Issue running Integration test in my local machine

2020-06-07 Thread Sathyaprakash G
Hi,

I am getting below error running integration test from my local machine.
Just curious to know if anybody else faced this issue and whether there is
any known solution to this issue.

command : mvn verify -DskipUTs=true -B
Log :
https://gist.github.com/sathyaprakashg/f1a6e6d35e5998519c89f558d9c83be3

#
412053 [main] INFO  org.apache.hudi.integ.ITTestBase  - Container :
/adhoc-1, Running command :/var/hoodie/ws/hudi-spark/run_hoodie_app.sh
--hive-sync --table-path
hdfs://namenode/docker_hoodie_multi_partition_key_mor_test --hive-url
jdbc:hive2://hiveserver:1 --table-type MERGE_ON_READ --hive-table
docker_hoodie_multi_partition_key_mor_test --use-multi-partition-keys
412053 [main] INFO  org.apache.hudi.integ.ITTestBase  -
#
457467 [dockerjava-jaxrs-async-14] INFO  org.apache.hudi.integ.ITTestBase
 - onComplete called
457479 [main] INFO  org.apache.hudi.integ.ITTestBase  - Exit code for
command : *137*

Exit code is 137 and looks like it is related to OOM error.

Increated memory from 1 GB to higher values in below file and still getting
same error
https://github.com/apache/hudi/blob/fb283934a33a0bc7b11f80e4149f7922fa4f0af5/hudi-spark/run_hoodie_app.sh#L40

-- 
With Regards,
Sathyaprakash G


Spark datasource support for MOR table

2020-05-26 Thread Sathyaprakash G
Hi,
I see that for MOR table only Read optimized view is supported for
Spark data source and Snapshot and Incremental query are not supported for
Spark data source.

Just curious to know whether support for Snapshot and Incremental query for
Spark data source is work in progress? If so, could you please share the
related JIRA ticket.

https://hudi.apache.org/docs/querying_data.html#merge-on-read-tables

-- 
With Regards,
Sathyaprakash G


Re: Merge on Read table is recreating affected parquet file on every write

2020-05-23 Thread Sathyaprakash G
Hi Vinod,

Thanks for detailed explanation. I looked little detail and found that
though documentation says default compaction policy would run every 10
delta commits, but in the code i see default is 1. I think default value of
1 is little overkill and also it will make MERGE ON READ work like COPY ON
WRITE with compaction on every run. Should we increase the default value to
more than 1?

https://github.com/apache/incubator-hudi/blob/f34de3fb2738c8c36c937eba8df2a6848fafa886/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java#L100

https://github.com/apache/incubator-hudi/commit/605af8a82f2cb0c5ea92ba4a12d0684571a17599

On Fri, May 22, 2020 at 11:07 AM Vinoth Chandar  wrote:

> Hi,
>
> Sorry, this slipped through the cracks. By default, the compaction policy
> would run every 10 delta commits or so.
>
> https://hudi.apache.org/docs/configurations.html#withMaxNumDeltaCommitsBeforeCompaction
>
>
>
> >>but in addition to new log file, i also see that corresponding parquet
> file is also rewritten.
> did you also have inserts in the second delta commit?  inserts go to a new
> parquet file, while updates to go the log as of now..
>
> >>My question is whether when we write update to MERGE ON READ table,
> compaction is always called?
> Hudi supports both sync/inline compaction which is called with every update
> or async compaction where it happens parallely.
> So this is dependent on deltastreamer or datasource. I see you are using
> the spark datasource, we only support inline compaction on it.
> That said, we are planning to add similar async compaction support for
> structured streaming sink in 0.6.0. Are you interested in that?
>
>
> On Wed, May 20, 2020 at 2:10 PM Sathyaprakash G 
> wrote:
>
> > Adding link to the images
> >
> > https://pasteboard.co/J9iAB10.png
> > https://pasteboard.co/J9iB1gZ.png
> > https://pasteboard.co/J9iBgpx.png
> >
> > On Wed, May 20, 2020 at 2:03 PM Sathyaprakash G <
> sathyapraka...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I created a Merge on Read table and then tried to update a record by
> > > writing the updated record to hudi table base path.
> > >
> > > When I look at the affected partition year=2020/month=05, i was
> expecting
> > > just new log file with the updated record written but in addition to
> new
> > > log file, i also see that corresponding parquet file is also rewritten.
> > >
> > > I see that there is compaction request by looking at below file.
> > > .hoodie/.aux/20200520195745.compaction.requested
> > >
> > > My question is whether when we write update to MERGE ON READ table,
> > > compaction is always called? Or is there any setting that controls
> > whether
> > > to automatically call compaction on particular write.
> > >
> > > This is the code i ran and also i have attached few screenshots of the
> > > written files
> > >
> https://gist.github.com/sathyaprakashg/e5107770817f1fe5a1019633ecfafb68
> > >
> > >
> > > --
> > > With Regards,
> > > Sathyaprakash G
> > >
> >
> >
> > --
> > With Regards,
> > Sathyaprakash G
> >
>


-- 
With Regards,
Sathyaprakash G


Re: Merge on Read table is recreating affected parquet file on every write

2020-05-20 Thread Sathyaprakash G
Adding link to the images

https://pasteboard.co/J9iAB10.png
https://pasteboard.co/J9iB1gZ.png
https://pasteboard.co/J9iBgpx.png

On Wed, May 20, 2020 at 2:03 PM Sathyaprakash G 
wrote:

> Hi,
>
> I created a Merge on Read table and then tried to update a record by
> writing the updated record to hudi table base path.
>
> When I look at the affected partition year=2020/month=05, i was expecting
> just new log file with the updated record written but in addition to new
> log file, i also see that corresponding parquet file is also rewritten.
>
> I see that there is compaction request by looking at below file.
> .hoodie/.aux/20200520195745.compaction.requested
>
> My question is whether when we write update to MERGE ON READ table,
> compaction is always called? Or is there any setting that controls whether
> to automatically call compaction on particular write.
>
> This is the code i ran and also i have attached few screenshots of the
> written files
> https://gist.github.com/sathyaprakashg/e5107770817f1fe5a1019633ecfafb68
>
>
> --
> With Regards,
> Sathyaprakash G
>


-- 
With Regards,
Sathyaprakash G


Merge on Read table is recreating affected parquet file on every write

2020-05-20 Thread Sathyaprakash G
Hi,

I created a Merge on Read table and then tried to update a record by
writing the updated record to hudi table base path.

When I look at the affected partition year=2020/month=05, i was expecting
just new log file with the updated record written but in addition to new
log file, i also see that corresponding parquet file is also rewritten.

I see that there is compaction request by looking at below file.
.hoodie/.aux/20200520195745.compaction.requested

My question is whether when we write update to MERGE ON READ table,
compaction is always called? Or is there any setting that controls whether
to automatically call compaction on particular write.

This is the code i ran and also i have attached few screenshots of the
written files
https://gist.github.com/sathyaprakashg/e5107770817f1fe5a1019633ecfafb68


-- 
With Regards,
Sathyaprakash G