Issue running Integration test in my local machine
Hi, I am getting below error running integration test from my local machine. Just curious to know if anybody else faced this issue and whether there is any known solution to this issue. command : mvn verify -DskipUTs=true -B Log : https://gist.github.com/sathyaprakashg/f1a6e6d35e5998519c89f558d9c83be3 # 412053 [main] INFO org.apache.hudi.integ.ITTestBase - Container : /adhoc-1, Running command :/var/hoodie/ws/hudi-spark/run_hoodie_app.sh --hive-sync --table-path hdfs://namenode/docker_hoodie_multi_partition_key_mor_test --hive-url jdbc:hive2://hiveserver:1 --table-type MERGE_ON_READ --hive-table docker_hoodie_multi_partition_key_mor_test --use-multi-partition-keys 412053 [main] INFO org.apache.hudi.integ.ITTestBase - # 457467 [dockerjava-jaxrs-async-14] INFO org.apache.hudi.integ.ITTestBase - onComplete called 457479 [main] INFO org.apache.hudi.integ.ITTestBase - Exit code for command : *137* Exit code is 137 and looks like it is related to OOM error. Increated memory from 1 GB to higher values in below file and still getting same error https://github.com/apache/hudi/blob/fb283934a33a0bc7b11f80e4149f7922fa4f0af5/hudi-spark/run_hoodie_app.sh#L40 -- With Regards, Sathyaprakash G
Spark datasource support for MOR table
Hi, I see that for MOR table only Read optimized view is supported for Spark data source and Snapshot and Incremental query are not supported for Spark data source. Just curious to know whether support for Snapshot and Incremental query for Spark data source is work in progress? If so, could you please share the related JIRA ticket. https://hudi.apache.org/docs/querying_data.html#merge-on-read-tables -- With Regards, Sathyaprakash G
Re: Merge on Read table is recreating affected parquet file on every write
Hi Vinod, Thanks for detailed explanation. I looked little detail and found that though documentation says default compaction policy would run every 10 delta commits, but in the code i see default is 1. I think default value of 1 is little overkill and also it will make MERGE ON READ work like COPY ON WRITE with compaction on every run. Should we increase the default value to more than 1? https://github.com/apache/incubator-hudi/blob/f34de3fb2738c8c36c937eba8df2a6848fafa886/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java#L100 https://github.com/apache/incubator-hudi/commit/605af8a82f2cb0c5ea92ba4a12d0684571a17599 On Fri, May 22, 2020 at 11:07 AM Vinoth Chandar wrote: > Hi, > > Sorry, this slipped through the cracks. By default, the compaction policy > would run every 10 delta commits or so. > > https://hudi.apache.org/docs/configurations.html#withMaxNumDeltaCommitsBeforeCompaction > > > > >>but in addition to new log file, i also see that corresponding parquet > file is also rewritten. > did you also have inserts in the second delta commit? inserts go to a new > parquet file, while updates to go the log as of now.. > > >>My question is whether when we write update to MERGE ON READ table, > compaction is always called? > Hudi supports both sync/inline compaction which is called with every update > or async compaction where it happens parallely. > So this is dependent on deltastreamer or datasource. I see you are using > the spark datasource, we only support inline compaction on it. > That said, we are planning to add similar async compaction support for > structured streaming sink in 0.6.0. Are you interested in that? > > > On Wed, May 20, 2020 at 2:10 PM Sathyaprakash G > wrote: > > > Adding link to the images > > > > https://pasteboard.co/J9iAB10.png > > https://pasteboard.co/J9iB1gZ.png > > https://pasteboard.co/J9iBgpx.png > > > > On Wed, May 20, 2020 at 2:03 PM Sathyaprakash G < > sathyapraka...@gmail.com> > > wrote: > > > > > Hi, > > > > > > I created a Merge on Read table and then tried to update a record by > > > writing the updated record to hudi table base path. > > > > > > When I look at the affected partition year=2020/month=05, i was > expecting > > > just new log file with the updated record written but in addition to > new > > > log file, i also see that corresponding parquet file is also rewritten. > > > > > > I see that there is compaction request by looking at below file. > > > .hoodie/.aux/20200520195745.compaction.requested > > > > > > My question is whether when we write update to MERGE ON READ table, > > > compaction is always called? Or is there any setting that controls > > whether > > > to automatically call compaction on particular write. > > > > > > This is the code i ran and also i have attached few screenshots of the > > > written files > > > > https://gist.github.com/sathyaprakashg/e5107770817f1fe5a1019633ecfafb68 > > > > > > > > > -- > > > With Regards, > > > Sathyaprakash G > > > > > > > > > -- > > With Regards, > > Sathyaprakash G > > > -- With Regards, Sathyaprakash G
Re: Merge on Read table is recreating affected parquet file on every write
Adding link to the images https://pasteboard.co/J9iAB10.png https://pasteboard.co/J9iB1gZ.png https://pasteboard.co/J9iBgpx.png On Wed, May 20, 2020 at 2:03 PM Sathyaprakash G wrote: > Hi, > > I created a Merge on Read table and then tried to update a record by > writing the updated record to hudi table base path. > > When I look at the affected partition year=2020/month=05, i was expecting > just new log file with the updated record written but in addition to new > log file, i also see that corresponding parquet file is also rewritten. > > I see that there is compaction request by looking at below file. > .hoodie/.aux/20200520195745.compaction.requested > > My question is whether when we write update to MERGE ON READ table, > compaction is always called? Or is there any setting that controls whether > to automatically call compaction on particular write. > > This is the code i ran and also i have attached few screenshots of the > written files > https://gist.github.com/sathyaprakashg/e5107770817f1fe5a1019633ecfafb68 > > > -- > With Regards, > Sathyaprakash G > -- With Regards, Sathyaprakash G
Merge on Read table is recreating affected parquet file on every write
Hi, I created a Merge on Read table and then tried to update a record by writing the updated record to hudi table base path. When I look at the affected partition year=2020/month=05, i was expecting just new log file with the updated record written but in addition to new log file, i also see that corresponding parquet file is also rewritten. I see that there is compaction request by looking at below file. .hoodie/.aux/20200520195745.compaction.requested My question is whether when we write update to MERGE ON READ table, compaction is always called? Or is there any setting that controls whether to automatically call compaction on particular write. This is the code i ran and also i have attached few screenshots of the written files https://gist.github.com/sathyaprakashg/e5107770817f1fe5a1019633ecfafb68 -- With Regards, Sathyaprakash G