[GitHub] [incubator-hudi] satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-03-30 Thread GitBox
satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO 
table before a pending compaction
URL: https://github.com/apache/incubator-hudi/pull/1396#issuecomment-606157488
 
 
   > Looks great @satishkotha Once you rebase and fix conflicts, we should be 
good to commit.
   
   @vinothchandar Checking to see if you have any other feedback here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-03-23 Thread GitBox
satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO 
table before a pending compaction
URL: https://github.com/apache/incubator-hudi/pull/1396#issuecomment-602769754
 
 
   @vinothchandar could you take another look at this one?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-03-19 Thread GitBox
satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO 
table before a pending compaction
URL: https://github.com/apache/incubator-hudi/pull/1396#issuecomment-601473026
 
 
   > @satishkotha : Some minor comments. Will approve once you reply/address 
them. Let's also wait for @vinothchandar to take a pass
   
   @bvaradar Sounds good. I addressed your review comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-03-12 Thread GitBox
satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO 
table before a pending compaction
URL: https://github.com/apache/incubator-hudi/pull/1396#issuecomment-598409536
 
 
   > > if compaction at t2 takes a long time, incremental reads using 
HoodieParquetInputFormat may make progress to read commits at t3
   > 
   > IIUC this is because you are incremental pulling from the parquet only 
table? I thought we can already incremental pull via logs. no? cc @n3nash .. is 
this really needed since it will add complexity to the system..
   > 
   > Eventually, I would like incremental query/pull on MOR to be just based on 
logs..
   
   Based on view type, hudi decides the input format to use (see 
https://github.com/apache/incubator-hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java#L91)
 . For RO views, we use HoodieParquetInputFormat, which does not read log 
files. For RT views, we use HoodieParquetRealtimeInputFormat, which reads slice 
including log files. In my limited testing, incremental reads on RT views also 
do not work well (we see duplicates after compaction in some conditions).  
@bvaradar  is working on fixing any broken windows for supporting incremental 
reads on RT views.
   
   We wanted to include this change for supporting RO views (which is majority 
of use cases for us). I agree with you that this is additional complexity. I 
added more tests than usual because of that. 
   
   Other alternatives i can think of:
   1) Support incremental reads only for RT views.  incremental reads on RO can 
fail or use RT (is this your proposal in the above comment?)
   2) Instead of doing incremental reads based on hoodie commit time, use 
parquet file creation times. This approach requires substantial changes and 
likely be breaking some fundamental assumptions.
   
   Also, at a high level, I want to discuss adding additional mode for 
incremental reads. Today, its responsibility of hoodie users to save commit 
times and use that for next incremental reads. Can we add 'kafka consumer' 
model, where consumer only specifies their unique-id. Hudi tracks read progress 
(perhaps as part of consolidated metadata?). This would simplify usage and make 
debugging lot easier.

   fyi,  @n3nash is out of office for next 10 days. @bvaradar likely can share 
more context. Let me know if you have other suggestions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-03-10 Thread GitBox
satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO 
table before a pending compaction
URL: https://github.com/apache/incubator-hudi/pull/1396#issuecomment-597421941
 
 
   @bvaradar sorry, I messed up rebase on 
https://github.com/apache/incubator-hudi/pull/1389, Please take a look at this 
instead. As discussed in the other PR, I updated RO and RT views. Spark 
DataSource does not seem to support MOR tables, so i'm skipping that part for 
now. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services