[GitHub] [hudi] bvaradar commented on issue #1979: [SUPPORT]: Is it possible to incrementally read only upserted rows where a material change has occurred?

2020-08-27 Thread GitBox


bvaradar commented on issue #1979:
URL: https://github.com/apache/hudi/issues/1979#issuecomment-682061419


   Will close the ticket for now. Please reopen if we need to discuss more on 
this topic.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #1979: [SUPPORT]: Is it possible to incrementally read only upserted rows where a material change has occurred?

2020-08-26 Thread GitBox


bvaradar commented on issue #1979:
URL: https://github.com/apache/hudi/issues/1979#issuecomment-680998998


   @hughfdjackson : Good point about incrementally reading multiple commits. 
The variation you suggested seems to make sense. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #1979: [SUPPORT]: Is it possible to incrementally read only upserted rows where a material change has occurred?

2020-08-24 Thread GitBox


bvaradar commented on issue #1979:
URL: https://github.com/apache/hudi/issues/1979#issuecomment-679522323


   @hughfdjackson : In general getting incremental read to discard duplicates 
is not possible for MOR table types as we defer the merging of records to 
compaction.
   
   I was thinking about alternate ways to achieve your use-case for COW table 
by using an application level boolean flag. Let me know if this makes sense:
   
   1. Introduce additional  boolean column "changed". Default Value is false.
   2. Have your own implementation of HoodieRecordPayload plugged-in.
   3a In HoodieRecordPayload.getInsertValue(), return an avro record with 
changed = true. This function is called first time  when the new record is 
inserted.
   3(b) In HoodieRecordPayload.combineAndGetUpdateValue(), if you determine, 
there is no material change, set changed = false else set it to true.
   
   In your incremental query,  add the filter changed = true to filter out 
those without material changes ? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #1979: [SUPPORT]: Is it possible to incrementally read only upserted rows where a material change has occurred?

2020-08-21 Thread GitBox


bvaradar commented on issue #1979:
URL: https://github.com/apache/hudi/issues/1979#issuecomment-678595102


   Right, this dataset is essentially a log but if you are only worried about 
incremental query,  then you will be reading only the records added by the new 
commits.  Also, note that your dataset will keep increasing.  So, its 
application is limited.
   
   In general, I don't see another way to do this in a generic way. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #1979: [SUPPORT]: Is it possible to incrementally read only upserted rows where a material change has occurred?

2020-08-19 Thread GitBox


bvaradar commented on issue #1979:
URL: https://github.com/apache/hudi/issues/1979#issuecomment-676490788


   One option to make this to work currently is to add columns that gets 
updated also as part of the composite record key.  We can use key uniqueness 
constraint of Hudi to achieve the result. This way, you have an option to 
filter out duplicates first and then upsert rest of the records in the batch. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org