[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2022-03-20 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-1073504409


   @pratyakshsharma 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2022-03-20 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-1073504190


   The purpose of introducing timestamps: Mainly when users want to consume 
from a certain location, deltastreamer can only specify checkpoint sites in the 
past. For example, kafka may have 50+ partitions, and users need to manually 
configure the checkpoint string. Introducing this simplifies this operation
   
   Regarding your example: I think you are right and agree with your idea. 
Partition 2 should not be populated with this value.
   At that time, the main consideration of this PR was to solve the problem of 
complex user configuration. It can simplify consumption data as much as 
possible. This example of partition 2 makes sense for some businesses. Maybe 
your current scenario may be a bit contradictory, and I feel like we can 
improve it and make it better


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-16 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-881820983


   @nsivabalan  Thank you for your concern and patience to help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-16 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-881278532


   @nsivabalan  I have completed the changes as you requested, please take a 
look~
   Thank you very much for your help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-16 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-881203019


   > Let's try to land this in by weekend. Its been hanging for quite sometime.
   
   ok.
   Sorry, I'll deal with it now, please excuse me
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-07-07 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-875975684


   woking


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-06-18 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-863968623


   deltaSync should reset this(...kafka.checkpoint.type) configuration (similar 
to how we reset checkpoints)
   In this way, we may need to store this in the metadata file. If it is a 
memory modification, there is a greater risk. I have submitted my latest 
implementation, please help to see if it is feasible
   @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-06-02 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-852970149


   I am currently facing a problem and would like to hear your opinion
   After we add this type, 
hoodie.deltastreamer.source.kafka.checkpoint.type=timestamp
   I am currently thinking, does deltastreamer.checkpoint.key maintain the 
status quo? The format is still: topicName,0:123,1:456
   If we continue to maintain the above format, when we specify: for example 
--checkpoint 1622635064, we need to determine the relationship between 
commitMetadata.getMetadata(CHECKPOINT_KEY) and --checkpoint 1622635064 in 
org.apache.hudi.utilities.deltastreamer.DeltaSync#readFromSource, This seems to 
be contrary to the results of our discussion, do not add kafka dependent code 
in DeltaSync
   
   Do you have any suggestions for this? thanks 
   @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-05-21 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-845791067


   > @liujinhui1994 : were you able to make progress on this. would be nice to 
have this in before next release.
   
   Sorry, I was too busy with work before~ I just sorted out the whole idea of 
this PR, clarified the goal, and will start soon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-04-01 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-811856655


   > Myself and Nishith discussed on this. Here is our proposal.
   > Let's rely on Deltastreamer.Config.checkpoint to pass in any type of 
checkpoint.
   > We can add another config called "checkpoint.type" which could default to 
string for all default checkpoints. For checkpoint of interest of this PR, we 
could set the value for this new config to "timestamp".
   > 
   > With this, its upto each source to parse and interpret the checkpoint 
value and DeltaSync does not need to deal w/ diff checkpointing formats.
   > 
   > Having said this, DeltaSync readFromSource() should not have any changes 
in this diff.
   > KafkaOffsetGen should have logic to parse diff checkpoint values, based on 
two values(deltastreamer.config.checkpoint and checkpoint.type).
   > 
   > With this, we also moved source specific checkpointing logic within source 
specific class and did not leak it to DeltaSync which should be agnostic to 
different Source.
   > 
   > @liujinhui1994 : Let me know what do you think. Happy to chat more on this.
   
   Great, I will modify this PR based on this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-03-15 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-799512747


   no problem
   
   
   
   
   
   -- Original --
   From: Sivabalan Narayanan ***@***.***
   Date: Mon,Mar 15,2021 11:28 PM
   To: apache/hudi ***@***.***
   Cc: liujinhui ***@***.***, Mention ***@***.***
   Subject: Re: [apache/hudi] [HUDI-1447] DeltaStreamer kafka source supports 
consuming from specified timestamp (#2438)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-02-28 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-787662595


   The current implementation is mainly in KafkaOffsetGen @wangxianghu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-02-25 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-786452099


   I will add the unit test, and then please review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-02-20 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-782589121


   I have verified, please help review
   @wangxianghu @yanghua @nsivabalan 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-02-20 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-782588952


   @yanghua @wangxianghu @nsivabalan  
   
   I have verified, please help review
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org