[jira] [Comment Edited] (HUDI-1623) Support start_commit_time & end_commit_times for serializable incremental pull
[ https://issues.apache.org/jira/browse/HUDI-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758833#comment-17758833 ] Vinoth Chandar edited comment on HUDI-1623 at 8/29/23 2:32 AM: --- On the naming of the active timeline instants: let's *{_}${start_time}{_}_${completion_time}.${action}* for completed instants. and leave the requested/inflight alone without changes. was (Author: vc): On the naming of the active timeline instants: let's *{_}${start_time}_{_}${completion_time}.${action}* for completed instants. and leave the requested/inflight alone without changes. > Support start_commit_time & end_commit_times for serializable incremental pull > -- > > Key: HUDI-1623 > URL: https://issues.apache.org/jira/browse/HUDI-1623 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core >Reporter: Nishith Agarwal >Assignee: Danny Chen >Priority: Critical > Fix For: 1.0.0 > > > We suggest a new file naming for the *completed* metadata file: > ${start_time}.${action}.${completion_time} > > We also need a global *Time Generator* that can ensure the monotonical > increasing generation of the timestamp, for example, maybe hold a mutex lock > with the last generated timestamp backing up there. Say it may holds a lock > {*}L1{*}. For each instant time generation, it needs guard from the lock. > > Before creating the completed file, we also need a lock guard from L1. > > Things need to note: > 1. we only add completion timestamp to the completed metadata file; > 2. we only add lock guard to the completed metadata file creation, not the > whole commiting procedure; > 3. for regular instant time generation, we also need a lock (that we should > ship out by default) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HUDI-1623) Support start_commit_time & end_commit_times for serializable incremental pull
[ https://issues.apache.org/jira/browse/HUDI-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758833#comment-17758833 ] Vinoth Chandar edited comment on HUDI-1623 at 8/29/23 2:31 AM: --- On the naming of the active timeline instants: let's *{_}${start_time}_{_}${completion_time}.${action}* for completed instants. and leave the requested/inflight alone without changes. was (Author: vc): On the naming of the active timeline instants: let's _${start_time}_${completion_time}.${action}_ for completed instants. and leave the requested/inflight alone without changes. > Support start_commit_time & end_commit_times for serializable incremental pull > -- > > Key: HUDI-1623 > URL: https://issues.apache.org/jira/browse/HUDI-1623 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core >Reporter: Nishith Agarwal >Assignee: Danny Chen >Priority: Critical > Fix For: 1.0.0 > > > We suggest a new file naming for the *completed* metadata file: > ${start_time}.${action}.${completion_time} > > We also need a global *Time Generator* that can ensure the monotonical > increasing generation of the timestamp, for example, maybe hold a mutex lock > with the last generated timestamp backing up there. Say it may holds a lock > {*}L1{*}. For each instant time generation, it needs guard from the lock. > > Before creating the completed file, we also need a lock guard from L1. > > Things need to note: > 1. we only add completion timestamp to the completed metadata file; > 2. we only add lock guard to the completed metadata file creation, not the > whole commiting procedure; > 3. for regular instant time generation, we also need a lock (that we should > ship out by default) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HUDI-1623) Support start_commit_time & end_commit_times for serializable incremental pull
[ https://issues.apache.org/jira/browse/HUDI-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758834#comment-17758834 ] Vinoth Chandar edited comment on HUDI-1623 at 8/25/23 5:40 AM: --- On TrueTime, we add a new _TrueTimeGenerator_ Interface. By default, we rely on existing lock provider interface. {code:java} Class WaitBasedTrueTimeGenerator implements TrueTimeGenerator { long maxExpectedClockSkewMs; LockProvider lock; long generateTime() { try (lock) { long ts = System.currentTimeMillis(); Thread.sleep(maxExpectedClockSkewMs); return ts; } } } {code} Without relying on clock skew, {code:java} Class StatefulTrueTimeGenerator implements TrueTimeGenerator { String timeStampFilePath = ".../.hoodie/truetime_latest"; LockProvider lock; long generateTime() { try (lock) { long currentMaxTrueTime = readAsLong(timeStampFilePath); long newTrueTime = Math.max(ts, currentMaxTrueTime + 100); writeAsLong(newTrueTime); return newTrueTime; } } } {code} was (Author: vc): On TrueTime, we add a new _TrueTimeGenerator_ Interface. By default, we rely on existing lock provider interface. {code:java} Class WaitBasedTrueTimeGenerator implements TrueTimeGenerator { long maxExpectedClockSkewMs; LockProvider lock; long generateTime() { try (lock) { long ts = System.currentTimeMillis(); Thread.sleep(maxExpectedClockSkewMs); return ts; } } } {code} > Support start_commit_time & end_commit_times for serializable incremental pull > -- > > Key: HUDI-1623 > URL: https://issues.apache.org/jira/browse/HUDI-1623 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core >Reporter: Nishith Agarwal >Assignee: Danny Chen >Priority: Critical > Fix For: 1.0.0 > > > We suggest a new file naming for the *completed* metadata file: > ${start_time}.${action}.${completion_time} > > We also need a global *Time Generator* that can ensure the monotonical > increasing generation of the timestamp, for example, maybe hold a mutex lock > with the last generated timestamp backing up there. Say it may holds a lock > {*}L1{*}. For each instant time generation, it needs guard from the lock. > > Before creating the completed file, we also need a lock guard from L1. > > Things need to note: > 1. we only add completion timestamp to the completed metadata file; > 2. we only add lock guard to the completed metadata file creation, not the > whole commiting procedure; > 3. for regular instant time generation, we also need a lock (that we should > ship out by default) -- This message was sent by Atlassian Jira (v8.20.10#820010)