[jira] [Comment Edited] (HUDI-1623) Support start_commit_time & end_commit_times for serializable incremental pull

2023-08-28 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758833#comment-17758833
 ] 

Vinoth Chandar edited comment on HUDI-1623 at 8/29/23 2:32 AM:
---

On the naming of the active timeline instants: 

let's *{_}${start_time}{_}_${completion_time}.${action}* for completed 
instants. and leave the requested/inflight  alone without changes. 


was (Author: vc):
On the naming of the active timeline instants: 

let's *{_}${start_time}_{_}${completion_time}.${action}* for completed 
instants. and leave the requested/inflight  alone without changes. 

> Support start_commit_time & end_commit_times for serializable incremental pull
> --
>
> Key: HUDI-1623
> URL: https://issues.apache.org/jira/browse/HUDI-1623
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Danny Chen
>Priority: Critical
> Fix For: 1.0.0
>
>
> We suggest a new file naming for the *completed* metadata file:
> ${start_time}.${action}.${completion_time}
>  
> We also need a global *Time Generator* that can ensure the monotonical 
> increasing generation of the timestamp, for example, maybe hold a mutex lock 
> with the last generated timestamp backing up there. Say it may holds a lock 
> {*}L1{*}. For each instant time generation, it needs guard from the lock.
>  
> Before creating the completed file, we also need a lock guard from L1.
>  
> Things need to note:
> 1. we only add completion timestamp to the completed metadata file;
> 2. we only add lock guard to the completed metadata file creation, not the 
> whole commiting procedure;
> 3. for regular instant time generation, we also need a lock (that we should 
> ship out by default)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HUDI-1623) Support start_commit_time & end_commit_times for serializable incremental pull

2023-08-28 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758833#comment-17758833
 ] 

Vinoth Chandar edited comment on HUDI-1623 at 8/29/23 2:31 AM:
---

On the naming of the active timeline instants: 

let's *{_}${start_time}_{_}${completion_time}.${action}* for completed 
instants. and leave the requested/inflight  alone without changes. 


was (Author: vc):
On the naming of the active timeline instants: 

let's _${start_time}_${completion_time}.${action}_ for completed instants. and 
leave the requested/inflight  alone without changes. 

> Support start_commit_time & end_commit_times for serializable incremental pull
> --
>
> Key: HUDI-1623
> URL: https://issues.apache.org/jira/browse/HUDI-1623
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Danny Chen
>Priority: Critical
> Fix For: 1.0.0
>
>
> We suggest a new file naming for the *completed* metadata file:
> ${start_time}.${action}.${completion_time}
>  
> We also need a global *Time Generator* that can ensure the monotonical 
> increasing generation of the timestamp, for example, maybe hold a mutex lock 
> with the last generated timestamp backing up there. Say it may holds a lock 
> {*}L1{*}. For each instant time generation, it needs guard from the lock.
>  
> Before creating the completed file, we also need a lock guard from L1.
>  
> Things need to note:
> 1. we only add completion timestamp to the completed metadata file;
> 2. we only add lock guard to the completed metadata file creation, not the 
> whole commiting procedure;
> 3. for regular instant time generation, we also need a lock (that we should 
> ship out by default)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HUDI-1623) Support start_commit_time & end_commit_times for serializable incremental pull

2023-08-24 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758834#comment-17758834
 ] 

Vinoth Chandar edited comment on HUDI-1623 at 8/25/23 5:40 AM:
---

On TrueTime, we add a new _TrueTimeGenerator_ Interface.

By default, we rely on existing lock provider interface. 
{code:java}
Class WaitBasedTrueTimeGenerator implements TrueTimeGenerator {
   long maxExpectedClockSkewMs;
   LockProvider lock;  
 
   long generateTime() {
 try (lock) {
 long ts = System.currentTimeMillis();
 Thread.sleep(maxExpectedClockSkewMs);
 return ts;
 }
   }
}
{code}
Without relying on clock skew, 
{code:java}
Class StatefulTrueTimeGenerator implements TrueTimeGenerator {
   String timeStampFilePath = ".../.hoodie/truetime_latest";
   LockProvider lock;  
 
   long generateTime() {
 try (lock) {
 long currentMaxTrueTime = readAsLong(timeStampFilePath);
 long newTrueTime = Math.max(ts, currentMaxTrueTime + 100);
 writeAsLong(newTrueTime);
 return newTrueTime;
 }
   }
}
{code}


was (Author: vc):
On TrueTime, we add a new _TrueTimeGenerator_ Interface.

By default, we rely on existing lock provider interface. 
{code:java}
Class WaitBasedTrueTimeGenerator implements TrueTimeGenerator {
   long maxExpectedClockSkewMs;
   LockProvider lock;  
 
   long generateTime() {
 try (lock) {
 long ts = System.currentTimeMillis();
 Thread.sleep(maxExpectedClockSkewMs);
 return ts;
 }
   }
}
{code}
 

> Support start_commit_time & end_commit_times for serializable incremental pull
> --
>
> Key: HUDI-1623
> URL: https://issues.apache.org/jira/browse/HUDI-1623
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Danny Chen
>Priority: Critical
> Fix For: 1.0.0
>
>
> We suggest a new file naming for the *completed* metadata file:
> ${start_time}.${action}.${completion_time}
>  
> We also need a global *Time Generator* that can ensure the monotonical 
> increasing generation of the timestamp, for example, maybe hold a mutex lock 
> with the last generated timestamp backing up there. Say it may holds a lock 
> {*}L1{*}. For each instant time generation, it needs guard from the lock.
>  
> Before creating the completed file, we also need a lock guard from L1.
>  
> Things need to note:
> 1. we only add completion timestamp to the completed metadata file;
> 2. we only add lock guard to the completed metadata file creation, not the 
> whole commiting procedure;
> 3. for regular instant time generation, we also need a lock (that we should 
> ship out by default)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)