[GitHub] [hudi] beyond1920 commented on a diff in pull request #4913: [HUDI-1517] create marker file for every log file

2023-08-21 Thread via GitHub


beyond1920 commented on code in PR #4913:
URL: https://github.com/apache/hudi/pull/4913#discussion_r1300814472


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##
@@ -273,4 +280,31 @@ protected static Option 
toAvroRecord(HoodieRecord record, Schema
   return Option.empty();
 }
   }
+
+  protected class AppendLogWriteCallback implements HoodieLogFileWriteCallback 
{
+// here we distinguish log files created from log files being appended. 
Considering following scenario:
+// An appending task write to log file.
+// (1) append to existing file file_instant_writetoken1.log.1
+// (2) rollover and create file file_instant_writetoken2.log.2
+// Then this task failed and retry by a new task.
+// (3) append to existing file file_instant_writetoken1.log.1
+// (4) rollover and create file file_instant_writetoken3.log.2
+// finally file_instant_writetoken2.log.2 should not be committed to hudi, 
we use marker file to delete it.
+// keep in mind that log file is not always fail-safe unless it never roll 
over
+

Review Comment:
   @nsivabalan @guanziyue I'm interested in this issue. Please count me in for 
the discussion on this topic, I might be able to provide some input and other 
problem about this issue. My slack id is Jing Zhang.
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] beyond1920 commented on a diff in pull request #4913: [HUDI-1517] create marker file for every log file

2023-08-21 Thread via GitHub


beyond1920 commented on code in PR #4913:
URL: https://github.com/apache/hudi/pull/4913#discussion_r1300814472


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##
@@ -273,4 +280,31 @@ protected static Option 
toAvroRecord(HoodieRecord record, Schema
   return Option.empty();
 }
   }
+
+  protected class AppendLogWriteCallback implements HoodieLogFileWriteCallback 
{
+// here we distinguish log files created from log files being appended. 
Considering following scenario:
+// An appending task write to log file.
+// (1) append to existing file file_instant_writetoken1.log.1
+// (2) rollover and create file file_instant_writetoken2.log.2
+// Then this task failed and retry by a new task.
+// (3) append to existing file file_instant_writetoken1.log.1
+// (4) rollover and create file file_instant_writetoken3.log.2
+// finally file_instant_writetoken2.log.2 should not be committed to hudi, 
we use marker file to delete it.
+// keep in mind that log file is not always fail-safe unless it never roll 
over
+

Review Comment:
   @nsivabalan @guanziyue I'm interested in this issue. Please count me in for 
the discussion on this topic; I might be able to provide some input. My slack 
id is Jing Zhang.
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] beyond1920 commented on a diff in pull request #4913: [HUDI-1517] create marker file for every log file

2023-06-25 Thread via GitHub


beyond1920 commented on code in PR #4913:
URL: https://github.com/apache/hudi/pull/4913#discussion_r1241088025


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##
@@ -273,4 +280,31 @@ protected static Option 
toAvroRecord(HoodieRecord record, Schema
   return Option.empty();
 }
   }
+
+  protected class AppendLogWriteCallback implements HoodieLogFileWriteCallback 
{
+// here we distinguish log files created from log files being appended. 
Considering following scenario:
+// An appending task write to log file.
+// (1) append to existing file file_instant_writetoken1.log.1
+// (2) rollover and create file file_instant_writetoken2.log.2
+// Then this task failed and retry by a new task.
+// (3) append to existing file file_instant_writetoken1.log.1
+// (4) rollover and create file file_instant_writetoken3.log.2
+// finally file_instant_writetoken2.log.2 should not be committed to hudi, 
we use marker file to delete it.
+// keep in mind that log file is not always fail-safe unless it never roll 
over
+

Review Comment:
   Cool



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] beyond1920 commented on a diff in pull request #4913: [HUDI-1517] create marker file for every log file

2023-06-19 Thread via GitHub


beyond1920 commented on code in PR #4913:
URL: https://github.com/apache/hudi/pull/4913#discussion_r1233767393


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##
@@ -273,4 +280,31 @@ protected static Option 
toAvroRecord(HoodieRecord record, Schema
   return Option.empty();
 }
   }
+
+  protected class AppendLogWriteCallback implements HoodieLogFileWriteCallback 
{
+// here we distinguish log files created from log files being appended. 
Considering following scenario:
+// An appending task write to log file.
+// (1) append to existing file file_instant_writetoken1.log.1
+// (2) rollover and create file file_instant_writetoken2.log.2
+// Then this task failed and retry by a new task.
+// (3) append to existing file file_instant_writetoken1.log.1
+// (4) rollover and create file file_instant_writetoken3.log.2
+// finally file_instant_writetoken2.log.2 should not be committed to hudi, 
we use marker file to delete it.
+// keep in mind that log file is not always fail-safe unless it never roll 
over
+

Review Comment:
   @guanziyue Thanks for response. 
   The problem would happen in two cases:
   1. Enable speculation
   2. Task failed and  Driver not fail, the following retry tasks would always 
failed until the job failed finally.
// The first task attempt write to log file.
   // (1) append to existing file file_instant_writetoken1.log.1
   // Then this task failed and Driver still alive, then a new task attempt 
would retry
   // (2) try to append to existing file file_instant_writetoken1.log.1, 
but would fail because the append marker already exists.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] beyond1920 commented on a diff in pull request #4913: [HUDI-1517] create marker file for every log file

2023-06-15 Thread via GitHub


beyond1920 commented on code in PR #4913:
URL: https://github.com/apache/hudi/pull/4913#discussion_r1231757821


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##
@@ -273,4 +280,31 @@ protected static Option 
toAvroRecord(HoodieRecord record, Schema
   return Option.empty();
 }
   }
+
+  protected class AppendLogWriteCallback implements HoodieLogFileWriteCallback 
{
+// here we distinguish log files created from log files being appended. 
Considering following scenario:
+// An appending task write to log file.
+// (1) append to existing file file_instant_writetoken1.log.1
+// (2) rollover and create file file_instant_writetoken2.log.2
+// Then this task failed and retry by a new task.
+// (3) append to existing file file_instant_writetoken1.log.1
+// (4) rollover and create file file_instant_writetoken3.log.2
+// finally file_instant_writetoken2.log.2 should not be committed to hudi, 
we use marker file to delete it.
+// keep in mind that log file is not always fail-safe unless it never roll 
over
+

Review Comment:
   Under this case, if we use direct marker type, the new task would always get 
an exception when it call `preLogFileOpen` because we use `create` instead of 
`createIfNotExist()`? 
   
![image](https://github.com/apache/hudi/assets/1525333/8e06dad5-b8f0-4e23-9a4f-4271d1d7096c)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] beyond1920 commented on a diff in pull request #4913: [HUDI-1517] create marker file for every log file

2023-06-15 Thread via GitHub


beyond1920 commented on code in PR #4913:
URL: https://github.com/apache/hudi/pull/4913#discussion_r1231757821


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##
@@ -273,4 +280,31 @@ protected static Option 
toAvroRecord(HoodieRecord record, Schema
   return Option.empty();
 }
   }
+
+  protected class AppendLogWriteCallback implements HoodieLogFileWriteCallback 
{
+// here we distinguish log files created from log files being appended. 
Considering following scenario:
+// An appending task write to log file.
+// (1) append to existing file file_instant_writetoken1.log.1
+// (2) rollover and create file file_instant_writetoken2.log.2
+// Then this task failed and retry by a new task.
+// (3) append to existing file file_instant_writetoken1.log.1
+// (4) rollover and create file file_instant_writetoken3.log.2
+// finally file_instant_writetoken2.log.2 should not be committed to hudi, 
we use marker file to delete it.
+// keep in mind that log file is not always fail-safe unless it never roll 
over
+

Review Comment:
   Under this case, if we use direct marker type, the new task would always get 
an exception when it call `preLogFileOpen` if we use `create` instead of 
`createIfNotExist()`? 
   
![image](https://github.com/apache/hudi/assets/1525333/8e06dad5-b8f0-4e23-9a4f-4271d1d7096c)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] beyond1920 commented on a diff in pull request #4913: [HUDI-1517] create marker file for every log file

2023-06-15 Thread via GitHub


beyond1920 commented on code in PR #4913:
URL: https://github.com/apache/hudi/pull/4913#discussion_r1231757821


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##
@@ -273,4 +280,31 @@ protected static Option 
toAvroRecord(HoodieRecord record, Schema
   return Option.empty();
 }
   }
+
+  protected class AppendLogWriteCallback implements HoodieLogFileWriteCallback 
{
+// here we distinguish log files created from log files being appended. 
Considering following scenario:
+// An appending task write to log file.
+// (1) append to existing file file_instant_writetoken1.log.1
+// (2) rollover and create file file_instant_writetoken2.log.2
+// Then this task failed and retry by a new task.
+// (3) append to existing file file_instant_writetoken1.log.1
+// (4) rollover and create file file_instant_writetoken3.log.2
+// finally file_instant_writetoken2.log.2 should not be committed to hudi, 
we use marker file to delete it.
+// keep in mind that log file is not always fail-safe unless it never roll 
over
+

Review Comment:
   Under this case, if we use direct marker type, the new task would always get 
an exception when it call `preLogFileOpen`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] beyond1920 commented on a diff in pull request #4913: [HUDI-1517] create marker file for every log file

2023-06-15 Thread via GitHub


beyond1920 commented on code in PR #4913:
URL: https://github.com/apache/hudi/pull/4913#discussion_r1231757821


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##
@@ -273,4 +280,31 @@ protected static Option 
toAvroRecord(HoodieRecord record, Schema
   return Option.empty();
 }
   }
+
+  protected class AppendLogWriteCallback implements HoodieLogFileWriteCallback 
{
+// here we distinguish log files created from log files being appended. 
Considering following scenario:
+// An appending task write to log file.
+// (1) append to existing file file_instant_writetoken1.log.1
+// (2) rollover and create file file_instant_writetoken2.log.2
+// Then this task failed and retry by a new task.
+// (3) append to existing file file_instant_writetoken1.log.1
+// (4) rollover and create file file_instant_writetoken3.log.2
+// finally file_instant_writetoken2.log.2 should not be committed to hudi, 
we use marker file to delete it.
+// keep in mind that log file is not always fail-safe unless it never roll 
over
+

Review Comment:
   Under this case, if we use direct marker type, the new task would got an 
exception when it call `preLogFileOpen`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org