[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-15 Thread GitBox


danny0405 commented on a change in pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#discussion_r670935008



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java
##
@@ -113,8 +120,11 @@ protected AbstractHoodieLogRecordScanner(FileSystem fs, 
String basePath, List

[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-15 Thread GitBox


danny0405 commented on a change in pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#discussion_r670949047



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##
@@ -90,6 +93,14 @@ protected boolean isDeleteRecord(GenericRecord 
genericRecord) {
 return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
 
+  /**
+   * Returns the ordering value of given record {@code record}.
+   */
+  protected static Object getOrderingVal(GenericRecord record, Properties 
properties) {
+return getNestedFieldVal(record,
+
properties.getProperty(HoodiePayloadProps.PAYLOAD_ORDERING_FIELD_PROP_KEY), 
true);

Review comment:
   Actually there is no need to care about whether the record comes from 
the persisted file or not, just fetch it and return null if the old schema does 
not contains the `preCombine` field.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-15 Thread GitBox


danny0405 commented on a change in pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#discussion_r670948572



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##
@@ -49,7 +49,7 @@ public OverwriteWithLatestAvroPayload(Option 
record) {
   @Override
   public OverwriteWithLatestAvroPayload 
preCombine(OverwriteWithLatestAvroPayload another) {
 // pick the payload with greatest ordering value
-if (another.orderingVal.compareTo(orderingVal) > 0) {
+if (another.orderingVal.compareTo(orderingVal) >= 0) {

Review comment:
   Finally i found that the `another` should always be the old value. So 
the original logic is correct.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-15 Thread GitBox


danny0405 commented on a change in pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#discussion_r670948572



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##
@@ -49,7 +49,7 @@ public OverwriteWithLatestAvroPayload(Option 
record) {
   @Override
   public OverwriteWithLatestAvroPayload 
preCombine(OverwriteWithLatestAvroPayload another) {
 // pick the payload with greatest ordering value
-if (another.orderingVal.compareTo(orderingVal) > 0) {
+if (another.orderingVal.compareTo(orderingVal) >= 0) {

Review comment:
   Finally i found that the `another` should always be the old value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-15 Thread GitBox


danny0405 commented on a change in pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#discussion_r670935008



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java
##
@@ -113,8 +120,11 @@ protected AbstractHoodieLogRecordScanner(FileSystem fs, 
String basePath, List

[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-15 Thread GitBox


danny0405 commented on a change in pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#discussion_r670926655



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java
##
@@ -113,8 +120,11 @@ protected AbstractHoodieLogRecordScanner(FileSystem fs, 
String basePath, List

[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-15 Thread GitBox


danny0405 commented on a change in pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#discussion_r670923302



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##
@@ -90,6 +93,14 @@ protected boolean isDeleteRecord(GenericRecord 
genericRecord) {
 return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
 
+  /**
+   * Returns the ordering value of given record {@code record}.
+   */
+  protected static Object getOrderingVal(GenericRecord record, Properties 
properties) {
+return getNestedFieldVal(record,
+
properties.getProperty(HoodiePayloadProps.PAYLOAD_ORDERING_FIELD_PROP_KEY), 
true);

Review comment:
   Reasonable, let me change it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-15 Thread GitBox


danny0405 commented on a change in pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#discussion_r670923142



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##
@@ -90,6 +93,14 @@ protected boolean isDeleteRecord(GenericRecord 
genericRecord) {
 return (deleteMarker instanceof Boolean && (boolean) deleteMarker);
   }
 
+  /**
+   * Returns the ordering value of given record {@code record}.
+   */
+  protected static Object getOrderingVal(GenericRecord record, Properties 
properties) {

Review comment:
   Hmm, i just do a refactor for the code, this key should always set.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-15 Thread GitBox


danny0405 commented on a change in pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#discussion_r670231077



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##
@@ -49,7 +49,7 @@ public OverwriteWithLatestAvroPayload(Option 
record) {
   @Override
   public OverwriteWithLatestAvroPayload 
preCombine(OverwriteWithLatestAvroPayload another) {
 // pick the payload with greatest ordering value
-if (another.orderingVal.compareTo(orderingVal) > 0) {
+if (another.orderingVal.compareTo(orderingVal) >= 0) {

Review comment:
   Thanks @nsivabalan , i also found that now 
`AbstractHoodieLogRecordScanner` always hard code the `orderingVal` to `0`, and 
then `HoodieMergedLogRecordScanner` invoke the 
`newRrecord.preCombine(oldRecord)` to merge the records to implement the 
natural order.
   
   But in my opinion, `newRrecord.preCombine(oldRecord)` has break the 
semantics of this interface, so i have also fixed the 
`AbstractHoodieLogRecordScanner` `orderingVal` fetch logic.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-07-14 Thread GitBox


danny0405 commented on a change in pull request #3267:
URL: https://github.com/apache/hudi/pull/3267#discussion_r670069995



##
File path: 
hudi-common/src/test/java/org/apache/hudi/common/model/TestOverwriteWithLatestAvroPayload.java
##
@@ -72,6 +72,17 @@ public void testActiveRecords() throws IOException {
 
 assertEquals(payload1.combineAndGetUpdateValue(record2, schema).get(), 
record1);
 assertEquals(payload2.combineAndGetUpdateValue(record1, schema).get(), 
record2);
+
+GenericRecord record3 = new GenericData.Record(schema);
+record3.put("id", "3");
+record3.put("partition", "partition2");

Review comment:
   `ts` is actually not the `preCombine` field, the `preCombine` was passed 
explicitly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org