[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload
danny0405 commented on a change in pull request #3267: URL: https://github.com/apache/hudi/pull/3267#discussion_r670935008 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java ## @@ -113,8 +120,11 @@ protected AbstractHoodieLogRecordScanner(FileSystem fs, String basePath, List
[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload
danny0405 commented on a change in pull request #3267: URL: https://github.com/apache/hudi/pull/3267#discussion_r670949047 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java ## @@ -90,6 +93,14 @@ protected boolean isDeleteRecord(GenericRecord genericRecord) { return (deleteMarker instanceof Boolean && (boolean) deleteMarker); } + /** + * Returns the ordering value of given record {@code record}. + */ + protected static Object getOrderingVal(GenericRecord record, Properties properties) { +return getNestedFieldVal(record, + properties.getProperty(HoodiePayloadProps.PAYLOAD_ORDERING_FIELD_PROP_KEY), true); Review comment: Actually there is no need to care about whether the record comes from the persisted file or not, just fetch it and return null if the old schema does not contains the `preCombine` field. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload
danny0405 commented on a change in pull request #3267: URL: https://github.com/apache/hudi/pull/3267#discussion_r670948572 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java ## @@ -49,7 +49,7 @@ public OverwriteWithLatestAvroPayload(Option record) { @Override public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload another) { // pick the payload with greatest ordering value -if (another.orderingVal.compareTo(orderingVal) > 0) { +if (another.orderingVal.compareTo(orderingVal) >= 0) { Review comment: Finally i found that the `another` should always be the old value. So the original logic is correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload
danny0405 commented on a change in pull request #3267: URL: https://github.com/apache/hudi/pull/3267#discussion_r670948572 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java ## @@ -49,7 +49,7 @@ public OverwriteWithLatestAvroPayload(Option record) { @Override public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload another) { // pick the payload with greatest ordering value -if (another.orderingVal.compareTo(orderingVal) > 0) { +if (another.orderingVal.compareTo(orderingVal) >= 0) { Review comment: Finally i found that the `another` should always be the old value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload
danny0405 commented on a change in pull request #3267: URL: https://github.com/apache/hudi/pull/3267#discussion_r670935008 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java ## @@ -113,8 +120,11 @@ protected AbstractHoodieLogRecordScanner(FileSystem fs, String basePath, List
[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload
danny0405 commented on a change in pull request #3267: URL: https://github.com/apache/hudi/pull/3267#discussion_r670926655 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java ## @@ -113,8 +120,11 @@ protected AbstractHoodieLogRecordScanner(FileSystem fs, String basePath, List
[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload
danny0405 commented on a change in pull request #3267: URL: https://github.com/apache/hudi/pull/3267#discussion_r670923302 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java ## @@ -90,6 +93,14 @@ protected boolean isDeleteRecord(GenericRecord genericRecord) { return (deleteMarker instanceof Boolean && (boolean) deleteMarker); } + /** + * Returns the ordering value of given record {@code record}. + */ + protected static Object getOrderingVal(GenericRecord record, Properties properties) { +return getNestedFieldVal(record, + properties.getProperty(HoodiePayloadProps.PAYLOAD_ORDERING_FIELD_PROP_KEY), true); Review comment: Reasonable, let me change it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload
danny0405 commented on a change in pull request #3267: URL: https://github.com/apache/hudi/pull/3267#discussion_r670923142 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java ## @@ -90,6 +93,14 @@ protected boolean isDeleteRecord(GenericRecord genericRecord) { return (deleteMarker instanceof Boolean && (boolean) deleteMarker); } + /** + * Returns the ordering value of given record {@code record}. + */ + protected static Object getOrderingVal(GenericRecord record, Properties properties) { Review comment: Hmm, i just do a refactor for the code, this key should always set. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload
danny0405 commented on a change in pull request #3267: URL: https://github.com/apache/hudi/pull/3267#discussion_r670231077 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java ## @@ -49,7 +49,7 @@ public OverwriteWithLatestAvroPayload(Option record) { @Override public OverwriteWithLatestAvroPayload preCombine(OverwriteWithLatestAvroPayload another) { // pick the payload with greatest ordering value -if (another.orderingVal.compareTo(orderingVal) > 0) { +if (another.orderingVal.compareTo(orderingVal) >= 0) { Review comment: Thanks @nsivabalan , i also found that now `AbstractHoodieLogRecordScanner` always hard code the `orderingVal` to `0`, and then `HoodieMergedLogRecordScanner` invoke the `newRrecord.preCombine(oldRecord)` to merge the records to implement the natural order. But in my opinion, `newRrecord.preCombine(oldRecord)` has break the semantics of this interface, so i have also fixed the `AbstractHoodieLogRecordScanner` `orderingVal` fetch logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #3267: [HUDI-2170] Always choose the latest record for HoodieRecordPayload
danny0405 commented on a change in pull request #3267: URL: https://github.com/apache/hudi/pull/3267#discussion_r670069995 ## File path: hudi-common/src/test/java/org/apache/hudi/common/model/TestOverwriteWithLatestAvroPayload.java ## @@ -72,6 +72,17 @@ public void testActiveRecords() throws IOException { assertEquals(payload1.combineAndGetUpdateValue(record2, schema).get(), record1); assertEquals(payload2.combineAndGetUpdateValue(record1, schema).get(), record2); + +GenericRecord record3 = new GenericData.Record(schema); +record3.put("id", "3"); +record3.put("partition", "partition2"); Review comment: `ts` is actually not the `preCombine` field, the `preCombine` was passed explicitly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org