Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-12-04 Thread via GitHub


nsivabalan merged PR #9630:
URL: https://github.com/apache/hudi/pull/9630


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-12-03 Thread via GitHub


flashJd commented on code in PR #9630:
URL: https://github.com/apache/hudi/pull/9630#discussion_r1413320714


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -1402,38 +1397,13 @@ private void 
fetchOutofSyncFilesRecordsFromMetadataTable(Map 
getRecordIndexUpdates(HoodieData writeStatuses) {
-HoodiePairData recordKeyDelegatePairs = null;
-// if update partition path is true, chances that we might get two records 
(1 delete in older partition and 1 insert to new partition)
-// and hence we might have to do reduce By key before ingesting to RLI 
partition.
-if (dataWriteConfig.getRecordIndexUpdatePartitionPath()) {
-  recordKeyDelegatePairs = writeStatuses.map(writeStatus -> 
writeStatus.getWrittenRecordDelegates().stream()
-  .map(recordDelegate -> Pair.of(recordDelegate.getRecordKey(), 
recordDelegate)))
-  .flatMapToPair(Stream::iterator)
-  .reduceByKey((recordDelegate1, recordDelegate2) -> {
-if 
(recordDelegate1.getRecordKey().equals(recordDelegate2.getRecordKey())) {
-  if (!recordDelegate1.getNewLocation().isPresent() && 
!recordDelegate2.getNewLocation().isPresent()) {
-throw new HoodieIOException("Both version of records do not 
have location set. Record V1 " + recordDelegate1.toString()
-+ ", Record V2 " + recordDelegate2.toString());
-  }
-  if (recordDelegate1.getNewLocation().isPresent()) {
-return recordDelegate1;
-  } else {
-// if record delegate 1 does not have location set, record 
delegate 2 should have location set.
-return recordDelegate2;
-  }
-} else {
-  return recordDelegate1;
-}
-  }, Math.max(1, writeStatuses.getNumPartitions()));
-} else {
-  // if update partition path = false, we should get only one entry per 
record key.
-  recordKeyDelegatePairs = writeStatuses.flatMapToPair(
-  (SerializableFunction>>) writeStatus
-  -> writeStatus.getWrittenRecordDelegates().stream().map(rec -> 
Pair.of(rec.getRecordKey(), rec)).iterator());
-}
-return recordKeyDelegatePairs
-.map(writeStatusRecordDelegate -> {
-  HoodieRecordDelegate recordDelegate = 
writeStatusRecordDelegate.getValue();
+return writeStatuses.flatMap(writeStatus -> {
+  List recordList = new LinkedList<>();
+  for (HoodieRecordDelegate recordDelegate : 
writeStatus.getWrittenRecordDelegates()) {
+if (!writeStatus.isErrored(recordDelegate.getHoodieKey())) {
+  if (recordDelegate.getIgnoreFlag()) {

Review Comment:
   yeah, it's right, `we are setting the ignore flag only in indexing code and 
specifically when indexing could reutrn two version of record delegate.`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-12-01 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1837007853

   
   ## CI report:
   
   * fe663145f11037dda6a8832e17f45899e577bd37 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21277)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-12-01 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1836940868

   
   ## CI report:
   
   * 36306f068c3bde89f668f0b4cd03cdb21e10a97f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21255)
 
   * fe663145f11037dda6a8832e17f45899e577bd37 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21277)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-12-01 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1836936195

   
   ## CI report:
   
   * 36306f068c3bde89f668f0b4cd03cdb21e10a97f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21255)
 
   * fe663145f11037dda6a8832e17f45899e577bd37 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-30 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1834632362

   
   ## CI report:
   
   * 36306f068c3bde89f668f0b4cd03cdb21e10a97f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21255)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-30 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1834458728

   
   ## CI report:
   
   * ac63d041655ace6fdfe5549f2120e208b2a97718 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21212)
 
   * 36306f068c3bde89f668f0b4cd03cdb21e10a97f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21255)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-30 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1834449238

   
   ## CI report:
   
   * ac63d041655ace6fdfe5549f2120e208b2a97718 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21212)
 
   * 36306f068c3bde89f668f0b4cd03cdb21e10a97f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-29 Thread via GitHub


nsivabalan commented on code in PR #9630:
URL: https://github.com/apache/hudi/pull/9630#discussion_r1409728826


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -1540,13 +1510,18 @@ private HoodieData 
getRecordIndexUpdates(HoodieData w
   recordDelegate.getRecordKey(), 
recordDelegate.getPartitionPath(),
   newLocation.get().getFileId(), 
newLocation.get().getInstantTime(), dataWriteConfig.getWritesFileIdEncoding());
 }
+hoodieRecord = HoodieMetadataPayload.createRecordIndexUpdate(

Review Comment:
   this does not look right. 
   we should not add a entry to RLI metadata partition if it is an update 
record in data table. 
   as per master, we will ignore it. but here looks like we are updating RLI 
even for update record. 
   can you double check the logic once. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-29 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1831643542

   
   ## CI report:
   
   * ac63d041655ace6fdfe5549f2120e208b2a97718 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21212)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-29 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1831488172

   
   ## CI report:
   
   * 8bd9148abd6d20addb6a963244926c85c02800fb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21207)
 
   * ac63d041655ace6fdfe5549f2120e208b2a97718 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21212)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-28 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1831390344

   
   ## CI report:
   
   * b6faf77fe94fabb9c9325bcc59a38ebbfecc3595 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21205)
 
   * 8bd9148abd6d20addb6a963244926c85c02800fb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21207)
 
   * ac63d041655ace6fdfe5549f2120e208b2a97718 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21212)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-28 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1831379723

   
   ## CI report:
   
   * b6faf77fe94fabb9c9325bcc59a38ebbfecc3595 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21205)
 
   * 8bd9148abd6d20addb6a963244926c85c02800fb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21207)
 
   * ac63d041655ace6fdfe5549f2120e208b2a97718 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-28 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1831370222

   
   ## CI report:
   
   * b6faf77fe94fabb9c9325bcc59a38ebbfecc3595 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21205)
 
   * 8bd9148abd6d20addb6a963244926c85c02800fb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21207)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-28 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1831302922

   
   ## CI report:
   
   * a37708f3db08326bda8b446afca3eebfa9f1b1fd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19691)
 
   * b6faf77fe94fabb9c9325bcc59a38ebbfecc3595 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21205)
 
   * 8bd9148abd6d20addb6a963244926c85c02800fb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21207)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-28 Thread via GitHub


nsivabalan commented on code in PR #9630:
URL: https://github.com/apache/hudi/pull/9630#discussion_r1408800688


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecord.java:
##
@@ -133,6 +133,11 @@ public String getFieldName() {
*/
   protected HoodieRecordLocation newLocation;
 
+  /**
+   * If set, not update index after written.
+   */
+  protected boolean ignored;

Review Comment:
   @bvaradar : is this addressed ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-28 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1831266076

   
   ## CI report:
   
   * a37708f3db08326bda8b446afca3eebfa9f1b1fd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19691)
 
   * b6faf77fe94fabb9c9325bcc59a38ebbfecc3595 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21205)
 
   * 8bd9148abd6d20addb6a963244926c85c02800fb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-28 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1831224173

   
   ## CI report:
   
   * a37708f3db08326bda8b446afca3eebfa9f1b1fd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19691)
 
   * b6faf77fe94fabb9c9325bcc59a38ebbfecc3595 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21205)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-28 Thread via GitHub


hudi-bot commented on PR #9630:
URL: https://github.com/apache/hudi/pull/9630#issuecomment-1831213498

   
   ## CI report:
   
   * a37708f3db08326bda8b446afca3eebfa9f1b1fd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19691)
 
   * b6faf77fe94fabb9c9325bcc59a38ebbfecc3595 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6822] Fix deletes handling in hbase index when partition path is updated [hudi]

2023-11-28 Thread via GitHub


nsivabalan commented on code in PR #9630:
URL: https://github.com/apache/hudi/pull/9630#discussion_r1408590637


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -1402,38 +1397,13 @@ private void 
fetchOutofSyncFilesRecordsFromMetadataTable(Map 
getRecordIndexUpdates(HoodieData writeStatuses) {
-HoodiePairData recordKeyDelegatePairs = null;
-// if update partition path is true, chances that we might get two records 
(1 delete in older partition and 1 insert to new partition)
-// and hence we might have to do reduce By key before ingesting to RLI 
partition.
-if (dataWriteConfig.getRecordIndexUpdatePartitionPath()) {
-  recordKeyDelegatePairs = writeStatuses.map(writeStatus -> 
writeStatus.getWrittenRecordDelegates().stream()
-  .map(recordDelegate -> Pair.of(recordDelegate.getRecordKey(), 
recordDelegate)))
-  .flatMapToPair(Stream::iterator)
-  .reduceByKey((recordDelegate1, recordDelegate2) -> {
-if 
(recordDelegate1.getRecordKey().equals(recordDelegate2.getRecordKey())) {
-  if (!recordDelegate1.getNewLocation().isPresent() && 
!recordDelegate2.getNewLocation().isPresent()) {
-throw new HoodieIOException("Both version of records do not 
have location set. Record V1 " + recordDelegate1.toString()
-+ ", Record V2 " + recordDelegate2.toString());
-  }
-  if (recordDelegate1.getNewLocation().isPresent()) {
-return recordDelegate1;
-  } else {
-// if record delegate 1 does not have location set, record 
delegate 2 should have location set.
-return recordDelegate2;
-  }
-} else {
-  return recordDelegate1;
-}
-  }, Math.max(1, writeStatuses.getNumPartitions()));
-} else {
-  // if update partition path = false, we should get only one entry per 
record key.
-  recordKeyDelegatePairs = writeStatuses.flatMapToPair(
-  (SerializableFunction>>) writeStatus
-  -> writeStatus.getWrittenRecordDelegates().stream().map(rec -> 
Pair.of(rec.getRecordKey(), rec)).iterator());
-}
-return recordKeyDelegatePairs
-.map(writeStatusRecordDelegate -> {
-  HoodieRecordDelegate recordDelegate = 
writeStatusRecordDelegate.getValue();
+return writeStatuses.flatMap(writeStatus -> {
+  List recordList = new LinkedList<>();
+  for (HoodieRecordDelegate recordDelegate : 
writeStatus.getWrittenRecordDelegates()) {
+if (!writeStatus.isErrored(recordDelegate.getHoodieKey())) {
+  if (recordDelegate.getIgnoreFlag()) {

Review Comment:
   how do we handle deletes. i.e. if we get deletes for a record in partition 
p1, when it reaches metadata writer, we might just have 1 recordDelegate but 
theignore flag will not be set since we are not setting it in any of write 
handles? and so we should be good. 
   
   we are setting the ignore flag only in indexing code and specifically when 
indexing could reutrn two version of record delegate. 
   
   just wanted to confirm my understanding. 



##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecord.java:
##
@@ -133,6 +133,11 @@ public String getFieldName() {
*/
   protected HoodieRecordLocation newLocation;
 
+  /**
+   * If set, not update index after written.
+   */
+  protected boolean ignored;

Review Comment:
   may be we can call it as `ignoreIndexUpdate` 
   and method can be named as "canIgnoreIndexUpdate" 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org