Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-09 Thread via GitHub


codope merged PR #10860:
URL: https://github.com/apache/hudi/pull/10860


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-09 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1557417357


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -991,9 +1001,9 @@ private void 
updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata,
   private HoodieData 
getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String 
indexPartition, String instantTime) throws Exception {
 HoodieFunctionalIndexDefinition indexDefinition = 
getFunctionalIndexDefinition(indexPartition);
 List> partitionFileSlicePairs = new ArrayList<>();
-HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient);
+HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(dataMetaClient);
 commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) 
-> {
-  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, 
Option.ofNullable(fsView), dataPartition);
+  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, 
Option.ofNullable(fsView), dataPartition);

Review Comment:
   Tracking in HUDI-7581



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


danny0405 commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1555159066


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -991,9 +1001,9 @@ private void 
updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata,
   private HoodieData 
getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String 
indexPartition, String instantTime) throws Exception {
 HoodieFunctionalIndexDefinition indexDefinition = 
getFunctionalIndexDefinition(indexPartition);
 List> partitionFileSlicePairs = new ArrayList<>();
-HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient);
+HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(dataMetaClient);
 commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) 
-> {
-  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, 
Option.ofNullable(fsView), dataPartition);
+  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, 
Option.ofNullable(fsView), dataPartition);

Review Comment:
   > Not following you here. In multi-writer scenario, there is a new latest 
file slice due to instant t1
   
   How about this client triggers cleaning for t0 with a very radical strategy 
while we do this loading check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2041474216

   
   ## CI report:
   
   * bbfbe38b86b5bd11972591a346ac9b847a7daa6a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23136)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2041454934

   
   ## CI report:
   
   * dbda44942240ecdf008df975aa15d58eaaa45a33 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23098)
 
   * bbfbe38b86b5bd11972591a346ac9b847a7daa6a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23136)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2041453013

   
   ## CI report:
   
   * dbda44942240ecdf008df975aa15d58eaaa45a33 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23098)
 
   * bbfbe38b86b5bd11972591a346ac9b847a7daa6a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1554955230


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -991,9 +1001,9 @@ private void 
updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata,
   private HoodieData 
getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String 
indexPartition, String instantTime) throws Exception {
 HoodieFunctionalIndexDefinition indexDefinition = 
getFunctionalIndexDefinition(indexPartition);
 List> partitionFileSlicePairs = new ArrayList<>();
-HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient);
+HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(dataMetaClient);
 commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) 
-> {
-  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, 
Option.ofNullable(fsView), dataPartition);
+  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, 
Option.ofNullable(fsView), dataPartition);

Review Comment:
   Not following you here. In multi-writer scenario, there is a new latest file 
slice due to instant t1 but `dataMetaClient` had already been initialized 
before t1 (say upto instant t0), then index will only be updated upto t0. In 
this case, `getPartitionLatestFileSlicesIncludingInflight` will only return 
file slices upto t0. The fsView API will return file slices from a consistent 
snapshot.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1554954192


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/TableFileSystemView.java:
##
@@ -107,6 +107,14 @@ interface SliceViewWithLatestSlice {
  */
 Stream getLatestFileSlices(String partitionPath);
 
+/**
+ * Get the latest file slices for a given partition including the inflight 
ones.
+ *
+ * @param partitionPath The partition path of interest
+ * @return Stream of latest {@link FileSlice} in the partition path.
+ */
+Stream getLatestFileSlicesIncludingInflight(String 
partitionPath);
+

Review Comment:
   No we don't need there. This is an uplevel of existing API. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-07 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1554953956


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String 
initializationTime, List functionalIndexPartitionsToInit = 
getFunctionalIndexPartitionsToInit();
+if (functionalIndexPartitionsToInit.isEmpty()) {
+  continue;

Review Comment:
   Adding to what Vinay said, going forward we will have more indexes where 
index type and index name (mdt partition name) will differ such as secondary 
index. I think we should get rid of `MetadataPartitionType`.  This also enables 
removing the `MetadataRecordsGenerationParams` pojo which is deprecated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-05 Thread via GitHub


danny0405 commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r155535


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/TableFileSystemView.java:
##
@@ -107,6 +107,14 @@ interface SliceViewWithLatestSlice {
  */
 Stream getLatestFileSlices(String partitionPath);
 
+/**
+ * Get the latest file slices for a given partition including the inflight 
ones.
+ *
+ * @param partitionPath The partition path of interest
+ * @return Stream of latest {@link FileSlice} in the partition path.
+ */
+Stream getLatestFileSlicesIncludingInflight(String 
partitionPath);
+

Review Comment:
   I didn't see impl for RocksDB fsview, do we need that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-05 Thread via GitHub


danny0405 commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1553326198


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -991,9 +1001,9 @@ private void 
updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata,
   private HoodieData 
getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String 
indexPartition, String instantTime) throws Exception {
 HoodieFunctionalIndexDefinition indexDefinition = 
getFunctionalIndexDefinition(indexPartition);
 List> partitionFileSlicePairs = new ArrayList<>();
-HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient);
+HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(dataMetaClient);
 commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) 
-> {
-  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, 
Option.ofNullable(fsView), dataPartition);
+  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, 
Option.ofNullable(fsView), dataPartition);

Review Comment:
   Yeah, the latest file slice is there but the `instantTime` here may not 
belongs to the latest file slice?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-04 Thread via GitHub


bhat-vinay commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1552836822


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String 
initializationTime, List functionalIndexPartitionsToInit = 
getFunctionalIndexPartitionsToInit();
+if (functionalIndexPartitionsToInit.isEmpty()) {
+  continue;

Review Comment:
   @danny0405 `MetadataPartitionType` has an inherent assumption that it maps 
directly to the physical partition (or rather partition path) for metadata 
table. While it is true for most of the MDT partitions, it is not the case with 
functional-index partitions (and the upcoming secondary index partition). 
`MetadataPartitionType` will only have a single entry for functional-index, 
`func_index_`, which acts as the prefix of the actual physical partition. The 
actual physical partitions is manifested as `func_index_`. This is 
what the PR is trying to fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-04 Thread via GitHub


danny0405 commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1552661236


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String 
initializationTime, List functionalIndexPartitionsToInit = 
getFunctionalIndexPartitionsToInit();
+if (functionalIndexPartitionsToInit.isEmpty()) {
+  continue;

Review Comment:
   I didn't get why the enum type affects the unification, can you elaborate a 
little bit?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-04 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2037332583

   
   ## CI report:
   
   * dbda44942240ecdf008df975aa15d58eaaa45a33 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23098)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-04 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2037190595

   
   ## CI report:
   
   * 971883fd498830b3c1257b9df61bf012d3be1554 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22977)
 
   * dbda44942240ecdf008df975aa15d58eaaa45a33 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23098)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-04 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2037169431

   
   ## CI report:
   
   * 971883fd498830b3c1257b9df61bf012d3be1554 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22977)
 
   * dbda44942240ecdf008df975aa15d58eaaa45a33 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-04 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1551490057


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -991,9 +1001,9 @@ private void 
updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata,
   private HoodieData 
getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String 
indexPartition, String instantTime) throws Exception {
 HoodieFunctionalIndexDefinition indexDefinition = 
getFunctionalIndexDefinition(indexPartition);
 List> partitionFileSlicePairs = new ArrayList<>();
-HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient);
+HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(dataMetaClient);
 commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) 
-> {
-  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, 
Option.ofNullable(fsView), dataPartition);
+  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, 
Option.ofNullable(fsView), dataPartition);

Review Comment:
   Cleaner will only delete the older log files which are already committed at 
least one commit before the latest version. So, log files for this instant 
should be present.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-04 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1551488498


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTableFileSystemView.java:
##
@@ -428,6 +428,7 @@ protected Option getReplaceInstant(final 
HoodieFileGroupId fileGr
* @return Stream of latest {@link FileSlice} in the partition path.
*/
   public Stream fetchLatestFileSlicesIncludingInflight(String 
partitionPath) {
+ensurePartitionLoadedCorrectly(partitionPath);

Review Comment:
   yes, i've moved to super class `AbstractTableFileSystemView` and added test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-04-04 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1551487833


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String 
initializationTime, List functionalIndexPartitionsToInit = 
getFunctionalIndexPartitionsToInit();
+if (functionalIndexPartitionsToInit.isEmpty()) {
+  continue;

Review Comment:
   We cannot do it currently without really getting rid of 
`MetadataPartitionType` enum. I think we should just have strings insetad of 
enum. If you agree, I can take up in a followup patch?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-22 Thread via GitHub


danny0405 commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1535241868


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String 
initializationTime, List functionalIndexPartitionsToInit = 
getFunctionalIndexPartitionsToInit();
+if (functionalIndexPartitionsToInit.isEmpty()) {
+  continue;

Review Comment:
   Can we unifiy the logic into one shot?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-21 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1535126995


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String 
initializationTime, List functionalIndexPartitionsToInit = 
getFunctionalIndexPartitionsToInit();
+if (functionalIndexPartitionsToInit.isEmpty()) {
+  continue;

Review Comment:
   No, i changed the logic slightly. `isMetadataPartitionAvailable` in line 392 
only checks for func_index prefix in the metadata partition name. By that 
check, even an old functional index will pass for which we don't need to 
initialize. So, we get the uninitialized functional index partitions based on 
index definition in `getFunctionalIndexPartitionsToInit`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-21 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1535123166


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -534,10 +537,17 @@ private Pair> 
initializeFunctionalIndexPartiti
 
 int fileGroupCount = 
dataWriteConfig.getMetadataConfig().getFunctionalIndexFileGroupCount();
 int parallelism = Math.min(partitionFileSlicePairs.size(), 
dataWriteConfig.getMetadataConfig().getFunctionalIndexParallelism());
-Schema readerSchema = 
addMetadataFields(getProjectedSchemaForFunctionalIndex(indexDefinition, 
dataMetaClient), dataWriteConfig.allowOperationMetadataField());
+Schema readerSchema = 
getProjectedSchemaForFunctionalIndex(indexDefinition, dataMetaClient);
 return Pair.of(fileGroupCount, 
getFunctionalIndexRecords(partitionFileSlicePairs, indexDefinition, 
dataMetaClient, parallelism, readerSchema, hadoopConf));
   }
 
+  private Set getFunctionalIndexPartitionsToInit() {
+Set functionalIndexPartitions = 
dataMetaClient.getFunctionalIndexMetadata().get().getIndexDefinitions().keySet();
+Set completedMetadataPartitions = 
dataMetaClient.getTableConfig().getMetadataPartitions();

Review Comment:
   this is handled in `getFunctionalIndexMetadata` method itself. If It is 
empty and the index definition path is defined then we load the file. The 
loading ideally happens just once when instantiating the metaclient. As long as 
we're not recreating meta client in metadata writer we should be good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-21 Thread via GitHub


danny0405 commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1534943897


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTableFileSystemView.java:
##
@@ -428,6 +428,7 @@ protected Option getReplaceInstant(final 
HoodieFileGroupId fileGr
* @return Stream of latest {@link FileSlice} in the partition path.
*/
   public Stream fetchLatestFileSlicesIncludingInflight(String 
partitionPath) {
+ensurePartitionLoadedCorrectly(partitionPath);

Review Comment:
   do we need a read lock guard here? And this method is used for only testing 
purpose now, may need to upgrade it as top-level API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-21 Thread via GitHub


danny0405 commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1534943496


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -991,9 +1001,9 @@ private void 
updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata,
   private HoodieData 
getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String 
indexPartition, String instantTime) throws Exception {
 HoodieFunctionalIndexDefinition indexDefinition = 
getFunctionalIndexDefinition(indexPartition);
 List> partitionFileSlicePairs = new ArrayList<>();
-HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient);
+HoodieTableFileSystemView fsView = 
HoodieTableMetadataUtil.getFileSystemView(dataMetaClient);
 commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) 
-> {
-  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, 
Option.ofNullable(fsView), dataPartition);
+  List fileSlices = 
getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, 
Option.ofNullable(fsView), dataPartition);

Review Comment:
   Is there any case when the log files for this instant got cleaned?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-21 Thread via GitHub


danny0405 commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1534943099


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -534,10 +537,17 @@ private Pair> 
initializeFunctionalIndexPartiti
 
 int fileGroupCount = 
dataWriteConfig.getMetadataConfig().getFunctionalIndexFileGroupCount();
 int parallelism = Math.min(partitionFileSlicePairs.size(), 
dataWriteConfig.getMetadataConfig().getFunctionalIndexParallelism());
-Schema readerSchema = 
addMetadataFields(getProjectedSchemaForFunctionalIndex(indexDefinition, 
dataMetaClient), dataWriteConfig.allowOperationMetadataField());
+Schema readerSchema = 
getProjectedSchemaForFunctionalIndex(indexDefinition, dataMetaClient);
 return Pair.of(fileGroupCount, 
getFunctionalIndexRecords(partitionFileSlicePairs, indexDefinition, 
dataMetaClient, parallelism, readerSchema, hadoopConf));
   }
 
+  private Set getFunctionalIndexPartitionsToInit() {
+Set functionalIndexPartitions = 
dataMetaClient.getFunctionalIndexMetadata().get().getIndexDefinitions().keySet();
+Set completedMetadataPartitions = 
dataMetaClient.getTableConfig().getMetadataPartitions();

Review Comment:
   How about when `dataMetaClient.getFunctionalIndexMetadata()` is empty?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-21 Thread via GitHub


danny0405 commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1534942879


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String 
initializationTime, List functionalIndexPartitionsToInit = 
getFunctionalIndexPartitionsToInit();
+if (functionalIndexPartitionsToInit.isEmpty()) {
+  continue;

Review Comment:
   Isn't line 392 already hanling the repetitive loading?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-21 Thread via GitHub


codope commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2012814839

   @danny0405 @bhat-vinay @yihua All tests passed now. Please take another pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-21 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2012296763

   
   ## CI report:
   
   * 971883fd498830b3c1257b9df61bf012d3be1554 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22977)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-21 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2012155756

   
   ## CI report:
   
   * c37115caa8348eeb0653cb03d5f773e2d046082b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22972)
 
   * 971883fd498830b3c1257b9df61bf012d3be1554 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22977)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-21 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2012137384

   
   ## CI report:
   
   * c37115caa8348eeb0653cb03d5f773e2d046082b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22972)
 
   * 971883fd498830b3c1257b9df61bf012d3be1554 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-21 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2011575469

   
   ## CI report:
   
   * c37115caa8348eeb0653cb03d5f773e2d046082b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22972)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-20 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2011305161

   
   ## CI report:
   
   * 567e8faf62ac6ed8c6df54ae8cbc4a0c6d994323 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22918)
 
   * c37115caa8348eeb0653cb03d5f773e2d046082b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22972)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-20 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-2011297856

   
   ## CI report:
   
   * 567e8faf62ac6ed8c6df54ae8cbc4a0c6d994323 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22918)
 
   * c37115caa8348eeb0653cb03d5f773e2d046082b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-16 Thread via GitHub


bhat-vinay commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1527415181


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##
@@ -737,11 +738,12 @@ public boolean isMetadataTableAvailable() {
   /**
* Checks if metadata table is enabled and the specified partition has been 
initialized.
*
-   * @param partition The partition to check
+   * @param metadataPartitionType The metadata table partition type to check
* @returns true if the specific partition has been initialized, else 
returns false.
*/
-  public boolean isMetadataPartitionAvailable(MetadataPartitionType partition) 
{
-return getMetadataPartitions().contains(partition.getPartitionPath());
+  public boolean isMetadataPartitionAvailable(MetadataPartitionType 
metadataPartitionType) {

Review Comment:
   Loosk good. The test failures look like simple checkstyle fixes. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-15 Thread via GitHub


danny0405 commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-1999175611

   Hmm, it looks like there are many test failures.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-14 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-1998845915

   
   ## CI report:
   
   * 567e8faf62ac6ed8c6df54ae8cbc4a0c6d994323 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22918)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-14 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-1998792317

   
   ## CI report:
   
   * 0b76cded2d8eab1c306d16a62717c7cba02e9b96 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22893)
 
   * 567e8faf62ac6ed8c6df54ae8cbc4a0c6d994323 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22918)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-14 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1525665975


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##
@@ -737,11 +738,12 @@ public boolean isMetadataTableAvailable() {
   /**
* Checks if metadata table is enabled and the specified partition has been 
initialized.
*
-   * @param partition The partition to check
+   * @param metadataPartitionType The metadata table partition type to check
* @returns true if the specific partition has been initialized, else 
returns false.
*/
-  public boolean isMetadataPartitionAvailable(MetadataPartitionType partition) 
{
-return getMetadataPartitions().contains(partition.getPartitionPath());
+  public boolean isMetadataPartitionAvailable(MetadataPartitionType 
metadataPartitionType) {

Review Comment:
   i changed the logic a bit. We will still find func index in the mdt 
partitions to initialize, but then when we actually go about initializing we 
will check based on full partition path - see `initializeFromFileSystem` and 
`getFunctionalIndexPartitionsToInit` in `HoodieBackedTableMetadataWriter`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-14 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-1998775560

   
   ## CI report:
   
   * 0b76cded2d8eab1c306d16a62717c7cba02e9b96 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22893)
 
   * 567e8faf62ac6ed8c6df54ae8cbc4a0c6d994323 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-14 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1524587040


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##
@@ -1919,7 +1920,7 @@ public HoodieRecord next() {
   public static Schema 
getProjectedSchemaForFunctionalIndex(HoodieFunctionalIndexDefinition 
indexDefinition, HoodieTableMetaClient metaClient) throws Exception {
 TableSchemaResolver schemaResolver = new TableSchemaResolver(metaClient);
 Schema tableSchema = schemaResolver.getTableAvroSchema();
-return HoodieAvroUtils.getSchemaForFields(tableSchema, 
indexDefinition.getSourceFields());
+return addMetadataFields(getSchemaForFields(tableSchema, 
indexDefinition.getSourceFields()));

Review Comment:
   yeah for update we need record key field.



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestFunctionalIndex.scala:
##
@@ -253,6 +253,47 @@ class TestFunctionalIndex extends HoodieSparkSqlTestBase {
 }
   }
 
+  test("Test functional index update after initialization") {
+withTempDir(tmp => {
+  val tableName = generateTableName
+  val basePath = s"${tmp.getCanonicalPath}/$tableName"
+  spark.sql(
+s"""create table $tableName (
+id int,
+name string,
+price double,
+ts long
+) using hudi
+options (
+primaryKey ='id',
+type = 'mor',
+preCombineField = 'ts',
+hoodie.metadata.record.index.enable = 'true',
+hoodie.datasource.write.recordkey.field = 'id'
+)
+partitioned by(ts)
+location '$basePath'""".stripMargin)
+  spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
+  spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)")
+  spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)")
+
+  checkAnswer(s"select id, name from $tableName where from_unixtime(ts, 
'-MM-dd') = '1970-01-01'")(
+Seq(1, "a1"),
+Seq(2, "a2"),
+Seq(3, "a3")
+  )
+  // create functional index
+  val createIndexSql = s"create index idx_datestr on $tableName using 
column_stats(ts) options(func='from_unixtime', format='-MM-dd')"
+  spark.sql(createIndexSql)
+  // do another insert after initializing the index
+  spark.sql(s"insert into $tableName values(4, 'a4', 10, 1000)")
+  // check query result
+  checkAnswer(s"select id, name from $tableName where from_unixtime(ts, 
'-MM-dd') = '1970-04-26'")(
+Seq(4, "a4")
+  )

Review Comment:
   ack.. will do



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-14 Thread via GitHub


codope commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1524586719


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##
@@ -737,11 +738,12 @@ public boolean isMetadataTableAvailable() {
   /**
* Checks if metadata table is enabled and the specified partition has been 
initialized.
*
-   * @param partition The partition to check
+   * @param metadataPartitionType The metadata table partition type to check
* @returns true if the specific partition has been initialized, else 
returns false.
*/
-  public boolean isMetadataPartitionAvailable(MetadataPartitionType partition) 
{
-return getMetadataPartitions().contains(partition.getPartitionPath());
+  public boolean isMetadataPartitionAvailable(MetadataPartitionType 
metadataPartitionType) {
+return getMetadataPartitions().stream().anyMatch(metadataPartition ->
+metadataPartition.equals(metadataPartitionType.getPartitionPath()) || 
(FUNCTIONAL_INDEX.equals(metadataPartitionType) && 
metadataPartition.startsWith(FUNCTIONAL_INDEX.getPartitionPath(;
   }

Review Comment:
   it's redundant. i can remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-13 Thread via GitHub


bhat-vinay commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1524160977


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestFunctionalIndex.scala:
##
@@ -253,6 +253,47 @@ class TestFunctionalIndex extends HoodieSparkSqlTestBase {
 }
   }
 
+  test("Test functional index update after initialization") {
+withTempDir(tmp => {
+  val tableName = generateTableName
+  val basePath = s"${tmp.getCanonicalPath}/$tableName"
+  spark.sql(
+s"""create table $tableName (
+id int,
+name string,
+price double,
+ts long
+) using hudi
+options (
+primaryKey ='id',
+type = 'mor',
+preCombineField = 'ts',
+hoodie.metadata.record.index.enable = 'true',
+hoodie.datasource.write.recordkey.field = 'id'
+)
+partitioned by(ts)
+location '$basePath'""".stripMargin)
+  spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
+  spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)")
+  spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)")
+
+  checkAnswer(s"select id, name from $tableName where from_unixtime(ts, 
'-MM-dd') = '1970-01-01'")(
+Seq(1, "a1"),
+Seq(2, "a2"),
+Seq(3, "a3")
+  )
+  // create functional index
+  val createIndexSql = s"create index idx_datestr on $tableName using 
column_stats(ts) options(func='from_unixtime', format='-MM-dd')"
+  spark.sql(createIndexSql)
+  // do another insert after initializing the index
+  spark.sql(s"insert into $tableName values(4, 'a4', 10, 1000)")
+  // check query result
+  checkAnswer(s"select id, name from $tableName where from_unixtime(ts, 
'-MM-dd') = '1970-04-26'")(
+Seq(4, "a4")
+  )

Review Comment:
   Might also want to check that the functional-index is available in 
tableConfig's metadata partition lists 
(`assert(metaClient.getTableConfig.isMetadataPartitionAvailable("func_index_idx_datestr"))`)



##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##
@@ -737,11 +738,12 @@ public boolean isMetadataTableAvailable() {
   /**
* Checks if metadata table is enabled and the specified partition has been 
initialized.
*
-   * @param partition The partition to check
+   * @param metadataPartitionType The metadata table partition type to check
* @returns true if the specific partition has been initialized, else 
returns false.
*/
-  public boolean isMetadataPartitionAvailable(MetadataPartitionType partition) 
{
-return getMetadataPartitions().contains(partition.getPartitionPath());
+  public boolean isMetadataPartitionAvailable(MetadataPartitionType 
metadataPartitionType) {

Review Comment:
   Is this sufficient or should we actually check the whole partition-path. 
Something like `public boolean isMetadataPartitionPathAvailable(String 
partitionPath)`. Mostly asking for the case when there could be multiple 
functional-index.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-13 Thread via GitHub


danny0405 commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1524070449


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##
@@ -1919,7 +1920,7 @@ public HoodieRecord next() {
   public static Schema 
getProjectedSchemaForFunctionalIndex(HoodieFunctionalIndexDefinition 
indexDefinition, HoodieTableMetaClient metaClient) throws Exception {
 TableSchemaResolver schemaResolver = new TableSchemaResolver(metaClient);
 Schema tableSchema = schemaResolver.getTableAvroSchema();
-return HoodieAvroUtils.getSchemaForFields(tableSchema, 
indexDefinition.getSourceFields());
+return addMetadataFields(getSchemaForFields(tableSchema, 
indexDefinition.getSourceFields()));

Review Comment:
   Is this change related with this fix?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-13 Thread via GitHub


danny0405 commented on code in PR #10860:
URL: https://github.com/apache/hudi/pull/10860#discussion_r1524070179


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##
@@ -737,11 +738,12 @@ public boolean isMetadataTableAvailable() {
   /**
* Checks if metadata table is enabled and the specified partition has been 
initialized.
*
-   * @param partition The partition to check
+   * @param metadataPartitionType The metadata table partition type to check
* @returns true if the specific partition has been initialized, else 
returns false.
*/
-  public boolean isMetadataPartitionAvailable(MetadataPartitionType partition) 
{
-return getMetadataPartitions().contains(partition.getPartitionPath());
+  public boolean isMetadataPartitionAvailable(MetadataPartitionType 
metadataPartitionType) {
+return getMetadataPartitions().stream().anyMatch(metadataPartition ->
+metadataPartition.equals(metadataPartitionType.getPartitionPath()) || 
(FUNCTIONAL_INDEX.equals(metadataPartitionType) && 
metadataPartition.startsWith(FUNCTIONAL_INDEX.getPartitionPath(;
   }

Review Comment:
   Why we need this check: `FUNCTIONAL_INDEX.equals(metadataPartitionType)` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-13 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-1995528694

   
   ## CI report:
   
   * 0b76cded2d8eab1c306d16a62717c7cba02e9b96 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22893)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-13 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-1995186398

   
   ## CI report:
   
   * 0b76cded2d8eab1c306d16a62717c7cba02e9b96 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22893)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-13 Thread via GitHub


hudi-bot commented on PR #10860:
URL: https://github.com/apache/hudi/pull/10860#issuecomment-1995162248

   
   ## CI report:
   
   * 0b76cded2d8eab1c306d16a62717c7cba02e9b96 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]

2024-03-13 Thread via GitHub


codope opened a new pull request, #10860:
URL: https://github.com/apache/hudi/pull/10860

   ### Change Logs
   
   This PR fixes two issues:
   
   1. When a functional index is updated, it was getting re-initialized. This 
was happening because metadata partition type only contains the functional 
index partition prefix while hoodie.properties contains the full name. Due to 
discrepancy, metadata writer thought there is anew index that needs ot be 
initialized. This is fixed in `HoodieTableConfig#isMetadataPartitionAvailable`.
   2. When a functional index is updated, it fetches latest file slices, 
including the pending ones, in data partition. There was a typo in building the 
filesystem view. Instead of passing data meta client, metadata meta client was 
passed.
   
   Added a test which covers both these scenarios.
   
   ### Impact
   
   Fixes functional index bugs.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org