Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope merged PR #10860: URL: https://github.com/apache/hudi/pull/10860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1557417357 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -991,9 +1001,9 @@ private void updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata, private HoodieData getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String indexPartition, String instantTime) throws Exception { HoodieFunctionalIndexDefinition indexDefinition = getFunctionalIndexDefinition(indexPartition); List> partitionFileSlicePairs = new ArrayList<>(); -HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient); +HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(dataMetaClient); commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) -> { - List fileSlices = getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, Option.ofNullable(fsView), dataPartition); + List fileSlices = getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, Option.ofNullable(fsView), dataPartition); Review Comment: Tracking in HUDI-7581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
danny0405 commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1555159066 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -991,9 +1001,9 @@ private void updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata, private HoodieData getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String indexPartition, String instantTime) throws Exception { HoodieFunctionalIndexDefinition indexDefinition = getFunctionalIndexDefinition(indexPartition); List> partitionFileSlicePairs = new ArrayList<>(); -HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient); +HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(dataMetaClient); commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) -> { - List fileSlices = getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, Option.ofNullable(fsView), dataPartition); + List fileSlices = getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, Option.ofNullable(fsView), dataPartition); Review Comment: > Not following you here. In multi-writer scenario, there is a new latest file slice due to instant t1 How about this client triggers cleaning for t0 with a very radical strategy while we do this loading check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2041474216 ## CI report: * bbfbe38b86b5bd11972591a346ac9b847a7daa6a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23136) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2041454934 ## CI report: * dbda44942240ecdf008df975aa15d58eaaa45a33 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23098) * bbfbe38b86b5bd11972591a346ac9b847a7daa6a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23136) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2041453013 ## CI report: * dbda44942240ecdf008df975aa15d58eaaa45a33 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23098) * bbfbe38b86b5bd11972591a346ac9b847a7daa6a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1554955230 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -991,9 +1001,9 @@ private void updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata, private HoodieData getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String indexPartition, String instantTime) throws Exception { HoodieFunctionalIndexDefinition indexDefinition = getFunctionalIndexDefinition(indexPartition); List> partitionFileSlicePairs = new ArrayList<>(); -HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient); +HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(dataMetaClient); commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) -> { - List fileSlices = getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, Option.ofNullable(fsView), dataPartition); + List fileSlices = getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, Option.ofNullable(fsView), dataPartition); Review Comment: Not following you here. In multi-writer scenario, there is a new latest file slice due to instant t1 but `dataMetaClient` had already been initialized before t1 (say upto instant t0), then index will only be updated upto t0. In this case, `getPartitionLatestFileSlicesIncludingInflight` will only return file slices upto t0. The fsView API will return file slices from a consistent snapshot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1554954192 ## hudi-common/src/main/java/org/apache/hudi/common/table/view/TableFileSystemView.java: ## @@ -107,6 +107,14 @@ interface SliceViewWithLatestSlice { */ Stream getLatestFileSlices(String partitionPath); +/** + * Get the latest file slices for a given partition including the inflight ones. + * + * @param partitionPath The partition path of interest + * @return Stream of latest {@link FileSlice} in the partition path. + */ +Stream getLatestFileSlicesIncludingInflight(String partitionPath); + Review Comment: No we don't need there. This is an uplevel of existing API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1554953956 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String initializationTime, List functionalIndexPartitionsToInit = getFunctionalIndexPartitionsToInit(); +if (functionalIndexPartitionsToInit.isEmpty()) { + continue; Review Comment: Adding to what Vinay said, going forward we will have more indexes where index type and index name (mdt partition name) will differ such as secondary index. I think we should get rid of `MetadataPartitionType`. This also enables removing the `MetadataRecordsGenerationParams` pojo which is deprecated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
danny0405 commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r155535 ## hudi-common/src/main/java/org/apache/hudi/common/table/view/TableFileSystemView.java: ## @@ -107,6 +107,14 @@ interface SliceViewWithLatestSlice { */ Stream getLatestFileSlices(String partitionPath); +/** + * Get the latest file slices for a given partition including the inflight ones. + * + * @param partitionPath The partition path of interest + * @return Stream of latest {@link FileSlice} in the partition path. + */ +Stream getLatestFileSlicesIncludingInflight(String partitionPath); + Review Comment: I didn't see impl for RocksDB fsview, do we need that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
danny0405 commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1553326198 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -991,9 +1001,9 @@ private void updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata, private HoodieData getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String indexPartition, String instantTime) throws Exception { HoodieFunctionalIndexDefinition indexDefinition = getFunctionalIndexDefinition(indexPartition); List> partitionFileSlicePairs = new ArrayList<>(); -HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient); +HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(dataMetaClient); commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) -> { - List fileSlices = getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, Option.ofNullable(fsView), dataPartition); + List fileSlices = getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, Option.ofNullable(fsView), dataPartition); Review Comment: Yeah, the latest file slice is there but the `instantTime` here may not belongs to the latest file slice? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
bhat-vinay commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1552836822 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String initializationTime, List functionalIndexPartitionsToInit = getFunctionalIndexPartitionsToInit(); +if (functionalIndexPartitionsToInit.isEmpty()) { + continue; Review Comment: @danny0405 `MetadataPartitionType` has an inherent assumption that it maps directly to the physical partition (or rather partition path) for metadata table. While it is true for most of the MDT partitions, it is not the case with functional-index partitions (and the upcoming secondary index partition). `MetadataPartitionType` will only have a single entry for functional-index, `func_index_`, which acts as the prefix of the actual physical partition. The actual physical partitions is manifested as `func_index_`. This is what the PR is trying to fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
danny0405 commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1552661236 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String initializationTime, List functionalIndexPartitionsToInit = getFunctionalIndexPartitionsToInit(); +if (functionalIndexPartitionsToInit.isEmpty()) { + continue; Review Comment: I didn't get why the enum type affects the unification, can you elaborate a little bit? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2037332583 ## CI report: * dbda44942240ecdf008df975aa15d58eaaa45a33 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23098) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2037190595 ## CI report: * 971883fd498830b3c1257b9df61bf012d3be1554 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22977) * dbda44942240ecdf008df975aa15d58eaaa45a33 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23098) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2037169431 ## CI report: * 971883fd498830b3c1257b9df61bf012d3be1554 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22977) * dbda44942240ecdf008df975aa15d58eaaa45a33 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1551490057 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -991,9 +1001,9 @@ private void updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata, private HoodieData getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String indexPartition, String instantTime) throws Exception { HoodieFunctionalIndexDefinition indexDefinition = getFunctionalIndexDefinition(indexPartition); List> partitionFileSlicePairs = new ArrayList<>(); -HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient); +HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(dataMetaClient); commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) -> { - List fileSlices = getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, Option.ofNullable(fsView), dataPartition); + List fileSlices = getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, Option.ofNullable(fsView), dataPartition); Review Comment: Cleaner will only delete the older log files which are already committed at least one commit before the latest version. So, log files for this instant should be present. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1551488498 ## hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTableFileSystemView.java: ## @@ -428,6 +428,7 @@ protected Option getReplaceInstant(final HoodieFileGroupId fileGr * @return Stream of latest {@link FileSlice} in the partition path. */ public Stream fetchLatestFileSlicesIncludingInflight(String partitionPath) { +ensurePartitionLoadedCorrectly(partitionPath); Review Comment: yes, i've moved to super class `AbstractTableFileSystemView` and added test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1551487833 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String initializationTime, List functionalIndexPartitionsToInit = getFunctionalIndexPartitionsToInit(); +if (functionalIndexPartitionsToInit.isEmpty()) { + continue; Review Comment: We cannot do it currently without really getting rid of `MetadataPartitionType` enum. I think we should just have strings insetad of enum. If you agree, I can take up in a followup patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
danny0405 commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1535241868 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String initializationTime, List functionalIndexPartitionsToInit = getFunctionalIndexPartitionsToInit(); +if (functionalIndexPartitionsToInit.isEmpty()) { + continue; Review Comment: Can we unifiy the logic into one shot? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1535126995 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String initializationTime, List functionalIndexPartitionsToInit = getFunctionalIndexPartitionsToInit(); +if (functionalIndexPartitionsToInit.isEmpty()) { + continue; Review Comment: No, i changed the logic slightly. `isMetadataPartitionAvailable` in line 392 only checks for func_index prefix in the metadata partition name. By that check, even an old functional index will pass for which we don't need to initialize. So, we get the uninitialized functional index partitions based on index definition in `getFunctionalIndexPartitionsToInit`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1535123166 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -534,10 +537,17 @@ private Pair> initializeFunctionalIndexPartiti int fileGroupCount = dataWriteConfig.getMetadataConfig().getFunctionalIndexFileGroupCount(); int parallelism = Math.min(partitionFileSlicePairs.size(), dataWriteConfig.getMetadataConfig().getFunctionalIndexParallelism()); -Schema readerSchema = addMetadataFields(getProjectedSchemaForFunctionalIndex(indexDefinition, dataMetaClient), dataWriteConfig.allowOperationMetadataField()); +Schema readerSchema = getProjectedSchemaForFunctionalIndex(indexDefinition, dataMetaClient); return Pair.of(fileGroupCount, getFunctionalIndexRecords(partitionFileSlicePairs, indexDefinition, dataMetaClient, parallelism, readerSchema, hadoopConf)); } + private Set getFunctionalIndexPartitionsToInit() { +Set functionalIndexPartitions = dataMetaClient.getFunctionalIndexMetadata().get().getIndexDefinitions().keySet(); +Set completedMetadataPartitions = dataMetaClient.getTableConfig().getMetadataPartitions(); Review Comment: this is handled in `getFunctionalIndexMetadata` method itself. If It is empty and the index definition path is defined then we load the file. The loading ideally happens just once when instantiating the metaclient. As long as we're not recreating meta client in metadata writer we should be good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
danny0405 commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1534943897 ## hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTableFileSystemView.java: ## @@ -428,6 +428,7 @@ protected Option getReplaceInstant(final HoodieFileGroupId fileGr * @return Stream of latest {@link FileSlice} in the partition path. */ public Stream fetchLatestFileSlicesIncludingInflight(String partitionPath) { +ensurePartitionLoadedCorrectly(partitionPath); Review Comment: do we need a read lock guard here? And this method is used for only testing purpose now, may need to upgrade it as top-level API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
danny0405 commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1534943496 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -991,9 +1001,9 @@ private void updateFunctionalIndexIfPresent(HoodieCommitMetadata commitMetadata, private HoodieData getFunctionalIndexUpdates(HoodieCommitMetadata commitMetadata, String indexPartition, String instantTime) throws Exception { HoodieFunctionalIndexDefinition indexDefinition = getFunctionalIndexDefinition(indexPartition); List> partitionFileSlicePairs = new ArrayList<>(); -HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient); +HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(dataMetaClient); commitMetadata.getPartitionToWriteStats().forEach((dataPartition, value) -> { - List fileSlices = getPartitionLatestFileSlicesIncludingInflight(metadataMetaClient, Option.ofNullable(fsView), dataPartition); + List fileSlices = getPartitionLatestFileSlicesIncludingInflight(dataMetaClient, Option.ofNullable(fsView), dataPartition); Review Comment: Is there any case when the log files for this instant got cleaned? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
danny0405 commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1534943099 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -534,10 +537,17 @@ private Pair> initializeFunctionalIndexPartiti int fileGroupCount = dataWriteConfig.getMetadataConfig().getFunctionalIndexFileGroupCount(); int parallelism = Math.min(partitionFileSlicePairs.size(), dataWriteConfig.getMetadataConfig().getFunctionalIndexParallelism()); -Schema readerSchema = addMetadataFields(getProjectedSchemaForFunctionalIndex(indexDefinition, dataMetaClient), dataWriteConfig.allowOperationMetadataField()); +Schema readerSchema = getProjectedSchemaForFunctionalIndex(indexDefinition, dataMetaClient); return Pair.of(fileGroupCount, getFunctionalIndexRecords(partitionFileSlicePairs, indexDefinition, dataMetaClient, parallelism, readerSchema, hadoopConf)); } + private Set getFunctionalIndexPartitionsToInit() { +Set functionalIndexPartitions = dataMetaClient.getFunctionalIndexMetadata().get().getIndexDefinitions().keySet(); +Set completedMetadataPartitions = dataMetaClient.getTableConfig().getMetadataPartitions(); Review Comment: How about when `dataMetaClient.getFunctionalIndexMetadata()` is empty? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
danny0405 commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1534942879 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -434,7 +433,12 @@ private boolean initializeFromFilesystem(String initializationTime, List functionalIndexPartitionsToInit = getFunctionalIndexPartitionsToInit(); +if (functionalIndexPartitionsToInit.isEmpty()) { + continue; Review Comment: Isn't line 392 already hanling the repetitive loading? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2012814839 @danny0405 @bhat-vinay @yihua All tests passed now. Please take another pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2012296763 ## CI report: * 971883fd498830b3c1257b9df61bf012d3be1554 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22977) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2012155756 ## CI report: * c37115caa8348eeb0653cb03d5f773e2d046082b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22972) * 971883fd498830b3c1257b9df61bf012d3be1554 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22977) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2012137384 ## CI report: * c37115caa8348eeb0653cb03d5f773e2d046082b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22972) * 971883fd498830b3c1257b9df61bf012d3be1554 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2011575469 ## CI report: * c37115caa8348eeb0653cb03d5f773e2d046082b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22972) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2011305161 ## CI report: * 567e8faf62ac6ed8c6df54ae8cbc4a0c6d994323 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22918) * c37115caa8348eeb0653cb03d5f773e2d046082b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22972) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-2011297856 ## CI report: * 567e8faf62ac6ed8c6df54ae8cbc4a0c6d994323 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22918) * c37115caa8348eeb0653cb03d5f773e2d046082b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
bhat-vinay commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1527415181 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java: ## @@ -737,11 +738,12 @@ public boolean isMetadataTableAvailable() { /** * Checks if metadata table is enabled and the specified partition has been initialized. * - * @param partition The partition to check + * @param metadataPartitionType The metadata table partition type to check * @returns true if the specific partition has been initialized, else returns false. */ - public boolean isMetadataPartitionAvailable(MetadataPartitionType partition) { -return getMetadataPartitions().contains(partition.getPartitionPath()); + public boolean isMetadataPartitionAvailable(MetadataPartitionType metadataPartitionType) { Review Comment: Loosk good. The test failures look like simple checkstyle fixes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
danny0405 commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-1999175611 Hmm, it looks like there are many test failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-1998845915 ## CI report: * 567e8faf62ac6ed8c6df54ae8cbc4a0c6d994323 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22918) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-1998792317 ## CI report: * 0b76cded2d8eab1c306d16a62717c7cba02e9b96 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22893) * 567e8faf62ac6ed8c6df54ae8cbc4a0c6d994323 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22918) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1525665975 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java: ## @@ -737,11 +738,12 @@ public boolean isMetadataTableAvailable() { /** * Checks if metadata table is enabled and the specified partition has been initialized. * - * @param partition The partition to check + * @param metadataPartitionType The metadata table partition type to check * @returns true if the specific partition has been initialized, else returns false. */ - public boolean isMetadataPartitionAvailable(MetadataPartitionType partition) { -return getMetadataPartitions().contains(partition.getPartitionPath()); + public boolean isMetadataPartitionAvailable(MetadataPartitionType metadataPartitionType) { Review Comment: i changed the logic a bit. We will still find func index in the mdt partitions to initialize, but then when we actually go about initializing we will check based on full partition path - see `initializeFromFileSystem` and `getFunctionalIndexPartitionsToInit` in `HoodieBackedTableMetadataWriter` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-1998775560 ## CI report: * 0b76cded2d8eab1c306d16a62717c7cba02e9b96 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22893) * 567e8faf62ac6ed8c6df54ae8cbc4a0c6d994323 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1524587040 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java: ## @@ -1919,7 +1920,7 @@ public HoodieRecord next() { public static Schema getProjectedSchemaForFunctionalIndex(HoodieFunctionalIndexDefinition indexDefinition, HoodieTableMetaClient metaClient) throws Exception { TableSchemaResolver schemaResolver = new TableSchemaResolver(metaClient); Schema tableSchema = schemaResolver.getTableAvroSchema(); -return HoodieAvroUtils.getSchemaForFields(tableSchema, indexDefinition.getSourceFields()); +return addMetadataFields(getSchemaForFields(tableSchema, indexDefinition.getSourceFields())); Review Comment: yeah for update we need record key field. ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestFunctionalIndex.scala: ## @@ -253,6 +253,47 @@ class TestFunctionalIndex extends HoodieSparkSqlTestBase { } } + test("Test functional index update after initialization") { +withTempDir(tmp => { + val tableName = generateTableName + val basePath = s"${tmp.getCanonicalPath}/$tableName" + spark.sql( +s"""create table $tableName ( +id int, +name string, +price double, +ts long +) using hudi +options ( +primaryKey ='id', +type = 'mor', +preCombineField = 'ts', +hoodie.metadata.record.index.enable = 'true', +hoodie.datasource.write.recordkey.field = 'id' +) +partitioned by(ts) +location '$basePath'""".stripMargin) + spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)") + spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)") + spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)") + + checkAnswer(s"select id, name from $tableName where from_unixtime(ts, '-MM-dd') = '1970-01-01'")( +Seq(1, "a1"), +Seq(2, "a2"), +Seq(3, "a3") + ) + // create functional index + val createIndexSql = s"create index idx_datestr on $tableName using column_stats(ts) options(func='from_unixtime', format='-MM-dd')" + spark.sql(createIndexSql) + // do another insert after initializing the index + spark.sql(s"insert into $tableName values(4, 'a4', 10, 1000)") + // check query result + checkAnswer(s"select id, name from $tableName where from_unixtime(ts, '-MM-dd') = '1970-04-26'")( +Seq(4, "a4") + ) Review Comment: ack.. will do -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1524586719 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java: ## @@ -737,11 +738,12 @@ public boolean isMetadataTableAvailable() { /** * Checks if metadata table is enabled and the specified partition has been initialized. * - * @param partition The partition to check + * @param metadataPartitionType The metadata table partition type to check * @returns true if the specific partition has been initialized, else returns false. */ - public boolean isMetadataPartitionAvailable(MetadataPartitionType partition) { -return getMetadataPartitions().contains(partition.getPartitionPath()); + public boolean isMetadataPartitionAvailable(MetadataPartitionType metadataPartitionType) { +return getMetadataPartitions().stream().anyMatch(metadataPartition -> +metadataPartition.equals(metadataPartitionType.getPartitionPath()) || (FUNCTIONAL_INDEX.equals(metadataPartitionType) && metadataPartition.startsWith(FUNCTIONAL_INDEX.getPartitionPath(; } Review Comment: it's redundant. i can remove it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
bhat-vinay commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1524160977 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestFunctionalIndex.scala: ## @@ -253,6 +253,47 @@ class TestFunctionalIndex extends HoodieSparkSqlTestBase { } } + test("Test functional index update after initialization") { +withTempDir(tmp => { + val tableName = generateTableName + val basePath = s"${tmp.getCanonicalPath}/$tableName" + spark.sql( +s"""create table $tableName ( +id int, +name string, +price double, +ts long +) using hudi +options ( +primaryKey ='id', +type = 'mor', +preCombineField = 'ts', +hoodie.metadata.record.index.enable = 'true', +hoodie.datasource.write.recordkey.field = 'id' +) +partitioned by(ts) +location '$basePath'""".stripMargin) + spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)") + spark.sql(s"insert into $tableName values(2, 'a2', 10, 1001)") + spark.sql(s"insert into $tableName values(3, 'a3', 10, 1002)") + + checkAnswer(s"select id, name from $tableName where from_unixtime(ts, '-MM-dd') = '1970-01-01'")( +Seq(1, "a1"), +Seq(2, "a2"), +Seq(3, "a3") + ) + // create functional index + val createIndexSql = s"create index idx_datestr on $tableName using column_stats(ts) options(func='from_unixtime', format='-MM-dd')" + spark.sql(createIndexSql) + // do another insert after initializing the index + spark.sql(s"insert into $tableName values(4, 'a4', 10, 1000)") + // check query result + checkAnswer(s"select id, name from $tableName where from_unixtime(ts, '-MM-dd') = '1970-04-26'")( +Seq(4, "a4") + ) Review Comment: Might also want to check that the functional-index is available in tableConfig's metadata partition lists (`assert(metaClient.getTableConfig.isMetadataPartitionAvailable("func_index_idx_datestr"))`) ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java: ## @@ -737,11 +738,12 @@ public boolean isMetadataTableAvailable() { /** * Checks if metadata table is enabled and the specified partition has been initialized. * - * @param partition The partition to check + * @param metadataPartitionType The metadata table partition type to check * @returns true if the specific partition has been initialized, else returns false. */ - public boolean isMetadataPartitionAvailable(MetadataPartitionType partition) { -return getMetadataPartitions().contains(partition.getPartitionPath()); + public boolean isMetadataPartitionAvailable(MetadataPartitionType metadataPartitionType) { Review Comment: Is this sufficient or should we actually check the whole partition-path. Something like `public boolean isMetadataPartitionPathAvailable(String partitionPath)`. Mostly asking for the case when there could be multiple functional-index. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
danny0405 commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1524070449 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java: ## @@ -1919,7 +1920,7 @@ public HoodieRecord next() { public static Schema getProjectedSchemaForFunctionalIndex(HoodieFunctionalIndexDefinition indexDefinition, HoodieTableMetaClient metaClient) throws Exception { TableSchemaResolver schemaResolver = new TableSchemaResolver(metaClient); Schema tableSchema = schemaResolver.getTableAvroSchema(); -return HoodieAvroUtils.getSchemaForFields(tableSchema, indexDefinition.getSourceFields()); +return addMetadataFields(getSchemaForFields(tableSchema, indexDefinition.getSourceFields())); Review Comment: Is this change related with this fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
danny0405 commented on code in PR #10860: URL: https://github.com/apache/hudi/pull/10860#discussion_r1524070179 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java: ## @@ -737,11 +738,12 @@ public boolean isMetadataTableAvailable() { /** * Checks if metadata table is enabled and the specified partition has been initialized. * - * @param partition The partition to check + * @param metadataPartitionType The metadata table partition type to check * @returns true if the specific partition has been initialized, else returns false. */ - public boolean isMetadataPartitionAvailable(MetadataPartitionType partition) { -return getMetadataPartitions().contains(partition.getPartitionPath()); + public boolean isMetadataPartitionAvailable(MetadataPartitionType metadataPartitionType) { +return getMetadataPartitions().stream().anyMatch(metadataPartition -> +metadataPartition.equals(metadataPartitionType.getPartitionPath()) || (FUNCTIONAL_INDEX.equals(metadataPartitionType) && metadataPartition.startsWith(FUNCTIONAL_INDEX.getPartitionPath(; } Review Comment: Why we need this check: `FUNCTIONAL_INDEX.equals(metadataPartitionType)` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-1995528694 ## CI report: * 0b76cded2d8eab1c306d16a62717c7cba02e9b96 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22893) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-1995186398 ## CI report: * 0b76cded2d8eab1c306d16a62717c7cba02e9b96 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22893) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
hudi-bot commented on PR #10860: URL: https://github.com/apache/hudi/pull/10860#issuecomment-1995162248 ## CI report: * 0b76cded2d8eab1c306d16a62717c7cba02e9b96 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7480] Fix functional index and avoid multiple initializations [hudi]
codope opened a new pull request, #10860: URL: https://github.com/apache/hudi/pull/10860 ### Change Logs This PR fixes two issues: 1. When a functional index is updated, it was getting re-initialized. This was happening because metadata partition type only contains the functional index partition prefix while hoodie.properties contains the full name. Due to discrepancy, metadata writer thought there is anew index that needs ot be initialized. This is fixed in `HoodieTableConfig#isMetadataPartitionAvailable`. 2. When a functional index is updated, it fetches latest file slices, including the pending ones, in data partition. There was a typo in building the filesystem view. Instead of passing data meta client, metadata meta client was passed. Added a test which covers both these scenarios. ### Impact Fixes functional index bugs. ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org