[GitHub] [hudi] codope commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table
codope commented on code in PR #8900: URL: https://github.com/apache/hudi/pull/8900#discussion_r1223850963 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java: ## @@ -61,6 +61,12 @@ public class HoodieCompactionConfig extends HoodieConfig { + "but users are expected to trigger async job for execution. If `hoodie.compact.inline` is set to true, regular writers will do both scheduling and " + "execution inline for compaction"); + public static final ConfigProperty ENABLE_LOG_COMPACTION = ConfigProperty + .key("hoodie.log.compaction.enable") + .defaultValue("false") + .sinceVersion("0.14") + .withDocumentation("By enabling log compaction through this config, log compaction will also gets enabled to metadata table."); Review Comment: ```suggestion .withDocumentation("By enabling log compaction through this config, log compaction will also get enabled for the metadata table."); ``` ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandleFactory.java: ## @@ -47,6 +50,7 @@ public static HoodieMergeHandle create( String fileId, TaskContextSupplier taskContextSupplier, Option keyGeneratorOpt) { +LOG.info("Get updateHandle for fileId " + fileId + " and partitionPath " + partitionPath + " at commit " + instantTime); Review Comment: Are these logs really necessary? If so, please consider logging in debug mode. Same for all logs. ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -1021,17 +1023,46 @@ private void runPendingTableServicesOperations(BaseHoodieWriteClient writeClient * deltacommit. */ protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String latestDeltacommitTime) { + +// Check if there are any pending compaction or log compaction instants in the timeline. +// If pending compact/logcompaction operations are found abort scheduling new compaction/logcompaction operations. +Option pendingLogCompactionInstant = + metadataMetaClient.getActiveTimeline().filterPendingLogCompactionTimeline().firstInstant(); +Option pendingCompactionInstant = + metadataMetaClient.getActiveTimeline().filterPendingCompactionTimeline().firstInstant(); +if (pendingLogCompactionInstant.isPresent() || pendingCompactionInstant.isPresent()) { Review Comment: this validation can be moved inside "validateTimelineBeforeSchedulingCompaction" ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -1021,17 +1023,46 @@ private void runPendingTableServicesOperations(BaseHoodieWriteClient writeClient * deltacommit. */ protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String latestDeltacommitTime) { + Review Comment: nit: remove newline ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataWriter.java: ## @@ -109,4 +115,5 @@ public interface HoodieTableMetadataWriter extends Serializable, AutoCloseable { * deciding if optimizations can be performed. */ void performTableServices(Option inFlightInstantTimestamp); + Review Comment: nit: remove newline ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java: ## @@ -82,6 +84,7 @@ public HoodieCompactionPlan generateCompactionPlan() throws IOException { // filter the partition paths if needed to reduce list status partitionPaths = filterPartitionPathsByStrategy(writeConfig, partitionPaths); +LOG.info("Filtered partition paths are " + partitionPaths); Review Comment: ```suggestion LOG.debug("Filtered partition paths are " + partitionPaths); ``` ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestDataValidationCheckForLogCompactionActions.java: ## @@ -377,7 +377,7 @@ private TestTableContents setupTestTable2() throws IOException { // Create logcompaction client. HoodieWriteConfig logCompactionConfig = HoodieWriteConfig.newBuilder().withProps(config2.getProps()) .withCompactionConfig(HoodieCompactionConfig.newBuilder() -.withLogCompactionBlocksThreshold("2").build()) +.withLogCompactionBlocksThreshold(2).build()) Review Comment: good catch! ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestStreamWriteOperatorCoordinator.java: ## @@ -233,48 +233,49 @@ void testSyncMetadataTable() throws Exception { assertThat(completedTimeline.lastInstant().get().getTimestamp(), startsWith(HoodieTableMetadata.SOLO_COMMIT_TIMESTAMP)); // test metadata
[GitHub] [hudi] danny0405 commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table
danny0405 commented on code in PR #8900: URL: https://github.com/apache/hudi/pull/8900#discussion_r1223869176 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandleFactory.java: ## @@ -47,6 +50,7 @@ public static HoodieMergeHandle create( String fileId, TaskContextSupplier taskContextSupplier, Option keyGeneratorOpt) { +LOG.info("Get updateHandle for fileId " + fileId + " and partitionPath " + partitionPath + " at commit " + instantTime); Review Comment: `Create update handle for fileId ... and partition path ...` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] suryaprasanna commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table
suryaprasanna commented on code in PR #8900: URL: https://github.com/apache/hudi/pull/8900#discussion_r1223865818 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java: ## @@ -111,6 +112,10 @@ public static HoodieWriteConfig createMetadataWriteConfig( // deltacommits having corresponding completed commits. Therefore, we need to compact all fileslices of all // partitions together requiring UnBoundedCompactionStrategy. .withCompactionStrategy(new UnBoundedCompactionStrategy()) +// Check if log compaction is enabled, this is needed for tables with lot of records. +.withLogCompactionEnabled(writeConfig.isLogCompactionEnabled()) +// This config is only used if enableLogCompactionForMetadata is set. Review Comment: Fixed the comment. ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -1021,17 +1023,46 @@ private void runPendingTableServicesOperations(BaseHoodieWriteClient writeClient * deltacommit. */ protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String latestDeltacommitTime) { + +// Check if there are any pending compaction or log compaction instants in the timeline. +// If pending compact/logcompaction operations are found abort scheduling new compaction/logcompaction operations. +Option pendingLogCompactionInstant = + metadataMetaClient.getActiveTimeline().filterPendingLogCompactionTimeline().firstInstant(); Review Comment: Test for various cases like creation of compaction plan when logcompaction and vice versa are present in TestHoodieClientOnMergeOnReadStorage. ## hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/ArchiveExecutorUtils.java: ## @@ -57,6 +57,15 @@ public static int archive(JavaSparkContext jsc, .build(); HoodieEngineContext context = new HoodieSparkEngineContext(jsc); HoodieSparkTable table = HoodieSparkTable.create(config, context); + +// Check if the metadata is already initialized. If it is initialize ignore the input arguments enableMetadata. Review Comment: Reverting these changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table
nsivabalan commented on code in PR #8900: URL: https://github.com/apache/hudi/pull/8900#discussion_r1223854254 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieClientOnMergeOnReadStorage.java: ## @@ -314,7 +314,7 @@ public void testSchedulingCompactionAfterSchedulingLogCompaction() throws Except // Try scheduling compaction, it wont succeed Option compactionTimeStamp = client.scheduleCompaction(Option.empty()); -assertFalse(compactionTimeStamp.isPresent()); Review Comment: do we know the reason why we had to flip. ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java: ## @@ -111,6 +112,10 @@ public static HoodieWriteConfig createMetadataWriteConfig( // deltacommits having corresponding completed commits. Therefore, we need to compact all fileslices of all // partitions together requiring UnBoundedCompactionStrategy. .withCompactionStrategy(new UnBoundedCompactionStrategy()) +// Check if log compaction is enabled, this is needed for tables with lot of records. +.withLogCompactionEnabled(writeConfig.isLogCompactionEnabled()) +// This config is only used if enableLogCompactionForMetadata is set. Review Comment: not sure I get your comment here "This config is only used if enableLogCompactionForMetadata is set". from the code, it looks like we fetch from writeConfig.isLogCompactionEnabled(). ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestStreamWriteOperatorCoordinator.java: ## @@ -233,48 +233,49 @@ void testSyncMetadataTable() throws Exception { assertThat(completedTimeline.lastInstant().get().getTimestamp(), startsWith(HoodieTableMetadata.SOLO_COMMIT_TIMESTAMP)); // test metadata table compaction -// write another 4 commits -for (int i = 1; i < 5; i++) { +// write another 9 commits to trigger compaction twice. Since default clean version to retain is 2. Review Comment: @danny0405 : can you review changes in flink classes. ## hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/ArchiveExecutorUtils.java: ## @@ -57,6 +57,15 @@ public static int archive(JavaSparkContext jsc, .build(); HoodieEngineContext context = new HoodieSparkEngineContext(jsc); HoodieSparkTable table = HoodieSparkTable.create(config, context); + +// Check if the metadata is already initialized. If it is initialize ignore the input arguments enableMetadata. Review Comment: are these required ? ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -1021,17 +1023,46 @@ private void runPendingTableServicesOperations(BaseHoodieWriteClient writeClient * deltacommit. */ protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String latestDeltacommitTime) { + +// Check if there are any pending compaction or log compaction instants in the timeline. +// If pending compact/logcompaction operations are found abort scheduling new compaction/logcompaction operations. +Option pendingLogCompactionInstant = + metadataMetaClient.getActiveTimeline().filterPendingLogCompactionTimeline().firstInstant(); Review Comment: do we have tests for these? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table
hudi-bot commented on PR #8900: URL: https://github.com/apache/hudi/pull/8900#issuecomment-1583999590 ## CI report: * f9e3b8dd406a43d5808ee93105efb9154b05a6cb Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17691) * 85e65864e9376baf4d84149310810983751b87eb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17698) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8913: [HUDI-6343] Fixing fileId format for all mdt partitions
hudi-bot commented on PR #8913: URL: https://github.com/apache/hudi/pull/8913#issuecomment-1583999661 ## CI report: * 3580939238ab2c8a458df5d4a14b0a6f07ccebed Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17694) * 2aa49c14bf4df38e11087d1add3518190093f7cc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17699) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
hudi-bot commented on PR #8885: URL: https://github.com/apache/hudi/pull/8885#issuecomment-1583999528 ## CI report: * 6bcf646df9a0223b8787e7bae2255c628aea54b4 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17693) * 8fda23303081b08c252c8d0eb74abe431af44901 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17697) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.
danny0405 commented on code in PR #8837: URL: https://github.com/apache/hudi/pull/8837#discussion_r1223859381 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -851,26 +919,49 @@ public void update(HoodieRestoreMetadata restoreMetadata, String instantTime) { */ @Override public void update(HoodieRollbackMetadata rollbackMetadata, String instantTime) { -if (enabled && metadata != null) { - // Is this rollback of an instant that has been synced to the metadata table? - String rollbackInstant = rollbackMetadata.getCommitsRollback().get(0); - boolean wasSynced = metadataMetaClient.getActiveTimeline().containsInstant(new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, rollbackInstant)); - if (!wasSynced) { -// A compaction may have taken place on metadata table which would have included this instant being rolled back. -// Revisit this logic to relax the compaction fencing : https://issues.apache.org/jira/browse/HUDI-2458 -Option latestCompaction = metadata.getLatestCompactionTime(); -if (latestCompaction.isPresent()) { - wasSynced = HoodieTimeline.compareTimestamps(rollbackInstant, HoodieTimeline.LESSER_THAN_OR_EQUALS, latestCompaction.get()); -} +// The commit which is being rolled back on the dataset +final String commitInstantTime = rollbackMetadata.getCommitsRollback().get(0); +// Find the deltacommits since the last compaction +Option> deltaCommitsInfo = + CompactionUtils.getDeltaCommitsSinceLatestCompaction(metadataMetaClient.getActiveTimeline()); +if (!deltaCommitsInfo.isPresent()) { + LOG.info(String.format("Ignoring rollback of instant %s at %s since there are no deltacommits on MDT", commitInstantTime, instantTime)); + return; +} + +// This could be a compaction or deltacommit instant (See CompactionUtils.getDeltaCommitsSinceLatestCompaction) +HoodieInstant compactionInstant = deltaCommitsInfo.get().getValue(); +HoodieTimeline deltacommitsSinceCompaction = deltaCommitsInfo.get().getKey(); + +// The deltacommit that will be rolled back +HoodieInstant deltaCommitInstant = new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, commitInstantTime); + +// The commit being rolled back should not be older than the latest compaction on the MDT. Compaction on MDT only occurs when all actions +// are completed on the dataset. Hence, this case implies a rollback of completed commit which should actually be handled using restore. +if (compactionInstant.getAction().equals(HoodieTimeline.COMMIT_ACTION)) { + final String compactionInstantTime = compactionInstant.getTimestamp(); + if (HoodieTimeline.LESSER_THAN_OR_EQUALS.test(commitInstantTime, compactionInstantTime)) { +throw new HoodieMetadataException(String.format("Commit being rolled back %s is older than the latest compaction %s. " ++ "There are %d deltacommits after this compaction: %s", commitInstantTime, compactionInstantTime, +deltacommitsSinceCompaction.countInstants(), deltacommitsSinceCompaction.getInstants())); } +} - Map> records = - HoodieTableMetadataUtil.convertMetadataToRecords(engineContext, metadataMetaClient.getActiveTimeline(), - rollbackMetadata, getRecordsGenerationParams(), instantTime, - metadata.getSyncedInstantTime(), wasSynced); - commit(instantTime, records, false); - closeInternal(); +if (deltaCommitsInfo.get().getKey().containsInstant(deltaCommitInstant)) { + LOG.info("Rolling back MDT deltacommit " + commitInstantTime); + if (!getWriteClient().rollback(commitInstantTime, instantTime)) { +throw new HoodieMetadataException("Failed to rollback deltacommit at " + commitInstantTime); + } +} else { + LOG.info(String.format("Ignoring rollback of instant %s at %s since there are no corresponding deltacommits on MDT", + commitInstantTime, instantTime)); } + +// Rollback of MOR table may end up adding a new log file. So we need to check for added files and add them to MDT +processAndCommit(instantTime, () -> HoodieTableMetadataUtil.convertMetadataToRecords(engineContext, metadataMetaClient.getActiveTimeline(), +rollbackMetadata, getRecordsGenerationParams(), instantTime, +metadata.getSyncedInstantTime(), true), false); Review Comment: Not sure why we perform the rollback again ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.
danny0405 commented on code in PR #8837: URL: https://github.com/apache/hudi/pull/8837#discussion_r1223857627 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -851,26 +919,49 @@ public void update(HoodieRestoreMetadata restoreMetadata, String instantTime) { */ @Override public void update(HoodieRollbackMetadata rollbackMetadata, String instantTime) { -if (enabled && metadata != null) { - // Is this rollback of an instant that has been synced to the metadata table? - String rollbackInstant = rollbackMetadata.getCommitsRollback().get(0); - boolean wasSynced = metadataMetaClient.getActiveTimeline().containsInstant(new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, rollbackInstant)); - if (!wasSynced) { -// A compaction may have taken place on metadata table which would have included this instant being rolled back. -// Revisit this logic to relax the compaction fencing : https://issues.apache.org/jira/browse/HUDI-2458 -Option latestCompaction = metadata.getLatestCompactionTime(); -if (latestCompaction.isPresent()) { - wasSynced = HoodieTimeline.compareTimestamps(rollbackInstant, HoodieTimeline.LESSER_THAN_OR_EQUALS, latestCompaction.get()); -} +// The commit which is being rolled back on the dataset +final String commitInstantTime = rollbackMetadata.getCommitsRollback().get(0); +// Find the deltacommits since the last compaction +Option> deltaCommitsInfo = + CompactionUtils.getDeltaCommitsSinceLatestCompaction(metadataMetaClient.getActiveTimeline()); +if (!deltaCommitsInfo.isPresent()) { + LOG.info(String.format("Ignoring rollback of instant %s at %s since there are no deltacommits on MDT", commitInstantTime, instantTime)); + return; +} + +// This could be a compaction or deltacommit instant (See CompactionUtils.getDeltaCommitsSinceLatestCompaction) +HoodieInstant compactionInstant = deltaCommitsInfo.get().getValue(); +HoodieTimeline deltacommitsSinceCompaction = deltaCommitsInfo.get().getKey(); + +// The deltacommit that will be rolled back +HoodieInstant deltaCommitInstant = new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, commitInstantTime); + +// The commit being rolled back should not be older than the latest compaction on the MDT. Compaction on MDT only occurs when all actions +// are completed on the dataset. Hence, this case implies a rollback of completed commit which should actually be handled using restore. +if (compactionInstant.getAction().equals(HoodieTimeline.COMMIT_ACTION)) { + final String compactionInstantTime = compactionInstant.getTimestamp(); + if (HoodieTimeline.LESSER_THAN_OR_EQUALS.test(commitInstantTime, compactionInstantTime)) { +throw new HoodieMetadataException(String.format("Commit being rolled back %s is older than the latest compaction %s. " ++ "There are %d deltacommits after this compaction: %s", commitInstantTime, compactionInstantTime, +deltacommitsSinceCompaction.countInstants(), deltacommitsSinceCompaction.getInstants())); } +} - Map> records = - HoodieTableMetadataUtil.convertMetadataToRecords(engineContext, metadataMetaClient.getActiveTimeline(), - rollbackMetadata, getRecordsGenerationParams(), instantTime, - metadata.getSyncedInstantTime(), wasSynced); - commit(instantTime, records, false); - closeInternal(); +if (deltaCommitsInfo.get().getKey().containsInstant(deltaCommitInstant)) { Review Comment: Use `deltacommitsSinceCompaction` should be fine? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.
voonhous commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1583996315 @pftn can you please help to verify if the data in these 2 parquets are the same? 1. 20220604/0007-3477-401f-982e-e5ae38ca0e23_3-20-6_20230510170043301.parquet 2. 20220604/0007-4bc1-4340-a9d8-330666a58244_5-20-6_20230511183601566.parquet Do you still have the compaction plans that generated these 2 parquet files, it'll be extremely helpful if we can know the write token of the log files before compaction. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8913: [HUDI-6343] Fixing fileId format for all mdt partitions
hudi-bot commented on PR #8913: URL: https://github.com/apache/hudi/pull/8913#issuecomment-1583993645 ## CI report: * 3580939238ab2c8a458df5d4a14b0a6f07ccebed Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17694) * 2aa49c14bf4df38e11087d1add3518190093f7cc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8911: [Hudi-8882] Compatible with hive 2.2.x to read hudi rt table
hudi-bot commented on PR #8911: URL: https://github.com/apache/hudi/pull/8911#issuecomment-1583993589 ## CI report: * 1ddc84cab970a6a43ea77a729213dc8c5200d845 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17690) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table
hudi-bot commented on PR #8900: URL: https://github.com/apache/hudi/pull/8900#issuecomment-1583993513 ## CI report: * fe74a9a7d32286ae29ded9370f6d53ccb14c8809 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17677) * f9e3b8dd406a43d5808ee93105efb9154b05a6cb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17691) * 85e65864e9376baf4d84149310810983751b87eb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
hudi-bot commented on PR #8885: URL: https://github.com/apache/hudi/pull/8885#issuecomment-1583993438 ## CI report: * e2f44f2a1f574eed79090b337d7bd56e08058b51 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17680) * 6bcf646df9a0223b8787e7bae2255c628aea54b4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17693) * 8fda23303081b08c252c8d0eb74abe431af44901 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
yihua commented on PR #8885: URL: https://github.com/apache/hudi/pull/8885#issuecomment-1583990851 @CTTY Thanks for the review. I addressed all your comments. @rahil-c @mansipp let me know if you have more comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a diff in pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
yihua commented on code in PR #8885: URL: https://github.com/apache/hudi/pull/8885#discussion_r1223853653 ## hudi-spark-datasource/hudi-spark3.4.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark34HoodieParquetFileFormat.scala: ## @@ -0,0 +1,532 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.parquet + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapred.FileSplit +import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl +import org.apache.hadoop.mapreduce.{JobID, TaskAttemptID, TaskID, TaskType} +import org.apache.hudi.HoodieSparkUtils +import org.apache.hudi.client.utils.SparkInternalSchemaConverter +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.util.InternalSchemaCache +import org.apache.hudi.common.util.StringUtils.isNullOrEmpty +import org.apache.hudi.common.util.collection.Pair +import org.apache.hudi.internal.schema.InternalSchema +import org.apache.hudi.internal.schema.action.InternalSchemaMerger +import org.apache.hudi.internal.schema.utils.{InternalSchemaUtils, SerDeHelper} +import org.apache.parquet.filter2.compat.FilterCompat +import org.apache.parquet.filter2.predicate.FilterApi +import org.apache.parquet.format.converter.ParquetMetadataConverter.SKIP_ROW_GROUPS +import org.apache.parquet.hadoop.{ParquetInputFormat, ParquetRecordReader} +import org.apache.spark.TaskContext +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection +import org.apache.spark.sql.catalyst.expressions.{Cast, JoinedRow} +import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.execution.WholeStageCodegenExec +import org.apache.spark.sql.execution.datasources.parquet.Spark34HoodieParquetFileFormat._ +import org.apache.spark.sql.execution.datasources.{DataSourceUtils, PartitionedFile, RecordReaderIterator} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources._ +import org.apache.spark.sql.types.{AtomicType, DataType, StructField, StructType} +import org.apache.spark.util.SerializableConfiguration +/** + * This class is an extension of [[ParquetFileFormat]] overriding Spark-specific behavior + * that's not possible to customize in any other way + * + * NOTE: This is a version of [[AvroDeserializer]] impl from Spark 3.2.1 w/ w/ the following changes applied to it: + * + * Avoiding appending partition values to the rows read from the data file + * Schema on-read + * + */ +class Spark34HoodieParquetFileFormat(private val shouldAppendPartitionValues: Boolean) extends ParquetFileFormat { + + override def supportBatch(sparkSession: SparkSession, schema: StructType): Boolean = { +val conf = sparkSession.sessionState.conf +conf.parquetVectorizedReaderEnabled && schema.forall(_.dataType.isInstanceOf[AtomicType]) + } + + def supportsColumnar(sparkSession: SparkSession, schema: StructType): Boolean = { +val conf = sparkSession.sessionState.conf +// Only output columnar if there is WSCG to read it. +val requiredWholeStageCodegenSettings = + conf.wholeStageEnabled && !WholeStageCodegenExec.isTooManyFields(conf, schema) +requiredWholeStageCodegenSettings && + supportBatch(sparkSession, schema) + } + + override def buildReaderWithPartitionValues(sparkSession: SparkSession, + dataSchema: StructType, + partitionSchema: StructType, + requiredSchema: StructType, + filters: Seq[Filter], + options: Map[String, String], + hadoopConf: Configuration): PartitionedFile => Iterator[InternalRow] = { +hadoopConf.set(ParquetInputFormat.READ_SUPPORT_CLASS, classOf[ParquetReadSupport].getName) +hadoopConf.set( + ParquetReadSupport.SPARK_ROW_REQUESTED_SCHEMA, + requiredSchema.json) +hadoopConf.set( +
[GitHub] [hudi] hudi-bot commented on pull request #8905: [HUDI-6337] Incremental Clean ignore partitions affected by append write commits/delta commits
hudi-bot commented on PR #8905: URL: https://github.com/apache/hudi/pull/8905#issuecomment-1583987938 ## CI report: * f8f14263190df7b66143e192188e68463e0c1efd UNKNOWN * f9adcecf4e54774510569f14af4c81a1f4951a28 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17681) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17689) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.
danny0405 commented on code in PR #8837: URL: https://github.com/apache/hudi/pull/8837#discussion_r1223851326 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -837,10 +840,75 @@ public void update(HoodieCleanMetadata cleanMetadata, String instantTime) { */ @Override public void update(HoodieRestoreMetadata restoreMetadata, String instantTime) { -processAndCommit(instantTime, () -> HoodieTableMetadataUtil.convertMetadataToRecords(engineContext, -metadataMetaClient.getActiveTimeline(), restoreMetadata, getRecordsGenerationParams(), instantTime, -metadata.getSyncedInstantTime()), false); -closeInternal(); +dataMetaClient.reloadActiveTimeline(); + +// Since the restore has completed on the dataset, the latest write timeline instant is the one to which the +// restore was performed. This should be always present. +final String restoreToInstantTime = dataMetaClient.getActiveTimeline().getWriteTimeline() +.getReverseOrderedInstants().findFirst().get().getTimestamp(); Review Comment: Why not use `lastInstant` instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8904: [SUPPORT] spark-sql hudi table Caused by: org.apache.avro.AvroTypeException: Found string, expecting union
ad1happy2go commented on issue #8904: URL: https://github.com/apache/hudi/issues/8904#issuecomment-1583978910 @zyclove This is known issue with hudi 0.11.1. This was fixed with this commit - https://github.com/apache/hudi/pull/6358 Can you try out this and let us know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3891) Investigate Hudi vs Raw Parquet table discrepancy
[ https://issues.apache.org/jira/browse/HUDI-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3891: - Epic Link: HUDI-1297 > Investigate Hudi vs Raw Parquet table discrepancy > - > > Key: HUDI-3891 > URL: https://issues.apache.org/jira/browse/HUDI-3891 > Project: Apache Hudi > Issue Type: Task >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > Attachments: image-2022-04-16-13-50-43-916.png, > image-2022-04-16-13-50-43-956.png > > > While benchmarking querying raw Parquet tables against Hudi tables, i've run > the test against the same (Hudi) table: > # In one query path i'm reading it as just a raw Parquet table > # In another, i'm reading it as Hudi RO (read_optimized) table > Surprisingly enough, those 2 diverge in the # of files being read: > > _Raw Parquet_ > !https://t18029943.p.clickup-attachments.com/t18029943/f700a129-35bc-4aaa-948c-9495392653f2/Screen%20Shot%202022-04-15%20at%205.20.41%20PM.png|width=1691,height=149! > > _Hudi_ > !https://t18029943.p.clickup-attachments.com/t18029943/d063c689-a254-45cf-8ba5-07fc88b354b6/Screen%20Shot%202022-04-15%20at%205.21.33%20PM.png|width=1673,height=142! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #8914: [HUDI-6344] Flink MDT bulk_insert for initial commit
hudi-bot commented on PR #8914: URL: https://github.com/apache/hudi/pull/8914#issuecomment-1583960558 ## CI report: * c72b73a619fbc720e343b1fc5a0e3e9506857d1b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17695) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8913: [HUDI-6343] Fixing fileId format for all mdt partitions
hudi-bot commented on PR #8913: URL: https://github.com/apache/hudi/pull/8913#issuecomment-1583960533 ## CI report: * 3580939238ab2c8a458df5d4a14b0a6f07ccebed Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17694) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-6342) Fix flaky MultiTableDeltaStreamer test
[ https://issues.apache.org/jira/browse/HUDI-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6342. Fix Version/s: 0.14.0 Resolution: Fixed Fixed via master branch: f1c8049f81af94dc4b01b25eb80218a9d97f2a8e > Fix flaky MultiTableDeltaStreamer test > -- > > Key: HUDI-6342 > URL: https://issues.apache.org/jira/browse/HUDI-6342 > Project: Apache Hudi > Issue Type: Bug > Components: tests-ci >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > TestHoodieDeltaStreamerWithMultiWriter. > testUpsertsContinuousModeWithMultipleWritersForConflicts > is flaky in recent times. > > [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/17675/logs/21] > > {code:java} > 2023-06-08T14:02:50.4346417Z 798455 [pool-1655-thread-1] ERROR > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter > [] - Continuous job failed java.lang.RuntimeException: Ingestion service was > shut down with exception. > 2023-06-08T14:02:50.4351308Z 798455 [Listener at localhost/45789] ERROR > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter > [] - Conflict happened, but not expected > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Ingestion service was shut down with exception. > 2023-06-08T14:02:50.7579883Z [ERROR] Tests run: 5, Failures: 0, Errors: 1, > Skipped: 1, Time elapsed: 201.181 s <<< FAILURE! - in > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter > 2023-06-08T14:02:50.7615120Z [ERROR] > testUpsertsContinuousModeWithMultipleWritersForConflicts{HoodieTableType}[2] > Time elapsed: 56.062 s <<< ERROR! > 2023-06-08T14:02:50.7615570Z java.util.concurrent.ExecutionException: > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Ingestion service was shut down with exception. > 2023-06-08T14:02:50.7616039Z at > java.util.concurrent.FutureTask.report(FutureTask.java:122) > 2023-06-08T14:02:50.7616662Z at > java.util.concurrent.FutureTask.get(FutureTask.java:192) > 2023-06-08T14:02:50.7617179Z at > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter.runJobsInParallel(TestHoodieDeltaStreamerWithMultiWriter.java:398) > 2023-06-08T14:02:50.7617674Z at > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter.testUpsertsContinuousModeWithMultipleWritersForConflicts(TestHoodieDeltaStreamerWithMultiWriter.java:140) > 2023-06-08T14:02:50.7618059Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2023-06-08T14:02:50.7618319Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2023-06-08T14:02:50.7618615Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2023-06-08T14:02:50.7618896Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2023-06-08T14:02:50.7619173Z at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) > 2023-06-08T14:02:50.7619480Z at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2023-06-08T14:02:50.7619845Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2023-06-08T14:02:50.7620217Z at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2023-06-08T14:02:50.7620540Z at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2023-06-08T14:02:50.7620903Z at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92) > 2023-06-08T14:02:50.7621288Z at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2023-06-08T14:02:50.7621849Z at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2023-06-08T14:02:50.767Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2023-06-08T14:02:50.7622626Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2023-06-08T14:02:50.7623010Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2023-06-08T14:02:50.7623375Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2023-06-08T14:02:50.7623723Z at >
[hudi] branch master updated (593181397e2 -> f1c8049f81a)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 593181397e2 [HUDI-5352] Fix `LocalDate` serialization in colstats (#8840) add f1c8049f81a [HUDI-6342] Fixing flaky Continuous mode multi writer tests (#8910) No new revisions were added by this update. Summary of changes: .../TestHoodieDeltaStreamerWithMultiWriter.java | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-)
[GitHub] [hudi] danny0405 merged pull request #8910: [HUDI-6342] Fixing flaky Continuous mode multi writer tests
danny0405 merged PR #8910: URL: https://github.com/apache/hudi/pull/8910 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #8913: [HUDI-6343] Fixing fileId format for all mdt partitions
codope commented on code in PR #8913: URL: https://github.com/apache/hudi/pull/8913#discussion_r1223830445 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMetadataTableWithSparkDataSource.scala: ## @@ -84,23 +84,21 @@ class TestMetadataTableWithSparkDataSource extends SparkClientFunctionalTestHarn .mode(SaveMode.Append) .save(basePath) -// Files partition of MT -val filesPartitionDF = spark.read.format(hudi).load(s"$basePath/.hoodie/metadata/files") +val mdtDf = spark.read.format("hudi").load(s"$basePath/.hoodie/metadata") +mdtDf.show() Review Comment: yeah let's remove this.. can be helpful for debugging but already our logs are bloated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #8126: [SUPPORT] Exit code 137 (interrupted by signal 9: SIGKILL) when StreamWriteFunction detect object size
danny0405 commented on issue #8126: URL: https://github.com/apache/hudi/issues/8126#issuecomment-1583957629 Yeah, it is introduced by https://github.com/apache/hudi/pull/6657, @codope , do you have any thoughts here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8913: [HUDI-6343] Fixing fileId format for all mdt partitions
hudi-bot commented on PR #8913: URL: https://github.com/apache/hudi/pull/8913#issuecomment-1583956648 ## CI report: * 3580939238ab2c8a458df5d4a14b0a6f07ccebed UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8914: [HUDI-6344] Flink MDT bulk_insert for initial commit
hudi-bot commented on PR #8914: URL: https://github.com/apache/hudi/pull/8914#issuecomment-1583956673 ## CI report: * c72b73a619fbc720e343b1fc5a0e3e9506857d1b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #8910: [HUDI-6342] Fixing flaky Continuous mode multi writer tests
codope commented on code in PR #8910: URL: https://github.com/apache/hudi/pull/8910#discussion_r1223827683 ## hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamerWithMultiWriter.java: ## @@ -404,12 +405,24 @@ private void runJobsInParallel(String tableBasePath, HoodieTableType tableType, * Need to perform getMessage().contains since the exception coming * from {@link org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.DeltaSyncService} gets wrapped many times into RuntimeExceptions. */ - if (expectConflict && e.getCause().getMessage().contains(ConcurrentModificationException.class.getName())) { + if (expectConflict && backfillFailed.get() && e.getCause().getMessage().contains(ConcurrentModificationException.class.getName())) { // expected ConcurrentModificationException since ingestion & backfill will have overlapping writes -if (backfillFailed.get()) { +if (!continuousFailed.get()) { // if backfill job failed, shutdown the continuous job. LOG.warn("Calling shutdown on ingestion job since the backfill job has failed for " + jobId); ingestionJob.shutdownGracefully(); +} else { + // both backfill and ingestion job cannot fail. + throw new HoodieException("Both backfilling and ingestion job failed ", e); +} + } else if (expectConflict && continuousFailed.get() && e.getCause().getMessage().contains("Ingestion service was shut down with exception")) { +// incase of regular ingestion job failing, ConcurrentModificationException is not throw all the way. Review Comment: nit: `thrown` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8913: [HUDI-6343] Fixing fileId format for all mdt partitions
danny0405 commented on code in PR #8913: URL: https://github.com/apache/hudi/pull/8913#discussion_r1223826247 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMetadataTableWithSparkDataSource.scala: ## @@ -84,23 +84,21 @@ class TestMetadataTableWithSparkDataSource extends SparkClientFunctionalTestHarn .mode(SaveMode.Append) .save(basePath) -// Files partition of MT -val filesPartitionDF = spark.read.format(hudi).load(s"$basePath/.hoodie/metadata/files") +val mdtDf = spark.read.format("hudi").load(s"$basePath/.hoodie/metadata") +mdtDf.show() Review Comment: what the purpose of the `show` calling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8913: [HUDI-6343] Fixing fileId format for all mdt partitions
danny0405 commented on code in PR #8913: URL: https://github.com/apache/hudi/pull/8913#discussion_r1223825494 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java: ## @@ -1453,11 +1453,7 @@ public static String deleteMetadataTablePartition(HoodieTableMetaClient dataMeta * @return The fileID */ public static String getFileIDForFileGroup(MetadataPartitionType partitionType, int index) { -if (partitionType == MetadataPartitionType.FILES) { - return String.format("%s%04d-%d", partitionType.getFileIdPrefix(), index, 0); -} else { - return String.format("%s%04d", partitionType.getFileIdPrefix(), index); -} +return String.format("%s%04d-%d", partitionType.getFileIdPrefix(), index, 0); Review Comment: Nice catch ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6344) Support Flink MDT bulk_insert
[ https://issues.apache.org/jira/browse/HUDI-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6344: - Labels: pull-request-available (was: ) > Support Flink MDT bulk_insert > - > > Key: HUDI-6344 > URL: https://issues.apache.org/jira/browse/HUDI-6344 > Project: Apache Hudi > Issue Type: Improvement > Components: flink-sql >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 opened a new pull request, #8914: [HUDI-6344] Flink MDT bulk_insert for initial commit
danny0405 opened a new pull request, #8914: URL: https://github.com/apache/hudi/pull/8914 ### Change Logs Fix the bulk_insert for Flink MDT initialization after #8684 . ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8910: [HUDI-6342] Fixing flaky Continuous mode multi writer tests
hudi-bot commented on PR #8910: URL: https://github.com/apache/hudi/pull/8910#issuecomment-1583949317 ## CI report: * fc4825fd3b646e3b69322b386fa4b2fd4f19ba67 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17687) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6344) Support Flink MDT bulk_insert
Danny Chen created HUDI-6344: Summary: Support Flink MDT bulk_insert Key: HUDI-6344 URL: https://issues.apache.org/jira/browse/HUDI-6344 Project: Apache Hudi Issue Type: Improvement Components: flink-sql Reporter: Danny Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6344) Support Flink MDT bulk_insert
[ https://issues.apache.org/jira/browse/HUDI-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6344: - Fix Version/s: 0.14.0 > Support Flink MDT bulk_insert > - > > Key: HUDI-6344 > URL: https://issues.apache.org/jira/browse/HUDI-6344 > Project: Apache Hudi > Issue Type: Improvement > Components: flink-sql >Reporter: Danny Chen >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] nsivabalan commented on pull request #8910: [HUDI-6342] Fixing flaky Continuous mode multi writer tests
nsivabalan commented on PR #8910: URL: https://github.com/apache/hudi/pull/8910#issuecomment-1583934703 we have an unrelated test failure ``` Test Call run_clustering Procedure Order Strategy *** FAILED *** ``` since this patch is also fixing a flaky test, prefer to go ahead w/ landing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6343) File id format differs from FILES partition and others when its initialized for first time
[ https://issues.apache.org/jira/browse/HUDI-6343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6343: - Labels: pull-request-available (was: ) > File id format differs from FILES partition and others when its initialized > for first time > -- > > Key: HUDI-6343 > URL: https://issues.apache.org/jira/browse/HUDI-6343 > Project: Apache Hudi > Issue Type: Bug > Components: metadata >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > [https://github.com/apache/hudi/blob/593181397e2f03b1172487e280ad279557bbf423/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java#L1455] > > When bulk insert gets triggered, the file group Id might differ for other > partitions. We might need to fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] nsivabalan opened a new pull request, #8913: [HUDI-6343] Fixing fileId format for all mdt partitions
nsivabalan opened a new pull request, #8913: URL: https://github.com/apache/hudi/pull/8913 ### Change Logs Fixing fileId format for all mdt partitions. When bulk_insert gets triggered, the fileId will get suffixed with "-0" in the end. And so, we might need to ensure the initial instantiation also follows the same format. ### Impact Fixing fileId format for all mdt partitions ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6343) File id format differs from FILES partition and others when its initialized for first time
sivabalan narayanan created HUDI-6343: - Summary: File id format differs from FILES partition and others when its initialized for first time Key: HUDI-6343 URL: https://issues.apache.org/jira/browse/HUDI-6343 Project: Apache Hudi Issue Type: Bug Components: metadata Reporter: sivabalan narayanan [https://github.com/apache/hudi/blob/593181397e2f03b1172487e280ad279557bbf423/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java#L1455] When bulk insert gets triggered, the file group Id might differ for other partitions. We might need to fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
hudi-bot commented on PR #8885: URL: https://github.com/apache/hudi/pull/8885#issuecomment-1583913902 ## CI report: * e2f44f2a1f574eed79090b337d7bd56e08058b51 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17680) * 6bcf646df9a0223b8787e7bae2255c628aea54b4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17693) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
hudi-bot commented on PR #8885: URL: https://github.com/apache/hudi/pull/8885#issuecomment-1583907653 ## CI report: * e2f44f2a1f574eed79090b337d7bd56e08058b51 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17680) * 6bcf646df9a0223b8787e7bae2255c628aea54b4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a diff in pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
yihua commented on code in PR #8885: URL: https://github.com/apache/hudi/pull/8885#discussion_r1223791778 ## hudi-common/src/main/java/org/apache/hudi/common/util/JsonUtils.java: ## @@ -35,6 +36,8 @@ public class JsonUtils { private static final ObjectMapper MAPPER = new ObjectMapper(); static { +registerModules(MAPPER); + Review Comment: This is fixed now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zyclove opened a new issue, #8912: [SUPPORT]hudi 0.12.2 sometimes appear org.apache.hudi.exception.HoodieIOException: IOException when reading log file
zyclove opened a new issue, #8912: URL: https://github.com/apache/hudi/issues/8912 **Describe the problem you faced** hudi timing spark-sql scheduling tasks sometimes appear org.apache.hudi.exception.HoodieIOException: IOException when reading log file. **To Reproduce** Steps to reproduce the behavior: 1.spark-sql hudi task execute once in half an hour. 2. sometimes error with log file missing, as follow. 3.just touch miss file, rerun is ok. but it is not a normal work. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version :0.12.2 * Spark version :aws EMR 3.2.1 * Hive version :2.3.9 * Hadoop version :3.2.1 * Storage (HDFS/S3/GCS..) :s3 * Running on Docker? (yes/no) :no **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` 3/06/09 02:39:16 INFO BlockManagerInfo: Added rdd_7_16 in memory on ip-172-16-13-109.us-west-2.compute.internal:42539 (size: 2.2 KiB, free: 8.4 GiB) 23/06/09 02:39:16 WARN TaskSetManager: Lost task 93.3 in stage 0.0 (TID 110) (ip-172-16-12-181.us-west-2.compute.internal executor 1): org.apache.hudi.exception.HoodieIOException: IOException when reading log file at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:374) at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:220) at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:209) at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:113) at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:106) at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:343) at org.apache.hudi.LogFileIterator$.scanLog(LogFileIterator.scala:305) at org.apache.hudi.LogFileIterator.(LogFileIterator.scala:89) at org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:96) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.sql.execution.SQLConfInjectingRDD.compute(SQLConfInjectingRDD.scala:58) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1440) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384) at org.apache.spark.rdd.RDD.iterator(RDD.scala:335) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at
[hudi] branch master updated (80e0b557ffe -> 593181397e2)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 80e0b557ffe [HUDI-6310] CreateHoodieTableCommand::createHiveDataSourceTable arguments refactor (#8874) add 593181397e2 [HUDI-5352] Fix `LocalDate` serialization in colstats (#8840) No new revisions were added by this update. Summary of changes: hudi-common/pom.xml | 4 .../java/org/apache/hudi/common/util/JsonUtils.java | 8 .../hudi/functional/TestColumnStatsIndex.scala | 7 ++- packaging/hudi-flink-bundle/pom.xml | 3 ++- packaging/hudi-hadoop-mr-bundle/pom.xml | 13 + packaging/hudi-hive-sync-bundle/pom.xml | 3 +++ packaging/hudi-integ-test-bundle/pom.xml | 5 +++-- packaging/hudi-kafka-connect-bundle/pom.xml | 13 + packaging/hudi-spark-bundle/pom.xml | 3 +++ packaging/hudi-timeline-server-bundle/pom.xml| 9 + packaging/hudi-utilities-bundle/pom.xml | 3 +++ pom.xml | 20 12 files changed, 83 insertions(+), 8 deletions(-)
[GitHub] [hudi] yihua merged pull request #8840: [HUDI-5352] Fix `LocalDate` serialization in colstats
yihua merged PR #8840: URL: https://github.com/apache/hudi/pull/8840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table
hudi-bot commented on PR #8900: URL: https://github.com/apache/hudi/pull/8900#issuecomment-1583854119 ## CI report: * fe74a9a7d32286ae29ded9370f6d53ccb14c8809 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17677) * f9e3b8dd406a43d5808ee93105efb9154b05a6cb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17691) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zyclove commented on issue #8903: [SUPPORT] aws spark3.2.1 & hudi 0.13.1 with java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.PartitionedFile
zyclove commented on issue #8903: URL: https://github.com/apache/hudi/issues/8903#issuecomment-1583849913 @umehrot2 hi,Hudi Experts, can anyone help me? The prod env problem is more urgent, looking forward to an early reply. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zyclove commented on issue #8904: [SUPPORT] spark-sql hudi table Caused by: org.apache.avro.AvroTypeException: Found string, expecting union
zyclove commented on issue #8904: URL: https://github.com/apache/hudi/issues/8904#issuecomment-1583848717 hi,Hudi Experts, can anyone help me? The prod env problem is more urgent, looking forward to an early reply. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8911: [Hudi-8882] Compatible with hive 2.2.x to read hudi rt table
hudi-bot commented on PR #8911: URL: https://github.com/apache/hudi/pull/8911#issuecomment-1583844595 ## CI report: * 1ddc84cab970a6a43ea77a729213dc8c5200d845 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17690) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table
hudi-bot commented on PR #8900: URL: https://github.com/apache/hudi/pull/8900#issuecomment-1583844504 ## CI report: * fe74a9a7d32286ae29ded9370f6d53ccb14c8809 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17677) * f9e3b8dd406a43d5808ee93105efb9154b05a6cb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8840: [HUDI-5352] Fix `LocalDate` serialization in colstats
hudi-bot commented on PR #8840: URL: https://github.com/apache/hudi/pull/8840#issuecomment-1583844187 ## CI report: * 80b25e613cbcdf8f3e1efe39436cad173163d9d9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17686) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] boneanxs commented on a diff in pull request #8452: [HUDI-6077] Add more partition push down filters
boneanxs commented on code in PR #8452: URL: https://github.com/apache/hudi/pull/8452#discussion_r1223761168 ## hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java: ## @@ -50,20 +56,25 @@ /** * Implementation of {@link HoodieTableMetadata} based file-system-backed table metadata. */ -public class FileSystemBackedTableMetadata implements HoodieTableMetadata { +public class FileSystemBackedTableMetadata extends AbstractHoodieTableMetadata { private static final int DEFAULT_LISTING_PARALLELISM = 1500; - private final transient HoodieEngineContext engineContext; - private final SerializableConfiguration hadoopConf; - private final String datasetBasePath; private final boolean assumeDatePartitioning; + private final boolean hiveStylePartitioningEnabled; + private final boolean urlEncodePartitioningEnabled; + public FileSystemBackedTableMetadata(HoodieEngineContext engineContext, SerializableConfiguration conf, String datasetBasePath, boolean assumeDatePartitioning) { -this.engineContext = engineContext; -this.hadoopConf = conf; -this.datasetBasePath = datasetBasePath; +super(engineContext, conf, datasetBasePath); + +FileSystem fs = FSUtils.getFs(dataBasePath.get(), conf.get()); +Path metaPath = new Path(dataBasePath.get(), HoodieTableMetaClient.METAFOLDER_NAME); +TableNotFoundException.checkTableValidity(fs, this.dataBasePath.get(), metaPath); +HoodieTableConfig tableConfig = new HoodieTableConfig(fs, metaPath.toString(), null, null); Review Comment: Move creating `HoodieTableConfig` only for `FileSystemBackedTableMetadata`, in case `HoodieBackedTableMetadata` creating it twice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8911: [Hudi-8882] Compatible with hive 2.2.x to read hudi rt table
hudi-bot commented on PR #8911: URL: https://github.com/apache/hudi/pull/8911#issuecomment-1583838150 ## CI report: * 1ddc84cab970a6a43ea77a729213dc8c5200d845 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8905: [HUDI-6337] Incremental Clean ignore partitions affected by append write commits/delta commits
hudi-bot commented on PR #8905: URL: https://github.com/apache/hudi/pull/8905#issuecomment-1583838091 ## CI report: * f8f14263190df7b66143e192188e68463e0c1efd UNKNOWN * f9adcecf4e54774510569f14af4c81a1f4951a28 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17681) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17689) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1583807262 I probably looked at the hive branch 2.0 2.1 code and it should be the same as 2.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stream2000 commented on pull request #8905: [HUDI-6337] Incremental Clean ignore partitions affected by append write commits/delta commits
stream2000 commented on PR #8905: URL: https://github.com/apache/hudi/pull/8905#issuecomment-1583798165 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on pull request #8911: [Hudi-8882] Compatible with hive 2.2.x to read hudi rt table
thomasg19930417 commented on PR #8911: URL: https://github.com/apache/hudi/pull/8911#issuecomment-1583794413 @danny0405 Please help to review the code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8910: [HUDI-6342] Fixing flaky Continuous mode multi writer tests
hudi-bot commented on PR #8910: URL: https://github.com/apache/hudi/pull/8910#issuecomment-1583793617 ## CI report: * fc4825fd3b646e3b69322b386fa4b2fd4f19ba67 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17687) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1583792529 @danny0405 I submitted a pr to be compatible with hive2.2, copied part of the code of hive2.3 to hudi, and converted the data structure of hive2.2 to the form in 2.3 for processing. I don’t know how to do this Is it reasonable and there may be some problems in the code, please help to review the code #8911 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 merged pull request #8874: [HUDI-6310] CreateHoodieTableCommand::createHiveDataSourceTable arguments refactor
danny0405 merged PR #8874: URL: https://github.com/apache/hudi/pull/8874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (7ae8da02d12 -> 80e0b557ffe)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 7ae8da02d12 [HUDI-6200] Enhancements to the MDT for improving performance of larger indexes. (#8684) add 80e0b557ffe [HUDI-6310] CreateHoodieTableCommand::createHiveDataSourceTable arguments refactor (#8874) No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/hudi/command/CreateHoodieTableCommand.scala | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
[GitHub] [hudi] danny0405 commented on issue #8906: [SUPPORT] hudi upsert error: java.lang.NumberFormatException: For input string: "d880d4ea"
danny0405 commented on issue #8906: URL: https://github.com/apache/hudi/issues/8906#issuecomment-1583786336 you are right, bucket index for bulk_insert in supported only in latest master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 opened a new pull request, #8911: Compatible with hive 2.2.x to read hudi rt table
thomasg19930417 opened a new pull request, #8911: URL: https://github.com/apache/hudi/pull/8911 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8910: [HUDI-6342] Fixing flaky Continuous mode multi writer tests
hudi-bot commented on PR #8910: URL: https://github.com/apache/hudi/pull/8910#issuecomment-1583784756 ## CI report: * fc4825fd3b646e3b69322b386fa4b2fd4f19ba67 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8684: [HUDI-6200] Enhancements to the MDT for improving performance of larger indexes.
danny0405 commented on code in PR #8684: URL: https://github.com/apache/hudi/pull/8684#discussion_r1223719557 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -1052,51 +1091,81 @@ protected HoodieData prepRecords(Map + * Don't perform optimization if there are inflight operations on the dataset. This is for two reasons: + * - The compaction will contain the correct data as all failed operations have been rolled back. + * - Clean/compaction etc. will have the highest timestamp on the MDT and we won't be adding new operations + * with smaller timestamps to metadata table (makes for easier debugging) + * + * This adds the limitations that long-running async operations (clustering, etc.) may cause delay in such MDT + * optimizations. We will relax this after MDT code has been hardened. */ - protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String instantTime) { -// finish off any pending compactions if any from previous attempt. -writeClient.runAnyPendingCompactions(); - -String latestDeltaCommitTimeInMetadataTable = metadataMetaClient.reloadActiveTimeline() -.getDeltaCommitTimeline() -.filterCompletedInstants() -.lastInstant().orElseThrow(() -> new HoodieMetadataException("No completed deltacommit in metadata table")) -.getTimestamp(); -// we need to find if there are any inflights in data table timeline before or equal to the latest delta commit in metadata table. -// Whenever you want to change this logic, please ensure all below scenarios are considered. -// a. There could be a chance that latest delta commit in MDT is committed in MDT, but failed in DT. And so findInstantsBeforeOrEquals() should be employed -// b. There could be DT inflights after latest delta commit in MDT and we are ok with it. bcoz, the contract is, latest compaction instant time in MDT represents -// any instants before that is already synced with metadata table. -// c. Do consider out of order commits. For eg, c4 from DT could complete before c3. and we can't trigger compaction in MDT with c4 as base instant time, until every -// instant before c4 is synced with metadata table. -List pendingInstants = dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested() - .findInstantsBeforeOrEquals(latestDeltaCommitTimeInMetadataTable).getInstants(); + @Override + public void performTableServices(Option inFlightInstantTimestamp) { +HoodieTimer metadataTableServicesTimer = HoodieTimer.start(); +boolean allTableServicesExecutedSuccessfullyOrSkipped = true; +try { + BaseHoodieWriteClient writeClient = getWriteClient(); + // Run any pending table services operations. + runPendingTableServicesOperations(writeClient); + + // Check and run clean operations. + String latestDeltacommitTime = metadataMetaClient.reloadActiveTimeline().getDeltaCommitTimeline() + .filterCompletedInstants() + .lastInstant().get() + .getTimestamp(); + LOG.info("Latest deltacommit time found is " + latestDeltacommitTime + ", running clean operations."); + cleanIfNecessary(writeClient, latestDeltacommitTime); + + // Do timeline validation before scheduling compaction/logcompaction operations. + if (!validateTimelineBeforeSchedulingCompaction(inFlightInstantTimestamp, latestDeltacommitTime)) { +return; Review Comment: > unless compaction in MDT kicks in, archival might not have anything to do after last time it was able to archive something. Then archiving will always be blocked by the compaction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #8910: [HUDI-6342] Fixing flaky Continuous mode multi writer tests
nsivabalan commented on code in PR #8910: URL: https://github.com/apache/hudi/pull/8910#discussion_r1223690819 ## hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamerWithMultiWriter.java: ## @@ -404,12 +405,24 @@ private void runJobsInParallel(String tableBasePath, HoodieTableType tableType, * Need to perform getMessage().contains since the exception coming * from {@link org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.DeltaSyncService} gets wrapped many times into RuntimeExceptions. */ - if (expectConflict && e.getCause().getMessage().contains(ConcurrentModificationException.class.getName())) { + if (expectConflict && backfillFailed.get() && e.getCause().getMessage().contains(ConcurrentModificationException.class.getName())) { // expected ConcurrentModificationException since ingestion & backfill will have overlapping writes -if (backfillFailed.get()) { +if (!continuousFailed.get()) { Review Comment: NTR: in most cases backfill job fails and hence the test succeeds. but if continuous job fails, the test times out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6342) Fix flaky MultiTableDeltaStreamer test
[ https://issues.apache.org/jira/browse/HUDI-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6342: - Labels: pull-request-available (was: ) > Fix flaky MultiTableDeltaStreamer test > -- > > Key: HUDI-6342 > URL: https://issues.apache.org/jira/browse/HUDI-6342 > Project: Apache Hudi > Issue Type: Bug > Components: tests-ci >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > TestHoodieDeltaStreamerWithMultiWriter. > testUpsertsContinuousModeWithMultipleWritersForConflicts > is flaky in recent times. > > [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/17675/logs/21] > > {code:java} > 2023-06-08T14:02:50.4346417Z 798455 [pool-1655-thread-1] ERROR > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter > [] - Continuous job failed java.lang.RuntimeException: Ingestion service was > shut down with exception. > 2023-06-08T14:02:50.4351308Z 798455 [Listener at localhost/45789] ERROR > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter > [] - Conflict happened, but not expected > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Ingestion service was shut down with exception. > 2023-06-08T14:02:50.7579883Z [ERROR] Tests run: 5, Failures: 0, Errors: 1, > Skipped: 1, Time elapsed: 201.181 s <<< FAILURE! - in > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter > 2023-06-08T14:02:50.7615120Z [ERROR] > testUpsertsContinuousModeWithMultipleWritersForConflicts{HoodieTableType}[2] > Time elapsed: 56.062 s <<< ERROR! > 2023-06-08T14:02:50.7615570Z java.util.concurrent.ExecutionException: > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Ingestion service was shut down with exception. > 2023-06-08T14:02:50.7616039Z at > java.util.concurrent.FutureTask.report(FutureTask.java:122) > 2023-06-08T14:02:50.7616662Z at > java.util.concurrent.FutureTask.get(FutureTask.java:192) > 2023-06-08T14:02:50.7617179Z at > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter.runJobsInParallel(TestHoodieDeltaStreamerWithMultiWriter.java:398) > 2023-06-08T14:02:50.7617674Z at > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter.testUpsertsContinuousModeWithMultipleWritersForConflicts(TestHoodieDeltaStreamerWithMultiWriter.java:140) > 2023-06-08T14:02:50.7618059Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2023-06-08T14:02:50.7618319Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2023-06-08T14:02:50.7618615Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2023-06-08T14:02:50.7618896Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2023-06-08T14:02:50.7619173Z at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) > 2023-06-08T14:02:50.7619480Z at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2023-06-08T14:02:50.7619845Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2023-06-08T14:02:50.7620217Z at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2023-06-08T14:02:50.7620540Z at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2023-06-08T14:02:50.7620903Z at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92) > 2023-06-08T14:02:50.7621288Z at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2023-06-08T14:02:50.7621849Z at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2023-06-08T14:02:50.767Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2023-06-08T14:02:50.7622626Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2023-06-08T14:02:50.7623010Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2023-06-08T14:02:50.7623375Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2023-06-08T14:02:50.7623723Z at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) >
[GitHub] [hudi] nsivabalan opened a new pull request, #8910: [HUDI-6342] Fixing flaky Continuous mode multi writer tests
nsivabalan opened a new pull request, #8910: URL: https://github.com/apache/hudi/pull/8910 ### Change Logs Fixing flaky Continuous mode multi writer tests. Exception thrown when continuous mode job fails is different than exception thrown while backfill job fails. So, had to fix the tests accounting for that. ### Impact Fixing flaky Continuous mode multi writer tests. Exception thrown when continuous mode job fails is different than exception thrown while backfill job fails. So, had to fix the tests accounting for that. ### Risk level (write none, low medium or high below) Stabilizes CI ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6342) Fix flaky MultiTableDeltaStreamer test
[ https://issues.apache.org/jira/browse/HUDI-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-6342: -- Epic Link: HUDI-4302 > Fix flaky MultiTableDeltaStreamer test > -- > > Key: HUDI-6342 > URL: https://issues.apache.org/jira/browse/HUDI-6342 > Project: Apache Hudi > Issue Type: Bug > Components: tests-ci >Reporter: sivabalan narayanan >Priority: Major > > TestHoodieDeltaStreamerWithMultiWriter. > testUpsertsContinuousModeWithMultipleWritersForConflicts > is flaky in recent times. > > [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/17675/logs/21] > > {code:java} > 2023-06-08T14:02:50.4346417Z 798455 [pool-1655-thread-1] ERROR > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter > [] - Continuous job failed java.lang.RuntimeException: Ingestion service was > shut down with exception. > 2023-06-08T14:02:50.4351308Z 798455 [Listener at localhost/45789] ERROR > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter > [] - Conflict happened, but not expected > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Ingestion service was shut down with exception. > 2023-06-08T14:02:50.7579883Z [ERROR] Tests run: 5, Failures: 0, Errors: 1, > Skipped: 1, Time elapsed: 201.181 s <<< FAILURE! - in > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter > 2023-06-08T14:02:50.7615120Z [ERROR] > testUpsertsContinuousModeWithMultipleWritersForConflicts{HoodieTableType}[2] > Time elapsed: 56.062 s <<< ERROR! > 2023-06-08T14:02:50.7615570Z java.util.concurrent.ExecutionException: > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Ingestion service was shut down with exception. > 2023-06-08T14:02:50.7616039Z at > java.util.concurrent.FutureTask.report(FutureTask.java:122) > 2023-06-08T14:02:50.7616662Z at > java.util.concurrent.FutureTask.get(FutureTask.java:192) > 2023-06-08T14:02:50.7617179Z at > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter.runJobsInParallel(TestHoodieDeltaStreamerWithMultiWriter.java:398) > 2023-06-08T14:02:50.7617674Z at > org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter.testUpsertsContinuousModeWithMultipleWritersForConflicts(TestHoodieDeltaStreamerWithMultiWriter.java:140) > 2023-06-08T14:02:50.7618059Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2023-06-08T14:02:50.7618319Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2023-06-08T14:02:50.7618615Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2023-06-08T14:02:50.7618896Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2023-06-08T14:02:50.7619173Z at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) > 2023-06-08T14:02:50.7619480Z at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2023-06-08T14:02:50.7619845Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2023-06-08T14:02:50.7620217Z at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2023-06-08T14:02:50.7620540Z at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2023-06-08T14:02:50.7620903Z at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92) > 2023-06-08T14:02:50.7621288Z at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2023-06-08T14:02:50.7621849Z at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2023-06-08T14:02:50.767Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2023-06-08T14:02:50.7622626Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2023-06-08T14:02:50.7623010Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2023-06-08T14:02:50.7623375Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2023-06-08T14:02:50.7623723Z at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2023-06-08T14:02:50.7624054Z at >
[jira] [Created] (HUDI-6342) Fix flaky MultiTableDeltaStreamer test
sivabalan narayanan created HUDI-6342: - Summary: Fix flaky MultiTableDeltaStreamer test Key: HUDI-6342 URL: https://issues.apache.org/jira/browse/HUDI-6342 Project: Apache Hudi Issue Type: Bug Components: tests-ci Reporter: sivabalan narayanan TestHoodieDeltaStreamerWithMultiWriter. testUpsertsContinuousModeWithMultipleWritersForConflicts is flaky in recent times. [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/17675/logs/21] {code:java} 2023-06-08T14:02:50.4346417Z 798455 [pool-1655-thread-1] ERROR org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter [] - Continuous job failed java.lang.RuntimeException: Ingestion service was shut down with exception. 2023-06-08T14:02:50.4351308Z 798455 [Listener at localhost/45789] ERROR org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter [] - Conflict happened, but not expected java.util.concurrent.ExecutionException: java.lang.RuntimeException: Ingestion service was shut down with exception. 2023-06-08T14:02:50.7579883Z [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 201.181 s <<< FAILURE! - in org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter 2023-06-08T14:02:50.7615120Z [ERROR] testUpsertsContinuousModeWithMultipleWritersForConflicts{HoodieTableType}[2] Time elapsed: 56.062 s <<< ERROR! 2023-06-08T14:02:50.7615570Z java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Ingestion service was shut down with exception. 2023-06-08T14:02:50.7616039Zat java.util.concurrent.FutureTask.report(FutureTask.java:122) 2023-06-08T14:02:50.7616662Zat java.util.concurrent.FutureTask.get(FutureTask.java:192) 2023-06-08T14:02:50.7617179Zat org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter.runJobsInParallel(TestHoodieDeltaStreamerWithMultiWriter.java:398) 2023-06-08T14:02:50.7617674Zat org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamerWithMultiWriter.testUpsertsContinuousModeWithMultipleWritersForConflicts(TestHoodieDeltaStreamerWithMultiWriter.java:140) 2023-06-08T14:02:50.7618059Zat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2023-06-08T14:02:50.7618319Zat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 2023-06-08T14:02:50.7618615Zat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2023-06-08T14:02:50.7618896Zat java.lang.reflect.Method.invoke(Method.java:498) 2023-06-08T14:02:50.7619173Zat org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) 2023-06-08T14:02:50.7619480Zat org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) 2023-06-08T14:02:50.7619845Zat org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) 2023-06-08T14:02:50.7620217Zat org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) 2023-06-08T14:02:50.7620540Zat org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) 2023-06-08T14:02:50.7620903Zat org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92) 2023-06-08T14:02:50.7621288Zat org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) 2023-06-08T14:02:50.7621849Zat org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) 2023-06-08T14:02:50.767Zat org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) 2023-06-08T14:02:50.7622626Zat org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) 2023-06-08T14:02:50.7623010Zat org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) 2023-06-08T14:02:50.7623375Zat org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) 2023-06-08T14:02:50.7623723Zat org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) 2023-06-08T14:02:50.7624054Zat org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) 2023-06-08T14:02:50.7624409Zat org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:210) 2023-06-08T14:02:50.7624794Zat
[jira] [Updated] (HUDI-6315) Optimize UPSERT codepath to use meta fields instead of key generation and index lookup
[ https://issues.apache.org/jira/browse/HUDI-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6315: - Labels: pull-request-available (was: ) > Optimize UPSERT codepath to use meta fields instead of key generation and > index lookup > -- > > Key: HUDI-6315 > URL: https://issues.apache.org/jira/browse/HUDI-6315 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Amrish Lal >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] nsivabalan commented on a diff in pull request #8879: [HUDI-6315] [WIP] Optimize UPSERT codepath to use meta fields instead of key generation and index lookup
nsivabalan commented on code in PR #8879: URL: https://github.com/apache/hudi/pull/8879#discussion_r175056 ## hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java: ## @@ -233,15 +238,25 @@ public static HoodieWriteResult doDeletePartitionsOperation(SparkRDDWriteClient } public static HoodieRecord createHoodieRecord(GenericRecord gr, Comparable orderingVal, HoodieKey hKey, - String payloadClass) throws IOException { + String payloadClass, HoodieRecordLocation recordLocation) throws IOException { HoodieRecordPayload payload = DataSourceUtils.createPayload(payloadClass, gr, orderingVal); -return new HoodieAvroRecord<>(hKey, payload); + +HoodieAvroRecord record = new HoodieAvroRecord<>(hKey, payload); +if (recordLocation != null) { + record.setCurrentLocation(recordLocation); +} +return record; } + // AKL_TODO: check if this change is needed. Also validate change if needed. public static HoodieRecord createHoodieRecord(GenericRecord gr, HoodieKey hKey, -String payloadClass) throws IOException { +String payloadClass, HoodieRecordLocation recordLocation) throws IOException { Review Comment: same here ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala: ## @@ -144,20 +144,25 @@ class DefaultSource extends RelationProvider mode: SaveMode, optParams: Map[String, String], df: DataFrame): BaseRelation = { -val dfWithoutMetaCols = df.drop(HoodieRecord.HOODIE_META_COLUMNS.asScala:_*) +val dfPrepped = if (optParams.getOrDefault(DATASOURCE_WRITE_PREPPED_KEY, "false") Review Comment: dfPrepped -> processedDf ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -1160,21 +1171,29 @@ object HoodieSparkSqlWriter { // handle dropping partition columns it.map { avroRec => -val processedRecord = if (shouldDropPartitionColumns) { - HoodieAvroUtils.rewriteRecord(avroRec, dataFileSchema) +val (hoodieKey: HoodieKey, recordLocation: Option[HoodieRecordLocation]) = getKeyAndLocatorFromAvroRecord(keyGenerator, avroRec, + isPrepped) + +val avroRecWithoutMeta: GenericRecord = if (isPrepped) { + HoodieAvroUtils.rewriteRecord(avroRec, HoodieAvroUtils.removeMetadataFields(dataFileSchema)) } else { avroRec } -val hoodieKey = new HoodieKey(keyGenerator.getRecordKey(avroRec), keyGenerator.getPartitionPath(avroRec)) +val processedRecord = if (shouldDropPartitionColumns) { + HoodieAvroUtils.rewriteRecord(avroRecWithoutMeta, dataFileSchema) +} else { + avroRecWithoutMeta +} + val hoodieRecord = if (shouldCombine) { val orderingVal = HoodieAvroUtils.getNestedFieldVal(avroRec, config.getString(PRECOMBINE_FIELD), false, consistentLogicalTimestampEnabled).asInstanceOf[Comparable[_]] DataSourceUtils.createHoodieRecord(processedRecord, orderingVal, hoodieKey, -config.getString(PAYLOAD_CLASS_NAME)) +config.getString(PAYLOAD_CLASS_NAME), recordLocation.getOrElse(null)) } else { - DataSourceUtils.createHoodieRecord(processedRecord, hoodieKey, -config.getString(PAYLOAD_CLASS_NAME)) + // AKL_TODO: check if this change is needed. Review Comment: fix the comments ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -1195,18 +1214,108 @@ object HoodieSparkSqlWriter { } val sparkKeyGenerator = HoodieSparkKeyGeneratorFactory.createKeyGenerator(keyGenProps).asInstanceOf[SparkKeyGeneratorInterface] val targetStructType = if (shouldDropPartitionColumns) dataFileStructType else writerStructType + val finalStructType = if (isPrepped) { +val fieldsToExclude = HoodieRecord.HOODIE_META_COLUMNS_WITH_OPERATION.toArray() +StructType(targetStructType.fields.filterNot(field => fieldsToExclude.contains(field.name))) + } else { +targetStructType + } // NOTE: To make sure we properly transform records - val targetStructTypeRowWriter = getCachedUnsafeRowWriter(sourceStructType, targetStructType) + val finalStructTypeRowWriter = getCachedUnsafeRowWriter(sourceStructType, finalStructType) it.map { sourceRow => -val recordKey = sparkKeyGenerator.getRecordKey(sourceRow, sourceStructType) -val partitionPath =
[GitHub] [hudi] yihua commented on pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
yihua commented on PR #8885: URL: https://github.com/apache/hudi/pull/8885#issuecomment-1583654629 Hi @zhangyue19921010 @xiarixiaoyao @nsivabalan @xushiyan @danny0405, could you also review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a diff in pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
yihua commented on code in PR #8885: URL: https://github.com/apache/hudi/pull/8885#discussion_r1223651454 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestIndexSyntax.scala: ## @@ -56,30 +58,37 @@ class TestIndexSyntax extends HoodieSparkSqlTestBase { var logicalPlan = sqlParser.parsePlan(s"show indexes from default.$tableName") var resolvedLogicalPlan = analyzer.execute(logicalPlan) - assertResult(s"`default`.`$tableName`")(resolvedLogicalPlan.asInstanceOf[ShowIndexesCommand].table.identifier.quotedString) Review Comment: FR: `table.identifier.quotedString` now also has catalog name as the prefix. ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala: ## @@ -733,8 +734,8 @@ object HoodieBaseRelation extends SparkAdapterSupport { partitionedFile => { val hadoopConf = hadoopConfBroadcast.value.get() - val reader = new HoodieAvroHFileReader(hadoopConf, new Path(partitionedFile.filePath), -new CacheConfig(hadoopConf)) + val filePath = sparkAdapter.getSparkPartitionedFileUtils.getPathFromPartitionedFile(partitionedFile) Review Comment: For Reviewer (FR): all the changes in the common module of introducing new adapter support are because of Spark 3.4 class and API changes. ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieParquetFileFormat.scala: ## @@ -34,6 +34,15 @@ class HoodieParquetFileFormat extends ParquetFileFormat with SparkAdapterSupport override def toString: String = "Hoodie-Parquet" + override def supportBatch(sparkSession: SparkSession, schema: StructType): Boolean = { Review Comment: FR: Spark 3.4 now supports vectorized reader on nested fields. However, Hudi does not support this yet due to custom schema evolution logic. So we add logic to override `supportBatch` in `HoodieParquetFileFormat` for Spark 3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
hudi-bot commented on PR #8885: URL: https://github.com/apache/hudi/pull/8885#issuecomment-1583605229 ## CI report: * e2f44f2a1f574eed79090b337d7bd56e08058b51 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17680) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8840: [HUDI-5352] Fix `LocalDate` serialization in colstats
hudi-bot commented on PR #8840: URL: https://github.com/apache/hudi/pull/8840#issuecomment-1583605102 ## CI report: * 29e4627d6ec492fb19b64777fbc4ae8e2091d6e0 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17685) * 80b25e613cbcdf8f3e1efe39436cad173163d9d9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17686) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] mpouttu commented on issue #498: Is there any record delete examples?
mpouttu commented on issue #498: URL: https://github.com/apache/hudi/issues/498#issuecomment-1583601589 _hoodie_is_deleted allows us to delete records and replace them with new records in the same transaction which is essential for some of our use cases. EmptyHoodieRecordPayload forces us to do the deletes in a separate commit to the inserts which will cause the balances not to tie out for GL accounts for example. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8840: [HUDI-5352] Fix `LocalDate` serialization in colstats
hudi-bot commented on PR #8840: URL: https://github.com/apache/hudi/pull/8840#issuecomment-1583596337 ## CI report: * 3461a1e2fbcc7b51e06f4bf803b6753466396c95 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17471) * 29e4627d6ec492fb19b64777fbc4ae8e2091d6e0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17685) * 80b25e613cbcdf8f3e1efe39436cad173163d9d9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8840: [HUDI-5352] Fix `LocalDate` serialization in colstats
hudi-bot commented on PR #8840: URL: https://github.com/apache/hudi/pull/8840#issuecomment-1583582258 ## CI report: * 3461a1e2fbcc7b51e06f4bf803b6753466396c95 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17471) * 29e4627d6ec492fb19b64777fbc4ae8e2091d6e0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17685) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8840: [HUDI-5352] Fix `LocalDate` serialization in colstats
hudi-bot commented on PR #8840: URL: https://github.com/apache/hudi/pull/8840#issuecomment-1583486543 ## CI report: * 3461a1e2fbcc7b51e06f4bf803b6753466396c95 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17471) * 29e4627d6ec492fb19b64777fbc4ae8e2091d6e0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
hudi-bot commented on PR #8885: URL: https://github.com/apache/hudi/pull/8885#issuecomment-1583469483 ## CI report: * e2f44f2a1f574eed79090b337d7bd56e08058b51 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17680) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
hudi-bot commented on PR #8885: URL: https://github.com/apache/hudi/pull/8885#issuecomment-1583463698 ## CI report: * e2f44f2a1f574eed79090b337d7bd56e08058b51 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a diff in pull request #8840: [HUDI-5352] Fix `LocalDate` serialization in colstats
yihua commented on code in PR #8840: URL: https://github.com/apache/hudi/pull/8840#discussion_r1223597615 ## hudi-common/src/main/java/org/apache/hudi/common/util/JsonUtils.java: ## @@ -20,41 +20,74 @@ package org.apache.hudi.common.util; import org.apache.hudi.exception.HoodieIOException; +import org.apache.hudi.util.Lazy; import com.fasterxml.jackson.annotation.JsonAutoDetect; import com.fasterxml.jackson.annotation.PropertyAccessor; import com.fasterxml.jackson.core.JsonProcessingException; import com.fasterxml.jackson.databind.DeserializationFeature; import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.SerializationFeature; +import com.fasterxml.jackson.databind.util.StdDateFormat; +import com.fasterxml.jackson.datatype.jsr310.JavaTimeModule; /** * Utils for JSON serialization and deserialization. */ public class JsonUtils { - private static final ObjectMapper MAPPER = new ObjectMapper(); - - static { -MAPPER.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES); -// We need to exclude custom getters, setters and creators which can use member fields -// to derive new fields, so that they are not included in the serialization -MAPPER.setVisibility(PropertyAccessor.FIELD, JsonAutoDetect.Visibility.ANY); -MAPPER.setVisibility(PropertyAccessor.GETTER, JsonAutoDetect.Visibility.NONE); -MAPPER.setVisibility(PropertyAccessor.IS_GETTER, JsonAutoDetect.Visibility.NONE); -MAPPER.setVisibility(PropertyAccessor.SETTER, JsonAutoDetect.Visibility.NONE); -MAPPER.setVisibility(PropertyAccessor.CREATOR, JsonAutoDetect.Visibility.NONE); - } + private static final Lazy MAPPER = Lazy.lazily(JsonUtils::instantiateObjectMapper); public static ObjectMapper getObjectMapper() { -return MAPPER; +return MAPPER.get(); } public static String toString(Object value) { try { - return MAPPER.writeValueAsString(value); + return MAPPER.get().writeValueAsString(value); } catch (JsonProcessingException e) { throw new HoodieIOException( "Fail to convert the class: " + value.getClass().getName() + " to Json String", e); } } + + private static ObjectMapper instantiateObjectMapper() { +ObjectMapper mapper = new ObjectMapper(); + +registerModules(mapper); + +// We're writing out dates as their string representations instead of (int) timestamps +mapper.disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS); +// NOTE: This is necessary to make sure that w/ Jackson >= 2.11 colon is not infixed +// into the timezone value ("+00:00" as opposed to "+" before 2.11) +// While Jackson is able to parse both of these formats, we keep it as false +// to make sure metadata produced by Hudi stays consistent across Jackson versions +configureColonInTimezone(mapper); Review Comment: I think we serialize the column stats to the metadata record payload, correct? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jonvex opened a new pull request, #8909: [HUDI-6311] Insert Into updated behavior
jonvex opened a new pull request, #8909: URL: https://github.com/apache/hudi/pull/8909 ### Change Logs Insert into updated for new behavior Insert overwrite updated for current behavior https://issues.apache.org/jira/browse/HUDI-6021 Create table updated for pkless ### Impact Website change ### Risk level (write none, low medium or high below) none ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a diff in pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
yihua commented on code in PR #8885: URL: https://github.com/apache/hudi/pull/8885#discussion_r1223580325 ## hudi-spark-datasource/hudi-spark3.2.x/src/main/scala/org/apache/spark/sql/HoodieSpark32CatalystPlanUtils.scala: ## @@ -38,6 +36,14 @@ object HoodieSpark32CatalystPlanUtils extends HoodieSpark3CatalystPlanUtils { case _ => None } + override def unapplyMergeIntoTable(plan: LogicalPlan): Option[(LogicalPlan, LogicalPlan, Expression)] = { +plan match { + case MergeIntoTable(targetTable, sourceTable, mergeCondition, _, _) => +Some((targetTable, sourceTable, mergeCondition)) Review Comment: The inner pair of parentheses is for Scala tuple, which is required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a diff in pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
yihua commented on code in PR #8885: URL: https://github.com/apache/hudi/pull/8885#discussion_r1223579632 ## hudi-common/src/main/java/org/apache/hudi/common/util/JsonUtils.java: ## @@ -35,6 +36,8 @@ public class JsonUtils { private static final ObjectMapper MAPPER = new ObjectMapper(); static { +registerModules(MAPPER); + Review Comment: #8840 contains some minor improvements which I'm currently not inclined to include. I'll revise #8840 to contain necessary fixes and then land it before this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pthalasta commented on issue #8901: [SUPPORT] Spark job never terminates
pthalasta commented on issue #8901: URL: https://github.com/apache/hudi/issues/8901#issuecomment-1583381245 I was able to add some env variable as mentioned in the warning message, however, the job never terminates and these are the last few lines of the logs that i see ``` 23/06/08 13:54:10 INFO ClusteringUtils: Found 0 files in pending clustering operations 23/06/08 13:54:10 INFO AbstractTableFileSystemView: Building file system view for partition (files) 23/06/08 13:54:11 INFO AbstractTableFileSystemView: addFilesToView: NumFiles=1, NumFileGroups=1, FileGroupsCreationTime=0, StoreTimeTaken=0 ``` Can someone help me with this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] CTTY commented on a diff in pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
CTTY commented on code in PR #8885: URL: https://github.com/apache/hudi/pull/8885#discussion_r1223566158 ## hudi-spark-datasource/hudi-spark3.4.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark34HoodieParquetFileFormat.scala: ## @@ -0,0 +1,532 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.parquet + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapred.FileSplit +import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl +import org.apache.hadoop.mapreduce.{JobID, TaskAttemptID, TaskID, TaskType} +import org.apache.hudi.HoodieSparkUtils +import org.apache.hudi.client.utils.SparkInternalSchemaConverter +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.util.InternalSchemaCache +import org.apache.hudi.common.util.StringUtils.isNullOrEmpty +import org.apache.hudi.common.util.collection.Pair +import org.apache.hudi.internal.schema.InternalSchema +import org.apache.hudi.internal.schema.action.InternalSchemaMerger +import org.apache.hudi.internal.schema.utils.{InternalSchemaUtils, SerDeHelper} +import org.apache.parquet.filter2.compat.FilterCompat +import org.apache.parquet.filter2.predicate.FilterApi +import org.apache.parquet.format.converter.ParquetMetadataConverter.SKIP_ROW_GROUPS +import org.apache.parquet.hadoop.{ParquetInputFormat, ParquetRecordReader} +import org.apache.spark.TaskContext +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection +import org.apache.spark.sql.catalyst.expressions.{Cast, JoinedRow} +import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.execution.WholeStageCodegenExec +import org.apache.spark.sql.execution.datasources.parquet.Spark34HoodieParquetFileFormat._ +import org.apache.spark.sql.execution.datasources.{DataSourceUtils, PartitionedFile, RecordReaderIterator} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources._ +import org.apache.spark.sql.types.{AtomicType, DataType, StructField, StructType} +import org.apache.spark.util.SerializableConfiguration +/** + * This class is an extension of [[ParquetFileFormat]] overriding Spark-specific behavior + * that's not possible to customize in any other way + * + * NOTE: This is a version of [[AvroDeserializer]] impl from Spark 3.2.1 w/ w/ the following changes applied to it: + * + * Avoiding appending partition values to the rows read from the data file + * Schema on-read + * + */ +class Spark34HoodieParquetFileFormat(private val shouldAppendPartitionValues: Boolean) extends ParquetFileFormat { + + override def supportBatch(sparkSession: SparkSession, schema: StructType): Boolean = { +val conf = sparkSession.sessionState.conf +conf.parquetVectorizedReaderEnabled && schema.forall(_.dataType.isInstanceOf[AtomicType]) + } + + def supportsColumnar(sparkSession: SparkSession, schema: StructType): Boolean = { +val conf = sparkSession.sessionState.conf +// Only output columnar if there is WSCG to read it. +val requiredWholeStageCodegenSettings = + conf.wholeStageEnabled && !WholeStageCodegenExec.isTooManyFields(conf, schema) +requiredWholeStageCodegenSettings && + supportBatch(sparkSession, schema) + } + + override def buildReaderWithPartitionValues(sparkSession: SparkSession, + dataSchema: StructType, + partitionSchema: StructType, + requiredSchema: StructType, + filters: Seq[Filter], + options: Map[String, String], + hadoopConf: Configuration): PartitionedFile => Iterator[InternalRow] = { +hadoopConf.set(ParquetInputFormat.READ_SUPPORT_CLASS, classOf[ParquetReadSupport].getName) +hadoopConf.set( + ParquetReadSupport.SPARK_ROW_REQUESTED_SCHEMA, + requiredSchema.json) +hadoopConf.set( +
[GitHub] [hudi] hudi-bot commented on pull request #8907: [DNM][MINOR] Add some logs to investigate flaky testUpsertsContinuousModeWithMultipleWriters
hudi-bot commented on PR #8907: URL: https://github.com/apache/hudi/pull/8907#issuecomment-1583366621 ## CI report: * ed947b39f1c42f690cbb79257399c1ec967859e9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17682) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] CTTY commented on a diff in pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
CTTY commented on code in PR #8885: URL: https://github.com/apache/hudi/pull/8885#discussion_r1223562373 ## hudi-spark-datasource/hudi-spark3.2.x/src/main/scala/org/apache/spark/sql/HoodieSpark32CatalystPlanUtils.scala: ## @@ -38,6 +36,14 @@ object HoodieSpark32CatalystPlanUtils extends HoodieSpark3CatalystPlanUtils { case _ => None } + override def unapplyMergeIntoTable(plan: LogicalPlan): Option[(LogicalPlan, LogicalPlan, Expression)] = { +plan match { + case MergeIntoTable(targetTable, sourceTable, mergeCondition, _, _) => +Some((targetTable, sourceTable, mergeCondition)) Review Comment: nit: double parentheses -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] CTTY commented on a diff in pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
CTTY commented on code in PR #8885: URL: https://github.com/apache/hudi/pull/8885#discussion_r1223554380 ## hudi-common/src/main/java/org/apache/hudi/common/util/JsonUtils.java: ## @@ -35,6 +36,8 @@ public class JsonUtils { private static final ObjectMapper MAPPER = new ObjectMapper(); static { +registerModules(MAPPER); + Review Comment: This change is ported from #8840 . I assume we will need to merge that PR before this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan merged pull request #8684: [HUDI-6200] Enhancements to the MDT for improving performance of larger indexes.
nsivabalan merged PR #8684: URL: https://github.com/apache/hudi/pull/8684 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #8684: [HUDI-6200] Enhancements to the MDT for improving performance of larger indexes.
nsivabalan commented on PR #8684: URL: https://github.com/apache/hudi/pull/8684#issuecomment-1583315765 CI is green https://github.com/apache/hudi/assets/513218/f9ba5e86-bc6f-4b62-a14c-502151f2a188;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0
hudi-bot commented on PR #8885: URL: https://github.com/apache/hudi/pull/8885#issuecomment-1583301619 ## CI report: * e2f44f2a1f574eed79090b337d7bd56e08058b51 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17680) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8908: [DNM][MINOR] Add some logs to investigate flaky testUpsertsContinuousModeWithMultipleWriters
hudi-bot commented on PR #8908: URL: https://github.com/apache/hudi/pull/8908#issuecomment-1583220401 ## CI report: * 9d6633418e12c8a06c7bdb3e271f535096299bd2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17683) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8847: [HUDI-2071] Support Reading Bootstrap MOR RT Table In Spark DataSource Table
hudi-bot commented on PR #8847: URL: https://github.com/apache/hudi/pull/8847#issuecomment-1583210976 ## CI report: * fe991dc492e5bec19b4bfd91dc0b210e6b152b7a UNKNOWN * 818c8050bf6cab30a402bfeab83a473976c44cdd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17679) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8905: [HUDI-6337] Incremental Clean ignore partitions affected by append write commits/delta commits
hudi-bot commented on PR #8905: URL: https://github.com/apache/hudi/pull/8905#issuecomment-1583202447 ## CI report: * f8f14263190df7b66143e192188e68463e0c1efd UNKNOWN * f9adcecf4e54774510569f14af4c81a1f4951a28 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17681) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table
hudi-bot commented on PR #8900: URL: https://github.com/apache/hudi/pull/8900#issuecomment-1583202392 ## CI report: * fe74a9a7d32286ae29ded9370f6d53ccb14c8809 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17677) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org