[GitHub] [hudi] bvaradar commented on a diff in pull request #8143: [HUDI-5911] SimpleTransactionDirectMarkerBasedDetectionStrategy can't work with none-partitioned table
bvaradar commented on code in PR #8143: URL: https://github.com/apache/hudi/pull/8143#discussion_r1181169082 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/DirectMarkerTransactionManager.java: ## @@ -83,7 +83,7 @@ private static TypedProperties createUpdatedLockProps( throw new HoodieNotSupportedException("Only Support ZK-based lock for DirectMarkerTransactionManager now."); } TypedProperties props = new TypedProperties(writeConfig.getProps()); -props.setProperty(LockConfiguration.ZK_LOCK_KEY_PROP_KEY, partitionPath + "/" + fileId); +props.setProperty(LockConfiguration.ZK_LOCK_KEY_PROP_KEY, (null != partitionPath && !partitionPath.isEmpty()) ? partitionPath + "/" + fileId : fileId); Review Comment: This change will pose a challenge when upgrading. During upgrade, we would need all writers to be stopped and upgrade to the version containing this change for safe concurrency -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8596: [BUG-FIX] use try with resource to close stream
hudi-bot commented on PR #8596: URL: https://github.com/apache/hudi/pull/8596#issuecomment-1528945298 ## CI report: * 8d29d9571d94e3d654e87151b16ef99ff02762b4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16763) * 0c8c7d99fc250191a7eba156052f01371e431a30 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16768) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8596: [BUG-FIX] use try with resource to close stream
hudi-bot commented on PR #8596: URL: https://github.com/apache/hudi/pull/8596#issuecomment-1528944501 ## CI report: * 8d29d9571d94e3d654e87151b16ef99ff02762b4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16763) * 0c8c7d99fc250191a7eba156052f01371e431a30 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8596: [BUG-FIX] use try with resource to close stream
hudi-bot commented on PR #8596: URL: https://github.com/apache/hudi/pull/8596#issuecomment-1528943697 ## CI report: * 8d29d9571d94e3d654e87151b16ef99ff02762b4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16763) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xicm commented on pull request #7355: [HUDI-5308] Hive3 query returns null when the where clause has a partition field
xicm commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1528942760 > So it is because the incorrect hive server version is used ? yes, partition query returns null with hive3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] c-f-cooper commented on pull request #8596: [BUG-FIX] use try with resource to close stream
c-f-cooper commented on PR #8596: URL: https://github.com/apache/hudi/pull/8596#issuecomment-1528941416 > Hi, can you elaborate a little more what would happen if the inputstream is not closed properly? Can you write a test case to demonstrate the resolution of the issue. To demonstrate the issue of unclosed IO streams, I can write the following test program: `import java.io.*; public class UnclosedIOTest { public static void main(String[] args) throws IOException { BufferedReader reader = new BufferedReader(new InputStreamReader(System.in)); BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt")); System.out.print("Please enter a line of text:"); String line = reader.readLine(); writer.write(line); System.out.println("Written to output.txt"); } } ` This program uses an unclosed BufferedReader and an unclosed BufferedWriter. When the program runs, the user will be prompted to enter a line of text, and the program will write this line of text to a file called output.txt. However, because the BufferedReader and BufferedWriter are not closed, it can lead to the following issues: Resource leakage: When the program is run repeatedly, a new BufferedReader and BufferedWriter will be created each time, but the old IO streams are not closed, and they will continue to occupy system resources, which may eventually cause the system or program to crash. Data loss: If an exception occurs while writing data and the BufferedWriter is not closed, the written data may be lost because they have not been flushed to the disk yet. To solve these issues, you should add the following code at the end of the program to close the IO streams: `reader.close(); writer.close(); ` This will ensure that the program correctly releases the IO resources. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on a diff in pull request #8378: [HUDI-6031] fix bug: checkpoint lost after changing cow to mor
bvaradar commented on code in PR #8378: URL: https://github.com/apache/hudi/pull/8378#discussion_r1181163294 ## hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java: ## @@ -2604,6 +2605,59 @@ public void testForceEmptyMetaSync() throws Exception { assertTrue(hiveClient.tableExists(tableName), "Table " + tableName + " should exist"); } + @Test + public void testResumeCheckpointAfterChangingCOW2MOR() throws Exception { +String tableBasePath = basePath + "/test_resume_checkpoint_after_changing_cow_to_mor"; +// default table type is COW +HoodieDeltaStreamer.Config cfg = TestHelpers.makeConfig(tableBasePath, WriteOperationType.BULK_INSERT); +new HoodieDeltaStreamer(cfg, jsc).sync(); +TestHelpers.assertRecordCount(1000, tableBasePath, sqlContext); +TestHelpers.assertCommitMetadata("0", tableBasePath, fs, 1); +TestHelpers.assertAtLeastNCommits(1, tableBasePath, fs); + +// change cow to mor +HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder() +.setConf(new Configuration(fs.getConf())) +.setBasePath(cfg.targetBasePath) +.setLoadActiveTimelineOnLoad(false) +.build(); +Properties hoodieProps = new Properties(); +hoodieProps.load(fs.open(new Path(cfg.targetBasePath + "/.hoodie/hoodie.properties"))); +LOG.info("old props: {}", hoodieProps); +hoodieProps.put("hoodie.table.type", HoodieTableType.MERGE_ON_READ.name()); +LOG.info("new props: {}", hoodieProps); +Path metaPathDir = new Path(metaClient.getBasePathV2(), METAFOLDER_NAME); +HoodieTableConfig.create(metaClient.getFs(), metaPathDir, hoodieProps); + +// continue deltastreamer +cfg = TestHelpers.makeConfig(tableBasePath, WriteOperationType.UPSERT); +cfg.tableType = HoodieTableType.MERGE_ON_READ.name(); +new HoodieDeltaStreamer(cfg, jsc).sync(); +// out of 1000 new records, 500 are inserts, 450 are updates and 50 are deletes. Review Comment: Sounds good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8611: [HUDI-6157] Fix potential data loss for flink streaming source from table with multi writer
hudi-bot commented on PR #8611: URL: https://github.com/apache/hudi/pull/8611#issuecomment-1528932193 ## CI report: * e3b3799e1e360710b99bc089f193b771fc8c4db3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16759) * b184b111c6928408d082ce73486f5bd3ae7c6683 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16767) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8611: [HUDI-6157] Fix potential data loss for flink streaming source from table with multi writer
hudi-bot commented on PR #8611: URL: https://github.com/apache/hudi/pull/8611#issuecomment-1528931363 ## CI report: * e3b3799e1e360710b99bc089f193b771fc8c4db3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16759) * b184b111c6928408d082ce73486f5bd3ae7c6683 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #8493: [HUDI-6098] Use bulk insert prepped for the initial write into MDT.
danny0405 commented on PR #8493: URL: https://github.com/apache/hudi/pull/8493#issuecomment-1528930736 @prashantwason You need to rebase with the latest master to get the tests passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8599: [MINOR] Ensure metrics prefix does not contain any dot.
danny0405 commented on code in PR #8599: URL: https://github.com/apache/hudi/pull/8599#discussion_r1181158282 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -2175,7 +2175,8 @@ public boolean getPushGatewayRandomJobNameSuffix() { } public String getMetricReporterMetricsNamePrefix() { -return getStringOrDefault(HoodieMetricsConfig.METRICS_REPORTER_PREFIX); +// Metrics prefixes should not have a dot as this is usually a separator +return getStringOrDefault(HoodieMetricsConfig.METRICS_REPORTER_PREFIX).replaceAll("\\.", "_"); Review Comment: Can we just report the invalid format and throws exception ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [MINOR] Match the directory-filter-regex to the relative directory name (#8601)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 01992458d86 [MINOR] Match the directory-filter-regex to the relative directory name (#8601) 01992458d86 is described below commit 01992458d86034cbfca79a865d6ee47313fc585e Author: Prashant Wason AuthorDate: Sat Apr 29 20:33:22 2023 -0700 [MINOR] Match the directory-filter-regex to the relative directory name (#8601) --- .../apache/hudi/metadata/HoodieBackedTableMetadataWriter.java| 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java index df4d2530815..1f5f505364c 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java @@ -673,12 +673,9 @@ public abstract class HoodieBackedTableMetadataWriter implements HoodieTableMeta for (DirectoryInfo dirInfo : processedDirectories) { if (!dirFilterRegex.isEmpty()) { final String relativePath = dirInfo.getRelativePath(); - if (!relativePath.isEmpty()) { -Path partitionPath = new Path(datasetBasePath, relativePath); -if (partitionPath.getName().matches(dirFilterRegex)) { - LOG.info("Ignoring directory " + partitionPath + " which matches the filter regex " + dirFilterRegex); - continue; -} + if (!relativePath.isEmpty() && relativePath.matches(dirFilterRegex)) { +LOG.info("Ignoring directory " + relativePath + " which matches the filter regex " + dirFilterRegex); +continue; } }
[GitHub] [hudi] danny0405 merged pull request #8601: [MINOR] Match the directory-filter-regex to the relative directory name.
danny0405 merged PR #8601: URL: https://github.com/apache/hudi/pull/8601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8602: [MINOR] When a clean operation fails do not continue and throw the exception.
danny0405 commented on code in PR #8602: URL: https://github.com/apache/hudi/pull/8602#discussion_r1181157917 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java: ## @@ -256,13 +257,14 @@ public HoodieCleanMetadata execute() { cleanMetadataList.add(runPendingClean(table, hoodieInstant)); } catch (Exception e) { LOG.warn("Failed to perform previous clean operation, instant: " + hoodieInstant, e); +throw e; } Review Comment: Can we write a test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xccui commented on pull request #8594: [HUDI-6148] Recreate StreamWriteOperatorCoordinator for global failovers
xccui commented on PR #8594: URL: https://github.com/apache/hudi/pull/8594#issuecomment-1528929872 Hi @danny0405, it's the http connection pool (`CPool`) in `PoolingHttpClientConnectionManager` used by s3a FileSystem. It was closed for an OOM of JobManager (see https://github.com/apache/httpcomponents-client/commit/ca98ad69adad79de57d8b944ba524f7267a795cb). I'm not quite sure why the JobManager was not restarted but just triggered a job failover. But when a failover is triggered, I believe the whole job including the coordinator should be reset. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8604: [HUDI-6151] Rollback previously applied commits to MDT when operations are retried.
danny0405 commented on code in PR #8604: URL: https://github.com/apache/hudi/pull/8604#discussion_r1181157786 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java: ## @@ -161,27 +161,28 @@ protected void commit(String instantTime, Map to MDT. ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java: ## @@ -161,27 +161,28 @@ protected void commit(String instantTime, Map for e.g ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java: ## @@ -161,27 +161,28 @@ protected void commit(String instantTime, Map alreadyCompletedInstant = metadataMetaClient.getActiveTimeline().filterCompletedInstants().filter(entry -> entry.getTimestamp().equals(instantTime)).lastInstant(); -if (alreadyCompletedInstant.isPresent()) { - // this code path refers to a re-attempted commit that got committed to metadata table, but failed in datatable. - // for eg, lets say compaction c1 on 1st attempt succeeded in metadata table and failed before committing to datatable. - // when retried again, data table will first rollback pending compaction. these will be applied to metadata table, but all changes - // are upserts to metadata table and so only a new delta commit will be created. - // once rollback is complete, compaction will be retried again, which will eventually hit this code block where the respective commit is - // already part of completed commit. So, we have to manually remove the completed instant and proceed. - // and it is for the same reason we enabled withAllowMultiWriteOnSameInstant for metadata table. - HoodieActiveTimeline.deleteInstantFile(metadataMetaClient.getFs(), metadataMetaClient.getMetaPath(), alreadyCompletedInstant.get()); - metadataMetaClient.reloadActiveTimeline(); +LOG.info(String.format("%s completed commit at %s being applied to metadata table", +alreadyCompletedInstant.isPresent() ? "Already" : "Partially", instantTime)); Review Comment: applied to metadata table -> applied to metadata table. ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java: ## @@ -161,27 +161,28 @@ protected void commit(String instantTime, Map let's say -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (f3ddcd97625 -> b56ab71c57c)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from f3ddcd97625 [MINOR] Fix the hudi-cli export command (#8608) add b56ab71c57c [MINOR] Update colstats parallelism default to 200 (#8517) No new revisions were added by this update. Summary of changes: .../main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[GitHub] [hudi] xushiyan merged pull request #8517: [MINOR] Update colstats parallelism default to 200
xushiyan merged PR #8517: URL: https://github.com/apache/hudi/pull/8517 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8606: [MINOR] Check the return value from delete during rollback and finalize to ensure the files actually got deleted.
danny0405 commented on code in PR #8606: URL: https://github.com/apache/hudi/pull/8606#discussion_r1181157467 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackHelper.java: ## @@ -197,14 +197,21 @@ protected List deleteFiles(HoodieTableMetaClient metaClient, // if first rollback attempt failed and retried again, chances that some files are already deleted. isDeleted = true; } + + if (!isDeleted) { Review Comment: In which case the `metaClient.getFs().delete()` can return false if the file actually exists there? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.
danny0405 commented on code in PR #8605: URL: https://github.com/apache/hudi/pull/8605#discussion_r1181157260 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java: ## @@ -170,9 +171,35 @@ public static List filterKeysFromFile(Path filePath, List candid return foundRecordKeys; } + /** + * Check if the given commit timestamp is valid for the timeline. + * + * The commit timestamp is considered to be valid if: + * 1. the commit timestamp is present in the timeline, or + * 2. the commit timestamp is less than the first commit timestamp in the timeline + * + * @param commitTimeline The timeline + * @param commitTsThe commit timestamp to check + * @returntrue if the commit timestamp is valid for the timeline + */ public static boolean checkIfValidCommit(HoodieTimeline commitTimeline, String commitTs) { -// Check if the last commit ts for this row is 1) present in the timeline or -// 2) is less than the first commit ts in the timeline -return !commitTimeline.empty() && commitTimeline.containsOrBeforeTimelineStarts(commitTs); +if (commitTimeline.empty()) { + return false; +} + +// Check for 0.8+ timestamps which have msec granularity +if (commitTimeline.containsOrBeforeTimelineStarts(commitTs)) { + return true; +} + +// Check for older timestamp which have sec granularity and an extension of DEFAULT_MILLIS_EXT may have been added via Timeline operations +if (commitTs.length() == HoodieInstantTimeGenerator.MILLIS_INSTANT_TIMESTAMP_FORMAT_LENGTH && commitTs.endsWith(HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT)) { + final String actualOlderFormatTs = commitTs.substring(0, commitTs.length() - HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT.length()); + if (commitTimeline.containsOrBeforeTimelineStarts(actualOlderFormatTs)) { +return true; + } +} Review Comment: Shouldm't we fix this method instead? `commitTimeline.containsOrBeforeTimelineStarts` and should we have a version number for the timeline? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8607: [MINOR] Fixed the reading of instants from very old archive files where ACTION_STATE is not present in instants.
danny0405 commented on code in PR #8607: URL: https://github.com/apache/hudi/pull/8607#discussion_r1181156586 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java: ## @@ -152,9 +156,13 @@ public void loadCompactionDetailsInMemory(String compactionInstantTime) { public void loadCompactionDetailsInMemory(String startTs, String endTs) { // load compactionPlan -loadInstants(new TimeRangeFilter(startTs, endTs), true, record -> - record.get(ACTION_TYPE_KEY).toString().equals(HoodieTimeline.COMPACTION_ACTION) -&& HoodieInstant.State.INFLIGHT.toString().equals(record.get(ACTION_STATE).toString()) +loadInstants(new TimeRangeFilter(startTs, endTs), true, +record -> { + // Older files don't have action state set. + Object action = record.get(ACTION_STATE); + return record.get(ACTION_TYPE_KEY).toString().equals(HoodieTimeline.COMPACTION_ACTION) +&& (action == null || HoodieInstant.State.INFLIGHT.toString().equals(action.toString())); Review Comment: When action equals null, the instant state is definite to be `INFLIGHT` for old version ? Can we write ta test case? ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java: ## @@ -143,7 +143,11 @@ public void loadInstantDetailsInMemory(String startTs, String endTs) { public void loadCompletedInstantDetailsInMemory() { loadInstants(null, true, -record -> HoodieInstant.State.COMPLETED.toString().equals(record.get(ACTION_STATE).toString())); +record -> { + // Very old archived instants don't have action state set. + Object action = record.get(ACTION_STATE); + return action == null || HoodieInstant.State.COMPLETED.toString().equals(action.toString()); Review Comment: When action equals null, the instant state is definite to be `COMPLETE` for old version ? Can we write ta test case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #7355: [HUDI-5308] Hive3 query returns null when the where clause has a partition field
danny0405 commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1528927662 So it is because the incorrect hive server version is used ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [MINOR] Fix the hudi-cli export command (#8608)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new f3ddcd97625 [MINOR] Fix the hudi-cli export command (#8608) f3ddcd97625 is described below commit f3ddcd97625631f91488da745164bbe7809ecc76 Author: Prashant Wason AuthorDate: Sat Apr 29 20:07:29 2023 -0700 [MINOR] Fix the hudi-cli export command (#8608) 1. Removed the hardcoded location of archives 2. Handle the case where the metadata from an archive entry may be null (seen in very old archives) --- .../java/org/apache/hudi/cli/commands/ExportCommand.java | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/ExportCommand.java b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/ExportCommand.java index 54227a613e4..e81a532f2a8 100644 --- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/ExportCommand.java +++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/ExportCommand.java @@ -44,6 +44,8 @@ import org.apache.hudi.common.table.timeline.HoodieTimeline; import org.apache.hudi.common.table.timeline.TimelineMetadataUtils; import org.apache.hudi.common.util.collection.ClosableIterator; import org.apache.hudi.exception.HoodieException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; import org.springframework.shell.standard.ShellComponent; import org.springframework.shell.standard.ShellMethod; import org.springframework.shell.standard.ShellOption; @@ -67,6 +69,8 @@ import java.util.stream.Collectors; @ShellComponent public class ExportCommand { + private static final Logger LOG = LoggerFactory.getLogger(ExportCommand.class); + @ShellMethod(key = "export instants", value = "Export Instants and their metadata from the Timeline") public String exportInstants( @ShellOption(value = {"--limit"}, help = "Limit Instants", defaultValue = "-1") final Integer limit, @@ -77,7 +81,7 @@ public class ExportCommand { throws Exception { final String basePath = HoodieCLI.getTableMetaClient().getBasePath(); -final Path archivePath = new Path(basePath + "/.hoodie/.commits_.archive*"); +final Path archivePath = new Path(HoodieCLI.getTableMetaClient().getArchivePath()); final Set actionSet = new HashSet(Arrays.asList(filter.split(","))); int numExports = limit == -1 ? Integer.MAX_VALUE : limit; int numCopied = 0; @@ -121,7 +125,7 @@ public class ExportCommand { Reader reader = HoodieLogFormat.newReader(fileSystem, new HoodieLogFile(fs.getPath()), HoodieArchivedMetaEntry.getClassSchema()); // read the avro blocks - while (reader.hasNext() && copyCount < limit) { + while (reader.hasNext() && copyCount++ < limit) { HoodieAvroDataBlock blk = (HoodieAvroDataBlock) reader.next(); try (ClosableIterator> recordItr = blk.getRecordIterator(HoodieRecordType.AVRO)) { while (recordItr.hasNext()) { @@ -158,11 +162,12 @@ public class ExportCommand { } final String instantTime = archiveEntryRecord.get("commitTime").toString(); +if (metadata == null) { + LOG.error("Could not load metadata for action " + action + " at instant time " + instantTime); + continue; +} final String outPath = localFolder + Path.SEPARATOR + instantTime + "." + action; writeToFile(outPath, HoodieAvroUtils.avroToJson(metadata, true)); -if (++copyCount == limit) { - break; -} } } }
[GitHub] [hudi] danny0405 commented on pull request #8608: [MINOR] Fixed the hudi-cli export command.
danny0405 commented on PR #8608: URL: https://github.com/apache/hudi/pull/8608#issuecomment-1528927412 The failed test case: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=16754&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0&t=746585d8-b50a-55c3-26c5-517d93af9934&l=37674 Should not be caused by this patch, would merge it soon~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 merged pull request #8608: [MINOR] Fixed the hudi-cli export command.
danny0405 merged PR #8608: URL: https://github.com/apache/hudi/pull/8608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #8594: [HUDI-6148] Recreate StreamWriteOperatorCoordinator for global failovers
danny0405 commented on PR #8594: URL: https://github.com/apache/hudi/pull/8594#issuecomment-1528927069 Thanks for the contribution @xccui , can you illustrate what kind of connection pool is not released when global failure is triggered? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #8596: [BUG-FIX] use try with resource to close stream
danny0405 commented on PR #8596: URL: https://github.com/apache/hudi/pull/8596#issuecomment-1528926785 Hi, can you elaborate a little more what would happen if the inputstream is not closed properly? Can you write a test case to demonstrate the resolution of the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8596: [BUG-FIX] use try with resource to close stream
hudi-bot commented on PR #8596: URL: https://github.com/apache/hudi/pull/8596#issuecomment-1528925367 ## CI report: * 66912e50cc13e9fdfeaddd68bfe53aead0f493cc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16731) * 8d29d9571d94e3d654e87151b16ef99ff02762b4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16763) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8596: [BUG-FIX] use try with resource to close stream
hudi-bot commented on PR #8596: URL: https://github.com/apache/hudi/pull/8596#issuecomment-1528924611 ## CI report: * 66912e50cc13e9fdfeaddd68bfe53aead0f493cc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16731) * 8d29d9571d94e3d654e87151b16ef99ff02762b4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable
hudi-bot commented on PR #8190: URL: https://github.com/apache/hudi/pull/8190#issuecomment-1528887656 ## CI report: * 7c71b63797be01ee91268c2520f82b18b3f13b7c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable
hudi-bot commented on PR #8190: URL: https://github.com/apache/hudi/pull/8190#issuecomment-1528869463 ## CI report: * 1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734) * 7c71b63797be01ee91268c2520f82b18b3f13b7c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16762) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable
hudi-bot commented on PR #8190: URL: https://github.com/apache/hudi/pull/8190#issuecomment-1528868341 ## CI report: * 1557ca7eeb8ef85bb76fe75ac38f0201dcf6de96 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15726) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15734) * 7c71b63797be01ee91268c2520f82b18b3f13b7c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8517: [MINOR] Update colstats parallelism default to 200
hudi-bot commented on PR #8517: URL: https://github.com/apache/hudi/pull/8517#issuecomment-1528828147 ## CI report: * 100ac5f5f9d8e9935625dda5419d5d66a92126a6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16761) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8611: [HUDI-6157] Fix potential data loss for flink streaming source from table with multi writer
hudi-bot commented on PR #8611: URL: https://github.com/apache/hudi/pull/8611#issuecomment-1528814552 ## CI report: * e3b3799e1e360710b99bc089f193b771fc8c4db3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16759) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8517: [MINOR] Update colstats parallelism default to 200
hudi-bot commented on PR #8517: URL: https://github.com/apache/hudi/pull/8517#issuecomment-1528805834 ## CI report: * 250a4dfe87b170e5df2ec282b9214e90f77fec45 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16503) * 100ac5f5f9d8e9935625dda5419d5d66a92126a6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16761) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8517: [MINOR] Update colstats parallelism default to 200
hudi-bot commented on PR #8517: URL: https://github.com/apache/hudi/pull/8517#issuecomment-1528804334 ## CI report: * 250a4dfe87b170e5df2ec282b9214e90f77fec45 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16503) * 100ac5f5f9d8e9935625dda5419d5d66a92126a6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8610: [HUDI-6156] prevent leaving tmp file in timeline when multi process t…
hudi-bot commented on PR #8610: URL: https://github.com/apache/hudi/pull/8610#issuecomment-1528802806 ## CI report: * f34ffd6ccf4fd366ade5dad8487ff9a0a248bec8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16758) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-6035] Make simple index parallelism auto inferred (#8468)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 78ad883a067 [HUDI-6035] Make simple index parallelism auto inferred (#8468) 78ad883a067 is described below commit 78ad883a067537bfef866dd5388faa4922efbd58 Author: clownxc <598457...@qq.com> AuthorDate: Sat Apr 29 22:25:07 2023 +0800 [HUDI-6035] Make simple index parallelism auto inferred (#8468) - Co-authored-by: ClownXC Co-authored-by: Raymond Xu --- .../main/java/org/apache/hudi/config/HoodieIndexConfig.java| 10 +- .../java/org/apache/hudi/index/simple/HoodieSimpleIndex.java | 7 ++- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java index fd50fdb0f6d..dc0b1cd5f4a 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java @@ -189,14 +189,14 @@ public class HoodieIndexConfig extends HoodieConfig { public static final ConfigProperty SIMPLE_INDEX_PARALLELISM = ConfigProperty .key("hoodie.simple.index.parallelism") - .defaultValue("100") + .defaultValue("0") .markAdvanced() .withDocumentation("Only applies if index type is SIMPLE. " + "This limits the parallelism of fetching records from the base files of affected " - + "partitions. The index picks the configured parallelism if the number of base " - + "files is larger than this configured value; otherwise, the number of base files " - + "is used as the parallelism. If the indexing stage is slow due to the limited " - + "parallelism, you can increase this to tune the performance."); + + "partitions. By default, this is auto computed based on input workload characteristics. " + + "If the parallelism is explicitly configured by the user, the user-configured " + + "value is used in defining the actual parallelism. If the indexing stage is slow " + + "due to the limited parallelism, you can increase this to tune the performance."); public static final ConfigProperty GLOBAL_SIMPLE_INDEX_PARALLELISM = ConfigProperty .key("hoodie.global.simple.index.parallelism") diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieSimpleIndex.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieSimpleIndex.java index 95823ff51e3..dbc49d0655f 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieSimpleIndex.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieSimpleIndex.java @@ -107,11 +107,16 @@ public class HoodieSimpleIndex .getString(HoodieIndexConfig.SIMPLE_INDEX_INPUT_STORAGE_LEVEL_VALUE)); } +int inputParallelism = inputRecords.getNumPartitions(); +int configuredSimpleIndexParallelism = config.getSimpleIndexParallelism(); +// NOTE: Target parallelism could be overridden by the config +int targetParallelism = +configuredSimpleIndexParallelism > 0 ? configuredSimpleIndexParallelism : inputParallelism; HoodiePairData> keyedInputRecords = inputRecords.mapToPair(record -> new ImmutablePair<>(record.getKey(), record)); HoodiePairData existingLocationsOnTable = fetchRecordLocationsForAffectedPartitions(keyedInputRecords.keys(), context, hoodieTable, -config.getSimpleIndexParallelism()); +targetParallelism); HoodieData> taggedRecords = keyedInputRecords.leftOuterJoin(existingLocationsOnTable).map(entry -> {
[GitHub] [hudi] xushiyan merged pull request #8468: [HUDI-6035] Make simple index parallelism auto inferred
xushiyan merged PR #8468: URL: https://github.com/apache/hudi/pull/8468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8468: [HUDI-6035] Make simple index parallelism auto inferred
hudi-bot commented on PR #8468: URL: https://github.com/apache/hudi/pull/8468#issuecomment-1528791113 ## CI report: * 9bce0a1d69458192721d929a554ef16281a13bed UNKNOWN * 1849bb1337d66a6433cad4cd38f0f1b978390b31 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16757) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8611: [HUDI-6157] Fix potential data loss for flink streaming source from table with multi writer
hudi-bot commented on PR #8611: URL: https://github.com/apache/hudi/pull/8611#issuecomment-1528781383 ## CI report: * e3b3799e1e360710b99bc089f193b771fc8c4db3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16759) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8611: [HUDI-6157] Fix potential data loss for flink streaming source from table with multi writer
hudi-bot commented on PR #8611: URL: https://github.com/apache/hudi/pull/8611#issuecomment-1528780076 ## CI report: * e3b3799e1e360710b99bc089f193b771fc8c4db3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8594: [HUDI-6148] Recreate StreamWriteOperatorCoordinator for global failovers
hudi-bot commented on PR #8594: URL: https://github.com/apache/hudi/pull/8594#issuecomment-1528778716 ## CI report: * ff459b2c4de2e4adcdd30977193b026d34636c7b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16755) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6157) Fix potential data loss for flink streaming source from table with multi writer
[ https://issues.apache.org/jira/browse/HUDI-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6157: - Labels: pull-request-available (was: ) > Fix potential data loss for flink streaming source from table with multi > writer > --- > > Key: HUDI-6157 > URL: https://issues.apache.org/jira/browse/HUDI-6157 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 opened a new pull request, #8611: [HUDI-6157] Fix potential data loss for flink streaming source from table with multi writer
danny0405 opened a new pull request, #8611: URL: https://github.com/apache/hudi/pull/8611 …able with multi writer ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6157) Fix potential data loss for flink streaming source from table with multi writer
Danny Chen created HUDI-6157: Summary: Fix potential data loss for flink streaming source from table with multi writer Key: HUDI-6157 URL: https://issues.apache.org/jira/browse/HUDI-6157 Project: Apache Hudi Issue Type: Bug Components: flink-sql Reporter: Danny Chen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #8609: [HUDI-6154] Introduced rety while reading hoodie.properties to deal with parallel updates.
hudi-bot commented on PR #8609: URL: https://github.com/apache/hudi/pull/8609#issuecomment-1528768431 ## CI report: * 33114fa16eff146842ea56a8e178441ed448866f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16756) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8608: [MINOR] Fixed the hudi-cli export command.
hudi-bot commented on PR #8608: URL: https://github.com/apache/hudi/pull/8608#issuecomment-1528757217 ## CI report: * bef668a7c58f3af8ccaf2b70bdda69c5db2e9952 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16754) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive3 query returns null when the where clause has a partition field
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1528754157 ## CI report: * e371363eb434b8c1878b0b1cf5d26121303c05e1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16740) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16753) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8610: [HUDI-6156] prevent leaving tmp file in timeline when multi process t…
hudi-bot commented on PR #8610: URL: https://github.com/apache/hudi/pull/8610#issuecomment-1528739547 ## CI report: * f34ffd6ccf4fd366ade5dad8487ff9a0a248bec8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16758) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8610: [HUDI-6156] prevent leaving tmp file in timeline when multi process t…
hudi-bot commented on PR #8610: URL: https://github.com/apache/hudi/pull/8610#issuecomment-1528737263 ## CI report: * f34ffd6ccf4fd366ade5dad8487ff9a0a248bec8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8468: [HUDI-6035] Make simple index parallelism auto inferred
hudi-bot commented on PR #8468: URL: https://github.com/apache/hudi/pull/8468#issuecomment-1528737160 ## CI report: * 73d1149b6adf91c85e2cd45ef419b8351c07f2cf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16360) * 9bce0a1d69458192721d929a554ef16281a13bed UNKNOWN * 1849bb1337d66a6433cad4cd38f0f1b978390b31 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16757) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8468: [HUDI-6035] Make simple index parallelism auto inferred
hudi-bot commented on PR #8468: URL: https://github.com/apache/hudi/pull/8468#issuecomment-1528735680 ## CI report: * 73d1149b6adf91c85e2cd45ef419b8351c07f2cf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16360) * 9bce0a1d69458192721d929a554ef16281a13bed UNKNOWN * 1849bb1337d66a6433cad4cd38f0f1b978390b31 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6156) prevent leaving tmp file in timeline when multi task try to complete the same instant
[ https://issues.apache.org/jira/browse/HUDI-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6156: - Labels: pull-request-available (was: ) > prevent leaving tmp file in timeline when multi task try to complete the same > instant > - > > Key: HUDI-6156 > URL: https://issues.apache.org/jira/browse/HUDI-6156 > Project: Apache Hudi > Issue Type: Bug >Reporter: HBG >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hbgstc123 opened a new pull request, #8610: [HUDI-6156] prevent leaving tmp file in timeline when multi process t…
hbgstc123 opened a new pull request, #8610: URL: https://github.com/apache/hudi/pull/8610 …ry to complete the same instant ### Change Logs Now if to task try to complete the same instant, a "xxx.tmp" file will leave in the .hoodie dir. For example a flink ingestion job with offline compaction, the ingestion job and offline compaction could both trigger clean task, and there are chances 2 clean task running the same clean instant, and the slow one will fail to rename tmp file(e.g. 20230429171948763.clean.tmp) to final file name (e.g. 20230429171948763.clean), leaving tmp file in timeline. ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7173: [HUDI-5189] Make HiveAvroSerializer compatible with hive3
hudi-bot commented on PR #7173: URL: https://github.com/apache/hudi/pull/7173#issuecomment-1528735209 ## CI report: * 33e116e83e6ca348dc6039db0f76ed5df50a731f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16721) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16730) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16752) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16741) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6156) prevent leaving tmp file in timeline when multi task try to complete the same instant
HBG created HUDI-6156: - Summary: prevent leaving tmp file in timeline when multi task try to complete the same instant Key: HUDI-6156 URL: https://issues.apache.org/jira/browse/HUDI-6156 Project: Apache Hudi Issue Type: Bug Reporter: HBG -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #8468: [HUDI-6035] Make simple index parallelism auto inferred
hudi-bot commented on PR #8468: URL: https://github.com/apache/hudi/pull/8468#issuecomment-1528723365 ## CI report: * 73d1149b6adf91c85e2cd45ef419b8351c07f2cf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16360) * 9bce0a1d69458192721d929a554ef16281a13bed UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8607: [MINOR] Fixed the reading of instants from very old archive files where ACTION_STATE is not present in instants.
hudi-bot commented on PR #8607: URL: https://github.com/apache/hudi/pull/8607#issuecomment-1528718856 ## CI report: * 18ec6f29e045dbb17ba587b54279b807492f71f0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16751) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8606: [MINOR] Check the return value from delete during rollback and finalize to ensure the files actually got deleted.
hudi-bot commented on PR #8606: URL: https://github.com/apache/hudi/pull/8606#issuecomment-1528705558 ## CI report: * e306a06b8c62c4218a0833e271b52364e05c4b50 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16750) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org