[GitHub] [hudi] hudi-bot commented on pull request #3391: [HUDI-83] Fix Timestamp/Date type read by Hive3
hudi-bot commented on PR #3391: URL: https://github.com/apache/hudi/pull/3391#issuecomment-1146551501 ## CI report: * 0e87e2b237e7272ee2e321e91280754d18c63f87 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9072) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] srinugsr2020 commented on issue #3894: [SUPPORT] Property hoodie.datasource.write.recordkey.field not found during version ONE to TWO migration
srinugsr2020 commented on issue #3894: URL: https://github.com/apache/hudi/issues/3894#issuecomment-1146548461 Hi, I recently upgraded Apache Hudi version to 0.10.0 and realized that redshift spectrum is not able to read data. Found couple of other links related to same issue but no solution yet. Only option right now is downgrade to earlier version. I tried downgrade command but received same error (java.lang.IllegalArgumentException: Property hoodie.datasource.write.recordkey.field not found) even after setting Hudi.metadata.enable to false in Hudi.properties. But I tried downgrade to version one and it ran fine however when I ran job it is still giving error unknown version code 3. I am using AWS EMR 6.6.0 with Hudi cli 0.10. Please suggest. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178][Stacked on 5733] Addressing performance regressions in Spark DataSourceV2 Integration
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1146545735 ## CI report: * dad16d1d712576b3b92389ab6ab045dc16bdafbf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9071) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3391: [HUDI-83] Fix Timestamp/Date type read by Hive3
hudi-bot commented on PR #3391: URL: https://github.com/apache/hudi/pull/3391#issuecomment-1146533938 ## CI report: * 6a2ed8538256bc9ee9ef5470cc5c573739a75b4b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9069) * 0e87e2b237e7272ee2e321e91280754d18c63f87 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9072) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3391: [HUDI-83] Fix Timestamp/Date type read by Hive3
hudi-bot commented on PR #3391: URL: https://github.com/apache/hudi/pull/3391#issuecomment-1146533034 ## CI report: * 6a2ed8538256bc9ee9ef5470cc5c573739a75b4b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9069) * 0e87e2b237e7272ee2e321e91280754d18c63f87 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
hudi-bot commented on PR #5733: URL: https://github.com/apache/hudi/pull/5733#issuecomment-1146532508 ## CI report: * bb436a73c4bd66a5a90467475710851b598d2ae9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9070) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178][Stacked on 5733] Addressing performance regressions in Spark DataSourceV2 Integration
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1146520863 ## CI report: * f84be3540d82f6bbb06e3f690671465298f63c9c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9067) * dad16d1d712576b3b92389ab6ab045dc16bdafbf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9071) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
hudi-bot commented on PR #5733: URL: https://github.com/apache/hudi/pull/5733#issuecomment-1146520849 ## CI report: * e6173f00c290e9e0bb55e4c7d8092f5eb26871ae Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9066) * bb436a73c4bd66a5a90467475710851b598d2ae9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9070) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3391: [HUDI-83] Fix Timestamp/Date type read by Hive3
hudi-bot commented on PR #3391: URL: https://github.com/apache/hudi/pull/3391#issuecomment-1146520556 ## CI report: * be76a443fe07639f0eb0cd5727ff64dc3fe29c22 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8863) * 6a2ed8538256bc9ee9ef5470cc5c573739a75b4b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9069) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178][Stacked on 5733] Addressing performance regressions in Spark DataSourceV2 Integration
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1146520090 ## CI report: * f84be3540d82f6bbb06e3f690671465298f63c9c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9067) * dad16d1d712576b3b92389ab6ab045dc16bdafbf UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
hudi-bot commented on PR #5733: URL: https://github.com/apache/hudi/pull/5733#issuecomment-1146520083 ## CI report: * e6173f00c290e9e0bb55e4c7d8092f5eb26871ae Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9066) * bb436a73c4bd66a5a90467475710851b598d2ae9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3391: [HUDI-83] Fix Timestamp/Date type read by Hive3
hudi-bot commented on PR #3391: URL: https://github.com/apache/hudi/pull/3391#issuecomment-1146519726 ## CI report: * be76a443fe07639f0eb0cd5727ff64dc3fe29c22 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8863) * 6a2ed8538256bc9ee9ef5470cc5c573739a75b4b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5747: [HUDI-4171] Fixing Non partitioned with virtual keys in read path
hudi-bot commented on PR #5747: URL: https://github.com/apache/hudi/pull/5747#issuecomment-1146492665 ## CI report: * 5e2e2ecd996e075f8a8ad026e918426f4d4cacce Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9068) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178][Stacked on 5733] Addressing performance regressions in Spark DataSourceV2 Integration
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1146480680 ## CI report: * f84be3540d82f6bbb06e3f690671465298f63c9c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9067) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
hudi-bot commented on PR #5733: URL: https://github.com/apache/hudi/pull/5733#issuecomment-1146480673 ## CI report: * e6173f00c290e9e0bb55e4c7d8092f5eb26871ae Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9066) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5737: [HUDI-4178][Stacked on 5733] Addressing performance regressions in Spark DataSourceV2 Integration
alexeykudinkin commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r889425572 ## hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/catalog/HoodieCatalog.scala: ## @@ -105,12 +106,16 @@ class HoodieCatalog extends DelegatingCatalogExtension case _ => catalogTable0 } -HoodieInternalV2Table( + +val v2Table = HoodieInternalV2Table( spark = spark, path = catalogTable.location.toString, catalogTable = Some(catalogTable), tableIdentifier = Some(ident.toString)) - case o => o +// TODO elaborate +v2Table.v1TableWrapper Review Comment: Why? Catalog still exposes the methods to write into Hudi tables -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a diff in pull request #5737: [HUDI-4178][Stacked on 5733] Addressing performance regressions in Spark DataSourceV2 Integration
leesf commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r889424940 ## hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/catalog/HoodieCatalog.scala: ## @@ -105,12 +106,16 @@ class HoodieCatalog extends DelegatingCatalogExtension case _ => catalogTable0 } -HoodieInternalV2Table( + +val v2Table = HoodieInternalV2Table( spark = spark, path = catalogTable.location.toString, catalogTable = Some(catalogTable), tableIdentifier = Some(ident.toString)) - case o => o +// TODO elaborate +v2Table.v1TableWrapper Review Comment: it means users no long able to write/read data when specifying HoodieCatalog if using V1Table since it has no capabilities. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a diff in pull request #5737: [HUDI-4178][Stacked on 5733] Addressing performance regressions in Spark DataSourceV2 Integration
leesf commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r889424940 ## hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/catalog/HoodieCatalog.scala: ## @@ -105,12 +106,16 @@ class HoodieCatalog extends DelegatingCatalogExtension case _ => catalogTable0 } -HoodieInternalV2Table( + +val v2Table = HoodieInternalV2Table( spark = spark, path = catalogTable.location.toString, catalogTable = Some(catalogTable), tableIdentifier = Some(ident.toString)) - case o => o +// TODO elaborate +v2Table.v1TableWrapper Review Comment: it means users no long able to write/read data when specifying HoodieCatalog -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
alexeykudinkin commented on code in PR #5733: URL: https://github.com/apache/hudi/pull/5733#discussion_r889423624 ## hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/BulkInsertDataInternalWriterHelper.java: ## @@ -128,7 +133,11 @@ public void write(InternalRow record) throws IOException { if (!keyGeneratorOpt.isPresent()) { // NoPartitionerKeyGen partitionPath = ""; } else if (simpleKeyGen) { // SimpleKeyGen - partitionPath = (record.get(simplePartitionFieldIndex, simplePartitionFieldDataType)).toString(); + Object parititionPathValue = record.get(simplePartitionFieldIndex, simplePartitionFieldDataType); Review Comment: Yes, will rebase before landing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
alexeykudinkin commented on PR #5733: URL: https://github.com/apache/hudi/pull/5733#issuecomment-1146477125 Build succeeded: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=9066&view=results -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
alexeykudinkin commented on code in PR #5733: URL: https://github.com/apache/hudi/pull/5733#discussion_r889403945 ## hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java: ## @@ -626,4 +499,128 @@ public Option getTableHistorySchemaStrFromCommitMetadata() { String result = manager.getHistorySchemaStr(); return result.isEmpty() ? Option.empty() : Option.of(result); } + + /** + * NOTE: This method could only be used in tests + * + * @VisibleForTesting + */ + public boolean hasOperationField() { Review Comment: Method did not change ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java: ## @@ -267,43 +265,47 @@ public Option getInstantDetails(HoodieInstant instant) { } /** - * Get the last instant with valid schema, and convert this to HoodieCommitMetadata + * Returns most recent instant having valid schema in its {@link HoodieCommitMetadata} */ public Option> getLastCommitMetadataWithValidSchema() { -List completed = getCommitsTimeline().filterCompletedInstants().getInstants() - .sorted(Comparator.comparing(HoodieInstant::getTimestamp).reversed()).collect(Collectors.toList()); -for (HoodieInstant instant : completed) { - try { -HoodieCommitMetadata commitMetadata = HoodieCommitMetadata.fromBytes( -getInstantDetails(instant).get(), HoodieCommitMetadata.class); -if (!StringUtils.isNullOrEmpty(commitMetadata.getMetadata(HoodieCommitMetadata.SCHEMA_KEY))) { - return Option.of(Pair.of(instant, commitMetadata)); -} - } catch (IOException e) { -LOG.warn("Failed to convert instant to HoodieCommitMetadata: " + instant.toString()); - } -} -return Option.empty(); +return Option.fromJavaOptional( +getCommitMetadataStream() +.filter(instantCommitMetadataPair -> + !StringUtils.isNullOrEmpty(instantCommitMetadataPair.getValue().getMetadata(HoodieCommitMetadata.SCHEMA_KEY))) +.findFirst() +); } /** * Get the last instant with valid data, and convert this to HoodieCommitMetadata */ public Option> getLastCommitMetadataWithValidData() { -List completed = getCommitsTimeline().filterCompletedInstants().getInstants() - .sorted(Comparator.comparing(HoodieInstant::getTimestamp).reversed()).collect(Collectors.toList()); -for (HoodieInstant instant : completed) { - try { -HoodieCommitMetadata commitMetadata = HoodieCommitMetadata.fromBytes( -getInstantDetails(instant).get(), HoodieCommitMetadata.class); -if (!commitMetadata.getFileIdAndRelativePaths().isEmpty()) { - return Option.of(Pair.of(instant, commitMetadata)); -} - } catch (IOException e) { -LOG.warn("Failed to convert instant to HoodieCommitMetadata: " + instant.toString()); - } -} -return Option.empty(); +return Option.fromJavaOptional( +getCommitMetadataStream() +.filter(instantCommitMetadataPair -> + !instantCommitMetadataPair.getValue().getFileIdAndRelativePaths().isEmpty()) +.findFirst() +); + } + + /** + * Returns stream of {@link HoodieCommitMetadata} in order reverse to chronological (ie most + * recent metadata being the first element) + */ + private Stream> getCommitMetadataStream() { +// NOTE: Streams are lazy Review Comment: Yes, streams are lazy therefore it will only compute whole chain only for it to get a single object ## hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java: ## @@ -626,4 +499,128 @@ public Option getTableHistorySchemaStrFromCommitMetadata() { String result = manager.getHistorySchemaStr(); return result.isEmpty() ? Option.empty() : Option.of(result); } + + /** + * NOTE: This method could only be used in tests + * + * @VisibleForTesting + */ + public boolean hasOperationField() { +try { + Schema tableAvroSchema = getTableAvroSchemaFromDataFile(); + return tableAvroSchema.getField(HoodieRecord.OPERATION_METADATA_FIELD) != null; +} catch (Exception e) { + LOG.info(String.format("Failed to read operation field from avro schema (%s)", e.getMessage())); + return false; +} + } + + private Option> getLatestCommitMetadataWithValidSchema() { +if (latestCommitWithValidSchema == null) { + Option> instantAndCommitMetadata = + metaClient.getActiveTimeline().getLastCommitMetadataWithValidSchema(); + if (instantAndCommitMetadata.isPresent()) { +HoodieInstant instant = instantAndCommitMetadata.get().getLeft(); +HoodieCommitMetadata metadata = instantAndCommitMetadata.get().getRight(); +synchronized (this) { + if (latestCommitWithValidSchema == null) { +latestCommitWithValidSchema = instant; +
[GitHub] [hudi] hudi-bot commented on pull request #5747: [HUDI-4171] Fixing Non partitioned with virtual keys in read path
hudi-bot commented on PR #5747: URL: https://github.com/apache/hudi/pull/5747#issuecomment-1146462631 ## CI report: * 5e2e2ecd996e075f8a8ad026e918426f4d4cacce Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9068) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5747: [HUDI-4171] Fixing Non partitioned with virtual keys in read path
hudi-bot commented on PR #5747: URL: https://github.com/apache/hudi/pull/5747#issuecomment-1146460018 ## CI report: * 5e2e2ecd996e075f8a8ad026e918426f4d4cacce UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
nsivabalan commented on code in PR #5733: URL: https://github.com/apache/hudi/pull/5733#discussion_r889414242 ## hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/BulkInsertDataInternalWriterHelper.java: ## @@ -128,7 +133,11 @@ public void write(InternalRow record) throws IOException { if (!keyGeneratorOpt.isPresent()) { // NoPartitionerKeyGen partitionPath = ""; } else if (simpleKeyGen) { // SimpleKeyGen - partitionPath = (record.get(simplePartitionFieldIndex, simplePartitionFieldDataType)).toString(); + Object parititionPathValue = record.get(simplePartitionFieldIndex, simplePartitionFieldDataType); Review Comment: Is it possible to remove the fixes that I have in https://github.com/apache/hudi/pull/5664. you may run into conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
nsivabalan commented on code in PR #5733: URL: https://github.com/apache/hudi/pull/5733#discussion_r889414242 ## hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/BulkInsertDataInternalWriterHelper.java: ## @@ -128,7 +133,11 @@ public void write(InternalRow record) throws IOException { if (!keyGeneratorOpt.isPresent()) { // NoPartitionerKeyGen partitionPath = ""; } else if (simpleKeyGen) { // SimpleKeyGen - partitionPath = (record.get(simplePartitionFieldIndex, simplePartitionFieldDataType)).toString(); + Object parititionPathValue = record.get(simplePartitionFieldIndex, simplePartitionFieldDataType); Review Comment: Is it possible to remove the fixes that I have in https://github.com/apache/hudi/pull/5664 you may run into conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4171) NonPartitioned Key gen w/ virtual keys fails to be read w/ presto
[ https://issues.apache.org/jira/browse/HUDI-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4171: - Labels: pull-request-available (was: ) > NonPartitioned Key gen w/ virtual keys fails to be read w/ presto > - > > Key: HUDI-4171 > URL: https://issues.apache.org/jira/browse/HUDI-4171 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > Looks like non partitioned key gen does not work well when Virtual keys are > enabled. > > {code:java} > Query 20220531_171243_00023_eudi3 failed: Fetching table schema failed with > exception > io.prestosql.spi.PrestoException: Fetching table schema failed with exception > at > io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:223) > at > io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.prestosql.$gen.Presto_33220220531_134705_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hudi.exception.HoodieException: Fetching table schema > failed with exception > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.getHoodieVirtualKeyInfo(HoodieCopyOnWriteTableInputFormat.java:289) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:245) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:140) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:407) > at > io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:287) > at > io.prestosql.plugin.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:107) > at > io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:216) > ... 6 more > Caused by: java.lang.NullPointerException > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.getHoodieVirtualKeyInfo(HoodieCopyOnWriteTableInputFormat.java:287) > ... 14 more {code} > Original table: > bulk insert row writer > Non partitioned key gen > Disable meta fields. > > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[GitHub] [hudi] nsivabalan opened a new pull request, #5747: [HUDI-4171] Fixing Non partitioned with virtual keys in read path
nsivabalan opened a new pull request, #5747: URL: https://github.com/apache/hudi/pull/5747 ## What is the purpose of the pull request When Non partitioned key gen is used with virtual keys, read path could break since partition path may not exist. Fixing that in this patch. ## Brief change log - Fixed generating virtual key info on the read path. ## Verify this pull request Added test to - TestHoodieParquetInputFormat to validate the fix. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178][Stacked on 5733] Fixing `HoodieSpark3Analysis` missing to pass schema from Spark Catalog
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1146435260 ## CI report: * 0290f672dd33aefe7ad33edc95e12979ebf035bd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9045) * f84be3540d82f6bbb06e3f690671465298f63c9c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9067) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178][Stacked on 5733] Fixing `HoodieSpark3Analysis` missing to pass schema from Spark Catalog
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1146433477 ## CI report: * 0290f672dd33aefe7ad33edc95e12979ebf035bd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9045) * f84be3540d82f6bbb06e3f690671465298f63c9c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
hudi-bot commented on PR #5733: URL: https://github.com/apache/hudi/pull/5733#issuecomment-1146433441 ## CI report: * 1e269381cb33f0f92be0749eeea66c3368fc225e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9041) * e6173f00c290e9e0bb55e4c7d8092f5eb26871ae Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9066) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
alexeykudinkin commented on code in PR #5733: URL: https://github.com/apache/hudi/pull/5733#discussion_r889401027 ## hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java: ## @@ -176,86 +124,25 @@ public Schema getTableAvroSchema() throws Exception { * @throws Exception */ public Schema getTableAvroSchema(boolean includeMetadataFields) throws Exception { -Schema schema; -Option schemaFromCommitMetadata = getTableSchemaFromCommitMetadata(includeMetadataFields); -if (schemaFromCommitMetadata.isPresent()) { - schema = schemaFromCommitMetadata.get(); -} else { - Option schemaFromTableConfig = metaClient.getTableConfig().getTableCreateSchema(); - if (schemaFromTableConfig.isPresent()) { -if (includeMetadataFields) { - schema = HoodieAvroUtils.addMetadataFields(schemaFromTableConfig.get(), hasOperationField); -} else { - schema = schemaFromTableConfig.get(); -} - } else { -if (includeMetadataFields) { - schema = getTableAvroSchemaFromDataFile(); -} else { - schema = HoodieAvroUtils.removeMetadataFields(getTableAvroSchemaFromDataFile()); -} - } -} - -Option partitionFieldsOpt = metaClient.getTableConfig().getPartitionFields(); -if (metaClient.getTableConfig().shouldDropPartitionColumns()) { - schema = recreateSchemaWhenDropPartitionColumns(partitionFieldsOpt, schema); -} -return schema; +return getTableAvroSchemaInternal(includeMetadataFields, Option.empty()); } - public static Schema recreateSchemaWhenDropPartitionColumns(Option partitionFieldsOpt, Schema originSchema) { -// when hoodie.datasource.write.drop.partition.columns is true, partition columns can't be persisted in data files. -// And there are no partition schema if the schema is parsed from data files. -// Here we create partition Fields for this case, and use StringType as the data type. -Schema schema = originSchema; -if (partitionFieldsOpt.isPresent() && partitionFieldsOpt.get().length != 0) { - List partitionFields = Arrays.asList(partitionFieldsOpt.get()); - - final Schema schema0 = originSchema; - boolean hasPartitionColNotInSchema = partitionFields.stream().anyMatch( - pt -> !HoodieAvroUtils.containsFieldInSchema(schema0, pt) - ); - boolean hasPartitionColInSchema = partitionFields.stream().anyMatch( - pt -> HoodieAvroUtils.containsFieldInSchema(schema0, pt) - ); - if (hasPartitionColNotInSchema && hasPartitionColInSchema) { -throw new HoodieIncompatibleSchemaException( -"Not support: Partial partition fields are still in the schema " -+ "when enable hoodie.datasource.write.drop.partition.columns"); - } - - if (hasPartitionColNotInSchema) { -// when hasPartitionColNotInSchema is true and hasPartitionColInSchema is false, all partition columns -// are not in originSchema. So we create and add them. -List newFields = new ArrayList<>(); -for (String partitionField: partitionFields) { - newFields.add(new Schema.Field( - partitionField, createNullableSchema(Schema.Type.STRING), "", JsonProperties.NULL_VALUE)); -} -schema = appendFieldsToSchema(schema, newFields); - } -} -return schema; + /** + * Fetches tables schema in Avro format as of the given instant + * + * @param instant as of which table's schema will be fetched + */ + public Schema getTableAvroSchema(HoodieInstant instant, boolean includeMetadataFields) throws Exception { +return getTableAvroSchemaInternal(includeMetadataFields, Option.of(instant)); } /** * Gets full schema (user + metadata) for a hoodie table in Parquet format. * * @return Parquet schema for the table - * @throws Exception */ public MessageType getTableParquetSchema() throws Exception { -Option schemaFromCommitMetadata = getTableSchemaFromCommitMetadata(true); -if (schemaFromCommitMetadata.isPresent()) { - return convertAvroSchemaToParquet(schemaFromCommitMetadata.get()); -} -Option schemaFromTableConfig = metaClient.getTableConfig().getTableCreateSchema(); -if (schemaFromTableConfig.isPresent()) { - Schema schema = HoodieAvroUtils.addMetadataFields(schemaFromTableConfig.get(), hasOperationField); - return convertAvroSchemaToParquet(schema); -} -return getTableParquetSchemaFromDataFile(); +return convertAvroSchemaToParquet(getTableAvroSchema(true)); Review Comment: It was not handled correctly before -- this config has to be handled in all code-paths -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about th
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
alexeykudinkin commented on code in PR #5733: URL: https://github.com/apache/hudi/pull/5733#discussion_r889401027 ## hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java: ## @@ -176,86 +124,25 @@ public Schema getTableAvroSchema() throws Exception { * @throws Exception */ public Schema getTableAvroSchema(boolean includeMetadataFields) throws Exception { -Schema schema; -Option schemaFromCommitMetadata = getTableSchemaFromCommitMetadata(includeMetadataFields); -if (schemaFromCommitMetadata.isPresent()) { - schema = schemaFromCommitMetadata.get(); -} else { - Option schemaFromTableConfig = metaClient.getTableConfig().getTableCreateSchema(); - if (schemaFromTableConfig.isPresent()) { -if (includeMetadataFields) { - schema = HoodieAvroUtils.addMetadataFields(schemaFromTableConfig.get(), hasOperationField); -} else { - schema = schemaFromTableConfig.get(); -} - } else { -if (includeMetadataFields) { - schema = getTableAvroSchemaFromDataFile(); -} else { - schema = HoodieAvroUtils.removeMetadataFields(getTableAvroSchemaFromDataFile()); -} - } -} - -Option partitionFieldsOpt = metaClient.getTableConfig().getPartitionFields(); -if (metaClient.getTableConfig().shouldDropPartitionColumns()) { - schema = recreateSchemaWhenDropPartitionColumns(partitionFieldsOpt, schema); -} -return schema; +return getTableAvroSchemaInternal(includeMetadataFields, Option.empty()); } - public static Schema recreateSchemaWhenDropPartitionColumns(Option partitionFieldsOpt, Schema originSchema) { -// when hoodie.datasource.write.drop.partition.columns is true, partition columns can't be persisted in data files. -// And there are no partition schema if the schema is parsed from data files. -// Here we create partition Fields for this case, and use StringType as the data type. -Schema schema = originSchema; -if (partitionFieldsOpt.isPresent() && partitionFieldsOpt.get().length != 0) { - List partitionFields = Arrays.asList(partitionFieldsOpt.get()); - - final Schema schema0 = originSchema; - boolean hasPartitionColNotInSchema = partitionFields.stream().anyMatch( - pt -> !HoodieAvroUtils.containsFieldInSchema(schema0, pt) - ); - boolean hasPartitionColInSchema = partitionFields.stream().anyMatch( - pt -> HoodieAvroUtils.containsFieldInSchema(schema0, pt) - ); - if (hasPartitionColNotInSchema && hasPartitionColInSchema) { -throw new HoodieIncompatibleSchemaException( -"Not support: Partial partition fields are still in the schema " -+ "when enable hoodie.datasource.write.drop.partition.columns"); - } - - if (hasPartitionColNotInSchema) { -// when hasPartitionColNotInSchema is true and hasPartitionColInSchema is false, all partition columns -// are not in originSchema. So we create and add them. -List newFields = new ArrayList<>(); -for (String partitionField: partitionFields) { - newFields.add(new Schema.Field( - partitionField, createNullableSchema(Schema.Type.STRING), "", JsonProperties.NULL_VALUE)); -} -schema = appendFieldsToSchema(schema, newFields); - } -} -return schema; + /** + * Fetches tables schema in Avro format as of the given instant + * + * @param instant as of which table's schema will be fetched + */ + public Schema getTableAvroSchema(HoodieInstant instant, boolean includeMetadataFields) throws Exception { +return getTableAvroSchemaInternal(includeMetadataFields, Option.of(instant)); } /** * Gets full schema (user + metadata) for a hoodie table in Parquet format. * * @return Parquet schema for the table - * @throws Exception */ public MessageType getTableParquetSchema() throws Exception { -Option schemaFromCommitMetadata = getTableSchemaFromCommitMetadata(true); -if (schemaFromCommitMetadata.isPresent()) { - return convertAvroSchemaToParquet(schemaFromCommitMetadata.get()); -} -Option schemaFromTableConfig = metaClient.getTableConfig().getTableCreateSchema(); -if (schemaFromTableConfig.isPresent()) { - Schema schema = HoodieAvroUtils.addMetadataFields(schemaFromTableConfig.get(), hasOperationField); - return convertAvroSchemaToParquet(schema); -} -return getTableParquetSchemaFromDataFile(); +return convertAvroSchemaToParquet(getTableAvroSchema(true)); Review Comment: It was not handled correctly before -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@
[GitHub] [hudi] hudi-bot commented on pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
hudi-bot commented on PR #5733: URL: https://github.com/apache/hudi/pull/5733#issuecomment-1146431494 ## CI report: * 1e269381cb33f0f92be0749eeea66c3368fc225e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9041) * e6173f00c290e9e0bb55e4c7d8092f5eb26871ae UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5733: [HUDI-4176] Fixing `TableSchemaResolver` to avoid repeated `HoodieCommitMetadata` parsing
alexeykudinkin commented on code in PR #5733: URL: https://github.com/apache/hudi/pull/5733#discussion_r889400793 ## hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java: ## @@ -58,100 +60,46 @@ import org.apache.parquet.hadoop.metadata.ParquetMetadata; import org.apache.parquet.schema.MessageType; +import javax.annotation.concurrent.ThreadSafe; import java.io.IOException; import java.util.ArrayList; import java.util.Arrays; import java.util.Iterator; import java.util.List; +import java.util.concurrent.ConcurrentHashMap; import static org.apache.hudi.avro.AvroSchemaUtils.appendFieldsToSchema; +import static org.apache.hudi.avro.AvroSchemaUtils.containsFieldInSchema; import static org.apache.hudi.avro.AvroSchemaUtils.createNullableSchema; /** * Helper class to read schema from data files and log files and to convert it between different formats. - * - * TODO(HUDI-3626) cleanup */ +@ThreadSafe public class TableSchemaResolver { private static final Logger LOG = LogManager.getLogger(TableSchemaResolver.class); - private final HoodieTableMetaClient metaClient; - private final boolean hasOperationField; - public TableSchemaResolver(HoodieTableMetaClient metaClient) { -this.metaClient = metaClient; -this.hasOperationField = hasOperationField(); - } + private final HoodieTableMetaClient metaClient; /** - * Gets the schema for a hoodie table. Depending on the type of table, read from any file written in the latest - * commit. We will assume that the schema has not changed within a single atomic write. + * NOTE: {@link HoodieCommitMetadata} could be of non-trivial size for large tables (in 100s of Mbs) + * and therefore we'd want to limit amount of throw-away work being performed while fetching + * commits' metadata * - * @return Parquet schema for this table + * Please check out corresponding methods to fetch commonly used instances of {@link HoodieCommitMetadata}: + * {@link #getLatestCommitMetadataWithValidSchema()}, + * {@link #getLatestCommitMetadataWithValidSchema()}, + * {@link #getCachedCommitMetadata(HoodieInstant)} */ - private MessageType getTableParquetSchemaFromDataFile() { -HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline(); -Option> instantAndCommitMetadata = -activeTimeline.getLastCommitMetadataWithValidData(); -try { - switch (metaClient.getTableType()) { -case COPY_ON_WRITE: - // For COW table, the file has data written must be in parquet or orc format currently. - if (instantAndCommitMetadata.isPresent()) { -HoodieCommitMetadata commitMetadata = instantAndCommitMetadata.get().getRight(); -Iterator filePaths = commitMetadata.getFileIdAndFullPaths(metaClient.getBasePath()).values().iterator(); -return fetchSchemaFromFiles(filePaths); - } else { -throw new IllegalArgumentException("Could not find any data file written for commit, " -+ "so could not get schema for table " + metaClient.getBasePath()); - } -case MERGE_ON_READ: - // For MOR table, the file has data written may be a parquet file, .log file, orc file or hfile. - // Determine the file format based on the file name, and then extract schema from it. - if (instantAndCommitMetadata.isPresent()) { -HoodieCommitMetadata commitMetadata = instantAndCommitMetadata.get().getRight(); -Iterator filePaths = commitMetadata.getFileIdAndFullPaths(metaClient.getBasePath()).values().iterator(); -return fetchSchemaFromFiles(filePaths); - } else { -throw new IllegalArgumentException("Could not find any data file written for commit, " -+ "so could not get schema for table " + metaClient.getBasePath()); - } -default: - LOG.error("Unknown table type " + metaClient.getTableType()); - throw new InvalidTableException(metaClient.getBasePath()); - } -} catch (IOException e) { - throw new HoodieException("Failed to read data schema", e); -} - } + private final Lazy> commitMetadataCache; Review Comment: Discussed offline: `TableSchemaResolver` is a short-lived object not meant to be refreshed -- to get latest schema you will have to create another instance with the refreshed `HoodieMetaClient` instance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on issue #5488: [SUPPORT] Read hive Table fail when HoodieCatalog used
leesf commented on issue #5488: URL: https://github.com/apache/hudi/issues/5488#issuecomment-1146421860 Closing the issue, @parisni please reopen if you have new problems. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf closed issue #5488: [SUPPORT] Read hive Table fail when HoodieCatalog used
leesf closed issue #5488: [SUPPORT] Read hive Table fail when HoodieCatalog used URL: https://github.com/apache/hudi/issues/5488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on issue #5537: hudi supports custom catalog name, spark_catalog is not mandatory
leesf commented on issue #5537: URL: https://github.com/apache/hudi/issues/5537#issuecomment-1146419871 @melin I think you can specify `spark_catalog` to `HoodieCatalog` and custom catalog for iceberg catalog for a currently workaround, since Hudi currently do not support custom catalogs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: Resized the blog images , tags onHover:blue, readingTime only visible in blogs page (#5745)
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new eb17d00b22 Resized the blog images , tags onHover:blue, readingTime only visible in blogs page (#5745) eb17d00b22 is described below commit eb17d00b2239dfb42f3b7643cac40c0dd36a7bd9 Author: yadav-jai <97013124+yadav-...@users.noreply.github.com> AuthorDate: Sat Jun 4 04:04:07 2022 +0530 Resized the blog images , tags onHover:blue, readingTime only visible in blogs page (#5745) --- website/src/css/custom.css | 9 ++--- website/src/theme/BlogPostItem/index.js | 6 +++--- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/website/src/css/custom.css b/website/src/css/custom.css index cee132bb1b..57eeca4bb2 100644 --- a/website/src/css/custom.css +++ b/website/src/css/custom.css @@ -205,15 +205,18 @@ footer .container { } .blogThumbnail img { - height: auto; - width: 100%; + height:100%; + width: auto; +} +.tagRegular_node_modules-\@docusaurus-theme-classic-lib-next-theme-Tag-styles-module{ + color:black } - .blog-list-page article { display: inline-flex; width: 45%; + margin: 1.2em; vertical-align: text-top; diff --git a/website/src/theme/BlogPostItem/index.js b/website/src/theme/BlogPostItem/index.js index 1478e1861f..8c9525e2f2 100644 --- a/website/src/theme/BlogPostItem/index.js +++ b/website/src/theme/BlogPostItem/index.js @@ -68,7 +68,7 @@ return ( <> - Tags: + {tags.map(({label, permalink: tagPermalink}) => ( @@ -155,12 +155,12 @@ const AuthorsList = () => { {AuthorsList()} - + {isBlogPostPage && <> {typeof readingTime !== 'undefined' && ( <> {readingTimePlural(readingTime)} - )} + )}}
[GitHub] [hudi] bhasudha merged pull request #5745: [MINOR][UI]Resized the blog images , tags onHover:blue, readingTime only visible in blogs page
bhasudha merged PR #5745: URL: https://github.com/apache/hudi/pull/5745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #5729: [SUPPORT] Environment issues when running Demo for v0.11
xushiyan commented on issue #5729: URL: https://github.com/apache/hudi/issues/5729#issuecomment-1146407055 @GnsCy right; this more likely caused by out-of-date instructions or configs that may changed in the newer release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4178) Performance regressions in Spark DataSourceV2 Integration
[ https://issues.apache.org/jira/browse/HUDI-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4178: -- Story Points: 4 (was: 1) Summary: Performance regressions in Spark DataSourceV2 Integration (was: HoodieSpark3Analysis does not pass schema from Spark Catalog) > Performance regressions in Spark DataSourceV2 Integration > - > > Key: HUDI-4178 > URL: https://issues.apache.org/jira/browse/HUDI-4178 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.11.0 >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.1 > > > There are multiple issues with our current DataSource V2 integrations: > Because we advertise Hudi tables as V2, Spark expects it to implement certain > APIs which are not implemented at the moment, instead we're using custom > Resolution rule (in HoodieSpark3Analysis) to instead manually fallback to V1 > APIs. This poses following problems > # It doesn't fully implement Spark's protocol: for ex, this rule doesn't > cache produced `LogicalPlan` making Spark re-create Hudi relations from > scratch (including doing full table's file-listing) for every query reading > this table. However, adding the caching in that sequence is not an option, > since V2 APIs manage cache differently and therefore for us to be able to > leverage that cache we will have to manage all of its lifecycle (adding, > flushing) > # Additionally, HoodieSpark3Analysis rule does not pass table's schema from > the Spark Catalog to Hudi's relations making them fetch the schema from > storage (either from commit's metadata or data file) every time > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-4178) HoodieSpark3Analysis does not pass schema from Spark Catalog
[ https://issues.apache.org/jira/browse/HUDI-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4178: -- Description: There are multiple issues with our current DataSource V2 integrations: Because we advertise Hudi tables as V2, Spark expects it to implement certain APIs which are not implemented at the moment, instead we're using custom Resolution rule (in HoodieSpark3Analysis) to instead manually fallback to V1 APIs. This poses following problems # It doesn't fully implement Spark's protocol: for ex, this rule doesn't cache produced `LogicalPlan` making Spark re-create Hudi relations from scratch (including doing full table's file-listing) for every query reading this table. However, adding the caching in that sequence is not an option, since V2 APIs manage cache differently and therefore for us to be able to leverage that cache we will have to manage all of its lifecycle (adding, flushing) # Additionally, HoodieSpark3Analysis rule does not pass table's schema from the Spark Catalog to Hudi's relations making them fetch the schema from storage (either from commit's metadata or data file) every time was: Currently, HoodieSpark3Analysis rule does not pass table's schema from the Spark Catalog to Hudi's relations making them fetch the schema from storage (either from commit's metadata or data file) every time. > HoodieSpark3Analysis does not pass schema from Spark Catalog > > > Key: HUDI-4178 > URL: https://issues.apache.org/jira/browse/HUDI-4178 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.11.0 >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.1 > > > There are multiple issues with our current DataSource V2 integrations: > Because we advertise Hudi tables as V2, Spark expects it to implement certain > APIs which are not implemented at the moment, instead we're using custom > Resolution rule (in HoodieSpark3Analysis) to instead manually fallback to V1 > APIs. This poses following problems > # It doesn't fully implement Spark's protocol: for ex, this rule doesn't > cache produced `LogicalPlan` making Spark re-create Hudi relations from > scratch (including doing full table's file-listing) for every query reading > this table. However, adding the caching in that sequence is not an option, > since V2 APIs manage cache differently and therefore for us to be able to > leverage that cache we will have to manage all of its lifecycle (adding, > flushing) > # Additionally, HoodieSpark3Analysis rule does not pass table's schema from > the Spark Catalog to Hudi's relations making them fetch the schema from > storage (either from commit's metadata or data file) every time > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[GitHub] [hudi] leesf commented on pull request #5743: [HUDI-4183] Fix using HoodieCatalog to create non-hudi tables
leesf commented on PR #5743: URL: https://github.com/apache/hudi/pull/5743#issuecomment-1146401813 > @leesf can you please add a description to the PR and also the Jira? @alexeykudinkin sure and done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables
[ https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-4183: Description: For now, when users specify `HoodieCatalog` in 0.11.0, they would not create non-hudi tables since HoodieCatalog#createTable do not handle the logic of non-hudi tables, in fact the logic is missed in #createTable method, and we should fix it. > Fix using HoodieCatalog to create non-hudi tables > - > > Key: HUDI-4183 > URL: https://issues.apache.org/jira/browse/HUDI-4183 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > For now, when users specify `HoodieCatalog` in 0.11.0, they would not create > non-hudi tables since HoodieCatalog#createTable do not handle the logic of > non-hudi tables, in fact the logic is missed in #createTable method, and we > should fix it. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[GitHub] [hudi] yihua commented on a diff in pull request #5664: [HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys
yihua commented on code in PR #5664: URL: https://github.com/apache/hudi/pull/5664#discussion_r889380010 ## hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/BulkInsertDataInternalWriterHelper.java: ## @@ -87,6 +89,7 @@ public BulkInsertDataInternalWriterHelper(HoodieTable hoodieTable, HoodieWriteCo this.populateMetaFields = populateMetaFields; this.arePartitionRecordsSorted = arePartitionRecordsSorted; this.fileIdPrefix = UUID.randomUUID().toString(); +this.isHiveStylePartitioning = writeConfig.isHiveStylePartitioningEnabled(); Review Comment: nit: `writeConfig` is saved inside this helper so we don't need to have another member variable `isHiveStylePartitioning`? ## hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieBulkInsertDataInternalWriter.java: ## @@ -109,6 +109,48 @@ public void testDataInternalWriter(boolean sorted, boolean populateMetaFields) t } } + @Test + public void testDataInternalWriterHiveStylePartitioning() throws Exception { +boolean sorted = true; +boolean populateMetaFields = false; +// init config and table +HoodieWriteConfig cfg = getWriteConfig(populateMetaFields, "true"); +HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient); +for (int i = 0; i < 1; i++) { + String instantTime = "00" + i; + // init writer + HoodieBulkInsertDataInternalWriter writer = new HoodieBulkInsertDataInternalWriter(table, cfg, instantTime, RANDOM.nextInt(10), RANDOM.nextLong(), RANDOM.nextLong(), + STRUCT_TYPE, populateMetaFields, sorted); + + int size = 10 + RANDOM.nextInt(1000); + // write N rows to partition1, N rows to partition2 and N rows to partition3 ... Each batch should create a new RowCreateHandle and a new file + int batches = 3; + Dataset totalInputRows = null; + + for (int j = 0; j < batches; j++) { +String partitionPath = HoodieTestDataGenerator.DEFAULT_PARTITION_PATHS[j % 3]; +Dataset inputRows = getRandomRows(sqlContext, size, partitionPath, false); +writeRows(inputRows, writer); +if (totalInputRows == null) { + totalInputRows = inputRows; +} else { + totalInputRows = totalInputRows.union(inputRows); +} + } + + BaseWriterCommitMessage commitMetadata = (BaseWriterCommitMessage) writer.commit(); + Option> fileAbsPaths = Option.of(new ArrayList<>()); + Option> fileNames = Option.of(new ArrayList<>()); + + // verify write statuses + assertWriteStatuses(commitMetadata.getWriteStatuses(), batches, size, sorted, fileAbsPaths, fileNames); + + // verify rows + Dataset result = sqlContext.read().parquet(fileAbsPaths.get().toArray(new String[0])); + assertOutput(totalInputRows, result, instantTime, fileNames, populateMetaFields); Review Comment: Do we want to validate the hive-style partition path value somewhere? ## hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/BulkInsertDataInternalWriterHelper.java: ## @@ -128,7 +133,11 @@ public void write(InternalRow record) throws IOException { if (!keyGeneratorOpt.isPresent()) { // NoPartitionerKeyGen partitionPath = ""; } else if (simpleKeyGen) { // SimpleKeyGen - partitionPath = (record.get(simplePartitionFieldIndex, simplePartitionFieldDataType)).toString(); + Object parititionPathValue = record.get(simplePartitionFieldIndex, simplePartitionFieldDataType); + partitionPath = parititionPathValue != null ? parititionPathValue.toString() : PartitionPathEncodeUtils.DEFAULT_PARTITION_PATH; + if (isHiveStylePartitioning) { +partitionPath = (keyGeneratorOpt.get()).getPartitionPathFields().get(0) + "=" + partitionPath; Review Comment: For `SimpleKeyGenerator`, there could be only one partition path field. Is that correct? ## hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/BulkInsertDataInternalWriterHelper.java: ## @@ -128,7 +133,11 @@ public void write(InternalRow record) throws IOException { if (!keyGeneratorOpt.isPresent()) { // NoPartitionerKeyGen partitionPath = ""; } else if (simpleKeyGen) { // SimpleKeyGen - partitionPath = (record.get(simplePartitionFieldIndex, simplePartitionFieldDataType)).toString(); + Object parititionPathValue = record.get(simplePartitionFieldIndex, simplePartitionFieldDataType); + partitionPath = parititionPathValue != null ? parititionPathValue.toString() : PartitionPathEncodeUtils.DEFAULT_PARTITION_PATH; + if (isHiveStylePartitioning) { +partitionPath = (keyGeneratorOpt.get()).getPartitionPathFields().get(0) + "=" + partitionPath; Review Comment: @nsivabalan could you simply leverage `SimpleKeyGenerator::getParti
[GitHub] [hudi] ctlgdanielli commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10
ctlgdanielli commented on issue #4622: URL: https://github.com/apache/hudi/issues/4622#issuecomment-1146360783 Hello, any updates ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5664: [HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys
hudi-bot commented on PR #5664: URL: https://github.com/apache/hudi/pull/5664#issuecomment-1146345218 ## CI report: * 18654512f52bf46f458d0275a844fd4e625e32e4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9063) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5664: [HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys
hudi-bot commented on PR #5664: URL: https://github.com/apache/hudi/pull/5664#issuecomment-1146301312 ## CI report: * db498bb903cdd264f29c3db616dab75bccffddaf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9062) * 18654512f52bf46f458d0275a844fd4e625e32e4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9063) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5402: [WIP] Support Hadoop 3.x Hive 3.x and Spark 3.2.x default
hudi-bot commented on PR #5402: URL: https://github.com/apache/hudi/pull/5402#issuecomment-1146269210 ## CI report: * 8c6f6e19940ce7ac04dfcfce52da3ccdaf3a8b0f UNKNOWN * c4799803cff8adffef56e889a5cd4d52599fcf73 UNKNOWN * c5616888bb267cb505a12b88cad3e99f9dd18d9b UNKNOWN * 3694b869048eff12b408a86e295ba88d3d3168fb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9061) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5402: [WIP] Support Hadoop 3.x Hive 3.x and Spark 3.2.x default
hudi-bot commented on PR #5402: URL: https://github.com/apache/hudi/pull/5402#issuecomment-1146263278 ## CI report: * 8c6f6e19940ce7ac04dfcfce52da3ccdaf3a8b0f UNKNOWN * c4799803cff8adffef56e889a5cd4d52599fcf73 UNKNOWN * c5616888bb267cb505a12b88cad3e99f9dd18d9b UNKNOWN * 3007879a9a938a65b1f7f9174c23f22f1bd82145 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9060) * 3694b869048eff12b408a86e295ba88d3d3168fb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9061) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5664: [HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys
hudi-bot commented on PR #5664: URL: https://github.com/apache/hudi/pull/5664#issuecomment-1146260471 ## CI report: * 63c2aa08ecec2dbbe98823f2b88b52874346a085 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8997) * db498bb903cdd264f29c3db616dab75bccffddaf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9062) * 18654512f52bf46f458d0275a844fd4e625e32e4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5402: [WIP] Support Hadoop 3.x Hive 3.x and Spark 3.2.x default
hudi-bot commented on PR #5402: URL: https://github.com/apache/hudi/pull/5402#issuecomment-1146260103 ## CI report: * 8c6f6e19940ce7ac04dfcfce52da3ccdaf3a8b0f UNKNOWN * c4799803cff8adffef56e889a5cd4d52599fcf73 UNKNOWN * c5616888bb267cb505a12b88cad3e99f9dd18d9b UNKNOWN * 3007879a9a938a65b1f7f9174c23f22f1bd82145 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9060) * 3694b869048eff12b408a86e295ba88d3d3168fb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5664: [HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys
hudi-bot commented on PR #5664: URL: https://github.com/apache/hudi/pull/5664#issuecomment-1146256754 ## CI report: * 63c2aa08ecec2dbbe98823f2b88b52874346a085 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8997) * db498bb903cdd264f29c3db616dab75bccffddaf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9062) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5402: [WIP] Support Hadoop 3.x Hive 3.x and Spark 3.2.x default
hudi-bot commented on PR #5402: URL: https://github.com/apache/hudi/pull/5402#issuecomment-1146256410 ## CI report: * 8c6f6e19940ce7ac04dfcfce52da3ccdaf3a8b0f UNKNOWN * c4799803cff8adffef56e889a5cd4d52599fcf73 UNKNOWN * c5616888bb267cb505a12b88cad3e99f9dd18d9b UNKNOWN * Unknown: [CANCELED](TBD) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rahil-c commented on pull request #5402: [WIP] Support Hadoop 3.x Hive 3.x and Spark 3.2.x default
rahil-c commented on PR #5402: URL: https://github.com/apache/hudi/pull/5402#issuecomment-1146255412 `@hudi-bot run azure` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rahil-c commented on pull request #5402: [WIP] Support Hadoop 3.x Hive 3.x and Spark 3.2.x default
rahil-c commented on PR #5402: URL: https://github.com/apache/hudi/pull/5402#issuecomment-1146255235 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5664: [HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual keys
hudi-bot commented on PR #5664: URL: https://github.com/apache/hudi/pull/5664#issuecomment-1146224569 ## CI report: * 63c2aa08ecec2dbbe98823f2b88b52874346a085 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8997) * db498bb903cdd264f29c3db616dab75bccffddaf UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4186) Support Hudi with Spark 3.3
Udit Mehrotra created HUDI-4186: --- Summary: Support Hudi with Spark 3.3 Key: HUDI-4186 URL: https://issues.apache.org/jira/browse/HUDI-4186 Project: Apache Hudi Issue Type: Epic Components: spark Reporter: Udit Mehrotra Spark 3.3 voting is currently in progress and should like go through soon [https://github.com/apache/spark/tree/v3.3.0-rc4|https://github.com/apache/spark/tree/v3.3.0-rc4.] We should support it for our next major release 0.12. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-541) Replace variables/comments named "data files" to "base file"
[ https://issues.apache.org/jira/browse/HUDI-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-541: --- Fix Version/s: 0.12.0 (was: 0.11.1) > Replace variables/comments named "data files" to "base file" > > > Key: HUDI-541 > URL: https://issues.apache.org/jira/browse/HUDI-541 > Project: Apache Hudi > Issue Type: Improvement > Components: code-quality, dev-experience >Reporter: Vinoth Chandar >Assignee: Pratyaksh Sharma >Priority: Major > Labels: new-to-hudi, pull-request-available > Fix For: 0.12.0 > > > Per cWiki design and arch page, we should converge on the same terminology.. > We have _HoodieBaseFile_.. we should ensure all variables of this type are > named _baseFile_ or _bf_ , as opposed to _dataFile_ or _df_. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[GitHub] [hudi] yihua commented on pull request #1650: [HUDI-541]: replaced dataFile/df with baseFile/bf throughout code base
yihua commented on PR #1650: URL: https://github.com/apache/hudi/pull/1650#issuecomment-1146205989 @pratyakshsharma could you rebase the PR on the latest master given there are conflicts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin opened a new pull request, #5746: [WIP] Optimize performance of Column Stats filtering seq
alexeykudinkin opened a new pull request, #5746: URL: https://github.com/apache/hudi/pull/5746 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request Currently from our benchmarking, Column Stats has somewhat static overhead of ~1-2s / table (~7 files). This PR is taking a stab at eliminating this overhead: - Avoiding capturing heavy objets in closure requiring extensive cleaning by Spark - ... ## Brief change log - TBD ## Verify this pull request This pull request is already covered by existing tests, such as *(please describe tests)*. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar closed pull request #1946: [HUDI-1176]Upgrade tp log4j2
vinothchandar closed pull request #1946: [HUDI-1176]Upgrade tp log4j2 URL: https://github.com/apache/hudi/pull/1946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #1946: [HUDI-1176]Upgrade tp log4j2
vinothchandar commented on PR #1946: URL: https://github.com/apache/hudi/pull/1946#issuecomment-1146145764 Closing in favor of #5366 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #1637: [WIP] Adding benchmark for some of the write operations in Hudi using jmh
vinothchandar commented on PR #1637: URL: https://github.com/apache/hudi/pull/1637#issuecomment-1146144647 Closing this over the other perf efforts -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar closed pull request #1637: [WIP] Adding benchmark for some of the write operations in Hudi using jmh
vinothchandar closed pull request #1637: [WIP] Adding benchmark for some of the write operations in Hudi using jmh URL: https://github.com/apache/hudi/pull/1637 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar closed pull request #1514: [WIP] [HUDI-774] Addressing incorrect Spark to Avro schema generation
vinothchandar closed pull request #1514: [WIP] [HUDI-774] Addressing incorrect Spark to Avro schema generation URL: https://github.com/apache/hudi/pull/1514 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #1514: [WIP] [HUDI-774] Addressing incorrect Spark to Avro schema generation
vinothchandar commented on PR #1514: URL: https://github.com/apache/hudi/pull/1514#issuecomment-1146144129 I think the default values are not addressed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on pull request #5743: [HUDI-4183] Fix using HoodieCatalog to create non-hudi tables
alexeykudinkin commented on PR #5743: URL: https://github.com/apache/hudi/pull/5743#issuecomment-1146142725 @leesf can you please add a description to the PR and also the Jira? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4185) Evaluate alternatives to using "hoodie.properties" as state store for Metadata Table
[ https://issues.apache.org/jira/browse/HUDI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4185: -- Fix Version/s: 0.12.0 > Evaluate alternatives to using "hoodie.properties" as state store for > Metadata Table > > > Key: HUDI-4185 > URL: https://issues.apache.org/jira/browse/HUDI-4185 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Sagar Sumit >Priority: Blocker > Fix For: 0.12.0 > > > Currently Metadata Table uses "hoodie.properties" file as a state-store > adding properties reflecting the state of the metadata table being indexed. > This is creating some issues (for ex, HUDI-4138) in respect to the > "hoodie.properties" lifecycle as most of the already existing code assumes > that the file is (mostly) immutable. > We should re-evaluate our usage of "hoodie.properties" as a state-store given > that it has ripple effects on the existing components. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-4185) Evaluate alternatives to using "hoodie.properties" as state store for Metadata Table
[ https://issues.apache.org/jira/browse/HUDI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4185: -- Priority: Blocker (was: Major) > Evaluate alternatives to using "hoodie.properties" as state store for > Metadata Table > > > Key: HUDI-4185 > URL: https://issues.apache.org/jira/browse/HUDI-4185 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Sagar Sumit >Priority: Blocker > > Currently Metadata Table uses "hoodie.properties" file as a state-store > adding properties reflecting the state of the metadata table being indexed. > This is creating some issues (for ex, HUDI-4138) in respect to the > "hoodie.properties" lifecycle as most of the already existing code assumes > that the file is (mostly) immutable. > We should re-evaluate our usage of "hoodie.properties" as a state-store given > that it has ripple effects on the existing components. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HUDI-4185) Evaluate alternatives to using "hoodie.properties" as state store for Metadata Table
Alexey Kudinkin created HUDI-4185: - Summary: Evaluate alternatives to using "hoodie.properties" as state store for Metadata Table Key: HUDI-4185 URL: https://issues.apache.org/jira/browse/HUDI-4185 Project: Apache Hudi Issue Type: Bug Reporter: Alexey Kudinkin Assignee: Sagar Sumit Currently Metadata Table uses "hoodie.properties" file as a state-store adding properties reflecting the state of the metadata table being indexed. This is creating some issues (for ex, HUDI-4138) in respect to the "hoodie.properties" lifecycle as most of the already existing code assumes that the file is (mostly) immutable. We should re-evaluate our usage of "hoodie.properties" as a state-store given that it has ripple effects on the existing components. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HUDI-4184) Creating external table in Spark SQL modifies "hoodie.properties"
Alexey Kudinkin created HUDI-4184: - Summary: Creating external table in Spark SQL modifies "hoodie.properties" Key: HUDI-4184 URL: https://issues.apache.org/jira/browse/HUDI-4184 Project: Apache Hudi Issue Type: Bug Reporter: Alexey Kudinkin Assignee: Sagar Sumit My setup was like following: # There's a table existing in one AWS account # I'm trying to access that table from Spark SQL from _another_ AWS account that only has Read permissions to the bucket with the table. # Now when issuing "CREATE TABLE" Spark SQL command it fails b/c Hudi tries to modify "hoodie.properties" file for whatever reason, even though i'm not modifying the table and just trying to create table in the catalog. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[GitHub] [hudi] yadav-jai opened a new pull request, #5745: [MINOR][UI]Resized the blog images , tags onHover:blue, readingTime only visible in blogs page
yadav-jai opened a new pull request, #5745: URL: https://github.com/apache/hudi/pull/5745 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request The images earlier had distorted width so fixed their sizing so that they look clear. https://user-images.githubusercontent.com/97013124/171902828-7306133c-6429-48b4-a78a-7f542fa00cdd.png";> Removed read time from blogs list page such that it will only be visible in the blogs page Made the tags black by default and blue color when onHover. *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a diff in pull request #5737: [HUDI-4178][Stacked on 5733] Fixing `HoodieSpark3Analysis` missing to pass schema from Spark Catalog
leesf commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r889010463 ## hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieSpark3Analysis.scala: ## @@ -45,16 +45,22 @@ case class HoodieSpark3Analysis(sparkSession: SparkSession) extends Rule[Logical with SparkAdapterSupport with ProvidesHoodieConfig { override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsDown { -case dsv2 @ DataSourceV2Relation(d: HoodieInternalV2Table, _, _, _, _) => - val output = dsv2.output - val catalogTable = if (d.catalogTable.isDefined) { -Some(d.v1Table) - } else { -None - } - val relation = new DefaultSource().createRelation(new SQLContext(sparkSession), -buildHoodieConfig(d.hoodieCatalogTable)) - LogicalRelation(relation, output, catalogTable, isStreaming = false) +// NOTE: This step is required since Hudi relations don't currently implement DS V2 Read API +case dsv2 @ DataSourceV2Relation(tbl: HoodieInternalV2Table, _, _, _, _) => + val qualifiedTableName = QualifiedTableName(tbl.v1Table.database, tbl.v1Table.identifier.table) + val catalog = sparkSession.sessionState.catalog + + catalog.getCachedPlan(qualifiedTableName, () => { Review Comment: > @vinothchandar revert back to v1 means drop `HoodieCatalog`? If so, I do not think it is a good idea since users use 0.11.0 would use `HoodieCatalog` config and will change the behavior if we drop the `HoodieCatalog`. and please see my comment https://github.com/apache/hudi/pull/5737/files#r889018883 above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5402: [WIP] Support Hadoop 3.x Hive 3.x and Spark 3.2.x default
hudi-bot commented on PR #5402: URL: https://github.com/apache/hudi/pull/5402#issuecomment-1146087261 ## CI report: * 8c6f6e19940ce7ac04dfcfce52da3ccdaf3a8b0f UNKNOWN * c4799803cff8adffef56e889a5cd4d52599fcf73 UNKNOWN * c5616888bb267cb505a12b88cad3e99f9dd18d9b UNKNOWN * 3007879a9a938a65b1f7f9174c23f22f1bd82145 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9060) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] fzong76 commented on issue #5735: No hudi dataset was saved to s3
fzong76 commented on issue #5735: URL: https://github.com/apache/hudi/issues/5735#issuecomment-1146083416 Yes. I just started with one table since it's the first time I tried HoodieMultiTableDeltaStreamer. There is no exception in the logs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #5728: [SUPPORT] Flink support Timeline-server-based marker
danny0405 commented on issue #5728: URL: https://github.com/apache/hudi/issues/5728#issuecomment-1146071893 > 邮件已收到!谢谢! 薛超 [It should be supported, https://issues.apache.org/jira/browse/HUDI-2767 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on a diff in pull request #5737: [HUDI-4178][Stacked on 5733] Fixing `HoodieSpark3Analysis` missing to pass schema from Spark Catalog
vinothchandar commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r889028951 ## hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieSpark3Analysis.scala: ## @@ -45,16 +45,22 @@ case class HoodieSpark3Analysis(sparkSession: SparkSession) extends Rule[Logical with SparkAdapterSupport with ProvidesHoodieConfig { override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsDown { -case dsv2 @ DataSourceV2Relation(d: HoodieInternalV2Table, _, _, _, _) => - val output = dsv2.output - val catalogTable = if (d.catalogTable.isDefined) { -Some(d.v1Table) - } else { -None - } - val relation = new DefaultSource().createRelation(new SQLContext(sparkSession), -buildHoodieConfig(d.hoodieCatalogTable)) - LogicalRelation(relation, output, catalogTable, isStreaming = false) +// NOTE: This step is required since Hudi relations don't currently implement DS V2 Read API +case dsv2 @ DataSourceV2Relation(tbl: HoodieInternalV2Table, _, _, _, _) => + val qualifiedTableName = QualifiedTableName(tbl.v1Table.database, tbl.v1Table.identifier.table) + val catalog = sparkSession.sessionState.catalog + + catalog.getCachedPlan(qualifiedTableName, () => { Review Comment: I am just asking for ideas to fix this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5402: [WIP] Support Hadoop 3.x Hive 3.x and Spark 3.2.x default
hudi-bot commented on PR #5402: URL: https://github.com/apache/hudi/pull/5402#issuecomment-1146049163 ## CI report: * 8c6f6e19940ce7ac04dfcfce52da3ccdaf3a8b0f UNKNOWN * c4799803cff8adffef56e889a5cd4d52599fcf73 UNKNOWN * c5616888bb267cb505a12b88cad3e99f9dd18d9b UNKNOWN * c02afe06f4b0d02291112351f62b1f4046faccc1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9055) * 3007879a9a938a65b1f7f9174c23f22f1bd82145 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a diff in pull request #5737: [HUDI-4178][Stacked on 5733] Fixing `HoodieSpark3Analysis` missing to pass schema from Spark Catalog
leesf commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r889010463 ## hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieSpark3Analysis.scala: ## @@ -45,16 +45,22 @@ case class HoodieSpark3Analysis(sparkSession: SparkSession) extends Rule[Logical with SparkAdapterSupport with ProvidesHoodieConfig { override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsDown { -case dsv2 @ DataSourceV2Relation(d: HoodieInternalV2Table, _, _, _, _) => - val output = dsv2.output - val catalogTable = if (d.catalogTable.isDefined) { -Some(d.v1Table) - } else { -None - } - val relation = new DefaultSource().createRelation(new SQLContext(sparkSession), -buildHoodieConfig(d.hoodieCatalogTable)) - LogicalRelation(relation, output, catalogTable, isStreaming = false) +// NOTE: This step is required since Hudi relations don't currently implement DS V2 Read API +case dsv2 @ DataSourceV2Relation(tbl: HoodieInternalV2Table, _, _, _, _) => + val qualifiedTableName = QualifiedTableName(tbl.v1Table.database, tbl.v1Table.identifier.table) + val catalog = sparkSession.sessionState.catalog + + catalog.getCachedPlan(qualifiedTableName, () => { Review Comment: > @vinothchandar revert back to v1 means drop `HoodieCatalog`? If so, I do not think it is a good idea since users use 0.11.0 would use `HoodieCatalog` config and will change the behavior if we drop the `HoodieCatalog` which is not a good idea. and please see my comment https://github.com/apache/hudi/pull/5737/files#r889018883 above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a diff in pull request #5737: [HUDI-4178][Stacked on 5733] Fixing `HoodieSpark3Analysis` missing to pass schema from Spark Catalog
leesf commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r889018883 ## hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieSpark3Analysis.scala: ## @@ -45,16 +45,22 @@ case class HoodieSpark3Analysis(sparkSession: SparkSession) extends Rule[Logical with SparkAdapterSupport with ProvidesHoodieConfig { override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsDown { -case dsv2 @ DataSourceV2Relation(d: HoodieInternalV2Table, _, _, _, _) => - val output = dsv2.output - val catalogTable = if (d.catalogTable.isDefined) { -Some(d.v1Table) - } else { -None - } - val relation = new DefaultSource().createRelation(new SQLContext(sparkSession), -buildHoodieConfig(d.hoodieCatalogTable)) - LogicalRelation(relation, output, catalogTable, isStreaming = false) +// NOTE: This step is required since Hudi relations don't currently implement DS V2 Read API +case dsv2 @ DataSourceV2Relation(tbl: HoodieInternalV2Table, _, _, _, _) => Review Comment: the changes looks good and i think the changes here would solve the problem of passing schema to V1 implementation(means DefaultSource), and no need other changes? @alexeykudinkin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (HUDI-4178) HoodieSpark3Analysis does not pass schema from Spark Catalog
[ https://issues.apache.org/jira/browse/HUDI-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545899#comment-17545899 ] leesf edited comment on HUDI-4178 at 6/3/22 2:46 PM: - [~alexey.kudinkin] `making them fetch the schema from storage (either from commit's metadata or data file) every time.` means fetch only once schema for a write operation or fetch many times for a write operation? and how much performance it affects while fetching from storage? was (Author: xleesf): [~alexey.kudinkin] `making them fetch the schema from storage (either from commit's metadata or data file) every time.` means fetch only once schema for a write operation or fetch many times for a write operation? > HoodieSpark3Analysis does not pass schema from Spark Catalog > > > Key: HUDI-4178 > URL: https://issues.apache.org/jira/browse/HUDI-4178 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.11.0 >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.1 > > > Currently, HoodieSpark3Analysis rule does not pass table's schema from the > Spark Catalog to Hudi's relations making them fetch the schema from storage > (either from commit's metadata or data file) every time. > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HUDI-4178) HoodieSpark3Analysis does not pass schema from Spark Catalog
[ https://issues.apache.org/jira/browse/HUDI-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545899#comment-17545899 ] leesf edited comment on HUDI-4178 at 6/3/22 2:46 PM: - [~alexey.kudinkin] hi, `making them fetch the schema from storage (either from commit's metadata or data file) every time.` means fetch only once schema for a write operation or fetch many times for a write operation? and how much performance it affects while fetching from storage? was (Author: xleesf): [~alexey.kudinkin] `making them fetch the schema from storage (either from commit's metadata or data file) every time.` means fetch only once schema for a write operation or fetch many times for a write operation? and how much performance it affects while fetching from storage? > HoodieSpark3Analysis does not pass schema from Spark Catalog > > > Key: HUDI-4178 > URL: https://issues.apache.org/jira/browse/HUDI-4178 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.11.0 >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.1 > > > Currently, HoodieSpark3Analysis rule does not pass table's schema from the > Spark Catalog to Hudi's relations making them fetch the schema from storage > (either from commit's metadata or data file) every time. > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HUDI-4178) HoodieSpark3Analysis does not pass schema from Spark Catalog
[ https://issues.apache.org/jira/browse/HUDI-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545899#comment-17545899 ] leesf commented on HUDI-4178: - [~alexey.kudinkin] `making them fetch the schema from storage (either from commit's metadata or data file) every time.` means fetch only once schema for a write operation or fetch many times for a write operation? > HoodieSpark3Analysis does not pass schema from Spark Catalog > > > Key: HUDI-4178 > URL: https://issues.apache.org/jira/browse/HUDI-4178 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.11.0 >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.1 > > > Currently, HoodieSpark3Analysis rule does not pass table's schema from the > Spark Catalog to Hudi's relations making them fetch the schema from storage > (either from commit's metadata or data file) every time. > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[GitHub] [hudi] leesf commented on a diff in pull request #5737: [HUDI-4178][Stacked on 5733] Fixing `HoodieSpark3Analysis` missing to pass schema from Spark Catalog
leesf commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r889010463 ## hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieSpark3Analysis.scala: ## @@ -45,16 +45,22 @@ case class HoodieSpark3Analysis(sparkSession: SparkSession) extends Rule[Logical with SparkAdapterSupport with ProvidesHoodieConfig { override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsDown { -case dsv2 @ DataSourceV2Relation(d: HoodieInternalV2Table, _, _, _, _) => - val output = dsv2.output - val catalogTable = if (d.catalogTable.isDefined) { -Some(d.v1Table) - } else { -None - } - val relation = new DefaultSource().createRelation(new SQLContext(sparkSession), -buildHoodieConfig(d.hoodieCatalogTable)) - LogicalRelation(relation, output, catalogTable, isStreaming = false) +// NOTE: This step is required since Hudi relations don't currently implement DS V2 Read API +case dsv2 @ DataSourceV2Relation(tbl: HoodieInternalV2Table, _, _, _, _) => + val qualifiedTableName = QualifiedTableName(tbl.v1Table.database, tbl.v1Table.identifier.table) + val catalog = sparkSession.sessionState.catalog + + catalog.getCachedPlan(qualifiedTableName, () => { Review Comment: > @vinothchandar revert back to v1 means drop `HoodieCatalog`? If so, I do not think it is a good idea since users use 0.11.0 would use `HoodieCatalog` config and will change the behavior if we drop the `HoodieCatalog` which is not a good idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bkosuru commented on issue #5741: [SUPPORT] Hudi table copy failed for some partitions in 0.11.0
bkosuru commented on issue #5741: URL: https://github.com/apache/hudi/issues/5741#issuecomment-1145972813 No issues if I set option("hoodie.metadata.enable", false) for the writer in 0.11.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on pull request #5436: [RFC-51] [HUDI-3478] Change Data Capture RFC
YannByron commented on PR #5436: URL: https://github.com/apache/hudi/pull/5436#issuecomment-1145952554 @vinothchandar > Actually what I proposed, everything uses CDC blocks. Just that when we are deriving on-the-fly we don't write before and after into the CDC blocks in this case, do you mean that only `op` and `_hoodie_record_key` will be kept in the cdc block? then iterator over this cdc block, and get the after-image value and the inserted value from the new file (base file or log file), get the before-image value and the deleted value from the previous file slice. if so, IMO, the cdc blocks in this case can be omitted. Because we can iterator the log file or the base file (apply the filter `_hoodie_commit_time` = the current commit time), and continue the next operations. > everything uses CDC blocks. in my design that cdc block have the while cdc information, the cdc block will be written out only when the `HoodieMergeHandle` is called, not always. And other scenarios can re-use the existing files. be afraid there is still a gap about this, so i stress this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on a diff in pull request #5737: [HUDI-4178][Stacked on 5733] Fixing `HoodieSpark3Analysis` missing to pass schema from Spark Catalog
YannByron commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r888915090 ## hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieSpark3Analysis.scala: ## @@ -45,16 +45,22 @@ case class HoodieSpark3Analysis(sparkSession: SparkSession) extends Rule[Logical with SparkAdapterSupport with ProvidesHoodieConfig { override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsDown { -case dsv2 @ DataSourceV2Relation(d: HoodieInternalV2Table, _, _, _, _) => - val output = dsv2.output - val catalogTable = if (d.catalogTable.isDefined) { -Some(d.v1Table) - } else { -None - } - val relation = new DefaultSource().createRelation(new SQLContext(sparkSession), -buildHoodieConfig(d.hoodieCatalogTable)) - LogicalRelation(relation, output, catalogTable, isStreaming = false) +// NOTE: This step is required since Hudi relations don't currently implement DS V2 Read API +case dsv2 @ DataSourceV2Relation(tbl: HoodieInternalV2Table, _, _, _, _) => + val qualifiedTableName = QualifiedTableName(tbl.v1Table.database, tbl.v1Table.identifier.table) + val catalog = sparkSession.sessionState.catalog + + catalog.getCachedPlan(qualifiedTableName, () => { Review Comment: no. v1 and v1 are just different in the internal workings of spark, and having a few more configurations. The interface to the user is not actually affected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ws-dohashi commented on issue #2089: Reading MOR Tables - Not Working
ws-dohashi commented on issue #2089: URL: https://github.com/apache/hudi/issues/2089#issuecomment-1145923835 @harishchanderramesh Hi! wondering if AWS was able to eventually resolve this for you and if so how you did it? We are currently running into a similiar `org.apache.http.NoHttpResponseException: The target server failed to respond` issue and hoping if you were able to find a resolution it could help us. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] GnsCy commented on issue #5729: [SUPPORT] Environment issues when running Demo for v0.11
GnsCy commented on issue #5729: URL: https://github.com/apache/hudi/issues/5729#issuecomment-1145872951 @xushiyan wouldn't running the demo on docker eliminate any environment setup discrepancies? I am running the setup on a clean ubuntu OS. ps. Btw I manage to run the same setup successfully for v0.10.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pratyakshsharma commented on issue #5735: No hudi dataset was saved to s3
pratyakshsharma commented on issue #5735: URL: https://github.com/apache/hudi/issues/5735#issuecomment-1145794222 @fzong76 Do you see any exception in the logs? > but failed with --class org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer when trying to load multiple tables. I see you are only trying to load a single table `fei_hudi_test.table1` as mentioned in the config file but you mentioned "trying to load multiple tables". Am I missing something here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables
[ https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-4183. --- Resolution: Fixed > Fix using HoodieCatalog to create non-hudi tables > - > > Key: HUDI-4183 > URL: https://issues.apache.org/jira/browse/HUDI-4183 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables
[ https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-4183: Fix Version/s: 0.12.0 > Fix using HoodieCatalog to create non-hudi tables > - > > Key: HUDI-4183 > URL: https://issues.apache.org/jira/browse/HUDI-4183 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables
[ https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-4183. - > Fix using HoodieCatalog to create non-hudi tables > - > > Key: HUDI-4183 > URL: https://issues.apache.org/jira/browse/HUDI-4183 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[GitHub] [hudi] yuzhaojing commented on issue #5728: [SUPPORT] Flink support Timeline-server-based marker
yuzhaojing commented on issue #5728: URL: https://github.com/apache/hudi/issues/5728#issuecomment-1145776631 This is a great proposal, I will support this feature before the 0.12 release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-4183] Fix using HoodieCatalog to create non-hudi tables (#5743)
This is an automated email from the ASF dual-hosted git repository. leesf pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 3759a38b99 [HUDI-4183] Fix using HoodieCatalog to create non-hudi tables (#5743) 3759a38b99 is described below commit 3759a38b99cf9bb7540cd1881879cc0547a25e70 Author: leesf <490081...@qq.com> AuthorDate: Fri Jun 3 17:16:48 2022 +0800 [HUDI-4183] Fix using HoodieCatalog to create non-hudi tables (#5743) --- .../apache/spark/sql/hudi/TestCreateTable.scala| 31 ++ .../spark/sql/hudi/catalog/HoodieCatalog.scala | 10 --- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestCreateTable.scala b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestCreateTable.scala index cad30eca24..7091de4a8e 100644 --- a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestCreateTable.scala +++ b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestCreateTable.scala @@ -781,4 +781,35 @@ class TestCreateTable extends HoodieSparkSqlTestBase { val tablePath = s"${dbPath}/${tableName}" assertResult(false)(existsPath(tablePath)) } + + test("Test Create Non-Hudi Table(Parquet Table)") { +val databaseName = "test_database" +spark.sql(s"create database if not exists $databaseName") +spark.sql(s"use $databaseName") + +val tableName = generateTableName +// Create a managed table +spark.sql( + s""" + | create table $tableName ( + | id int, + | name string, + | price double, + | ts long + | ) using parquet + """.stripMargin) +val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier(tableName)) +assertResult(tableName)(table.identifier.table) +assertResult("parquet")(table.provider.get) +assertResult(CatalogTableType.MANAGED)(table.tableType) +assertResult( + Seq( +StructField("id", IntegerType), +StructField("name", StringType), +StructField("price", DoubleType), +StructField("ts", LongType)) +)(table.schema.fields) + +spark.sql("use default") + } } diff --git a/hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/catalog/HoodieCatalog.scala b/hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/catalog/HoodieCatalog.scala index 67012c7723..e1c2f228fa 100644 --- a/hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/catalog/HoodieCatalog.scala +++ b/hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/catalog/HoodieCatalog.scala @@ -118,9 +118,13 @@ class HoodieCatalog extends DelegatingCatalogExtension schema: StructType, partitions: Array[Transform], properties: util.Map[String, String]): Table = { -val locUriAndTableType = deduceTableLocationURIAndTableType(ident, properties) -createHoodieTable(ident, schema, locUriAndTableType, partitions, properties, - Map.empty, Option.empty, TableCreationMode.CREATE) +if (sparkAdapter.isHoodieTable(properties)) { + val locUriAndTableType = deduceTableLocationURIAndTableType(ident, properties) + createHoodieTable(ident, schema, locUriAndTableType, partitions, properties, +Map.empty, Option.empty, TableCreationMode.CREATE) +} else { + super.createTable(ident, schema, partitions, properties) +} } override def tableExists(ident: Identifier): Boolean = super.tableExists(ident)
[GitHub] [hudi] leesf merged pull request #5743: [HUDI-4183] Fix using HoodieCatalog to create non-hudi tables
leesf merged PR #5743: URL: https://github.com/apache/hudi/pull/5743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] BuddyJack commented on issue #5728: [SUPPORT] Flink support Timeline-server-based marker
BuddyJack commented on issue #5728: URL: https://github.com/apache/hudi/issues/5728#issuecomment-1145747140 邮件已收到!谢谢! 薛超 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #5728: [SUPPORT] Flink support Timeline-server-based marker
xushiyan commented on issue #5728: URL: https://github.com/apache/hudi/issues/5728#issuecomment-1145746881 @yuzhaojing can you please take this and advise? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #5729: [SUPPORT] Environment issues when running Demo for v0.11
xushiyan commented on issue #5729: URL: https://github.com/apache/hudi/issues/5729#issuecomment-1145745363 I suspect this is discrepancy with your environment setup. We have integration test end to end running with docker demo for every commits. And we certainly tested deltastreamer org.apache.hudi.integ.ITTestHoodieDemo#ingestFirstBatchAndHiveSync -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org