[GitHub] [hudi] hudi-bot commented on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
hudi-bot commented on pull request #4291: URL: https://github.com/apache/hudi/pull/4291#issuecomment-992199532 ## CI report: * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4227) * 6184de8bc6d18499d0ff49a0ef8f92f8ba20ba6e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
hudi-bot removed a comment on pull request #4291: URL: https://github.com/apache/hudi/pull/4291#issuecomment-992195331 ## CI report: * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4227) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] waywtdcc commented on issue #4249: [SUPPORT]FLINK CDC WRITE HUDI, restart job get exception:org.apache.hudi.org.apache.avro.InvalidAvroMagicException: Not an Avro data file
waywtdcc commented on issue #4249: URL: https://github.com/apache/hudi/issues/4249#issuecomment-992195627 > Not an Avro data file Is the release version 0.10 okay? I see there is also an exception of the 0.10 rc version here #4204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
hudi-bot commented on pull request #4291: URL: https://github.com/apache/hudi/pull/4291#issuecomment-992195331 ## CI report: * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4227) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
hudi-bot removed a comment on pull request #4291: URL: https://github.com/apache/hudi/pull/4291#issuecomment-992165761 ## CI report: * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221) * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4227) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
leesf commented on a change in pull request #4291: URL: https://github.com/apache/hudi/pull/4291#discussion_r767475656 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java ## @@ -166,20 +165,28 @@ protected void syncHoodieTable(String tableName, boolean useRealtimeInputFormat, // Check if the necessary table exists boolean tableExists = hoodieHiveClient.doesTableExist(tableName); -// Get the parquet schema for this table looking at the latest commit -MessageType schema = hoodieHiveClient.getDataSchema(); - -// Currently HoodieBootstrapRelation does support reading bootstrap MOR rt table, -// so we disable the syncAsSparkDataSourceTable here to avoid read such kind table -// by the data source way (which will use the HoodieBootstrapRelation). -// TODO after we support bootstrap MOR rt table in HoodieBootstrapRelation[HUDI-2071], we can remove this logical. -if (hoodieHiveClient.isBootstrap() -&& hoodieHiveClient.getTableType() == HoodieTableType.MERGE_ON_READ -&& !readAsOptimized) { - cfg.syncAsSparkDataSourceTable = false; +// check if isDeletePartition +boolean isDeletePartition = hoodieHiveClient.isDeletePartition(); Review comment: rename to `isDropPartition` and change to isDropPartition method name -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
leesf commented on a change in pull request #4291: URL: https://github.com/apache/hudi/pull/4291#discussion_r767475329 ## File path: hudi-sync/hudi-dla-sync/src/main/java/org/apache/hudi/dla/HoodieDLAClient.java ## @@ -287,6 +287,11 @@ public void updatePartitionsToTable(String tableName, List changedPartit } } + @Override + public void dropPartitionsToTable(String tableName, List partitionsToDelete) { +throw new UnsupportedOperationException("Not support dropPartitionsToTables yet."); Review comment: `dropPartitionsToTable` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…
hudi-bot removed a comment on pull request #4294: URL: https://github.com/apache/hudi/pull/4294#issuecomment-992158754 ## CI report: * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225) * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4226) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…
hudi-bot commented on pull request #4294: URL: https://github.com/apache/hudi/pull/4294#issuecomment-992183402 ## CI report: * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4226) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Carl-Zhou-CN commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena
Carl-Zhou-CN commented on issue #4267: URL: https://github.com/apache/hudi/issues/4267#issuecomment-992178782 I think it is possible, but I am not familiar with Athena. I think that as long as Hudi can interact with Glue Catalog, your problem should be solved. You may need to ask others to help. @nsivabalan Do you have time to help? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
hudi-bot removed a comment on pull request #4291: URL: https://github.com/apache/hudi/pull/4291#issuecomment-992158732 ## CI report: * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221) * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot commented on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-992165710 ## CI report: * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4223) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4228) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
hudi-bot commented on pull request #4291: URL: https://github.com/apache/hudi/pull/4291#issuecomment-992165761 ## CI report: * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221) * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4227) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot removed a comment on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-992099257 ## CI report: * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4223) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
xiarixiaoyao commented on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-992164844 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zztttt edited a comment on issue #4072: [SUPPORT]Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/scala/table6
zz edited a comment on issue #4072: URL: https://github.com/apache/hudi/issues/4072#issuecomment-992151992 > hmmm, seems strange. have you tried giving a diff warehouse dir? yes, I have already tried to change the warehouse dir URL, but it didn't work. Using remote metastore may be a better approach, and I have finished this problem by adding "spark.hadoop." prefix before hive configurations what is usually placed in hive-site.xml. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] mincwang commented on issue #4227: [SUPPORT] java.lang.IllegalStateException: Duplicate key Option
mincwang commented on issue #4227: URL: https://github.com/apache/hudi/issues/4227#issuecomment-992159825 hi @yanenze , i also has this probloms,do you have push patch to github repository ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…
hudi-bot removed a comment on pull request #4294: URL: https://github.com/apache/hudi/pull/4294#issuecomment-992132673 ## CI report: * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225) * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
hudi-bot removed a comment on pull request #4291: URL: https://github.com/apache/hudi/pull/4291#issuecomment-992081197 ## CI report: * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…
hudi-bot commented on pull request #4294: URL: https://github.com/apache/hudi/pull/4294#issuecomment-992158754 ## CI report: * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225) * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4226) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
hudi-bot commented on pull request #4291: URL: https://github.com/apache/hudi/pull/4291#issuecomment-992158732 ## CI report: * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221) * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Arun-kc commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena
Arun-kc commented on issue #4267: URL: https://github.com/apache/hudi/issues/4267#issuecomment-992158121 @Carl-Zhou-CN It's ok. I have tried `ALTER TABLE ADD PARTITION` before, it does work. But we will have to specify the partitions manually. When there are a lot of partitions this is not a viable solution, until and unless we can automate it. I will have to create a script to do this using boto3, that's doable. What I was trying to do is letting the Hudi system do this on its own so that in Athena we can query the partitions directly without running any other queries. Is it possible? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1602: -- Labels: core-flow-ds pull-request-available sev:critical (was: pull-request-available sev:critical) > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Nishith Agarwal >Priority: Major > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at >
[jira] [Updated] (HUDI-1850) Read on table fails if the first write to table failed
[ https://issues.apache.org/jira/browse/HUDI-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1850: -- Labels: core-flow-ds pull-request-available release-blocker sev:critical spark (was: pull-request-available release-blocker sev:critical spark) > Read on table fails if the first write to table failed > -- > > Key: HUDI-1850 > URL: https://issues.apache.org/jira/browse/HUDI-1850 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Vaibhav Sinha >Priority: Major > Labels: core-flow-ds, pull-request-available, release-blocker, > sev:critical, spark > Fix For: 0.11.0 > > Attachments: Screenshot 2021-04-24 at 7.53.22 PM.png > > > {code:java} > ava.util.NoSuchElementException: No value present in Option > at org.apache.hudi.common.util.Option.get(Option.java:88) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at > org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromCommitMetadata(TableSchemaResolver.java:215) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at > org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:166) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at > org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:155) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at > org.apache.hudi.MergeOnReadSnapshotRelation.(MergeOnReadSnapshotRelation.scala:65) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:99) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:63) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354) > ~[spark-sql_2.12-3.1.1.jar:3.1.1] > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326) > ~[spark-sql_2.12-3.1.1.jar:3.1.1] > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308) > ~[spark-sql_2.12-3.1.1.jar:3.1.1] > at scala.Option.getOrElse(Option.scala:189) > ~[scala-library-2.12.10.jar:?] > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308) > ~[spark-sql_2.12-3.1.1.jar:3.1.1] > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240) > ~[spark-sql_2.12-3.1.1.jar:3.1.1] > {code} > The screenshot shows the files that got created before the write had failed. > > !Screenshot 2021-04-24 at 7.53.22 PM.png! -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-945) Cleanup spillable map files eagerly as part of close
[ https://issues.apache.org/jira/browse/HUDI-945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-945: - Labels: pull-request-available sev:high (was: pull-request-available sev:critical) > Cleanup spillable map files eagerly as part of close > > > Key: HUDI-945 > URL: https://issues.apache.org/jira/browse/HUDI-945 > Project: Apache Hudi > Issue Type: Bug > Components: Writer Core >Reporter: Balaji Varadarajan >Assignee: Rajesh Mahindra >Priority: Major > Labels: pull-request-available, sev:high > Fix For: 0.11.0 > > > Currently, files used by external spillable map are deleted on exits. For > spark-streaming/deltastreamer continuous-mode cases which runs several > iterations, it is better to eagerly delete files on closing the handles using > it. > We need to eagerly delete the files on following cases: > # HoodieMergeHandle > # HoodieMergedLogRecordScanner > # SpillableMapBasedFileSystemView -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1607) Decimal handling bug in SparkAvroPostProcessor
[ https://issues.apache.org/jira/browse/HUDI-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1607: -- Labels: core-flow-ds sev:critical user-support-issues (was: sev:critical user-support-issues) > Decimal handling bug in SparkAvroPostProcessor > --- > > Key: HUDI-1607 > URL: https://issues.apache.org/jira/browse/HUDI-1607 > Project: Apache Hudi > Issue Type: Bug >Reporter: Jingwei Zhang >Priority: Major > Labels: core-flow-ds, sev:critical, user-support-issues > > This issue related to > [#[Hudi-1343]|[https://github.com/apache/hudi/pull/2192].] > I think the purpose of Hudi-1343 was to bridge the difference between avro > 1.8.2(used by hudi) and avro 1.9.2(used by upstream system) thru internal > Struct type. In particular, the incompatible form to express nullable type > between those two versions. > It was all good until I hit the type Decimal. Since it can either be FIXED or > BYTES, if an avro schema contains decimal type with BYTES as its literal > type, after this two way conversion its literal type become FIXED instead. > This will cause an exception to be thrown in AvroConversionHelper as the data > underneath is HeapByteBuffer rather than GenericFixed. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] zztttt commented on issue #4072: [SUPPORT]Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/scala/table6
zz commented on issue #4072: URL: https://github.com/apache/hudi/issues/4072#issuecomment-992151992 > hmmm, seems strange. have you tried giving a diff warehouse dir? yes, I have already tried to change the warehouse dir url, but it didn't works. Using remote metastore may be a better approach, and I have finished this but by add "spark.hadoop." prefix before hive configurations what is usually placed in hive-site.xml. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2374) AvroDFSSource does not use the overridden schema to deserialize Avro binaries.
[ https://issues.apache.org/jira/browse/HUDI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2374: -- Labels: core-flow-ds sev:critical (was: sev:critical) > AvroDFSSource does not use the overridden schema to deserialize Avro binaries. > -- > > Key: HUDI-2374 > URL: https://issues.apache.org/jira/browse/HUDI-2374 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Affects Versions: 0.9.0 >Reporter: Xuan Huy Pham >Assignee: Alexey Kudinkin >Priority: Major > Labels: core-flow-ds, sev:critical > Fix For: 0.11.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > Hi, > I am not sure if the AvroDFSSource is intended to ignore the source schema > from designated schema provider class, but the current logic always uses the > Avro writer schema as reader schema. > Logic as of release-0.9.0, Class: > {{org.apache.hudi.utilities.sources.AvroDFSSource}} > {code:java} > public class AvroDFSSource extends AvroSource { > private final DFSPathSelector pathSelector; > public AvroDFSSource(TypedProperties props, JavaSparkContext sparkContext, > SparkSession sparkSession, > SchemaProvider schemaProvider) throws IOException { > super(props, sparkContext, sparkSession, schemaProvider); > this.pathSelector = DFSPathSelector > .createSourceSelector(props, sparkContext.hadoopConfiguration()); > } > @Override > protected InputBatch> fetchNewData(Option > lastCkptStr, long sourceLimit) { > Pair, String> selectPathsWithMaxModificationTime = > pathSelector.getNextFilePathsAndMaxModificationTime(sparkContext, > lastCkptStr, sourceLimit); > return selectPathsWithMaxModificationTime.getLeft() > .map(pathStr -> new InputBatch<>(Option.of(fromFiles(pathStr)), > selectPathsWithMaxModificationTime.getRight())) > .orElseGet(() -> new InputBatch<>(Option.empty(), > selectPathsWithMaxModificationTime.getRight())); > } > private JavaRDD fromFiles(String pathStr) { > sparkContext.setJobGroup(this.getClass().getSimpleName(), "Fetch Avro > data from files"); > JavaPairRDD avroRDD = > sparkContext.newAPIHadoopFile(pathStr, AvroKeyInputFormat.class, > AvroKey.class, NullWritable.class, > sparkContext.hadoopConfiguration()); > return avroRDD.keys().map(r -> ((GenericRecord) r.datum())); > } > } > {code} > The {{schemaProvider}} parameter is completely ignored in the constructor, > making {{AvroKeyInputFormat}} always use writer schema to read. > As a result, we often see this from DeltaStream logs: > {code:java} > 21/08/30 10:17:24 WARN AvroKeyInputFormat: Reader schema was not set. Use > AvroJob.setInputKeySchema() if desired. > 21/08/30 10:17:24 INFO AvroKeyInputFormat: Using a reader schema equal to the > writer schema. > {code} > This [https://hudi.apache.org/blog/2021/08/16/kafka-custom-deserializer] is a > nice blog writing for AvroKafkaSource that supports BACKWARD_TRANSITIVE > schema evolution. For DFS data, I see this is the main blocker. If we pass > the source schema from {{schemaProvider}}, we should be able to have the same > BACKWARD_TRANSITIVE schema evolution feature for DFS avro data. > > Suggested Fix: Pass the source schema from {{schemaProvider}} to hadoop > configuration key {{avro.schema.input.key}} > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2986) Deltastreamer continuous mode run into Too many open files exception
[ https://issues.apache.org/jira/browse/HUDI-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2986: -- Labels: core-flow-ds sev:critical (was: sev:critical) > Deltastreamer continuous mode run into Too many open files exception > > > Key: HUDI-2986 > URL: https://issues.apache.org/jira/browse/HUDI-2986 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer, Writer Core >Reporter: Raymond Xu >Priority: Blocker > Labels: core-flow-ds, sev:critical > > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 6 in stage 35202.0 failed 4 times, most recent failure: Lost task 6.3 in > stage 35202.0 (TID 1172485, ip-10-211-53-165.infra.usw2.zdsys.com, executor > 1): java.io.FileNotFoundException: > /mnt/yarn/usercache/hadoop/appcache/application_1638666447607_0001/blockmgr-3725bb05-2c9a-4073-80f6-4eaa335321c9/34/temp_shuffle_8f675a83-21ac-4908-b8da-1c8e25a59b8e > (Too many open files) > at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at > org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:106) > at > org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:119) > at > org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:251) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:157) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:95) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2136) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2124) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2123) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2123) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:994) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:994) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:994) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2384) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2333) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2322) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) > at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:805) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2097) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2194) > at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1143) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:385) > at org.apache.spark.rdd.RDD.fold(RDD.scala:1137) > at > org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply$mcD$sp(DoubleRDDFunctions.scala:35) > at > org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply(DoubleRDDFunctions.scala:35) > at > org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply(DoubleRDDFunctions.scala:35) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) >
[GitHub] [hudi] yihua commented on issue #4230: [SUPPORT] org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file
yihua commented on issue #4230: URL: https://github.com/apache/hudi/issues/4230#issuecomment-992150600 @BenjMaq Could you try adding this config to disable timeline-server-based markers and check if the insert is successful? ``` set hoodie.write.markers.type=direct; ``` The problem is likely due to no/failed timeline server in the insert operation in Spark SQL. I'm going to understand the root cause. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Carl-Zhou-CN commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena
Carl-Zhou-CN commented on issue #4267: URL: https://github.com/apache/hudi/issues/4267#issuecomment-992150191 @Arun-kc Sorry, it seems I misunderstood,what needs to be done should be ALTER TABLE ADD PARTITION ![image](https://user-images.githubusercontent.com/67902676/145762846-007866d1-1bfe-46fe-b082-66e723007b92.png) ![image](https://user-images.githubusercontent.com/67902676/145762903-7599f309-b5b0-4eac-9e7c-d3b9ecafe9b8.png) https://docs.aws.amazon.com/athena/latest/ug/querying-hudi.html#querying-hudi-in-athena-creating-hudi-tables -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2996) Flink streaming reader 'skip_compaction' option does not work
Danny Chen created HUDI-2996: Summary: Flink streaming reader 'skip_compaction' option does not work Key: HUDI-2996 URL: https://issues.apache.org/jira/browse/HUDI-2996 Project: Apache Hudi Issue Type: Bug Components: Flink Integration Affects Versions: 0.10.0 Reporter: Danny Chen Fix For: 0.11.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] laurieliyang commented on pull request #3859: [DOCS] Fix the "Edit this page" config and add 6 cn docs.
laurieliyang commented on pull request #3859: URL: https://github.com/apache/hudi/pull/3859#issuecomment-992145998 > @laurieliyang Thanks for fixing the Chinese docs. Could you fix the conflicts with the latest asf-site? I have fixed the conflicts in `overview.md`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
leesf commented on a change in pull request #4291: URL: https://github.com/apache/hudi/pull/4291#discussion_r767437932 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/JDBCExecutor.java ## @@ -141,6 +142,11 @@ private String getHiveJdbcUrlWithDefaultDBName(String jdbcUrl) { } } + @Override + public void dropPartitionsToTable(String tableName, List partitionsToDelete) { +throw new UnsupportedOperationException("No support for dropPartitionsToTable"); Review comment: why not support in jdbc mode? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
leesf commented on a change in pull request #4291: URL: https://github.com/apache/hudi/pull/4291#discussion_r767437932 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/JDBCExecutor.java ## @@ -141,6 +142,11 @@ private String getHiveJdbcUrlWithDefaultDBName(String jdbcUrl) { } } + @Override + public void dropPartitionsToTable(String tableName, List partitionsToDelete) { +throw new UnsupportedOperationException("No support for dropPartitionsToTable"); Review comment: not support in jdbc mode? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
leesf commented on a change in pull request #4291: URL: https://github.com/apache/hudi/pull/4291#discussion_r767437393 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java ## @@ -122,6 +122,14 @@ public void updatePartitionsToTable(String tableName, List changedPartit ddlExecutor.updatePartitionsToTable(tableName, changedPartitions); } + /** + * Partition path has changed - drop the following partitions. + */ + @Override + public void dropPartitionsToTable(String tableName, List partitionsToDelete) { Review comment: partitionsToDrop -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
leesf commented on a change in pull request #4291: URL: https://github.com/apache/hudi/pull/4291#discussion_r767437625 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java ## @@ -147,6 +155,14 @@ public void updateTableProperties(String tableName, Map tablePro * Generate a list of PartitionEvent based on the changes required. */ List getPartitionEvents(List tablePartitions, List partitionStoragePartitions) { +return getPartitionEvents(tablePartitions, partitionStoragePartitions, false); + } + + /** + * Iterate over the storage partitions and find if there are any new partitions that need to be added or updated. + * Generate a list of PartitionEvent based on the changes required. + */ + List getPartitionEvents(List tablePartitions, List partitionStoragePartitions, boolean isDeletePartition) { Review comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations
hudi-bot commented on pull request #4259: URL: https://github.com/apache/hudi/pull/4259#issuecomment-992142842 ## CI report: * a4e9d227602017b1b1db0d2ef706afad0ea09158 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4224) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations
hudi-bot removed a comment on pull request #4259: URL: https://github.com/apache/hudi/pull/4259#issuecomment-992123974 ## CI report: * c9d8d403526f3f562283a2f64c4f4f7bddfee07b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4141) * a4e9d227602017b1b1db0d2ef706afad0ea09158 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4224) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
leesf commented on a change in pull request #4291: URL: https://github.com/apache/hudi/pull/4291#discussion_r767436889 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java ## @@ -331,19 +338,32 @@ private boolean syncSchema(String tableName, boolean tableExists, boolean useRea * Syncs the list of storage partitions passed in (checks if the partition is in hive, if not adds it or if the * partition path does not match, it updates the partition path). */ - private boolean syncPartitions(String tableName, List writtenPartitionsSince) { + private boolean syncPartitions(String tableName, List writtenPartitionsSince, boolean isDeletePartition) { Review comment: rename to `isDropPartition` to align with PartitionEvenType.DROP? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
leesf commented on a change in pull request #4291: URL: https://github.com/apache/hudi/pull/4291#discussion_r767436334 ## File path: hudi-sync/hudi-dla-sync/src/main/java/org/apache/hudi/dla/HoodieDLAClient.java ## @@ -287,6 +287,11 @@ public void updatePartitionsToTable(String tableName, List changedPartit } } + @Override + public void dropPartitionsToTable(String tableName, List partitionsToDelete) { +throw new UnsupportedOperationException("No support for dropPartitionsToTable"); Review comment: Not support dropPartitionsToTables yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions
leesf commented on a change in pull request #4291: URL: https://github.com/apache/hudi/pull/4291#discussion_r767435793 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java ## @@ -414,6 +414,25 @@ public Schema getLatestSchema(Schema writeSchema, boolean convertTableSchemaToAd return latestSchema; } + + /** + * Get Last commit's Metadata. + */ + public HoodieCommitMetadata getLatestCommitMetadata() { Review comment: use `Option` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2990) Sync to HMS when deleting partitions
[ https://issues.apache.org/jira/browse/HUDI-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Forward Xu updated HUDI-2990: - Summary: Sync to HMS when deleting partitions (was: Delete partitions without metadata sync to hms) > Sync to HMS when deleting partitions > > > Key: HUDI-2990 > URL: https://issues.apache.org/jira/browse/HUDI-2990 > Project: Apache Hudi > Issue Type: Bug >Reporter: Forward Xu >Assignee: Forward Xu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot removed a comment on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…
hudi-bot removed a comment on pull request #4294: URL: https://github.com/apache/hudi/pull/4294#issuecomment-992131759 ## CI report: * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225) * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…
hudi-bot commented on pull request #4294: URL: https://github.com/apache/hudi/pull/4294#issuecomment-992132673 ## CI report: * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225) * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2994) Add judgement to existed partitionPath in the catch code block for HUDI-2743
[ https://issues.apache.org/jira/browse/HUDI-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2994: - Labels: pull-request-available (was: ) > Add judgement to existed partitionPath in the catch code block for HUDI-2743 > > > Key: HUDI-2994 > URL: https://issues.apache.org/jira/browse/HUDI-2994 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Affects Versions: 0.10.0 >Reporter: WangMinChao >Assignee: WangMinChao >Priority: Major > Labels: pull-request-available > Attachments: image-2021-12-13-13-25-33-402.png > > > !image-2021-12-13-13-25-33-402.png! -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot removed a comment on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…
hudi-bot removed a comment on pull request #4294: URL: https://github.com/apache/hudi/pull/4294#issuecomment-992130078 ## CI report: * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…
hudi-bot commented on pull request #4294: URL: https://github.com/apache/hudi/pull/4294#issuecomment-992131759 ## CI report: * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225) * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Arun-kc commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena
Arun-kc commented on issue #4267: URL: https://github.com/apache/hudi/issues/4267#issuecomment-992131104 @Carl-Zhou-CN I tried `ALTER TABLE table_name RECOVER PARTITIONS;`, but its not working. ![image](https://user-images.githubusercontent.com/22231409/145756912-c82b44ee-8d03-4802-8412-ddf2919aa766.png) hoodie.datasource.hive_sync.use_jdbc -> false Tried this approach too, but to no vain. @nikita-sheremet-clearscale Yes, I'm using Glue in this scenario. I'm using a hudi connector that was subscribed when the version was 0.4. Now in marketplace the version is shown as 0.9.0. I'm not sure if the subscribed version gets updated automatically. I will check on the IP part and will let you know. Just to let you know, the hudi table I'm creating it manually in Athena using the following DDL ```sql CREATE EXTERNAL TABLE `my_hudi_table`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `id` string, `last_update_time` string) PARTITIONED BY ( `creation_date` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3:///tmp/myhudidataset_001' ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a change in pull request #4141: [HUDI-2815] Support partial update for streaming change logs
yihua commented on a change in pull request #4141: URL: https://github.com/apache/hudi/pull/4141#discussion_r767428311 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdateWithLatestAvroPayload.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.model; + +import org.apache.hudi.common.util.Option; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; + +import java.io.IOException; +import java.util.List; +import java.util.Objects; +import java.util.Properties; + +import static org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro; + +/** + * The only difference with {@link DefaultHoodieRecordPayload} is that support update partial fields + * in latest record to old record instead of all fields. Review comment: nit: do you want to give a concrete example here to illustrate the operation of `combineAndGetUpdateValue()`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2815) Support partial update for streaming change logs
[ https://issues.apache.org/jira/browse/HUDI-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2815: - Labels: pull-request-available (was: ) > Support partial update for streaming change logs > > > Key: HUDI-2815 > URL: https://issues.apache.org/jira/browse/HUDI-2815 > Project: Apache Hudi > Issue Type: New Feature > Components: Common Core >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > > See issue: https://github.com/apache/hudi/issues/4030 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] yihua commented on a change in pull request #4141: [HUDI-2815] Support partial update for streaming change logs
yihua commented on a change in pull request #4141: URL: https://github.com/apache/hudi/pull/4141#discussion_r767427510 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdateWithLatestAvroPayload.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.model; + +import org.apache.hudi.common.util.Option; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; + +import java.io.IOException; +import java.util.List; +import java.util.Objects; +import java.util.Properties; + +import static org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro; + +/** + * The only difference with {@link DefaultHoodieRecordPayload} is that support update partial fields + * in latest record to old record instead of all fields. + */ +public class PartialUpdateWithLatestAvroPayload extends DefaultHoodieRecordPayload { + + public PartialUpdateWithLatestAvroPayload(GenericRecord record, Comparable orderingVal) { +super(record, orderingVal); + } + + @Override + public Option combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties) throws IOException { +if (recordBytes.length == 0) { + return Option.of(currentValue); +} + +GenericRecord incomingRecord = bytesToAvro(recordBytes, schema); + +// Null check is needed here to support schema evolution. The record in storage may be from old schema where +// the new ordering column might not be present and hence returns null. +if (!needUpdatingPersistedRecord(currentValue, incomingRecord, properties)) { + return Option.of(currentValue); +} + +if (isDeleteRecord(incomingRecord)) { + return Option.empty(); +} + +GenericRecord currentRecord = (GenericRecord) currentValue; +// The field num in updated record may be less than old record, so only update these partial fields to old record. +List fields = schema.getFields(); +fields.forEach(field -> { + Object value = incomingRecord.get(field.name()); + if (Objects.nonNull(value)) { +currentRecord.put(field.name(), value); + } Review comment: The difference compared to DefaultHoodieRecordPayload is that only if the corresponding field has a value, it overrides the field value in the existing record, instead of overriding it to null. Is it correct? The docs above are a bit confusing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4294: Add judgement to existed partitionPath in the catch code block for HU…
hudi-bot removed a comment on pull request #4294: URL: https://github.com/apache/hudi/pull/4294#issuecomment-992129272 ## CI report: * 8c68cfecef8fc2892da4d332d2c2993a0460cdac UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4294: Add judgement to existed partitionPath in the catch code block for HU…
hudi-bot commented on pull request #4294: URL: https://github.com/apache/hudi/pull/4294#issuecomment-992130078 ## CI report: * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2962) Support JVM based local process lock provider implementation
[ https://issues.apache.org/jira/browse/HUDI-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HUDI-2962: - Summary: Support JVM based local process lock provider implementation (was: Enable metadata table along with JVM local lock provider) > Support JVM based local process lock provider implementation > > > Key: HUDI-2962 > URL: https://issues.apache.org/jira/browse/HUDI-2962 > Project: Apache Hudi > Issue Type: Task >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > Metadata table is disabled by default in master due to > https://issues.apache.org/jira/browse/HUDI-2961. > > For the single writer + async table services deployment model, to protect > against races, we can have a fairly light weight JVM local lock provider. > This mean all the writes and the table services have to be running from the > single JVM, like in the case of DeltaStreamer. This doesn't cover the multi > JVM writes, async table services though and a full fix for the same will be > covered by HUDI-2961. For now to have the metadata table re-enabled at > master, a JVM local locl provider should be sufficient. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4294: Add judgement to existed partitionPath in the catch code block for HU…
hudi-bot commented on pull request #4294: URL: https://github.com/apache/hudi/pull/4294#issuecomment-992129272 ## CI report: * 8c68cfecef8fc2892da4d332d2c2993a0460cdac UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2995) Enable metadata table by default
Manoj Govindassamy created HUDI-2995: Summary: Enable metadata table by default Key: HUDI-2995 URL: https://issues.apache.org/jira/browse/HUDI-2995 Project: Apache Hudi Issue Type: Task Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy Fix For: 0.11.0 Metadata table was disabled by default due to https://issues.apache.org/jira/browse/HUDI-2961 The interim workaround is to have JVM based local process lock provider - https://issues.apache.org/jira/browse/HUDI-2962. With this we can turn on the metadata table by default. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] mincwang opened a new pull request #4294: Add judgement to existed partitionPath in the catch code block for HU…
mincwang opened a new pull request #4294: URL: https://github.com/apache/hudi/pull/4294 …DI-2743 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2994) Add judgement to existed partitionPath in the catch code block for HUDI-2743
[ https://issues.apache.org/jira/browse/HUDI-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangMinChao updated HUDI-2994: -- Summary: Add judgement to existed partitionPath in the catch code block for HUDI-2743 (was: Add judge existed partitionPath in the catch code block for HUDI-2743) > Add judgement to existed partitionPath in the catch code block for HUDI-2743 > > > Key: HUDI-2994 > URL: https://issues.apache.org/jira/browse/HUDI-2994 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Affects Versions: 0.10.0 >Reporter: WangMinChao >Assignee: WangMinChao >Priority: Major > Attachments: image-2021-12-13-13-25-33-402.png > > > !image-2021-12-13-13-25-33-402.png! -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2994) Add judge existed partitionPath in the catch code block for HUDI-2743
[ https://issues.apache.org/jira/browse/HUDI-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangMinChao updated HUDI-2994: -- Summary: Add judge existed partitionPath in the catch code block for HUDI-2743 (was: Add existed partitionPath process in the catch code block for HUDI-2743) > Add judge existed partitionPath in the catch code block for HUDI-2743 > - > > Key: HUDI-2994 > URL: https://issues.apache.org/jira/browse/HUDI-2994 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Affects Versions: 0.10.0 >Reporter: WangMinChao >Assignee: WangMinChao >Priority: Major > Attachments: image-2021-12-13-13-25-33-402.png > > > !image-2021-12-13-13-25-33-402.png! -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations
hudi-bot commented on pull request #4259: URL: https://github.com/apache/hudi/pull/4259#issuecomment-992123974 ## CI report: * c9d8d403526f3f562283a2f64c4f4f7bddfee07b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4141) * a4e9d227602017b1b1db0d2ef706afad0ea09158 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4224) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations
hudi-bot removed a comment on pull request #4259: URL: https://github.com/apache/hudi/pull/4259#issuecomment-992122965 ## CI report: * c9d8d403526f3f562283a2f64c4f4f7bddfee07b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4141) * a4e9d227602017b1b1db0d2ef706afad0ea09158 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2994) Add existed partitionPath process in the catch code block for HUDI-2743
WangMinChao created HUDI-2994: - Summary: Add existed partitionPath process in the catch code block for HUDI-2743 Key: HUDI-2994 URL: https://issues.apache.org/jira/browse/HUDI-2994 Project: Apache Hudi Issue Type: Bug Components: Common Core Affects Versions: 0.10.0 Reporter: WangMinChao Assignee: WangMinChao Attachments: image-2021-12-13-13-25-33-402.png !image-2021-12-13-13-25-33-402.png! -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] yihua commented on pull request #3776: [HUDI-2543]: Added guides section
yihua commented on pull request #3776: URL: https://github.com/apache/hudi/pull/3776#issuecomment-992122994 @pratyakshsharma any update on the nit? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations
hudi-bot commented on pull request #4259: URL: https://github.com/apache/hudi/pull/4259#issuecomment-992122965 ## CI report: * c9d8d403526f3f562283a2f64c4f4f7bddfee07b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4141) * a4e9d227602017b1b1db0d2ef706afad0ea09158 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations
hudi-bot removed a comment on pull request #4259: URL: https://github.com/apache/hudi/pull/4259#issuecomment-990409450 ## CI report: * c9d8d403526f3f562283a2f64c4f4f7bddfee07b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4141) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] manojpec commented on a change in pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations
manojpec commented on a change in pull request #4259: URL: https://github.com/apache/hudi/pull/4259#discussion_r767420453 ## File path: hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestLocalProcessLockProvider.java ## @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.client.transaction; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hudi.client.transaction.lock.LocalProcessLockProvider; +import org.apache.hudi.common.config.LockConfiguration; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.exception.HoodieLockException; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.Test; + +import java.util.concurrent.TimeUnit; + +import static org.junit.jupiter.api.Assertions.assertDoesNotThrow; +import static org.junit.jupiter.api.Assertions.assertThrows; + +public class TestLocalProcessLockProvider { + + private static final Logger LOG = LogManager.getLogger(TestLocalProcessLockProvider.class); + private final Configuration hadoopConfiguration = new Configuration(); + private final LockConfiguration lockConfiguration = new LockConfiguration(new TypedProperties()); + + @Test + public void testLockAcquisition() { +LocalProcessLockProvider localProcessLockProvider = new LocalProcessLockProvider(lockConfiguration, hadoopConfiguration); +assertDoesNotThrow(() -> { + localProcessLockProvider.lock(); +}); +assertDoesNotThrow(() -> { + localProcessLockProvider.unlock(); +}); + } + + @Test + public void testLockReAcquisitionBySameThread() { +LocalProcessLockProvider localProcessLockProvider = new LocalProcessLockProvider(lockConfiguration, hadoopConfiguration); +assertDoesNotThrow(() -> { + localProcessLockProvider.lock(); +}); +assertThrows(HoodieLockException.class, () -> { + localProcessLockProvider.lock(); +}); +assertDoesNotThrow(() -> { + localProcessLockProvider.unlock(); +}); + } + + @Test + public void testLockReAcquisitionByDifferentThread() { +LocalProcessLockProvider localProcessLockProvider = new LocalProcessLockProvider(lockConfiguration, hadoopConfiguration); + +// Main test thread +assertDoesNotThrow(() -> { + localProcessLockProvider.lock(); +}); + +// Another writer thread +Thread writer2 = new Thread(new Runnable() { + @Override + public void run() { +assertThrows(HoodieLockException.class, () -> { + localProcessLockProvider.lock(); +}); + } +}); + +try { + writer2.join(); +} catch (InterruptedException e) { + // +} + +assertDoesNotThrow(() -> { + localProcessLockProvider.unlock(); +}); + } + + @Test + public void testTryLockAcquisition() { +LocalProcessLockProvider localProcessLockProvider = new LocalProcessLockProvider(lockConfiguration, hadoopConfiguration); +Assertions.assertTrue(localProcessLockProvider.tryLock()); +assertDoesNotThrow(() -> { + localProcessLockProvider.unlock(); +}); + } + + @Test + public void testTryLockAcquisitionWithTimeout() { +LocalProcessLockProvider localProcessLockProvider = new LocalProcessLockProvider(lockConfiguration, hadoopConfiguration); +Assertions.assertTrue(localProcessLockProvider.tryLock(1, TimeUnit.MILLISECONDS)); +assertDoesNotThrow(() -> { + localProcessLockProvider.unlock(); +}); + } + + @Test + public void testTryLockReAcquisitionBySameThread() { Review comment: Added a new unit test for your suggested case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] manojpec commented on a change in pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations
manojpec commented on a change in pull request #4259: URL: https://github.com/apache/hudi/pull/4259#discussion_r767420255 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/LocalProcessLockProvider.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.client.transaction.lock; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hudi.common.config.LockConfiguration; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.lock.LockProvider; +import org.apache.hudi.common.lock.LockState; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.exception.HoodieLockException; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.jetbrains.annotations.NotNull; + +import java.util.concurrent.TimeUnit; +import java.util.concurrent.locks.ReentrantReadWriteLock; + +/** + * Local process level lock. This {@link LockProvider} implementation is to + * guard table from concurrent operations happening in the local JVM process. + * + * Note: This Lock provider implementation doesn't allow lock reentrancy. + * Attempting to reacquire the lock from the same thread will throw + * HoodieLockException. Threads other than the current lock owner, will + * block on lock() and return false on tryLock(). + */ +public class LocalProcessLockProvider implements LockProvider { + + private static final Logger LOG = LogManager.getLogger(ZookeeperBasedLockProvider.class); + private static final ReentrantReadWriteLock LOCK = new ReentrantReadWriteLock(); + private final long maxWaitTimeMillis; + + public LocalProcessLockProvider(final LockConfiguration lockConfiguration, final Configuration conf) { +TypedProperties typedProperties = lockConfiguration.getConfig(); +maxWaitTimeMillis = (typedProperties.containsKey(LockConfiguration.LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY) +? lockConfiguration.getConfig().getLong(LockConfiguration.LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY) : 0); + } + + @Override + public void lock() { Review comment: For the lock provider completeness would like to have the lock() implemented as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] manojpec commented on a change in pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations
manojpec commented on a change in pull request #4259: URL: https://github.com/apache/hudi/pull/4259#discussion_r767420030 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/LocalProcessLockProvider.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.client.transaction.lock; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hudi.common.config.LockConfiguration; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.lock.LockProvider; +import org.apache.hudi.common.lock.LockState; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.exception.HoodieLockException; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; +import org.jetbrains.annotations.NotNull; + +import java.util.concurrent.TimeUnit; +import java.util.concurrent.locks.ReentrantReadWriteLock; + +/** + * Local process level lock. This {@link LockProvider} implementation is to + * guard table from concurrent operations happening in the local JVM process. + * + * Note: This Lock provider implementation doesn't allow lock reentrancy. + * Attempting to reacquire the lock from the same thread will throw + * HoodieLockException. Threads other than the current lock owner, will + * block on lock() and return false on tryLock(). + */ +public class LocalProcessLockProvider implements LockProvider { + + private static final Logger LOG = LogManager.getLogger(ZookeeperBasedLockProvider.class); + private static final ReentrantReadWriteLock LOCK = new ReentrantReadWriteLock(); + private final long maxWaitTimeMillis; + + public LocalProcessLockProvider(final LockConfiguration lockConfiguration, final Configuration conf) { +TypedProperties typedProperties = lockConfiguration.getConfig(); +maxWaitTimeMillis = (typedProperties.containsKey(LockConfiguration.LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY) +? lockConfiguration.getConfig().getLong(LockConfiguration.LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY) : 0); + } + + @Override + public void lock() { +LOG.info(getLogMessage(LockState.ACQUIRING)); +if (LOCK.isWriteLockedByCurrentThread()) { + throw new HoodieLockException(getLogMessage(LockState.ALREADY_ACQUIRED)); +} +LOCK.writeLock().lock(); +LOG.info(getLogMessage(LockState.ACQUIRED)); + } + + @Override + public boolean tryLock() { +LOG.info(getLogMessage(LockState.ACQUIRING)); +if (LOCK.writeLock().isHeldByCurrentThread()) { + throw new HoodieLockException(getLogMessage(LockState.ALREADY_ACQUIRED)); +} +final boolean isLockAcquired; +try { + isLockAcquired = LOCK.writeLock().tryLock(maxWaitTimeMillis, TimeUnit.MILLISECONDS); +} catch (InterruptedException e) { + throw new HoodieLockException(getLogMessage(LockState.FAILED_TO_ACQUIRE)); +} +LOG.info(getLogMessage(isLockAcquired ? LockState.ACQUIRED : LockState.FAILED_TO_ACQUIRE)); +return isLockAcquired; + } + + @Override + public boolean tryLock(long time, @NotNull TimeUnit unit) { +LOG.info(getLogMessage(LockState.ACQUIRING)); Review comment: right, fixed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on pull request #3859: [DOCS] Fix the "Edit this page" config and add 6 cn docs.
yihua commented on pull request #3859: URL: https://github.com/apache/hudi/pull/3859#issuecomment-992121886 @leesf is there any plan to update the CN docs for 0.9.0, 0.10.0 releases, and the current version? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a change in pull request #3859: [DOCS] Fix the "Edit this page" config and add 6 cn docs.
yihua commented on a change in pull request #3859: URL: https://github.com/apache/hudi/pull/3859#discussion_r767224687 ## File path: website/i18n/cn/docusaurus-plugin-content-docs/current/ibm_cos_hoodie.md ## @@ -1,26 +1,26 @@ --- -title: IBM Cloud Object Storage Filesystem +title: IBM Cloud Object Storage 文件系统 keywords: [ hudi, hive, ibm, cos, spark, presto] -summary: In this page, we go over how to configure Hudi with IBM Cloud Object Storage filesystem. +summary: 在本页中,我们讨论在 IBM Cloud Object Storage 文件系统中配置 Hudi 。 last_modified_at: 2020-10-01T11:38:24-10:00 language: cn --- -In this page, we explain how to get your Hudi spark job to store into IBM Cloud Object Storage. +在本页中,我们解释如何如何将你的 Hudi Spark 作业存储到 IBM Cloud Object Storage 当中。 Review comment: `我们解释如何如何...` -> `我们解释如何...` ## File path: website/docusaurus.config.js ## @@ -383,8 +383,20 @@ module.exports = { docs: { sidebarPath: require.resolve('./sidebars.js'), // Please change this to your repo. - editUrl: -'https://github.com/apache/hudi/edit/asf-site/website/docs/', + editUrl: ({ version, versionDocsDirPath, docPath, locale }) => { +if (locale != this.defaultLocale) { + return `https://github.com/apache/hudi/tree/asf-site/website/${versionDocsDirPath}/${docPath}` +} else { + return `https://github.com/apache/hudi/tree/asf-site/website/i18n/${locale}/docusaurus-plugin-content-${versionDocsDirPath}/${version}/${docPath}` +} + }, + // type EditUrlFunction = (params: { + // version: string; + // versionDocsDirPath: string; + // docPath: string; + // permalink: string; + // locale: string; + // }) => string | undefined; Review comment: Could you remove these if not used? ## File path: website/i18n/cn/docusaurus-plugin-content-docs/current/migration_guide.md ## @@ -1,58 +1,46 @@ --- -title: Migration Guide -keywords: [ hudi, migration, use case] -summary: In this page, we will discuss some available tools for migrating your existing dataset into a Hudi dataset +title: 迁移指南 +keywords: [ hudi, migration, use case, 迁移, 用例] +summary: 在本页中,我们将讨论有效的工具,他们能将你的现有数据集迁移到 Hudi 数据集。 last_modified_at: 2019-12-30T15:59:57-04:00 language: cn --- -Hudi maintains metadata such as commit timeline and indexes to manage a dataset. The commit timelines helps to understand the actions happening on a dataset as well as the current state of a dataset. Indexes are used by Hudi to maintain a record key to file id mapping to efficiently locate a record. At the moment, Hudi supports writing only parquet columnar formats. -To be able to start using Hudi for your existing dataset, you will need to migrate your existing dataset into a Hudi managed dataset. There are a couple of ways to achieve this. +Hudi 维护了元数据,包括提交的时间线和索引,来管理一个数据集。提交的时间线帮助理解一个数据集上发生的操作,以及数据集的当前状态。索引则被 Hudi 用来维护一个映射到文件 ID 的记录键,它能高效地定位一条记录。目前, Hudi 仅支持写 Parquet 列式格式 。 +为了在你的现有数据集上开始使用 Hudi ,你需要将你的现有数据集迁移到 Hudi 管理的数据集中。以下有多种方法实现这个目的。 -## Approaches +## 方法 -### Use Hudi for new partitions alone -Hudi can be used to manage an existing dataset without affecting/altering the historical data already present in the -dataset. Hudi has been implemented to be compatible with such a mixed dataset with a caveat that either the complete -Hive partition is Hudi managed or not. Thus the lowest granularity at which Hudi manages a dataset is a Hive -partition. Start using the datasource API or the WriteClient to write to the dataset and make sure you start writing -to a new partition or convert your last N partitions into Hudi instead of the entire table. Note, since the historical - partitions are not managed by HUDI, none of the primitives provided by HUDI work on the data in those partitions. More concretely, one cannot perform upserts or incremental pull on such older partitions not managed by the HUDI dataset. -Take this approach if your dataset is an append only type of dataset and you do not expect to perform any updates to existing (or non Hudi managed) partitions. +### 将 Hudi 仅用于新分区 +Hudi 可以被用来在不影响/改变数据集历史数据的情况下管理一个现有的数据集。 Hudi 已经实现为能够兼容这样的数据集,不论整个 Hive 分区是否由 Hudi 管理。因此, Hudi 管理一个数据集的最低粒度是一个 Hive 分区。使用数据源 API 或 WriteClient 来写入数据集,并确保你开始写入的是一个新分区,或者将过去的 N 个分区而非整张表转换为 Hudi 。需要注意的是,由于历史分区不是由 Hudi 管理的, Hudi 提供的任何操作在那些分区上都不生效。更具体地说,无法在这些非 Hudi 管理的旧分区上进行插入更新或增量拉取。 Review comment: `Hudi 已经实现为能够兼容这样的数据集,不论整个 Hive 分区是否由 Hudi 管理。` -> `Hudi 已经实现兼容这样的数据集,需要注意的是,单个 Hive 分区要么完全由 Hudi 管理,要么不由 Hudi 管理。` ## File path: website/i18n/cn/docusaurus-plugin-content-docs/current/migration_guide.md ## @@ -1,58 +1,46 @@ --- -title: Migration Guide -keywords: [ hudi, migration, use case] -summary: In this page, we will discuss some available tools for migrating your existing dataset into a Hudi dataset
[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot commented on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-992099257 ## CI report: * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4223) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot removed a comment on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-992081147 ## CI report: * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4223) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.
hudi-bot removed a comment on pull request #3887: URL: https://github.com/apache/hudi/pull/3887#issuecomment-992071081 ## CI report: * 82ec7c1e3c40af686b9a4dcc5af99ebd3671913d UNKNOWN * fe0c868afdbc57efd8628c7380da7469e5108476 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3325) * e314a3c3cbe9a90b4d5f72d2b46a157985288ea1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4222) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.
hudi-bot commented on pull request #3887: URL: https://github.com/apache/hudi/pull/3887#issuecomment-992091282 ## CI report: * 82ec7c1e3c40af686b9a4dcc5af99ebd3671913d UNKNOWN * e314a3c3cbe9a90b4d5f72d2b46a157985288ea1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4222) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2993) Support sink multiple tables with schema evolution into Hudi
Casel Chen created HUDI-2993: Summary: Support sink multiple tables with schema evolution into Hudi Key: HUDI-2993 URL: https://issues.apache.org/jira/browse/HUDI-2993 Project: Apache Hudi Issue Type: New Feature Components: Flink Integration Reporter: Casel Chen We have hundreds of OLTP tables that need to be synchronized to Hudi data Lake in real time. If we launch a synchronization job per table, resources and management will be a big challenge. Therefore, we eagerly looking for a FULL database synchronization tool, which can synchronize multiple tables to Hudi data lake in one job. At the same time, it is better to support schema evolution, because OLTP tables often modify the schema of tables. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot removed a comment on pull request #4291: [HUDI-2990] Delete partitions without metadata sync to hms
hudi-bot removed a comment on pull request #4291: URL: https://github.com/apache/hudi/pull/4291#issuecomment-992061874 ## CI report: * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN * 550ba7889e0d4c553b5347f26c60e97a27844468 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4219) * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot commented on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-992081147 ## CI report: * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4223) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4291: [HUDI-2990] Delete partitions without metadata sync to hms
hudi-bot commented on pull request #4291: URL: https://github.com/apache/hudi/pull/4291#issuecomment-992081197 ## CI report: * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot removed a comment on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-992075851 ## CI report: * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
xiarixiaoyao commented on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-992080922 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot commented on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-992075851 ## CI report: * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot removed a comment on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-992054747 ## CI report: * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094) * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.
hudi-bot removed a comment on pull request #3887: URL: https://github.com/apache/hudi/pull/3887#issuecomment-992064349 ## CI report: * 82ec7c1e3c40af686b9a4dcc5af99ebd3671913d UNKNOWN * fe0c868afdbc57efd8628c7380da7469e5108476 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3325) * e314a3c3cbe9a90b4d5f72d2b46a157985288ea1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.
hudi-bot commented on pull request #3887: URL: https://github.com/apache/hudi/pull/3887#issuecomment-992071081 ## CI report: * 82ec7c1e3c40af686b9a4dcc5af99ebd3671913d UNKNOWN * fe0c868afdbc57efd8628c7380da7469e5108476 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3325) * e314a3c3cbe9a90b4d5f72d2b46a157985288ea1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4222) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.
zhangyue19921010 commented on a change in pull request #3887: URL: https://github.com/apache/hudi/pull/3887#discussion_r767386257 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FileSystemGuardConfig.java ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.fs; + +import org.apache.hudi.common.config.ConfigClassProperty; +import org.apache.hudi.common.config.ConfigGroups; +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.config.HoodieConfig; + +import java.io.File; +import java.io.FileReader; +import java.io.IOException; +import java.util.Properties; + +/** + * The consistency guard relevant config options. + */ +@ConfigClassProperty(name = "FileSystem Guard Configurations", +groupName = ConfigGroups.Names.WRITE_CLIENT, +description = "The filesystem guard related config options, to help deal with runtime exception like s3 list/get/put/delete performance issues.") +public class FileSystemGuardConfig extends HoodieConfig { Review comment: Sure, "FileSystemRetryConfig" is more appropriate. Changed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.
zhangyue19921010 commented on a change in pull request #3887: URL: https://github.com/apache/hudi/pull/3887#discussion_r767386141 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/RetryHelper.java ## @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.util; + +import org.apache.hudi.common.fs.HoodieWrapperFileSystem; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; + +import java.io.IOException; +import java.util.Random; + +public class RetryHelper { + private static final Logger LOG = LogManager.getLogger(RetryHelper.class); + private HoodieWrapperFileSystem.CheckedFunction func; + private int num; + private long maxIntervalTime; + private long initialIntervalTime = 100L; + private String taskInfo = "N/A"; + + public RetryHelper() { + } + + public RetryHelper(String taskInfo) { +this.taskInfo = taskInfo; + } + + public RetryHelper tryWith(HoodieWrapperFileSystem.CheckedFunction func) { +this.func = func; +return this; + } + + public RetryHelper tryNum(int num) { +this.num = num; +return this; + } + + public RetryHelper tryTaskInfo(String taskInfo) { +this.taskInfo = taskInfo; +return this; + } + + public RetryHelper tryMaxInterval(long time) { +maxIntervalTime = time; +return this; + } + + public RetryHelper tryInitialInterval(long time) { +initialIntervalTime = time; +return this; + } + + public T start() throws IOException { +int retries = 0; +boolean success = false; +RuntimeException exception = null; +T t = null; +do { + long waitTime = Math.min(getWaitTimeExp(retries), maxIntervalTime); + try { +t = func.get(); +success = true; +break; + } catch (RuntimeException e) { +// deal with RuntimeExceptions such like AmazonS3Exception 503 +exception = e; +LOG.warn("Catch RuntimeException " + taskInfo + ", will retry after " + waitTime + " ms.", e); +try { + Thread.sleep(waitTime); +} catch (InterruptedException ex) { +// ignore InterruptedException here +} +retries++; + } +} while (retries <= num); Review comment: emmm, we only do `++` when caught exception, so maybe can't move it out of `catch() {}` block. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.
zhangyue19921010 commented on a change in pull request #3887: URL: https://github.com/apache/hudi/pull/3887#discussion_r767385839 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/RetryHelper.java ## @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.util; + +import org.apache.hudi.common.fs.HoodieWrapperFileSystem; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; + +import java.io.IOException; +import java.util.Random; + +public class RetryHelper { + private static final Logger LOG = LogManager.getLogger(RetryHelper.class); + private HoodieWrapperFileSystem.CheckedFunction func; + private int num; + private long maxIntervalTime; + private long initialIntervalTime = 100L; + private String taskInfo = "N/A"; + + public RetryHelper() { + } + + public RetryHelper(String taskInfo) { +this.taskInfo = taskInfo; + } + + public RetryHelper tryWith(HoodieWrapperFileSystem.CheckedFunction func) { +this.func = func; +return this; + } + + public RetryHelper tryNum(int num) { +this.num = num; +return this; + } + + public RetryHelper tryTaskInfo(String taskInfo) { +this.taskInfo = taskInfo; +return this; + } + + public RetryHelper tryMaxInterval(long time) { +maxIntervalTime = time; +return this; + } + + public RetryHelper tryInitialInterval(long time) { +initialIntervalTime = time; +return this; + } + + public T start() throws IOException { +int retries = 0; +boolean success = false; +RuntimeException exception = null; +T t = null; Review comment: Sure thing, changed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.
zhangyue19921010 commented on a change in pull request #3887: URL: https://github.com/apache/hudi/pull/3887#discussion_r767385794 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FileSystemGuardConfig.java ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.fs; + +import org.apache.hudi.common.config.ConfigClassProperty; +import org.apache.hudi.common.config.ConfigGroups; +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.config.HoodieConfig; + +import java.io.File; +import java.io.FileReader; +import java.io.IOException; +import java.util.Properties; + +/** + * The consistency guard relevant config options. + */ +@ConfigClassProperty(name = "FileSystem Guard Configurations", +groupName = ConfigGroups.Names.WRITE_CLIENT, +description = "The filesystem guard related config options, to help deal with runtime exception like s3 list/get/put/delete performance issues.") +public class FileSystemGuardConfig extends HoodieConfig { + + public static final ConfigProperty FILESYSTEM_RETRY_ENABLE = ConfigProperty + .key("hoodie.filesystem.action.retry.enabled") + .defaultValue("false") + .sinceVersion("0.10.0") + .withDocumentation("Enabled to handle S3 list/get/delete etc file system performance issue."); + + public static final ConfigProperty INITIAL_RETRY_INTERVAL_MS = ConfigProperty + .key("hoodie.filesystem.action.retry.initial_interval_ms") + .defaultValue(100L) + .sinceVersion("0.10.0") + .withDocumentation("Amount of time (in ms) to wait, before retry to do operations on storage."); + + public static final ConfigProperty MAX_RETRY_INTERVAL_MS = ConfigProperty + .key("hoodie.filesystem.action.retry.max_interval_ms") + .defaultValue(2000L) + .sinceVersion("0.10.0") + .withDocumentation("Maximum amount of time (in ms), to wait for next retry."); + + public static final ConfigProperty MAX_RETRY_NUMBERS = ConfigProperty + .key("hoodie.filesystem.action.retry.max_numbers") Review comment: We use `(long) Math.pow(2, retryCount) * initialIntervalTime + random.nextInt(100);` to calculate sleep time before each retry. And we may need `MAX_RETRY_INTERVAL_MS` to control the maximum duration of a single sleep in case sleep too long`Math.min(getWaitTimeExp(retries), maxIntervalTime)`. Also use `MAX_RETRY_NUMBERS` to control max retry numbers to limit total retry time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.
hudi-bot commented on pull request #3887: URL: https://github.com/apache/hudi/pull/3887#issuecomment-992064349 ## CI report: * 82ec7c1e3c40af686b9a4dcc5af99ebd3671913d UNKNOWN * fe0c868afdbc57efd8628c7380da7469e5108476 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3325) * e314a3c3cbe9a90b4d5f72d2b46a157985288ea1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.
hudi-bot removed a comment on pull request #3887: URL: https://github.com/apache/hudi/pull/3887#issuecomment-966822668 ## CI report: * 82ec7c1e3c40af686b9a4dcc5af99ebd3671913d UNKNOWN * fe0c868afdbc57efd8628c7380da7469e5108476 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3325) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.
zhangyue19921010 commented on a change in pull request #3887: URL: https://github.com/apache/hudi/pull/3887#discussion_r767384153 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FileSystemGuardConfig.java ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.fs; + +import org.apache.hudi.common.config.ConfigClassProperty; +import org.apache.hudi.common.config.ConfigGroups; +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.config.HoodieConfig; + +import java.io.File; +import java.io.FileReader; +import java.io.IOException; +import java.util.Properties; + +/** + * The consistency guard relevant config options. + */ +@ConfigClassProperty(name = "FileSystem Guard Configurations", +groupName = ConfigGroups.Names.WRITE_CLIENT, +description = "The filesystem guard related config options, to help deal with runtime exception like s3 list/get/put/delete performance issues.") +public class FileSystemGuardConfig extends HoodieConfig { + + public static final ConfigProperty FILESYSTEM_RETRY_ENABLE = ConfigProperty + .key("hoodie.filesystem.action.retry.enabled") Review comment: Sure, changed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4291: [HUDI-2990] Delete partitions without metadata sync to hms
hudi-bot commented on pull request #4291: URL: https://github.com/apache/hudi/pull/4291#issuecomment-992061874 ## CI report: * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN * 550ba7889e0d4c553b5347f26c60e97a27844468 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4219) * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4291: [HUDI-2990] Delete partitions without metadata sync to hms
hudi-bot removed a comment on pull request #4291: URL: https://github.com/apache/hudi/pull/4291#issuecomment-992044866 ## CI report: * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN * 550ba7889e0d4c553b5347f26c60e97a27844468 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4219) * 301d9ab65f3983ecf77b192d4af9401b8d60b059 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #4249: [SUPPORT]FLINK CDC WRITE HUDI, restart job get exception:org.apache.hudi.org.apache.avro.InvalidAvroMagicException: Not an Avro data file
danny0405 commented on issue #4249: URL: https://github.com/apache/hudi/issues/4249#issuecomment-992055023 Can you try 0.10.0 please ? Seems has been fixed in the latest version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot commented on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-992054747 ## CI report: * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094) * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot removed a comment on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-992053877 ## CI report: * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094) * 893fe09af34779c0ef98b732a418c9ba941a2bfc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimal
xiarixiaoyao commented on a change in pull request #4253: URL: https://github.com/apache/hudi/pull/4253#discussion_r767376871 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala ## @@ -723,4 +723,26 @@ class TestCOWDataSource extends HoodieClientTestBase { val result = spark.sql("select * from tmptable limit 1").collect()(0) result.schema.contains(new StructField("partition", StringType, true)) } + + @Test + def testWriteSmallPrecisionDecimalTable(): Unit = { +val records1 = recordsToStrings(dataGen.generateInserts("001", 5)).toList +val inputDF1 = spark.read.json(spark.sparkContext.parallelize(records1, 2)) + .withColumn("shortDecimal", lit(new java.math.BigDecimal(s"2090."))) // create decimalType(8, 4) +inputDF1.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL) + .mode(SaveMode.Overwrite) + .save(basePath) + +val records2 = recordsToStrings(dataGen.generateUpdates("002", 5)).toList +val inputDF2 = spark.read.json(spark.sparkContext.parallelize(records2, 2)) + .withColumn("shortDecimal", lit(new java.math.BigDecimal(s"2090."))) // create decimalType(8, 4) +inputDF2.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL) + .mode(SaveMode.Append) + .save(basePath) +assert(spark.read.format("hudi").load(basePath).count() == 5) Review comment: yes, fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot commented on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-992053877 ## CI report: * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094) * 893fe09af34779c0ef98b732a418c9ba941a2bfc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot removed a comment on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-988825224 ## CI report: * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org