[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing
hudi-bot commented on pull request #4693: URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025464066 ## CI report: * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing
hudi-bot removed a comment on pull request #4693: URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025426843 ## CI report: * 238b128260cab3ad11c8e00bd20871b45e112c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533) * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing
hudi-bot commented on pull request #4693: URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025426843 ## CI report: * 238b128260cab3ad11c8e00bd20871b45e112c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533) * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing
hudi-bot removed a comment on pull request #4693: URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025425611 ## CI report: * 238b128260cab3ad11c8e00bd20871b45e112c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533) * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing
hudi-bot commented on pull request #4693: URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025425611 ## CI report: * 238b128260cab3ad11c8e00bd20871b45e112c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533) * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing
hudi-bot removed a comment on pull request #4693: URL: https://github.com/apache/hudi/pull/4693#issuecomment-1022409891 ## CI report: * 238b128260cab3ad11c8e00bd20871b45e112c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #3964: [HUDI-2732][RFC-38] Spark Datasource V2 Integration
xushiyan commented on pull request #3964: URL: https://github.com/apache/hudi/pull/3964#issuecomment-1025423884 > @leesf : I did not go through the lineage of this patch. But I do know we landed the another PR related to spark datasource V2. so, is this patch still valid or can we close it out. @nsivabalan this is the RFC PR. Current work is in https://github.com/apache/hudi/pull/4611 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL
xushiyan commented on issue #4154: URL: https://github.com/apache/hudi/issues/4154#issuecomment-1025422834 FYI @codope, can you take this please? > Hi team @nsivabalan, I only tested with Hive yesterday and it worked. However, testing with Presto `0.247` now and I still cannot read the table correctly. > > I did some research and found out Hudi was bumped to `0.9.0` in PrestoDB `0.265` ([PR](https://github.com/prestodb/presto/commit/b52db2c4c2baa720489ae3908acd9303a41081fc), [release notes](https://prestodb.io/docs/current/release/release-0.265.html)). For Presto `<0.265`, Hudi `0.5.3` is installed. > > I believe that's where the problem is. My version of Presto only has Hudi `0.5.3` and thus I cannot correctly read replacecommit files. > > The [Hudi docs](https://hudi.apache.org/docs/query_engine_setup#PrestoDB) do not mention anything about this issue. It only says "Presto >= 0.240 | No action needed. Hudi 0.5.3 version is a compile time dependency." > > Am I correct assuming Hudi `0.5.3` cannot correctly read replacecommit files? If yes, could we maybe update the Docs to reflect this? > > Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] melin commented on issue #4627: [SUPPORT] Dremio integration
melin commented on issue #4627: URL: https://github.com/apache/hudi/issues/4627#issuecomment-1025415663 In many large enterprises, especially banks, there is a strong DEMAND for BI. Leaders need to look at BI charts every day. As data gradually increases, BI queries are slow, which will drive the acceleration of queries. Pre-accelerated data is required to meet requirements. Dremio can meet this need by analyzing historical SQL and automating accelerated data. Hudi has a lot of features that Iceberg doesn't, so we tend to use hudi. But Dremio does not currently support Hudi. > Message ID: ***@***.***> > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3204) spark on TimestampBasedKeyGenerator has no result when query by partition column
[ https://issues.apache.org/jira/browse/HUDI-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3204: - Reviewers: sivabalan narayanan (was: Raymond Xu) > spark on TimestampBasedKeyGenerator has no result when query by partition > column > > > Key: HUDI-3204 > URL: https://issues.apache.org/jira/browse/HUDI-3204 > Project: Apache Hudi > Issue Type: Bug > Components: spark >Reporter: Yann Byron >Assignee: Yann Byron >Priority: Critical > Labels: hudi-on-call, pull-request-available, sev:critical > Fix For: 0.11.0 > > Original Estimate: 3h > Remaining Estimate: 3h > > > {code:java} > import org.apache.hudi.DataSourceWriteOptions > import org.apache.hudi.config.HoodieWriteConfig > import org.apache.hudi.keygen.constant.KeyGeneratorOptions._ > import org.apache.hudi.hive.MultiPartKeysValueExtractor > val df = Seq((1, "z3", 30, "v1", "2018-09-23"), (2, "z3", 35, "v1", > "2018-09-24")).toDF("id", "name", "age", "ts", "data_date") > // mor > df.write.format("hudi"). > option(HoodieWriteConfig.TABLE_NAME, "issue_4417_mor"). > option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). > option("hoodie.datasource.write.recordkey.field", "id"). > option("hoodie.datasource.write.partitionpath.field", "data_date"). > option("hoodie.datasource.write.precombine.field", "ts"). > option("hoodie.datasource.write.keygenerator.class", > "org.apache.hudi.keygen.TimestampBasedKeyGenerator"). > option("hoodie.deltastreamer.keygen.timebased.timestamp.type", "DATE_STRING"). > option("hoodie.deltastreamer.keygen.timebased.output.dateformat", > "/MM/dd"). > option("hoodie.deltastreamer.keygen.timebased.timezone", "GMT+8:00"). > option("hoodie.deltastreamer.keygen.timebased.input.dateformat", > "-MM-dd"). > mode(org.apache.spark.sql.SaveMode.Append). > save("file:///tmp/hudi/issue_4417_mor") > +---++--+--++---++---+---+--+ > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| > _hoodie_file_name| id|name|age| ts| data_date| > +---++--+--++---++---+---+--+ > | 20220110172709324|20220110172709324...| 2| > 2018/09/24|703e56d3-badb-40b...| 2| z3| 35| v1|2018-09-24| > | 20220110172709324|20220110172709324...| 1| > 2018/09/23|58fde2b3-db0e-464...| 1| z3| 30| v1|2018-09-23| > +---++--+--++---++---+---+--+ > // can not query any data > spark.read.format("hudi").load("file:///tmp/hudi/issue_4417_mor").where("data_date > = '2018-09-24'") > // still can not query any data > spark.read.format("hudi").load("file:///tmp/hudi/issue_4417_mor").where("data_date > = '2018/09/24'").show > // cow > df.write.format("hudi"). > option(HoodieWriteConfig.TABLE_NAME, "issue_4417_cow"). > option("hoodie.datasource.write.table.type", "COPY_ON_WRITE"). > option("hoodie.datasource.write.recordkey.field", "id"). > option("hoodie.datasource.write.partitionpath.field", "data_date"). > option("hoodie.datasource.write.precombine.field", "ts"). > option("hoodie.datasource.write.keygenerator.class", > "org.apache.hudi.keygen.TimestampBasedKeyGenerator"). > option("hoodie.deltastreamer.keygen.timebased.timestamp.type", "DATE_STRING"). > option("hoodie.deltastreamer.keygen.timebased.output.dateformat", > "/MM/dd"). > option("hoodie.deltastreamer.keygen.timebased.timezone", "GMT+8:00"). > option("hoodie.deltastreamer.keygen.timebased.input.dateformat", > "-MM-dd"). > mode(org.apache.spark.sql.SaveMode.Append). > save("file:///tmp/hudi/issue_4417_cow") > +---++--+--++---++---+---+--+ > > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| > _hoodie_file_name| id|name|age| ts| data_date| > +---++--+--++---++---+---+--+ > | 20220110172721896|20220110172721896...| 2| > 2018/09/24|81cc7819-a0d1-4e6...| 2| z3| 35| v1|2018/09/24| | > 20220110172721896|20220110172721896...| 1| > 2018/09/23|d428019b-a829-41a...| 1| z3| 30| v1|2018/09/23| > +---++--+--++---++---+---+--+ > > // can not query any data > spark.read.format("hudi").load("file:///tmp/h
[jira] [Updated] (HUDI-3204) spark on TimestampBasedKeyGenerator has no result when query by partition column
[ https://issues.apache.org/jira/browse/HUDI-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3204: - Status: In Progress (was: Open) > spark on TimestampBasedKeyGenerator has no result when query by partition > column > > > Key: HUDI-3204 > URL: https://issues.apache.org/jira/browse/HUDI-3204 > Project: Apache Hudi > Issue Type: Bug > Components: spark >Reporter: Yann Byron >Assignee: Yann Byron >Priority: Critical > Labels: hudi-on-call, pull-request-available, sev:critical > Fix For: 0.11.0 > > Original Estimate: 3h > Remaining Estimate: 3h > > > {code:java} > import org.apache.hudi.DataSourceWriteOptions > import org.apache.hudi.config.HoodieWriteConfig > import org.apache.hudi.keygen.constant.KeyGeneratorOptions._ > import org.apache.hudi.hive.MultiPartKeysValueExtractor > val df = Seq((1, "z3", 30, "v1", "2018-09-23"), (2, "z3", 35, "v1", > "2018-09-24")).toDF("id", "name", "age", "ts", "data_date") > // mor > df.write.format("hudi"). > option(HoodieWriteConfig.TABLE_NAME, "issue_4417_mor"). > option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). > option("hoodie.datasource.write.recordkey.field", "id"). > option("hoodie.datasource.write.partitionpath.field", "data_date"). > option("hoodie.datasource.write.precombine.field", "ts"). > option("hoodie.datasource.write.keygenerator.class", > "org.apache.hudi.keygen.TimestampBasedKeyGenerator"). > option("hoodie.deltastreamer.keygen.timebased.timestamp.type", "DATE_STRING"). > option("hoodie.deltastreamer.keygen.timebased.output.dateformat", > "/MM/dd"). > option("hoodie.deltastreamer.keygen.timebased.timezone", "GMT+8:00"). > option("hoodie.deltastreamer.keygen.timebased.input.dateformat", > "-MM-dd"). > mode(org.apache.spark.sql.SaveMode.Append). > save("file:///tmp/hudi/issue_4417_mor") > +---++--+--++---++---+---+--+ > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| > _hoodie_file_name| id|name|age| ts| data_date| > +---++--+--++---++---+---+--+ > | 20220110172709324|20220110172709324...| 2| > 2018/09/24|703e56d3-badb-40b...| 2| z3| 35| v1|2018-09-24| > | 20220110172709324|20220110172709324...| 1| > 2018/09/23|58fde2b3-db0e-464...| 1| z3| 30| v1|2018-09-23| > +---++--+--++---++---+---+--+ > // can not query any data > spark.read.format("hudi").load("file:///tmp/hudi/issue_4417_mor").where("data_date > = '2018-09-24'") > // still can not query any data > spark.read.format("hudi").load("file:///tmp/hudi/issue_4417_mor").where("data_date > = '2018/09/24'").show > // cow > df.write.format("hudi"). > option(HoodieWriteConfig.TABLE_NAME, "issue_4417_cow"). > option("hoodie.datasource.write.table.type", "COPY_ON_WRITE"). > option("hoodie.datasource.write.recordkey.field", "id"). > option("hoodie.datasource.write.partitionpath.field", "data_date"). > option("hoodie.datasource.write.precombine.field", "ts"). > option("hoodie.datasource.write.keygenerator.class", > "org.apache.hudi.keygen.TimestampBasedKeyGenerator"). > option("hoodie.deltastreamer.keygen.timebased.timestamp.type", "DATE_STRING"). > option("hoodie.deltastreamer.keygen.timebased.output.dateformat", > "/MM/dd"). > option("hoodie.deltastreamer.keygen.timebased.timezone", "GMT+8:00"). > option("hoodie.deltastreamer.keygen.timebased.input.dateformat", > "-MM-dd"). > mode(org.apache.spark.sql.SaveMode.Append). > save("file:///tmp/hudi/issue_4417_cow") > +---++--+--++---++---+---+--+ > > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| > _hoodie_file_name| id|name|age| ts| data_date| > +---++--+--++---++---+---+--+ > | 20220110172721896|20220110172721896...| 2| > 2018/09/24|81cc7819-a0d1-4e6...| 2| z3| 35| v1|2018/09/24| | > 20220110172721896|20220110172721896...| 1| > 2018/09/23|d428019b-a829-41a...| 1| z3| 30| v1|2018/09/23| > +---++--+--++---++---+---+--+ > > // can not query any data > spark.read.format("hudi").load("file:///tmp/hudi/issue_4417_co
[jira] [Closed] (HUDI-3267) On-call team to triage GH issues, PRs, and JIRAs
[ https://issues.apache.org/jira/browse/HUDI-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-3267. Resolution: Done > On-call team to triage GH issues, PRs, and JIRAs > > > Key: HUDI-3267 > URL: https://issues.apache.org/jira/browse/HUDI-3267 > Project: Apache Hudi > Issue Type: Task > Components: dev-experience >Reporter: Raymond Xu >Priority: Major > Labels: hudi-on-call > Original Estimate: 8h > Time Spent: 12h 40m > Remaining Estimate: 0h > > h4. triaged GH issues > # https://github.com/apache/hudi/issues/4701 > # https://github.com/apache/hudi/issues/3751 > h4. triaged cirtical PR > # https://github.com/apache/hudi/pull/4608 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #2419: [HUDI-1421] Improvement of failure recovery for HoodieFlinkStreamer.
hudi-bot commented on pull request #2419: URL: https://github.com/apache/hudi/pull/2419#issuecomment-1025355089 ## CI report: * ba6e4d2888efe9cfbc1b7fcc6897b97a4ad68a1b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #1817: [HUDI-651] Fix incremental queries in hive for MOR tables
hudi-bot commented on pull request #1817: URL: https://github.com/apache/hudi/pull/1817#issuecomment-1025355020 ## CI report: * 304061316e3f633fef102f9f217c3e446ce14156 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #3711: [SUPPORT] flink jar with hudi 0.8.0 and how parsing complex structured data
xushiyan commented on issue #3711: URL: https://github.com/apache/hudi/issues/3711#issuecomment-1025353895 Closing due to inactive -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan closed issue #3711: [SUPPORT] flink jar with hudi 0.8.0 and how parsing complex structured data
xushiyan closed issue #3711: URL: https://github.com/apache/hudi/issues/3711 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan closed issue #3707: [SUPPORT] Flink Hudi write on S3 DataStreamSinkProvider error
xushiyan closed issue #3707: URL: https://github.com/apache/hudi/issues/3707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #3707: [SUPPORT] Flink Hudi write on S3 DataStreamSinkProvider error
xushiyan commented on issue #3707: URL: https://github.com/apache/hudi/issues/3707#issuecomment-1025352891 @ibudanaev-crunchyroll I hope your problem was resolved given flink 1.13.x was supported. @awsalialem Hope @danny0405 's reply gave you the pointer to your dependency problem https://github.com/apache/hudi/issues/3707#issuecomment-960472407 Closing this now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: [MINOR] Fixing powered by pages with logos against company (#4722)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new b46b902 [MINOR] Fixing powered by pages with logos against company (#4722) b46b902 is described below commit b46b90231a35aba1e0f0400dfb26419bb27b4d2e Author: Sivabalan Narayanan AuthorDate: Sun Jan 30 22:13:33 2022 -0500 [MINOR] Fixing powered by pages with logos against company (#4722) --- website/src/pages/powered-by.md| 202 +++-- .../static/assets/images/powers/DoubleVerify.png | Bin 0 -> 36270 bytes .../static/assets/images/powers/GE_aviation.png| Bin 0 -> 72123 bytes website/static/assets/images/powers/ai_bank.png| Bin 0 -> 22077 bytes website/static/assets/images/powers/amazon.png | Bin 0 -> 110569 bytes website/static/assets/images/powers/bilibili.png | Bin 0 -> 70800 bytes website/static/assets/images/powers/bytedance.png | Bin 0 -> 83244 bytes website/static/assets/images/powers/cirium.png | Bin 0 -> 68961 bytes .../static/assets/images/powers/disney_hotstar.png | Bin 0 -> 28807 bytes .../static/assets/images/powers/google_cloud.png | Bin 0 -> 52087 bytes website/static/assets/images/powers/grofers.png| Bin 0 -> 51330 bytes website/static/assets/images/powers/halodoc.png| Bin 0 -> 69591 bytes website/static/assets/images/powers/hopsworks.png | Bin 0 -> 26456 bytes website/static/assets/images/powers/huawei.png | Bin 0 -> 209684 bytes .../static/assets/images/powers/tencent_cloud.png | Bin 0 -> 20976 bytes website/static/assets/images/powers/udemy.png | Bin 0 -> 58830 bytes website/static/assets/images/powers/walmart.png| Bin 0 -> 90716 bytes 17 files changed, 149 insertions(+), 53 deletions(-) diff --git a/website/src/pages/powered-by.md b/website/src/pages/powered-by.md index 8aff705..01acd40 100644 --- a/website/src/pages/powered-by.md +++ b/website/src/pages/powered-by.md @@ -3,117 +3,213 @@ title: "Powered By" keywords: [hudi, powered-by] last_modified_at: 2019-12-31T15:59:57-04:00 --- -export const companiesList = [ -{ img_path: '/assets/images/powers/uber.png', }, -{ img_path: '/assets/images/powers/aws.jpg', }, - { img_path: '/assets/images/powers/alibaba.png', }, - { img_path: '/assets/images/powers/clinbrain.png', }, - { img_path: '/assets/images/powers/emis.jpg', }, - { img_path: '/assets/images/powers/yield.png', }, - { img_path: '/assets/images/powers/qq.png', }, - { img_path: '/assets/images/powers/tongcheng.png', }, - { img_path: '/assets/images/powers/yotpo.png', }, - { img_path: '/assets/images/powers/kyligence.png', }, - { img_path: '/assets/images/powers/tathastu.png', }, - { img_path: '/assets/images/powers/shunfeng.png', }, - { img_path: '/assets/images/powers/lingyue.png', }, - { img_path: '/assets/images/powers/37.PNG', }, - { img_path: '/assets/images/powers/H3C.JPG', }, - { img_path: '/assets/images/powers/moveworks.png', }, - { img_path: '/assets/images/powers/robinhood.png', }, - { img_path: '/assets/images/powers/zendesk.png', }, -]; # Who's Using ## Adoption -### Alibaba Cloud + + + + +### Alibaba Cloud Alibaba Cloud provides cloud computing services to online businesses and Alibaba's own e-commerce ecosystem, Apache Hudi is integrated into Alibaba Cloud [Data Lake Analytics](https://www.alibabacloud.com/help/product/70174.htm) offering real-time analysis on hudi dataset. -### Amazon Web Services + + + +### Amazon Web Services Amazon Web Services is the World's leading cloud services provider. Apache Hudi is [pre-installed](https://aws.amazon.com/emr/features/hudi/) with the AWS Elastic Map Reduce offering, providing means for AWS users to perform record-level updates/deletes and manage storage efficiently. -### Clinbrain + + + +### Clinbrain [Clinbrain](https://www.clinbrain.com/) is the leader of big data platform and usage in medical industry. We have built 200 medical big data centers by integrating Hudi Data Lake solution in numerous hospitals. Hudi provides the ability to upsert and delete on hdfs, at the same time, it can make the fresh data-stream up-to-date efficiently in hadoop system with the hudi incremental view. -### EMIS Health + + + +### EMIS Health [EMIS Health](https://www.emishealth.com/) is the largest provider of Primary Care IT software in the UK with datasets including more than 500Bn healthcare records. HUDI is used to manage their analytics dataset in production and keeping them up-to-date with their upstream source. Presto is being used to query the data written in HUDI format. -### Grofers + + + +### Grofers [Grofers](https://grofers.com) is a grocery delivery provider operating across APAC region.
[GitHub] [hudi] nsivabalan merged pull request #4722: [MINOR] Fixing powered by pages with logos against company
nsivabalan merged pull request #4722: URL: https://github.com/apache/hudi/pull/4722 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan closed issue #3680: [SUPPORT]Failed to sync data to hive-3.1.2 by flink-sql
xushiyan closed issue #3680: URL: https://github.com/apache/hudi/issues/3680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #3680: [SUPPORT]Failed to sync data to hive-3.1.2 by flink-sql
xushiyan commented on issue #3680: URL: https://github.com/apache/hudi/issues/3680#issuecomment-1025308955 @Cherry-Puppy @zhouhongyu888 we're closing this assuming you had the problem resolved by @danny0405 in other channel. Any follow-up comment feel free to post here. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #3215: [SUPPORT] Flink Hudi Quickstart - SqlClientException
xushiyan commented on issue #3215: URL: https://github.com/apache/hudi/issues/3215#issuecomment-1025307418 @anikait-rao have you got the problem resolved? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #3747: [SUPPORT] Hive Sync process stuck and unable to exit
xushiyan commented on issue #3747: URL: https://github.com/apache/hudi/issues/3747#issuecomment-1025303976 @stym06 thanks for filing https://issues.apache.org/jira/browse/HUDI-2733 and the patch. Let's continue collaborating from there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan closed issue #3747: [SUPPORT] Hive Sync process stuck and unable to exit
xushiyan closed issue #3747: URL: https://github.com/apache/hudi/issues/3747 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3267) On-call team to triage GH issues, PRs, and JIRAs
[ https://issues.apache.org/jira/browse/HUDI-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3267: - Description: h4. triaged GH issues # https://github.com/apache/hudi/issues/4701 # https://github.com/apache/hudi/issues/3751 h4. triaged cirtical PR # https://github.com/apache/hudi/pull/4608 was: h4. triaged GH issues # https://github.com/apache/hudi/issues/4701 # h4. triaged cirtical PR # https://github.com/apache/hudi/pull/4608 > On-call team to triage GH issues, PRs, and JIRAs > > > Key: HUDI-3267 > URL: https://issues.apache.org/jira/browse/HUDI-3267 > Project: Apache Hudi > Issue Type: Task > Components: dev-experience >Reporter: Raymond Xu >Priority: Major > Labels: hudi-on-call > Original Estimate: 8h > Time Spent: 8h 40m > Remaining Estimate: 0h > > h4. triaged GH issues > # https://github.com/apache/hudi/issues/4701 > # https://github.com/apache/hudi/issues/3751 > h4. triaged cirtical PR > # https://github.com/apache/hudi/pull/4608 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] xushiyan commented on issue #3751: [SUPPORT] Slow Write Speeds to Hudi
xushiyan commented on issue #3751: URL: https://github.com/apache/hudi/issues/3751#issuecomment-1025300721 > num-executors 19 > executor-cores 1 > executor-memory 6g @MikeBuh this setting means you probably can have 30-40 parallelism to set for the spark and shuffle partitions and hudi parallelisms, given each core works with 1.5-2 concurrency. suggest increase executor cores to 3-5 to increase throughput, and tune other settings accordingly. You want to also align spark parallelism, shuffle partitions and hudi parallelisms (a few of them) as well. > hoodie.datasource.write.row.writer.enable: true This is only for bulk insert as of now. > data seems to be skewed and thus not easy to partition using a field and ensuring even distribution usually you'd use salting to handle skewed data to improve this. the performance won't go far without handling skewness properly. Hope these would help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #3755: [Delta Streamer] file name mismatch with meta when compaction running
xushiyan commented on issue #3755: URL: https://github.com/apache/hudi/issues/3755#issuecomment-1025291114 > > past > > think this is cause by meta table, I enabled metatable then got this error @fengjian428 did you mean you disabled metadata table then it's resolved? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #3831: Deltastreamer through Pyspark/livy
xushiyan commented on issue #3831: URL: https://github.com/apache/hudi/issues/3831#issuecomment-1025287708 @stackls closing this due to inactive. if you ever come up with some work to share about Deltastreamer + Livy, happy to see and help promote it to the community. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan closed issue #3831: Deltastreamer through Pyspark/livy
xushiyan closed issue #3831: URL: https://github.com/apache/hudi/issues/3831 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan opened a new pull request #4722: [MINOR] Fixing powered by pages with logos against company
nsivabalan opened a new pull request #4722: URL: https://github.com/apache/hudi/pull/4722 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan closed issue #3933: [SUPPORT] Large amount of disk spill on initial upsert/bulk insert
xushiyan closed issue #3933: URL: https://github.com/apache/hudi/issues/3933 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #3933: [SUPPORT] Large amount of disk spill on initial upsert/bulk insert
xushiyan commented on issue #3933: URL: https://github.com/apache/hudi/issues/3933#issuecomment-1025284224 @Limess thanks for the info provided so far on this issue. closing this now assuming no further action needed here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #3960: [SUPPORT]How to auto sync "add column" by flink ?
xushiyan commented on issue #3960: URL: https://github.com/apache/hudi/issues/3960#issuecomment-1025282665 > IMO, this requires the flink can support schema evolution, current community flink doesn't support it, you can consider use aliyun enterprise flink, we are support it. @0x574C i hope this gives you a good idea about supporting schema evolution in flink. If you don't have further questions, we may close this. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #4411: [SUPPORT] - Presto Querying Issue in AWS EMR 6.3.1
xushiyan commented on issue #4411: URL: https://github.com/apache/hudi/issues/4411#issuecomment-1025274983 > @xushiyan - in 0.5.0, NonpartitionedKeyGenerator doesn't support Composite Primary Keys, so we have used ComplexKeyGenerator and the querying works as expected through Presto 0.230 and not in Presto 0.245.1 @rajgowtham24 Ok do you mean the issue resolved with Presto 0.230 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #4627: [SUPPORT] Dremio integration
xushiyan commented on issue #4627: URL: https://github.com/apache/hudi/issues/4627#issuecomment-1025272749 @melin can you share more info on the use case? it'll be great if you can come up with an RFC and illustrate more details on the integration proposal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #4636: [SUPPORT] Sync timeline from embedded timeline server in flink pipline
xushiyan commented on issue #4636: URL: https://github.com/apache/hudi/issues/4636#issuecomment-1025271879 @danny0405 any suggestion on using timeline server with flink writer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan edited a comment on issue #4678: [SUPPORT] spark.read.format("hudi").schema(userSpecifiedSchema) doesn't work in version 0.10.0 ,but does work in 0.5.3
xushiyan edited a comment on issue #4678: URL: https://github.com/apache/hudi/issues/4678#issuecomment-1025263395 @YannByron do you know if we ever use the schema specified in `spark.read.format().schema()` or we always infer from the commit timeline or parquet file? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #4678: [SUPPORT] spark.read.format("hudi").schema(userSpecifiedSchema) doesn't work in version 0.10.0 ,but does work in 0.5.3
xushiyan commented on issue #4678: URL: https://github.com/apache/hudi/issues/4678#issuecomment-1025263395 @YannByron do you know if we ever use input from`.schema()` or we always infer from the commit timeline or parquet file? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #4683: [SUPPORT] Hive ro table read error
xushiyan commented on issue #4683: URL: https://github.com/apache/hudi/issues/4683#issuecomment-1025261010 @waywtdcc the stacktrace shows "unsupported type: optional int96 ts" which does not originate from hudi. looks like a problem with the hive ql specifying bigint? have you tried different data type like TIMESTAMP for `ts` ? also does this `ts` come from `users_cdc_hive` or `user_cdc17_ro` ? you probably need to double check the schemas for both tables and make sure int96 is not used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-3007) Address minor feedbacks on the repair utility
[ https://issues.apache.org/jira/browse/HUDI-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484474#comment-17484474 ] sivabalan narayanan commented on HUDI-3007: --- [~guoyihua] : are we good to close this or is there any pending items. > Address minor feedbacks on the repair utility > - > > Key: HUDI-3007 > URL: https://issues.apache.org/jira/browse/HUDI-3007 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3002) Add support for running integ test suite jobs in Azure CI once a week
[ https://issues.apache.org/jira/browse/HUDI-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-3002. - Resolution: Won't Fix > Add support for running integ test suite jobs in Azure CI once a week > - > > Key: HUDI-3002 > URL: https://issues.apache.org/jira/browse/HUDI-3002 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > > We have integ test suite job frameowrk to run tests locally in docker env. We > want to add support to run the same in azure CI on a weekly cadence. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3014) Hudi commit timeline 都是固定本地时区,云主机比如spot机制会受到时区影响
[ https://issues.apache.org/jira/browse/HUDI-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-3014. - Resolution: Fixed > Hudi commit timeline 都是固定本地时区,云主机比如spot机制会受到时区影响 > > > Key: HUDI-3014 > URL: https://issues.apache.org/jira/browse/HUDI-3014 > Project: Apache Hudi > Issue Type: Improvement >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3013) Docs for Presto and Hudi
[ https://issues.apache.org/jira/browse/HUDI-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3013: -- Component/s: trino-presto > Docs for Presto and Hudi > > > Key: HUDI-3013 > URL: https://issues.apache.org/jira/browse/HUDI-3013 > Project: Apache Hudi > Issue Type: Task > Components: trino-presto >Reporter: Kyle Weller >Assignee: Kyle Weller >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3018) Flag if user df has "_hoodie_is_deleted" field with diff data type other than boolean.
[ https://issues.apache.org/jira/browse/HUDI-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3018: -- Priority: Critical (was: Major) > Flag if user df has "_hoodie_is_deleted" field with diff data type other than > boolean. > --- > > Key: HUDI-3018 > URL: https://issues.apache.org/jira/browse/HUDI-3018 > Project: Apache Hudi > Issue Type: Improvement > Components: Usability >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Critical > Labels: sev:normal > Fix For: 0.11.0 > > > as of now, hudi interprets a special column named "_hoodie_is_deleted" and if > set to true, the record is considered a delete else an update or an insert. > this is not a reserved column as such. For eg, user dataframe can have a > column named "_hoodie_is_deleted" whose data type is random string. > > Add validations to hudi to ensure that this columns' data type is boolean if > present in the df. > > excerpt from the user > > I'd suggest: > * Possibly dropping the column (as you say if it has little benefits sure). > If not, documenting the behaviour somewhere. Alternatively, always include > the column, along with the other Hudi metadata fields which are prepended to > written schema already. > * If the column is not a boolean: > ** Failing hard, as this column is essentially "reserved" for Hudi > ** Taking {{IS NOT NULL}} as truthy > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3040) Fix HoodieSparkBootstrapExample error info for usage
[ https://issues.apache.org/jira/browse/HUDI-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3040: -- Status: Open (was: In Progress) > Fix HoodieSparkBootstrapExample error info for usage > > > Key: HUDI-3040 > URL: https://issues.apache.org/jira/browse/HUDI-3040 > Project: Apache Hudi > Issue Type: Improvement > Components: Code Cleanup >Reporter: Zhou Jianpeng >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Attachments: image-2021-12-16-21-06-04-406.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > !image-2021-12-16-21-06-04-406.png|width=692,height=234! > System.err.println("Usage: HoodieSparkBootstrapExample >"); -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3040) Fix HoodieSparkBootstrapExample error info for usage
[ https://issues.apache.org/jira/browse/HUDI-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-3040. - Resolution: Fixed > Fix HoodieSparkBootstrapExample error info for usage > > > Key: HUDI-3040 > URL: https://issues.apache.org/jira/browse/HUDI-3040 > Project: Apache Hudi > Issue Type: Improvement > Components: Code Cleanup >Reporter: Zhou Jianpeng >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Attachments: image-2021-12-16-21-06-04-406.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > !image-2021-12-16-21-06-04-406.png|width=692,height=234! > System.err.println("Usage: HoodieSparkBootstrapExample >"); -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HUDI-3335) Loading Hudi table fails with NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484469#comment-17484469 ] Harsha Teja Kanna commented on HUDI-3335: - Hi, Thanks. by deleting the metadata and running the sync I am able to load table again But the corrupted metadata is gone now). I will run it on another instance of the table to provide the above info. > Loading Hudi table fails with NullPointerException > -- > > Key: HUDI-3335 > URL: https://issues.apache.org/jira/browse/HUDI-3335 > Project: Apache Hudi > Issue Type: Bug > Components: metadata >Affects Versions: 0.10.1 >Reporter: Harsha Teja Kanna >Priority: Critical > Labels: hudi-on-call, user-support-issues > Fix For: 0.11.0 > > > Have a COW table with metadata enabled. Loading from Spark query fails with > java.lang.NullPointerException > *Environment* > Spark 3.1.2 > Hudi 0.10.1 > *Query* > import org.apache.hudi.DataSourceReadOptions > import org.apache.hudi.common.config.HoodieMetadataConfig > val basePath = "s3a://datalake-hudi/v1" > val df = spark. > read. > format("org.apache.hudi"). > option(HoodieMetadataConfig.ENABLE.key(), "true"). > option(DataSourceReadOptions.QUERY_TYPE.key(), > DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL). > load(s"${basePath}/sessions/") > df.createOrReplaceTempView(table) > *Passing an individual partition works though* > val df = spark. > read. > format("org.apache.hudi"). > option(HoodieMetadataConfig.ENABLE.key(), "true"). > option(DataSourceReadOptions.QUERY_TYPE.key(), > DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL). > load(s"${basePath}/sessions/date=2022/01/25") > df.createOrReplaceTempView(table) > *Also, disabling metadata works, but the query taking very long time* > val df = spark. > read. > format("org.apache.hudi"). > option(DataSourceReadOptions.QUERY_TYPE.key(), > DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL). > load(s"${basePath}/sessions/") > df.createOrReplaceTempView(table) > *Loading files with stacktrace:* > at > org.sparkproject.guava.base.Preconditions.checkNotNull(Preconditions.java:191) > at org.sparkproject.guava.cache.LocalCache.put(LocalCache.java:4210) > at > org.sparkproject.guava.cache.LocalCache$LocalManualCache.put(LocalCache.java:4804) > at > org.apache.spark.sql.execution.datasources.SharedInMemoryCache$$anon$3.putLeafFiles(FileStatusCache.scala:161) > at > org.apache.hudi.HoodieFileIndex.$anonfun$loadPartitionPathFiles$4(HoodieFileIndex.scala:631) > at > org.apache.hudi.HoodieFileIndex.$anonfun$loadPartitionPathFiles$4$adapted(HoodieFileIndex.scala:629) > at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:234) > at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:468) > at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:468) > at > org.apache.hudi.HoodieFileIndex.loadPartitionPathFiles(HoodieFileIndex.scala:629) > at org.apache.hudi.HoodieFileIndex.refresh0(HoodieFileIndex.scala:387) > at org.apache.hudi.HoodieFileIndex.(HoodieFileIndex.scala:184) > at > org.apache.hudi.DefaultSource.getBaseFileOnlyView(DefaultSource.scala:199) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:119) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:69) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239) > at $anonfun$res3$1(:46) > at $anonfun$res3$1$adapted(:40) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > *Writer config* > ** > spark-submit \ > --master yarn \ > --deploy-mode cluster \ > --driver-cores 4 \ > --driver-memory 4g \ > --executor-cores 4 \ > --executor-memory 6g \ > --num-executors 8 \ > --jars > s3://datalake/jars/unused-1.0.0.jar,s3://datalake/jars/spark-avro_2.12-3.1.2.jar > \ > --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \ > --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ > --conf spark
[jira] [Closed] (HUDI-3046) Claim RFC number for RFC for Compaction / Clustering Service
[ https://issues.apache.org/jira/browse/HUDI-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-3046. - Resolution: Fixed > Claim RFC number for RFC for Compaction / Clustering Service > > > Key: HUDI-3046 > URL: https://issues.apache.org/jira/browse/HUDI-3046 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: yuzhaojing >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HUDI-3050) Unable to find server port
[ https://issues.apache.org/jira/browse/HUDI-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484467#comment-17484467 ] sivabalan narayanan commented on HUDI-3050: --- [~guoyihua] : your attached link is not accessible? Did we already merge any patch in this regard? > Unable to find server port > -- > > Key: HUDI-3050 > URL: https://issues.apache.org/jira/browse/HUDI-3050 > Project: Apache Hudi > Issue Type: Bug > Components: kafka-connect >Reporter: Di Wang >Assignee: Ethan Guo >Priority: Major > Labels: user-support-issues > Fix For: 0.11.0 > > > > {code:java} > // code placeholder > Caused by: org.apache.hudi.exception.HoodieException: Unable to find server > port > at > org.apache.hudi.common.util.NetworkUtils.getHostname(NetworkUtils.java:41) > at > org.apache.hudi.client.embedded.EmbeddedTimelineService.setHostAddr(EmbeddedTimelineService.java:104) > at > org.apache.hudi.client.embedded.EmbeddedTimelineService.(EmbeddedTimelineService.java:55) > at > org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.startTimelineService(EmbeddedTimelineServerHelper.java:70) > at > org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.createEmbeddedTimelineService(EmbeddedTimelineServerHelper.java:51) > at > org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:109) > at > org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:77) > at > org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:139) > at > org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:127) > at > org.apache.hudi.client.HoodieFlinkWriteClient.(HoodieFlinkWriteClient.java:97) > at > org.apache.hudi.util.StreamerUtil.createWriteClient(StreamerUtil.java:402) > at > org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:166) > at > org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:198) > at > org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85) > ... 24 common frames omitted > Caused by: java.net.ConnectException: Connection timed out (Connection timed > out) > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at > org.apache.hudi.common.util.NetworkUtils.getHostname(NetworkUtils.java:38) > ... 37 common frames omitted > {code} > it happens an exception . i think relative code is is below, > https://issues.apache.org/jira/browse/HUDI-3037 > s = new Socket(); > // see > https://stackoverflow.com/questions/9481865/getting-the-ip-address-of-the-current-machine-using-java > // for details. > s.connect(new InetSocketAddress("google.com", 80)); > return s.getLocalAddress().getHostAddress(); > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3046) Claim RFC number for RFC for Compaction / Clustering Service
[ https://issues.apache.org/jira/browse/HUDI-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3046: -- Fix Version/s: 0.11.0 > Claim RFC number for RFC for Compaction / Clustering Service > > > Key: HUDI-3046 > URL: https://issues.apache.org/jira/browse/HUDI-3046 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: yuzhaojing >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3045) new ClusteringPlanStrategy to use regex choose partitions when building clustering plan.
[ https://issues.apache.org/jira/browse/HUDI-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3045: -- Fix Version/s: 0.11.0 > new ClusteringPlanStrategy to use regex choose partitions when building > clustering plan. > > > Key: HUDI-3045 > URL: https://issues.apache.org/jira/browse/HUDI-3045 > Project: Apache Hudi > Issue Type: Improvement > Components: clustering >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3045) new ClusteringPlanStrategy to use regex choose partitions when building clustering plan.
[ https://issues.apache.org/jira/browse/HUDI-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3045: -- Component/s: clustering > new ClusteringPlanStrategy to use regex choose partitions when building > clustering plan. > > > Key: HUDI-3045 > URL: https://issues.apache.org/jira/browse/HUDI-3045 > Project: Apache Hudi > Issue Type: Improvement > Components: clustering >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3050) Unable to find server port
[ https://issues.apache.org/jira/browse/HUDI-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3050: -- Priority: Critical (was: Major) > Unable to find server port > -- > > Key: HUDI-3050 > URL: https://issues.apache.org/jira/browse/HUDI-3050 > Project: Apache Hudi > Issue Type: Bug > Components: kafka-connect >Reporter: Di Wang >Assignee: Ethan Guo >Priority: Critical > Labels: user-support-issues > Fix For: 0.11.0 > > > > {code:java} > // code placeholder > Caused by: org.apache.hudi.exception.HoodieException: Unable to find server > port > at > org.apache.hudi.common.util.NetworkUtils.getHostname(NetworkUtils.java:41) > at > org.apache.hudi.client.embedded.EmbeddedTimelineService.setHostAddr(EmbeddedTimelineService.java:104) > at > org.apache.hudi.client.embedded.EmbeddedTimelineService.(EmbeddedTimelineService.java:55) > at > org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.startTimelineService(EmbeddedTimelineServerHelper.java:70) > at > org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.createEmbeddedTimelineService(EmbeddedTimelineServerHelper.java:51) > at > org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:109) > at > org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:77) > at > org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:139) > at > org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:127) > at > org.apache.hudi.client.HoodieFlinkWriteClient.(HoodieFlinkWriteClient.java:97) > at > org.apache.hudi.util.StreamerUtil.createWriteClient(StreamerUtil.java:402) > at > org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:166) > at > org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:198) > at > org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85) > ... 24 common frames omitted > Caused by: java.net.ConnectException: Connection timed out (Connection timed > out) > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at > org.apache.hudi.common.util.NetworkUtils.getHostname(NetworkUtils.java:38) > ... 37 common frames omitted > {code} > it happens an exception . i think relative code is is below, > https://issues.apache.org/jira/browse/HUDI-3037 > s = new Socket(); > // see > https://stackoverflow.com/questions/9481865/getting-the-ip-address-of-the-current-machine-using-java > // for details. > s.connect(new InetSocketAddress("google.com", 80)); > return s.getLocalAddress().getHostAddress(); > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3050) Unable to find server port
[ https://issues.apache.org/jira/browse/HUDI-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3050: -- Labels: user-support-issues (was: ) > Unable to find server port > -- > > Key: HUDI-3050 > URL: https://issues.apache.org/jira/browse/HUDI-3050 > Project: Apache Hudi > Issue Type: Bug > Components: kafka-connect >Reporter: Di Wang >Assignee: Ethan Guo >Priority: Major > Labels: user-support-issues > Fix For: 0.11.0 > > > > {code:java} > // code placeholder > Caused by: org.apache.hudi.exception.HoodieException: Unable to find server > port > at > org.apache.hudi.common.util.NetworkUtils.getHostname(NetworkUtils.java:41) > at > org.apache.hudi.client.embedded.EmbeddedTimelineService.setHostAddr(EmbeddedTimelineService.java:104) > at > org.apache.hudi.client.embedded.EmbeddedTimelineService.(EmbeddedTimelineService.java:55) > at > org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.startTimelineService(EmbeddedTimelineServerHelper.java:70) > at > org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.createEmbeddedTimelineService(EmbeddedTimelineServerHelper.java:51) > at > org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:109) > at > org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:77) > at > org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:139) > at > org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:127) > at > org.apache.hudi.client.HoodieFlinkWriteClient.(HoodieFlinkWriteClient.java:97) > at > org.apache.hudi.util.StreamerUtil.createWriteClient(StreamerUtil.java:402) > at > org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:166) > at > org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:198) > at > org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85) > ... 24 common frames omitted > Caused by: java.net.ConnectException: Connection timed out (Connection timed > out) > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at > org.apache.hudi.common.util.NetworkUtils.getHostname(NetworkUtils.java:38) > ... 37 common frames omitted > {code} > it happens an exception . i think relative code is is below, > https://issues.apache.org/jira/browse/HUDI-3037 > s = new Socket(); > // see > https://stackoverflow.com/questions/9481865/getting-the-ip-address-of-the-current-machine-using-java > // for details. > s.connect(new InetSocketAddress("google.com", 80)); > return s.getLocalAddress().getHostAddress(); > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3050) Unable to find server port
[ https://issues.apache.org/jira/browse/HUDI-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3050: -- Component/s: kafka-connect > Unable to find server port > -- > > Key: HUDI-3050 > URL: https://issues.apache.org/jira/browse/HUDI-3050 > Project: Apache Hudi > Issue Type: Bug > Components: kafka-connect >Reporter: Di Wang >Assignee: Ethan Guo >Priority: Major > > > {code:java} > // code placeholder > Caused by: org.apache.hudi.exception.HoodieException: Unable to find server > port > at > org.apache.hudi.common.util.NetworkUtils.getHostname(NetworkUtils.java:41) > at > org.apache.hudi.client.embedded.EmbeddedTimelineService.setHostAddr(EmbeddedTimelineService.java:104) > at > org.apache.hudi.client.embedded.EmbeddedTimelineService.(EmbeddedTimelineService.java:55) > at > org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.startTimelineService(EmbeddedTimelineServerHelper.java:70) > at > org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.createEmbeddedTimelineService(EmbeddedTimelineServerHelper.java:51) > at > org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:109) > at > org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:77) > at > org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:139) > at > org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:127) > at > org.apache.hudi.client.HoodieFlinkWriteClient.(HoodieFlinkWriteClient.java:97) > at > org.apache.hudi.util.StreamerUtil.createWriteClient(StreamerUtil.java:402) > at > org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:166) > at > org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:198) > at > org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85) > ... 24 common frames omitted > Caused by: java.net.ConnectException: Connection timed out (Connection timed > out) > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at > org.apache.hudi.common.util.NetworkUtils.getHostname(NetworkUtils.java:38) > ... 37 common frames omitted > {code} > it happens an exception . i think relative code is is below, > https://issues.apache.org/jira/browse/HUDI-3037 > s = new Socket(); > // see > https://stackoverflow.com/questions/9481865/getting-the-ip-address-of-the-current-machine-using-java > // for details. > s.connect(new InetSocketAddress("google.com", 80)); > return s.getLocalAddress().getHostAddress(); > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3050) Unable to find server port
[ https://issues.apache.org/jira/browse/HUDI-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3050: -- Fix Version/s: 0.11.0 > Unable to find server port > -- > > Key: HUDI-3050 > URL: https://issues.apache.org/jira/browse/HUDI-3050 > Project: Apache Hudi > Issue Type: Bug > Components: kafka-connect >Reporter: Di Wang >Assignee: Ethan Guo >Priority: Major > Fix For: 0.11.0 > > > > {code:java} > // code placeholder > Caused by: org.apache.hudi.exception.HoodieException: Unable to find server > port > at > org.apache.hudi.common.util.NetworkUtils.getHostname(NetworkUtils.java:41) > at > org.apache.hudi.client.embedded.EmbeddedTimelineService.setHostAddr(EmbeddedTimelineService.java:104) > at > org.apache.hudi.client.embedded.EmbeddedTimelineService.(EmbeddedTimelineService.java:55) > at > org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.startTimelineService(EmbeddedTimelineServerHelper.java:70) > at > org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.createEmbeddedTimelineService(EmbeddedTimelineServerHelper.java:51) > at > org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:109) > at > org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:77) > at > org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:139) > at > org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:127) > at > org.apache.hudi.client.HoodieFlinkWriteClient.(HoodieFlinkWriteClient.java:97) > at > org.apache.hudi.util.StreamerUtil.createWriteClient(StreamerUtil.java:402) > at > org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:166) > at > org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:198) > at > org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.startAllOperatorCoordinators(DefaultOperatorCoordinatorHandler.java:85) > ... 24 common frames omitted > Caused by: java.net.ConnectException: Connection timed out (Connection timed > out) > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at > org.apache.hudi.common.util.NetworkUtils.getHostname(NetworkUtils.java:38) > ... 37 common frames omitted > {code} > it happens an exception . i think relative code is is below, > https://issues.apache.org/jira/browse/HUDI-3037 > s = new Socket(); > // see > https://stackoverflow.com/questions/9481865/getting-the-ip-address-of-the-current-machine-using-java > // for details. > s.connect(new InetSocketAddress("google.com", 80)); > return s.getLocalAddress().getHostAddress(); > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3055) Make sure that Compression Codec configuration is respected across the board
[ https://issues.apache.org/jira/browse/HUDI-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3055: -- Component/s: storage-management > Make sure that Compression Codec configuration is respected across the board > > > Key: HUDI-3055 > URL: https://issues.apache.org/jira/browse/HUDI-3055 > Project: Apache Hudi > Issue Type: Bug > Components: storage-management >Reporter: Alexey Kudinkin >Priority: Major > Labels: newbie > Fix For: 0.11.0 > > > Currently there are quite a few places where we assume GZip as the > compression codec which is incorrect, given that this is configurable and > users might actually prefer to use different compression codec. > Examples: > [HoodieParquetDataBlock|https://github.com/apache/hudi/pull/4333/files#diff-798a773c6eef4011aef2da2b2fb71c25f753500548167b610021336ef6f14807] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3055) Make sure that Compression Codec configuration is respected across the board
[ https://issues.apache.org/jira/browse/HUDI-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3055: -- Fix Version/s: 0.11.0 > Make sure that Compression Codec configuration is respected across the board > > > Key: HUDI-3055 > URL: https://issues.apache.org/jira/browse/HUDI-3055 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Priority: Major > Labels: newbie > Fix For: 0.11.0 > > > Currently there are quite a few places where we assume GZip as the > compression codec which is incorrect, given that this is configurable and > users might actually prefer to use different compression codec. > Examples: > [HoodieParquetDataBlock|https://github.com/apache/hudi/pull/4333/files#diff-798a773c6eef4011aef2da2b2fb71c25f753500548167b610021336ef6f14807] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3062) savepoint rollback of last but one savepoint fails
[ https://issues.apache.org/jira/browse/HUDI-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3062: -- Priority: Critical (was: Major) > savepoint rollback of last but one savepoint fails > -- > > Key: HUDI-3062 > URL: https://issues.apache.org/jira/browse/HUDI-3062 > Project: Apache Hudi > Issue Type: Bug >Reporter: sivabalan narayanan >Priority: Critical > Labels: sev:critical > Fix For: 0.11.0 > > > so, I created 2 savepoints as below. > c1, c2, c3, sp1, c4, sp2, c5. > tried savepoint rollback for sp2 and it worked. but left trailing rollback > meta files. > again tried to savepoint roll back with sp1 and it failed. stacktrace does > not have sufficient info. > {code:java} > 21/12/18 06:20:00 INFO HoodieActiveTimeline: Loaded instants upto : > Option{val=[==>20211218061954430__rollback__REQUESTED]} > 21/12/18 06:20:00 INFO BaseRollbackPlanActionExecutor: Requesting Rollback > with instant time [==>20211218061954430__rollback__REQUESTED] > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 66 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 26 > 21/12/18 06:20:00 INFO BlockManagerInfo: Removed broadcast_3_piece0 on > 192.168.1.4:54359 in memory (size: 25.5 KB, free: 366.2 MB) > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 110 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 99 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 47 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 21 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 43 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 55 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 104 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 124 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 29 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 91 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 123 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 120 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 25 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 32 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 92 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 76 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 89 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 102 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 50 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 49 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 116 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 96 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 118 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 44 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 60 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 87 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 77 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 75 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 9 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 72 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 2 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 37 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 113 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 67 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 28 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 95 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 59 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 68 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 45 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 39 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 74 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 20 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 90 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 56 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 58 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 61 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 13 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 46 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 101 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 105 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 81 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 63 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 78 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 4 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned ac
[GitHub] [hudi] xushiyan commented on issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet
xushiyan commented on issue #4690: URL: https://github.com/apache/hudi/issues/4690#issuecomment-1025238050 @wqwl611 we need to know more details like the code you're trying to execute and the environment to help reproduce the issue. Also spark 3.2 is not supported in hudi 0.10.0. Please try spark 3.1 or 3.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #4698: [SUPPORT] RowDataToAvroConverters does not support data in flink timestamp_ltz (timestamp_with_local_time_zone) format.
xushiyan commented on issue #4698: URL: https://github.com/apache/hudi/issues/4698#issuecomment-1025233718 @hanjun996 can you share environment configs, software versions, and code snippet to help us reproduce the issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #4700: [SUPPORT] Adding new column to table is not propagated to Hive via HMS sync mode
xushiyan commented on issue #4700: URL: https://github.com/apache/hudi/issues/4700#issuecomment-1025231731 @117th hudi is certified with hive 2.3.x. There could be issues working with hive 3.0.0. Can you see if it works hive 2.3.3 ? @codope do you happen to know the full list of compatible hive versions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (HUDI-3335) Loading Hudi table fails with NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484420#comment-17484420 ] Harsha Teja Kanna edited comment on HUDI-3335 at 1/30/22, 8:17 PM: --- Hi, I cannot do that immediately(I will have to check), also this is a very large table to reproduce. But I have see this happen on the same table after creating it for the second time. I will try to delete the metadata folder and re-run sync to see if that helps. Also will try to see if I can reproduce this on any small table. was (Author: h7kanna): Hi, I cannot do that immediately(I will have to check), also this is a very large to reproduce. But I have see this happen on the same table after creating it for the second time. I will try to delete the metadata folder and re-run sync to see if that helps. Also will try to see if I can reproduce this on any small table. > Loading Hudi table fails with NullPointerException > -- > > Key: HUDI-3335 > URL: https://issues.apache.org/jira/browse/HUDI-3335 > Project: Apache Hudi > Issue Type: Bug > Components: metadata >Affects Versions: 0.10.1 >Reporter: Harsha Teja Kanna >Priority: Critical > Labels: hudi-on-call, user-support-issues > Fix For: 0.11.0 > > > Have a COW table with metadata enabled. Loading from Spark query fails with > java.lang.NullPointerException > *Environment* > Spark 3.1.2 > Hudi 0.10.1 > *Query* > import org.apache.hudi.DataSourceReadOptions > import org.apache.hudi.common.config.HoodieMetadataConfig > val basePath = "s3a://datalake-hudi/v1" > val df = spark. > read. > format("org.apache.hudi"). > option(HoodieMetadataConfig.ENABLE.key(), "true"). > option(DataSourceReadOptions.QUERY_TYPE.key(), > DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL). > load(s"${basePath}/sessions/") > df.createOrReplaceTempView(table) > *Passing an individual partition works though* > val df = spark. > read. > format("org.apache.hudi"). > option(HoodieMetadataConfig.ENABLE.key(), "true"). > option(DataSourceReadOptions.QUERY_TYPE.key(), > DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL). > load(s"${basePath}/sessions/date=2022/01/25") > df.createOrReplaceTempView(table) > *Also, disabling metadata works, but the query taking very long time* > val df = spark. > read. > format("org.apache.hudi"). > option(DataSourceReadOptions.QUERY_TYPE.key(), > DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL). > load(s"${basePath}/sessions/") > df.createOrReplaceTempView(table) > *Loading files with stacktrace:* > at > org.sparkproject.guava.base.Preconditions.checkNotNull(Preconditions.java:191) > at org.sparkproject.guava.cache.LocalCache.put(LocalCache.java:4210) > at > org.sparkproject.guava.cache.LocalCache$LocalManualCache.put(LocalCache.java:4804) > at > org.apache.spark.sql.execution.datasources.SharedInMemoryCache$$anon$3.putLeafFiles(FileStatusCache.scala:161) > at > org.apache.hudi.HoodieFileIndex.$anonfun$loadPartitionPathFiles$4(HoodieFileIndex.scala:631) > at > org.apache.hudi.HoodieFileIndex.$anonfun$loadPartitionPathFiles$4$adapted(HoodieFileIndex.scala:629) > at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:234) > at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:468) > at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:468) > at > org.apache.hudi.HoodieFileIndex.loadPartitionPathFiles(HoodieFileIndex.scala:629) > at org.apache.hudi.HoodieFileIndex.refresh0(HoodieFileIndex.scala:387) > at org.apache.hudi.HoodieFileIndex.(HoodieFileIndex.scala:184) > at > org.apache.hudi.DefaultSource.getBaseFileOnlyView(DefaultSource.scala:199) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:119) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:69) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239) > at $anonfun$res3$1(:46) > at $anonfun$res3$1$adapted(:40) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach
[jira] [Updated] (HUDI-3254) Introduce HoodieCatalog to manage tables for Spark Datasource V2
[ https://issues.apache.org/jira/browse/HUDI-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3254: - Status: In Progress (was: Open) > Introduce HoodieCatalog to manage tables for Spark Datasource V2 > > > Key: HUDI-3254 > URL: https://issues.apache.org/jira/browse/HUDI-3254 > Project: Apache Hudi > Issue Type: New Feature > Components: spark >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available, sev:normal > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2732) Spark Datasource V2 integration RFC
[ https://issues.apache.org/jira/browse/HUDI-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2732: - Status: Patch Available (was: In Progress) > Spark Datasource V2 integration RFC > > > Key: HUDI-2732 > URL: https://issues.apache.org/jira/browse/HUDI-2732 > Project: Apache Hudi > Issue Type: Task > Components: spark >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3254) Introduce HoodieCatalog to manage tables for Spark Datasource V2
[ https://issues.apache.org/jira/browse/HUDI-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3254: - Status: Patch Available (was: In Progress) > Introduce HoodieCatalog to manage tables for Spark Datasource V2 > > > Key: HUDI-3254 > URL: https://issues.apache.org/jira/browse/HUDI-3254 > Project: Apache Hudi > Issue Type: New Feature > Components: spark >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available, sev:normal > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2732) Spark Datasource V2 integration RFC
[ https://issues.apache.org/jira/browse/HUDI-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2732: - Status: In Progress (was: Open) > Spark Datasource V2 integration RFC > > > Key: HUDI-2732 > URL: https://issues.apache.org/jira/browse/HUDI-2732 > Project: Apache Hudi > Issue Type: Task > Components: spark >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3062) savepoint rollback of last but one savepoint fails
[ https://issues.apache.org/jira/browse/HUDI-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3062: -- Fix Version/s: 0.11.0 > savepoint rollback of last but one savepoint fails > -- > > Key: HUDI-3062 > URL: https://issues.apache.org/jira/browse/HUDI-3062 > Project: Apache Hudi > Issue Type: Bug >Reporter: sivabalan narayanan >Priority: Major > Labels: sev:critical > Fix For: 0.11.0 > > > so, I created 2 savepoints as below. > c1, c2, c3, sp1, c4, sp2, c5. > tried savepoint rollback for sp2 and it worked. but left trailing rollback > meta files. > again tried to savepoint roll back with sp1 and it failed. stacktrace does > not have sufficient info. > {code:java} > 21/12/18 06:20:00 INFO HoodieActiveTimeline: Loaded instants upto : > Option{val=[==>20211218061954430__rollback__REQUESTED]} > 21/12/18 06:20:00 INFO BaseRollbackPlanActionExecutor: Requesting Rollback > with instant time [==>20211218061954430__rollback__REQUESTED] > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 66 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 26 > 21/12/18 06:20:00 INFO BlockManagerInfo: Removed broadcast_3_piece0 on > 192.168.1.4:54359 in memory (size: 25.5 KB, free: 366.2 MB) > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 110 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 99 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 47 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 21 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 43 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 55 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 104 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 124 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 29 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 91 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 123 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 120 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 25 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 32 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 92 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 76 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 89 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 102 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 50 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 49 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 116 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 96 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 118 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 44 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 60 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 87 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 77 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 75 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 9 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 72 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 2 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 37 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 113 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 67 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 28 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 95 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 59 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 68 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 45 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 39 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 74 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 20 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 90 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 56 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 58 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 61 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 13 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 46 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 101 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 105 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 81 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 63 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 78 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 4 > 21/12/18 06:20:00 INFO ContextCleaner: Cleaned accumulator 31 >
[jira] [Commented] (HUDI-3066) Very slow file listing after enabling metadata for existing tables in 0.10.0 release
[ https://issues.apache.org/jira/browse/HUDI-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484429#comment-17484429 ] sivabalan narayanan commented on HUDI-3066: --- CC [~guoyihua] > Very slow file listing after enabling metadata for existing tables in 0.10.0 > release > > > Key: HUDI-3066 > URL: https://issues.apache.org/jira/browse/HUDI-3066 > Project: Apache Hudi > Issue Type: Bug > Components: metadata, reader-core >Affects Versions: 0.10.0 > Environment: EMR 6.4.0 > Hudi version : 0.10.0 >Reporter: Harsha Teja Kanna >Assignee: sivabalan narayanan >Priority: Critical > Labels: performance, pull-request-available > Fix For: 0.11.0 > > Attachments: Screen Shot 2021-12-18 at 6.16.29 PM.png, Screen Shot > 2021-12-20 at 10.05.50 PM.png, Screen Shot 2021-12-20 at 10.17.44 PM.png, > Screen Shot 2021-12-21 at 10.22.54 PM.png, Screen Shot 2021-12-21 at 10.24.12 > PM.png, metadata_files.txt, metadata_files_compacted.txt, > metadata_timeline.txt, metadata_timeline_archived.txt, > metadata_timeline_compacted.txt, stderr_part1.txt, stderr_part2.txt, > timeline.txt, writer_log.txt > > > After 'metadata table' is enabled, File listing takes long time. > If metadata is enabled on Reader side(as shown below), it is taking even more > time per file listing task > {code:java} > import org.apache.hudi.DataSourceReadOptions > import org.apache.hudi.common.config.HoodieMetadataConfig > val hadoopConf = spark.conf > hadoopConf.set(HoodieMetadataConfig.ENABLE.key(), "true") > val basePath = "s3a://datalake-hudi" > val sessions = spark > .read > .format("org.apache.hudi") > .option(DataSourceReadOptions.QUERY_TYPE.key(), > DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL) > .option(DataSourceReadOptions.READ_PATHS.key(), > s"${basePath}/sessions_by_entrydate/entrydate=2021/*/*/*") > .load() > sessions.createOrReplaceTempView("sessions") {code} > Existing tables (COW) have inline clustering on and have many replace commits. > Logs seem to suggest the delay is in view.AbstractTableFileSystemView > resetFileGroupsReplaced function or metadata.HoodieBackedTableMetadata > Also many log messages in AbstractHoodieLogRecordReader > > 2021-12-18 23:17:54,056 INFO view.AbstractTableFileSystemView: Took 4118 ms > to read 136 instants, 9731 replaced file groups > 2021-12-18 23:37:46,086 INFO log.AbstractHoodieLogRecordReader: Number of > remaining logblocks to merge 1 > 2021-12-18 23:37:46,090 INFO log.AbstractHoodieLogRecordReader: Reading a > data block from file > s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.76_0-20-515 > at instant 20211217035105329 > 2021-12-18 23:37:46,090 INFO log.AbstractHoodieLogRecordReader: Number of > remaining logblocks to merge 1 > 2021-12-18 23:37:46,094 INFO log.HoodieLogFormatReader: Moving to the next > reader for logfile > HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.121_0-57-663', > fileLen=0} > 2021-12-18 23:37:46,095 INFO log.AbstractHoodieLogRecordReader: Scanning log > file > HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.20_0-35-613', > fileLen=0} > 2021-12-18 23:37:46,095 INFO s3a.S3AInputStream: Switching to Random IO seek > policy > 2021-12-18 23:37:46,096 INFO log.AbstractHoodieLogRecordReader: Reading a > data block from file > s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.62_0-34-377 > at instant 20211217022049877 > 2021-12-18 23:37:46,096 INFO log.AbstractHoodieLogRecordReader: Number of > remaining logblocks to merge 1 > 2021-12-18 23:37:46,105 INFO log.HoodieLogFormatReader: Moving to the next > reader for logfile > HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.86_0-20-362', > fileLen=0} > 2021-12-18 23:37:46,109 INFO log.AbstractHoodieLogRecordReader: Scanning log > file > HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.121_0-57-663', > fileLen=0} > 2021-12-18 23:37:46,109 INFO s3a.S3AInputStream: Switching to Random IO seek > policy > 2021-12-18 23:37:46,110 INFO log.HoodieLogFormatReader: Moving to the next > reader for logfile > HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.77_0-35-590', > fileLen=0} > 2021-12-18 23:37:46,112 INFO log.AbstractHoodieLogRecordReader: Reading a > data block from file > s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.20_0-35-613 > at ins
[jira] [Updated] (HUDI-3068) Add support to sync all partitions in hive sync tool
[ https://issues.apache.org/jira/browse/HUDI-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3068: -- Fix Version/s: 0.11.0 > Add support to sync all partitions in hive sync tool > > > Key: HUDI-3068 > URL: https://issues.apache.org/jira/browse/HUDI-3068 > Project: Apache Hudi > Issue Type: Improvement > Components: hive >Reporter: sivabalan narayanan >Assignee: Harshal Patil >Priority: Major > Labels: pull-request-available, sev:critical > Fix For: 0.11.0 > > > If a user runs hive sync occationally and if archival kicked in and trimmed > some commits and if there were partitions added during those commits which > was never updated later, hive sync will miss out those partitions. > {code:java} > LOG.info("Last commit time synced is " + lastCommitTimeSynced.get() + ", > Getting commits since then"); > return > TimelineUtils.getPartitionsWritten(metaClient.getActiveTimeline().getCommitsTimeline() > .findInstantsAfter(lastCommitTimeSynced.get(), Integer.MAX_VALUE)); > } {code} > bcoz, we for recurrent syncs, we always fetch new commits from timeline after > the last synced instant and fetch commit metadata and go on to fetch the > partitions added as part of it. > > We can add a new config to hive sync tool to override this behavior. > --sync-all-partitions > when this config is set to true, we should ignore last synced instant and > should go the below route which is done when syncing for the first time. > > {code:java} > if (!lastCommitTimeSynced.isPresent()) { > LOG.info("Last commit time synced is not known, listing all partitions in " > + basePath + ",FS :" + fs); > HoodieLocalEngineContext engineContext = new > HoodieLocalEngineContext(metaClient.getHadoopConf()); > return FSUtils.getAllPartitionPaths(engineContext, basePath, > useFileListingFromMetadata, assumeDatePartitioning); > } {code} > > > Ref issue: > https://github.com/apache/hudi/issues/3890 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3067) "Table already exists" error with multiple writers and dynamodb
[ https://issues.apache.org/jira/browse/HUDI-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-3067: - Assignee: Wenning Ding > "Table already exists" error with multiple writers and dynamodb > --- > > Key: HUDI-3067 > URL: https://issues.apache.org/jira/browse/HUDI-3067 > Project: Apache Hudi > Issue Type: Bug >Reporter: Nikita Sheremet >Assignee: Wenning Ding >Priority: Major > > How reproduce: > # Set up multiple writing > [https://hudi.apache.org/docs/concurrency_control/] for dynamodb (do not > forget to set _hoodie.write.lock.dynamodb.region_ and > {_}hoodie.write.lock.dynamodb.billing_mode{_}). Do not create anty dynamodb > table. > # Run multiple writers to the table > (Tested on aws EMR, so multiple writers is EMR steps) > Expected result - all steps completed. > Actual result: some steps failed with exception > {code:java} > Caused by: com.amazonaws.services.dynamodbv2.model.ResourceInUseException: > Table already exists: truedata_detections (Service: AmazonDynamoDBv2; Status > Code: 400; Error Code: ResourceInUseException; Request ID:; Proxy: null) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:6214) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:6181) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeCreateTable(AmazonDynamoDBClient.java:1160) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.createTable(AmazonDynamoDBClient.java:1124) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.createLockTableInDynamoDB(DynamoDBBasedLockProvider.java:188) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:99) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:77) > ... 54 more > 21/12/19 13:42:06 INFO Yar {code} > This happens because all steps tried to create table at the same time. > > Suggested solution: > A catch statment for _Table already exists_ exception should be added into > dynamodb table creation code. May be with delay and additional check that > table is present. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HUDI-3067) "Table already exists" error with multiple writers and dynamodb
[ https://issues.apache.org/jira/browse/HUDI-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484428#comment-17484428 ] sivabalan narayanan commented on HUDI-3067: --- [~wenningd] : Can you please follow up on this. > "Table already exists" error with multiple writers and dynamodb > --- > > Key: HUDI-3067 > URL: https://issues.apache.org/jira/browse/HUDI-3067 > Project: Apache Hudi > Issue Type: Bug >Reporter: Nikita Sheremet >Assignee: Wenning Ding >Priority: Major > Fix For: 0.11.0 > > > How reproduce: > # Set up multiple writing > [https://hudi.apache.org/docs/concurrency_control/] for dynamodb (do not > forget to set _hoodie.write.lock.dynamodb.region_ and > {_}hoodie.write.lock.dynamodb.billing_mode{_}). Do not create anty dynamodb > table. > # Run multiple writers to the table > (Tested on aws EMR, so multiple writers is EMR steps) > Expected result - all steps completed. > Actual result: some steps failed with exception > {code:java} > Caused by: com.amazonaws.services.dynamodbv2.model.ResourceInUseException: > Table already exists: truedata_detections (Service: AmazonDynamoDBv2; Status > Code: 400; Error Code: ResourceInUseException; Request ID:; Proxy: null) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:6214) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:6181) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeCreateTable(AmazonDynamoDBClient.java:1160) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.createTable(AmazonDynamoDBClient.java:1124) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.createLockTableInDynamoDB(DynamoDBBasedLockProvider.java:188) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:99) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:77) > ... 54 more > 21/12/19 13:42:06 INFO Yar {code} > This happens because all steps tried to create table at the same time. > > Suggested solution: > A catch statment for _Table already exists_ exception should be added into > dynamodb table creation code. May be with delay and additional check that > table is present. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3067) "Table already exists" error with multiple writers and dynamodb
[ https://issues.apache.org/jira/browse/HUDI-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3067: -- Fix Version/s: 0.11.0 > "Table already exists" error with multiple writers and dynamodb > --- > > Key: HUDI-3067 > URL: https://issues.apache.org/jira/browse/HUDI-3067 > Project: Apache Hudi > Issue Type: Bug >Reporter: Nikita Sheremet >Assignee: Wenning Ding >Priority: Major > Fix For: 0.11.0 > > > How reproduce: > # Set up multiple writing > [https://hudi.apache.org/docs/concurrency_control/] for dynamodb (do not > forget to set _hoodie.write.lock.dynamodb.region_ and > {_}hoodie.write.lock.dynamodb.billing_mode{_}). Do not create anty dynamodb > table. > # Run multiple writers to the table > (Tested on aws EMR, so multiple writers is EMR steps) > Expected result - all steps completed. > Actual result: some steps failed with exception > {code:java} > Caused by: com.amazonaws.services.dynamodbv2.model.ResourceInUseException: > Table already exists: truedata_detections (Service: AmazonDynamoDBv2; Status > Code: 400; Error Code: ResourceInUseException; Request ID:; Proxy: null) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:6214) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:6181) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeCreateTable(AmazonDynamoDBClient.java:1160) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.createTable(AmazonDynamoDBClient.java:1124) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.createLockTableInDynamoDB(DynamoDBBasedLockProvider.java:188) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:99) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:77) > ... 54 more > 21/12/19 13:42:06 INFO Yar {code} > This happens because all steps tried to create table at the same time. > > Suggested solution: > A catch statment for _Table already exists_ exception should be added into > dynamodb table creation code. May be with delay and additional check that > table is present. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3073) 0.10.0 Missing Docs
[ https://issues.apache.org/jira/browse/HUDI-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3073: -- Component/s: docs > 0.10.0 Missing Docs > --- > > Key: HUDI-3073 > URL: https://issues.apache.org/jira/browse/HUDI-3073 > Project: Apache Hudi > Issue Type: Epic > Components: docs >Reporter: Kyle Weller >Assignee: Kyle Weller >Priority: Major > > List of docs that were missed in 0.10.0 release -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3074) Docs for Z-order
[ https://issues.apache.org/jira/browse/HUDI-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3074: -- Component/s: docs > Docs for Z-order > > > Key: HUDI-3074 > URL: https://issues.apache.org/jira/browse/HUDI-3074 > Project: Apache Hudi > Issue Type: Task > Components: clustering, docs >Reporter: Kyle Weller >Assignee: Kyle Weller >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3071) Release 0.10.1 Prep
[ https://issues.apache.org/jira/browse/HUDI-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-3071. - Resolution: Invalid > Release 0.10.1 Prep > --- > > Key: HUDI-3071 > URL: https://issues.apache.org/jira/browse/HUDI-3071 > Project: Apache Hudi > Issue Type: Improvement > Components: Release & Administrative >Reporter: sivabalan narayanan >Priority: Major > > We would like to do a minor 0.10.1 release with critical bug fixes on top of > 0.10.0. > Will use this ticket to track all important fixes that needs to be pulled > into 0.10.1. > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3075) Docs for Debezium source
[ https://issues.apache.org/jira/browse/HUDI-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3075: -- Component/s: docs > Docs for Debezium source > > > Key: HUDI-3075 > URL: https://issues.apache.org/jira/browse/HUDI-3075 > Project: Apache Hudi > Issue Type: Task > Components: deltastreamer, docs >Reporter: Kyle Weller >Assignee: Kyle Weller >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3075) Docs for Debezium source
[ https://issues.apache.org/jira/browse/HUDI-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3075: -- Component/s: deltastreamer > Docs for Debezium source > > > Key: HUDI-3075 > URL: https://issues.apache.org/jira/browse/HUDI-3075 > Project: Apache Hudi > Issue Type: Task > Components: deltastreamer >Reporter: Kyle Weller >Assignee: Kyle Weller >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3074) Docs for Z-order
[ https://issues.apache.org/jira/browse/HUDI-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3074: -- Component/s: clustering > Docs for Z-order > > > Key: HUDI-3074 > URL: https://issues.apache.org/jira/browse/HUDI-3074 > Project: Apache Hudi > Issue Type: Task > Components: clustering >Reporter: Kyle Weller >Assignee: Kyle Weller >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3081) Revisiting Read Path Infra across Query Engines
[ https://issues.apache.org/jira/browse/HUDI-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3081: -- Component/s: reader-core > Revisiting Read Path Infra across Query Engines > --- > > Key: HUDI-3081 > URL: https://issues.apache.org/jira/browse/HUDI-3081 > Project: Apache Hudi > Issue Type: Epic > Components: reader-core >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Fix For: 0.11.0 > > > Currently, our Read-path infrastructure is mostly disparate for each > individual Query Engine having the same flow replicated multiple times: > * Hive leverages hierarchy based off `InputFormat` class > * Spark leverages hierarchy based off `SnapshotRelation` > This leads to substantial duplication of virtually the same flows being > replicated multiple times and unfortunately now diverging due to out of sync > lifecycle (bug-fixes, etc). > h3. Proposal > > *Phase 1: Abstracting Common Functionality* > > {_}T-shirt{_}: 1-1.5 weeks > {_}Goal{_}: Abstract following common items to avoid duplication of the > complex sequences across Engines > * Unify Hive’s RecordReaders (`RealtimeCompactedRecordReader`, > {{{}RealtimeUnmergedRecordReader{}}}) > * > ** _These Readers should only differ in the way they handle the payload, > everything else should remain constant_ > * Abstract w/in common component (name TBD) > ** Listing current file-slices at the requested instant (handling the > timeline) > ** Creating Record Iterator for the provided file-slice > > REF > [https://app.clickup.com/18029943/v/dc/h67bq-1900/h67bq-6680] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3079) Docs for Flink 0.10.0 new features
[ https://issues.apache.org/jira/browse/HUDI-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3079: -- Component/s: flink > Docs for Flink 0.10.0 new features > -- > > Key: HUDI-3079 > URL: https://issues.apache.org/jira/browse/HUDI-3079 > Project: Apache Hudi > Issue Type: Task > Components: flink >Reporter: Kyle Weller >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3079) Docs for Flink 0.10.0 new features
[ https://issues.apache.org/jira/browse/HUDI-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3079: -- Component/s: docs > Docs for Flink 0.10.0 new features > -- > > Key: HUDI-3079 > URL: https://issues.apache.org/jira/browse/HUDI-3079 > Project: Apache Hudi > Issue Type: Task > Components: docs, flink >Reporter: Kyle Weller >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3085) Refactor fileId & writeHandler logic into partitioner for bulk_insert
[ https://issues.apache.org/jira/browse/HUDI-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3085: -- Component/s: writer-core > Refactor fileId & writeHandler logic into partitioner for bulk_insert > - > > Key: HUDI-3085 > URL: https://issues.apache.org/jira/browse/HUDI-3085 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core >Reporter: Yuwei Xiao >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > a better partitioner abstraction for bulk_insert -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3085) Refactor fileId & writeHandler logic into partitioner for bulk_insert
[ https://issues.apache.org/jira/browse/HUDI-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3085: -- Fix Version/s: 0.11.0 > Refactor fileId & writeHandler logic into partitioner for bulk_insert > - > > Key: HUDI-3085 > URL: https://issues.apache.org/jira/browse/HUDI-3085 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Yuwei Xiao >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > a better partitioner abstraction for bulk_insert -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3087) Fix DedupeSparkJob typo
[ https://issues.apache.org/jira/browse/HUDI-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3087: -- Fix Version/s: 0.11.0 > Fix DedupeSparkJob typo > --- > > Key: HUDI-3087 > URL: https://issues.apache.org/jira/browse/HUDI-3087 > Project: Apache Hudi > Issue Type: Improvement > Components: spark >Reporter: Zhou Jianpeng >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Attachments: image-2021-12-22-00-03-14-430.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > !image-2021-12-22-00-03-14-430.png|width=582,height=179! > line 236 may be step 5 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3087) Fix DedupeSparkJob typo
[ https://issues.apache.org/jira/browse/HUDI-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3087: -- Status: Patch Available (was: In Progress) > Fix DedupeSparkJob typo > --- > > Key: HUDI-3087 > URL: https://issues.apache.org/jira/browse/HUDI-3087 > Project: Apache Hudi > Issue Type: Improvement > Components: spark >Reporter: Zhou Jianpeng >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Attachments: image-2021-12-22-00-03-14-430.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > !image-2021-12-22-00-03-14-430.png|width=582,height=179! > line 236 may be step 5 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3087) Fix DedupeSparkJob typo
[ https://issues.apache.org/jira/browse/HUDI-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-3087. - Resolution: Fixed > Fix DedupeSparkJob typo > --- > > Key: HUDI-3087 > URL: https://issues.apache.org/jira/browse/HUDI-3087 > Project: Apache Hudi > Issue Type: Improvement > Components: spark >Reporter: Zhou Jianpeng >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Attachments: image-2021-12-22-00-03-14-430.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > !image-2021-12-22-00-03-14-430.png|width=582,height=179! > line 236 may be step 5 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3089) Add tests for S3 Incremental source
[ https://issues.apache.org/jira/browse/HUDI-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-3089. - Assignee: sivabalan narayanan Resolution: Won't Fix > Add tests for S3 Incremental source > --- > > Key: HUDI-3089 > URL: https://issues.apache.org/jira/browse/HUDI-3089 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: sev:high > > S3 incremental source does not have good tests only. We need to add tests for > the same. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HUDI-3089) Add tests for S3 Incremental source
[ https://issues.apache.org/jira/browse/HUDI-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484427#comment-17484427 ] sivabalan narayanan commented on HUDI-3089: --- don't think this is doable. tests need to operate with s3 file system since we hard code those in these sources. closing the ticket. > Add tests for S3 Incremental source > --- > > Key: HUDI-3089 > URL: https://issues.apache.org/jira/browse/HUDI-3089 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: sivabalan narayanan >Priority: Major > Labels: sev:high > > S3 incremental source does not have good tests only. We need to add tests for > the same. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3103) Enable MultiTableDeltaStreamer to update a single target table from multiple source tables.
[ https://issues.apache.org/jira/browse/HUDI-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3103: -- Fix Version/s: 0.11.0 > Enable MultiTableDeltaStreamer to update a single target table from multiple > source tables. > --- > > Key: HUDI-3103 > URL: https://issues.apache.org/jira/browse/HUDI-3103 > Project: Apache Hudi > Issue Type: New Feature > Components: deltastreamer >Reporter: YangXuan >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > It is very appropriate when MultiTableDeltaStreamer completes the business of > updating table a' by table a and updating table b' by table b. Because > MultiTableDeltaStreamer will start two threads, one to update table a', and > the other to update table b'. If the current service scenario is that table a > and table b update table c at the same time, MultiTableDeltaStreamer may > encounter write conflict and cannot complete the service. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3113) Kafka Connect create Multiple Embedded Timeline Services
[ https://issues.apache.org/jira/browse/HUDI-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-3113: - Assignee: Ethan Guo > Kafka Connect create Multiple Embedded Timeline Services > > > Key: HUDI-3113 > URL: https://issues.apache.org/jira/browse/HUDI-3113 > Project: Apache Hudi > Issue Type: Task > Components: kafka-connect >Reporter: cdmikechen >Assignee: Ethan Guo >Priority: Major > > After Kafka Connect started, I've found that hudi will create not one > Embedded Timeline Service. > {code} > [2021-12-28 07:52:57,154] INFO Starting Timeline service !! > (org.apache.hudi.client.embedded.EmbeddedTimelineService) > [2021-12-28 07:52:57,155] WARN Unable to find driver bind address from spark > config (org.apache.hudi.client.embedded.EmbeddedTimelineService) > [2021-12-28 07:52:57,169] INFO Creating View Manager with storage type > :MEMORY (org.apache.hudi.common.table.view.FileSystemViewManager) > [2021-12-28 07:52:57,170] INFO Creating in-memory based Table View > (org.apache.hudi.common.table.view.FileSystemViewManager) > [2021-12-28 07:52:57,184] INFO Logging initialized @27658ms to > org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog > (org.apache.hudi.org.eclipse.jetty.util.log) > [2021-12-28 07:52:57,502] INFO > __ __ _ > / / _ _ __ _ / /(_) > __ / // __ `/| | / // __ `// // // __ \ > / /_/ // /_/ / | |/ // /_/ // // // / / / > \/ \__,_/ |___/ \__,_//_//_//_/ /_/ > https://javalin.io/documentation > (io.javalin.Javalin) > [2021-12-28 07:52:57,504] INFO Starting Javalin ... (io.javalin.Javalin) > [2021-12-28 07:52:57,650] INFO Listening on http://localhost:43691/ > (io.javalin.Javalin) > [2021-12-28 07:52:57,650] INFO Javalin started in 151ms \o/ > (io.javalin.Javalin) > [2021-12-28 07:52:57,650] INFO Starting Timeline server on port :43691 > (org.apache.hudi.timeline.service.TimelineService) > [2021-12-28 07:52:57,650] INFO Started embedded timeline server at > 172.17.0.7:43691 (org.apache.hudi.client.embedded.EmbeddedTimelineService) > [2021-12-28 07:52:57,661] INFO Start Transaction Coordinator for topic > hudi-test-topic partition 0 > (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator) > [2021-12-28 07:52:57,690] INFO Loaded instants upto : > Option\{val=[==>20211228075022280__commit__INFLIGHT]} > (org.apache.hudi.common.table.timeline.HoodieActiveTimeline) > [2021-12-28 07:52:57,822] INFO Retrieved Raw Kafka offsets from Hudi Commit > File 0=100 (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator) > [2021-12-28 07:52:57,823] INFO Initialized the kafka offset commits \{0=100} > (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator) > [2021-12-28 07:52:57,858] INFO The value of > hoodie.datasource.write.keygenerator.type is empty, using SIMPLE > (org.apache.hudi.keygen.factory.HoodieAvroKeyGeneratorFactory) > [2021-12-28 07:52:57,928] INFO AdminClientConfig values: > bootstrap.servers = [10.3.101.60:9092] > client.dns.lookup = use_all_dns_ips > client.id = > connections.max.idle.ms = 30 > default.api.timeout.ms = 6 > metadata.max.age.ms = 30 > metric.reporters = [] > metrics.num.samples = 2 > metrics.recording.level = INFO > metrics.sample.window.ms = 3 > receive.buffer.bytes = 65536 > reconnect.backoff.max.ms = 1000 > reconnect.backoff.ms = 50 > request.timeout.ms = 3 > retries = 2147483647 > retry.backoff.ms = 100 > sasl.client.callback.handler.class = null > sasl.jaas.config = null > sasl.kerberos.kinit.cmd = /usr/bin/kinit > sasl.kerberos.min.time.before.relogin = 6 > sasl.kerberos.service.name = null > sasl.kerberos.ticket.renew.jitter = 0.05 > sasl.kerberos.ticket.renew.window.factor = 0.8 > sasl.login.callback.handler.class = null > sasl.login.class = null > sasl.login.refresh.buffer.seconds = 300 > sasl.login.refresh.min.period.seconds = 60 > sasl.login.refresh.window.factor = 0.8 > sasl.login.refresh.window.jitter = 0.05 > sasl.mechanism = GSSAPI > security.protocol = PLAINTEXT > security.providers = null > send.buffer.bytes = 131072 > socket.connection.setup.timeout.max.ms = 127000 > socket.connection.setup.timeout.ms = 1 > ssl.cipher.suites = null > ssl.enabled.protocols = [TLSv1.2, TLSv1.3] > ssl.endpoint.identification.algorithm = https > ssl.engine.factory.class = null > ssl.key.password = null > ssl.keymanager.algorithm = SunX509 > ssl.keystore.certificate.chain = null > ssl.keystore.key = null > ssl.keystore.location = null > ssl.keystore.password = null > ssl.key
[jira] [Updated] (HUDI-3114) Kafka Connect can not connect Hive by jdbc
[ https://issues.apache.org/jira/browse/HUDI-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3114: -- Fix Version/s: 0.11.0 > Kafka Connect can not connect Hive by jdbc > -- > > Key: HUDI-3114 > URL: https://issues.apache.org/jira/browse/HUDI-3114 > Project: Apache Hudi > Issue Type: Task > Components: kafka-connect >Reporter: cdmikechen >Assignee: Ethan Guo >Priority: Major > Fix For: 0.11.0 > > > Current Kafka Connect does not import hive-jdbc dependency, which makes it > impossible to create hive tables using hive jdbc. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3115) Kafka Connect should not be packaged as a bundle
[ https://issues.apache.org/jira/browse/HUDI-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3115: -- Fix Version/s: 0.11.0 > Kafka Connect should not be packaged as a bundle > > > Key: HUDI-3115 > URL: https://issues.apache.org/jira/browse/HUDI-3115 > Project: Apache Hudi > Issue Type: Task > Components: kafka-connect >Reporter: cdmikechen >Assignee: Ethan Guo >Priority: Major > Fix For: 0.11.0 > > > Currently, Kafka Connect is packaged based on bundles, but in fact, most > Kafka Connect projects do not package all dependencies into one jar. > I hoped that the packaging method by maven can be adjusted so that it can be > easily synchronized to the confluent hub in the future -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3113) Kafka Connect create Multiple Embedded Timeline Services
[ https://issues.apache.org/jira/browse/HUDI-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3113: -- Fix Version/s: 0.11.0 > Kafka Connect create Multiple Embedded Timeline Services > > > Key: HUDI-3113 > URL: https://issues.apache.org/jira/browse/HUDI-3113 > Project: Apache Hudi > Issue Type: Task > Components: kafka-connect >Reporter: cdmikechen >Assignee: Ethan Guo >Priority: Major > Fix For: 0.11.0 > > > After Kafka Connect started, I've found that hudi will create not one > Embedded Timeline Service. > {code} > [2021-12-28 07:52:57,154] INFO Starting Timeline service !! > (org.apache.hudi.client.embedded.EmbeddedTimelineService) > [2021-12-28 07:52:57,155] WARN Unable to find driver bind address from spark > config (org.apache.hudi.client.embedded.EmbeddedTimelineService) > [2021-12-28 07:52:57,169] INFO Creating View Manager with storage type > :MEMORY (org.apache.hudi.common.table.view.FileSystemViewManager) > [2021-12-28 07:52:57,170] INFO Creating in-memory based Table View > (org.apache.hudi.common.table.view.FileSystemViewManager) > [2021-12-28 07:52:57,184] INFO Logging initialized @27658ms to > org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog > (org.apache.hudi.org.eclipse.jetty.util.log) > [2021-12-28 07:52:57,502] INFO > __ __ _ > / / _ _ __ _ / /(_) > __ / // __ `/| | / // __ `// // // __ \ > / /_/ // /_/ / | |/ // /_/ // // // / / / > \/ \__,_/ |___/ \__,_//_//_//_/ /_/ > https://javalin.io/documentation > (io.javalin.Javalin) > [2021-12-28 07:52:57,504] INFO Starting Javalin ... (io.javalin.Javalin) > [2021-12-28 07:52:57,650] INFO Listening on http://localhost:43691/ > (io.javalin.Javalin) > [2021-12-28 07:52:57,650] INFO Javalin started in 151ms \o/ > (io.javalin.Javalin) > [2021-12-28 07:52:57,650] INFO Starting Timeline server on port :43691 > (org.apache.hudi.timeline.service.TimelineService) > [2021-12-28 07:52:57,650] INFO Started embedded timeline server at > 172.17.0.7:43691 (org.apache.hudi.client.embedded.EmbeddedTimelineService) > [2021-12-28 07:52:57,661] INFO Start Transaction Coordinator for topic > hudi-test-topic partition 0 > (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator) > [2021-12-28 07:52:57,690] INFO Loaded instants upto : > Option\{val=[==>20211228075022280__commit__INFLIGHT]} > (org.apache.hudi.common.table.timeline.HoodieActiveTimeline) > [2021-12-28 07:52:57,822] INFO Retrieved Raw Kafka offsets from Hudi Commit > File 0=100 (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator) > [2021-12-28 07:52:57,823] INFO Initialized the kafka offset commits \{0=100} > (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator) > [2021-12-28 07:52:57,858] INFO The value of > hoodie.datasource.write.keygenerator.type is empty, using SIMPLE > (org.apache.hudi.keygen.factory.HoodieAvroKeyGeneratorFactory) > [2021-12-28 07:52:57,928] INFO AdminClientConfig values: > bootstrap.servers = [10.3.101.60:9092] > client.dns.lookup = use_all_dns_ips > client.id = > connections.max.idle.ms = 30 > default.api.timeout.ms = 6 > metadata.max.age.ms = 30 > metric.reporters = [] > metrics.num.samples = 2 > metrics.recording.level = INFO > metrics.sample.window.ms = 3 > receive.buffer.bytes = 65536 > reconnect.backoff.max.ms = 1000 > reconnect.backoff.ms = 50 > request.timeout.ms = 3 > retries = 2147483647 > retry.backoff.ms = 100 > sasl.client.callback.handler.class = null > sasl.jaas.config = null > sasl.kerberos.kinit.cmd = /usr/bin/kinit > sasl.kerberos.min.time.before.relogin = 6 > sasl.kerberos.service.name = null > sasl.kerberos.ticket.renew.jitter = 0.05 > sasl.kerberos.ticket.renew.window.factor = 0.8 > sasl.login.callback.handler.class = null > sasl.login.class = null > sasl.login.refresh.buffer.seconds = 300 > sasl.login.refresh.min.period.seconds = 60 > sasl.login.refresh.window.factor = 0.8 > sasl.login.refresh.window.jitter = 0.05 > sasl.mechanism = GSSAPI > security.protocol = PLAINTEXT > security.providers = null > send.buffer.bytes = 131072 > socket.connection.setup.timeout.max.ms = 127000 > socket.connection.setup.timeout.ms = 1 > ssl.cipher.suites = null > ssl.enabled.protocols = [TLSv1.2, TLSv1.3] > ssl.endpoint.identification.algorithm = https > ssl.engine.factory.class = null > ssl.key.password = null > ssl.keymanager.algorithm = SunX509 > ssl.keystore.certificate.chain = null > ssl.keystore.key = null > ssl.keystore.location = null > ssl.keystore.pa
[jira] [Commented] (HUDI-3113) Kafka Connect create Multiple Embedded Timeline Services
[ https://issues.apache.org/jira/browse/HUDI-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484426#comment-17484426 ] sivabalan narayanan commented on HUDI-3113: --- CC [~guoyihua] > Kafka Connect create Multiple Embedded Timeline Services > > > Key: HUDI-3113 > URL: https://issues.apache.org/jira/browse/HUDI-3113 > Project: Apache Hudi > Issue Type: Task > Components: kafka-connect >Reporter: cdmikechen >Priority: Major > > After Kafka Connect started, I've found that hudi will create not one > Embedded Timeline Service. > {code} > [2021-12-28 07:52:57,154] INFO Starting Timeline service !! > (org.apache.hudi.client.embedded.EmbeddedTimelineService) > [2021-12-28 07:52:57,155] WARN Unable to find driver bind address from spark > config (org.apache.hudi.client.embedded.EmbeddedTimelineService) > [2021-12-28 07:52:57,169] INFO Creating View Manager with storage type > :MEMORY (org.apache.hudi.common.table.view.FileSystemViewManager) > [2021-12-28 07:52:57,170] INFO Creating in-memory based Table View > (org.apache.hudi.common.table.view.FileSystemViewManager) > [2021-12-28 07:52:57,184] INFO Logging initialized @27658ms to > org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog > (org.apache.hudi.org.eclipse.jetty.util.log) > [2021-12-28 07:52:57,502] INFO > __ __ _ > / / _ _ __ _ / /(_) > __ / // __ `/| | / // __ `// // // __ \ > / /_/ // /_/ / | |/ // /_/ // // // / / / > \/ \__,_/ |___/ \__,_//_//_//_/ /_/ > https://javalin.io/documentation > (io.javalin.Javalin) > [2021-12-28 07:52:57,504] INFO Starting Javalin ... (io.javalin.Javalin) > [2021-12-28 07:52:57,650] INFO Listening on http://localhost:43691/ > (io.javalin.Javalin) > [2021-12-28 07:52:57,650] INFO Javalin started in 151ms \o/ > (io.javalin.Javalin) > [2021-12-28 07:52:57,650] INFO Starting Timeline server on port :43691 > (org.apache.hudi.timeline.service.TimelineService) > [2021-12-28 07:52:57,650] INFO Started embedded timeline server at > 172.17.0.7:43691 (org.apache.hudi.client.embedded.EmbeddedTimelineService) > [2021-12-28 07:52:57,661] INFO Start Transaction Coordinator for topic > hudi-test-topic partition 0 > (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator) > [2021-12-28 07:52:57,690] INFO Loaded instants upto : > Option\{val=[==>20211228075022280__commit__INFLIGHT]} > (org.apache.hudi.common.table.timeline.HoodieActiveTimeline) > [2021-12-28 07:52:57,822] INFO Retrieved Raw Kafka offsets from Hudi Commit > File 0=100 (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator) > [2021-12-28 07:52:57,823] INFO Initialized the kafka offset commits \{0=100} > (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator) > [2021-12-28 07:52:57,858] INFO The value of > hoodie.datasource.write.keygenerator.type is empty, using SIMPLE > (org.apache.hudi.keygen.factory.HoodieAvroKeyGeneratorFactory) > [2021-12-28 07:52:57,928] INFO AdminClientConfig values: > bootstrap.servers = [10.3.101.60:9092] > client.dns.lookup = use_all_dns_ips > client.id = > connections.max.idle.ms = 30 > default.api.timeout.ms = 6 > metadata.max.age.ms = 30 > metric.reporters = [] > metrics.num.samples = 2 > metrics.recording.level = INFO > metrics.sample.window.ms = 3 > receive.buffer.bytes = 65536 > reconnect.backoff.max.ms = 1000 > reconnect.backoff.ms = 50 > request.timeout.ms = 3 > retries = 2147483647 > retry.backoff.ms = 100 > sasl.client.callback.handler.class = null > sasl.jaas.config = null > sasl.kerberos.kinit.cmd = /usr/bin/kinit > sasl.kerberos.min.time.before.relogin = 6 > sasl.kerberos.service.name = null > sasl.kerberos.ticket.renew.jitter = 0.05 > sasl.kerberos.ticket.renew.window.factor = 0.8 > sasl.login.callback.handler.class = null > sasl.login.class = null > sasl.login.refresh.buffer.seconds = 300 > sasl.login.refresh.min.period.seconds = 60 > sasl.login.refresh.window.factor = 0.8 > sasl.login.refresh.window.jitter = 0.05 > sasl.mechanism = GSSAPI > security.protocol = PLAINTEXT > security.providers = null > send.buffer.bytes = 131072 > socket.connection.setup.timeout.max.ms = 127000 > socket.connection.setup.timeout.ms = 1 > ssl.cipher.suites = null > ssl.enabled.protocols = [TLSv1.2, TLSv1.3] > ssl.endpoint.identification.algorithm = https > ssl.engine.factory.class = null > ssl.key.password = null > ssl.keymanager.algorithm = SunX509 > ssl.keystore.certificate.chain = null > ssl.keystore.key = null > ssl.keystore.location = null > ssl.keystore.password = null >
[jira] [Assigned] (HUDI-3114) Kafka Connect can not connect Hive by jdbc
[ https://issues.apache.org/jira/browse/HUDI-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-3114: - Assignee: Ethan Guo > Kafka Connect can not connect Hive by jdbc > -- > > Key: HUDI-3114 > URL: https://issues.apache.org/jira/browse/HUDI-3114 > Project: Apache Hudi > Issue Type: Task > Components: kafka-connect >Reporter: cdmikechen >Assignee: Ethan Guo >Priority: Major > > Current Kafka Connect does not import hive-jdbc dependency, which makes it > impossible to create hive tables using hive jdbc. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3115) Kafka Connect should not be packaged as a bundle
[ https://issues.apache.org/jira/browse/HUDI-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-3115: - Assignee: Ethan Guo > Kafka Connect should not be packaged as a bundle > > > Key: HUDI-3115 > URL: https://issues.apache.org/jira/browse/HUDI-3115 > Project: Apache Hudi > Issue Type: Task > Components: kafka-connect >Reporter: cdmikechen >Assignee: Ethan Guo >Priority: Major > > Currently, Kafka Connect is packaged based on bundles, but in fact, most > Kafka Connect projects do not package all dependencies into one jar. > I hoped that the packaging method by maven can be adjusted so that it can be > easily synchronized to the confluent hub in the future -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3117) Kafka Connect can not clearly distinguish every task log
[ https://issues.apache.org/jira/browse/HUDI-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3117: -- Fix Version/s: 0.11.0 > Kafka Connect can not clearly distinguish every task log > > > Key: HUDI-3117 > URL: https://issues.apache.org/jira/browse/HUDI-3117 > Project: Apache Hudi > Issue Type: Task >Reporter: cdmikechen >Assignee: Ethan Guo >Priority: Major > Labels: kafka-connect > Fix For: 0.11.0 > > > After creating multiple tasks in Kafka connect, it is difficult to > distinguish which task is processed in the current part through log > information because there is no field declaration of task related information. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HUDI-3117) Kafka Connect can not clearly distinguish every task log
[ https://issues.apache.org/jira/browse/HUDI-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484425#comment-17484425 ] sivabalan narayanan commented on HUDI-3117: --- CC [~guoyihua] > Kafka Connect can not clearly distinguish every task log > > > Key: HUDI-3117 > URL: https://issues.apache.org/jira/browse/HUDI-3117 > Project: Apache Hudi > Issue Type: Task >Reporter: cdmikechen >Priority: Major > Labels: kafka-connect > > After creating multiple tasks in Kafka connect, it is difficult to > distinguish which task is processed in the current part through log > information because there is no field declaration of task related information. -- This message was sent by Atlassian Jira (v8.20.1#820001)