[jira] [Assigned] (HUDI-7598) Remove duplicate methods in subclasses of HoodieSparkClientTestBase to enhance reusability
[ https://issues.apache.org/jira/browse/HUDI-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-7598: --- Assignee: Vova Kolmakov > Remove duplicate methods in subclasses of HoodieSparkClientTestBase to > enhance reusability > -- > > Key: HUDI-7598 > URL: https://issues.apache.org/jira/browse/HUDI-7598 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Vova Kolmakov >Priority: Minor > Labels: starter > Fix For: 1.0.0 > > > https://github.com/apache/hudi/pull/10352#discussion_r1444909613 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6563]Supports flink lookup join [hudi]
hudi-bot commented on PR #9228: URL: https://github.com/apache/hudi/pull/9228#issuecomment-2106790589 ## CI report: * 28351cba30dbd1b366c49c7b4218d8ce61920528 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23871) * 55ceb8d72c2eb0e23b7763102959258101a363d1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23872) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-7747) In MetaClient remove getBasePathV2() and return StoragePath from getBasePath()
[ https://issues.apache.org/jira/browse/HUDI-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-7747: --- Assignee: Vova Kolmakov > In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() > -- > > Key: HUDI-7747 > URL: https://issues.apache.org/jira/browse/HUDI-7747 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Jonathan Vexler >Assignee: Vova Kolmakov >Priority: Major > > In HoodieTableMetaClient remove getBasePathV2() and return StoragePath from > getBasePath(). -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6563]Supports flink lookup join [hudi]
hudi-bot commented on PR #9228: URL: https://github.com/apache/hudi/pull/9228#issuecomment-2106778064 ## CI report: * 2e76abc1279b28780dfc17f06a96f841021f0fea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18918) * 28351cba30dbd1b366c49c7b4218d8ce61920528 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23871) * 55ceb8d72c2eb0e23b7763102959258101a363d1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6563]Supports flink lookup join [hudi]
hudi-bot commented on PR #9228: URL: https://github.com/apache/hudi/pull/9228#issuecomment-2106710045 ## CI report: * 2e76abc1279b28780dfc17f06a96f841021f0fea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18918) * 28351cba30dbd1b366c49c7b4218d8ce61920528 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23871) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6563]Supports flink lookup join [hudi]
hudi-bot commented on PR #9228: URL: https://github.com/apache/hudi/pull/9228#issuecomment-2106703116 ## CI report: * 2e76abc1279b28780dfc17f06a96f841021f0fea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18918) * 28351cba30dbd1b366c49c7b4218d8ce61920528 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Adding New Configuration To Support ZSTD Level [hudi]
ad1happy2go commented on issue #11196: URL: https://github.com/apache/hudi/issues/11196#issuecomment-2106679631 @Amar1404 With spark, Did you tried to give config along with write.df. - .option("parquet.compression.codec.zstd.level", "22") -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]
hudi-bot commented on PR #11154: URL: https://github.com/apache/hudi/pull/11154#issuecomment-2106589168 ## CI report: * 12038dbde068e26f733a7b1c9cc7217019c31f25 UNKNOWN * a2f928ca3c4ef9d103d48c56df4b647e961b7f56 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23869) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]
hudi-bot commented on PR #11154: URL: https://github.com/apache/hudi/pull/11154#issuecomment-2106583567 ## CI report: * 12038dbde068e26f733a7b1c9cc7217019c31f25 UNKNOWN * fbb9dd5d64652ddec923dc7948f77adc61e823b3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23717) * a2f928ca3c4ef9d103d48c56df4b647e961b7f56 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23869) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]
hudi-bot commented on PR #11154: URL: https://github.com/apache/hudi/pull/11154#issuecomment-2106578074 ## CI report: * 12038dbde068e26f733a7b1c9cc7217019c31f25 UNKNOWN * fbb9dd5d64652ddec923dc7948f77adc61e823b3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23717) * a2f928ca3c4ef9d103d48c56df4b647e961b7f56 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7549] Reverting spurious log block deduction with LogRecordReader [hudi]
hudi-bot commented on PR #10922: URL: https://github.com/apache/hudi/pull/10922#issuecomment-2106577791 ## CI report: * 1c36f92dbff0e9be085a409d28cb9403a0343781 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23866) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7743] Improve StoragePath usages (#11189)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 4b59b202f49 [HUDI-7743] Improve StoragePath usages (#11189) 4b59b202f49 is described below commit 4b59b202f491780eeab8c67ce5f4b6506200c7b4 Author: Jon Vexler AuthorDate: Sun May 12 23:18:09 2024 -0400 [HUDI-7743] Improve StoragePath usages (#11189) Co-authored-by: Jonathan Vexler <=> Co-authored-by: Y Ethan Guo --- .../hudi/cli/commands/ArchivedCommitsCommand.java | 19 .../apache/hudi/cli/commands/RepairsCommand.java | 11 - .../org/apache/hudi/cli/commands/TableCommand.java | 14 .../apache/hudi/cli/commands/TimelineCommand.java | 4 ++-- .../apache/hudi/cli/commands/TestTableCommand.java | 4 ++-- .../cli/commands/TestUpgradeDowngradeCommand.java | 4 ++-- .../hudi/client/heartbeat/HeartbeatUtils.java | 2 +- .../client/heartbeat/HoodieHeartbeatClient.java| 4 ++-- .../utils/LegacyArchivedMetaEntryReader.java | 2 +- .../index/bucket/ConsistentBucketIndexUtils.java | 8 +++ .../org/apache/hudi/io/HoodieKeyLookupHandle.java | 3 +-- .../java/org/apache/hudi/io/HoodieReadHandle.java | 5 ++--- .../java/org/apache/hudi/io/HoodieWriteHandle.java | 2 +- .../metadata/HoodieBackedTableMetadataWriter.java | 3 +-- .../java/org/apache/hudi/table/HoodieTable.java| 4 ++-- .../table/action/commit/HoodieMergeHelper.java | 3 +-- .../table/action/index/RunIndexActionExecutor.java | 3 +-- .../BaseHoodieFunctionalIndexClient.java | 3 +-- .../rollback/ListingBasedRollbackStrategy.java | 6 ++--- .../hudi/table/upgrade/UpgradeDowngrade.java | 6 ++--- .../table/upgrade/ZeroToOneUpgradeHandler.java | 2 +- .../apache/hudi/io/FlinkWriteHandleFactory.java| 4 +++- .../io/storage/row/HoodieRowDataCreateHandle.java | 7 -- .../row/HoodieRowDataFileWriterFactory.java| 4 ++-- .../org/apache/hudi/table/HoodieJavaTable.java | 5 ++--- .../client/utils/SparkMetadataWriterUtils.java | 5 +++-- .../index/bloom/HoodieFileProbingFunction.java | 3 +-- .../org/apache/hudi/table/HoodieSparkTable.java| 5 ++--- .../functional/TestHoodieBackedMetadata.java | 4 ++-- .../TestCopyOnWriteRollbackActionExecutor.java | 2 +- .../TestHoodieSparkMergeOnReadTableRollback.java | 4 ++-- .../hudi/table/upgrade/TestUpgradeDowngrade.java | 16 ++--- .../common/config/HoodieFunctionalIndexConfig.java | 2 +- .../java/org/apache/hudi/common/fs/FSUtils.java| 2 +- .../common/heartbeat/HoodieHeartbeatUtils.java | 2 +- .../hudi/common/table/HoodieTableConfig.java | 8 +++ .../hudi/common/table/HoodieTableMetaClient.java | 6 ++--- .../table/timeline/HoodieActiveTimeline.java | 4 ++-- .../hudi/common/table/timeline/LSMTimeline.java| 2 +- .../view/HoodieTablePreCommitFileSystemView.java | 2 +- .../org/apache/hudi/common/util/ConfigUtils.java | 2 +- .../index/secondary/SecondaryIndexManager.java | 7 +++--- .../io/FileBasedInternalSchemaStorageManager.java | 5 ++--- .../metadata/FileSystemBackedTableMetadata.java| 2 +- .../hudi/metadata/HoodieBackedTableMetadata.java | 4 ++-- .../hudi/sink/bootstrap/BootstrapOperator.java | 3 +-- .../java/org/apache/hudi/util/StreamerUtil.java| 2 +- .../hudi/sink/bucket/ITTestBucketStreamWrite.java | 2 +- .../apache/hudi/table/format/TestInputFormat.java | 2 +- .../common/config/DFSPropertiesConfiguration.java | 2 +- .../common/bootstrap/index/TestBootstrapIndex.java | 3 +-- .../fs/TestFSUtilsWithRetryWrapperEnable.java | 8 +++ .../hudi/common/table/TestHoodieTableConfig.java | 26 +++--- .../common/table/TestHoodieTableMetaClient.java| 2 +- .../table/view/TestHoodieTableFileSystemView.java | 6 ++--- .../table/view/TestIncrementalFSViewSync.java | 2 +- .../hadoop/HoodieCopyOnWriteTableInputFormat.java | 4 ++-- .../hudi/hadoop/HoodieHFileRecordReader.java | 3 ++- .../hudi/hadoop/HoodieROTablePathFilter.java | 8 --- .../apache/hudi/hadoop/SchemaEvolutionContext.java | 5 +++-- .../HoodieMergeOnReadTableInputFormat.java | 3 +-- .../hudi/hadoop/utils/HoodieInputFormatUtils.java | 8 --- .../utils/HoodieRealtimeRecordReaderUtils.java | 4 ++-- .../reader/DFSHoodieDatasetInputReader.java| 3 +-- .../scala/org/apache/hudi/HoodieBaseRelation.scala | 11 - .../org/apache/spark/sql/hudi/DedupeSparkJob.scala | 15 +++-- .../procedures/ExportInstantsProcedure.scala | 3 ++- .../RepairMigratePartitionMetaProcedure.scala | 2 +- .../RepairOverwriteHoodiePropsProcedure.scala | 5 + .../apache/spark/sql/hudi/common/TestSqlConf.scala | 6 ++--- .../TestUpgradeOrDowngrad
Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]
yihua merged PR #11189: URL: https://github.com/apache/hudi/pull/11189 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6563) Supports flink lookup join
[ https://issues.apache.org/jira/browse/HUDI-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6563: - Reviewers: Danny Chen > Supports flink lookup join > --- > > Key: HUDI-6563 > URL: https://issues.apache.org/jira/browse/HUDI-6563 > Project: Apache Hudi > Issue Type: New Feature > Components: flink >Reporter: waywtdcc >Priority: Major > Labels: pull-request-available > > Supports flink lookup join > > {code:java} > CREATE TABLE `datagen_source`( > id int, > name STRING, > proctime as PROCTIME() > ) WITH ( > 'connector' = 'datagen', > 'rows-per-second'='1', > 'number-of-rows' = '2', > 'fields.id.kind'='sequence', > 'fields.id.start'='1', > 'fields.id.end'='2' > );select o.id,o.name,b.id as id2 > from datagen_source AS o > join hudi_table/*+ OPTIONS('lookup.join.cache.ttl'= '2 day') */ FOR > SYSTEM_TIME AS OF o.proctime AS b on o.id = b.id; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6563) Supports flink lookup join
[ https://issues.apache.org/jira/browse/HUDI-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6563: - Status: In Progress (was: Open) > Supports flink lookup join > --- > > Key: HUDI-6563 > URL: https://issues.apache.org/jira/browse/HUDI-6563 > Project: Apache Hudi > Issue Type: New Feature > Components: flink >Reporter: waywtdcc >Priority: Major > Labels: pull-request-available > > Supports flink lookup join > > {code:java} > CREATE TABLE `datagen_source`( > id int, > name STRING, > proctime as PROCTIME() > ) WITH ( > 'connector' = 'datagen', > 'rows-per-second'='1', > 'number-of-rows' = '2', > 'fields.id.kind'='sequence', > 'fields.id.start'='1', > 'fields.id.end'='2' > );select o.id,o.name,b.id as id2 > from datagen_source AS o > join hudi_table/*+ OPTIONS('lookup.join.cache.ttl'= '2 day') */ FOR > SYSTEM_TIME AS OF o.proctime AS b on o.id = b.id; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6563) Supports flink lookup join
[ https://issues.apache.org/jira/browse/HUDI-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6563: - Sprint: Sprint 2023-04-26 > Supports flink lookup join > --- > > Key: HUDI-6563 > URL: https://issues.apache.org/jira/browse/HUDI-6563 > Project: Apache Hudi > Issue Type: New Feature > Components: flink >Reporter: waywtdcc >Priority: Major > Labels: pull-request-available > > Supports flink lookup join > > {code:java} > CREATE TABLE `datagen_source`( > id int, > name STRING, > proctime as PROCTIME() > ) WITH ( > 'connector' = 'datagen', > 'rows-per-second'='1', > 'number-of-rows' = '2', > 'fields.id.kind'='sequence', > 'fields.id.start'='1', > 'fields.id.end'='2' > );select o.id,o.name,b.id as id2 > from datagen_source AS o > join hudi_table/*+ OPTIONS('lookup.join.cache.ttl'= '2 day') */ FOR > SYSTEM_TIME AS OF o.proctime AS b on o.id = b.id; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6563]Supports flink lookup join [hudi]
danny0405 commented on PR #9228: URL: https://github.com/apache/hudi/pull/9228#issuecomment-2106552715 @waywtdcc Hi, can you rebase with the latest master and I will take a look of this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7535] Add metrics for sourceParallelism and Refresh profile in S3/GCS [hudi]
hudi-bot commented on PR #10918: URL: https://github.com/apache/hudi/pull/10918#issuecomment-2106545016 ## CI report: * dba597f6e2b2c8dccad7b2768bffb27a623a1acf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23868) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7535] Add metrics for sourceParallelism and Refresh profile in S3/GCS [hudi]
hudi-bot commented on PR #10918: URL: https://github.com/apache/hudi/pull/10918#issuecomment-2106539553 ## CI report: * 95436a55a29960c5bdeb8901f83c90d4712aa40b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23007) * dba597f6e2b2c8dccad7b2768bffb27a623a1acf UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]
hudi-bot commented on PR #11189: URL: https://github.com/apache/hudi/pull/11189#issuecomment-2106534204 ## CI report: * 4095b60ef4c272c8046aeb9e2a1d13db2d1c0a9d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23865) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]xxx.parquet is not a Parquet file [hudi]
MrAladdin commented on issue #11178: URL: https://github.com/apache/hudi/issues/11178#issuecomment-2106518950 @ad1happy2go I need your help to answer the question I replied to you above, thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]
yihua merged PR #10900: URL: https://github.com/apache/hudi/pull/10900 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (be0a6604b12 -> ce08875a0d7)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from be0a6604b12 [HUDI-7501] Use source profile for S3 and GCS sources (#10861) add ce08875a0d7 [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource (#10900) No new revisions were added by this update. Summary of changes: .../org/apache/hudi/common/util/ConfigUtils.java | 17 +- .../apache/hudi/common/util/TestConfigUtils.java | 64 -- .../utilities/config/HoodieIncrSourceConfig.java | 8 +++ .../hudi/utilities/sources/HoodieIncrSource.java | 16 +- .../utilities/sources/TestHoodieIncrSource.java| 40 +- 5 files changed, 121 insertions(+), 24 deletions(-)
Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]
hudi-bot commented on PR #10900: URL: https://github.com/apache/hudi/pull/10900#issuecomment-2106500988 ## CI report: * b91da909a18c11702b917910846356e98aeaecf2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23864) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7501] Use source profile for S3 and GCS sources (#10861)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new be0a6604b12 [HUDI-7501] Use source profile for S3 and GCS sources (#10861) be0a6604b12 is described below commit be0a6604b12abe6ef74a7b2c83f24de6af19e3d7 Author: Vinish Reddy AuthorDate: Mon May 13 07:23:31 2024 +0530 [HUDI-7501] Use source profile for S3 and GCS sources (#10861) Co-authored-by: Y Ethan Guo --- .../org/apache/hudi/utilities/UtilHelpers.java | 53 - .../sources/GcsEventsHoodieIncrSource.java | 61 -- .../hudi/utilities/sources/HoodieIncrSource.java | 6 +- .../apache/hudi/utilities/sources/RowSource.java | 8 +- .../sources/S3EventsHoodieIncrSource.java | 87 +++--- .../sources/helpers/CloudDataFetcher.java | 79 - .../helpers/CloudObjectsSelectorCommon.java| 70 .../helpers/gcs/GcsObjectMetadataFetcher.java | 86 -- .../sources/TestGcsEventsHoodieIncrSource.java | 83 ++ .../utilities/sources/TestHoodieIncrSource.java| 3 +- .../sources/TestS3EventsHoodieIncrSource.java | 125 - .../debezium/TestAbstractDebeziumSource.java | 3 +- .../helpers/TestCloudObjectsSelectorCommon.java| 42 --- 13 files changed, 383 insertions(+), 323 deletions(-) diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java index 124abeb059f..d0acffe5d17 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java @@ -40,6 +40,7 @@ import org.apache.hudi.common.util.Option; import org.apache.hudi.common.util.ReflectionUtils; import org.apache.hudi.common.util.StringUtils; import org.apache.hudi.common.util.ValidationUtils; +import org.apache.hudi.common.util.collection.Pair; import org.apache.hudi.config.HoodieCompactionConfig; import org.apache.hudi.config.HoodieIndexConfig; import org.apache.hudi.config.HoodieLockConfig; @@ -140,42 +141,30 @@ public class UtilHelpers { } public static Source createSource(String sourceClass, TypedProperties cfg, JavaSparkContext jssc, - SparkSession sparkSession, SchemaProvider schemaProvider, - HoodieIngestionMetrics metrics) throws IOException { -try { +SparkSession sparkSession, HoodieIngestionMetrics metrics, StreamContext streamContext) throws IOException { +// All possible constructors. +Class[] constructorArgsStreamContextMetrics = new Class[] {TypedProperties.class, JavaSparkContext.class, SparkSession.class, HoodieIngestionMetrics.class, StreamContext.class}; +Class[] constructorArgsStreamContext = new Class[] {TypedProperties.class, JavaSparkContext.class, SparkSession.class, StreamContext.class}; +Class[] constructorArgsMetrics = new Class[] {TypedProperties.class, JavaSparkContext.class, SparkSession.class, SchemaProvider.class, HoodieIngestionMetrics.class}; +Class[] constructorArgs = new Class[] {TypedProperties.class, JavaSparkContext.class, SparkSession.class, SchemaProvider.class}; +// List of constructor and their respective arguments. +List[], Object[]>> sourceConstructorAndArgs = new ArrayList<>(); +sourceConstructorAndArgs.add(Pair.of(constructorArgsStreamContextMetrics, new Object[] {cfg, jssc, sparkSession, metrics, streamContext})); +sourceConstructorAndArgs.add(Pair.of(constructorArgsStreamContext, new Object[] {cfg, jssc, sparkSession, streamContext})); +sourceConstructorAndArgs.add(Pair.of(constructorArgsMetrics, new Object[] {cfg, jssc, sparkSession, streamContext.getSchemaProvider(), metrics})); +sourceConstructorAndArgs.add(Pair.of(constructorArgs, new Object[] {cfg, jssc, sparkSession, streamContext.getSchemaProvider()})); + +HoodieException sourceClassLoadException = null; +for (Pair[], Object[]> constructor : sourceConstructorAndArgs) { try { -return (Source) ReflectionUtils.loadClass(sourceClass, -new Class[] {TypedProperties.class, JavaSparkContext.class, -SparkSession.class, SchemaProvider.class, -HoodieIngestionMetrics.class}, -cfg, jssc, sparkSession, schemaProvider, metrics); +return (Source) ReflectionUtils.loadClass(sourceClass, constructor.getLeft(), constructor.getRight()); } catch (HoodieException e) { -return (Source) ReflectionUtils.loadClass(sourceClass, -new Class[] {TypedProperties.class, JavaSparkContext.class, -SparkSession.class, SchemaProvider.class}, -cfg, jssc, sparkSession, schemaProvider); +sourceClassLoadException
Re: [PR] [HUDI-7501] Use source profile for S3 and GCS sources [hudi]
yihua merged PR #10861: URL: https://github.com/apache/hudi/pull/10861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7549] Reverting spurious log block deduction with LogRecordReader [hudi]
hudi-bot commented on PR #10922: URL: https://github.com/apache/hudi/pull/10922#issuecomment-2106494067 ## CI report: * 41e7049a782561d5f8f9a21af7ba4c1021b3fb14 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23810) * 1c36f92dbff0e9be085a409d28cb9403a0343781 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23866) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7549] Reverting spurious log block deduction with LogRecordReader [hudi]
hudi-bot commented on PR #10922: URL: https://github.com/apache/hudi/pull/10922#issuecomment-2106487245 ## CI report: * 41e7049a782561d5f8f9a21af7ba4c1021b3fb14 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23810) * 1c36f92dbff0e9be085a409d28cb9403a0343781 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7501] Use source profile for S3 and GCS sources [hudi]
hudi-bot commented on PR #10861: URL: https://github.com/apache/hudi/pull/10861#issuecomment-2106487144 ## CI report: * 2c6eb9de69f80fbc5cbd83c8e2faa4ed93bf0980 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23861) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Adding New Configuration To Support ZSTD Level [hudi]
danny0405 commented on issue #11196: URL: https://github.com/apache/hudi/issues/11196#issuecomment-2106456256 In Flink, you can use `parquet.` prefix for any property that you wanna customize with the parquet writer, not sure whether Spark has the similiar function. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]
hudi-bot commented on PR #11189: URL: https://github.com/apache/hudi/pull/11189#issuecomment-2106449606 ## CI report: * 511e55b8d042e8db674b48b203f3bf9b8f52ad6e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23858) * 4095b60ef4c272c8046aeb9e2a1d13db2d1c0a9d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23865) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]
hudi-bot commented on PR #11189: URL: https://github.com/apache/hudi/pull/11189#issuecomment-210660 ## CI report: * 511e55b8d042e8db674b48b203f3bf9b8f52ad6e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23858) * 4095b60ef4c272c8046aeb9e2a1d13db2d1c0a9d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]
hudi-bot commented on PR #10900: URL: https://github.com/apache/hudi/pull/10900#issuecomment-2106444120 ## CI report: * 39c476826c6dd8182d758c39e3cfbada40ec2b1b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23862) * b91da909a18c11702b917910846356e98aeaecf2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23864) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]
yihua commented on code in PR #11192: URL: https://github.com/apache/hudi/pull/11192#discussion_r1597761032 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala: ## @@ -853,7 +852,7 @@ object HoodieBaseRelation extends SparkAdapterSupport { val hoodieConfig = new HoodieConfig() hoodieConfig.setValue(USE_NATIVE_HFILE_READER, options.getOrElse(USE_NATIVE_HFILE_READER.key(), USE_NATIVE_HFILE_READER.defaultValue().toString)) - val reader = HoodieFileReaderFactory.getReaderFactory(HoodieRecordType.AVRO) + val reader = (new HoodieSparkIOFactory).getReaderFactory(HoodieRecordType.AVRO) Review Comment: Based on the discussion, it is safer to hardcode the class for now as there are gaps in passing the storage configuration outside the `hudi-common` module. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]
yihua commented on code in PR #11192: URL: https://github.com/apache/hudi/pull/11192#discussion_r1597760489 ## hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileWriterFactory.java: ## @@ -43,39 +40,18 @@ public class HoodieFileWriterFactory { - private static HoodieFileWriterFactory getWriterFactory(HoodieRecord.HoodieRecordType recordType) { Review Comment: `HoodieFileReaderFactory` and `HoodieFileWriterFactory` contain such methods that throw `UnsupportedOperationException`. Instead, such methods should be abstract and the factory classes should also be made abstract or interface. ``` protected HoodieFileReader newParquetFileReader(StorageConfiguration conf, StoragePath path) { throw new UnsupportedOperationException(); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]
hudi-bot commented on PR #10900: URL: https://github.com/apache/hudi/pull/10900#issuecomment-2106440061 ## CI report: * 39c476826c6dd8182d758c39e3cfbada40ec2b1b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23862) * b91da909a18c11702b917910846356e98aeaecf2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]
yihua commented on code in PR #11189: URL: https://github.com/apache/hudi/pull/11189#discussion_r1597754130 ## hudi-common/src/main/java/org/apache/hudi/common/table/view/IncrementalTimelineSyncFileSystemView.java: ## @@ -269,7 +269,7 @@ private void updatePartitionWriteFileGroups(Map> p LOG.info("Syncing partition (" + partition + ") of instant (" + instant + ")"); List pathInfoList = entry.getValue().stream() .map(p -> new StoragePathInfo( -new StoragePath(String.format("%s/%s", metaClient.getBasePath(), p.getPath())), +new StoragePath(metaClient.getBasePathV2(), p.getPath()), Review Comment: If p.getPath() has a slash as the prefix, there will be a behavior change. ## hudi-common/src/main/java/org/apache/hudi/common/table/view/IncrementalTimelineSyncFileSystemView.java: ## @@ -269,7 +269,7 @@ private void updatePartitionWriteFileGroups(Map> p LOG.info("Syncing partition (" + partition + ") of instant (" + instant + ")"); List pathInfoList = entry.getValue().stream() .map(p -> new StoragePathInfo( -new StoragePath(String.format("%s/%s", metaClient.getBasePath(), p.getPath())), +new StoragePath(metaClient.getBasePathV2(), p.getPath()), Review Comment: ```suggestion new StoragePath(String.format("%s/%s", metaClient.getBasePath(), p.getPath())), ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]
yihua commented on code in PR #11189: URL: https://github.com/apache/hudi/pull/11189#discussion_r1597753236 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java: ## @@ -2040,7 +2040,7 @@ public void testEagerRollbackinMDT() throws IOException { // collect all commit meta files from metadata table. List metaFiles = metaClient.getStorage() -.listDirectEntries(new StoragePath(metaClient.getMetaPath() + "/metadata/.hoodie")); +.listDirectEntries(new StoragePath(metaClient.getMetaPath(), "/metadata/.hoodie")); Review Comment: ```suggestion .listDirectEntries(new StoragePath(metaClient.getMetaPath(), "metadata/.hoodie")); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-4732] Add support for confluent schema registry with proto (#11070)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new aa5bb0dda34 [HUDI-4732] Add support for confluent schema registry with proto (#11070) aa5bb0dda34 is described below commit aa5bb0dda34bf643d61e96f51a456cf876c0a0eb Author: Tim Brown AuthorDate: Sun May 12 19:59:45 2024 -0400 [HUDI-4732] Add support for confluent schema registry with proto (#11070) Co-authored-by: Y Ethan Guo --- hudi-utilities/pom.xml | 7 +-- .../hudi/utilities/config/KafkaSourceConfig.java | 8 +++ .../deser/KafkaAvroSchemaDeserializer.java | 4 +- .../schema/ProtoClassBasedSchemaProvider.java | 10 +--- .../ProtoSchemaToAvroSchemaConverter.java | 43 +++ .../hudi/utilities/sources/ProtoKafkaSource.java | 40 ++ .../sources/helpers/ProtoConversionUtil.java | 56 +-- .../deser/TestKafkaAvroSchemaDeserializer.java | 8 +-- .../TestProtoSchemaToAvroSchemaConverter.java | 50 + .../utilities/sources/TestProtoKafkaSource.java| 63 -- packaging/hudi-utilities-bundle/pom.xml| 1 + packaging/hudi-utilities-slim-bundle/pom.xml | 1 + pom.xml| 34 +++- 13 files changed, 288 insertions(+), 37 deletions(-) diff --git a/hudi-utilities/pom.xml b/hudi-utilities/pom.xml index 3a7a9d6a712..47c172b7791 100644 --- a/hudi-utilities/pom.xml +++ b/hudi-utilities/pom.xml @@ -361,12 +361,10 @@ io.confluent kafka-avro-serializer - ${confluent.version} io.confluent common-config - ${confluent.version} io.confluent @@ -376,7 +374,10 @@ io.confluent kafka-schema-registry-client - ${confluent.version} + + + io.confluent + kafka-protobuf-serializer diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/config/KafkaSourceConfig.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/config/KafkaSourceConfig.java index 024712f8cdd..6215e99d665 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/config/KafkaSourceConfig.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/config/KafkaSourceConfig.java @@ -24,6 +24,8 @@ import org.apache.hudi.common.config.ConfigGroups; import org.apache.hudi.common.config.ConfigProperty; import org.apache.hudi.common.config.HoodieConfig; +import org.apache.kafka.common.serialization.ByteArrayDeserializer; + import javax.annotation.concurrent.Immutable; import static org.apache.hudi.common.util.ConfigUtils.DELTA_STREAMER_CONFIG_PREFIX; @@ -120,6 +122,12 @@ public class KafkaSourceConfig extends HoodieConfig { .markAdvanced() .withDocumentation("Kafka consumer strategy for reading data."); + public static final ConfigProperty KAFKA_PROTO_VALUE_DESERIALIZER_CLASS = ConfigProperty + .key(PREFIX + "proto.value.deserializer.class") + .defaultValue(ByteArrayDeserializer.class.getName()) + .sinceVersion("0.15.0") + .withDocumentation("Kafka Proto Payload Deserializer Class"); + /** * Kafka reset offset strategies. */ diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deser/KafkaAvroSchemaDeserializer.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deser/KafkaAvroSchemaDeserializer.java index 246be5f8ec6..4673eceed15 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deser/KafkaAvroSchemaDeserializer.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deser/KafkaAvroSchemaDeserializer.java @@ -60,7 +60,6 @@ public class KafkaAvroSchemaDeserializer extends KafkaAvroDeserializer { /** * We need to inject sourceSchema instead of reader schema during deserialization or later stages of the pipeline. * - * @param includeSchemaAndVersion * @param topic * @param isKey * @param payload @@ -70,13 +69,12 @@ public class KafkaAvroSchemaDeserializer extends KafkaAvroDeserializer { */ @Override protected Object deserialize( - boolean includeSchemaAndVersion, String topic, Boolean isKey, byte[] payload, Schema readerSchema) throws SerializationException { -return super.deserialize(includeSchemaAndVersion, topic, isKey, payload, sourceSchema); +return super.deserialize(topic, isKey, payload, sourceSchema); } protected TypedProperties getConvertToTypedProperties(Map configs) { diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/ProtoClassBasedSchemaProvider.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/ProtoClassBasedSchemaProvider.java index 7d6981efb40..a4b485e1634 100644 --- a/
Re: [PR] [HUDI-4732] Add support for confluent schema registry with proto [hudi]
yihua merged PR #11070: URL: https://github.com/apache/hudi/pull/11070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]
hudi-bot commented on PR #10900: URL: https://github.com/apache/hudi/pull/10900#issuecomment-2106417131 ## CI report: * 5fefa9e02c016d50b2f2b1fda2c9c89f2df7d620 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23641) * 39c476826c6dd8182d758c39e3cfbada40ec2b1b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23862) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7501] Use source profile for S3 and GCS sources [hudi]
hudi-bot commented on PR #10861: URL: https://github.com/apache/hudi/pull/10861#issuecomment-2106417105 ## CI report: * 896491233f44039e8874d5a3080dd686fffd044e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22944) * 2c6eb9de69f80fbc5cbd83c8e2faa4ed93bf0980 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23861) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]
yihua commented on code in PR #10900: URL: https://github.com/apache/hudi/pull/10900#discussion_r1597744597 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/config/HoodieIncrSourceConfig.java: ## @@ -101,4 +101,11 @@ public class HoodieIncrSourceConfig extends HoodieConfig { .withAlternatives(DELTA_STREAMER_CONFIG_PREFIX + "source.hoodieincr.partition.extractor.class") .markAdvanced() .withDocumentation("PartitionValueExtractor class to extract partition fields from _hoodie_partition_path"); + + public static final ConfigProperty HOODIE_SPARK_DATASOURCE_OPTIONS = ConfigProperty + .key(STREAMER_CONFIG_PREFIX + "source.hoodieincr.data.datasource.options") + .noDefaultValue() + .markAdvanced() + .withDocumentation("A comma separate list of options that can be passed to the spark dataframe reader of a hudi table, " + + "eg: hoodie.metadata.enable=true,hoodie.enable.data.skipping=true"); Review Comment: We can keep the config in the `HoodieIncrSourceConfig` class since it applies to the incremental source only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]
yihua commented on code in PR #10900: URL: https://github.com/apache/hudi/pull/10900#discussion_r159777 ## hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestHoodieIncrSource.java: ## @@ -333,14 +334,47 @@ public void testHoodieIncrSourceWithPendingTableServices(HoodieTableType tableTy } } + @ParameterizedTest + @EnumSource(HoodieTableType.class) + public void testHoodieIncrSourceWithDataSourceOptions(HoodieTableType tableType) throws IOException { +this.tableType = tableType; +metaClient = getHoodieMetaClient(hadoopConf(), basePath()); +HoodieWriteConfig writeConfig = getConfigBuilder(basePath(), metaClient) + .withArchivalConfig(HoodieArchivalConfig.newBuilder().archiveCommitsWith(10, 12).build()) + .withCleanConfig(HoodieCleanConfig.newBuilder().retainCommits(9).build()) +.withCompactionConfig( +HoodieCompactionConfig.newBuilder() +.withScheduleInlineCompaction(true) +.withMaxNumDeltaCommitsBeforeCompaction(1) +.build()) +.withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(true) +.withMetadataIndexColumnStats(true) +.withColumnStatsIndexForColumns("_hoodie_commit_time") +.build()) +.build(); + +TypedProperties extraProps = new TypedProperties(); + extraProps.setProperty(HoodieIncrSourceConfig.HOODIE_SPARK_DATASOURCE_OPTIONS.key(), "hoodie.metadata.enable=true,hoodie.enable.data.skipping=true"); Review Comment: I think it might be hard to check the Spark reader contains the passed configs in the tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7501] Use source profile for S3 and GCS sources [hudi]
hudi-bot commented on PR #10861: URL: https://github.com/apache/hudi/pull/10861#issuecomment-2106414947 ## CI report: * 896491233f44039e8874d5a3080dd686fffd044e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22944) * 2c6eb9de69f80fbc5cbd83c8e2faa4ed93bf0980 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]
hudi-bot commented on PR #10900: URL: https://github.com/apache/hudi/pull/10900#issuecomment-2106414969 ## CI report: * 5fefa9e02c016d50b2f2b1fda2c9c89f2df7d620 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23641) * 39c476826c6dd8182d758c39e3cfbada40ec2b1b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]
yihua commented on code in PR #10900: URL: https://github.com/apache/hudi/pull/10900#discussion_r1597742937 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java: ## @@ -189,10 +193,18 @@ public Pair>, String> fetchNextBatch(Option lastCkpt return Pair.of(Option.empty(), queryInfo.getEndInstant()); } +DataFrameReader reader = sparkSession.read().format("org.apache.hudi"); +String datasourceOpts = getStringWithAltKeys(props, HoodieIncrSourceConfig.HOODIE_SPARK_DATASOURCE_OPTIONS, true); +if (!StringUtils.isNullOrEmpty(datasourceOpts)) { + Map optionsMap = Arrays.stream(datasourceOpts.split(",")) + .map(option -> Pair.of(option.split("=")[0], option.split("=")[1])) + .collect(Collectors.toMap(Pair::getLeft, Pair::getRight)); Review Comment: Adjusted `ConfigUtils.toMap` so it can be resued. Unit tests are also added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]
yihua commented on code in PR #10900: URL: https://github.com/apache/hudi/pull/10900#discussion_r1597739617 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/config/HoodieIncrSourceConfig.java: ## @@ -101,4 +101,11 @@ public class HoodieIncrSourceConfig extends HoodieConfig { .withAlternatives(DELTA_STREAMER_CONFIG_PREFIX + "source.hoodieincr.partition.extractor.class") .markAdvanced() .withDocumentation("PartitionValueExtractor class to extract partition fields from _hoodie_partition_path"); + + public static final ConfigProperty HOODIE_SPARK_DATASOURCE_OPTIONS = ConfigProperty Review Comment: Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7748] Update ErrorTableAwareChainedTransformer.java [hudi]
hudi-bot commented on PR #11197: URL: https://github.com/apache/hudi/pull/11197#issuecomment-2106268279 ## CI report: * c6bec154954403a17aadd26bfab364ba675ce878 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23860) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7748] Update ErrorTableAwareChainedTransformer.java [hudi]
hudi-bot commented on PR #11197: URL: https://github.com/apache/hudi/pull/11197#issuecomment-2106237499 ## CI report: * c6bec154954403a17aadd26bfab364ba675ce878 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23860) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7748] Update ErrorTableAwareChainedTransformer.java [hudi]
hudi-bot commented on PR #11197: URL: https://github.com/apache/hudi/pull/11197#issuecomment-2106235175 ## CI report: * c6bec154954403a17aadd26bfab364ba675ce878 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7748) Add logs and drop _hoodie_is_deleted in Transformer
[ https://issues.apache.org/jira/browse/HUDI-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7748: - Labels: pull-request-available (was: ) > Add logs and drop _hoodie_is_deleted in Transformer > --- > > Key: HUDI-7748 > URL: https://issues.apache.org/jira/browse/HUDI-7748 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Sagar Sumit >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7748] Update ErrorTableAwareChainedTransformer.java [hudi]
codope opened a new pull request, #11197: URL: https://github.com/apache/hudi/pull/11197 ### Change Logs minor logs ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7748) Add logs and drop _hoodie_is_deleted in Transformer
Sagar Sumit created HUDI-7748: - Summary: Add logs and drop _hoodie_is_deleted in Transformer Key: HUDI-7748 URL: https://issues.apache.org/jira/browse/HUDI-7748 Project: Apache Hudi Issue Type: Improvement Reporter: Sagar Sumit -- This message was sent by Atlassian Jira (v8.20.10#820010)