[jira] [Assigned] (HUDI-7598) Remove duplicate methods in subclasses of HoodieSparkClientTestBase to enhance reusability

2024-05-12 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-7598:
---

Assignee: Vova Kolmakov

> Remove duplicate methods in subclasses of HoodieSparkClientTestBase to 
> enhance reusability
> --
>
> Key: HUDI-7598
> URL: https://issues.apache.org/jira/browse/HUDI-7598
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Vova Kolmakov
>Priority: Minor
>  Labels: starter
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10352#discussion_r1444909613



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6563]Supports flink lookup join [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #9228:
URL: https://github.com/apache/hudi/pull/9228#issuecomment-2106790589

   
   ## CI report:
   
   * 28351cba30dbd1b366c49c7b4218d8ce61920528 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23871)
 
   * 55ceb8d72c2eb0e23b7763102959258101a363d1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23872)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-7747) In MetaClient remove getBasePathV2() and return StoragePath from getBasePath()

2024-05-12 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-7747:
---

Assignee: Vova Kolmakov

> In MetaClient remove getBasePathV2() and return StoragePath from getBasePath()
> --
>
> Key: HUDI-7747
> URL: https://issues.apache.org/jira/browse/HUDI-7747
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jonathan Vexler
>Assignee: Vova Kolmakov
>Priority: Major
>
> In HoodieTableMetaClient remove getBasePathV2() and return StoragePath from 
> getBasePath().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6563]Supports flink lookup join [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #9228:
URL: https://github.com/apache/hudi/pull/9228#issuecomment-2106778064

   
   ## CI report:
   
   * 2e76abc1279b28780dfc17f06a96f841021f0fea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18918)
 
   * 28351cba30dbd1b366c49c7b4218d8ce61920528 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23871)
 
   * 55ceb8d72c2eb0e23b7763102959258101a363d1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6563]Supports flink lookup join [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #9228:
URL: https://github.com/apache/hudi/pull/9228#issuecomment-2106710045

   
   ## CI report:
   
   * 2e76abc1279b28780dfc17f06a96f841021f0fea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18918)
 
   * 28351cba30dbd1b366c49c7b4218d8ce61920528 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23871)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6563]Supports flink lookup join [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #9228:
URL: https://github.com/apache/hudi/pull/9228#issuecomment-2106703116

   
   ## CI report:
   
   * 2e76abc1279b28780dfc17f06a96f841021f0fea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18918)
 
   * 28351cba30dbd1b366c49c7b4218d8ce61920528 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Adding New Configuration To Support ZSTD Level [hudi]

2024-05-12 Thread via GitHub


ad1happy2go commented on issue #11196:
URL: https://github.com/apache/hudi/issues/11196#issuecomment-2106679631

   @Amar1404 With spark, Did you tried to give config along with write.df.
   - .option("parquet.compression.codec.zstd.level", "22") 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2106589168

   
   ## CI report:
   
   * 12038dbde068e26f733a7b1c9cc7217019c31f25 UNKNOWN
   * a2f928ca3c4ef9d103d48c56df4b647e961b7f56 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23869)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2106583567

   
   ## CI report:
   
   * 12038dbde068e26f733a7b1c9cc7217019c31f25 UNKNOWN
   * fbb9dd5d64652ddec923dc7948f77adc61e823b3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23717)
 
   * a2f928ca3c4ef9d103d48c56df4b647e961b7f56 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23869)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #11154:
URL: https://github.com/apache/hudi/pull/11154#issuecomment-2106578074

   
   ## CI report:
   
   * 12038dbde068e26f733a7b1c9cc7217019c31f25 UNKNOWN
   * fbb9dd5d64652ddec923dc7948f77adc61e823b3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23717)
 
   * a2f928ca3c4ef9d103d48c56df4b647e961b7f56 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7549] Reverting spurious log block deduction with LogRecordReader [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #10922:
URL: https://github.com/apache/hudi/pull/10922#issuecomment-2106577791

   
   ## CI report:
   
   * 1c36f92dbff0e9be085a409d28cb9403a0343781 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23866)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7743] Improve StoragePath usages (#11189)

2024-05-12 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 4b59b202f49 [HUDI-7743] Improve StoragePath usages (#11189)
4b59b202f49 is described below

commit 4b59b202f491780eeab8c67ce5f4b6506200c7b4
Author: Jon Vexler 
AuthorDate: Sun May 12 23:18:09 2024 -0400

[HUDI-7743] Improve StoragePath usages (#11189)

Co-authored-by: Jonathan Vexler <=>
Co-authored-by: Y Ethan Guo 
---
 .../hudi/cli/commands/ArchivedCommitsCommand.java  | 19 
 .../apache/hudi/cli/commands/RepairsCommand.java   | 11 -
 .../org/apache/hudi/cli/commands/TableCommand.java | 14 
 .../apache/hudi/cli/commands/TimelineCommand.java  |  4 ++--
 .../apache/hudi/cli/commands/TestTableCommand.java |  4 ++--
 .../cli/commands/TestUpgradeDowngradeCommand.java  |  4 ++--
 .../hudi/client/heartbeat/HeartbeatUtils.java  |  2 +-
 .../client/heartbeat/HoodieHeartbeatClient.java|  4 ++--
 .../utils/LegacyArchivedMetaEntryReader.java   |  2 +-
 .../index/bucket/ConsistentBucketIndexUtils.java   |  8 +++
 .../org/apache/hudi/io/HoodieKeyLookupHandle.java  |  3 +--
 .../java/org/apache/hudi/io/HoodieReadHandle.java  |  5 ++---
 .../java/org/apache/hudi/io/HoodieWriteHandle.java |  2 +-
 .../metadata/HoodieBackedTableMetadataWriter.java  |  3 +--
 .../java/org/apache/hudi/table/HoodieTable.java|  4 ++--
 .../table/action/commit/HoodieMergeHelper.java |  3 +--
 .../table/action/index/RunIndexActionExecutor.java |  3 +--
 .../BaseHoodieFunctionalIndexClient.java   |  3 +--
 .../rollback/ListingBasedRollbackStrategy.java |  6 ++---
 .../hudi/table/upgrade/UpgradeDowngrade.java   |  6 ++---
 .../table/upgrade/ZeroToOneUpgradeHandler.java |  2 +-
 .../apache/hudi/io/FlinkWriteHandleFactory.java|  4 +++-
 .../io/storage/row/HoodieRowDataCreateHandle.java  |  7 --
 .../row/HoodieRowDataFileWriterFactory.java|  4 ++--
 .../org/apache/hudi/table/HoodieJavaTable.java |  5 ++---
 .../client/utils/SparkMetadataWriterUtils.java |  5 +++--
 .../index/bloom/HoodieFileProbingFunction.java |  3 +--
 .../org/apache/hudi/table/HoodieSparkTable.java|  5 ++---
 .../functional/TestHoodieBackedMetadata.java   |  4 ++--
 .../TestCopyOnWriteRollbackActionExecutor.java |  2 +-
 .../TestHoodieSparkMergeOnReadTableRollback.java   |  4 ++--
 .../hudi/table/upgrade/TestUpgradeDowngrade.java   | 16 ++---
 .../common/config/HoodieFunctionalIndexConfig.java |  2 +-
 .../java/org/apache/hudi/common/fs/FSUtils.java|  2 +-
 .../common/heartbeat/HoodieHeartbeatUtils.java |  2 +-
 .../hudi/common/table/HoodieTableConfig.java   |  8 +++
 .../hudi/common/table/HoodieTableMetaClient.java   |  6 ++---
 .../table/timeline/HoodieActiveTimeline.java   |  4 ++--
 .../hudi/common/table/timeline/LSMTimeline.java|  2 +-
 .../view/HoodieTablePreCommitFileSystemView.java   |  2 +-
 .../org/apache/hudi/common/util/ConfigUtils.java   |  2 +-
 .../index/secondary/SecondaryIndexManager.java |  7 +++---
 .../io/FileBasedInternalSchemaStorageManager.java  |  5 ++---
 .../metadata/FileSystemBackedTableMetadata.java|  2 +-
 .../hudi/metadata/HoodieBackedTableMetadata.java   |  4 ++--
 .../hudi/sink/bootstrap/BootstrapOperator.java |  3 +--
 .../java/org/apache/hudi/util/StreamerUtil.java|  2 +-
 .../hudi/sink/bucket/ITTestBucketStreamWrite.java  |  2 +-
 .../apache/hudi/table/format/TestInputFormat.java  |  2 +-
 .../common/config/DFSPropertiesConfiguration.java  |  2 +-
 .../common/bootstrap/index/TestBootstrapIndex.java |  3 +--
 .../fs/TestFSUtilsWithRetryWrapperEnable.java  |  8 +++
 .../hudi/common/table/TestHoodieTableConfig.java   | 26 +++---
 .../common/table/TestHoodieTableMetaClient.java|  2 +-
 .../table/view/TestHoodieTableFileSystemView.java  |  6 ++---
 .../table/view/TestIncrementalFSViewSync.java  |  2 +-
 .../hadoop/HoodieCopyOnWriteTableInputFormat.java  |  4 ++--
 .../hudi/hadoop/HoodieHFileRecordReader.java   |  3 ++-
 .../hudi/hadoop/HoodieROTablePathFilter.java   |  8 ---
 .../apache/hudi/hadoop/SchemaEvolutionContext.java |  5 +++--
 .../HoodieMergeOnReadTableInputFormat.java |  3 +--
 .../hudi/hadoop/utils/HoodieInputFormatUtils.java  |  8 ---
 .../utils/HoodieRealtimeRecordReaderUtils.java |  4 ++--
 .../reader/DFSHoodieDatasetInputReader.java|  3 +--
 .../scala/org/apache/hudi/HoodieBaseRelation.scala | 11 -
 .../org/apache/spark/sql/hudi/DedupeSparkJob.scala | 15 +++--
 .../procedures/ExportInstantsProcedure.scala   |  3 ++-
 .../RepairMigratePartitionMetaProcedure.scala  |  2 +-
 .../RepairOverwriteHoodiePropsProcedure.scala  |  5 +
 .../apache/spark/sql/hudi/common/TestSqlConf.scala |  6 ++---
 .../TestUpgradeOrDowngrad

Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]

2024-05-12 Thread via GitHub


yihua merged PR #11189:
URL: https://github.com/apache/hudi/pull/11189


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6563) Supports flink lookup join

2024-05-12 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6563:
-
Reviewers: Danny Chen

>  Supports flink lookup join
> ---
>
> Key: HUDI-6563
> URL: https://issues.apache.org/jira/browse/HUDI-6563
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: flink
>Reporter: waywtdcc
>Priority: Major
>  Labels: pull-request-available
>
>  Supports flink lookup join
>  
> {code:java}
> CREATE TABLE `datagen_source`(
>                                id  int,
>                                name STRING,
>                                proctime as PROCTIME()
> ) WITH (
>       'connector' = 'datagen',
>       'rows-per-second'='1',
>       'number-of-rows' = '2',
>      'fields.id.kind'='sequence',
>      'fields.id.start'='1',
>      'fields.id.end'='2'
>  );select o.id,o.name,b.id as id2
> from datagen_source AS o
>  join hudi_table/*+ OPTIONS('lookup.join.cache.ttl'= '2 day') */   FOR 
> SYSTEM_TIME AS OF o.proctime AS b on o.id = b.id; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6563) Supports flink lookup join

2024-05-12 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6563:
-
Status: In Progress  (was: Open)

>  Supports flink lookup join
> ---
>
> Key: HUDI-6563
> URL: https://issues.apache.org/jira/browse/HUDI-6563
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: flink
>Reporter: waywtdcc
>Priority: Major
>  Labels: pull-request-available
>
>  Supports flink lookup join
>  
> {code:java}
> CREATE TABLE `datagen_source`(
>                                id  int,
>                                name STRING,
>                                proctime as PROCTIME()
> ) WITH (
>       'connector' = 'datagen',
>       'rows-per-second'='1',
>       'number-of-rows' = '2',
>      'fields.id.kind'='sequence',
>      'fields.id.start'='1',
>      'fields.id.end'='2'
>  );select o.id,o.name,b.id as id2
> from datagen_source AS o
>  join hudi_table/*+ OPTIONS('lookup.join.cache.ttl'= '2 day') */   FOR 
> SYSTEM_TIME AS OF o.proctime AS b on o.id = b.id; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6563) Supports flink lookup join

2024-05-12 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6563:
-
Sprint: Sprint 2023-04-26

>  Supports flink lookup join
> ---
>
> Key: HUDI-6563
> URL: https://issues.apache.org/jira/browse/HUDI-6563
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: flink
>Reporter: waywtdcc
>Priority: Major
>  Labels: pull-request-available
>
>  Supports flink lookup join
>  
> {code:java}
> CREATE TABLE `datagen_source`(
>                                id  int,
>                                name STRING,
>                                proctime as PROCTIME()
> ) WITH (
>       'connector' = 'datagen',
>       'rows-per-second'='1',
>       'number-of-rows' = '2',
>      'fields.id.kind'='sequence',
>      'fields.id.start'='1',
>      'fields.id.end'='2'
>  );select o.id,o.name,b.id as id2
> from datagen_source AS o
>  join hudi_table/*+ OPTIONS('lookup.join.cache.ttl'= '2 day') */   FOR 
> SYSTEM_TIME AS OF o.proctime AS b on o.id = b.id; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6563]Supports flink lookup join [hudi]

2024-05-12 Thread via GitHub


danny0405 commented on PR #9228:
URL: https://github.com/apache/hudi/pull/9228#issuecomment-2106552715

   @waywtdcc Hi, can you rebase with the latest master and I will take a look 
of this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7535] Add metrics for sourceParallelism and Refresh profile in S3/GCS [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #10918:
URL: https://github.com/apache/hudi/pull/10918#issuecomment-2106545016

   
   ## CI report:
   
   * dba597f6e2b2c8dccad7b2768bffb27a623a1acf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23868)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7535] Add metrics for sourceParallelism and Refresh profile in S3/GCS [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #10918:
URL: https://github.com/apache/hudi/pull/10918#issuecomment-2106539553

   
   ## CI report:
   
   * 95436a55a29960c5bdeb8901f83c90d4712aa40b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23007)
 
   * dba597f6e2b2c8dccad7b2768bffb27a623a1acf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #11189:
URL: https://github.com/apache/hudi/pull/11189#issuecomment-2106534204

   
   ## CI report:
   
   * 4095b60ef4c272c8046aeb9e2a1d13db2d1c0a9d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23865)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]xxx.parquet is not a Parquet file [hudi]

2024-05-12 Thread via GitHub


MrAladdin commented on issue #11178:
URL: https://github.com/apache/hudi/issues/11178#issuecomment-2106518950

   @ad1happy2go I need your help to answer the question I replied to you above, 
thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]

2024-05-12 Thread via GitHub


yihua merged PR #10900:
URL: https://github.com/apache/hudi/pull/10900


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (be0a6604b12 -> ce08875a0d7)

2024-05-12 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from be0a6604b12 [HUDI-7501] Use source profile for S3 and GCS sources 
(#10861)
 add ce08875a0d7 [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used 
in HoodieIncrSource (#10900)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/common/util/ConfigUtils.java   | 17 +-
 .../apache/hudi/common/util/TestConfigUtils.java   | 64 --
 .../utilities/config/HoodieIncrSourceConfig.java   |  8 +++
 .../hudi/utilities/sources/HoodieIncrSource.java   | 16 +-
 .../utilities/sources/TestHoodieIncrSource.java| 40 +-
 5 files changed, 121 insertions(+), 24 deletions(-)



Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #10900:
URL: https://github.com/apache/hudi/pull/10900#issuecomment-2106500988

   
   ## CI report:
   
   * b91da909a18c11702b917910846356e98aeaecf2 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23864)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7501] Use source profile for S3 and GCS sources (#10861)

2024-05-12 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new be0a6604b12 [HUDI-7501] Use source profile for S3 and GCS sources 
(#10861)
be0a6604b12 is described below

commit be0a6604b12abe6ef74a7b2c83f24de6af19e3d7
Author: Vinish Reddy 
AuthorDate: Mon May 13 07:23:31 2024 +0530

[HUDI-7501] Use source profile for S3 and GCS sources (#10861)

Co-authored-by: Y Ethan Guo 
---
 .../org/apache/hudi/utilities/UtilHelpers.java |  53 -
 .../sources/GcsEventsHoodieIncrSource.java |  61 --
 .../hudi/utilities/sources/HoodieIncrSource.java   |   6 +-
 .../apache/hudi/utilities/sources/RowSource.java   |   8 +-
 .../sources/S3EventsHoodieIncrSource.java  |  87 +++---
 .../sources/helpers/CloudDataFetcher.java  |  79 -
 .../helpers/CloudObjectsSelectorCommon.java|  70 
 .../helpers/gcs/GcsObjectMetadataFetcher.java  |  86 --
 .../sources/TestGcsEventsHoodieIncrSource.java |  83 ++
 .../utilities/sources/TestHoodieIncrSource.java|   3 +-
 .../sources/TestS3EventsHoodieIncrSource.java  | 125 -
 .../debezium/TestAbstractDebeziumSource.java   |   3 +-
 .../helpers/TestCloudObjectsSelectorCommon.java|  42 ---
 13 files changed, 383 insertions(+), 323 deletions(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
index 124abeb059f..d0acffe5d17 100644
--- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
+++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
@@ -40,6 +40,7 @@ import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.ReflectionUtils;
 import org.apache.hudi.common.util.StringUtils;
 import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.config.HoodieCompactionConfig;
 import org.apache.hudi.config.HoodieIndexConfig;
 import org.apache.hudi.config.HoodieLockConfig;
@@ -140,42 +141,30 @@ public class UtilHelpers {
   }
 
   public static Source createSource(String sourceClass, TypedProperties cfg, 
JavaSparkContext jssc,
-  SparkSession sparkSession, SchemaProvider schemaProvider,
-  HoodieIngestionMetrics metrics) throws IOException {
-try {
+SparkSession sparkSession, 
HoodieIngestionMetrics metrics, StreamContext streamContext) throws IOException 
{
+// All possible constructors.
+Class[] constructorArgsStreamContextMetrics = new Class[] 
{TypedProperties.class, JavaSparkContext.class, SparkSession.class, 
HoodieIngestionMetrics.class, StreamContext.class};
+Class[] constructorArgsStreamContext = new Class[] 
{TypedProperties.class, JavaSparkContext.class, SparkSession.class, 
StreamContext.class};
+Class[] constructorArgsMetrics = new Class[] {TypedProperties.class, 
JavaSparkContext.class, SparkSession.class, SchemaProvider.class, 
HoodieIngestionMetrics.class};
+Class[] constructorArgs = new Class[] {TypedProperties.class, 
JavaSparkContext.class, SparkSession.class, SchemaProvider.class};
+// List of constructor and their respective arguments.
+List[], Object[]>> sourceConstructorAndArgs = new 
ArrayList<>();
+sourceConstructorAndArgs.add(Pair.of(constructorArgsStreamContextMetrics, 
new Object[] {cfg, jssc, sparkSession, metrics, streamContext}));
+sourceConstructorAndArgs.add(Pair.of(constructorArgsStreamContext, new 
Object[] {cfg, jssc, sparkSession, streamContext}));
+sourceConstructorAndArgs.add(Pair.of(constructorArgsMetrics, new Object[] 
{cfg, jssc, sparkSession, streamContext.getSchemaProvider(), metrics}));
+sourceConstructorAndArgs.add(Pair.of(constructorArgs, new Object[] {cfg, 
jssc, sparkSession, streamContext.getSchemaProvider()}));
+
+HoodieException sourceClassLoadException = null;
+for (Pair[], Object[]> constructor : sourceConstructorAndArgs) {
   try {
-return (Source) ReflectionUtils.loadClass(sourceClass,
-new Class[] {TypedProperties.class, JavaSparkContext.class,
-SparkSession.class, SchemaProvider.class,
-HoodieIngestionMetrics.class},
-cfg, jssc, sparkSession, schemaProvider, metrics);
+return (Source) ReflectionUtils.loadClass(sourceClass, 
constructor.getLeft(), constructor.getRight());
   } catch (HoodieException e) {
-return (Source) ReflectionUtils.loadClass(sourceClass,
-new Class[] {TypedProperties.class, JavaSparkContext.class,
-SparkSession.class, SchemaProvider.class},
-cfg, jssc, sparkSession, schemaProvider);
+sourceClassLoadException

Re: [PR] [HUDI-7501] Use source profile for S3 and GCS sources [hudi]

2024-05-12 Thread via GitHub


yihua merged PR #10861:
URL: https://github.com/apache/hudi/pull/10861


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7549] Reverting spurious log block deduction with LogRecordReader [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #10922:
URL: https://github.com/apache/hudi/pull/10922#issuecomment-2106494067

   
   ## CI report:
   
   * 41e7049a782561d5f8f9a21af7ba4c1021b3fb14 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23810)
 
   * 1c36f92dbff0e9be085a409d28cb9403a0343781 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23866)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7549] Reverting spurious log block deduction with LogRecordReader [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #10922:
URL: https://github.com/apache/hudi/pull/10922#issuecomment-2106487245

   
   ## CI report:
   
   * 41e7049a782561d5f8f9a21af7ba4c1021b3fb14 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23810)
 
   * 1c36f92dbff0e9be085a409d28cb9403a0343781 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7501] Use source profile for S3 and GCS sources [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #10861:
URL: https://github.com/apache/hudi/pull/10861#issuecomment-2106487144

   
   ## CI report:
   
   * 2c6eb9de69f80fbc5cbd83c8e2faa4ed93bf0980 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23861)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Adding New Configuration To Support ZSTD Level [hudi]

2024-05-12 Thread via GitHub


danny0405 commented on issue #11196:
URL: https://github.com/apache/hudi/issues/11196#issuecomment-2106456256

In Flink, you can use `parquet.` prefix for any property that you wanna 
customize with the parquet writer, not sure whether Spark has the similiar 
function.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #11189:
URL: https://github.com/apache/hudi/pull/11189#issuecomment-2106449606

   
   ## CI report:
   
   * 511e55b8d042e8db674b48b203f3bf9b8f52ad6e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23858)
 
   * 4095b60ef4c272c8046aeb9e2a1d13db2d1c0a9d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23865)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #11189:
URL: https://github.com/apache/hudi/pull/11189#issuecomment-210660

   
   ## CI report:
   
   * 511e55b8d042e8db674b48b203f3bf9b8f52ad6e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23858)
 
   * 4095b60ef4c272c8046aeb9e2a1d13db2d1c0a9d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #10900:
URL: https://github.com/apache/hudi/pull/10900#issuecomment-2106444120

   
   ## CI report:
   
   * 39c476826c6dd8182d758c39e3cfbada40ec2b1b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23862)
 
   * b91da909a18c11702b917910846356e98aeaecf2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23864)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-12 Thread via GitHub


yihua commented on code in PR #11192:
URL: https://github.com/apache/hudi/pull/11192#discussion_r1597761032


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala:
##
@@ -853,7 +852,7 @@ object HoodieBaseRelation extends SparkAdapterSupport {
   val hoodieConfig = new HoodieConfig()
   hoodieConfig.setValue(USE_NATIVE_HFILE_READER,
 options.getOrElse(USE_NATIVE_HFILE_READER.key(), 
USE_NATIVE_HFILE_READER.defaultValue().toString))
-  val reader = 
HoodieFileReaderFactory.getReaderFactory(HoodieRecordType.AVRO)
+  val reader = (new 
HoodieSparkIOFactory).getReaderFactory(HoodieRecordType.AVRO)

Review Comment:
   Based on the discussion, it is safer to hardcode the class for now as there 
are gaps in passing the storage configuration outside the `hudi-common` module.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-12 Thread via GitHub


yihua commented on code in PR #11192:
URL: https://github.com/apache/hudi/pull/11192#discussion_r1597760489


##
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileWriterFactory.java:
##
@@ -43,39 +40,18 @@
 
 public class HoodieFileWriterFactory {
 
-  private static HoodieFileWriterFactory 
getWriterFactory(HoodieRecord.HoodieRecordType recordType) {

Review Comment:
   `HoodieFileReaderFactory` and `HoodieFileWriterFactory` contain such methods 
that throw `UnsupportedOperationException`.  Instead, such methods should be 
abstract and the factory classes should also be made abstract or interface.
   ```
   protected HoodieFileReader newParquetFileReader(StorageConfiguration 
conf, StoragePath path) {
   throw new UnsupportedOperationException();
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #10900:
URL: https://github.com/apache/hudi/pull/10900#issuecomment-2106440061

   
   ## CI report:
   
   * 39c476826c6dd8182d758c39e3cfbada40ec2b1b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23862)
 
   * b91da909a18c11702b917910846356e98aeaecf2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]

2024-05-12 Thread via GitHub


yihua commented on code in PR #11189:
URL: https://github.com/apache/hudi/pull/11189#discussion_r1597754130


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/IncrementalTimelineSyncFileSystemView.java:
##
@@ -269,7 +269,7 @@ private void updatePartitionWriteFileGroups(Map> p
 LOG.info("Syncing partition (" + partition + ") of instant (" + 
instant + ")");
 List pathInfoList = entry.getValue().stream()
 .map(p -> new StoragePathInfo(
-new StoragePath(String.format("%s/%s", 
metaClient.getBasePath(), p.getPath())),
+new StoragePath(metaClient.getBasePathV2(), p.getPath()),

Review Comment:
   If p.getPath() has a slash as the prefix, there will be a behavior change.



##
hudi-common/src/main/java/org/apache/hudi/common/table/view/IncrementalTimelineSyncFileSystemView.java:
##
@@ -269,7 +269,7 @@ private void updatePartitionWriteFileGroups(Map> p
 LOG.info("Syncing partition (" + partition + ") of instant (" + 
instant + ")");
 List pathInfoList = entry.getValue().stream()
 .map(p -> new StoragePathInfo(
-new StoragePath(String.format("%s/%s", 
metaClient.getBasePath(), p.getPath())),
+new StoragePath(metaClient.getBasePathV2(), p.getPath()),

Review Comment:
   ```suggestion
   new StoragePath(String.format("%s/%s", 
metaClient.getBasePath(), p.getPath())),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]

2024-05-12 Thread via GitHub


yihua commented on code in PR #11189:
URL: https://github.com/apache/hudi/pull/11189#discussion_r1597753236


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java:
##
@@ -2040,7 +2040,7 @@ public void testEagerRollbackinMDT() throws IOException {
 
 // collect all commit meta files from metadata table.
 List metaFiles = metaClient.getStorage()
-.listDirectEntries(new StoragePath(metaClient.getMetaPath() + 
"/metadata/.hoodie"));
+.listDirectEntries(new StoragePath(metaClient.getMetaPath(), 
"/metadata/.hoodie"));

Review Comment:
   ```suggestion
   .listDirectEntries(new StoragePath(metaClient.getMetaPath(), 
"metadata/.hoodie"));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-4732] Add support for confluent schema registry with proto (#11070)

2024-05-12 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new aa5bb0dda34 [HUDI-4732] Add support for confluent schema registry with 
proto (#11070)
aa5bb0dda34 is described below

commit aa5bb0dda34bf643d61e96f51a456cf876c0a0eb
Author: Tim Brown 
AuthorDate: Sun May 12 19:59:45 2024 -0400

[HUDI-4732] Add support for confluent schema registry with proto (#11070)

Co-authored-by: Y Ethan Guo 
---
 hudi-utilities/pom.xml |  7 +--
 .../hudi/utilities/config/KafkaSourceConfig.java   |  8 +++
 .../deser/KafkaAvroSchemaDeserializer.java |  4 +-
 .../schema/ProtoClassBasedSchemaProvider.java  | 10 +---
 .../ProtoSchemaToAvroSchemaConverter.java  | 43 +++
 .../hudi/utilities/sources/ProtoKafkaSource.java   | 40 ++
 .../sources/helpers/ProtoConversionUtil.java   | 56 +--
 .../deser/TestKafkaAvroSchemaDeserializer.java |  8 +--
 .../TestProtoSchemaToAvroSchemaConverter.java  | 50 +
 .../utilities/sources/TestProtoKafkaSource.java| 63 --
 packaging/hudi-utilities-bundle/pom.xml|  1 +
 packaging/hudi-utilities-slim-bundle/pom.xml   |  1 +
 pom.xml| 34 +++-
 13 files changed, 288 insertions(+), 37 deletions(-)

diff --git a/hudi-utilities/pom.xml b/hudi-utilities/pom.xml
index 3a7a9d6a712..47c172b7791 100644
--- a/hudi-utilities/pom.xml
+++ b/hudi-utilities/pom.xml
@@ -361,12 +361,10 @@
 
   io.confluent
   kafka-avro-serializer
-  ${confluent.version}
 
 
   io.confluent
   common-config
-  ${confluent.version}
 
 
   io.confluent
@@ -376,7 +374,10 @@
 
   io.confluent
   kafka-schema-registry-client
-  ${confluent.version}
+
+
+  io.confluent
+  kafka-protobuf-serializer
 
 
 
diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/config/KafkaSourceConfig.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/config/KafkaSourceConfig.java
index 024712f8cdd..6215e99d665 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/config/KafkaSourceConfig.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/config/KafkaSourceConfig.java
@@ -24,6 +24,8 @@ import org.apache.hudi.common.config.ConfigGroups;
 import org.apache.hudi.common.config.ConfigProperty;
 import org.apache.hudi.common.config.HoodieConfig;
 
+import org.apache.kafka.common.serialization.ByteArrayDeserializer;
+
 import javax.annotation.concurrent.Immutable;
 
 import static 
org.apache.hudi.common.util.ConfigUtils.DELTA_STREAMER_CONFIG_PREFIX;
@@ -120,6 +122,12 @@ public class KafkaSourceConfig extends HoodieConfig {
   .markAdvanced()
   .withDocumentation("Kafka consumer strategy for reading data.");
 
+  public static final ConfigProperty 
KAFKA_PROTO_VALUE_DESERIALIZER_CLASS = ConfigProperty
+  .key(PREFIX + "proto.value.deserializer.class")
+  .defaultValue(ByteArrayDeserializer.class.getName())
+  .sinceVersion("0.15.0")
+  .withDocumentation("Kafka Proto Payload Deserializer Class");
+
   /**
* Kafka reset offset strategies.
*/
diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deser/KafkaAvroSchemaDeserializer.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deser/KafkaAvroSchemaDeserializer.java
index 246be5f8ec6..4673eceed15 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deser/KafkaAvroSchemaDeserializer.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deser/KafkaAvroSchemaDeserializer.java
@@ -60,7 +60,6 @@ public class KafkaAvroSchemaDeserializer extends 
KafkaAvroDeserializer {
   /**
* We need to inject sourceSchema instead of reader schema during 
deserialization or later stages of the pipeline.
*
-   * @param includeSchemaAndVersion
* @param topic
* @param isKey
* @param payload
@@ -70,13 +69,12 @@ public class KafkaAvroSchemaDeserializer extends 
KafkaAvroDeserializer {
*/
   @Override
   protected Object deserialize(
-  boolean includeSchemaAndVersion,
   String topic,
   Boolean isKey,
   byte[] payload,
   Schema readerSchema)
   throws SerializationException {
-return super.deserialize(includeSchemaAndVersion, topic, isKey, payload, 
sourceSchema);
+return super.deserialize(topic, isKey, payload, sourceSchema);
   }
 
   protected TypedProperties getConvertToTypedProperties(Map 
configs) {
diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/ProtoClassBasedSchemaProvider.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/ProtoClassBasedSchemaProvider.java
index 7d6981efb40..a4b485e1634 100644
--- 
a/

Re: [PR] [HUDI-4732] Add support for confluent schema registry with proto [hudi]

2024-05-12 Thread via GitHub


yihua merged PR #11070:
URL: https://github.com/apache/hudi/pull/11070


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #10900:
URL: https://github.com/apache/hudi/pull/10900#issuecomment-2106417131

   
   ## CI report:
   
   * 5fefa9e02c016d50b2f2b1fda2c9c89f2df7d620 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23641)
 
   * 39c476826c6dd8182d758c39e3cfbada40ec2b1b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23862)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7501] Use source profile for S3 and GCS sources [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #10861:
URL: https://github.com/apache/hudi/pull/10861#issuecomment-2106417105

   
   ## CI report:
   
   * 896491233f44039e8874d5a3080dd686fffd044e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22944)
 
   * 2c6eb9de69f80fbc5cbd83c8e2faa4ed93bf0980 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23861)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]

2024-05-12 Thread via GitHub


yihua commented on code in PR #10900:
URL: https://github.com/apache/hudi/pull/10900#discussion_r1597744597


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/config/HoodieIncrSourceConfig.java:
##
@@ -101,4 +101,11 @@ public class HoodieIncrSourceConfig extends HoodieConfig {
   .withAlternatives(DELTA_STREAMER_CONFIG_PREFIX + 
"source.hoodieincr.partition.extractor.class")
   .markAdvanced()
   .withDocumentation("PartitionValueExtractor class to extract partition 
fields from _hoodie_partition_path");
+
+  public static final ConfigProperty HOODIE_SPARK_DATASOURCE_OPTIONS = 
ConfigProperty
+  .key(STREAMER_CONFIG_PREFIX + 
"source.hoodieincr.data.datasource.options")
+  .noDefaultValue()
+  .markAdvanced()
+  .withDocumentation("A comma separate list of options that can be passed 
to the spark dataframe reader of a hudi table, "
+  + "eg: 
hoodie.metadata.enable=true,hoodie.enable.data.skipping=true");

Review Comment:
   We can keep the config in the `HoodieIncrSourceConfig` class since it 
applies to the incremental source only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]

2024-05-12 Thread via GitHub


yihua commented on code in PR #10900:
URL: https://github.com/apache/hudi/pull/10900#discussion_r159777


##
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestHoodieIncrSource.java:
##
@@ -333,14 +334,47 @@ public void 
testHoodieIncrSourceWithPendingTableServices(HoodieTableType tableTy
 }
   }
 
+  @ParameterizedTest
+  @EnumSource(HoodieTableType.class)
+  public void testHoodieIncrSourceWithDataSourceOptions(HoodieTableType 
tableType) throws IOException {
+this.tableType = tableType;
+metaClient = getHoodieMetaClient(hadoopConf(), basePath());
+HoodieWriteConfig writeConfig = getConfigBuilder(basePath(), metaClient)
+
.withArchivalConfig(HoodieArchivalConfig.newBuilder().archiveCommitsWith(10, 
12).build())
+
.withCleanConfig(HoodieCleanConfig.newBuilder().retainCommits(9).build())
+.withCompactionConfig(
+HoodieCompactionConfig.newBuilder()
+.withScheduleInlineCompaction(true)
+.withMaxNumDeltaCommitsBeforeCompaction(1)
+.build())
+.withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(true)
+.withMetadataIndexColumnStats(true)
+.withColumnStatsIndexForColumns("_hoodie_commit_time")
+.build())
+.build();
+
+TypedProperties extraProps = new TypedProperties();
+
extraProps.setProperty(HoodieIncrSourceConfig.HOODIE_SPARK_DATASOURCE_OPTIONS.key(),
 "hoodie.metadata.enable=true,hoodie.enable.data.skipping=true");

Review Comment:
   I think it might be hard to check the Spark reader contains the passed 
configs in the tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7501] Use source profile for S3 and GCS sources [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #10861:
URL: https://github.com/apache/hudi/pull/10861#issuecomment-2106414947

   
   ## CI report:
   
   * 896491233f44039e8874d5a3080dd686fffd044e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22944)
 
   * 2c6eb9de69f80fbc5cbd83c8e2faa4ed93bf0980 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #10900:
URL: https://github.com/apache/hudi/pull/10900#issuecomment-2106414969

   
   ## CI report:
   
   * 5fefa9e02c016d50b2f2b1fda2c9c89f2df7d620 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23641)
 
   * 39c476826c6dd8182d758c39e3cfbada40ec2b1b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]

2024-05-12 Thread via GitHub


yihua commented on code in PR #10900:
URL: https://github.com/apache/hudi/pull/10900#discussion_r1597742937


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java:
##
@@ -189,10 +193,18 @@ public Pair>, String> 
fetchNextBatch(Option lastCkpt
   return Pair.of(Option.empty(), queryInfo.getEndInstant());
 }
 
+DataFrameReader reader = sparkSession.read().format("org.apache.hudi");
+String datasourceOpts = getStringWithAltKeys(props, 
HoodieIncrSourceConfig.HOODIE_SPARK_DATASOURCE_OPTIONS, true);
+if (!StringUtils.isNullOrEmpty(datasourceOpts)) {
+  Map optionsMap = Arrays.stream(datasourceOpts.split(","))
+  .map(option -> Pair.of(option.split("=")[0], option.split("=")[1]))
+  .collect(Collectors.toMap(Pair::getLeft, Pair::getRight));

Review Comment:
   Adjusted `ConfigUtils.toMap` so it can be resued.  Unit tests are also added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]

2024-05-12 Thread via GitHub


yihua commented on code in PR #10900:
URL: https://github.com/apache/hudi/pull/10900#discussion_r1597739617


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/config/HoodieIncrSourceConfig.java:
##
@@ -101,4 +101,11 @@ public class HoodieIncrSourceConfig extends HoodieConfig {
   .withAlternatives(DELTA_STREAMER_CONFIG_PREFIX + 
"source.hoodieincr.partition.extractor.class")
   .markAdvanced()
   .withDocumentation("PartitionValueExtractor class to extract partition 
fields from _hoodie_partition_path");
+
+  public static final ConfigProperty HOODIE_SPARK_DATASOURCE_OPTIONS = 
ConfigProperty

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7748] Update ErrorTableAwareChainedTransformer.java [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #11197:
URL: https://github.com/apache/hudi/pull/11197#issuecomment-2106268279

   
   ## CI report:
   
   * c6bec154954403a17aadd26bfab364ba675ce878 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23860)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7748] Update ErrorTableAwareChainedTransformer.java [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #11197:
URL: https://github.com/apache/hudi/pull/11197#issuecomment-2106237499

   
   ## CI report:
   
   * c6bec154954403a17aadd26bfab364ba675ce878 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23860)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7748] Update ErrorTableAwareChainedTransformer.java [hudi]

2024-05-12 Thread via GitHub


hudi-bot commented on PR #11197:
URL: https://github.com/apache/hudi/pull/11197#issuecomment-2106235175

   
   ## CI report:
   
   * c6bec154954403a17aadd26bfab364ba675ce878 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7748) Add logs and drop _hoodie_is_deleted in Transformer

2024-05-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7748:
-
Labels: pull-request-available  (was: )

> Add logs and drop _hoodie_is_deleted in Transformer
> ---
>
> Key: HUDI-7748
> URL: https://issues.apache.org/jira/browse/HUDI-7748
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7748] Update ErrorTableAwareChainedTransformer.java [hudi]

2024-05-12 Thread via GitHub


codope opened a new pull request, #11197:
URL: https://github.com/apache/hudi/pull/11197

   ### Change Logs
   
   minor logs
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7748) Add logs and drop _hoodie_is_deleted in Transformer

2024-05-12 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-7748:
-

 Summary: Add logs and drop _hoodie_is_deleted in Transformer
 Key: HUDI-7748
 URL: https://issues.apache.org/jira/browse/HUDI-7748
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Sagar Sumit






--
This message was sent by Atlassian Jira
(v8.20.10#820010)