[jira] [Updated] (HUDI-3545) Make HoodieAvroWriteSupport class configurable

2024-01-31 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler updated HUDI-3545:
--
Status: In Progress  (was: Open)

> Make HoodieAvroWriteSupport class configurable
> --
>
> Key: HUDI-3545
> URL: https://issues.apache.org/jira/browse/HUDI-3545
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Surya Prasanna Yalla
>Assignee: Surya Prasanna Yalla
>Priority: Major
>  Labels: pull-request-available
>
> Make HoodieAvroWriteSupport class configurable, that way this class can be 
> overridden by custom write support classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-3545) Make HoodieAvroWriteSupport class configurable

2024-01-31 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler closed HUDI-3545.
-
Resolution: Fixed

> Make HoodieAvroWriteSupport class configurable
> --
>
> Key: HUDI-3545
> URL: https://issues.apache.org/jira/browse/HUDI-3545
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Surya Prasanna Yalla
>Assignee: Surya Prasanna Yalla
>Priority: Major
>  Labels: pull-request-available
>
> Make HoodieAvroWriteSupport class configurable, that way this class can be 
> overridden by custom write support classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [Hudi-6868] Support extracting passwords from credential store for Hive Sync [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10577:
URL: https://github.com/apache/hudi/pull/10577#issuecomment-1919546729

   
   ## CI report:
   
   * 40cbc324442334d3e1313f995c8ae9feed7d0db7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6)
 
   * 27e72600df8807de069ab066fcf4a1d40c0d9b56 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22247)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919546440

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * 7b46d61e36c1007f132c255e12d86c597a807335 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22246)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Hudi-6868] Support extracting passwords from credential store for Hive Sync [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10577:
URL: https://github.com/apache/hudi/pull/10577#issuecomment-1919532025

   
   ## CI report:
   
   * 40cbc324442334d3e1313f995c8ae9feed7d0db7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6)
 
   * 27e72600df8807de069ab066fcf4a1d40c0d9b56 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919531736

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * 56fad09cfbef830e9e359fda98d282431c1fdc7f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22245)
 
   * 7b46d61e36c1007f132c255e12d86c597a807335 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919517887

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * 85d468ac3479ec66a4507e07e157ef77a8e42e7b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22243)
 
   * 56fad09cfbef830e9e359fda98d282431c1fdc7f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10278:
URL: https://github.com/apache/hudi/pull/10278#issuecomment-1919517087

   
   ## CI report:
   
   * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN
   * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN
   * a6973f9c50ba8fcc6485bc87a8107752988447eb Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22233)
 
   * 4017aca3f1cc50f0a22d023d6c175fc0224bb2b1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22244)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Hudi-6868] Support extracting passwords from credential store for Hive Sync [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on code in PR #10577:
URL: https://github.com/apache/hudi/pull/10577#discussion_r1473131674


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##
@@ -998,7 +1000,16 @@ object HoodieSparkSqlWriter {
   
properties.put(HiveSyncConfigHolder.HIVE_SYNC_SCHEMA_STRING_LENGTH_THRESHOLD.key,
 
spark.sessionState.conf.getConf(StaticSQLConf.SCHEMA_STRING_LENGTH_THRESHOLD).toString)
   properties.put(HoodieSyncConfig.META_SYNC_SPARK_VERSION.key, 
SPARK_VERSION)
   
properties.put(HoodieSyncConfig.META_SYNC_USE_FILE_LISTING_FROM_METADATA.key, 
hoodieConfig.getBoolean(HoodieMetadataConfig.ENABLE))
-
+  try{
+val passwd = 
ShimLoader.getHadoopShims.getPassword(spark.sparkContext.hadoopConfiguration, 
HiveConf.ConfVars.METASTOREPWD.varname)
+if (passwd != null && !passwd.isEmpty) {

Review Comment:
   Made the changes. Please review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10278:
URL: https://github.com/apache/hudi/pull/10278#issuecomment-1919432441

   
   ## CI report:
   
   * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN
   * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN
   * a6973f9c50ba8fcc6485bc87a8107752988447eb Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22233)
 
   * 4017aca3f1cc50f0a22d023d6c175fc0224bb2b1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Hudi-6868] Support extracting passwords from credential store for Hive Sync [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on code in PR #10577:
URL: https://github.com/apache/hudi/pull/10577#discussion_r1473085790


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##
@@ -998,7 +1000,16 @@ object HoodieSparkSqlWriter {
   
properties.put(HiveSyncConfigHolder.HIVE_SYNC_SCHEMA_STRING_LENGTH_THRESHOLD.key,
 
spark.sessionState.conf.getConf(StaticSQLConf.SCHEMA_STRING_LENGTH_THRESHOLD).toString)
   properties.put(HoodieSyncConfig.META_SYNC_SPARK_VERSION.key, 
SPARK_VERSION)
   
properties.put(HoodieSyncConfig.META_SYNC_USE_FILE_LISTING_FROM_METADATA.key, 
hoodieConfig.getBoolean(HoodieMetadataConfig.ENABLE))
-
+  try{
+val passwd = 
ShimLoader.getHadoopShims.getPassword(spark.sparkContext.hadoopConfiguration, 
HiveConf.ConfVars.METASTOREPWD.varname)
+if (passwd != null && !passwd.isEmpty) {

Review Comment:
   Thanks @bvaradar for the feedback. I will make that change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] parquet bloom filters not supported by hudi [hudi]

2024-01-31 Thread via GitHub


jonvex commented on issue #7117:
URL: https://github.com/apache/hudi/issues/7117#issuecomment-1919400827

   https://github.com/apache/hudi/pull/10278 I am working on the FileGroup 
Reader for Hudi 1.0 and that test was failing but if I change it to accu.add(1) 
then it works. So that's why I'm asking. I don't want to break parquet bloom 
filters.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [Docs] Added known regression note for 0.14.1 release related to ComplexKeyGen [hudi]

2024-01-31 Thread via GitHub


ad1happy2go opened a new pull request, #10597:
URL: https://github.com/apache/hudi/pull/10597

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] too many s3 list when hoodie.metadata.enable=true [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #9751:
URL: https://github.com/apache/hudi/issues/9751#issuecomment-1919303669

   @ad1happy2go I did internal benchmarks with different versions of hudi here. 
With metadata enabled between various version, I didn't saw significant 
increase in S3 calls.
   
   @njalan @BruceKellan Did you tried 0.14.X release? Do you still see high S3 
calls only with metadata enabled?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Unable to read column_stats sub-table of a HUDI table for some tables [hudi]

2024-01-31 Thread via GitHub


codope closed issue #9399: [SUPPORT] Unable to read column_stats sub-table of a 
HUDI table for some tables
URL: https://github.com/apache/hudi/issues/9399


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Unable to read column_stats sub-table of a HUDI table for some tables [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #9399:
URL: https://github.com/apache/hudi/issues/9399#issuecomment-1919299483

   @nandubatchu 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Unable to read column_stats sub-table of a HUDI table for some tables [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #9399:
URL: https://github.com/apache/hudi/issues/9399#issuecomment-1919299172

   Closing this out. Please reopen in case you still facing this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Datasource incremental subsequent read same as first read [hudi]

2024-01-31 Thread via GitHub


parisni commented on issue #7846:
URL: https://github.com/apache/hudi/issues/7846#issuecomment-1919296148

   we recently faced a more general problem with spark datasource where 
subsequent read.table("hudi_table") are cached and won't reflect hudi commits 
except if you restart the context (or apply your config)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Dirty data filtering failed [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #9877:
URL: https://github.com/apache/hudi/issues/9877#issuecomment-1919294351

   @deasea Sorry for delay here. @danny0405 Do you have any insights here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] hudi sql task hang java.lang.System.exit block [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10112:
URL: https://github.com/apache/hudi/issues/10112#issuecomment-1919292489

   @zyclove Did you got a chance to try this? Was this PR fixed your issue. 
Please share the insights here. Thanks in advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] parquet bloom filters not supported by hudi [hudi]

2024-01-31 Thread via GitHub


parisni commented on issue #7117:
URL: https://github.com/apache/hudi/issues/7117#issuecomment-1919291600

   that's a good point. I don't know, I found that code in the spark tests. The 
point is it does increment !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieMultiTableDeltaStreamer does not work as expected [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10246:
URL: https://github.com/apache/hudi/issues/10246#issuecomment-1919284449

   @nttq1sub Sorry for delay in response here. Yes, You are correct. It will 
read from one topic and ingest one table for that MicroBatchExecution and then 
runs the next table. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Querying Hudi Table Created With Version 0.12.3 Not Working on Trino 430 [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10228:
URL: https://github.com/apache/hudi/issues/10228#issuecomment-1919280265

   @Amar1404 Ideally HiveSync also should delegate to AwsGlueCatalogSync if 
Glue is enabled for EMR. So ideally should not cause any difference.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919262947

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * 85d468ac3479ec66a4507e07e157ef77a8e42e7b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22243)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to do Schema Evolution with Apache Flink DataStream API when doing CDC? [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10349:
URL: https://github.com/apache/hudi/issues/10349#issuecomment-1919260878

   @danny0405 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Partitioning data into two keys is taking more time (10x) than partitioning into one key. [hudi]

2024-01-31 Thread via GitHub


maheshguptags commented on issue #10456:
URL: https://github.com/apache/hudi/issues/10456#issuecomment-1919229376

   Hi @ad1happy2go,
   There is little correction on the commit file size.
   
   > which ultimately causing OOM due to 400MB commit files. 
   
   its a 41 Mb commit file size @ad1happy2go. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919174513

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * 32225014f8f91229051ceea86612261f5ef1a5f8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22239)
 
   * 85d468ac3479ec66a4507e07e157ef77a8e42e7b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7362] Fix hudi partition base path scheme to s3 [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10596:
URL: https://github.com/apache/hudi/pull/10596#issuecomment-1919145004

   
   ## CI report:
   
   * febb22c2b62f65657dbe46f4242ca032dd64185f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22241)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Support non-unique keys for secondary index [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10211:
URL: https://github.com/apache/hudi/pull/10211#issuecomment-1919143807

   
   ## CI report:
   
   * b3c87bc228fa2be4558a349c9f44f47a695f8a8d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22242)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Support non-unique keys for secondary index [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10211:
URL: https://github.com/apache/hudi/pull/10211#issuecomment-1919059558

   
   ## CI report:
   
   * d97a61842c678093b17ae5c42f95a1f4e2aa925f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22024)
 
   * b3c87bc228fa2be4558a349c9f44f47a695f8a8d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22242)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9424]Support using local timezone when writing flink TIMESTAMP data [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10594:
URL: https://github.com/apache/hudi/pull/10594#issuecomment-1919049335

   
   ## CI report:
   
   * d5f75fac99c5cd1039f0418e0900fc3aae608a33 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22240)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] Support non-unique keys for secondary index [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10211:
URL: https://github.com/apache/hudi/pull/10211#issuecomment-1919048326

   
   ## CI report:
   
   * d97a61842c678093b17ae5c42f95a1f4e2aa925f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22024)
 
   * b3c87bc228fa2be4558a349c9f44f47a695f8a8d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi 0.13.1 on EMR, MOR table writer hangs intermittently with S3 read timeout error for column stats index [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10415:
URL: https://github.com/apache/hudi/issues/10415#issuecomment-1919021038

   Thanks for trying @ergophobiac. @CTTY any insights here ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Inconsistent Checkpoint Size in Flink Applications with MoR [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10329:
URL: https://github.com/apache/hudi/issues/10329#issuecomment-1919019227

   cc @danny0405 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]

2024-01-31 Thread via GitHub


codope closed issue #10303: [SUPPORT] CoW: Hudi Upsert not working when there 
is a timestamp field in the composite key 
URL: https://github.com/apache/hudi/issues/10303


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10303:
URL: https://github.com/apache/hudi/issues/10303#issuecomment-1919013789

   @srinikandi Closing out this issue, Please reopen in case you still faces 
this issue after setting 
`hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Can't delete key (row) for all commits in HUDI Table (history)? [hudi]

2024-01-31 Thread via GitHub


jens4doc commented on issue #10581:
URL: https://github.com/apache/hudi/issues/10581#issuecomment-1919012227

   Thank you, unfortunate that right to be forgotten cannot be applied by HUDI. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Can't delete key (row) for all commits in HUDI Table (history)? [hudi]

2024-01-31 Thread via GitHub


jens4doc closed issue #10581: [SUPPORT] Can't delete key (row) for all commits 
in HUDI Table (history)?
URL: https://github.com/apache/hudi/issues/10581


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] MOR hudi 0.14, Bloom Filters are not being used on query time [hudi]

2024-01-31 Thread via GitHub


bk-mz commented on issue #10511:
URL: https://github.com/apache/hudi/issues/10511#issuecomment-1919010119

   >when number of output rows with bloom is clearly lot less than number of 
output rows without bloom.
   
   @ad1happy2go 
   
   The query performance is same for both ro and snapshot cases, therefore I'm 
making that statement. Just having one number smaller than other number is 
cryptic. 
   
   >You can also try column stats indexing also in this case. 
   
   As you can see, they are enabled:
   
   ```hoodie.metadata.index.bloom.filter.column.list=id,account_id
   hoodie.metadata.index.bloom.filter.enable=true
   hoodie.metadata.index.column.stats.column.list=id,account_id
   hoodie.metadata.index.column.stats.enable=true```
   
   My concern with Hudi and in this ticket specifically, that today Hudi does 
not allow you to introspect and figure out that any statistical or indexing 
solution is actually improving performance. 
   
   We can't tie hudi configurations with actual results, they are logically not 
connected as seen from queries above. 
   
   I.e. I can't say "ok I removed that configuration and my query started to 
lag", nor vice-versa, I also can't say "I added that column in statistics 
config and my queries are faster now", because there are no metrics nor 
practical evidences from anywhere helping to understand the cause.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Duplicate Row in Same Partition using Global Bloom Index [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #9536:
URL: https://github.com/apache/hudi/issues/9536#issuecomment-1919009780

   @Raghvendradubey Closing this. Please reopen if you still faces this issue 
with this PR or 0.14.1. Thanks a lot for raising this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Deltastreamer throws exception when ingesting INT96 timestamps [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #9151:
URL: https://github.com/apache/hudi/issues/9151#issuecomment-1919005476

   @satyasinha-94 Any update on this? Were you able to get your issue resolved. 
Please share the insights.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [BUG]Data duplication, multiple data primary keys are duplicated [hudi]

2024-01-31 Thread via GitHub


codope closed issue #10545: [BUG]Data duplication, multiple data primary keys 
are duplicated
URL: https://github.com/apache/hudi/issues/10545


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [BUG]Data duplication, multiple data primary keys are duplicated [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10545:
URL: https://github.com/apache/hudi/issues/10545#issuecomment-1919000324

   @waywtdcc Closing this out. Please reopen in case you need any more help on 
this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Support] An error occurred while calling o1748.load.\n: java.io.FileNotFoundException [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10503:
URL: https://github.com/apache/hudi/issues/10503#issuecomment-1918998944

   So you mean spark standalone mode? Does that mode works for 0.14 also?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Support] An error occurred while calling o1748.load.\n: java.io.FileNotFoundException [hudi]

2024-01-31 Thread via GitHub


gsudhanshu commented on issue #10503:
URL: https://github.com/apache/hudi/issues/10503#issuecomment-1918985618

   @ad1happy2go it is working in 0.13.1 and standalone mode


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hope Hudi 0.13. 1 can support Flink 1.17+ [hudi]

2024-01-31 Thread via GitHub


codope closed issue #10434: [SUPPORT] Hope Hudi 0.13. 1 can support Flink 1.17+
URL: https://github.com/apache/hudi/issues/10434


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hope Hudi 0.13. 1 can support Flink 1.17+ [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10434:
URL: https://github.com/apache/hudi/issues/10434#issuecomment-1918970432

   @lmhongwei Closing this issue, Please reopen in case you need any further 
help. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Flink streaming read MOR table, thrown Unexpected cdc file split infer case: LOG_FILE Exception [hudi]

2024-01-31 Thread via GitHub


codope closed issue #10539: [SUPPORT] Flink streaming read MOR table, thrown 
Unexpected cdc file split infer case: LOG_FILE Exception 
URL: https://github.com/apache/hudi/issues/10539


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o230.save. Parquet/Avro schema mismatch: Avro field 'id' not found [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10555:
URL: https://github.com/apache/hudi/issues/10555#issuecomment-1918966565

   Any update here @jayesh2424 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Flink streaming read MOR table, thrown Unexpected cdc file split infer case: LOG_FILE Exception [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10539:
URL: https://github.com/apache/hudi/issues/10539#issuecomment-1918968410

   @nicholasxu Closing out this issue. Please reopen or create a new one in 
case of any further queries/issues. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] MOR hudi 0.14, Bloom Filters are not being used on query time [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10511:
URL: https://github.com/apache/hudi/issues/10511#issuecomment-1918965640

   @bk-mz Why do you think "indexing and statistical means in hudi are 
ineffective" when number of output rows with bloom is clearly lot less than 
number of output rows without bloom. 
   You can also try column stats indexing also in this case. That will optimise 
your read queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7362] Fix hudi partition base path scheme to s3 [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10596:
URL: https://github.com/apache/hudi/pull/10596#issuecomment-1918959104

   
   ## CI report:
   
   * febb22c2b62f65657dbe46f4242ca032dd64185f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22241)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HUDI baseFile is empty String and this causes IllegalArgumentException [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10458:
URL: https://github.com/apache/hudi/issues/10458#issuecomment-1918953387

   I will work on updating the docs. Thanks @stayrascal 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Partitioning data into two keys is taking more time (10x) than partitioning into one key. [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10456:
URL: https://github.com/apache/hudi/issues/10456#issuecomment-1918948729

   @xicm @danny0405 Had a discussion with @maheshguptags . Let me try to 
summarise his issue.
   
   He is having around 5000 partitions in total and using the bucket index. 
When he use parallelism(write.tasks) as 20 the job takes 1:45 mins and when it 
is 100 it takes 35 mins.
   
   But with increase in parallelism, the number of file groups explodes as 
expected. This result in lot of small file groups with very few records each 
(~20) , which ultimately causing OOM due to 400MB commit files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7362] Fix hudi partition base path scheme to s3 [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10596:
URL: https://github.com/apache/hudi/pull/10596#issuecomment-1918891968

   
   ## CI report:
   
   * febb22c2b62f65657dbe46f4242ca032dd64185f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7362) Athena does not support s3a partition scheme anymore leading to missing data

2024-01-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7362:
-
Labels: pull-request-available  (was: )

>  Athena does not support s3a partition scheme anymore leading to missing data
> -
>
> Key: HUDI-7362
> URL: https://issues.apache.org/jira/browse/HUDI-7362
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: nicolas paris
>Priority: Major
>  Labels: pull-request-available
>
> see https://github.com/apache/hudi/issues/10595



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7362) Athena does not support s3a partition scheme anymore leading to missing data

2024-01-31 Thread nicolas paris (Jira)
nicolas paris created HUDI-7362:
---

 Summary:  Athena does not support s3a partition scheme anymore 
leading to missing data
 Key: HUDI-7362
 URL: https://issues.apache.org/jira/browse/HUDI-7362
 Project: Apache Hudi
  Issue Type: Bug
Reporter: nicolas paris


see https://github.com/apache/hudi/issues/10595



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] Fix hudi partition base path scheme to s3 [hudi]

2024-01-31 Thread via GitHub


parisni opened a new pull request, #10596:
URL: https://github.com/apache/hudi/pull/10596

   ### Change Logs
   
   Fixes #10595
   
   ### Impact
   
   People having the issue should drop the glue table and recreate it from 
scratch w/ this patch
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-7287) Exception in streaming read while querying tables with 'cdc.enabled' is true

2024-01-31 Thread Aditya Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812652#comment-17812652
 ] 

Aditya Goenka commented on HUDI-7287:
-

Need documentation update here, as only MOR supports cdc. as per this comment - 
https://github.com/apache/hudi/issues/10458#issuecomment-1911319106

> Exception in streaming read while querying tables with 'cdc.enabled' is true
> 
>
> Key: HUDI-7287
> URL: https://issues.apache.org/jira/browse/HUDI-7287
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Aditya Goenka
>Priority: Critical
> Fix For: 1.1.0
>
>
> Github Issue - [https://github.com/apache/hudi/issues/10458]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] Hudi behaviour if AWS Glue concurrency is triggered[SUPPORT] [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10559:
URL: https://github.com/apache/hudi/issues/10559#issuecomment-1918835176

   @rishabhreply It will handle and process all the 10 files. It is simple 
spark/distributed computing concept to process files in parallel. Let me know 
in case I am missing anything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Datasource incremental subsequent read same as first read [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #7846:
URL: https://github.com/apache/hudi/issues/7846#issuecomment-1918831478

   adding @beyond1920 @yihua @nsivabalan for more insights here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Partitioning data into two keys is taking more time (10x) than partitioning into one key. [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10456:
URL: https://github.com/apache/hudi/issues/10456#issuecomment-1918825052

   @maheshguptags Lets get into a call to discuss this further.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Support] An error occurred while calling o1748.load.\n: java.io.FileNotFoundException [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10503:
URL: https://github.com/apache/hudi/issues/10503#issuecomment-1918819721

   @gsudhanshu Can you let us know if Just by downgrading Hudi version to 
0.13.1 makes your existing setup works? If yes then we need to dig deep and 
something in 0.14 release should cause this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] The Schema Evolution Not working For Hudi 0.12.3 [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10309:
URL: https://github.com/apache/hudi/issues/10309#issuecomment-1918797127

   @lei-su-awx I tried this code with 0.14.1 and it worked fine. With 0.14.0 I 
can see the error.
   
   @lei-su-awx @Amar1404 Can you guys try with 0.14.1 and let me know in case 
this issue persists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (e466fb221b5 -> c5573ab34b2)

2024-01-31 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from e466fb221b5 [HUDI-7345] Remove usage of 
org.apache.hadoop.util.VersionUtil (#10571)
 add c5573ab34b2 [HUDI-7344] Use Java Stream instead of 
FSDataStream when possible (#10573)

No new revisions were added by this update.

Summary of changes:
 .../hudi/cli/commands/CompactionCommand.java   |  8 +++
 .../cli/commands/TestUpgradeDowngradeCommand.java  | 12 +-
 .../cli/integ/ITTestHDFSParquetImportCommand.java  |  4 ++--
 .../HoodieTestCommitMetadataGenerator.java |  6 ++---
 .../lock/FileSystemBasedLockProvider.java  |  6 ++---
 .../index/bucket/ConsistentBucketIndexUtils.java   |  8 +++
 .../org/apache/hudi/HoodieTestCommitGenerator.java |  4 ++--
 .../hudi/client/TestJavaHoodieBackedMetadata.java  |  4 ++--
 .../functional/TestHoodieBackedMetadata.java   |  4 ++--
 .../java/org/apache/hudi/table/TestCleaner.java|  4 ++--
 .../TestTimelineServerBasedWriteMarkers.java   |  8 +++
 .../hudi/table/upgrade/TestUpgradeDowngrade.java   | 12 +-
 .../common/config/HoodieFunctionalIndexConfig.java | 14 +--
 .../hudi/common/model/HoodiePartitionMetadata.java | 11 -
 .../hudi/common/table/HoodieTableConfig.java   | 16 ++---
 .../table/timeline/HoodieActiveTimeline.java   |  4 ++--
 .../org/apache/hudi/common/util/ConfigUtils.java   | 10 
 .../hudi/common/util/InternalSchemaCache.java  |  4 ++--
 .../org/apache/hudi/common/util/MarkerUtils.java   | 28 +++---
 .../io/FileBasedInternalSchemaStorageManager.java  |  4 ++--
 .../hudi/common/table/TestHoodieTableConfig.java   | 10 
 .../common/testutils/HoodieTestDataGenerator.java  |  7 +++---
 .../hudi/table/catalog/TableOptionProperties.java  |  8 +++
 .../apache/hudi/util/ViewStorageProperties.java|  8 +++
 .../hudi/hadoop/fs/HoodieWrapperFileSystem.java| 17 ++---
 .../org/apache/hudi/common/util/FileIOUtils.java   | 24 +--
 .../hudi/hive/testutils/HiveTestCluster.java   |  3 +--
 .../apache/hudi/hive/testutils/HiveTestUtil.java   | 10 
 .../hudi/sync/common/util/ManifestFileWriter.java  |  4 ++--
 .../service/handlers/marker/MarkerDirState.java| 10 
 .../hudi/utilities/HoodieCompactionAdminTool.java  |  8 +++
 .../hudi/utilities/perf/TimelineServerPerf.java|  4 ++--
 .../utilities/schema/FilebasedSchemaProvider.java  |  4 ++--
 .../apache/hudi/utilities/sources/JdbcSource.java  |  4 ++--
 .../deltastreamer/TestHoodieDeltaStreamer.java |  4 ++--
 .../functional/TestHDFSParquetImporter.java|  4 ++--
 .../sources/helpers/TestSanitizationUtils.java |  4 ++--
 37 files changed, 151 insertions(+), 153 deletions(-)



(hudi) branch master updated (a078242b19d -> e466fb221b5)

2024-01-31 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from a078242b19d [HUDI-7343] Replace Path.SEPARATOR with 
HoodieLocation.SEPARATOR (#10570)
 add e466fb221b5 [HUDI-7345] Remove usage of 
org.apache.hadoop.util.VersionUtil (#10571)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/hudi/avro/HoodieAvroUtils.java |   5 +-
 .../apache/hudi/common/util/ComparableVersion.java | 402 +
 .../org/apache/hudi/common/util/StringUtils.java   | 108 +-
 .../apache/hudi/common/util/TestStringUtils.java   |  22 +-
 4 files changed, 525 insertions(+), 12 deletions(-)
 create mode 100644 
hudi-io/src/main/java/org/apache/hudi/common/util/ComparableVersion.java
 copy {hudi-common => 
hudi-io}/src/test/java/org/apache/hudi/common/util/TestStringUtils.java (84%)



Re: [PR] [HUDI-9424]Support using local timezone when writing flink TIMESTAMP data [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10594:
URL: https://github.com/apache/hudi/pull/10594#issuecomment-1918779515

   
   ## CI report:
   
   * d5f75fac99c5cd1039f0418e0900fc3aae608a33 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22240)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9424]Support using local timezone when writing flink TIMESTAMP data [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10594:
URL: https://github.com/apache/hudi/pull/10594#issuecomment-1918766300

   
   ## CI report:
   
   * d5f75fac99c5cd1039f0418e0900fc3aae608a33 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9424]Support using local timezone when writing flink TIMESTAMP data [hudi]

2024-01-31 Thread via GitHub


cmmp6 commented on PR #10594:
URL: https://github.com/apache/hudi/pull/10594#issuecomment-1918722320

   Related items 
https://github.com/apache/hudi/pull/7886,https://github.com/apache/hudi/pull/7904,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9424]Support using local timezone when writing flink TIMESTAMP data [hudi]

2024-01-31 Thread via GitHub


cmmp6 commented on PR #10594:
URL: https://github.com/apache/hudi/pull/10594#issuecomment-1918703432

   @danny0405 please review this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9424]Support using local timezone when writing flink TIMESTAMP data [hudi]

2024-01-31 Thread via GitHub


cmmp6 commented on PR #10594:
URL: https://github.com/apache/hudi/pull/10594#issuecomment-1918701697

   PR is to solve problem in https://github.com/apache/hudi/issues/9424


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-9424]Support using local timezone when writing flink TIMESTAMP data [hudi]

2024-01-31 Thread via GitHub


cmmp6 opened a new pull request, #10594:
URL: https://github.com/apache/hudi/pull/10594

   ### Change Logs
   
   This PR makes the changes to support using local timezone when writing flink 
TIMESTAMP data. 
   
   ### Impact
   
   User can use utc or local timezone to write flink TIMESTAMP date.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   Add new flink config option "write.utc-timezone". Default value is "true" 
for forward compatibility.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7361) Fix a concurrency issue caused by clean

2024-01-31 Thread eric (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eric closed HUDI-7361.
--
Fix Version/s: 0.14.0
   Resolution: Fixed

> Fix a concurrency issue caused by clean
> ---
>
> Key: HUDI-7361
> URL: https://issues.apache.org/jira/browse/HUDI-7361
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: eric
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1918675170

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * 32225014f8f91229051ceea86612261f5ef1a5f8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22239)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7361) Fix a concurrency issue caused by clean

2024-01-31 Thread eric (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eric updated HUDI-7361:
---
Summary: Fix a concurrency issue caused by clean  (was: Fix a concurrency 
issue caused by rollbackFailedWrites)

> Fix a concurrency issue caused by clean
> ---
>
> Key: HUDI-7361
> URL: https://issues.apache.org/jira/browse/HUDI-7361
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: eric
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7361) Fix a concurrency issue caused by rollbackFailedWrites

2024-01-31 Thread eric (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eric updated HUDI-7361:
---
Attachment: (was: taskmanager_log.txt)

> Fix a concurrency issue caused by rollbackFailedWrites
> --
>
> Key: HUDI-7361
> URL: https://issues.apache.org/jira/browse/HUDI-7361
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: eric
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-7361) Fix a concurrency issue caused by rollbackFailedWrites

2024-01-31 Thread eric (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eric resolved HUDI-7361.


> Fix a concurrency issue caused by rollbackFailedWrites
> --
>
> Key: HUDI-7361
> URL: https://issues.apache.org/jira/browse/HUDI-7361
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: eric
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7361) Fix a concurrency issue caused by rollbackFailedWrites

2024-01-31 Thread eric (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eric updated HUDI-7361:
---
Component/s: (was: writer-core)
Description: (was: {quote}CREATE TABLE tbl (
..
) WITH (
'connector' = 'hudi',
'path' = '/tblpath',
'table.type' = 'COPY_ON_WRITE',
'write.bucket_assign.tasks'='5',
'write.operation'='insert',
'write.tasks'='5', 
'clustering.schedule.enabled'='true',
'clustering.async.enabled'='true',
'clustering.delta_commits'='3',
'clustering.tasks'='5',
'hoodie.cleaner.policy.failed.writes'='LAZY'
);
{quote}
*Table parameters are as above*

 

*From jbmanager and taskmanager log, we can summarize the process of abnormal 
triggering:* 


before the writeClient complete the commit 20240126154725671, the clean table 
service starts to work, and the failed Writes rollback needs to be checked and 
completed during the clean process. 

This method will verify whether the heartbeats of all inflight instants are 
overtime and rollback which instants have overtime heartbeats. At the same 
time, the write client has completed the commit 20240126154725671 and deleted 
the heartbeat file of this instant. 

The clean table service client obtained the last heartbeat of 0, so it rolled 
back this instant.)

> Fix a concurrency issue caused by rollbackFailedWrites
> --
>
> Key: HUDI-7361
> URL: https://issues.apache.org/jira/browse/HUDI-7361
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: eric
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7361) Fix a concurrency issue caused by rollbackFailedWrites

2024-01-31 Thread eric (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eric updated HUDI-7361:
---
Attachment: (was: jobmanager_log.txt)

> Fix a concurrency issue caused by rollbackFailedWrites
> --
>
> Key: HUDI-7361
> URL: https://issues.apache.org/jira/browse/HUDI-7361
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: eric
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-7361) Fix a concurrency issue caused by rollbackFailedWrites

2024-01-31 Thread eric (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812583#comment-17812583
 ] 

eric commented on HUDI-7361:


[[HUDI-5675] fix lazy clean schedule rollback on completed instant by 
stream2000 · Pull Request #7826 · apache/hudi 
(github.com)|https://github.com/apache/hudi/pull/7826]

> Fix a concurrency issue caused by rollbackFailedWrites
> --
>
> Key: HUDI-7361
> URL: https://issues.apache.org/jira/browse/HUDI-7361
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: eric
>Priority: Major
>  Labels: pull-request-available
> Attachments: jobmanager_log.txt, taskmanager_log.txt
>
>
> {quote}CREATE TABLE tbl (
> ..
> ) WITH (
> 'connector' = 'hudi',
> 'path' = '/tblpath',
> 'table.type' = 'COPY_ON_WRITE',
> 'write.bucket_assign.tasks'='5',
> 'write.operation'='insert',
> 'write.tasks'='5', 
> 'clustering.schedule.enabled'='true',
> 'clustering.async.enabled'='true',
> 'clustering.delta_commits'='3',
> 'clustering.tasks'='5',
> 'hoodie.cleaner.policy.failed.writes'='LAZY'
> );
> {quote}
> *Table parameters are as above*
>  
> *From jbmanager and taskmanager log, we can summarize the process of abnormal 
> triggering:* 
> before the writeClient complete the commit 20240126154725671, the clean table 
> service starts to work, and the failed Writes rollback needs to be checked 
> and completed during the clean process. 
> This method will verify whether the heartbeats of all inflight instants are 
> overtime and rollback which instants have overtime heartbeats. At the same 
> time, the write client has completed the commit 20240126154725671 and deleted 
> the heartbeat file of this instant. 
> The clean table service client obtained the last heartbeat of 0, so it rolled 
> back this instant.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1918663165

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * 4bb5eb097f1cf2eebdd62f4dbd8982d448c96a9e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9)
 
   * 32225014f8f91229051ceea86612261f5ef1a5f8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7361] Fix a concurrency issue caused by rollbackFailedWrites [hudi]

2024-01-31 Thread via GitHub


eric9204 commented on PR #10593:
URL: https://github.com/apache/hudi/pull/10593#issuecomment-1918662863

   this issue has been resolved by this pr 
https://github.com/apache/hudi/pull/7826


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] The Schema Evolution Not working For Hudi 0.12.3 [hudi]

2024-01-31 Thread via GitHub


lei-su-awx commented on issue #10309:
URL: https://github.com/apache/hudi/issues/10309#issuecomment-1918659000

   I had a similar question, when table schema is double, and the incoming data 
schema is long, then why the data can not upsert into table? I think double can 
handle long. Below is my code(hudi 0.14.0, spark3.4.1):
   
   ```
   from pyspark.sql.types import StructType, StructField, StringType, 
ArrayType, MapType, DecimalType, LongType, BooleanType, DoubleType, IntegerType
   from pyspark.sql import Row
   schema1 = StructType(
   [
   StructField("id", IntegerType(), True),
   StructField("value", DoubleType(), True)
   ]
   )
   
   schema2 = StructType(
   [
   StructField("id", IntegerType(), True),
   StructField("value", LongType(), True)
   ]
   )
   
   data1 = [
   Row(1, 100.0),
   Row(2, 100.0),
   Row(3,100.0),
   ]
   
   data2 = [
   Row(1, 100),
   Row(2, 200),
   Row(3,100),
   ]
   
   
   hudi_configs = {
   "hoodie.table.name": 'table',
   "hoodie.datasource.write.precombine.field":"value",
   "hoodie.datasource.write.recordkey.field":"id",
   'hoodie.datasource.write.reconcile.schema': 'true',
   'hoodie.schema.on.read.enable': 'true',
   }
   
   PATH = 'gs://data_lake_staging_hk/data/raw_lei/test/'
   
   df = spark.createDataFrame(spark.sparkContext.parallelize(data1), schema1)
   
df.write.format("org.apache.hudi").options(**hudi_configs).mode("append").save(PATH)
   spark.read.format("org.apache.hudi").load(PATH).printSchema()
   spark.read.format("org.apache.hudi").load(PATH).show()
   df = spark.createDataFrame(spark.sparkContext.parallelize(data2), schema2)
   
df.write.format("org.apache.hudi").options(**hudi_configs).mode("append").save(PATH)
   spark.read.format("org.apache.hudi").load(PATH).printSchema()
   spark.read.format("org.apache.hudi").load(PATH).show()
   ```
   the stack trace is:
   `IllegalArgumentException: cannot update origin type: double to a 
incompatibility type: long`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7361) Fix a concurrency issue caused by rollbackFailedWrites

2024-01-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7361:
-
Labels: pull-request-available  (was: )

> Fix a concurrency issue caused by rollbackFailedWrites
> --
>
> Key: HUDI-7361
> URL: https://issues.apache.org/jira/browse/HUDI-7361
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: eric
>Priority: Major
>  Labels: pull-request-available
> Attachments: jobmanager_log.txt, taskmanager_log.txt
>
>
> {quote}CREATE TABLE tbl (
> ..
> ) WITH (
> 'connector' = 'hudi',
> 'path' = '/tblpath',
> 'table.type' = 'COPY_ON_WRITE',
> 'write.bucket_assign.tasks'='5',
> 'write.operation'='insert',
> 'write.tasks'='5', 
> 'clustering.schedule.enabled'='true',
> 'clustering.async.enabled'='true',
> 'clustering.delta_commits'='3',
> 'clustering.tasks'='5',
> 'hoodie.cleaner.policy.failed.writes'='LAZY'
> );
> {quote}
> *Table parameters are as above*
>  
> *From jbmanager and taskmanager log, we can summarize the process of abnormal 
> triggering:* 
> before the writeClient complete the commit 20240126154725671, the clean table 
> service starts to work, and the failed Writes rollback needs to be checked 
> and completed during the clean process. 
> This method will verify whether the heartbeats of all inflight instants are 
> overtime and rollback which instants have overtime heartbeats. At the same 
> time, the write client has completed the commit 20240126154725671 and deleted 
> the heartbeat file of this instant. 
> The clean table service client obtained the last heartbeat of 0, so it rolled 
> back this instant.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7361] Fix a concurrency issue caused by rollbackFailedWrites [hudi]

2024-01-31 Thread via GitHub


eric9204 closed pull request #10593: [HUDI-7361] Fix a concurrency issue caused 
by rollbackFailedWrites
URL: https://github.com/apache/hudi/pull/10593


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix(HoodieRecord): add serialVersionUID [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10592:
URL: https://github.com/apache/hudi/pull/10592#issuecomment-1918651993

   
   ## CI report:
   
   * 8de03a278356eafd1cf9f012d58a5993f5314b56 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22236)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10591:
URL: https://github.com/apache/hudi/pull/10591#issuecomment-1918651897

   
   ## CI report:
   
   * 8207558e8c8714386cf2f71929d6fb08db10617b UNKNOWN
   * 4e39d3ba20d5d2236e599a55c96a9c731ed721c0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22238)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1918651519

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * 4bb5eb097f1cf2eebdd62f4dbd8982d448c96a9e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



<    1   2