Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920710825

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * 99cf737b33b2f3687d743afde2f13a341851c237 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22266)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10278:
URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920709373

   
   ## CI report:
   
   * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN
   * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN
   * 1dab0df80b70d0d70aabd57743d8681bce3c6ec1 UNKNOWN
   * c410c9ab8a8ea987b41a009a33157c387b06795a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22264)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


linliu-code commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920706918

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7365) Fxi a flaky test TestHoodieParquetInputFormat.testHoodieParquetInputFormatReadTimeType

2024-01-31 Thread Lin Liu (Jira)
Lin Liu created HUDI-7365:
-

 Summary: Fxi a flaky test 
TestHoodieParquetInputFormat.testHoodieParquetInputFormatReadTimeType
 Key: HUDI-7365
 URL: https://issues.apache.org/jira/browse/HUDI-7365
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Lin Liu
Assignee: Lin Liu


We can see that the error sometimes without any changes.

 

TestHoodieParquetInputFormat.testHoodieParquetInputFormatReadTimeType:818 
expected: <2024-02-01 07:36:39.0> but was: <2024-02-01 07:36:39>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] Hudi behaviour if AWS Glue concurrency is triggered[SUPPORT] [hudi]

2024-01-31 Thread via GitHub


rishabhreply commented on issue #10559:
URL: https://github.com/apache/hudi/issues/10559#issuecomment-1920687767

   @ad1happy2go Okay, so if I ingest 10 files altogether and my step function 
triggers multiple glue job instances to process them then there will be no data 
discrepancy in the data written by the jobs. Thank you for the effort!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] parquet bloom filters not supported by hudi [hudi]

2024-01-31 Thread via GitHub


parisni commented on issue #7117:
URL: https://github.com/apache/hudi/issues/7117#issuecomment-1920649103

   Okay will double check thanks for reaching


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] parquet bloom filters not supported by hudi [hudi]

2024-01-31 Thread via GitHub


parisni commented on issue #7117:
URL: https://github.com/apache/hudi/issues/7117#issuecomment-1920648234

   Got it. Can you share your draft ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920648314

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * 112adc1b0508253ec22bbc11b0fdfb90a108508d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22265)
 
   * 99cf737b33b2f3687d743afde2f13a341851c237 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22266)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Datasource incremental subsequent read same as first read [hudi]

2024-01-31 Thread via GitHub


parisni commented on issue #7846:
URL: https://github.com/apache/hudi/issues/7846#issuecomment-1920644875

   Thanks both for your insight. I am wondering if this behavior also apply for 
iceberg and delta. If not hudi might align to disable this cache by default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920639652

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * 112adc1b0508253ec22bbc11b0fdfb90a108508d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22265)
 
   * 99cf737b33b2f3687d743afde2f13a341851c237 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920630293

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * 112adc1b0508253ec22bbc11b0fdfb90a108508d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22265)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920583621

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * 0be6e4bbc1c11531d777971851888cfa43ce1f73 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22263)
 
   * 112adc1b0508253ec22bbc11b0fdfb90a108508d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10278:
URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920569552

   
   ## CI report:
   
   * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN
   * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN
   * a926d67d3d519c49dcb7b8893671b312e1e5bcfd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22262)
 
   * 1dab0df80b70d0d70aabd57743d8681bce3c6ec1 UNKNOWN
   * c410c9ab8a8ea987b41a009a33157c387b06795a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22264)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] FileNotFoundException when clustering [hudi]

2024-01-31 Thread via GitHub


echisan opened a new issue, #10601:
URL: https://github.com/apache/hudi/issues/10601

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   I am not sure why the parquet file is missing, flinkjob did not restart. I 
would like to know how to handle this issue. Is it possible to ignore the 
missing file?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.Set up a FlinkSQL job with Kafka as the data source.
   2.Configure the job to write data into a Hudi Cow table with online 
clustering.
   3.Execute the job.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.13.1-rc1
   
   * Spark version :
   
   * Hive version : 3.1.3
   
   * Hadoop version :  2.9.2
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : flink on k8s
   
   
   **Additional context**
   
   
   ```sql
   CREATE  TABLE ods_mqtt_msg(
 
 dt STRING,
 PRIMARY KEY (`field1`, `field2`, `field3`) NOT ENFORCED
   )
   PARTITIONED BY (`dt`)
   WITH (
 'connector' = 'hudi',
 'table.type' = 'COPY_ON_WRITE',
 'path' = 's3a:///lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg',
 'write.operation' = 'INSERT',
 'clustering.async.enabled' = 'true',
 'clustering.schedule.enabled' = 'true',
 'hive_sync.enable' = 'true',
 'hive_sync.mode' = 'hms',
 'hive_sync.metastore.uris' = 'thrift://hive-metastore-svc.hms.svc:9083',
 'read.streaming.enabled' = 'true',
 'write.tasks' = '4'
   );
   ```
   
   
   
   **Stacktrace**
   
   ```
   2024-02-01 03:56:01,519 INFO  org.apache.hudi.client.HoodieFlinkWriteClient  
  [] - Cleaner has been spawned already. Waiting for it to finish
   2024-02-01 03:56:01,519 INFO  org.apache.hudi.async.AsyncCleanerService  
  [] - Waiting for async clean service to finish
   2024-02-01 03:56:01,627 INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Loaded 
instants upto : Option{val=[==>20240201035304158__commit__INFLIGHT]}
   2024-02-01 03:56:02,333 INFO  org.apache.hudi.common.util.ClusteringUtils
  [] - Found 658 files in pending clustering operations
   2024-02-01 03:56:02,333 INFO  
org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView [] - Sending 
request : 
(http://169.122.153.67:46386/v1/hoodie/view/compactions/pending/?basepath=s3a%3A%2Flakehouse%2Fhudi%2Fdevice_mqtt_msg%2Fods_mqtt_msg=20240201035302997=350fb15b2282717446dd396f06ebaf80257ed284589ba906e5c3ccf6701cc223)
   2024-02-01 03:56:02,427 INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Checking for 
file exists 
?s3a:/lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg/.hoodie/20240131195932968.replacecommit.requested
   2024-02-01 03:56:02,564 INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Create new 
file for toInstant 
?s3a:/lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg/.hoodie/20240131195932968.replacecommit.inflight
   2024-02-01 03:56:02,677 INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Loaded 
instants upto : Option{val=[20240201035304158__commit__COMPLETED]}
   2024-02-01 03:56:02,677 INFO  
org.apache.flink.streaming.api.operators.AbstractStreamOperator [] - Execute 
clustering plan for instant 20240131195932968 as 17 file slices
   2024-02-01 03:56:02,937 ERROR 
org.apache.hudi.sink.clustering.ClusteringOperator   [] - Executor 
executes action [Execute clustering for instant 20240131195932968 from task 2] 
error
   org.apache.hudi.exception.HoodieClusteringException: Error reading input 
data for 
s3a://xxx-bucket/lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg/2024-01-31/85040fcd-3f42-4b37-865f-616fc0ad3df8-0_1-4-0_20240131164655396.parquet
 and []
at 
org.apache.hudi.sink.clustering.ClusteringOperator.lambda$null$4(ClusteringOperator.java:332)
 ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
at java.lang.Iterable.spliterator(Unknown Source) ~[?:?]
at 
org.apache.hudi.sink.clustering.ClusteringOperator.lambda$readRecordsForGroupBaseFiles$5(ClusteringOperator.java:336)
 ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1]
at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source) ~[?:?]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown 
Source) ~[?:?]
at java.util.stream.AbstractPipeline.copyInto(Unknown Source) ~[?:?]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) 
~[?:?]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown 

Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10278:
URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920521605

   
   ## CI report:
   
   * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN
   * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN
   * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248)
 
   * a926d67d3d519c49dcb7b8893671b312e1e5bcfd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22262)
 
   * 1dab0df80b70d0d70aabd57743d8681bce3c6ec1 UNKNOWN
   * c410c9ab8a8ea987b41a009a33157c387b06795a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22264)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10278:
URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920514612

   
   ## CI report:
   
   * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN
   * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN
   * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248)
 
   * a926d67d3d519c49dcb7b8893671b312e1e5bcfd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22262)
 
   * 1dab0df80b70d0d70aabd57743d8681bce3c6ec1 UNKNOWN
   * c410c9ab8a8ea987b41a009a33157c387b06795a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10278:
URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920509429

   
   ## CI report:
   
   * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN
   * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN
   * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248)
 
   * a926d67d3d519c49dcb7b8893671b312e1e5bcfd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22262)
 
   * 1dab0df80b70d0d70aabd57743d8681bce3c6ec1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920467434

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * 0be6e4bbc1c11531d777971851888cfa43ce1f73 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22263)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10278:
URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920467036

   
   ## CI report:
   
   * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN
   * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN
   * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248)
 
   * a926d67d3d519c49dcb7b8893671b312e1e5bcfd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22262)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920462121

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * e8a768676bb2bf8b64211b06b7fa90785991e958 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22261)
 
   * 0be6e4bbc1c11531d777971851888cfa43ce1f73 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10278:
URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920461859

   
   ## CI report:
   
   * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN
   * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN
   * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248)
 
   * a926d67d3d519c49dcb7b8893671b312e1e5bcfd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920456509

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * e8a768676bb2bf8b64211b06b7fa90785991e958 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22261)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Datasource incremental subsequent read same as first read [hudi]

2024-01-31 Thread via GitHub


beyond1920 commented on issue #7846:
URL: https://github.com/apache/hudi/issues/7846#issuecomment-1920428504

   @parisni I agree with @ad1happy2go cache behavior happened in spark instead 
of HUDI. Spark would cache by `dbName`.`tableName`.
   https://github.com/apache/hudi/assets/1525333/2557fd10-eadf-437c-8506-eabe70ca5b89;>
   
   In addition to set `spark.sql.filesourceTableRelationCacheSize=0` as 
@ad1happy2go  proposed, you could also try to refresh a table manually by 
`spark.catalog.refreshTable("database.hudi_table")` before query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920417023

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * fbba45806b55d9801f973a4a18fe87134a41aa9c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22259)
 
   * e8a768676bb2bf8b64211b06b7fa90785991e958 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22261)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10591:
URL: https://github.com/apache/hudi/pull/10591#issuecomment-1920410955

   
   ## CI report:
   
   * 8207558e8c8714386cf2f71929d6fb08db10617b UNKNOWN
   * 44e334758625cfc2a35d7644cbcbed102e560062 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22260)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920410749

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * fbba45806b55d9801f973a4a18fe87134a41aa9c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22259)
 
   * e8a768676bb2bf8b64211b06b7fa90785991e958 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Flink streaming read MOR table, thrown Unexpected cdc file split infer case: LOG_FILE Exception [hudi]

2024-01-31 Thread via GitHub


nicholasxu commented on issue #10539:
URL: https://github.com/apache/hudi/issues/10539#issuecomment-1920406638

   > @nicholasxu Closing out this issue. Please reopen or create a new one in 
case of any further queries/issues. Thanks.
   
   ok!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10591:
URL: https://github.com/apache/hudi/pull/10591#issuecomment-1920404441

   
   ## CI report:
   
   * 8207558e8c8714386cf2f71929d6fb08db10617b UNKNOWN
   * 4e39d3ba20d5d2236e599a55c96a9c731ed721c0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22238)
 
   * 44e334758625cfc2a35d7644cbcbed102e560062 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Dataloss in FlinkCDC into Hudi without any exception or other infomation [hudi]

2024-01-31 Thread via GitHub


xuzifu666 commented on issue #10542:
URL: https://github.com/apache/hudi/issues/10542#issuecomment-1920398436

   @ad1happy2go According to feedbacks before,the dataloss bug was fixed in 1.0 
beta version?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]

2024-01-31 Thread via GitHub


yihua commented on PR #10591:
URL: https://github.com/apache/hudi/pull/10591#issuecomment-1920374907

   Note to reviewer: commit 
`[44e3347](https://github.com/apache/hudi/pull/10591/commits/44e334758625cfc2a35d7644cbcbed102e560062)`
 is frozen now and I'll only add new commits for new changes and fixes to 
easier review.  I'll also defer the rebasing and force-push until CI passes and 
the PR is approved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920360582

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * fbba45806b55d9801f973a4a18fe87134a41aa9c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22259)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920354291

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * c27384b5d2ed7d697c86115f473c9b18bb76f8f3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22257)
 
   * fbba45806b55d9801f973a4a18fe87134a41aa9c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920347776

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * c27384b5d2ed7d697c86115f473c9b18bb76f8f3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22257)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7364] Move InLineFs classes to hudi-hadoop-common module [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10599:
URL: https://github.com/apache/hudi/pull/10599#issuecomment-1920347970

   
   ## CI report:
   
   * 096faa6576dce3781643ac3f8e7c3d7fb1f879ac Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22253)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


linliu-code commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920323482

   @hudi-bot run azure
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Reorder Azure CI test modules [hudi]

2024-01-31 Thread via GitHub


linliu-code closed pull request #10600: [MINOR] Reorder Azure CI test modules
URL: https://github.com/apache/hudi/pull/10600


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9424]Support using local timezone when writing flink TIMESTAMP data [hudi]

2024-01-31 Thread via GitHub


danny0405 commented on code in PR #10594:
URL: https://github.com/apache/hudi/pull/10594#discussion_r1473678410


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/RowDataToAvroConverters.java:
##
@@ -241,10 +271,10 @@ public Object convert(Schema schema, Object object) {
 };
   }
 
-  private static RowDataToAvroConverter createRowConverter(RowType rowType) {
+  private static RowDataToAvroConverter createRowConverter(RowType rowType, 
boolean utcTimezone) {
 final RowDataToAvroConverter[] fieldConverters =
 rowType.getChildren().stream()
-.map(RowDataToAvroConverters::createConverter)
+.map(type -> createConverter(type, utcTimezone))

Review Comment:
   @voonhous , would you like to take a look at this PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9424]Support using local timezone when writing flink TIMESTAMP data [hudi]

2024-01-31 Thread via GitHub


danny0405 commented on code in PR #10594:
URL: https://github.com/apache/hudi/pull/10594#discussion_r1473677608


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/utils/TestRowDataToAvroConverters.java:
##
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utils;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.flink.formats.common.TimestampFormat;
+import org.apache.flink.formats.json.JsonToRowDataConverters;
+import 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonProcessingException;
+import 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.RowDataToAvroConverters;
+
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.Test;
+
+import java.time.Instant;
+import java.time.LocalDateTime;
+import java.time.ZoneId;
+import java.time.format.DateTimeFormatter;
+import java.util.TimeZone;
+
+import static org.apache.flink.table.api.DataTypes.ROW;
+import static org.apache.flink.table.api.DataTypes.FIELD;
+import static org.apache.flink.table.api.DataTypes.TIMESTAMP;
+
+class TestRowDataToAvroConverters {
+
+  DateTimeFormatter formatter = DateTimeFormatter.ofPattern("-MM-dd 
HH:mm:ss");
+  @Test
+  void testRowDataToAvroStringToRowDataWithLocalTimezone1() throws 
JsonProcessingException {
+TimeZone.setDefault(TimeZone.getTimeZone(ZoneId.of("Asia/Shanghai")));
+String timestampFromUtc8 = "2021-03-30 15:44:29";
+
+DataType rowDataType = ROW(FIELD("timestamp_from_utc_8", TIMESTAMP()));
+JsonToRowDataConverters.JsonToRowDataConverter jsonToRowDataConverter =
+new JsonToRowDataConverters(true, true, TimestampFormat.SQL)
+.createConverter(rowDataType.getLogicalType());
+Object rowData = jsonToRowDataConverter.convert(new 
ObjectMapper().readTree("{\"timestamp_from_utc_8\":\"" + timestampFromUtc8 + 
"\"}"));
+

Review Comment:
   I would like to see some ITs in `ITTestHoodieDataSource`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920288525

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * 3bb06bb4df1185da15fd6bb3e82fdb1ff56e19cb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22255)
 
   * c27384b5d2ed7d697c86115f473c9b18bb76f8f3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (e23f402e194 -> b6642c65848)

2024-01-31 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from e23f402e194 [HUDI-7347] Introduce SeekableDataInputStream for random 
access (#10575)
 add b6642c65848 [MINOR] Add serialVersionUID to HoodieRecord class (#10592)

No new revisions were added by this update.

Summary of changes:
 hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecord.java | 1 +
 1 file changed, 1 insertion(+)



Re: [PR] [MINOR] Add serialVersionUID to HoodieRecord class [hudi]

2024-01-31 Thread via GitHub


danny0405 merged PR #10592:
URL: https://github.com/apache/hudi/pull/10592


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [BUG] Failure Encountered When Reading Hudi with Flink in Batch Runtime Mode and FlinkOptions.READ_AS_STREAMING=false [hudi]

2024-01-31 Thread via GitHub


danny0405 commented on issue #10576:
URL: https://github.com/apache/hudi/issues/10576#issuecomment-1920285622

   Yeah, prople never reports failure for batch snapshot queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch asf-site updated: [Docs] Added known regression note for 0.14.1 release (#10597)

2024-01-31 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 8df7f1ab496 [Docs] Added known regression note for 0.14.1 release 
(#10597)
8df7f1ab496 is described below

commit 8df7f1ab4964f1667af457128a8c6b5b73cdbc3c
Author: Aditya Goenka <63430370+ad1happy...@users.noreply.github.com>
AuthorDate: Thu Feb 1 06:35:37 2024 +0530

[Docs] Added known regression note for 0.14.1 release (#10597)
---
 website/releases/release-0.14.1.md | 8 
 1 file changed, 8 insertions(+)

diff --git a/website/releases/release-0.14.1.md 
b/website/releases/release-0.14.1.md
index 9b244253a96..1905810bcfb 100644
--- a/website/releases/release-0.14.1.md
+++ b/website/releases/release-0.14.1.md
@@ -31,6 +31,14 @@ import TabItem from '@theme/TabItem';
 * Flink engine
 * Unit, functional, integration tests and CI
 
+## Known Regressions
+We discovered a regression in Hudi 0.14.1 release related to Complex Key gen 
when record key consists of one field. 
+It can silently ingest duplicates if table is upgraded from previous versions.
+
+:::tip
+Avoid upgrading any existing table to 0.14.1 if you are using 
ComplexKeyGenerator and number of fields in record key is 1.
+:::
+
 ## Raw Release Notes
 
 The raw release notes are available 
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12353493)



Re: [PR] [Docs] Added known regression note for 0.14.1 release related to ComplexKeyGen [hudi]

2024-01-31 Thread via GitHub


danny0405 merged PR #10597:
URL: https://github.com/apache/hudi/pull/10597


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7347) Introduce SeekableDataInputStream for random access

2024-01-31 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7347.

Resolution: Fixed

Fixed via master branch: e23f402e194498088f17142d9f132548ffbbd91d

> Introduce SeekableDataInputStream for random access
> ---
>
> Key: HUDI-7347
> URL: https://issues.apache.org/jira/browse/HUDI-7347
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-7347] Introduce SeekableDataInputStream for random access (#10575)

2024-01-31 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new e23f402e194 [HUDI-7347] Introduce SeekableDataInputStream for random 
access (#10575)
e23f402e194 is described below

commit e23f402e194498088f17142d9f132548ffbbd91d
Author: Y Ethan Guo 
AuthorDate: Wed Jan 31 16:48:46 2024 -0800

[HUDI-7347] Introduce SeekableDataInputStream for random access (#10575)
---
 .../hudi/common/table/log/HoodieLogFileReader.java | 36 +++
 .../table/log/block/HoodieAvroDataBlock.java   |  4 +-
 .../common/table/log/block/HoodieCDCDataBlock.java |  4 +-
 .../common/table/log/block/HoodieCommandBlock.java |  5 +-
 .../common/table/log/block/HoodieCorruptBlock.java |  5 +-
 .../common/table/log/block/HoodieDataBlock.java|  4 +-
 .../common/table/log/block/HoodieDeleteBlock.java  |  6 +--
 .../table/log/block/HoodieHFileDataBlock.java  |  4 +-
 .../common/table/log/block/HoodieLogBlock.java | 16 +++
 .../table/log/block/HoodieParquetDataBlock.java|  4 +-
 .../hadoop/fs/HadoopSeekableDataInputStream.java   | 48 
 .../apache/hudi/io/SeekableDataInputStream.java| 53 ++
 12 files changed, 150 insertions(+), 39 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
index cce13c1a6e2..fa8174931c4 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
@@ -37,9 +37,11 @@ import org.apache.hudi.exception.CorruptedLogFileException;
 import org.apache.hudi.exception.HoodieIOException;
 import org.apache.hudi.exception.HoodieNotSupportedException;
 import org.apache.hudi.hadoop.fs.BoundedFsDataInputStream;
+import org.apache.hudi.hadoop.fs.HadoopSeekableDataInputStream;
 import org.apache.hudi.hadoop.fs.SchemeAwareFSDataInputStream;
 import org.apache.hudi.hadoop.fs.TimedFSDataInputStream;
 import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.io.SeekableDataInputStream;
 import org.apache.hudi.io.util.IOUtils;
 import org.apache.hudi.storage.StorageSchemes;
 
@@ -90,7 +92,7 @@ public class HoodieLogFileReader implements 
HoodieLogFormat.Reader {
   private final boolean reverseReader;
   private final boolean enableRecordLookups;
   private boolean closed = false;
-  private FSDataInputStream inputStream;
+  private SeekableDataInputStream inputStream;
 
   public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema 
readerSchema, int bufferSize,
  boolean readBlockLazily) throws IOException {
@@ -120,7 +122,7 @@ public class HoodieLogFileReader implements 
HoodieLogFormat.Reader {
 Path updatedPath = FSUtils.makeQualified(fs, logFile.getPath());
 this.logFile = updatedPath.equals(logFile.getPath()) ? logFile : new 
HoodieLogFile(updatedPath, logFile.getFileSize());
 this.bufferSize = bufferSize;
-this.inputStream = getFSDataInputStream(fs, this.logFile, bufferSize);
+this.inputStream = getDataInputStream(fs, this.logFile, bufferSize);
 this.readerSchema = readerSchema;
 this.readBlockLazily = readBlockLazily;
 this.reverseReader = reverseReader;
@@ -202,7 +204,7 @@ public class HoodieLogFileReader implements 
HoodieLogFormat.Reader {
 if (nextBlockVersion.getVersion() == 
HoodieLogFormatVersion.DEFAULT_VERSION) {
   return HoodieAvroDataBlock.getBlock(content.get(), readerSchema, 
internalSchema);
 } else {
-  return new HoodieAvroDataBlock(() -> getFSDataInputStream(fs, 
this.logFile, bufferSize), content, readBlockLazily, logBlockContentLoc,
+  return new HoodieAvroDataBlock(() -> getDataInputStream(fs, 
this.logFile, bufferSize), content, readBlockLazily, logBlockContentLoc,
   getTargetReaderSchemaForBlock(), header, footer, keyField);
 }
 
@@ -210,7 +212,7 @@ public class HoodieLogFileReader implements 
HoodieLogFormat.Reader {
 checkState(nextBlockVersion.getVersion() != 
HoodieLogFormatVersion.DEFAULT_VERSION,
 String.format("HFile block could not be of version (%d)", 
HoodieLogFormatVersion.DEFAULT_VERSION));
 return new HoodieHFileDataBlock(
-() -> getFSDataInputStream(fs, this.logFile, bufferSize), content, 
readBlockLazily, logBlockContentLoc,
+() -> getDataInputStream(fs, this.logFile, bufferSize), content, 
readBlockLazily, logBlockContentLoc,
 Option.ofNullable(readerSchema), header, footer, 
enableRecordLookups, logFile.getPath(),
 ConfigUtils.getBooleanWithAltKeys(fs.getConf(), 
USE_NATIVE_HFILE_READER));
 
@@ -218,17 +220,17 @@ public class 

Re: [PR] [HUDI-7347] Introduce SeekableDataInputStream for random access [hudi]

2024-01-31 Thread via GitHub


danny0405 merged PR #10575:
URL: https://github.com/apache/hudi/pull/10575


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7340) Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer

2024-01-31 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7340.

Resolution: Fixed

Fixed via master branch: 4ed41e0f15e65431799340bb655d28db92de34b9

> Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer
> ---
>
> Key: HUDI-7340
> URL: https://issues.apache.org/jira/browse/HUDI-7340
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: Danny Chen
>Assignee: Lin Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-7340] Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer (#10588)

2024-01-31 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 4ed41e0f15e [HUDI-7340] Use spillable map for cached log records in 
HoodieBaseFileGroupRecordBuffer (#10588)
4ed41e0f15e is described below

commit 4ed41e0f15e65431799340bb655d28db92de34b9
Author: Danny Chan 
AuthorDate: Thu Feb 1 08:43:21 2024 +0800

[HUDI-7340] Use spillable map for cached log records in 
HoodieBaseFileGroupRecordBuffer (#10588)
---
 .../table/log/HoodieMergedLogRecordReader.java |  3 ++-
 .../read/HoodieBaseFileGroupRecordBuffer.java  | 27 --
 .../common/table/read/HoodieFileGroupReader.java   | 11 ++---
 .../table/read/HoodieFileGroupRecordBuffer.java|  7 +++---
 .../read/HoodieKeyBasedFileGroupRecordBuffer.java  | 16 +
 .../HoodiePositionBasedFileGroupRecordBuffer.java  | 14 +++
 .../common/util/HoodieRecordSizeEstimator.java |  5 ++--
 .../table/read/TestHoodieFileGroupReaderBase.java  |  5 
 .../reader/HoodieFileGroupReaderTestUtils.java |  8 ++-
 ...odieFileGroupReaderBasedParquetFileFormat.scala | 17 ++
 ...stHoodiePositionBasedFileGroupRecordBuffer.java |  7 +-
 11 files changed, 88 insertions(+), 32 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordReader.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordReader.java
index 44c4c973eae..6b31c200907 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordReader.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordReader.java
@@ -40,6 +40,7 @@ import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.io.Closeable;
+import java.io.Serializable;
 import java.util.HashSet;
 import java.util.Iterator;
 import java.util.List;
@@ -183,7 +184,7 @@ public class HoodieMergedLogRecordReader extends 
BaseHoodieLogRecordReader
 return recordBuffer.getLogRecordIterator();
   }
 
-  public Map, Map>> getRecords() {
+  public Map, Map>> getRecords() {
 return recordBuffer.getLogRecords();
   }
 
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java
index 2f695cf0249..70ddb5abff2 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java
@@ -27,11 +27,15 @@ import org.apache.hudi.common.model.HoodieRecordMerger;
 import org.apache.hudi.common.table.log.KeySpec;
 import org.apache.hudi.common.table.log.block.HoodieDataBlock;
 import org.apache.hudi.common.table.log.block.HoodieLogBlock;
+import org.apache.hudi.common.util.DefaultSizeEstimator;
+import org.apache.hudi.common.util.HoodieRecordSizeEstimator;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.ReflectionUtils;
 import org.apache.hudi.common.util.collection.ClosableIterator;
+import org.apache.hudi.common.util.collection.ExternalSpillableMap;
 import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.exception.HoodieCorruptedDataException;
+import org.apache.hudi.exception.HoodieIOException;
 import org.apache.hudi.exception.HoodieKeyException;
 import org.apache.hudi.exception.HoodieValidationException;
 
@@ -39,8 +43,8 @@ import org.apache.avro.Schema;
 import org.roaringbitmap.longlong.Roaring64NavigableMap;
 
 import java.io.IOException;
+import java.io.Serializable;
 import java.util.ArrayList;
-import java.util.HashMap;
 import java.util.Iterator;
 import java.util.List;
 import java.util.Map;
@@ -56,7 +60,7 @@ public abstract class HoodieBaseFileGroupRecordBuffer 
implements HoodieFileGr
   protected final Option partitionPathFieldOpt;
   protected final HoodieRecordMerger recordMerger;
   protected final TypedProperties payloadProps;
-  protected final Map, Map>> records;
+  protected final ExternalSpillableMap, 
Map>> records;
   protected ClosableIterator baseFileIterator;
   protected Iterator, Map>> logRecordIterator;
   protected T nextRecord;
@@ -68,7 +72,11 @@ public abstract class HoodieBaseFileGroupRecordBuffer 
implements HoodieFileGr
  Option 
partitionNameOverrideOpt,
  Option 
partitionPathFieldOpt,
  HoodieRecordMerger recordMerger,
- TypedProperties payloadProps) {
+ TypedProperties payloadProps,
+ long maxMemorySizeInBytes,
+   

Re: [PR] [HUDI-7340] Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer [hudi]

2024-01-31 Thread via GitHub


danny0405 merged PR #10588:
URL: https://github.com/apache/hudi/pull/10588


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7340] Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer [hudi]

2024-01-31 Thread via GitHub


danny0405 commented on PR #10588:
URL: https://github.com/apache/hudi/pull/10588#issuecomment-1920262000

   The failed Azure test is timed out often, should not be caused by this 
patch, will merge it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7340] Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer [hudi]

2024-01-31 Thread via GitHub


danny0405 commented on code in PR #10588:
URL: https://github.com/apache/hudi/pull/10588#discussion_r1473654923


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -107,7 +108,11 @@ public HoodieFileGroupReader(HoodieReaderContext 
readerContext,
HoodieTableConfig tableConfig,
long start,
long length,
-   boolean shouldUseRecordPosition) {
+   boolean shouldUseRecordPosition,
+   long maxMemorySizeInBytes,
+   String spillableMapBasePath,

Review Comment:
   yeah, there are two many parameters, we should add a builder for it just 
like what we do to `AbstractHoodieLogRecordReader`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920228209

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * 3bb06bb4df1185da15fd6bb3e82fdb1ff56e19cb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22255)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920219664

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * d7b4db087514e34b8d5d06b0b306d2cfaba0ff3a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22252)
 
   * 3bb06bb4df1185da15fd6bb3e82fdb1ff56e19cb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920200158

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * d7b4db087514e34b8d5d06b0b306d2cfaba0ff3a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22252)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Reorder Azure CI test modules [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10600:
URL: https://github.com/apache/hudi/pull/10600#issuecomment-1920150638

   
   ## CI report:
   
   * b4e0bd6803cab032901572e45c3ab78e8e6c764a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22254)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Reorder Azure CI test modules [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10600:
URL: https://github.com/apache/hudi/pull/10600#issuecomment-1920142076

   
   ## CI report:
   
   * b4e0bd6803cab032901572e45c3ab78e8e6c764a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7364] Move InLineFs classes to hudi-hadoop-common module [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10599:
URL: https://github.com/apache/hudi/pull/10599#issuecomment-1920142020

   
   ## CI report:
   
   * 096faa6576dce3781643ac3f8e7c3d7fb1f879ac Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22253)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920141672

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * d7b4db087514e34b8d5d06b0b306d2cfaba0ff3a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22252)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [MINOR] Reorder Azure CI test modules [hudi]

2024-01-31 Thread via GitHub


linliu-code opened a new pull request, #10600:
URL: https://github.com/apache/hudi/pull/10600

   ### Change Logs
   
   Just curious: 4 <-> 3.
   
   Wants to know if this could break the coupling between the two modules.
   
   ### Impact
   
   None.
   
   ### Risk level (write none, low medium or high below)
   
   None.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7364] Move InLineFs classes to hudi-hadoop-common module [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10599:
URL: https://github.com/apache/hudi/pull/10599#issuecomment-1920133552

   
   ## CI report:
   
   * 096faa6576dce3781643ac3f8e7c3d7fb1f879ac UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920133237

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   * d7b4db087514e34b8d5d06b0b306d2cfaba0ff3a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


linliu-code commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920123802

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7340] Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer [hudi]

2024-01-31 Thread via GitHub


linliu-code commented on code in PR #10588:
URL: https://github.com/apache/hudi/pull/10588#discussion_r1473560060


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -107,7 +108,11 @@ public HoodieFileGroupReader(HoodieReaderContext 
readerContext,
HoodieTableConfig tableConfig,
long start,
long length,
-   boolean shouldUseRecordPosition) {
+   boolean shouldUseRecordPosition,
+   long maxMemorySizeInBytes,
+   String spillableMapBasePath,

Review Comment:
   a bit ugly though. Can we wrap these parameters into a class first? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] MOR hudi 0.14, Bloom Filters are not being used on query time [hudi]

2024-01-31 Thread via GitHub


bhasudha commented on issue #10511:
URL: https://github.com/apache/hudi/issues/10511#issuecomment-1920095297

   Hi  @bk-mz  . Wanted to add to this thread. Query latency may not be the 
only metric to measure like explained in the above threads. The runs with 
parquet native bloom filters enabled and still taking similar time could be 
dominated by few factors: the need to still open all files to load the parquet 
native bloom filter, S3 throttling etc. 
   
   One way I would try testing this is to remove Hudi from the picture and take 
the same parquet dataset, and run it with and without parquet native bloom 
filter enabled. You should be able to see the output rows reduced, but the 
query time may not be that improved due to the need to load each of these files 
to read the bloom filters. 
   
   The Column stats in Hudi's metadata table helps to reduce the number of 
files scanned (unlike parquet native bloom filters).   With data skipping 
enabled, Hudi uses the column stats stored in the metadata table instead of 
scanning the metadata in each parquet file, so Hudi can better plan the query 
with such stats and the predicates by scanning/reading fewer files when 
possible (see this 
[blog](https://www.onehouse.ai/blog/hudis-column-stats-index-and-data-skipping-feature-help-speed-up-queries-by-an-orders-of-magnitude)
 for more details on data skipping in Hudi).  This is particularly helpful on 
cloud storage as cloud storage requests have constant overhead and are subject 
to rate limiting. 
   
   You bring valid feedback that we will take and work on - better showcasing 
the impact of using these indexes so the users can easily spot them. Will 
update you back on how we are incorporating this shortly.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7364] Move InLineFs classes to hudi-hadoop-common module [hudi]

2024-01-31 Thread via GitHub


yihua opened a new pull request, #10599:
URL: https://github.com/apache/hudi/pull/10599

   ### Change Logs
   
   As above.
   
   This is part of the effort to provide Hudi storage abstraction and decouple 
`hudi-common` from hadoop dependencies. For reference, the single big-change PR 
can be found here: #10360.
   
   ### Impact
   
   No behavior change now.
   
   ### Risk level
   
   none
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7364) Move InLineFs classes to hudi-hadoop-common module

2024-01-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7364:
-
Labels: pull-request-available  (was: )

> Move InLineFs classes to hudi-hadoop-common module
> --
>
> Key: HUDI-7364
> URL: https://issues.apache.org/jira/browse/HUDI-7364
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7364) Move InLineFs classes to hudi-hadoop-common module

2024-01-31 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7364:

Summary: Move InLineFs classes to hudi-hadoop-common module  (was: Move 
InLineFs classes to hudi-hadoop-common)

> Move InLineFs classes to hudi-hadoop-common module
> --
>
> Key: HUDI-7364
> URL: https://issues.apache.org/jira/browse/HUDI-7364
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7364) Move InLineFs classes to hudi-hadoop-common

2024-01-31 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7364:

Priority: Blocker  (was: Major)

> Move InLineFs classes to hudi-hadoop-common
> ---
>
> Key: HUDI-7364
> URL: https://issues.apache.org/jira/browse/HUDI-7364
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7364) Move InLineFs classes to hudi-hadoop-common

2024-01-31 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7364:
---

Assignee: Ethan Guo

> Move InLineFs classes to hudi-hadoop-common
> ---
>
> Key: HUDI-7364
> URL: https://issues.apache.org/jira/browse/HUDI-7364
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7364) Move InLineFs classes to hudi-hadoop-common

2024-01-31 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7364:

Fix Version/s: 1.0.0

> Move InLineFs classes to hudi-hadoop-common
> ---
>
> Key: HUDI-7364
> URL: https://issues.apache.org/jira/browse/HUDI-7364
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7364) Move InLineFs classes to hudi-hadoop-common

2024-01-31 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7364:
---

 Summary: Move InLineFs classes to hudi-hadoop-common
 Key: HUDI-7364
 URL: https://issues.apache.org/jira/browse/HUDI-7364
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


linliu-code commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920083504

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10278:
URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920036189

   
   ## CI report:
   
   * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN
   * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN
   * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] allow custom write support for row writer [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10598:
URL: https://github.com/apache/hudi/pull/10598#issuecomment-1919939495

   
   ## CI report:
   
   * c2046a168ba1705fc8d951299a7f33c5c8d4ebff Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22250)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919840344

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * bd5dc6e247ece35fffcfcc91bc78c8964317a241 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22249)
 
   * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] allow custom write support for row writer [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10598:
URL: https://github.com/apache/hudi/pull/10598#issuecomment-1919829816

   
   ## CI report:
   
   * c2046a168ba1705fc8d951299a7f33c5c8d4ebff Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22250)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919829429

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * bd5dc6e247ece35fffcfcc91bc78c8964317a241 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22249)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


linliu-code commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919813944

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] allow custom write support for row writer [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10598:
URL: https://github.com/apache/hudi/pull/10598#issuecomment-1919757952

   
   ## CI report:
   
   * c2046a168ba1705fc8d951299a7f33c5c8d4ebff UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919757625

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * bd5dc6e247ece35fffcfcc91bc78c8964317a241 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22249)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10278:
URL: https://github.com/apache/hudi/pull/10278#issuecomment-1919757002

   
   ## CI report:
   
   * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN
   * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN
   * 4017aca3f1cc50f0a22d023d6c175fc0224bb2b1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22244)
 
   * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch asf-site updated: added new videos for hudi oss site (#10563)

2024-01-31 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 07baecfe258 added new videos for hudi oss site (#10563)
07baecfe258 is described below

commit 07baecfe2581ceefb2dd27e92f1c07e76825ba1d
Author: nadine farah 
AuthorDate: Wed Jan 31 11:07:12 2024 -0800

added new videos for hudi oss site (#10563)

* added new videos for hudi oss site

updated for singular tags and updated readme

* updated tags in general so they are consistent

* updated aws tags to amazon
---
 README.md  |   2 +-
 ...-Setup-Locally-in-Minutes-Hands-On-Exercise.png | Bin 0 -> 124716 bytes
 ...in-two-hudi-tables-Labs-with-Exercise-Files.png | Bin 0 -> 146529 bytes
 ...cker-in-Minutes-and-Connect-to-Your-S3-Data.png | Bin 0 -> 136219 bytes
 ...COW-Table-on-S3-to-MOR-Table-using-Hudi-CLI.png | Bin 0 -> 121647 bytes
 ...d-Getting-started-Spark-Connect-Hello-World.png | Bin 0 -> 138967 bytes
 ...Index-FastAPI-Spark-Connect-with-Swagger-UI.png | Bin 0 -> 159526 bytes
 ...ll-Tables-from-particular-Schema-full-video.png | Bin 0 -> 127945 bytes
 ...res-Bring-all-Tables-from-particular-Schema.png | Bin 0 -> 125607 bytes
 ...O-locally-using-Docker-Container-in-Minutes.png | Bin 0 -> 144801 bytes
 ...ating-in-UPSERT-Mode-with-Kafka-Avro-MSG-12.png | Bin 0 -> 126280 bytes
 ...a-From-MongoDB-to-Apache-Hudi-Using-PySpark.png | Bin 0 -> 140135 bytes
 ...o_remove_duplicates_on_a_data_lake_Hudi_Labs.md |   2 +-
 ..._Bucket_Index_SIMPLE_In_Apache_Hudi_with_lab.md |   2 +-
 ...otion_with_Incremental_ETL_Using_Apache_Hudi.md |   2 +-
 ...Consistent_Hashing_in_Apache_Hudi_MOR_Tables.md |   2 +-
 ...h_Incremental_ETL_using_Apache_Hudi_Hands_On.md |   2 +-
 ..._Hudi_Apache_Hudi_Data_Lakehouse_Hudi_Apache.md |   4 ++--
 ...ling_Failed_InsertsUpserts_with_Error_Tables.md |   6 +++---
 ..._Tables_to_Redshift_Using_AWS_Glue_and_Spark.md |   2 +-
 ...ion_from_Postgres_using_Triggers_and_PySpark.md |   2 +-
 ...th-DynamoDB-for-Faster-Commit-Time-Retrieval.md |   4 ++--
 ...i-Course-for-beginner-Operations-Type-Part-5.md |   8 +++
 ...Your-Medallion-Architecture-with-Apache-Hudi.md |   2 +-
 ...-Setup-Locally-in-Minutes-Hands-On-Exercise.mdx |  24 +
 ...in-two-hudi-tables-Labs-with-Exercise-Files.mdx |  17 +++
 ...cker-in-Minutes-and-Connect-to-Your-S3-Data.mdx |  17 +++
 ...COW-Table-on-S3-to-MOR-Table-using-Hudi-CLI.mdx |  17 +++
 ...d-Getting-started-Spark-Connect-Hello-World.mdx |  14 
 ...Index-FastAPI-Spark-Connect-with-Swagger-UI.mdx |  16 ++
 ...ring-all-Tables-from-particular-Schema-full.mdx |  17 +++
 ...res-Bring-all-Tables-from-particular-Schema.mdx |  17 +++
 ...O-locally-using-Docker-Container-in-Minutes.mdx |  17 +++
 ...ating-in-UPSERT-Mode-with-Kafka-Avro-MSG-12.mdx |  21 ++
 ...a-From-MongoDB-to-Apache-Hudi-Using-PySpark.mdx |  16 ++
 35 files changed, 213 insertions(+), 20 deletions(-)

diff --git a/README.md b/README.md
index 9a9f3e1a801..2f27fc68189 100644
--- a/README.md
+++ b/README.md
@@ -204,7 +204,7 @@ Take a look at this blog for reference - (Apache Hudi vs 
Delta Lake vs Apache Ic
   - performance (involves performance related blogs)
   - blog (anything else such as announcements/release 
updates/insights/guides/tutorials/concepts overview etc)
2. tag 2
-   - Represent individual features - clustering, compaction, ingestion, 
meta-sync etc.
+   - Represent individual features - clustering, compaction, ingestion, 
meta-sync etc. Make sure you keep the features **singular**, i.e., Use `upsert` 
not `upserts` or use `delete` not `deletes`
3. tag 3
   - Source. This is usually the second level domain name for this article 
gathered from the url link.
For example if the article is 
https://www.uber.com/blog/cost-efficiency-big-data/ we would use `uber` as the 
tag here. 
diff --git 
a/website/static/assets/images/video_blogs/2023-12-24-Apache-Hudi-Spark-DBT-Glue-Hive-MetaStore-Setup-Locally-in-Minutes-Hands-On-Exercise.png
 
b/website/static/assets/images/video_blogs/2023-12-24-Apache-Hudi-Spark-DBT-Glue-Hive-MetaStore-Setup-Locally-in-Minutes-Hands-On-Exercise.png
new file mode 100644
index 000..db48750e25a
Binary files /dev/null and 
b/website/static/assets/images/video_blogs/2023-12-24-Apache-Hudi-Spark-DBT-Glue-Hive-MetaStore-Setup-Locally-in-Minutes-Hands-On-Exercise.png
 differ
diff --git 
a/website/static/assets/images/video_blogs/2023-12-25-Hudi-DBT-Spark-Glue-Hive-MetaStore-Join-two-hudi-tables-Labs-with-Exercise-Files.png
 
b/website/static/assets/images/video_blogs/2023-12-25-Hudi-DBT-Spark-Glue-Hive-MetaStore-Join-two-hudi-tables-Labs-with-Exercise-Files.png
new file 

Re: [PR] added new videos for hudi oss site [hudi]

2024-01-31 Thread via GitHub


bhasudha merged PR #10563:
URL: https://github.com/apache/hudi/pull/10563


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] allow custom write support for row writer [hudi]

2024-01-31 Thread via GitHub


jonvex opened a new pull request, #10598:
URL: https://github.com/apache/hudi/pull/10598

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10512:
URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919745623

   
   ## CI report:
   
   * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN
   * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN
   * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN
   * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN
   * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN
   * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN
   * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN
   * 7b46d61e36c1007f132c255e12d86c597a807335 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22246)
 
   * bd5dc6e247ece35fffcfcc91bc78c8964317a241 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10278:
URL: https://github.com/apache/hudi/pull/10278#issuecomment-1919744970

   
   ## CI report:
   
   * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN
   * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN
   * 4017aca3f1cc50f0a22d023d6c175fc0224bb2b1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22244)
 
   * 1c6e22304b9f819aecd328fffe84394912daf763 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Hudi-6868] Support extracting passwords from credential store for Hive Sync [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10577:
URL: https://github.com/apache/hudi/pull/10577#issuecomment-1919734629

   
   ## CI report:
   
   * 27e72600df8807de069ab066fcf4a1d40c0d9b56 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22247)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]

2024-01-31 Thread via GitHub


hudi-bot commented on PR #10278:
URL: https://github.com/apache/hudi/pull/10278#issuecomment-1919733841

   
   ## CI report:
   
   * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN
   * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN
   * 4017aca3f1cc50f0a22d023d6c175fc0224bb2b1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22244)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7363) Replace unnecessary FileSystem, Path, and FileStatus usage in other modules

2024-01-31 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7363:

Description: HUDI-6497 has done the work for hudi-common module.  This is 
to clean up usage for other modules.

> Replace unnecessary FileSystem, Path, and FileStatus usage in other modules
> ---
>
> Key: HUDI-7363
> URL: https://issues.apache.org/jira/browse/HUDI-7363
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.1.0
>
>
> HUDI-6497 has done the work for hudi-common module.  This is to clean up 
> usage for other modules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7363) Replace unnecessary FileSystem, Path, and FileStatus usage in other modules

2024-01-31 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7363:

Fix Version/s: 1.1.0

> Replace unnecessary FileSystem, Path, and FileStatus usage in other modules
> ---
>
> Key: HUDI-7363
> URL: https://issues.apache.org/jira/browse/HUDI-7363
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7363) Replace unnecessary FileSystem, Path, and FileStatus usage in other modules

2024-01-31 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7363:
---

 Summary: Replace unnecessary FileSystem, Path, and FileStatus 
usage in other modules
 Key: HUDI-7363
 URL: https://issues.apache.org/jira/browse/HUDI-7363
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT] Hudi CLI bundle not working [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10566:
URL: https://github.com/apache/hudi/issues/10566#issuecomment-1919591261

   @CTTY I was trying to reproduce this issue, but got into some other setup 
issue. Will get back to you soon on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Upsert operation not working and job is running longer while using "Record level index" in Apache Hudi 0.14 in EMR 6.15 [hudi]

2024-01-31 Thread via GitHub


SudhirSaxena commented on issue #10587:
URL: https://github.com/apache/hudi/issues/10587#issuecomment-1919588851

   
   let me check now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Upsert operation not working and job is running longer while using "Record level index" in Apache Hudi 0.14 in EMR 6.15 [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #10587:
URL: https://github.com/apache/hudi/issues/10587#issuecomment-1919587148

   That's strange! looks like it has stalled on driver. Can you check driver 
logs during this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] AWS Athena query fail when compaction is scheduled for MOR table [hudi]

2024-01-31 Thread via GitHub


codope commented on issue #9907:
URL: https://github.com/apache/hudi/issues/9907#issuecomment-1919583511

   @rahil-c Can you confirm that it is the same Athena version that can read 
Hudi 0.13.1 table but not 0.14.0 table? If so, then it  eliminates any engine 
issue, and we need to debug further in Hudi.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Datasource incremental subsequent read same as first read [hudi]

2024-01-31 Thread via GitHub


ad1happy2go commented on issue #7846:
URL: https://github.com/apache/hudi/issues/7846#issuecomment-1919582070

   @parisni This is similar issue related to Spark SQL cache the results. This 
is done to optimise subsequent reads from the table in the running terminal. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Querying Hudi tables with Spark+Velox(C++), ObjectSizeCalculator.getObjectSize hangs causing about a 50-second delay in queries [hudi]

2024-01-31 Thread via GitHub


codope commented on issue #10580:
URL: https://github.com/apache/hudi/issues/10580#issuecomment-1919578705

   Interesting. So, we had done a micro-benchmark and we found that there was 
about 5% slowness due to JOL. And since we already invoke this for only a 
sample of records and not all records in the batch, we did not consider other 
alternatives (as mentioned in the description of PR). The main reason it was 
added because Trino upgraded to Java 17 and trino-hudi connector build started 
failing (reason mentioned in the PR).
   
   I am curious if something else is going on because object size calculation 
lies on the hotpath, this issue would have surfaced in other large scale 
benchmarks that we run before release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Upsert operation not working and job is running longer while using "Record level index" in Apache Hudi 0.14 in EMR 6.15 [hudi]

2024-01-31 Thread via GitHub


SudhirSaxena commented on issue #10587:
URL: https://github.com/apache/hudi/issues/10587#issuecomment-1919576161

   Hi @ad1happy2go  , @soumilshah1995  ,@nsivabalan
   
   I am trying to see where job is getting stuck . i see driver which is in 
Executor id summary (below screenshot) is running more than 1 hour and not 
moving. I am not sure what could be the reason and why it's happening. any idea 
why it's happening and how to resolve this to run the job for upsert operation 
using Record level index in hudi 0.14 in EMR 6.15. 
   appreciate your help on this.
   
   
   Executor ID | Address | Status | RDD Blocks | Storage Memory | Disk Used | 
Cores | Active Tasks | Failed Tasks | Complete Tasks | Total Tasks | Task Time 
(GC Time) | Input | Shuffle Read | Shuffle Write | Logs
   -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
   **driver** | ip-10-156-17-51.ec2.internal:41585 | Active | 0 | 0.0 B / 15.8 
GiB | 0.0 B | 0 | 0 | 0 | 0 | 0 | **1.2 h** (0.0 ms) | 0.0 B | 0.0 B | 0.0 B
   
   
   
   https://github.com/apache/hudi/assets/33292656/9b38bdf1-4602-4670-8f83-176f4423991b;>
   https://github.com/apache/hudi/assets/33292656/d0644aee-79fe-4955-8b30-cb54312679b6;>
   https://github.com/apache/hudi/assets/33292656/05682797-a9ed-4dd0-b8b0-c7dbf235e9ac;>
   https://github.com/apache/hudi/assets/33292656/3702640f-b93e-4055-aec6-2bf8f359b95c;>
   https://github.com/apache/hudi/assets/33292656/dba1e2db-f25f-441a-bc12-943d78672b9d;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-3545) Make HoodieAvroWriteSupport class configurable

2024-01-31 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler updated HUDI-3545:
--
Status: Patch Available  (was: In Progress)

> Make HoodieAvroWriteSupport class configurable
> --
>
> Key: HUDI-3545
> URL: https://issues.apache.org/jira/browse/HUDI-3545
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Surya Prasanna Yalla
>Assignee: Surya Prasanna Yalla
>Priority: Major
>  Labels: pull-request-available
>
> Make HoodieAvroWriteSupport class configurable, that way this class can be 
> overridden by custom write support classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >