Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920710825 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * 99cf737b33b2f3687d743afde2f13a341851c237 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22266) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]
hudi-bot commented on PR #10278: URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920709373 ## CI report: * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN * 1dab0df80b70d0d70aabd57743d8681bce3c6ec1 UNKNOWN * c410c9ab8a8ea987b41a009a33157c387b06795a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22264) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
linliu-code commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920706918 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7365) Fxi a flaky test TestHoodieParquetInputFormat.testHoodieParquetInputFormatReadTimeType
Lin Liu created HUDI-7365: - Summary: Fxi a flaky test TestHoodieParquetInputFormat.testHoodieParquetInputFormatReadTimeType Key: HUDI-7365 URL: https://issues.apache.org/jira/browse/HUDI-7365 Project: Apache Hudi Issue Type: Bug Reporter: Lin Liu Assignee: Lin Liu We can see that the error sometimes without any changes. TestHoodieParquetInputFormat.testHoodieParquetInputFormatReadTimeType:818 expected: <2024-02-01 07:36:39.0> but was: <2024-02-01 07:36:39> -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [I] Hudi behaviour if AWS Glue concurrency is triggered[SUPPORT] [hudi]
rishabhreply commented on issue #10559: URL: https://github.com/apache/hudi/issues/10559#issuecomment-1920687767 @ad1happy2go Okay, so if I ingest 10 files altogether and my step function triggers multiple glue job instances to process them then there will be no data discrepancy in the data written by the jobs. Thank you for the effort! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] parquet bloom filters not supported by hudi [hudi]
parisni commented on issue #7117: URL: https://github.com/apache/hudi/issues/7117#issuecomment-1920649103 Okay will double check thanks for reaching -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] parquet bloom filters not supported by hudi [hudi]
parisni commented on issue #7117: URL: https://github.com/apache/hudi/issues/7117#issuecomment-1920648234 Got it. Can you share your draft ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920648314 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * 112adc1b0508253ec22bbc11b0fdfb90a108508d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22265) * 99cf737b33b2f3687d743afde2f13a341851c237 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22266) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Datasource incremental subsequent read same as first read [hudi]
parisni commented on issue #7846: URL: https://github.com/apache/hudi/issues/7846#issuecomment-1920644875 Thanks both for your insight. I am wondering if this behavior also apply for iceberg and delta. If not hudi might align to disable this cache by default. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920639652 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * 112adc1b0508253ec22bbc11b0fdfb90a108508d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22265) * 99cf737b33b2f3687d743afde2f13a341851c237 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920630293 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * 112adc1b0508253ec22bbc11b0fdfb90a108508d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22265) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920583621 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * 0be6e4bbc1c11531d777971851888cfa43ce1f73 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22263) * 112adc1b0508253ec22bbc11b0fdfb90a108508d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]
hudi-bot commented on PR #10278: URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920569552 ## CI report: * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN * a926d67d3d519c49dcb7b8893671b312e1e5bcfd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22262) * 1dab0df80b70d0d70aabd57743d8681bce3c6ec1 UNKNOWN * c410c9ab8a8ea987b41a009a33157c387b06795a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22264) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT] FileNotFoundException when clustering [hudi]
echisan opened a new issue, #10601: URL: https://github.com/apache/hudi/issues/10601 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** I am not sure why the parquet file is missing, flinkjob did not restart. I would like to know how to handle this issue. Is it possible to ignore the missing file? **To Reproduce** Steps to reproduce the behavior: 1.Set up a FlinkSQL job with Kafka as the data source. 2.Configure the job to write data into a Hudi Cow table with online clustering. 3.Execute the job. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : 0.13.1-rc1 * Spark version : * Hive version : 3.1.3 * Hadoop version : 2.9.2 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : flink on k8s **Additional context** ```sql CREATE TABLE ods_mqtt_msg( dt STRING, PRIMARY KEY (`field1`, `field2`, `field3`) NOT ENFORCED ) PARTITIONED BY (`dt`) WITH ( 'connector' = 'hudi', 'table.type' = 'COPY_ON_WRITE', 'path' = 's3a:///lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg', 'write.operation' = 'INSERT', 'clustering.async.enabled' = 'true', 'clustering.schedule.enabled' = 'true', 'hive_sync.enable' = 'true', 'hive_sync.mode' = 'hms', 'hive_sync.metastore.uris' = 'thrift://hive-metastore-svc.hms.svc:9083', 'read.streaming.enabled' = 'true', 'write.tasks' = '4' ); ``` **Stacktrace** ``` 2024-02-01 03:56:01,519 INFO org.apache.hudi.client.HoodieFlinkWriteClient [] - Cleaner has been spawned already. Waiting for it to finish 2024-02-01 03:56:01,519 INFO org.apache.hudi.async.AsyncCleanerService [] - Waiting for async clean service to finish 2024-02-01 03:56:01,627 INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline [] - Loaded instants upto : Option{val=[==>20240201035304158__commit__INFLIGHT]} 2024-02-01 03:56:02,333 INFO org.apache.hudi.common.util.ClusteringUtils [] - Found 658 files in pending clustering operations 2024-02-01 03:56:02,333 INFO org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView [] - Sending request : (http://169.122.153.67:46386/v1/hoodie/view/compactions/pending/?basepath=s3a%3A%2Flakehouse%2Fhudi%2Fdevice_mqtt_msg%2Fods_mqtt_msg=20240201035302997=350fb15b2282717446dd396f06ebaf80257ed284589ba906e5c3ccf6701cc223) 2024-02-01 03:56:02,427 INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline [] - Checking for file exists ?s3a:/lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg/.hoodie/20240131195932968.replacecommit.requested 2024-02-01 03:56:02,564 INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline [] - Create new file for toInstant ?s3a:/lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg/.hoodie/20240131195932968.replacecommit.inflight 2024-02-01 03:56:02,677 INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline [] - Loaded instants upto : Option{val=[20240201035304158__commit__COMPLETED]} 2024-02-01 03:56:02,677 INFO org.apache.flink.streaming.api.operators.AbstractStreamOperator [] - Execute clustering plan for instant 20240131195932968 as 17 file slices 2024-02-01 03:56:02,937 ERROR org.apache.hudi.sink.clustering.ClusteringOperator [] - Executor executes action [Execute clustering for instant 20240131195932968 from task 2] error org.apache.hudi.exception.HoodieClusteringException: Error reading input data for s3a://xxx-bucket/lakehouse/hudi/device_mqtt_msg/ods_mqtt_msg/2024-01-31/85040fcd-3f42-4b37-865f-616fc0ad3df8-0_1-4-0_20240131164655396.parquet and [] at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$null$4(ClusteringOperator.java:332) ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1] at java.lang.Iterable.spliterator(Unknown Source) ~[?:?] at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$readRecordsForGroupBaseFiles$5(ClusteringOperator.java:336) ~[hudi-flink1.16-bundle-0.13.1-rc1.jar:0.13.1-rc1] at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source) ~[?:?] at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown Source) ~[?:?] at java.util.stream.AbstractPipeline.copyInto(Unknown Source) ~[?:?] at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) ~[?:?] at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown
Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]
hudi-bot commented on PR #10278: URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920521605 ## CI report: * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248) * a926d67d3d519c49dcb7b8893671b312e1e5bcfd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22262) * 1dab0df80b70d0d70aabd57743d8681bce3c6ec1 UNKNOWN * c410c9ab8a8ea987b41a009a33157c387b06795a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22264) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]
hudi-bot commented on PR #10278: URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920514612 ## CI report: * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248) * a926d67d3d519c49dcb7b8893671b312e1e5bcfd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22262) * 1dab0df80b70d0d70aabd57743d8681bce3c6ec1 UNKNOWN * c410c9ab8a8ea987b41a009a33157c387b06795a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]
hudi-bot commented on PR #10278: URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920509429 ## CI report: * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248) * a926d67d3d519c49dcb7b8893671b312e1e5bcfd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22262) * 1dab0df80b70d0d70aabd57743d8681bce3c6ec1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920467434 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * 0be6e4bbc1c11531d777971851888cfa43ce1f73 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22263) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]
hudi-bot commented on PR #10278: URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920467036 ## CI report: * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248) * a926d67d3d519c49dcb7b8893671b312e1e5bcfd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22262) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920462121 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * e8a768676bb2bf8b64211b06b7fa90785991e958 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22261) * 0be6e4bbc1c11531d777971851888cfa43ce1f73 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]
hudi-bot commented on PR #10278: URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920461859 ## CI report: * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248) * a926d67d3d519c49dcb7b8893671b312e1e5bcfd UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920456509 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * e8a768676bb2bf8b64211b06b7fa90785991e958 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22261) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Datasource incremental subsequent read same as first read [hudi]
beyond1920 commented on issue #7846: URL: https://github.com/apache/hudi/issues/7846#issuecomment-1920428504 @parisni I agree with @ad1happy2go cache behavior happened in spark instead of HUDI. Spark would cache by `dbName`.`tableName`. https://github.com/apache/hudi/assets/1525333/2557fd10-eadf-437c-8506-eabe70ca5b89;> In addition to set `spark.sql.filesourceTableRelationCacheSize=0` as @ad1happy2go proposed, you could also try to refresh a table manually by `spark.catalog.refreshTable("database.hudi_table")` before query. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920417023 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * fbba45806b55d9801f973a4a18fe87134a41aa9c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22259) * e8a768676bb2bf8b64211b06b7fa90785991e958 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22261) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]
hudi-bot commented on PR #10591: URL: https://github.com/apache/hudi/pull/10591#issuecomment-1920410955 ## CI report: * 8207558e8c8714386cf2f71929d6fb08db10617b UNKNOWN * 44e334758625cfc2a35d7644cbcbed102e560062 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22260) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920410749 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * fbba45806b55d9801f973a4a18fe87134a41aa9c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22259) * e8a768676bb2bf8b64211b06b7fa90785991e958 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Flink streaming read MOR table, thrown Unexpected cdc file split infer case: LOG_FILE Exception [hudi]
nicholasxu commented on issue #10539: URL: https://github.com/apache/hudi/issues/10539#issuecomment-1920406638 > @nicholasxu Closing out this issue. Please reopen or create a new one in case of any further queries/issues. Thanks. ok! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]
hudi-bot commented on PR #10591: URL: https://github.com/apache/hudi/pull/10591#issuecomment-1920404441 ## CI report: * 8207558e8c8714386cf2f71929d6fb08db10617b UNKNOWN * 4e39d3ba20d5d2236e599a55c96a9c731ed721c0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22238) * 44e334758625cfc2a35d7644cbcbed102e560062 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Dataloss in FlinkCDC into Hudi without any exception or other infomation [hudi]
xuzifu666 commented on issue #10542: URL: https://github.com/apache/hudi/issues/10542#issuecomment-1920398436 @ad1happy2go According to feedbacks before,the dataloss bug was fixed in 1.0 beta version? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6497] Replace FileSystem, Path, and FileStatus usage in `hudi-common` module [hudi]
yihua commented on PR #10591: URL: https://github.com/apache/hudi/pull/10591#issuecomment-1920374907 Note to reviewer: commit `[44e3347](https://github.com/apache/hudi/pull/10591/commits/44e334758625cfc2a35d7644cbcbed102e560062)` is frozen now and I'll only add new commits for new changes and fixes to easier review. I'll also defer the rebasing and force-push until CI passes and the PR is approved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920360582 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * fbba45806b55d9801f973a4a18fe87134a41aa9c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22259) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920354291 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * c27384b5d2ed7d697c86115f473c9b18bb76f8f3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22257) * fbba45806b55d9801f973a4a18fe87134a41aa9c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920347776 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * c27384b5d2ed7d697c86115f473c9b18bb76f8f3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22257) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7364] Move InLineFs classes to hudi-hadoop-common module [hudi]
hudi-bot commented on PR #10599: URL: https://github.com/apache/hudi/pull/10599#issuecomment-1920347970 ## CI report: * 096faa6576dce3781643ac3f8e7c3d7fb1f879ac Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22253) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
linliu-code commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920323482 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Reorder Azure CI test modules [hudi]
linliu-code closed pull request #10600: [MINOR] Reorder Azure CI test modules URL: https://github.com/apache/hudi/pull/10600 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9424]Support using local timezone when writing flink TIMESTAMP data [hudi]
danny0405 commented on code in PR #10594: URL: https://github.com/apache/hudi/pull/10594#discussion_r1473678410 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/RowDataToAvroConverters.java: ## @@ -241,10 +271,10 @@ public Object convert(Schema schema, Object object) { }; } - private static RowDataToAvroConverter createRowConverter(RowType rowType) { + private static RowDataToAvroConverter createRowConverter(RowType rowType, boolean utcTimezone) { final RowDataToAvroConverter[] fieldConverters = rowType.getChildren().stream() -.map(RowDataToAvroConverters::createConverter) +.map(type -> createConverter(type, utcTimezone)) Review Comment: @voonhous , would you like to take a look at this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9424]Support using local timezone when writing flink TIMESTAMP data [hudi]
danny0405 commented on code in PR #10594: URL: https://github.com/apache/hudi/pull/10594#discussion_r1473677608 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/utils/TestRowDataToAvroConverters.java: ## @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.utils; + +import org.apache.avro.generic.GenericRecord; +import org.apache.flink.formats.common.TimestampFormat; +import org.apache.flink.formats.json.JsonToRowDataConverters; +import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonProcessingException; +import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper; +import org.apache.flink.table.api.DataTypes; +import org.apache.flink.table.types.DataType; +import org.apache.flink.table.types.logical.RowType; +import org.apache.hudi.util.AvroSchemaConverter; +import org.apache.hudi.util.RowDataToAvroConverters; + +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.Test; + +import java.time.Instant; +import java.time.LocalDateTime; +import java.time.ZoneId; +import java.time.format.DateTimeFormatter; +import java.util.TimeZone; + +import static org.apache.flink.table.api.DataTypes.ROW; +import static org.apache.flink.table.api.DataTypes.FIELD; +import static org.apache.flink.table.api.DataTypes.TIMESTAMP; + +class TestRowDataToAvroConverters { + + DateTimeFormatter formatter = DateTimeFormatter.ofPattern("-MM-dd HH:mm:ss"); + @Test + void testRowDataToAvroStringToRowDataWithLocalTimezone1() throws JsonProcessingException { +TimeZone.setDefault(TimeZone.getTimeZone(ZoneId.of("Asia/Shanghai"))); +String timestampFromUtc8 = "2021-03-30 15:44:29"; + +DataType rowDataType = ROW(FIELD("timestamp_from_utc_8", TIMESTAMP())); +JsonToRowDataConverters.JsonToRowDataConverter jsonToRowDataConverter = +new JsonToRowDataConverters(true, true, TimestampFormat.SQL) +.createConverter(rowDataType.getLogicalType()); +Object rowData = jsonToRowDataConverter.convert(new ObjectMapper().readTree("{\"timestamp_from_utc_8\":\"" + timestampFromUtc8 + "\"}")); + Review Comment: I would like to see some ITs in `ITTestHoodieDataSource`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920288525 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * 3bb06bb4df1185da15fd6bb3e82fdb1ff56e19cb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22255) * c27384b5d2ed7d697c86115f473c9b18bb76f8f3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (e23f402e194 -> b6642c65848)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from e23f402e194 [HUDI-7347] Introduce SeekableDataInputStream for random access (#10575) add b6642c65848 [MINOR] Add serialVersionUID to HoodieRecord class (#10592) No new revisions were added by this update. Summary of changes: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecord.java | 1 + 1 file changed, 1 insertion(+)
Re: [PR] [MINOR] Add serialVersionUID to HoodieRecord class [hudi]
danny0405 merged PR #10592: URL: https://github.com/apache/hudi/pull/10592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [BUG] Failure Encountered When Reading Hudi with Flink in Batch Runtime Mode and FlinkOptions.READ_AS_STREAMING=false [hudi]
danny0405 commented on issue #10576: URL: https://github.com/apache/hudi/issues/10576#issuecomment-1920285622 Yeah, prople never reports failure for batch snapshot queries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch asf-site updated: [Docs] Added known regression note for 0.14.1 release (#10597)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 8df7f1ab496 [Docs] Added known regression note for 0.14.1 release (#10597) 8df7f1ab496 is described below commit 8df7f1ab4964f1667af457128a8c6b5b73cdbc3c Author: Aditya Goenka <63430370+ad1happy...@users.noreply.github.com> AuthorDate: Thu Feb 1 06:35:37 2024 +0530 [Docs] Added known regression note for 0.14.1 release (#10597) --- website/releases/release-0.14.1.md | 8 1 file changed, 8 insertions(+) diff --git a/website/releases/release-0.14.1.md b/website/releases/release-0.14.1.md index 9b244253a96..1905810bcfb 100644 --- a/website/releases/release-0.14.1.md +++ b/website/releases/release-0.14.1.md @@ -31,6 +31,14 @@ import TabItem from '@theme/TabItem'; * Flink engine * Unit, functional, integration tests and CI +## Known Regressions +We discovered a regression in Hudi 0.14.1 release related to Complex Key gen when record key consists of one field. +It can silently ingest duplicates if table is upgraded from previous versions. + +:::tip +Avoid upgrading any existing table to 0.14.1 if you are using ComplexKeyGenerator and number of fields in record key is 1. +::: + ## Raw Release Notes The raw release notes are available [here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12353493)
Re: [PR] [Docs] Added known regression note for 0.14.1 release related to ComplexKeyGen [hudi]
danny0405 merged PR #10597: URL: https://github.com/apache/hudi/pull/10597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-7347) Introduce SeekableDataInputStream for random access
[ https://issues.apache.org/jira/browse/HUDI-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7347. Resolution: Fixed Fixed via master branch: e23f402e194498088f17142d9f132548ffbbd91d > Introduce SeekableDataInputStream for random access > --- > > Key: HUDI-7347 > URL: https://issues.apache.org/jira/browse/HUDI-7347 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated: [HUDI-7347] Introduce SeekableDataInputStream for random access (#10575)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new e23f402e194 [HUDI-7347] Introduce SeekableDataInputStream for random access (#10575) e23f402e194 is described below commit e23f402e194498088f17142d9f132548ffbbd91d Author: Y Ethan Guo AuthorDate: Wed Jan 31 16:48:46 2024 -0800 [HUDI-7347] Introduce SeekableDataInputStream for random access (#10575) --- .../hudi/common/table/log/HoodieLogFileReader.java | 36 +++ .../table/log/block/HoodieAvroDataBlock.java | 4 +- .../common/table/log/block/HoodieCDCDataBlock.java | 4 +- .../common/table/log/block/HoodieCommandBlock.java | 5 +- .../common/table/log/block/HoodieCorruptBlock.java | 5 +- .../common/table/log/block/HoodieDataBlock.java| 4 +- .../common/table/log/block/HoodieDeleteBlock.java | 6 +-- .../table/log/block/HoodieHFileDataBlock.java | 4 +- .../common/table/log/block/HoodieLogBlock.java | 16 +++ .../table/log/block/HoodieParquetDataBlock.java| 4 +- .../hadoop/fs/HadoopSeekableDataInputStream.java | 48 .../apache/hudi/io/SeekableDataInputStream.java| 53 ++ 12 files changed, 150 insertions(+), 39 deletions(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java b/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java index cce13c1a6e2..fa8174931c4 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java @@ -37,9 +37,11 @@ import org.apache.hudi.exception.CorruptedLogFileException; import org.apache.hudi.exception.HoodieIOException; import org.apache.hudi.exception.HoodieNotSupportedException; import org.apache.hudi.hadoop.fs.BoundedFsDataInputStream; +import org.apache.hudi.hadoop.fs.HadoopSeekableDataInputStream; import org.apache.hudi.hadoop.fs.SchemeAwareFSDataInputStream; import org.apache.hudi.hadoop.fs.TimedFSDataInputStream; import org.apache.hudi.internal.schema.InternalSchema; +import org.apache.hudi.io.SeekableDataInputStream; import org.apache.hudi.io.util.IOUtils; import org.apache.hudi.storage.StorageSchemes; @@ -90,7 +92,7 @@ public class HoodieLogFileReader implements HoodieLogFormat.Reader { private final boolean reverseReader; private final boolean enableRecordLookups; private boolean closed = false; - private FSDataInputStream inputStream; + private SeekableDataInputStream inputStream; public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema readerSchema, int bufferSize, boolean readBlockLazily) throws IOException { @@ -120,7 +122,7 @@ public class HoodieLogFileReader implements HoodieLogFormat.Reader { Path updatedPath = FSUtils.makeQualified(fs, logFile.getPath()); this.logFile = updatedPath.equals(logFile.getPath()) ? logFile : new HoodieLogFile(updatedPath, logFile.getFileSize()); this.bufferSize = bufferSize; -this.inputStream = getFSDataInputStream(fs, this.logFile, bufferSize); +this.inputStream = getDataInputStream(fs, this.logFile, bufferSize); this.readerSchema = readerSchema; this.readBlockLazily = readBlockLazily; this.reverseReader = reverseReader; @@ -202,7 +204,7 @@ public class HoodieLogFileReader implements HoodieLogFormat.Reader { if (nextBlockVersion.getVersion() == HoodieLogFormatVersion.DEFAULT_VERSION) { return HoodieAvroDataBlock.getBlock(content.get(), readerSchema, internalSchema); } else { - return new HoodieAvroDataBlock(() -> getFSDataInputStream(fs, this.logFile, bufferSize), content, readBlockLazily, logBlockContentLoc, + return new HoodieAvroDataBlock(() -> getDataInputStream(fs, this.logFile, bufferSize), content, readBlockLazily, logBlockContentLoc, getTargetReaderSchemaForBlock(), header, footer, keyField); } @@ -210,7 +212,7 @@ public class HoodieLogFileReader implements HoodieLogFormat.Reader { checkState(nextBlockVersion.getVersion() != HoodieLogFormatVersion.DEFAULT_VERSION, String.format("HFile block could not be of version (%d)", HoodieLogFormatVersion.DEFAULT_VERSION)); return new HoodieHFileDataBlock( -() -> getFSDataInputStream(fs, this.logFile, bufferSize), content, readBlockLazily, logBlockContentLoc, +() -> getDataInputStream(fs, this.logFile, bufferSize), content, readBlockLazily, logBlockContentLoc, Option.ofNullable(readerSchema), header, footer, enableRecordLookups, logFile.getPath(), ConfigUtils.getBooleanWithAltKeys(fs.getConf(), USE_NATIVE_HFILE_READER)); @@ -218,17 +220,17 @@ public class
Re: [PR] [HUDI-7347] Introduce SeekableDataInputStream for random access [hudi]
danny0405 merged PR #10575: URL: https://github.com/apache/hudi/pull/10575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-7340) Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer
[ https://issues.apache.org/jira/browse/HUDI-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7340. Resolution: Fixed Fixed via master branch: 4ed41e0f15e65431799340bb655d28db92de34b9 > Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer > --- > > Key: HUDI-7340 > URL: https://issues.apache.org/jira/browse/HUDI-7340 > Project: Apache Hudi > Issue Type: Improvement > Components: reader-core >Reporter: Danny Chen >Assignee: Lin Liu >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated: [HUDI-7340] Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer (#10588)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 4ed41e0f15e [HUDI-7340] Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer (#10588) 4ed41e0f15e is described below commit 4ed41e0f15e65431799340bb655d28db92de34b9 Author: Danny Chan AuthorDate: Thu Feb 1 08:43:21 2024 +0800 [HUDI-7340] Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer (#10588) --- .../table/log/HoodieMergedLogRecordReader.java | 3 ++- .../read/HoodieBaseFileGroupRecordBuffer.java | 27 -- .../common/table/read/HoodieFileGroupReader.java | 11 ++--- .../table/read/HoodieFileGroupRecordBuffer.java| 7 +++--- .../read/HoodieKeyBasedFileGroupRecordBuffer.java | 16 + .../HoodiePositionBasedFileGroupRecordBuffer.java | 14 +++ .../common/util/HoodieRecordSizeEstimator.java | 5 ++-- .../table/read/TestHoodieFileGroupReaderBase.java | 5 .../reader/HoodieFileGroupReaderTestUtils.java | 8 ++- ...odieFileGroupReaderBasedParquetFileFormat.scala | 17 ++ ...stHoodiePositionBasedFileGroupRecordBuffer.java | 7 +- 11 files changed, 88 insertions(+), 32 deletions(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordReader.java b/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordReader.java index 44c4c973eae..6b31c200907 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordReader.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordReader.java @@ -40,6 +40,7 @@ import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.Closeable; +import java.io.Serializable; import java.util.HashSet; import java.util.Iterator; import java.util.List; @@ -183,7 +184,7 @@ public class HoodieMergedLogRecordReader extends BaseHoodieLogRecordReader return recordBuffer.getLogRecordIterator(); } - public Map, Map>> getRecords() { + public Map, Map>> getRecords() { return recordBuffer.getLogRecords(); } diff --git a/hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java b/hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java index 2f695cf0249..70ddb5abff2 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java @@ -27,11 +27,15 @@ import org.apache.hudi.common.model.HoodieRecordMerger; import org.apache.hudi.common.table.log.KeySpec; import org.apache.hudi.common.table.log.block.HoodieDataBlock; import org.apache.hudi.common.table.log.block.HoodieLogBlock; +import org.apache.hudi.common.util.DefaultSizeEstimator; +import org.apache.hudi.common.util.HoodieRecordSizeEstimator; import org.apache.hudi.common.util.Option; import org.apache.hudi.common.util.ReflectionUtils; import org.apache.hudi.common.util.collection.ClosableIterator; +import org.apache.hudi.common.util.collection.ExternalSpillableMap; import org.apache.hudi.common.util.collection.Pair; import org.apache.hudi.exception.HoodieCorruptedDataException; +import org.apache.hudi.exception.HoodieIOException; import org.apache.hudi.exception.HoodieKeyException; import org.apache.hudi.exception.HoodieValidationException; @@ -39,8 +43,8 @@ import org.apache.avro.Schema; import org.roaringbitmap.longlong.Roaring64NavigableMap; import java.io.IOException; +import java.io.Serializable; import java.util.ArrayList; -import java.util.HashMap; import java.util.Iterator; import java.util.List; import java.util.Map; @@ -56,7 +60,7 @@ public abstract class HoodieBaseFileGroupRecordBuffer implements HoodieFileGr protected final Option partitionPathFieldOpt; protected final HoodieRecordMerger recordMerger; protected final TypedProperties payloadProps; - protected final Map, Map>> records; + protected final ExternalSpillableMap, Map>> records; protected ClosableIterator baseFileIterator; protected Iterator, Map>> logRecordIterator; protected T nextRecord; @@ -68,7 +72,11 @@ public abstract class HoodieBaseFileGroupRecordBuffer implements HoodieFileGr Option partitionNameOverrideOpt, Option partitionPathFieldOpt, HoodieRecordMerger recordMerger, - TypedProperties payloadProps) { + TypedProperties payloadProps, + long maxMemorySizeInBytes, +
Re: [PR] [HUDI-7340] Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer [hudi]
danny0405 merged PR #10588: URL: https://github.com/apache/hudi/pull/10588 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7340] Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer [hudi]
danny0405 commented on PR #10588: URL: https://github.com/apache/hudi/pull/10588#issuecomment-1920262000 The failed Azure test is timed out often, should not be caused by this patch, will merge it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7340] Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer [hudi]
danny0405 commented on code in PR #10588: URL: https://github.com/apache/hudi/pull/10588#discussion_r1473654923 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -107,7 +108,11 @@ public HoodieFileGroupReader(HoodieReaderContext readerContext, HoodieTableConfig tableConfig, long start, long length, - boolean shouldUseRecordPosition) { + boolean shouldUseRecordPosition, + long maxMemorySizeInBytes, + String spillableMapBasePath, Review Comment: yeah, there are two many parameters, we should add a builder for it just like what we do to `AbstractHoodieLogRecordReader`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920228209 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * 3bb06bb4df1185da15fd6bb3e82fdb1ff56e19cb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22255) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920219664 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * d7b4db087514e34b8d5d06b0b306d2cfaba0ff3a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22252) * 3bb06bb4df1185da15fd6bb3e82fdb1ff56e19cb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920200158 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * d7b4db087514e34b8d5d06b0b306d2cfaba0ff3a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22252) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Reorder Azure CI test modules [hudi]
hudi-bot commented on PR #10600: URL: https://github.com/apache/hudi/pull/10600#issuecomment-1920150638 ## CI report: * b4e0bd6803cab032901572e45c3ab78e8e6c764a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22254) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Reorder Azure CI test modules [hudi]
hudi-bot commented on PR #10600: URL: https://github.com/apache/hudi/pull/10600#issuecomment-1920142076 ## CI report: * b4e0bd6803cab032901572e45c3ab78e8e6c764a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7364] Move InLineFs classes to hudi-hadoop-common module [hudi]
hudi-bot commented on PR #10599: URL: https://github.com/apache/hudi/pull/10599#issuecomment-1920142020 ## CI report: * 096faa6576dce3781643ac3f8e7c3d7fb1f879ac Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22253) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920141672 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * Unknown: [CANCELED](TBD) * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * d7b4db087514e34b8d5d06b0b306d2cfaba0ff3a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22252) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [MINOR] Reorder Azure CI test modules [hudi]
linliu-code opened a new pull request, #10600: URL: https://github.com/apache/hudi/pull/10600 ### Change Logs Just curious: 4 <-> 3. Wants to know if this could break the coupling between the two modules. ### Impact None. ### Risk level (write none, low medium or high below) None. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7364] Move InLineFs classes to hudi-hadoop-common module [hudi]
hudi-bot commented on PR #10599: URL: https://github.com/apache/hudi/pull/10599#issuecomment-1920133552 ## CI report: * 096faa6576dce3781643ac3f8e7c3d7fb1f879ac UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920133237 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * Unknown: [CANCELED](TBD) * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN * d7b4db087514e34b8d5d06b0b306d2cfaba0ff3a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
linliu-code commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920123802 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7340] Use spillable map for cached log records in HoodieBaseFileGroupRecordBuffer [hudi]
linliu-code commented on code in PR #10588: URL: https://github.com/apache/hudi/pull/10588#discussion_r1473560060 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -107,7 +108,11 @@ public HoodieFileGroupReader(HoodieReaderContext readerContext, HoodieTableConfig tableConfig, long start, long length, - boolean shouldUseRecordPosition) { + boolean shouldUseRecordPosition, + long maxMemorySizeInBytes, + String spillableMapBasePath, Review Comment: a bit ugly though. Can we wrap these parameters into a class first? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] MOR hudi 0.14, Bloom Filters are not being used on query time [hudi]
bhasudha commented on issue #10511: URL: https://github.com/apache/hudi/issues/10511#issuecomment-1920095297 Hi @bk-mz . Wanted to add to this thread. Query latency may not be the only metric to measure like explained in the above threads. The runs with parquet native bloom filters enabled and still taking similar time could be dominated by few factors: the need to still open all files to load the parquet native bloom filter, S3 throttling etc. One way I would try testing this is to remove Hudi from the picture and take the same parquet dataset, and run it with and without parquet native bloom filter enabled. You should be able to see the output rows reduced, but the query time may not be that improved due to the need to load each of these files to read the bloom filters. The Column stats in Hudi's metadata table helps to reduce the number of files scanned (unlike parquet native bloom filters). With data skipping enabled, Hudi uses the column stats stored in the metadata table instead of scanning the metadata in each parquet file, so Hudi can better plan the query with such stats and the predicates by scanning/reading fewer files when possible (see this [blog](https://www.onehouse.ai/blog/hudis-column-stats-index-and-data-skipping-feature-help-speed-up-queries-by-an-orders-of-magnitude) for more details on data skipping in Hudi). This is particularly helpful on cloud storage as cloud storage requests have constant overhead and are subject to rate limiting. You bring valid feedback that we will take and work on - better showcasing the impact of using these indexes so the users can easily spot them. Will update you back on how we are incorporating this shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7364] Move InLineFs classes to hudi-hadoop-common module [hudi]
yihua opened a new pull request, #10599: URL: https://github.com/apache/hudi/pull/10599 ### Change Logs As above. This is part of the effort to provide Hudi storage abstraction and decouple `hudi-common` from hadoop dependencies. For reference, the single big-change PR can be found here: #10360. ### Impact No behavior change now. ### Risk level none ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7364) Move InLineFs classes to hudi-hadoop-common module
[ https://issues.apache.org/jira/browse/HUDI-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7364: - Labels: pull-request-available (was: ) > Move InLineFs classes to hudi-hadoop-common module > -- > > Key: HUDI-7364 > URL: https://issues.apache.org/jira/browse/HUDI-7364 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7364) Move InLineFs classes to hudi-hadoop-common module
[ https://issues.apache.org/jira/browse/HUDI-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7364: Summary: Move InLineFs classes to hudi-hadoop-common module (was: Move InLineFs classes to hudi-hadoop-common) > Move InLineFs classes to hudi-hadoop-common module > -- > > Key: HUDI-7364 > URL: https://issues.apache.org/jira/browse/HUDI-7364 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7364) Move InLineFs classes to hudi-hadoop-common
[ https://issues.apache.org/jira/browse/HUDI-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7364: Priority: Blocker (was: Major) > Move InLineFs classes to hudi-hadoop-common > --- > > Key: HUDI-7364 > URL: https://issues.apache.org/jira/browse/HUDI-7364 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7364) Move InLineFs classes to hudi-hadoop-common
[ https://issues.apache.org/jira/browse/HUDI-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7364: --- Assignee: Ethan Guo > Move InLineFs classes to hudi-hadoop-common > --- > > Key: HUDI-7364 > URL: https://issues.apache.org/jira/browse/HUDI-7364 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7364) Move InLineFs classes to hudi-hadoop-common
[ https://issues.apache.org/jira/browse/HUDI-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7364: Fix Version/s: 1.0.0 > Move InLineFs classes to hudi-hadoop-common > --- > > Key: HUDI-7364 > URL: https://issues.apache.org/jira/browse/HUDI-7364 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7364) Move InLineFs classes to hudi-hadoop-common
Ethan Guo created HUDI-7364: --- Summary: Move InLineFs classes to hudi-hadoop-common Key: HUDI-7364 URL: https://issues.apache.org/jira/browse/HUDI-7364 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
linliu-code commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1920083504 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]
hudi-bot commented on PR #10278: URL: https://github.com/apache/hudi/pull/10278#issuecomment-1920036189 ## CI report: * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] allow custom write support for row writer [hudi]
hudi-bot commented on PR #10598: URL: https://github.com/apache/hudi/pull/10598#issuecomment-1919939495 ## CI report: * c2046a168ba1705fc8d951299a7f33c5c8d4ebff Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22250) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919840344 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * bd5dc6e247ece35fffcfcc91bc78c8964317a241 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22249) * f8c748241017499433296ff26e6984064d8085b8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] allow custom write support for row writer [hudi]
hudi-bot commented on PR #10598: URL: https://github.com/apache/hudi/pull/10598#issuecomment-1919829816 ## CI report: * c2046a168ba1705fc8d951299a7f33c5c8d4ebff Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22250) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919829429 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * bd5dc6e247ece35fffcfcc91bc78c8964317a241 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22249) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
linliu-code commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919813944 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] allow custom write support for row writer [hudi]
hudi-bot commented on PR #10598: URL: https://github.com/apache/hudi/pull/10598#issuecomment-1919757952 ## CI report: * c2046a168ba1705fc8d951299a7f33c5c8d4ebff UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919757625 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * bd5dc6e247ece35fffcfcc91bc78c8964317a241 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22249) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]
hudi-bot commented on PR #10278: URL: https://github.com/apache/hudi/pull/10278#issuecomment-1919757002 ## CI report: * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN * 4017aca3f1cc50f0a22d023d6c175fc0224bb2b1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22244) * 1c6e22304b9f819aecd328fffe84394912daf763 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22248) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch asf-site updated: added new videos for hudi oss site (#10563)
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 07baecfe258 added new videos for hudi oss site (#10563) 07baecfe258 is described below commit 07baecfe2581ceefb2dd27e92f1c07e76825ba1d Author: nadine farah AuthorDate: Wed Jan 31 11:07:12 2024 -0800 added new videos for hudi oss site (#10563) * added new videos for hudi oss site updated for singular tags and updated readme * updated tags in general so they are consistent * updated aws tags to amazon --- README.md | 2 +- ...-Setup-Locally-in-Minutes-Hands-On-Exercise.png | Bin 0 -> 124716 bytes ...in-two-hudi-tables-Labs-with-Exercise-Files.png | Bin 0 -> 146529 bytes ...cker-in-Minutes-and-Connect-to-Your-S3-Data.png | Bin 0 -> 136219 bytes ...COW-Table-on-S3-to-MOR-Table-using-Hudi-CLI.png | Bin 0 -> 121647 bytes ...d-Getting-started-Spark-Connect-Hello-World.png | Bin 0 -> 138967 bytes ...Index-FastAPI-Spark-Connect-with-Swagger-UI.png | Bin 0 -> 159526 bytes ...ll-Tables-from-particular-Schema-full-video.png | Bin 0 -> 127945 bytes ...res-Bring-all-Tables-from-particular-Schema.png | Bin 0 -> 125607 bytes ...O-locally-using-Docker-Container-in-Minutes.png | Bin 0 -> 144801 bytes ...ating-in-UPSERT-Mode-with-Kafka-Avro-MSG-12.png | Bin 0 -> 126280 bytes ...a-From-MongoDB-to-Apache-Hudi-Using-PySpark.png | Bin 0 -> 140135 bytes ...o_remove_duplicates_on_a_data_lake_Hudi_Labs.md | 2 +- ..._Bucket_Index_SIMPLE_In_Apache_Hudi_with_lab.md | 2 +- ...otion_with_Incremental_ETL_Using_Apache_Hudi.md | 2 +- ...Consistent_Hashing_in_Apache_Hudi_MOR_Tables.md | 2 +- ...h_Incremental_ETL_using_Apache_Hudi_Hands_On.md | 2 +- ..._Hudi_Apache_Hudi_Data_Lakehouse_Hudi_Apache.md | 4 ++-- ...ling_Failed_InsertsUpserts_with_Error_Tables.md | 6 +++--- ..._Tables_to_Redshift_Using_AWS_Glue_and_Spark.md | 2 +- ...ion_from_Postgres_using_Triggers_and_PySpark.md | 2 +- ...th-DynamoDB-for-Faster-Commit-Time-Retrieval.md | 4 ++-- ...i-Course-for-beginner-Operations-Type-Part-5.md | 8 +++ ...Your-Medallion-Architecture-with-Apache-Hudi.md | 2 +- ...-Setup-Locally-in-Minutes-Hands-On-Exercise.mdx | 24 + ...in-two-hudi-tables-Labs-with-Exercise-Files.mdx | 17 +++ ...cker-in-Minutes-and-Connect-to-Your-S3-Data.mdx | 17 +++ ...COW-Table-on-S3-to-MOR-Table-using-Hudi-CLI.mdx | 17 +++ ...d-Getting-started-Spark-Connect-Hello-World.mdx | 14 ...Index-FastAPI-Spark-Connect-with-Swagger-UI.mdx | 16 ++ ...ring-all-Tables-from-particular-Schema-full.mdx | 17 +++ ...res-Bring-all-Tables-from-particular-Schema.mdx | 17 +++ ...O-locally-using-Docker-Container-in-Minutes.mdx | 17 +++ ...ating-in-UPSERT-Mode-with-Kafka-Avro-MSG-12.mdx | 21 ++ ...a-From-MongoDB-to-Apache-Hudi-Using-PySpark.mdx | 16 ++ 35 files changed, 213 insertions(+), 20 deletions(-) diff --git a/README.md b/README.md index 9a9f3e1a801..2f27fc68189 100644 --- a/README.md +++ b/README.md @@ -204,7 +204,7 @@ Take a look at this blog for reference - (Apache Hudi vs Delta Lake vs Apache Ic - performance (involves performance related blogs) - blog (anything else such as announcements/release updates/insights/guides/tutorials/concepts overview etc) 2. tag 2 - - Represent individual features - clustering, compaction, ingestion, meta-sync etc. + - Represent individual features - clustering, compaction, ingestion, meta-sync etc. Make sure you keep the features **singular**, i.e., Use `upsert` not `upserts` or use `delete` not `deletes` 3. tag 3 - Source. This is usually the second level domain name for this article gathered from the url link. For example if the article is https://www.uber.com/blog/cost-efficiency-big-data/ we would use `uber` as the tag here. diff --git a/website/static/assets/images/video_blogs/2023-12-24-Apache-Hudi-Spark-DBT-Glue-Hive-MetaStore-Setup-Locally-in-Minutes-Hands-On-Exercise.png b/website/static/assets/images/video_blogs/2023-12-24-Apache-Hudi-Spark-DBT-Glue-Hive-MetaStore-Setup-Locally-in-Minutes-Hands-On-Exercise.png new file mode 100644 index 000..db48750e25a Binary files /dev/null and b/website/static/assets/images/video_blogs/2023-12-24-Apache-Hudi-Spark-DBT-Glue-Hive-MetaStore-Setup-Locally-in-Minutes-Hands-On-Exercise.png differ diff --git a/website/static/assets/images/video_blogs/2023-12-25-Hudi-DBT-Spark-Glue-Hive-MetaStore-Join-two-hudi-tables-Labs-with-Exercise-Files.png b/website/static/assets/images/video_blogs/2023-12-25-Hudi-DBT-Spark-Glue-Hive-MetaStore-Join-two-hudi-tables-Labs-with-Exercise-Files.png new file
Re: [PR] added new videos for hudi oss site [hudi]
bhasudha merged PR #10563: URL: https://github.com/apache/hudi/pull/10563 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] allow custom write support for row writer [hudi]
jonvex opened a new pull request, #10598: URL: https://github.com/apache/hudi/pull/10598 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6902] Containerize the Azure tests [hudi]
hudi-bot commented on PR #10512: URL: https://github.com/apache/hudi/pull/10512#issuecomment-1919745623 ## CI report: * 0e5a63db2337ae435f17eb956460e22caeea65b3 UNKNOWN * 4d759f3b4d6629e738b9b1afe4157c514d6df182 UNKNOWN * a70247f32679a6441cea131e946acce6fd09523e UNKNOWN * a5529adc60d4af0c3ece9bbcdcc98ecd5482d21a UNKNOWN * b13310f2241a287a1966fe7fd63a616b86c3974c UNKNOWN * d47977a291de7374cc34436f4c4e22e1812a883e UNKNOWN * e0931770db4a4846a16b09eace9154166bd0842d UNKNOWN * 7b46d61e36c1007f132c255e12d86c597a807335 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22246) * bd5dc6e247ece35fffcfcc91bc78c8964317a241 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]
hudi-bot commented on PR #10278: URL: https://github.com/apache/hudi/pull/10278#issuecomment-1919744970 ## CI report: * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN * 4017aca3f1cc50f0a22d023d6c175fc0224bb2b1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22244) * 1c6e22304b9f819aecd328fffe84394912daf763 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Hudi-6868] Support extracting passwords from credential store for Hive Sync [hudi]
hudi-bot commented on PR #10577: URL: https://github.com/apache/hudi/pull/10577#issuecomment-1919734629 ## CI report: * 27e72600df8807de069ab066fcf4a1d40c0d9b56 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22247) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7045] Create parquet readers inside the reader context and implement schema.on.read in the filegroup reader in spark [hudi]
hudi-bot commented on PR #10278: URL: https://github.com/apache/hudi/pull/10278#issuecomment-1919733841 ## CI report: * d98b47625ecada36364aa02aa1496dafd330c6a9 UNKNOWN * ab0b2127349325a3c939fe65da9d8caaac0da018 UNKNOWN * 4017aca3f1cc50f0a22d023d6c175fc0224bb2b1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22244) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7363) Replace unnecessary FileSystem, Path, and FileStatus usage in other modules
[ https://issues.apache.org/jira/browse/HUDI-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7363: Description: HUDI-6497 has done the work for hudi-common module. This is to clean up usage for other modules. > Replace unnecessary FileSystem, Path, and FileStatus usage in other modules > --- > > Key: HUDI-7363 > URL: https://issues.apache.org/jira/browse/HUDI-7363 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Fix For: 1.1.0 > > > HUDI-6497 has done the work for hudi-common module. This is to clean up > usage for other modules. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7363) Replace unnecessary FileSystem, Path, and FileStatus usage in other modules
[ https://issues.apache.org/jira/browse/HUDI-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7363: Fix Version/s: 1.1.0 > Replace unnecessary FileSystem, Path, and FileStatus usage in other modules > --- > > Key: HUDI-7363 > URL: https://issues.apache.org/jira/browse/HUDI-7363 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Fix For: 1.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7363) Replace unnecessary FileSystem, Path, and FileStatus usage in other modules
Ethan Guo created HUDI-7363: --- Summary: Replace unnecessary FileSystem, Path, and FileStatus usage in other modules Key: HUDI-7363 URL: https://issues.apache.org/jira/browse/HUDI-7363 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [I] [SUPPORT] Hudi CLI bundle not working [hudi]
ad1happy2go commented on issue #10566: URL: https://github.com/apache/hudi/issues/10566#issuecomment-1919591261 @CTTY I was trying to reproduce this issue, but got into some other setup issue. Will get back to you soon on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Upsert operation not working and job is running longer while using "Record level index" in Apache Hudi 0.14 in EMR 6.15 [hudi]
SudhirSaxena commented on issue #10587: URL: https://github.com/apache/hudi/issues/10587#issuecomment-1919588851 let me check now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Upsert operation not working and job is running longer while using "Record level index" in Apache Hudi 0.14 in EMR 6.15 [hudi]
ad1happy2go commented on issue #10587: URL: https://github.com/apache/hudi/issues/10587#issuecomment-1919587148 That's strange! looks like it has stalled on driver. Can you check driver logs during this time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] AWS Athena query fail when compaction is scheduled for MOR table [hudi]
codope commented on issue #9907: URL: https://github.com/apache/hudi/issues/9907#issuecomment-1919583511 @rahil-c Can you confirm that it is the same Athena version that can read Hudi 0.13.1 table but not 0.14.0 table? If so, then it eliminates any engine issue, and we need to debug further in Hudi. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Datasource incremental subsequent read same as first read [hudi]
ad1happy2go commented on issue #7846: URL: https://github.com/apache/hudi/issues/7846#issuecomment-1919582070 @parisni This is similar issue related to Spark SQL cache the results. This is done to optimise subsequent reads from the table in the running terminal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Querying Hudi tables with Spark+Velox(C++), ObjectSizeCalculator.getObjectSize hangs causing about a 50-second delay in queries [hudi]
codope commented on issue #10580: URL: https://github.com/apache/hudi/issues/10580#issuecomment-1919578705 Interesting. So, we had done a micro-benchmark and we found that there was about 5% slowness due to JOL. And since we already invoke this for only a sample of records and not all records in the batch, we did not consider other alternatives (as mentioned in the description of PR). The main reason it was added because Trino upgraded to Java 17 and trino-hudi connector build started failing (reason mentioned in the PR). I am curious if something else is going on because object size calculation lies on the hotpath, this issue would have surfaced in other large scale benchmarks that we run before release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Upsert operation not working and job is running longer while using "Record level index" in Apache Hudi 0.14 in EMR 6.15 [hudi]
SudhirSaxena commented on issue #10587: URL: https://github.com/apache/hudi/issues/10587#issuecomment-1919576161 Hi @ad1happy2go , @soumilshah1995 ,@nsivabalan I am trying to see where job is getting stuck . i see driver which is in Executor id summary (below screenshot) is running more than 1 hour and not moving. I am not sure what could be the reason and why it's happening. any idea why it's happening and how to resolve this to run the job for upsert operation using Record level index in hudi 0.14 in EMR 6.15. appreciate your help on this. Executor ID | Address | Status | RDD Blocks | Storage Memory | Disk Used | Cores | Active Tasks | Failed Tasks | Complete Tasks | Total Tasks | Task Time (GC Time) | Input | Shuffle Read | Shuffle Write | Logs -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- **driver** | ip-10-156-17-51.ec2.internal:41585 | Active | 0 | 0.0 B / 15.8 GiB | 0.0 B | 0 | 0 | 0 | 0 | 0 | **1.2 h** (0.0 ms) | 0.0 B | 0.0 B | 0.0 B https://github.com/apache/hudi/assets/33292656/9b38bdf1-4602-4670-8f83-176f4423991b;> https://github.com/apache/hudi/assets/33292656/d0644aee-79fe-4955-8b30-cb54312679b6;> https://github.com/apache/hudi/assets/33292656/05682797-a9ed-4dd0-b8b0-c7dbf235e9ac;> https://github.com/apache/hudi/assets/33292656/3702640f-b93e-4055-aec6-2bf8f359b95c;> https://github.com/apache/hudi/assets/33292656/dba1e2db-f25f-441a-bc12-943d78672b9d;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3545) Make HoodieAvroWriteSupport class configurable
[ https://issues.apache.org/jira/browse/HUDI-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-3545: -- Status: Patch Available (was: In Progress) > Make HoodieAvroWriteSupport class configurable > -- > > Key: HUDI-3545 > URL: https://issues.apache.org/jira/browse/HUDI-3545 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core >Reporter: Surya Prasanna Yalla >Assignee: Surya Prasanna Yalla >Priority: Major > Labels: pull-request-available > > Make HoodieAvroWriteSupport class configurable, that way this class can be > overridden by custom write support classes. -- This message was sent by Atlassian Jira (v8.20.10#820010)