[GitHub] [hudi] ranjani1993 opened a new issue, #7693: [SUPPORT] HUDI file cleanup - Not working as expected

2023-01-17 Thread GitBox
ranjani1993 opened a new issue, #7693: URL: https://github.com/apache/hudi/issues/7693 **Describe the problem you faced** HUDI file cleanup is not working as expected when we run it along with data ingestion. **config used:** df.write.format("hudi"). option(

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1386630097 ## CI report: * ed783b49dbeec18cca93a9fe43f1c4f8ee9ae6dd UNKNOWN * 091943461a6aa7e7dab9364813fb867f6a8771f6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] duc-dn commented on issue #7683: [SUPPORT] Querying data using Trino only returns records of the latest commit, not all records.

2023-01-17 Thread GitBox
duc-dn commented on issue #7683: URL: https://github.com/apache/hudi/issues/7683#issuecomment-1386629863 Hi @danny0405 I use trino connector, how to check the metastore synced for Trino?? - [This is my log file of trino and hivemetastore](https://drive.google.com/drive/folders/1Apx4CA

[jira] [Created] (HUDI-5574) Support auto record key generation with Spark SQL

2023-01-17 Thread Lokesh Jain (Jira)
Lokesh Jain created HUDI-5574: - Summary: Support auto record key generation with Spark SQL Key: HUDI-5574 URL: https://issues.apache.org/jira/browse/HUDI-5574 Project: Apache Hudi Issue Type: Bug

[GitHub] [hudi] danny0405 commented on a diff in pull request #6815: [HUDI-4937] Fix `HoodieTable` injecting non-reusable `HoodieBackedTableMetadata` aggressively flushing MT readers

2023-01-17 Thread GitBox
danny0405 commented on code in PR #6815: URL: https://github.com/apache/hudi/pull/6815#discussion_r1073171203 ## hudi-common/src/main/java/org/apache/hudi/common/fs/LeakTrackingFSDataInputStream.java: ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

[GitHub] [hudi] mahesh2247 commented on issue #7688: [SUPPORT] Trying to write a glue job script for reflecting CDC delete (Data Pipelining Kinesis Streams to create Apache Hudi Table from AWS Glue Jo

2023-01-17 Thread GitBox
mahesh2247 commented on issue #7688: URL: https://github.com/apache/hudi/issues/7688#issuecomment-1386594913 Resulting in ``` 23/01/18 06:44:20 ERROR ProcessLauncher: Error from Python:Traceback (most recent call last): File "/tmp/glue_job_script.py", line 77, in glueConte

[GitHub] [hudi] nsivabalan opened a new pull request, #7692: [HUDI-XXXX] enabling scan V2 for log record reader

2023-01-17 Thread GitBox
nsivabalan opened a new pull request, #7692: URL: https://github.com/apache/hudi/pull/7692 ### Change Logs testing ScanV2 with log record reader. ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none,

[GitHub] [hudi] mahesh2247 commented on issue #7688: [SUPPORT] Trying to write a glue job script for reflecting CDC delete (Data Pipelining Kinesis Streams to create Apache Hudi Table from AWS Glue Jo

2023-01-17 Thread GitBox
mahesh2247 commented on issue #7688: URL: https://github.com/apache/hudi/issues/7688#issuecomment-1386588309 Hey danny0405 and umehrot2. Thanks for your reply. I realised that I needed to add a logic to delete incoming data streams with a "REMOVE" label in it but do not know how to implemen

[GitHub] [hudi] TengHuo commented on issue #7691: [SUPPORT] Flink's schema conflicts with spark's schema.

2023-01-17 Thread GitBox
TengHuo commented on issue #7691: URL: https://github.com/apache/hudi/issues/7691#issuecomment-1386565809 > Not the same. The current issue is the schema compatibility problem between Flink and Spark. Yeah, not the same, but I think they are similar. In #7284, we found it uses a patt

[GitHub] [hudi] hudi-bot commented on pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
hudi-bot commented on PR #7626: URL: https://github.com/apache/hudi/pull/7626#issuecomment-1386565755 ## CI report: * ae0b2c787c8e3afd7f9a3f6cc04676f910373657 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=143

[GitHub] [hudi] hudi-bot commented on pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
hudi-bot commented on PR #7626: URL: https://github.com/apache/hudi/pull/7626#issuecomment-1386560274 ## CI report: * fad68a69ecf7e7eba8d43307e1b0fa9da6244857 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1423

[GitHub] [hudi] LinMingQiang commented on issue #7691: [SUPPORT] Flink's schema conflicts with spark's schema.

2023-01-17 Thread GitBox
LinMingQiang commented on issue #7691: URL: https://github.com/apache/hudi/issues/7691#issuecomment-1386558629 Not the same. The current issue is the schema compatibility problem between Flink and Spark. -- This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [hudi] trushev commented on pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
trushev commented on PR #7626: URL: https://github.com/apache/hudi/pull/7626#issuecomment-1386555492 @danny0405 Could you pls take a look again. New solution: - Replaced `Map>` with `Map` - All handles are definitely closed by finally section with `closeGracefully()` -- This is an a

[GitHub] [hudi] hudi-bot commented on pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
hudi-bot commented on PR #7626: URL: https://github.com/apache/hudi/pull/7626#issuecomment-1386553951 ## CI report: * fad68a69ecf7e7eba8d43307e1b0fa9da6244857 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1423

[GitHub] [hudi] hudi-bot commented on pull request #6815: [HUDI-4937] Fix `HoodieTable` injecting non-reusable `HoodieBackedTableMetadata` aggressively flushing MT readers

2023-01-17 Thread GitBox
hudi-bot commented on PR #6815: URL: https://github.com/apache/hudi/pull/6815#issuecomment-1386552910 ## CI report: * 13fb78850890b96b86b66d7df060feb11950ec0c UNKNOWN * a516a4e3d57db84065f08219bc09442569b4627f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #6384: [HUDI-4613] Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2023-01-17 Thread GitBox
hudi-bot commented on PR #6384: URL: https://github.com/apache/hudi/pull/6384#issuecomment-1386552520 ## CI report: * a7ece0e42ac674d75b035220f129e5c0892dbf05 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1437

[GitHub] [hudi] TengHuo commented on issue #7691: [SUPPORT] Flink's schema conflicts with spark's schema.

2023-01-17 Thread GitBox
TengHuo commented on issue #7691: URL: https://github.com/apache/hudi/issues/7691#issuecomment-1386540907 Is this the same issue as this one? It was an Avro schema namespace inconsistent issue we found before. https://github.com/apache/hudi/issues/7284 -- This is an automated message f

[GitHub] [hudi] xushiyan commented on a diff in pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
xushiyan commented on code in PR #5926: URL: https://github.com/apache/hudi/pull/5926#discussion_r1072830110 ## hudi-platform-service/hudi-table-service-manager/src/main/java/org/apache/hudi/table/service/manager/util/DateTimeUtils.java: ## @@ -0,0 +1,39 @@ +/* + * Licensed to t

[GitHub] [hudi] trushev commented on pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
trushev commented on PR #7626: URL: https://github.com/apache/hudi/pull/7626#issuecomment-1386515641 > The thing which I want to share is that caching write handles could take a lot of memory, because each handle obtains an instance of `HoodieTable`, and there is a `viewManager` in every `H

[jira] [Created] (HUDI-5573) Support table operation APIs for Hudi

2023-01-17 Thread Danny Chen (Jira)
Danny Chen created HUDI-5573: Summary: Support table operation APIs for Hudi Key: HUDI-5573 URL: https://issues.apache.org/jira/browse/HUDI-5573 Project: Apache Hudi Issue Type: New Feature

[GitHub] [hudi] danny0405 closed issue #7686: [SUPPORT] Is there any way to delete records by specify one field value without selecting all the records out

2023-01-17 Thread GitBox
danny0405 closed issue #7686: [SUPPORT] Is there any way to delete records by specify one field value without selecting all the records out URL: https://github.com/apache/hudi/issues/7686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] danny0405 commented on issue #7686: [SUPPORT] Is there any way to delete records by specify one field value without selecting all the records out

2023-01-17 Thread GitBox
danny0405 commented on issue #7686: URL: https://github.com/apache/hudi/issues/7686#issuecomment-1386504650 One workaround is insert into one record with the desired primary key and define your payload class as a delete payload, but yeah, I agree Iceberg has better definition and operabilit

[GitHub] [hudi] hudi-bot commented on pull request #7690: [HUDI-5485] Add File System View API for batch listing and improve savepoint performance with metadata table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7690: URL: https://github.com/apache/hudi/pull/7690#issuecomment-1386503584 ## CI report: * ca9fb1e21c08ac0eb7dc6305934f1c58803070e3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1437

[GitHub] [hudi] hudi-bot commented on pull request #7677: [HUDI-5559] Support CDC for flink bounded source

2023-01-17 Thread GitBox
hudi-bot commented on PR #7677: URL: https://github.com/apache/hudi/pull/7677#issuecomment-1386503521 ## CI report: * c81f60f80a945dd2377e2fff4bc6207cc63ef576 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1435

[GitHub] [hudi] hudi-bot commented on pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
hudi-bot commented on PR #7626: URL: https://github.com/apache/hudi/pull/7626#issuecomment-1386503348 ## CI report: * fad68a69ecf7e7eba8d43307e1b0fa9da6244857 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1423

[GitHub] [hudi] hudi-bot commented on pull request #6815: [HUDI-4937] Fix `HoodieTable` injecting non-reusable `HoodieBackedTableMetadata` aggressively flushing MT readers

2023-01-17 Thread GitBox
hudi-bot commented on PR #6815: URL: https://github.com/apache/hudi/pull/6815#issuecomment-1386502581 ## CI report: * 13fb78850890b96b86b66d7df060feb11950ec0c UNKNOWN * 3d90e88fda205fd2cbf95c402a19b5bba2ebfa18 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1386502062 ## CI report: * ed783b49dbeec18cca93a9fe43f1c4f8ee9ae6dd UNKNOWN * a94346128d6b22fec262f74d7c2c9d7d342a0a3c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-

[GitHub] [hudi] danny0405 commented on issue #7683: [SUPPORT] Querying data using Trino only returns records of the latest commit, not all records.

2023-01-17 Thread GitBox
danny0405 commented on issue #7683: URL: https://github.com/apache/hudi/issues/7683#issuecomment-1386499692 I guess the metadata is out of sync for Trino, do you use the Trino connector or Trino Hive. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] hudi-bot commented on pull request #7677: [HUDI-5559] Support CDC for flink bounded source

2023-01-17 Thread GitBox
hudi-bot commented on PR #7677: URL: https://github.com/apache/hudi/pull/7677#issuecomment-1386498457 ## CI report: * c81f60f80a945dd2377e2fff4bc6207cc63ef576 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1435

[GitHub] [hudi] danny0405 commented on a diff in pull request #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-17 Thread GitBox
danny0405 commented on code in PR #7684: URL: https://github.com/apache/hudi/pull/7684#discussion_r1073100473 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/bootstrap/HoodieSparkBootstrapSchemaProvider.java: ## @@ -92,7 +92,7 @@ private static Schema getB

[GitHub] [hudi] danny0405 commented on a diff in pull request #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-17 Thread GitBox
danny0405 commented on code in PR #7684: URL: https://github.com/apache/hudi/pull/7684#discussion_r1073100251 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/bootstrap/HoodieSparkBootstrapSchemaProvider.java: ## @@ -60,7 +60,7 @@ protected Schema getBootstr

[GitHub] [hudi] hudi-bot commented on pull request #6815: [HUDI-4937] Fix `HoodieTable` injecting non-reusable `HoodieBackedTableMetadata` aggressively flushing MT readers

2023-01-17 Thread GitBox
hudi-bot commented on PR #6815: URL: https://github.com/apache/hudi/pull/6815#issuecomment-1386497208 ## CI report: * 13fb78850890b96b86b66d7df060feb11950ec0c UNKNOWN * 3d90e88fda205fd2cbf95c402a19b5bba2ebfa18 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1386496551 ## CI report: * ed783b49dbeec18cca93a9fe43f1c4f8ee9ae6dd UNKNOWN * a94346128d6b22fec262f74d7c2c9d7d342a0a3c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-

[GitHub] [hudi] danny0405 commented on issue #7688: [SUPPORT] Trying to write a glue job script for reflecting CDC delete (Data Pipelining Kinesis Streams to create Apache Hudi Table from AWS Glue Job

2023-01-17 Thread GitBox
danny0405 commented on issue #7688: URL: https://github.com/apache/hudi/issues/7688#issuecomment-1386495431 Thanks, we need more detail stack trace for the Hudi errors to triage the problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [hudi] danny0405 commented on pull request #7687: Update to handle deletes in postgres debezium

2023-01-17 Thread GitBox
danny0405 commented on PR #7687: URL: https://github.com/apache/hudi/pull/7687#issuecomment-1386493383 Thanks @BalaMahesh can we fire a JIRA issue and change the commit title to: [HUDI-${JIRA_ID}] ${you commit title} -- This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] danny0405 commented on issue #7689: [SUPPORT] PriorityBasedFileSystemView: Got error running preferred function. Trying secondary

2023-01-17 Thread GitBox
danny0405 commented on issue #7689: URL: https://github.com/apache/hudi/issues/7689#issuecomment-1386491862 The secondary view is a fallback when the first view returns any error from server, the secondary is usually a local view that scans the files by local task which is a costly operatio

[GitHub] [hudi] hudi-bot commented on pull request #7642: [HUDI-5534][Stacked on 6782] Optimizing Bloom Index lookup when using Bloom Filters from Metadata Table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7642: URL: https://github.com/apache/hudi/pull/7642#issuecomment-1386491820 ## CI report: * 01697615c3d88afaa15a59cad6d0c5548b295253 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1437

[GitHub] [hudi] hudi-bot commented on pull request #7612: [HUDI-5336] Fixing log file pattern match to ignore extraneous files

2023-01-17 Thread GitBox
hudi-bot commented on PR #7612: URL: https://github.com/apache/hudi/pull/7612#issuecomment-1386491701 ## CI report: * 1dc0a0732953fa0b470054c828981e226803e8aa Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1437

[GitHub] [hudi] danny0405 commented on issue #7691: [SUPPORT] Flink's schema conflicts with spark's schema.

2023-01-17 Thread GitBox
danny0405 commented on issue #7691: URL: https://github.com/apache/hudi/issues/7691#issuecomment-1386486937 Okey, seems a bug, flink uses the constant namespace named 'record' when generating the avro schema, does that cause the im-compatibility? Can you fire a JIRA to address and fix this?

[jira] [Assigned] (HUDI-5572) Flink write need to skip check the compatibility of Schema#name

2023-01-17 Thread HunterXHunter (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-5572: --- Assignee: HunterXHunter > Flink write need to skip check the compatibility of Schema#name > -

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7323: [HUDI-5276][WIP] Exclude unnecessary partiton paths return

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7323: URL: https://github.com/apache/hudi/pull/7323#discussion_r1073075839 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java: ## @@ -152,7 +153,8 @@ protected Option> getRecordByKey(String key, @Overr

[jira] [Updated] (HUDI-5276) Hudi getAllQueryPartitionPaths use regular match caused Invalid input path add

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5276: -- Description:   When we query sql in hive like: select mainwaybillno, zonecode, accountantcode,

[jira] [Assigned] (HUDI-5276) Hudi getAllQueryPartitionPaths use regular match caused Invalid input path add

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5276: - Assignee: Alexey Kudinkin > Hudi getAllQueryPartitionPaths use regular match caused Inval

[jira] [Updated] (HUDI-5572) Flink write need to skip check the compatibility of Schema#name

2023-01-17 Thread HunterXHunter (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5572: Description: When we use spark to initialize the hudi table, .hoodie#hoodie.properties#hoodie.table

[jira] [Updated] (HUDI-5572) Flink write need to skip check the compatibility of Schema#name

2023-01-17 Thread HunterXHunter (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-5572: Attachment: image-2023-01-18-11-51-12-914.png > Flink write need to skip check the compatibility of

[jira] [Created] (HUDI-5572) Flink write need to skip check the compatibility of Schema#name

2023-01-17 Thread HunterXHunter (Jira)
HunterXHunter created HUDI-5572: --- Summary: Flink write need to skip check the compatibility of Schema#name Key: HUDI-5572 URL: https://issues.apache.org/jira/browse/HUDI-5572 Project: Apache Hudi

[GitHub] [hudi] hudi-bot commented on pull request #7642: [HUDI-5534][Stacked on 6782] Optimizing Bloom Index lookup when using Bloom Filters from Metadata Table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7642: URL: https://github.com/apache/hudi/pull/7642#issuecomment-1386443421 ## CI report: * 094ff711d9518a04e93ab0ba28ed636827652d8c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1426

[GitHub] [hudi] hudi-bot commented on pull request #6384: [HUDI-4613] Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2023-01-17 Thread GitBox
hudi-bot commented on PR #6384: URL: https://github.com/apache/hudi/pull/6384#issuecomment-1386442484 ## CI report: * Unknown: [CANCELED](TBD) * a7ece0e42ac674d75b035220f129e5c0892dbf05 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039

[GitHub] [hudi] hudi-bot commented on pull request #7642: [HUDI-5534][Stacked on 6782] Optimizing Bloom Index lookup when using Bloom Filters from Metadata Table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7642: URL: https://github.com/apache/hudi/pull/7642#issuecomment-1386439125 ## CI report: * 094ff711d9518a04e93ab0ba28ed636827652d8c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1426

[GitHub] [hudi] hudi-bot commented on pull request #6384: [HUDI-4613] Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2023-01-17 Thread GitBox
hudi-bot commented on PR #6384: URL: https://github.com/apache/hudi/pull/6384#issuecomment-1386438102 ## CI report: * Unknown: [CANCELED](TBD) * a7ece0e42ac674d75b035220f129e5c0892dbf05 UNKNOWN Bot commands @hudi-bot supports the following commands: - `

[GitHub] [hudi] ThinkerLei commented on pull request #6384: [HUDI-4613] Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2023-01-17 Thread GitBox
ThinkerLei commented on PR #6384: URL: https://github.com/apache/hudi/pull/6384#issuecomment-1386436685 @hudi-bot run azure re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [hudi] hudi-bot commented on pull request #6815: [HUDI-4937] Fix `HoodieTable` injecting non-reusable `HoodieBackedTableMetadata` aggressively flushing MT readers

2023-01-17 Thread GitBox
hudi-bot commented on PR #6815: URL: https://github.com/apache/hudi/pull/6815#issuecomment-1386433086 ## CI report: * 13fb78850890b96b86b66d7df060feb11950ec0c UNKNOWN * 3d90e88fda205fd2cbf95c402a19b5bba2ebfa18 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #6384: [HUDI-4613] Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2023-01-17 Thread GitBox
hudi-bot commented on PR #6384: URL: https://github.com/apache/hudi/pull/6384#issuecomment-1386432790 ## CI report: * d18a40d00cb6ff6c2ff2768b289c1435e3ceaa28 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1205

[GitHub] [hudi] ThinkerLei commented on pull request #6384: [HUDI-4613] Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2023-01-17 Thread GitBox
ThinkerLei commented on PR #6384: URL: https://github.com/apache/hudi/pull/6384#issuecomment-1386415494 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[jira] [Updated] (HUDI-5535) Add support for keyless for all keygens(non partitioned, timestamp based key gen)

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5535: - Status: Patch Available (was: In Progress) > Add support for keyless for all keygens(non partitioned, tim

[jira] [Updated] (HUDI-5535) Add support for keyless for all keygens(non partitioned, timestamp based key gen)

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5535: - Reviewers: Alexey Kudinkin > Add support for keyless for all keygens(non partitioned, timestamp based key

[jira] [Updated] (HUDI-5535) Add support for keyless for all keygens(non partitioned, timestamp based key gen)

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5535: - Status: In Progress (was: Open) > Add support for keyless for all keygens(non partitioned, timestamp base

[jira] [Updated] (HUDI-5417) Support to read avro from non-legacy map/list in parquet log

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5417: -- Status: In Progress (was: Open) > Support to read avro from non-legacy map/list in parquet log

[jira] [Updated] (HUDI-5417) Support to read avro from non-legacy map/list in parquet log

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5417: -- Status: Patch Available (was: In Progress) > Support to read avro from non-legacy map/list in p

[jira] [Updated] (HUDI-5417) Support to read avro from non-legacy map/list in parquet log

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5417: -- Story Points: 1 > Support to read avro from non-legacy map/list in parquet log > --

[jira] [Updated] (HUDI-5537) Support partitionBy with dataframe apis

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5537: - Sprint: 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3) > Support partitionBy

[jira] [Updated] (HUDI-5417) Support to read avro from non-legacy map/list in parquet log

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5417: -- Reviewers: Alexey Kudinkin > Support to read avro from non-legacy map/list in parquet log > ---

[jira] [Updated] (HUDI-5417) Support to read avro from non-legacy map/list in parquet log

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5417: -- Summary: Support to read avro from non-legacy map/list in parquet log (was: support to read av

[jira] [Updated] (HUDI-5417) support to read avro from non-legacy map/list in parquet log

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5417: -- Sprint: 0.13.0 Final Sprint 3 > support to read avro from non-legacy map/list in parquet log >

[jira] [Updated] (HUDI-5417) support to read avro from non-legacy map/list in parquet log

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5417: -- Fix Version/s: 0.13.0 > support to read avro from non-legacy map/list in parquet log >

[jira] [Updated] (HUDI-5417) support to read avro from non-legacy map/list in parquet log

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5417: -- Priority: Blocker (was: Major) > support to read avro from non-legacy map/list in parquet log

[jira] [Assigned] (HUDI-5417) support to read avro from non-legacy map/list in parquet log

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5417: - Assignee: Frank Wong > support to read avro from non-legacy map/list in parquet log > --

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5569: - Reviewers: sivabalan narayanan > Files written by first commit/delta commit if it failed is detected as va

[jira] [Updated] (HUDI-5475) not able to generate utilities-slim bundle dependency tree

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5475: - Reviewers: Raymond Xu > not able to generate utilities-slim bundle dependency tree > -

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5569: - Story Points: 2 > Files written by first commit/delta commit if it failed is detected as valid > data fil

[jira] [Updated] (HUDI-5555) Set class loader for parquet data block

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-: - Reviewers: Alexey Kudinkin > Set class loader for parquet data block > ---

[jira] [Updated] (HUDI-5323) Decouple virtual key with writing bloom filters to parquet files

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5323: Story Points: 1 (was: 0.5) > Decouple virtual key with writing bloom filters to parquet files > ---

[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Story Points: 2 (was: 1) > Fix HiveHoodieTableFileIndex to use lazy listing > -

[jira] [Updated] (HUDI-5555) Set class loader for parquet data block

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-: - Story Points: 0.5 > Set class loader for parquet data block > --- > >

[jira] [Updated] (HUDI-5534) Optimize Bloom Index lookup DAG

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5534: -- Story Points: 3 (was: 2) > Optimize Bloom Index lookup DAG > --- >

[jira] [Closed] (HUDI-4586) Address S3 timeouts in Bloom Index with metadata table

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin closed HUDI-4586. - Resolution: Fixed > Address S3 timeouts in Bloom Index with metadata table > -

[jira] [Updated] (HUDI-5384) Make sure predicates are appropriately pushed down to HoodieFileIndex when lazy listing

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5384: -- Story Points: 1 (was: 2) > Make sure predicates are appropriately pushed down to HoodieFileInde

[jira] [Updated] (HUDI-5552) Too slow while using trino-hudi connector while querying partitioned tables.

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5552: -- Reviewers: Alexey Kudinkin (was: Alexey Kudinkin) > Too slow while using trino-hudi connector w

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5485: Story Points: 1 (was: 0.5) > Improve performance of savepoint with MDT > --

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5485: Reviewers: sivabalan narayanan > Improve performance of savepoint with MDT > ---

[jira] [Updated] (HUDI-5552) Too slow while using trino-hudi connector while querying partitioned tables.

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5552: - Reviewers: Alexey Kudinkin > Too slow while using trino-hudi connector while querying partitioned tables.

[jira] [Updated] (HUDI-5552) Too slow while using trino-hudi connector while querying partitioned tables.

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5552: - Story Points: 2 > Too slow while using trino-hudi connector while querying partitioned tables. > -

[jira] [Assigned] (HUDI-5552) Too slow while using trino-hudi connector while querying partitioned tables.

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-5552: Assignee: Ethan Guo (was: Alexey Kudinkin) > Too slow while using trino-hudi connector while query

[jira] [Updated] (HUDI-5443) Fix exception when querying MOR table after applying NestedSchemaPruning optimization

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5443: -- Story Points: 3 (was: 4) > Fix exception when querying MOR table after applying NestedSchemaPru

[jira] [Updated] (HUDI-5570) Test scenario for failed compaction retried w/ MDT able to serve just the required data

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5570: -- Summary: Test scenario for failed compaction retried w/ MDT able to serve just the requi

[jira] [Updated] (HUDI-5571) Add support for keyless for all keygens(non partitioned, timestamp based key gen) row writer

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5571: -- Epic Link: HUDI-4699 Story Points: 2 > Add support for keyless for all keygens(no

[jira] [Updated] (HUDI-5571) Add support for keyless for all keygens(non partitioned, timestamp based key gen) row writer

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5571: -- Fix Version/s: 0.13.0 > Add support for keyless for all keygens(non partitioned, timesta

[jira] [Assigned] (HUDI-5571) Add support for keyless for all keygens(non partitioned, timestamp based key gen) row writer

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-5571: - Assignee: Lokesh Jain > Add support for keyless for all keygens(non partitioned,

[jira] [Updated] (HUDI-5571) Add support for keyless for all keygens(non partitioned, timestamp based key gen) row writer

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5571: -- Sprint: 0.13.0 Final Sprint 3 > Add support for keyless for all keygens(non partitioned,

[jira] [Updated] (HUDI-5498) Update docs for reading Hudi tables on Databricks runtime

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5498: Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint, 0.13.0

[jira] [Updated] (HUDI-5535) Add support for keyless for all keygens(non partitioned, timestamp based key gen)

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5535: -- Story Points: 2 (was: 3) > Add support for keyless for all keygens(non partitioned, tim

[jira] [Updated] (HUDI-5571) Add support for keyless for all keygens(non partitioned, timestamp based key gen) row writer

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5571: -- Priority: Blocker (was: Critical) > Add support for keyless for all keygens(non partiti

[jira] [Updated] (HUDI-5571) Add support for keyless for all keygens(non partitioned, timestamp based key gen) row writer

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5571: -- Priority: Critical (was: Major) > Add support for keyless for all keygens(non partition

[jira] [Assigned] (HUDI-5535) Add support for keyless for all keygens(non partitioned, timestamp based key gen)

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-5535: - Assignee: sivabalan narayanan (was: Lokesh Jain) > Add support for keyless for a

[jira] [Created] (HUDI-5571) Add support for keyless for all keygens(non partitioned, timestamp based key gen) row writer

2023-01-17 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5571: - Summary: Add support for keyless for all keygens(non partitioned, timestamp based key gen) row writer Key: HUDI-5571 URL: https://issues.apache.org/jira/browse/HUDI-557

[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3517: -- Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint, 0.13.0 Fi

[jira] [Updated] (HUDI-4700) RFC for primary key-less data model

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4700: -- Sprint: 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3) > RF

[jira] [Updated] (HUDI-5520) Fail MDT when list of log files grows unboundedly

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5520: -- Story Points: 2 (was: 1) > Fail MDT when list of log files grows unboundedly >

[jira] [Updated] (HUDI-3636) Clustering fails due to marker creation failure

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3636: -- Story Points: 1 (was: 0) > Clustering fails due to marker creation failure > --

[jira] [Updated] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5392: -- Story Points: 1 (was: 2) > Fix Bootstrap files reader to configure arrays to be read in the new

  1   2   3   >