[GitHub] [hudi] Mulavar commented on pull request #8529: [HUDI-6120]filter base file when there is only one file slice fetched

2023-04-26 Thread via GitHub
Mulavar commented on PR #8529: URL: https://github.com/apache/hudi/pull/8529#issuecomment-1524761536 > Add some notion like this: > > ```java > CAUTION: the method requires that all the file slices must only contain log files. > ``` Copy that, and just to make sure, we

[GitHub] [hudi] FredMkl opened a new issue, #6591: [SUPPORT]Duplicate records in MOR

2023-04-26 Thread via GitHub
FredMkl opened a new issue, #6591: URL: https://github.com/apache/hudi/issues/6591 **Describe the problem you faced** We use MOR table, we found that when updating an existing set of rows to another partition will result in both a)generate a parquet file b)an update written to a log

[GitHub] [hudi] codope commented on issue #6591: [SUPPORT]Duplicate records in MOR

2023-04-26 Thread via GitHub
codope commented on issue #6591: URL: https://github.com/apache/hudi/issues/6591#issuecomment-1524756517 Reopening to validate against master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] prashantwason commented on a diff in pull request #8523: [HUDI-6114] Fixed rollback of blocks in scanInternalV1

2023-04-26 Thread via GitHub
prashantwason commented on code in PR #8523: URL: https://github.com/apache/hudi/pull/8523#discussion_r1178653357 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java: ## @@ -306,45 +306,43 @@ private void scanInternalV1(Option

[GitHub] [hudi] hudi-bot commented on pull request #8585: [HUDI-8585]Improve documentation of org.apache.hudi.common.table.view.Abstr…

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8585: URL: https://github.com/apache/hudi/pull/8585#issuecomment-1524750802 ## CI report: * ef0b65c6471448ba86899c587618e60a6377d3c8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #8493: [HUDI-6098] Use bulk insert prepped for the initial write into MDT.

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8493: URL: https://github.com/apache/hudi/pull/8493#issuecomment-1524750228 ## CI report: * 6d9d24f2e0ab70b97fb912505f2d0da60dfea86f Azure:

[GitHub] [hudi] prashantwason commented on pull request #8493: [HUDI-6098] Use bulk insert prepped for the initial write into MDT.

2023-04-26 Thread via GitHub
prashantwason commented on PR #8493: URL: https://github.com/apache/hudi/pull/8493#issuecomment-1524745657 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] codope commented on issue #7733: [SUPPORT] Duplicate rows found in Hudi non partitioned table.

2023-04-26 Thread via GitHub
codope commented on issue #7733: URL: https://github.com/apache/hudi/issues/7733#issuecomment-1524743450 Reopening to validate the fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] BalaMahesh opened a new issue, #7733: [SUPPORT] Duplicate rows found in Hudi non partitioned table.

2023-04-26 Thread via GitHub
BalaMahesh opened a new issue, #7733: URL: https://github.com/apache/hudi/issues/7733 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at

[GitHub] [hudi] gtwuser opened a new issue, #6869: [SUPPORT] Incremental upsert or merge is not working

2023-04-26 Thread via GitHub
gtwuser opened a new issue, #6869: URL: https://github.com/apache/hudi/issues/6869 A clear and concise description of the problem. Hudi merge is not working as described in the hudi docs. On saving a record with few fields updated, it is saved as a new record instead of being merged

[GitHub] [hudi] codope commented on issue #6869: [SUPPORT] Incremental upsert or merge is not working

2023-04-26 Thread via GitHub
codope commented on issue #6869: URL: https://github.com/apache/hudi/issues/6869#issuecomment-1524739582 Reopening to validate the fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] PaddyMelody opened a new pull request, #8585: [DOC]Improve documentation of org.apache.hudi.common.table.view.Abstr…

2023-04-26 Thread via GitHub
PaddyMelody opened a new pull request, #8585: URL: https://github.com/apache/hudi/pull/8585 …actTableFileSystemView ### Change Logs This PR improved org.apache.hudi.common.table.view.AbstractTableFileSystemView header comments ### Impact Add the br tag to make it

[jira] [Commented] (HUDI-6057) Support Flink 1.17

2023-04-26 Thread Prabhu Joseph (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716982#comment-17716982 ] Prabhu Joseph commented on HUDI-6057: - Thanks [~danny0405] and [~rchertara] for the review and commit.

[jira] [Created] (HUDI-6143) Use the startoffset of each logfile recorded in the deltacommit metadata to read the logfile

2023-04-26 Thread lei w (Jira)
lei w created HUDI-6143: --- Summary: Use the startoffset of each logfile recorded in the deltacommit metadata to read the logfile Key: HUDI-6143 URL: https://issues.apache.org/jira/browse/HUDI-6143 Project:

[GitHub] [hudi] hudi-bot commented on pull request #8583: [MINOR] Propagate failed cleaner status

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8583: URL: https://github.com/apache/hudi/pull/8583#issuecomment-1524677684 ## CI report: * b0ddedb3fad1942ab4acca2910832900e5953977 Azure:

[GitHub] [hudi] tpcross opened a new issue, #8584: [SUPPORT] Spark SQL query FileNotFoundException using cleaner policy KEEP_LATEST_BY_HOURS

2023-04-26 Thread via GitHub
tpcross opened a new issue, #8584: URL: https://github.com/apache/hudi/issues/8584 **Describe the problem you faced** Hello, having an issue with the cleaning retention policy Spark SQL query FileNotFoundException when querying Hudi table using cleaner policy

[GitHub] [hudi] nsivabalan commented on issue #5777: [SUPPORT] Hudi table has duplicate data.

2023-04-26 Thread via GitHub
nsivabalan commented on issue #5777: URL: https://github.com/apache/hudi/issues/5777#issuecomment-1524667071 here are the reasons why we might see duplicates. So far, I could not pin point any of them for your use-case. but if you can find anything resembling your use-case, let us know.

[GitHub] [hudi] rohan-uptycs commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-26 Thread via GitHub
rohan-uptycs commented on PR #8503: URL: https://github.com/apache/hudi/pull/8503#issuecomment-1524640903 > @rohan-uptycs, please rebase master branch instead of merge. Sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] nsivabalan commented on issue #5777: [SUPPORT] Hudi table has duplicate data.

2023-04-26 Thread via GitHub
nsivabalan commented on issue #5777: URL: https://github.com/apache/hudi/issues/5777#issuecomment-1524637082 hey @jjtjiang : for the sample data you have provided, whats the record key field and whats the partition path field. we are taking a detailed look at all data consistency issues.

[GitHub] [hudi] hbgstc123 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

2023-04-26 Thread via GitHub
hbgstc123 commented on code in PR #8546: URL: https://github.com/apache/hudi/pull/8546#discussion_r1178594690 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java: ## @@ -97,6 +97,11 @@ public void open(Configuration

[GitHub] [hudi] hudi-bot commented on pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With StringBuilder

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8478: URL: https://github.com/apache/hudi/pull/8478#issuecomment-1524606488 ## CI report: * d169a9929d6c8dcf1ce3b350687592ad6e12314f Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-04-26 Thread via GitHub
hudi-bot commented on PR #7627: URL: https://github.com/apache/hudi/pull/7627#issuecomment-1524604127 ## CI report: * 85b25f5cda4ccd8189a1607259e1732a910c3262 UNKNOWN * c926ead8318c18cf0ead9122c9292dd22036d05c Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8503: URL: https://github.com/apache/hudi/pull/8503#issuecomment-1524593534 ## CI report: * 0738d975df341763e384b9ac9bcad14b006c9c47 UNKNOWN * b061571b7738370a1bdb64d2dd9cf5220b309d6d Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With StringBuilder

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8478: URL: https://github.com/apache/hudi/pull/8478#issuecomment-1524593285 ## CI report: * d169a9929d6c8dcf1ce3b350687592ad6e12314f Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-04-26 Thread via GitHub
hudi-bot commented on PR #7627: URL: https://github.com/apache/hudi/pull/7627#issuecomment-1524590779 ## CI report: * 85b25f5cda4ccd8189a1607259e1732a910c3262 UNKNOWN * c926ead8318c18cf0ead9122c9292dd22036d05c Azure:

[GitHub] [hudi] SteNicholas commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-26 Thread via GitHub
SteNicholas commented on PR #8503: URL: https://github.com/apache/hudi/pull/8503#issuecomment-1524567428 @rohan-uptycs, please rebase master branch instead of merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-26 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1178579855 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -509,7 +509,15 @@ private Stream

[GitHub] [hudi] danny0405 commented on a diff in pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

2023-04-26 Thread via GitHub
danny0405 commented on code in PR #8546: URL: https://github.com/apache/hudi/pull/8546#discussion_r1178572693 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java: ## @@ -97,6 +97,11 @@ public void open(Configuration

[GitHub] [hudi] danny0405 closed pull request #8501: [HUDI-6103] Validate required columns when fetching required positions

2023-04-26 Thread via GitHub
danny0405 closed pull request #8501: [HUDI-6103] Validate required columns when fetching required positions URL: https://github.com/apache/hudi/pull/8501 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] danny0405 commented on pull request #8501: [HUDI-6103] Validate required columns when fetching required positions

2023-04-26 Thread via GitHub
danny0405 commented on PR #8501: URL: https://github.com/apache/hudi/pull/8501#issuecomment-1524529704 Close because it is invalid. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[jira] [Closed] (HUDI-6127) Flink Hudi Support Commit on empty batch

2023-04-26 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6127. Resolution: Fixed Fixed via master branch: 1b02a492c5ab34a48fb63023641b022ae5ea4c1e > Flink Hudi Support

[jira] [Updated] (HUDI-6127) Flink Hudi Support Commit on empty batch

2023-04-26 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6127: - Fix Version/s: 0.14.0 > Flink Hudi Support Commit on empty batch >

[hudi] branch master updated (d43730a65d1 -> 1b02a492c5a)

2023-04-26 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from d43730a65d1 [HUDI-6019] support config minPartitions when reading from kafka (#8376) add 1b02a492c5a

[GitHub] [hudi] danny0405 merged pull request #8550: [HUDI-6127]Flink Hudi Write support commit on an empty batch

2023-04-26 Thread via GitHub
danny0405 merged PR #8550: URL: https://github.com/apache/hudi/pull/8550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] danny0405 commented on pull request #8550: [HUDI-6127]Flink Hudi Write support commit on an empty batch

2023-04-26 Thread via GitHub
danny0405 commented on PR #8550: URL: https://github.com/apache/hudi/pull/8550#issuecomment-1524523790 The failed test case `ITTestHoodieDataSource#testStreamWriteReadSkippingCompaction` should not be affected by this patch (flag default false), also run it in local for about 10 times and

[jira] [Updated] (HUDI-6086) Improve HiveSchemaUtil#generateCreateDDL With StringBuilder.

2023-04-26 Thread Shilun Fan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HUDI-6086: - Summary: Improve HiveSchemaUtil#generateCreateDDL With StringBuilder. (was: Improve

[GitHub] [hudi] hudi-bot commented on pull request #8582: [HUDI-6142] Refactor the code related to creating user-defined index

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8582: URL: https://github.com/apache/hudi/pull/8582#issuecomment-1524518313 ## CI report: * 0e206a09ca21ee70ae2093a02fad349eb6b56eac Azure:

[GitHub] [hudi] slfan1989 commented on pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With ST

2023-04-26 Thread via GitHub
slfan1989 commented on PR #8478: URL: https://github.com/apache/hudi/pull/8478#issuecomment-1524516978 @danny0405 Can you help review this PR again? Thank you very much! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [hudi] hudi-bot commented on pull request #8582: [HUDI-6142] Refactor the code related to creating user-defined index

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8582: URL: https://github.com/apache/hudi/pull/8582#issuecomment-1524505235 ## CI report: * 0e206a09ca21ee70ae2093a02fad349eb6b56eac Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8582: [HUDI-6142] Refactor the code related to creating user-defined index

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8582: URL: https://github.com/apache/hudi/pull/8582#issuecomment-1524489969 ## CI report: * 0e206a09ca21ee70ae2093a02fad349eb6b56eac Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8550: [HUDI-6127]Flink Hudi Write support commit on an empty batch

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8550: URL: https://github.com/apache/hudi/pull/8550#issuecomment-1524489726 ## CI report: * 563e10e0492a8194d789772de6bb9ced9f8c0721 UNKNOWN * 7f3f4aa438927aa50346ac5dbbb38f3e5241135d Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8432: URL: https://github.com/apache/hudi/pull/8432#issuecomment-1524488845 ## CI report: * 3e9388ee9a6edaa6caab4f738b093f82744bc7dc Azure:

[hudi] branch master updated: [HUDI-6019] support config minPartitions when reading from kafka (#8376)

2023-04-26 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository. vbalaji pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new d43730a65d1 [HUDI-6019] support config

[GitHub] [hudi] bvaradar merged pull request #8376: [HUDI-6019] support config minPartitions when reading from kafka

2023-04-26 Thread via GitHub
bvaradar merged PR #8376: URL: https://github.com/apache/hudi/pull/8376 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] danny0405 commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-26 Thread via GitHub
danny0405 commented on PR #8432: URL: https://github.com/apache/hudi/pull/8432#issuecomment-1524476805 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[jira] [Updated] (HUDI-6057) Support Flink 1.17

2023-04-26 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6057: - Fix Version/s: 0.14.0 > Support Flink 1.17 > -- > > Key: HUDI-6057 >

[jira] [Closed] (HUDI-6057) Support Flink 1.17

2023-04-26 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6057. Resolution: Fixed Fixed via master branch: 2ef682012b8df71a69288af5ad21609aeb73f023 > Support Flink 1.17 >

[hudi] branch master updated (0894279a0b7 -> 2ef682012b8)

2023-04-26 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 0894279a0b7 [HUDI-6124] Optimize exception message in HoodieCatalogTable (#8543) add 2ef682012b8 [HUDI-6057]

[GitHub] [hudi] danny0405 merged pull request #8512: [HUDI-6057] Support Flink 1.17

2023-04-26 Thread via GitHub
danny0405 merged PR #8512: URL: https://github.com/apache/hudi/pull/8512 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] stream2000 commented on pull request #8550: [HUDI-6127]Flink Hudi Write support commit on an empty batch

2023-04-26 Thread via GitHub
stream2000 commented on PR #8550: URL: https://github.com/apache/hudi/pull/8550#issuecomment-1524450157 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] hudi-bot commented on pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8505: URL: https://github.com/apache/hudi/pull/8505#issuecomment-1524381631 ## CI report: * f7c73e83812258b53b979afbd6d465e9066b801f UNKNOWN * 269fad02a5346121e823a15c9804e2e63eb16c30 UNKNOWN * 442430f680316bdfefc27c4aca9f7cd94e95373c UNKNOWN *

[GitHub] [hudi] hudi-bot commented on pull request #8575: [MINOR] Prevent nullptr exception if enum config class has extra fields

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8575: URL: https://github.com/apache/hudi/pull/8575#issuecomment-1524284374 ## CI report: * ba47591a4088f7a17d922438f9537a9fdf657be7 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8550: [HUDI-6127]Flink Hudi Write support commit on an empty batch

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8550: URL: https://github.com/apache/hudi/pull/8550#issuecomment-1524284179 ## CI report: * 563e10e0492a8194d789772de6bb9ced9f8c0721 UNKNOWN * 7f3f4aa438927aa50346ac5dbbb38f3e5241135d Azure:

[hudi] branch master updated (c332c60ad7b -> 0894279a0b7)

2023-04-26 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository. vbalaji pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from c332c60ad7b [HUDI-6131] Refactor getWritePathsOfInstants in Flink WriteProfiles (#8556) add 0894279a0b7

[GitHub] [hudi] bvaradar merged pull request #8543: [HUDI-6124] Optimize exception message in HoodieCatalogTable

2023-04-26 Thread via GitHub
bvaradar merged PR #8543: URL: https://github.com/apache/hudi/pull/8543 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] bvaradar commented on a diff in pull request #8376: [HUDI-6019] support config minPartitions when reading from kafka

2023-04-26 Thread via GitHub
bvaradar commented on code in PR #8376: URL: https://github.com/apache/hudi/pull/8376#discussion_r1178488660 ## hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/helpers/TestCheckpointUtils.java: ## @@ -57,63 +58,191 @@ public void testStringToOffsets() { @Test

[GitHub] [hudi] hudi-bot commented on pull request #8514: [HUDI-6113] Support multiple transformers using the same config keys in DeltaStreamer

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8514: URL: https://github.com/apache/hudi/pull/8514#issuecomment-1524183930 ## CI report: * 524a060f0e5fcf31d43cc538dbc80004d64c9b52 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8520: [HUDI-6115] Hardening expectation of corruptRecordColumn in ChainedTransformer.

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8520: URL: https://github.com/apache/hudi/pull/8520#issuecomment-1524122758 ## CI report: * 56550641906350ac03288ad2f23fff15ade2e9da Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8491: [HUDI-6095] Refactor the judgment condition of WorkloadProfile

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8491: URL: https://github.com/apache/hudi/pull/8491#issuecomment-1524081640 ## CI report: * bcd54355f02696b50cd3998e8cc93f5e64cfc338 UNKNOWN * 568d9d96e86a5a65cb6d1ffa996e28262863a5a1 Azure:

[GitHub] [hudi] sydneyhoran commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-04-26 Thread via GitHub
sydneyhoran commented on issue #8519: URL: https://github.com/apache/hudi/issues/8519#issuecomment-1524068006 My team is trying to develop a Custom Transformer class that can skip over null (tombstone) records from PostgresDebezium Kafka Source to address this. We are attempting along the

[GitHub] [hudi] sydneyhoran commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-04-26 Thread via GitHub
sydneyhoran commented on issue #8519: URL: https://github.com/apache/hudi/issues/8519#issuecomment-1524065639 Just as an update, we were able to set tombstones.on.delete to `false` in a lower environment and still got the following error after a delete op: -- This is an automated message

[GitHub] [hudi] hudi-bot commented on pull request #8512: [HUDI-6057] Support Flink 1.17

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8512: URL: https://github.com/apache/hudi/pull/8512#issuecomment-1524022293 ## CI report: * b60243c20444b113f0725dfe4d3ee235e7f22746 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-04-26 Thread via GitHub
hudi-bot commented on PR #7627: URL: https://github.com/apache/hudi/pull/7627#issuecomment-1524003196 ## CI report: * 85b25f5cda4ccd8189a1607259e1732a910c3262 UNKNOWN * c926ead8318c18cf0ead9122c9292dd22036d05c Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8583: [MINOR] Propagate failed cleaner status

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8583: URL: https://github.com/apache/hudi/pull/8583#issuecomment-1523944860 ## CI report: * b0ddedb3fad1942ab4acca2910832900e5953977 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8583: [MINOR] Propagate failed cleaner status

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8583: URL: https://github.com/apache/hudi/pull/8583#issuecomment-1523936127 ## CI report: * b0ddedb3fad1942ab4acca2910832900e5953977 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #8546: [MINOR] Add log in flink compact/cluster commit sink for troubleshoot…

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8546: URL: https://github.com/apache/hudi/pull/8546#issuecomment-1523927396 ## CI report: * 8bf018b22a413e93fb8773605166eede0cf0da62 Azure:

[GitHub] [hudi] soumilshah1995 commented on issue #8400: [SUPPORT] Hudi Offline Compaction in EMR Serverless 6.10 for YouTube Video

2023-04-26 Thread via GitHub
soumilshah1995 commented on issue #8400: URL: https://github.com/apache/hudi/issues/8400#issuecomment-1523923026 Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] haggy commented on pull request #5965: [MINOR] Propagate cleaner exceptions

2023-04-26 Thread via GitHub
haggy commented on PR #5965: URL: https://github.com/apache/hudi/pull/5965#issuecomment-1523896930 New PR here: https://github.com/apache/hudi/pull/8583 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] haggy opened a new pull request, #8583: [MINOR] Propagate failed cleaner status

2023-04-26 Thread via GitHub
haggy opened a new pull request, #8583: URL: https://github.com/apache/hudi/pull/8583 ### Change Logs PR Cloned from: https://github.com/apache/hudi/pull/5965 This PR changes the `HoodieCleaner` utility so that errors will propagate to spark + YARN (or whatever RM you are using).

[GitHub] [hudi] kazdy commented on issue #8502: [SUPPORT] Does spark.sql("MERGE INTO") supports schema evolution write option

2023-04-26 Thread via GitHub
kazdy commented on issue #8502: URL: https://github.com/apache/hudi/issues/8502#issuecomment-1523888006 @ad1happy2go Not sure if this is really something blocked by spark sql parser, as an example Delta Lake supports schema evolution in MERGE INTO (both for partial updates as well as

[GitHub] [hudi] haggy closed pull request #5965: [MINOR] Propagate cleaner exceptions

2023-04-26 Thread via GitHub
haggy closed pull request #5965: [MINOR] Propagate cleaner exceptions URL: https://github.com/apache/hudi/pull/5965 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] haggy commented on pull request #5965: [MINOR] Propagate cleaner exceptions

2023-04-26 Thread via GitHub
haggy commented on PR #5965: URL: https://github.com/apache/hudi/pull/5965#issuecomment-1523885282 Mis-push. This is a very old PR so I'm just going to close it and re-open with a fresh branch. Sorry all. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] hudi-bot commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8503: URL: https://github.com/apache/hudi/pull/8503#issuecomment-1523879004 ## CI report: * 0738d975df341763e384b9ac9bcad14b006c9c47 UNKNOWN * 56040691bc99ee34cdeb4e5bc758abe0ba9f7711 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8501: [HUDI-6103] Validate required columns when fetching required positions

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8501: URL: https://github.com/apache/hudi/pull/8501#issuecomment-1523878950 ## CI report: * bb329b30e88f2b1a84418ddf451839dab7bb7948 Azure:

[GitHub] [hudi] PhantomHunt commented on issue #8572: [SUPPORT] Getting java.io.FileNotFoundException when reading MOR table.

2023-04-26 Thread via GitHub
PhantomHunt commented on issue #8572: URL: https://github.com/apache/hudi/issues/8572#issuecomment-1523877536 > Generally incremental query will work only if cleaner has not run. for eg, if you have 100 commits in your timeline and cleaner has cleaned up the data pertaining to first 25

[GitHub] [hudi] haggy commented on a diff in pull request #5965: [MINOR] Propagate cleaner exceptions

2023-04-26 Thread via GitHub
haggy commented on code in PR #5965: URL: https://github.com/apache/hudi/pull/5965#discussion_r1178252458 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCleaner.java: ## @@ -106,12 +106,7 @@ public static void main(String[] args) { String dirName = new

[GitHub] [hudi] hudi-bot commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8503: URL: https://github.com/apache/hudi/pull/8503#issuecomment-1523869894 ## CI report: * 0738d975df341763e384b9ac9bcad14b006c9c47 UNKNOWN * 56040691bc99ee34cdeb4e5bc758abe0ba9f7711 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8575: [MINOR] Prevent nullptr exception if enum config class has extra fields

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8575: URL: https://github.com/apache/hudi/pull/8575#issuecomment-1523861124 ## CI report: * ba47591a4088f7a17d922438f9537a9fdf657be7 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8575: [MINOR] Prevent nullptr exception if enum config class has extra fields

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8575: URL: https://github.com/apache/hudi/pull/8575#issuecomment-1523810547 ## CI report: * ba47591a4088f7a17d922438f9537a9fdf657be7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] voonhous commented on a diff in pull request #8501: [HUDI-6103] Validate required columns when fetching required positions

2023-04-26 Thread via GitHub
voonhous commented on code in PR #8501: URL: https://github.com/apache/hudi/pull/8501#discussion_r1178169075 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java: ## @@ -401,4 +408,35 @@ private static void

[GitHub] [hudi] voonhous commented on pull request #8579: [MINOR] Added docs for gotchas when using PartialUpdateAvroPayload

2023-04-26 Thread via GitHub
voonhous commented on PR #8579: URL: https://github.com/apache/hudi/pull/8579#issuecomment-1523770585 @danny0405 Yeap, this is not a bug, but will affect the query snapshot results. Also MOR tables are more susceptible of encountering such issues (when performing a query on the _rt

[GitHub] [hudi] hudi-bot commented on pull request #8582: [HUDI-6142] Refactor the code related to creating user-defined index

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8582: URL: https://github.com/apache/hudi/pull/8582#issuecomment-1523733364 ## CI report: * 0e206a09ca21ee70ae2093a02fad349eb6b56eac Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8582: [HUDI-6142] Refactor the code related to creating user-defined index

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8582: URL: https://github.com/apache/hudi/pull/8582#issuecomment-1523720344 ## CI report: * 0e206a09ca21ee70ae2093a02fad349eb6b56eac UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field

2023-04-26 Thread via GitHub
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1523716865 ## CI report: * 744a515f20bf5611d649a2a502662799929779a3 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8505: URL: https://github.com/apache/hudi/pull/8505#issuecomment-1523703148 ## CI report: * f7c73e83812258b53b979afbd6d465e9066b801f UNKNOWN * 269fad02a5346121e823a15c9804e2e63eb16c30 UNKNOWN * 442430f680316bdfefc27c4aca9f7cd94e95373c UNKNOWN *

[GitHub] [hudi] hudi-bot commented on pull request #8076: [HUDI-5884] Support bulk_insert for insert_overwrite and insert_overwrite_table

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8076: URL: https://github.com/apache/hudi/pull/8076#issuecomment-1523701947 ## CI report: * 6a239ada8998fd440f19c0082b26d206ed589870 UNKNOWN * 1fadedfb975375bba6571e7ecf51de55d7e8dae2 Azure:

[GitHub] [hudi] huangxiaopingRD opened a new pull request, #8582: [HUDI-6142] Refactor the code related to creating user-defined index

2023-04-26 Thread via GitHub
huangxiaopingRD opened a new pull request, #8582: URL: https://github.com/apache/hudi/pull/8582 ### Change Logs extract the same code ### Impact No ### Risk level (write none, low medium or high below) None ### Documentation Update ###

[jira] [Updated] (HUDI-6142) Refactor the code related to creating user-defined index

2023-04-26 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6142: - Labels: pull-request-available (was: ) > Refactor the code related to creating user-defined

[jira] [Created] (HUDI-6142) Refactor the code related to creating user-defined index

2023-04-26 Thread xiaoping.huang (Jira)
xiaoping.huang created HUDI-6142: Summary: Refactor the code related to creating user-defined index Key: HUDI-6142 URL: https://issues.apache.org/jira/browse/HUDI-6142 Project: Apache Hudi

[GitHub] [hudi] hudi-bot commented on pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8505: URL: https://github.com/apache/hudi/pull/8505#issuecomment-1523623798 ## CI report: * f7c73e83812258b53b979afbd6d465e9066b801f UNKNOWN * 269fad02a5346121e823a15c9804e2e63eb16c30 UNKNOWN * 442430f680316bdfefc27c4aca9f7cd94e95373c UNKNOWN *

[GitHub] [hudi] hudi-bot commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8432: URL: https://github.com/apache/hudi/pull/8432#issuecomment-1523623222 ## CI report: * 3e9388ee9a6edaa6caab4f738b093f82744bc7dc Azure:

[GitHub] [hudi] PrabhuJoseph commented on a diff in pull request #8512: [HUDI-6057] Support Flink 1.17

2023-04-26 Thread via GitHub
PrabhuJoseph commented on code in PR #8512: URL: https://github.com/apache/hudi/pull/8512#discussion_r1178045112 ## pom.xml: ## @@ -2373,9 +2374,23 @@ + + flink1.17 + Review Comment: Done. -- This is an automated message from the Apache

[GitHub] [hudi] hudi-bot commented on pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-04-26 Thread via GitHub
hudi-bot commented on PR #8505: URL: https://github.com/apache/hudi/pull/8505#issuecomment-1523610285 ## CI report: * f7c73e83812258b53b979afbd6d465e9066b801f UNKNOWN * 269fad02a5346121e823a15c9804e2e63eb16c30 UNKNOWN * 442430f680316bdfefc27c4aca9f7cd94e95373c UNKNOWN *

[GitHub] [hudi] nsivabalan commented on issue #8500: [DISCUSS] Hive Sync will lose some partitions in multi writer scenario

2023-04-26 Thread via GitHub
nsivabalan commented on issue #8500: URL: https://github.com/apache/hudi/issues/8500#issuecomment-1523567035 since we have a tracking ticket, can we go ahead and close the github issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [hudi] nsivabalan closed issue #8544: [SUPPORT] Support rate limit when reading Hudi table

2023-04-26 Thread via GitHub
nsivabalan closed issue #8544: [SUPPORT] Support rate limit when reading Hudi table URL: https://github.com/apache/hudi/issues/8544 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] nsivabalan commented on issue #8544: [SUPPORT] Support rate limit when reading Hudi table

2023-04-26 Thread via GitHub
nsivabalan commented on issue #8544: URL: https://github.com/apache/hudi/issues/8544#issuecomment-1523559602 since, this a feature request, we can probably go ahead and close the github issues. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] nsivabalan commented on issue #8580: Hudi reads failing on upgrade to Hudi 0.12.2 from Hudi 0.11.1

2023-04-26 Thread via GitHub
nsivabalan commented on issue #8580: URL: https://github.com/apache/hudi/issues/8580#issuecomment-1523549310 yes. unfortunately, the readers has to be upgraded before you can upgrade the writers. this a known limitation. -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-04-26 Thread via GitHub
zhuanshenbsj1 commented on code in PR #8505: URL: https://github.com/apache/hudi/pull/8505#discussion_r1177963180 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java: ## @@ -269,6 +269,7 @@ private int doCompact(JavaSparkContext jsc) throws Exception

[GitHub] [hudi] nsivabalan commented on issue #8576: [SUPPORT] Doubt about handling old data arrival in hudi

2023-04-26 Thread via GitHub
nsivabalan commented on issue #8576: URL: https://github.com/apache/hudi/issues/8576#issuecomment-1523546055 you can also refer to https://medium.com/@simpsons/curious-case-of-defaulthoodierecordpayload-vs-default-payload-class-in-hudi-efbfa423c48e for some details on these two payloads.

[GitHub] [hudi] nsivabalan commented on issue #8576: [SUPPORT] Doubt about handling old data arrival in hudi

2023-04-26 Thread via GitHub
nsivabalan commented on issue #8576: URL: https://github.com/apache/hudi/issues/8576#issuecomment-1523544854 if you strictly wish to honor the ordering field, you might have to use

  1   2   3   >