[jira] [Created] (HUDI-1758) Flink insert command does not update the record
Nishith Agarwal created HUDI-1758: - Summary: Flink insert command does not update the record Key: HUDI-1758 URL: https://issues.apache.org/jira/browse/HUDI-1758 Project: Apache Hudi Issue Type: Bug Components: Flink Integration Reporter: Nishith Agarwal Assignee: Nishith Agarwal !image (1).png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1758) Flink insert command does not update the record
[ https://issues.apache.org/jira/browse/HUDI-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-1758: -- Description: !image (1).png! Followed the steps mentioned in [https://hudi.apache.org/docs/flink-quick-start-guide.html] but the second insert command that is supposed to perform an `update` did not update the record. [~danny0405] Would you be able to help here ? was:!image (1).png! > Flink insert command does not update the record > --- > > Key: HUDI-1758 > URL: https://issues.apache.org/jira/browse/HUDI-1758 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: Nishith Agarwal >Assignee: Nishith Agarwal >Priority: Major > > !image (1).png! > > Followed the steps mentioned in > [https://hudi.apache.org/docs/flink-quick-start-guide.html] > but the second insert command that is supposed to perform an `update` did not > update the record. > > [~danny0405] Would you be able to help here ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1758) Flink insert command does not update the record
[ https://issues.apache.org/jira/browse/HUDI-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-1758: -- Description: !Screen Shot 2021-04-02 at 12.10.08 AM.png! Followed the steps mentioned in [https://hudi.apache.org/docs/flink-quick-start-guide.html] but the second insert command that is supposed to perform an `update` did not update the record. [~danny0405] Would you be able to help here ? was: !image (1).png! Followed the steps mentioned in [https://hudi.apache.org/docs/flink-quick-start-guide.html] but the second insert command that is supposed to perform an `update` did not update the record. [~danny0405] Would you be able to help here ? > Flink insert command does not update the record > --- > > Key: HUDI-1758 > URL: https://issues.apache.org/jira/browse/HUDI-1758 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: Nishith Agarwal >Assignee: Nishith Agarwal >Priority: Major > > !Screen Shot 2021-04-02 at 12.10.08 AM.png! > > Followed the steps mentioned in > [https://hudi.apache.org/docs/flink-quick-start-guide.html] > but the second insert command that is supposed to perform an `update` did not > update the record. > > [~danny0405] Would you be able to help here ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1758) Flink insert command does not update the record
[ https://issues.apache.org/jira/browse/HUDI-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-1758: -- Description: [^Screen Shot 2021-04-02 at 12.10.08 AM.zip] Followed the steps mentioned in [https://hudi.apache.org/docs/flink-quick-start-guide.html] but the second insert command that is supposed to perform an `update` did not update the record. [~danny0405] Would you be able to help here ? was: !Screen Shot 2021-04-02 at 12.10.08 AM.png! Followed the steps mentioned in [https://hudi.apache.org/docs/flink-quick-start-guide.html] but the second insert command that is supposed to perform an `update` did not update the record. [~danny0405] Would you be able to help here ? > Flink insert command does not update the record > --- > > Key: HUDI-1758 > URL: https://issues.apache.org/jira/browse/HUDI-1758 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: Nishith Agarwal >Assignee: Nishith Agarwal >Priority: Major > > [^Screen Shot 2021-04-02 at 12.10.08 AM.zip] > > Followed the steps mentioned in > [https://hudi.apache.org/docs/flink-quick-start-guide.html] > but the second insert command that is supposed to perform an `update` did not > update the record. > > [~danny0405] Would you be able to help here ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 commented on a change in pull request #2666: URL: https://github.com/apache/hudi/pull/2666#discussion_r606108625 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java ## @@ -106,7 +110,7 @@ public HoodieMergeHandle(HoodieWriteConfig config, String instantTime, HoodieTable hoodieTable, Iterator> recordItr, String partitionPath, String fileId, TaskContextSupplier taskContextSupplier) { -super(config, instantTime, partitionPath, fileId, hoodieTable, taskContextSupplier); +super(config, instantTime, partitionPath, fileId, hoodieTable, getWriterSchemaIncludingAndExcludingMetadataPair(config, hoodieTable), taskContextSupplier); Review comment: I found through debug,when config.setLastSchema is successfully updated, table.getConfig.getLastSchema is also updated, and they both take their values from the same properties -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 commented on a change in pull request #2666: URL: https://github.com/apache/hudi/pull/2666#discussion_r606108825 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdatePayload.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.model; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; +import org.apache.hudi.common.util.Option; + +import java.io.IOException; +import java.util.List; + +public class PartialUpdatePayload extends OverwriteWithLatestAvroPayload { Review comment: ok,i will add -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 commented on a change in pull request #2666: URL: https://github.com/apache/hudi/pull/2666#discussion_r606116196 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdatePayload.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.model; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; +import org.apache.hudi.common.util.Option; + +import java.io.IOException; +import java.util.List; + +public class PartialUpdatePayload extends OverwriteWithLatestAvroPayload { + public PartialUpdatePayload(GenericRecord record, Comparable orderingVal) { +super(record, orderingVal); + } + + public PartialUpdatePayload(Option record) { +this(record.get(), (record1) -> 0); // natural order + } + + @Override + public Option combineAndGetUpdateValue(IndexedRecord lastValue, Schema schema) throws IOException { Review comment: > schema here refers to incoming partial schema right? yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1657) build failed on AArch64, Fedora 33
[ https://issues.apache.org/jira/browse/HUDI-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313672#comment-17313672 ] shenjinxin commented on HUDI-1657: -- I also encounter the same problem. My Java is JDK 1.8_281 > build failed on AArch64, Fedora 33 > --- > > Key: HUDI-1657 > URL: https://issues.apache.org/jira/browse/HUDI-1657 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lutz Weischer >Priority: Major > Labels: sev:triage, user-support-issues > > [jw@cn05 hudi]$ mvn package -DskipTests > [INFO] Scanning for projects... > [WARNING] > [WARNING] Some problems were encountered while building the effective model > for org.apache.hudi:hudi-java-client:jar:0.8.0-SNAPSHOT > [WARNING] The expression ${parent.version} is deprecated. Please use > ${project.parent.version} instead. > [WARNING] > [WARNING] Some problems were encountered while building the effective model > for org.apache.hudi:hudi-spark-client:jar:0.8.0-SNAPSHOT > [WARNING] The expression ${parent.version} is deprecated. Please use > ${project.parent.version} instead. > [WARNING] > [WARNING] Some problems were encountered while building the effective model > for org.apache.hudi:hudi-flink-client:jar:0.8.0-SNAPSHOT > [WARNING] The expression ${parent.version} is deprecated. Please use > ${project.parent.version} instead. > [WARNING] > [WARNING] Some problems were encountered while building the effective model > for org.apache.hudi:hudi-spark_2.11:jar:0.8.0-SNAPSHOT > [WARNING] 'artifactId' contains an expression but should be a constant. @ > org.apache.hudi:hudi-spark_${scala.binary.version}:0.8.0-SNAPSHOT, > /home/jw/apache/hudi/hudi-spark-datasource/hudi-spark/pom.xml, line 26, > column 15 > [WARNING] > [WARNING] Some problems were encountered while building the effective model > for org.apache.hudi:hudi-spark2_2.11:jar:0.8.0-SNAPSHOT > [WARNING] 'artifactId' contains an expression but should be a constant. @ > org.apache.hudi:hudi-spark2_${scala.binary.version}:0.8.0-SNAPSHOT, > /home/jw/apache/hudi/hudi-spark-datasource/hudi-spark2/pom.xml, line 24, > column 15 > [WARNING] > [WARNING] Some problems were encountered while building the effective model > for org.apache.hudi:hudi-utilities_2.11:jar:0.8.0-SNAPSHOT > [WARNING] 'artifactId' contains an expression but should be a constant. @ > org.apache.hudi:hudi-utilities_${scala.binary.version}:0.8.0-SNAPSHOT, > /home/jw/apache/hudi/hudi-utilities/pom.xml, line 26, column 15 > [WARNING] > [WARNING] Some problems were encountered while building the effective model > for org.apache.hudi:hudi-spark-bundle_2.11:jar:0.8.0-SNAPSHOT > [WARNING] 'artifactId' contains an expression but should be a constant. @ > org.apache.hudi:hudi-spark-bundle_${scala.binary.version}:0.8.0-SNAPSHOT, > /home/jw/apache/hudi/packaging/hudi-spark-bundle/pom.xml, line 26, column 15 > [WARNING] > [WARNING] Some problems were encountered while building the effective model > for org.apache.hudi:hudi-utilities-bundle_2.11:jar:0.8.0-SNAPSHOT > [WARNING] 'artifactId' contains an expression but should be a constant. @ > org.apache.hudi:hudi-utilities-bundle_${scala.binary.version}:0.8.0-SNAPSHOT, > /home/jw/apache/hudi/packaging/hudi-utilities-bundle/pom.xml, line 26, column > 15 > [WARNING] > [WARNING] Some problems were encountered while building the effective model > for org.apache.hudi:hudi-flink_2.11:jar:0.8.0-SNAPSHOT > [WARNING] 'artifactId' contains an expression but should be a constant. @ > org.apache.hudi:hudi-flink_${scala.binary.version}:0.8.0-SNAPSHOT, > /home/jw/apache/hudi/hudi-flink/pom.xml, line 28, column 15 > [WARNING] > [WARNING] Some problems were encountered while building the effective model > for org.apache.hudi:hudi-flink-bundle_2.11:jar:0.8.0-SNAPSHOT > [WARNING] 'artifactId' contains an expression but should be a constant. @ > org.apache.hudi:hudi-flink-bundle_${scala.binary.version}:0.8.0-SNAPSHOT, > /home/jw/apache/hudi/packaging/hudi-flink-bundle/pom.xml, line 28, column 15 > [WARNING] > [WARNING] It is highly recommended to fix these problems because they > threaten the stability of your build. > [WARNING] > [WARNING] For this reason, future Maven versions might no longer support > building such malformed projects. > [WARNING] > [INFO] > > [INFO] Reactor Build Order: > [INFO] > [INFO] Hudi > [pom] > [INFO] hudi-common > [jar] > [INFO] hudi-timeline-service > [jar] > [INFO] hudi-client > [pom] > [INFO] hudi-client-common >
[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 commented on a change in pull request #2666: URL: https://github.com/apache/hudi/pull/2666#discussion_r606117534 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdatePayload.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.model; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; +import org.apache.hudi.common.util.Option; + +import java.io.IOException; +import java.util.List; + +public class PartialUpdatePayload extends OverwriteWithLatestAvroPayload { + public PartialUpdatePayload(GenericRecord record, Comparable orderingVal) { +super(record, orderingVal); + } + + public PartialUpdatePayload(Option record) { +this(record.get(), (record1) -> 0); // natural order + } + + @Override + public Option combineAndGetUpdateValue(IndexedRecord lastValue, Schema schema) throws IOException { Review comment: > existing ok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false
lrz created HUDI-1759: - Summary: Save one connection retry when hiveSyncTool run with useJdbc=false Key: HUDI-1759 URL: https://issues.apache.org/jira/browse/HUDI-1759 Project: Apache Hudi Issue Type: Improvement Reporter: lrz Fix For: 0.9.0 Attachments: image-2021-04-02-15-43-15-854.png when sync metadata to hive with useJdbc=false, there will have two problem: first: if hive server enable kerberos, and hudi sync to hive with useJdbc=false, then the metadata will miss owner, check the meta data here(I test with hive 3.1.1): !image-2021-04-02-15-43-15-854.png! second: we can see there is a connection retry to hive metastore everytime syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync": -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false
[ https://issues.apache.org/jira/browse/HUDI-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-1759: -- Attachment: image-2021-04-02-15-48-42-895.png > Save one connection retry when hiveSyncTool run with useJdbc=false > -- > > Key: HUDI-1759 > URL: https://issues.apache.org/jira/browse/HUDI-1759 > Project: Apache Hudi > Issue Type: Improvement >Reporter: lrz >Priority: Major > Fix For: 0.9.0 > > Attachments: image-2021-04-02-15-43-15-854.png, > image-2021-04-02-15-48-42-895.png > > > when sync metadata to hive with useJdbc=false, there will have two problem: > first: if hive server enable kerberos, and hudi sync to hive with > useJdbc=false, then the metadata will miss owner, check the meta data here(I > test with hive 3.1.1): > !image-2021-04-02-15-43-15-854.png! > second: we can see there is a connection retry to hive metastore everytime > syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync": > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false
[ https://issues.apache.org/jira/browse/HUDI-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-1759: -- Description: when sync metadata to hive with useJdbc=false, there will have two problem: first: if hive server enable kerberos, and hudi sync to hive with useJdbc=false, then the metadata will miss owner, check the meta data here(I test with hive 3.1.1): !image-2021-04-02-15-43-15-854.png! second: we can see there is a connection retry to hive metastore everytime syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync": !image-2021-04-02-15-48-42-895.png! was: when sync metadata to hive with useJdbc=false, there will have two problem: first: if hive server enable kerberos, and hudi sync to hive with useJdbc=false, then the metadata will miss owner, check the meta data here(I test with hive 3.1.1): !image-2021-04-02-15-43-15-854.png! second: we can see there is a connection retry to hive metastore everytime syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync": > Save one connection retry when hiveSyncTool run with useJdbc=false > -- > > Key: HUDI-1759 > URL: https://issues.apache.org/jira/browse/HUDI-1759 > Project: Apache Hudi > Issue Type: Improvement >Reporter: lrz >Priority: Major > Fix For: 0.9.0 > > Attachments: image-2021-04-02-15-43-15-854.png, > image-2021-04-02-15-48-42-895.png > > > when sync metadata to hive with useJdbc=false, there will have two problem: > first: if hive server enable kerberos, and hudi sync to hive with > useJdbc=false, then the metadata will miss owner, check the meta data here(I > test with hive 3.1.1): > !image-2021-04-02-15-43-15-854.png! > second: we can see there is a connection retry to hive metastore everytime > syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync": > !image-2021-04-02-15-48-42-895.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] li36909 opened a new pull request #2759: [HUDI-1759] Save one connection retry to hive metastore when hiveSyncTool run with useJdbc=false
li36909 opened a new pull request #2759: URL: https://github.com/apache/hudi/pull/2759 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request 1) set owner when start hive session 2) save one connection retry to hive metastore when sync to hive ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request run the UT "TestHiveSyncTool.testBasicSync", and check the log. after this fix, it shoudn't have a connection retry exception ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false
[ https://issues.apache.org/jira/browse/HUDI-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1759: - Labels: pull-request-available (was: ) > Save one connection retry when hiveSyncTool run with useJdbc=false > -- > > Key: HUDI-1759 > URL: https://issues.apache.org/jira/browse/HUDI-1759 > Project: Apache Hudi > Issue Type: Improvement >Reporter: lrz >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > Attachments: image-2021-04-02-15-43-15-854.png, > image-2021-04-02-15-48-42-895.png > > > when sync metadata to hive with useJdbc=false, there will have two problem: > first: if hive server enable kerberos, and hudi sync to hive with > useJdbc=false, then the metadata will miss owner, check the meta data here(I > test with hive 3.1.1): > !image-2021-04-02-15-43-15-854.png! > second: we can see there is a connection retry to hive metastore everytime > syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync": > !image-2021-04-02-15-48-42-895.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] li36909 commented on pull request #2759: [HUDI-1759] Save one connection retry to hive metastore when hiveSyncTool run with useJdbc=false
li36909 commented on pull request #2759: URL: https://github.com/apache/hudi/pull/2759#issuecomment-812402879 The retry issue is cause by: when close metaStoreClient, or sessionState, or hiveDriver, they will all call 'Hive.closeCurrent()', so both sessionState and hiveDriver should be a class member -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] li36909 commented on pull request #2759: [HUDI-1759] Save one connection retry to hive metastore when hiveSyncTool run with useJdbc=false
li36909 commented on pull request #2759: URL: https://github.com/apache/hudi/pull/2759#issuecomment-812403045 cc @nsivabalan could you help to take a look, thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch release-0.8.0 updated: [MINOR] Update release version to reflect published version 0.8.0
This is an automated email from the ASF dual-hosted git repository. garyli pushed a commit to branch release-0.8.0 in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/release-0.8.0 by this push: new da65d3c [MINOR] Update release version to reflect published version 0.8.0 da65d3c is described below commit da65d3cae99e8fee0ede9b5ed8630a3716d284c8 Author: garyli1019 AuthorDate: Fri Apr 2 17:19:07 2021 +0800 [MINOR] Update release version to reflect published version 0.8.0 --- docker/hoodie/hadoop/base/pom.xml | 2 +- docker/hoodie/hadoop/datanode/pom.xml | 2 +- docker/hoodie/hadoop/historyserver/pom.xml | 2 +- docker/hoodie/hadoop/hive_base/pom.xml | 2 +- docker/hoodie/hadoop/namenode/pom.xml | 2 +- docker/hoodie/hadoop/pom.xml| 2 +- docker/hoodie/hadoop/prestobase/pom.xml | 2 +- docker/hoodie/hadoop/spark_base/pom.xml | 2 +- docker/hoodie/hadoop/sparkadhoc/pom.xml | 2 +- docker/hoodie/hadoop/sparkmaster/pom.xml| 2 +- docker/hoodie/hadoop/sparkworker/pom.xml| 2 +- hudi-cli/pom.xml| 2 +- hudi-client/hudi-client-common/pom.xml | 4 ++-- hudi-client/hudi-flink-client/pom.xml | 4 ++-- hudi-client/hudi-java-client/pom.xml| 4 ++-- hudi-client/hudi-spark-client/pom.xml | 4 ++-- hudi-client/pom.xml | 2 +- hudi-common/pom.xml | 2 +- hudi-examples/pom.xml | 2 +- hudi-flink/pom.xml | 2 +- hudi-hadoop-mr/pom.xml | 2 +- hudi-integ-test/pom.xml | 2 +- hudi-spark-datasource/hudi-spark-common/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark2/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark3/pom.xml | 4 ++-- hudi-spark-datasource/pom.xml | 2 +- hudi-sync/hudi-dla-sync/pom.xml | 2 +- hudi-sync/hudi-hive-sync/pom.xml| 2 +- hudi-sync/hudi-sync-common/pom.xml | 2 +- hudi-sync/pom.xml | 2 +- hudi-timeline-service/pom.xml | 2 +- hudi-utilities/pom.xml | 2 +- packaging/hudi-flink-bundle/pom.xml | 2 +- packaging/hudi-hadoop-mr-bundle/pom.xml | 2 +- packaging/hudi-hive-sync-bundle/pom.xml | 2 +- packaging/hudi-integ-test-bundle/pom.xml| 2 +- packaging/hudi-presto-bundle/pom.xml| 2 +- packaging/hudi-spark-bundle/pom.xml | 2 +- packaging/hudi-timeline-server-bundle/pom.xml | 2 +- packaging/hudi-utilities-bundle/pom.xml | 2 +- pom.xml | 2 +- 42 files changed, 50 insertions(+), 50 deletions(-) diff --git a/docker/hoodie/hadoop/base/pom.xml b/docker/hoodie/hadoop/base/pom.xml index 3e2bc48..85e84d0 100644 --- a/docker/hoodie/hadoop/base/pom.xml +++ b/docker/hoodie/hadoop/base/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-rc1 +0.8.0 4.0.0 pom diff --git a/docker/hoodie/hadoop/datanode/pom.xml b/docker/hoodie/hadoop/datanode/pom.xml index 561d1a9..b57a19c 100644 --- a/docker/hoodie/hadoop/datanode/pom.xml +++ b/docker/hoodie/hadoop/datanode/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-rc1 +0.8.0 4.0.0 pom diff --git a/docker/hoodie/hadoop/historyserver/pom.xml b/docker/hoodie/hadoop/historyserver/pom.xml index b06a238..e04d446 100644 --- a/docker/hoodie/hadoop/historyserver/pom.xml +++ b/docker/hoodie/hadoop/historyserver/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-rc1 +0.8.0 4.0.0 pom diff --git a/docker/hoodie/hadoop/hive_base/pom.xml b/docker/hoodie/hadoop/hive_base/pom.xml index c17c3da..3f85692 100644 --- a/docker/hoodie/hadoop/hive_base/pom.xml +++ b/docker/hoodie/hadoop/hive_base/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-rc1 +0.8.0 4.0.0 pom diff --git a/docker/hoodie/hadoop/namenode/pom.xml b/docker/hoodie/hadoop/namenode/pom.xml index ab7251c..2806990 100644 --- a/docker/hoodie/hadoop/namenode/pom.xml +++ b/docker/hoodie/hadoop/namenode/pom.xml @@ -19,7 +19,7 @@ hudi-hadoop-docker org.apache.hudi -0.8.0-rc1 +0.8.0 4.0.0 pom diff --git a/docker/hoodie/hadoop/pom.xml b/docker/hoodie/hadoop/pom.xml index deff4ba..e8300ae 100644 --- a/docker/hoodie/hadoop/pom.xml +++ b/docker/hoodie/hadoop/pom.xml @@ -19,7 +19,7 @@ hudi org.apache.hudi -0.8.0-rc1 +0.8.0 ../../../pom.xml 4.0.0 diff --git a/docker/hoodie/hadoop/prestobase/pom.xml b/docker/hoodie/hadoop/prestobase/pom.xml index 2430969..605b
[GitHub] [hudi] ssdong edited a comment on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
ssdong edited a comment on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-811619487 @jsbali To give out extra insights and details, as @zherenyu831 has posted in the beginning: ``` [20210323080718__replacecommit__COMPLETED]: size : 0 [20210323081449__replacecommit__COMPLETED]: size : 1 [20210323082046__replacecommit__COMPLETED]: size : 1 [20210323082758__replacecommit__COMPLETED]: size : 1 [20210323084004__replacecommit__COMPLETED]: size : 1 [20210323085044__replacecommit__COMPLETED]: size : 1 [20210323085823__replacecommit__COMPLETED]: size : 1 [20210323090550__replacecommit__COMPLETED]: size : 1 [20210323091700__replacecommit__COMPLETED]: size : 1 ``` If we keep everything the same and let archive logic handling everything, it would fail at 0 `partitionToReplaceFileIds` against `20210323080718__replacecommit__COMPLETED`(the first item in the list above), and this is a known issue. To make the archive work, we tried to _manually_ delete the first _empty_ commit file, which is `20210323080718__replacecommit__COMPLETED`(the first item in the list above). This has succeeded the archive, but instead, it has failed upon `User class threw exception: org.apache.hudi.exception.HoodieIOException: Could not read commit details from s3://xxx/data/.hoodie/20210323081449.replacecommit`(the second item in the list above) Now to reason through the underlying mechanism of this error, given the archive was successful, that means a few commit files have been placed within the `.archive` folder, let's say ``` [20210323081449__replacecommit__COMPLETED]: size : 1 [20210323082046__replacecommit__COMPLETED]: size : 1 [20210323082758__replacecommit__COMPLETED]: size : 1 [20210323084004__replacecommit__COMPLETED]: size : 1 [20210323085044__replacecommit__COMPLETED]: size : 1 ``` have been successfully moved and placed in `.archive`. At this moment, the timeline has been updated and there are 3 remaining commit files which are: ``` [20210323085823__replacecommit__COMPLETED]: size : 1 [20210323090550__replacecommit__COMPLETED]: size : 1 [20210323091700__replacecommit__COMPLETED]: size : 1 ``` Now, if you pay attention to the stack trace which caused `User class threw exception: org.apache.hudi.exception.HoodieIOException: Could not read commit details from s3://xxx/data/.hoodie/20210323081449.replacecommit`, and I am just pasting them again: ``` User class threw exception: org.apache.hudi.exception.HoodieIOException: Could not read commit details from s3://xxx/data/.hoodie/20210323081449.replacecommit at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:530) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:194) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353) at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707) at org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118) at org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:179) at org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:112) ``` After a `close` action being triggered on `TimelineService`, which is understandable, it propagates to `HoodieTableFileSystemView.close` and there is: ``` at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) at org.apache.hudi.common.table.view.A
[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 commented on a change in pull request #2666: URL: https://github.com/apache/hudi/pull/2666#discussion_r606123928 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/CommitUtils.java ## @@ -59,14 +61,24 @@ public static HoodieCommitMetadata buildMetadata(List writeStat Option> extraMetadata, WriteOperationType operationType, String schemaToStoreInCommit, - String commitActionType) { + String commitActionType, + Boolean updatePartialFields, + HoodieTableMetaClient metaClient) { HoodieCommitMetadata commitMetadata = buildMetadataFromStats(writeStats, partitionToReplaceFileIds, commitActionType, operationType); // add in extra metadata if (extraMetadata.isPresent()) { extraMetadata.get().forEach(commitMetadata::addMetadata); } +if (updatePartialFields) { + try { Review comment: good,i will delete it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hddong commented on a change in pull request #2325: [HUDI-699]Fix CompactionCommand and add unit test for CompactionCommand
hddong commented on a change in pull request #2325: URL: https://github.com/apache/hudi/pull/2325#discussion_r606166490 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/CompactionCommand.java ## @@ -175,25 +174,26 @@ public String compactionShowArchived( HoodieArchivedTimeline archivedTimeline = client.getArchivedTimeline(); HoodieInstant instant = new HoodieInstant(HoodieInstant.State.COMPLETED, HoodieTimeline.COMPACTION_ACTION, compactionInstantTime); -String startTs = CommitUtil.addHours(compactionInstantTime, -1); -String endTs = CommitUtil.addHours(compactionInstantTime, 1); Review comment: > if we want to load a `ts` equals `compactionInstantTime`, can we add a new method that takes only one `instantTime` as input param? WDYT Yes, had add it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hddong commented on pull request #2325: [HUDI-699]Fix CompactionCommand and add unit test for CompactionCommand
hddong commented on pull request #2325: URL: https://github.com/apache/hudi/pull/2325#issuecomment-812460441 @yanghua @wangxianghu: had address them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1591) Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning
[ https://issues.apache.org/jira/browse/HUDI-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengzhiwei updated HUDI-1591: - Summary: Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning (was: Improve Hoodie Table Query Performance And Ease Of Use For Spark) > Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource > using non-globbed table path and partition pruning > -- > > Key: HUDI-1591 > URL: https://issues.apache.org/jira/browse/HUDI-1591 > Project: Apache Hudi > Issue Type: Improvement > Components: Spark Integration >Affects Versions: 0.9.0 >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > We have found same problems on query hoodie table on spark: > 1、Users must specify "*" to tell the partition level to spark for the query. > 2、Cannot support partition prune for COW table. > This issue wants to achieve the following goals: > 1、Support No Stars query for hoodie table. > 2、Support partition prune for COW table. > Refer to the documentation for more details about this: [Optimization For > Hudi COW > Query|https://docs.google.com/document/d/1qG014M3VZg3lMswsZv7cYB9Tb0vz8yXgqvlI_Jlnnsc/edit#heading=h.k6ro6dhgwh8y] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] pengzhiwei2018 closed pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…
pengzhiwei2018 closed pull request #2334: URL: https://github.com/apache/hudi/pull/2334 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 commented on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…
pengzhiwei2018 commented on pull request #2334: URL: https://github.com/apache/hudi/pull/2334#issuecomment-812496550 > https://gist.github.com/nsivabalan/91f12109e0fe1ca9749ff5290c946778 Hi @nsivabalan , I have take a review for your test code. First you write a "int" to the table, so the table schema type is "int". Then you write a "double" to the table, so the table schema become to "double". The table schema changed from "int" to "double". I think this is more reasonable. In my original idea, I think the first write schema(e.g. "int") is the table schema forever. The incoming records after that should be compatible with the origin table schema(e.g. "int"). This is this PR wants to solve. I can understand more clearly now. The table schema should be change to a more generic type (e.g. from "int" to "double"), but not always be the first write schema. So I can close this PR now. Thanks @nsivabalan for correct me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] li36909 commented on pull request #2752: [HUDI-1749] Clean/Compaction/Rollback command maybe never exit when operation fail
li36909 commented on pull request #2752: URL: https://github.com/apache/hudi/pull/2752#issuecomment-812496806 just run any rollback/compaction command, and make it fail by injection fault, then the command will hang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2756: OrderingVal not being honoured for payloads in log files (for MOR table)
nsivabalan commented on issue #2756: URL: https://github.com/apache/hudi/issues/2756#issuecomment-812496813 yes, this is expected. if you are using OverwriteWithLatestAvroPayload as your payload class, combineAndGetUpdateValue does not honor ordering value. And so we added another payload for this purpose. DefaultHoodieRecordPayload. While using this, ensure you set "hoodie.payload.ordering.field" accordingly. Let me know how it goes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1745) Hudi compilation fails w/ spark version < 2.4.4 due to usage of unavailable spark api
[ https://issues.apache.org/jira/browse/HUDI-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313827#comment-17313827 ] sivabalan narayanan commented on HUDI-1745: --- We have always been testing w/ spark 2.4.4 and at uber, its 2.43. And so, we will just fix the documentation that min spark version supported is 2.4.3. > Hudi compilation fails w/ spark version < 2.4.4 due to usage of unavailable > spark api > - > > Key: HUDI-1745 > URL: https://issues.apache.org/jira/browse/HUDI-1745 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Affects Versions: 0.8.0 >Reporter: sivabalan narayanan >Priority: Major > Labels: sev:critical > > [https://github.com/apache/hudi/issues/2748] > > PR thats of interest: [https://github.com/apache/hudi/pull/2431] > > I see we have three options. Let me know if we have more. > Option1: > Similar to SparkRowSerDe, we might have to introduce an interface for > translateSqlOptions and override based on spark versions. But already we have > two sub modules for spark2 and spark3. and now we might have to add more such > modules for diff spark2 versions which might need more thought to do it > elegantly. > Option2: > Since this feature is added only w/ 0.8.0, and not like a more sought after > feature, we could revert this commit and unblock ourselves for 0.8.0. Once > release is complete, we can decide how to do about doing this and get this > feature in for next release. > Option3: > we say that hudi does not support spark version < 2.4.4 w/ 0.8.0. Don't think > we can go this route. But just listing it out. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1745) Hudi compilation fails w/ spark version < 2.4.4 due to usage of unavailable spark api
[ https://issues.apache.org/jira/browse/HUDI-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan resolved HUDI-1745. --- Resolution: Fixed > Hudi compilation fails w/ spark version < 2.4.4 due to usage of unavailable > spark api > - > > Key: HUDI-1745 > URL: https://issues.apache.org/jira/browse/HUDI-1745 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Affects Versions: 0.8.0 >Reporter: sivabalan narayanan >Priority: Major > Labels: sev:critical > > [https://github.com/apache/hudi/issues/2748] > > PR thats of interest: [https://github.com/apache/hudi/pull/2431] > > I see we have three options. Let me know if we have more. > Option1: > Similar to SparkRowSerDe, we might have to introduce an interface for > translateSqlOptions and override based on spark versions. But already we have > two sub modules for spark2 and spark3. and now we might have to add more such > modules for diff spark2 versions which might need more thought to do it > elegantly. > Option2: > Since this feature is added only w/ 0.8.0, and not like a more sought after > feature, we could revert this commit and unblock ourselves for 0.8.0. Once > release is complete, we can decide how to do about doing this and get this > feature in for next release. > Option3: > we say that hudi does not support spark version < 2.4.4 w/ 0.8.0. Don't think > we can go this route. But just listing it out. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on issue #2728: [SUPPORT]Hive sync error by using run_sync_tool.sh
nsivabalan commented on issue #2728: URL: https://github.com/apache/hudi/issues/2728#issuecomment-812497980 thanks for letting us know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HUDI-1734) Hive sync script (run_sync_tool.sh) fails w/ ClassNotFoundError : org/apache/log4j/LogManager
[ https://issues.apache.org/jira/browse/HUDI-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan resolved HUDI-1734. --- Fix Version/s: 0.8.0 Resolution: Invalid > Hive sync script (run_sync_tool.sh) fails w/ ClassNotFoundError : > org/apache/log4j/LogManager > - > > Key: HUDI-1734 > URL: https://issues.apache.org/jira/browse/HUDI-1734 > Project: Apache Hudi > Issue Type: Bug >Reporter: sivabalan narayanan >Priority: Major > Labels: sev:critical, user-support-issues > Fix For: 0.8.0 > > > ./run_sync_tool_sh --jdbc-url jdbc:hive://dxbigdata102:1000 \ --user appuser > \ --pass '' \ --base-path > 'hdfs://dxbigdata101:8020/user/hudi/test/data/hudi_trips_cow' \ --database > test \ --table hudi_trips_cow > > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/log4j/LogManager > at org.apache.hudi.hive.HiveSyncTool.(HiveSyncTool.java:55) > Caused by: java.lang.ClassNotFoundException: org.apache.log4j.LogManager > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 1 more > > https://github.com/apache/hudi/issues/2728 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] li36909 edited a comment on pull request #2752: [HUDI-1749] Clean/Compaction/Rollback command maybe never exit when operation fail
li36909 edited a comment on pull request #2752: URL: https://github.com/apache/hudi/pull/2752#issuecomment-812496806 just run any rollback/compaction command, and make it fail by injection fault, then the command will hang. For example, currently hudi only support the latest commit, we can make a rollback fail by rollback to a older commit, and observation the hang problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] li36909 edited a comment on pull request #2752: [HUDI-1749] Clean/Compaction/Rollback command maybe never exit when operation fail
li36909 edited a comment on pull request #2752: URL: https://github.com/apache/hudi/pull/2752#issuecomment-812496806 just run any rollback/compaction command, and make it fail by injection fault, then the command will hang. For example, currently hudi only support rollback to the latest commit, we can make a rollback fail by rollback to a older commit, and observation the hang problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1723) DFSPathSelector skips files with the same modify date when read up to source limit
[ https://issues.apache.org/jira/browse/HUDI-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313828#comment-17313828 ] sivabalan narayanan commented on HUDI-1723: --- [~xushiyan]: I don't have much exp on the query side, so some noob questions. Whats the granularity of the modification time? If its millisecs, you mean to say that we will have lot of files w/ exactly same modification time at ms granularity? Did you see this happen in prod env or just theorically speaking. I understand the problem, just trying to gauge the severity and probability of it occurring. > DFSPathSelector skips files with the same modify date when read up to source > limit > -- > > Key: HUDI-1723 > URL: https://issues.apache.org/jira/browse/HUDI-1723 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Raymond Xu >Priority: Critical > Labels: sev:critical, user-support-issues > Fix For: 0.9.0 > > Attachments: Screen Shot 2021-03-26 at 1.42.42 AM.png > > > org.apache.hudi.utilities.sources.helpers.DFSPathSelector#listEligibleFiles > filters the input files based on last saved checkpoint, which was the > modification date from last read file. However, the last read file's > modification date could be duplicated for multiple files and resulted in > skipping a few of them when reading up to source limit. An illustration is > shown in the attached picture. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1652) DiskBasedMap:As time goes by, the number of /temp/***** file handles held by the executor process is increasing
[ https://issues.apache.org/jira/browse/HUDI-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313832#comment-17313832 ] sivabalan narayanan commented on HUDI-1652: --- [~hainanzhongjian]: can we close the Jira then since its already fixed in hudi-0.7? > DiskBasedMap:As time goes by, the number of /temp/* file handles held by > the executor process is increasing > --- > > Key: HUDI-1652 > URL: https://issues.apache.org/jira/browse/HUDI-1652 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Affects Versions: 0.6.0 >Reporter: wangmeng >Priority: Major > Labels: sev:critical, user-support-issues > > We encountered a problem in the hudi production environment, which is very > similar to the HUDI-945 problem. > *Software environment:* spark 2.4.5, hudi 0.6 > *Scenario:* consume Kafka data and write hudi, using spark streaming > (non-StructedStreaming). > *Problem:* As time goes by, the number of /temp/* file handles held by > the executor process is increasing. > " > /tmp/10ded0f7-1bcc-4316-91e9-9b4d0507e1e0 > /tmp/49251680-0efd-4cc4-a55e-1af2038d3900 > /tmp/cc7dd284-3444-4c17-a5c8-84b3090c17f9 > " > *Reason analysis:* ExternalSpillableMap is used in HoodieMergeHandle class, > and DiskBasedMap is used to flush overflowed data to the disk. But the file > stream can only be closed and deleted by the hook when the jvm exits. When > the clear method is executed in the program, the stream is not closed and the > file is not deleted. As a result, over time, more and more file handles are > still held, leading to errors. This error is similar to Hudi-945. > > *软件环境:*spark 2.4.5、hudi 0.6 > *场景:*消费kafka数据写入hudi,采用spark streaming(非StructedStreaming)。 > *问题:executor 进程随着时间的推移,所持有的/temp/*文件句柄数越来越多。 > " > /tmp/10ded0f7-1bcc-4316-91e9-9b4d0507e1e0 > /tmp/49251680-0efd-4cc4-a55e-1af2038d3900 > /tmp/cc7dd284-3444-4c17-a5c8-84b3090c17f9 > " > *原因分析:*HoodieMergeHandle类中采用ExternalSpillableMap,使用DiskBasedMap将溢出的数据刷新到磁盘上。但是文件流只有在jvm退出的时候通过钩子关闭且删除文件。程序中执行clear方法时,并不关闭流及删除文件。从而导致随着时间推移,越来越多的文件句柄还持有,导致报错。此错误和Hudi-945挺相似的。 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan edited a comment on pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive
nsivabalan edited a comment on pull request #2449: URL: https://github.com/apache/hudi/pull/2449#issuecomment-779341027 @Trevor-zhang : sorry, I didn't suggest to close this out. I am also getting conversant w/ hive sync in general. So, was trying to clarify few things. if I am not wrong, metastore flow(non JDBC flow) is already supported. Just that config value for "hive.metastore.uris" is taken from Hadoop configuration and there is no direct way to pass in as as argument. If you intention is to add an argument to make it convenient, then this patch is the right approach. If not, lets discuss on what exactly are we trying to achieve here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1751) DeltaStream print many unnecessary warn log because of passing hoodie config to kafka consumer
[ https://issues.apache.org/jira/browse/HUDI-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-1751: -- Summary: DeltaStream print many unnecessary warn log because of passing hoodie config to kafka consumer (was: DeltaStream print many unnecessary warn log) > DeltaStream print many unnecessary warn log because of passing hoodie config > to kafka consumer > -- > > Key: HUDI-1751 > URL: https://issues.apache.org/jira/browse/HUDI-1751 > Project: Apache Hudi > Issue Type: Improvement >Reporter: lrz >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > > Because we add both kafka parameters and hudi configs at the same properties > file, such as kafka-source.properties, then when creating kafkaParams obj > will add some hoodie config also, which lead to the warn log printing: > !https://wa.vision.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/2/15/qwx352829/76572ba9f4094fb29b018db91fbf1450/image.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1528) hudi-sync-tools error
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1528: -- Labels: pull-request-available user-support-issues (was: pull-request-available sev:critical user-support-issues) > hudi-sync-tools error > - > > Key: HUDI-1528 > URL: https://issues.apache.org/jira/browse/HUDI-1528 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Reporter: Trevorzhang >Assignee: Trevorzhang >Priority: Major > Labels: pull-request-available, user-support-issues > Fix For: 0.9.0 > > > When using hudi-sync-tools to synchronize to a remote hive, hivemetastore > throw exceptions. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1528) hudi-sync-tools error
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1528: -- Labels: user-support-issues (was: pull-request-available user-support-issues) > hudi-sync-tools error > - > > Key: HUDI-1528 > URL: https://issues.apache.org/jira/browse/HUDI-1528 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Reporter: Trevorzhang >Assignee: Trevorzhang >Priority: Major > Labels: user-support-issues > Fix For: 0.9.0 > > > When using hudi-sync-tools to synchronize to a remote hive, hivemetastore > throw exceptions. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1528) hudi-sync-tools error
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313834#comment-17313834 ] sivabalan narayanan commented on HUDI-1528: --- [~Trevorzhang]: can you update the Jira on how you fixed the issue or what was the resolution. we can close it out if you don't have any more issues. > hudi-sync-tools error > - > > Key: HUDI-1528 > URL: https://issues.apache.org/jira/browse/HUDI-1528 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Reporter: Trevorzhang >Assignee: Trevorzhang >Priority: Major > Labels: user-support-issues > Fix For: 0.9.0 > > > When using hudi-sync-tools to synchronize to a remote hive, hivemetastore > throw exceptions. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1288) DeltaSync:writeToSink fails with Unknown datum type org.apache.avro.JsonProperties$Null
[ https://issues.apache.org/jira/browse/HUDI-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1288: -- Status: Open (was: New) > DeltaSync:writeToSink fails with Unknown datum type > org.apache.avro.JsonProperties$Null > --- > > Key: HUDI-1288 > URL: https://issues.apache.org/jira/browse/HUDI-1288 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Michal Swiatowy >Priority: Major > Labels: sev:critical, user-support-issues > > After updating to Hudi version 0.5.3 (prev. 0.5.2-incubating) I run into > following error message on write to HDFS: > {code:java} > 2020-09-18 12:54:38,651 [Driver] INFO > HoodieTableMetaClient:initTableAndGetMetaClient:379 - Finished initializing > Table of type MERGE_ON_READ from > /master_data/6FQS/hudi_test/S_INCOMINGMESSAGEDETAIL_CDC > 2020-09-18 12:54:38,663 [Driver] INFO DeltaSync:setupWriteClient:470 - > Setting up Hoodie Write Client > 2020-09-18 12:54:38,695 [Driver] INFO DeltaSync:registerAvroSchemas:522 - > Registering Schema > :[{"type":"record","name":"Value","namespace":"ARC_6FQS_W.dbo.S_INCOMINGMESSAGEDETAIL","fields":[{"name":"ID","type":"long"},{"name":"OPTIMISTICLOCK","type":{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}},{"name":"DOCUMENTAMOUNT","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DOCUMENTDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"DOCUMENTNUMBER","type":["null","string"],"default":null},{"name":"PAYMENTTYPE","type":["null","string"],"default":null},{"name":"PURCHASEORDERNUMBER","type":["null","string"],"default":null},{"name":"VALUEDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"INCOMINGMESSAGEHEADERID","type":["null","long"],"default":null},{"name":"MESSAGETEXTID","type":["null","long"],"default":null},{"name":"DUEDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"DEBTORASCNUMBER","type":["null","string"],"default":null},{"name":"DOCUMENTTYPE","type":["null","string"],"default":null},{"name":"NUMBEROFDUEDATES","type":["null","string"],"default":null},{"name":"DUEDATEINDICATOR","type":["null","string"],"default":null},{"name":"DISPUTECODE","type":["null","string"],"default":null},{"name":"INSTRUCTIONCODE","type":["null","string"],"default":null},{"name":"PAYMENTTERMS","type":["null","string"],"default":null},{"name":"PAYMENTCONDITION","type":["null","string"],"default":null},{"name":"DISCOUNTDAYS1","type":["null","string"],"default":null},{"name":"DISCOUNTDAYS2","type":["null","string"],"default":null},{"name":"ERRORID","type":["null","string"],"default":null},{"name":"DISCOUNTPERCENT1","type":["null",{"type":"bytes","scale":5,"precision":9,"connect.version":1,"connect.parameters":{"scale":"5","connect.decimal.precision":"9"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DISCOUNTPERCENT2","type":["null",{"type":"bytes","scale":5,"precision":9,"connect.version":1,"connect.parameters":{"scale":"5","connect.decimal.precision":"9"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DEDUCTIONAMOUNT1","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DEDUCTIONAMOUNT2","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DEDUCTIONAMOUNT3","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DISPUTEAMOUNT","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"CREDITNOTENUMBER","type":["null","string"],"default":null},{"name":"DEDUCTIONCODE1","type":["null","string"],"default":null}
[jira] [Commented] (HUDI-1288) DeltaSync:writeToSink fails with Unknown datum type org.apache.avro.JsonProperties$Null
[ https://issues.apache.org/jira/browse/HUDI-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313835#comment-17313835 ] sivabalan narayanan commented on HUDI-1288: --- Closing out this Jira as we don't have any plans to back port fixes. > DeltaSync:writeToSink fails with Unknown datum type > org.apache.avro.JsonProperties$Null > --- > > Key: HUDI-1288 > URL: https://issues.apache.org/jira/browse/HUDI-1288 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Michal Swiatowy >Priority: Major > Labels: sev:critical, user-support-issues > > After updating to Hudi version 0.5.3 (prev. 0.5.2-incubating) I run into > following error message on write to HDFS: > {code:java} > 2020-09-18 12:54:38,651 [Driver] INFO > HoodieTableMetaClient:initTableAndGetMetaClient:379 - Finished initializing > Table of type MERGE_ON_READ from > /master_data/6FQS/hudi_test/S_INCOMINGMESSAGEDETAIL_CDC > 2020-09-18 12:54:38,663 [Driver] INFO DeltaSync:setupWriteClient:470 - > Setting up Hoodie Write Client > 2020-09-18 12:54:38,695 [Driver] INFO DeltaSync:registerAvroSchemas:522 - > Registering Schema > :[{"type":"record","name":"Value","namespace":"ARC_6FQS_W.dbo.S_INCOMINGMESSAGEDETAIL","fields":[{"name":"ID","type":"long"},{"name":"OPTIMISTICLOCK","type":{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}},{"name":"DOCUMENTAMOUNT","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DOCUMENTDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"DOCUMENTNUMBER","type":["null","string"],"default":null},{"name":"PAYMENTTYPE","type":["null","string"],"default":null},{"name":"PURCHASEORDERNUMBER","type":["null","string"],"default":null},{"name":"VALUEDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"INCOMINGMESSAGEHEADERID","type":["null","long"],"default":null},{"name":"MESSAGETEXTID","type":["null","long"],"default":null},{"name":"DUEDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"DEBTORASCNUMBER","type":["null","string"],"default":null},{"name":"DOCUMENTTYPE","type":["null","string"],"default":null},{"name":"NUMBEROFDUEDATES","type":["null","string"],"default":null},{"name":"DUEDATEINDICATOR","type":["null","string"],"default":null},{"name":"DISPUTECODE","type":["null","string"],"default":null},{"name":"INSTRUCTIONCODE","type":["null","string"],"default":null},{"name":"PAYMENTTERMS","type":["null","string"],"default":null},{"name":"PAYMENTCONDITION","type":["null","string"],"default":null},{"name":"DISCOUNTDAYS1","type":["null","string"],"default":null},{"name":"DISCOUNTDAYS2","type":["null","string"],"default":null},{"name":"ERRORID","type":["null","string"],"default":null},{"name":"DISCOUNTPERCENT1","type":["null",{"type":"bytes","scale":5,"precision":9,"connect.version":1,"connect.parameters":{"scale":"5","connect.decimal.precision":"9"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DISCOUNTPERCENT2","type":["null",{"type":"bytes","scale":5,"precision":9,"connect.version":1,"connect.parameters":{"scale":"5","connect.decimal.precision":"9"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DEDUCTIONAMOUNT1","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DEDUCTIONAMOUNT2","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DEDUCTIONAMOUNT3","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DISPUTEAMOUNT","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"CREDITNOTENUMBER","type":
[GitHub] [hudi] li36909 commented on a change in pull request #2754: [HUDI-1751] DeltaStreamer print many unnecessary warn log
li36909 commented on a change in pull request #2754: URL: https://github.com/apache/hudi/pull/2754#discussion_r606207354 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java ## @@ -173,9 +173,11 @@ public KafkaOffsetGen(TypedProperties props) { this.props = props; kafkaParams = new HashMap<>(); -for (Object prop : props.keySet()) { +props.keySet().stream().filter(prop -> { Review comment: how about change to this: "DeltaStream print many unnecessary warn log because of passing hoodie config to kafka consumer". the warn log is printed by kafkaconsumer. when hudi new the kafka consumer, hudi pass some non-kafka parameter to the kafka comsumer, then lead to these warn log, to solve this problem we just need to filter the hoodie config. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HUDI-1288) DeltaSync:writeToSink fails with Unknown datum type org.apache.avro.JsonProperties$Null
[ https://issues.apache.org/jira/browse/HUDI-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan resolved HUDI-1288. --- Fix Version/s: 0.6.0 Resolution: Fixed > DeltaSync:writeToSink fails with Unknown datum type > org.apache.avro.JsonProperties$Null > --- > > Key: HUDI-1288 > URL: https://issues.apache.org/jira/browse/HUDI-1288 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Michal Swiatowy >Priority: Major > Labels: sev:critical, user-support-issues > Fix For: 0.6.0 > > > After updating to Hudi version 0.5.3 (prev. 0.5.2-incubating) I run into > following error message on write to HDFS: > {code:java} > 2020-09-18 12:54:38,651 [Driver] INFO > HoodieTableMetaClient:initTableAndGetMetaClient:379 - Finished initializing > Table of type MERGE_ON_READ from > /master_data/6FQS/hudi_test/S_INCOMINGMESSAGEDETAIL_CDC > 2020-09-18 12:54:38,663 [Driver] INFO DeltaSync:setupWriteClient:470 - > Setting up Hoodie Write Client > 2020-09-18 12:54:38,695 [Driver] INFO DeltaSync:registerAvroSchemas:522 - > Registering Schema > :[{"type":"record","name":"Value","namespace":"ARC_6FQS_W.dbo.S_INCOMINGMESSAGEDETAIL","fields":[{"name":"ID","type":"long"},{"name":"OPTIMISTICLOCK","type":{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}},{"name":"DOCUMENTAMOUNT","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DOCUMENTDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"DOCUMENTNUMBER","type":["null","string"],"default":null},{"name":"PAYMENTTYPE","type":["null","string"],"default":null},{"name":"PURCHASEORDERNUMBER","type":["null","string"],"default":null},{"name":"VALUEDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"INCOMINGMESSAGEHEADERID","type":["null","long"],"default":null},{"name":"MESSAGETEXTID","type":["null","long"],"default":null},{"name":"DUEDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"DEBTORASCNUMBER","type":["null","string"],"default":null},{"name":"DOCUMENTTYPE","type":["null","string"],"default":null},{"name":"NUMBEROFDUEDATES","type":["null","string"],"default":null},{"name":"DUEDATEINDICATOR","type":["null","string"],"default":null},{"name":"DISPUTECODE","type":["null","string"],"default":null},{"name":"INSTRUCTIONCODE","type":["null","string"],"default":null},{"name":"PAYMENTTERMS","type":["null","string"],"default":null},{"name":"PAYMENTCONDITION","type":["null","string"],"default":null},{"name":"DISCOUNTDAYS1","type":["null","string"],"default":null},{"name":"DISCOUNTDAYS2","type":["null","string"],"default":null},{"name":"ERRORID","type":["null","string"],"default":null},{"name":"DISCOUNTPERCENT1","type":["null",{"type":"bytes","scale":5,"precision":9,"connect.version":1,"connect.parameters":{"scale":"5","connect.decimal.precision":"9"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DISCOUNTPERCENT2","type":["null",{"type":"bytes","scale":5,"precision":9,"connect.version":1,"connect.parameters":{"scale":"5","connect.decimal.precision":"9"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DEDUCTIONAMOUNT1","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DEDUCTIONAMOUNT2","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DEDUCTIONAMOUNT3","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DISPUTEAMOUNT","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"CREDITNOTENUMBER","type":["null","string"],"default":null},{"name":"DE
[jira] [Commented] (HUDI-1063) Save in Google Cloud Storage not working
[ https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313836#comment-17313836 ] sivabalan narayanan commented on HUDI-1063: --- [~WaterKnight]: Were you able to resolve your issue. If disabling embedded timeline server work for you. feel free to close out the ticket if not required. if not, would appreciate w/ any updates. > Save in Google Cloud Storage not working > > > Key: HUDI-1063 > URL: https://issues.apache.org/jira/browse/HUDI-1063 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Affects Versions: 0.9.0 >Reporter: David Lacalle Castillo >Priority: Critical > Labels: sev:critical, user-support-issues > Fix For: 0.9.0 > > > I added to spark submit the following properties: > {{--packages > org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4 > \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}} > Spark version 2.4.5 and Hadoop version 3.2.1 > > I am trying to save a Dataframe as follows in Google Cloud Storage as follows: > tableName = "forecasts" > basePath = "gs://hudi-datalake/" + tableName > hudi_options = { > 'hoodie.table.name': tableName, > 'hoodie.datasource.write.recordkey.field': 'uuid', > 'hoodie.datasource.write.partitionpath.field': 'partitionpath', > 'hoodie.datasource.write.table.name': tableName, > 'hoodie.datasource.write.operation': 'insert', > 'hoodie.datasource.write.precombine.field': 'ts', > 'hoodie.upsert.shuffle.parallelism': 2, > 'hoodie.insert.shuffle.parallelism': 2 > } > results = results.selectExpr( > "ds as date", > "store", > "item", > "y as sales", > "yhat as sales_predicted", > "yhat_upper as sales_predicted_upper", > "yhat_lower as sales_predicted_lower", > "training_date") > results.write.format("hudi"). \ > options(**hudi_options). \ > mode("overwrite"). \ > save(basePath) > I am getting the following error: > Py4JJavaError: An error occurred while calling o312.save. : > java.lang.NoSuchMethodError: > org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V at > io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50) > at io.javalin.Javalin.(Javalin.java:94) at > io.javalin.Javalin.create(Javalin.java:107) at > org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:102) > at > org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:74) > at > org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:102) > at > org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:69) > at > org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:83) > at > org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:137) > at > org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:124) > at > org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:120) > at > org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:195) > at > org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:135) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108) at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) > at > org.apache.spark.sql.executio
[GitHub] [hudi] li36909 commented on a change in pull request #2754: [HUDI-1751] DeltaStreamer print many unnecessary warn log
li36909 commented on a change in pull request #2754: URL: https://github.com/apache/hudi/pull/2754#discussion_r606207765 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java ## @@ -173,9 +173,11 @@ public KafkaOffsetGen(TypedProperties props) { this.props = props; kafkaParams = new HashMap<>(); -for (Object prop : props.keySet()) { +props.keySet().stream().filter(prop -> { Review comment: BTW, I find a UT fail cause by concurrent write to a hudi table, I will try to analyze it later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1036) HoodieCombineHiveInputFormat not picking up HoodieRealtimeFileSplit
[ https://issues.apache.org/jira/browse/HUDI-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313837#comment-17313837 ] sivabalan narayanan commented on HUDI-1036: --- [~nishith29]: this has been lying around for some time. do fix the sev labels as appropriate. > HoodieCombineHiveInputFormat not picking up HoodieRealtimeFileSplit > --- > > Key: HUDI-1036 > URL: https://issues.apache.org/jira/browse/HUDI-1036 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Affects Versions: 0.9.0 >Reporter: Bhavani Sudha >Assignee: Nishith Agarwal >Priority: Major > Labels: sev:critical, user-support-issues > Fix For: 0.9.0 > > > Opening this Jira based on the GitHub issue reported here - > [https://github.com/apache/hudi/issues/1735] when hive.input.format = > org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat it is not able to > create HoodieRealtimeFileSplit for querying _rt table. Please see the GitHub > issue more details. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-874) Schema evolution does not work with AWS Glue catalog
[ https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313840#comment-17313840 ] sivabalan narayanan commented on HUDI-874: -- [~uditme]: is someone from AWS looking into this. > Schema evolution does not work with AWS Glue catalog > > > Key: HUDI-874 > URL: https://issues.apache.org/jira/browse/HUDI-874 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Udit Mehrotra >Priority: Major > Labels: sev:critical, user-support-issues > > This issue has been discussed here > [https://github.com/apache/incubator-hudi/issues/1581] and at other places as > well. Glue catalog currently does not support *cascade* for *ALTER TABLE* > statements. As a result features like adding new columns to an existing table > does now work with glue catalog . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HUDI-874) Schema evolution does not work with AWS Glue catalog
[ https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313840#comment-17313840 ] sivabalan narayanan edited comment on HUDI-874 at 4/2/21, 12:10 PM: [~uditme]: is someone from AWS looking into this. Can you give us an update. was (Author: shivnarayan): [~uditme]: is someone from AWS looking into this. > Schema evolution does not work with AWS Glue catalog > > > Key: HUDI-874 > URL: https://issues.apache.org/jira/browse/HUDI-874 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Udit Mehrotra >Priority: Major > Labels: sev:critical, user-support-issues > > This issue has been discussed here > [https://github.com/apache/incubator-hudi/issues/1581] and at other places as > well. Glue catalog currently does not support *cascade* for *ALTER TABLE* > statements. As a result features like adding new columns to an existing table > does now work with glue catalog . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-774) Spark to Avro converter incorrectly generates optional fields
[ https://issues.apache.org/jira/browse/HUDI-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313841#comment-17313841 ] sivabalan narayanan commented on HUDI-774: -- related issue : HUDI-1716 > Spark to Avro converter incorrectly generates optional fields > - > > Key: HUDI-774 > URL: https://issues.apache.org/jira/browse/HUDI-774 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Priority: Major > Labels: pull-request-available, sev:critical, user-support-issues > Fix For: 0.9.0 > > Time Spent: 10m > Remaining Estimate: 0h > > I think https://issues.apache.org/jira/browse/SPARK-28008 is a good > descriptions of what is happening. > > It can cause a situation when schema in the MOR log files is incompatible > with the schema produced by RowBasedSchemaProvider, so compactions will stall. > > I have a fix which is a bit hacky -> postprocess schema produced by the > converter and > 1) Make sure unions with null types have those null types at position 0 > 2) They have default values set to null > I couldn't find a way to do a clean fix as some classes that are problematic > are from Hive and called from Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1716) rt view w/ MOR tables fails after schema evolution
[ https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313842#comment-17313842 ] sivabalan narayanan commented on HUDI-1716: --- related issue: HUDI-774 > rt view w/ MOR tables fails after schema evolution > -- > > Key: HUDI-1716 > URL: https://issues.apache.org/jira/browse/HUDI-1716 > Project: Apache Hudi > Issue Type: Bug > Components: Storage Management >Reporter: sivabalan narayanan >Assignee: Aditya Tiwari >Priority: Major > Labels: sev:critical, user-support-issues > Fix For: 0.9.0 > > > Looks like realtime view w/ MOR table fails if schema present in existing log > file is evolved to add a new field. no issues w/ writing. but reading fails > More info: [https://github.com/apache/hudi/issues/2675] > > gist of the stack trace: > Caused by: org.apache.avro.AvroTypeException: Found > hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting > hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field > evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found > hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting > hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field > evolvedField at > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at > org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at > org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) > at > org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165) > at > org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128) > at > org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252) > ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage > 83.0 (TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver): > org.apache.hudi.exception.HoodieException: Exception when reading log file > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:93) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:75) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230) > at > org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328) > at > org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.(HoodieMergeOnReadRDD.scala:210) > at > org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200) > at > org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77) > > Logs from local run: > [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198] > diff with which above logs were generated: > [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec] > > Steps to reproduce in spark shell: > # create MOR table w/ schema1. > # Ingest (with schema1) until log files are created. // verify via hudi-cli. > It took me 2 batch of updates to see a log file. > # create a new schema2 with one new additional field. ingest a batch with > schema2 that updates existing records. > # read entire dataset. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…
nsivabalan commented on pull request #2334: URL: https://github.com/apache/hudi/pull/2334#issuecomment-812516130 Yeah, I did verify by enabling schema compatability check. it will fail if we try to evolve a field from double to int. ``` scala> dfFromData5.write.format("hudi"). | options(getQuickstartWriteConfigs). | option(PRECOMBINE_FIELD_OPT_KEY, "preComb"). | option(RECORDKEY_FIELD_OPT_KEY, "rowId"). | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionId"). | option("hoodie.index.type","SIMPLE"). | option(TABLE_NAME, tableName). | option("hoodie.avro.schema.validate","true"). | mode(Append). | save(basePath) org.apache.hudi.exception.HoodieUpsertException: Failed upsert schema compatibility check. at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:629) at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:152) at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:186) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:677) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:286) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:272) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:230) ... 72 elided Caused by: org.apache.hudi.exception.HoodieException: Failed schema compatibility check for writerSchema :{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"rowId","type":["string","null"]},{"name":"partitionId","type":["string","null"]},{"name":"preComb","type":["long","null"]},{"name":"name","type":["string","null"]},{"name":"versionId","type":["string","null"]},{"name":"doubleToInt","type":["int","null"]}]}, table schema :{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"_hoodie_co mmit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"rowId","type":["string","null"]},{"name":"partitionId","type":["string","null"]},{"name":"preComb","type":["long","null"]},{"name":"name","type":["string","null"]},{"name":"versionId","type":["string","null"]},{"name":"doubleToInt","type":["double","null"]}]}, base path :file:/tmp/hudi_trips_cow at org.apache.hudi.table.HoodieTable.valida
[GitHub] [hudi] aditiwari01 commented on issue #2756: OrderingVal not being honoured for payloads in log files (for MOR table)
aditiwari01 commented on issue #2756: URL: https://github.com/apache/hudi/issues/2756#issuecomment-812516535 I think I couldn't explain myself. I am using DefaultHoodieRecordPayload only. I am attached sample command regardinng same. The issue is not with "combineAndGetUpdateValue", rather with "preCombine". As per my uderstanding, combineAndGetUpdateValue is used to merge record from parquet and in memory record, whereas preCombine is used to dedupe multiple records in memory with same key. The preCombine function uses orderingVal field to sort and while creating record from log file we do not set this ordering field. And hence the issue. The constructors are as foolows: 1. DefaultHoodieRecordPayload(Option record) {this(recordl, 0);} 2. DefaultHoodieRecordPayload(GenericRecord record, Comparable orderingVal) {super(record, orderingVal)} In the read path we only call the 1st constructor and hence lose the ordering value. Also, if we compact after each commit we dont see this issue since "combineAndGetUpdateValue" works absolutely fine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] aditiwari01 edited a comment on issue #2756: OrderingVal not being honoured for payloads in log files (for MOR table)
aditiwari01 edited a comment on issue #2756: URL: https://github.com/apache/hudi/issues/2756#issuecomment-812516535 I think I couldn't explain myself. I am using DefaultHoodieRecordPayload only. I have attached sample command regardinng same. The issue is not with "combineAndGetUpdateValue", rather with "preCombine". As per my uderstanding, combineAndGetUpdateValue is used to merge record from parquet and in memory record, whereas preCombine is used to dedupe multiple records in memory with same key. The preCombine function uses orderingVal field to sort and while creating record from log file we do not set this ordering field. And hence the issue. The constructors are as foolows: 1. DefaultHoodieRecordPayload(Option record) {this(recordl, 0);} 2. DefaultHoodieRecordPayload(GenericRecord record, Comparable orderingVal) {super(record, orderingVal)} In the read path we only call the 1st constructor and hence lose the ordering value. Also, if we compact after each commit we dont see this issue since "combineAndGetUpdateValue" works absolutely fine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1453) Throw Exception when input data schema is not equal to the hoodie table schema
[ https://issues.apache.org/jira/browse/HUDI-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1453: -- Status: Open (was: New) > Throw Exception when input data schema is not equal to the hoodie table schema > -- > > Key: HUDI-1453 > URL: https://issues.apache.org/jira/browse/HUDI-1453 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available, sev:high, user-support-issues > Fix For: 0.9.0 > > > The hoodie table *h0's* schema is : > {code:java} > (id long, price double){code} > when I write the *dataframe* to *h0* with the follow schema: > {code:java} > (id long, price int){code} > An Exception is threw as follow: > {code:java} > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)at > org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136) at > org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49) > at > org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:102) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ... 4 > moreCaused by: java.lang.UnsupportedOperationException: > org.apache.parquet.avro.AvroConverters$FieldIntegerConverter at > org.apache.parquet.io.api.PrimitiveConverter.addDouble(PrimitiveConverter.java:84) > at > org.apache.parquet.column.impl.ColumnReaderImpl$2$2.writeValue(ColumnReaderImpl.java:228) > at > org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367) > at > org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226) > ... 11 more > {code} > I have enable the *AVRO_SCHEMA_VALIDATE,* it *can pass the schema validate > in HoodieTable#validateUpsertSchema,* so it is right to write the "int" data > to the "double" field in hoodie. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1453) Throw Exception when input data schema is not equal to the hoodie table schema
[ https://issues.apache.org/jira/browse/HUDI-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313858#comment-17313858 ] sivabalan narayanan commented on HUDI-1453: --- double to int is not backwards compatible schema evolution. if schema compatibility check is enabled, it will fail. ``` {{scala> dfFromData5.write.format("hudi"). | options(getQuickstartWriteConfigs). | option(PRECOMBINE_FIELD_OPT_KEY, "preComb"). | option(RECORDKEY_FIELD_OPT_KEY, "rowId"). | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionId"). | option("hoodie.index.type","SIMPLE"). | option(TABLE_NAME, tableName). | option("hoodie.avro.schema.validate","true"). | mode(Append). | save(basePath) org.apache.hudi.exception.HoodieUpsertException: Failed upsert schema compatibility check. at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:629) at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:152) at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:186) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:677) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:286) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:272) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:230) ... 72 elided Caused by: org.apache.hudi.exception.HoodieException: Failed schema compatibility check for writerSchema :\{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},\{"name":"rowId","type":["string","null"]},\{"name":"partitionId","type":["string","null"]},\{"name":"preComb","type":["long","null"]},\{"name":"name","type":["string","null"]},\{"name":"versionId","type":["string","null"]},\{"name":"doubleToInt","type":["int","null"]}]}, table schema :\{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},\{"name":"rowId","type":["string","null"]},\{"name":"partitionId","type":["string","null"]},\{"name":"preComb","type":["long","null"]},\{"name":"name","type":["string","null"]},\{"name":"versionId","type":["string","null"]},\{"name":"doubleToInt","type":["double","null"]}]}, base path :file:/tmp/hudi_trips_cow at o
[jira] [Resolved] (HUDI-1453) Throw Exception when input data schema is not equal to the hoodie table schema
[ https://issues.apache.org/jira/browse/HUDI-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan resolved HUDI-1453. --- Resolution: Invalid > Throw Exception when input data schema is not equal to the hoodie table schema > -- > > Key: HUDI-1453 > URL: https://issues.apache.org/jira/browse/HUDI-1453 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available, sev:high, user-support-issues > Fix For: 0.9.0 > > > The hoodie table *h0's* schema is : > {code:java} > (id long, price double){code} > when I write the *dataframe* to *h0* with the follow schema: > {code:java} > (id long, price int){code} > An Exception is threw as follow: > {code:java} > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)at > org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136) at > org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49) > at > org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:102) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ... 4 > moreCaused by: java.lang.UnsupportedOperationException: > org.apache.parquet.avro.AvroConverters$FieldIntegerConverter at > org.apache.parquet.io.api.PrimitiveConverter.addDouble(PrimitiveConverter.java:84) > at > org.apache.parquet.column.impl.ColumnReaderImpl$2$2.writeValue(ColumnReaderImpl.java:228) > at > org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367) > at > org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226) > ... 11 more > {code} > I have enable the *AVRO_SCHEMA_VALIDATE,* it *can pass the schema validate > in HoodieTable#validateUpsertSchema,* so it is right to write the "int" data > to the "double" field in hoodie. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] ssdong edited a comment on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
ssdong edited a comment on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-811619487 @jsbali To give out extra insights and details, as @zherenyu831 has posted in the beginning: ``` [20210323080718__replacecommit__COMPLETED]: size : 0 [20210323081449__replacecommit__COMPLETED]: size : 1 [20210323082046__replacecommit__COMPLETED]: size : 1 [20210323082758__replacecommit__COMPLETED]: size : 1 [20210323084004__replacecommit__COMPLETED]: size : 1 [20210323085044__replacecommit__COMPLETED]: size : 1 [20210323085823__replacecommit__COMPLETED]: size : 1 [20210323090550__replacecommit__COMPLETED]: size : 1 [20210323091700__replacecommit__COMPLETED]: size : 1 ``` If we keep everything the same and let archive logic handling everything, it would fail at 0 `partitionToReplaceFileIds` against `20210323080718__replacecommit__COMPLETED`(the first item in the list above), and this is a known issue. To make the archive work, we tried to _manually_ delete the first _empty_ commit file, which is `20210323080718__replacecommit__COMPLETED`(the first item in the list above). This has succeeded the archive, but instead, it has failed upon `User class threw exception: org.apache.hudi.exception.HoodieIOException: Could not read commit details from s3://xxx/data/.hoodie/20210323081449.replacecommit`(the second item in the list above) Now to reason through the underlying mechanism of this error, given the archive was successful, that means a few commit files have been placed within the `.archive` folder, let's say ``` [20210323081449__replacecommit__COMPLETED]: size : 1 [20210323082046__replacecommit__COMPLETED]: size : 1 [20210323082758__replacecommit__COMPLETED]: size : 1 [20210323084004__replacecommit__COMPLETED]: size : 1 [20210323085044__replacecommit__COMPLETED]: size : 1 ``` have been successfully moved and placed in `.archive`. At this moment, the timeline has been updated and there are 3 remaining commit files which are: ``` [20210323085823__replacecommit__COMPLETED]: size : 1 [20210323090550__replacecommit__COMPLETED]: size : 1 [20210323091700__replacecommit__COMPLETED]: size : 1 ``` Now, if you pay attention to the stack trace which caused `User class threw exception: org.apache.hudi.exception.HoodieIOException: Could not read commit details from s3://xxx/data/.hoodie/20210323081449.replacecommit`, and I am just pasting them again: ``` User class threw exception: org.apache.hudi.exception.HoodieIOException: Could not read commit details from s3://xxx/data/.hoodie/20210323081449.replacecommit at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:530) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:194) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353) at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707) at org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118) at org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:179) at org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:112) ``` After a `close` action being triggered on `TimelineService`, which is understandable, it propagates to `HoodieTableFileSystemView.close` and there is: ``` at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) at org.apache.hudi.common.table.view.A
[GitHub] [hudi] codecov-io edited a comment on pull request #2452: [HUDI-1531] Introduce HoodiePartitionCleaner to delete specific partition
codecov-io edited a comment on pull request #2452: URL: https://github.com/apache/hudi/pull/2452#issuecomment-761259726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2452: [HUDI-1531] Introduce HoodiePartitionCleaner to delete specific partition
codecov-io edited a comment on pull request #2452: URL: https://github.com/apache/hudi/pull/2452#issuecomment-761259726 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2452?src=pr&el=h1) Report > Merging [#2452](https://codecov.io/gh/apache/hudi/pull/2452?src=pr&el=desc) (8052abf) into [master](https://codecov.io/gh/apache/hudi/commit/e970e1f48302aec3af7eeca009a2c793757cd501?el=desc) (e970e1f) will **decrease** coverage by `0.16%`. > The diff coverage is `0.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2452/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2452?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2452 +/- ## - Coverage 52.32% 52.16% -0.17% Complexity 3689 3689 Files 483 484 +1 Lines 2309523159 +64 Branches 2460 2466 +6 - Hits 1208412080 -4 - Misses 994210010 +68 Partials 1069 1069 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | | | hudicommon | `50.78% <ø> (-0.05%)` | `0.00 <ø> (ø)` | | | hudiflink | `56.71% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `71.33% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisync | `45.47% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `67.57% <0.00%> (-2.12%)` | `0.00 <0.00> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2452?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../apache/hudi/utilities/HoodiePartitionCleaner.java](https://codecov.io/gh/apache/hudi/pull/2452/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVBhcnRpdGlvbkNsZWFuZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | | | [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2452/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | | | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2452/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `78.12% <0.00%> (-1.57%)` | `26.00% <0.00%> (ø%)` | | | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2452/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `71.42% <0.00%> (+0.34%)` | `56.00% <0.00%> (+1.00%)` | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jkdll opened a new issue #2760: [SUPPORT] Possibly Incorrect Documentation
jkdll opened a new issue #2760: URL: https://github.com/apache/hudi/issues/2760 Hi, I am using the HudiWriteClient library and have been following the documentation at [this link](https://hudi.apache.org/docs/configurations.html#writeclient-configs) to instantiate the HoodieWriteConfig object. The documentation indicates that the WriteConfig can define the [withAssumeDatePartitioning](https://hudi.apache.org/docs/configurations.html#withAssumeDatePartitioning) attribute. However upon further inspection, it turns out that this attribute is not present [in the HoodieWriteConfig class.](https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java). Instead this field looks to be present in [HoodieMetaDataConifg](https://github.com/apache/hudi/blob/03668dbaf1a60428d7e0d68c6622605e0809150a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java). Similarly [withConsistencyCheckEnabled](https://hudi.apache.org/docs/configurations.html#withConsistencyCheckEnabled) is also not in the HoodieWriteConfig class and its [usage](https://github.com/apache/hudi/blob/03668dbaf1a60428d7e0d68c6622605e0809150a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieClientTestBase.java#L146) is not properly documented. Versions from Pom.xml: ` org.apache.hudi hudi-java-client 0.7.0 org.apache.hudi hudi-client-common 0.7.0 org.apache.hudi hudi-client 0.7.0 pom ` Thus I suggest that the documentation is updated. Thanks, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1036) HoodieCombineHiveInputFormat not picking up HoodieRealtimeFileSplit
[ https://issues.apache.org/jira/browse/HUDI-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-1036: -- Labels: sev:normal user-support-issues (was: sev:critical user-support-issues) > HoodieCombineHiveInputFormat not picking up HoodieRealtimeFileSplit > --- > > Key: HUDI-1036 > URL: https://issues.apache.org/jira/browse/HUDI-1036 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Affects Versions: 0.9.0 >Reporter: Bhavani Sudha >Assignee: Nishith Agarwal >Priority: Major > Labels: sev:normal, user-support-issues > Fix For: 0.9.0 > > > Opening this Jira based on the GitHub issue reported here - > [https://github.com/apache/hudi/issues/1735] when hive.input.format = > org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat it is not able to > create HoodieRealtimeFileSplit for querying _rt table. Please see the GitHub > issue more details. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1036) HoodieCombineHiveInputFormat not picking up HoodieRealtimeFileSplit
[ https://issues.apache.org/jira/browse/HUDI-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314046#comment-17314046 ] Nishith Agarwal commented on HUDI-1036: --- [~shivnarayan] Thanks for the reminder, I will take a look at this. > HoodieCombineHiveInputFormat not picking up HoodieRealtimeFileSplit > --- > > Key: HUDI-1036 > URL: https://issues.apache.org/jira/browse/HUDI-1036 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Affects Versions: 0.9.0 >Reporter: Bhavani Sudha >Assignee: Nishith Agarwal >Priority: Major > Labels: sev:critical, user-support-issues > Fix For: 0.9.0 > > > Opening this Jira based on the GitHub issue reported here - > [https://github.com/apache/hudi/issues/1735] when hive.input.format = > org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat it is not able to > create HoodieRealtimeFileSplit for querying _rt table. Please see the GitHub > issue more details. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on issue #2756: OrderingVal not being honoured for payloads in log files (for MOR table)
nsivabalan commented on issue #2756: URL: https://github.com/apache/hudi/issues/2756#issuecomment-812703790 CC @n3nash -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-79) how to query hoodie tables with 'Hive on Spark' engine?
[ https://issues.apache.org/jira/browse/HUDI-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-79: Labels: sev:normal user-support-issues (was: sev:critical user-support-issues) > how to query hoodie tables with 'Hive on Spark' engine? > --- > > Key: HUDI-79 > URL: https://issues.apache.org/jira/browse/HUDI-79 > Project: Apache Hudi > Issue Type: Wish > Components: Hive Integration >Reporter: t oo >Assignee: Nishith Agarwal >Priority: Major > Labels: sev:normal, user-support-issues > > > [https://cwiki.apache.org//confluence/display/Hive/Hive+on+Spark:+Getting+Started] > recommends not having any \**hive\*jar in the spark/jars folder. But when > running hive on spark exec engine with non-local spark.master and query a > hoodie external table i get errors about classnotfound > org.apache.hadoop.hive.ql.io. _parquet_._MapredParquetInputFormat_* > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on issue #2675: [SUPPORT] Unable to query MOR table after schema evolution
nsivabalan commented on issue #2675: URL: https://github.com/apache/hudi/issues/2675#issuecomment-812710497 Closing this as we have a tracking jira. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #2675: [SUPPORT] Unable to query MOR table after schema evolution
nsivabalan closed issue #2675: URL: https://github.com/apache/hudi/issues/2675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #2586: [SUPPORT] - How to guarantee snapshot isolation when reading Hudi tables in S3?
nsivabalan closed issue #2586: URL: https://github.com/apache/hudi/issues/2586 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2586: [SUPPORT] - How to guarantee snapshot isolation when reading Hudi tables in S3?
nsivabalan commented on issue #2586: URL: https://github.com/apache/hudi/issues/2586#issuecomment-812712095 Closing this for now. please feel free to reopen or open a new ticket. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2284: [SUPPORT] : Is there a option to achieve SCD 2 in Hudi?
nsivabalan commented on issue #2284: URL: https://github.com/apache/hudi/issues/2284#issuecomment-812713158 CC @n3nash -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2338: [SUPPORT] MOR table found duplicate and process so slowly
nsivabalan commented on issue #2338: URL: https://github.com/apache/hudi/issues/2338#issuecomment-812713376 Closing due to inactivity. but feel free to reopen to create a new ticket. would be happy to assist you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #2338: [SUPPORT] MOR table found duplicate and process so slowly
nsivabalan closed issue #2338: URL: https://github.com/apache/hudi/issues/2338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2757: [HUDI-1757] Assigns the buckets by record key for Flink writer
codecov-io edited a comment on pull request #2757: URL: https://github.com/apache/hudi/pull/2757#issuecomment-812247500 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2757?src=pr&el=h1) Report > Merging [#2757](https://codecov.io/gh/apache/hudi/pull/2757?src=pr&el=desc) (8dd3a6f) into [master](https://codecov.io/gh/apache/hudi/commit/9804662bc8e17d6936c20326f17ec7c0360dcaf6?el=desc) (9804662) will **increase** coverage by `17.56%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2757/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2757?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2757 +/- ## = + Coverage 52.12% 69.69% +17.56% + Complexity 3646 371 -3275 = Files 480 54 -426 Lines 22867 1993-20874 Branches 2417 236 -2181 = - Hits 11920 1389-10531 + Misses 9916 473 -9443 + Partials 1031 131 -900 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.69% <ø> (-0.04%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2757?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `71.08% <0.00%> (-0.30%)` | `55.00% <0.00%> (ø%)` | | | [...i/common/table/view/HoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh) | | | | | [...g/apache/hudi/common/util/RocksDBSchemaHelper.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUm9ja3NEQlNjaGVtYUhlbHBlci5qYXZh) | | | | | [...e/hudi/metadata/FileSystemBackedTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvRmlsZVN5c3RlbUJhY2tlZFRhYmxlTWV0YWRhdGEuamF2YQ==) | | | | | [...main/scala/org/apache/hudi/HoodieWriterUtils.scala](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVdyaXRlclV0aWxzLnNjYWxh) | | | | | [.../org/apache/hudi/common/engine/EngineProperty.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9FbmdpbmVQcm9wZXJ0eS5qYXZh) | | | | | [...n/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZU1lcmdlT25SZWFkUkRELnNjYWxh) | | | | | [...he/hudi/common/util/HoodieRecordSizeEstimator.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvSG9vZGllUmVjb3JkU2l6ZUVzdGltYXRvci5qYXZh) | | | | | [...a/org/apache/hudi/streamer/OperationConverter.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zdHJlYW1lci9PcGVyYXRpb25Db252ZXJ0ZXIuamF2YQ==) | | | | | [...i/common/model/OverwriteWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZVdpdGhMYXRlc3RBdnJvUGF5bG9hZC5qYXZh) | | | | | ... and [404 more](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree-more) | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructu
[GitHub] [hudi] codecov-io edited a comment on pull request #2757: [HUDI-1757] Assigns the buckets by record key for Flink writer
codecov-io edited a comment on pull request #2757: URL: https://github.com/apache/hudi/pull/2757#issuecomment-812247500 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2757?src=pr&el=h1) Report > Merging [#2757](https://codecov.io/gh/apache/hudi/pull/2757?src=pr&el=desc) (9372602) into [master](https://codecov.io/gh/apache/hudi/commit/9804662bc8e17d6936c20326f17ec7c0360dcaf6?el=desc) (9804662) will **decrease** coverage by `42.74%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2757/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2757?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2757 +/- ## - Coverage 52.12% 9.38% -42.75% + Complexity 3646 48 -3598 Files 480 54 -426 Lines 228671993-20874 Branches 2417 236 -2181 - Hits 11920 187-11733 + Misses 99161793 -8123 + Partials 1031 13 -1018 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.38% <ø> (-60.36%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2757?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | | | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm
[GitHub] [hudi] xiarixiaoyao opened a new pull request #2761: [HUDI-1676] Support SQL with spark3
xiarixiaoyao opened a new pull request #2761: URL: https://github.com/apache/hudi/pull/2761 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request Support SQL with spark3 , compatible with dataSourceV1 and dataSourceV2 and hive table 1、support CTAS for spark3 (for mor table ro and rt table will be create at the same time) 2、support INSERT for spark3 3、support merge、update、delete without RowKey constraint for spark3 the pr for hudi support dataSourceV2 will put forward in next few days this pr is supplement of https://github.com/apache/hudi/pull/2645 which implement basic sql support and mergeInto with the RowKey constraint. ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1676) Support SQL with spark3
[ https://issues.apache.org/jira/browse/HUDI-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1676: - Labels: pull-request-available (was: ) > Support SQL with spark3 > --- > > Key: HUDI-1676 > URL: https://issues.apache.org/jira/browse/HUDI-1676 > Project: Apache Hudi > Issue Type: Sub-task > Components: Spark Integration >Affects Versions: 0.9.0 >Reporter: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > 1、support CTAS for spark3 > 3、support INSERT for spark3 > 4、support merge、update、delete without RowKey constraint for spark3 > 5、support dataSourceV2 for spark3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] xiarixiaoyao commented on pull request #2761: [HUDI-1676] Support SQL with spark3
xiarixiaoyao commented on pull request #2761: URL: https://github.com/apache/hudi/pull/2761#issuecomment-812815163 @vinothchandar , could you help me to review this pr, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2761: [HUDI-1676] Support SQL with spark3
xiarixiaoyao commented on a change in pull request #2761: URL: https://github.com/apache/hudi/pull/2761#discussion_r606622137 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala ## @@ -302,6 +302,10 @@ case class HoodieFileIndex( PartitionRowPath(partitionRow, partitionPath) } +if (partitionRowPaths.isEmpty) { + partitionRowPaths = Seq(PartitionRowPath(InternalRow.empty, "")).toBuffer +} + Review comment: simple fixed the bug ,, introduce by HUDI-1591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2761: [HUDI-1676] Support SQL with spark3
xiarixiaoyao commented on a change in pull request #2761: URL: https://github.com/apache/hudi/pull/2761#discussion_r606622137 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala ## @@ -302,6 +302,10 @@ case class HoodieFileIndex( PartitionRowPath(partitionRow, partitionPath) } +if (partitionRowPaths.isEmpty) { + partitionRowPaths = Seq(PartitionRowPath(InternalRow.empty, "")).toBuffer +} + Review comment: simple fixed the bug , introduced by HUDI-1591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io commented on pull request #2761: [HUDI-1676] Support SQL with spark3
codecov-io commented on pull request #2761: URL: https://github.com/apache/hudi/pull/2761#issuecomment-812815750 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2761?src=pr&el=h1) Report > Merging [#2761](https://codecov.io/gh/apache/hudi/pull/2761?src=pr&el=desc) (f404051) into [master](https://codecov.io/gh/apache/hudi/commit/e970e1f48302aec3af7eeca009a2c793757cd501?el=desc) (e970e1f) will **decrease** coverage by `42.94%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2761/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2761?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2761 +/- ## - Coverage 52.32% 9.38% -42.95% + Complexity 3689 48 -3641 Files 483 54 -429 Lines 230951993-21102 Branches 2460 236 -2224 - Hits 12084 187-11897 + Misses 99421793 -8149 + Partials 1069 13 -1056 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.38% <ø> (-60.32%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2761?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | | | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlc