[GitHub] [hudi] wzx140 commented on pull request #6745: Fix comment in RFC46
wzx140 commented on PR #6745: URL: https://github.com/apache/hudi/pull/6745#issuecomment-1272461930 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full
hudi-bot commented on PR #6284: URL: https://github.com/apache/hudi/pull/6284#issuecomment-1272455607 ## CI report: * 026dbfc7a6d4d7e489e8c8671a84e143bdb01758 UNKNOWN * a455e4c67d1ac237ef999ac8d6aa584af2f4cd1f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12081) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
svn commit: r57231 - in /dev/hudi/hudi-0.12.1-rc2: ./ hudi-0.12.1-rc2.src.tgz.asc
Author: yuzhaojing Date: Sun Oct 9 04:04:16 2022 New Revision: 57231 Log: (empty) Added: dev/hudi/hudi-0.12.1-rc2/ dev/hudi/hudi-0.12.1-rc2/hudi-0.12.1-rc2.src.tgz.asc Added: dev/hudi/hudi-0.12.1-rc2/hudi-0.12.1-rc2.src.tgz.asc == --- dev/hudi/hudi-0.12.1-rc2/hudi-0.12.1-rc2.src.tgz.asc (added) +++ dev/hudi/hudi-0.12.1-rc2/hudi-0.12.1-rc2.src.tgz.asc Sun Oct 9 04:04:16 2022 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCAAdFiEEtDBVGfNt1+i35qaEWLhbgUd4POIFAmNCRrcACgkQWLhbgUd4 +POJz4w/+IiYdMJ9T3EokFjbvSRboaf7CCxFJQI6Oo4jWffNMe5ucUe7HmK9FVYzZ +MmspbWifHrv9lojKVf9Lr37CCo+3SjktTGVPh1Ux7qOMQGJjlG4/Mf9oWq0cnKys +pGpGFdT8P39ETtlad2ic+JYyHTidE9lwIM0/p1syZO2sTL6e1093i+COfrgQxzjQ +BRcM95oeNmFOTjrJfKChM86IqZJqejl0duYci4BqxcYB+NgLIYtXYDxEQeGtP13N +Y2zPVDJ1FNkltgrMBaK2m8eh/4ZUALLgzVGAf/jaOc/Zw1rvdBCJrpYUCsAiH04y +aYsIFfCgvazrZsG/bLCa8kzR1TkjJenHNuvqQz8nhRJK+7pAludM+S3mTgVPdi8h +/wZUFNSNDGtaFX783FYVhZ9RKvyY4hH7OBlojlTbF6hqqpAXFLFr0BVj4Lri76Z/ +ow+RHu06heVqgXhvFSycXuB0t3JiSZuvF8JwvkKxRqek2vh6MXU3jnn04vipKgx7 +pel2z3wh6aVEwJT/VFNUtqPyp/6yVxs0p+94h2Hhd5FChYOh2mi8/MI0pBHhB+Vw +5WrqBkEbTxjiMCGyE78yK2V5tAcwV/WaH4kQ+oDTIBPa6zsNicUP7CrvvgqwBB04 +mdHuzmQtDTZoaLD7meyrTTl3xC4h4Om6JBAeFcomB4sX/uhDXVE= +=IV7M +-END PGP SIGNATURE-
[hudi] annotated tag release-0.12.1-rc2 updated (baeff4331d -> 0672671a24)
This is an automated email from the ASF dual-hosted git repository. yuzhaojing pushed a change to annotated tag release-0.12.1-rc2 in repository https://gitbox.apache.org/repos/asf/hudi.git *** WARNING: tag release-0.12.1-rc2 was modified! *** from baeff4331d (commit) to 0672671a24 (tag) tagging baeff4331dd25742f8280553281b773bc5e570a5 (commit) replaces release-0.12.1-rc1 by 喻兆靖 on Sun Oct 9 12:00:41 2022 +0800 - Log - 0.12.1 -BEGIN PGP SIGNATURE- iQIzBAABCAAdFiEEtDBVGfNt1+i35qaEWLhbgUd4POIFAmNCR2kACgkQWLhbgUd4 POIlfw/+MoDAQDus6WUXKEPLfvhXiM3AK/xw6fupGd1N0ge6mA1FR4A0YJ1mFpEH hNw+ZI7Nn43FeikBTq3FeqSBkHYJpDu/OmAoN7trEyidpb09uzAcFSte2XZfJK0s xf5s0kmSl5DP7PCE4+B6DhJDDG11G40HYGiyoOoOwU1XMpOfEdQUfxU+dorwy7dj gDucsTqHRFviiN27KsG0ONSihZhY3ODStxsq5zDqsDvaVJZfSafZE8txsFyuGJle 7/mwBuKuhP+hmc0o9m9N3gY/aGTCCTX2V309um8H8H5l/dq3Dm4gCA4NB750WuFJ EPOUpSXwk9QXgGRIV52aOBZAVDIzZc4ME4Ngav/XFVHK/rL9zJ7iEmZ0lbQtDzt8 wXR2Ljh4F7hETQ4jG2G67ETh+XnQQQfKAkpdTigS5ox2vU4023oyx9XMQRpLx5Mc ifsk1YeoFx0ptlm33bqaV2OGutB3t1f51iEBt5sc/lf10JO8ehf0kJwIGvNGlE6C 8hTIsQHCf4qCrnpo7/pEFZQ9XskrWnBgM9kXZhMdssiUvQ5LJHD+MWGt15pL8VM5 w8zJmIId/vAubrTAujTdV+i2WgFi0rEKsB7gp/ITroheXv3V3GRg3voNaB5Wx+DI LsDOloyvO0z14IO13RQ4gXuiuBwMjKPBXTjPXDXUuq5fY/EBtxI= =tdzo -END PGP SIGNATURE- --- No new revisions were added by this update. Summary of changes:
[hudi] 01/02: [HUDI-4992] Fixing invalid min/max record key stats in Parquet metadata (#6883)
This is an automated email from the ASF dual-hosted git repository. yuzhaojing pushed a commit to branch release-0.12.1 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 51105621af08eb77a8007ac965a8e5d0882c5d97 Author: Alexey Kudinkin AuthorDate: Fri Oct 7 03:37:26 2022 -0700 [HUDI-4992] Fixing invalid min/max record key stats in Parquet metadata (#6883) --- .../apache/hudi/io/storage/HoodieOrcWriter.java| 10 +- .../hudi/avro/TestHoodieAvroParquetWriter.java | 118 + .../hudi/io/storage/TestHoodieOrcReaderWriter.java | 7 +- .../row/HoodieRowDataParquetWriteSupport.java | 55 -- .../storage/row/HoodieRowParquetWriteSupport.java | 61 +-- .../row/TestHoodieInternalRowParquetWriter.java| 95 ++--- .../apache/hudi/avro/HoodieAvroWriteSupport.java | 60 +-- .../hudi/avro/HoodieBloomFilterWriteSupport.java | 96 + .../org/apache/hudi/common/util/BaseFileUtils.java | 13 +-- .../hudi/avro/TestHoodieAvroWriteSupport.java | 67 10 files changed, 361 insertions(+), 221 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieOrcWriter.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieOrcWriter.java index a532ac66c9..4bcab2cec8 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieOrcWriter.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieOrcWriter.java @@ -23,6 +23,7 @@ import org.apache.avro.generic.GenericRecord; import org.apache.avro.generic.IndexedRecord; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; +import org.apache.hudi.avro.HoodieBloomFilterWriteSupport; import org.apache.hudi.common.bloom.BloomFilter; import org.apache.hudi.common.bloom.HoodieDynamicBoundedBloomFilter; import org.apache.hudi.common.engine.TaskContextSupplier; @@ -44,9 +45,6 @@ import java.util.List; import java.util.concurrent.atomic.AtomicLong; import static org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_AVRO_BLOOM_FILTER_METADATA_KEY; -import static org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_BLOOM_FILTER_TYPE_CODE; -import static org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_MAX_RECORD_KEY_FOOTER; -import static org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_MIN_RECORD_KEY_FOOTER; public class HoodieOrcWriter implements HoodieFileWriter, Closeable { @@ -155,11 +153,11 @@ public class HoodieOrcWriterhttp://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.avro; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hudi.DummyTaskContextSupplier; +import org.apache.hudi.common.bloom.BloomFilter; +import org.apache.hudi.common.bloom.BloomFilterFactory; +import org.apache.hudi.common.bloom.BloomFilterTypeCode; +import org.apache.hudi.common.testutils.HoodieTestDataGenerator; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.ParquetUtils; +import org.apache.hudi.io.storage.HoodieAvroParquetWriter; +import org.apache.hudi.io.storage.HoodieParquetConfig; +import org.apache.parquet.avro.AvroSchemaConverter; +import org.apache.parquet.hadoop.ParquetWriter; +import org.apache.parquet.hadoop.metadata.CompressionCodecName; +import org.apache.parquet.hadoop.metadata.FileMetaData; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; + +import java.io.IOException; +import java.util.Comparator; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertTrue; + +public class TestHoodieAvroParquetWriter { + + @TempDir java.nio.file.Path tmpDir; + + @Test + public void testProperWriting() throws IOException { +Configuration hadoopConf = new Configuration(); + +HoodieTestDataGenerator dataGen = new HoodieTestDataGenerator(0xDEED); +List records = dataGen.generateGenericRecords(10); + +Schema schema = records.get(0).getSchema(); + +BloomFilter filter = BloomFilterFactory.createBloomFilter(1000, 0.0001, 1, +BloomFilterTypeCode.DYNAMIC_V0.name()); +HoodieAvroWriteSupport writeSupport = new HoodieAvroWriteSupport(new AvroSchemaConverter().convert(schema), +schema, Option.of(filter)); + +HoodieParquetConfig parquetConfig = +new
[hudi] branch release-0.12.1 updated (28cb191df7 -> baeff4331d)
This is an automated email from the ASF dual-hosted git repository. yuzhaojing pushed a change to branch release-0.12.1 in repository https://gitbox.apache.org/repos/asf/hudi.git from 28cb191df7 [MINOR] Update release version to reflect published version 0.12.1 new 51105621af [HUDI-4992] Fixing invalid min/max record key stats in Parquet metadata (#6883) new baeff4331d Bumping release candidate number 2 The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: docker/hoodie/hadoop/base/pom.xml | 2 +- docker/hoodie/hadoop/base_java11/pom.xml | 2 +- docker/hoodie/hadoop/datanode/pom.xml | 2 +- docker/hoodie/hadoop/historyserver/pom.xml | 2 +- docker/hoodie/hadoop/hive_base/pom.xml | 2 +- docker/hoodie/hadoop/namenode/pom.xml | 2 +- docker/hoodie/hadoop/pom.xml | 2 +- docker/hoodie/hadoop/prestobase/pom.xml| 2 +- docker/hoodie/hadoop/spark_base/pom.xml| 2 +- docker/hoodie/hadoop/sparkadhoc/pom.xml| 2 +- docker/hoodie/hadoop/sparkmaster/pom.xml | 2 +- docker/hoodie/hadoop/sparkworker/pom.xml | 2 +- docker/hoodie/hadoop/trinobase/pom.xml | 2 +- docker/hoodie/hadoop/trinocoordinator/pom.xml | 2 +- docker/hoodie/hadoop/trinoworker/pom.xml | 2 +- hudi-aws/pom.xml | 4 +- hudi-cli/pom.xml | 2 +- hudi-client/hudi-client-common/pom.xml | 4 +- .../apache/hudi/io/storage/HoodieOrcWriter.java| 10 +- .../hudi/avro/TestHoodieAvroParquetWriter.java | 118 + .../hudi/io/storage/TestHoodieOrcReaderWriter.java | 7 +- hudi-client/hudi-flink-client/pom.xml | 4 +- .../row/HoodieRowDataParquetWriteSupport.java | 55 -- hudi-client/hudi-java-client/pom.xml | 4 +- hudi-client/hudi-spark-client/pom.xml | 4 +- .../storage/row/HoodieRowParquetWriteSupport.java | 61 +-- .../row/TestHoodieInternalRowParquetWriter.java| 95 ++--- hudi-client/pom.xml| 2 +- hudi-common/pom.xml| 2 +- .../apache/hudi/avro/HoodieAvroWriteSupport.java | 60 +-- .../hudi/avro/HoodieBloomFilterWriteSupport.java | 96 + .../org/apache/hudi/common/util/BaseFileUtils.java | 13 +-- .../hudi/avro/TestHoodieAvroWriteSupport.java | 67 hudi-examples/hudi-examples-common/pom.xml | 2 +- hudi-examples/hudi-examples-flink/pom.xml | 2 +- hudi-examples/hudi-examples-java/pom.xml | 2 +- hudi-examples/hudi-examples-spark/pom.xml | 2 +- hudi-examples/pom.xml | 2 +- hudi-flink-datasource/hudi-flink/pom.xml | 4 +- hudi-flink-datasource/hudi-flink1.13.x/pom.xml | 4 +- hudi-flink-datasource/hudi-flink1.14.x/pom.xml | 4 +- hudi-flink-datasource/hudi-flink1.15.x/pom.xml | 4 +- hudi-flink-datasource/pom.xml | 4 +- hudi-gcp/pom.xml | 2 +- hudi-hadoop-mr/pom.xml | 2 +- hudi-integ-test/pom.xml| 2 +- hudi-kafka-connect/pom.xml | 4 +- hudi-spark-datasource/hudi-spark-common/pom.xml| 4 +- hudi-spark-datasource/hudi-spark/pom.xml | 4 +- hudi-spark-datasource/hudi-spark2-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark2/pom.xml | 4 +- hudi-spark-datasource/hudi-spark3-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark3.1.x/pom.xml | 4 +- hudi-spark-datasource/hudi-spark3.2.x/pom.xml | 4 +- .../hudi-spark3.2plus-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark3.3.x/pom.xml | 4 +- hudi-spark-datasource/pom.xml | 2 +- hudi-sync/hudi-adb-sync/pom.xml| 2 +- hudi-sync/hudi-datahub-sync/pom.xml| 2 +- hudi-sync/hudi-hive-sync/pom.xml | 2 +- hudi-sync/hudi-sync-common/pom.xml | 2 +- hudi-sync/pom.xml | 2 +- hudi-tests-common/pom.xml | 2 +- hudi-timeline-service/pom.xml | 2 +- hudi-utilities/pom.xml | 2 +- packaging/hudi-aws-bundle/pom.xml | 2 +- packaging/hudi-datahub-sync-bundle/pom.xml | 2 +- packaging/hudi-flink-bundle/pom.xml| 2 +- packaging/hudi-gcp-bundle/pom.xml | 2 +-
[hudi] 02/02: Bumping release candidate number 2
This is an automated email from the ASF dual-hosted git repository. yuzhaojing pushed a commit to branch release-0.12.1 in repository https://gitbox.apache.org/repos/asf/hudi.git commit baeff4331dd25742f8280553281b773bc5e570a5 Author: 喻兆靖 AuthorDate: Sun Oct 9 11:56:52 2022 +0800 Bumping release candidate number 2 --- docker/hoodie/hadoop/base/pom.xml | 2 +- docker/hoodie/hadoop/base_java11/pom.xml | 2 +- docker/hoodie/hadoop/datanode/pom.xml | 2 +- docker/hoodie/hadoop/historyserver/pom.xml | 2 +- docker/hoodie/hadoop/hive_base/pom.xml | 2 +- docker/hoodie/hadoop/namenode/pom.xml | 2 +- docker/hoodie/hadoop/pom.xml | 2 +- docker/hoodie/hadoop/prestobase/pom.xml| 2 +- docker/hoodie/hadoop/spark_base/pom.xml| 2 +- docker/hoodie/hadoop/sparkadhoc/pom.xml| 2 +- docker/hoodie/hadoop/sparkmaster/pom.xml | 2 +- docker/hoodie/hadoop/sparkworker/pom.xml | 2 +- docker/hoodie/hadoop/trinobase/pom.xml | 2 +- docker/hoodie/hadoop/trinocoordinator/pom.xml | 2 +- docker/hoodie/hadoop/trinoworker/pom.xml | 2 +- hudi-aws/pom.xml | 4 ++-- hudi-cli/pom.xml | 2 +- hudi-client/hudi-client-common/pom.xml | 4 ++-- hudi-client/hudi-flink-client/pom.xml | 4 ++-- hudi-client/hudi-java-client/pom.xml | 4 ++-- hudi-client/hudi-spark-client/pom.xml | 4 ++-- hudi-client/pom.xml| 2 +- hudi-common/pom.xml| 2 +- hudi-examples/hudi-examples-common/pom.xml | 2 +- hudi-examples/hudi-examples-flink/pom.xml | 2 +- hudi-examples/hudi-examples-java/pom.xml | 2 +- hudi-examples/hudi-examples-spark/pom.xml | 2 +- hudi-examples/pom.xml | 2 +- hudi-flink-datasource/hudi-flink/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.13.x/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.14.x/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.15.x/pom.xml | 4 ++-- hudi-flink-datasource/pom.xml | 4 ++-- hudi-gcp/pom.xml | 2 +- hudi-hadoop-mr/pom.xml | 2 +- hudi-integ-test/pom.xml| 2 +- hudi-kafka-connect/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark-common/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark2-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark2/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark3-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark3.1.x/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark3.2.x/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark3.2plus-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark3.3.x/pom.xml | 4 ++-- hudi-spark-datasource/pom.xml | 2 +- hudi-sync/hudi-adb-sync/pom.xml| 2 +- hudi-sync/hudi-datahub-sync/pom.xml| 2 +- hudi-sync/hudi-hive-sync/pom.xml | 2 +- hudi-sync/hudi-sync-common/pom.xml | 2 +- hudi-sync/pom.xml | 2 +- hudi-tests-common/pom.xml | 2 +- hudi-timeline-service/pom.xml | 2 +- hudi-utilities/pom.xml | 2 +- packaging/hudi-aws-bundle/pom.xml | 2 +- packaging/hudi-datahub-sync-bundle/pom.xml | 2 +- packaging/hudi-flink-bundle/pom.xml| 2 +- packaging/hudi-gcp-bundle/pom.xml | 2 +- packaging/hudi-hadoop-mr-bundle/pom.xml| 2 +- packaging/hudi-hive-sync-bundle/pom.xml| 2 +- packaging/hudi-integ-test-bundle/pom.xml | 2 +- packaging/hudi-kafka-connect-bundle/pom.xml| 2 +- packaging/hudi-presto-bundle/pom.xml | 2 +- packaging/hudi-spark-bundle/pom.xml| 2 +- packaging/hudi-timeline-server-bundle/pom.xml | 2 +- packaging/hudi-trino-bundle/pom.xml| 2 +- packaging/hudi-utilities-bundle/pom.xml| 2 +- packaging/hudi-utilities-slim-bundle/pom.xml | 2 +- pom.xml| 2 +- 70 files changed, 87 insertions(+), 87 deletions(-) diff --git a/docker/hoodie/hadoop/base/pom.xml b/docker/hoodie/hadoop/base/pom.xml index 39ceb4006b..8cbaa9fc06 100644 --- a/docker/hoodie/hadoop/base/pom.xml +++
[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
hudi-bot commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272447816 ## CI report: * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN * f06e77aa268d70f0532bdaee53db7f9be660de39 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12073) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12080) * 3bc9f046410bead2b9f17a35e552c2a868d523c0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12082) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
hudi-bot commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272447239 ## CI report: * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN * f06e77aa268d70f0532bdaee53db7f9be660de39 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12073) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12080) * 3bc9f046410bead2b9f17a35e552c2a868d523c0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
hudi-bot commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272446498 ## CI report: * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN * f06e77aa268d70f0532bdaee53db7f9be660de39 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12073) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12080) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] slfan1989 commented on pull request #6893: [HUDI-4997] use jackson-v2 replace jackson-v1 import.
slfan1989 commented on PR #6893: URL: https://github.com/apache/hudi/pull/6893#issuecomment-1272438799 @xushiyan Can you help review this pr? Thank you very much! This change avoids the use of jackson-v1 to reduce security risks. We can read this article below: https://cowtowncoder.medium.com/on-jackson-cves-dont-panic-here-is-what-you-need-to-know-54cd0d6e8062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full
hudi-bot commented on PR #6284: URL: https://github.com/apache/hudi/pull/6284#issuecomment-1272437525 ## CI report: * 026dbfc7a6d4d7e489e8c8671a84e143bdb01758 UNKNOWN * 82dd925f9018c0ec3fb3bfaa09f70174010af90c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12079) * a455e4c67d1ac237ef999ac8d6aa584af2f4cd1f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12081) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full
xiarixiaoyao commented on code in PR #6284: URL: https://github.com/apache/hudi/pull/6284#discussion_r990721773 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java: ## @@ -92,11 +92,12 @@ protected HoodieMergedLogRecordScanner(FileSystem fs, String basePath, List(maxMemorySizeInBytes, spillableMapBasePath, new DefaultSizeEstimator(), + this.records = new ExternalSpillableMap<>(maxMemorySizeInBytes, basePath + spillableMapBasePath, new DefaultSizeEstimator(), new HoodieRecordSizeEstimator(readerSchema), diskMapType, isBitCaskDiskMapCompressionEnabled); + Review Comment: basepath is hdfs path, spillableMapbase_path is local path, It is wrong to use hdfs path directly as spillableMapbase_path -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full
hudi-bot commented on PR #6284: URL: https://github.com/apache/hudi/pull/6284#issuecomment-1272436538 ## CI report: * 026dbfc7a6d4d7e489e8c8671a84e143bdb01758 UNKNOWN * 4b0a4e72766491e15dbeb8ed904c9aabae32bb89 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11563) * 82dd925f9018c0ec3fb3bfaa09f70174010af90c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12079) * a455e4c67d1ac237ef999ac8d6aa584af2f4cd1f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Aload commented on issue #6618: Caused by: org.apache.http.NoHttpResponseException: xxxxxx:34812 failed to respond[SUPPORT]
Aload commented on issue #6618: URL: https://github.com/apache/hudi/issues/6618#issuecomment-1272431470 > @Aload can you verify if the patch is used in your version of hudi? and still having the problem? > > > I have encountered this problem,this pr may solve your problem : #6393 > > in order to help diagnose, we need more info also to reproduce it. like configs and code snippet yes version 0.12.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full
hudi-bot commented on PR #6284: URL: https://github.com/apache/hudi/pull/6284#issuecomment-1272426439 ## CI report: * 026dbfc7a6d4d7e489e8c8671a84e143bdb01758 UNKNOWN * 4b0a4e72766491e15dbeb8ed904c9aabae32bb89 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11563) * 82dd925f9018c0ec3fb3bfaa09f70174010af90c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12079) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
hudi-bot commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272425963 ## CI report: * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN * f06e77aa268d70f0532bdaee53db7f9be660de39 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12073) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12080) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
xiarixiaoyao commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272425949 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full
hudi-bot commented on PR #6284: URL: https://github.com/apache/hudi/pull/6284#issuecomment-1272425889 ## CI report: * 026dbfc7a6d4d7e489e8c8671a84e143bdb01758 UNKNOWN * 4b0a4e72766491e15dbeb8ed904c9aabae32bb89 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11563) * 82dd925f9018c0ec3fb3bfaa09f70174010af90c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] suryaprasanna commented on a diff in pull request #5958: [HUDI-3900] [UBER] Support log compaction action for MOR tables
suryaprasanna commented on code in PR #5958: URL: https://github.com/apache/hudi/pull/5958#discussion_r985294717 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java: ## @@ -362,6 +381,228 @@ protected synchronized void scanInternal(Option keySpecOpt) { } } + private void scanInternalV2(Option keySpecOption, boolean skipProcessingBlocks) { +currentInstantLogBlocks = new ArrayDeque<>(); +progress = 0.0f; +totalLogFiles = new AtomicLong(0); +totalRollbacks = new AtomicLong(0); +totalCorruptBlocks = new AtomicLong(0); +totalLogBlocks = new AtomicLong(0); +totalLogRecords = new AtomicLong(0); +HoodieLogFormatReader logFormatReaderWrapper = null; +HoodieTimeline commitsTimeline = this.hoodieTableMetaClient.getCommitsTimeline(); +HoodieTimeline completedInstantsTimeline = commitsTimeline.filterCompletedInstants(); +HoodieTimeline inflightInstantsTimeline = commitsTimeline.filterInflights(); +try { + + // Get the key field based on populate meta fields config + // and the table type + final String keyField = getKeyField(); + + boolean enableRecordLookups = !forceFullScan; + // Iterate over the paths + logFormatReaderWrapper = new HoodieLogFormatReader(fs, + logFilePaths.stream().map(logFile -> new HoodieLogFile(new Path(logFile))).collect(Collectors.toList()), + readerSchema, readBlocksLazily, reverseReader, bufferSize, enableRecordLookups, keyField, internalSchema); + + /** + * Scanning log blocks and placing the compacted blocks at the right place require two traversals. + * First traversal to identify the rollback blocks and valid data and compacted blocks. + * + * Scanning blocks is easy to do in single writer mode, where the rollback block is right after the effected data blocks. + * With multiwriter mode the blocks can be out of sync. An example scenario. + * B1, B2, B3, B4, R1(B3), B5 + * In this case, rollback block R1 is invalidating the B3 which is not the previous block. + * This becomes more complicated if we have compacted blocks, which are data blocks created using log compaction. + * + * To solve this, run a single traversal, collect all the valid blocks that are not corrupted + * along with the block instant times and rollback block's target instant times. + * + * As part of second traversal iterate block instant times in reverse order. + * While iterating in reverse order keep a track of final compacted instant times for each block. + * In doing so, when a data block is seen include the final compacted block if it is not already added. + * + * find the final compacted block which contains the merged contents. + * For example B1 and B2 are merged and created a compacted block called M1 and now M1, B3 and B4 are merged and + * created another compacted block called M2. So, now M2 is the final block which contains all the changes of B1,B2,B3,B4. + * So, blockTimeToCompactionBlockTimeMap will look like + * (B1 -> M2), (B2 -> M2), (B3 -> M2), (B4 -> M2), (M1 -> M2) + * This map is updated while iterating and is used to place the compacted blocks in the correct position. + * This way we can have multiple layers of merge blocks and still be able to find the correct positions of merged blocks. + */ + + // Collect targetRollbackInstants, using which we can determine which blocks are invalid. + Set targetRollbackInstants = new HashSet<>(); + + // This holds block instant time to list of blocks. Note here the log blocks can be normal data blocks or compacted log blocks. + Map> instantToBlocksMap = new HashMap<>(); + + // Order of Instants. + List orderedInstantsList = new ArrayList<>(); + + Set scannedLogFiles = new HashSet<>(); + + /* + * 1. First step to traverse in forward direction. While traversing the log blocks collect following, + *a. instant times + *b. instant to logblocks map. + *c. targetRollbackInstants. + */ + while (logFormatReaderWrapper.hasNext()) { +HoodieLogFile logFile = logFormatReaderWrapper.getLogFile(); +LOG.info("Scanning log file " + logFile); +scannedLogFiles.add(logFile); +totalLogFiles.set(scannedLogFiles.size()); +// Use the HoodieLogFileReader to iterate through the blocks in the log file +HoodieLogBlock logBlock = logFormatReaderWrapper.next(); +final String instantTime = logBlock.getLogBlockHeader().get(INSTANT_TIME); +totalLogBlocks.incrementAndGet(); +// Ignore the corrupt blocks. No further handling is required for them. +if (logBlock.getBlockType().equals(CORRUPT_BLOCK)) { + LOG.info("Found a corrupt block in " + logFile.getPath()); +
[GitHub] [hudi] hudi-bot commented on pull request #6896: [HUDI-4975] Fix datahub bundle dependency
hudi-bot commented on PR #6896: URL: https://github.com/apache/hudi/pull/6896#issuecomment-1272399503 ## CI report: * 1e185f00b79069df14222048fb0b7b834292d2c6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12077) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272399393 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * d5266737aed5cee1b62592371219d944312c06b4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12078) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pratyakshsharma commented on pull request #5071: [HUDI-1881]: draft implementation for trigger based on data availability
pratyakshsharma commented on PR #5071: URL: https://github.com/apache/hudi/pull/5071#issuecomment-1272398411 @nsivabalan please take a pass, this should be good to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6871: Bump protobuf-java from 3.21.5 to 3.21.7
hudi-bot commented on PR #6871: URL: https://github.com/apache/hudi/pull/6871#issuecomment-1272388964 ## CI report: * efdbd9edebed1d540916f981722038f24d9c7266 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12076) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272378288 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * 18ef7b44488dff256728b2bba024b4a4d00aebe9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12064) * d5266737aed5cee1b62592371219d944312c06b4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12078) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5071: [HUDI-1881]: draft implementation for trigger based on data availability
hudi-bot commented on PR #5071: URL: https://github.com/apache/hudi/pull/5071#issuecomment-1272378107 ## CI report: * b7203e6d2d6f1e8d3121024faedfa2da1ccc0c71 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7088) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5071: [HUDI-1881]: draft implementation for trigger based on data availability
hudi-bot commented on PR #5071: URL: https://github.com/apache/hudi/pull/5071#issuecomment-1272377275 ## CI report: * b7203e6d2d6f1e8d3121024faedfa2da1ccc0c71 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7088) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pratyakshsharma commented on pull request #5071: [HUDI-1881]: draft implementation for trigger based on data availability
pratyakshsharma commented on PR #5071: URL: https://github.com/apache/hudi/pull/5071#issuecomment-1272374574 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272367848 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * 18ef7b44488dff256728b2bba024b4a4d00aebe9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12064) * d5266737aed5cee1b62592371219d944312c06b4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6896: [HUDI-4975] Fix datahub bundle dependency
hudi-bot commented on PR #6896: URL: https://github.com/apache/hudi/pull/6896#issuecomment-1272366290 ## CI report: * 97406bce1fcbf575139682cd0659fa154fbb214f Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12074) * 1e185f00b79069df14222048fb0b7b834292d2c6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12077) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6871: Bump protobuf-java from 3.21.5 to 3.21.7
hudi-bot commented on PR #6871: URL: https://github.com/apache/hudi/pull/6871#issuecomment-1272366267 ## CI report: * 050ce213e4faa481abafff1f9127bd91753f2d6d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11996) * efdbd9edebed1d540916f981722038f24d9c7266 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12076) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6896: [HUDI-4975] Fix datahub bundle dependency
hudi-bot commented on PR #6896: URL: https://github.com/apache/hudi/pull/6896#issuecomment-1272365457 ## CI report: * 97406bce1fcbf575139682cd0659fa154fbb214f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12074) * 1e185f00b79069df14222048fb0b7b834292d2c6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6871: Bump protobuf-java from 3.21.5 to 3.21.7
hudi-bot commented on PR #6871: URL: https://github.com/apache/hudi/pull/6871#issuecomment-1272365434 ## CI report: * 050ce213e4faa481abafff1f9127bd91753f2d6d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11996) * efdbd9edebed1d540916f981722038f24d9c7266 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
hudi-bot commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272364193 ## CI report: * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN * f06e77aa268d70f0532bdaee53db7f9be660de39 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12073) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch dependabot/maven/com.google.protobuf-protobuf-java-3.21.7 updated (050ce213e4 -> efdbd9edeb)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/com.google.protobuf-protobuf-java-3.21.7 in repository https://gitbox.apache.org/repos/asf/hudi.git discard 050ce213e4 Bump protobuf-java from 3.21.5 to 3.21.7 add 48e5bb0fed [HOTFIX] Fix source release validate script (#6865) add 9f5d16529d [HUDI-4980] Calculate avg record size using commit only (#6864) add 067cc24d88 Revert "[HUDI-4915] improve avro serializer/deserializer (#6788)" (#6809) add fb4f026580 [HUDI-4970] Update kafka-connect readme and refactor HoodieConfig#create (#6857) add 280194d3b6 Enhancing README for multi-writer tests (#6870) add fd8a947e61 [MINOR] Fix deploy script for flink 1.15 (#6872) add a51181726c [HUDI-4992] Fixing invalid min/max record key stats in Parquet metadata (#6883) add c5125d38b5 [HUDI-4972] Fixes to make unit tests work on m1 mac (#6751) add 06d924137b [HUDI-2786] Docker demo on mac aarch64 (#6859) add 9c1fa14fd6 add support for unraveling proto schemas add 510d525e15 fix some compile issues add aad9ec1320 naming and style updates add 889927 make test data random, reuse code add a922a5beca add test for 2 different recursion depths, fix schema cache key add 3b37dc95d9 add unsigned long support add 706291d4f3 better handle other types add c28e874fca rebase on 4904 add 190cc16381 get all tests working add f18fff886e fix oneof expected schema, update tests after rebase add ff5baa8706 revert scala binary change add 0069da2d1a try a different method to avoid avro version add 71a39bf488 Merge remote-tracking branch 'origin/master' into HUDI-4905 add c5dff63375 delete unused file add f53d47ea3b address PR feedback, update decimal precision add 1831639e39 fix isNullable issue, check if class is Int64value add eca2992d65 checkstyle fix add 423da6f7bb change wrapper descriptor set initialization add fb2d9f0030 add in testing for unsigned long to BigInteger conversion add f03f9610cf shade protobuf dependency add 57f8b81194 Merge remote-tracking branch 'origin/master' into HUDI-4905 add 7d5b9dc0a9 Revert "shade protobuf dependency" add 5d2c2853ea [HUDI-4905] Improve type handling in proto schema conversion add 182475a854 [HUDI-4971] Fix shading kryo-shaded with reusing configs (#6873) add efdbd9edeb Bump protobuf-java from 3.21.5 to 3.21.7 This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (050ce213e4) \ N -- N -- N refs/heads/dependabot/maven/com.google.protobuf-protobuf-java-3.21.7 (efdbd9edeb) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: ...ose_hadoop284_hive233_spark244_mac_aarch64.yml} | 131 +++ docker/setup_demo.sh | 10 +- docker/stop_demo.sh| 7 +- .../cli/commands/TestUpgradeDowngradeCommand.java | 6 +- .../apache/hudi/io/storage/HoodieOrcWriter.java| 10 +- .../hudi/avro/TestHoodieAvroParquetWriter.java | 118 ++ .../hudi/io/storage/TestHoodieOrcReaderWriter.java | 7 +- .../row/HoodieRowDataParquetWriteSupport.java | 55 ++--- .../storage/row/HoodieRowParquetWriteSupport.java | 61 +++--- .../table/action/commit/UpsertPartitioner.java | 16 +- .../row/TestHoodieInternalRowParquetWriter.java| 95 .../hudi/table/upgrade/TestUpgradeDowngrade.java | 6 +- .../apache/hudi/avro/HoodieAvroWriteSupport.java | 60 +++-- .../hudi/avro/HoodieBloomFilterWriteSupport.java | 96 .../apache/hudi/common/config/HoodieConfig.java| 9 +- .../org/apache/hudi/common/util/BaseFileUtils.java | 13 +- .../hudi/avro/TestHoodieAvroWriteSupport.java | 67 -- hudi-examples/hudi-examples-java/pom.xml | 6 + hudi-integ-test/README.md | 52 - hudi-kafka-connect/README.md | 11 +- .../TestUpgradeOrDowngradeProcedure.scala | 5 +- .../apache/spark/sql/avro/AvroDeserializer.scala | 20 +- .../org/apache/spark/sql/avro/AvroSerializer.scala | 17 +- .../apache/spark/sql/avro/AvroDeserializer.scala | 20 +- .../org/apache/spark/sql/avro/AvroSerializer.scala | 19 +- .../apache/spark/sql/avro/AvroDeserializer.scala | 20 +-
[hudi] branch master updated: [HUDI-4971] Fix shading kryo-shaded with reusing configs (#6873)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 182475a854 [HUDI-4971] Fix shading kryo-shaded with reusing configs (#6873) 182475a854 is described below commit 182475a8548c6174bb21999e4b55003e9854da3c Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com> AuthorDate: Sun Oct 9 00:50:25 2022 +0800 [HUDI-4971] Fix shading kryo-shaded with reusing configs (#6873) --- packaging/hudi-aws-bundle/pom.xml | 16 ++-- packaging/hudi-datahub-sync-bundle/pom.xml| 16 ++-- packaging/hudi-flink-bundle/pom.xml | 19 ++- packaging/hudi-gcp-bundle/pom.xml | 16 ++-- packaging/hudi-hadoop-mr-bundle/pom.xml | 19 ++- packaging/hudi-hive-sync-bundle/pom.xml | 19 ++- packaging/hudi-integ-test-bundle/pom.xml | 19 ++- packaging/hudi-kafka-connect-bundle/pom.xml | 7 ++ packaging/hudi-presto-bundle/pom.xml | 20 ++- packaging/hudi-spark-bundle/pom.xml | 4 +-- packaging/hudi-timeline-server-bundle/pom.xml | 7 ++ packaging/hudi-trino-bundle/pom.xml | 19 ++- packaging/hudi-utilities-bundle/pom.xml | 4 +-- packaging/hudi-utilities-slim-bundle/pom.xml | 4 +-- pom.xml | 35 +++ 15 files changed, 63 insertions(+), 161 deletions(-) diff --git a/packaging/hudi-aws-bundle/pom.xml b/packaging/hudi-aws-bundle/pom.xml index 61aea395ed..75e13ff5f9 100644 --- a/packaging/hudi-aws-bundle/pom.xml +++ b/packaging/hudi-aws-bundle/pom.xml @@ -71,7 +71,7 @@ - + org.apache.hudi:hudi-common org.apache.hudi:hudi-hadoop-mr org.apache.hudi:hudi-sync-common @@ -102,15 +102,7 @@ org.openjdk.jol:jol-core - - - com.esotericsoftware.kryo. - org.apache.hudi.com.esotericsoftware.kryo. - - - com.esotericsoftware.minlog. - org.apache.hudi.com.esotericsoftware.minlog. - + com.beust.jcommander. org.apache.hudi.com.beust.jcommander. @@ -134,10 +126,6 @@ org.apache.htrace. org.apache.hudi.org.apache.htrace. - -org.objenesis. - org.apache.hudi.org.objenesis. - com.amazonaws. org.apache.hudi.com.amazonaws. diff --git a/packaging/hudi-datahub-sync-bundle/pom.xml b/packaging/hudi-datahub-sync-bundle/pom.xml index 7425631181..2bae25239d 100644 --- a/packaging/hudi-datahub-sync-bundle/pom.xml +++ b/packaging/hudi-datahub-sync-bundle/pom.xml @@ -67,7 +67,7 @@ - + org.apache.hudi:hudi-common org.apache.hudi:hudi-hadoop-mr org.apache.hudi:hudi-sync-common @@ -98,15 +98,7 @@ org.openjdk.jol:jol-core - - - com.esotericsoftware.kryo. - org.apache.hudi.com.esotericsoftware.kryo. - - - com.esotericsoftware.minlog. - org.apache.hudi.com.esotericsoftware.minlog. - + org.apache.commons.io. org.apache.hudi.org.apache.commons.io. @@ -126,10 +118,6 @@ org.apache.htrace. org.apache.hudi.org.apache.htrace. - - org.objenesis. - org.apache.hudi.org.objenesis. - org.openjdk.jol. org.apache.hudi.org.openjdk.jol. diff --git a/packaging/hudi-flink-bundle/pom.xml
[GitHub] [hudi] xushiyan commented on pull request #6873: [HUDI-4971] Fix shading kryo-shaded with re-usable configs
xushiyan commented on PR #6873: URL: https://github.com/apache/hudi/pull/6873#issuecomment-1272356929 tested a few bundles including datahub sync, aws, utilities-slim+spark. working ok. the original issue is resolved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan merged pull request #6873: [HUDI-4971] Fix shading kryo-shaded with re-usable configs
xushiyan merged PR #6873: URL: https://github.com/apache/hudi/pull/6873 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6896: [HUDI-4975] Fix datahub bundle dependency
hudi-bot commented on PR #6896: URL: https://github.com/apache/hudi/pull/6896#issuecomment-1272353354 ## CI report: * 97406bce1fcbf575139682cd0659fa154fbb214f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12074) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6896: [HUDI-4975] Fix datahub bundle dependency
hudi-bot commented on PR #6896: URL: https://github.com/apache/hudi/pull/6896#issuecomment-1272352483 ## CI report: * 97406bce1fcbf575139682cd0659fa154fbb214f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6845: [HUDI-4945] Add a test case for batch clean.
hudi-bot commented on PR #6845: URL: https://github.com/apache/hudi/pull/6845#issuecomment-1272351581 ## CI report: * a3851570e4d4e07ebc53bf67934829051802da04 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12072) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #6891: [MINOR][DOCS] update committer list
xushiyan commented on PR #6891: URL: https://github.com/apache/hudi/pull/6891#issuecomment-1272344006 @YannByron you can land this yourself :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: [MINOR] Update committer list (#6891)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 82bbc2ce26 [MINOR] Update committer list (#6891) 82bbc2ce26 is described below commit 82bbc2ce2675f29deb9171365de063079790ff1a Author: Yann Byron AuthorDate: Sat Oct 8 23:35:51 2022 +0800 [MINOR] Update committer list (#6891) --- website/community/team.md | 1 + 1 file changed, 1 insertion(+) diff --git a/website/community/team.md b/website/community/team.md index 062a6d2cd8..277122aaca 100644 --- a/website/community/team.md +++ b/website/community/team.md @@ -35,6 +35,7 @@ last_modified_at: 2020-09-01T15:59:57-04:00 | https://avatars.githubusercontent.com/lw309637554"} alt="liway" className="profile-pic" align="middle" /> | [Wei Li](https://github.com/lw309637554) | Committer | liway| | https://avatars.githubusercontent.com/zhedoubushishi"} className="profile-pic" alt="zhedoubushishi" /> | [Wenning Ding](https://github.com/zhedoubushishi) | Committer | wenningd | | https://avatars.githubusercontent.com/wangxianghu"} alt="wangxianghu" className="profile-pic" align="middle" /> | [Xianghu Wang](https://github.com/wangxianghu) | Committer | wangxianghu| +| https://avatars.githubusercontent.com/YannByron"} className="profile-pic" alt="Yann Byron" align="middle" /> | [Yann Byron](https://github.com/YannByron) | Committer | biyan | | https://avatars.githubusercontent.com/pengzhiwei2018"} className="profile-pic" alt="pengzhiwei2018" align="middle" /> | [Zhiwei Peng](https://github.com/pengzhiwei2018) | Committer | zhiwei| | https://avatars.githubusercontent.com/xiarixiaoyao"} className="profile-pic" alt="xiarixiaoyao" align="middle" /> | [Tao Meng](https://github.com/xiarixiaoyao) | Committer | mengtao| | https://avatars.githubusercontent.com/yuzhaojing"} className="profile-pic" alt="yuzhaojing" align="middle" /> | [Zhaojing Yu](https://github.com/yuzhaojing) | Committer | yuzhaojing|
[GitHub] [hudi] xushiyan merged pull request #6891: [MINOR][DOCS] update committer list
xushiyan merged PR #6891: URL: https://github.com/apache/hudi/pull/6891 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4975) datahub sync bundle causes class loading issue
[ https://issues.apache.org/jira/browse/HUDI-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4975: - Labels: pull-request-available (was: ) > datahub sync bundle causes class loading issue > -- > > Key: HUDI-4975 > URL: https://issues.apache.org/jira/browse/HUDI-4975 > Project: Apache Hudi > Issue Type: Bug > Components: dependencies >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.2 > > > run utilities-slim.jar as the main jar for deltastreamer > set --jars > /tmp/hudi-datahub-sync-bundle-0.12.1-rc1.jar,/tmp/hudi-spark3.1-bundle_2.12-0.12.1-rc1.jar > put datahub sync bundle before spark bundle resulted in class loader issue. > works fine if spark bundle goes first > {code:bash} > Caused by: java.lang.NoClassDefFoundError: > org/apache/parquet/schema/LogicalTypeAnnotation > at > org.apache.hudi.io.storage.HoodieFileWriterFactory.newParquetFileWriter(HoodieFileWriterFactory.java:78) > at > org.apache.hudi.io.storage.HoodieFileWriterFactory.newParquetFileWriter(HoodieFileWriterFactory.java:70) > at > org.apache.hudi.io.storage.HoodieFileWriterFactory.getFileWriter(HoodieFileWriterFactory.java:54) > at > org.apache.hudi.io.HoodieCreateHandle.(HoodieCreateHandle.java:104) > at > org.apache.hudi.io.HoodieCreateHandle.(HoodieCreateHandle.java:76) > at > org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:46) > at > org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:83) > at > org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:40) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:135) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: java.lang.ClassNotFoundException: > org.apache.parquet.schema.LogicalTypeAnnotation > at java.net.URLClassLoader.findClass(URLClassLoader.java:387) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 14 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xushiyan opened a new pull request, #6896: [HUDI-4975] Fix datahub bundle dependency
xushiyan opened a new pull request, #6896: URL: https://github.com/apache/hudi/pull/6896 ### Change Logs - Make parquet-avro and avro scope provided in datahub bundle - ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: none | low | medium | high** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
hudi-bot commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272338806 ## CI report: * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN * b9b24c49718554e2263e07967fbcabbb3523a1c1 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12071) * f06e77aa268d70f0532bdaee53db7f9be660de39 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12073) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6745: Fix comment in RFC46
hudi-bot commented on PR #6745: URL: https://github.com/apache/hudi/pull/6745#issuecomment-1272337693 ## CI report: * 466535c2d2984fd57c471bb6127edc507d48d0b1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12070) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
hudi-bot commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272327704 ## CI report: * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN * b9b24c49718554e2263e07967fbcabbb3523a1c1 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12071) * f06e77aa268d70f0532bdaee53db7f9be660de39 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
hudi-bot commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272326567 ## CI report: * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN * f3e44c648063cc4da5198c5be5256d326511b304 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12066) * b9b24c49718554e2263e07967fbcabbb3523a1c1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12071) * f06e77aa268d70f0532bdaee53db7f9be660de39 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6895: [MINOR] Fix name spelling for RunBootstrapProcedure
hudi-bot commented on PR #6895: URL: https://github.com/apache/hudi/pull/6895#issuecomment-1272324593 ## CI report: * ce39faf7390aee37e4b00798c8dda25ab581e273 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12068) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency
hudi-bot commented on PR #5416: URL: https://github.com/apache/hudi/pull/5416#issuecomment-1272324223 ## CI report: * b838e1f406902c9bdfb5e84d53ef5a5effd0765b UNKNOWN * 6114ee2aa59f087e5ef0b1b53979eec143b33f5e UNKNOWN * 92760dbf5a047fe1f9941fa4b36c944eb3bec5c7 UNKNOWN * 447cb4510301af1c3ff1aebb3bd0a668872fc3f6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12069) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6845: [HUDI-4945] Add a test case for batch clean.
hudi-bot commented on PR #6845: URL: https://github.com/apache/hudi/pull/6845#issuecomment-1272310606 ## CI report: * f368eed82d5140142889b7853597a66770e99886 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11942) * a3851570e4d4e07ebc53bf67934829051802da04 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12072) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
hudi-bot commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272310556 ## CI report: * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN * f3e44c648063cc4da5198c5be5256d326511b304 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12066) * b9b24c49718554e2263e07967fbcabbb3523a1c1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12071) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6845: [HUDI-4945] Add a test case for batch clean.
hudi-bot commented on PR #6845: URL: https://github.com/apache/hudi/pull/6845#issuecomment-1272308678 ## CI report: * f368eed82d5140142889b7853597a66770e99886 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11942) * a3851570e4d4e07ebc53bf67934829051802da04 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
hudi-bot commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272308653 ## CI report: * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN * f3e44c648063cc4da5198c5be5256d326511b304 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12066) * b9b24c49718554e2263e07967fbcabbb3523a1c1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6893: [HUDI-4997] use jackson-v2 replace jackson-v1 import.
hudi-bot commented on PR #6893: URL: https://github.com/apache/hudi/pull/6893#issuecomment-1272307669 ## CI report: * 0f78ff5e81d51f2972bba066804a315bb23dbe12 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12067) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
hudi-bot commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272307615 ## CI report: * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN * f3e44c648063cc4da5198c5be5256d326511b304 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12066) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] LinMingQiang commented on a diff in pull request #6845: [HUDI-4945] Add a test case for batch clean.
LinMingQiang commented on code in PR #6845: URL: https://github.com/apache/hudi/pull/6845#discussion_r990634609 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java: ## @@ -96,9 +96,10 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context context) { pipeline = Pipelines.hoodieStreamWrite(conf, hoodieRecordDataStream); // compaction if (OptionsResolver.needsAsyncCompaction(conf)) { -// use synchronous compaction for bounded source. +// use synchronous compaction and clean for bounded source. if (context.isBounded()) { conf.setBoolean(FlinkOptions.COMPACTION_ASYNC_ENABLED, false); + conf.setBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED, false); } Review Comment: ok. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6745: Fix comment in RFC46
hudi-bot commented on PR #6745: URL: https://github.com/apache/hudi/pull/6745#issuecomment-1272297073 ## CI report: * 4c78db48d9e86c620f0824fe1438a1d151100d98 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12041) * 466535c2d2984fd57c471bb6127edc507d48d0b1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12070) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency
hudi-bot commented on PR #5416: URL: https://github.com/apache/hudi/pull/5416#issuecomment-1272296814 ## CI report: * b838e1f406902c9bdfb5e84d53ef5a5effd0765b UNKNOWN * 6114ee2aa59f087e5ef0b1b53979eec143b33f5e UNKNOWN * 92760dbf5a047fe1f9941fa4b36c944eb3bec5c7 UNKNOWN * 4ba91d4ce8345b4917e1f402694a55d07bf2951c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12047) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12052) * 447cb4510301af1c3ff1aebb3bd0a668872fc3f6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12069) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6745: Fix comment in RFC46
hudi-bot commented on PR #6745: URL: https://github.com/apache/hudi/pull/6745#issuecomment-1272296201 ## CI report: * 4c78db48d9e86c620f0824fe1438a1d151100d98 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12041) * 466535c2d2984fd57c471bb6127edc507d48d0b1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency
hudi-bot commented on PR #5416: URL: https://github.com/apache/hudi/pull/5416#issuecomment-1272295990 ## CI report: * b838e1f406902c9bdfb5e84d53ef5a5effd0765b UNKNOWN * 6114ee2aa59f087e5ef0b1b53979eec143b33f5e UNKNOWN * 92760dbf5a047fe1f9941fa4b36c944eb3bec5c7 UNKNOWN * 4ba91d4ce8345b4917e1f402694a55d07bf2951c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12047) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12052) * 447cb4510301af1c3ff1aebb3bd0a668872fc3f6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex
hudi-bot commented on PR #6680: URL: https://github.com/apache/hudi/pull/6680#issuecomment-1272295291 ## CI report: * 5f6d4f624c5f20cf3c4c38384e17c7bb13e56991 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12065) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency
zhangyue19921010 commented on PR #5416: URL: https://github.com/apache/hudi/pull/5416#issuecomment-1272290024 > Do we have tests for DistruptorProducers? I found tests only for DistruptorExecutor and DistruptorMessageQueue. > > Also, do we have tests to test out single producer and multiple producers? and can you summarize what kind of error cases have been tested. 1 producer thread crashing while others are still continuing to produce, memory too low to hold all produced records, but still no records should be dropped, etc. Hi @nsivabalan Thanks a lot for reminding. Add more tests here including: 1. `TestBoundedInMemoryExecutorInSpark#testExecutor` ==> test common disruptor executor ingestion 2. `TestBoundedInMemoryExecutorInSpark#testInterruptExecutor` ==> test disruptor executor ingestion with interrupt 3. `TestDisruptorMessageQueue#testRecordReading` ==> test common single producer and single consumer reading 4. `TestDisruptorMessageQueue#testCompositeProducerRecordReading` ==> test multi-producers and single consumer 5. `TestDisruptorMessageQueue#testException` ==> test multi-producers which one producer thread crashing while others are still continuing to produce. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6895: [MINOR] Fix name spelling for RunBootstrapProcedure
hudi-bot commented on PR #6895: URL: https://github.com/apache/hudi/pull/6895#issuecomment-1272283945 ## CI report: * ce39faf7390aee37e4b00798c8dda25ab581e273 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12068) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6893: [HUDI-4997] use jackson-v2 replace jackson-v1 import.
hudi-bot commented on PR #6893: URL: https://github.com/apache/hudi/pull/6893#issuecomment-1272283929 ## CI report: * 0f78ff5e81d51f2972bba066804a315bb23dbe12 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12067) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
hudi-bot commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272283853 ## CI report: * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN * ca9b8fb8950e382908469a40724fddff88aa60d0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11945) * f3e44c648063cc4da5198c5be5256d326511b304 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12066) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6895: [MINOR] Fix name spelling for RunBootstrapProcedure
hudi-bot commented on PR #6895: URL: https://github.com/apache/hudi/pull/6895#issuecomment-1272282750 ## CI report: * ce39faf7390aee37e4b00798c8dda25ab581e273 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6893: [HUDI-4997] use jackson-v2 replace jackson-v1 import.
hudi-bot commented on PR #6893: URL: https://github.com/apache/hudi/pull/6893#issuecomment-1272282740 ## CI report: * 0f78ff5e81d51f2972bba066804a315bb23dbe12 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table
hudi-bot commented on PR #6741: URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272282682 ## CI report: * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN * ca9b8fb8950e382908469a40724fddff88aa60d0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11945) * f3e44c648063cc4da5198c5be5256d326511b304 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272282554 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * 18ef7b44488dff256728b2bba024b4a4d00aebe9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12064) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] boneanxs commented on pull request #6895: [MINOR] Fix name spelling for RunBootstrapProcedure
boneanxs commented on PR #6895: URL: https://github.com/apache/hudi/pull/6895#issuecomment-1272280375 Hi, @XuQianJin-Stars a minor name fix for RunBootstrapProcedure, can you help to review it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] boneanxs opened a new pull request, #6895: [MINOR] Fix name spelling for RunBootstrapProcedure
boneanxs opened a new pull request, #6895: URL: https://github.com/apache/hudi/pull/6895 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: low** ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] gubinjie opened a new issue, #6894: [SUPPORT]Error running child : java.lang.NoSuchMethodError: org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageTy
gubinjie opened a new issue, #6894: URL: https://github.com/apache/hudi/issues/6894 CDH 6.3.2 Hudi 0.10.1 When querying a Hudi table through Hive, I get the following error: select * from hudi_flink_tyc_company_rt where name = '3213' `2022-10-08 16:30:27,365 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 2022-10-08 16:30:27,661 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2022-10-08 16:30:27,819 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: HoodieCombineRealtimeFileSplit{realtimeFileSplits=[HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/5652dad0-9e32-43f5-99c4-eff0a89c6a79_0-1-0_20220929181835942.parquet, deltaLogPaths=[hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.5652dad0-9e32-43f5-99c4-eff0a89c6a79_20220929181835942.log.1_0-1-0], maxCommitTime='20220929190955221', basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}, HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/6f026a25-797e-4a8b-9382-b426b94fd034_0-1-0_20220929181835942.parquet, deltaLogPaths=[hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.6f026a25-797e-4a8b-9382-b426b94fd034_20220929181835942.log.1_0-1-0], maxCommitTime='20220929190955221', basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}, HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehou se/paat_ods_hudi.db/4f90e72d-d205-4640-975f-09ebb2ad136a_0-1-0_20220929180105887.parquet, deltaLogPaths=[], maxCommitTime='20220929190955221', basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}, HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/b72b41e5-7bd9-4a87-a91d-86a368a2f7b7_0-1-0_20220929181835942.parquet, deltaLogPaths=[], maxCommitTime='20220929190955221', basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}]}InputFormatClass: org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat 2022-10-08 16:30:27,873 INFO [main] org.apache.hadoop.hive.conf.HiveConf: Found configuration file null 2022-10-08 16:30:27,980 INFO [main] org.apache.hadoop.hive.ql.exec.SerializationUtilities: Deserializing MapWork using kryo 2022-10-08 16:30:28,110 INFO [main] org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat: Before adding Hoodie columns, Projections :_hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name,company_id,company_name,legal_person_name,establish_time,reg_capital,reg_status,reg_number,org_number,credit_code,reg_location,phone_num,province_code,city_code,district_code,province,city,district,company_type,tax_code,category_code_std,social_security_staff_num,update_time, Ids :0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26 2022-10-08 16:30:28,110 INFO [main] org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat: Creating record reader with readCols :_hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name,company_id,company_name,legal_person_name,establish_time,reg_capital,reg_status,reg_number,org_number,credit_code,reg_location,phone_num,province_code,city_code,district_code,province,city,district,company_type,tax_code,category_code_std,social_security_staff_num,update_time, Ids :0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26 2022-10-08 16:30:28,361 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 2022-10-08 16:30:28,366 ERROR [main] org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl 2022-10-08 16:30:28,390 INFO [main] org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 44225 records. 2022-10-08 16:30:28,390 INFO [main] org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block 2022-10-08 16:30:28,412 INFO [main] org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 2022-10-08 16:30:28,413 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz] 2022-10-08 16:30:28,418 INFO [main] org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 28 ms. row count = 44225 2022-10-08 16:30:28,565 INFO [main] org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader: Enabling merged reading of realtime records for split HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/5652dad0-9e32-43f5-99c4-eff0a89c6a79_0-1-0_20220929181835942.parquet,
[jira] [Updated] (HUDI-4997) use jackson-v2 replace jackson-v1 import
[ https://issues.apache.org/jira/browse/HUDI-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4997: - Labels: pull-request-available (was: ) > use jackson-v2 replace jackson-v1 import > > > Key: HUDI-4997 > URL: https://issues.apache.org/jira/browse/HUDI-4997 > Project: Apache Hudi > Issue Type: Improvement > Components: cli >Reporter: fanshilun >Priority: Major > Labels: pull-request-available > > HoodieWriteCommitCallbackUtil uses ObjectMapper, but uses jackson-v1 import, > jackson-v1 has security risks, replace import with jackson-v2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] slfan1989 opened a new pull request, #6893: HUDI-4997: use jackson-v2 replace jackson-v1 import.
slfan1989 opened a new pull request, #6893: URL: https://github.com/apache/hudi/pull/6893 JIRA: HUDI-4997: use jackson-v2 replace jackson-v1 import. HoodieWriteCommitCallbackUtil uses ObjectMapper, but uses jackson-v1 import, jackson-v1 has security risks, replace import with jackson-v2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4997) use jackson-v2 replace jackson-v1 import
fanshilun created HUDI-4997: --- Summary: use jackson-v2 replace jackson-v1 import Key: HUDI-4997 URL: https://issues.apache.org/jira/browse/HUDI-4997 Project: Apache Hudi Issue Type: Improvement Components: cli Reporter: fanshilun HoodieWriteCommitCallbackUtil uses ObjectMapper, but uses jackson-v1 import, jackson-v1 has security risks, replace import with jackson-v2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex
hudi-bot commented on PR #6680: URL: https://github.com/apache/hudi/pull/6680#issuecomment-1272270347 ## CI report: * efc19bfcfb86bf582d4bd2584462083b8178c1c6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11783) * 5f6d4f624c5f20cf3c4c38384e17c7bb13e56991 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12065) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] gubinjie closed issue #6825: [SUPPORT]org.apache.hudi.exception.HoodieRemoteException: *****:37568 failed to respond
gubinjie closed issue #6825: [SUPPORT]org.apache.hudi.exception.HoodieRemoteException: *:37568 failed to respond URL: https://github.com/apache/hudi/issues/6825 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] gubinjie commented on issue #6825: [SUPPORT]org.apache.hudi.exception.HoodieRemoteException: *****:37568 failed to respond
gubinjie commented on issue #6825: URL: https://github.com/apache/hudi/issues/6825#issuecomment-1272269358 TH -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex
hudi-bot commented on PR #6680: URL: https://github.com/apache/hudi/pull/6680#issuecomment-1272269333 ## CI report: * efc19bfcfb86bf582d4bd2584462083b8178c1c6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11783) * 5f6d4f624c5f20cf3c4c38384e17c7bb13e56991 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wzx140 commented on a diff in pull request #6745: Fix comment in RFC46
wzx140 commented on code in PR #6745: URL: https://github.com/apache/hudi/pull/6745#discussion_r990612384 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/util/HoodieSparkRecordUtils.java: ## @@ -91,21 +65,20 @@ private static Option getNullableValAsString(StructType structType, Inte * @param structType {@link StructType} instance. * @return Column value if a single column, or concatenated String values by comma. */ - public static Object getRecordColumnValues(InternalRow row, + public static ComparableList getRecordColumnValues(InternalRow row, String[] columns, StructType structType, boolean consistentLogicalTimestampEnabled) { -if (columns.length == 1) { - NestedFieldPath posList = HoodieInternalRowUtils.getCachedPosList(structType, columns[0]); - return HoodieUnsafeRowUtils.getNestedInternalRowValue(row, posList); -} else { - // TODO this is inefficient, instead we can simply return array of Comparable - StringBuilder sb = new StringBuilder(); - for (String col : columns) { -// TODO support consistentLogicalTimestampEnabled -NestedFieldPath posList = HoodieInternalRowUtils.getCachedPosList(structType, columns[0]); -return HoodieUnsafeRowUtils.getNestedInternalRowValue(row, posList); +List list = new LinkedList<>(); +for (String column : columns) { + NestedFieldPath posList = HoodieInternalRowUtils.getCachedPosList(structType, column); + Object value = HoodieUnsafeRowUtils.getNestedInternalRowValue(row, posList); + DataType dataType = posList.parts()[posList.parts().length - 1]._2.dataType(); + if (value instanceof InternalRow | value instanceof MapData | value instanceof ArrayData Review Comment: Removed ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala: ## @@ -461,6 +461,18 @@ abstract class HoodieBaseRelation(val sqlContext: SQLContext, } protected def getTableState: HoodieTableState = { +val mergerImpls = (if (optParams.contains(HoodieWriteConfig.MERGER_IMPLS.key())) { Review Comment: Fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4996) Update cleaning doc
[ https://issues.apache.org/jira/browse/HUDI-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xi chaomin updated HUDI-4996: - Description: The parameter *--hoodieConfigs* in "cleans run" is a String array, the value should be separated with " ". > Update cleaning doc > --- > > Key: HUDI-4996 > URL: https://issues.apache.org/jira/browse/HUDI-4996 > Project: Apache Hudi > Issue Type: Improvement > Components: cleaning, docs >Reporter: xi chaomin >Priority: Major > > The parameter *--hoodieConfigs* in "cleans run" is a String array, the value > should be separated with " ". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4996) Update cleaning doc
xi chaomin created HUDI-4996: Summary: Update cleaning doc Key: HUDI-4996 URL: https://issues.apache.org/jira/browse/HUDI-4996 Project: Apache Hudi Issue Type: Improvement Components: cleaning, docs Reporter: xi chaomin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xicm opened a new pull request, #6892: Update hoodie_cleaner.md
xicm opened a new pull request, #6892: URL: https://github.com/apache/hudi/pull/6892 ### Change Logs The parameter **--hoodieConfigs** in "cleans run" is a String array, the value should be separated with " ". ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: none | low | medium | high** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex
YuweiXiao commented on code in PR #6680: URL: https://github.com/apache/hudi/pull/6680#discussion_r990607541 ## hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java: ## @@ -179,15 +197,125 @@ public void close() throws Exception { } protected List getAllQueryPartitionPaths() { +if (cachedAllPartitionPaths != null) { + return cachedAllPartitionPaths; +} + +loadAllQueryPartitionPaths(); +return cachedAllPartitionPaths; + } + + private void loadAllQueryPartitionPaths() { List queryRelativePartitionPaths = queryPaths.stream() .map(path -> FSUtils.getRelativePartitionPath(basePath, path)) .collect(Collectors.toList()); -// Load all the partition path from the basePath, and filter by the query partition path. -// TODO load files from the queryRelativePartitionPaths directly. -List matchedPartitionPaths = getAllPartitionPathsUnchecked() -.stream() -.filter(path -> queryRelativePartitionPaths.stream().anyMatch(path::startsWith)) +this.cachedAllPartitionPaths = listQueryPartitionPaths(queryRelativePartitionPaths); + +// If the partition value contains InternalRow.empty, we query it as a non-partitioned table. +this.queryAsNonePartitionedTable = this.cachedAllPartitionPaths.stream().anyMatch(p -> p.values.length == 0); + } + + protected Map> getAllInputFileSlices() { +if (!isAllInputFileSlicesCached) { + doRefresh(); +} +return cachedAllInputFileSlices; + } + + /** + * Get input file slice for the given partition. Will use cache directly if it is computed before. + */ + protected List getCachedInputFileSlices(PartitionPath partition) { +return cachedAllInputFileSlices.computeIfAbsent(partition, this::loadFileSlicesForPartition); + } + + private List loadFileSlicesForPartition(PartitionPath p) { +FileStatus[] files = loadPartitionPathFiles(p); +HoodieTimeline activeTimeline = getActiveTimeline(); +Option latestInstant = activeTimeline.lastInstant(); + +HoodieTableFileSystemView fileSystemView = new HoodieTableFileSystemView(metaClient, activeTimeline, files); + +Option queryInstant = specifiedQueryInstant.or(() -> latestInstant.map(HoodieInstant::getTimestamp)); + +validate(activeTimeline, queryInstant); + +List ret; +if (tableType.equals(HoodieTableType.MERGE_ON_READ) && queryType.equals(HoodieTableQueryType.SNAPSHOT)) { + ret = queryInstant.map(instant -> + fileSystemView.getLatestMergedFileSlicesBeforeOrOn(p.path, queryInstant.get()) + .collect(Collectors.toList()) + ) + .orElse(Collections.emptyList()); +} else { + ret = queryInstant.map(instant -> + fileSystemView.getLatestFileSlicesBeforeOrOn(p.path, instant, true) + ) + .orElse(fileSystemView.getLatestFileSlices(p.path)) + .collect(Collectors.toList()); +} + +cachedFileSize += ret.stream().mapToLong(BaseHoodieTableFileIndex::fileSliceSize).sum(); +return ret; + } + + /** + * Get partition path with the given partition value + * @param partitionNames partition names + * @param values partition values + * @return partitions that match the given partition values + */ + protected List getPartitionPaths(String[] partitionNames, String[] values) { +if (partitionNames.length == 0 || partitionNames.length != values.length) { Review Comment: Yeah, I cleaned up the code accordingly. I added `isPartial` to replace the role of `idx`. Could u take another look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex
YuweiXiao commented on code in PR #6680: URL: https://github.com/apache/hudi/pull/6680#discussion_r990606063 ## hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java: ## @@ -179,15 +197,125 @@ public void close() throws Exception { } protected List getAllQueryPartitionPaths() { +if (cachedAllPartitionPaths != null) { + return cachedAllPartitionPaths; +} + +loadAllQueryPartitionPaths(); +return cachedAllPartitionPaths; + } + + private void loadAllQueryPartitionPaths() { List queryRelativePartitionPaths = queryPaths.stream() .map(path -> FSUtils.getRelativePartitionPath(basePath, path)) .collect(Collectors.toList()); -// Load all the partition path from the basePath, and filter by the query partition path. -// TODO load files from the queryRelativePartitionPaths directly. -List matchedPartitionPaths = getAllPartitionPathsUnchecked() -.stream() -.filter(path -> queryRelativePartitionPaths.stream().anyMatch(path::startsWith)) +this.cachedAllPartitionPaths = listQueryPartitionPaths(queryRelativePartitionPaths); + +// If the partition value contains InternalRow.empty, we query it as a non-partitioned table. +this.queryAsNonePartitionedTable = this.cachedAllPartitionPaths.stream().anyMatch(p -> p.values.length == 0); + } + + protected Map> getAllInputFileSlices() { +if (!isAllInputFileSlicesCached) { + doRefresh(); +} +return cachedAllInputFileSlices; + } + + /** + * Get input file slice for the given partition. Will use cache directly if it is computed before. + */ + protected List getCachedInputFileSlices(PartitionPath partition) { +return cachedAllInputFileSlices.computeIfAbsent(partition, this::loadFileSlicesForPartition); + } + + private List loadFileSlicesForPartition(PartitionPath p) { +FileStatus[] files = loadPartitionPathFiles(p); +HoodieTimeline activeTimeline = getActiveTimeline(); +Option latestInstant = activeTimeline.lastInstant(); + +HoodieTableFileSystemView fileSystemView = new HoodieTableFileSystemView(metaClient, activeTimeline, files); + +Option queryInstant = specifiedQueryInstant.or(() -> latestInstant.map(HoodieInstant::getTimestamp)); + +validate(activeTimeline, queryInstant); + +List ret; +if (tableType.equals(HoodieTableType.MERGE_ON_READ) && queryType.equals(HoodieTableQueryType.SNAPSHOT)) { + ret = queryInstant.map(instant -> + fileSystemView.getLatestMergedFileSlicesBeforeOrOn(p.path, queryInstant.get()) + .collect(Collectors.toList()) + ) + .orElse(Collections.emptyList()); +} else { + ret = queryInstant.map(instant -> + fileSystemView.getLatestFileSlicesBeforeOrOn(p.path, instant, true) + ) + .orElse(fileSystemView.getLatestFileSlices(p.path)) + .collect(Collectors.toList()); +} + +cachedFileSize += ret.stream().mapToLong(BaseHoodieTableFileIndex::fileSliceSize).sum(); +return ret; + } + + /** + * Get partition path with the given partition value + * @param partitionNames partition names + * @param values partition values + * @return partitions that match the given partition values + */ + protected List getPartitionPaths(String[] partitionNames, String[] values) { +if (partitionNames.length == 0 || partitionNames.length != values.length) { + LOG.info("The input partition names or value is empty, fallback to return all partition paths"); + return getAllQueryPartitionPaths(); +} + +if (cachedAllPartitionPaths != null) { + LOG.info("All partition paths have already loaded, use it directly"); + return cachedAllPartitionPaths; +} + +boolean hiveStylePartitioning = Boolean.parseBoolean(metaClient.getTableConfig().getHiveStylePartitioningEnable()); +boolean urlEncodePartitioning = Boolean.parseBoolean(this.metaClient.getTableConfig().getUrlEncodePartitioning()); +Map partitionNameToIdx = IntStream.range(0, partitionNames.length) +.mapToObj(i -> Pair.of(i, partitionNames[i])) +.collect(Collectors.toMap(Pair::getValue, Pair::getKey)); +StringBuilder queryPartitionPath = new StringBuilder(); +int idx = 0; +for (; idx < partitionNames.length; ++idx) { + String columnNames = this.partitionColumns[idx]; + if (partitionNameToIdx.containsKey(columnNames)) { +int k = partitionNameToIdx.get(columnNames); +String value = urlEncodePartitioning ? PartitionPathEncodeUtils.escapePathName(values[k]) : values[k]; +queryPartitionPath.append(hiveStylePartitioning ? columnNames + "=" : "").append(value).append("/"); + } else { +break; + } +} +queryPartitionPath.deleteCharAt(queryPartitionPath.length() - 1); +// Return directly if all partition values are specified. +if (idx == this.partitionColumns.length) { + return
[GitHub] [hudi] wzx140 commented on a diff in pull request #6745: Fix comment in RFC46
wzx140 commented on code in PR #6745: URL: https://github.com/apache/hudi/pull/6745#discussion_r990605829 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java: ## @@ -253,21 +253,21 @@ private Option prepareRecord(HoodieRecord hoodieRecord) { } private HoodieRecord populateMetadataFields(HoodieRecord hoodieRecord, Schema schema, Properties prop) throws IOException { -Map metadataValues = new HashMap<>(); -String seqId = -HoodieRecord.generateSequenceId(instantTime, getPartitionId(), RECORD_COUNTER.getAndIncrement()); +MetadataValues metadataValues = new MetadataValues(); if (config.populateMetaFields()) { - metadataValues.put(HoodieRecord.HoodieMetadataField.FILENAME_METADATA_FIELD.getFieldName(), fileId); - metadataValues.put(HoodieRecord.HoodieMetadataField.PARTITION_PATH_METADATA_FIELD.getFieldName(), partitionPath); - metadataValues.put(HoodieRecord.HoodieMetadataField.RECORD_KEY_METADATA_FIELD.getFieldName(), hoodieRecord.getRecordKey()); - metadataValues.put(HoodieRecord.HoodieMetadataField.COMMIT_TIME_METADATA_FIELD.getFieldName(), instantTime); - metadataValues.put(HoodieRecord.HoodieMetadataField.COMMIT_SEQNO_METADATA_FIELD.getFieldName(), seqId); + String seqId = + HoodieRecord.generateSequenceId(instantTime, getPartitionId(), RECORD_COUNTER.getAndIncrement()); + metadataValues.setFileName(fileId); + metadataValues.setPartitionPath(partitionPath); + metadataValues.setRecordKey(hoodieRecord.getRecordKey()); + metadataValues.setCommitTime(instantTime); + metadataValues.setCommitSeqno(seqId); } if (config.allowOperationMetadataField()) { - metadataValues.put(HoodieRecord.HoodieMetadataField.OPERATION_METADATA_FIELD.getFieldName(), hoodieRecord.getOperation().getName()); + metadataValues.setOperation(hoodieRecord.getOperation().getName()); } -return hoodieRecord.updateValues(schema, prop, metadataValues); +return hoodieRecord.updateMetadataValues(schema, prop, metadataValues); Review Comment: if config.populateMetaFields=false, then metadataValues is empty. And hoodieRecord.updateMetadataValues will do nothing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex
YuweiXiao commented on code in PR #6680: URL: https://github.com/apache/hudi/pull/6680#discussion_r990605731 ## hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java: ## @@ -138,7 +143,20 @@ public BaseHoodieTableFileIndex(HoodieEngineContext engineContext, this.engineContext = engineContext; this.fileStatusCache = fileStatusCache; -doRefresh(); +/** + * The `shouldRefresh` variable controls how we initialize the TableFileIndex: Review Comment: I removed `isAllInputFileSlicesCached ` and have following logic to check is all file slices cached: ``` if (cachedAllPartitionPaths == null) { return false; } return cachedAllPartitionPaths.stream().allMatch(p -> cachedAllInputFileSlices.containsKey(p)); ``` Basically, we check if all partitions are loaded. Then we check if all partitions is contained in the `cachedAllInputFileSlices`. It should be cleaner instead of maintaining a separate flag variable. ## hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java: ## @@ -179,15 +197,125 @@ public void close() throws Exception { } protected List getAllQueryPartitionPaths() { +if (cachedAllPartitionPaths != null) { + return cachedAllPartitionPaths; +} + +loadAllQueryPartitionPaths(); +return cachedAllPartitionPaths; + } + + private void loadAllQueryPartitionPaths() { List queryRelativePartitionPaths = queryPaths.stream() .map(path -> FSUtils.getRelativePartitionPath(basePath, path)) .collect(Collectors.toList()); -// Load all the partition path from the basePath, and filter by the query partition path. -// TODO load files from the queryRelativePartitionPaths directly. -List matchedPartitionPaths = getAllPartitionPathsUnchecked() -.stream() -.filter(path -> queryRelativePartitionPaths.stream().anyMatch(path::startsWith)) +this.cachedAllPartitionPaths = listQueryPartitionPaths(queryRelativePartitionPaths); + +// If the partition value contains InternalRow.empty, we query it as a non-partitioned table. +this.queryAsNonePartitionedTable = this.cachedAllPartitionPaths.stream().anyMatch(p -> p.values.length == 0); Review Comment: Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272253598 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * f8732300afaf355296ca13fe7f2d3e9a131315d6 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12063) * 18ef7b44488dff256728b2bba024b4a4d00aebe9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12064) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex
YuweiXiao commented on code in PR #6680: URL: https://github.com/apache/hudi/pull/6680#discussion_r990601721 ## hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java: ## @@ -179,15 +197,125 @@ public void close() throws Exception { } protected List getAllQueryPartitionPaths() { +if (cachedAllPartitionPaths != null) { + return cachedAllPartitionPaths; +} + +loadAllQueryPartitionPaths(); +return cachedAllPartitionPaths; + } + + private void loadAllQueryPartitionPaths() { List queryRelativePartitionPaths = queryPaths.stream() .map(path -> FSUtils.getRelativePartitionPath(basePath, path)) .collect(Collectors.toList()); -// Load all the partition path from the basePath, and filter by the query partition path. -// TODO load files from the queryRelativePartitionPaths directly. -List matchedPartitionPaths = getAllPartitionPathsUnchecked() -.stream() -.filter(path -> queryRelativePartitionPaths.stream().anyMatch(path::startsWith)) +this.cachedAllPartitionPaths = listQueryPartitionPaths(queryRelativePartitionPaths); + +// If the partition value contains InternalRow.empty, we query it as a non-partitioned table. +this.queryAsNonePartitionedTable = this.cachedAllPartitionPaths.stream().anyMatch(p -> p.values.length == 0); + } + + protected Map> getAllInputFileSlices() { +if (!isAllInputFileSlicesCached) { Review Comment: Yeah, good point. 1) generalize to batch get 2) load only remaining partitions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex
YuweiXiao commented on code in PR #6680: URL: https://github.com/apache/hudi/pull/6680#discussion_r990601573 ## hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java: ## @@ -179,15 +197,125 @@ public void close() throws Exception { } protected List getAllQueryPartitionPaths() { +if (cachedAllPartitionPaths != null) { + return cachedAllPartitionPaths; +} + +loadAllQueryPartitionPaths(); Review Comment: Yes, you are right. I will have it inlined. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272242144 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * f8732300afaf355296ca13fe7f2d3e9a131315d6 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12063) * 18ef7b44488dff256728b2bba024b4a4d00aebe9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272241019 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * 1d98224805b75fc0c9c8ec54948870e96c4b54e7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12043) * f8732300afaf355296ca13fe7f2d3e9a131315d6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12063) * 18ef7b44488dff256728b2bba024b4a4d00aebe9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272240117 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * 1d98224805b75fc0c9c8ec54948870e96c4b54e7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12043) * f8732300afaf355296ca13fe7f2d3e9a131315d6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file
alexeykudinkin commented on code in PR #6358: URL: https://github.com/apache/hudi/pull/6358#discussion_r990595189 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -185,7 +185,7 @@ public class HoodieWriteConfig extends HoodieConfig { public static final ConfigProperty AVRO_SCHEMA_VALIDATE_ENABLE = ConfigProperty .key("hoodie.avro.schema.validate") - .defaultValue("false") + .defaultValue("true") Review Comment: This is flipped to default to make sure proper schema validation are run for every operation on the table ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java: ## @@ -81,20 +79,7 @@ public static IgnoreRecord IGNORE_RECORD = new IgnoreRecord(); /** - * The specified schema of the table. ("specified" denotes that this is configured by the client, - * as opposed to being implicitly fetched out of the commit metadata) - */ - protected final Schema tableSchema; - protected final Schema tableSchemaWithMetaFields; Review Comment: These fields were misused and are redundant, hence deleted ## hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaCompatibility.java: ## @@ -0,0 +1,941 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.avro; + +import org.apache.avro.AvroRuntimeException; +import org.apache.avro.Schema; +import org.apache.avro.Schema.Field; +import org.apache.avro.Schema.Type; +import org.apache.hudi.common.util.Either; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.Deque; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.Set; +import java.util.TreeSet; +import java.util.stream.Collectors; + +import static org.apache.hudi.common.util.ValidationUtils.checkState; + +/** + * Evaluate the compatibility between a reader schema and a writer schema. A + * reader and a writer schema are declared compatible if all datum instances of + * the writer schema can be successfully decoded using the specified reader + * schema. + * + * NOTE: PLEASE READ CAREFULLY BEFORE CHANGING + * + * This code is borrowed from Avro 1.10, with the following modifications: + * + * Compatibility checks ignore schema name, unless schema is held inside + * a union + * + * + */ +public class AvroSchemaCompatibility { Review Comment: Context: Avro requires at all times that schema's names have to match in order for them to be counted as compatible. Provided that only Avro bears the names on the schemas themselves (Spark does not, for ex) this makes for ex, some schemas converted from Spark's [[StructType]] incompatible w/ Avro This has code is mostly borrowed as is from Avro 1.10 w/ the following critical adjustments: Schema names now are only checked in following 2 cases: - In case it's a top-level schema - In case schema is enclosed into a union (in which case its name might be used for reverse-lookup) ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseMergeHelper.java: ## @@ -18,91 +18,47 @@ package org.apache.hudi.table.action.commit; +import org.apache.avro.generic.GenericRecord; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; import org.apache.hudi.avro.HoodieAvroUtils; import org.apache.hudi.client.utils.MergingIterator; -import org.apache.hudi.common.model.HoodieBaseFile; -import org.apache.hudi.common.model.HoodieRecordPayload; import org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer; -import org.apache.hudi.exception.HoodieException; import org.apache.hudi.io.HoodieMergeHandle; import org.apache.hudi.io.storage.HoodieFileReader; import org.apache.hudi.io.storage.HoodieFileReaderFactory; import org.apache.hudi.table.HoodieTable; -import