[GitHub] [hudi] wzx140 commented on pull request #6745: Fix comment in RFC46

2022-10-08 Thread GitBox


wzx140 commented on PR #6745:
URL: https://github.com/apache/hudi/pull/6745#issuecomment-1272461930

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full

2022-10-08 Thread GitBox


hudi-bot commented on PR #6284:
URL: https://github.com/apache/hudi/pull/6284#issuecomment-1272455607

   
   ## CI report:
   
   * 026dbfc7a6d4d7e489e8c8671a84e143bdb01758 UNKNOWN
   * a455e4c67d1ac237ef999ac8d6aa584af2f4cd1f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12081)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



svn commit: r57231 - in /dev/hudi/hudi-0.12.1-rc2: ./ hudi-0.12.1-rc2.src.tgz.asc

2022-10-08 Thread yuzhaojing
Author: yuzhaojing
Date: Sun Oct  9 04:04:16 2022
New Revision: 57231

Log: (empty)

Added:
dev/hudi/hudi-0.12.1-rc2/
dev/hudi/hudi-0.12.1-rc2/hudi-0.12.1-rc2.src.tgz.asc

Added: dev/hudi/hudi-0.12.1-rc2/hudi-0.12.1-rc2.src.tgz.asc
==
--- dev/hudi/hudi-0.12.1-rc2/hudi-0.12.1-rc2.src.tgz.asc (added)
+++ dev/hudi/hudi-0.12.1-rc2/hudi-0.12.1-rc2.src.tgz.asc Sun Oct  9 04:04:16 
2022
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEEtDBVGfNt1+i35qaEWLhbgUd4POIFAmNCRrcACgkQWLhbgUd4
+POJz4w/+IiYdMJ9T3EokFjbvSRboaf7CCxFJQI6Oo4jWffNMe5ucUe7HmK9FVYzZ
+MmspbWifHrv9lojKVf9Lr37CCo+3SjktTGVPh1Ux7qOMQGJjlG4/Mf9oWq0cnKys
+pGpGFdT8P39ETtlad2ic+JYyHTidE9lwIM0/p1syZO2sTL6e1093i+COfrgQxzjQ
+BRcM95oeNmFOTjrJfKChM86IqZJqejl0duYci4BqxcYB+NgLIYtXYDxEQeGtP13N
+Y2zPVDJ1FNkltgrMBaK2m8eh/4ZUALLgzVGAf/jaOc/Zw1rvdBCJrpYUCsAiH04y
+aYsIFfCgvazrZsG/bLCa8kzR1TkjJenHNuvqQz8nhRJK+7pAludM+S3mTgVPdi8h
+/wZUFNSNDGtaFX783FYVhZ9RKvyY4hH7OBlojlTbF6hqqpAXFLFr0BVj4Lri76Z/
+ow+RHu06heVqgXhvFSycXuB0t3JiSZuvF8JwvkKxRqek2vh6MXU3jnn04vipKgx7
+pel2z3wh6aVEwJT/VFNUtqPyp/6yVxs0p+94h2Hhd5FChYOh2mi8/MI0pBHhB+Vw
+5WrqBkEbTxjiMCGyE78yK2V5tAcwV/WaH4kQ+oDTIBPa6zsNicUP7CrvvgqwBB04
+mdHuzmQtDTZoaLD7meyrTTl3xC4h4Om6JBAeFcomB4sX/uhDXVE=
+=IV7M
+-END PGP SIGNATURE-




[hudi] annotated tag release-0.12.1-rc2 updated (baeff4331d -> 0672671a24)

2022-10-08 Thread yuzhaojing
This is an automated email from the ASF dual-hosted git repository.

yuzhaojing pushed a change to annotated tag release-0.12.1-rc2
in repository https://gitbox.apache.org/repos/asf/hudi.git


*** WARNING: tag release-0.12.1-rc2 was modified! ***

from baeff4331d (commit)
  to 0672671a24 (tag)
 tagging baeff4331dd25742f8280553281b773bc5e570a5 (commit)
 replaces release-0.12.1-rc1
  by 喻兆靖
  on Sun Oct 9 12:00:41 2022 +0800

- Log -
0.12.1
-BEGIN PGP SIGNATURE-

iQIzBAABCAAdFiEEtDBVGfNt1+i35qaEWLhbgUd4POIFAmNCR2kACgkQWLhbgUd4
POIlfw/+MoDAQDus6WUXKEPLfvhXiM3AK/xw6fupGd1N0ge6mA1FR4A0YJ1mFpEH
hNw+ZI7Nn43FeikBTq3FeqSBkHYJpDu/OmAoN7trEyidpb09uzAcFSte2XZfJK0s
xf5s0kmSl5DP7PCE4+B6DhJDDG11G40HYGiyoOoOwU1XMpOfEdQUfxU+dorwy7dj
gDucsTqHRFviiN27KsG0ONSihZhY3ODStxsq5zDqsDvaVJZfSafZE8txsFyuGJle
7/mwBuKuhP+hmc0o9m9N3gY/aGTCCTX2V309um8H8H5l/dq3Dm4gCA4NB750WuFJ
EPOUpSXwk9QXgGRIV52aOBZAVDIzZc4ME4Ngav/XFVHK/rL9zJ7iEmZ0lbQtDzt8
wXR2Ljh4F7hETQ4jG2G67ETh+XnQQQfKAkpdTigS5ox2vU4023oyx9XMQRpLx5Mc
ifsk1YeoFx0ptlm33bqaV2OGutB3t1f51iEBt5sc/lf10JO8ehf0kJwIGvNGlE6C
8hTIsQHCf4qCrnpo7/pEFZQ9XskrWnBgM9kXZhMdssiUvQ5LJHD+MWGt15pL8VM5
w8zJmIId/vAubrTAujTdV+i2WgFi0rEKsB7gp/ITroheXv3V3GRg3voNaB5Wx+DI
LsDOloyvO0z14IO13RQ4gXuiuBwMjKPBXTjPXDXUuq5fY/EBtxI=
=tdzo
-END PGP SIGNATURE-
---


No new revisions were added by this update.

Summary of changes:



[hudi] 01/02: [HUDI-4992] Fixing invalid min/max record key stats in Parquet metadata (#6883)

2022-10-08 Thread yuzhaojing
This is an automated email from the ASF dual-hosted git repository.

yuzhaojing pushed a commit to branch release-0.12.1
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit 51105621af08eb77a8007ac965a8e5d0882c5d97
Author: Alexey Kudinkin 
AuthorDate: Fri Oct 7 03:37:26 2022 -0700

[HUDI-4992] Fixing invalid min/max record key stats in Parquet metadata 
(#6883)
---
 .../apache/hudi/io/storage/HoodieOrcWriter.java|  10 +-
 .../hudi/avro/TestHoodieAvroParquetWriter.java | 118 +
 .../hudi/io/storage/TestHoodieOrcReaderWriter.java |   7 +-
 .../row/HoodieRowDataParquetWriteSupport.java  |  55 --
 .../storage/row/HoodieRowParquetWriteSupport.java  |  61 +--
 .../row/TestHoodieInternalRowParquetWriter.java|  95 ++---
 .../apache/hudi/avro/HoodieAvroWriteSupport.java   |  60 +--
 .../hudi/avro/HoodieBloomFilterWriteSupport.java   |  96 +
 .../org/apache/hudi/common/util/BaseFileUtils.java |  13 +--
 .../hudi/avro/TestHoodieAvroWriteSupport.java  |  67 
 10 files changed, 361 insertions(+), 221 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieOrcWriter.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieOrcWriter.java
index a532ac66c9..4bcab2cec8 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieOrcWriter.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieOrcWriter.java
@@ -23,6 +23,7 @@ import org.apache.avro.generic.GenericRecord;
 import org.apache.avro.generic.IndexedRecord;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
+import org.apache.hudi.avro.HoodieBloomFilterWriteSupport;
 import org.apache.hudi.common.bloom.BloomFilter;
 import org.apache.hudi.common.bloom.HoodieDynamicBoundedBloomFilter;
 import org.apache.hudi.common.engine.TaskContextSupplier;
@@ -44,9 +45,6 @@ import java.util.List;
 import java.util.concurrent.atomic.AtomicLong;
 
 import static 
org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_AVRO_BLOOM_FILTER_METADATA_KEY;
-import static 
org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_BLOOM_FILTER_TYPE_CODE;
-import static 
org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_MAX_RECORD_KEY_FOOTER;
-import static 
org.apache.hudi.avro.HoodieAvroWriteSupport.HOODIE_MIN_RECORD_KEY_FOOTER;
 
 public class HoodieOrcWriter
 implements HoodieFileWriter, Closeable {
@@ -155,11 +153,11 @@ public class HoodieOrcWriterhttp://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.DummyTaskContextSupplier;
+import org.apache.hudi.common.bloom.BloomFilter;
+import org.apache.hudi.common.bloom.BloomFilterFactory;
+import org.apache.hudi.common.bloom.BloomFilterTypeCode;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ParquetUtils;
+import org.apache.hudi.io.storage.HoodieAvroParquetWriter;
+import org.apache.hudi.io.storage.HoodieParquetConfig;
+import org.apache.parquet.avro.AvroSchemaConverter;
+import org.apache.parquet.hadoop.ParquetWriter;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.apache.parquet.hadoop.metadata.FileMetaData;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+public class TestHoodieAvroParquetWriter {
+
+  @TempDir java.nio.file.Path tmpDir;
+
+  @Test
+  public void testProperWriting() throws IOException {
+Configuration hadoopConf = new Configuration();
+
+HoodieTestDataGenerator dataGen = new HoodieTestDataGenerator(0xDEED);
+List records = dataGen.generateGenericRecords(10);
+
+Schema schema = records.get(0).getSchema();
+
+BloomFilter filter = BloomFilterFactory.createBloomFilter(1000, 0.0001, 
1,
+BloomFilterTypeCode.DYNAMIC_V0.name());
+HoodieAvroWriteSupport writeSupport = new HoodieAvroWriteSupport(new 
AvroSchemaConverter().convert(schema),
+schema, Option.of(filter));
+
+HoodieParquetConfig parquetConfig =
+new 

[hudi] branch release-0.12.1 updated (28cb191df7 -> baeff4331d)

2022-10-08 Thread yuzhaojing
This is an automated email from the ASF dual-hosted git repository.

yuzhaojing pushed a change to branch release-0.12.1
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 28cb191df7 [MINOR] Update release version to reflect published version 
 0.12.1
 new 51105621af [HUDI-4992] Fixing invalid min/max record key stats in 
Parquet metadata (#6883)
 new baeff4331d Bumping release candidate number 2

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docker/hoodie/hadoop/base/pom.xml  |   2 +-
 docker/hoodie/hadoop/base_java11/pom.xml   |   2 +-
 docker/hoodie/hadoop/datanode/pom.xml  |   2 +-
 docker/hoodie/hadoop/historyserver/pom.xml |   2 +-
 docker/hoodie/hadoop/hive_base/pom.xml |   2 +-
 docker/hoodie/hadoop/namenode/pom.xml  |   2 +-
 docker/hoodie/hadoop/pom.xml   |   2 +-
 docker/hoodie/hadoop/prestobase/pom.xml|   2 +-
 docker/hoodie/hadoop/spark_base/pom.xml|   2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml|   2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml   |   2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml   |   2 +-
 docker/hoodie/hadoop/trinobase/pom.xml |   2 +-
 docker/hoodie/hadoop/trinocoordinator/pom.xml  |   2 +-
 docker/hoodie/hadoop/trinoworker/pom.xml   |   2 +-
 hudi-aws/pom.xml   |   4 +-
 hudi-cli/pom.xml   |   2 +-
 hudi-client/hudi-client-common/pom.xml |   4 +-
 .../apache/hudi/io/storage/HoodieOrcWriter.java|  10 +-
 .../hudi/avro/TestHoodieAvroParquetWriter.java | 118 +
 .../hudi/io/storage/TestHoodieOrcReaderWriter.java |   7 +-
 hudi-client/hudi-flink-client/pom.xml  |   4 +-
 .../row/HoodieRowDataParquetWriteSupport.java  |  55 --
 hudi-client/hudi-java-client/pom.xml   |   4 +-
 hudi-client/hudi-spark-client/pom.xml  |   4 +-
 .../storage/row/HoodieRowParquetWriteSupport.java  |  61 +--
 .../row/TestHoodieInternalRowParquetWriter.java|  95 ++---
 hudi-client/pom.xml|   2 +-
 hudi-common/pom.xml|   2 +-
 .../apache/hudi/avro/HoodieAvroWriteSupport.java   |  60 +--
 .../hudi/avro/HoodieBloomFilterWriteSupport.java   |  96 +
 .../org/apache/hudi/common/util/BaseFileUtils.java |  13 +--
 .../hudi/avro/TestHoodieAvroWriteSupport.java  |  67 
 hudi-examples/hudi-examples-common/pom.xml |   2 +-
 hudi-examples/hudi-examples-flink/pom.xml  |   2 +-
 hudi-examples/hudi-examples-java/pom.xml   |   2 +-
 hudi-examples/hudi-examples-spark/pom.xml  |   2 +-
 hudi-examples/pom.xml  |   2 +-
 hudi-flink-datasource/hudi-flink/pom.xml   |   4 +-
 hudi-flink-datasource/hudi-flink1.13.x/pom.xml |   4 +-
 hudi-flink-datasource/hudi-flink1.14.x/pom.xml |   4 +-
 hudi-flink-datasource/hudi-flink1.15.x/pom.xml |   4 +-
 hudi-flink-datasource/pom.xml  |   4 +-
 hudi-gcp/pom.xml   |   2 +-
 hudi-hadoop-mr/pom.xml |   2 +-
 hudi-integ-test/pom.xml|   2 +-
 hudi-kafka-connect/pom.xml |   4 +-
 hudi-spark-datasource/hudi-spark-common/pom.xml|   4 +-
 hudi-spark-datasource/hudi-spark/pom.xml   |   4 +-
 hudi-spark-datasource/hudi-spark2-common/pom.xml   |   2 +-
 hudi-spark-datasource/hudi-spark2/pom.xml  |   4 +-
 hudi-spark-datasource/hudi-spark3-common/pom.xml   |   2 +-
 hudi-spark-datasource/hudi-spark3.1.x/pom.xml  |   4 +-
 hudi-spark-datasource/hudi-spark3.2.x/pom.xml  |   4 +-
 .../hudi-spark3.2plus-common/pom.xml   |   2 +-
 hudi-spark-datasource/hudi-spark3.3.x/pom.xml  |   4 +-
 hudi-spark-datasource/pom.xml  |   2 +-
 hudi-sync/hudi-adb-sync/pom.xml|   2 +-
 hudi-sync/hudi-datahub-sync/pom.xml|   2 +-
 hudi-sync/hudi-hive-sync/pom.xml   |   2 +-
 hudi-sync/hudi-sync-common/pom.xml |   2 +-
 hudi-sync/pom.xml  |   2 +-
 hudi-tests-common/pom.xml  |   2 +-
 hudi-timeline-service/pom.xml  |   2 +-
 hudi-utilities/pom.xml |   2 +-
 packaging/hudi-aws-bundle/pom.xml  |   2 +-
 packaging/hudi-datahub-sync-bundle/pom.xml |   2 +-
 packaging/hudi-flink-bundle/pom.xml|   2 +-
 packaging/hudi-gcp-bundle/pom.xml  |   2 +-
 

[hudi] 02/02: Bumping release candidate number 2

2022-10-08 Thread yuzhaojing
This is an automated email from the ASF dual-hosted git repository.

yuzhaojing pushed a commit to branch release-0.12.1
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit baeff4331dd25742f8280553281b773bc5e570a5
Author: 喻兆靖 
AuthorDate: Sun Oct 9 11:56:52 2022 +0800

Bumping release candidate number 2
---
 docker/hoodie/hadoop/base/pom.xml  | 2 +-
 docker/hoodie/hadoop/base_java11/pom.xml   | 2 +-
 docker/hoodie/hadoop/datanode/pom.xml  | 2 +-
 docker/hoodie/hadoop/historyserver/pom.xml | 2 +-
 docker/hoodie/hadoop/hive_base/pom.xml | 2 +-
 docker/hoodie/hadoop/namenode/pom.xml  | 2 +-
 docker/hoodie/hadoop/pom.xml   | 2 +-
 docker/hoodie/hadoop/prestobase/pom.xml| 2 +-
 docker/hoodie/hadoop/spark_base/pom.xml| 2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml| 2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml   | 2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml   | 2 +-
 docker/hoodie/hadoop/trinobase/pom.xml | 2 +-
 docker/hoodie/hadoop/trinocoordinator/pom.xml  | 2 +-
 docker/hoodie/hadoop/trinoworker/pom.xml   | 2 +-
 hudi-aws/pom.xml   | 4 ++--
 hudi-cli/pom.xml   | 2 +-
 hudi-client/hudi-client-common/pom.xml | 4 ++--
 hudi-client/hudi-flink-client/pom.xml  | 4 ++--
 hudi-client/hudi-java-client/pom.xml   | 4 ++--
 hudi-client/hudi-spark-client/pom.xml  | 4 ++--
 hudi-client/pom.xml| 2 +-
 hudi-common/pom.xml| 2 +-
 hudi-examples/hudi-examples-common/pom.xml | 2 +-
 hudi-examples/hudi-examples-flink/pom.xml  | 2 +-
 hudi-examples/hudi-examples-java/pom.xml   | 2 +-
 hudi-examples/hudi-examples-spark/pom.xml  | 2 +-
 hudi-examples/pom.xml  | 2 +-
 hudi-flink-datasource/hudi-flink/pom.xml   | 4 ++--
 hudi-flink-datasource/hudi-flink1.13.x/pom.xml | 4 ++--
 hudi-flink-datasource/hudi-flink1.14.x/pom.xml | 4 ++--
 hudi-flink-datasource/hudi-flink1.15.x/pom.xml | 4 ++--
 hudi-flink-datasource/pom.xml  | 4 ++--
 hudi-gcp/pom.xml   | 2 +-
 hudi-hadoop-mr/pom.xml | 2 +-
 hudi-integ-test/pom.xml| 2 +-
 hudi-kafka-connect/pom.xml | 4 ++--
 hudi-spark-datasource/hudi-spark-common/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark/pom.xml   | 4 ++--
 hudi-spark-datasource/hudi-spark2-common/pom.xml   | 2 +-
 hudi-spark-datasource/hudi-spark2/pom.xml  | 4 ++--
 hudi-spark-datasource/hudi-spark3-common/pom.xml   | 2 +-
 hudi-spark-datasource/hudi-spark3.1.x/pom.xml  | 4 ++--
 hudi-spark-datasource/hudi-spark3.2.x/pom.xml  | 4 ++--
 hudi-spark-datasource/hudi-spark3.2plus-common/pom.xml | 2 +-
 hudi-spark-datasource/hudi-spark3.3.x/pom.xml  | 4 ++--
 hudi-spark-datasource/pom.xml  | 2 +-
 hudi-sync/hudi-adb-sync/pom.xml| 2 +-
 hudi-sync/hudi-datahub-sync/pom.xml| 2 +-
 hudi-sync/hudi-hive-sync/pom.xml   | 2 +-
 hudi-sync/hudi-sync-common/pom.xml | 2 +-
 hudi-sync/pom.xml  | 2 +-
 hudi-tests-common/pom.xml  | 2 +-
 hudi-timeline-service/pom.xml  | 2 +-
 hudi-utilities/pom.xml | 2 +-
 packaging/hudi-aws-bundle/pom.xml  | 2 +-
 packaging/hudi-datahub-sync-bundle/pom.xml | 2 +-
 packaging/hudi-flink-bundle/pom.xml| 2 +-
 packaging/hudi-gcp-bundle/pom.xml  | 2 +-
 packaging/hudi-hadoop-mr-bundle/pom.xml| 2 +-
 packaging/hudi-hive-sync-bundle/pom.xml| 2 +-
 packaging/hudi-integ-test-bundle/pom.xml   | 2 +-
 packaging/hudi-kafka-connect-bundle/pom.xml| 2 +-
 packaging/hudi-presto-bundle/pom.xml   | 2 +-
 packaging/hudi-spark-bundle/pom.xml| 2 +-
 packaging/hudi-timeline-server-bundle/pom.xml  | 2 +-
 packaging/hudi-trino-bundle/pom.xml| 2 +-
 packaging/hudi-utilities-bundle/pom.xml| 2 +-
 packaging/hudi-utilities-slim-bundle/pom.xml   | 2 +-
 pom.xml| 2 +-
 70 files changed, 87 insertions(+), 87 deletions(-)

diff --git a/docker/hoodie/hadoop/base/pom.xml 
b/docker/hoodie/hadoop/base/pom.xml
index 39ceb4006b..8cbaa9fc06 100644
--- a/docker/hoodie/hadoop/base/pom.xml
+++ 

[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


hudi-bot commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272447816

   
   ## CI report:
   
   * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
   * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
   * f06e77aa268d70f0532bdaee53db7f9be660de39 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12073)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12080)
 
   * 3bc9f046410bead2b9f17a35e552c2a868d523c0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12082)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


hudi-bot commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272447239

   
   ## CI report:
   
   * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
   * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
   * f06e77aa268d70f0532bdaee53db7f9be660de39 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12073)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12080)
 
   * 3bc9f046410bead2b9f17a35e552c2a868d523c0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


hudi-bot commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272446498

   
   ## CI report:
   
   * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
   * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
   * f06e77aa268d70f0532bdaee53db7f9be660de39 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12073)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12080)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] slfan1989 commented on pull request #6893: [HUDI-4997] use jackson-v2 replace jackson-v1 import.

2022-10-08 Thread GitBox


slfan1989 commented on PR #6893:
URL: https://github.com/apache/hudi/pull/6893#issuecomment-1272438799

   @xushiyan Can you help review this pr? Thank you very much! This change 
avoids the use of jackson-v1 to reduce security risks.
   
   We can read this article below:
   
https://cowtowncoder.medium.com/on-jackson-cves-dont-panic-here-is-what-you-need-to-know-54cd0d6e8062
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full

2022-10-08 Thread GitBox


hudi-bot commented on PR #6284:
URL: https://github.com/apache/hudi/pull/6284#issuecomment-1272437525

   
   ## CI report:
   
   * 026dbfc7a6d4d7e489e8c8671a84e143bdb01758 UNKNOWN
   * 82dd925f9018c0ec3fb3bfaa09f70174010af90c Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12079)
 
   * a455e4c67d1ac237ef999ac8d6aa584af2f4cd1f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12081)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full

2022-10-08 Thread GitBox


xiarixiaoyao commented on code in PR #6284:
URL: https://github.com/apache/hudi/pull/6284#discussion_r990721773


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java:
##
@@ -92,11 +92,12 @@ protected HoodieMergedLogRecordScanner(FileSystem fs, 
String basePath, List(maxMemorySizeInBytes, 
spillableMapBasePath, new DefaultSizeEstimator(),
+  this.records = new ExternalSpillableMap<>(maxMemorySizeInBytes, basePath 
+ spillableMapBasePath, new DefaultSizeEstimator(),
   new HoodieRecordSizeEstimator(readerSchema), diskMapType, 
isBitCaskDiskMapCompressionEnabled);
+

Review Comment:
   basepath is  hdfs path,   spillableMapbase_path is local path,  
   It is wrong to use hdfs path directly as spillableMapbase_path 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full

2022-10-08 Thread GitBox


hudi-bot commented on PR #6284:
URL: https://github.com/apache/hudi/pull/6284#issuecomment-1272436538

   
   ## CI report:
   
   * 026dbfc7a6d4d7e489e8c8671a84e143bdb01758 UNKNOWN
   * 4b0a4e72766491e15dbeb8ed904c9aabae32bb89 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11563)
 
   * 82dd925f9018c0ec3fb3bfaa09f70174010af90c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12079)
 
   * a455e4c67d1ac237ef999ac8d6aa584af2f4cd1f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Aload commented on issue #6618: Caused by: org.apache.http.NoHttpResponseException: xxxxxx:34812 failed to respond[SUPPORT]

2022-10-08 Thread GitBox


Aload commented on issue #6618:
URL: https://github.com/apache/hudi/issues/6618#issuecomment-1272431470

   > @Aload can you verify if the patch is used in your version of hudi? and 
still having the problem?
   > 
   > > I have encountered this problem,this pr may solve your problem : #6393
   > 
   > in order to help diagnose, we need more info also to reproduce it. like 
configs and code snippet
   
   yes version 0.12.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full

2022-10-08 Thread GitBox


hudi-bot commented on PR #6284:
URL: https://github.com/apache/hudi/pull/6284#issuecomment-1272426439

   
   ## CI report:
   
   * 026dbfc7a6d4d7e489e8c8671a84e143bdb01758 UNKNOWN
   * 4b0a4e72766491e15dbeb8ed904c9aabae32bb89 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11563)
 
   * 82dd925f9018c0ec3fb3bfaa09f70174010af90c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12079)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


hudi-bot commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272425963

   
   ## CI report:
   
   * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
   * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
   * f06e77aa268d70f0532bdaee53db7f9be660de39 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12073)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12080)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


xiarixiaoyao commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272425949

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full

2022-10-08 Thread GitBox


hudi-bot commented on PR #6284:
URL: https://github.com/apache/hudi/pull/6284#issuecomment-1272425889

   
   ## CI report:
   
   * 026dbfc7a6d4d7e489e8c8671a84e143bdb01758 UNKNOWN
   * 4b0a4e72766491e15dbeb8ed904c9aabae32bb89 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11563)
 
   * 82dd925f9018c0ec3fb3bfaa09f70174010af90c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] suryaprasanna commented on a diff in pull request #5958: [HUDI-3900] [UBER] Support log compaction action for MOR tables

2022-10-08 Thread GitBox


suryaprasanna commented on code in PR #5958:
URL: https://github.com/apache/hudi/pull/5958#discussion_r985294717


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##
@@ -362,6 +381,228 @@ protected synchronized void scanInternal(Option 
keySpecOpt) {
 }
   }
 
+  private void scanInternalV2(Option keySpecOption, boolean 
skipProcessingBlocks) {
+currentInstantLogBlocks = new ArrayDeque<>();
+progress = 0.0f;
+totalLogFiles = new AtomicLong(0);
+totalRollbacks = new AtomicLong(0);
+totalCorruptBlocks = new AtomicLong(0);
+totalLogBlocks = new AtomicLong(0);
+totalLogRecords = new AtomicLong(0);
+HoodieLogFormatReader logFormatReaderWrapper = null;
+HoodieTimeline commitsTimeline = 
this.hoodieTableMetaClient.getCommitsTimeline();
+HoodieTimeline completedInstantsTimeline = 
commitsTimeline.filterCompletedInstants();
+HoodieTimeline inflightInstantsTimeline = 
commitsTimeline.filterInflights();
+try {
+
+  // Get the key field based on populate meta fields config
+  // and the table type
+  final String keyField = getKeyField();
+
+  boolean enableRecordLookups = !forceFullScan;
+  // Iterate over the paths
+  logFormatReaderWrapper = new HoodieLogFormatReader(fs,
+  logFilePaths.stream().map(logFile -> new HoodieLogFile(new 
Path(logFile))).collect(Collectors.toList()),
+  readerSchema, readBlocksLazily, reverseReader, bufferSize, 
enableRecordLookups, keyField, internalSchema);
+
+  /**
+   * Scanning log blocks and placing the compacted blocks at the right 
place require two traversals.
+   * First traversal to identify the rollback blocks and valid data and 
compacted blocks.
+   *
+   * Scanning blocks is easy to do in single writer mode, where the 
rollback block is right after the effected data blocks.
+   * With multiwriter mode the blocks can be out of sync. An example 
scenario.
+   * B1, B2, B3, B4, R1(B3), B5
+   * In this case, rollback block R1 is invalidating the B3 which is not 
the previous block.
+   * This becomes more complicated if we have compacted blocks, which are 
data blocks created using log compaction.
+   *
+   * To solve this, run a single traversal, collect all the valid blocks 
that are not corrupted
+   * along with the block instant times and rollback block's target 
instant times.
+   *
+   * As part of second traversal iterate block instant times in reverse 
order.
+   * While iterating in reverse order keep a track of final compacted 
instant times for each block.
+   * In doing so, when a data block is seen include the final compacted 
block if it is not already added.
+   *
+   * find the final compacted block which contains the merged contents.
+   * For example B1 and B2 are merged and created a compacted block called 
M1 and now M1, B3 and B4 are merged and
+   * created another compacted block called M2. So, now M2 is the final 
block which contains all the changes of B1,B2,B3,B4.
+   * So, blockTimeToCompactionBlockTimeMap will look like
+   * (B1 -> M2), (B2 -> M2), (B3 -> M2), (B4 -> M2), (M1 -> M2)
+   * This map is updated while iterating and is used to place the 
compacted blocks in the correct position.
+   * This way we can have multiple layers of merge blocks and still be 
able to find the correct positions of merged blocks.
+   */
+
+  // Collect targetRollbackInstants, using which we can determine which 
blocks are invalid.
+  Set targetRollbackInstants = new HashSet<>();
+
+  // This holds block instant time to list of blocks. Note here the log 
blocks can be normal data blocks or compacted log blocks.
+  Map> instantToBlocksMap = new HashMap<>();
+
+  // Order of Instants.
+  List orderedInstantsList = new ArrayList<>();
+
+  Set scannedLogFiles = new HashSet<>();
+
+  /*
+   * 1. First step to traverse in forward direction. While traversing the 
log blocks collect following,
+   *a. instant times
+   *b. instant to logblocks map.
+   *c. targetRollbackInstants.
+   */
+  while (logFormatReaderWrapper.hasNext()) {
+HoodieLogFile logFile = logFormatReaderWrapper.getLogFile();
+LOG.info("Scanning log file " + logFile);
+scannedLogFiles.add(logFile);
+totalLogFiles.set(scannedLogFiles.size());
+// Use the HoodieLogFileReader to iterate through the blocks in the 
log file
+HoodieLogBlock logBlock = logFormatReaderWrapper.next();
+final String instantTime = 
logBlock.getLogBlockHeader().get(INSTANT_TIME);
+totalLogBlocks.incrementAndGet();
+// Ignore the corrupt blocks. No further handling is required for them.
+if (logBlock.getBlockType().equals(CORRUPT_BLOCK)) {
+  LOG.info("Found a corrupt block in " + logFile.getPath());
+  

[GitHub] [hudi] hudi-bot commented on pull request #6896: [HUDI-4975] Fix datahub bundle dependency

2022-10-08 Thread GitBox


hudi-bot commented on PR #6896:
URL: https://github.com/apache/hudi/pull/6896#issuecomment-1272399503

   
   ## CI report:
   
   * 1e185f00b79069df14222048fb0b7b834292d2c6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12077)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-10-08 Thread GitBox


hudi-bot commented on PR #6358:
URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272399393

   
   ## CI report:
   
   * 288d166c49602a4593b1e97763a467811903737d UNKNOWN
   * d5266737aed5cee1b62592371219d944312c06b4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12078)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] pratyakshsharma commented on pull request #5071: [HUDI-1881]: draft implementation for trigger based on data availability

2022-10-08 Thread GitBox


pratyakshsharma commented on PR #5071:
URL: https://github.com/apache/hudi/pull/5071#issuecomment-1272398411

   @nsivabalan please take a pass, this should be good to review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6871: Bump protobuf-java from 3.21.5 to 3.21.7

2022-10-08 Thread GitBox


hudi-bot commented on PR #6871:
URL: https://github.com/apache/hudi/pull/6871#issuecomment-1272388964

   
   ## CI report:
   
   * efdbd9edebed1d540916f981722038f24d9c7266 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12076)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-10-08 Thread GitBox


hudi-bot commented on PR #6358:
URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272378288

   
   ## CI report:
   
   * 288d166c49602a4593b1e97763a467811903737d UNKNOWN
   * 18ef7b44488dff256728b2bba024b4a4d00aebe9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12064)
 
   * d5266737aed5cee1b62592371219d944312c06b4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12078)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5071: [HUDI-1881]: draft implementation for trigger based on data availability

2022-10-08 Thread GitBox


hudi-bot commented on PR #5071:
URL: https://github.com/apache/hudi/pull/5071#issuecomment-1272378107

   
   ## CI report:
   
   * b7203e6d2d6f1e8d3121024faedfa2da1ccc0c71 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7088)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5071: [HUDI-1881]: draft implementation for trigger based on data availability

2022-10-08 Thread GitBox


hudi-bot commented on PR #5071:
URL: https://github.com/apache/hudi/pull/5071#issuecomment-1272377275

   
   ## CI report:
   
   * b7203e6d2d6f1e8d3121024faedfa2da1ccc0c71 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7088)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] pratyakshsharma commented on pull request #5071: [HUDI-1881]: draft implementation for trigger based on data availability

2022-10-08 Thread GitBox


pratyakshsharma commented on PR #5071:
URL: https://github.com/apache/hudi/pull/5071#issuecomment-1272374574

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-10-08 Thread GitBox


hudi-bot commented on PR #6358:
URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272367848

   
   ## CI report:
   
   * 288d166c49602a4593b1e97763a467811903737d UNKNOWN
   * 18ef7b44488dff256728b2bba024b4a4d00aebe9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12064)
 
   * d5266737aed5cee1b62592371219d944312c06b4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6896: [HUDI-4975] Fix datahub bundle dependency

2022-10-08 Thread GitBox


hudi-bot commented on PR #6896:
URL: https://github.com/apache/hudi/pull/6896#issuecomment-1272366290

   
   ## CI report:
   
   * 97406bce1fcbf575139682cd0659fa154fbb214f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12074)
 
   * 1e185f00b79069df14222048fb0b7b834292d2c6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12077)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6871: Bump protobuf-java from 3.21.5 to 3.21.7

2022-10-08 Thread GitBox


hudi-bot commented on PR #6871:
URL: https://github.com/apache/hudi/pull/6871#issuecomment-1272366267

   
   ## CI report:
   
   * 050ce213e4faa481abafff1f9127bd91753f2d6d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11996)
 
   * efdbd9edebed1d540916f981722038f24d9c7266 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12076)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6896: [HUDI-4975] Fix datahub bundle dependency

2022-10-08 Thread GitBox


hudi-bot commented on PR #6896:
URL: https://github.com/apache/hudi/pull/6896#issuecomment-1272365457

   
   ## CI report:
   
   * 97406bce1fcbf575139682cd0659fa154fbb214f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12074)
 
   * 1e185f00b79069df14222048fb0b7b834292d2c6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6871: Bump protobuf-java from 3.21.5 to 3.21.7

2022-10-08 Thread GitBox


hudi-bot commented on PR #6871:
URL: https://github.com/apache/hudi/pull/6871#issuecomment-1272365434

   
   ## CI report:
   
   * 050ce213e4faa481abafff1f9127bd91753f2d6d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11996)
 
   * efdbd9edebed1d540916f981722038f24d9c7266 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


hudi-bot commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272364193

   
   ## CI report:
   
   * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
   * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
   * f06e77aa268d70f0532bdaee53db7f9be660de39 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12073)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch dependabot/maven/com.google.protobuf-protobuf-java-3.21.7 updated (050ce213e4 -> efdbd9edeb)

2022-10-08 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/com.google.protobuf-protobuf-java-3.21.7
in repository https://gitbox.apache.org/repos/asf/hudi.git


 discard 050ce213e4 Bump protobuf-java from 3.21.5 to 3.21.7
 add 48e5bb0fed [HOTFIX] Fix source release validate script (#6865)
 add 9f5d16529d [HUDI-4980] Calculate avg record size using commit only 
(#6864)
 add 067cc24d88 Revert "[HUDI-4915] improve avro serializer/deserializer 
(#6788)" (#6809)
 add fb4f026580 [HUDI-4970] Update kafka-connect readme and refactor 
HoodieConfig#create (#6857)
 add 280194d3b6 Enhancing README for multi-writer tests (#6870)
 add fd8a947e61 [MINOR] Fix deploy script for flink 1.15 (#6872)
 add a51181726c [HUDI-4992] Fixing invalid min/max record key stats in 
Parquet metadata (#6883)
 add c5125d38b5 [HUDI-4972] Fixes to make unit tests work on m1 mac (#6751)
 add 06d924137b [HUDI-2786] Docker demo on mac aarch64 (#6859)
 add 9c1fa14fd6 add support for unraveling proto schemas
 add 510d525e15 fix some compile issues
 add aad9ec1320 naming and style updates
 add 889927 make test data random, reuse code
 add a922a5beca add test for 2 different recursion depths, fix schema cache 
key
 add 3b37dc95d9 add unsigned long support
 add 706291d4f3 better handle other types
 add c28e874fca rebase on 4904
 add 190cc16381 get all tests working
 add f18fff886e fix oneof expected schema, update tests after rebase
 add ff5baa8706 revert scala binary change
 add 0069da2d1a try a different method to avoid avro version
 add 71a39bf488 Merge remote-tracking branch 'origin/master' into HUDI-4905
 add c5dff63375 delete unused file
 add f53d47ea3b address PR feedback, update decimal precision
 add 1831639e39 fix isNullable issue, check if class is Int64value
 add eca2992d65 checkstyle fix
 add 423da6f7bb change wrapper descriptor set initialization
 add fb2d9f0030 add in testing for unsigned long to BigInteger conversion
 add f03f9610cf shade protobuf dependency
 add 57f8b81194 Merge remote-tracking branch 'origin/master' into HUDI-4905
 add 7d5b9dc0a9 Revert "shade protobuf dependency"
 add 5d2c2853ea [HUDI-4905] Improve type handling in proto schema conversion
 add 182475a854 [HUDI-4971] Fix shading kryo-shaded with reusing configs 
(#6873)
 add efdbd9edeb Bump protobuf-java from 3.21.5 to 3.21.7

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (050ce213e4)
\
 N -- N -- N   
refs/heads/dependabot/maven/com.google.protobuf-protobuf-java-3.21.7 
(efdbd9edeb)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 ...ose_hadoop284_hive233_spark244_mac_aarch64.yml} | 131 +++
 docker/setup_demo.sh   |  10 +-
 docker/stop_demo.sh|   7 +-
 .../cli/commands/TestUpgradeDowngradeCommand.java  |   6 +-
 .../apache/hudi/io/storage/HoodieOrcWriter.java|  10 +-
 .../hudi/avro/TestHoodieAvroParquetWriter.java | 118 ++
 .../hudi/io/storage/TestHoodieOrcReaderWriter.java |   7 +-
 .../row/HoodieRowDataParquetWriteSupport.java  |  55 ++---
 .../storage/row/HoodieRowParquetWriteSupport.java  |  61 +++---
 .../table/action/commit/UpsertPartitioner.java |  16 +-
 .../row/TestHoodieInternalRowParquetWriter.java|  95 
 .../hudi/table/upgrade/TestUpgradeDowngrade.java   |   6 +-
 .../apache/hudi/avro/HoodieAvroWriteSupport.java   |  60 +++--
 .../hudi/avro/HoodieBloomFilterWriteSupport.java   |  96 
 .../apache/hudi/common/config/HoodieConfig.java|   9 +-
 .../org/apache/hudi/common/util/BaseFileUtils.java |  13 +-
 .../hudi/avro/TestHoodieAvroWriteSupport.java  |  67 --
 hudi-examples/hudi-examples-java/pom.xml   |   6 +
 hudi-integ-test/README.md  |  52 -
 hudi-kafka-connect/README.md   |  11 +-
 .../TestUpgradeOrDowngradeProcedure.scala  |   5 +-
 .../apache/spark/sql/avro/AvroDeserializer.scala   |  20 +-
 .../org/apache/spark/sql/avro/AvroSerializer.scala |  17 +-
 .../apache/spark/sql/avro/AvroDeserializer.scala   |  20 +-
 .../org/apache/spark/sql/avro/AvroSerializer.scala |  19 +-
 .../apache/spark/sql/avro/AvroDeserializer.scala   |  20 +-
 

[hudi] branch master updated: [HUDI-4971] Fix shading kryo-shaded with reusing configs (#6873)

2022-10-08 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 182475a854 [HUDI-4971] Fix shading kryo-shaded with reusing configs 
(#6873)
182475a854 is described below

commit 182475a8548c6174bb21999e4b55003e9854da3c
Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Sun Oct 9 00:50:25 2022 +0800

[HUDI-4971] Fix shading kryo-shaded with reusing configs (#6873)
---
 packaging/hudi-aws-bundle/pom.xml | 16 ++--
 packaging/hudi-datahub-sync-bundle/pom.xml| 16 ++--
 packaging/hudi-flink-bundle/pom.xml   | 19 ++-
 packaging/hudi-gcp-bundle/pom.xml | 16 ++--
 packaging/hudi-hadoop-mr-bundle/pom.xml   | 19 ++-
 packaging/hudi-hive-sync-bundle/pom.xml   | 19 ++-
 packaging/hudi-integ-test-bundle/pom.xml  | 19 ++-
 packaging/hudi-kafka-connect-bundle/pom.xml   |  7 ++
 packaging/hudi-presto-bundle/pom.xml  | 20 ++-
 packaging/hudi-spark-bundle/pom.xml   |  4 +--
 packaging/hudi-timeline-server-bundle/pom.xml |  7 ++
 packaging/hudi-trino-bundle/pom.xml   | 19 ++-
 packaging/hudi-utilities-bundle/pom.xml   |  4 +--
 packaging/hudi-utilities-slim-bundle/pom.xml  |  4 +--
 pom.xml   | 35 +++
 15 files changed, 63 insertions(+), 161 deletions(-)

diff --git a/packaging/hudi-aws-bundle/pom.xml 
b/packaging/hudi-aws-bundle/pom.xml
index 61aea395ed..75e13ff5f9 100644
--- a/packaging/hudi-aws-bundle/pom.xml
+++ b/packaging/hudi-aws-bundle/pom.xml
@@ -71,7 +71,7 @@
 
 
 
-
+
 
org.apache.hudi:hudi-common
 
org.apache.hudi:hudi-hadoop-mr
 
org.apache.hudi:hudi-sync-common
@@ -102,15 +102,7 @@
 org.openjdk.jol:jol-core
 
 
-
-
-
com.esotericsoftware.kryo.
-
org.apache.hudi.com.esotericsoftware.kryo.
-
-
-
com.esotericsoftware.minlog.
-
org.apache.hudi.com.esotericsoftware.minlog.
-
+
 
 com.beust.jcommander.
 
org.apache.hudi.com.beust.jcommander.
@@ -134,10 +126,6 @@
 org.apache.htrace.
 
org.apache.hudi.org.apache.htrace.
 
-
-org.objenesis.
-
org.apache.hudi.org.objenesis.
-
 
 com.amazonaws.
 
org.apache.hudi.com.amazonaws.
diff --git a/packaging/hudi-datahub-sync-bundle/pom.xml 
b/packaging/hudi-datahub-sync-bundle/pom.xml
index 7425631181..2bae25239d 100644
--- a/packaging/hudi-datahub-sync-bundle/pom.xml
+++ b/packaging/hudi-datahub-sync-bundle/pom.xml
@@ -67,7 +67,7 @@
 
   
   
-
+
   org.apache.hudi:hudi-common
   org.apache.hudi:hudi-hadoop-mr
   org.apache.hudi:hudi-sync-common
@@ -98,15 +98,7 @@
   org.openjdk.jol:jol-core
 
   
-  
-
-  com.esotericsoftware.kryo.
-  
org.apache.hudi.com.esotericsoftware.kryo.
-
-
-  com.esotericsoftware.minlog.
-  
org.apache.hudi.com.esotericsoftware.minlog.
-
+  
 
   org.apache.commons.io.
   
org.apache.hudi.org.apache.commons.io.
@@ -126,10 +118,6 @@
   org.apache.htrace.
   
org.apache.hudi.org.apache.htrace.
 
-
-  org.objenesis.
-  org.apache.hudi.org.objenesis.
-
 
   org.openjdk.jol.
   
org.apache.hudi.org.openjdk.jol.
diff --git a/packaging/hudi-flink-bundle/pom.xml 

[GitHub] [hudi] xushiyan commented on pull request #6873: [HUDI-4971] Fix shading kryo-shaded with re-usable configs

2022-10-08 Thread GitBox


xushiyan commented on PR #6873:
URL: https://github.com/apache/hudi/pull/6873#issuecomment-1272356929

   tested a few bundles including datahub sync, aws, utilities-slim+spark. 
working ok. the original issue is resolved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan merged pull request #6873: [HUDI-4971] Fix shading kryo-shaded with re-usable configs

2022-10-08 Thread GitBox


xushiyan merged PR #6873:
URL: https://github.com/apache/hudi/pull/6873


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6896: [HUDI-4975] Fix datahub bundle dependency

2022-10-08 Thread GitBox


hudi-bot commented on PR #6896:
URL: https://github.com/apache/hudi/pull/6896#issuecomment-1272353354

   
   ## CI report:
   
   * 97406bce1fcbf575139682cd0659fa154fbb214f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12074)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6896: [HUDI-4975] Fix datahub bundle dependency

2022-10-08 Thread GitBox


hudi-bot commented on PR #6896:
URL: https://github.com/apache/hudi/pull/6896#issuecomment-1272352483

   
   ## CI report:
   
   * 97406bce1fcbf575139682cd0659fa154fbb214f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6845: [HUDI-4945] Add a test case for batch clean.

2022-10-08 Thread GitBox


hudi-bot commented on PR #6845:
URL: https://github.com/apache/hudi/pull/6845#issuecomment-1272351581

   
   ## CI report:
   
   * a3851570e4d4e07ebc53bf67934829051802da04 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12072)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #6891: [MINOR][DOCS] update committer list

2022-10-08 Thread GitBox


xushiyan commented on PR #6891:
URL: https://github.com/apache/hudi/pull/6891#issuecomment-1272344006

   @YannByron you can land this yourself :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated: [MINOR] Update committer list (#6891)

2022-10-08 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 82bbc2ce26 [MINOR] Update committer list (#6891)
82bbc2ce26 is described below

commit 82bbc2ce2675f29deb9171365de063079790ff1a
Author: Yann Byron 
AuthorDate: Sat Oct 8 23:35:51 2022 +0800

[MINOR] Update committer list (#6891)
---
 website/community/team.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/website/community/team.md b/website/community/team.md
index 062a6d2cd8..277122aaca 100644
--- a/website/community/team.md
+++ b/website/community/team.md
@@ -35,6 +35,7 @@ last_modified_at: 2020-09-01T15:59:57-04:00
 | https://avatars.githubusercontent.com/lw309637554"} alt="liway" 
className="profile-pic" align="middle" /> | [Wei 
Li](https://github.com/lw309637554)   | Committer | liway|
 | https://avatars.githubusercontent.com/zhedoubushishi"} 
className="profile-pic" alt="zhedoubushishi" /> | [Wenning 
Ding](https://github.com/zhedoubushishi)   | Committer  
 | wenningd |
 | https://avatars.githubusercontent.com/wangxianghu"} 
alt="wangxianghu" className="profile-pic" align="middle" /> | [Xianghu 
Wang](https://github.com/wangxianghu)   | Committer | wangxianghu|
+| https://avatars.githubusercontent.com/YannByron"} 
className="profile-pic" alt="Yann Byron" align="middle" /> | [Yann 
Byron](https://github.com/YannByron) | Committer   | 
biyan   |
 | https://avatars.githubusercontent.com/pengzhiwei2018"} 
className="profile-pic" alt="pengzhiwei2018" align="middle" /> | [Zhiwei 
Peng](https://github.com/pengzhiwei2018)  | Committer   
| zhiwei|
 | https://avatars.githubusercontent.com/xiarixiaoyao"} 
className="profile-pic" alt="xiarixiaoyao" align="middle" /> | [Tao 
Meng](https://github.com/xiarixiaoyao)  | Committer   | 
mengtao|
 | https://avatars.githubusercontent.com/yuzhaojing"} 
className="profile-pic" alt="yuzhaojing" align="middle" /> | [Zhaojing 
Yu](https://github.com/yuzhaojing)  | Committer   | 
yuzhaojing|



[GitHub] [hudi] xushiyan merged pull request #6891: [MINOR][DOCS] update committer list

2022-10-08 Thread GitBox


xushiyan merged PR #6891:
URL: https://github.com/apache/hudi/pull/6891


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4975) datahub sync bundle causes class loading issue

2022-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4975:
-
Labels: pull-request-available  (was: )

> datahub sync bundle causes class loading issue
> --
>
> Key: HUDI-4975
> URL: https://issues.apache.org/jira/browse/HUDI-4975
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> run utilities-slim.jar as the main jar for deltastreamer
> set --jars 
> /tmp/hudi-datahub-sync-bundle-0.12.1-rc1.jar,/tmp/hudi-spark3.1-bundle_2.12-0.12.1-rc1.jar
> put datahub sync bundle before spark bundle resulted in class loader issue. 
> works fine if spark bundle goes first
> {code:bash}
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/parquet/schema/LogicalTypeAnnotation
>   at 
> org.apache.hudi.io.storage.HoodieFileWriterFactory.newParquetFileWriter(HoodieFileWriterFactory.java:78)
>   at 
> org.apache.hudi.io.storage.HoodieFileWriterFactory.newParquetFileWriter(HoodieFileWriterFactory.java:70)
>   at 
> org.apache.hudi.io.storage.HoodieFileWriterFactory.getFileWriter(HoodieFileWriterFactory.java:54)
>   at 
> org.apache.hudi.io.HoodieCreateHandle.(HoodieCreateHandle.java:104)
>   at 
> org.apache.hudi.io.HoodieCreateHandle.(HoodieCreateHandle.java:76)
>   at 
> org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:46)
>   at 
> org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:83)
>   at 
> org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:40)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:135)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   ... 3 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.parquet.schema.LogicalTypeAnnotation
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xushiyan opened a new pull request, #6896: [HUDI-4975] Fix datahub bundle dependency

2022-10-08 Thread GitBox


xushiyan opened a new pull request, #6896:
URL: https://github.com/apache/hudi/pull/6896

   ### Change Logs
   
   - Make parquet-avro and avro scope provided in datahub bundle
   - 
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


hudi-bot commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272338806

   
   ## CI report:
   
   * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
   * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
   * b9b24c49718554e2263e07967fbcabbb3523a1c1 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12071)
 
   * f06e77aa268d70f0532bdaee53db7f9be660de39 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12073)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6745: Fix comment in RFC46

2022-10-08 Thread GitBox


hudi-bot commented on PR #6745:
URL: https://github.com/apache/hudi/pull/6745#issuecomment-1272337693

   
   ## CI report:
   
   * 466535c2d2984fd57c471bb6127edc507d48d0b1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12070)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


hudi-bot commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272327704

   
   ## CI report:
   
   * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
   * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
   * b9b24c49718554e2263e07967fbcabbb3523a1c1 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12071)
 
   * f06e77aa268d70f0532bdaee53db7f9be660de39 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


hudi-bot commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272326567

   
   ## CI report:
   
   * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
   * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
   * f3e44c648063cc4da5198c5be5256d326511b304 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12066)
 
   * b9b24c49718554e2263e07967fbcabbb3523a1c1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12071)
 
   * f06e77aa268d70f0532bdaee53db7f9be660de39 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6895: [MINOR] Fix name spelling for RunBootstrapProcedure

2022-10-08 Thread GitBox


hudi-bot commented on PR #6895:
URL: https://github.com/apache/hudi/pull/6895#issuecomment-1272324593

   
   ## CI report:
   
   * ce39faf7390aee37e4b00798c8dda25ab581e273 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12068)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency

2022-10-08 Thread GitBox


hudi-bot commented on PR #5416:
URL: https://github.com/apache/hudi/pull/5416#issuecomment-1272324223

   
   ## CI report:
   
   * b838e1f406902c9bdfb5e84d53ef5a5effd0765b UNKNOWN
   * 6114ee2aa59f087e5ef0b1b53979eec143b33f5e UNKNOWN
   * 92760dbf5a047fe1f9941fa4b36c944eb3bec5c7 UNKNOWN
   * 447cb4510301af1c3ff1aebb3bd0a668872fc3f6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12069)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6845: [HUDI-4945] Add a test case for batch clean.

2022-10-08 Thread GitBox


hudi-bot commented on PR #6845:
URL: https://github.com/apache/hudi/pull/6845#issuecomment-1272310606

   
   ## CI report:
   
   * f368eed82d5140142889b7853597a66770e99886 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11942)
 
   * a3851570e4d4e07ebc53bf67934829051802da04 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12072)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


hudi-bot commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272310556

   
   ## CI report:
   
   * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
   * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
   * f3e44c648063cc4da5198c5be5256d326511b304 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12066)
 
   * b9b24c49718554e2263e07967fbcabbb3523a1c1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12071)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6845: [HUDI-4945] Add a test case for batch clean.

2022-10-08 Thread GitBox


hudi-bot commented on PR #6845:
URL: https://github.com/apache/hudi/pull/6845#issuecomment-1272308678

   
   ## CI report:
   
   * f368eed82d5140142889b7853597a66770e99886 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11942)
 
   * a3851570e4d4e07ebc53bf67934829051802da04 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


hudi-bot commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272308653

   
   ## CI report:
   
   * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
   * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
   * f3e44c648063cc4da5198c5be5256d326511b304 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12066)
 
   * b9b24c49718554e2263e07967fbcabbb3523a1c1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6893: [HUDI-4997] use jackson-v2 replace jackson-v1 import.

2022-10-08 Thread GitBox


hudi-bot commented on PR #6893:
URL: https://github.com/apache/hudi/pull/6893#issuecomment-1272307669

   
   ## CI report:
   
   * 0f78ff5e81d51f2972bba066804a315bb23dbe12 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12067)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


hudi-bot commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272307615

   
   ## CI report:
   
   * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
   * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
   * f3e44c648063cc4da5198c5be5256d326511b304 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12066)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] LinMingQiang commented on a diff in pull request #6845: [HUDI-4945] Add a test case for batch clean.

2022-10-08 Thread GitBox


LinMingQiang commented on code in PR #6845:
URL: https://github.com/apache/hudi/pull/6845#discussion_r990634609


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java:
##
@@ -96,9 +96,10 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context 
context) {
   pipeline = Pipelines.hoodieStreamWrite(conf, hoodieRecordDataStream);
   // compaction
   if (OptionsResolver.needsAsyncCompaction(conf)) {
-// use synchronous compaction for bounded source.
+// use synchronous compaction and clean for bounded source.
 if (context.isBounded()) {
   conf.setBoolean(FlinkOptions.COMPACTION_ASYNC_ENABLED, false);
+  conf.setBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED, false);
 }

Review Comment:
   ok.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6745: Fix comment in RFC46

2022-10-08 Thread GitBox


hudi-bot commented on PR #6745:
URL: https://github.com/apache/hudi/pull/6745#issuecomment-1272297073

   
   ## CI report:
   
   * 4c78db48d9e86c620f0824fe1438a1d151100d98 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12041)
 
   * 466535c2d2984fd57c471bb6127edc507d48d0b1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12070)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency

2022-10-08 Thread GitBox


hudi-bot commented on PR #5416:
URL: https://github.com/apache/hudi/pull/5416#issuecomment-1272296814

   
   ## CI report:
   
   * b838e1f406902c9bdfb5e84d53ef5a5effd0765b UNKNOWN
   * 6114ee2aa59f087e5ef0b1b53979eec143b33f5e UNKNOWN
   * 92760dbf5a047fe1f9941fa4b36c944eb3bec5c7 UNKNOWN
   * 4ba91d4ce8345b4917e1f402694a55d07bf2951c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12047)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12052)
 
   * 447cb4510301af1c3ff1aebb3bd0a668872fc3f6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12069)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6745: Fix comment in RFC46

2022-10-08 Thread GitBox


hudi-bot commented on PR #6745:
URL: https://github.com/apache/hudi/pull/6745#issuecomment-1272296201

   
   ## CI report:
   
   * 4c78db48d9e86c620f0824fe1438a1d151100d98 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12041)
 
   * 466535c2d2984fd57c471bb6127edc507d48d0b1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency

2022-10-08 Thread GitBox


hudi-bot commented on PR #5416:
URL: https://github.com/apache/hudi/pull/5416#issuecomment-1272295990

   
   ## CI report:
   
   * b838e1f406902c9bdfb5e84d53ef5a5effd0765b UNKNOWN
   * 6114ee2aa59f087e5ef0b1b53979eec143b33f5e UNKNOWN
   * 92760dbf5a047fe1f9941fa4b36c944eb3bec5c7 UNKNOWN
   * 4ba91d4ce8345b4917e1f402694a55d07bf2951c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12047)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12052)
 
   * 447cb4510301af1c3ff1aebb3bd0a668872fc3f6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex

2022-10-08 Thread GitBox


hudi-bot commented on PR #6680:
URL: https://github.com/apache/hudi/pull/6680#issuecomment-1272295291

   
   ## CI report:
   
   * 5f6d4f624c5f20cf3c4c38384e17c7bb13e56991 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12065)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency

2022-10-08 Thread GitBox


zhangyue19921010 commented on PR #5416:
URL: https://github.com/apache/hudi/pull/5416#issuecomment-1272290024

   > Do we have tests for DistruptorProducers? I found tests only for 
DistruptorExecutor and DistruptorMessageQueue.
   > 
   > Also, do we have tests to test out single producer and multiple producers? 
and can you summarize what kind of error cases have been tested. 1 producer 
thread crashing while others are still continuing to produce, memory too low to 
hold all produced records, but still no records should be dropped, etc.
   
   Hi @nsivabalan Thanks a lot for reminding. Add more tests here including:
   1. `TestBoundedInMemoryExecutorInSpark#testExecutor` ==> test common 
disruptor executor ingestion
   2. `TestBoundedInMemoryExecutorInSpark#testInterruptExecutor` ==> test 
disruptor executor ingestion with interrupt
   3. `TestDisruptorMessageQueue#testRecordReading` ==> test common single 
producer and single consumer reading
   4. `TestDisruptorMessageQueue#testCompositeProducerRecordReading` ==> test 
multi-producers and single consumer
   5. `TestDisruptorMessageQueue#testException` ==> test multi-producers which 
one producer thread crashing while others are still continuing to produce.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6895: [MINOR] Fix name spelling for RunBootstrapProcedure

2022-10-08 Thread GitBox


hudi-bot commented on PR #6895:
URL: https://github.com/apache/hudi/pull/6895#issuecomment-1272283945

   
   ## CI report:
   
   * ce39faf7390aee37e4b00798c8dda25ab581e273 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12068)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6893: [HUDI-4997] use jackson-v2 replace jackson-v1 import.

2022-10-08 Thread GitBox


hudi-bot commented on PR #6893:
URL: https://github.com/apache/hudi/pull/6893#issuecomment-1272283929

   
   ## CI report:
   
   * 0f78ff5e81d51f2972bba066804a315bb23dbe12 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12067)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


hudi-bot commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272283853

   
   ## CI report:
   
   * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
   * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
   * ca9b8fb8950e382908469a40724fddff88aa60d0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11945)
 
   * f3e44c648063cc4da5198c5be5256d326511b304 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12066)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6895: [MINOR] Fix name spelling for RunBootstrapProcedure

2022-10-08 Thread GitBox


hudi-bot commented on PR #6895:
URL: https://github.com/apache/hudi/pull/6895#issuecomment-1272282750

   
   ## CI report:
   
   * ce39faf7390aee37e4b00798c8dda25ab581e273 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6893: [HUDI-4997] use jackson-v2 replace jackson-v1 import.

2022-10-08 Thread GitBox


hudi-bot commented on PR #6893:
URL: https://github.com/apache/hudi/pull/6893#issuecomment-1272282740

   
   ## CI report:
   
   * 0f78ff5e81d51f2972bba066804a315bb23dbe12 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6741: [HUDI-4898] presto/hive respect payload during merge parquet file and logfile when reading mor table

2022-10-08 Thread GitBox


hudi-bot commented on PR #6741:
URL: https://github.com/apache/hudi/pull/6741#issuecomment-1272282682

   
   ## CI report:
   
   * bff3acafde6d8a1bd5574b90ce644ef30acbf0a2 UNKNOWN
   * 79a6a0be9c0f9a5aaabb36857b5b68adc5cb9522 UNKNOWN
   * ca9b8fb8950e382908469a40724fddff88aa60d0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11945)
 
   * f3e44c648063cc4da5198c5be5256d326511b304 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-10-08 Thread GitBox


hudi-bot commented on PR #6358:
URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272282554

   
   ## CI report:
   
   * 288d166c49602a4593b1e97763a467811903737d UNKNOWN
   * 18ef7b44488dff256728b2bba024b4a4d00aebe9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12064)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] boneanxs commented on pull request #6895: [MINOR] Fix name spelling for RunBootstrapProcedure

2022-10-08 Thread GitBox


boneanxs commented on PR #6895:
URL: https://github.com/apache/hudi/pull/6895#issuecomment-1272280375

   Hi, @XuQianJin-Stars a minor name fix for RunBootstrapProcedure, can you 
help to review it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] boneanxs opened a new pull request, #6895: [MINOR] Fix name spelling for RunBootstrapProcedure

2022-10-08 Thread GitBox


boneanxs opened a new pull request, #6895:
URL: https://github.com/apache/hudi/pull/6895

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: low**
   
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] gubinjie opened a new issue, #6894: [SUPPORT]Error running child : java.lang.NoSuchMethodError: org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageTy

2022-10-08 Thread GitBox


gubinjie opened a new issue, #6894:
URL: https://github.com/apache/hudi/issues/6894

   CDH 6.3.2
   Hudi 0.10.1
   
   When querying a Hudi table through Hive, I get the following error:
   select * from hudi_flink_tyc_company_rt where name = '3213'
   
   `2022-10-08 16:30:27,365 INFO [main] 
org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. 
Instead, use dfs.metrics.session-id
   2022-10-08 16:30:27,661 INFO [main] org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorProcessTree : [ ]
   2022-10-08 16:30:27,819 INFO [main] org.apache.hadoop.mapred.MapTask: 
Processing split: 
HoodieCombineRealtimeFileSplit{realtimeFileSplits=[HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/5652dad0-9e32-43f5-99c4-eff0a89c6a79_0-1-0_20220929181835942.parquet,
 
deltaLogPaths=[hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.5652dad0-9e32-43f5-99c4-eff0a89c6a79_20220929181835942.log.1_0-1-0],
 maxCommitTime='20220929190955221', 
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}, 
HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/6f026a25-797e-4a8b-9382-b426b94fd034_0-1-0_20220929181835942.parquet,
 
deltaLogPaths=[hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.6f026a25-797e-4a8b-9382-b426b94fd034_20220929181835942.log.1_0-1-0],
 maxCommitTime='20220929190955221', 
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}, 
HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehou
 
se/paat_ods_hudi.db/4f90e72d-d205-4640-975f-09ebb2ad136a_0-1-0_20220929180105887.parquet,
 deltaLogPaths=[], maxCommitTime='20220929190955221', 
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}, 
HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/b72b41e5-7bd9-4a87-a91d-86a368a2f7b7_0-1-0_20220929181835942.parquet,
 deltaLogPaths=[], maxCommitTime='20220929190955221', 
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}]}InputFormatClass:
 org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat
   
   2022-10-08 16:30:27,873 INFO [main] org.apache.hadoop.hive.conf.HiveConf: 
Found configuration file null
   2022-10-08 16:30:27,980 INFO [main] 
org.apache.hadoop.hive.ql.exec.SerializationUtilities: Deserializing MapWork 
using kryo
   2022-10-08 16:30:28,110 INFO [main] 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat: Before adding 
Hoodie columns, Projections 
:_hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name,company_id,company_name,legal_person_name,establish_time,reg_capital,reg_status,reg_number,org_number,credit_code,reg_location,phone_num,province_code,city_code,district_code,province,city,district,company_type,tax_code,category_code_std,social_security_staff_num,update_time,
 Ids :0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
   2022-10-08 16:30:28,110 INFO [main] 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat: Creating 
record reader with readCols 
:_hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name,company_id,company_name,legal_person_name,establish_time,reg_capital,reg_status,reg_number,org_number,credit_code,reg_location,phone_num,province_code,city_code,district_code,province,city,district,company_type,tax_code,category_code_std,social_security_staff_num,update_time,
 Ids :0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
   2022-10-08 16:30:28,361 INFO [main] 
org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. 
Instead, use mapreduce.task.attempt.id
   2022-10-08 16:30:28,366 ERROR [main] 
org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due 
to context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
   2022-10-08 16:30:28,390 INFO [main] 
org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized 
will read a total of 44225 records.
   2022-10-08 16:30:28,390 INFO [main] 
org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next 
block
   2022-10-08 16:30:28,412 INFO [main] 
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & 
initialized native-zlib library
   2022-10-08 16:30:28,413 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.gz]
   2022-10-08 16:30:28,418 INFO [main] 
org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 
28 ms. row count = 44225
   2022-10-08 16:30:28,565 INFO [main] 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader: Enabling merged 
reading of realtime records for split 
HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/5652dad0-9e32-43f5-99c4-eff0a89c6a79_0-1-0_20220929181835942.parquet,
 

[jira] [Updated] (HUDI-4997) use jackson-v2 replace jackson-v1 import

2022-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4997:
-
Labels: pull-request-available  (was: )

> use jackson-v2 replace jackson-v1 import
> 
>
> Key: HUDI-4997
> URL: https://issues.apache.org/jira/browse/HUDI-4997
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: fanshilun
>Priority: Major
>  Labels: pull-request-available
>
> HoodieWriteCommitCallbackUtil uses ObjectMapper, but uses jackson-v1 import, 
> jackson-v1 has security risks, replace import with jackson-v2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] slfan1989 opened a new pull request, #6893: HUDI-4997: use jackson-v2 replace jackson-v1 import.

2022-10-08 Thread GitBox


slfan1989 opened a new pull request, #6893:
URL: https://github.com/apache/hudi/pull/6893

   JIRA: HUDI-4997: use jackson-v2 replace jackson-v1 import.
   
   HoodieWriteCommitCallbackUtil uses ObjectMapper, but uses jackson-v1 import, 
jackson-v1 has security risks, replace import with jackson-v2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-4997) use jackson-v2 replace jackson-v1 import

2022-10-08 Thread fanshilun (Jira)
fanshilun created HUDI-4997:
---

 Summary: use jackson-v2 replace jackson-v1 import
 Key: HUDI-4997
 URL: https://issues.apache.org/jira/browse/HUDI-4997
 Project: Apache Hudi
  Issue Type: Improvement
  Components: cli
Reporter: fanshilun


HoodieWriteCommitCallbackUtil uses ObjectMapper, but uses jackson-v1 import, 
jackson-v1 has security risks, replace import with jackson-v2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex

2022-10-08 Thread GitBox


hudi-bot commented on PR #6680:
URL: https://github.com/apache/hudi/pull/6680#issuecomment-1272270347

   
   ## CI report:
   
   * efc19bfcfb86bf582d4bd2584462083b8178c1c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11783)
 
   * 5f6d4f624c5f20cf3c4c38384e17c7bb13e56991 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12065)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] gubinjie closed issue #6825: [SUPPORT]org.apache.hudi.exception.HoodieRemoteException: *****:37568 failed to respond

2022-10-08 Thread GitBox


gubinjie closed issue #6825: 
[SUPPORT]org.apache.hudi.exception.HoodieRemoteException: *:37568 failed to 
respond
URL: https://github.com/apache/hudi/issues/6825


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] gubinjie commented on issue #6825: [SUPPORT]org.apache.hudi.exception.HoodieRemoteException: *****:37568 failed to respond

2022-10-08 Thread GitBox


gubinjie commented on issue #6825:
URL: https://github.com/apache/hudi/issues/6825#issuecomment-1272269358

   TH


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex

2022-10-08 Thread GitBox


hudi-bot commented on PR #6680:
URL: https://github.com/apache/hudi/pull/6680#issuecomment-1272269333

   
   ## CI report:
   
   * efc19bfcfb86bf582d4bd2584462083b8178c1c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11783)
 
   * 5f6d4f624c5f20cf3c4c38384e17c7bb13e56991 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] wzx140 commented on a diff in pull request #6745: Fix comment in RFC46

2022-10-08 Thread GitBox


wzx140 commented on code in PR #6745:
URL: https://github.com/apache/hudi/pull/6745#discussion_r990612384


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/util/HoodieSparkRecordUtils.java:
##
@@ -91,21 +65,20 @@ private static Option 
getNullableValAsString(StructType structType, Inte
* @param structType  {@link StructType} instance.
* @return Column value if a single column, or concatenated String values by 
comma.
*/
-  public static Object getRecordColumnValues(InternalRow row,
+  public static ComparableList getRecordColumnValues(InternalRow row,
   String[] columns,
   StructType structType, boolean consistentLogicalTimestampEnabled) {
-if (columns.length == 1) {
-  NestedFieldPath posList = 
HoodieInternalRowUtils.getCachedPosList(structType, columns[0]);
-  return HoodieUnsafeRowUtils.getNestedInternalRowValue(row, posList);
-} else {
-  // TODO this is inefficient, instead we can simply return array of 
Comparable
-  StringBuilder sb = new StringBuilder();
-  for (String col : columns) {
-// TODO support consistentLogicalTimestampEnabled
-NestedFieldPath posList = 
HoodieInternalRowUtils.getCachedPosList(structType, columns[0]);
-return HoodieUnsafeRowUtils.getNestedInternalRowValue(row, posList);
+List list = new LinkedList<>();
+for (String column : columns) {
+  NestedFieldPath posList = 
HoodieInternalRowUtils.getCachedPosList(structType, column);
+  Object value = HoodieUnsafeRowUtils.getNestedInternalRowValue(row, 
posList);
+  DataType dataType = posList.parts()[posList.parts().length - 
1]._2.dataType();
+  if (value instanceof InternalRow | value instanceof MapData | value 
instanceof ArrayData

Review Comment:
   Removed



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala:
##
@@ -461,6 +461,18 @@ abstract class HoodieBaseRelation(val sqlContext: 
SQLContext,
   }
 
   protected def getTableState: HoodieTableState = {
+val mergerImpls = (if 
(optParams.contains(HoodieWriteConfig.MERGER_IMPLS.key())) {

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4996) Update cleaning doc

2022-10-08 Thread xi chaomin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xi chaomin updated HUDI-4996:
-
Description: The parameter *--hoodieConfigs* in "cleans run" is a String 
array, the value should be separated with " ".

> Update cleaning doc
> ---
>
> Key: HUDI-4996
> URL: https://issues.apache.org/jira/browse/HUDI-4996
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning, docs
>Reporter: xi chaomin
>Priority: Major
>
> The parameter *--hoodieConfigs* in "cleans run" is a String array, the value 
> should be separated with " ".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4996) Update cleaning doc

2022-10-08 Thread xi chaomin (Jira)
xi chaomin created HUDI-4996:


 Summary: Update cleaning doc
 Key: HUDI-4996
 URL: https://issues.apache.org/jira/browse/HUDI-4996
 Project: Apache Hudi
  Issue Type: Improvement
  Components: cleaning, docs
Reporter: xi chaomin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xicm opened a new pull request, #6892: Update hoodie_cleaner.md

2022-10-08 Thread GitBox


xicm opened a new pull request, #6892:
URL: https://github.com/apache/hudi/pull/6892

   ### Change Logs
   
   The parameter **--hoodieConfigs** in "cleans run" is a String array, the 
value should be separated with " ".
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex

2022-10-08 Thread GitBox


YuweiXiao commented on code in PR #6680:
URL: https://github.com/apache/hudi/pull/6680#discussion_r990607541


##
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##
@@ -179,15 +197,125 @@ public void close() throws Exception {
   }
 
   protected List getAllQueryPartitionPaths() {
+if (cachedAllPartitionPaths != null) {
+  return cachedAllPartitionPaths;
+}
+
+loadAllQueryPartitionPaths();
+return cachedAllPartitionPaths;
+  }
+
+  private void loadAllQueryPartitionPaths() {
 List queryRelativePartitionPaths = queryPaths.stream()
 .map(path -> FSUtils.getRelativePartitionPath(basePath, path))
 .collect(Collectors.toList());
 
-// Load all the partition path from the basePath, and filter by the query 
partition path.
-// TODO load files from the queryRelativePartitionPaths directly.
-List matchedPartitionPaths = getAllPartitionPathsUnchecked()
-.stream()
-.filter(path -> 
queryRelativePartitionPaths.stream().anyMatch(path::startsWith))
+this.cachedAllPartitionPaths = 
listQueryPartitionPaths(queryRelativePartitionPaths);
+
+// If the partition value contains InternalRow.empty, we query it as a 
non-partitioned table.
+this.queryAsNonePartitionedTable = 
this.cachedAllPartitionPaths.stream().anyMatch(p -> p.values.length == 0);
+  }
+
+  protected Map> getAllInputFileSlices() {
+if (!isAllInputFileSlicesCached) {
+  doRefresh();
+}
+return cachedAllInputFileSlices;
+  }
+
+  /**
+   * Get input file slice for the given partition. Will use cache directly if 
it is computed before.
+   */
+  protected List getCachedInputFileSlices(PartitionPath partition) {
+return cachedAllInputFileSlices.computeIfAbsent(partition, 
this::loadFileSlicesForPartition);
+  }
+
+  private List loadFileSlicesForPartition(PartitionPath p) {
+FileStatus[] files = loadPartitionPathFiles(p);
+HoodieTimeline activeTimeline = getActiveTimeline();
+Option latestInstant = activeTimeline.lastInstant();
+
+HoodieTableFileSystemView fileSystemView = new 
HoodieTableFileSystemView(metaClient, activeTimeline, files);
+
+Option queryInstant = specifiedQueryInstant.or(() -> 
latestInstant.map(HoodieInstant::getTimestamp));
+
+validate(activeTimeline, queryInstant);
+
+List ret;
+if (tableType.equals(HoodieTableType.MERGE_ON_READ) && 
queryType.equals(HoodieTableQueryType.SNAPSHOT)) {
+  ret = queryInstant.map(instant ->
+  fileSystemView.getLatestMergedFileSlicesBeforeOrOn(p.path, 
queryInstant.get())
+  .collect(Collectors.toList())
+  )
+  .orElse(Collections.emptyList());
+} else {
+  ret = queryInstant.map(instant ->
+  fileSystemView.getLatestFileSlicesBeforeOrOn(p.path, instant, 
true)
+  )
+  .orElse(fileSystemView.getLatestFileSlices(p.path))
+  .collect(Collectors.toList());
+}
+
+cachedFileSize += 
ret.stream().mapToLong(BaseHoodieTableFileIndex::fileSliceSize).sum();
+return ret;
+  }
+
+  /**
+   * Get partition path with the given partition value
+   * @param partitionNames partition names
+   * @param values partition values
+   * @return partitions that match the given partition values
+   */
+  protected List getPartitionPaths(String[] partitionNames, 
String[] values) {
+if (partitionNames.length == 0 || partitionNames.length != values.length) {

Review Comment:
   Yeah, I cleaned up the code accordingly. I added `isPartial` to replace the 
role of `idx`. Could u take another look?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex

2022-10-08 Thread GitBox


YuweiXiao commented on code in PR #6680:
URL: https://github.com/apache/hudi/pull/6680#discussion_r990606063


##
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##
@@ -179,15 +197,125 @@ public void close() throws Exception {
   }
 
   protected List getAllQueryPartitionPaths() {
+if (cachedAllPartitionPaths != null) {
+  return cachedAllPartitionPaths;
+}
+
+loadAllQueryPartitionPaths();
+return cachedAllPartitionPaths;
+  }
+
+  private void loadAllQueryPartitionPaths() {
 List queryRelativePartitionPaths = queryPaths.stream()
 .map(path -> FSUtils.getRelativePartitionPath(basePath, path))
 .collect(Collectors.toList());
 
-// Load all the partition path from the basePath, and filter by the query 
partition path.
-// TODO load files from the queryRelativePartitionPaths directly.
-List matchedPartitionPaths = getAllPartitionPathsUnchecked()
-.stream()
-.filter(path -> 
queryRelativePartitionPaths.stream().anyMatch(path::startsWith))
+this.cachedAllPartitionPaths = 
listQueryPartitionPaths(queryRelativePartitionPaths);
+
+// If the partition value contains InternalRow.empty, we query it as a 
non-partitioned table.
+this.queryAsNonePartitionedTable = 
this.cachedAllPartitionPaths.stream().anyMatch(p -> p.values.length == 0);
+  }
+
+  protected Map> getAllInputFileSlices() {
+if (!isAllInputFileSlicesCached) {
+  doRefresh();
+}
+return cachedAllInputFileSlices;
+  }
+
+  /**
+   * Get input file slice for the given partition. Will use cache directly if 
it is computed before.
+   */
+  protected List getCachedInputFileSlices(PartitionPath partition) {
+return cachedAllInputFileSlices.computeIfAbsent(partition, 
this::loadFileSlicesForPartition);
+  }
+
+  private List loadFileSlicesForPartition(PartitionPath p) {
+FileStatus[] files = loadPartitionPathFiles(p);
+HoodieTimeline activeTimeline = getActiveTimeline();
+Option latestInstant = activeTimeline.lastInstant();
+
+HoodieTableFileSystemView fileSystemView = new 
HoodieTableFileSystemView(metaClient, activeTimeline, files);
+
+Option queryInstant = specifiedQueryInstant.or(() -> 
latestInstant.map(HoodieInstant::getTimestamp));
+
+validate(activeTimeline, queryInstant);
+
+List ret;
+if (tableType.equals(HoodieTableType.MERGE_ON_READ) && 
queryType.equals(HoodieTableQueryType.SNAPSHOT)) {
+  ret = queryInstant.map(instant ->
+  fileSystemView.getLatestMergedFileSlicesBeforeOrOn(p.path, 
queryInstant.get())
+  .collect(Collectors.toList())
+  )
+  .orElse(Collections.emptyList());
+} else {
+  ret = queryInstant.map(instant ->
+  fileSystemView.getLatestFileSlicesBeforeOrOn(p.path, instant, 
true)
+  )
+  .orElse(fileSystemView.getLatestFileSlices(p.path))
+  .collect(Collectors.toList());
+}
+
+cachedFileSize += 
ret.stream().mapToLong(BaseHoodieTableFileIndex::fileSliceSize).sum();
+return ret;
+  }
+
+  /**
+   * Get partition path with the given partition value
+   * @param partitionNames partition names
+   * @param values partition values
+   * @return partitions that match the given partition values
+   */
+  protected List getPartitionPaths(String[] partitionNames, 
String[] values) {
+if (partitionNames.length == 0 || partitionNames.length != values.length) {
+  LOG.info("The input partition names or value is empty, fallback to 
return all partition paths");
+  return getAllQueryPartitionPaths();
+}
+
+if (cachedAllPartitionPaths != null) {
+  LOG.info("All partition paths have already loaded, use it directly");
+  return cachedAllPartitionPaths;
+}
+
+boolean hiveStylePartitioning = 
Boolean.parseBoolean(metaClient.getTableConfig().getHiveStylePartitioningEnable());
+boolean urlEncodePartitioning = 
Boolean.parseBoolean(this.metaClient.getTableConfig().getUrlEncodePartitioning());
+Map partitionNameToIdx = IntStream.range(0, 
partitionNames.length)
+.mapToObj(i -> Pair.of(i, partitionNames[i]))
+.collect(Collectors.toMap(Pair::getValue, Pair::getKey));
+StringBuilder queryPartitionPath = new StringBuilder();
+int idx = 0;
+for (; idx < partitionNames.length; ++idx) {
+  String columnNames = this.partitionColumns[idx];
+  if (partitionNameToIdx.containsKey(columnNames)) {
+int k = partitionNameToIdx.get(columnNames);
+String value =  urlEncodePartitioning ? 
PartitionPathEncodeUtils.escapePathName(values[k]) : values[k];
+queryPartitionPath.append(hiveStylePartitioning ? columnNames + "=" : 
"").append(value).append("/");
+  } else {
+break;
+  }
+}
+queryPartitionPath.deleteCharAt(queryPartitionPath.length() - 1);
+// Return directly if all partition values are specified.
+if (idx == this.partitionColumns.length) {
+  return 

[GitHub] [hudi] wzx140 commented on a diff in pull request #6745: Fix comment in RFC46

2022-10-08 Thread GitBox


wzx140 commented on code in PR #6745:
URL: https://github.com/apache/hudi/pull/6745#discussion_r990605829


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##
@@ -253,21 +253,21 @@ private Option 
prepareRecord(HoodieRecord hoodieRecord) {
   }
 
   private HoodieRecord populateMetadataFields(HoodieRecord hoodieRecord, 
Schema schema, Properties prop) throws IOException {
-Map metadataValues = new HashMap<>();
-String seqId =
-HoodieRecord.generateSequenceId(instantTime, getPartitionId(), 
RECORD_COUNTER.getAndIncrement());
+MetadataValues metadataValues = new MetadataValues();
 if (config.populateMetaFields()) {
-  
metadataValues.put(HoodieRecord.HoodieMetadataField.FILENAME_METADATA_FIELD.getFieldName(),
 fileId);
-  
metadataValues.put(HoodieRecord.HoodieMetadataField.PARTITION_PATH_METADATA_FIELD.getFieldName(),
 partitionPath);
-  
metadataValues.put(HoodieRecord.HoodieMetadataField.RECORD_KEY_METADATA_FIELD.getFieldName(),
 hoodieRecord.getRecordKey());
-  
metadataValues.put(HoodieRecord.HoodieMetadataField.COMMIT_TIME_METADATA_FIELD.getFieldName(),
 instantTime);
-  
metadataValues.put(HoodieRecord.HoodieMetadataField.COMMIT_SEQNO_METADATA_FIELD.getFieldName(),
 seqId);
+  String seqId =
+  HoodieRecord.generateSequenceId(instantTime, getPartitionId(), 
RECORD_COUNTER.getAndIncrement());
+  metadataValues.setFileName(fileId);
+  metadataValues.setPartitionPath(partitionPath);
+  metadataValues.setRecordKey(hoodieRecord.getRecordKey());
+  metadataValues.setCommitTime(instantTime);
+  metadataValues.setCommitSeqno(seqId);
 }
 if (config.allowOperationMetadataField()) {
-  
metadataValues.put(HoodieRecord.HoodieMetadataField.OPERATION_METADATA_FIELD.getFieldName(),
 hoodieRecord.getOperation().getName());
+  metadataValues.setOperation(hoodieRecord.getOperation().getName());
 }
 
-return hoodieRecord.updateValues(schema, prop, metadataValues);
+return hoodieRecord.updateMetadataValues(schema, prop, metadataValues);

Review Comment:
   if config.populateMetaFields=false, then metadataValues is empty. And 
hoodieRecord.updateMetadataValues will do nothing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex

2022-10-08 Thread GitBox


YuweiXiao commented on code in PR #6680:
URL: https://github.com/apache/hudi/pull/6680#discussion_r990605731


##
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##
@@ -138,7 +143,20 @@ public BaseHoodieTableFileIndex(HoodieEngineContext 
engineContext,
 this.engineContext = engineContext;
 this.fileStatusCache = fileStatusCache;
 
-doRefresh();
+/**
+ * The `shouldRefresh` variable controls how we initialize the 
TableFileIndex:

Review Comment:
   I removed `isAllInputFileSlicesCached ` and have following logic to check is 
all file slices cached:
   
   ```
   if (cachedAllPartitionPaths == null) {
 return false;
   }
   return cachedAllPartitionPaths.stream().allMatch(p -> 
cachedAllInputFileSlices.containsKey(p));
   ```
   
   Basically, we check if all partitions are loaded. Then we check if all 
partitions is contained in the `cachedAllInputFileSlices`. It should be cleaner 
instead of maintaining a separate flag variable.



##
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##
@@ -179,15 +197,125 @@ public void close() throws Exception {
   }
 
   protected List getAllQueryPartitionPaths() {
+if (cachedAllPartitionPaths != null) {
+  return cachedAllPartitionPaths;
+}
+
+loadAllQueryPartitionPaths();
+return cachedAllPartitionPaths;
+  }
+
+  private void loadAllQueryPartitionPaths() {
 List queryRelativePartitionPaths = queryPaths.stream()
 .map(path -> FSUtils.getRelativePartitionPath(basePath, path))
 .collect(Collectors.toList());
 
-// Load all the partition path from the basePath, and filter by the query 
partition path.
-// TODO load files from the queryRelativePartitionPaths directly.
-List matchedPartitionPaths = getAllPartitionPathsUnchecked()
-.stream()
-.filter(path -> 
queryRelativePartitionPaths.stream().anyMatch(path::startsWith))
+this.cachedAllPartitionPaths = 
listQueryPartitionPaths(queryRelativePartitionPaths);
+
+// If the partition value contains InternalRow.empty, we query it as a 
non-partitioned table.
+this.queryAsNonePartitionedTable = 
this.cachedAllPartitionPaths.stream().anyMatch(p -> p.values.length == 0);

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-10-08 Thread GitBox


hudi-bot commented on PR #6358:
URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272253598

   
   ## CI report:
   
   * 288d166c49602a4593b1e97763a467811903737d UNKNOWN
   * f8732300afaf355296ca13fe7f2d3e9a131315d6 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12063)
 
   * 18ef7b44488dff256728b2bba024b4a4d00aebe9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12064)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex

2022-10-08 Thread GitBox


YuweiXiao commented on code in PR #6680:
URL: https://github.com/apache/hudi/pull/6680#discussion_r990601721


##
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##
@@ -179,15 +197,125 @@ public void close() throws Exception {
   }
 
   protected List getAllQueryPartitionPaths() {
+if (cachedAllPartitionPaths != null) {
+  return cachedAllPartitionPaths;
+}
+
+loadAllQueryPartitionPaths();
+return cachedAllPartitionPaths;
+  }
+
+  private void loadAllQueryPartitionPaths() {
 List queryRelativePartitionPaths = queryPaths.stream()
 .map(path -> FSUtils.getRelativePartitionPath(basePath, path))
 .collect(Collectors.toList());
 
-// Load all the partition path from the basePath, and filter by the query 
partition path.
-// TODO load files from the queryRelativePartitionPaths directly.
-List matchedPartitionPaths = getAllPartitionPathsUnchecked()
-.stream()
-.filter(path -> 
queryRelativePartitionPaths.stream().anyMatch(path::startsWith))
+this.cachedAllPartitionPaths = 
listQueryPartitionPaths(queryRelativePartitionPaths);
+
+// If the partition value contains InternalRow.empty, we query it as a 
non-partitioned table.
+this.queryAsNonePartitionedTable = 
this.cachedAllPartitionPaths.stream().anyMatch(p -> p.values.length == 0);
+  }
+
+  protected Map> getAllInputFileSlices() {
+if (!isAllInputFileSlicesCached) {

Review Comment:
   Yeah, good point. 1) generalize to batch get  2) load only remaining 
partitions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex

2022-10-08 Thread GitBox


YuweiXiao commented on code in PR #6680:
URL: https://github.com/apache/hudi/pull/6680#discussion_r990601573


##
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##
@@ -179,15 +197,125 @@ public void close() throws Exception {
   }
 
   protected List getAllQueryPartitionPaths() {
+if (cachedAllPartitionPaths != null) {
+  return cachedAllPartitionPaths;
+}
+
+loadAllQueryPartitionPaths();

Review Comment:
   Yes, you are right. I will have it inlined.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-10-08 Thread GitBox


hudi-bot commented on PR #6358:
URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272242144

   
   ## CI report:
   
   * 288d166c49602a4593b1e97763a467811903737d UNKNOWN
   * f8732300afaf355296ca13fe7f2d3e9a131315d6 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12063)
 
   * 18ef7b44488dff256728b2bba024b4a4d00aebe9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-10-08 Thread GitBox


hudi-bot commented on PR #6358:
URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272241019

   
   ## CI report:
   
   * 288d166c49602a4593b1e97763a467811903737d UNKNOWN
   * 1d98224805b75fc0c9c8ec54948870e96c4b54e7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12043)
 
   * f8732300afaf355296ca13fe7f2d3e9a131315d6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12063)
 
   * 18ef7b44488dff256728b2bba024b4a4d00aebe9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-10-08 Thread GitBox


hudi-bot commented on PR #6358:
URL: https://github.com/apache/hudi/pull/6358#issuecomment-1272240117

   
   ## CI report:
   
   * 288d166c49602a4593b1e97763a467811903737d UNKNOWN
   * 1d98224805b75fc0c9c8ec54948870e96c4b54e7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12043)
 
   * f8732300afaf355296ca13fe7f2d3e9a131315d6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-10-08 Thread GitBox


alexeykudinkin commented on code in PR #6358:
URL: https://github.com/apache/hudi/pull/6358#discussion_r990595189


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -185,7 +185,7 @@ public class HoodieWriteConfig extends HoodieConfig {
 
   public static final ConfigProperty AVRO_SCHEMA_VALIDATE_ENABLE = 
ConfigProperty
   .key("hoodie.avro.schema.validate")
-  .defaultValue("false")
+  .defaultValue("true")

Review Comment:
   This is flipped to default to make sure proper schema validation are run for 
every operation on the table



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##
@@ -81,20 +79,7 @@
   public static IgnoreRecord IGNORE_RECORD = new IgnoreRecord();
 
   /**
-   * The specified schema of the table. ("specified" denotes that this is 
configured by the client,
-   * as opposed to being implicitly fetched out of the commit metadata)
-   */
-  protected final Schema tableSchema;
-  protected final Schema tableSchemaWithMetaFields;

Review Comment:
   These fields were misused and are redundant, hence deleted



##
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaCompatibility.java:
##
@@ -0,0 +1,941 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.avro;
+
+import org.apache.avro.AvroRuntimeException;
+import org.apache.avro.Schema;
+import org.apache.avro.Schema.Field;
+import org.apache.avro.Schema.Type;
+import org.apache.hudi.common.util.Either;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Deque;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.TreeSet;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.util.ValidationUtils.checkState;
+
+/**
+ * Evaluate the compatibility between a reader schema and a writer schema. A
+ * reader and a writer schema are declared compatible if all datum instances of
+ * the writer schema can be successfully decoded using the specified reader
+ * schema.
+ *
+ * NOTE: PLEASE READ CAREFULLY BEFORE CHANGING
+ *
+ *   This code is borrowed from Avro 1.10, with the following 
modifications:
+ *   
+ * Compatibility checks ignore schema name, unless schema is held 
inside
+ * a union
+ *   
+ *
+ */
+public class AvroSchemaCompatibility {

Review Comment:
   Context: Avro requires at all times that schema's names have to match in 
order for them to be counted as compatible. Provided that only Avro bears the 
names on the schemas themselves (Spark does not, for ex) this makes for ex, 
some schemas converted from Spark's [[StructType]] incompatible w/ Avro
   
   
   This has code is mostly borrowed as is from Avro 1.10 w/ the following 
critical adjustments: Schema names now are only checked in following 2 cases:
   
- In case it's a top-level schema
- In case schema is enclosed into a union (in which case its name might be 
used for reverse-lookup)
   



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseMergeHelper.java:
##
@@ -18,91 +18,47 @@
 
 package org.apache.hudi.table.action.commit;
 
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
 import org.apache.hudi.avro.HoodieAvroUtils;
 import org.apache.hudi.client.utils.MergingIterator;
-import org.apache.hudi.common.model.HoodieBaseFile;
-import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer;
-import org.apache.hudi.exception.HoodieException;
 import org.apache.hudi.io.HoodieMergeHandle;
 import org.apache.hudi.io.storage.HoodieFileReader;
 import org.apache.hudi.io.storage.HoodieFileReaderFactory;
 import org.apache.hudi.table.HoodieTable;
 
-import