[GitHub] [hudi] lanyuanxiaoyao commented on a diff in pull request #5677: [HUDI-4152] Flink offline compaction support compacting multi compaction plan at once

2022-07-05 Thread GitBox


lanyuanxiaoyao commented on code in PR #5677:
URL: https://github.com/apache/hudi/pull/5677#discussion_r914479792


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/strategy/InstantCompactionPlanSelectStrategy.java:
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.sink.compact.strategy;
+
+import java.util.Collections;
+import java.util.List;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.sink.compact.FlinkCompactionConfig;
+
+/**
+ * Specify the compaction plan instant to compact
+ */
+public class InstantCompactionPlanSelectStrategy implements 
CompactionPlanSelectStrategy {

Review Comment:
   Good idea. I will add it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] RoderickAdriance commented on issue #5765: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"

2022-07-05 Thread GitBox


RoderickAdriance commented on issue #5765:
URL: https://github.com/apache/hudi/issues/5765#issuecomment-1175839617

   @yihua I use Hadoop3 and spark2 this problem will be resolved.
   So I think this problem  is caused by the incompatibility between Hudi jar 
package and spark3 package.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] RoderickAdriance commented on issue #5765: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"

2022-07-05 Thread GitBox


RoderickAdriance commented on issue #5765:
URL: https://github.com/apache/hudi/issues/5765#issuecomment-1175836187

   @yihua I use Hadoop3 and spark2 this problem will be resolved.
   So I think HFile classes is not compatible with spark2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] shqiprimbkodelabs opened a new issue, #6052: [SUPPORT] HoodieRollbackException when starting Flink Job on existing Hudi Table

2022-07-05 Thread GitBox


shqiprimbkodelabs opened a new issue, #6052:
URL: https://github.com/apache/hudi/issues/6052

   I am using Hudi with Flink and when I submit the job it fails to rollback. 
This is issue happens also if any of the checkpoints fails. 
   `Caused by: org.apache.flink.util.FlinkException: Global failure triggered 
by OperatorCoordinator for 'stream_write: HUDI_TIME_SERIES' (operator 
e8fc67ede24cef102d7d7a9334f93f11).
   at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:545)
   at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$start$0(StreamWriteOperatorCoordinator.java:188)
   at 
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$execute$0(NonThrownExecutor.java:103)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.hudi.exception.HoodieException: Executor executes 
action [initialize instant ] error
   ... 5 more
   Caused by: org.apache.hudi.exception.HoodieRollbackException: Failed to 
rollback gs://bucketpath/table_ commits 20220705165415314
   at 
org.apache.hudi.client.BaseHoodieWriteClient.rollback(BaseHoodieWriteClient.java:779)
   at 
org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1189)
   at 
org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1172)
   at 
org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1160)
   at 
org.apache.hudi.client.BaseHoodieWriteClient.lambda$startCommit$afea71c0$1(BaseHoodieWriteClient.java:932)
   at 
org.apache.hudi.common.util.CleanerUtils.rollbackFailedWrites(CleanerUtils.java:151)
   at 
org.apache.hudi.client.BaseHoodieWriteClient.startCommit(BaseHoodieWriteClient.java:931)
   at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.startInstant(StreamWriteOperatorCoordinator.java:375)
   at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$initInstant$6(StreamWriteOperatorCoordinator.java:403)
   at 
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$execute$0(NonThrownExecutor.java:93)
   ... 3 more
   Caused by: java.lang.NullPointerException
   at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:910)
   at 
org.apache.hudi.table.action.BaseActionExecutor.lambda$writeTableMetadata$2(BaseActionExecutor.java:77)
   at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
   at 
org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:77)
   at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.finishRollback(BaseRollbackActionExecutor.java:255)
   at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:124)
   at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:145)
   at 
org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.rollback(HoodieFlinkCopyOnWriteTable.java:333)
   at 
org.apache.hudi.client.BaseHoodieWriteClient.rollback(BaseHoodieWriteClient.java:762)
   ... 12 more`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-4357) Support flink 1.15.x

2022-07-05 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562957#comment-17562957
 ] 

Danny Chen commented on HUDI-4357:
--

Fixed via master branch: 7eeaff9ee0ee12e93e6bd7a6e8fa5f15a2081a0b

> Support flink 1.15.x
> 
>
> Key: HUDI-4357
> URL: https://issues.apache.org/jira/browse/HUDI-4357
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-4357) Support flink 1.15.x

2022-07-05 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-4357.
--

> Support flink 1.15.x
> 
>
> Key: HUDI-4357
> URL: https://issues.apache.org/jira/browse/HUDI-4357
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated (b18c32379f -> 7eeaff9ee0)

2022-07-05 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from b18c32379f [HUDI-4219] Merge Into when update expression "col=s.col+2" 
on precombine cause exception (#5828)
 add 7eeaff9ee0 [HUDI-4357] Support flink 1.15.x (#6050)

No new revisions were added by this update.

Summary of changes:
 hudi-client/hudi-flink-client/pom.xml  | 12 ++--
 .../hudi/table/action/commit/FlinkMergeHelper.java |  3 +-
 hudi-examples/hudi-examples-flink/pom.xml  | 22 +++-
 .../quickstart/source/ContinuousFileSource.java|  5 +-
 hudi-flink-datasource/hudi-flink/pom.xml   | 22 
 .../org/apache/hudi/sink/StreamWriteFunction.java  |  1 +
 .../hudi/sink/append/AppendWriteFunction.java  |  1 +
 .../sink/common/AbstractStreamWriteFunction.java   | 14 +
 .../org/apache/hudi/sink/meta/CkpMetadata.java |  6 +-
 .../org/apache/hudi/table/HoodieTableSink.java |  4 +-
 .../org/apache/hudi/table/HoodieTableSource.java   |  4 +-
 .../table/format/cow/CopyOnWriteInputFormat.java   |  8 +--
 .../table/format/mor/MergeOnReadInputFormat.java   | 10 ++--
 .../org/apache/hudi/util/AvroSchemaConverter.java  |  3 +-
 .../java/org/apache/hudi/util/DataTypeUtils.java   | 51 +
 .../java/org/apache/hudi/util/HoodiePipeline.java  | 11 ++--
 .../apache/hudi/table/ITTestHoodieDataSource.java  |  2 +-
 .../hudi/utils/source/ContinuousFileSource.java|  5 +-
 hudi-flink-datasource/hudi-flink1.13.x/pom.xml | 23 
 .../adapter/DataStreamScanProviderAdapter.java}|  6 +-
 .../adapter/DataStreamSinkProviderAdapter.java}|  6 +-
 .../main/java/org/apache/hudi/adapter/Utils.java   | 12 
 .../table/format/cow/ParquetSplitReaderUtil.java   |  0
 .../table/format/cow/vector/HeapArrayVector.java   |  0
 .../format/cow/vector/HeapMapColumnVector.java |  0
 .../format/cow/vector/HeapRowColumnVector.java |  0
 .../format/cow/vector/ParquetDecimalVector.java|  0
 .../cow/vector/reader/AbstractColumnReader.java|  0
 .../cow/vector/reader/ArrayColumnReader.java   |  0
 .../vector/reader/BaseVectorizedColumnReader.java  |  0
 .../vector/reader/FixedLenBytesColumnReader.java   |  0
 .../vector/reader/Int64TimestampColumnReader.java  |  0
 .../format/cow/vector/reader/MapColumnReader.java  |  0
 .../reader/ParquetColumnarRowSplitReader.java  |  0
 .../cow/vector/reader/ParquetDataColumnReader.java |  0
 .../reader/ParquetDataColumnReaderFactory.java |  0
 .../format/cow/vector/reader/RowColumnReader.java  |  0
 .../format/cow/vector/reader/RunLengthDecoder.java |  6 +-
 hudi-flink-datasource/hudi-flink1.14.x/pom.xml | 29 ++
 .../adapter/DataStreamScanProviderAdapter.java}|  6 +-
 .../adapter/DataStreamSinkProviderAdapter.java}|  6 +-
 .../main/java/org/apache/hudi/adapter/Utils.java   | 12 
 .../table/format/cow/ParquetSplitReaderUtil.java   |  0
 .../table/format/cow/vector/HeapArrayVector.java   |  0
 .../format/cow/vector/HeapMapColumnVector.java |  0
 .../format/cow/vector/HeapRowColumnVector.java |  0
 .../format/cow/vector/ParquetDecimalVector.java|  0
 .../cow/vector/reader/AbstractColumnReader.java|  0
 .../cow/vector/reader/ArrayColumnReader.java   |  0
 .../vector/reader/BaseVectorizedColumnReader.java  |  0
 .../vector/reader/FixedLenBytesColumnReader.java   |  0
 .../vector/reader/Int64TimestampColumnReader.java  |  0
 .../format/cow/vector/reader/MapColumnReader.java  |  0
 .../reader/ParquetColumnarRowSplitReader.java  |  0
 .../cow/vector/reader/ParquetDataColumnReader.java |  0
 .../reader/ParquetDataColumnReaderFactory.java |  0
 .../format/cow/vector/reader/RowColumnReader.java  |  0
 .../format/cow/vector/reader/RunLengthDecoder.java |  6 +-
 .../{hudi-flink1.14.x => hudi-flink1.15.x}/pom.xml | 45 ---
 .../adapter/AbstractStreamOperatorAdapter.java |  0
 .../AbstractStreamOperatorFactoryAdapter.java  |  0
 .../adapter/DataStreamScanProviderAdapter.java}| 20 +++
 .../adapter/DataStreamSinkProviderAdapter.java}| 21 +++
 .../hudi/adapter/MailboxExecutorAdapter.java   |  0
 .../apache/hudi/adapter/RateLimiterAdapter.java|  0
 .../main/java/org/apache/hudi/adapter/Utils.java   | 14 +
 .../table/format/cow/ParquetSplitReaderUtil.java   | 28 +-
 .../table/format/cow/vector/HeapArrayVector.java   | 10 ++--
 .../format/cow/vector/HeapMapColumnVector.java | 10 ++--
 .../format/cow/vector/HeapRowColumnVector.java | 10 ++--
 .../format/cow/vector/ParquetDecimalVector.java|  6 +-
 .../cow/vector/reader/AbstractColumnReader.java|  4 +-
 .../cow/vector/reader/ArrayColumnReader.java   | 22 
 .../vector/reader/BaseVectorizedColumnReader.java  |  2 +-
 .../vector/reader/FixedLenBytesColumnReader.java   |  6 +-
 .../vector/reader/Int64TimestampColumnReader.java  |  4 +-
 .../format/cow/vector/

[GitHub] [hudi] danny0405 merged pull request #6050: [HUDI-4357] Support flink 1.15.x

2022-07-05 Thread GitBox


danny0405 merged PR #6050:
URL: https://github.com/apache/hudi/pull/6050


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6051: [HUDI-4366] Synchronous cleaning for flink bounded source

2022-07-05 Thread GitBox


hudi-bot commented on PR #6051:
URL: https://github.com/apache/hudi/pull/6051#issuecomment-1175784579

   
   ## CI report:
   
   * f825ff8c71e8912a6b656bbb51789d05e49871ce Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9741)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5995: [HUDI-4335] Bug fixes in AWSGlueCatalogSyncClient post schema evolution.

2022-07-05 Thread GitBox


hudi-bot commented on PR #5995:
URL: https://github.com/apache/hudi/pull/5995#issuecomment-1175784512

   
   ## CI report:
   
   * f06a5460500b4a12fa14b752de26b6eddc270ebe Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9610)
 
   * a9ab15037c87692466531b3afa38aea19e6646b2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9740)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6051: [HUDI-4366] Synchronous cleaning for flink bounded source

2022-07-05 Thread GitBox


hudi-bot commented on PR #6051:
URL: https://github.com/apache/hudi/pull/6051#issuecomment-1175782571

   
   ## CI report:
   
   * f825ff8c71e8912a6b656bbb51789d05e49871ce UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5995: [HUDI-4335] Bug fixes in AWSGlueCatalogSyncClient post schema evolution.

2022-07-05 Thread GitBox


hudi-bot commented on PR #5995:
URL: https://github.com/apache/hudi/pull/5995#issuecomment-1175782509

   
   ## CI report:
   
   * f06a5460500b4a12fa14b752de26b6eddc270ebe Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9610)
 
   * a9ab15037c87692466531b3afa38aea19e6646b2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #5979: [SUPPORT]the hudi's table of join can not handle delete operation.But simple table is ok.why?

2022-07-05 Thread GitBox


yihua commented on issue #5979:
URL: https://github.com/apache/hudi/issues/5979#issuecomment-1175781714

   cc @yuzhaojing 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #5984: [SUPPORT] Error on GlobalSortPartitioner using 0.9.0

2022-07-05 Thread GitBox


yihua commented on issue #5984:
URL: https://github.com/apache/hudi/issues/5984#issuecomment-1175781226

   cc @minihippo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #5985: [SUPPORT] Hudi upsert fails with java.lang.ClassCastException: optional binary xx (STRING) is not a group

2022-07-05 Thread GitBox


yihua commented on issue #5985:
URL: https://github.com/apache/hudi/issues/5985#issuecomment-1175781069

   cc @minihippo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #5989: [SUPPORT] Schema Evolution Issue - New columns are not showing up in Spark-SQL.

2022-07-05 Thread GitBox


yihua commented on issue #5989:
URL: https://github.com/apache/hudi/issues/5989#issuecomment-1175780819

   cc @minihippo @xiarixiaoyao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #6007: spark query partition field error

2022-07-05 Thread GitBox


yihua commented on issue #6007:
URL: https://github.com/apache/hudi/issues/6007#issuecomment-1175779565

   cc @minihippo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #6011: [SUPPORT] HoodieFlinkCompactor failed

2022-07-05 Thread GitBox


yihua commented on issue #6011:
URL: https://github.com/apache/hudi/issues/6011#issuecomment-1175779210

   cc @yuzhaojing 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #6014: [SUPPORT] High runtime for a batch in SparkWriteHelper stage

2022-07-05 Thread GitBox


yihua commented on issue #6014:
URL: https://github.com/apache/hudi/issues/6014#issuecomment-1175779034

   cc @minihippo @xiarixiaoyao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kumudkumartirupati commented on pull request #5995: [HUDI-4335] Bug fixes in AWSGlueCatalogSyncClient post schema evolution.

2022-07-05 Thread GitBox


kumudkumartirupati commented on PR #5995:
URL: https://github.com/apache/hudi/pull/5995#issuecomment-1175778645

   Conflicts resolved


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kumud-hs commented on pull request #5995: [HUDI-4335] Bug fixes in AWSGlueCatalogSyncClient post schema evolution.

2022-07-05 Thread GitBox


kumud-hs commented on PR #5995:
URL: https://github.com/apache/hudi/pull/5995#issuecomment-1175778248

   Conflicts resolved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4366) Synchronous cleaning for flink bounded source

2022-07-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4366:
-
Labels: pull-request-available  (was: )

> Synchronous cleaning for flink bounded source
> -
>
> Key: HUDI-4366
> URL: https://issues.apache.org/jira/browse/HUDI-4366
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 opened a new pull request, #6051: [HUDI-4366] Synchronous cleaning for flink bounded source

2022-07-05 Thread GitBox


danny0405 opened a new pull request, #6051:
URL: https://github.com/apache/hudi/pull/6051

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-4366) Synchronous cleaning for flink bounded source

2022-07-05 Thread Danny Chen (Jira)
Danny Chen created HUDI-4366:


 Summary: Synchronous cleaning for flink bounded source
 Key: HUDI-4366
 URL: https://issues.apache.org/jira/browse/HUDI-4366
 Project: Apache Hudi
  Issue Type: New Feature
  Components: flink
Reporter: Danny Chen
 Fix For: 0.12.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 commented on pull request #6050: [HUDI-4357] Support flink 1.15.x

2022-07-05 Thread GitBox


danny0405 commented on PR #6050:
URL: https://github.com/apache/hudi/pull/6050#issuecomment-1175759080

   The filed module `hudi-integ-test` has no relationship with this PR's 
change, and it succeed in the build history.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6050: [HUDI-4357] Support flink 1.15.x

2022-07-05 Thread GitBox


hudi-bot commented on PR #6050:
URL: https://github.com/apache/hudi/pull/6050#issuecomment-1175747079

   
   ## CI report:
   
   * 64e3f11d32fc3dd5cb6bc8158913994e3b6a691f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9739)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Aiden-Dong commented on pull request #5963: [HUDI-4300] Add sync clean and archive for compaction service in Spark Env

2022-07-05 Thread GitBox


Aiden-Dong commented on PR #5963:
URL: https://github.com/apache/hudi/pull/5963#issuecomment-1175746885

   @danny0405 Hello, can you take a moment to check this pr?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Aiden-Dong commented on pull request #5945: [HUDI-4308] READ_OPTIMIZED read mode will temporary loss of data when compaction

2022-07-05 Thread GitBox


Aiden-Dong commented on PR #5945:
URL: https://github.com/apache/hudi/pull/5945#issuecomment-1175746309

   @danny0405 Hello, can you take a moment to check this pr?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] liufangqi commented on a diff in pull request #5997: [HUDI-4338] resolve the data skew when using flink datastream write hudi

2022-07-05 Thread GitBox


liufangqi commented on code in PR #5997:
URL: https://github.com/apache/hudi/pull/5997#discussion_r914375846


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java:
##
@@ -330,17 +330,23 @@ public static DataStream 
hoodieStreamWrite(Configuration conf, int defau
   .setParallelism(conf.getInteger(FlinkOptions.WRITE_TASKS));
 } else {
   WriteOperatorFactory operatorFactory = 
StreamWriteOperator.getFactory(conf);
-  return dataStream
-  // Key-by record key, to avoid multiple subtasks write to a bucket 
at the same time
-  .keyBy(HoodieRecord::getRecordKey)
-  .transform(
-  "bucket_assigner",
-  TypeInformation.of(HoodieRecord.class),
-  new KeyedProcessOperator<>(new BucketAssignFunction<>(conf)))
-  .uid("uid_bucket_assigner_" + 
conf.getString(FlinkOptions.TABLE_NAME))
-  
.setParallelism(conf.getOptional(FlinkOptions.BUCKET_ASSIGN_TASKS).orElse(defaultParallelism))
-  // shuffle by fileId(bucket id)
-  .keyBy(record -> record.getCurrentLocation().getFileId())
+
+  DataStream bucketDataStream = dataStream
+  // Key-by record key, to avoid multiple subtasks write to a 
bucket at the same time
+  .keyBy(HoodieRecord::getRecordKey)
+  .transform(
+  "bucket_assigner",
+  TypeInformation.of(HoodieRecord.class),
+  new KeyedProcessOperator<>(new 
BucketAssignFunction<>(conf)))
+  .uid("uid_bucket_assigner_" + 
conf.getString(FlinkOptions.TABLE_NAME))
+  
.setParallelism(conf.getOptional(FlinkOptions.BUCKET_ASSIGN_TASKS).orElse(defaultParallelism));
+
+  bucketDataStream = 
conf.getOptional(FlinkOptions.BUCKET_ASSIGN_TASKS).orElse(defaultParallelism) ==
+  conf.getInteger(FlinkOptions.WRITE_TASKS) ? bucketDataStream : 
bucketDataStream
+  // shuffle by fileId(bucket id)
+  .keyBy(record -> record.getCurrentLocation().getFileId());

Review Comment:
   > But we should figure out more random algorithm here instead of just fix 
the case when bucket assign and write task have the same parallelism, how about 
the bucket assign has parallelism 4 and write task has parallelism 6 here ?
   
   @danny0405 Yeah, I approval this. We do need to resolve the problem 
completely. I will think about a better idea later. 
   But this pr can help resolve the network overhead and the data skew in some 
case. I think it should be a improvment not bug fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-07-05 Thread GitBox


xiarixiaoyao commented on PR #6017:
URL: https://github.com/apache/hudi/pull/6017#issuecomment-1175730234

   @leesf  fixed the comment and UT.  could you pls help review again, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6050: [HUDI-4357] Support flink 1.15.x

2022-07-05 Thread GitBox


hudi-bot commented on PR #6050:
URL: https://github.com/apache/hudi/pull/6050#issuecomment-1175725791

   
   ## CI report:
   
   * 64e3f11d32fc3dd5cb6bc8158913994e3b6a691f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 opened a new pull request, #6050: [HUDI-4357] Support flink 1.15.x

2022-07-05 Thread GitBox


danny0405 opened a new pull request, #6050:
URL: https://github.com/apache/hudi/pull/6050

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 closed pull request #6036: [HUDI-4357] Support flink 1.15.x

2022-07-05 Thread GitBox


danny0405 closed pull request #6036: [HUDI-4357] Support flink 1.15.x
URL: https://github.com/apache/hudi/pull/6036


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6036: [HUDI-4357] Support flink 1.15.x

2022-07-05 Thread GitBox


hudi-bot commented on PR #6036:
URL: https://github.com/apache/hudi/pull/6036#issuecomment-1175719724

   
   ## CI report:
   
   * ca25b34f04a0bf02daf2c2198fe3db5ede544129 UNKNOWN
   * 64e3f11d32fc3dd5cb6bc8158913994e3b6a691f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9728)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5997: [HUDI-4338] resolve the data skew when using flink datastream write hudi

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5997:
URL: https://github.com/apache/hudi/pull/5997#discussion_r914365747


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java:
##
@@ -330,17 +330,23 @@ public static DataStream 
hoodieStreamWrite(Configuration conf, int defau
   .setParallelism(conf.getInteger(FlinkOptions.WRITE_TASKS));
 } else {
   WriteOperatorFactory operatorFactory = 
StreamWriteOperator.getFactory(conf);
-  return dataStream
-  // Key-by record key, to avoid multiple subtasks write to a bucket 
at the same time
-  .keyBy(HoodieRecord::getRecordKey)
-  .transform(
-  "bucket_assigner",
-  TypeInformation.of(HoodieRecord.class),
-  new KeyedProcessOperator<>(new BucketAssignFunction<>(conf)))
-  .uid("uid_bucket_assigner_" + 
conf.getString(FlinkOptions.TABLE_NAME))
-  
.setParallelism(conf.getOptional(FlinkOptions.BUCKET_ASSIGN_TASKS).orElse(defaultParallelism))
-  // shuffle by fileId(bucket id)
-  .keyBy(record -> record.getCurrentLocation().getFileId())
+
+  DataStream bucketDataStream = dataStream
+  // Key-by record key, to avoid multiple subtasks write to a 
bucket at the same time
+  .keyBy(HoodieRecord::getRecordKey)
+  .transform(
+  "bucket_assigner",
+  TypeInformation.of(HoodieRecord.class),
+  new KeyedProcessOperator<>(new 
BucketAssignFunction<>(conf)))
+  .uid("uid_bucket_assigner_" + 
conf.getString(FlinkOptions.TABLE_NAME))
+  
.setParallelism(conf.getOptional(FlinkOptions.BUCKET_ASSIGN_TASKS).orElse(defaultParallelism));
+
+  bucketDataStream = 
conf.getOptional(FlinkOptions.BUCKET_ASSIGN_TASKS).orElse(defaultParallelism) ==
+  conf.getInteger(FlinkOptions.WRITE_TASKS) ? bucketDataStream : 
bucketDataStream
+  // shuffle by fileId(bucket id)
+  .keyBy(record -> record.getCurrentLocation().getFileId());

Review Comment:
   But we should figure out more random algorithm here instead of just fix the 
case when bucket assign and write task have the same parallelism, how about the 
bucket assign has parallelism 4 and write task has parallelism 6 here ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6036: [HUDI-4357] Support flink 1.15.x

2022-07-05 Thread GitBox


hudi-bot commented on PR #6036:
URL: https://github.com/apache/hudi/pull/6036#issuecomment-1175717646

   
   ## CI report:
   
   * ca25b34f04a0bf02daf2c2198fe3db5ede544129 UNKNOWN
   * 64e3f11d32fc3dd5cb6bc8158913994e3b6a691f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9728)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914361984


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadTableState.java:
##
@@ -82,6 +85,10 @@ public int getOperationPos() {
 return operationPos;
   }
 
+  public String getMergeClass() {
+return mergeClass;
+  }

Review Comment:
   No need to pass the string in this POJO, IMO.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914361663


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java:
##
@@ -649,7 +656,8 @@ static class MergeIterator implements RecordIterator {
 int[] requiredPos,
 boolean emitDelete,
 int operationPos,
-ParquetColumnarRowSplitReader reader) { // the reader should be with 
full schema
+ParquetColumnarRowSplitReader reader, // the reader should be with 
full schema
+String mergeClass) {
   this.tableSchema = tableSchema;

Review Comment:
   Do we need to pass around the `mergeClass` explicitly ? You can fetch that 
through the `flinkConf` right ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914360633


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##
@@ -283,6 +284,13 @@ private FlinkOptions() {
   .withDescription("Payload class used. Override this, if you like to roll 
your own merge logic, when upserting/inserting.\n"
   + "This will render any value set for the option in-effective");
 
+  public static final ConfigOption MERGE_CLASS_NAME = ConfigOptions
+  .key("write.merge.class")
+  .stringType()

Review Comment:
   Should name it `merge.class` if the class is used for reading path, say the 
MOR table logs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914360207


##
hudi-common/src/main/java/org/apache/hudi/common/util/HoodieRecordUtils.java:
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import org.apache.hudi.common.model.HoodieMerge;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.exception.HoodieException;
+
+import java.lang.reflect.InvocationTargetException;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * A utility class for HoodieRecord.
+ */
+public class HoodieRecordUtils {
+
+  private static final Map INSTANCE_CACHE = new HashMap<>();
+
+  /**
+   * Instantiate a given class with a record merge.
+   */
+  public static HoodieMerge loadMerge(String mergeClass) {
+try {
+  HoodieMerge merge = (HoodieMerge) INSTANCE_CACHE.get(mergeClass);

Review Comment:
   Why introducing another tool clazz for reflection ? We already have 
`ReflectionUtil`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914359578


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##
@@ -160,6 +162,7 @@ protected AbstractHoodieLogRecordReader(FileSystem fs, 
String basePath, List

[GitHub] [hudi] danny0405 commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914359119


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##
@@ -154,6 +155,12 @@ public class HoodieTableConfig extends HoodieConfig {
   .withDocumentation("Payload class to use for performing compactions, i.e 
merge delta logs with current base file and then "
   + " produce a new base file.");
 
+  public static final ConfigProperty MERGE_CLASS_NAME = ConfigProperty
+  .key("hoodie.compaction.merge.class")
+  .defaultValue(HoodieAvroRecordMerge.class.getName())
+  .withDocumentation("Merge class provide stateless component interface 
for merging records, and support various HoodieRecord "

Review Comment:
   I'm -1 for a separate merge clazz that is only for merging, can we elaborate 
a little more for just reusing the merge clazz for write path ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914358079


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecord.java:
##
@@ -169,15 +179,17 @@ public HoodieOperation getOperation() {
 return operation;
   }
 
+  public Comparable getOrderingValue() {
+return orderingVal;
+  }
+
   public T getData() {
 if (data == null) {
-  throw new IllegalStateException("Payload already deflated for record.");
+  throw new IllegalStateException("HoodieRecord already deflated for 
record.");
 }

Review Comment:
   Why this change ? Please revert it back !!!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914357550


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecord.java:
##
@@ -131,25 +131,35 @@ public String getFieldName() {
*/
   private HoodieOperation operation;
 
+  /**
+   * For purposes of preCombining.
+   */
+  private Comparable orderingVal;

Review Comment:
   > eventTime
   
   We can not make sure the field here is a timestamp type, even it is, the 
time maybe a processing time from source, so `orderingVal` is a more general 
case which just means the field is used for payload ordering.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914355923


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieMerge.java:
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.avro.Schema;
+import org.apache.hudi.common.util.Option;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Properties;
+
+/**
+ * HoodieMerge defines how to merge two records. It is a stateless component.
+ * It can implement the merging logic of HoodieRecord of different engines
+ * and avoid the performance consumption caused by the 
serialization/deserialization of Avro payload.
+ */
+public interface HoodieMerge extends Serializable {

Review Comment:
   I would suggest a name like `HoodiePayloadMerger` and definitely -1 for 
`HoodieMerge` because it is a confusing bad word and merge is a verb.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914355454


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecordMerge.java:
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.metadata.HoodieMetadataPayload;
+
+import java.io.IOException;
+import java.util.Properties;
+
+import static org.apache.hudi.TypeUtils.unsafeCast;
+
+public class HoodieAvroRecordMerge implements HoodieMerge {
+  @Override
+  public HoodieRecord preCombine(HoodieRecord older, HoodieRecord newer) {
+HoodieRecordPayload picked = unsafeCast(((HoodieAvroRecord) 
newer).getData().preCombine(((HoodieAvroRecord) older).getData()));
+if (picked instanceof HoodieMetadataPayload) {
+  // NOTE: HoodieMetadataPayload return a new payload
+  return new HoodieAvroRecord(newer.getKey(), picked, 
newer.getOperation());
+}
+return picked.equals(((HoodieAvroRecord) newer).getData()) ? newer : older;
+  }
+
+  @Override
+  public Option combineAndGetUpdateValue(HoodieRecord older, 
HoodieRecord newer, Schema schema, Properties props) throws IOException {
+Option previousRecordAvroPayload;
+if (older instanceof HoodieAvroIndexedRecord) {
+  previousRecordAvroPayload = Option.ofNullable(((HoodieAvroIndexedRecord) 
older).getData());

Review Comment:
   Can we avoid the instance of here? `older instanceof HoodieAvroIndexedRecord`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914352505


##
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/testutils/HoodieWriteableTestTable.java:
##
@@ -44,13 +50,6 @@
 import org.apache.hudi.io.storage.HoodieOrcConfig;
 import org.apache.hudi.io.storage.HoodieParquetConfig;
 import org.apache.hudi.metadata.HoodieTableMetadataWriter;
-
-import org.apache.avro.Schema;
-import org.apache.avro.generic.GenericRecord;
-import org.apache.avro.generic.IndexedRecord;

Review Comment:
   unnecessary change, please revert it back ~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914352001


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##
@@ -103,6 +106,7 @@ protected HoodieWriteHandle(HoodieWriteConfig config, 
String instantTime, String
 this.taskContextSupplier = taskContextSupplier;
 this.writeToken = makeWriteToken();
 schemaOnReadEnabled = 
!isNullOrEmpty(hoodieTable.getConfig().getInternalSchema());
+this.merge = HoodieRecordUtils.loadMerge(config.getMergeClass());

Review Comment:
   Merge is a verb, I would suggest `HoodiePayloadMerger` for better 
readability.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


danny0405 commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914350928


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -123,6 +124,12 @@ public class HoodieWriteConfig extends HoodieConfig {
   .withDocumentation("Payload class used. Override this, if you like to 
roll your own merge logic, when upserting/inserting. "
   + "This will render any value set for PRECOMBINE_FIELD_OPT_VAL 
in-effective");
 
+  public static final ConfigProperty MERGE_CLASS_NAME = ConfigProperty

Review Comment:
   Would suggest to use `hoodie.merge.class` if the option is also used for 
reading code path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #6036: [HUDI-4357] Support flink 1.15.x

2022-07-05 Thread GitBox


danny0405 commented on PR #6036:
URL: https://github.com/apache/hudi/pull/6036#issuecomment-1175694155

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (3670e82af5 -> b18c32379f)

2022-07-05 Thread mengtao
This is an automated email from the ASF dual-hosted git repository.

mengtao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 3670e82af5 [HUDI-4356] Fix the error when sync hive in CTAS (#6029)
 add b18c32379f [HUDI-4219] Merge Into when update expression "col=s.col+2" 
on precombine cause exception (#5828)

No new revisions were added by this update.

Summary of changes:
 .../hudi/command/MergeIntoHoodieTableCommand.scala |  40 -
 .../apache/spark/sql/hudi/TestMergeIntoTable.scala | 181 +
 2 files changed, 215 insertions(+), 6 deletions(-)



[GitHub] [hudi] xiarixiaoyao merged pull request #5828: [HUDI-4219] Merge Into when update expression "col=s.col+2" on precombine cause exception

2022-07-05 Thread GitBox


xiarixiaoyao merged PR #5828:
URL: https://github.com/apache/hudi/pull/5828


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6049: [HUDI-4365] Fixing URL-encoding in Bulk Insert row-writing path

2022-07-05 Thread GitBox


hudi-bot commented on PR #6049:
URL: https://github.com/apache/hudi/pull/6049#issuecomment-1175632658

   
   ## CI report:
   
   * 48dc9f122fb2e119ed8784cdd1bd8a8412e633bf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9737)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #5995: [HUDI-4335] Bug fixes in AWSGlueCatalogSyncClient post schema evolution.

2022-07-05 Thread GitBox


xushiyan commented on PR #5995:
URL: https://github.com/apache/hudi/pull/5995#issuecomment-1175615060

   @kumudkumartirupati thanks for the fix. could you resolve the conflict pls?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6049: [HUDI-4365] Fixing URL-encoding in Bulk Insert row-writing path

2022-07-05 Thread GitBox


hudi-bot commented on PR #6049:
URL: https://github.com/apache/hudi/pull/6049#issuecomment-1175599033

   
   ## CI report:
   
   * 1c9a917927cf0339a0271e04d474e91ac89254c8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9736)
 
   * 48dc9f122fb2e119ed8784cdd1bd8a8412e633bf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9737)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6049: [HUDI-4365] Fixing URL-encoding in Bulk Insert row-writing path

2022-07-05 Thread GitBox


hudi-bot commented on PR #6049:
URL: https://github.com/apache/hudi/pull/6049#issuecomment-1175573690

   
   ## CI report:
   
   * 1c9a917927cf0339a0271e04d474e91ac89254c8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9736)
 
   * 48dc9f122fb2e119ed8784cdd1bd8a8412e633bf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9737)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6049: [HUDI-4365] Fixing URL-encoding in Bulk Insert row-writing path

2022-07-05 Thread GitBox


hudi-bot commented on PR #6049:
URL: https://github.com/apache/hudi/pull/6049#issuecomment-1175570125

   
   ## CI report:
   
   * 1c9a917927cf0339a0271e04d474e91ac89254c8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9736)
 
   * 48dc9f122fb2e119ed8784cdd1bd8a8412e633bf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6049: [HUDI-4365] Fixing URL-encoding in Bulk Insert row-writing path

2022-07-05 Thread GitBox


hudi-bot commented on PR #6049:
URL: https://github.com/apache/hudi/pull/6049#issuecomment-1175515606

   
   ## CI report:
   
   * 1c9a917927cf0339a0271e04d474e91ac89254c8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9736)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6049: [HUDI-4365] Fixing URL-encoding in Bulk Insert row-writing path

2022-07-05 Thread GitBox


hudi-bot commented on PR #6049:
URL: https://github.com/apache/hudi/pull/6049#issuecomment-1175512074

   
   ## CI report:
   
   * 1c9a917927cf0339a0271e04d474e91ac89254c8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4365) Bulk Insert not URL encoding Partition Path properly

2022-07-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4365:
-
Labels: pull-request-available  (was: )

> Bulk Insert not URL encoding Partition Path properly
> 
>
> Key: HUDI-4365
> URL: https://issues.apache.org/jira/browse/HUDI-4365
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.0
>
> Attachments: Screen Shot 2022-07-05 at 1.07.19 PM.png
>
>
> Currently when using partition paths with slashes in it, Hudi lays out 
> partitioned table incorrectly (see below):
> !Screen Shot 2022-07-05 at 1.07.19 PM.png|width=623,height=206!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] alexeykudinkin opened a new pull request, #6049: [HUDI-4365] Fixing URL-encoding in Bulk Insert row-writing path

2022-07-05 Thread GitBox


alexeykudinkin opened a new pull request, #6049:
URL: https://github.com/apache/hudi/pull/6049

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Currently when doing bulk-insert using partition paths with slashes in it's 
being laid out incorrectly missing URL-encoding for the partition path, even 
though it's set to true.
   
   This fix is purely a duct-tape until it's properly addressed by HUDI-3993
   
   ## Brief change log
   
   See above
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4365) Bulk Insert not URL encoding Partition Path properly

2022-07-05 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-4365:
--
Description: 
Currently when using partition paths with slashes in it, Hudi lays out 
partitioned table incorrectly (see below):
!Screen Shot 2022-07-05 at 1.07.19 PM.png|width=623,height=206!

> Bulk Insert not URL encoding Partition Path properly
> 
>
> Key: HUDI-4365
> URL: https://issues.apache.org/jira/browse/HUDI-4365
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.12.0
>
> Attachments: Screen Shot 2022-07-05 at 1.07.19 PM.png
>
>
> Currently when using partition paths with slashes in it, Hudi lays out 
> partitioned table incorrectly (see below):
> !Screen Shot 2022-07-05 at 1.07.19 PM.png|width=623,height=206!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4365) Bulk Insert not URL encoding Partition Path properly

2022-07-05 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-4365:
--
Attachment: Screen Shot 2022-07-05 at 1.07.19 PM.png

> Bulk Insert not URL encoding Partition Path properly
> 
>
> Key: HUDI-4365
> URL: https://issues.apache.org/jira/browse/HUDI-4365
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.12.0
>
> Attachments: Screen Shot 2022-07-05 at 1.07.19 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4365) Bulk Insert not URL encoding Partition Path properly

2022-07-05 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-4365:
-

 Summary: Bulk Insert not URL encoding Partition Path properly
 Key: HUDI-4365
 URL: https://issues.apache.org/jira/browse/HUDI-4365
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Alexey Kudinkin
Assignee: Alexey Kudinkin
 Fix For: 0.12.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4346) Fix the params of bulkInsertAsRow not update BULKINSERT_ARE_PARTITIONER_RECORDS_SORTED

2022-07-05 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-4346:

Status: In Progress  (was: Open)

> Fix the params of bulkInsertAsRow not update 
> BULKINSERT_ARE_PARTITIONER_RECORDS_SORTED
> --
>
> Key: HUDI-4346
> URL: https://issues.apache.org/jira/browse/HUDI-4346
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Reporter: Ethan Guo
>Assignee: Hui An
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> https://github.com/apache/hudi/pull/5999



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-4346) Fix the params of bulkInsertAsRow not update BULKINSERT_ARE_PARTITIONER_RECORDS_SORTED

2022-07-05 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-4346.
---
Resolution: Fixed

> Fix the params of bulkInsertAsRow not update 
> BULKINSERT_ARE_PARTITIONER_RECORDS_SORTED
> --
>
> Key: HUDI-4346
> URL: https://issues.apache.org/jira/browse/HUDI-4346
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Reporter: Ethan Guo
>Assignee: Hui An
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> https://github.com/apache/hudi/pull/5999



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4346) Fix the params of bulkInsertAsRow not update BULKINSERT_ARE_PARTITIONER_RECORDS_SORTED

2022-07-05 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-4346:

Status: Patch Available  (was: In Progress)

> Fix the params of bulkInsertAsRow not update 
> BULKINSERT_ARE_PARTITIONER_RECORDS_SORTED
> --
>
> Key: HUDI-4346
> URL: https://issues.apache.org/jira/browse/HUDI-4346
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Reporter: Ethan Guo
>Assignee: Hui An
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> https://github.com/apache/hudi/pull/5999



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-4360) Fix HoodieDropPartitionsTool based on refactored meta sync

2022-07-05 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-4360.
---
Resolution: Fixed

> Fix HoodieDropPartitionsTool based on refactored meta sync
> --
>
> Key: HUDI-4360
> URL: https://issues.apache.org/jira/browse/HUDI-4360
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> [https://github.com/apache/hudi/pull/4459] causes master to fail to due to 
> refactoring of the meta sync.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4360) Fix HoodieDropPartitionsTool based on refactored meta sync

2022-07-05 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-4360:

Status: Patch Available  (was: In Progress)

> Fix HoodieDropPartitionsTool based on refactored meta sync
> --
>
> Key: HUDI-4360
> URL: https://issues.apache.org/jira/browse/HUDI-4360
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> [https://github.com/apache/hudi/pull/4459] causes master to fail to due to 
> refactoring of the meta sync.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4360) Fix HoodieDropPartitionsTool based on refactored meta sync

2022-07-05 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-4360:

Status: In Progress  (was: Open)

> Fix HoodieDropPartitionsTool based on refactored meta sync
> --
>
> Key: HUDI-4360
> URL: https://issues.apache.org/jira/browse/HUDI-4360
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> [https://github.com/apache/hudi/pull/4459] causes master to fail to due to 
> refactoring of the meta sync.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] alexeykudinkin commented on pull request #5828: [HUDI-4219] Merge Into when update expression "col=s.col+2" on precombine cause exception

2022-07-05 Thread GitBox


alexeykudinkin commented on PR #5828:
URL: https://github.com/apache/hudi/pull/5828#issuecomment-1175364275

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


alexeykudinkin commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914081837


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieMerge.java:
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.avro.Schema;
+import org.apache.hudi.common.util.Option;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Properties;
+
+/**
+ * HoodieMerge defines how to merge two records. It is a stateless component.
+ * It can implement the merging logic of HoodieRecord of different engines
+ * and avoid the performance consumption caused by the 
serialization/deserialization of Avro payload.
+ */
+public interface HoodieMerge extends Serializable {
+  
+  HoodieRecord preCombine(HoodieRecord older, HoodieRecord newer);

Review Comment:
   @wulei0302 i think we need to make sure that we wrap up RFC-46 with the 
following artifacts on hands:
   
- New APIs (forward-looking, one unified merging API)
- Legacy APIs (backward-compatible, with `preCombine` and 
`combineAndGetUpdateValue`, necessary to facilitate the migration onto the new 
API)
   
   That way, we can encourage folks to migrate from existing model where Legacy 
API will be providing compatibility out-of-the box, onto a new API which we can 
plan to declare a standard and deprecate a legacy one in 0.13



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5627: [HUDI-3350][HUDI-3351] Support HoodieMerge API and Spark engine-specific HoodieRecord

2022-07-05 Thread GitBox


alexeykudinkin commented on code in PR #5627:
URL: https://github.com/apache/hudi/pull/5627#discussion_r914081837


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieMerge.java:
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.avro.Schema;
+import org.apache.hudi.common.util.Option;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Properties;
+
+/**
+ * HoodieMerge defines how to merge two records. It is a stateless component.
+ * It can implement the merging logic of HoodieRecord of different engines
+ * and avoid the performance consumption caused by the 
serialization/deserialization of Avro payload.
+ */
+public interface HoodieMerge extends Serializable {
+  
+  HoodieRecord preCombine(HoodieRecord older, HoodieRecord newer);

Review Comment:
   @wulei0302 i think we need to make sure that we wrap up RFC-46 with the 
following artifacts on hands:
   
- New APIs (forward-looking, one unified merging API)
- Legacy APIs (backward-compatible, with `preCombine` and 
`combineAndGetUpdateValue`, necessary to facilitate the migration onto the new 
API)
   
   That way, we can encourage folks to migrate from existing model where Legacy 
API will be providing compatibility out-of-the box onto a new API which we can 
call a standard and deprecate a legacy one in 0.13



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5664: [HUDI-4140] Fixing hive style partitioning and default partition with bulk insert row writer with SimpleKeyGen and virtual key

2022-07-05 Thread GitBox


alexeykudinkin commented on code in PR #5664:
URL: https://github.com/apache/hudi/pull/5664#discussion_r914070359


##
hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieBulkInsertDataInternalWriter.java:
##
@@ -109,6 +112,51 @@ public void testDataInternalWriter(boolean sorted, boolean 
populateMetaFields) t
 }
   }
 
+  @Test
+  public void testDataInternalWriterHiveStylePartitioning() throws Exception {
+boolean sorted = true;
+boolean populateMetaFields = false;
+// init config and table
+HoodieWriteConfig cfg = getWriteConfig(populateMetaFields, "true");
+HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
+for (int i = 0; i < 1; i++) {
+  String instantTime = "00" + i;
+  // init writer
+  HoodieBulkInsertDataInternalWriter writer = new 
HoodieBulkInsertDataInternalWriter(table, cfg, instantTime, 
RANDOM.nextInt(10), RANDOM.nextLong(), RANDOM.nextLong(),

Review Comment:
   Is this RANDOM a fixed-seed one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-07-05 Thread GitBox


hudi-bot commented on PR #6017:
URL: https://github.com/apache/hudi/pull/6017#issuecomment-1175294900

   
   ## CI report:
   
   * 572b3bd83c499348795f380004520f880506cf86 UNKNOWN
   * 65d15683ec3b8084330a6df7e121ca4218b83b2f UNKNOWN
   * 26f78b05748846a5724c2153d52a695cba641759 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9732)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-2749) Improve the streaming read for hudi

2022-07-05 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562738#comment-17562738
 ] 

Raymond Xu commented on HUDI-2749:
--

[~danny0405] can you clean up the issues under this Epic to align with the main 
goal?

> Improve the streaming read for hudi
> ---
>
> Key: HUDI-2749
> URL: https://issues.apache.org/jira/browse/HUDI-2749
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Common Core
>Reporter: Danny Chen
>Assignee: Alexey Kudinkin
>Priority: Blocker
>
> Hudi has been widely used as streaming storage for our flink users. While 
> generally speaking, the streaming computing friendly feature has become a 
> killer differentiation. 
> With this umbrella issue, i propose to improve the integration with streaming 
> engine for both semantics and performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated (8570c3aab4 -> 3670e82af5)

2022-07-05 Thread forwardxu
This is an automated email from the ASF dual-hosted git repository.

forwardxu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 8570c3aab4 [HUDI-4359] Support show_fs_path_detail command on Call 
Produce Command (#6042)
 add 3670e82af5 [HUDI-4356] Fix the error when sync hive in CTAS (#6029)

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/hudi/command/CreateHoodieTableAsSelectCommand.scala | 1 +
 1 file changed, 1 insertion(+)



[GitHub] [hudi] XuQianJin-Stars merged pull request #6029: [HUDI-4356] Fix the error when sync hive in CTAS

2022-07-05 Thread GitBox


XuQianJin-Stars merged PR #6029:
URL: https://github.com/apache/hudi/pull/6029


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-2749) Improve the streaming read for hudi

2022-07-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2749:
-
Fix Version/s: (was: 0.12.0)

> Improve the streaming read for hudi
> ---
>
> Key: HUDI-2749
> URL: https://issues.apache.org/jira/browse/HUDI-2749
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Common Core
>Reporter: Danny Chen
>Assignee: Alexey Kudinkin
>Priority: Blocker
>
> Hudi has been widely used as streaming storage for our flink users. While 
> generally speaking, the streaming computing friendly feature has become a 
> killer differentiation. 
> With this umbrella issue, i propose to improve the integration with streaming 
> engine for both semantics and performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated (23c9c5c296 -> 8570c3aab4)

2022-07-05 Thread forwardxu
This is an automated email from the ASF dual-hosted git repository.

forwardxu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 23c9c5c296 [HUDI-3836] Improve the way of fetching metadata partitions 
from table (#5286)
 add 8570c3aab4 [HUDI-4359] Support show_fs_path_detail command on Call 
Produce Command (#6042)

No new revisions were added by this update.

Summary of changes:
 .../hudi/command/procedures/HoodieProcedures.scala |   1 +
 .../procedures/ShowFsPathDetailProcedure.scala | 112 +
 ...e.scala => TestShowFsPathDetailProcedure.scala} |  12 ++-
 3 files changed, 120 insertions(+), 5 deletions(-)
 create mode 100644 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowFsPathDetailProcedure.scala
 copy 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/{TestExportInstantsProcedure.scala
 => TestShowFsPathDetailProcedure.scala} (76%)



[GitHub] [hudi] XuQianJin-Stars merged pull request #6042: [HUDI-4359] Support show_fs_path_detail command on Call Produce Command

2022-07-05 Thread GitBox


XuQianJin-Stars merged PR #6042:
URL: https://github.com/apache/hudi/pull/6042


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6025: [HUDI-4351] Improve HoodieFlinkCompactor

2022-07-05 Thread GitBox


hudi-bot commented on PR #6025:
URL: https://github.com/apache/hudi/pull/6025#issuecomment-1175219792

   
   ## CI report:
   
   * c71c6e8ea76e26fc528daa2ed5f05a6fc99dfe7f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9731)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 commented on issue #6038: [SUPPORT] MOR taking more time than COW using HoodieJavaWriteClient

2022-07-05 Thread GitBox


fengjian428 commented on issue #6038:
URL: https://github.com/apache/hudi/issues/6038#issuecomment-1175214688

   https://hudi.apache.org/community/get-involved  just click the join group 
link


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] noahtaite commented on issue #6048: [SUPPORT] S3 throttling while loading a table written with "hoodie.metadata.enable" = true

2022-07-05 Thread GitBox


noahtaite commented on issue #6048:
URL: https://github.com/apache/hudi/issues/6048#issuecomment-1175210980

   ![MicrosoftTeams-image 
(6)](https://user-images.githubusercontent.com/24283126/177366433-d21d3d16-31ed-41c8-93cf-aff1ae37b687.png)
   
   You can see the effect is a spike here after inserting 500 more data sources 
(for a total of 1500) and then calling load() to do schema validation for the 
last 500.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] tommss commented on issue #6038: [SUPPORT] MOR taking more time than COW using HoodieJavaWriteClient

2022-07-05 Thread GitBox


tommss commented on issue #6038:
URL: https://github.com/apache/hudi/issues/6038#issuecomment-1175202995

   I have sent request to add me to the slack group 
(https://github.com/apache/hudi/issues/143).
Can you add me there 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] tommss commented on issue #143: Tracking ticket for folks to be added to slack group

2022-07-05 Thread GitBox


tommss commented on issue #143:
URL: https://github.com/apache/hudi/issues/143#issuecomment-1175201665

   Hi,Please add me to slack group
   Email: sheisher...@gmail.com
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 commented on issue #6038: [SUPPORT] MOR taking more time than COW using HoodieJavaWriteClient

2022-07-05 Thread GitBox


fengjian428 commented on issue #6038:
URL: https://github.com/apache/hudi/issues/6038#issuecomment-1175192155

   why don't you create a dataframe on top of rdd and then save it to hudi?
   btw have you joined hudi's slack? we can discuss this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 commented on issue #6038: [SUPPORT] MOR taking more time than COW using HoodieJavaWriteClient

2022-07-05 Thread GitBox


fengjian428 commented on issue #6038:
URL: https://github.com/apache/hudi/issues/6038#issuecomment-1175179593

   HoodieJavaWriteClient use HoodieJavaMergeOnReadTable for handle mor table, 
and HoodieJavaMergeOnReadTable does nothing but inherits function from 
HoodieJavaCopyOnWriteTable. 
   So if you cannot change to SparkWriteClient or Flink, we need to implement 
HoodieJavaMergeOnReadTable's function


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] tommss commented on issue #6038: [SUPPORT] MOR taking more time than COW using HoodieJavaWriteClient

2022-07-05 Thread GitBox


tommss commented on issue #6038:
URL: https://github.com/apache/hudi/issues/6038#issuecomment-1175174196

   - I changed index to Bloom to see if it makes any difference, but it does 
not.
   - What do you mean by HoodieJavaMergeOnReadTable is unfinished ?
   - Below is what we are trying to achieve in the cluster and the reason for 
using hudi java client.
   
   
   
![image](https://user-images.githubusercontent.com/3656499/177359536-299c0c3f-bc68-4159-8b8e-197986d12139.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-07-05 Thread GitBox


hudi-bot commented on PR #6017:
URL: https://github.com/apache/hudi/pull/6017#issuecomment-1175163203

   
   ## CI report:
   
   * 572b3bd83c499348795f380004520f880506cf86 UNKNOWN
   * d98c0e31e8d401014fe7338207e45879e5828f99 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9726)
 
   * 65d15683ec3b8084330a6df7e121ca4218b83b2f UNKNOWN
   * 26f78b05748846a5724c2153d52a695cba641759 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9732)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-07-05 Thread GitBox


hudi-bot commented on PR #6017:
URL: https://github.com/apache/hudi/pull/6017#issuecomment-1175158251

   
   ## CI report:
   
   * 572b3bd83c499348795f380004520f880506cf86 UNKNOWN
   * d98c0e31e8d401014fe7338207e45879e5828f99 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9726)
 
   * 65d15683ec3b8084330a6df7e121ca4218b83b2f UNKNOWN
   * 26f78b05748846a5724c2153d52a695cba641759 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance

2022-07-05 Thread GitBox


hudi-bot commented on PR #6046:
URL: https://github.com/apache/hudi/pull/6046#issuecomment-1175153416

   
   ## CI report:
   
   * 58cf2096e648ccc8c7e7c563003753ce89a90261 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9730)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-07-05 Thread GitBox


hudi-bot commented on PR #6017:
URL: https://github.com/apache/hudi/pull/6017#issuecomment-1175153262

   
   ## CI report:
   
   * 572b3bd83c499348795f380004520f880506cf86 UNKNOWN
   * d98c0e31e8d401014fe7338207e45879e5828f99 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9726)
 
   * 65d15683ec3b8084330a6df7e121ca4218b83b2f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (fbda4ad5bd -> 23c9c5c296)

2022-07-05 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from fbda4ad5bd [HUDI-4360] Fix HoodieDropPartitionsTool based on 
refactored meta sync (#6043)
 add 23c9c5c296 [HUDI-3836] Improve the way of fetching metadata partitions 
from table (#5286)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/index/bloom/HoodieBloomIndex.java  |  3 +-
 .../org/apache/hudi/io/HoodieKeyLookupHandle.java  |  3 +-
 .../metadata/HoodieBackedTableMetadataWriter.java  |  7 ++---
 .../java/org/apache/hudi/table/HoodieTable.java|  5 ++--
 .../table/action/index/RunIndexActionExecutor.java |  5 ++--
 .../index/bloom/SparkHoodieBloomIndexHelper.java   |  3 +-
 .../functional/TestHoodieBackedMetadata.java   | 27 +-
 .../hudi/client/functional/TestHoodieIndex.java|  3 +-
 .../hudi/common/table/HoodieTableConfig.java   | 10 +++
 .../hudi/metadata/HoodieTableMetadataUtil.java |  6 +---
 .../scala/org/apache/hudi/HoodieFileIndex.scala|  2 +-
 .../org/apache/hudi/utilities/HoodieIndexer.java   |  5 ++--
 .../apache/hudi/utilities/TestHoodieIndexer.java   | 33 +++---
 13 files changed, 49 insertions(+), 63 deletions(-)



[GitHub] [hudi] yihua merged pull request #5286: [HUDI-3836] Improve the way of fetching metadata partitions from table

2022-07-05 Thread GitBox


yihua merged PR #5286:
URL: https://github.com/apache/hudi/pull/5286


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #6042: [HUDI-4359] Support show_fs_path_detail command on Call Produce Command

2022-07-05 Thread GitBox


xiarixiaoyao commented on PR #6042:
URL: https://github.com/apache/hudi/pull/6042#issuecomment-1175145943

   nice work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6036: [HUDI-4357] Support flink 1.15.x

2022-07-05 Thread GitBox


hudi-bot commented on PR #6036:
URL: https://github.com/apache/hudi/pull/6036#issuecomment-1175083192

   
   ## CI report:
   
   * ca25b34f04a0bf02daf2c2198fe3db5ede544129 UNKNOWN
   * 64e3f11d32fc3dd5cb6bc8158913994e3b6a691f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9728)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6036: [HUDI-4357] Support flink 1.15.x

2022-07-05 Thread GitBox


hudi-bot commented on PR #6036:
URL: https://github.com/apache/hudi/pull/6036#issuecomment-1175077768

   
   ## CI report:
   
   * ca25b34f04a0bf02daf2c2198fe3db5ede544129 UNKNOWN
   * 64e3f11d32fc3dd5cb6bc8158913994e3b6a691f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9728)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6028: [HUDI-4355] Bulk insert As Row: Should also repartiiton records if populateMetaFields is false

2022-07-05 Thread GitBox


hudi-bot commented on PR #6028:
URL: https://github.com/apache/hudi/pull/6028#issuecomment-1175077675

   
   ## CI report:
   
   * ba6cd2a43d6a4f4e69594d434c34a859419fcff7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9729)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on pull request #5608: [HUDI-2150] Rename/Restructure configs for better modularity

2022-07-05 Thread GitBox


codope commented on PR #5608:
URL: https://github.com/apache/hudi/pull/5608#issuecomment-1175069366

   @liujinhui1994 I'll pick up the review this week. A couple of high-level 
questions:
   1. Are there any default behaviour changes?
   2. For the renames, is the backward compatibility handled? If not, you can 
explore the usage of `ConfigProperty#withAlternatives` API. Let's add some 
compatibility UTs if not already added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >