[jira] [Updated] (HUDI-5656) Metadata Bootstrap flow resulting in NPE

2023-01-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5656:
-
Labels: pull-request-available  (was: )

> Metadata Bootstrap flow resulting in NPE
> 
>
> Key: HUDI-5656
> URL: https://issues.apache.org/jira/browse/HUDI-5656
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Affects Versions: 0.13.0
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> After adding a simple statement forcing the test to read whole bootstrapped 
> table:
> {code:java}
> sqlContext.sql("select * from bootstrapped").show(); {code}
>  
> Following NPE have been observed on master 
> (testBulkInsertsAndUpsertsWithBootstrap):
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 183.0 failed 1 times, most recent failure: Lost task 0.0 in stage 183.0 
> (TID 971, localhost, executor driver): java.lang.NullPointerException
>     at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:109)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_1$(Unknown
>  Source)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>     at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>     at scala.collection.Iterator$$anon$10.next(Iterator.scala:448)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:256)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:836)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:836)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>     at org.apache.spark.scheduler.Task.run(Task.scala:123)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)Driver stacktrace:    at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:1889)
>     at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1877)
>     at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1876)
>     at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:59)
>     at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:52)
>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>     at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
>     at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:926)
>     at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:926)
>     at scala.Option.foreach(Option.scala:257)
>     at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
>     at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
>     at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
>     at 
> org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:365)
>     at 
> 

[GitHub] [hudi] alexeykudinkin opened a new pull request, #7804: [HUDI-5656] Fixing NPE while reading `HoodieBootstrapRelation`

2023-01-30 Thread via GitHub


alexeykudinkin opened a new pull request, #7804:
URL: https://github.com/apache/hudi/pull/7804

   ### Change Logs
   
   Currently `HoodieBootstrapRelation` is improperly treating partitioned 
tables resulting in NPE, while trying to read bootstrapped table. To address 
that `HoodieBootstrapRelation` have been rebased onto `HoodieBaseRelation` 
providing some of the common semantic across all of the Hudi's file-based 
Partition implementations (schema handling, file-listing, etc)
   
   ### Impact
   
   Addresses NPE in current implementation of `HoodieBootstrapRelation`
   
   ### Risk level (write none, low medium or high below)
   
   Medium
   
   ### Documentation Update
   
   TBA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on pull request #7669: [HUDI-5553] Prevent partition(s) from being dropped if there are pending…

2023-01-30 Thread via GitHub


SteNicholas commented on PR #7669:
URL: https://github.com/apache/hudi/pull/7669#issuecomment-1409895036

   @voonhous, I had a voice communication with @YannByron. This PR is just a 
temporary check for deleting partitions. The final reasonable implementation 
should be to put the check in the table service of compaction or clustering. In 
other words , table service should not affect the execution of DDL such as 
deleting partitions. However, it is beneficial to temporarily add the check in 
this PR. For example, user A creates a Flink task that writes Hudi table 1 and 
enables asynchronous clustering. At this time, when user B wants to delete 
partitions, he can Perceived that this partition has a corresponding pending 
table service, it can at least temporarily refuse user B to delete the 
partition to avoid affecting user A's Flink task. To sum up, you can merge 
first, and create a ticket to optimize this check in table service.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (1377143656a -> 252c4033010)

2023-01-30 Thread mengtao
This is an automated email from the ASF dual-hosted git repository.

mengtao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 1377143656a [HUDI-5487] Reduce duplicate logs in ExternalSpillableMap 
(#7579)
 add 252c4033010 [MINOR] Standardise schema concepts on Flink Engine (#7761)

No new revisions were added by this update.

Summary of changes:
 .../internal/schema/utils/InternalSchemaUtils.java |  4 +-
 .../hudi/table/format/InternalSchemaManager.java   | 57 +-
 .../apache/hudi/table/format/RecordIterators.java  |  8 +--
 3 files changed, 41 insertions(+), 28 deletions(-)



[GitHub] [hudi] xiarixiaoyao merged pull request #7761: [MINOR] Standardise schema concepts on Flink Engine

2023-01-30 Thread via GitHub


xiarixiaoyao merged PR #7761:
URL: https://github.com/apache/hudi/pull/7761


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #7761: [MINOR] Standardise schema concepts on Flink Engine

2023-01-30 Thread via GitHub


xiarixiaoyao commented on PR #7761:
URL: https://github.com/apache/hudi/pull/7761#issuecomment-1409863546

   ut failure has nothing to do with pr
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] duc-dn commented on issue #7791: [SUPPORT] Don't see metadata folder in .hoodie folder when ingesting data use hudi kafka connector

2023-01-30 Thread via GitHub


duc-dn commented on issue #7791:
URL: https://github.com/apache/hudi/issues/7791#issuecomment-1409860750

   @danny0405 
   This is my hoodie.properties file
   
![image](https://user-images.githubusercontent.com/64005590/215689022-dcd4398d-c314-43a1-847c-a0416463246c.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-30 Thread via GitHub


yuzhaojing commented on code in PR #5926:
URL: https://github.com/apache/hudi/pull/5926#discussion_r1091522679


##
hudi-platform-service/hudi-table-service-manager/src/main/java/org/apache/hudi/table/service/manager/entity/AssistQueryEntity.java:
##
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.service.manager.entity;
+
+import org.apache.hudi.table.service.manager.common.ServiceConfig;
+import org.apache.hudi.table.service.manager.util.DateTimeUtils;
+
+import lombok.Getter;
+
+import java.util.Date;

Review Comment:
   Ok.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-30 Thread via GitHub


yuzhaojing commented on code in PR #5926:
URL: https://github.com/apache/hudi/pull/5926#discussion_r1091520884


##
hudi-platform-service/hudi-table-service-manager/src/main/java/org/apache/hudi/table/service/manager/executor/submitter/ExecutionEngine.java:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.service.manager.executor.submitter;
+
+import 
org.apache.hudi.table.service.manager.common.HoodieTableServiceManagerConfig;
+import org.apache.hudi.table.service.manager.entity.Instance;
+import 
org.apache.hudi.table.service.manager.exception.HoodieTableServiceManagerException;
+import org.apache.hudi.table.service.manager.store.impl.InstanceService;
+
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+import java.util.Map;
+
+public abstract class ExecutionEngine {

Review Comment:
   Sure, will add doc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-30 Thread via GitHub


yuzhaojing commented on code in PR #5926:
URL: https://github.com/apache/hudi/pull/5926#discussion_r1091520637


##
hudi-platform-service/hudi-table-service-manager/src/main/java/org/apache/hudi/table/service/manager/service/BaseService.java:
##
@@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.service.manager.service;
+
+public interface BaseService {
+
+  void init();
+
+  void startService();

Review Comment:
   Ok.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-30 Thread via GitHub


yuzhaojing commented on code in PR #5926:
URL: https://github.com/apache/hudi/pull/5926#discussion_r1091520367


##
hudi-platform-service/hudi-table-service-manager/src/main/java/org/apache/hudi/table/service/manager/store/impl/InstanceService.java:
##
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.service.manager.store.impl;
+
+import org.apache.hudi.table.service.manager.common.ServiceContext;
+import org.apache.hudi.table.service.manager.entity.AssistQueryEntity;
+import org.apache.hudi.table.service.manager.entity.Instance;
+import org.apache.hudi.table.service.manager.entity.InstanceStatus;
+
+import org.apache.hudi.table.service.manager.store.jdbc.JdbcMapper;
+
+import org.apache.ibatis.session.RowBounds;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+public class InstanceService {
+
+  private static Logger LOG = LogManager.getLogger(InstanceService.class);
+
+  private JdbcMapper jdbcMapper = ServiceContext.getJdbcMapper();
+
+  private static final String NAMESPACE = "Instance";
+
+  public void createInstance() {
+try {
+  jdbcMapper.updateObject(statement(NAMESPACE, "createInstance"), null);
+} catch (Exception e) {
+  throw new RuntimeException(e);
+}
+  }
+
+  public void saveInstance(Instance instance) {
+try {
+  jdbcMapper.saveObject(statement(NAMESPACE, "saveInstance"), instance);
+} catch (Exception e) {
+  throw new RuntimeException(e);
+}
+  }
+
+  public void updateStatus(Instance instance) {
+try {
+  int ret = jdbcMapper.updateObject(statement(NAMESPACE, 
getUpdateStatusSqlId(instance)), instance);
+  if (ret != 1) {
+LOG.error("Fail update status instance: " + instance);
+throw new RuntimeException("Fail update status instance: " + 
instance.getIdentifier());
+  }
+  LOG.info("Success update status instance: " + instance.getIdentifier());
+} catch (Exception e) {
+  LOG.error("Fail update status, instance: " + instance.getIdentifier() + 
", errMsg: ", e);
+  throw new RuntimeException(e);
+}
+  }
+
+  public void updateExecutionInfo(Instance instance) {
+int retryNum = 0;
+try {
+  while (retryNum++ < 3) {

Review Comment:
   Will update it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-30 Thread via GitHub


yuzhaojing commented on code in PR #5926:
URL: https://github.com/apache/hudi/pull/5926#discussion_r1091520137


##
hudi-platform-service/hudi-table-service-manager/src/main/java/org/apache/hudi/table/service/manager/RequestHandler.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.service.manager;
+
+import org.apache.hudi.client.HoodieTableServiceManagerClient;
+import org.apache.hudi.table.service.manager.entity.Action;
+import org.apache.hudi.table.service.manager.entity.Engine;
+import org.apache.hudi.table.service.manager.entity.Instance;
+import org.apache.hudi.table.service.manager.entity.InstanceStatus;
+import org.apache.hudi.table.service.manager.handlers.ActionHandler;
+import org.apache.hudi.table.service.manager.store.MetadataStore;
+import org.apache.hudi.table.service.manager.util.InstanceUtil;
+
+import io.javalin.Context;
+import io.javalin.Handler;
+import io.javalin.Javalin;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+import org.jetbrains.annotations.NotNull;
+
+import java.util.Locale;
+
+/**
+ * Main REST Handler class that handles and delegates calls to timeline 
relevant handlers.
+ */
+public class RequestHandler {
+
+  private static final Logger LOG = LogManager.getLogger(RequestHandler.class);
+
+  private final Javalin app;
+  private final ActionHandler actionHandler;
+
+  public RequestHandler(Javalin app,
+Configuration conf,
+MetadataStore metadataStore) {
+this.app = app;
+this.actionHandler = new ActionHandler(conf, metadataStore);
+  }
+
+  public void register() {
+registerCompactionAPI();
+registerClusteringAPI();
+registerCleanAPI();
+  }
+
+  /**
+   * Register Compaction API calls.
+   */
+  private void registerCompactionAPI() {
+app.get(HoodieTableServiceManagerClient.EXECUTE_COMPACTION, new 
ViewHandler(ctx -> {
+  for (String instant : 
ctx.validatedQueryParam(HoodieTableServiceManagerClient.INSTANT_PARAM).getOrThrow().split(","))
 {
+Instance instance = Instance.builder()
+
.basePath(ctx.validatedQueryParam(HoodieTableServiceManagerClient.BASEPATH_PARAM).getOrThrow())
+
.dbName(ctx.validatedQueryParam(HoodieTableServiceManagerClient.DATABASE_NAME_PARAM).getOrThrow())
+
.tableName(ctx.validatedQueryParam(HoodieTableServiceManagerClient.TABLE_NAME_PARAM).getOrThrow())
+.action(Action.COMPACTION.getValue())
+.instant(instant)
+
.executionEngine(Engine.valueOf(ctx.validatedQueryParam(HoodieTableServiceManagerClient.EXECUTION_ENGINE).getOrThrow().toUpperCase(Locale.ROOT)))
+
.userName(ctx.validatedQueryParam(HoodieTableServiceManagerClient.USERNAME).getOrThrow())
+
.queue(ctx.validatedQueryParam(HoodieTableServiceManagerClient.QUEUE).getOrThrow())
+
.resource(ctx.validatedQueryParam(HoodieTableServiceManagerClient.RESOURCE).getOrThrow())
+
.parallelism(ctx.validatedQueryParam(HoodieTableServiceManagerClient.PARALLELISM).getOrThrow())
+.status(InstanceStatus.SCHEDULED.getStatus())
+.build();
+InstanceUtil.checkArgument(instance);
+actionHandler.scheduleCompaction(instance);
+  }
+}));
+  }
+
+  /**
+   * Register Clustering API calls.
+   */
+  private void registerClusteringAPI() {
+app.get(HoodieTableServiceManagerClient.EXECUTE_CLUSTERING, new 
ViewHandler(ctx -> {
+  Instance instance = Instance.builder()
+  
.basePath(ctx.validatedQueryParam(HoodieTableServiceManagerClient.BASEPATH_PARAM).getOrThrow())
+  
.dbName(ctx.validatedQueryParam(HoodieTableServiceManagerClient.DATABASE_NAME_PARAM).getOrThrow())
+  
.tableName(ctx.validatedQueryParam(HoodieTableServiceManagerClient.TABLE_NAME_PARAM).getOrThrow())
+  .action(Action.CLUSTERING.getValue())
+  
.instant(ctx.validatedQueryParam(HoodieTableServiceManagerClient.INSTANT_PARAM).getOrThrow())
+  

[GitHub] [hudi] hudi-bot commented on pull request #7803: [HUDI-5661] Add ConflictResolutionStrategy for bucket index

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7803:
URL: https://github.com/apache/hudi/pull/7803#issuecomment-1409856892

   
   ## CI report:
   
   * 410d0d504acaa6ff46ee85bb3dddb46cf5fb18fb UNKNOWN
   * 53807f6493b7056be1afdd7e78353a354514f845 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-30 Thread via GitHub


yuzhaojing commented on code in PR #5926:
URL: https://github.com/apache/hudi/pull/5926#discussion_r1091519698


##
hudi-platform-service/hudi-table-service-manager/src/main/java/org/apache/hudi/table/service/manager/store/jdbc/SqlSessionFactoryUtil.java:
##
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.service.manager.store.jdbc;
+
+import 
org.apache.hudi.table.service.manager.exception.HoodieTableServiceManagerException;
+
+import org.apache.ibatis.io.Resources;
+import org.apache.ibatis.session.SqlSession;
+import org.apache.ibatis.session.SqlSessionFactory;
+import org.apache.ibatis.session.SqlSessionFactoryBuilder;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.sql.PreparedStatement;
+import java.util.stream.Collectors;
+
+public class SqlSessionFactoryUtil {

Review Comment:
   In the follow-up PR, hudi-platform-common will be extracted for unification, 
let us follow up.



##
hudi-platform-service/hudi-table-service-manager/src/main/java/org/apache/hudi/table/service/manager/HoodieTableServiceManager.java:
##
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.service.manager;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.util.ReflectionUtils;
+import org.apache.hudi.table.service.manager.common.CommandConfig;
+import 
org.apache.hudi.table.service.manager.common.HoodieTableServiceManagerConfig;
+import org.apache.hudi.table.service.manager.service.BaseService;
+import org.apache.hudi.table.service.manager.service.CleanService;
+import org.apache.hudi.table.service.manager.service.ExecutorService;
+import org.apache.hudi.table.service.manager.service.MonitorService;
+import org.apache.hudi.table.service.manager.service.RestoreService;
+import org.apache.hudi.table.service.manager.service.RetryService;
+import org.apache.hudi.table.service.manager.service.ScheduleService;
+import org.apache.hudi.table.service.manager.store.MetadataStore;
+
+import com.beust.jcommander.JCommander;
+import io.javalin.Javalin;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Main class of hoodie table service manager.
+ *
+ * @Experimental
+ * @since 0.13.0
+ */
+public class HoodieTableServiceManager {
+
+  private static final Logger LOG = 
LogManager.getLogger(HoodieTableServiceManager.class);
+
+  private final int serverPort;
+  private final Configuration conf;
+  private transient Javalin app = null;
+  private List services;
+  private final MetadataStore metadataStore;
+  private final HoodieTableServiceManagerConfig tableServiceManagerConfig;
+
+  public HoodieTableServiceManager(CommandConfig config) {
+this.conf = FSUtils.prepareHadoopConf(new Configuration());
+this.tableServiceManagerConfig = 
CommandConfig.toTableServiceManagerConfig(config);
+this.serverPort = config.serverPort;
+this.metadataStore = initMetadataStore();
+  }
+
+  public void startService() {
+app = Javalin.create();
+RequestHandler requestHandler = new RequestHandler(app, conf, 
metadataStore);
+app.get("/", ctx -> ctx.result("Hello World"));
+requestHandler.register();
+app.start(serverPort);
+registerService();

[GitHub] [hudi] hudi-bot commented on pull request #7614: [HUDI-5509] check if dfs support atomic creation when using filesyste…

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7614:
URL: https://github.com/apache/hudi/pull/7614#issuecomment-1409856482

   
   ## CI report:
   
   * 48630689184d006a4be0ad9eef2ade76919458cd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14174)
 
   * 058ab2703bda207fc9f5861d5e4b865e83ee1b45 UNKNOWN
   * 3c010a86327c341b29aaea9ff6ca571855951bd3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7680: [HUDI-5548] spark sql show | update hudi's table properties

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7680:
URL: https://github.com/apache/hudi/pull/7680#issuecomment-1409856659

   
   ## CI report:
   
   * 0970573f82ef1a49184d1875975463f76f7d791d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14686)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14760)
 
   * b87dbc4ca43aa4e2565f11d87d74cd018b95b6cf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14807)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-30 Thread via GitHub


yuzhaojing commented on code in PR #5926:
URL: https://github.com/apache/hudi/pull/5926#discussion_r1091518781


##
hudi-platform-service/hudi-table-service-manager/src/main/java/org/apache/hudi/table/service/manager/util/DateTimeUtils.java:
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.service.manager.util;
+
+import java.util.Calendar;
+import java.util.Date;
+
+public class DateTimeUtils {

Review Comment:
   Sure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on pull request #7675: [HUDI-5561] The preCombine method of PartialUpdateAvroPayload is not called

2023-01-30 Thread via GitHub


xicm commented on PR #7675:
URL: https://github.com/apache/hudi/pull/7675#issuecomment-1409854509

   @danny0405 The problem has been fix by #7759, we can close this pr.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on pull request #7669: [HUDI-5553] Prevent partition(s) from being dropped if there are pending…

2023-01-30 Thread via GitHub


voonhous commented on PR #7669:
URL: https://github.com/apache/hudi/pull/7669#issuecomment-1409854140

   @YannByron Hmmm, are you envisioning option 3 as a solution for this issue 
that is described here? 
https://github.com/apache/hudi/pull/7669#discussion_r1090237751
   
   i.e. Any new filegroup that is created from a filegroup that is flagged for 
deletion should also be flagged for deletion? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7803: [HUDI-5661] Add ConflictResolutionStrategy for bucket index

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7803:
URL: https://github.com/apache/hudi/pull/7803#issuecomment-1409850518

   
   ## CI report:
   
   * 410d0d504acaa6ff46ee85bb3dddb46cf5fb18fb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7802: [DNM] Disable default Avro schema validation

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7802:
URL: https://github.com/apache/hudi/pull/7802#issuecomment-1409850470

   
   ## CI report:
   
   * 88d82b6c21f0c4d81409dae0f5420cea116954ba Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14805)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7787: [HUDI-5646] Guard dropping columns by a config, do not allow by default

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7787:
URL: https://github.com/apache/hudi/pull/7787#issuecomment-1409850396

   
   ## CI report:
   
   * a930137a8435863ad4b01997b731895328c0fa59 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14790)
 
   * 8849a731a979b687f490674e7fc15f8301947b27 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14804)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7706: [HUDI-5585][flink]Fix flink creates and writes the table, the spark alter table reports an error

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7706:
URL: https://github.com/apache/hudi/pull/7706#issuecomment-1409850237

   
   ## CI report:
   
   * 126951c4f2e2581ffbfb996df3d2ea325290f7f6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14447)
 
   * 8c4ddec99d740d881bbabb9ca27e860c217acc2c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14803)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7680: [HUDI-5548] spark sql show | update hudi's table properties

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7680:
URL: https://github.com/apache/hudi/pull/7680#issuecomment-1409850148

   
   ## CI report:
   
   * 0970573f82ef1a49184d1875975463f76f7d791d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14686)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14760)
 
   * b87dbc4ca43aa4e2565f11d87d74cd018b95b6cf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7669: [HUDI-5553] Prevent partition(s) from being dropped if there are pending…

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7669:
URL: https://github.com/apache/hudi/pull/7669#issuecomment-1409850055

   
   ## CI report:
   
   * 6e1d03f8dd6c292959ee29c8592ca4340d2aca46 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14796)
 
   * 343b22a4f7cd783ff6a69ba19df828f221c69c5a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14802)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7378: [HUDI-5329] spark reads hudi table error when flink creates the table without preCombine fields

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7378:
URL: https://github.com/apache/hudi/pull/7378#issuecomment-1409849655

   
   ## CI report:
   
   * e6dd84eef8e98d43c19f31df2a88f6b1d3f6f717 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13455)
 
   * 576dad186d2bfbe4bc497d75d17c6ded88df35a5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14801)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7614: [HUDI-5509] check if dfs support atomic creation when using filesyste…

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7614:
URL: https://github.com/apache/hudi/pull/7614#issuecomment-1409849911

   
   ## CI report:
   
   * 48630689184d006a4be0ad9eef2ade76919458cd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14174)
 
   * 058ab2703bda207fc9f5861d5e4b865e83ee1b45 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5662) Build failure on upgrading hudi-presto-bundle to version 0.13.0 in Presto

2023-01-30 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-5662:
-

 Summary: Build failure on upgrading hudi-presto-bundle to version 
0.13.0 in Presto
 Key: HUDI-5662
 URL: https://issues.apache.org/jira/browse/HUDI-5662
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit


After upgrading to 0.13.0-rc1 as shown in 
[https://github.com/codope/presto/commit/8779d4f17be10861d7726226e74397c6d9b316fd]

and building the project, we get the following error:
{code:java}
Failed while enforcing RequireUpperBoundDeps. The error(s) are [
Require upper bound dependencies error for org.objenesis:objenesis:1.3 paths to 
dependency are:
+-com.facebook.presto:presto-hive:0.280-SNAPSHOT
  +-org.apache.hudi:hudi-presto-bundle:0.13.0-rc1
    +-com.esotericsoftware:kryo-shaded:4.0.2
      +-org.objenesis:objenesis:1.3 (managed) <-- org.objenesis:objenesis:2.5.1
] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] YannByron commented on pull request #7669: [HUDI-5553] Prevent partition(s) from being dropped if there are pending…

2023-01-30 Thread via GitHub


YannByron commented on PR #7669:
URL: https://github.com/apache/hudi/pull/7669#issuecomment-1409849451

   @voonhous @SteNicholas @XuQianJin-Stars hey guys, sorry that I have a 
different thought. I think the drop-partition operation should be allowed to 
execute, and the table service action should re-check whether the touched file 
groups have been updated when the table service action is actually executing. 
IMO, the operations to data is prior to the table management operations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7802: [DNM] Disable default Avro schema validation

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7802:
URL: https://github.com/apache/hudi/pull/7802#issuecomment-1409843991

   
   ## CI report:
   
   * 88d82b6c21f0c4d81409dae0f5420cea116954ba UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7787: [HUDI-5646] Guard dropping columns by a config, do not allow by default

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7787:
URL: https://github.com/apache/hudi/pull/7787#issuecomment-1409843915

   
   ## CI report:
   
   * 4a25ef8aaf5415ea6a918b7f32c5f3a84597ddec Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14781)
 
   * a930137a8435863ad4b01997b731895328c0fa59 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14790)
 
   * 8849a731a979b687f490674e7fc15f8301947b27 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7769: [HUDI-5633] Fixing performance regression in `HoodieSparkRecord`

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7769:
URL: https://github.com/apache/hudi/pull/7769#issuecomment-1409843852

   
   ## CI report:
   
   * 9bfa20f45fcc675b79053bb8b4f379b09c6cd6c5 UNKNOWN
   * 325244765016f67034fd8f364942028fe217ecb5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14788)
 
   * 24020a964671b35fb9aa7b86748771fd71512495 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14800)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7706: [HUDI-5585][flink]Fix flink creates and writes the table, the spark alter table reports an error

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7706:
URL: https://github.com/apache/hudi/pull/7706#issuecomment-1409843749

   
   ## CI report:
   
   * 126951c4f2e2581ffbfb996df3d2ea325290f7f6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14447)
 
   * 8c4ddec99d740d881bbabb9ca27e860c217acc2c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7669: [HUDI-5553] Prevent partition(s) from being dropped if there are pending…

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7669:
URL: https://github.com/apache/hudi/pull/7669#issuecomment-1409843559

   
   ## CI report:
   
   * c4ecd3d09de159fab46b6bcadc502cea3a76e4cb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14770)
 
   * 6e1d03f8dd6c292959ee29c8592ca4340d2aca46 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14796)
 
   * 343b22a4f7cd783ff6a69ba19df828f221c69c5a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7378: [HUDI-5329] spark reads hudi table error when flink creates the table without preCombine fields

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7378:
URL: https://github.com/apache/hudi/pull/7378#issuecomment-1409843214

   
   ## CI report:
   
   * e6dd84eef8e98d43c19f31df2a88f6b1d3f6f717 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13455)
 
   * 576dad186d2bfbe4bc497d75d17c6ded88df35a5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5661) Add ConflictResolutionStrategy for bucket index

2023-01-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5661:
-
Labels: pull-request-available  (was: )

> Add ConflictResolutionStrategy for bucket index
> ---
>
> Key: HUDI-5661
> URL: https://issues.apache.org/jira/browse/HUDI-5661
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xi chaomin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xicm opened a new pull request, #7803: [HUDI-5661] Add ConflictResolutionStrategy for bucket index

2023-01-30 Thread via GitHub


xicm opened a new pull request, #7803:
URL: https://github.com/apache/hudi/pull/7803

   ### Change Logs
   
   `SimpleConcurrentFileWritesConflictResolutionStrategy` check conflict by 
file id , while for bucket index we should check bucket id to see if there is a 
conflict.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7793: [HUDI-5317] Fix insert overwrite table for partitioned table

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7793:
URL: https://github.com/apache/hudi/pull/7793#issuecomment-1409837284

   
   ## CI report:
   
   * 6f3efd8db2ef71ad0861f468a491b6b22e032037 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14791)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7633: Fix Deletes issued without any prior commits

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7633:
URL: https://github.com/apache/hudi/pull/7633#issuecomment-1409836312

   
   ## CI report:
   
   * 6edffd10a0abadfc1f169b10f131b755c8e1280e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14789)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14799)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7434: [HUDI-5240] Clean content when recursive Invocation inflate

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7434:
URL: https://github.com/apache/hudi/pull/7434#issuecomment-1409836047

   
   ## CI report:
   
   * 3494a8f423725c70f16e1f2f4ae2e7ef45a06b35 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13645)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14798)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nfarah86 commented on pull request #7549: [DOCS] improve spark quickstart, info about MT and async services

2023-01-30 Thread via GitHub


nfarah86 commented on PR #7549:
URL: https://github.com/apache/hudi/pull/7549#issuecomment-1409835064

   cc @codope 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5661) Add ConflictResolutionStrategy for bucket index

2023-01-30 Thread xi chaomin (Jira)
xi chaomin created HUDI-5661:


 Summary: Add ConflictResolutionStrategy for bucket index
 Key: HUDI-5661
 URL: https://issues.apache.org/jira/browse/HUDI-5661
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: xi chaomin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] XuQianJin-Stars commented on pull request #7434: [HUDI-5240] Clean content when recursive Invocation inflate

2023-01-30 Thread via GitHub


XuQianJin-Stars commented on PR #7434:
URL: https://github.com/apache/hudi/pull/7434#issuecomment-1409823440

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] liaotian1005 commented on pull request #7633: Fix Deletes issued without any prior commits

2023-01-30 Thread via GitHub


liaotian1005 commented on PR #7633:
URL: https://github.com/apache/hudi/pull/7633#issuecomment-1409806899

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin opened a new pull request, #7802: [DNM] Disable default Avro schema validation

2023-01-30 Thread via GitHub


alexeykudinkin opened a new pull request, #7802:
URL: https://github.com/apache/hudi/pull/7802

   ### Change Logs
   
   As we discussed earlier today, disabling executing Avro schema validation by 
default.
   
   ### Impact
   
   No impact
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] maheshguptags commented on issue #7613: [SUPPORT] Not able to Delete record

2023-01-30 Thread via GitHub


maheshguptags commented on issue #7613:
URL: https://github.com/apache/hudi/issues/7613#issuecomment-1409804936

   Hi @danny0405,
   I tried both option like Upsert, Delete but it is doing the same. so it is 
not working 
   Now if you have any working code for cross platform(flink insert, delete 
spark) to delete the record from hudi table,  please share with us as well.
   
   -Mahesh


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5660) Support bucket index for spark bulk_insert

2023-01-30 Thread Danny Chen (Jira)
Danny Chen created HUDI-5660:


 Summary: Support bucket index for spark bulk_insert
 Key: HUDI-5660
 URL: https://issues.apache.org/jira/browse/HUDI-5660
 Project: Apache Hudi
  Issue Type: Improvement
  Components: spark
Reporter: Danny Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] waywtdcc commented on a diff in pull request #7706: [HUDI-5585][flink]Fix flink creates and writes the table, the spark alter table reports an error

2023-01-30 Thread via GitHub


waywtdcc commented on code in PR #7706:
URL: https://github.com/apache/hudi/pull/7706#discussion_r1091477435


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -438,8 +439,10 @@ public CatalogBaseTable getTable(ObjectPath tablePath) 
throws TableNotExistExcep
   LOG.warn("{} does not have any hoodie schema, and use hive table schema 
to infer the table schema", tablePath);
   schema = HiveSchemaUtils.convertTableSchema(hiveTable);
 }
+org.apache.flink.table.api.Schema resultSchema = 
DataTypeUtils.dropIfExistsColumns(schema, 
HoodieRecord.HOODIE_META_COLUMNS_WITH_OPERATION);
+

Review Comment:
   Indeed, I have removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-30 Thread via GitHub


danny0405 commented on PR #7684:
URL: https://github.com/apache/hudi/pull/7684#issuecomment-1409796950

   The failed test is kind of unrelated: 
https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=14754=logs=dcedfe73-9485-5cc5-817a-73b61fc5dcb0=746585d8-b50a-55c3-26c5-517d93af9934=40524


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #7613: Not able to Delete record

2023-01-30 Thread via GitHub


danny0405 commented on issue #7613:
URL: https://github.com/apache/hudi/issues/7613#issuecomment-1409794955

   I noticed that there is an write operation named `DELETE`, so just switch 
the value to `hoodie.datasource.write.operation` and have a try again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-5487) Reduce duplicate Logs in ExternalSpillableMap

2023-01-30 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-5487.

 Reviewers: Danny Chen
Resolution: Fixed

Fixed via master branch: 1377143656a447ddd6f884380c93be6fb5ecf459

> Reduce duplicate Logs in ExternalSpillableMap
> -
>
> Key: HUDI-5487
> URL: https://issues.apache.org/jira/browse/HUDI-5487
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: dzcxzl
>Assignee: Danny Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.1
>
>
> We see hundreds of thousands of duplicate logs in the executor log.
> {code:java}
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5487) Reduce duplicate Logs in ExternalSpillableMap

2023-01-30 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-5487:
-
Fix Version/s: 0.13.1

> Reduce duplicate Logs in ExternalSpillableMap
> -
>
> Key: HUDI-5487
> URL: https://issues.apache.org/jira/browse/HUDI-5487
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.1
>
>
> We see hundreds of thousands of duplicate logs in the executor log.
> {code:java}
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5487) Reduce duplicate Logs in ExternalSpillableMap

2023-01-30 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen reassigned HUDI-5487:


Assignee: Danny Chen

> Reduce duplicate Logs in ExternalSpillableMap
> -
>
> Key: HUDI-5487
> URL: https://issues.apache.org/jira/browse/HUDI-5487
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: dzcxzl
>Assignee: Danny Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.1
>
>
> We see hundreds of thousands of duplicate logs in the executor log.
> {code:java}
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567
> 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 
> 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size 
> to => 4567 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated: [HUDI-5487] Reduce duplicate logs in ExternalSpillableMap (#7579)

2023-01-30 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 1377143656a [HUDI-5487] Reduce duplicate logs in ExternalSpillableMap 
(#7579)
1377143656a is described below

commit 1377143656a447ddd6f884380c93be6fb5ecf459
Author: cxzl25 
AuthorDate: Tue Jan 31 13:35:27 2023 +0800

[HUDI-5487] Reduce duplicate logs in ExternalSpillableMap (#7579)
---
 .../apache/hudi/common/util/collection/ExternalSpillableMap.java | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java
index ee930e588d0..b540b204214 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java
@@ -202,10 +202,13 @@ public class ExternalSpillableMap= maxInMemorySizeInBytes || 
inMemoryMap.size() % NUMBER_OF_RECORDS_TO_ESTIMATE_PAYLOAD_SIZE == 0) {
-  this.estimatedPayloadSize = (long) (this.estimatedPayloadSize * 0.9 
-+ (keySizeEstimator.sizeEstimate(key) + 
valueSizeEstimator.sizeEstimate(value)) * 0.1);
+  long tmpEstimatedPayloadSize = (long) (this.estimatedPayloadSize * 0.9
+  + (keySizeEstimator.sizeEstimate(key) + 
valueSizeEstimator.sizeEstimate(value)) * 0.1);
+  if (this.estimatedPayloadSize != tmpEstimatedPayloadSize) {
+LOG.info("Update Estimated Payload size to => " + 
this.estimatedPayloadSize);
+  }
+  this.estimatedPayloadSize = tmpEstimatedPayloadSize;
   this.currentInMemoryMapSize = this.inMemoryMap.size() * 
this.estimatedPayloadSize;
-  LOG.info("Update Estimated Payload size to => " + 
this.estimatedPayloadSize);
 }
 
 if (this.currentInMemoryMapSize < maxInMemorySizeInBytes || 
inMemoryMap.containsKey(key)) {



[GitHub] [hudi] danny0405 merged pull request #7579: [HUDI-5487] Reduce duplicate Logs in ExternalSpillableMap

2023-01-30 Thread via GitHub


danny0405 merged PR #7579:
URL: https://github.com/apache/hudi/pull/7579


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #7647: [HUDI-5530] Fix WARNING during compile.

2023-01-30 Thread via GitHub


danny0405 commented on PR #7647:
URL: https://github.com/apache/hudi/pull/7647#issuecomment-1409787529

   cc @alexeykudinkin if you have any time for this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #7791: [SUPPORT] Don't see metadata folder in .hoodie folder when ingesting data use hudi kafka connector

2023-01-30 Thread via GitHub


danny0405 commented on issue #7791:
URL: https://github.com/apache/hudi/issues/7791#issuecomment-1409780774

   Can you check the `.hoodie/hoodie.properties` file so that we make sure the 
metadata table is enabled in the table config? Just to make sure there are no 
conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5659) Support cleaning for archived files

2023-01-30 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-5659:
-
Labels: hudi-on-call  (was: )

> Support cleaning for archived files
> ---
>
> Key: HUDI-5659
> URL: https://issues.apache.org/jira/browse/HUDI-5659
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Danny Chen
>Priority: Major
>  Labels: hudi-on-call
>
> Details comes from the issue: https://github.com/apache/hudi/issues/7800



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 commented on issue #7800: "java.lang.OutOfMemoryError: Requested array size exceeds VM limit" while writing to Hudi COW table

2023-01-30 Thread via GitHub


danny0405 commented on issue #7800:
URL: https://github.com/apache/hudi/issues/7800#issuecomment-1409773708

   Thanks for the feedback @phani482 , sorry to tell you that cleaning of 
archival files are not supported now, I have created a JIRA issue to track 
this: https://issues.apache.org/jira/browse/HUDI-5659
   
   I also noticed that you use the `INSERT` operation, so which spark stage did 
you percive as slow?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7769: [HUDI-5633] Fixing performance regression in `HoodieSparkRecord`

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7769:
URL: https://github.com/apache/hudi/pull/7769#issuecomment-1409773643

   
   ## CI report:
   
   * 9bfa20f45fcc675b79053bb8b4f379b09c6cd6c5 UNKNOWN
   * 325244765016f67034fd8f364942028fe217ecb5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14788)
 
   * 24020a964671b35fb9aa7b86748771fd71512495 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7669: [HUDI-5553] Prevent partition(s) from being dropped if there are pending…

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7669:
URL: https://github.com/apache/hudi/pull/7669#issuecomment-1409773412

   
   ## CI report:
   
   * c4ecd3d09de159fab46b6bcadc502cea3a76e4cb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14770)
 
   * 6e1d03f8dd6c292959ee29c8592ca4340d2aca46 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14796)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nfarah86 commented on pull request #7549: [DOCS] improve spark quickstart, info about MT and async services

2023-01-30 Thread via GitHub


nfarah86 commented on PR #7549:
URL: https://github.com/apache/hudi/pull/7549#issuecomment-1409772889

   cc @bhasudha I'm ok merging it-- I can refactor it later on 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5659) Support cleaning for archived files

2023-01-30 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-5659:
-
Description: Details comes from the issue: 
https://github.com/apache/hudi/issues/7800

> Support cleaning for archived files
> ---
>
> Key: HUDI-5659
> URL: https://issues.apache.org/jira/browse/HUDI-5659
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Danny Chen
>Priority: Major
>
> Details comes from the issue: https://github.com/apache/hudi/issues/7800



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5659) Support cleaning for archived files

2023-01-30 Thread Danny Chen (Jira)
Danny Chen created HUDI-5659:


 Summary: Support cleaning for archived files
 Key: HUDI-5659
 URL: https://issues.apache.org/jira/browse/HUDI-5659
 Project: Apache Hudi
  Issue Type: Improvement
  Components: writer-core
Reporter: Danny Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7769: [HUDI-5633] Fixing performance regression in `HoodieSparkRecord`

2023-01-30 Thread via GitHub


alexeykudinkin commented on code in PR #7769:
URL: https://github.com/apache/hudi/pull/7769#discussion_r1091455819


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java:
##
@@ -159,35 +153,8 @@ public Pair>, String> 
fetchNextBatch(Option lastCkpt
   queryTypeAndInstantEndpts.getRight().getRight()));
 }
 
-/*
- * log.info("Partition Fields are : (" + partitionFields + "). Initial 
Source Schema :" + source.schema());
- *
- * StructType newSchema = new StructType(source.schema().fields()); for 
(String field : partitionFields) { newSchema
- * = newSchema.add(field, DataTypes.StringType, true); }
- *
- * /** Validates if the commit time is sane and also generates Partition 
fields from _hoodie_partition_path if
- * configured
- *
- * Dataset validated = source.map((MapFunction) (Row row) 
-> { // _hoodie_instant_time String
- * instantTime = row.getString(0); 
IncrSourceHelper.validateInstantTime(row, instantTime, instantEndpts.getKey(),
- * instantEndpts.getValue()); if (!partitionFields.isEmpty()) { // 
_hoodie_partition_path String hoodiePartitionPath
- * = row.getString(3); List partitionVals =
- * extractor.extractPartitionValuesInPath(hoodiePartitionPath).stream() 
.map(o -> (Object)
- * o).collect(Collectors.toList()); 
ValidationUtils.checkArgument(partitionVals.size() == partitionFields.size(),
- * "#partition-fields != #partition-values-extracted"); List 
rowObjs = new
- * 
ArrayList<>(scala.collection.JavaConversions.seqAsJavaList(row.toSeq())); 
rowObjs.addAll(partitionVals); return
- * RowFactory.create(rowObjs.toArray()); } return row; }, 
RowEncoder.apply(newSchema));
- *
- * log.info("Validated Source Schema :" + validated.schema());
- */
-boolean dropAllMetaFields = 
props.getBoolean(Config.HOODIE_DROP_ALL_META_FIELDS_FROM_SOURCE,
-Config.DEFAULT_HOODIE_DROP_ALL_META_FIELDS_FROM_SOURCE);
-
-// Remove Hoodie meta columns except partition path from input source
-String[] colsToDrop = dropAllMetaFields ? 
HoodieRecord.HOODIE_META_COLUMNS.stream().toArray(String[]::new) :
-HoodieRecord.HOODIE_META_COLUMNS.stream().filter(x -> 
!x.equals(HoodieRecord.PARTITION_PATH_METADATA_FIELD)).toArray(String[]::new);
-final Dataset src = source.drop(colsToDrop);
-// log.info("Final Schema from Source is :" + src.schema());
+// Remove Hoodie meta columns
+final Dataset src = 
source.drop(HoodieRecord.HOODIE_META_COLUMNS.stream().toArray(String[]::new));

Review Comment:
   Change here is to avoid keeping partition-path as this will make 
`HoodieSparkSqlWriter` treat it as data column which is not compatible w/ 
`SparkRecordMerger`
   



##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java:
##
@@ -159,35 +153,8 @@ public Pair>, String> 
fetchNextBatch(Option lastCkpt
   queryTypeAndInstantEndpts.getRight().getRight()));
 }
 
-/*
- * log.info("Partition Fields are : (" + partitionFields + "). Initial 
Source Schema :" + source.schema());

Review Comment:
   Cleaning up dead commented code (not updated since 2018)



##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java:
##
@@ -159,35 +153,8 @@ public Pair>, String> 
fetchNextBatch(Option lastCkpt
   queryTypeAndInstantEndpts.getRight().getRight()));
 }
 
-/*
- * log.info("Partition Fields are : (" + partitionFields + "). Initial 
Source Schema :" + source.schema());
- *
- * StructType newSchema = new StructType(source.schema().fields()); for 
(String field : partitionFields) { newSchema
- * = newSchema.add(field, DataTypes.StringType, true); }
- *
- * /** Validates if the commit time is sane and also generates Partition 
fields from _hoodie_partition_path if
- * configured
- *
- * Dataset validated = source.map((MapFunction) (Row row) 
-> { // _hoodie_instant_time String
- * instantTime = row.getString(0); 
IncrSourceHelper.validateInstantTime(row, instantTime, instantEndpts.getKey(),
- * instantEndpts.getValue()); if (!partitionFields.isEmpty()) { // 
_hoodie_partition_path String hoodiePartitionPath
- * = row.getString(3); List partitionVals =
- * extractor.extractPartitionValuesInPath(hoodiePartitionPath).stream() 
.map(o -> (Object)
- * o).collect(Collectors.toList()); 
ValidationUtils.checkArgument(partitionVals.size() == partitionFields.size(),
- * "#partition-fields != #partition-values-extracted"); List 
rowObjs = new
- * 
ArrayList<>(scala.collection.JavaConversions.seqAsJavaList(row.toSeq())); 
rowObjs.addAll(partitionVals); return
- * RowFactory.create(rowObjs.toArray()); } return row; }, 
RowEncoder.apply(newSchema));
- *
- * log.info("Validated Source Schema :" + validated.schema());
- */
-boolean dropAllMetaFields = 

[GitHub] [hudi] hudi-bot commented on pull request #7801: [HUDI-5657] Fix NPE if filters condition contains null literal when using column stats data skipping for flink

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7801:
URL: https://github.com/apache/hudi/pull/7801#issuecomment-1409768720

   
   ## CI report:
   
   * 11517333c6d2b85b774d64e150825f5544fb7baa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14795)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7669: [HUDI-5553] Prevent partition(s) from being dropped if there are pending…

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7669:
URL: https://github.com/apache/hudi/pull/7669#issuecomment-1409768368

   
   ## CI report:
   
   * c4ecd3d09de159fab46b6bcadc502cea3a76e4cb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14770)
 
   * 6e1d03f8dd6c292959ee29c8592ca4340d2aca46 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7801: [HUDI-5657] Fix NPE if filters condition contains null literal when using column stats data skipping for flink

2023-01-30 Thread via GitHub


danny0405 commented on code in PR #7801:
URL: https://github.com/apache/hudi/pull/7801#discussion_r1091455210


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/stats/ExpressionEvaluator.java:
##
@@ -282,6 +286,9 @@ public static LessThan getInstance() {
 
 @Override
 public boolean eval() {
+  if (this.val == null) {
+return false;
+  }

Review Comment:
   Can we pre-process these NULLs before entering specific evaluators?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7801: [HUDI-5657] Fix NPE if filters condition contains null literal when using column stats data skipping for flink

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7801:
URL: https://github.com/apache/hudi/pull/7801#issuecomment-1409762876

   
   ## CI report:
   
   * 11517333c6d2b85b774d64e150825f5544fb7baa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7769: [HUDI-5633] Fixing performance regression in `HoodieSparkRecord`

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7769:
URL: https://github.com/apache/hudi/pull/7769#issuecomment-1409762705

   
   ## CI report:
   
   * 9bfa20f45fcc675b79053bb8b4f379b09c6cd6c5 UNKNOWN
   * 325244765016f67034fd8f364942028fe217ecb5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14788)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7633: Fix Deletes issued without any prior commits

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7633:
URL: https://github.com/apache/hudi/pull/7633#issuecomment-1409762317

   
   ## CI report:
   
   * 6edffd10a0abadfc1f169b10f131b755c8e1280e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14789)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (9906df48e7c -> 5acc6fe51ac)

2023-01-30 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 9906df48e7c [HUDI-5655] Closing write client for spark ds writer in 
all cases (including exception) (#7799)
 add 5acc6fe51ac [HUDI-5654] Fixing read of an empty rollback completed 
meta files from data table timeline w/ metadata reads (#7798)

No new revisions were added by this update.

Summary of changes:
 .../hudi/metadata/HoodieBackedTableMetadata.java   | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)



[GitHub] [hudi] nsivabalan merged pull request #7798: [HUDI-5654] Fixing reading of empty rollback completed meta files from data table timeline

2023-01-30 Thread via GitHub


nsivabalan merged PR #7798:
URL: https://github.com/apache/hudi/pull/7798


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5658) Add tests for empty rollback commits in data table timeline

2023-01-30 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-5658:
--
Fix Version/s: 0.13.1

> Add tests for empty rollback commits in data table timeline 
> 
>
> Key: HUDI-5658
> URL: https://issues.apache.org/jira/browse/HUDI-5658
> Project: Apache Hudi
>  Issue Type: Test
>  Components: writer-core
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 0.13.1
>
>
> If data table timeline has empty rollback completed commits, it could fail 
> metadata table reads. We have fixed the issue in 
> [https://github.com/apache/hudi/pull/7798.] 
> fastracking the fix for 0.13.0. but filing a tracking ticket to add tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5658) Add tests for empty rollback commits in data table timeline

2023-01-30 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-5658:
-

Assignee: Lokesh Jain

> Add tests for empty rollback commits in data table timeline 
> 
>
> Key: HUDI-5658
> URL: https://issues.apache.org/jira/browse/HUDI-5658
> Project: Apache Hudi
>  Issue Type: Test
>  Components: writer-core
>Reporter: sivabalan narayanan
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.13.1
>
>
> If data table timeline has empty rollback completed commits, it could fail 
> metadata table reads. We have fixed the issue in 
> [https://github.com/apache/hudi/pull/7798.] 
> fastracking the fix for 0.13.0. but filing a tracking ticket to add tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] nsivabalan commented on pull request #7798: [HUDI-5654] Fixing reading of empty rollback completed meta files from data table timeline

2023-01-30 Thread via GitHub


nsivabalan commented on PR #7798:
URL: https://github.com/apache/hudi/pull/7798#issuecomment-1409746828

   CI failed due to known flaky test. going ahead w/ landing. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #7798: [HUDI-5654] Fixing reading of empty rollback completed meta files from data table timeline

2023-01-30 Thread via GitHub


nsivabalan commented on PR #7798:
URL: https://github.com/apache/hudi/pull/7798#issuecomment-1409746650

   yes, https://issues.apache.org/jira/browse/HUDI-5658 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5658) Add tests for empty rollback commits in data table timeline

2023-01-30 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5658:
-

 Summary: Add tests for empty rollback commits in data table 
timeline 
 Key: HUDI-5658
 URL: https://issues.apache.org/jira/browse/HUDI-5658
 Project: Apache Hudi
  Issue Type: Test
  Components: writer-core
Reporter: sivabalan narayanan


If data table timeline has empty rollback completed commits, it could fail 
metadata table reads. We have fixed the issue in 
[https://github.com/apache/hudi/pull/7798.] 

fastracking the fix for 0.13.0. but filing a tracking ticket to add tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated: [HUDI-5655] Closing write client for spark ds writer in all cases (including exception) (#7799)

2023-01-30 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 9906df48e7c [HUDI-5655] Closing write client for spark ds writer in 
all cases (including exception) (#7799)
9906df48e7c is described below

commit 9906df48e7c285994d56e1e2d466372eee57c268
Author: Sivabalan Narayanan 
AuthorDate: Mon Jan 30 20:37:10 2023 -0800

[HUDI-5655] Closing write client for spark ds writer in all cases 
(including exception) (#7799)

Looks like we miss to close the writeClient on some of the failure cases 
while writing via spark-ds and spark-sql writes.
---
 .../org/apache/hudi/HoodieSparkSqlWriter.scala | 23 ++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
index 7e234775faa..304a1303a3b 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
@@ -376,12 +376,22 @@ object HoodieSparkSqlWriter {
 }
 
   // Check for errors and commit the write.
-  val (writeSuccessful, compactionInstant, clusteringInstant) =
-commitAndPerformPostOperations(sqlContext.sparkSession, df.schema,
-  writeResult, parameters, writeClient, tableConfig, jsc,
-  TableInstantInfo(basePath, instantTime, commitActionType, 
operation), extraPreCommitFn)
+  try {
+val (writeSuccessful, compactionInstant, clusteringInstant) =
+  commitAndPerformPostOperations(sqlContext.sparkSession, df.schema,
+writeResult, parameters, writeClient, tableConfig, jsc,
+TableInstantInfo(basePath, instantTime, commitActionType, 
operation), extraPreCommitFn)
 
-  (writeSuccessful, common.util.Option.ofNullable(instantTime), 
compactionInstant, clusteringInstant, writeClient, tableConfig)
+(writeSuccessful, common.util.Option.ofNullable(instantTime), 
compactionInstant, clusteringInstant, writeClient, tableConfig)
+  } finally {
+// close the write client in all cases
+val asyncCompactionEnabled = isAsyncCompactionEnabled(writeClient, 
tableConfig, parameters, jsc.hadoopConfiguration())
+val asyncClusteringEnabled = isAsyncClusteringEnabled(writeClient, 
parameters)
+if (!asyncCompactionEnabled && !asyncClusteringEnabled) {
+  log.info("Closing write client")
+  writeClient.close()
+}
+  }
 }
   }
 
@@ -959,9 +969,6 @@ object HoodieSparkSqlWriter {
 tableInstantInfo.basePath, schema)
 
   log.info(s"Is Async Compaction Enabled ? $asyncCompactionEnabled")
-  if (!asyncCompactionEnabled && !asyncClusteringEnabled) {
-client.close()
-  }
   (commitSuccess && metaSyncSuccess, compactionInstant, clusteringInstant)
 } else {
   log.error(s"${tableInstantInfo.operation} failed with errors")



[GitHub] [hudi] nsivabalan merged pull request #7799: [HUDI-5655] Closing write client for spark ds writer in all cases

2023-01-30 Thread via GitHub


nsivabalan merged PR #7799:
URL: https://github.com/apache/hudi/pull/7799


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #7799: [HUDI-5655] Closing write client for spark ds writer in all cases

2023-01-30 Thread via GitHub


nsivabalan commented on PR #7799:
URL: https://github.com/apache/hudi/pull/7799#issuecomment-1409743454

   CI failed due to known flaky test. going ahead w/ landing. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #7706: [HUDI-5585][flink]Fix flink creates and writes the table, the spark alter table reports an error

2023-01-30 Thread via GitHub


danny0405 commented on PR #7706:
URL: https://github.com/apache/hudi/pull/7706#issuecomment-1409742579

   
[HUDI-5585.patch.zip](https://github.com/apache/hudi/files/10542826/HUDI-5585.patch.zip)
   Thanks for the contribution, I have reviewed and attache a patch, can you 
apply the path with cmd:
   
   `git apply xxx.patch`
   
   then rebase with the latest master code and force push?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7706: [HUDI-5585][flink]Fix flink creates and writes the table, the spark alter table reports an error

2023-01-30 Thread via GitHub


danny0405 commented on code in PR #7706:
URL: https://github.com/apache/hudi/pull/7706#discussion_r1091428782


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -438,8 +439,10 @@ public CatalogBaseTable getTable(ObjectPath tablePath) 
throws TableNotExistExcep
   LOG.warn("{} does not have any hoodie schema, and use hive table schema 
to infer the table schema", tablePath);
   schema = HiveSchemaUtils.convertTableSchema(hiveTable);
 }
+org.apache.flink.table.api.Schema resultSchema = 
DataTypeUtils.dropIfExistsColumns(schema, 
HoodieRecord.HOODIE_META_COLUMNS_WITH_OPERATION);
+

Review Comment:
   In line419, we already ignore the metadata column, so why drop it again?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (HUDI-5568) Fix the BucketStreamWriteFunction to rebase the local filesystem instance instead

2023-01-30 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682370#comment-17682370
 ] 

Danny Chen edited comment on HUDI-5568 at 1/31/23 4:02 AM:
---

Fixed via master branch: d90f286a1971af952becea2267ed772f5e5e4ec7


was (Author: danny0405):
Fixed via master branch: 

> Fix the BucketStreamWriteFunction to rebase the local filesystem instance 
> instead
> -
>
> Key: HUDI-5568
> URL: https://issues.apache.org/jira/browse/HUDI-5568
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: loukey_j
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> writeClient.getHoodieTable().getFileSystemView()  always return the local 
> fileSystemView,
> should use writeClient. getHoodieTable(). getHoodieView() to determine the 
> fileSystemView



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5568) Fix the BucketStreamWriteFunction to rebase the local filesystem instance instead

2023-01-30 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-5568:
-
Summary: Fix the BucketStreamWriteFunction to rebase the local filesystem 
instance instead  (was:  incorrect use of fileSystemView)

> Fix the BucketStreamWriteFunction to rebase the local filesystem instance 
> instead
> -
>
> Key: HUDI-5568
> URL: https://issues.apache.org/jira/browse/HUDI-5568
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: loukey_j
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> writeClient.getHoodieTable().getFileSystemView()  always return the local 
> fileSystemView,
> should use writeClient. getHoodieTable(). getHoodieView() to determine the 
> fileSystemView



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5568) incorrect use of fileSystemView

2023-01-30 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-5568:
-
Fix Version/s: 0.13.0

>  incorrect use of fileSystemView
> 
>
> Key: HUDI-5568
> URL: https://issues.apache.org/jira/browse/HUDI-5568
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: loukey_j
>Assignee: loukey_j
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> writeClient.getHoodieTable().getFileSystemView()  always return the local 
> fileSystemView,
> should use writeClient. getHoodieTable(). getHoodieView() to determine the 
> fileSystemView



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-5568) incorrect use of fileSystemView

2023-01-30 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-5568.
--

>  incorrect use of fileSystemView
> 
>
> Key: HUDI-5568
> URL: https://issues.apache.org/jira/browse/HUDI-5568
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: loukey_j
>Assignee: loukey_j
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> writeClient.getHoodieTable().getFileSystemView()  always return the local 
> fileSystemView,
> should use writeClient. getHoodieTable(). getHoodieView() to determine the 
> fileSystemView



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-5568) incorrect use of fileSystemView

2023-01-30 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-5568.

  Assignee: Danny Chen  (was: loukey_j)
Resolution: Fixed

Fixed via master branch: 

>  incorrect use of fileSystemView
> 
>
> Key: HUDI-5568
> URL: https://issues.apache.org/jira/browse/HUDI-5568
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: loukey_j
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> writeClient.getHoodieTable().getFileSystemView()  always return the local 
> fileSystemView,
> should use writeClient. getHoodieTable(). getHoodieView() to determine the 
> fileSystemView



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated: [HUDI-5568] Fix the BucketStreamWriteFunction to rebase the local filesystem instance instead (#7685)

2023-01-30 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new d90f286a197 [HUDI-5568] Fix the BucketStreamWriteFunction to rebase 
the local filesystem instance instead (#7685)
d90f286a197 is described below

commit d90f286a1971af952becea2267ed772f5e5e4ec7
Author: luokey <854194...@qq.com>
AuthorDate: Mon Jan 30 23:01:14 2023 -0500

[HUDI-5568] Fix the BucketStreamWriteFunction to rebase the local 
filesystem instance instead (#7685)

Should use `writeClient. getHoodieTable(). getHoodieView()` to determine 
the fileSystemView
---
 .../java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java
 
b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java
index c989b4eb29a..cf06dbc18d6 100644
--- 
a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java
+++ 
b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java
@@ -156,7 +156,7 @@ public class BucketStreamWriteFunction extends 
StreamWriteFunction {
 
 // Load existing fileID belongs to this task
 Map bucketToFileIDMap = new HashMap<>();
-
this.writeClient.getHoodieTable().getFileSystemView().getAllFileGroups(partition).forEach(fileGroup
 -> {
+
this.writeClient.getHoodieTable().getHoodieView().getAllFileGroups(partition).forEach(fileGroup
 -> {
   String fileID = fileGroup.getFileGroupId().getFileId();
   int bucketNumber = BucketIdentifier.bucketIdFromFileId(fileID);
   if (isBucketToLoad(bucketNumber, partition)) {



[GitHub] [hudi] danny0405 merged pull request #7685: [HUDI-5568] incorrect use of fileSystemView

2023-01-30 Thread via GitHub


danny0405 merged PR #7685:
URL: https://github.com/apache/hudi/pull/7685


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7795: [HUDI-5651] sort the inputs by record keys for bulk insert tasks

2023-01-30 Thread via GitHub


danny0405 commented on code in PR #7795:
URL: https://github.com/apache/hudi/pull/7795#discussion_r1091424466


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java:
##
@@ -139,10 +142,17 @@ public static DataStreamSink 
bulkInsert(Configuration conf, RowType rowT
 dataStream = dataStream.partitionCustom(partitioner, 
rowDataKeyGen::getPartitionPath);
   }
   if (conf.getBoolean(FlinkOptions.WRITE_BULK_INSERT_SORT_INPUT)) {
-SortOperatorGen sortOperatorGen = new SortOperatorGen(rowType, 
partitionFields);
-// sort by partition keys
+String[] sortFields = partitionFields;
+if 
(conf.getBoolean(FlinkOptions.WRITE_BULK_INSERT_SORT_INPUT_BY_RECORD_KEY)) {
+  // sort by record keys
+  ArrayList sortList = new 
ArrayList<>(Arrays.asList(partitionFields));
+  Collections.addAll(sortList, 
conf.getString(FlinkOptions.RECORD_KEY_FIELD).split(","));
+  sortFields = sortList.toArray(new String[0]);
+}
+SortOperatorGen sortOperatorGen = new SortOperatorGen(rowType, 
sortFields);
+// sort by partition keys or (partition keys and record keys)
 dataStream = dataStream
-.transform("partition_key_sorter",
+.transform("partition_and_record_key_sorter",
 InternalTypeInfo.of(rowType),
 sortOperatorGen.createSortOperator())

Review Comment:
   Generate operator name as `sorter:(partition_key)` or 
`sorter:(partition_key, record_key)` based on the given configurations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5657) Fix NPE if filters condition contains null literal when using column stats data skipping for flink

2023-01-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5657:
-
Labels: pull-request-available  (was: )

> Fix NPE if filters condition contains null literal when using column stats 
> data skipping for flink
> --
>
> Key: HUDI-5657
> URL: https://issues.apache.org/jira/browse/HUDI-5657
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink, flink-sql
>Reporter: Jing Zhang
>Assignee: Jing Zhang
>Priority: Major
>  Labels: pull-request-available
>
> If the filter contains null literal, NPE would be thrown out. 
> For example:
> "where a < null", "where a <> null", "where a <= null".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] beyond1920 opened a new pull request, #7801: [HUDI-5657] Fix NPE if filters condition contains null literal when using column stats data skipping for flink

2023-01-30 Thread via GitHub


beyond1920 opened a new pull request, #7801:
URL: https://github.com/apache/hudi/pull/7801

   [HUDI-5657] Fix NPE if filters condition contains null literal when using 
column stats data skipping for flink
   
   ### Change Logs
   
   Fix NPE if filters condition contains null literal when using column stats 
data skipping for flink
   
   
   ### Impact
   
   NA
   
   ### Risk level (write none, low medium or high below)
   
   NA
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7615: [HUDI-5510] Reload active timeline when commit finish

2023-01-30 Thread via GitHub


danny0405 commented on code in PR #7615:
URL: https://github.com/apache/hudi/pull/7615#discussion_r1091422171


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##
@@ -282,6 +282,8 @@ protected void commit(HoodieTable table, String 
commitActionType, String instant
 writeTableMetadata(table, instantTime, commitActionType, metadata);
 activeTimeline.saveAsComplete(new HoodieInstant(true, commitActionType, 
instantTime),
 Option.of(metadata.toJsonString().getBytes(StandardCharsets.UTF_8)));
+// reload active timeline
+table.getMetaClient().reloadActiveTimeline();
   }

Review Comment:
   A better choice is moveing the logic into `HoodieTimelineArchiver` 
constructor.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##
@@ -282,6 +282,8 @@ protected void commit(HoodieTable table, String 
commitActionType, String instant
 writeTableMetadata(table, instantTime, commitActionType, metadata);
 activeTimeline.saveAsComplete(new HoodieInstant(true, commitActionType, 
instantTime),
 Option.of(metadata.toJsonString().getBytes(StandardCharsets.UTF_8)));
+// reload active timeline
+table.getMetaClient().reloadActiveTimeline();
   }

Review Comment:
   A better choice is moving the logic into `HoodieTimelineArchiver` 
constructor.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7799: [HUDI-5655] Closing write client for spark ds writer in all cases

2023-01-30 Thread via GitHub


hudi-bot commented on PR #7799:
URL: https://github.com/apache/hudi/pull/7799#issuecomment-1409699372

   
   ## CI report:
   
   * 9cca5f3cbc039e71feabad9f48050ec7ebce17a7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14785)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-5657) Fix NPE if filters condition contains null literal when using column stats data skipping for flink

2023-01-30 Thread Jing Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhang reassigned HUDI-5657:


Assignee: Jing Zhang

> Fix NPE if filters condition contains null literal when using column stats 
> data skipping for flink
> --
>
> Key: HUDI-5657
> URL: https://issues.apache.org/jira/browse/HUDI-5657
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink, flink-sql
>Reporter: Jing Zhang
>Assignee: Jing Zhang
>Priority: Major
>
> If the filter contains null literal, NPE would be thrown out. 
> For example:
> "where a < null", "where a <> null", "where a <= null".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5657) Fix NPE if filters condition contains null literal when using column stats data skipping for flink

2023-01-30 Thread Jing Zhang (Jira)
Jing Zhang created HUDI-5657:


 Summary: Fix NPE if filters condition contains null literal when 
using column stats data skipping for flink
 Key: HUDI-5657
 URL: https://issues.apache.org/jira/browse/HUDI-5657
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink, flink-sql
Reporter: Jing Zhang


If the filter contains null literal, NPE would be thrown out. 

For example:

"where a < null", "where a <> null", "where a <= null".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] phani482 opened a new issue, #7800: "java.lang.OutOfMemoryError: Requested array size exceeds VM limit" while writing to Hudi COW table

2023-01-30 Thread via GitHub


phani482 opened a new issue, #7800:
URL: https://github.com/apache/hudi/issues/7800

   Hello Team,
   
   We are running Glue streaming Job which reads from kinesis and writes to 
Hudi COW table (s3) on glue catalog.
   The Job is running since ~1year without issues. However, lately we started 
seeing OOM errors as below without much insights from the logging.  
   
   a. I tried moving [.commits_.archive] files out of .hoodie folder to reduce 
the size of the .hoodie folder. This helped for a while but the issue started 
to surface again. 
   (s3:///prefix/.hoodie/.commits_.archive.1763_1-0-1
   
   b. Here are the write options we are using for Apache Hudi Connector 0.9.0 
 "hoodie.datasource.write.operation": "insert",
   "hoodie.insert.shuffle.parallelism": 10,
   "hoodie.bulkinsert.shuffle.parallelism": 10,
   "hoodie.upsert.shuffle.parallelism": 10,
   "hoodie.delete.shuffle.parallelism": 10,
   "hoodie.parquet.small.file.limit": 8 * 1000 * 1000,  # 8MB
   "hoodie.parquet.max.file.size": 10 * 1000 * 1000,  # 10 MB
   "hoodie.datasource.hive_sync.use_jdbc": "false",
   "hoodie.datasource.hive_sync.enable": "false",
   "hoodie.datasource.hive_sync.database": "database_name",
   "hoodie.datasource.hive_sync.table": "raw_table_name",
   "hoodie.datasource.hive_sync.partition_fields": "entity_name",
   "hoodie.datasource.hive_sync.partition_extractor_class": 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
   "hoodie.datasource.hive_sync.support_timestamp": "true",
   "hoodie.keep.min.commits": 1450,  
   "hoodie.keep.max.commits": 1500,  
   "hoodie.cleaner.commits.retained": 1449,
   
   Error:
   ###
   INFO:py4j.java_gateway:Received command  on object id 
INFO:py4j.java_gateway:Closing down callback connection
   --
   INFO:py4j.java_gateway:Callback Connection ready to receive messages
   INFO:py4j.java_gateway:Received command c on object id p0
   INFO:root:Batch ID: 160325 has 110 records
   ## java.lang.OutOfMemoryError: Requested array size exceeds VM limit# 
-XX:OnOutOfMemoryError="kill -9 %p"#   Executing /bin/sh -c "kill -9 7"...
   ###
   
   Q: We noticed that ".commits_.archive" files are not being cleaned up by 
hoodie by default. Are there any settings we need to enable for this to happen ?
   
   Q: Our .hoodie folder was ~1.5 GB in size before we started moving archive 
file out of the folder. Is this a hude size for .hoodie folder to be ? What are 
the best practices to maintain .hoodie folder in terms of size and object count?
   
   Q: The error logs doesnt indicate more details, but even after using 20 G.1x 
type DPU on GLue this seems to be not helping. (executor memeory: 10GB, Driver 
memeory 10 GB, executor cores 8). Our workload is not huge, we get few 
thousands of events every hr on avg 1 million records a day is what our job 
processes. The payload size is not more than ~300kb
   
   Please let me know if you need any further details
   
   Thanks
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] liaotian1005 commented on pull request #7633: Fix Deletes issued without any prior commits

2023-01-30 Thread via GitHub


liaotian1005 commented on PR #7633:
URL: https://github.com/apache/hudi/pull/7633#issuecomment-1409678925

   > Hi, you bette rebase with the latest master and force-push, there are some 
fixes for the CI failure.
   
   I fixed the test case, and one ci is PENDING
   6edffd10a0abadfc1f169b10f131b755c8e1280e Azure: [PENDING]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] liaotian1005 commented on pull request #7633: Fix Deletes issued without any prior commits

2023-01-30 Thread via GitHub


liaotian1005 commented on PR #7633:
URL: https://github.com/apache/hudi/pull/7633#issuecomment-1409675705

   > Hi, you bette rebase with the latest master and force-push, there are some 
fixes for the CI failure.
   I fixed the test case, and one ci is PENDING 
   `6edffd10a0abadfc1f169b10f131b755c8e1280e` Azure: [PENDING]
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >