date:20231105

[I] [SUPPORT] Can not extract Partition Path with conf populateMetaFields set false and dropPartitionColumns set true [hudi]

2023-11-05 Thread via GitHub



zyl891229 opened a new issue, #9991:
URL: https://github.com/apache/hudi/issues/9991

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   Can not extract Partition Path with conf populateMetaFields set false and 
dropPartitionColumns set true
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.  write hudi table using spark dataframe 
   
   val structType = StructType(Array(
 StructField("name", StringType, nullable = true),
 StructField("age", IntegerType, nullable = true),
 StructField("city", StringType, nullable = true),
 StructField("map", MapType(StringType, StringType), nullable = true)
   ))
   
 hudi conf as follows：
 .option(DataSourceWriteOptions.TABLE_TYPE.key(), 
COW_TABLE_TYPE_OPT_VAL)
 .options(getQuickstartWriteConfigs)
 .option(PRECOMBINE_FIELD_OPT_KEY,"name")
 .option(RECORDKEY_FIELD_OPT_KEY, "name")
 .option(PARTITIONPATH_FIELD_OPT_KEY, "city,age")
 .option(HoodieWriteConfig.TBL_NAME.key(), tableName)
 .option(DataSourceWriteOptions.OPERATION.key(), 
DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL)
 
.option("hoodie.bulkinsert.overwrite.operation.type","insert_overwrite")
 .option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING.key(), "true")
 .option(HoodieTableConfig.POPULATE_META_FIELDS.key(), "false")
 .option("hoodie.datasource.write.drop.partition.columns", "true")
 .option(HoodieMetadataConfig.ENABLE.key(), "true")
 .option("hoodie.metadata.index.column.stats.enable", "true")
 .option("hoodie.metadata.index.column.stats.file.group.count", "8")
 .option("hoodie.metadata.index.column.stats.parallelism", "10")
 .option("hoodie.enable.data.skipping", "true")
 .option(HoodieCleanConfig.CLEANER_POLICY.key(), 
HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS.name())
 .option(HoodieCleanConfig.CLEANER_FILE_VERSIONS_RETAINED.key(), "1")
 .mode(SaveMode.Append)
   
   **Expected behavior**
   
   Able to parse partitions correctly
   
   **Environment Description**
   
   * Hudi version :
   * 0.14.0
   
   * Spark version :
   * 3.1.1
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   * Local
   
   * Running on Docker? (yes/no) :
   * no
   
   **Additional context**
   
   According to the 
org.apache.hudi.table.action.commit.BulkInsertDataInternalWriterHelper#extractPartitionPath
   when populateMetaFields is set to  false  , the partition fields are 
extracted from the Spark InternalRow  ，
   but according to 
org.apache.hudi.HoodieDatasetBulkInsertHelper#dropPartitionColumns
   when dropPartitionColumns set true ，PartitionColumns will be droped from 
InternalRow  ，
   Due to the above two Settings, the following exceptions may occur.
   
   **Stacktrace**
   
   [INFO ] 2023-11-06 15:07:32,445 --> [dag-scheduler-event-loop] 
DAGScheduler.logInfo(Logging.scala:57): ResultStage 5 (collect at 
HoodieDatasetBulkInsertHelper.scala:154) failed in 508.376 s due to Job aborted 
due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: 
Lost task 0.0 in stage 5.0 (TID 5) (30.24.66.252 executor driver): 
org.apache.hudi.exception.HoodieException: Failed to resolve nested partition 
field
at 
org.apache.hudi.keygen.BuiltinKeyGenerator$SparkRowAccessor.getRecordPartitionPathValues(BuiltinKeyGenerator.java:458)
at 
org.apache.hudi.keygen.ComplexKeyGenerator.getPartitionPath(ComplexKeyGenerator.java:83)
at 
org.apache.hudi.table.action.commit.BulkInsertDataInternalWriterHelper.extractPartitionPath(BulkInsertDataInternalWriterHelper.java:159)
at 
org.apache.hudi.table.action.commit.BulkInsertDataInternalWriterHelper.write(BulkInsertDataInternalWriterHelper.java:119)
at 
org.apache.hudi.HoodieDatasetBulkInsertHelper$.$anonfun$bulkInsert$2(HoodieDatasetBulkInsertHelper.scala:189)
at 
org.apache.hudi.HoodieDatasetBulkInsertHelper$.$anonfun$bulkInsert$2$adapted(HoodieDatasetBulkInsertHelper.scala:189)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.foreach(WholeStageCodegenExec.scala:753)
at 
org.apache.hudi.HoodieDatasetBulkInsertHelper$.$anonfun$bulkInsert$1(HoodieDatasetBulkInsertHelper.scala:189)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:

[PR] [DOCS] update clustering configuration [hudi]

2023-11-05 Thread via GitHub



ksmou opened a new pull request, #9990:
URL: https://github.com/apache/hudi/pull/9990

   ### Change Logs
   
   Update clustering configuration.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] Incoming batch schema is not compatible with the table's one [hudi]

2023-11-05 Thread via GitHub



njalan commented on issue #9980:
URL: https://github.com/apache/hudi/issues/9980#issuecomment-1794217113

   @ad1happy2go  I am sure I have totally removed all the data files. I tested 
many times.  It is wired that how this **Table's schema**  generated and it is 
totally different from source table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6949] Spark support non-blocking concurrency control [hudi]

2023-11-05 Thread via GitHub



codope commented on PR #9921:
URL: https://github.com/apache/hudi/pull/9921#issuecomment-1794215375

   Azure CI is blocked and @xushiyan is looking into that. Meanwhile, I am 
running the CI tests locally for this PR as we want to merge it before the beta 
release. If tests are passing, I will merge the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] [DOCS] update run_clustering procedure docs [hudi]

2023-11-05 Thread via GitHub



ksmou opened a new pull request, #9989:
URL: https://github.com/apache/hudi/pull/9989

   ### Change Logs
   
   Update run_clustering procedure docs.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-11-05 Thread via GitHub



codope merged PR #9871:
URL: https://github.com/apache/hudi/pull/9871


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

(hudi) branch master updated: [HUDI-2461] Support out of order commits in MDT with completion time view (#9871)

2023-11-05 Thread codope

This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 61f35ebe423 [HUDI-2461] Support out of order commits in MDT with 
completion time view (#9871)
61f35ebe423 is described below

commit 61f35ebe423da3f8df7f5343c98fc74eb3d6eb7f
Author: Sagar Sumit 
AuthorDate: Mon Nov 6 12:23:18 2023 +0530

[HUDI-2461] Support out of order commits in MDT with completion time view 
(#9871)
---
 .../client/timeline/HoodieTimelineArchiver.java|   5 +-
 .../metadata/HoodieBackedTableMetadataWriter.java  |   4 +-
 .../common/testutils/HoodieMetadataTestTable.java  |  17 +-
 .../hudi/client/TestJavaHoodieBackedMetadata.java  |  41 +
 .../functional/TestHoodieBackedMetadata.java   |  58 ++
 .../apache/hudi/io/TestHoodieTimelineArchiver.java | 201 +
 .../table/timeline/CompletionTimeQueryView.java|   8 +-
 .../table/timeline/HoodieDefaultTimeline.java  |   7 +-
 .../apache/hudi/common/util/CompactionUtils.java   |   8 +-
 .../hudi/metadata/HoodieTableMetadataUtil.java |   7 -
 .../hudi/common/util/TestCompactionUtils.java  |  73 
 .../sink/TestStreamWriteOperatorCoordinator.java   |  13 +-
 12 files changed, 179 insertions(+), 263 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java
index dc761e23804..3277039f31b 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java
@@ -56,7 +56,6 @@ import java.util.stream.Collectors;
 import java.util.stream.Stream;
 
 import static 
org.apache.hudi.client.utils.ArchivalUtils.getMinAndMaxInstantsToKeep;
-import static 
org.apache.hudi.common.table.timeline.HoodieTimeline.COMPACTION_ACTION;
 import static org.apache.hudi.common.table.timeline.HoodieTimeline.LESSER_THAN;
 import static 
org.apache.hudi.common.table.timeline.HoodieTimeline.compareTimestamps;
 
@@ -213,8 +212,8 @@ public class HoodieTimelineArchiver {
   return Collections.emptyList();
 } else {
   LOG.info("Limiting archiving of instants to latest compaction on 
metadata table at " + latestCompactionTime.get());
-  earliestInstantToRetainCandidates.add(Option.of(new HoodieInstant(
-  HoodieInstant.State.COMPLETED, COMPACTION_ACTION, 
latestCompactionTime.get(;
+  earliestInstantToRetainCandidates.add(
+  
completedCommitsTimeline.findInstantsModifiedAfterByCompletionTime(latestCompactionTime.get()).firstInstant());
 }
   } catch (Exception e) {
 throw new HoodieException("Error limiting instant archival based on 
metadata table", e);
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
index af15fc304de..ecdf93eda1d 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
@@ -1136,7 +1136,7 @@ public abstract class HoodieBackedTableMetadataWriter 
implements HoodieTableM
 // are completed on the dataset. Hence, this case implies a rollback of 
completed commit which should actually be handled using restore.
 if (compactionInstant.getAction().equals(HoodieTimeline.COMMIT_ACTION)) {
   final String compactionInstantTime = compactionInstant.getTimestamp();
-  if 
(HoodieTimeline.LESSER_THAN_OR_EQUALS.test(commitToRollbackInstantTime, 
compactionInstantTime)) {
+  if (commitToRollbackInstantTime.length() == 
compactionInstantTime.length() && 
HoodieTimeline.LESSER_THAN_OR_EQUALS.test(commitToRollbackInstantTime, 
compactionInstantTime)) {
 throw new HoodieMetadataException(String.format("Commit being rolled 
back %s is earlier than the latest compaction %s. "
 + "There are %d deltacommits after this compaction: %s", 
commitToRollbackInstantTime, compactionInstantTime,
 deltacommitsSinceCompaction.countInstants(), 
deltacommitsSinceCompaction.getInstants()));
@@ -1359,7 +1359,7 @@ public abstract class HoodieBackedTableMetadataWriter 
implements HoodieTableM
 // Trigger compaction with suffixes based on the same instant time. This 
ensures that any future
 // delta commits synced over will not have an instant time lesser than the 
last completed instant on the
 // metadata table.
-final String compactionInstantTime

Re: [I] [SUPPORT]flink-sql write hudi use TIMESTAMP, when hive query, it get time+8h question, use TIMESTAMP_LTZ, the hive schema is bigint but timestamp [hudi]

2023-11-05 Thread via GitHub



GaoYaokun commented on issue #9864:
URL: https://github.com/apache/hudi/issues/9864#issuecomment-1794180934

   I saw #8867 and switched Hudi version to 0.14.0. Then the timestamp and date 
field synchronized hive correctly. Querying the hudi table through hive is also 
correct.
   
   hudi version: 0.14.0
   flink version 1.16.1
   hive version 3.1.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7030] update containsInstant without containsOrBeforeTimelineStarts to fix data lost [hudi]

2023-11-05 Thread via GitHub



Xoln commented on code in PR #9982:
URL: https://github.com/apache/hudi/pull/9982#discussion_r1382840721


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java:
##
@@ -439,7 +439,7 @@ public boolean containsInstant(String ts) {
 // Check for older timestamp which have sec granularity and an extension 
of DEFAULT_MILLIS_EXT may have been added via Timeline operations
 if (ts.length() == 
HoodieInstantTimeGenerator.MILLIS_INSTANT_TIMESTAMP_FORMAT_LENGTH && 
ts.endsWith(HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT)) {
   final String actualOlderFormatTs = ts.substring(0, ts.length() - 
HoodieInstantTimeGenerator.DEFAULT_MILLIS_EXT.length());
-  return containsOrBeforeTimelineStarts(actualOlderFormatTs);
+  return containsInstant(actualOlderFormatTs);

Review Comment:
   > The start instant comparison is neede because query of all historical 
instants is costly, the comparison is an expedience.
   
   When worked in inflight timeline , it cause incorrect result . Instant 
which's timestamp is end with default_mills_ext don't contain in inflight 
timeline but return true. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] Incoming batch schema is not compatible with the table's one [hudi]

2023-11-05 Thread via GitHub



ad1happy2go commented on issue #9980:
URL: https://github.com/apache/hudi/issues/9980#issuecomment-1794096535

   @njalan That means the old data was not getting deleted properly I guess 
with name 'address'. Can you confirm once. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-05 Thread via GitHub



hehuiyuan commented on PR #9936:
URL: https://github.com/apache/hudi/pull/9936#issuecomment-1793993547

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7032] ShowProcedures show add limit syntax to keep the same [hudi]

2023-11-05 Thread via GitHub



xuzifu666 commented on code in PR #9988:
URL: https://github.com/apache/hudi/pull/9988#discussion_r1382720887


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowSavepointsProcedure.scala:
##
@@ -54,7 +56,11 @@ class ShowSavepointsProcedure extends BaseProcedure with 
ProcedureBuilder {
 val commits: util.List[HoodieInstant] = 
timeline.getReverseOrderedInstants.collect(Collectors.toList[HoodieInstant])
 
 if (commits.isEmpty) Seq.empty[Row] else {
-  commits.toArray.map(instant => 
instant.asInstanceOf[HoodieInstant].getTimestamp).map(p => Row(p)).toSeq
+  if (limit.isDefined) {
+
commits.stream().limit(limit.get.asInstanceOf[Int]).toArray.map(instant => 
instant.asInstanceOf[HoodieInstant].getTimestamp).map(p => Row(p)).toSeq

Review Comment:
   ok，I would change it and add ut for it



##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowSavepointsProcedure.scala:
##
@@ -54,7 +56,11 @@ class ShowSavepointsProcedure extends BaseProcedure with 
ProcedureBuilder {
 val commits: util.List[HoodieInstant] = 
timeline.getReverseOrderedInstants.collect(Collectors.toList[HoodieInstant])
 
 if (commits.isEmpty) Seq.empty[Row] else {
-  commits.toArray.map(instant => 
instant.asInstanceOf[HoodieInstant].getTimestamp).map(p => Row(p)).toSeq
+  if (limit.isDefined) {
+
commits.stream().limit(limit.get.asInstanceOf[Int]).toArray.map(instant => 
instant.asInstanceOf[HoodieInstant].getTimestamp).map(p => Row(p)).toSeq

Review Comment:
   ok，I would change it and add ut for it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7031] CopyToTempView support cache for improving perfermance [hudi]

2023-11-05 Thread via GitHub



danny0405 commented on code in PR #9986:
URL: https://github.com/apache/hudi/pull/9986#discussion_r1382717155


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/CopyToTempViewProcedure.scala:
##
@@ -56,6 +57,7 @@ class CopyToTempViewProcedure extends BaseProcedure with 
ProcedureBuilder with L
 val asOfInstant = getArgValueOrDefault(args, 
PARAMETERS(5)).get.asInstanceOf[String]
 val replace = getArgValueOrDefault(args, 
PARAMETERS(6)).get.asInstanceOf[Boolean]
 val global = getArgValueOrDefault(args, 
PARAMETERS(7)).get.asInstanceOf[Boolean]
+val cache = getArgValueOrDefault(args, 
PARAMETERS(8)).get.asInstanceOf[Boolean]
 

Review Comment:
   @boneanxs Can you help for the review?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7032] ShowProcedures show add limit syntax to keep the same [hudi]

2023-11-05 Thread via GitHub



danny0405 commented on code in PR #9988:
URL: https://github.com/apache/hudi/pull/9988#discussion_r1382716912


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowSavepointsProcedure.scala:
##
@@ -54,7 +56,11 @@ class ShowSavepointsProcedure extends BaseProcedure with 
ProcedureBuilder {
 val commits: util.List[HoodieInstant] = 
timeline.getReverseOrderedInstants.collect(Collectors.toList[HoodieInstant])
 
 if (commits.isEmpty) Seq.empty[Row] else {
-  commits.toArray.map(instant => 
instant.asInstanceOf[HoodieInstant].getTimestamp).map(p => Row(p)).toSeq
+  if (limit.isDefined) {
+
commits.stream().limit(limit.get.asInstanceOf[Int]).toArray.map(instant => 
instant.asInstanceOf[HoodieInstant].getTimestamp).map(p => Row(p)).toSeq

Review Comment:
Can we abstract out the limit function into a common logic in parent class, 
the parent class is responsible to limit and collect the data frame, and can we 
add a test for it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7032) ShowProcedures show add limit syntax to keep the same

2023-11-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7032:
-
Labels: pull-request-available  (was: )

> ShowProcedures show add limit syntax to keep the same 
> --
>
> Key: HUDI-7032
> URL: https://issues.apache.org/jira/browse/HUDI-7032
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
>
> like ShowArchivedCommitsProcedure both contains limit to aviod to much status 
> return to user，keep other commands contains it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-7032] ShowProcedures show add limit syntax to keep the same [hudi]

2023-11-05 Thread via GitHub



xuzifu666 commented on PR #9988:
URL: https://github.com/apache/hudi/pull/9988#issuecomment-1793950894

   cc @danny0405 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7031] CopyToTempView support cache for improving perfermance [hudi]

2023-11-05 Thread via GitHub



xuzifu666 commented on PR #9986:
URL: https://github.com/apache/hudi/pull/9986#issuecomment-1793950440

   cc @danny0405 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-05 Thread via GitHub



danny0405 commented on code in PR #9936:
URL: https://github.com/apache/hudi/pull/9936#discussion_r1382706501


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestSchemaEvolution.java:
##
@@ -480,16 +480,16 @@ private ExpectedResult(String[] evolvedRows, String[] 
rowsWithMeta, String[] row
   "+I[Alice, 9.9, unknown, +I[9, 9, s9, 99, t9, drop_add9], 
{Alice=.99}, [.0, .0], +I[9, 9], [9], {k9=v9}]",
   },
   new String[] {
-  "+I[uuid:id0, Indica, null, 12, null, {Indica=1212.0}, [12.0], null, 
null, null]",
-  "+I[uuid:id1, Danny, 1.1, 23, +I[1, 1, s1, 11, t1, drop_add1], 
{Danny=2323.23}, [23.0, 23.0, 23.0], +I[1, 1], [1], {k1=v1}]",
-  "+I[uuid:id2, Stephen, null, 33, +I[2, null, s2, 2, null, null], 
{Stephen=.0}, [33.0], null, null, null]",
-  "+I[uuid:id3, Julian, 3.3, 53, +I[3, 3, s3, 33, t3, drop_add3], 
{Julian=5353.53}, [53.0], +I[3, 3], [3], {k3=v3}]",
-  "+I[uuid:id4, Fabian, null, 31, +I[4, null, s4, 4, null, null], 
{Fabian=3131.0}, [31.0], null, null, null]",
-  "+I[uuid:id5, Sophia, null, 18, +I[5, null, s5, 5, null, null], 
{Sophia=1818.0}, [18.0, 18.0], null, null, null]",
-  "+I[uuid:id6, Emma, null, 20, +I[6, null, s6, 6, null, null], 
{Emma=2020.0}, [20.0], null, null, null]",
-  "+I[uuid:id7, Bob, null, 44, +I[7, null, s7, 7, null, null], 
{Bob=.0}, [44.0, 44.0], null, null, null]",

Review Comment:
   you can rebase with the latest master to resolve this test failure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [SUPPORT] hoodie.table.cdc.enabled 'There should be a cdc log file.' read error with new partition. [hudi]

2023-11-05 Thread via GitHub



danny0405 commented on issue #9987:
URL: https://github.com/apache/hudi/issues/9987#issuecomment-1793944107

   Nice findings, do you have spare time to contribute a fix patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-6990) Configurable clustering task parallelism

2023-11-05 Thread kwang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-6990:

Summary: Configurable clustering task parallelism  (was: Spark clustering 
job reads records support control the parallelism)

> Configurable clustering task parallelism
> 
>
> Key: HUDI-6990
> URL: https://issues.apache.org/jira/browse/HUDI-6990
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: clustering
>Reporter: kwang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.14.1
>
> Attachments: after-subtasks.png, before-subtasks.png
>
>
> Spark executes clustering job will read clustering plan which contains 
> multiple groups. Each group process many base files or log files. When we 
> config param `
> hoodie.clustering.plan.strategy.sort.columns`, we read those files through 
> spark's parallelize method, every file read will generate one sub task. It's 
> unreasonable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [Docs] add function doc for hudi table type change command [hudi]

2023-11-05 Thread via GitHub



danny0405 commented on PR #9985:
URL: https://github.com/apache/hudi/pull/9985#issuecomment-1793942722

   Maybe there are some syntax error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [Docs] add function doc for hudi table type change command [hudi]

2023-11-05 Thread via GitHub



waitingF commented on PR #9985:
URL: https://github.com/apache/hudi/pull/9985#issuecomment-1793754506

   > Did you build the website in your local env, the CI failed.
   
   I built in my local, but failed too. The error message is same as last 
commit 
[99e573a9f399e868073b45b6b1208060ea0cfc49](https://github.com/apache/hudi/actions/runs/6752073823/job/18356985170)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] [HUDI-7031] ShowProcedures show add limit syntax to keep the same [hudi]

2023-11-05 Thread via GitHub



xuzifu666 opened a new pull request, #9988:
URL: https://github.com/apache/hudi/pull/9988

   ### Change Logs
   
   like ShowArchivedCommitsProcedure both contains limit to aviod to much 
status return to user，keep other commands contains it
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7032) ShowProcedures show add limit syntax to keep the same

2023-11-05 Thread xy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-7032:
-
Component/s: spark-sql

> ShowProcedures show add limit syntax to keep the same 
> --
>
> Key: HUDI-7032
> URL: https://issues.apache.org/jira/browse/HUDI-7032
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Assignee: xy
>Priority: Major
>
> like ShowArchivedCommitsProcedure both contains limit to aviod to much status 
> return to user，keep other commands contains it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-7032) ShowProcedures show add limit syntax to keep the same

2023-11-05 Thread xy (Jira)

xy created HUDI-7032:


 Summary: ShowProcedures show add limit syntax to keep the same 
 Key: HUDI-7032
 URL: https://issues.apache.org/jira/browse/HUDI-7032
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: xy
Assignee: xy


like ShowArchivedCommitsProcedure both contains limit to aviod to much status 
return to user，keep other commands contains it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [I] [SUPPORT] Trino queries failing when hudi.metadata_enabled is set to true. [hudi]

2023-11-05 Thread via GitHub



codope commented on issue #9758:
URL: https://github.com/apache/hudi/issues/9758#issuecomment-1793733363

   @danny0405 I see that the the limit was hard-coded to 3 in 
https://github.com/apache/hudi/commit/f1286c2c764d6be9f23b41c76f4de1c8734c1f3b. 
Should we determine this based on compaction max delta commits?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7031] CopyToTempView support cache for improving perfermance [hudi]

2023-11-05 Thread via GitHub



xuzifu666 closed pull request #9986: [HUDI-7031] CopyToTempView support cache 
for improving perfermance
URL: https://github.com/apache/hudi/pull/9986


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [SUPPORT] hoodie.table.cdc.enabled 'There should be a cdc log file.' read error with new partition. [hudi]

2023-11-05 Thread via GitHub



Hans-Raintree opened a new issue, #9987:
URL: https://github.com/apache/hudi/issues/9987

   **Describe the problem you faced**
   
   When reading incrementally with format 'cdc' the read fails when there was 
both an insert and a delete in the last write for a new partition. Also fails 
if there was just a delete.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   ```
   output_path = ''
   
   hudiOptions = {
   'hoodie.table.name': 'test',
   'hoodie.datasource.write.recordkey.field': '_id',
   'hoodie.datasource.write.precombine.field': 'replicadmstimestamp',
   'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.ComplexKeyGenerator', 
   'hoodie.datasource.write.partitionpath.field': 'partition',
   'hoodie.datasource.write.payload.class': 
'org.apache.hudi.common.model.AWSDmsAvroPayload',
   'hoodie.table.cdc.enabled': 'true',
   'hoodie.table.cdc.supplemental.logging.mode': 'data_before_after'
   }
   
   data = [("1", "I", "2023-06-14 15:46:06.953746", "A", "A")]
   df = spark.createDataFrame(data, ["_id", "Op", "replicadmstimestamp", 
"code", "partition"])
   
   
   df.write \
   .format('org.apache.hudi') \
   .option('hoodie.datasource.write.operation', 'upsert') \
   .options(**hudiOptions) \
   .mode('append') \
   .save(output_path)
   
   
   data = [("10", "I", "2023-06-15 15:48:06.953746", "B", "B"),
   ("10", "D", "2023-06-15 15:49:06.953746", "B", "B")]
   df = spark.createDataFrame(data, ["_id", "Op", "replicadmstimestamp", 
"code", "partition"])
   
   df.write \
   .format('org.apache.hudi') \
   .option('hoodie.datasource.write.operation', 'upsert') \
   .options(**hudiOptions) \
   .mode('append') \
   .save(output_path)
   
   read_options = {
   'hoodie.datasource.query.type': 'incremental',
   'hoodie.datasource.read.begin.instanttime': '0',
   'hoodie.datasource.query.incremental.format': 'cdc'
   }
   
   df = spark.read \
   .format('org.apache.hudi') \
   .options(**read_options) \
   .load(output_path)
   df.show()
   ```
   Also if the second insert was just:
   
   ` ("10", "D", "2023-06-15 15:49:06.953746", "B", "B")
   ` 
   It would fail with the same error.
   
   **Expected behavior**
   
   Read doesn't fail.
   
   **Environment Description**
   
   * Hudi version : 0.14.0
   
   * Spark version : 3.4.0
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.3.3
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   I think the issue is probably here:
   
   ```
   if (WriteOperationType.isDelete(operation) && writeStat.getNumWrites() == 0L
&& writeStat.getNumDeletes() != 0) {
 // This is a delete operation wherein all the records in this file group 
are deleted
 // and no records have been written out a new file.
 // So, we find the previous file that this operation delete from, and 
treat each of
 // records as a deleted one.
 HoodieBaseFile beforeBaseFile = getOrCreateFsView().getBaseFileOn(
  fileGroupId.getPartitionPath(), writeStat.getPrevCommit(), 
fileGroupId.getFileId()
 ).orElseThrow(() ->
  new HoodieIOException("Can not get the previous version of the base 
file")
 );
 FileSlice beforeFileSlice = new FileSlice(fileGroupId, 
writeStat.getPrevCommit(), beforeBaseFile, Collections.emptyList());
 cdcFileSplit = new HoodieCDCFileSplit(instantTs, BASE_FILE_DELETE, new 
ArrayList<>(), Option.of(beforeFileSlice), Option.empty());
   } else if (writeStat.getNumUpdateWrites() == 0L && writeStat.getNumDeletes() 
== 0
&& writeStat.getNumWrites() == writeStat.getNumInserts()) {
 // all the records in this file are new.
 cdcFileSplit = new HoodieCDCFileSplit(instantTs, BASE_FILE_INSERT, path);
   } else {
 throw new HoodieException("There should be a cdc log file.");
   }
   ```
   
   Not sure what exactly is going wrong here, maybe writeStat.getNumDeletes() 
is somehow greater than 0, although it should be 0, because nothing actually 
got deleted.
   
   **Stacktrace**
   
   ```An error was encountered:
   An error occurred while calling o389.showString.
   : org.apache.hudi.exception.HoodieException: There should be a cdc log file.
at 
org.apache.hudi.common.table.cdc.HoodieCDCExtractor.parseWriteStat(HoodieCDCExtractor.java:276)
at 
org.apache.hudi.common.table.cdc.HoodieCDCExtractor.lambda$extractCDCFileSplits$1(HoodieCDCExtractor.java:131)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at 
org.apache.hudi.common.table.cdc.HoodieCDCExtractor.extractCDCFileSplits(HoodieCDCExtractor.java:126)
at org.apache.hudi.cdc.CDCRelation.buildScan0(CDCRelation.scala:105)
at org.apache.hudi.cdc.CDCRelation.buildScan(CDCRelation.scala:87)
at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$.$anonfun$ap

[jira] [Updated] (HUDI-7031) CopyToTempView support cache for improving perfermance

2023-11-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7031:
-
Labels: pull-request-available  (was: )

> CopyToTempView support cache for improving perfermance
> --
>
> Key: HUDI-7031
> URL: https://issues.apache.org/jira/browse/HUDI-7031
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Affects Versions: 0.14.0
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
>
> when user in sparksql session mode，can not cache tempview for read_optimized 
> or snapshot，so we need support it for improment sql in query or join scene



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] [HUDI-7031] CopyToTempView support cache for improving perfermance [hudi]

2023-11-05 Thread via GitHub



xuzifu666 opened a new pull request, #9986:
URL: https://github.com/apache/hudi/pull/9986

   ### Change Logs
   
   when user in sparksql session mode，can not cache tempview for read_optimized 
or snapshot，so we need support it for improment sql in query or join 
scene，copyToTempView support cache for improving perfermance
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7031) CopyToTempView support cache for improving perfermance

2023-11-05 Thread xy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-7031:
-
Summary: CopyToTempView support cache for improving perfermance  (was: 
CopyToTempView support cache for improve perfermance)

> CopyToTempView support cache for improving perfermance
> --
>
> Key: HUDI-7031
> URL: https://issues.apache.org/jira/browse/HUDI-7031
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Affects Versions: 0.14.0
>Reporter: xy
>Assignee: xy
>Priority: Major
>
> when user in sparksql session mode，can not cache tempview for read_optimized 
> or snapshot，so we need support it for improment sql in query or join scene



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7031) CopyToTempView support cache for improve perfermance

2023-11-05 Thread xy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-7031:
-
Description: when user in sparksql session mode，can not cache tempview for 
read_optimized or snapshot，so we need support it for improment sql in query or 
join scene  (was: when user in sparksql session mode，can not cache tempview for 
read_optimized or snapshot，so we need support it)

> CopyToTempView support cache for improve perfermance
> 
>
> Key: HUDI-7031
> URL: https://issues.apache.org/jira/browse/HUDI-7031
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Affects Versions: 0.14.0
>Reporter: xy
>Assignee: xy
>Priority: Major
>
> when user in sparksql session mode，can not cache tempview for read_optimized 
> or snapshot，so we need support it for improment sql in query or join scene



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7031) CopyToTempView need support cache for improve perfermance

2023-11-05 Thread xy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-7031:
-
Issue Type: Improvement  (was: Bug)

> CopyToTempView need support cache for improve perfermance
> -
>
> Key: HUDI-7031
> URL: https://issues.apache.org/jira/browse/HUDI-7031
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Affects Versions: 0.14.0
>Reporter: xy
>Assignee: xy
>Priority: Major
>
> when user in sparksql session mode，can not cache tempview for read_optimized 
> or snapshot，so we need support it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7031) CopyToTempView support cache for improve perfermance

2023-11-05 Thread xy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-7031:
-
Summary: CopyToTempView support cache for improve perfermance  (was: 
CopyToTempView need support cache for improve perfermance)

> CopyToTempView support cache for improve perfermance
> 
>
> Key: HUDI-7031
> URL: https://issues.apache.org/jira/browse/HUDI-7031
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Affects Versions: 0.14.0
>Reporter: xy
>Assignee: xy
>Priority: Major
>
> when user in sparksql session mode，can not cache tempview for read_optimized 
> or snapshot，so we need support it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-7031) CopyToTempView need support cache for improve perfermance

2023-11-05 Thread xy (Jira)

xy created HUDI-7031:


 Summary: CopyToTempView need support cache for improve perfermance
 Key: HUDI-7031
 URL: https://issues.apache.org/jira/browse/HUDI-7031
 Project: Apache Hudi
  Issue Type: Bug
  Components: spark-sql
Affects Versions: 1.0.0
Reporter: xy
Assignee: xy


when user in sparksql session mode，can not cache tempview for read_optimized or 
snapshot，so we need support it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7031) CopyToTempView need support cache for improve perfermance

2023-11-05 Thread xy (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-7031:
-
Affects Version/s: 0.14.0
   (was: 1.0.0)

> CopyToTempView need support cache for improve perfermance
> -
>
> Key: HUDI-7031
> URL: https://issues.apache.org/jira/browse/HUDI-7031
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Affects Versions: 0.14.0
>Reporter: xy
>Assignee: xy
>Priority: Major
>
> when user in sparksql session mode，can not cache tempview for read_optimized 
> or snapshot，so we need support it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [I] Incoming batch schema is not compatible with the table's one [hudi]

2023-11-05 Thread via GitHub



njalan commented on issue #9980:
URL: https://github.com/apache/hudi/issues/9980#issuecomment-1793715912

   @ad1happy2go  I removed hudi table and also removed all files but still got 
the same error messsage


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-11-05 Thread via GitHub



codope commented on code in PR #9871:
URL: https://github.com/apache/hudi/pull/9871#discussion_r1382548171


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java:
##
@@ -228,9 +228,10 @@ public HoodieDefaultTimeline 
findInstantsInRangeByCompletionTime(String startTs,
   @Override
   public HoodieDefaultTimeline 
findInstantsModifiedAfterByCompletionTime(String instantTime) {
 return new HoodieDefaultTimeline(instants.stream()
-.filter(s -> s.getCompletionTime() != null
-&& HoodieTimeline.compareTimestamps(s.getCompletionTime(), 
GREATER_THAN, instantTime)
-&& !s.getTimestamp().equals(instantTime)), details);
+// either pending or completionTime greater than instantTime
+.filter(s -> (s.getCompletionTime() == null && 
compareTimestamps(s.getTimestamp(), GREATER_THAN, instantTime))
+|| (compareTimestamps(s.getCompletionTime(), GREATER_THAN, 
instantTime) && !s.getTimestamp().equals(instantTime))),

Review Comment:
   my bad.. i meant to keep it.. added now. thanks for pointing out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [MINOR] Fix npe for get internal schema [hudi]

2023-11-05 Thread via GitHub



xiarixiaoyao commented on code in PR #9984:
URL: https://github.com/apache/hudi/pull/9984#discussion_r1382534301


##
hudi-common/src/main/java/org/apache/hudi/common/util/InternalSchemaCache.java:
##
@@ -217,7 +217,11 @@ public static InternalSchema 
getInternalSchemaByVersionId(long versionId, String
 }
 InternalSchema fileSchema = InternalSchemaUtils.searchSchema(versionId, 
SerDeHelper.parseSchemas(latestHistorySchema));
 // step3:
-return fileSchema.isEmptySchema() ? 
AvroInternalSchemaConverter.convert(HoodieAvroUtils.addMetadataFields(new 
Schema.Parser().parse(avroSchema))) : fileSchema;
+return fileSchema.isEmptySchema()
+   ? (StringUtils.isNullOrEmpty(avroSchema)
+ ? InternalSchema.getEmptyInternalSchema()
+ : 
AvroInternalSchemaConverter.convert(HoodieAvroUtils.addMetadataFields(new 
Schema.Parser().parse(avroSchema
+   : fileSchema;

Review Comment:
   thanks for your fix.  
   why avro schema is null here ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-11-05 Thread via GitHub



danny0405 commented on code in PR #9871:
URL: https://github.com/apache/hudi/pull/9871#discussion_r1382533555


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java:
##
@@ -228,9 +228,10 @@ public HoodieDefaultTimeline 
findInstantsInRangeByCompletionTime(String startTs,
   @Override
   public HoodieDefaultTimeline 
findInstantsModifiedAfterByCompletionTime(String instantTime) {
 return new HoodieDefaultTimeline(instants.stream()
-.filter(s -> s.getCompletionTime() != null
-&& HoodieTimeline.compareTimestamps(s.getCompletionTime(), 
GREATER_THAN, instantTime)
-&& !s.getTimestamp().equals(instantTime)), details);
+// either pending or completionTime greater than instantTime
+.filter(s -> (s.getCompletionTime() == null && 
compareTimestamps(s.getTimestamp(), GREATER_THAN, instantTime))
+|| (compareTimestamps(s.getCompletionTime(), GREATER_THAN, 
instantTime) && !s.getTimestamp().equals(instantTime))),

Review Comment:
   The `s.getCompletionTime() != null` check is missing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [Docs] add function doc for hudi table type change command [hudi]

2023-11-05 Thread via GitHub



danny0405 commented on PR #9985:
URL: https://github.com/apache/hudi/pull/9985#issuecomment-1793675546

   Did you build the website in your local env, the CI failed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

41 matches

Mail list logo