Re: [I] [SUPPORT] Poor parallelism in BLOOM indexing stage with Hudi 0.12.3 [hudi]

2023-11-15 Thread via GitHub


ad1happy2go commented on issue #10115:
URL: https://github.com/apache/hudi/issues/10115#issuecomment-1813949629

   @ChiehFu We should derive based on input data size. Too many partitions will 
create extra tasks and create extra overhead time. Are those 2 tsv files 
gzipped. If yes that is causing the job only have 3 tasks. gzip is unsplittable 
format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Poor parallelism in BLOOM indexing stage with Hudi 0.12.3 [hudi]

2023-11-15 Thread via GitHub


ChiehFu commented on issue #10115:
URL: https://github.com/apache/hudi/issues/10115#issuecomment-1813946952

   @ad1happy2go I see. In this case there were only 2 tsv files with a total 
size of 115.7 MiB.
   
   https://github.com/apache/hudi/assets/11819388/da21e2b8-4061-455d-bdd2-d9a33ebba051";>
   
   Is repartitioning by 1 something we could apply universally to all our 
tables regardless of input data size? Or it would be better to derive the value 
based on some factor like input size and would it cause any harm to upsert 
performance if we over re-partition?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7099] Providing metrics for archive and defining some string constants [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10101:
URL: https://github.com/apache/hudi/pull/10101#issuecomment-1813945486

   
   ## CI report:
   
   * 178ef4eadac6ab6d009d86ab86d35babe952 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20942)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7071] Throw exceptions when clustering/index job fail [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10050:
URL: https://github.com/apache/hudi/pull/10050#issuecomment-1813945205

   
   ## CI report:
   
   * 40caf2cf77aa03c17ee84077b6c2d4752c542d48 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20815)
 
   * a46978c942649269675db590f2f65186b636e70a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20945)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7090]Set the maxParallelism for singleton operator [hudi]

2023-11-15 Thread via GitHub


danny0405 commented on code in PR #10090:
URL: https://github.com/apache/hudi/pull/10090#discussion_r1395276883


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java:
##
@@ -410,10 +410,11 @@ public static DataStream 
hoodieStreamWrite(Configuration conf, DataStrea
* @return the compaction pipeline
*/
   public static DataStreamSink compact(Configuration 
conf, DataStream dataStream) {
-return dataStream.transform("compact_plan_generate",
+DataStreamSink compactionCommitEventDataStream = 
dataStream.transform("compact_plan_generate",
 TypeInformation.of(CompactionPlanEvent.class),
 new CompactionPlanOperator(conf))
 .setParallelism(1) // plan generate must be singleton
+.setMaxParallelism(1)

Review Comment:
   Is this line compatible with flink release before 1.18?



##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java:
##
@@ -207,6 +207,7 @@ public DataStream 
produceDataStream(StreamExecutionEnvironment execEnv)
   SingleOutputStreamOperator source = 
execEnv.addSource(monitoringFunction, getSourceOperatorName("split_monitor"))
   .uid(Pipelines.opUID("split_monitor", conf))
   .setParallelism(1)
+  .setMaxParallelism(1)

Review Comment:
   Is this line compatible with flink release before 1.18?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Modified description to include missing trigger strategy [hudi]

2023-11-15 Thread via GitHub


voonhous commented on code in PR #10114:
URL: https://github.com/apache/hudi/pull/10114#discussion_r1395275760


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##
@@ -41,6 +41,7 @@
 import org.apache.hudi.keygen.constant.KeyGeneratorType;
 import org.apache.hudi.sink.overwrite.PartitionOverwriteMode;
 import org.apache.hudi.table.action.cluster.ClusteringPlanPartitionFilterMode;
+import org.apache.hudi.table.action.compact.CompactionTriggerStrategy;
 import org.apache.hudi.util.ClientIds;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7105] support filesystem view configuable [hudi]

2023-11-15 Thread via GitHub


danny0405 commented on code in PR #10116:
URL: https://github.com/apache/hudi/pull/10116#discussion_r1395275031


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/FileSystemViewManager.java:
##
@@ -279,7 +278,13 @@ public static FileSystemViewManager 
createViewManager(final HoodieEngineContext
   throw new IllegalArgumentException("Secondary Storage type can 
only be in-memory or spillable. Was :"
   + viewConfig.getSecondaryStorageType());
   }
-  return new PriorityBasedFileSystemView(remoteFileSystemView, 
secondaryView);
+  if (config.isRemoteViewFirst()) {
+LOG.info("Creating remote table view first");
+return new PriorityBasedFileSystemView(remoteFileSystemView, 
secondaryView);
+  } else {
+LOG.info("Creating secondary table view first");

Review Comment:
   cc @zhedoubushishi , who have also encountered OOM for async cleaning.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Modified description to include missing trigger strategy [hudi]

2023-11-15 Thread via GitHub


voonhous commented on code in PR #10114:
URL: https://github.com/apache/hudi/pull/10114#discussion_r1395274432


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##
@@ -41,6 +41,7 @@
 import org.apache.hudi.keygen.constant.KeyGeneratorType;
 import org.apache.hudi.sink.overwrite.PartitionOverwriteMode;
 import org.apache.hudi.table.action.cluster.ClusteringPlanPartitionFilterMode;
+import org.apache.hudi.table.action.compact.CompactionTriggerStrategy;
 import org.apache.hudi.util.ClientIds;

Review Comment:
   My bad, was doing some debugging and forgot to remove it. 
   
   Will remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Modified description to include missing trigger strategy [hudi]

2023-11-15 Thread via GitHub


danny0405 commented on code in PR #10114:
URL: https://github.com/apache/hudi/pull/10114#discussion_r1395272032


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##
@@ -41,6 +41,7 @@
 import org.apache.hudi.keygen.constant.KeyGeneratorType;
 import org.apache.hudi.sink.overwrite.PartitionOverwriteMode;
 import org.apache.hudi.table.action.cluster.ClusteringPlanPartitionFilterMode;
+import org.apache.hudi.table.action.compact.CompactionTriggerStrategy;
 import org.apache.hudi.util.ClientIds;

Review Comment:
   Why the importation of the class is needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7099] Providing metrics for archive and defining some string constants [hudi]

2023-11-15 Thread via GitHub


danny0405 commented on code in PR #10101:
URL: https://github.com/apache/hudi/pull/10101#discussion_r1395270648


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java:
##
@@ -117,6 +122,10 @@ public boolean archiveIfRequired(HoodieEngineContext 
context, boolean acquireLoc
   } else {
 LOG.info("No Instants to archive");
   }
+  if (success && timerContext != null) {
+long durationMs = metrics.getDurationInMs(timerContext.stop());

Review Comment:
   Can we move the metrics handling to the write client or the service client, 
the cleaning and rollback alreay follow this pattern.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] NotSerializableException using SparkRDDWriteClient with OCC and DynamoDBBasedLockProvider [hudi]

2023-11-15 Thread via GitHub


chym1303 commented on issue #9807:
URL: https://github.com/apache/hudi/issues/9807#issuecomment-1813925321

   Hi @ad1happy2go DynamoDBBasedLockProvider  and 
HiveMetastoreBasedLockProvider have the same issue like 
https://issues.apache.org/jira/browse/HUDI-3638, task not serializable in clean 
action.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Poor parallelism in BLOOM indexing stage with Hudi 0.12.3 [hudi]

2023-11-15 Thread via GitHub


ad1happy2go commented on issue #10115:
URL: https://github.com/apache/hudi/issues/10115#issuecomment-1813912870

   @ChiehFu The tasks in tagging step depends n how many partitions are there 
in input DataFrame. 
   
   By any chance are you getting large zipped files in source? You can do 
repartition before write to hudi.
   
   `df.repartition(1).write.format`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7105) Add FileSystemViewManager configuable

2023-11-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7105:
-
Labels: clean pull-request-available  (was: clean)

> Add FileSystemViewManager configuable
> -
>
> Key: HUDI-7105
> URL: https://issues.apache.org/jira/browse/HUDI-7105
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>  Labels: clean, pull-request-available
>
> If there exists many partitions and files When generating the clean plan, 
> it's easy to throw oom exception. Using secondaryFileSystemView first is more 
> stable than remoteFileSystemView.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7105] support filesystem view configuable [hudi]

2023-11-15 Thread via GitHub


ksmou opened a new pull request, #10116:
URL: https://github.com/apache/hudi/pull/10116

   ### Change Logs
   
   If there are many partitions and files When generating the clean plan, it's 
easy to throw oom exception. The default way is remote table view first, it can 
not fall back to secondary table first if remote view throws oom exception. 
Using secondary view first is more stable than remoteFileSystemView.
   
   ### Impact
   
   N/A
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   N/A
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7071] Throw exceptions when clustering/index job fail [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10050:
URL: https://github.com/apache/hudi/pull/10050#issuecomment-1813899879

   
   ## CI report:
   
   * 40caf2cf77aa03c17ee84077b6c2d4752c542d48 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20815)
 
   * a46978c942649269675db590f2f65186b636e70a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Modified description to include missing trigger strategy [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10114:
URL: https://github.com/apache/hudi/pull/10114#issuecomment-1813884877

   
   ## CI report:
   
   * 5152ea66bd6f4a3c3f506bfe051ef4122973e908 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20941)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7090]Set the maxParallelism for singleton operator [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10090:
URL: https://github.com/apache/hudi/pull/10090#issuecomment-1813884740

   
   ## CI report:
   
   * 35219c2180342faea6e09987e69271508a3f0096 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20912)
 
   * 36d5e48d7b41740a2f94be92dd0fb45cbe4806de Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20944)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7090]Set the maxParallelism for singleton operator [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10090:
URL: https://github.com/apache/hudi/pull/10090#issuecomment-1813877522

   
   ## CI report:
   
   * 35219c2180342faea6e09987e69271508a3f0096 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20912)
 
   * 36d5e48d7b41740a2f94be92dd0fb45cbe4806de UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10063:
URL: https://github.com/apache/hudi/pull/10063#issuecomment-1813877422

   
   ## CI report:
   
   * edb9997799c672e69a5a81271f32504e270846d2 UNKNOWN
   * 34efaac278dde7fd73515e6d54418a6ff8815326 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20939)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Query failure due to replacecommit being archived [hudi]

2023-11-15 Thread via GitHub


ad1happy2go commented on issue #10107:
URL: https://github.com/apache/hudi/issues/10107#issuecomment-1813869857

   @haoxie-aws I tried to reproduce this with OSS version but couldn't able to 
reproduce. Can you try with the later version. Below is the code I used. 
   
   Writer 
   ```
   spark = get_spark_session(spark_version="3.2", hudi_version="0.11.0")
   
   def generateDataFrame():
   # Define the schema for the DataFrame
   schema = StructType([
   StructField("uuid", StringType(), True),
   StructField("index", StringType(), True),
   StructField("timestamp", StringType(), True)
   ])
   
   # Create a list of Row objects
   data = [Row(str(uuid.uuid4()), str(i), str(datetime.now())) for i in 
range(11)]
   
   # Parallelize the data using SparkContext and create an RDD
   rdd = spark.sparkContext.parallelize(data)
   
   # Create a DataFrame from the RDD and schema
   df = spark.createDataFrame(rdd, schema)
   
   return df
   
   def loop():
   # Concatenate Hudi options into a single string
   hudi_options = {
   "hoodie.table.name": TABLE_NAME,
   "hoodie.table.type": "COPY_ON_WRITE",
   "hoodie.datasource.write.recordkey.field": "uuid",
   "hoodie.datasource.write.precombine.field": "timestamp",
   "hoodie.datasource.write.operation": "upsert",
   "hoodie.parquet.max.file.size" : "20971520",
   "hoodie.parquet.small.file.limit" : "0", # 20MB
   "hoodie.keep.max.commits" : "12",
   "hoodie.keep.min.commits" : "11",
   "hoodie.bulkinsert.sort.mode" : "NONE",
   "hoodie.clustering.inline" : "true",
   "hoodie.clustering.inline.max.commits" : "2",
   "hoodie.clustering.plan.strategy.small.file.limit" : "20971520" , # 
20MB
   "clustering.plan.strategy.target.file.max.bytes" : "31457280", # 30 
MB
   "hoodie.metadata.enable" : "true"
   }
   
   
   # Write DataFrame to Hudi
   
generateDataFrame().write.options(**hudi_options).format("org.apache.hudi") \
   .option("hoodie.datasource.write.hive_style_partitioning", "true") \
   .mode("append") \
   .save(PATH)
   
   
   if __name__ == "__main__":
   for _ in range(1001):
   loop()
   ```
   
   READER 
   ```
   spark = get_spark_session(spark_version="3.2", hudi_version="0.11.0")
   
   def loop():
   print(spark.read.format("hudi").load(PATH).count())
   spark.read.format("hudi").load(PATH).show()
   
   if __name__ == "__main__":
   for _ in range(1001):
   loop()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Poor parallelism in BLOOM indexing stage with Hudi 0.12.3 [hudi]

2023-11-15 Thread via GitHub


ChiehFu commented on issue #10115:
URL: https://github.com/apache/hudi/issues/10115#issuecomment-1813862633

   @ad1happy2go  I am not sure why it only had 3 tasks. This particular upsert 
job upserted 328,550 records.
   https://github.com/apache/hudi/assets/11819388/4826e4b8-9a90-4973-b040-6decb711fda2";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7105) Add FileSystemViewManager configuable

2023-11-15 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-7105:

Description: If there exists many partitions and files When generating the 
clean plan, it's easy to throw oom exception. Using secondaryFileSystemView 
first is more stable than remoteFileSystemView.  (was: If there exists mang 
partitions and files When generating the clean plan, it's easy to throw oom 
exception. Using secondaryFileSystemView first is more stable than 
remoteFileSystemView.)

> Add FileSystemViewManager configuable
> -
>
> Key: HUDI-7105
> URL: https://issues.apache.org/jira/browse/HUDI-7105
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>  Labels: clean
>
> If there exists many partitions and files When generating the clean plan, 
> it's easy to throw oom exception. Using secondaryFileSystemView first is more 
> stable than remoteFileSystemView.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7105) Add FileSystemViewManager configuable

2023-11-15 Thread kwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kwang updated HUDI-7105:

Description: If there exists mang partitions and files When generating the 
clean plan, it's easy to throw oom exception. Using secondaryFileSystemView 
first is more stable than remoteFileSystemView.  (was: If there exists mang 
partitions and files When generating the clean plan, it's easy to throw oom 
exception. Using secondaryFileSystemView is more stable than 
remoteFileSystemView.)

> Add FileSystemViewManager configuable
> -
>
> Key: HUDI-7105
> URL: https://issues.apache.org/jira/browse/HUDI-7105
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>  Labels: clean
>
> If there exists mang partitions and files When generating the clean plan, 
> it's easy to throw oom exception. Using secondaryFileSystemView first is more 
> stable than remoteFileSystemView.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7099] Providing metrics for archive and defining some string constants [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10101:
URL: https://github.com/apache/hudi/pull/10101#issuecomment-1813839813

   
   ## CI report:
   
   * 2f97634b8b59e9f61dc05b649e78f9fe747c5ee5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20922)
 
   * 178ef4eadac6ab6d009d86ab86d35babe952 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20942)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6887] Add test for Record Index and MIT queries [hudi]

2023-11-15 Thread via GitHub


lokeshj1703 commented on PR #9760:
URL: https://github.com/apache/hudi/pull/9760#issuecomment-1813833900

   The test added here is passing locally but failing in the CI. I have to 
debug and fix the CI failure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7099] Providing metrics for archive and defining some string constants [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10101:
URL: https://github.com/apache/hudi/pull/10101#issuecomment-1813833815

   
   ## CI report:
   
   * 2f97634b8b59e9f61dc05b649e78f9fe747c5ee5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20922)
 
   * 178ef4eadac6ab6d009d86ab86d35babe952 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (35af64db466 -> 874b5dec5e9)

2023-11-15 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 35af64db466 [Minor] Throw exceptions when cleaner/compactor fail 
(#10108)
 add 874b5dec5e9 [HUDI-6806] Support Spark 3.5.0 (#9717)

No new revisions were added by this update.

Summary of changes:
 .github/workflows/bot.yml  | 13 +++
 .../scala/org/apache/hudi/HoodieSparkUtils.scala   |  2 +
 .../org/apache/hudi/SparkAdapterSupport.scala  |  4 +-
 .../scala/org/apache/spark/sql/DataFrameUtil.scala |  6 +-
 .../spark/sql/HoodieCatalystExpressionUtils.scala  | 16 ++--
 .../org/apache/spark/sql/HoodieSchemaUtils.scala   |  9 +++
 .../org/apache/spark/sql/HoodieUnsafeUtils.scala   | 13 +--
 .../HoodieSparkPartitionedFileUtils.scala  | 20 +++--
 .../org/apache/spark/sql/hudi/SparkAdapter.scala   |  5 +-
 .../org/apache/hudi/avro/TestHoodieAvroUtils.java  |  4 +-
 .../hudi/common/util/TestClusteringUtils.java  |  2 +
 .../dag/nodes/BaseValidateDatasetNode.java | 13 +--
 .../scala/org/apache/hudi/HoodieBaseRelation.scala |  4 +-
 .../scala/org/apache/hudi/HoodieCDCFileIndex.scala |  2 +-
 .../scala/org/apache/hudi/HoodieFileIndex.scala|  9 ++-
 .../apache/hudi/HoodieIncrementalFileIndex.scala   |  9 ++-
 .../datasources/HoodieInMemoryFileIndex.scala  |  5 +-
 .../hudi/testutils/SparkDatasetTestUtils.java  | 19 ++---
 hudi-spark-datasource/hudi-spark/pom.xml   | 30 +++
 .../spark/sql/hudi/analysis/HoodieAnalysis.scala   | 19 -
 .../hudi/command/CallProcedureHoodieCommand.scala  |  6 +-
 .../hudi/command/CompactionHoodiePathCommand.scala |  5 +-
 .../command/CompactionHoodieTableCommand.scala |  5 +-
 .../command/CompactionShowHoodiePathCommand.scala  |  5 +-
 .../command/CompactionShowHoodieTableCommand.scala |  5 +-
 .../command/InsertIntoHoodieTableCommand.scala | 10 ++-
 .../TestBulkInsertInternalPartitionerForRows.java  |  0
 .../TestHoodieDatasetBulkInsertHelper.java | 19 ++---
 .../row/TestHoodieInternalRowParquetWriter.java|  0
 .../io/storage/row/TestHoodieRowCreateHandle.java  | 14 +++-
 .../hudi/testutils/KeyGeneratorTestUtilities.java  | 20 ++---
 .../org/apache/hudi/TestAvroConversionUtils.scala  |  2 +-
 .../read/TestHoodieFileGroupReaderOnSpark.scala|  9 ++-
 .../apache/spark/sql/hudi/TestInsertTable.scala| 22 +-
 hudi-spark-datasource/hudi-spark2/pom.xml  |  8 ++
 .../sql/HoodieSpark2CatalystExpressionUtils.scala  |  7 +-
 .../apache/spark/sql/HoodieSpark2SchemaUtils.scala |  6 ++
 .../apache/spark/sql/adapter/Spark2Adapter.scala   |  7 +-
 .../HoodieSpark2PartitionedFileUtils.scala | 12 ++-
 .../HoodieBulkInsertInternalWriterTestBase.java|  0
 .../apache/hudi/spark3/internal/ReflectUtil.java   |  8 +-
 .../spark/sql/adapter/BaseSpark3Adapter.scala  |  6 +-
 hudi-spark-datasource/hudi-spark3.0.x/pom.xml  | 15 
 .../sql/HoodieSpark30CatalystExpressionUtils.scala |  7 +-
 .../spark/sql/HoodieSpark30SchemaUtils.scala   |  6 ++
 .../HoodieSpark30PartitionedFileUtils.scala| 12 ++-
 .../HoodieBulkInsertInternalWriterTestBase.java|  0
 .../TestHoodieBulkInsertDataInternalWriter.java|  0
 .../TestHoodieDataSourceInternalBatchWrite.java|  0
 hudi-spark-datasource/hudi-spark3.1.x/pom.xml  | 15 
 .../sql/HoodieSpark31CatalystExpressionUtils.scala |  8 +-
 .../spark/sql/HoodieSpark31SchemaUtils.scala   |  6 ++
 .../HoodieSpark31PartitionedFileUtils.scala| 12 ++-
 .../HoodieBulkInsertInternalWriterTestBase.java|  0
 .../TestHoodieBulkInsertDataInternalWriter.java|  0
 .../TestHoodieDataSourceInternalBatchWrite.java|  0
 hudi-spark-datasource/hudi-spark3.2.x/pom.xml  |  8 +-
 .../sql/HoodieSpark32CatalystExpressionUtils.scala |  7 +-
 .../spark/sql/HoodieSpark32SchemaUtils.scala   |  6 ++
 .../HoodieSpark32PartitionedFileUtils.scala| 12 ++-
 .../parquet/Spark32DataSourceUtils.scala}  |  2 +-
 .../Spark32LegacyHoodieParquetFileFormat.scala | 10 +--
 .../sql/hudi/analysis/HoodieSpark32Analysis.scala  | 66 
 .../HoodieBulkInsertInternalWriterTestBase.java|  0
 .../TestHoodieBulkInsertDataInternalWriter.java|  0
 .../TestHoodieDataSourceInternalBatchWrite.java|  0
 .../hudi/analysis/HoodieSpark32PlusAnalysis.scala  | 28 ---
 .../sql/HoodieSpark33CatalystExpressionUtils.scala |  9 ++-
 .../spark/sql/HoodieSpark33SchemaUtils.scala   |  6 ++
 .../HoodieSpark33PartitionedFileUtils.scala| 12 ++-
 .../parquet/Spark33DataSourceUtils.scala}  |  2 +-
 .../Spark33LegacyHoodieParquetFileFormat.scala | 10 +--
 .../sql/hudi/analysis/HoodieSpark33Analysis.scala  | 66 
 .../HoodieBulkInsertInternalWriterTestBase.java|  0
 .../hudi/spark3/internal/TestReflectUtil.java  |  3 +-
 .../sql/HoodieSpark34CatalystExpressionUtils.scala |  7 +-
 .../spark/sql/Hood

Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-15 Thread via GitHub


yihua merged PR #9717:
URL: https://github.com/apache/hudi/pull/9717


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-15 Thread via GitHub


yihua commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1813833441

   Azure CI on master also fails on the fourth task.  Merging this PR.
   https://github.com/apache/hudi/assets/2497195/fdcd6011-d90a-4861-a9a4-c21ed62414ce";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR][DNM] Add logs to test runs [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10111:
URL: https://github.com/apache/hudi/pull/10111#issuecomment-1813828330

   
   ## CI report:
   
   * 65c56d302e05ac18639929442f9b533d11f38ed5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20938)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10063:
URL: https://github.com/apache/hudi/pull/10063#issuecomment-1813828231

   
   ## CI report:
   
   * edb9997799c672e69a5a81271f32504e270846d2 UNKNOWN
   * 2c51a6c39ee41fac34110a41f943a3f1dee93f0f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20936)
 
   * 34efaac278dde7fd73515e6d54418a6ff8815326 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20939)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]When hudi integrates hive, an error is reported when the hive external table is queried [hudi]

2023-11-15 Thread via GitHub


Jackkaabe commented on issue #10084:
URL: https://github.com/apache/hudi/issues/10084#issuecomment-1813823608

   > @Jackkaabe This happens due to conflict with the parquet dependency. You 
can try shade the parquet jars and rebuild it by adding following configuration 
to the Flink-bundle pom.xml.
   > 
   > ```
   > 
   >   org.apache.parquet
   >   
${flink.bundle.shade.prefix}org.apache.parquet
   > 
   > ```
   > 
   > cc @danny0405
   
   I did it, but still got the same error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Poor parallelism in BLOOM indexing stage with Hudi 0.12.3 [hudi]

2023-11-15 Thread via GitHub


ad1happy2go commented on issue #10115:
URL: https://github.com/apache/hudi/issues/10115#issuecomment-1813816561

   @ChiehFu Do you know if any particular reason why it's taking only 3 tasks. 
Can you paste the full UI for one of the job. Need to check how many tasks it 
create for Tagging stage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-7104) Cleaner could miss to clean up some files w/ savepoint interplay

2023-11-15 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-7104:
-

Assignee: sivabalan narayanan

> Cleaner could miss to clean up some files w/ savepoint interplay 
> -
>
> Key: HUDI-7104
> URL: https://issues.apache.org/jira/browse/HUDI-7104
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cleaning
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> Lets say partitioning is day based and is based on created date. So, older 
> partitions generally does not get any new data after few days. 
>  
> Lets say we have savepoints added to a day and later removed. 
> day 1: cleaned up. 
> day2: savepoint added. and so cleaner ignord. 
> day3: cleaned up 
> day4: earliest commit to retain based on cleaner configs. 
>  
> So, w/ this table/timeline state, if we remove the savepointed commit, data 
> pertaining to day2 will never be cleaned by the cleaner since its lesser than 
> the earliest commit to retain. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT] hudi sql task hang java.lang.System.exit block [hudi]

2023-11-15 Thread via GitHub


zyclove commented on issue #10112:
URL: https://github.com/apache/hudi/issues/10112#issuecomment-1813780480

   I will check it and retry. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] RFC 63 Functional Index Hudi 0.1.0-beta [hudi]

2023-11-15 Thread via GitHub


codope commented on issue #10110:
URL: https://github.com/apache/hudi/issues/10110#issuecomment-1813780193

   Hi @soumilshah1995 , thanks for giving it a try! Currently, the `FUNCTION` 
keyword is not integrated. I need to update the RFC with the exact syntax which 
can be found here in the SQL DDL docs - 
https://hudi.apache.org/docs/next/sql_ddl#create-index-experimental
   We are tracking the issue to simplify the syntax. Ideally, we want users to 
be able to just say `CREATE INDEX func_index_abc on xyz_hudi_table USING 
column_stats(hour(ts))` without using `FUNCTION` keyword or provide extra 
options to specify the function. We will have it in 1.0 GA. Feel free to reach 
out to me directly on Hudi Slack if you're more interested in this feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] Poor parallelism in BLOOM indexing stage with Hudi 0.12.3 [hudi]

2023-11-15 Thread via GitHub


ChiehFu opened a new issue, #10115:
URL: https://github.com/apache/hudi/issues/10115

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   Hello, Recently we migrated our datasets from Hudi 0.8 to Hudi 0.12.3 and 
started experiencing slowness in indexing stage in some tables during upserts.
   
   After looking into spark steps, we found out that there was one particular 
stage where Hudi set a very low value parallelism for a indexing stage (stage 
32 in the screenshoot) and ended up causing long duration and shuffle spill 
which further slowdown the stage. We set `hoodie.bloom.index.parallelism=2000` 
however, it doesn't seem to affect the parallelism of that particular stage.
   
   In Hudi 0.8, Hudi used to use the value we set in 
`hoodie.upsert.shuffle.parallelism` for parallelism for this stage, however it 
seems in Hudi 0.12, the parallelism is being calculated dynamically.
   
   Can you please help us understand if there is any Hudi configuration we 
should use to increase the parallelism for the stage?
   
   We also tried setting `hoodie.copyonwrite.record.size.estimate` to a very 
small value as it seems help forcing Hudi to use a larger parallelism for 
indexing initially, but it's very inconsistent as we still see small values 
being set for the stage across upsert jobs.
   
   
   
   **Environment Description**
   
   * Hudi version : 0.12.3
   
   * Spark version :  3.1.3
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.3.3
   
   * Storage (HDFS/S3/GCS..) : S3 
   
   * Running on Docker? (yes/no) : no
   * EMR: 6.10.0
   
   **Additional context**
   
   Hudi configs
   ```
   hoodie.metadata.enable: true
   hoodie.metadata.validate: true
   hoodie.cleaner.commits.retained: 72
   hoodie.keep.min.commits: 100
   hoodie.keep.max.commits: 150
   hoodie.datasource.write.payload.class: 
org.apache.hudi.common.model.DefaultHoodieRecordPayload
   hoodie.index.type: BLOOM
   hoodie.bloom.index.parallelism: 2000
   hoodie.copyonwrite.record.size.estimate: 1
   hoodie.metadata.enable: true
   hoodie.datasource.write.table.type: COPY_ON_WRITE
   hoodie.insert.shuffle.parallelism: 1500
   hoodie.datasource.write.operation: upsert
   hoodie.datasource.hive_sync.partition_extractor_class: 
org.apache.hudi.hive.MultiPartKeysValueExtractor
   hoodie.datasource.write.keygenerator.class: 
org.apache.hudi.keygen.ComplexKeyGenerator
   ```
   https://github.com/apache/hudi/assets/11819388/e0381e62-0690-4bce-8fe3-15f6590870bb";>
   
   https://github.com/apache/hudi/assets/11819388/3a481bb8-c6ff-456d-baca-b60b22788abf";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Modified description to include missing trigger strategy [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10114:
URL: https://github.com/apache/hudi/pull/10114#issuecomment-1813755365

   
   ## CI report:
   
   * 5152ea66bd6f4a3c3f506bfe051ef4122973e908 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20941)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Modified description to include missing trigger strategy [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10114:
URL: https://github.com/apache/hudi/pull/10114#issuecomment-1813750459

   
   ## CI report:
   
   * 5152ea66bd6f4a3c3f506bfe051ef4122973e908 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7090]Set the maxParallelism for singleton operator [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10090:
URL: https://github.com/apache/hudi/pull/10090#issuecomment-1813750327

   
   ## CI report:
   
   * 35219c2180342faea6e09987e69271508a3f0096 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20912)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR][DNM] Full test runtime 2 [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10113:
URL: https://github.com/apache/hudi/pull/10113#issuecomment-1813750425

   
   ## CI report:
   
   * 272f308766e7bfeaf03d7d5bfc9b15cd4bf92a15 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20940)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10063:
URL: https://github.com/apache/hudi/pull/10063#issuecomment-1813750267

   
   ## CI report:
   
   * edb9997799c672e69a5a81271f32504e270846d2 UNKNOWN
   * d22fcb976c5c468cb129abf9c4ee200eb249fb73 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20934)
 
   * 2c51a6c39ee41fac34110a41f943a3f1dee93f0f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20936)
 
   * 34efaac278dde7fd73515e6d54418a6ff8815326 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20939)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7105) Add FileSystemViewManager configuable

2023-11-15 Thread kwang (Jira)
kwang created HUDI-7105:
---

 Summary: Add FileSystemViewManager configuable
 Key: HUDI-7105
 URL: https://issues.apache.org/jira/browse/HUDI-7105
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: kwang


If there exists mang partitions and files When generating the clean plan, it's 
easy to throw oom exception. Using secondaryFileSystemView is more stable than 
remoteFileSystemView.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [MINOR][DNM] Full test runtime 2 [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10113:
URL: https://github.com/apache/hudi/pull/10113#issuecomment-1813745670

   
   ## CI report:
   
   * 272f308766e7bfeaf03d7d5bfc9b15cd4bf92a15 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7103] Support time travel queies for COW tables [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10109:
URL: https://github.com/apache/hudi/pull/10109#issuecomment-1813745641

   
   ## CI report:
   
   * 01cd726aff602316f444f98e6e61bf2433fa3e95 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20931)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10102:
URL: https://github.com/apache/hudi/pull/10102#issuecomment-1813745621

   
   ## CI report:
   
   * c3ff2511a30564e5a5ff0cb407326ff6ef0584e3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20930)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7090]Set the maxParallelism for singleton operator [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10090:
URL: https://github.com/apache/hudi/pull/10090#issuecomment-1813745573

   
   ## CI report:
   
   * 35219c2180342faea6e09987e69271508a3f0096 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20912)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10063:
URL: https://github.com/apache/hudi/pull/10063#issuecomment-1813745503

   
   ## CI report:
   
   * edb9997799c672e69a5a81271f32504e270846d2 UNKNOWN
   * d22fcb976c5c468cb129abf9c4ee200eb249fb73 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20934)
 
   * 2c51a6c39ee41fac34110a41f943a3f1dee93f0f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20936)
 
   * 34efaac278dde7fd73515e6d54418a6ff8815326 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [MINOR] Modified description to include missing trigger strategy [hudi]

2023-11-15 Thread via GitHub


voonhous opened a new pull request, #10114:
URL: https://github.com/apache/hudi/pull/10114

   ### Change Logs
   
   In https://github.com/apache/hudi/pull/6144, a new compaction trigger 
strategy was added named `NUM_COMMITS_AFTER_LAST_REQUEST`, 
org.apache.hudi.table.action.compact.CompactionTriggerStrategy.
   
   However, the FlinkOptions description as never updated to include this new 
trigger strategy. Adding it in so that configs page on doc-site will reflect 
this trigger strategy for completeness.
   
   TODO: 
   Might need to do some refactoring to centralise these common config so that 
we do not have to worry about these de-sync in the future. Might also make it 
easier for testing.
   
   ### Impact
   
   None
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7071] Throw exception when clustering/compactin job fail [hudi]

2023-11-15 Thread via GitHub


ksmou commented on PR #10050:
URL: https://github.com/apache/hudi/pull/10050#issuecomment-1813740241

   > Is it fixed via: #10108 ?
   
   It's good. All services those calling `UtilHelpers.retry` have similar 
problems. I fix the clustering/index job like this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7090]Set the maxParallelism for singleton operator [hudi]

2023-11-15 Thread via GitHub


hehuiyuan commented on code in PR #10090:
URL: https://github.com/apache/hudi/pull/10090#discussion_r1395108190


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java:
##
@@ -207,6 +207,7 @@ public DataStream 
produceDataStream(StreamExecutionEnvironment execEnv)
   SingleOutputStreamOperator source = 
execEnv.addSource(monitoringFunction, getSourceOperatorName("split_monitor"))
   .uid(Pipelines.opUID("split_monitor", conf))
   .setParallelism(1)
+  .setMaxParallelism(1)

Review Comment:
   single operator.
   
   
https://github.com/apache/flink/blob/012704d9884f92274495fbf6fdb7234373944212/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/table/stream/StreamingSink.java#L124



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7090]Set the maxParallelism for singleton operator [hudi]

2023-11-15 Thread via GitHub


hehuiyuan commented on code in PR #10090:
URL: https://github.com/apache/hudi/pull/10090#discussion_r1395108190


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java:
##
@@ -207,6 +207,7 @@ public DataStream 
produceDataStream(StreamExecutionEnvironment execEnv)
   SingleOutputStreamOperator source = 
execEnv.addSource(monitoringFunction, getSourceOperatorName("split_monitor"))
   .uid(Pipelines.opUID("split_monitor", conf))
   .setParallelism(1)
+  .setMaxParallelism(1)

Review Comment:
   single operator 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7090]Set the maxParallelism for singleton operator [hudi]

2023-11-15 Thread via GitHub


hehuiyuan commented on PR #10090:
URL: https://github.com/apache/hudi/pull/10090#issuecomment-1813732678

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR][DNM] Add logs to test runs [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10111:
URL: https://github.com/apache/hudi/pull/10111#issuecomment-1813718687

   
   ## CI report:
   
   * 65c56d302e05ac18639929442f9b533d11f38ed5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20938)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [MINOR][DNM] Full test runtime 2 [hudi]

2023-11-15 Thread via GitHub


yihua opened a new pull request, #10113:
URL: https://github.com/apache/hudi/pull/10113

   ### Change Logs
   
   As above.  This reverts #9260 to fix CI.
   
   ### Impact
   
   Testing only.
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR][DNM] Add logs to test runs [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10111:
URL: https://github.com/apache/hudi/pull/10111#issuecomment-1813712301

   
   ## CI report:
   
   * 65c56d302e05ac18639929442f9b533d11f38ed5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Can hudi support updating only specific columns ? (not rewrite base columns) [hudi]

2023-11-15 Thread via GitHub


danny0405 commented on issue #10086:
URL: https://github.com/apache/hudi/issues/10086#issuecomment-1813711254

   The release 1.0 doc is not released yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7090]Set the maxParallelism for singleton operator [hudi]

2023-11-15 Thread via GitHub


danny0405 commented on code in PR #10090:
URL: https://github.com/apache/hudi/pull/10090#discussion_r1395092204


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java:
##
@@ -207,6 +207,7 @@ public DataStream 
produceDataStream(StreamExecutionEnvironment execEnv)
   SingleOutputStreamOperator source = 
execEnv.addSource(monitoringFunction, getSourceOperatorName("split_monitor"))
   .uid(Pipelines.opUID("split_monitor", conf))
   .setParallelism(1)
+  .setMaxParallelism(1)

Review Comment:
   Is the `setMaxParallelism` takes effect with per-operator scope or global 
scope?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7071] Throw exception when clustering/compactin job fail [hudi]

2023-11-15 Thread via GitHub


danny0405 commented on PR #10050:
URL: https://github.com/apache/hudi/pull/10050#issuecomment-1813708197

   Is it fixed via: https://github.com/apache/hudi/pull/10108 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10063:
URL: https://github.com/apache/hudi/pull/10063#issuecomment-1813707163

   
   ## CI report:
   
   * edb9997799c672e69a5a81271f32504e270846d2 UNKNOWN
   * d22fcb976c5c468cb129abf9c4ee200eb249fb73 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20934)
 
   * 411f1e09cc33590a4a1f7cc93c65db083494633b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20935)
 
   * 2c51a6c39ee41fac34110a41f943a3f1dee93f0f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20936)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1813706840

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * afe70daf89229ab3ac4153d69b511121b8a31d9e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20933)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Query failure due to replacecommit being archived [hudi]

2023-11-15 Thread via GitHub


danny0405 commented on issue #10107:
URL: https://github.com/apache/hudi/issues/10107#issuecomment-1813704292

   Should be fixed in recent releases, cc @ad1happy2go for double check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] hudi sql task hang java.lang.System.exit block [hudi]

2023-11-15 Thread via GitHub


danny0405 commented on issue #10112:
URL: https://github.com/apache/hudi/issues/10112#issuecomment-1813702995

   Not sure whether this fix is related with your issue: 
https://github.com/apache/hudi/pull/10108


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]

2023-11-15 Thread via GitHub


danny0405 commented on code in PR #10102:
URL: https://github.com/apache/hudi/pull/10102#discussion_r1395085791


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java:
##
@@ -260,7 +260,7 @@ private void scanInternalV1(Option keySpecOpt) {
 && 
!HoodieTimeline.compareTimestamps(logBlock.getLogBlockHeader().get(INSTANT_TIME),
 HoodieTimeline.LESSER_THAN_OR_EQUALS, this.latestInstantTime
 )) {
   // hit a block with instant time greater than should be processed, 
stop processing further
-  break;
+  continue;
 }

Review Comment:
   The reader consumption upper threshold is introduced for unnecessary reading 
of log block, should we drop it? I don't think so, maybe you shoud just fix the 
threshold itself.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] hudi sql task hang java.lang.System.exit block [hudi]

2023-11-15 Thread via GitHub


zyclove commented on issue #10112:
URL: https://github.com/apache/hudi/issues/10112#issuecomment-1813701762

   Thread 8953: (state = IN_NATIVE_TRANS)
- org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(int, 
org.apache.hadoop.net.unix.DomainSocketWatcher$FdSet) @bci=0 (Interpreted frame)
- org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(int, 
org.apache.hadoop.net.unix.DomainSocketWatcher$FdSet) @bci=2, line=52 
(Interpreted frame)
- org.apache.hadoop.net.unix.DomainSocketWatcher$2.run() @bci=763, line=503 
(Interpreted frame)
- java.lang.Thread.run() @bci=11, line=750 (Interpreted frame)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] hudi sql task hang java.lang.System.exit block [hudi]

2023-11-15 Thread via GitHub


zyclove opened a new issue, #10112:
URL: https://github.com/apache/hudi/issues/10112

   
   **Describe the problem you faced**
   The sql task is over, bug the drive can not exit some times .
   
   If the same task is run many times, there is a small chance that it will 
exit abnormally.
   
   Tens of thousands of tasks are executed every day, and this problem has 
never occurred for non-hudi spark tasks. 
   
   Hudi task have occasionally appeared several times before.
   
   
![企业微信截图_28aec49f-d1c0-45d0-b9e0-dc1f31b21ee0](https://github.com/apache/hudi/assets/15028279/16c6762f-afde-47ee-ac12-bb2d2c590f45)
   
   
   
![企业微信截图_4dcf7b0c-1c6b-44b7-99e7-d8d5134434a8](https://github.com/apache/hudi/assets/15028279/6b156a0f-43c4-4fa4-adaa-b75458f16a3f)
   
   
   
![企业微信截图_b8884f5e-ff16-4115-a826-f5a50b281df9](https://github.com/apache/hudi/assets/15028279/2b79c34c-694b-49ca-839b-65c7cd2c4769)
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. /usr/lib/spark/bin/spark-sql --name 
63130__VOLCANO_JOB_1699949768615_004319 -f 
/tmp/VOLCANO_JOB_1699949768615_004319.sql --master yarn --queue hadoop 
--driver-memory 8g --executor-memory 4G --executor-cores 2 --num-executors 8  
--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.14.0 --conf 
spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension 
--conf spark.sql.autoBroadcastJoinThreshold=2G --conf 
spark.sql.broadcastTimeout=6 --conf spark.memory.storageFraction=0.7 --conf 
spark.yarn.priority=5 --conf spark.sql.adaptive.enabled=true
   
   
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :0.14.0
   
   * Spark version :3.2.1
   
   * Hive version :3.1.3
   
   * Hadoop version :3.2.2
   
   * Storage (HDFS/S3/GCS..) :s3
   
   * Running on Docker? (yes/no) :no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```
   Attaching to process ID 8854, please wait...
   Debugger attached successfully.
   Server compiler detected.
   JVM version is 25.382-b05
   Deadlock Detection:
   
   No deadlocks found.
   
   Thread 23860: (state = BLOCKED)
- java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be 
imprecise)
- io.netty.util.concurrent.SingleThreadEventExecutor.confirmShutdown() 
@bci=153, line=787 (Interpreted frame)
- io.netty.channel.nio.NioEventLoop.run() @bci=406, line=530 (Interpreted 
frame)
- io.netty.util.concurrent.SingleThreadEventExecutor$4.run() @bci=44, 
line=986 (Interpreted frame)
- io.netty.util.internal.ThreadExecutorMap$2.run() @bci=11, line=74 
(Interpreted frame)
- io.netty.util.concurrent.FastThreadLocalRunnable.run() @bci=4, line=30 
(Interpreted frame)
- java.lang.Thread.run() @bci=11, line=750 (Compiled frame)
   
   
   Thread 23859: (state = BLOCKED)
- java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be 
imprecise)
- io.netty.util.concurrent.SingleThreadEventExecutor.confirmShutdown() 
@bci=153, line=787 (Interpreted frame)
- io.netty.channel.nio.NioEventLoop.run() @bci=406, line=530 (Interpreted 
frame)
- io.netty.util.concurrent.SingleThreadEventExecutor$4.run() @bci=44, 
line=986 (Interpreted frame)
- io.netty.util.internal.ThreadExecutorMap$2.run() @bci=11, line=74 
(Interpreted frame)
- io.netty.util.concurrent.FastThreadLocalRunnable.run() @bci=4, line=30 
(Interpreted frame)
- java.lang.Thread.run() @bci=11, line=750 (Compiled frame)
   
   
   Thread 23858: (state = BLOCKED)
- java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be 
imprecise)
- io.netty.util.concurrent.SingleThreadEventExecutor.confirmShutdown() 
@bci=153, line=787 (Interpreted frame)
- io.netty.channel.nio.NioEventLoop.run() @bci=406, line=530 (Interpreted 
frame)
- io.netty.util.concurrent.SingleThreadEventExecutor$4.run() @bci=44, 
line=986 (Interpreted frame)
- io.netty.util.internal.ThreadExecutorMap$2.run() @bci=11, line=74 
(Interpreted frame)
- io.netty.util.concurrent.FastThreadLocalRunnable.run() @bci=4, line=30 
(Interpreted frame)
- java.lang.Thread.run() @bci=11, line=750 (Compiled frame)
   
   
   Thread 23857: (state = BLOCKED)
- java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be 
imprecise)
- io.netty.util.concurrent.SingleThreadEventExecutor.confirmShutdown() 
@bci=153, line=787 (Interpreted frame)
- io.netty.channel.nio.NioEventLoop.run() @bci=406, line=530 (Interpreted 
frame)
- io.netty.util.concurrent.SingleThreadEventExecutor$4.run() @bci=44, 
line=986 (Interpreted frame)
- io.netty.util.internal.ThreadExecutorMap$2.run() @bci=11, line=74 
(Interpreted frame)
- io.netty.util.concurrent.FastThreadLocalRunnable.run() @bci=4, line=30 
(Interpreted frame)
- java.lang

Re: [PR] [MINOR] CLAZZ_CACHE get should be synchonized avoid thread safe problem [hudi]

2023-11-15 Thread via GitHub


danny0405 closed pull request #9788: [MINOR] CLAZZ_CACHE get should be 
synchonized avoid thread safe problem
URL: https://github.com/apache/hudi/pull/9788


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] CLAZZ_CACHE get should be synchonized avoid thread safe problem [hudi]

2023-11-15 Thread via GitHub


danny0405 commented on PR #9788:
URL: https://github.com/apache/hudi/pull/9788#issuecomment-1813699169

   Close because it been fixed via https://github.com/apache/hudi/pull/9786.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [MINOR][DNM] Add logs to test runs [hudi]

2023-11-15 Thread via GitHub


yihua opened a new pull request, #10111:
URL: https://github.com/apache/hudi/pull/10111

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [Minor] Throw exceptions when cleaner/compactor fail (#10108)

2023-11-15 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 35af64db466 [Minor] Throw exceptions when cleaner/compactor fail 
(#10108)
35af64db466 is described below

commit 35af64db46668115dc7c9cd9b05844819cb1157e
Author: Shawn Chang <42792772+c...@users.noreply.github.com>
AuthorDate: Wed Nov 15 18:36:42 2023 -0800

[Minor] Throw exceptions when cleaner/compactor fail (#10108)

Co-authored-by: Shawn Chang 
---
 .../main/java/org/apache/hudi/utilities/HoodieCleaner.java  | 13 +++--
 .../java/org/apache/hudi/utilities/HoodieCompactor.java | 13 -
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCleaner.java 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCleaner.java
index 53b80e55b25..49aed0b 100644
--- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCleaner.java
+++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCleaner.java
@@ -26,6 +26,7 @@ import org.apache.hudi.config.HoodieWriteConfig;
 import com.beust.jcommander.JCommander;
 import com.beust.jcommander.Parameter;
 import org.apache.hadoop.fs.Path;
+import org.apache.hudi.exception.HoodieException;
 import org.apache.spark.api.java.JavaSparkContext;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -103,28 +104,20 @@ public class HoodieCleaner {
 JCommander cmd = new JCommander(cfg, null, args);
 if (cfg.help || args.length == 0) {
   cmd.usage();
-  System.exit(1);
+  throw new HoodieException("Failed to run cleaning for " + cfg.basePath);
 }
 
 String dirName = new Path(cfg.basePath).getName();
 JavaSparkContext jssc = UtilHelpers.buildSparkContext("hoodie-cleaner-" + 
dirName, cfg.sparkMaster);
-boolean success = true;
 
 try {
   new HoodieCleaner(cfg, jssc).run();
 } catch (Throwable throwable) {
-  success = false;
-  LOG.error("Failed to run cleaning for " + cfg.basePath, throwable);
+  throw new HoodieException("Failed to run cleaning for " + cfg.basePath, 
throwable);
 } finally {
   jssc.stop();
 }
 
-if (!success) {
-  // Return a non-zero exit code to properly notify any resource manager
-  // that cleaning was not successful
-  System.exit(1);
-}
-
 LOG.info("Cleaner ran successfully");
   }
 }
diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java
index 9b03cb7a724..c8bdf0da3a0 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java
@@ -29,6 +29,7 @@ import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.StringUtils;
 import org.apache.hudi.config.HoodieCleanConfig;
+import org.apache.hudi.exception.HoodieException;
 import org.apache.hudi.table.action.HoodieWriteMetadata;
 import 
org.apache.hudi.table.action.compact.strategy.LogFileSizeBasedCompactionStrategy;
 
@@ -168,18 +169,20 @@ public class HoodieCompactor {
 JCommander cmd = new JCommander(cfg, null, args);
 if (cfg.help || args.length == 0) {
   cmd.usage();
-  System.exit(1);
+  throw new HoodieException("Fail to run compaction for " + cfg.tableName 
+ ", return code: " + 1);
 }
 final JavaSparkContext jsc = UtilHelpers.buildSparkContext("compactor-" + 
cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
 int ret = 0;
 try {
-  HoodieCompactor compactor = new HoodieCompactor(jsc, cfg);
-  ret = compactor.compact(cfg.retry);
+  ret = new HoodieCompactor(jsc, cfg).compact(cfg.retry);
 } catch (Throwable throwable) {
-  LOG.error("Fail to run compaction for " + cfg.tableName, throwable);
+  throw new HoodieException("Fail to run compaction for " + cfg.tableName 
+ ", return code: " + ret, throwable);
 } finally {
   jsc.stop();
-  System.exit(ret);
+}
+
+if (ret != 0) {
+  throw new HoodieException("Fail to run compaction for " + cfg.tableName 
+ ", return code: " + ret);
 }
   }
 



Re: [PR] [MINOR] Throw exceptions when cleaner/compactor fail [hudi]

2023-11-15 Thread via GitHub


danny0405 merged PR #10108:
URL: https://github.com/apache/hudi/pull/10108


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Throw exceptions when cleaner/compactor fail [hudi]

2023-11-15 Thread via GitHub


danny0405 commented on PR #10108:
URL: https://github.com/apache/hudi/pull/10108#issuecomment-1813697919

   The failure is not relevent: 
https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=20928&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0&t=746585d8-b50a-55c3-26c5-517d93af9934&l=14572


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10063:
URL: https://github.com/apache/hudi/pull/10063#issuecomment-1813667907

   
   ## CI report:
   
   * edb9997799c672e69a5a81271f32504e270846d2 UNKNOWN
   * 97424b66af6de869a7feba00c6e8c24f80eb90a4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20927)
 
   * d22fcb976c5c468cb129abf9c4ee200eb249fb73 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20934)
 
   * 411f1e09cc33590a4a1f7cc93c65db083494633b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20935)
 
   * 2c51a6c39ee41fac34110a41f943a3f1dee93f0f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7099] Providing metrics for archive and defining some string constants [hudi]

2023-11-15 Thread via GitHub


stream2000 commented on code in PR #10101:
URL: https://github.com/apache/hudi/pull/10101#discussion_r1395063596


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/HoodieMetrics.java:
##
@@ -255,48 +277,57 @@ private void updateCommitTimingMetrics(long 
commitEpochTimeInMs, long durationIn
   Pair, Option> eventTimePairMinMax = 
metadata.getMinAndMaxEventTime();
   if (eventTimePairMinMax.getLeft().isPresent()) {
 long commitLatencyInMs = commitEpochTimeInMs + durationInMs - 
eventTimePairMinMax.getLeft().get();
-metrics.registerGauge(getMetricsName(actionType, "commitLatencyInMs"), 
commitLatencyInMs);
+metrics.registerGauge(getMetricsName(actionType, COMMIT_LATENCY_STR), 
commitLatencyInMs);
   }
   if (eventTimePairMinMax.getRight().isPresent()) {
 long commitFreshnessInMs = commitEpochTimeInMs + durationInMs - 
eventTimePairMinMax.getRight().get();
-metrics.registerGauge(getMetricsName(actionType, 
"commitFreshnessInMs"), commitFreshnessInMs);
+metrics.registerGauge(getMetricsName(actionType, 
COMMIT_FRESHNESS_STR), commitFreshnessInMs);
   }
-  metrics.registerGauge(getMetricsName(actionType, "commitTime"), 
commitEpochTimeInMs);
-  metrics.registerGauge(getMetricsName(actionType, "duration"), 
durationInMs);
+  metrics.registerGauge(getMetricsName(actionType, COMMIT_TIME_STR), 
commitEpochTimeInMs);
+  metrics.registerGauge(getMetricsName(actionType, DURATION_STR), 
durationInMs);
 }
   }
 
   public void updateRollbackMetrics(long durationInMs, long numFilesDeleted) {
 if (config.isMetricsOn()) {
   LOG.info(
   String.format("Sending rollback metrics (duration=%d, 
numFilesDeleted=%d)", durationInMs, numFilesDeleted));
-  metrics.registerGauge(getMetricsName("rollback", "duration"), 
durationInMs);
-  metrics.registerGauge(getMetricsName("rollback", "numFilesDeleted"), 
numFilesDeleted);
+  metrics.registerGauge(getMetricsName(HoodieTimeline.ROLLBACK_ACTION, 
DURATION_STR), durationInMs);
+  metrics.registerGauge(getMetricsName(HoodieTimeline.ROLLBACK_ACTION, 
DELETE_FILES_NUM_STR), numFilesDeleted);
 }
   }
 
   public void updateCleanMetrics(long durationInMs, int numFilesDeleted) {
 if (config.isMetricsOn()) {
   LOG.info(
   String.format("Sending clean metrics (duration=%d, 
numFilesDeleted=%d)", durationInMs, numFilesDeleted));
-  metrics.registerGauge(getMetricsName("clean", "duration"), durationInMs);
-  metrics.registerGauge(getMetricsName("clean", "numFilesDeleted"), 
numFilesDeleted);
+  metrics.registerGauge(getMetricsName(HoodieTimeline.CLEAN_ACTION, 
DURATION_STR), durationInMs);
+  metrics.registerGauge(getMetricsName(HoodieTimeline.CLEAN_ACTION, 
DELETE_FILES_NUM_STR), numFilesDeleted);
+}
+  }
+
+  public void updateArchiveMetrics(long durationInMs, int numFilesDeleted) {
+if (config.isMetricsOn()) {
+  LOG.info(
+  String.format("Sending archive metrics (duration=%d, 
numFilesDeleted=%d)", durationInMs, numFilesDeleted));
+  metrics.registerGauge(getMetricsName(ARCHIVE_ACTION, DURATION_STR), 
durationInMs);
+  metrics.registerGauge(getMetricsName(ARCHIVE_ACTION, 
DELETE_FILES_NUM_STR), numFilesDeleted);
 }
   }
 
   public void updateFinalizeWriteMetrics(long durationInMs, long 
numFilesFinalized) {
 if (config.isMetricsOn()) {
   LOG.info(String.format("Sending finalize write metrics (duration=%d, 
numFilesFinalized=%d)", durationInMs,
   numFilesFinalized));
-  metrics.registerGauge(getMetricsName("finalize", "duration"), 
durationInMs);
-  metrics.registerGauge(getMetricsName("finalize", "numFilesFinalized"), 
numFilesFinalized);
+  metrics.registerGauge(getMetricsName(FINALIZE_ACTION, DURATION_STR), 
durationInMs);
+  metrics.registerGauge(getMetricsName(FINALIZE_ACTION, 
FINALIZED_FILES_NUM_STR), numFilesFinalized);
 }
   }
 
   public void updateIndexMetrics(final String action, final long durationInMs) 
{
 if (config.isMetricsOn()) {
   LOG.info(String.format("Sending index metrics (%s.duration, %d)", 
action, durationInMs));

Review Comment:
   We can also update the string literal in the log here



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/HoodieMetrics.java:
##
@@ -92,20 +106,21 @@ public HoodieMetrics(HoodieWriteConfig config) {
 this.tableName = config.getTableName();
 if (config.isMetricsOn()) {
   metrics = Metrics.getInstance(config);
-  this.rollbackTimerName = getMetricsName("timer", 
HoodieTimeline.ROLLBACK_ACTION);
-  this.cleanTimerName = getMetricsName("timer", 
HoodieTimeline.CLEAN_ACTION);
-  this.commitTimerName = getMetricsName("timer", 
HoodieTimeline.COMMIT_ACTION);
-  this.deltaCommitTimerName = getMetricsName("timer", 
HoodieTimeline.DELTA_COMMIT_ACTION);
-  this.replaceCommitTimerName = getMetricsName("tim

Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10063:
URL: https://github.com/apache/hudi/pull/10063#issuecomment-1813653979

   
   ## CI report:
   
   * edb9997799c672e69a5a81271f32504e270846d2 UNKNOWN
   * 97424b66af6de869a7feba00c6e8c24f80eb90a4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20927)
 
   * d22fcb976c5c468cb129abf9c4ee200eb249fb73 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20934)
 
   * 411f1e09cc33590a4a1f7cc93c65db083494633b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1813653007

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * 017a37588ccb55c0df8a98a48a251146256d9406 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20929)
 
   * afe70daf89229ab3ac4153d69b511121b8a31d9e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20933)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10063:
URL: https://github.com/apache/hudi/pull/10063#issuecomment-1813638155

   
   ## CI report:
   
   * edb9997799c672e69a5a81271f32504e270846d2 UNKNOWN
   * 97424b66af6de869a7feba00c6e8c24f80eb90a4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20927)
 
   * d22fcb976c5c468cb129abf9c4ee200eb249fb73 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7104) Cleaner could miss to clean up some files w/ savepoint interplay

2023-11-15 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-7104:
-

 Summary: Cleaner could miss to clean up some files w/ savepoint 
interplay 
 Key: HUDI-7104
 URL: https://issues.apache.org/jira/browse/HUDI-7104
 Project: Apache Hudi
  Issue Type: Improvement
  Components: cleaning
Reporter: sivabalan narayanan


Lets say partitioning is day based and is based on created date. So, older 
partitions generally does not get any new data after few days. 

 

Lets say we have savepoints added to a day and later removed. 

day 1: cleaned up. 

day2: savepoint added. and so cleaner ignord. 

day3: cleaned up 

day4: earliest commit to retain based on cleaner configs. 

 

So, w/ this table/timeline state, if we remove the savepointed commit, data 
pertaining to day2 will never be cleaned by the cleaner since its lesser than 
the earliest commit to retain. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1813637324

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * 017a37588ccb55c0df8a98a48a251146256d9406 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20929)
 
   * afe70daf89229ab3ac4153d69b511121b8a31d9e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10063:
URL: https://github.com/apache/hudi/pull/10063#issuecomment-1813623184

   
   ## CI report:
   
   * edb9997799c672e69a5a81271f32504e270846d2 UNKNOWN
   * 97424b66af6de869a7feba00c6e8c24f80eb90a4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20927)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1813622358

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * 017a37588ccb55c0df8a98a48a251146256d9406 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20929)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10102:
URL: https://github.com/apache/hudi/pull/10102#issuecomment-1813546968

   
   ## CI report:
   
   * c3ff2511a30564e5a5ff0cb407326ff6ef0584e3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20930)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7103] Support time travel queies for COW tables [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10109:
URL: https://github.com/apache/hudi/pull/10109#issuecomment-1813547074

   
   ## CI report:
   
   * 01cd726aff602316f444f98e6e61bf2433fa3e95 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20931)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1813545833

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * af280647acca3e0cbf9f52c7bbe189f326cd8df6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20926)
 
   * 017a37588ccb55c0df8a98a48a251146256d9406 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20929)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7103] Support time travel queies for COW tables [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10109:
URL: https://github.com/apache/hudi/pull/10109#issuecomment-1813536258

   
   ## CI report:
   
   * 01cd726aff602316f444f98e6e61bf2433fa3e95 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7102) A bug for the time travel queries for MOR tables

2023-11-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7102:
-
Labels: pull-request-available  (was: )

> A bug for the time travel queries for MOR tables
> 
>
> Key: HUDI-7102
> URL: https://issues.apache.org/jira/browse/HUDI-7102
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Issue:
>  # Based on the provided TIMESTAMP_AS_OF, a list of file slices are returned. 
> However, these file slices that are returned are based on their base file 
> timestamp. That means, these slices may contain log files whose timestamps 
> are higher than the provided timestamp.
>  # Such that, when we try to merge the logs in the reverse order, we may see 
> these unqualified log files first, which triggers the "break" operation, and 
> no merging will be done.
>  
> Solution:
>  # The first solution is to filter the log files as well as the base files 
> for the file slices. 
>  # The second solution is to skip these unqualified log files, and keep 
> merging.
>  
> Risk:
>  * Not sure if new bugs would be introduced by changing the current behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10102:
URL: https://github.com/apache/hudi/pull/10102#issuecomment-1813536200

   
   ## CI report:
   
   * c3ff2511a30564e5a5ff0cb407326ff6ef0584e3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1813535629

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * af280647acca3e0cbf9f52c7bbe189f326cd8df6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20926)
 
   * 017a37588ccb55c0df8a98a48a251146256d9406 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5936] Fix serialization problem when FileStatus is not serializable [hudi]

2023-11-15 Thread via GitHub


yihua merged PR #10065:
URL: https://github.com/apache/hudi/pull/10065


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (dcd5a8182a1 -> bada5d91a8d)

2023-11-15 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from dcd5a8182a1 [HUDI-7069] Optimize metaclient construction and include 
table config options (#10048)
 add bada5d91a8d [HUDI-5936] Fix serialization problem when FileStatus is 
not serializable (#10065)

No new revisions were added by this update.

Summary of changes:
 .../hudi/common/fs/NonSerializableFileSystem.java  | 115 
 .../fs/TestHoodieSerializableFileStatus.java   |  86 
 .../common/fs/HoodieSerializableFileStatus.java| 144 +
 .../metadata/FileSystemBackedTableMetadata.java|  28 ++--
 4 files changed, 361 insertions(+), 12 deletions(-)
 create mode 100644 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/common/fs/NonSerializableFileSystem.java
 create mode 100644 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/common/fs/TestHoodieSerializableFileStatus.java
 create mode 100644 
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieSerializableFileStatus.java



Re: [PR] [MINOR] Throw exceptions when cleaner/compactor fail [hudi]

2023-11-15 Thread via GitHub


hudi-bot commented on PR #10108:
URL: https://github.com/apache/hudi/pull/10108#issuecomment-1813529460

   
   ## CI report:
   
   * 0165912015447a8ce331afa757ff764809113b9e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20928)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7103) Enable Time travel queries for COW

2023-11-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7103:
-
Labels: pull-request-available  (was: )

> Enable Time travel queries for COW
> --
>
> Key: HUDI-7103
> URL: https://issues.apache.org/jira/browse/HUDI-7103
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> This goal of this task is to enable time travel queries for COW tables based 
> on HadoopFsRelation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7103] Support time travel queies for COW tables [hudi]

2023-11-15 Thread via GitHub


linliu-code opened a new pull request, #10109:
URL: https://github.com/apache/hudi/pull/10109

   ### Change Logs
   
   This is based on HadoopFsRelation, and new file format and file group reader.
   
   ### Impact
   
   Time travel queries should be more stable.
   
   ### Risk level (write none, low medium or high below)
   
   LOW since this is for 1.0.0.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7103) Enable Time travel queries for COW

2023-11-15 Thread Lin Liu (Jira)
Lin Liu created HUDI-7103:
-

 Summary: Enable Time travel queries for COW
 Key: HUDI-7103
 URL: https://issues.apache.org/jira/browse/HUDI-7103
 Project: Apache Hudi
  Issue Type: Task
Reporter: Lin Liu
Assignee: Lin Liu
 Fix For: 1.0.0


This goal of this task is to enable time travel queries for COW tables based on 
HadoopFsRelation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-15 Thread via GitHub


yihua commented on code in PR #9717:
URL: https://github.com/apache/hudi/pull/9717#discussion_r1394831451


##
.github/workflows/bot.yml:
##
@@ -284,29 +294,33 @@ jobs:
   matrix:
 include:
   - flinkProfile: 'flink1.17'
-sparkProfile: 'spark3.4'
-sparkRuntime: 'spark3.4.0'
-  - flinkProfile: 'flink1.17'
-sparkProfile: 'spark3.3'
-sparkRuntime: 'spark3.3.2'
-  - flinkProfile: 'flink1.16'
-sparkProfile: 'spark3.3'
-sparkRuntime: 'spark3.3.2'
-  - flinkProfile: 'flink1.15'
-sparkProfile: 'spark3.3'
-sparkRuntime: 'spark3.3.1'
-  - flinkProfile: 'flink1.14'
-sparkProfile: 'spark3.2'
-sparkRuntime: 'spark3.2.3'
-  - flinkProfile: 'flink1.13'
-sparkProfile: 'spark3.1'
-sparkRuntime: 'spark3.1.3'
-  - flinkProfile: 'flink1.14'
-sparkProfile: 'spark3.0'
-sparkRuntime: 'spark3.0.2'
-  - flinkProfile: 'flink1.13'
-sparkProfile: 'spark2.4'
-sparkRuntime: 'spark2.4.8'
+sparkProfile: 'spark3.5'
+sparkRuntime: 'spark3.5.0'
+#  - flinkProfile: 'flink1.17'
+#sparkProfile: 'spark3.4'
+#sparkRuntime: 'spark3.4.0'
+#  - flinkProfile: 'flink1.17'
+#sparkProfile: 'spark3.3'
+#sparkRuntime: 'spark3.3.2'
+#  - flinkProfile: 'flink1.16'
+#sparkProfile: 'spark3.3'
+#sparkRuntime: 'spark3.3.2'
+#  - flinkProfile: 'flink1.15'
+#sparkProfile: 'spark3.3'
+#sparkRuntime: 'spark3.3.1'
+#  - flinkProfile: 'flink1.14'
+#sparkProfile: 'spark3.2'
+#sparkRuntime: 'spark3.2.3'
+#  - flinkProfile: 'flink1.13'
+#sparkProfile: 'spark3.1'
+#sparkRuntime: 'spark3.1.3'
+#  - flinkProfile: 'flink1.14'
+#sparkProfile: 'spark3.0'
+#sparkRuntime: 'spark3.0.2'
+#  - flinkProfile: 'flink1.13'
+#sparkProfile: 'spark2.4'
+#sparkRuntime: 'spark2.4.8'
+

Review Comment:
   I've built and uploaded the bundle validation image 
`apachehudi/hudi-ci-bundle-validation-base:flink1180hive313spark350`.  It's 
ready for use now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7102) A bug for the time travel queries for MOR tables

2023-11-15 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-7102:
--
Description: 
Issue:
 # Based on the provided TIMESTAMP_AS_OF, a list of file slices are returned. 
However, these file slices that are returned are based on their base file 
timestamp. That means, these slices may contain log files whose timestamps are 
higher than the provided timestamp.
 # Such that, when we try to merge the logs in the reverse order, we may see 
these unqualified log files first, which triggers the "break" operation, and no 
merging will be done.

 

Solution:
 # The first solution is to filter the log files as well as the base files for 
the file slices. 
 # The second solution is to skip these unqualified log files, and keep merging.

 

Risk:
 * 1. Not sure if new bugs would be introduced by changing the current behavior.

  was:
The issue is:
 # Based on the provided TIMESTAMP_AS_OF, a list of file slices are returned. 
However, these file slices that are returned are based on their base file 
timestamp. That means, these slices may contain log files whose timestamps are 
higher than the provided timestamp.
 # Such that, when we try to merge the logs in the reverse order, we may see 
these unqualified log files first, which triggers the "break" operation, and no 
merging will be done.

 

Solution:
 # The first solution is to filter the log files as well as the base files for 
the file slices. But not sure if any other logic will be affected.
 # The second solution is to skip these unqualified log files, and keep 
merging. Not sure if any existing processing logic are based on this "break" 
logic.


> A bug for the time travel queries for MOR tables
> 
>
> Key: HUDI-7102
> URL: https://issues.apache.org/jira/browse/HUDI-7102
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Issue:
>  # Based on the provided TIMESTAMP_AS_OF, a list of file slices are returned. 
> However, these file slices that are returned are based on their base file 
> timestamp. That means, these slices may contain log files whose timestamps 
> are higher than the provided timestamp.
>  # Such that, when we try to merge the logs in the reverse order, we may see 
> these unqualified log files first, which triggers the "break" operation, and 
> no merging will be done.
>  
> Solution:
>  # The first solution is to filter the log files as well as the base files 
> for the file slices. 
>  # The second solution is to skip these unqualified log files, and keep 
> merging.
>  
> Risk:
>  * 1. Not sure if new bugs would be introduced by changing the current 
> behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7102) A bug for the time travel queries for MOR tables

2023-11-15 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-7102:
--
Description: 
Issue:
 # Based on the provided TIMESTAMP_AS_OF, a list of file slices are returned. 
However, these file slices that are returned are based on their base file 
timestamp. That means, these slices may contain log files whose timestamps are 
higher than the provided timestamp.
 # Such that, when we try to merge the logs in the reverse order, we may see 
these unqualified log files first, which triggers the "break" operation, and no 
merging will be done.

 

Solution:
 # The first solution is to filter the log files as well as the base files for 
the file slices. 
 # The second solution is to skip these unqualified log files, and keep merging.

 

Risk:
 * Not sure if new bugs would be introduced by changing the current behavior.

  was:
Issue:
 # Based on the provided TIMESTAMP_AS_OF, a list of file slices are returned. 
However, these file slices that are returned are based on their base file 
timestamp. That means, these slices may contain log files whose timestamps are 
higher than the provided timestamp.
 # Such that, when we try to merge the logs in the reverse order, we may see 
these unqualified log files first, which triggers the "break" operation, and no 
merging will be done.

 

Solution:
 # The first solution is to filter the log files as well as the base files for 
the file slices. 
 # The second solution is to skip these unqualified log files, and keep merging.

 

Risk:
 * 1. Not sure if new bugs would be introduced by changing the current behavior.


> A bug for the time travel queries for MOR tables
> 
>
> Key: HUDI-7102
> URL: https://issues.apache.org/jira/browse/HUDI-7102
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> Issue:
>  # Based on the provided TIMESTAMP_AS_OF, a list of file slices are returned. 
> However, these file slices that are returned are based on their base file 
> timestamp. That means, these slices may contain log files whose timestamps 
> are higher than the provided timestamp.
>  # Such that, when we try to merge the logs in the reverse order, we may see 
> these unqualified log files first, which triggers the "break" operation, and 
> no merging will be done.
>  
> Solution:
>  # The first solution is to filter the log files as well as the base files 
> for the file slices. 
>  # The second solution is to skip these unqualified log files, and keep 
> merging.
>  
> Risk:
>  * Not sure if new bugs would be introduced by changing the current behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6702] Fix a bug for time travel queries on MOR tables [hudi]

2023-11-15 Thread via GitHub


linliu-code commented on PR #10102:
URL: https://github.com/apache/hudi/pull/10102#issuecomment-1813516391

   > for bug fixes, we should have the jira fild and call out the scenarios 
where bugs could happen. Can you please file one and add details on what exact 
issue we are runing into. @linliu-code Also, is it possible to add tests.
   
   This change fixed existing broken tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7102) A bug for the time travel queries for MOR tables

2023-11-15 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-7102:
--
Summary: A bug for the time travel queries for MOR tables  (was: Fixed a 
bug for the time travel queries for MOR tables)

> A bug for the time travel queries for MOR tables
> 
>
> Key: HUDI-7102
> URL: https://issues.apache.org/jira/browse/HUDI-7102
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> The issue is:
>  # Based on the provided TIMESTAMP_AS_OF, a list of file slices are returned. 
> However, these file slices that are returned are based on their base file 
> timestamp. That means, these slices may contain log files whose timestamps 
> are higher than the provided timestamp.
>  # Such that, when we try to merge the logs in the reverse order, we may see 
> these unqualified log files first, which triggers the "break" operation, and 
> no merging will be done.
>  
> Solution:
>  # The first solution is to filter the log files as well as the base files 
> for the file slices. But not sure if any other logic will be affected.
>  # The second solution is to skip these unqualified log files, and keep 
> merging. Not sure if any existing processing logic are based on this "break" 
> logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7102) Fixed a bug for the time travel queries for MOR tables

2023-11-15 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu reassigned HUDI-7102:
-

Assignee: Lin Liu

> Fixed a bug for the time travel queries for MOR tables
> --
>
> Key: HUDI-7102
> URL: https://issues.apache.org/jira/browse/HUDI-7102
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>
> The issue is:
>  # Based on the provided TIMESTAMP_AS_OF, a list of file slices are returned. 
> However, these file slices that are returned are based on their base file 
> timestamp. That means, these slices may contain log files whose timestamps 
> are higher than the provided timestamp.
>  # Such that, when we try to merge the logs in the reverse order, we may see 
> these unqualified log files first, which triggers the "break" operation, and 
> no merging will be done.
>  
> Solution:
>  # The first solution is to filter the log files as well as the base files 
> for the file slices. But not sure if any other logic will be affected.
>  # The second solution is to skip these unqualified log files, and keep 
> merging. Not sure if any existing processing logic are based on this "break" 
> logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   >