Re: [PR] [HUDI-7224] HoodieSparkSqlWriter metasync success or not show details messages log [hudi]

2023-12-11 Thread via GitHub


xuzifu666 commented on code in PR #10314:
URL: https://github.com/apache/hudi/pull/10314#discussion_r1423577968


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##
@@ -975,6 +975,13 @@ class HoodieSparkSqlWriterInternal {
   val metaSyncSuccess = metaSync(spark, 
HoodieWriterUtils.convertMapToHoodieConfig(parameters),
 tableInstantInfo.basePath, schema)
 
+  if (metaSyncSuccess) {
+log.info("metaSyncSuccess " + tableInstantInfo.instantTime + " 
successful!")
+  }

Review Comment:
   problem is job would stop with out any error or details to dig cause,these 
would take more work without the log @danny0405 



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##
@@ -975,6 +975,13 @@ class HoodieSparkSqlWriterInternal {
   val metaSyncSuccess = metaSync(spark, 
HoodieWriterUtils.convertMapToHoodieConfig(parameters),
 tableInstantInfo.basePath, schema)
 
+  if (metaSyncSuccess) {
+log.info("metaSyncSuccess " + tableInstantInfo.instantTime + " 
successful!")
+  }

Review Comment:
   problem is job would stop with out any error or details to dig cause,these 
would take more work without the log @danny0405 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7224] HoodieSparkSqlWriter metasync success or not show details messages log [hudi]

2023-12-11 Thread via GitHub


xuzifu666 closed pull request #10314: [HUDI-7224] HoodieSparkSqlWriter metasync 
success or not show details messages log
URL: https://github.com/apache/hudi/pull/10314


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] how to config hudi table TTL in S3? The table_meta can be separated into a directory? [hudi]

2023-12-11 Thread via GitHub


zyclove opened a new issue, #10316:
URL: https://github.com/apache/hudi/issues/10316

   
   How to configure TTL policy in hudi data table? Can the metadata (.hoodie 
)be separated into a directory? 
   Only configure the appropriate TTL for the data directory, so that data 
cleaning can also use hierarchical storage and different life cycles, and the 
data can also be automatically cleaned by relying on the object storage 
service, and there is no cost.
   
   EG:
   
   s3://big-data-eu/hudi/data/bi_ods/table_name/dt=20231211/data
   < 30 days STANDARD S3
   > 30 days delete by TTL with no cost.
   =
   
   < 15 days STANDARD S3
   > 15 days GLACIER_IR
   > 105 days  delete by TTL with no cost.
   
   s3://big-data-eu/hudi/table_meta/bi_ods/table_name/.hoodie
   
   As mentioned above, if there are many data tables under the data storage and 
the storage periods are the same, I can just configure the storage period for 
the directory and rely on the object storage to automatically clean up the 
historical data at no cost.
   EG:
   
   s3://big-data-eu/hudi/data/30days/bi_ods/table_name/dt=20231211/data
   < 30 days STANDARD S3
   > 30 days delete by TTL with no cost.
   
   s3://big-data-eu/hudi/data/90days/bi_ods/table_name/dt=20231211/data
   < 30 days STANDARD S3
   30days < 90days GLACIER_IR
   > 90 days delete by TTL with no cost.
   
   =
   
   < 15 days STANDARD S3
   > 15 days GLACIER_IR
   > 105 days  delete by TTL with no cost.
   
   s3://big-data-eu/hudi/table_meta/bi_ods/table_name/.hoodie
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] NPE fix while adding projection field & added its test cases [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10313:
URL: https://github.com/apache/hudi/pull/10313#issuecomment-1851464138

   
   ## CI report:
   
   * b9ebe136bdcafc4d5bbd407691f2420ccab45adc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21466)
 
   * 5273d8cc9ed428d2ac6896f52664618ed02c98a1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21468)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2023-12-11 Thread via GitHub


lei-su-awx commented on issue #10315:
URL: https://github.com/apache/hudi/issues/10315#issuecomment-1851457169

   Hi @danny0405 , do you mean like this:
   https://github.com/apache/hudi/assets/19327659/946c95b5-77f5-4d34-a315-a284c6b95b37";>
   I tried this, but also will read other partitions' data file to resolve 
schema. And I think kind of filter takes effect after source load all 
partitions files, but I want a config that can tell source that only reads the 
partition that in my configs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7224] HoodieSparkSqlWriter metasync success or not show details messages log [hudi]

2023-12-11 Thread via GitHub


danny0405 commented on code in PR #10314:
URL: https://github.com/apache/hudi/pull/10314#discussion_r1423561174


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##
@@ -975,6 +975,13 @@ class HoodieSparkSqlWriterInternal {
   val metaSyncSuccess = metaSync(spark, 
HoodieWriterUtils.convertMapToHoodieConfig(parameters),
 tableInstantInfo.basePath, schema)
 
+  if (metaSyncSuccess) {
+log.info("metaSyncSuccess " + tableInstantInfo.instantTime + " 
successful!")
+  }

Review Comment:
   Kind of remember if the meta sync failes, the whole job would just fail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2023-12-11 Thread via GitHub


xuzifu666 commented on code in PR #10297:
URL: https://github.com/apache/hudi/pull/10297#discussion_r1423553849


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##
@@ -508,10 +509,7 @@ protected void doWrite(HoodieRecord record, Schema schema, 
TypedProperties props
   flushToDiskIfRequired(record, false);
   writeToBuffer(record);
 } catch (Throwable t) {
-  // Not throwing exception from here, since we don't want to fail the 
entire job
-  // for a single record
-  writeStatus.markFailure(record, t, recordMetadata);
-  LOG.error("Error writing record " + record, t);
+  throw new HoodieInsertException("Error writing record " + record, t);

Review Comment:
   spark seem not have the config @danny0405 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-7222) Fix the loose Scala style check

2023-12-11 Thread Lin Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795632#comment-17795632
 ] 

Lin Liu commented on HUDI-7222:
---

We found that the order of the imports are not enforced, the unused imports are 
not removed.

Tried to enforce these rules, which causes hundreds of error messages. Need to 
understand the rationale that we did not enforce these rules.

> Fix the loose Scala style check
> ---
>
> Key: HUDI-7222
> URL: https://issues.apache.org/jira/browse/HUDI-7222
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
>
> We have seen the Scala code style is loose for many places, like malformed 
> imports. We have to fix this kind of problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT] Flink cannot write data to HUDI when set metadata.enabled to true [hudi]

2023-12-11 Thread via GitHub


danny0405 commented on issue #10306:
URL: https://github.com/apache/hudi/issues/10306#issuecomment-1851449422

   Do we have other exceptions? It looks like the exception is not a root 
cause, other exception relay the exception msg to interrupt these tasks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6979][RFC-76] support event time based compaction strategy [hudi]

2023-12-11 Thread via GitHub


danny0405 commented on code in PR #10266:
URL: https://github.com/apache/hudi/pull/10266#discussion_r1423554270


##
rfc/rfc-76/rfc-76.md:
##
@@ -0,0 +1,238 @@
+
+# RFC-[74]: [support EventTimeBasedCompactionStrategy]
+
+## Proposers
+
+- @waitingF
+
+## Approvers
+ - @
+ - @
+
+## Status
+
+JIRA: [HUDI-6979](https://issues.apache.org/jira/browse/HUDI-6979)
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+Currently, to gain low ingestion latency, we can adopt the MergeOnRead table, 
which support appending log files and 
+compact log files into base file later. When querying the snapshot table (RT 
table) generated by MOR, 
+query side have to perform a compaction so that they can get all data, which 
is expected time-consuming causing query latency.
+At the time, hudi provide read-optimized table (RO table) for low query 
latency just like COW.
+
+But currently, there is no compaction strategy based on event time, so there 
is no data freshness guarantee for RO table.
+For cases, user want all data before a specified time, user have to query the 
RT table to get all data with expected high query latency.

Review Comment:
   Can you modify the doc based on our new file slicing algorithm?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2023-12-11 Thread via GitHub


xuzifu666 commented on code in PR #10297:
URL: https://github.com/apache/hudi/pull/10297#discussion_r1423553849


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##
@@ -508,10 +509,7 @@ protected void doWrite(HoodieRecord record, Schema schema, 
TypedProperties props
   flushToDiskIfRequired(record, false);
   writeToBuffer(record);
 } catch (Throwable t) {
-  // Not throwing exception from here, since we don't want to fail the 
entire job
-  // for a single record
-  writeStatus.markFailure(record, t, recordMetadata);
-  LOG.error("Error writing record " + record, t);
+  throw new HoodieInsertException("Error writing record " + record, t);

Review Comment:
   spark seem not the config



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2023-12-11 Thread via GitHub


danny0405 commented on issue #10315:
URL: https://github.com/apache/hudi/issues/10315#issuecomment-1851443273

   Did you try to add filter condition with the partition fields?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2023-12-11 Thread via GitHub


danny0405 commented on code in PR #10297:
URL: https://github.com/apache/hudi/pull/10297#discussion_r1423550203


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##
@@ -508,10 +509,7 @@ protected void doWrite(HoodieRecord record, Schema schema, 
TypedProperties props
   flushToDiskIfRequired(record, false);
   writeToBuffer(record);
 } catch (Throwable t) {
-  // Not throwing exception from here, since we don't want to fail the 
entire job
-  // for a single record
-  writeStatus.markFailure(record, t, recordMetadata);
-  LOG.error("Error writing record " + record, t);
+  throw new HoodieInsertException("Error writing record " + record, t);

Review Comment:
   Does Spark have similiar option?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] NPE fix while adding projection field & added its test cases [hudi]

2023-12-11 Thread via GitHub


danny0405 commented on code in PR #10313:
URL: https://github.com/apache/hudi/pull/10313#discussion_r1423548002


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java:
##
@@ -86,7 +86,7 @@ private static Configuration addProjectionField(Configuration 
conf, String field
 
   public static void addProjectionField(Configuration conf, String[] 
fieldName) {
 if (fieldName.length > 0) {
-  List columnNameList = 
Arrays.stream(conf.get(serdeConstants.LIST_COLUMNS).split(",")).collect(Collectors.toList());
+  List columnNameList = 
Arrays.stream(conf.get(serdeConstants.LIST_COLUMNS, 
"").split(",")).collect(Collectors.toList());
   Arrays.stream(fieldName).forEach(field -> {

Review Comment:
   When the `LIST_COLUMNS` and when not?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] NPE fix while adding projection field & added its test cases [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10313:
URL: https://github.com/apache/hudi/pull/10313#issuecomment-1851403241

   
   ## CI report:
   
   * b9ebe136bdcafc4d5bbd407691f2420ccab45adc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21466)
 
   * 5273d8cc9ed428d2ac6896f52664618ed02c98a1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7224] HoodieSparkSqlWriter metasync success or not show details messages log [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10314:
URL: https://github.com/apache/hudi/pull/10314#issuecomment-1851396191

   
   ## CI report:
   
   * 88b9f8d9518f5afd376479ba9c87a8dd30170ffc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21467)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] NPE fix while adding projection field & added its test cases [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10313:
URL: https://github.com/apache/hudi/pull/10313#issuecomment-1851396149

   
   ## CI report:
   
   * b9ebe136bdcafc4d5bbd407691f2420ccab45adc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21466)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7132] Data may be lost for flink task failure [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10312:
URL: https://github.com/apache/hudi/pull/10312#issuecomment-1851396107

   
   ## CI report:
   
   * 5c971e1a0cafb635ad9cfed0f452751314bdb21c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21465)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7224] HoodieSparkSqlWriter metasync success or not show details messages log [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10314:
URL: https://github.com/apache/hudi/pull/10314#issuecomment-1851388190

   
   ## CI report:
   
   * 88b9f8d9518f5afd376479ba9c87a8dd30170ffc UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] NPE fix while adding projection field & added its test cases [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10313:
URL: https://github.com/apache/hudi/pull/10313#issuecomment-1851388151

   
   ## CI report:
   
   * b9ebe136bdcafc4d5bbd407691f2420ccab45adc UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7132] Data may be lost for flink task failure [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10312:
URL: https://github.com/apache/hudi/pull/10312#issuecomment-1851388109

   
   ## CI report:
   
   * 5c971e1a0cafb635ad9cfed0f452751314bdb21c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2023-12-11 Thread via GitHub


lei-su-awx opened a new issue, #10315:
URL: https://github.com/apache/hudi/issues/10315

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   I have a table partition by operation type ingestion date(like 
insert/2023-12-11/, update/2023-12-11/, delete/2023-12-11/), when I read(use 
spark readStream) this table, I just want to read data under `update` 
partition. And I found a config 'hoodie.datasource.read.incr.path.glob', then I 
use this config and value is `update/202*`. But I found spark job init very 
slow, and found job was stuck 
   https://github.com/apache/hudi/assets/19327659/febba726-b441-4b01-abb0-2e12f8bc62d7";>.
   But this parquet file is not under `update` partition, it is under `insert` 
partition, which is very confused.
   So I want ask is there a config that can only read the target partition and 
skip others and also does not read other partitions' data files to get schema. 
   
   
   **Expected behavior**
   
   I want to know is there a config that can only read the target partition and 
skip others and also does not read other partitions' data files to get schema. 
   
   **Environment Description**
   
   * Hudi version : 0.14.0
   
   * Spark version : 3.4.1
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : GCS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   I use the below configurations to write to table:
   hudi_write_options = {
   'hoodie.table.name': hudi_table_name,
   'hoodie.datasource.write.partitionpath.field': 'operation_type, 
ingestion_dt',
   'hoodie.datasource.write.operation': 'insert',
   'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
   'hoodie.parquet.compression.codec': 'zstd',
   "hoodie.datasource.write.payload.class": 
"org.apache.hudi.common.model.DefaultHoodieRecordPayload",
   "hoodie.datasource.write.reconcile.schema": True,
   "hoodie.metadata.enable": True
   }
   
   And I use the configurations to read from table:
   read_streaming_hudi_options = {
   'maxFilesPerTrigger': 5,
   'hoodie.datasource.read.incr.path.glob': 'update/202*',
   'hoodie.read.timeline.holes.resolution.policy': 'BLOCK',
   
'hoodie.datasource.read.file.index.listing.partition-path-prefix.analysis.enabled':
 False,
   'hoodie.file.listing.parallelism': 1000,
   'hoodie.metadata.enable': True,
   'hoodie.datasource.read.schema.use.end.instanttime': True,
   'hoodie.datasource.streaming.startOffset': '202312110'
   }
   
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10297:
URL: https://github.com/apache/hudi/pull/10297#issuecomment-1851381361

   
   ## CI report:
   
   * 68f3125e06ebb01154d659c2b4452c7ac4c5aa25 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21461)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7211) Relax need of ordering/precombine field for tables with autogenerated record keys for DeltaStreamer

2023-12-11 Thread Aditya Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Goenka updated HUDI-7211:

Description: 
[https://github.com/apache/hudi/issues/10233]

 

```
NOW=$(date '+%Y%m%dt%H%M%S')
${SPARK_HOME}/bin/spark-submit \
--jars 
${path_prefix}/jars/${SPARK_V}/hudi-spark${SPARK_VERSION}-bundle_2.12-${HUDI_VERSION}.jar
 \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
${path_prefix}/jars/${SPARK_V}/hudi-utilities-slim-bundle_2.12-${HUDI_VERSION}.jar
 \
--target-base-path ${path_prefix}/testcases/stocks/data/target/${NOW} \
--target-table stocks${NOW} \
--table-type COPY_ON_WRITE \
--base-file-format PARQUET \
--props ${path_prefix}/testcases/stocks/configs/hoodie.properties \
--source-class org.apache.hudi.utilities.sources.JsonDFSSource \
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider 
\
--hoodie-conf 
hoodie.deltastreamer.schemaprovider.source.schema.file=${path_prefix}/testcases/stocks/data/schema_without_ts.avsc
 \
--hoodie-conf 
hoodie.deltastreamer.schemaprovider.target.schema.file=${path_prefix}/testcases/stocks/data/schema_without_ts.avsc
 \
--op UPSERT \
--spark-master yarn \
--hoodie-conf 
hoodie.deltastreamer.source.dfs.root=${path_prefix}/testcases/stocks/data/source_without_ts
 \
--hoodie-conf hoodie.datasource.write.partitionpath.field=date \
--hoodie-conf hoodie.datasource.write.keygenerator.type=SIMPLE \
--hoodie-conf hoodie.datasource.write.hive_style_partitioning=false \
--hoodie-conf hoodie.metadata.enable=true
```

  was:
[https://github.com/apache/hudi/issues/10233]

 

Reproducible code - 
https://github.com/apache/hudi/issues/10233#issuecomment-1849561433


> Relax need of ordering/precombine field for tables with autogenerated record 
> keys for DeltaStreamer
> ---
>
> Key: HUDI-7211
> URL: https://issues.apache.org/jira/browse/HUDI-7211
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: Aditya Goenka
>Priority: Critical
> Fix For: 0.14.1
>
>
> [https://github.com/apache/hudi/issues/10233]
>  
> ```
> NOW=$(date '+%Y%m%dt%H%M%S')
> ${SPARK_HOME}/bin/spark-submit \
> --jars 
> ${path_prefix}/jars/${SPARK_V}/hudi-spark${SPARK_VERSION}-bundle_2.12-${HUDI_VERSION}.jar
>  \
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
> ${path_prefix}/jars/${SPARK_V}/hudi-utilities-slim-bundle_2.12-${HUDI_VERSION}.jar
>  \
> --target-base-path ${path_prefix}/testcases/stocks/data/target/${NOW} \
> --target-table stocks${NOW} \
> --table-type COPY_ON_WRITE \
> --base-file-format PARQUET \
> --props ${path_prefix}/testcases/stocks/configs/hoodie.properties \
> --source-class org.apache.hudi.utilities.sources.JsonDFSSource \
> --schemaprovider-class 
> org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
> --hoodie-conf 
> hoodie.deltastreamer.schemaprovider.source.schema.file=${path_prefix}/testcases/stocks/data/schema_without_ts.avsc
>  \
> --hoodie-conf 
> hoodie.deltastreamer.schemaprovider.target.schema.file=${path_prefix}/testcases/stocks/data/schema_without_ts.avsc
>  \
> --op UPSERT \
> --spark-master yarn \
> --hoodie-conf 
> hoodie.deltastreamer.source.dfs.root=${path_prefix}/testcases/stocks/data/source_without_ts
>  \
> --hoodie-conf hoodie.datasource.write.partitionpath.field=date \
> --hoodie-conf hoodie.datasource.write.keygenerator.type=SIMPLE \
> --hoodie-conf hoodie.datasource.write.hive_style_partitioning=false \
> --hoodie-conf hoodie.metadata.enable=true
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7211) Relax need of ordering/precombine field for tables with autogenerated record keys for DeltaStreamer

2023-12-11 Thread Aditya Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Goenka updated HUDI-7211:

Summary: Relax need of ordering/precombine field for tables with 
autogenerated record keys for DeltaStreamer  (was: Relax need of precombine key 
for tables with autogenerated record keys)

> Relax need of ordering/precombine field for tables with autogenerated record 
> keys for DeltaStreamer
> ---
>
> Key: HUDI-7211
> URL: https://issues.apache.org/jira/browse/HUDI-7211
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: Aditya Goenka
>Priority: Critical
> Fix For: 0.14.1
>
>
> [https://github.com/apache/hudi/issues/10233]
>  
> Reproducible code - 
> https://github.com/apache/hudi/issues/10233#issuecomment-1849561433



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT] restart flink job got InvalidAvroMagicException: Not an Avro data file [hudi]

2023-12-11 Thread via GitHub


zlinsc commented on issue #10285:
URL: https://github.com/apache/hudi/issues/10285#issuecomment-1851376859

   > see the fix: https://github.com/apache/hudi/pull/10298/files
   
   thx @danny0405 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] The Schema Evolution Not working For Hudi 0.12.3 [hudi]

2023-12-11 Thread via GitHub


ad1happy2go commented on issue #10309:
URL: https://github.com/apache/hudi/issues/10309#issuecomment-1851374103

   @Amar1404 Can you give more details like table/writer configurations you are 
using? I tried with simple scenario and schema evolution from long to double 
works fine. 
   
   ```
   schema1 = StructType(
   [
   StructField("id", IntegerType(), True),
   StructField("value", LongType(), True)
   ]
   )
   
   schema2 = StructType(
   [
   StructField("id", IntegerType(), True),
   StructField("value", DoubleType(), True)
   ]
   )
   
   data1 = [
   Row(1, 100),
   Row(2, 100),
   Row(3,100),
   ]
   
   data2 = [
   Row(1, 100.1),
   Row(2, 200.2),
   Row(3,100.0),
   ]
   
   
   hudi_configs = {
   "hoodie.table.name": TABLE_NAME,
   "hoodie.datasource.write.precombine.field":"value",
   "hoodie.datasource.write.recordkey.field":"id"
   }
   
   df = spark.createDataFrame(spark.sparkContext.parallelize(data1), schema1)
   
df.write.format("org.apache.hudi").options(**hudi_configs).mode("append").save(PATH)
   spark.read.format("org.apache.hudi").load(PATH).printSchema()
   spark.read.format("org.apache.hudi").load(PATH).show()
   df = spark.createDataFrame(spark.sparkContext.parallelize(data2), schema2)
   
df.write.format("org.apache.hudi").options(**hudi_configs).mode("append").save(PATH)
   spark.read.format("org.apache.hudi").load(PATH).printSchema()
   spark.read.format("org.apache.hudi").load(PATH).show()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7224) HoodieSparkSqlWriter metasync success or not show details messages log

2023-12-11 Thread xy (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-7224:
-
Description: HoodieSparkSqlWriter metasync success or not show details 
messages log,currently only commitSuccess show logs

> HoodieSparkSqlWriter metasync success or not show details messages log
> --
>
> Key: HUDI-7224
> URL: https://issues.apache.org/jira/browse/HUDI-7224
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.1
>
>
> HoodieSparkSqlWriter metasync success or not show details messages 
> log,currently only commitSuccess show logs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7224) HoodieSparkSqlWriter metasync success or not show details messages log

2023-12-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7224:
-
Labels: pull-request-available  (was: )

> HoodieSparkSqlWriter metasync success or not show details messages log
> --
>
> Key: HUDI-7224
> URL: https://issues.apache.org/jira/browse/HUDI-7224
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7224] HoodieSparkSqlWriter metasync success or not show details messages log [hudi]

2023-12-11 Thread via GitHub


xuzifu666 opened a new pull request, #10314:
URL: https://github.com/apache/hudi/pull/10314

   ### Change Logs
   
   HoodieSparkSqlWriter metasync success or not show details messages log to 
help user to confirm error in sparksql writing process
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7224) HoodieSparkSqlWriter metasync success or not show details messages log

2023-12-11 Thread xy (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-7224:
-
Fix Version/s: 0.14.1

> HoodieSparkSqlWriter metasync success or not show details messages log
> --
>
> Key: HUDI-7224
> URL: https://issues.apache.org/jira/browse/HUDI-7224
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Priority: Major
> Fix For: 0.14.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7224) HoodieSparkSqlWriter metasync success or not show details messages log

2023-12-11 Thread xy (Jira)
xy created HUDI-7224:


 Summary: HoodieSparkSqlWriter metasync success or not show details 
messages log
 Key: HUDI-7224
 URL: https://issues.apache.org/jira/browse/HUDI-7224
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: xy






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7224) HoodieSparkSqlWriter metasync success or not show details messages log

2023-12-11 Thread xy (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-7224:
-
Component/s: spark-sql

> HoodieSparkSqlWriter metasync success or not show details messages log
> --
>
> Key: HUDI-7224
> URL: https://issues.apache.org/jira/browse/HUDI-7224
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] NPE fix while adding projection field & added its test cases [hudi]

2023-12-11 Thread via GitHub


prathit06 opened a new pull request, #10313:
URL: https://github.com/apache/hudi/pull/10313

   ### Change Logs
   
   Describe context and summary for this change :
   
   While using `HoodieParquetInputFormat` to read a partitioned hudi table from 
Flink we encountered below exception
   
   ```java.lang.NullPointerException
at 
org.apache.hudi.hadoop.utils.HoodieRealtimeInputFormatUtils.addProjectionField(HoodieRealtimeInputFormatUtils.java:88)
at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReader(HoodieParquetInputFormat.java:90)
at 
org.apache.flink.api.java.hadoop.mapred.HadoopInputFormatBase.open(HadoopInputFormatBase.java:176)
at 
org.apache.flink.api.java.hadoop.mapred.HadoopInputFormatBase.open(HadoopInputFormatBase.java:58)
at 
org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333)```
   
   Table type : cow & insert_overwrite 
   Hudi version : 0.13.1 
   Flink version : 1.16.0 
   
   This PR fixes the NPE & Also adds a simple test case for any further testing 
in `HoodieRealtimeInputFormatUtils` 
   
   ### Impact
   
   Describe any public API or user-facing feature change or any performance 
impact._ : NA
   
   ### Risk level (write none, low medium or high below)
   
   If medium or high, explain what verification was done to mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7132) Data may be lost in Flink checkpoint

2023-12-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7132:
-
Labels: pull-request-available  (was: )

> Data may be lost in Flink checkpoint
> 
>
> Key: HUDI-7132
> URL: https://issues.apache.org/jira/browse/HUDI-7132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.13.1, 0.14.0
>Reporter: Bo Cui
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
> before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
> elements of eventBuffer is null
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7132] Data may be lost in Flink checkpoint [hudi]

2023-12-11 Thread via GitHub


danny0405 opened a new pull request, #10312:
URL: https://github.com/apache/hudi/pull/10312

   ### Change Logs
   
   See the context in https://issues.apache.org/jira/browse/HUDI-7132, there is 
no need to clean the event by failure task signal, the events would be 
overridden when new events got received.
   
   ### Impact
   
   fix the potential data loss.
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Incoming batch schema is not compatible with the table's one #9980 [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10308:
URL: https://github.com/apache/hudi/pull/10308#issuecomment-1851342245

   
   ## CI report:
   
   * 737e09fc37912e88f640393b11357cb8b27a29c5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21464)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Incoming batch schema is not compatible with the table's one #9980 [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10308:
URL: https://github.com/apache/hudi/pull/10308#issuecomment-1851336257

   
   ## CI report:
   
   * 737e09fc37912e88f640393b11357cb8b27a29c5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10307:
URL: https://github.com/apache/hudi/pull/10307#issuecomment-1851330034

   
   ## CI report:
   
   * 8b67cc8faf3a4e76866bed27c67ab8687eff5c40 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21462)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Incoming batch schema is not compatible with the table's one [hudi]

2023-12-11 Thread via GitHub


njalan commented on issue #9980:
URL: https://github.com/apache/hudi/issues/9980#issuecomment-1851329200

   @ad1happy2go  I just raised one PR https://github.com/apache/hudi/pull/10308.
   Can you please kindly review it and it is my first pr for hudi.  I am not 
sure it can be merged or not.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] The Schema Evolution Not working For Hudi 0.12.3 [hudi]

2023-12-11 Thread via GitHub


Amar1404 closed issue #10310: The Schema Evolution Not working For Hudi 0.12.3
URL: https://github.com/apache/hudi/issues/10310


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] [hudi]

2023-12-11 Thread via GitHub


Amar1404 closed issue #10311: [SUPPORT]
URL: https://github.com/apache/hudi/issues/10311


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] [hudi]

2023-12-11 Thread via GitHub


Amar1404 opened a new issue, #10311:
URL: https://github.com/apache/hudi/issues/10311

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] The Schema Evolution Not working For Hudi 0.12.3 [hudi]

2023-12-11 Thread via GitHub


Amar1404 opened a new issue, #10310:
URL: https://github.com/apache/hudi/issues/10310

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   I have a column that is earlier a LONG and that is changed to DOUBLE. The 
schema evolution is not working.
   The data is writen but while reading the table the old data file parquet is 
not able to read throwing **ERROR the parquet file not able to read since the 
column expect double but got INT64**
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. I have a column 
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.12.3
   
   * Spark version : 3.3
   
   * Hive version :3
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : EMR
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] The Schema Evolution Not working For Hudi 0.12.3 [hudi]

2023-12-11 Thread via GitHub


Amar1404 opened a new issue, #10309:
URL: https://github.com/apache/hudi/issues/10309

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   I have a column that is earlier a LONG and that is changed to DOUBLE. The 
schema evolution is not working.
   The data is writen but while reading the table the old data file parquet is 
not able to read throwing **ERROR the parquet file not able to read since the 
column expect double but got INT64**
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. I have a column 
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.12.3
   
   * Spark version : 3.3
   
   * Hive version :3
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : EMR
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Incoming batch schema is not compatible with the table's one #9980 [hudi]

2023-12-11 Thread via GitHub


njalan opened a new pull request, #10308:
URL: https://github.com/apache/hudi/pull/10308

   ### Change Logs
   
Use meta sync database to fill hoodie.table.name if it not sets

   
   ### Impact
   
   Fix issue that if there are same table name under hive 'default' schema than 
will case Incoming batch schema is not compatible with the table's one.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7132) Data may be lost in Flink checkpoint

2023-12-11 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7132:
-
Summary: Data may be lost in Flink checkpoint  (was: Data may be lost in 
flink#chk)

> Data may be lost in Flink checkpoint
> 
>
> Key: HUDI-7132
> URL: https://issues.apache.org/jira/browse/HUDI-7132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.13.1, 0.14.0
>Reporter: Bo Cui
>Priority: Major
>
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
> before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
> elements of eventBuffer is null
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6658) Implement MOR Incremental for new file format

2023-12-11 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-6658.
-
Resolution: Done

> Implement MOR Incremental for new file format
> -
>
> Key: HUDI-6658
> URL: https://issues.apache.org/jira/browse/HUDI-6658
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Implement MOR Incremental for new file format



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-6658] Inject filters for incremental query (#10225)

2023-12-11 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new cacbb82254c [HUDI-6658] Inject filters for incremental query  (#10225)
cacbb82254c is described below

commit cacbb82254c840b96f879c8a577a6e91aff3f57e
Author: Jon Vexler 
AuthorDate: Tue Dec 12 00:08:23 2023 -0500

[HUDI-6658] Inject filters for incremental query  (#10225)

Add incremental filters to the query plan

Also fix some tests that use partition path as the precombine field

Incremental queries will now work the new filegroup reader
---
 .../spark/sql/HoodieCatalystPlansUtils.scala   |  11 +-
 .../main/scala/org/apache/hudi/DefaultSource.scala |  14 +-
 .../scala/org/apache/hudi/HoodieCDCFileIndex.scala |   5 +
 .../hudi/HoodieHadoopFsRelationFactory.scala   |   6 +-
 .../apache/hudi/HoodieIncrementalFileIndex.scala   |   2 +-
 .../sql/FileFormatUtilsForFileGroupReader.scala| 128 +++
 ...odieFileGroupReaderBasedParquetFileFormat.scala |  36 ++-
 .../parquet/NewHoodieParquetFileFormat.scala   |   2 +
 .../spark/sql/hudi/analysis/HoodieAnalysis.scala   |   2 +-
 .../TestGlobalIndexEnableUpdatePartitions.java |   6 +
 .../TestIncrementalReadWithFullTableScan.scala |  27 +--
 .../hudi/functional/TestSparkDataSource.scala  |   4 +
 .../hudi/procedure/TestClusteringProcedure.scala   | 255 +++--
 .../procedure/TestHoodieLogFileProcedure.scala |  11 +-
 .../spark/sql/HoodieSpark2CatalystPlanUtils.scala  |  27 ++-
 .../spark/sql/HoodieSpark3CatalystPlanUtils.scala  |  25 +-
 .../spark/sql/HoodieSpark30CatalystPlanUtils.scala |   4 +-
 .../spark/sql/HoodieSpark31CatalystPlanUtils.scala |   4 +-
 .../spark/sql/HoodieSpark32CatalystPlanUtils.scala |   4 +-
 .../spark/sql/HoodieSpark33CatalystPlanUtils.scala |   4 +-
 .../spark/sql/HoodieSpark34CatalystPlanUtils.scala |   4 +-
 .../spark/sql/HoodieSpark35CatalystPlanUtils.scala |   4 +-
 .../deltastreamer/TestHoodieDeltaStreamer.java |   4 +-
 .../utilities/sources/TestHoodieIncrSource.java|   6 +
 24 files changed, 372 insertions(+), 223 deletions(-)

diff --git 
a/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieCatalystPlansUtils.scala
 
b/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieCatalystPlansUtils.scala
index 64ee645ba0f..b9110f1ed93 100644
--- 
a/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieCatalystPlansUtils.scala
+++ 
b/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieCatalystPlansUtils.scala
@@ -102,9 +102,11 @@ trait HoodieCatalystPlansUtils {
* Spark requires file formats to append the partition path fields to the 
end of the schema.
* For tables where the partition path fields are not at the end of the 
schema, we don't want
* to return the schema in the wrong order when they do a query like "select 
*". To fix this
-   * behavior, we apply a projection onto FileScan when the file format is 
NewHudiParquetFileFormat
+   * behavior, we apply a projection onto FileScan when the file format has 
HoodieFormatTrait
+   *
+   * Additionally, incremental queries require filters to be added to the plan
*/
-  def applyNewHoodieParquetFileFormatProjection(plan: LogicalPlan): LogicalPlan
+  def maybeApplyForNewFileFormat(plan: LogicalPlan): LogicalPlan
 
   /**
* Decomposes [[InsertIntoStatement]] into its arguments allowing to 
accommodate for API
@@ -140,4 +142,9 @@ trait HoodieCatalystPlansUtils {
   def failAnalysisForMIT(a: Attribute, cols: String): Unit = {}
 
   def createMITJoin(left: LogicalPlan, right: LogicalPlan, joinType: JoinType, 
condition: Option[Expression], hint: String): LogicalPlan
+
+  /**
+   * true if both plans produce the same attributes in the the same order
+   */
+  def produceSameOutput(a: LogicalPlan, b: LogicalPlan): Boolean
 }
diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala
index 168502b3f08..ac8286b1bde 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala
@@ -285,7 +285,12 @@ object DefaultSource {
 resolveBaseFileOnlyRelation(sqlContext, globPaths, userSchema, 
metaClient, parameters)
   }
 case (COPY_ON_WRITE, QUERY_TYPE_INCREMENTAL_OPT_VAL, _) =>
-  new IncrementalRelation(sqlContext, parameters, userSchema, 
metaClient)
+  if (fileFormatUtils.isDefined) {
+new HoodieCopyOnWriteIncrementalHadoopFsRelationFactory(
+  sqlContext, metaClient, parameters, userSchema, 
isBootstrappedTable).build()
+  }

Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]

2023-12-11 Thread via GitHub


codope merged PR #10225:
URL: https://github.com/apache/hudi/pull/10225


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6658) Implement MOR Incremental for new file format

2023-12-11 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6658:
--
Fix Version/s: 1.0.0

> Implement MOR Incremental for new file format
> -
>
> Key: HUDI-6658
> URL: https://issues.apache.org/jira/browse/HUDI-6658
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Implement MOR Incremental for new file format



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7132) Data may be lost in flink#chk

2023-12-11 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7132:
-
Affects Version/s: (was: 1.1.0)

> Data may be lost in flink#chk
> -
>
> Key: HUDI-7132
> URL: https://issues.apache.org/jira/browse/HUDI-7132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.13.1, 0.14.0
>Reporter: Bo Cui
>Priority: Major
>
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
> before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
> elements of eventBuffer is null
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-7132) Data may be lost in flink#chk

2023-12-11 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795589#comment-17795589
 ] 

Danny Chen commented on HUDI-7132:
--

Did you try this fix: https://github.com/apache/hudi/pull/9867/files

> Data may be lost in flink#chk
> -
>
> Key: HUDI-7132
> URL: https://issues.apache.org/jira/browse/HUDI-7132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.13.1, 0.14.0
>Reporter: Bo Cui
>Priority: Major
>
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
> before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
> elements of eventBuffer is null
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7132) Data may be lost in flink#chk

2023-12-11 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7132:
-
Affects Version/s: 0.14.0
   0.13.1

> Data may be lost in flink#chk
> -
>
> Key: HUDI-7132
> URL: https://issues.apache.org/jira/browse/HUDI-7132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.13.1, 0.14.0, 1.1.0
>Reporter: Bo Cui
>Priority: Major
>
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
> before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
> elements of eventBuffer is null
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key [hudi]

2023-12-11 Thread via GitHub


ad1happy2go commented on issue #10303:
URL: https://github.com/apache/hudi/issues/10303#issuecomment-1851290166

   @srinikandi I see a fix(https://github.com/apache/hudi/pull/4201) was tried 
but then it was reverted due to another issue,. Will look into it. Thanks for 
raising this again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6658] inject filters for incremental query [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10225:
URL: https://github.com/apache/hudi/pull/10225#issuecomment-1851286731

   
   ## CI report:
   
   * 8cc63ddef9ccc12489d0461f1c928e5c579b9277 UNKNOWN
   * bf5722620a6d6c4ab30944c2ebec3d17cd65f625 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21458)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7176] Add file group reader test framework [hudi]

2023-12-11 Thread via GitHub


yihua commented on code in PR #10263:
URL: https://github.com/apache/hudi/pull/10263#discussion_r1423390827


##
hudi-common/src/test/java/org/apache/hudi/common/testutils/reader/DataGenerationPlan.java:
##
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.testutils.reader;
+
+import java.util.List;
+
+/**
+ * The blueprint of records that will be generated
+ * by the data generator.
+ *
+ * Current limitations:
+ * 1. One plan generates one file, either a base file, or a log file.
+ * 2. One file contains one operation, e.g., insert, delete, or update.
+ */
+public class DataGenerationPlan {

Review Comment:
   Is this going to be integrated with `HoodieTestTable` or similar to generate 
test tables without using transactional writes?
   



##
hudi-common/src/test/java/org/apache/hudi/common/testutils/reader/HoodieTestReaderContext.java:
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.testutils.reader;
+
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.engine.HoodieReaderContext;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.HoodieAvroIndexedRecord;
+import org.apache.hudi.common.model.HoodieAvroRecordMerger;
+import org.apache.hudi.common.model.HoodieEmptyRecord;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.model.HoodieOperation;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordMerger;
+import org.apache.hudi.common.util.ConfigUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.collection.ClosableIterator;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.io.storage.HoodieAvroParquetReader;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.function.UnaryOperator;
+
+import static 
org.apache.hudi.common.model.HoodieRecordMerger.DEFAULT_MERGER_STRATEGY_UUID;
+import static 
org.apache.hudi.common.testutils.reader.HoodieFileSliceTestUtils.ROW_KEY;
+
+public class HoodieTestReaderContext extends 
HoodieReaderContext {
+  @Override
+  public FileSystem getFs(String path, Configuration conf) {
+return FSUtils.getFs(path, conf);
+  }
+
+  @Override
+  public ClosableIterator getFileRecordIterator(
+  Path filePath,
+  long start,
+  long length,
+  Schema dataSchema,
+  Schema requiredSchema,
+  Configuration conf
+  ) throws IOException {
+HoodieAvroParquetReader reader = new HoodieAvroParquetReader(conf, 
filePath);
+return reader.getIndexedRecordIterator(dataSchema, requiredSchema);
+  }
+
+  @Override
+  public IndexedRecord convertAvroRecord(IndexedRecord record) {
+return record;
+  }
+
+  @Override
+  public HoodieRecordMerger getRecordMerger(String mergerStrategy) {
+switch (mergerStrategy) {
+  case DEFAULT_MERGER_STRATEGY_UUID:
+return new HoodieAvroRecordMerger();
+  default:
+throw new HoodieException(
+"The merger strategy UUID is not supported: " + mergerStrategy);
+   

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10307:
URL: https://github.com/apache/hudi/pull/10307#issuecomment-1851281091

   
   ## CI report:
   
   * 8b67cc8faf3a4e76866bed27c67ab8687eff5c40 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21462)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10307:
URL: https://github.com/apache/hudi/pull/10307#issuecomment-1851252749

   
   ## CI report:
   
   * 8b67cc8faf3a4e76866bed27c67ab8687eff5c40 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-11 Thread via GitHub


the-other-tim-brown commented on code in PR #10307:
URL: https://github.com/apache/hudi/pull/10307#discussion_r1423363621


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##
@@ -351,23 +351,15 @@ private Pair> 
getFilesToCleanKeepingLatestCommits(S
 continue;
   }
 
-  if (policy == HoodieCleaningPolicy.KEEP_LATEST_COMMITS) {
+  if (policy == HoodieCleaningPolicy.KEEP_LATEST_COMMITS || policy == 
HoodieCleaningPolicy.KEEP_LATEST_BY_HOURS) {

Review Comment:
   If we agree this is the correct behavior, I can update or add unit tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-11 Thread via GitHub


the-other-tim-brown opened a new pull request, #10307:
URL: https://github.com/apache/hudi/pull/10307

   ### Change Logs
   
   The `KEEP_LATEST_BY_HOURS` cleaner policy will find the earliest time to 
retain and then find the earliest commit that is greater than or equal to that 
time:
   ```
   String earliestTimeToRetain = 
HoodieActiveTimeline.formatDate(Date.from(latestDateTime.minusHours(hoursRetained).toInstant()));
   earliestCommitToRetain = 
Option.fromJavaOptional(completedCommitsTimeline.getInstantsAsStream().filter(i 
-> 
   HoodieTimeline.compareTimestamps(i.getTimestamp(),
 HoodieTimeline.GREATER_THAN_OR_EQUALS, 
earliestTimeToRetain)).findFirst());
   ```
   In the CleanPlanner, any file slice before this `earliestCommitToRetain` 
will be removed. This can result in deleting files that were still relevant N 
hours ago and users will not be able to query the table as of that time.
   
   For example, imagine you have configured keep latest 24 hours. You have a 
new base file written for a file group at time `NOW-48hr`. An update comes in 
at time `NOW-8hr` and does not touch this file group. Another commit writes a 
new base file at time `NOW-1hr` and the cleaner kicks in. It will see `NOW-8hr` 
as the latest commit that is not more than 24 hours old causing the file 
written in `NOW-48hr` to be removed even though it would be required if someone 
wants to do a time travel query for a point in time 8+ hours in the past.
   
   ### Impact
   
   Fixes expectations for time based cleaning and retention.
   
   ### Risk level (write none, low medium or high below)
   
   Low, will keep files longer for those using this policy
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7223) Hudi Cleaner removing files still required for view N hours old

2023-12-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7223:
-
Labels: pull-request-available  (was: )

> Hudi Cleaner removing files still required for view N hours old
> ---
>
> Key: HUDI-7223
> URL: https://issues.apache.org/jira/browse/HUDI-7223
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> If a user is using time based cleaner policy, they will expect that they can 
> query the table state as of N hours ago. This means that they do not want to 
> clean up files older than N hours but files that are no longer relevant to 
> the table N hours ago. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7215] Delete NewHoodieParquetFileFormat [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10304:
URL: https://github.com/apache/hudi/pull/10304#issuecomment-1851243823

   
   ## CI report:
   
   * d858eaac14b3de45d4066165622738d91ff603fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21456)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Flink cannot write data to HUDI when set metadata.enabled to true [hudi]

2023-12-11 Thread via GitHub


zdl1 commented on issue #10306:
URL: https://github.com/apache/hudi/issues/10306#issuecomment-1851240670

   When try to restart the job, there are other exceptions:
   ```
   2023-12-12 11:23:25
   org.apache.flink.util.FlinkException: Global failure triggered by 
OperatorCoordinator for 'stream_write: test1' (operator 
8b0eff726c52aac1276bd5cfcb9bf178).
at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:545)
at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$start$0(StreamWriteOperatorCoordinator.java:191)
at 
org.apache.hudi.sink.utils.NonThrownExecutor.handleException(NonThrownExecutor.java:142)
at 
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:133)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.hudi.exception.HoodieException: Executor executes 
action [commits the instant 20231212110524061] error
... 6 more
   Caused by: org.apache.hudi.exception.HoodieException: Heartbeat for instant 
20231212110524061 has expired, last heartbeat 0
at 
org.apache.hudi.client.heartbeat.HeartbeatUtils.abortIfHeartbeatExpired(HeartbeatUtils.java:95)
at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:225)
at 
org.apache.hudi.client.HoodieFlinkWriteClient.commit(HoodieFlinkWriteClient.java:111)
at 
org.apache.hudi.client.HoodieFlinkWriteClient.commit(HoodieFlinkWriteClient.java:74)
at 
org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:199)
at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.doCommit(StreamWriteOperatorCoordinator.java:540)
at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.commitInstant(StreamWriteOperatorCoordinator.java:516)
at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2(StreamWriteOperatorCoordinator.java:246)
at 
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
... 3 more
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] Flink cannot write data to HUDI when set metadata.enabled to true [hudi]

2023-12-11 Thread via GitHub


zdl1 opened a new issue, #10306:
URL: https://github.com/apache/hudi/issues/10306

   
   **Describe the problem you faced**
   When I set metadata.enabled to true by Flink, HUDI cannot delta_commit 
successfully and always restarts the job 
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. start flink sql client
   ```
   create table test1 (
   c1 int primary key,
   c2 int,
   c3 int
   ) with (
   'connector' = 'hudi',
   'path' = 'hdfs:/flink/test1',
   'table.type' = 'MERGE_ON_READ',
   'metadata.enabled' = 'true'
   );
   ```
   2.
   ```
   create table datagen1 (
   c1 int,
   c2 int,
   c3 int
   ) with (
   'connector' = 'datagen',
   'number-of-rows' = '300',
   'rows-per-second' = '10'
   );
   ```
   3.
   ```
   SET execution.checkpointing.interval=1000;
   insert into test1 select * from datagen1;
   ```
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.13.1
   
   * Flink version : 0.14
   
   * Hive version :
   
   * Hadoop version : 
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   
   **Stacktrace**
   
   ```
   2023-12-12 11:07:55,633 INFO  
org.apache.flink.streaming.api.functions.source.datagen.DataGeneratorSource [] 
- generated 10 rows
   2023-12-12 11:07:55,633 WARN  
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory[] - I/O error 
constructing remote block reader.
   java.nio.channels.ClosedByInterruptException: null
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
 ~[?:1.8.0_382]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658) 
~[?:1.8.0_382]
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) 
~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) 
~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
at 
org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2946) 
~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:815)
 ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:740)
 ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:385)
 ~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:696) 
~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:655) 
~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:926) 
~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:982) 
~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
at java.io.DataInputStream.read(DataInputStream.java:149) ~[?:1.8.0_382]
at java.io.DataInputStream.read(DataInputStream.java:100) ~[?:1.8.0_382]
at java.util.Properties$LineReader.readLine(Properties.java:435) 
~[?:1.8.0_382]
at java.util.Properties.load0(Properties.java:353) ~[?:1.8.0_382]
at java.util.Properties.load(Properties.java:341) ~[?:1.8.0_382]
at 
org.apache.hudi.common.table.HoodieTableConfig.fetchConfigs(HoodieTableConfig.java:351)
 ~[hudi-flink1.14-bundle-0.13.1.jar:0.13.1]
at 
org.apache.hudi.common.table.HoodieTableConfig.(HoodieTableConfig.java:284)
 ~[hudi-flink1.14-bundle-0.13.1.jar:0.13.1]
at 
org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:138)
 ~[hudi-flink1.14-bundle-0.13.1.jar:0.13.1]
at 
org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:689)
 ~[hudi-flink1.14-bundle-0.13.1.jar:0.13.1]
at 
org.apache.hudi.common.table.HoodieTableMetaClient.access$000(HoodieTableMetaClient.java:81)
 ~[hudi-flink1.14-bundle-0.13.1.jar:0.13.1]
at 
org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:770)
 ~[hudi-flink1.14-bundle-0.13.1.jar:0.13.1]
at 
org.apache.hudi.table.HoodieFlinkTable.create(HoodieFlinkTable.java:62) 
~[hudi-flink1.14-bundle-0.13.1.jar:0.13.1]
at 
org.apache.hudi.sink.partitioner.profile.WriteProfile.getTable(WriteProfile.java:138)
 ~[hudi-flink1.14-bundle-0.13.1.jar:0.13.1]
at 
org.apache.hudi.sink.partitioner.profile.DeltaWriteProfile.getFileSystemView(DeltaWriteProfile.java:93)
 ~[hudi-flink1.14-bu

[jira] [Created] (HUDI-7223) Hudi Cleaner removing files still required for view N hours old

2023-12-11 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-7223:
---

 Summary: Hudi Cleaner removing files still required for view N 
hours old
 Key: HUDI-7223
 URL: https://issues.apache.org/jira/browse/HUDI-7223
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Timothy Brown


If a user is using time based cleaner policy, they will expect that they can 
query the table state as of N hours ago. This means that they do not want to 
clean up files older than N hours but files that are no longer relevant to the 
table N hours ago. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7222) Fix the loose Scala style check

2023-12-11 Thread Lin Liu (Jira)
Lin Liu created HUDI-7222:
-

 Summary: Fix the loose Scala style check
 Key: HUDI-7222
 URL: https://issues.apache.org/jira/browse/HUDI-7222
 Project: Apache Hudi
  Issue Type: Task
Reporter: Lin Liu
Assignee: Lin Liu


We have seen the Scala code style is loose for many places, like malformed 
imports. We have to fix this kind of problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6979][RFC-76] support event time based compaction strategy [hudi]

2023-12-11 Thread via GitHub


waitingF commented on code in PR #10266:
URL: https://github.com/apache/hudi/pull/10266#discussion_r1423345721


##
rfc/rfc-76/rfc-76.md:
##
@@ -0,0 +1,238 @@
+
+# RFC-[74]: [support EventTimeBasedCompactionStrategy]
+
+## Proposers
+
+- @waitingF
+
+## Approvers
+ - @
+ - @
+
+## Status
+
+JIRA: [HUDI-6979](https://issues.apache.org/jira/browse/HUDI-6979)
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+Currently, to gain low ingestion latency, we can adopt the MergeOnRead table, 
which support appending log files and 
+compact log files into base file later. When querying the snapshot table (RT 
table) generated by MOR, 
+query side have to perform a compaction so that they can get all data, which 
is expected time-consuming causing query latency.
+At the time, hudi provide read-optimized table (RO table) for low query 
latency just like COW.
+
+But currently, there is no compaction strategy based on event time, so there 
is no data freshness guarantee for RO table.
+For cases, user want all data before a specified time, user have to query the 
RT table to get all data with expected high query latency.

Review Comment:
   > With our new file slicing under unbounded io compaction strategy, a 
compaction plan at t is designated as including all the log files complete 
before t, does that make sense to your use case? You can then query the ro 
table after the compaction completes, the ro table data freshness is at least 
up to t.
   
   I dont think so, as there will be file groups in pending compaction which 
will be skipped in scheduling compaction plan, in this case, it will break the 
rule that "the ro table data freshness is at least up to `t`", there may be 
history data in those file groups.
   We should ensure all log files before `t` being compacted, that means we 
should generate new plan if no file group in pending compaction/clustering, 
that is no pending compaction or clustering left. For this, we can introduce a 
new trigger.
   
   > The question is how to tell the reader the freshness of the ro table?
   
   Yeah, this is part of the rfc.  We can extract the freshness from log file 
during compacting



##
rfc/rfc-76/rfc-76.md:
##
@@ -0,0 +1,238 @@
+
+# RFC-[74]: [support EventTimeBasedCompactionStrategy]
+
+## Proposers
+
+- @waitingF
+
+## Approvers
+ - @
+ - @
+
+## Status
+
+JIRA: [HUDI-6979](https://issues.apache.org/jira/browse/HUDI-6979)
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+Currently, to gain low ingestion latency, we can adopt the MergeOnRead table, 
which support appending log files and 

Review Comment:
   yeah, looks like so



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10297:
URL: https://github.com/apache/hudi/pull/10297#issuecomment-1851216322

   
   ## CI report:
   
   * fe0a9fb6f96859b5d2bc2254899f2f5c8624d841 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21438)
 
   * 68f3125e06ebb01154d659c2b4452c7ac4c5aa25 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21461)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7221) Move Hudi Option class from hudi-common to hudi-io module

2023-12-11 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7221:
---

 Summary: Move Hudi Option class from hudi-common to hudi-io module
 Key: HUDI-7221
 URL: https://issues.apache.org/jira/browse/HUDI-7221
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7221) Move Hudi Option class from hudi-common to hudi-io module

2023-12-11 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7221:
---

Assignee: Ethan Guo

> Move Hudi Option class from hudi-common to hudi-io module
> -
>
> Key: HUDI-7221
> URL: https://issues.apache.org/jira/browse/HUDI-7221
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7221) Move Hudi Option class from hudi-common to hudi-io module

2023-12-11 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7221:

Fix Version/s: 1.0.0

> Move Hudi Option class from hudi-common to hudi-io module
> -
>
> Key: HUDI-7221
> URL: https://issues.apache.org/jira/browse/HUDI-7221
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10297:
URL: https://github.com/apache/hudi/pull/10297#issuecomment-1851211389

   
   ## CI report:
   
   * fe0a9fb6f96859b5d2bc2254899f2f5c8624d841 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21438)
 
   * 68f3125e06ebb01154d659c2b4452c7ac4c5aa25 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7176] Add file group reader test framework [hudi]

2023-12-11 Thread via GitHub


hudi-bot commented on PR #10263:
URL: https://github.com/apache/hudi/pull/10263#issuecomment-1851206311

   
   ## CI report:
   
   * 9450ab6f8a73add7f5b25b92c73429ce5f3f25ec UNKNOWN
   * 15f28aa59354f27033840dc140f1e71483257fc4 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21455)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7209] Add configuration to skip not exists file in streaming read [hudi]

2023-12-11 Thread via GitHub


danny0405 commented on code in PR #10295:
URL: https://github.com/apache/hudi/pull/10295#discussion_r142755


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/IncrementalInputSplits.java:
##
@@ -210,9 +210,9 @@ public Result inputSplits(
 LOG.warn("No partitions found for reading in user provided path.");
 return Result.EMPTY;
   }
-  FileStatus[] files = WriteProfiles.getFilesFromMetadata(path, 
metaClient.getHadoopConf(), metadataList, metaClient.getTableType(), false);
-  if (files == null) {
-LOG.warn("Found deleted files in metadata, fall back to full table 
scan.");
+  Boolean skipNotExistsFile = 
conf.getBoolean(FlinkOptions.READ_STREAMING_SKIP_NOT_EXIST_FILE);

Review Comment:
   Did you know that this code only works for batch queries?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7170][WIP] Implement HFile reader independent of HBase [hudi]

2023-12-11 Thread via GitHub


yihua commented on code in PR #10241:
URL: https://github.com/apache/hudi/pull/10241#discussion_r142777


##
hudi-io/src/main/java/org/apache/hudi/io/hfile/HFileDataBlock.java:
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.hfile;
+
+import org.apache.hudi.io.util.IOUtils;
+
+import java.util.Optional;
+
+import static org.apache.hudi.io.hfile.KeyValue.KEY_OFFSET;
+
+/**
+ * Represents a {@link HFileBlockType#DATA} block in the "Scanned block" 
section.
+ */
+public class HFileDataBlock extends HFileBlock {
+  protected HFileDataBlock(HFileContext context,
+   byte[] byteBuff,
+   int startOffsetInBuff) {
+super(context, HFileBlockType.DATA, byteBuff, startOffsetInBuff);
+  }
+
+  /**
+   * Seeks to the key to look up.
+   *
+   * @param key Key to look up.
+   * @return The {@link KeyValue} instance in the block that contains the 
exact same key as the
+   * lookup key; or empty {@link Optional} if the lookup key does not exist.
+   */
+  public Optional seekTo(Key key) {
+int offset = startOffsetInBuff + HFILEBLOCK_HEADER_SIZE;
+int endOffset = offset + onDiskSizeWithoutHeader;
+// TODO: check last 4 bytes in the data block
+while (offset + HFILEBLOCK_HEADER_SIZE < endOffset) {
+  // Full length is not known yet until parsing
+  KeyValue kv = new KeyValue(byteBuff, offset, -1);

Review Comment:
   Removed as the length is not used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2023-12-11 Thread via GitHub


xuzifu666 commented on code in PR #10297:
URL: https://github.com/apache/hudi/pull/10297#discussion_r1423332239


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##
@@ -508,10 +509,7 @@ protected void doWrite(HoodieRecord record, Schema schema, 
TypedProperties props
   flushToDiskIfRequired(record, false);
   writeToBuffer(record);
 } catch (Throwable t) {
-  // Not throwing exception from here, since we don't want to fail the 
entire job
-  // for a single record
-  writeStatus.markFailure(record, t, recordMetadata);
-  LOG.error("Error writing record " + record, t);
+  throw new HoodieInsertException("Error writing record " + record, t);

Review Comment:
   revert changes in HoodieAppendHandle @danny0405 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7170][WIP] Implement HFile reader independent of HBase [hudi]

2023-12-11 Thread via GitHub


yihua commented on code in PR #10241:
URL: https://github.com/apache/hudi/pull/10241#discussion_r1423331913


##
hudi-io/src/main/java/org/apache/hudi/io/hfile/HFileDataBlock.java:
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.hfile;
+
+import org.apache.hudi.io.util.IOUtils;
+
+import java.util.Optional;
+
+import static org.apache.hudi.io.hfile.KeyValue.KEY_OFFSET;
+
+/**
+ * Represents a {@link HFileBlockType#DATA} block in the "Scanned block" 
section.
+ */
+public class HFileDataBlock extends HFileBlock {
+  protected HFileDataBlock(HFileContext context,
+   byte[] byteBuff,
+   int startOffsetInBuff) {
+super(context, HFileBlockType.DATA, byteBuff, startOffsetInBuff);
+  }
+
+  /**
+   * Seeks to the key to look up.
+   *
+   * @param key Key to look up.
+   * @return The {@link KeyValue} instance in the block that contains the 
exact same key as the
+   * lookup key; or empty {@link Optional} if the lookup key does not exist.
+   */
+  public Optional seekTo(Key key) {
+int offset = startOffsetInBuff + HFILEBLOCK_HEADER_SIZE;
+int endOffset = offset + onDiskSizeWithoutHeader;
+// TODO: check last 4 bytes in the data block
+while (offset + HFILEBLOCK_HEADER_SIZE < endOffset) {
+  // Full length is not known yet until parsing
+  KeyValue kv = new KeyValue(byteBuff, offset, -1);
+  // TODO: Reading long instead of two integers per HBase

Review Comment:
   Filed HUDI-7220 for benchmarking.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2023-12-11 Thread via GitHub


xuzifu666 commented on code in PR #10297:
URL: https://github.com/apache/hudi/pull/10297#discussion_r1423332239


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##
@@ -508,10 +509,7 @@ protected void doWrite(HoodieRecord record, Schema schema, 
TypedProperties props
   flushToDiskIfRequired(record, false);
   writeToBuffer(record);
 } catch (Throwable t) {
-  // Not throwing exception from here, since we don't want to fail the 
entire job
-  // for a single record
-  writeStatus.markFailure(record, t, recordMetadata);
-  LOG.error("Error writing record " + record, t);
+  throw new HoodieInsertException("Error writing record " + record, t);

Review Comment:
   revert changed in HoodieAppendHandle @danny0405 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7220) Benchmark new HFile reader

2023-12-11 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7220:

Description: Do micro-benchmarks on the HFile reader on different 
operations, including reading the whole file, do point and prefix lookups 
(seekTo), etc.

> Benchmark new HFile reader
> --
>
> Key: HUDI-7220
> URL: https://issues.apache.org/jira/browse/HUDI-7220
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Priority: Major
>
> Do micro-benchmarks on the HFile reader on different operations, including 
> reading the whole file, do point and prefix lookups (seekTo), etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7220) Benchmark new HFile reader

2023-12-11 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7220:

Fix Version/s: 1.0.0

> Benchmark new HFile reader
> --
>
> Key: HUDI-7220
> URL: https://issues.apache.org/jira/browse/HUDI-7220
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> Do micro-benchmarks on the HFile reader on different operations, including 
> reading the whole file, do point and prefix lookups (seekTo), etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7220) Benchmark new HFile reader

2023-12-11 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7220:

Priority: Blocker  (was: Major)

> Benchmark new HFile reader
> --
>
> Key: HUDI-7220
> URL: https://issues.apache.org/jira/browse/HUDI-7220
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 1.0.0
>
>
> Do micro-benchmarks on the HFile reader on different operations, including 
> reading the whole file, do point and prefix lookups (seekTo), etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7220) Benchmark new HFile reader

2023-12-11 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7220:
---

Assignee: Ethan Guo

> Benchmark new HFile reader
> --
>
> Key: HUDI-7220
> URL: https://issues.apache.org/jira/browse/HUDI-7220
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> Do micro-benchmarks on the HFile reader on different operations, including 
> reading the whole file, do point and prefix lookups (seekTo), etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7220) Benchmark new HFile reader

2023-12-11 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7220:
---

 Summary: Benchmark new HFile reader
 Key: HUDI-7220
 URL: https://issues.apache.org/jira/browse/HUDI-7220
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7170][WIP] Implement HFile reader independent of HBase [hudi]

2023-12-11 Thread via GitHub


yihua commented on code in PR #10241:
URL: https://github.com/apache/hudi/pull/10241#discussion_r1423330790


##
hudi-io/src/test/java/org/apache/hudi/io/hfile/TestHFileReader.java:
##
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.hfile;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.ValueSource;
+
+import java.io.IOException;
+import java.util.Optional;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+
+public class TestHFileReader {

Review Comment:
   I'll add more tests and clean up existing test logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7170][WIP] Implement HFile reader independent of HBase [hudi]

2023-12-11 Thread via GitHub


yihua commented on code in PR #10241:
URL: https://github.com/apache/hudi/pull/10241#discussion_r1423330150


##
hudi-io/src/main/java/org/apache/hudi/io/hfile/HFileReader.java:
##
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.hfile;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.IOException;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+
+/**
+ * A reader reading a HFile.
+ */
+public class HFileReader {
+  private final FSDataInputStream stream;
+  private final long fileSize;
+  private boolean isMetadataInitialized = false;
+  private HFileContext context;
+  private List blockIndexEntryList;
+  private HFileBlock metaIndexBlock;
+  private HFileBlock fileInfoBlock;
+
+  public HFileReader(FSDataInputStream stream, long fileSize) {
+this.stream = stream;
+this.fileSize = fileSize;
+  }
+
+  /**
+   * Initializes the metadata by reading the "Load-on-open" section.
+   *
+   * @throws IOException upon error.
+   */
+  public void initializeMetadata() throws IOException {
+assert !this.isMetadataInitialized;
+
+// Read Trailer (serialized in Proto)
+HFileTrailer trailer = readTrailer(stream, fileSize);
+this.context = HFileContext.builder()
+.compressAlgo(trailer.getCompressionCodec())
+.build();
+HFileBlockReader blockReader = new HFileBlockReader(
+context, stream, trailer.getLoadOnOpenDataOffset(), fileSize - 
HFileTrailer.getTrailerSize());
+HFileRootIndexBlock dataIndexBlock =
+(HFileRootIndexBlock) blockReader.nextBlock(HFileBlockType.ROOT_INDEX);
+this.blockIndexEntryList = 
dataIndexBlock.readDataIndex(trailer.getDataIndexCount());
+this.metaIndexBlock = blockReader.nextBlock(HFileBlockType.ROOT_INDEX);
+this.fileInfoBlock = blockReader.nextBlock(HFileBlockType.FILE_INFO);
+
+this.isMetadataInitialized = true;
+  }
+
+  /**
+   * Seeks to the key to look up.
+   *
+   * @param key Key to look up.
+   * @return The {@link KeyValue} instance in the block that contains the 
exact same key as the
+   * lookup key; or empty {@link Optional} if the lookup key does not exist.
+   * @throws IOException upon error.
+   */
+  public Optional seekTo(Key key) throws IOException {
+BlockIndexEntry lookUpKey = new BlockIndexEntry(key, -1, -1);
+int rootLevelBlockIndex = searchBlockByKey(lookUpKey);
+if (rootLevelBlockIndex < 0) {
+  // Key smaller than the start key of the first block
+  return Optional.empty();
+}
+BlockIndexEntry blockToRead = blockIndexEntryList.get(rootLevelBlockIndex);
+HFileBlockReader blockReader = new HFileBlockReader(
+context, stream, blockToRead.getOffset(), blockToRead.getOffset() + 
(long) blockToRead.getSize());
+HFileDataBlock dataBlock = (HFileDataBlock) 
blockReader.nextBlock(HFileBlockType.DATA);
+return seekToKeyInBlock(dataBlock, key);
+  }
+
+  /**
+   * Reads the HFile major version from the input.
+   *
+   * @param bytes  Input data.
+   * @param offset Offset to start reading.
+   * @return Major version of the file.
+   */
+  public static int readMajorVersion(byte[] bytes, int offset) {

Review Comment:
   This only reads three bytes which is uncommon so I kept it here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7170][WIP] Implement HFile reader independent of HBase [hudi]

2023-12-11 Thread via GitHub


yihua commented on code in PR #10241:
URL: https://github.com/apache/hudi/pull/10241#discussion_r1423329790


##
hudi-io/src/main/java/org/apache/hudi/io/hfile/HFileReader.java:
##
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.hfile;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.IOException;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+
+/**
+ * A reader reading a HFile.
+ */
+public class HFileReader {
+  private final FSDataInputStream stream;
+  private final long fileSize;
+  private boolean isMetadataInitialized = false;
+  private HFileContext context;
+  private List blockIndexEntryList;
+  private HFileBlock metaIndexBlock;
+  private HFileBlock fileInfoBlock;
+
+  public HFileReader(FSDataInputStream stream, long fileSize) {
+this.stream = stream;
+this.fileSize = fileSize;
+  }
+
+  /**
+   * Initializes the metadata by reading the "Load-on-open" section.
+   *
+   * @throws IOException upon error.
+   */
+  public void initializeMetadata() throws IOException {
+assert !this.isMetadataInitialized;
+
+// Read Trailer (serialized in Proto)
+HFileTrailer trailer = readTrailer(stream, fileSize);
+this.context = HFileContext.builder()
+.compressAlgo(trailer.getCompressionCodec())
+.build();
+HFileBlockReader blockReader = new HFileBlockReader(
+context, stream, trailer.getLoadOnOpenDataOffset(), fileSize - 
HFileTrailer.getTrailerSize());
+HFileRootIndexBlock dataIndexBlock =
+(HFileRootIndexBlock) blockReader.nextBlock(HFileBlockType.ROOT_INDEX);
+this.blockIndexEntryList = 
dataIndexBlock.readDataIndex(trailer.getDataIndexCount());
+this.metaIndexBlock = blockReader.nextBlock(HFileBlockType.ROOT_INDEX);
+this.fileInfoBlock = blockReader.nextBlock(HFileBlockType.FILE_INFO);
+
+this.isMetadataInitialized = true;
+  }
+
+  /**
+   * Seeks to the key to look up.
+   *
+   * @param key Key to look up.
+   * @return The {@link KeyValue} instance in the block that contains the 
exact same key as the
+   * lookup key; or empty {@link Optional} if the lookup key does not exist.
+   * @throws IOException upon error.
+   */
+  public Optional seekTo(Key key) throws IOException {
+BlockIndexEntry lookUpKey = new BlockIndexEntry(key, -1, -1);
+int rootLevelBlockIndex = searchBlockByKey(lookUpKey);
+if (rootLevelBlockIndex < 0) {
+  // Key smaller than the start key of the first block
+  return Optional.empty();
+}
+BlockIndexEntry blockToRead = blockIndexEntryList.get(rootLevelBlockIndex);
+HFileBlockReader blockReader = new HFileBlockReader(
+context, stream, blockToRead.getOffset(), blockToRead.getOffset() + 
(long) blockToRead.getSize());
+HFileDataBlock dataBlock = (HFileDataBlock) 
blockReader.nextBlock(HFileBlockType.DATA);

Review Comment:
   More APIs need to be added to the HFile reader based on the current prefix 
search implemented in `HoodieAvroHFileReader` 
(https://github.com/apache/hudi/blob/release-0.14.0/hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroHFileReader.java#L298).
 I plan to cover this in a separate PR: HUDI-7217



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7213] When using wrong tabe.type value in hudi catalog happends npe [hudi]

2023-12-11 Thread via GitHub


danny0405 commented on code in PR #10300:
URL: https://github.com/apache/hudi/pull/10300#discussion_r1423326760


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/TableOptionProperties.java:
##
@@ -189,7 +190,16 @@ public static Map 
translateFlinkTableProperties2Spark(
 return properties.entrySet().stream()
 .filter(e -> KEY_MAPPING.containsKey(e.getKey()) && 
!catalogTable.getOptions().containsKey(KEY_MAPPING.get(e.getKey(
 .collect(Collectors.toMap(e -> KEY_MAPPING.get(e.getKey()),
-e -> e.getKey().equalsIgnoreCase(FlinkOptions.TABLE_TYPE.key()) ? 
VALUE_MAPPING.get(e.getValue()) : e.getValue()));
+e -> {
+  if (e.getKey().equalsIgnoreCase(FlinkOptions.TABLE_TYPE.key())) {
+  String sparkTableType = VALUE_MAPPING.get(e.getValue());
+  if (sparkTableType == null) {
+throw new HoodieValidationException(String.format("%s's 
value is invalid", e.getKey()));
+  }

Review Comment:
   Nice catch, can you write a UT in `TestHoodieCatalog` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] add a util to print hudi options [hudi]

2023-12-11 Thread via GitHub


danny0405 commented on PR #10301:
URL: https://github.com/apache/hudi/pull/10301#issuecomment-1851190487

   Can you give us an example why the print of the options is necessary?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi Bootstrap with METADATA_ONLY with Hive Sync Fails on EMR Serverlkess 6.10 [hudi]

2023-12-11 Thread via GitHub


soumilshah1995 closed issue #8565: [SUPPORT] Hudi Bootstrap with METADATA_ONLY 
with Hive Sync Fails on EMR Serverlkess 6.10
URL: https://github.com/apache/hudi/issues/8565


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-12-11 Thread via GitHub


zyclove commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1851189708

   This problem was recently verified based on version 0.14 with the #10224 . 
It has been running for a week and the problem has not been reproduced for the 
time being. The release-0.14.1 need to be merged. 
   @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi Offline Compaction in EMR Serverless 6.10 for YouTube Video [hudi]

2023-12-11 Thread via GitHub


soumilshah1995 closed issue #8400: [SUPPORT] Hudi Offline Compaction in EMR 
Serverless 6.10 for YouTube Video 
URL: https://github.com/apache/hudi/issues/8400


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Issue with Hudi Hive Sync Tool with Hive MetaStore [hudi]

2023-12-11 Thread via GitHub


soumilshah1995 closed issue #10231: [SUPPORT] Issue with Hudi Hive Sync Tool  
with Hive MetaStore
URL: https://github.com/apache/hudi/issues/10231


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7170][WIP] Implement HFile reader independent of HBase [hudi]

2023-12-11 Thread via GitHub


yihua commented on code in PR #10241:
URL: https://github.com/apache/hudi/pull/10241#discussion_r1423320093


##
hudi-io/src/main/java/org/apache/hudi/io/hfile/HFileReader.java:
##
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.hfile;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.IOException;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+
+/**
+ * A reader reading a HFile.
+ */
+public class HFileReader {
+  private final FSDataInputStream stream;
+  private final long fileSize;
+  private boolean isMetadataInitialized = false;
+  private HFileContext context;
+  private List blockIndexEntryList;
+  private HFileBlock metaIndexBlock;
+  private HFileBlock fileInfoBlock;
+
+  public HFileReader(FSDataInputStream stream, long fileSize) {
+this.stream = stream;
+this.fileSize = fileSize;
+  }
+
+  /**
+   * Initializes the metadata by reading the "Load-on-open" section.
+   *
+   * @throws IOException upon error.
+   */
+  public void initializeMetadata() throws IOException {
+assert !this.isMetadataInitialized;
+
+// Read Trailer (serialized in Proto)
+HFileTrailer trailer = readTrailer(stream, fileSize);
+this.context = HFileContext.builder()
+.compressAlgo(trailer.getCompressionCodec())
+.build();
+HFileBlockReader blockReader = new HFileBlockReader(
+context, stream, trailer.getLoadOnOpenDataOffset(), fileSize - 
HFileTrailer.getTrailerSize());
+HFileRootIndexBlock dataIndexBlock =
+(HFileRootIndexBlock) blockReader.nextBlock(HFileBlockType.ROOT_INDEX);
+this.blockIndexEntryList = 
dataIndexBlock.readDataIndex(trailer.getDataIndexCount());
+this.metaIndexBlock = blockReader.nextBlock(HFileBlockType.ROOT_INDEX);
+this.fileInfoBlock = blockReader.nextBlock(HFileBlockType.FILE_INFO);
+
+this.isMetadataInitialized = true;
+  }
+
+  /**
+   * Seeks to the key to look up.
+   *
+   * @param key Key to look up.
+   * @return The {@link KeyValue} instance in the block that contains the 
exact same key as the
+   * lookup key; or empty {@link Optional} if the lookup key does not exist.
+   * @throws IOException upon error.
+   */
+  public Optional seekTo(Key key) throws IOException {
+BlockIndexEntry lookUpKey = new BlockIndexEntry(key, -1, -1);
+int rootLevelBlockIndex = searchBlockByKey(lookUpKey);
+if (rootLevelBlockIndex < 0) {
+  // Key smaller than the start key of the first block
+  return Optional.empty();
+}
+BlockIndexEntry blockToRead = blockIndexEntryList.get(rootLevelBlockIndex);
+HFileBlockReader blockReader = new HFileBlockReader(
+context, stream, blockToRead.getOffset(), blockToRead.getOffset() + 
(long) blockToRead.getSize());
+HFileDataBlock dataBlock = (HFileDataBlock) 
blockReader.nextBlock(HFileBlockType.DATA);
+return seekToKeyInBlock(dataBlock, key);
+  }
+
+  /**
+   * Reads the HFile major version from the input.
+   *
+   * @param bytes  Input data.
+   * @param offset Offset to start reading.
+   * @return Major version of the file.
+   */
+  public static int readMajorVersion(byte[] bytes, int offset) {
+int ch1 = bytes[offset] & 0xFF;
+int ch2 = bytes[offset + 1] & 0xFF;
+int ch3 = bytes[offset + 2] & 0xFF;
+return ((ch1 << 16) + (ch2 << 8) + ch3);
+  }
+
+  /**
+   * Reads and parses the HFile trailer.
+   *
+   * @param stream   HFile input.
+   * @param fileSize HFile size.
+   * @return {@link HFileTrailer} instance.
+   * @throws IOException upon error.
+   */
+  private static HFileTrailer readTrailer(FSDataInputStream stream,
+  long fileSize) throws IOException {
+int bufferSize = HFileTrailer.getTrailerSize();
+long seekPos = fileSize - bufferSize;
+if (seekPos < 0) {
+  // It is hard to imagine such a small HFile.
+  seekPos = 0;
+  bufferSize = (int) fileSize;
+}
+stream.seek(seekPos);
+
+byte[] byteBuff = new byte[bufferSize];
+stream.readFully(byteBuff);
+
+int majorVersion = readMajorVersi

Re: [I] [SUPPORT] restart flink job got InvalidAvroMagicException: Not an Avro data file [hudi]

2023-12-11 Thread via GitHub


danny0405 closed issue #10285: [SUPPORT] restart flink job got 
InvalidAvroMagicException: Not an Avro data file
URL: https://github.com/apache/hudi/issues/10285


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2023-12-11 Thread via GitHub


danny0405 commented on code in PR #10297:
URL: https://github.com/apache/hudi/pull/10297#discussion_r1423319781


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##
@@ -508,10 +509,7 @@ protected void doWrite(HoodieRecord record, Schema schema, 
TypedProperties props
   flushToDiskIfRequired(record, false);
   writeToBuffer(record);
 } catch (Throwable t) {
-  // Not throwing exception from here, since we don't want to fail the 
entire job
-  // for a single record
-  writeStatus.markFailure(record, t, recordMetadata);
-  LOG.error("Error writing record " + record, t);
+  throw new HoodieInsertException("Error writing record " + record, t);

Review Comment:
   In Flink SQL, option `write.ignore.failed` controls the throwing behavior of 
errors.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7170][WIP] Implement HFile reader independent of HBase [hudi]

2023-12-11 Thread via GitHub


yihua commented on code in PR #10241:
URL: https://github.com/apache/hudi/pull/10241#discussion_r1423319899


##
hudi-io/src/main/java/org/apache/hudi/io/hfile/HFileReader.java:
##
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.hfile;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.IOException;
+import java.util.Collections;
+import java.util.List;
+import java.util.Optional;
+
+/**
+ * A reader reading a HFile.
+ */
+public class HFileReader {
+  private final FSDataInputStream stream;
+  private final long fileSize;
+  private boolean isMetadataInitialized = false;
+  private HFileContext context;
+  private List blockIndexEntryList;
+  private HFileBlock metaIndexBlock;
+  private HFileBlock fileInfoBlock;
+
+  public HFileReader(FSDataInputStream stream, long fileSize) {
+this.stream = stream;
+this.fileSize = fileSize;
+  }
+
+  /**
+   * Initializes the metadata by reading the "Load-on-open" section.
+   *
+   * @throws IOException upon error.
+   */
+  public void initializeMetadata() throws IOException {
+assert !this.isMetadataInitialized;
+
+// Read Trailer (serialized in Proto)
+HFileTrailer trailer = readTrailer(stream, fileSize);
+this.context = HFileContext.builder()
+.compressAlgo(trailer.getCompressionCodec())
+.build();
+HFileBlockReader blockReader = new HFileBlockReader(
+context, stream, trailer.getLoadOnOpenDataOffset(), fileSize - 
HFileTrailer.getTrailerSize());
+HFileRootIndexBlock dataIndexBlock =
+(HFileRootIndexBlock) blockReader.nextBlock(HFileBlockType.ROOT_INDEX);
+this.blockIndexEntryList = 
dataIndexBlock.readDataIndex(trailer.getDataIndexCount());
+this.metaIndexBlock = blockReader.nextBlock(HFileBlockType.ROOT_INDEX);
+this.fileInfoBlock = blockReader.nextBlock(HFileBlockType.FILE_INFO);
+
+this.isMetadataInitialized = true;
+  }
+
+  /**
+   * Seeks to the key to look up.
+   *
+   * @param key Key to look up.
+   * @return The {@link KeyValue} instance in the block that contains the 
exact same key as the
+   * lookup key; or empty {@link Optional} if the lookup key does not exist.
+   * @throws IOException upon error.
+   */
+  public Optional seekTo(Key key) throws IOException {
+BlockIndexEntry lookUpKey = new BlockIndexEntry(key, -1, -1);
+int rootLevelBlockIndex = searchBlockByKey(lookUpKey);
+if (rootLevelBlockIndex < 0) {
+  // Key smaller than the start key of the first block
+  return Optional.empty();
+}
+BlockIndexEntry blockToRead = blockIndexEntryList.get(rootLevelBlockIndex);
+HFileBlockReader blockReader = new HFileBlockReader(
+context, stream, blockToRead.getOffset(), blockToRead.getOffset() + 
(long) blockToRead.getSize());
+HFileDataBlock dataBlock = (HFileDataBlock) 
blockReader.nextBlock(HFileBlockType.DATA);
+return seekToKeyInBlock(dataBlock, key);
+  }
+
+  /**
+   * Reads the HFile major version from the input.
+   *
+   * @param bytes  Input data.
+   * @param offset Offset to start reading.
+   * @return Major version of the file.
+   */
+  public static int readMajorVersion(byte[] bytes, int offset) {
+int ch1 = bytes[offset] & 0xFF;
+int ch2 = bytes[offset + 1] & 0xFF;
+int ch3 = bytes[offset + 2] & 0xFF;
+return ((ch1 << 16) + (ch2 << 8) + ch3);
+  }
+
+  /**
+   * Reads and parses the HFile trailer.
+   *
+   * @param stream   HFile input.
+   * @param fileSize HFile size.
+   * @return {@link HFileTrailer} instance.
+   * @throws IOException upon error.
+   */
+  private static HFileTrailer readTrailer(FSDataInputStream stream,
+  long fileSize) throws IOException {
+int bufferSize = HFileTrailer.getTrailerSize();
+long seekPos = fileSize - bufferSize;
+if (seekPos < 0) {
+  // It is hard to imagine such a small HFile.
+  seekPos = 0;
+  bufferSize = (int) fileSize;
+}
+stream.seek(seekPos);
+
+byte[] byteBuff = new byte[bufferSize];
+stream.readFully(byteBuff);
+
+int majorVersion = readMajorVersi

[jira] [Closed] (HUDI-7210) In CleanFunction#open, triggers the cleaning under option 'clean.async.enabled'

2023-12-11 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7210.

Resolution: Fixed

Fixed via master branch: 1cdae9c83ecff683e014c74621c78281677c3fac

> In CleanFunction#open, triggers the cleaning under option 
> 'clean.async.enabled'
> ---
>
> Key: HUDI-7210
> URL: https://issues.apache.org/jira/browse/HUDI-7210
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.1, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated (5f3bc96dc9e -> 1cdae9c83ec)

2023-12-11 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 5f3bc96dc9e [HUDI-7023] Support querying without syncing partition 
metadata to catalog (#10153)
 add 1cdae9c83ec [HUDI-7210] In CleanFunction#open, triggers the cleaning 
under option 'clean.async.enabled' (#10298)

No new revisions were added by this update.

Summary of changes:
 .../main/java/org/apache/hudi/sink/CleanFunction.java  | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)



Re: [PR] [HUDI-7210] In CleanFunction#open, triggers the cleaning under option 'clean.async.enabled' [hudi]

2023-12-11 Thread via GitHub


danny0405 merged PR #10298:
URL: https://github.com/apache/hudi/pull/10298


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >