[GitHub] [hudi] silencily opened a new issue, #8096: [SUPPORT]Whether there are some problems when using hudi-trino-bundle of 0.13.0 version to query hudi table of 0.12.2 version

2023-03-04 Thread via GitHub


silencily opened a new issue, #8096:
URL: https://github.com/apache/hudi/issues/8096

   **Describe the problem you faced**
   
   Now I use trino of 407 version to query hudi table,hudi-trino-bundle version 
i use 0.13.0 version, and our hudi tables were created by 0.12.2 version. I 
need some help to confirm whether there are some problems of known by community 
when using different hudi version. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-4372) Enable matadata table by default for flink

2023-03-04 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-4372.

Fix Version/s: 0.14.0
   (was: 0.13.1)
   Resolution: Fixed

Fixed via master branch: 9bb6b55440cf385844c757344f66148039e657e8

> Enable matadata table by default for flink
> --
>
> Key: HUDI-4372
> URL: https://issues.apache.org/jira/browse/HUDI-4372
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: flink, metadata
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated (2ddcf96cddb -> 9bb6b55440c)

2023-03-04 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 2ddcf96cddb [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale 
view at the timeline server (#8079)
 add 9bb6b55440c [HUDI-4372] Enable matadata table by default for flink 
(#8070)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/client/BaseHoodieWriteClient.java  |  2 +-
 .../client/transaction/ConcurrentOperation.java| 12 +--
 .../hudi/table/action/clean/CleanPlanner.java  | 37 +++---
 .../functional/TestHoodieBackedMetadata.java   |  8 +++--
 .../apache/hudi/configuration/FlinkOptions.java|  2 +-
 .../sink/clustering/HoodieFlinkClusteringJob.java  |  3 ++
 .../hudi/sink/compact/HoodieFlinkCompactor.java|  3 ++
 .../hudi/table/catalog/HoodieHiveCatalog.java  |  2 +-
 .../java/org/apache/hudi/util/CompactionUtil.java  | 19 ++-
 .../apache/hudi/table/ITTestHoodieDataSource.java  | 20 ++--
 .../org/apache/hudi/utils/TestClusteringUtil.java  |  4 +++
 .../org/apache/hudi/utils/TestCompactionUtil.java  | 13 
 packaging/bundle-validation/flink/insert.sql   |  1 +
 packaging/hudi-flink-bundle/pom.xml|  3 ++
 14 files changed, 107 insertions(+), 22 deletions(-)



[GitHub] [hudi] danny0405 merged pull request #8070: [HUDI-4372] Enable metadata table by default for flink

2023-03-04 Thread via GitHub


danny0405 merged PR #8070:
URL: https://github.com/apache/hudi/pull/8070


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] soumilshah1995 closed issue #8031: [SUPPORT] Hudi Timestamp Based Key Generator Need Assistance

2023-03-04 Thread via GitHub


soumilshah1995 closed issue #8031: [SUPPORT] Hudi Timestamp Based Key Generator 
Need Assistance 
URL: https://github.com/apache/hudi/issues/8031


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] soumilshah1995 commented on issue #8031: [SUPPORT] Hudi Timestamp Based Key Generator Need Assistance

2023-03-04 Thread via GitHub


soumilshah1995 commented on issue #8031:
URL: https://github.com/apache/hudi/issues/8031#issuecomment-1454933980

   Adding setting has resolved issue 
   
   ```
   
   try:
   
   import os
   import sys
   import uuid
   
   import pyspark
   from pyspark.sql import SparkSession
   from pyspark import SparkConf, SparkContext
   from pyspark.sql.functions import col, asc, desc
   from pyspark.sql.functions import col, to_timestamp, 
monotonically_increasing_id, to_date, when
   from pyspark.sql.functions import *
   from pyspark.sql.types import *
   from datetime import datetime
   from functools import reduce
   from faker import Faker
   
   
   except Exception as e:
   pass
   
   SUBMIT_ARGS = "--packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.12.1 
pyspark-shell"
   os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
   os.environ['PYSPARK_PYTHON'] = sys.executable
   os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable
   
   spark = SparkSession.builder \
   .config('spark.serializer', 
'org.apache.spark.serializer.KryoSerializer') \
   .config('className', 'org.apache.hudi') \
   .config('spark.sql.hive.convertMetastoreParquet', 'false') \
   .getOrCreate()
   
   
   db_name = "hudidb"
   table_name = "hudi_table"
   
   recordkey = 'uuid'
   precombine = 'date'
   
   path = f"file:///C:/tmp/{db_name}/{table_name}"
   
   method = 'upsert'
   table_type = "COPY_ON_WRITE"  # COPY_ON_WRITE | MERGE_ON_READ
   
   hudi_options = {
   'hoodie.table.name': table_name,
   'hoodie.datasource.write.recordkey.field': recordkey,
   'hoodie.datasource.write.table.name': table_name,
   'hoodie.datasource.write.operation': method,
   'hoodie.datasource.write.precombine.field': precombine,
   
   
   'hoodie.datasource.write.partitionpath.field': 'date',
   
"hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled":"true",
   "hoodie-conf hoodie.datasource.write.partitionpath.field":"date",
   'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.TimestampBasedKeyGenerator',
   'hoodie.deltastreamer.keygen.timebased.timestamp.type': 'DATE_STRING',
   'hoodie.deltastreamer.keygen.timebased.timezone':"GMT+8:00",
   'hoodie.deltastreamer.keygen.timebased.input.dateformat': '-MM-dd 
hh:mm:ss',
   'hoodie.deltastreamer.keygen.timebased.output.dateformat': '/MM/dd'
   
   }
   
   #Input field value: “2020-01-06 12:12:12”
   # Partition path generated from key generator: “2020-01-06 12”
   
   data_items = [
   (1, "mess 1",  111,  "2020-01-06 12:12:12"),
   (2, "mes 2",  22, "2020-01-06 12:12:12"),
   ]
   
   columns = ["uuid", "message", "precomb", "date"]
   
   spark_df = spark.createDataFrame(data=data_items, schema=columns)
   spark_df.show()
   spark_df.printSchema()
   spark_df.write.format("hudi"). \
   options(**hudi_options). \
   mode("append"). \
   save(path)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] soumilshah1995 commented on issue #8031: [SUPPORT] Hudi Timestamp Based Key Generator Need Assistance

2023-03-04 Thread via GitHub


soumilshah1995 commented on issue #8031:
URL: https://github.com/apache/hudi/issues/8031#issuecomment-1454933866

   Thank you very much for taking time and answering my question 
   looking fwd to pass this as tutorial to community 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (d40a6211f64 -> 2ddcf96cddb)

2023-03-04 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from d40a6211f64 [HUDI-5796] Adding auto inferring partition from incoming 
df (#7951)
 add 2ddcf96cddb [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale 
view at the timeline server (#8079)

No new revisions were added by this update.

Summary of changes:
 .../TestRemoteFileSystemViewWithMetadataTable.java | 275 +
 .../table/view/AbstractTableFileSystemView.java|  34 ++-
 .../IncrementalTimelineSyncFileSystemView.java |  20 +-
 .../metadata/HoodieMetadataFileSystemView.java |  20 +-
 .../HoodieBackedTestDelayedTableMetadata.java  |  54 
 .../hudi/timeline/service/RequestHandler.java  |  27 +-
 6 files changed, 390 insertions(+), 40 deletions(-)
 create mode 100644 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestRemoteFileSystemViewWithMetadataTable.java
 create mode 100644 
hudi-common/src/test/java/org/apache/hudi/metadata/HoodieBackedTestDelayedTableMetadata.java



[GitHub] [hudi] yihua merged pull request #8079: [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server

2023-03-04 Thread via GitHub


yihua merged PR #8079:
URL: https://github.com/apache/hudi/pull/8079


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8095: [HUDI-5866] Fix unnecessary log messages during bulk insert in Spark

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8095:
URL: https://github.com/apache/hudi/pull/8095#issuecomment-1454858086

   
   ## CI report:
   
   * d84c2d5274ac3e5525996156d3057a77dfe200d3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15579)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8095: [HUDI-5866] Fix unnecessary log messages during bulk insert in Spark

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8095:
URL: https://github.com/apache/hudi/pull/8095#issuecomment-1454837270

   
   ## CI report:
   
   * d84c2d5274ac3e5525996156d3057a77dfe200d3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15579)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8095: [HUDI-5866] Fix unnecessary log messages during bulk insert in Spark

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8095:
URL: https://github.com/apache/hudi/pull/8095#issuecomment-1454823902

   
   ## CI report:
   
   * d84c2d5274ac3e5525996156d3057a77dfe200d3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] aajisaka commented on pull request #8095: [HUDI-5866] Fix unnecessary log messages during bulk insert in Spark

2023-03-04 Thread via GitHub


aajisaka commented on PR #8095:
URL: https://github.com/apache/hudi/pull/8095#issuecomment-1454821323

   It's first time for me to contribute to Hudi and I don't have Hudi 
contributor privilege. Can someone grant it to me?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5866) Fix unnecessary log messages during bulk insert in Spark

2023-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5866:
-
Labels: pull-request-available  (was: )

> Fix unnecessary log messages during bulk insert in Spark
> 
>
> Key: HUDI-5866
> URL: https://issues.apache.org/jira/browse/HUDI-5866
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 0.13.0
>Reporter: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
>
> HUDI-5544 fixed excessive log message issue in Flink, but it's not fixed in 
> Spark. We need to make a similar fix in hudi-spark-client  
> https://github.com/apache/hudi/blob/47356a57930687c1bdfa66d1a62421d8a5fc0b29/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BulkInsertDataInternalWriterHelper.java#L147



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] aajisaka opened a new pull request, #8095: [HUDI-5866] Fix unnecessary log messages during bulk insert in Spark

2023-03-04 Thread via GitHub


aajisaka opened a new pull request, #8095:
URL: https://github.com/apache/hudi/pull/8095

   ### Change Logs
   
   Currently a log msg that says "Creating new file for partition path" is 
generated every time the current partition changes, even when no new file is 
being created (which is confusing). This issue is fixed by #7658 in Flink, but 
it's not fixed in Spark.
   
   ### Impact
   
   N/A
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8070: [HUDI-4372] Enable metadata table by default for flink

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8070:
URL: https://github.com/apache/hudi/pull/8070#issuecomment-1454803021

   
   ## CI report:
   
   * 8bc5774747acac448ef96b036a4e38d832255441 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15578)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Reopened] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reopened HUDI-5728:
--

> HoodieTimelineArchiver archives the latest instant before inflight 
> replacecommit
> 
>
> Key: HUDI-5728
> URL: https://issues.apache.org/jira/browse/HUDI-5728
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: table-service
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> When inline or async clustering is enabled, we need to ensure that there is a 
> commit in the active timeline to check whether the file slice generated in 
> pending clustering after archive isn't committed via 
> {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore 
> HoodieTimelineArchiver archive the latest instant before inflight 
> replacecommit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang resolved HUDI-5728.
--

> HoodieTimelineArchiver archives the latest instant before inflight 
> replacecommit
> 
>
> Key: HUDI-5728
> URL: https://issues.apache.org/jira/browse/HUDI-5728
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: table-service
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> When inline or async clustering is enabled, we need to ensure that there is a 
> commit in the active timeline to check whether the file slice generated in 
> pending clustering after archive isn't committed via 
> {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore 
> HoodieTimelineArchiver archive the latest instant before inflight 
> replacecommit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang reopened HUDI-5772:
--

> Align Flink clustering configuration with HoodieClusteringConfig
> 
>
> Key: HUDI-5772
> URL: https://issues.apache.org/jira/browse/HUDI-5772
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.13.1
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
>
> In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are 
> 'clustering.plan.strategy.cluster.begin.partition', 
> 'clustering.plan.strategy.cluster.end.partition', 
> 'clustering.plan.strategy.partition.regex.pattern', 
> 'clustering.plan.strategy.partition.selected' options which do not align the 
> clustering configuration of HoodieClusteringConfig. FlinkOptions, 
> FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering 
> configuration with HoodieClusteringConfig.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang resolved HUDI-5772.
--

> Align Flink clustering configuration with HoodieClusteringConfig
> 
>
> Key: HUDI-5772
> URL: https://issues.apache.org/jira/browse/HUDI-5772
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.13.1
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
>
> In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are 
> 'clustering.plan.strategy.cluster.begin.partition', 
> 'clustering.plan.strategy.cluster.end.partition', 
> 'clustering.plan.strategy.partition.regex.pattern', 
> 'clustering.plan.strategy.partition.selected' options which do not align the 
> clustering configuration of HoodieClusteringConfig. FlinkOptions, 
> FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering 
> configuration with HoodieClusteringConfig.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-5531) RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to RECENT_PARTITIONS

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang closed HUDI-5531.

Resolution: Won't Fix

> RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode should rename to 
> RECENT_PARTITIONS
> 
>
> Key: HUDI-5531
> URL: https://issues.apache.org/jira/browse/HUDI-5531
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Minor
> Fix For: 0.13.1
>
>
> The javadoc of `ClusteringPlanPartitionFilter` mentions that RECENT DAYS: 
> output recent partition given skip num and days lookback config, therefore 
> the RECENT_DAYS strategy doesn't match the semantics because it assumes that 
> Hudi partitions are partitioned by day, but partitioning by hour can also use 
> this strategy. RECENT_DAYS strategy of ClusteringPlanPartitionFilterMode 
> should rename to RECENT_PARTITIONS for the semantics match.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-2503) HoodieFlinkWriteClient supports to allow parallel writing to tables using Locking service

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang closed HUDI-2503.

Resolution: Fixed

> HoodieFlinkWriteClient supports to allow parallel writing to tables using 
> Locking service
> -
>
> Key: HUDI-2503
> URL: https://issues.apache.org/jira/browse/HUDI-2503
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> The strategy interface for conflict resolution with multiple writers is 
> introduced and the SparkRDDWriteClient has integrated with the 
> ConflictResolutionStrategy. HoodieFlinkWriteClient should also support to 
> allow parallel writing to tables using Locking service based on 
> ConflictResolutionStrategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang closed HUDI-5772.

Resolution: Fixed

> Align Flink clustering configuration with HoodieClusteringConfig
> 
>
> Key: HUDI-5772
> URL: https://issues.apache.org/jira/browse/HUDI-5772
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.13.1
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
>
> In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are 
> 'clustering.plan.strategy.cluster.begin.partition', 
> 'clustering.plan.strategy.cluster.end.partition', 
> 'clustering.plan.strategy.partition.regex.pattern', 
> 'clustering.plan.strategy.partition.selected' options which do not align the 
> clustering configuration of HoodieClusteringConfig. FlinkOptions, 
> FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering 
> configuration with HoodieClusteringConfig.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-5728) HoodieTimelineArchiver archives the latest instant before inflight replacecommit

2023-03-04 Thread Nicholas Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Jiang closed HUDI-5728.

Resolution: Fixed

> HoodieTimelineArchiver archives the latest instant before inflight 
> replacecommit
> 
>
> Key: HUDI-5728
> URL: https://issues.apache.org/jira/browse/HUDI-5728
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: table-service
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> When inline or async clustering is enabled, we need to ensure that there is a 
> commit in the active timeline to check whether the file slice generated in 
> pending clustering after archive isn't committed via 
> {{{}HoodieFileGroup#isFileSliceCommitted(slice){}}}. Therefore 
> HoodieTimelineArchiver archive the latest instant before inflight 
> replacecommit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #8070: [HUDI-4372] Enable metadata table by default for flink

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8070:
URL: https://github.com/apache/hudi/pull/8070#issuecomment-1454766654

   
   ## CI report:
   
   * 10d3659dcc94bc069d0da83ee3b711bf4ff079fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15573)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15577)
 
   * 8bc5774747acac448ef96b036a4e38d832255441 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15578)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8070: [HUDI-4372] Enable metadata table by default for flink

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8070:
URL: https://github.com/apache/hudi/pull/8070#issuecomment-1454763414

   
   ## CI report:
   
   * 10d3659dcc94bc069d0da83ee3b711bf4ff079fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15573)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15577)
 
   * 8bc5774747acac448ef96b036a4e38d832255441 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8070: [HUDI-4372] Enable metadata table by default for flink

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8070:
URL: https://github.com/apache/hudi/pull/8070#issuecomment-1454761576

   
   ## CI report:
   
   * 10d3659dcc94bc069d0da83ee3b711bf4ff079fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15573)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15577)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #8070: [HUDI-4372] Enable metadata table by default for flink

2023-03-04 Thread via GitHub


danny0405 commented on PR #8070:
URL: https://github.com/apache/hudi/pull/8070#issuecomment-1454757688

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8079: [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8079:
URL: https://github.com/apache/hudi/pull/8079#issuecomment-1454732191

   
   ## CI report:
   
   * 103f3efa119c4de262544fd1ee412c5375bf55cf UNKNOWN
   * d2b11c5747266e7f3cb77dfa19193bdb89548e50 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15562)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8079: [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8079:
URL: https://github.com/apache/hudi/pull/8079#issuecomment-1454730347

   
   ## CI report:
   
   * 103f3efa119c4de262544fd1ee412c5375bf55cf UNKNOWN
   * d660deb903eed17560554a0145464598089fb3ea Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15574)
 
   * d2b11c5747266e7f3cb77dfa19193bdb89548e50 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8094: [HUDI-5876] Remove usage of deprecated TableConfig.

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8094:
URL: https://github.com/apache/hudi/pull/8094#issuecomment-1454727473

   
   ## CI report:
   
   * 0123e176853d34eabeb39c71f45061b927c0d93a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15575)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8079: [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8079:
URL: https://github.com/apache/hudi/pull/8079#issuecomment-1454712975

   
   ## CI report:
   
   * 103f3efa119c4de262544fd1ee412c5375bf55cf UNKNOWN
   * d660deb903eed17560554a0145464598089fb3ea Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15574)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8070: [HUDI-4372] Enable metadata table by default for flink

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8070:
URL: https://github.com/apache/hudi/pull/8070#issuecomment-1454700721

   
   ## CI report:
   
   * 10d3659dcc94bc069d0da83ee3b711bf4ff079fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15573)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8076: Support bulk_insert for insert_overwrite and insert_overwrite_table

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8076:
URL: https://github.com/apache/hudi/pull/8076#issuecomment-1454680899

   
   ## CI report:
   
   * 6a239ada8998fd440f19c0082b26d206ed589870 UNKNOWN
   * f384bbc843028360687903b3b6de835685235b68 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15570)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8094: [HUDI-5876] Remove usage of deprecated TableConfig.

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8094:
URL: https://github.com/apache/hudi/pull/8094#issuecomment-1454668571

   
   ## CI report:
   
   * 0123e176853d34eabeb39c71f45061b927c0d93a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15575)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8094: [HUDI-5876] Remove usage of deprecated TableConfig.

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8094:
URL: https://github.com/apache/hudi/pull/8094#issuecomment-1454667399

   
   ## CI report:
   
   * 0123e176853d34eabeb39c71f45061b927c0d93a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8079: [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8079:
URL: https://github.com/apache/hudi/pull/8079#issuecomment-1454667392

   
   ## CI report:
   
   * 103f3efa119c4de262544fd1ee412c5375bf55cf UNKNOWN
   * 7fff406e74cdf3faf047634a2d596399fa49f059 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15572)
 
   * d660deb903eed17560554a0145464598089fb3ea Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15574)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5876) Remove usage of deprecated TableConfig.

2023-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5876:
-
Labels: pull-request-available  (was: )

> Remove usage of deprecated TableConfig.
> ---
>
> Key: HUDI-5876
> URL: https://issues.apache.org/jira/browse/HUDI-5876
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> This is a small change, I found out that SortOperatorGen initializes 
> TableConfig using deprecated method. Use recommended methods to improve.
> TableConfig
> /** Please use \{@link TableConfig#getDefault()} instead. */
> @Deprecated
> public TableConfig() {}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] slfan1989 opened a new pull request, #8094: [HUDI-5876] Remove usage of deprecated TableConfig.

2023-03-04 Thread via GitHub


slfan1989 opened a new pull request, #8094:
URL: https://github.com/apache/hudi/pull/8094

   ### Change Logs
   
   JIRA: HUDI-5876. Remove usage of deprecated TableConfig.
   
   This is a small change, I found out that SortOperatorGen initializes 
TableConfig using deprecated method. Use recommended methods to improve.
   
   TableConfig
   ```
   /** Please use {@link TableConfig#getDefault()} instead. */
   @Deprecated
   public TableConfig() {}
   ```
   
   ### Impact
   
   none.
   
   ### Risk level (write none, low medium or high below)
   
   none.
   
   ### Documentation Update
   
   none.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8079: [HUDI-5863] Fix HoodieMetadataFileSystemView serving stale view at the timeline server

2023-03-04 Thread via GitHub


hudi-bot commented on PR #8079:
URL: https://github.com/apache/hudi/pull/8079#issuecomment-1454666283

   
   ## CI report:
   
   * 103f3efa119c4de262544fd1ee412c5375bf55cf UNKNOWN
   * c162956f9f418b4603328c37f9e2babf59613d4b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15571)
 
   * 7fff406e74cdf3faf047634a2d596399fa49f059 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15572)
 
   * d660deb903eed17560554a0145464598089fb3ea UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5876) Remove usage of deprecated TableConfig.

2023-03-04 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-5876:


 Summary: Remove usage of deprecated TableConfig.
 Key: HUDI-5876
 URL: https://issues.apache.org/jira/browse/HUDI-5876
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Shilun Fan
Assignee: Shilun Fan


This is a small change, I found out that SortOperatorGen initializes 
TableConfig using deprecated method. Use recommended methods to improve.

TableConfig

/** Please use \{@link TableConfig#getDefault()} instead. */
@Deprecated
public TableConfig() {}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 commented on a diff in pull request #7687: [HUDI-5606] Update to handle deletes in postgres debezium

2023-03-04 Thread via GitHub


danny0405 commented on code in PR #7687:
URL: https://github.com/apache/hudi/pull/7687#discussion_r1125417796


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/debezium/DebeziumSource.java:
##
@@ -86,21 +90,28 @@ public DebeziumSource(TypedProperties props, 
JavaSparkContext sparkContext,
 deserializerClassName = 
props.getString(DataSourceWriteOptions.KAFKA_AVRO_VALUE_DESERIALIZER_CLASS().key(),
 
DataSourceWriteOptions.KAFKA_AVRO_VALUE_DESERIALIZER_CLASS().defaultValue());
 
+// Currently, debezium source requires Confluent/Kafka schema-registry to 
fetch the latest schema.
+if (schemaProvider == null || !(schemaProvider instanceof 
SchemaRegistryProvider)) {
+  schemaRegistryProvider = new SchemaRegistryProvider(props, sparkContext);
+  schemaProvider = schemaRegistryProvider;
+} else {
+  schemaRegistryProvider = (SchemaRegistryProvider) schemaProvider;
+}
+
 try {
   props.put(NATIVE_KAFKA_VALUE_DESERIALIZER_PROP, 
Class.forName(deserializerClassName).getName());
+  if 
(deserializerClassName.equals(KafkaAvroSchemaDeserializer.class.getName())) {
+if (schemaProvider == null) {
+  throw new HoodieIOException("SchemaProvider has to be set to use 
KafkaAvroSchemaDeserializer");
+}
+props.put(KAFKA_AVRO_VALUE_DESERIALIZER_SCHEMA, 
schemaProvider.getSourceSchema().toString());
+  }

Review Comment:
   Let's try to add some tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7687: [HUDI-5606] Update to handle deletes in postgres debezium

2023-03-04 Thread via GitHub


danny0405 commented on code in PR #7687:
URL: https://github.com/apache/hudi/pull/7687#discussion_r1125417754


##
hudi-common/src/main/java/org/apache/hudi/common/model/debezium/AbstractDebeziumAvroPayload.java:
##
@@ -55,19 +55,26 @@ public AbstractDebeziumAvroPayload(Option 
record) {
 
   @Override
   public Option getInsertValue(Schema schema) throws 
IOException {
-IndexedRecord insertRecord = getInsertRecord(schema);
-return handleDeleteOperation(insertRecord);
+Option insertRecord = getInsertRecord(schema);
+if (!insertRecord.isPresent()) {
+  return insertRecord;
+}
+return handleDeleteOperation(insertRecord.get());
   }
 
   @Override
   public Option combineAndGetUpdateValue(IndexedRecord 
currentValue, Schema schema) throws IOException {
 // Step 1: If the time occurrence of the current record in storage is 
higher than the time occurrence of the
 // insert record (including a delete record), pick the current record.
-if (shouldPickCurrentRecord(currentValue, getInsertRecord(schema), 
schema)) {
-  return Option.of(currentValue);
+Option indexedRecordOption = getInsertValue(schema);
+if (indexedRecordOption.isPresent()) {
+  if (shouldPickCurrentRecord(currentValue, getInsertRecord(schema).get(), 
schema)) {
+return Option.of(currentValue);
+  }
+  // Step 2: Pick the insert record (as a delete record if its a deleted 
event)
+  return getInsertValue(schema);

Review Comment:
   No need to invoke `getInsertValue(schema);` twice, can fallback to line 77 
directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nfarah86 closed pull request #8093: Docs update1

2023-03-04 Thread via GitHub


nfarah86 closed pull request #8093: Docs update1
URL: https://github.com/apache/hudi/pull/8093


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nfarah86 opened a new pull request, #8093: Docs update1

2023-03-04 Thread via GitHub


nfarah86 opened a new pull request, #8093:
URL: https://github.com/apache/hudi/pull/8093

   cc @yihua cc @danny0405  cc @bhasudha please review the PR for docs: 
   
   Timeline
   ![Screenshot 2023-03-03 at 4 37 51 
PM](https://user-images.githubusercontent.com/5392555/222884865-91878270-85e6-450d-ae53-cc68e87875b1.png)
   
   Flink
   ![Screenshot 2023-03-04 at 12 07 23 
AM](https://user-images.githubusercontent.com/5392555/222884867-fb69a5bb-2b56-40bd-a8df-ba5073e0.png)
   
   File sizing
   ![Screenshot 2023-03-04 at 12 07 57 
AM](https://user-images.githubusercontent.com/5392555/222884868-67aea232-7759-4bc2-8be9-bed098223457.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org