date:20220807

[hudi] branch master updated: [HUDI-4447] fix SQL metasync when perform delete table operation (#6180)

2022-08-07 Thread xushiyan

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 61fc3c03a6 [HUDI-4447] fix SQL metasync when perform delete table 
operation (#6180)
61fc3c03a6 is described below

commit 61fc3c03a69c76c47e65a1b2ee8d17bd9477c3a2
Author: RexXiong 
AuthorDate: Mon Aug 8 13:59:38 2022 +0800

[HUDI-4447] fix SQL metasync when perform delete table operation (#6180)
---
 .../main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala  | 5 +
 1 file changed, 5 insertions(+)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala
index cfb357ee90..7d6db19edf 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala
@@ -258,6 +258,7 @@ trait ProvidesHoodieConfig extends Logging {
 
 val options = hoodieCatalogTable.catalogProperties
 val enableHive = isUsingHiveCatalog(sparkSession)
+val partitionFields = hoodieCatalogTable.partitionFields.mkString(",")
 
 withSparkConf(sparkSession, options) {
   Map(
@@ -273,7 +274,11 @@ trait ProvidesHoodieConfig extends Logging {
 HoodieSyncConfig.META_SYNC_ENABLED.key -> enableHive.toString,
 HiveSyncConfigHolder.HIVE_SYNC_ENABLED.key -> enableHive.toString,
 HiveSyncConfigHolder.HIVE_SYNC_MODE.key -> 
hiveSyncConfig.getStringOrDefault(HiveSyncConfigHolder.HIVE_SYNC_MODE, 
HiveSyncMode.HMS.name()),
+HoodieSyncConfig.META_SYNC_DATABASE_NAME.key -> 
hiveSyncConfig.getStringOrDefault(HoodieSyncConfig.META_SYNC_DATABASE_NAME),
+HoodieSyncConfig.META_SYNC_TABLE_NAME.key -> 
hiveSyncConfig.getStringOrDefault(HoodieSyncConfig.META_SYNC_TABLE_NAME),
 HiveSyncConfigHolder.HIVE_SUPPORT_TIMESTAMP_TYPE.key -> 
hiveSyncConfig.getBoolean(HiveSyncConfigHolder.HIVE_SUPPORT_TIMESTAMP_TYPE).toString,
+HoodieSyncConfig.META_SYNC_PARTITION_FIELDS.key -> partitionFields,
+HoodieSyncConfig.META_SYNC_PARTITION_EXTRACTOR_CLASS.key -> 
hiveSyncConfig.getStringOrDefault(HoodieSyncConfig.META_SYNC_PARTITION_EXTRACTOR_CLASS),
 HoodieWriteConfig.DELETE_PARALLELISM_VALUE.key -> 
hoodieProps.getString(HoodieWriteConfig.DELETE_PARALLELISM_VALUE.key, "200"),
 SqlKeyGenerator.PARTITION_SCHEMA -> partitionSchema.toDDL
   )

[GitHub] [hudi] xushiyan merged pull request #6180: [HUDI-4447] fix the sync problem when performing delete table operation

2022-08-07 Thread GitBox



xushiyan merged PR #6180:
URL: https://github.com/apache/hudi/pull/6180


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] XuQianJin-Stars opened a new pull request, #6325: [MINOR] improve flink dummySink's parallelism

2022-08-07 Thread GitBox



XuQianJin-Stars opened a new pull request, #6325:
URL: https://github.com/apache/hudi/pull/6325

   ### Change Logs
   
   
![image](https://user-images.githubusercontent.com/10494131/183349245-2f480caa-bc97-44d3-9e4a-dfb1b9eeaa78.png)
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6281: [SUPPORT] AwsGlueCatalogSyncTool -The number of partition keys do not match the number of partition values

2022-08-07 Thread GitBox



yihua commented on issue #6281:
URL: https://github.com/apache/hudi/issues/6281#issuecomment-1207690470

   @zhedoubushishi @rahil-c could you guys help here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6289: [SUPPORT] hudi support ingest oracle data with flink cdc？

2022-08-07 Thread GitBox



yihua commented on issue #6289:
URL: https://github.com/apache/hudi/issues/6289#issuecomment-1207688475

   Closing this issue as the question is answered.  @stephen5538 feel free to 
reopen the issue if you have more questions on the topic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua closed issue #6289: [SUPPORT] hudi support ingest oracle data with flink cdc？

2022-08-07 Thread GitBox



yihua closed issue #6289: [SUPPORT] hudi support ingest oracle data with flink 
cdc？
URL: https://github.com/apache/hudi/issues/6289


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6297: [SUPPORT] Flink SQL client cow table query error "org/apache/parquet/column/ColumnDescriptor" (but mor table query normal)

2022-08-07 Thread GitBox



yihua commented on issue #6297:
URL: https://github.com/apache/hudi/issues/6297#issuecomment-1207687388

   @danny0405 could this be due to the dependency conflict?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6304: Hudi MultiTable Deltastreamer not updating glue catalog when new column added on Source

2022-08-07 Thread GitBox



yihua commented on issue #6304:
URL: https://github.com/apache/hudi/issues/6304#issuecomment-1207686416

   @SubashRanganathan have you contacted AWS support regarding this?
   cc @zhedoubushishi @rahil-c who may also help here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6305: Hudi Delta Streamer unable to read Older Dates

2022-08-07 Thread GitBox



yihua commented on issue #6305:
URL: https://github.com/apache/hudi/issues/6305#issuecomment-1207684814

   @alexeykudinkin based on the discussion in Slack, is there a solution to it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-4546) Optimize catalog cast logic in HoodieSpark3Analysis

2022-08-07 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-4546.
-
Resolution: Fixed

> Optimize catalog cast logic in HoodieSpark3Analysis
> ---
>
> Key: HUDI-4546
> URL: https://issues.apache.org/jira/browse/HUDI-4546
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> In HoodieSpark3Analysis, if it is CreateV2Table, there is no need to cast the 
> HoodieCatalog since CreateV2Table contains TableCatalog and we would use it 
> directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4546) Optimize catalog cast logic in HoodieSpark3Analysis

2022-08-07 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-4546:
--
Fix Version/s: 0.13.0

> Optimize catalog cast logic in HoodieSpark3Analysis
> ---
>
> Key: HUDI-4546
> URL: https://issues.apache.org/jira/browse/HUDI-4546
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> In HoodieSpark3Analysis, if it is CreateV2Table, there is no need to cast the 
> HoodieCatalog since CreateV2Table contains TableCatalog and we would use it 
> directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-4514) optimize CTAS or saveAsTable in different modes

2022-08-07 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-4514.
-
Resolution: Fixed

> optimize CTAS or saveAsTable in different modes
> ---
>
> Key: HUDI-4514
> URL: https://issues.apache.org/jira/browse/HUDI-4514
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> https://github.com/apache/hudi/issues/5904



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-4544) support retain hour cleaning policy for flink

2022-08-07 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-4544.
-
Resolution: Done

> support retain hour cleaning policy for flink
> -
>
> Key: HUDI-4544
> URL: https://issues.apache.org/jira/browse/HUDI-4544
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: yonghua jian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4514) optimize CTAS or saveAsTable in different modes

2022-08-07 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-4514:
--
Fix Version/s: 0.13.0

> optimize CTAS or saveAsTable in different modes
> ---
>
> Key: HUDI-4514
> URL: https://issues.apache.org/jira/browse/HUDI-4514
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> https://github.com/apache/hudi/issues/5904



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] yihua commented on issue #6315: [SUPPORT]

2022-08-07 Thread GitBox



yihua commented on issue #6315:
URL: https://github.com/apache/hudi/issues/6315#issuecomment-1207682249

   There could be issues with MOR incremental query in Hudi 0.7.0.  Since then 
MOR incremental reads have been improved. Have you tried Hudi 0.11.1 or the 
latest master to see if the problem still exists in your case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6321: [SUPPORT]Fail to write/merge when missing any nested fields in a struct. [An error occurred while calling o185.save. Can't redefine: element]

2022-08-07 Thread GitBox



yihua commented on issue #6321:
URL: https://github.com/apache/hudi/issues/6321#issuecomment-1207679679

   @gtwuser could you provide the Hudi configs you used?  Which Hudi payload 
class do you use?
   
   By default, if the new batch of data only has the partial schema and 
corresponding fields compared to the existing table, it causes undefined 
behavior.  @xiarixiaoyao does schema evolution in Hudi Spark support this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4106) Identify out of the box default performance flips for spark-sql

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4106:
-
Epic Link: HUDI-3249

> Identify out of the box default performance flips for spark-sql
> ---
>
> Key: HUDI-4106
> URL: https://issues.apache.org/jira/browse/HUDI-4106
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>
> We had HUDI-2151 to track performance flips, but its been 1 year that we 
> combed through all configs. Lets do another round of combing through all 
> configs and come up with a new list to flip. 
> this ticket specifically tracks spark-sql layer configs. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4105) Identify out of the box performance config flips for spark-ds

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4105:
-
Epic Link: HUDI-3249

> Identify out of the box performance config flips for spark-ds
> -
>
> Key: HUDI-4105
> URL: https://issues.apache.org/jira/browse/HUDI-4105
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: configs
>Reporter: sivabalan narayanan
>Priority: Major
>
> we need to identify out of the box performance flips. Refer to HUDI-2151 for 
> older ticket. But we need to comb through all configs once again and come up 
> with an updated list. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-940) Audit bad/dangling configs and code

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-940:

Sprint: 2022/05/16, 2022/05/31, 2022/09/19  (was: 2022/05/16, 2022/05/31, 
2022/08/08)

> Audit bad/dangling configs and code 
> 
>
> Key: HUDI-940
> URL: https://issues.apache.org/jira/browse/HUDI-940
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Common Core
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Raymond Xu
>Priority: Critical
> Fix For: 0.13.0
>
>
> Motivation : Avoid bad configs like the one fixed in  
> [https://github.com/apache/hudi/pull/1654]
> We need to take a pass on the code to remove dead/bad configs and code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4307) Document version where replaced filegroups arennot being filtered out

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4307:
-
Priority: Major  (was: Critical)

> Document version where replaced filegroups arennot being filtered out
> -
>
> Key: HUDI-4307
> URL: https://issues.apache.org/jira/browse/HUDI-4307
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 0.12.0
>
>
> See the bug in HUDI-4290
> Presto queries using version 0.272 or later (until it is patched) may contain 
> duplicates in results if clustering is enabled. We should document this in 
> https://hudi.apache.org/docs/query_engine_setup#prestodb



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-940) Audit bad/dangling configs and code

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-940:

Fix Version/s: 0.13.0
   (was: 0.12.0)

> Audit bad/dangling configs and code 
> 
>
> Key: HUDI-940
> URL: https://issues.apache.org/jira/browse/HUDI-940
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Common Core
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Raymond Xu
>Priority: Critical
> Fix For: 0.13.0
>
>
> Motivation : Avoid bad configs like the one fixed in  
> [https://github.com/apache/hudi/pull/1654]
> We need to take a pass on the code to remove dead/bad configs and code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4417) Update Hudi Storage docs

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4417:
-
Priority: Major  (was: Critical)

> Update Hudi Storage docs
> 
>
> Key: HUDI-4417
> URL: https://issues.apache.org/jira/browse/HUDI-4417
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Vamshi Gudavarthi
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: Docs
> Fix For: 0.12.0
>
>
> Please update these docs as they seemed stale 
> https://hudi.apache.org/docs/cloud



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4560) [DOCS] Update default value for partition extractor and note about infer function

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4560:
-
Priority: Major  (was: Blocker)

> [DOCS] Update default value for partition extractor and note about infer 
> function
> -
>
> Key: HUDI-4560
> URL: https://issues.apache.org/jira/browse/HUDI-4560
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 0.12.0
>
>
> See https://github.com/apache/hudi/pull/6310



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4441) Disbale INFO level logs from tests

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4441:
-
Priority: Critical  (was: Major)

> Disbale INFO level logs from tests
> --
>
> Key: HUDI-4441
> URL: https://issues.apache.org/jira/browse/HUDI-4441
> Project: Apache Hudi
>  Issue Type: Task
>  Components: dependencies
>Reporter: Sagar Sumit
>Assignee: Timothy Brown
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> Since the log4j1-2 bridge upgrade, we have noticed that CI runs are logging 
> INFO level logs despite the min level set as WARN in all 
> log4j-sure.properties. To reproduce the issue just run any test locally and 
> you should see INFO level logs. This creates unnecessary noise and painful to 
> debug failures. We need to fix this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4549) hive sync bundle causes class loader issue

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4549:
-
Fix Version/s: 0.12.1
   (was: 0.12.0)

> hive sync bundle causes class loader issue
> --
>
> Key: HUDI-4549
> URL: https://issues.apache.org/jira/browse/HUDI-4549
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Raymond Xu
>Priority: Critical
> Fix For: 0.12.1
>
>
> A weird classpath issue i found: when testing deltastreamer using 
> hudi-utilities-slim-bundle, if i put --jars 
> hudi-hive-sync-bundle.jar,hudi-spark-bundle.jar then i’ll get this error when 
> writing
> {code:java}
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.hudi.avro.MercifulJsonConverter.convert(Ljava/lang/String;Lorg/apache/avro/Schema;)Lorg/apache/avro/generic/GenericRecord;
>   at 
> org.apache.hudi.utilities.sources.helpers.AvroConvertor.fromJson(AvroConvertor.java:86)
>   at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
> {code}
> if i put the spark bundle before the hive sync bundle, then no issue. Without 
> hive-sync-bundle, also no issue. So hive-sync-bundle somehow messes up with 
> classpath? not sure why it reports a hudi-common API not found… caused by 
> shading avro?
> the same behavior i observed with aws-bundle, which makes sense, as it’s a 
> superset of hive-sync-bundle



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4503) Support table identifier with explicit catalog

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4503:
-
Sprint: 2022/08/08

> Support table identifier with explicit catalog
> --
>
> Key: HUDI-4503
> URL: https://issues.apache.org/jira/browse/HUDI-4503
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark, spark-sql
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4549) hive sync bundle causes class loader issue

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4549:
-
Sprint: 2022/08/08

> hive sync bundle causes class loader issue
> --
>
> Key: HUDI-4549
> URL: https://issues.apache.org/jira/browse/HUDI-4549
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Raymond Xu
>Priority: Critical
> Fix For: 0.12.1
>
>
> A weird classpath issue i found: when testing deltastreamer using 
> hudi-utilities-slim-bundle, if i put --jars 
> hudi-hive-sync-bundle.jar,hudi-spark-bundle.jar then i’ll get this error when 
> writing
> {code:java}
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.hudi.avro.MercifulJsonConverter.convert(Ljava/lang/String;Lorg/apache/avro/Schema;)Lorg/apache/avro/generic/GenericRecord;
>   at 
> org.apache.hudi.utilities.sources.helpers.AvroConvertor.fromJson(AvroConvertor.java:86)
>   at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
> {code}
> if i put the spark bundle before the hive sync bundle, then no issue. Without 
> hive-sync-bundle, also no issue. So hive-sync-bundle somehow messes up with 
> classpath? not sure why it reports a hudi-common API not found… caused by 
> shading avro?
> the same behavior i observed with aws-bundle, which makes sense, as it’s a 
> superset of hive-sync-bundle



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4503) Support table identifier with explicit catalog

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4503:
-
Fix Version/s: 0.12.1

> Support table identifier with explicit catalog
> --
>
> Key: HUDI-4503
> URL: https://issues.apache.org/jira/browse/HUDI-4503
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark, spark-sql
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4441) Disbale INFO level logs from tests

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4441:
-
Component/s: dependencies

> Disbale INFO level logs from tests
> --
>
> Key: HUDI-4441
> URL: https://issues.apache.org/jira/browse/HUDI-4441
> Project: Apache Hudi
>  Issue Type: Task
>  Components: dependencies
>Reporter: Sagar Sumit
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> Since the log4j1-2 bridge upgrade, we have noticed that CI runs are logging 
> INFO level logs despite the min level set as WARN in all 
> log4j-sure.properties. To reproduce the issue just run any test locally and 
> you should see INFO level logs. This creates unnecessary noise and painful to 
> debug failures. We need to fix this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4441) Disbale INFO level logs from tests

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4441:
-
Sprint: 2022/08/08

> Disbale INFO level logs from tests
> --
>
> Key: HUDI-4441
> URL: https://issues.apache.org/jira/browse/HUDI-4441
> Project: Apache Hudi
>  Issue Type: Task
>  Components: dependencies
>Reporter: Sagar Sumit
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> Since the log4j1-2 bridge upgrade, we have noticed that CI runs are logging 
> INFO level logs despite the min level set as WARN in all 
> log4j-sure.properties. To reproduce the issue just run any test locally and 
> you should see INFO level logs. This creates unnecessary noise and painful to 
> debug failures. We need to fix this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4441) Disbale INFO level logs from tests

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4441:
-
Fix Version/s: 0.12.1

> Disbale INFO level logs from tests
> --
>
> Key: HUDI-4441
> URL: https://issues.apache.org/jira/browse/HUDI-4441
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> Since the log4j1-2 bridge upgrade, we have noticed that CI runs are logging 
> INFO level logs despite the min level set as WARN in all 
> log4j-sure.properties. To reproduce the issue just run any test locally and 
> you should see INFO level logs. This creates unnecessary noise and painful to 
> debug failures. We need to fix this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6319: [HUDI-4556] Improve functional test coverage of column stats index

2022-08-07 Thread GitBox



hudi-bot commented on PR #6319:
URL: https://github.com/apache/hudi/pull/6319#issuecomment-1207670624

   
   ## CI report:
   
   * f8148735464d99d6d7a8531bcf72d5b6850553db Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10626)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-4566) Docs writing for 0.12.0: config updates

2022-08-07 Thread Raymond Xu (Jira)

Raymond Xu created HUDI-4566:


 Summary: Docs writing for 0.12.0: config updates
 Key: HUDI-4566
 URL: https://issues.apache.org/jira/browse/HUDI-4566
 Project: Apache Hudi
  Issue Type: Task
Reporter: Raymond Xu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4566) Docs writing for 0.12.0: config updates

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4566:
-
Reviewers: Raymond Xu

> Docs writing for 0.12.0: config updates
> ---
>
> Key: HUDI-4566
> URL: https://issues.apache.org/jira/browse/HUDI-4566
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4566) Docs writing for 0.12.0: config updates

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4566:
-
Story Points: 1

> Docs writing for 0.12.0: config updates
> ---
>
> Key: HUDI-4566
> URL: https://issues.apache.org/jira/browse/HUDI-4566
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4566) Docs writing for 0.12.0: config updates

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-4566:


Assignee: Sagar Sumit

> Docs writing for 0.12.0: config updates
> ---
>
> Key: HUDI-4566
> URL: https://issues.apache.org/jira/browse/HUDI-4566
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4565) Docs writing for 0.12.0: archival beyond savepoint, bundle changes, presto update

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4565:
-
Summary: Docs writing for 0.12.0: archival beyond savepoint, bundle 
changes, presto update  (was: Docs writing for 0.12.0: archival beyond 
savepoint and bundle changes)

> Docs writing for 0.12.0: archival beyond savepoint, bundle changes, presto 
> update
> -
>
> Key: HUDI-4565
> URL: https://issues.apache.org/jira/browse/HUDI-4565
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Raymond Xu
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4565) Docs writing for 0.12.0: archival beyond savepoint and bundle changes

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4565:
-
Story Points: 1

> Docs writing for 0.12.0: archival beyond savepoint and bundle changes
> -
>
> Key: HUDI-4565
> URL: https://issues.apache.org/jira/browse/HUDI-4565
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4565) Docs writing for 0.12.0: archival beyond savepoint and bundle changes

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4565:
-
Component/s: docs

> Docs writing for 0.12.0: archival beyond savepoint and bundle changes
> -
>
> Key: HUDI-4565
> URL: https://issues.apache.org/jira/browse/HUDI-4565
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Raymond Xu
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4565) Docs writing for 0.12.0: archival beyond savepoint and bundle changes

2022-08-07 Thread Raymond Xu (Jira)

Raymond Xu created HUDI-4565:


 Summary: Docs writing for 0.12.0: archival beyond savepoint and 
bundle changes
 Key: HUDI-4565
 URL: https://issues.apache.org/jira/browse/HUDI-4565
 Project: Apache Hudi
  Issue Type: Task
Reporter: Raymond Xu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4565) Docs writing for 0.12.0: archival beyond savepoint and bundle changes

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4565:
-
Reviewers: Raymond Xu

> Docs writing for 0.12.0: archival beyond savepoint and bundle changes
> -
>
> Key: HUDI-4565
> URL: https://issues.apache.org/jira/browse/HUDI-4565
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4565) Docs writing for 0.12.0: archival beyond savepoint and bundle changes

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-4565:


Assignee: Sagar Sumit

> Docs writing for 0.12.0: archival beyond savepoint and bundle changes
> -
>
> Key: HUDI-4565
> URL: https://issues.apache.org/jira/browse/HUDI-4565
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4564) Docs writing for 0.12.0: spark 3.3 support and data skipping

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4564:
-
Story Points: 1

> Docs writing for 0.12.0: spark 3.3 support and data skipping
> 
>
> Key: HUDI-4564
> URL: https://issues.apache.org/jira/browse/HUDI-4564
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4564) Docs writing for 0.12.0: spark 3.3 support and data skipping

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4564:
-
Reviewers: Sagar Sumit

> Docs writing for 0.12.0: spark 3.3 support and data skipping
> 
>
> Key: HUDI-4564
> URL: https://issues.apache.org/jira/browse/HUDI-4564
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4564) Docs writing for 0.12.0: spark 3.3 support and data skipping

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-4564:


Assignee: Ethan Guo

> Docs writing for 0.12.0: spark 3.3 support and data skipping
> 
>
> Key: HUDI-4564
> URL: https://issues.apache.org/jira/browse/HUDI-4564
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4563) Docs writing for 0.12.0: key gen API change and perf improvements

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4563:
-
Story Points: 1  (was: 3)

> Docs writing for 0.12.0: key gen API change and perf improvements
> -
>
> Key: HUDI-4563
> URL: https://issues.apache.org/jira/browse/HUDI-4563
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: Alexey Kudinkin
>Priority: Major
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4563) Docs writing for 0.12.0: key gen API change and perf improvements

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-4563:


Assignee: Alexey Kudinkin

> Docs writing for 0.12.0: key gen API change and perf improvements
> -
>
> Key: HUDI-4563
> URL: https://issues.apache.org/jira/browse/HUDI-4563
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: Alexey Kudinkin
>Priority: Major
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4564) Docs writing for 0.12.0: spark 3.3 support and data skipping

2022-08-07 Thread Raymond Xu (Jira)

Raymond Xu created HUDI-4564:


 Summary: Docs writing for 0.12.0: spark 3.3 support and data 
skipping
 Key: HUDI-4564
 URL: https://issues.apache.org/jira/browse/HUDI-4564
 Project: Apache Hudi
  Issue Type: Task
Reporter: Raymond Xu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6319: [HUDI-4556] Improve functional test coverage of column stats index

2022-08-07 Thread GitBox



hudi-bot commented on PR #6319:
URL: https://github.com/apache/hudi/pull/6319#issuecomment-1207667963

   
   ## CI report:
   
   * f8148735464d99d6d7a8531bcf72d5b6850553db UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4563) Docs writing for 0.12.0: key gen API change and perf improvements

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4563:
-
Summary: Docs writing for 0.12.0: key gen API change and perf improvements  
(was: Docs writing for 0.12.0)

> Docs writing for 0.12.0: key gen API change and perf improvements
> -
>
> Key: HUDI-4563
> URL: https://issues.apache.org/jira/browse/HUDI-4563
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-08-07 Thread GitBox



hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1207665227

   
   ## CI report:
   
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 8bd34a6bee3084bdc6029f3c0740cf06906acfd5 UNKNOWN
   * a80d4bdd93c349b09b6e640dd2229379f2173ff0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10661)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] chenshzh commented on pull request #6121: [HUDI-4406] Support Flink compaction commit write error resolvement to avoid data loss

2022-08-07 Thread GitBox



chenshzh commented on PR #6121:
URL: https://github.com/apache/hudi/pull/6121#issuecomment-1207662818

   > > @danny0405 would you pls help see this pr for avoiding data loss during 
compaction due to some write errors such as non-thrown exceptions ?
   > 
   > What kind of exceptions in your production caused the write handle to 
encounter exceptions then ? 
   
   @danny0405 Just as described in the opening part，we have encountered data 
loss during compaction. And after analyzing the code below and validating it 
online, we found that some Exceptions are catched during 
HoodieMergeHandle#writeRecord, then the writing handle process is interrupted, 
and only marked as failures in WriteStatus, therefore causes data loss from 
delta logs to data files. The exceptions might be like IOException from 
HoodieFileWriter#writeToFile, HoodieUpsertException, and so on.
   
   ```java
 protected boolean writeRecord(HoodieRecord hoodieRecord, 
Option indexedRecord, boolean isDelete) {
   Option recordMetadata = hoodieRecord.getData().getMetadata();
   if (!partitionPath.equals(hoodieRecord.getPartitionPath())) {
 HoodieUpsertException failureEx = new 
HoodieUpsertException("mismatched partition path, record partition: "
 + hoodieRecord.getPartitionPath() + " but trying to insert into 
partition: " + partitionPath);
 writeStatus.markFailure(hoodieRecord, failureEx, recordMetadata);
 return false;
   }
   try {
 if (indexedRecord.isPresent() && !isDelete) {
   writeToFile(hoodieRecord.getKey(), (GenericRecord) 
indexedRecord.get(), preserveMetadata && useWriterSchemaForCompaction);
   recordsWritten++;
 } else {
   recordsDeleted++;
 }
 writeStatus.markSuccess(hoodieRecord, recordMetadata);
 // deflate record payload after recording success. This will help 
users access payload as a
 // part of marking
 // record successful.
 hoodieRecord.deflate();
 return true;
   } catch (Exception e) {
 LOG.error("Error writing record  " + hoodieRecord, e);
 writeStatus.markFailure(hoodieRecord, e, recordMetadata);
   }
   return false;
 }
   ```
   
   > And what would you want to do when you encounter that ?
   
   What we want to do has been detailedly described in the opening part as a) 
-> d). 
   
   Compaction completed commit will actually be regarded as the snapshot and 
affect the latest file slices, so it's better to take the writestatus errors 
under consideration when deciding to commit or rollback for compaction (Just as 
the StreamWriteOperatorCoordinator done for deltacommit showed below). 
   
   In a word, compaction commit with errors should be rolled back, warned and 
retried for next schedule if we think data quality is more important than job 
stablility.
   
   org.apache.hudi.sink.StreamWriteOperatorCoordinator#doCommit
   ```java
 private void doCommit(String instant, List writeResults) {
   // commit or rollback
   long totalErrorRecords = 
writeResults.stream().map(WriteStatus::getTotalErrorRecords).reduce(Long::sum).orElse(0L);
   long totalRecords = 
writeResults.stream().map(WriteStatus::getTotalRecords).reduce(Long::sum).orElse(0L);
   boolean hasErrors = totalErrorRecords > 0;
   
   if (!hasErrors || this.conf.getBoolean(FlinkOptions.IGNORE_FAILED)) {
 HashMap checkpointCommitMetadata = new HashMap<>();
 if (hasErrors) {
   LOG.warn("Some records failed to merge but forcing commit since 
commitOnErrors set to true. Errors/Total="
   + totalErrorRecords + "/" + totalRecords);
 }
   
 final Map> partitionToReplacedFileIds = 
tableState.isOverwrite
 ? 
writeClient.getPartitionToReplacedFileIds(tableState.operationType, 
writeResults)
 : Collections.emptyMap();
 boolean success = writeClient.commit(instant, writeResults, 
Option.of(checkpointCommitMetadata),
 tableState.commitAction, partitionToReplacedFileIds);
 if (success) {
   reset();
   this.ckpMetadata.commitInstant(instant);
   LOG.info("Commit instant [{}] success!", instant);
 } else {
   throw new HoodieException(String.format("Commit instant [%s] 
failed!", instant));
 }
   } else {
 LOG.error("Error when writing. Errors/Total=" + totalErrorRecords + 
"/" + totalRecords);
 LOG.error("The first 100 error messages");
 
writeResults.stream().filter(WriteStatus::hasErrors).limit(100).forEach(ws -> {
   LOG.error("Global error for partition path {} and fileID {}: {}",
   ws.getGlobalError(), ws.getPartitionPath(), ws.getFileId());
   if (ws.getErrors().size() > 0) {
 ws.getErrors().forEach((key, value) -> LOG.trace("Error for key:" 
+ key + " and value " + value));
   }
 });
 // Rolls back instant

[jira] [Commented] (HUDI-3973) Implement GENERATE manifest command for Snowflake integration

2022-08-07 Thread Raymond Xu (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576538#comment-17576538
 ] 

Raymond Xu commented on HUDI-3973:
--

[~vino] [~joyansil] Is this work required for snowflake integration? if so, any 
plan to complete this?

> Implement GENERATE manifest command for Snowflake integration
> -
>
> Key: HUDI-3973
> URL: https://issues.apache.org/jira/browse/HUDI-3973
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Joyan Sil
>Assignee: Joyan Sil
>Priority: Major
> Fix For: 0.13.0
>
>
> Implement GENERATE manifest command for Snowflake integration



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3973) Implement GENERATE manifest command for Snowflake integration

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3973:
-
Fix Version/s: 0.13.0
   (was: 0.12.0)

> Implement GENERATE manifest command for Snowflake integration
> -
>
> Key: HUDI-3973
> URL: https://issues.apache.org/jira/browse/HUDI-3973
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Joyan Sil
>Assignee: Joyan Sil
>Priority: Major
> Fix For: 0.13.0
>
>
> Implement GENERATE manifest command for Snowflake integration



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4137) Implement SnowflakeSyncTool to support Hudi to Snowflake Integration

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4137:
-
Sprint: 2022/08/08

> Implement SnowflakeSyncTool to support Hudi to Snowflake Integration
> 
>
> Key: HUDI-4137
> URL: https://issues.apache.org/jira/browse/HUDI-4137
> Project: Apache Hudi
>  Issue Type: Task
>  Components: meta-sync
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Major
>  Labels: integration, pull-request-available
> Fix For: 0.13.0
>
>
> Implement SnowflakeSyncTool similar to the BigQuerySyncTool to support Hudi 
> to Snowflake Integration



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4254) Refactor SnowflakeSyncTool and BigQuerySyncTool

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4254:
-
Sprint: 2022/08/08

> Refactor SnowflakeSyncTool and BigQuerySyncTool
> ---
>
> Key: HUDI-4254
> URL: https://issues.apache.org/jira/browse/HUDI-4254
> Project: Apache Hudi
>  Issue Type: Task
>  Components: meta-sync
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Major
> Fix For: 0.13.0
>
>
> There are many similarities between SnowflakeSyncTool and BigQuerySyncTool, 
> refactor the common methods to create an Abstract class then use the same to 
> avoid code duplication.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-2832) [Umbrella] [RFC-40] Integrated Hudi with Snowflake

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2832:
-
Story Points: 0  (was: 0.5)

> [Umbrella] [RFC-40] Integrated Hudi with Snowflake 
> ---
>
> Key: HUDI-2832
> URL: https://issues.apache.org/jira/browse/HUDI-2832
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Common Core
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Critical
>  Labels: BigQuery, Integration, pull-request-available
> Fix For: 0.13.0
>
>
> Snowflake is a fully managed service that’s simple to use but can power a 
> near-unlimited number of concurrent workloads. Snowflake is a solution for 
> data warehousing, data lakes, data engineering, data science, data 
> application development, and securely sharing and consuming shared data. 
> Snowflake [doesn’t 
> support|https://docs.snowflake.com/en/sql-reference/sql/alter-file-format.html]
>  Apache Hudi file format yet, but it has support for Parquet, ORC, and Delta 
> file format. This proposal is to implement a SnowflakeSync similar to 
> HiveSync to sync the Hudi table as the Snowflake External Parquet table so 
> that users can query the Hudi tables using Snowflake. Many users have 
> expressed interest in Hudi and other support channels asking to integrate 
> Hudi with Snowflake, this will unlock new use cases for Hudi.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3973) Implement GENERATE manifest command for Snowflake integration

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3973:
-
Sprint: 2022/08/08

> Implement GENERATE manifest command for Snowflake integration
> -
>
> Key: HUDI-3973
> URL: https://issues.apache.org/jira/browse/HUDI-3973
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Joyan Sil
>Assignee: Joyan Sil
>Priority: Major
> Fix For: 0.12.0
>
>
> Implement GENERATE manifest command for Snowflake integration



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4254) Refactor SnowflakeSyncTool and BigQuerySyncTool

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4254:
-
Component/s: meta-sync

> Refactor SnowflakeSyncTool and BigQuerySyncTool
> ---
>
> Key: HUDI-4254
> URL: https://issues.apache.org/jira/browse/HUDI-4254
> Project: Apache Hudi
>  Issue Type: Task
>  Components: meta-sync
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Major
> Fix For: 0.13.0
>
>
> There are many similarities between SnowflakeSyncTool and BigQuerySyncTool, 
> refactor the common methods to create an Abstract class then use the same to 
> avoid code duplication.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4137) Implement SnowflakeSyncTool to support Hudi to Snowflake Integration

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4137:
-
Component/s: meta-sync

> Implement SnowflakeSyncTool to support Hudi to Snowflake Integration
> 
>
> Key: HUDI-4137
> URL: https://issues.apache.org/jira/browse/HUDI-4137
> Project: Apache Hudi
>  Issue Type: Task
>  Components: meta-sync
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Major
>  Labels: integration, pull-request-available
> Fix For: 0.13.0
>
>
> Implement SnowflakeSyncTool similar to the BigQuerySyncTool to support Hudi 
> to Snowflake Integration



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4254) Refactor SnowflakeSyncTool and BigQuerySyncTool

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4254:
-
Fix Version/s: 0.13.0

> Refactor SnowflakeSyncTool and BigQuerySyncTool
> ---
>
> Key: HUDI-4254
> URL: https://issues.apache.org/jira/browse/HUDI-4254
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Major
> Fix For: 0.13.0
>
>
> There are many similarities between SnowflakeSyncTool and BigQuerySyncTool, 
> refactor the common methods to create an Abstract class then use the same to 
> avoid code duplication.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-2832) [Umbrella] [RFC-40] Integrated Hudi with Snowflake

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2832:
-
Story Points: 0.5  (was: 0)

> [Umbrella] [RFC-40] Integrated Hudi with Snowflake 
> ---
>
> Key: HUDI-2832
> URL: https://issues.apache.org/jira/browse/HUDI-2832
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Common Core
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Critical
>  Labels: BigQuery, Integration, pull-request-available
> Fix For: 0.13.0
>
>
> Snowflake is a fully managed service that’s simple to use but can power a 
> near-unlimited number of concurrent workloads. Snowflake is a solution for 
> data warehousing, data lakes, data engineering, data science, data 
> application development, and securely sharing and consuming shared data. 
> Snowflake [doesn’t 
> support|https://docs.snowflake.com/en/sql-reference/sql/alter-file-format.html]
>  Apache Hudi file format yet, but it has support for Parquet, ORC, and Delta 
> file format. This proposal is to implement a SnowflakeSync similar to 
> HiveSync to sync the Hudi table as the Snowflake External Parquet table so 
> that users can query the Hudi tables using Snowflake. Many users have 
> expressed interest in Hudi and other support channels asking to integrate 
> Hudi with Snowflake, this will unlock new use cases for Hudi.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3654) Support basic actions based on hudi metastore server

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3654:
-
Fix Version/s: 0.13.0
   (was: 0.12.0)

> Support basic actions based on hudi metastore server 
> -
>
> Key: HUDI-3654
> URL: https://issues.apache.org/jira/browse/HUDI-3654
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: metadata
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4428) Make the meta server configurable when starting

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4428:
-
Sprint: 2022/08/08

> Make the meta server configurable when starting
> ---
>
> Key: HUDI-4428
> URL: https://issues.apache.org/jira/browse/HUDI-4428
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3654) Support basic actions based on hudi metastore server

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3654:
-
Sprint: 2022/08/08

> Support basic actions based on hudi metastore server 
> -
>
> Key: HUDI-3654
> URL: https://issues.apache.org/jira/browse/HUDI-3654
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: metadata
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4428) Make the meta server configurable when starting

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4428:
-
Fix Version/s: 0.13.0

> Make the meta server configurable when starting
> ---
>
> Key: HUDI-4428
> URL: https://issues.apache.org/jira/browse/HUDI-4428
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3345) [Umbrella] Hudi metastore server

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3345:
-
Fix Version/s: 0.13.0

> [Umbrella] Hudi metastore server
> 
>
> Key: HUDI-3345
> URL: https://issues.apache.org/jira/browse/HUDI-3345
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: writer-core
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3345) [Umbrella] Hudi metastore server

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3345:
-
Priority: Blocker  (was: Major)

> [Umbrella] Hudi metastore server
> 
>
> Key: HUDI-3345
> URL: https://issues.apache.org/jira/browse/HUDI-3345
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: writer-core
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-246) Apache Pulsar data source for Hudi DeltaStreamer

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-246:

Sprint: 2022/08/08

> Apache Pulsar data source for Hudi DeltaStreamer
> 
>
> Key: HUDI-246
> URL: https://issues.apache.org/jira/browse/HUDI-246
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: deltastreamer
>Reporter: Taher Koitawala
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> [Apache Pulsar|https://pulsar.apache.org/en/] is a pub/sub messaging system 
> like Kafka, with a lot of new features like multiple subscription modes, out 
> of the box service discovery etc. The goal here is to add Pulsar as a data 
> source to DeltaStreamer. To get started please follow [Pulsar adaptor for 
> Apache Spark|https://pulsar.apache.org/docs/en/adaptors-spark/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-246) Apache Pulsar data source for Hudi DeltaStreamer

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-246:
---

Assignee: Alexey Kudinkin  (was: Vinoth Chandar)

> Apache Pulsar data source for Hudi DeltaStreamer
> 
>
> Key: HUDI-246
> URL: https://issues.apache.org/jira/browse/HUDI-246
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: deltastreamer
>Reporter: Taher Koitawala
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> [Apache Pulsar|https://pulsar.apache.org/en/] is a pub/sub messaging system 
> like Kafka, with a lot of new features like multiple subscription modes, out 
> of the box service discovery etc. The goal here is to add Pulsar as a data 
> source to DeltaStreamer. To get started please follow [Pulsar adaptor for 
> Apache Spark|https://pulsar.apache.org/docs/en/adaptors-spark/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-2939) [DOC] Add description of write commit callback by pulsar to document

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2939.

Resolution: Done

> [DOC] Add description of write commit callback by pulsar to document
> 
>
> Key: HUDI-2939
> URL: https://issues.apache.org/jira/browse/HUDI-2939
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Major
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-2937) Introduce a pulsar implementation of hoodie write commit callback

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2937.

Resolution: Done

> Introduce a pulsar implementation of hoodie write commit callback
> -
>
> Key: HUDI-2937
> URL: https://issues.apache.org/jira/browse/HUDI-2937
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-246) Apache Pulsar data source for Hudi DeltaStreamer

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-246:

Fix Version/s: 0.13.0
   (was: 0.12.0)

> Apache Pulsar data source for Hudi DeltaStreamer
> 
>
> Key: HUDI-246
> URL: https://issues.apache.org/jira/browse/HUDI-246
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: deltastreamer
>Reporter: Taher Koitawala
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> [Apache Pulsar|https://pulsar.apache.org/en/] is a pub/sub messaging system 
> like Kafka, with a lot of new features like multiple subscription modes, out 
> of the box service discovery etc. The goal here is to add Pulsar as a data 
> source to DeltaStreamer. To get started please follow [Pulsar adaptor for 
> Apache Spark|https://pulsar.apache.org/docs/en/adaptors-spark/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3958) Resolve parquet-avro conflict in hudi-gcp-bundle and hudi-spark3.1-bundle

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3958:
-
Story Points: 1

> Resolve parquet-avro conflict in hudi-gcp-bundle and hudi-spark3.1-bundle
> -
>
> Key: HUDI-3958
> URL: https://issues.apache.org/jira/browse/HUDI-3958
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
> Fix For: 0.12.0
>
>
> In gcp bundle (master version) we include parquet-avro, which results in 
> issue running in dataproc 2.0.34-ubuntu18 with spark3.1-bundle and 
> utilities-slim bundle
> {code:text}
> 22/04/23 15:02:14 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 
> 0.0 in stage 36.0 (TID 93) 
> (cluster-4275-m.asia-southeast1-a.c.hudi-bq.internal executor 1): 
> java.lang.RuntimeException: org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: 
> org/apache/parquet/schema/LogicalTypeAnnotation$UUIDLogicalTypeAnnotation
>   at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46)
>   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
>   at 
> org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
>   at 
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
>   at 
> org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1440)
>   at 
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
>   at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
>   at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
>   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: 
> org/apache/parquet/schema/LogicalTypeAnnotation$UUIDLogicalTypeAnnotation
>   at 
> org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:94)
>   at 
> org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:37)
>   at 
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
>   ... 22 more
> Caused by: org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: 
> org/apache/parquet/schema/LogicalTypeAnnotation$UUIDLogicalTypeAnnotation
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:160)
>   at 
> org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:90)
>   ... 24 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NoClassDefFoundError: 
> org/apache/parquet/schema/LogicalTypeAnnotation$UUIDLogicalTypeAnnotation
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:154)
>   ... 25 more
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/parquet/schema/LogicalTypeAnnotation$UUIDLogicalTypeAnnotation
>   at 
> org.apache.hudi.io.storage.HoodieFileWriterFactory.newParquetFileWriter(HoodieFileWriterFactory.java:78)
>   at 
>

[GitHub] [hudi] RexXiong commented on a diff in pull request #5250: [HUDI-3817] shade parquet dependency for hudi-hadoop-mr-bundle

2022-08-07 Thread GitBox



RexXiong commented on code in PR #5250:
URL: https://github.com/apache/hudi/pull/5250#discussion_r939809114


##
packaging/hudi-hadoop-mr-bundle/pom.xml:
##
@@ -67,8 +67,9 @@
 
   org.apache.hudi:hudi-common
   org.apache.hudi:hudi-hadoop-mr
-
+  
   org.apache.parquet:parquet-avro
+  org.apache.parquet:parquet-hadoop-bundle

Review Comment:
   @xushiyan test seems  parquet-avro 1.10.x is compatible with parquet-hadoop 
1.8.1. So I will specify the parquet version of parquet-avro for 
hadoop-mr-bundle, and this solution was also the first proposed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a diff in pull request #6320: [HUDI-4558] lost 'hoodie.table.keygenerator.class' in hoodie.properties

2022-08-07 Thread GitBox



danny0405 commented on code in PR #6320:
URL: https://github.com/apache/hudi/pull/6320#discussion_r939808811


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##
@@ -875,6 +883,33 @@ public static  boolean 
isDefaultValueDefined(Configuration conf, ConfigOption
 || conf.get(option).equals(option.defaultValue());
   }
 
+  public static String getKeyGenClassNameByType(Configuration conf) {
+String genType = conf.get(FlinkOptions.KEYGEN_TYPE);

Review Comment:
   Can giving the option `KEYGEN_CLASS_NAME` a default value: 
`SimpleAvroKeyGenerator`
   solves your problem ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-3687) Make sure CI run tests against all Spark versions

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-3687.

Resolution: Fixed

> Make sure CI run tests against all Spark versions
> -
>
> Key: HUDI-3687
> URL: https://issues.apache.org/jira/browse/HUDI-3687
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Alexey Kudinkin
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> Currently, CI only runs tests against Spark 2.4.4. Since we pledge to support 
> all patch versions of Spark w/in a particular supported minor version branch 
> of Spark (3.1, 3.2, etc), we need to run at Spark-related tests for all Spark 
> versions we're pledging the support for.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3687) Make sure CI run tests against all Spark versions

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3687:
-
Fix Version/s: 0.12.0
   (was: 0.13.0)

> Make sure CI run tests against all Spark versions
> -
>
> Key: HUDI-3687
> URL: https://issues.apache.org/jira/browse/HUDI-3687
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Alexey Kudinkin
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> Currently, CI only runs tests against Spark 2.4.4. Since we pledge to support 
> all patch versions of Spark w/in a particular supported minor version branch 
> of Spark (3.1, 3.2, etc), we need to run at Spark-related tests for all Spark 
> versions we're pledging the support for.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HUDI-3687) Make sure CI run tests against all Spark versions

2022-08-07 Thread Raymond Xu (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576526#comment-17576526
 ] 

Raymond Xu edited comment on HUDI-3687 at 8/8/22 4:10 AM:
--

Fixed in https://github.com/apache/hudi/pull/6279 and 
https://github.com/apache/hudi/pull/5943


was (Author: xushiyan):
Fixed in https://github.com/apache/hudi/pull/6279 and 
https://github.com/apache/hudi/pull/6279

> Make sure CI run tests against all Spark versions
> -
>
> Key: HUDI-3687
> URL: https://issues.apache.org/jira/browse/HUDI-3687
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Alexey Kudinkin
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Currently, CI only runs tests against Spark 2.4.4. Since we pledge to support 
> all patch versions of Spark w/in a particular supported minor version branch 
> of Spark (3.1, 3.2, etc), we need to run at Spark-related tests for all Spark 
> versions we're pledging the support for.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-3687) Make sure CI run tests against all Spark versions

2022-08-07 Thread Raymond Xu (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576526#comment-17576526
 ] 

Raymond Xu commented on HUDI-3687:
--

Fixed in https://github.com/apache/hudi/pull/6279 and 
https://github.com/apache/hudi/pull/6279

> Make sure CI run tests against all Spark versions
> -
>
> Key: HUDI-3687
> URL: https://issues.apache.org/jira/browse/HUDI-3687
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Alexey Kudinkin
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Currently, CI only runs tests against Spark 2.4.4. Since we pledge to support 
> all patch versions of Spark w/in a particular supported minor version branch 
> of Spark (3.1, 3.2, etc), we need to run at Spark-related tests for all Spark 
> versions we're pledging the support for.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3687) Make sure CI run tests against all Spark versions

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3687:
-
Reviewers: Alexey Kudinkin, Raymond Xu

> Make sure CI run tests against all Spark versions
> -
>
> Key: HUDI-3687
> URL: https://issues.apache.org/jira/browse/HUDI-3687
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Alexey Kudinkin
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Currently, CI only runs tests against Spark 2.4.4. Since we pledge to support 
> all patch versions of Spark w/in a particular supported minor version branch 
> of Spark (3.1, 3.2, etc), we need to run at Spark-related tests for all Spark 
> versions we're pledging the support for.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3687) Make sure CI run tests against all Spark versions

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3687:
-
Flagged:   (was: Impediment)

> Make sure CI run tests against all Spark versions
> -
>
> Key: HUDI-3687
> URL: https://issues.apache.org/jira/browse/HUDI-3687
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Alexey Kudinkin
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Currently, CI only runs tests against Spark 2.4.4. Since we pledge to support 
> all patch versions of Spark w/in a particular supported minor version branch 
> of Spark (3.1, 3.2, etc), we need to run at Spark-related tests for all Spark 
> versions we're pledging the support for.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4307) Document version where replaced filegroups arennot being filtered out

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4307:
-
Story Points: 0.5

> Document version where replaced filegroups arennot being filtered out
> -
>
> Key: HUDI-4307
> URL: https://issues.apache.org/jira/browse/HUDI-4307
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Critical
> Fix For: 0.12.0
>
>
> See the bug in HUDI-4290
> Presto queries using version 0.272 or later (until it is patched) may contain 
> duplicates in results if clustering is enabled. We should document this in 
> https://hudi.apache.org/docs/query_engine_setup#prestodb



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4417) Update Hudi Storage docs

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4417:
-
Story Points: 0.5

> Update Hudi Storage docs
> 
>
> Key: HUDI-4417
> URL: https://issues.apache.org/jira/browse/HUDI-4417
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Vamshi Gudavarthi
>Assignee: Sagar Sumit
>Priority: Critical
>  Labels: Docs
> Fix For: 0.12.0
>
>
> Please update these docs as they seemed stale 
> https://hudi.apache.org/docs/cloud



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-2517) Simplify the amount of configs that need to be passed in for Delta Streamer

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2517:
-
Sprint: 2022/05/16, 2022/05/31, 2022/09/19  (was: 2022/05/16, 2022/05/31, 
2022/08/08)

> Simplify the amount of configs that need to be passed in for Delta Streamer
> ---
>
> Key: HUDI-2517
> URL: https://issues.apache.org/jira/browse/HUDI-2517
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: code-quality, configs
>Reporter: Vinoth Chandar
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4560) [DOCS] Update default value for partition extractor and note about infer function

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4560:
-
Story Points: 0.5

> [DOCS] Update default value for partition extractor and note about infer 
> function
> -
>
> Key: HUDI-4560
> URL: https://issues.apache.org/jira/browse/HUDI-4560
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Blocker
> Fix For: 0.12.0
>
>
> See https://github.com/apache/hudi/pull/6310



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-2871) Decouple metrics dependencies from hudi-client-common

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2871:
-
Sprint: Hudi-Sprint-Mar-01, Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14, 
Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, 2022/05/16, 2022/05/31, 2022/09/19  
(was: Hudi-Sprint-Mar-01, Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14, 
Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, 2022/05/16, 2022/05/31, 2022/08/08)

> Decouple metrics dependencies from hudi-client-common
> -
>
> Key: HUDI-2871
> URL: https://issues.apache.org/jira/browse/HUDI-2871
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: code-quality, dependencies, metrics, writer-core
>Reporter: Vinoth Chandar
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> There are some metrics stuff  - Cloudwatch, graphite, prometheus etc are all 
> pulled in. 
> might be good to break these out into their own modules and include during 
> packaging. This needs some way of reflection based instantiation of the 
> Metrics reporter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3155) java.lang.NoSuchFieldError for logical timestamp types when run hive sync tool

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3155:
-
Sprint: 2022/05/16, 2022/05/31, 2022/09/19  (was: 2022/05/16, 2022/05/31, 
2022/08/08)

> java.lang.NoSuchFieldError for logical timestamp types when run hive sync tool
> --
>
> Key: HUDI-3155
> URL: https://issues.apache.org/jira/browse/HUDI-3155
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive, meta-sync
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 0.12.0
>
>
> https://github.com/apache/hudi/issues/4176
> Looks like parquet-column is not part of the bundle



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3649) Add HoodieTableConfig defaults to HoodieWriteConfig

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3649:
-
Sprint: 2022/05/16, 2022/05/31, 2022/09/19  (was: 2022/05/16, 2022/05/31, 
2022/08/08)

> Add HoodieTableConfig defaults to HoodieWriteConfig
> ---
>
> Key: HUDI-3649
> URL: https://issues.apache.org/jira/browse/HUDI-3649
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: configs
>Reporter: Ethan Guo
>Assignee: Sagar Sumit
>Priority: Major
> Fix For: 0.12.0
>
>
> HoodieWriteConfig does not set defaults from HoodieTableConfig.  We need to 
> see if some HoodieTableConfig defaults should be set in HoodieWriteConfig.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1101) Decouple Hive dependencies from hudi-spark

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1101:
-
Sprint: 2022/05/16, 2022/05/31, 2022/09/19  (was: 2022/05/16, 2022/05/31, 
2022/08/08)

> Decouple Hive dependencies from hudi-spark
> --
>
> Key: HUDI-1101
> URL: https://issues.apache.org/jira/browse/HUDI-1101
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Yanjia Gary Li
>Priority: Major
> Fix For: 0.12.0
>
>
> We have syncHive tool in both hudi-spark and hudi-utilities modules. This 
> might cause dependency conflict when the user don't use Hive at all. We could 
> move all the hive sync related method to hudi-hive-snyc module.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3952) Presto HadoopExtendedFileSystem now supports getScheme. Update presto version to 0.273 is docker setup and IT test

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3952:
-
Sprint: 2022/05/02, 2022/05/16, 2022/05/31, 2022/09/19  (was: 2022/05/02, 
2022/05/16, 2022/05/31, 2022/08/08)

> Presto HadoopExtendedFileSystem now supports getScheme. Update presto version 
> to 0.273 is docker setup and IT test
> --
>
> Key: HUDI-3952
> URL: https://issues.apache.org/jira/browse/HUDI-3952
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Sagar Sumit
>Priority: Blocker
> Fix For: 0.13.0
>
>
> When switching DEFAULT_METADATA_ENABLE_FOR_READERS to true and setting 
> "hoodie.metadata.enable.full.scan.log.files" to false, running presto queries 
> with hudi-presto-bundle on HDFS in docker demo throws 
> UnsupportedOperationException during HFile log merging, because 
> HadoopExtendedFileSystem does not implement getScheme().
> {code:java}
> 2022-04-23T07:26:13.085Z INFO hive-hive-0 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader Reading a data 
> block from file 
> hdfs://namenode:8020/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/files/.files-_00.log.1_0-10-10
>  at instant 202204230723040192022-04-23T07:26:13.086Z INFO hive-hive-0 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader Merging the 
> final data blocks2022-04-23T07:26:13.086Z INFO hive-hive-0 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader Number of 
> remaining logblocks to merge 32022-04-23T07:26:13.185Z INFO hive-hive-0 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader Number of 
> remaining logblocks to merge 22022-04-23T07:26:13.190Z ERROR hive-hive-0 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader Got exception 
> when reading log filejava.lang.UnsupportedOperationException: Not implemented 
> by the HadoopExtendedFileSystem FileSystem implementationat 
> org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:219)at 
> org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupRecords(HoodieHFileDataBlock.java:205)at
>  
> org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecordIterator(HoodieDataBlock.java:168)at
>  
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.getRecordsIterator(AbstractHoodieLogRecordReader.java:488)at
>  
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:378)at
>  
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:466)at
>  
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:342)at
>  
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:195)at
>  
> org.apache.hudi.metadata.HoodieMetadataMergedLogRecordReader.getRecordsByKeys(HoodieMetadataMergedLogRecordReader.java:124)at
>  
> org.apache.hudi.metadata.HoodieBackedTableMetadata.readLogRecords(HoodieBackedTableMetadata.java:257)at
>  
> org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$0(HoodieBackedTableMetadata.java:213)at
>  java.util.HashMap.forEach(HashMap.java:1289)at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:200)at
>  
> org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:140)at
>  
> org.apache.hudi.metadata.BaseTableMetadata.fetchAllFilesInPartition(BaseTableMetadata.java:312)at
>  
> org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:135)at
>  
> org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:65)at
>  
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:304)at
>  
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)at
>  
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295)at
>  
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestBaseFiles(AbstractTableFileSystemView.java:478)at
>  
> org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:189)at
>  
> com.facebook.presto.hive.util.HiveFileIterator.lambda$getLocatedFileStatusRemoteIterator$0(HiveFileIterator.java:103)at
>  com.google.common.collect.Iterators$5.computeNext(Iterators.java:639)at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)at
>  
>

[jira] [Updated] (HUDI-4142) RFC for new Table APIs proposal

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4142:
-
Sprint: 2022/05/16, 2022/05/31, 2022/09/19  (was: 2022/05/16, 2022/05/31, 
2022/08/08)

> RFC for new Table APIs proposal
> ---
>
> Key: HUDI-4142
> URL: https://issues.apache.org/jira/browse/HUDI-4142
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
>
> Document all APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4035) Improve point lookup in Metadata Table

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4035:
-
Sprint: 2022/05/02, 2022/05/16, 2022/05/31, 2022/09/19  (was: 2022/05/02, 
2022/05/16, 2022/05/31, 2022/08/08)

> Improve point lookup in Metadata Table
> --
>
> Key: HUDI-4035
> URL: https://issues.apache.org/jira/browse/HUDI-4035
> Project: Apache Hudi
>  Issue Type: Task
>  Components: metadata
>Reporter: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> Concurrent lookup, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-08-07 Thread GitBox



hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1207633040

   
   ## CI report:
   
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 8bd34a6bee3084bdc6029f3c0740cf06906acfd5 UNKNOWN
   * 96701a705ec8c271438ff696c1451364b9b4398d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10660)
 
   * a80d4bdd93c349b09b6e640dd2229379f2173ff0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10661)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2955) Upgrade Hadoop to 3.3.x

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2955:
-
Sprint: Hudi-Sprint-Feb-14, Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, 
Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 
2022/05/02, 2022/05/16, 2022/05/31, 2022/08/22  (was: Hudi-Sprint-Feb-14, 
Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05, 
Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/05/02, 2022/05/16, 2022/05/31, 
2022/08/08)

> Upgrade Hadoop to 3.3.x
> ---
>
> Key: HUDI-2955
> URL: https://issues.apache.org/jira/browse/HUDI-2955
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Alexey Kudinkin
>Assignee: Rahil Chertara
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: Screen Shot 2021-12-07 at 2.32.51 PM.png
>
>
> According to Hadoop compatibility matrix, this is a pre-requisite to 
> upgrading to JDK11:
> !Screen Shot 2021-12-07 at 2.32.51 PM.png|width=938,height=230!
> [https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions]
>  
> *Upgrading Hadoop from 2.x to 3.x*
> [https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.x+to+3.x+Upgrade+Efforts]
> Everything (relevant to us) seems to be in a good shape, except Spark 2.2/.3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4563) Docs writing for 0.12.0

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4563:
-
Fix Version/s: 0.12.0

> Docs writing for 0.12.0
> ---
>
> Key: HUDI-4563
> URL: https://issues.apache.org/jira/browse/HUDI-4563
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4563) Docs writing for 0.12.0

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4563:
-
Story Points: 3

> Docs writing for 0.12.0
> ---
>
> Key: HUDI-4563
> URL: https://issues.apache.org/jira/browse/HUDI-4563
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4563) Docs writing for 0.12.0

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4563:
-
Reviewers: Sagar Sumit

> Docs writing for 0.12.0
> ---
>
> Key: HUDI-4563
> URL: https://issues.apache.org/jira/browse/HUDI-4563
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-4563) Docs writing for 0.12.0

2022-08-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-4563:


Assignee: (was: Sagar Sumit)

> Docs writing for 0.12.0
> ---
>
> Key: HUDI-4563
> URL: https://issues.apache.org/jira/browse/HUDI-4563
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4563) Docs writing for 0.12.0

2022-08-07 Thread Raymond Xu (Jira)

Raymond Xu created HUDI-4563:


 Summary: Docs writing for 0.12.0
 Key: HUDI-4563
 URL: https://issues.apache.org/jira/browse/HUDI-4563
 Project: Apache Hudi
  Issue Type: Task
Reporter: Raymond Xu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 >

1 - 100 of 185 matches

Mail list logo