date:20211228

[jira] [Updated] (HUDI-3107) Fix HiveSyncTool drop partitions using JDBC

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3107:
-
Component/s: Hive Integration

> Fix HiveSyncTool drop partitions using JDBC
> ---
>
> Key: HUDI-3107
> URL: https://issues.apache.org/jira/browse/HUDI-3107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Yue Zhang
>Assignee: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> ```
>  org.apache.hudi.exception.HoodieException: Unable to delete table partitions 
> in /Users/yuezhang/tmp/hudiAfTable/forecast_agg
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:240)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.main(HoodieDropPartitionsTool.java:212)
>   at HoodieDropPartitionsToolTest.main(HoodieDropPartitionsToolTest.java:31)
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
> when hive syncing forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:119)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncHive(HoodieDropPartitionsTool.java:404)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncToHiveIfNecessary(HoodieDropPartitionsTool.java:270)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.doDeleteTablePartitionsEager(HoodieDropPartitionsTool.java:252)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:230)
>   ... 2 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
> partitions for table forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:368)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:202)
>   at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:116)
>   ... 6 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing 
> SQL ALTER TABLE `forecast_agg` DROP PARTITION (20210623/0/20210623)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:64)
>   at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
>   at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
>   at 
> org.apache.hudi.hive.ddl.JDBCExecutor.dropPartitionsToTable(JDBCExecutor.java:149)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.dropPartitionsToTable(HoodieHiveClient.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:363)
>   ... 9 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:62)
>   ... 21 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>   at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>   at 
>

[jira] [Assigned] (HUDI-3107) Fix HiveSyncTool drop partitions using JDBC

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3107:


Assignee: Yue Zhang

> Fix HiveSyncTool drop partitions using JDBC
> ---
>
> Key: HUDI-3107
> URL: https://issues.apache.org/jira/browse/HUDI-3107
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Yue Zhang
>Assignee: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> ```
>  org.apache.hudi.exception.HoodieException: Unable to delete table partitions 
> in /Users/yuezhang/tmp/hudiAfTable/forecast_agg
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:240)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.main(HoodieDropPartitionsTool.java:212)
>   at HoodieDropPartitionsToolTest.main(HoodieDropPartitionsToolTest.java:31)
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
> when hive syncing forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:119)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncHive(HoodieDropPartitionsTool.java:404)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncToHiveIfNecessary(HoodieDropPartitionsTool.java:270)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.doDeleteTablePartitionsEager(HoodieDropPartitionsTool.java:252)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:230)
>   ... 2 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
> partitions for table forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:368)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:202)
>   at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:116)
>   ... 6 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing 
> SQL ALTER TABLE `forecast_agg` DROP PARTITION (20210623/0/20210623)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:64)
>   at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
>   at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
>   at 
> org.apache.hudi.hive.ddl.JDBCExecutor.dropPartitionsToTable(JDBCExecutor.java:149)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.dropPartitionsToTable(HoodieHiveClient.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:363)
>   ... 9 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:62)
>   ... 21 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>   at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)

[jira] [Updated] (HUDI-3107) Fix HiveSyncTool drop partitions using JDBC

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3107:
-
Fix Version/s: 0.11.0
   0.10.1

> Fix HiveSyncTool drop partitions using JDBC
> ---
>
> Key: HUDI-3107
> URL: https://issues.apache.org/jira/browse/HUDI-3107
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Yue Zhang
>Assignee: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> ```
>  org.apache.hudi.exception.HoodieException: Unable to delete table partitions 
> in /Users/yuezhang/tmp/hudiAfTable/forecast_agg
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:240)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.main(HoodieDropPartitionsTool.java:212)
>   at HoodieDropPartitionsToolTest.main(HoodieDropPartitionsToolTest.java:31)
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
> when hive syncing forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:119)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncHive(HoodieDropPartitionsTool.java:404)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncToHiveIfNecessary(HoodieDropPartitionsTool.java:270)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.doDeleteTablePartitionsEager(HoodieDropPartitionsTool.java:252)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:230)
>   ... 2 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
> partitions for table forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:368)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:202)
>   at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:116)
>   ... 6 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing 
> SQL ALTER TABLE `forecast_agg` DROP PARTITION (20210623/0/20210623)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:64)
>   at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
>   at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
>   at 
> org.apache.hudi.hive.ddl.JDBCExecutor.dropPartitionsToTable(JDBCExecutor.java:149)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.dropPartitionsToTable(HoodieHiveClient.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:363)
>   ... 9 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:62)
>   ... 21 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>   at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>   at 
>

[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-28 Thread GitBox



dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002442373


   > @dongkelun @xushiyan I offer another solution to discuss.
   > 
   > Query incrementally in hive need to set 
`hoodie.%s.consume.start.timestamp` which is used in 
`HoodieHiveUtils.readStartCommitTime`。Currently, we pass the 
`hoodie.table.name` named `tableName` to this function. We can add configs 
`hoodie.datasource.write.database.name` in `DataSourceWriteOptions` and 
`hoodie.database.name` in `HoodieTableConfig`. And if `database.name` provided, 
we joint the `database.name` and `table.name` and pass it to 
`readStartCommitTime`. And then, use can set 
`hoodie.dbName.tableName.consume.start.timestamp` in hive and query.
   > 
   > Also, `hoodie.datasource.write.database.name` and `hoodie.database.name` 
can reuse in other scene.
   > 
   > @xushiyan what do you think.
   
   @xushiyan @YannByron   I probably understand the solution. 
   
   SQL will persist the database name to ` hoodie.properties` by default, DF is 
selectively persisted through optional database parameters. Then, in 
incremental query, if  set ` databaseName.tableName`, we match 
`databaseName.tableName`. If it is inconsistent or there is no databaseName, 
incremental query will not be performed. If consistent, perform an incremental 
query.If the incremental query does not have a database name set, does not 
match the database name, only the table name
   
   So, which parameter should DF use to persist the database name？


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] RocMarshal commented on pull request #3813: [HUDI-2563][hudi-client] Refactor CompactionTriggerStrategy.

2021-12-28 Thread GitBox



RocMarshal commented on pull request #3813:
URL: https://github.com/apache/hudi/pull/3813#issuecomment-1002441676


   > I'm not in favor of fat Enum either. But would like to understand the main 
benefit of this change: is it meant for portability of these logic? @RocMarshal
   
   Thanks @xushiyan for the review. After listening to your opinions, I don't 
think this reconstruction seems to be much significant.
   Or turn it off.
   If new strategies are introduced in the future, it goes perhaps without good 
scalability.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-3093) spark-sql query error when use TimestampBasedKeyGenerator

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-3093.

 Reviewers: Raymond Xu
Resolution: Fixed

> spark-sql query error when use TimestampBasedKeyGenerator
> -
>
> Key: HUDI-3093
> URL: https://issues.apache.org/jira/browse/HUDI-3093
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: Yann Byron
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> https://github.com/apache/hudi/issues/4200



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3093) spark-sql query error when use TimestampBasedKeyGenerator

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3093:
-
Reporter: Raymond Xu  (was: Yann Byron)

> spark-sql query error when use TimestampBasedKeyGenerator
> -
>
> Key: HUDI-3093
> URL: https://issues.apache.org/jira/browse/HUDI-3093
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: Yann Byron
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> https://github.com/apache/hudi/issues/4200



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3093) spark-sql query error when use TimestampBasedKeyGenerator

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3093:
-
Priority: Critical  (was: Major)

> spark-sql query error when use TimestampBasedKeyGenerator
> -
>
> Key: HUDI-3093
> URL: https://issues.apache.org/jira/browse/HUDI-3093
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> https://github.com/apache/hudi/issues/4200



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2990) Sync to HMS when deleting partitions

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2990:
-
Component/s: Hive Integration

> Sync to HMS when deleting partitions
> 
>
> Key: HUDI-2990
> URL: https://issues.apache.org/jira/browse/HUDI-2990
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.11.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2915) Fix field not found in record error for spark-sql

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2915:
-
Priority: Critical  (was: Major)

> Fix field not found in record error for spark-sql
> -
>
> Key: HUDI-2915
> URL: https://issues.apache.org/jira/browse/HUDI-2915
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: Forward Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
> Attachments: image-2021-12-02-19-37-10-346.png
>
>
> !image-2021-12-02-19-37-10-346.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2915) Fix field not found in record error for spark-sql

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2915:
-
Reporter: Raymond Xu  (was: Forward Xu)

> Fix field not found in record error for spark-sql
> -
>
> Key: HUDI-2915
> URL: https://issues.apache.org/jira/browse/HUDI-2915
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
> Attachments: image-2021-12-02-19-37-10-346.png
>
>
> !image-2021-12-02-19-37-10-346.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2837) The original hoodie.table.name should be maintained in Spark SQL

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2837:
-
Fix Version/s: 0.11.0
Reviewers: Raymond Xu

> The original hoodie.table.name should be maintained in Spark SQL
> 
>
> Key: HUDI-2837
> URL: https://issues.apache.org/jira/browse/HUDI-2837
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> When querying Hudi incrementally in hive, we set the start query time of the 
> table. This setting works for all tables with the same name, not only for the 
> tables in the current database. In actual business, it can not be guaranteed 
> that the tables in different databases are different, so it can be realized 
> by setting hoodie.table.name as database name + table name, However, at 
> present, the original value of hoodie.table.name is not consistent in Spark 
> SQL



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002428182


   
   ## CI report:
   
   * 2e5ad082fa641bd060c7b8b25a23ef042c240460 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4796)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002409677


   
   ## CI report:
   
   * 0c1a86fb69261aef8f6bd7f017a04ce087b2fc98 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4777)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4780)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4779)
 
   * 2e5ad082fa641bd060c7b8b25a23ef042c240460 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4796)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-2986) Deltastreamer continuous mode run into Too many open files exception

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2986.

Fix Version/s: (was: 0.11.0)
   (was: 0.10.1)
   Resolution: Won't Fix

not seeing the issue in 0.10.0

> Deltastreamer continuous mode run into Too many open files exception
> 
>
> Key: HUDI-2986
> URL: https://issues.apache.org/jira/browse/HUDI-2986
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer, Writer Core
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: core-flow-ds, sev:critical
>
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 6 in stage 35202.0 failed 4 times, most recent failure: Lost task 6.3 in 
> stage 35202.0 (TID 1172485, ip-10-211-53-165.infra.usw2.zdsys.com, executor 
> 1): java.io.FileNotFoundException: 
> /mnt/yarn/usercache/hadoop/appcache/application_1638666447607_0001/blockmgr-3725bb05-2c9a-4073-80f6-4eaa335321c9/34/temp_shuffle_8f675a83-21ac-4908-b8da-1c8e25a59b8e
>  (Too many open files)
>   at java.io.FileOutputStream.open0(Native Method)
>   at java.io.FileOutputStream.open(FileOutputStream.java:270)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:106)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:119)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:251)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:157)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:95)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2136)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2124)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2123)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2123)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:994)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:994)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:994)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2384)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2333)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2322)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
>   at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:805)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2097)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2194)
>   at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1143)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
>   at org.apache.spark.rdd.RDD.fold(RDD.scala:1137)
>   at 
> org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply$mcD$sp(DoubleRDDFunctions.scala:35)
>   at 
> org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply(DoubleRDDFunctions.scala:35)
>   at 
> org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply(DoubleRDDFunctions.scala:35)
>

[jira] [Closed] (HUDI-2989) Hive sync to Glue tables not updating S3 location

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2989.

Fix Version/s: (was: 0.11.0)
   (was: 0.10.1)
   Resolution: Won't Fix

> Hive sync to Glue tables not updating S3 location
> -
>
> Key: HUDI-2989
> URL: https://issues.apache.org/jira/browse/HUDI-2989
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2987) event time not recorded in commit metadata when insert or bulk insert

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2987:
-
Priority: Critical  (was: Blocker)

> event time not recorded in commit metadata when insert or bulk insert
> -
>
> Key: HUDI-2987
> URL: https://issues.apache.org/jira/browse/HUDI-2987
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available, sev:high
> Fix For: 0.11.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2989) Hive sync to Glue tables not updating S3 location

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2989:
-
Status: Resolved  (was: Patch Available)

> Hive sync to Glue tables not updating S3 location
> -
>
> Key: HUDI-2989
> URL: https://issues.apache.org/jira/browse/HUDI-2989
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Reopened] (HUDI-2989) Hive sync to Glue tables not updating S3 location

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reopened HUDI-2989:
--

> Hive sync to Glue tables not updating S3 location
> -
>
> Key: HUDI-2989
> URL: https://issues.apache.org/jira/browse/HUDI-2989
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2989) Hive sync to Glue tables not updating S3 location

2021-12-28 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2989:
-
Priority: Critical  (was: Blocker)

> Hive sync to Glue tables not updating S3 location
> -
>
> Key: HUDI-2989
> URL: https://issues.apache.org/jira/browse/HUDI-2989
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot removed a comment on pull request #4467: [HUDI-3124] Bootstrap when timeline have completed instant

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4467:
URL: https://github.com/apache/hudi/pull/4467#issuecomment-1002402985


   
   ## CI report:
   
   * 200dc06debc347edec4496d5b09fb7942cdec1a3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4795)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4467: [HUDI-3124] Bootstrap when timeline have completed instant

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4467:
URL: https://github.com/apache/hudi/pull/4467#issuecomment-1002422707


   
   ## CI report:
   
   * 200dc06debc347edec4496d5b09fb7942cdec1a3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4795)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] bhasudha commented on issue #2529: [SUPPORT] - Hudi Jar update in EMR

2021-12-28 Thread GitBox



bhasudha commented on issue #2529:
URL: https://github.com/apache/hudi/issues/2529#issuecomment-1002415806


   Should be in FAQ already - 
https://hudi.apache.org/learn/faq/#how-to-override-hudi-jars-in-emr 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] bhasudha closed issue #2529: [SUPPORT] - Hudi Jar update in EMR

2021-12-28 Thread GitBox



bhasudha closed issue #2529:
URL: https://github.com/apache/hudi/issues/2529


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4463: [HUDI-3120] Cache compactionPlan in buffer

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4463:
URL: https://github.com/apache/hudi/pull/4463#issuecomment-1002415715


   
   ## CI report:
   
   * 3904a789ff694a3b4ef0bc015e73f840e150a797 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4793)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4463: [HUDI-3120] Cache compactionPlan in buffer

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4463:
URL: https://github.com/apache/hudi/pull/4463#issuecomment-1002399306


   
   ## CI report:
   
   * 0ec7317ac54ffcfe925206deeb0f4866dff1f298 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4776)
 
   * 3904a789ff694a3b4ef0bc015e73f840e150a797 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4793)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4459: [HUDI-3116]Add a new HoodieDropPartitionsTool to let users drop table partitions through a standalone job.

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4459:
URL: https://github.com/apache/hudi/pull/4459#issuecomment-1002412658


   
   ## CI report:
   
   * d9182c1661e37f29622caafd9eaa23de73b26331 UNKNOWN
   * 270eee7ef88fc59339675b1443b8918e63015fed Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4773)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4770)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4794)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4459: [HUDI-3116]Add a new HoodieDropPartitionsTool to let users drop table partitions through a standalone job.

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4459:
URL: https://github.com/apache/hudi/pull/4459#issuecomment-1002401796


   
   ## CI report:
   
   * d9182c1661e37f29622caafd9eaa23de73b26331 UNKNOWN
   * 270eee7ef88fc59339675b1443b8918e63015fed Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4773)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4770)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4794)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002409677


   
   ## CI report:
   
   * 0c1a86fb69261aef8f6bd7f017a04ce087b2fc98 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4777)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4780)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4779)
 
   * 2e5ad082fa641bd060c7b8b25a23ef042c240460 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4796)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002409018


   
   ## CI report:
   
   * 0c1a86fb69261aef8f6bd7f017a04ce087b2fc98 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4777)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4780)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4779)
 
   * 2e5ad082fa641bd060c7b8b25a23ef042c240460 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002409018


   
   ## CI report:
   
   * 0c1a86fb69261aef8f6bd7f017a04ce087b2fc98 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4777)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4780)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4779)
 
   * 2e5ad082fa641bd060c7b8b25a23ef042c240460 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002165902


   
   ## CI report:
   
   * 0c1a86fb69261aef8f6bd7f017a04ce087b2fc98 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4777)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4780)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4779)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] cdmikechen commented on a change in pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

2021-12-28 Thread GitBox



cdmikechen commented on a change in pull request #3391:
URL: https://github.com/apache/hudi/pull/3391#discussion_r776160563



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/avro/HudiAvroParquetReader.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.avro;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.io.ArrayWritable;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils;
+import org.apache.parquet.avro.AvroReadSupport;
+import org.apache.parquet.filter2.compat.FilterCompat;
+import org.apache.parquet.hadoop.ParquetRecordReader;
+
+import java.io.IOException;
+
+public class HudiAvroParquetReader extends RecordReader {
+
+  private final ParquetRecordReader parquetRecordReader;
+
+  public HudiAvroParquetReader(FilterCompat.Filter filter) {
+parquetRecordReader = new ParquetRecordReader<>(new AvroReadSupport<>(), 
filter);
+  }
+
+  @Override
+  public void initialize(InputSplit split, TaskAttemptContext context) throws 
IOException, InterruptedException {
+parquetRecordReader.initialize(split, context);
+  }
+
+  @Override
+  public boolean nextKeyValue() throws IOException, InterruptedException {
+return parquetRecordReader.nextKeyValue();
+  }
+
+  @Override
+  public Void getCurrentKey() throws IOException, InterruptedException {
+return parquetRecordReader.getCurrentKey();
+  }
+
+  @Override
+  public ArrayWritable getCurrentValue() throws IOException, 
InterruptedException {
+GenericRecord record = parquetRecordReader.getCurrentValue();
+return (ArrayWritable) 
HoodieRealtimeRecordReaderUtils.avroToArrayWritable(record, record.getSchema());

Review comment:
   @vinothchandar 
   I have been running the fork for several months. At present, it does not 
cause too many additional problems. This may be related to the small amount of 
data processed by my hive and the insufficient impact of memory.
   
   At present, the parsing of avro data by hudi spark in 
`org.apache.hudi.AvroConversionHelper` or hive itself in 
`org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable` both wrapper 
around an Avro GenericRecord. Normally,  I think data processing should not 
cause serious overhead .
   
   Meanwhile, in part of the instantiation of `TimestampWritableV2`, I 
reconstructed some code to enhance and fix some of the original errors and 
problems.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] cdmikechen commented on a change in pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

2021-12-28 Thread GitBox



cdmikechen commented on a change in pull request #3391:
URL: https://github.com/apache/hudi/pull/3391#discussion_r776160563



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/avro/HudiAvroParquetReader.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.avro;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.io.ArrayWritable;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils;
+import org.apache.parquet.avro.AvroReadSupport;
+import org.apache.parquet.filter2.compat.FilterCompat;
+import org.apache.parquet.hadoop.ParquetRecordReader;
+
+import java.io.IOException;
+
+public class HudiAvroParquetReader extends RecordReader {
+
+  private final ParquetRecordReader parquetRecordReader;
+
+  public HudiAvroParquetReader(FilterCompat.Filter filter) {
+parquetRecordReader = new ParquetRecordReader<>(new AvroReadSupport<>(), 
filter);
+  }
+
+  @Override
+  public void initialize(InputSplit split, TaskAttemptContext context) throws 
IOException, InterruptedException {
+parquetRecordReader.initialize(split, context);
+  }
+
+  @Override
+  public boolean nextKeyValue() throws IOException, InterruptedException {
+return parquetRecordReader.nextKeyValue();
+  }
+
+  @Override
+  public Void getCurrentKey() throws IOException, InterruptedException {
+return parquetRecordReader.getCurrentKey();
+  }
+
+  @Override
+  public ArrayWritable getCurrentValue() throws IOException, 
InterruptedException {
+GenericRecord record = parquetRecordReader.getCurrentValue();
+return (ArrayWritable) 
HoodieRealtimeRecordReaderUtils.avroToArrayWritable(record, record.getSchema());

Review comment:
   @vinothchandar 
   I have been running the fork for several months. At present, it does not 
cause too many additional problems. This may be related to the small amount of 
data processed by my hive and the insufficient impact of memory.
   
   At present, the parsing of avro data by hudi spark in 
`org.apache.hudi.AvroConversionHelper` or hive itself in 
`org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable` both wrapper 
around an Avro GenericRecord. Normally,  I think data processing should not 
cause serious overhead .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4467: [HUDI-3124] Bootstrap when timeline have completed instant

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4467:
URL: https://github.com/apache/hudi/pull/4467#issuecomment-1002402985


   
   ## CI report:
   
   * 200dc06debc347edec4496d5b09fb7942cdec1a3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4795)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4467: [HUDI-3124] Bootstrap when timeline have completed instant

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4467:
URL: https://github.com/apache/hudi/pull/4467#issuecomment-1002402429


   
   ## CI report:
   
   * 200dc06debc347edec4496d5b09fb7942cdec1a3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4467: [HUDI-3124] Bootstrap when timeline have completed instant

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4467:
URL: https://github.com/apache/hudi/pull/4467#issuecomment-1002402429


   
   ## CI report:
   
   * 200dc06debc347edec4496d5b09fb7942cdec1a3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3124) Bootstrap when timeline have completed instant

2021-12-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3124:
-
Labels: pull-request-available  (was: )

> Bootstrap when timeline have completed instant
> --
>
> Key: HUDI-3124
> URL: https://issues.apache.org/jira/browse/HUDI-3124
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot commented on pull request #4459: [HUDI-3116]Add a new HoodieDropPartitionsTool to let users drop table partitions through a standalone job.

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4459:
URL: https://github.com/apache/hudi/pull/4459#issuecomment-1002401796


   
   ## CI report:
   
   * d9182c1661e37f29622caafd9eaa23de73b26331 UNKNOWN
   * 270eee7ef88fc59339675b1443b8918e63015fed Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4773)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4770)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4794)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4459: [HUDI-3116]Add a new HoodieDropPartitionsTool to let users drop table partitions through a standalone job.

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4459:
URL: https://github.com/apache/hudi/pull/4459#issuecomment-1002058975


   
   ## CI report:
   
   * d9182c1661e37f29622caafd9eaa23de73b26331 UNKNOWN
   * 270eee7ef88fc59339675b1443b8918e63015fed Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4773)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4770)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zhangyue19921010 commented on pull request #4459: [HUDI-3116]Add a new HoodieDropPartitionsTool to let users drop table partitions through a standalone job.

2021-12-28 Thread GitBox



zhangyue19921010 commented on pull request #4459:
URL: https://github.com/apache/hudi/pull/4459#issuecomment-1002401764


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zhangyue19921010 removed a comment on pull request #4459: [HUDI-3116]Add a new HoodieDropPartitionsTool to let users drop table partitions through a standalone job.

2021-12-28 Thread GitBox



zhangyue19921010 removed a comment on pull request #4459:
URL: https://github.com/apache/hudi/pull/4459#issuecomment-1001991875


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yuzhaojing opened a new pull request #4467: [HUDI-3124] Bootstrap when timeline have completed instant

2021-12-28 Thread GitBox



yuzhaojing opened a new pull request #4467:
URL: https://github.com/apache/hudi/pull/4467


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

2021-12-28 Thread GitBox



hudi-bot commented on pull request #3391:
URL: https://github.com/apache/hudi/pull/3391#issuecomment-1002400317


   
   ## CI report:
   
   * ecb72b89015831cfbfa99ebcb027f660729b3195 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4792)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #3391:
URL: https://github.com/apache/hudi/pull/3391#issuecomment-1002388626


   
   ## CI report:
   
   * e19068fd9ef591062e9ae920f3e2fe74f1eabfe3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1549)
 
   * ecb72b89015831cfbfa99ebcb027f660729b3195 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4792)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-3124) Bootstrap when timeline have completed instant

2021-12-28 Thread yuzhaojing (Jira)

yuzhaojing created HUDI-3124:


 Summary: Bootstrap when timeline have completed instant
 Key: HUDI-3124
 URL: https://issues.apache.org/jira/browse/HUDI-3124
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot commented on pull request #4463: [HUDI-3120] Cache compactionPlan in buffer

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4463:
URL: https://github.com/apache/hudi/pull/4463#issuecomment-1002399306


   
   ## CI report:
   
   * 0ec7317ac54ffcfe925206deeb0f4866dff1f298 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4776)
 
   * 3904a789ff694a3b4ef0bc015e73f840e150a797 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4793)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4463: [HUDI-3120] Cache compactionPlan in buffer

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4463:
URL: https://github.com/apache/hudi/pull/4463#issuecomment-1002398601


   
   ## CI report:
   
   * 0ec7317ac54ffcfe925206deeb0f4866dff1f298 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4776)
 
   * 3904a789ff694a3b4ef0bc015e73f840e150a797 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] cdmikechen commented on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

2021-12-28 Thread GitBox



cdmikechen commented on pull request #3391:
URL: https://github.com/apache/hudi/pull/3391#issuecomment-1002398757


   > I have a concern around performance overhead and also wondering if we can 
just do it as a part of the existing inputformat with a flag, instead of 
switching over entirely to a new ipf? thougnts?
   
   For compatibility `com.twitter:parquet-hadoop-bundle` which used for 
`ParquetInputFormat` in Spark2 (It only contains a parameterless constructor, 
while in hive2 and hive3, a constructor containing ParquetInputFormat is added)
   
   
https://github.com/apache/hive/blob/8e7f23f34b2ce7328c9d571a13c336f0c8cdecb6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java#L48-L55
   ```java
 public MapredParquetInputFormat() {
   this(new 
ParquetInputFormat(DataWritableReadSupport.class));
 }
   
 protected MapredParquetInputFormat(final ParquetInputFormat 
inputFormat) {
   this.realInput = inputFormat;
   vectorizedSelf = new VectorizedParquetInputFormat();
 }
   ```
   Otherwise, we can actually consider refactoring directly into
   ```java
 public HoodieParquetInputFormat() {
   super(new HudiAvroParquetInputFormat());
 }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4463: [HUDI-3120] Cache compactionPlan in buffer

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4463:
URL: https://github.com/apache/hudi/pull/4463#issuecomment-1002101189


   
   ## CI report:
   
   * 0ec7317ac54ffcfe925206deeb0f4866dff1f298 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4776)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4463: [HUDI-3120] Cache compactionPlan in buffer

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4463:
URL: https://github.com/apache/hudi/pull/4463#issuecomment-1002398601


   
   ## CI report:
   
   * 0ec7317ac54ffcfe925206deeb0f4866dff1f298 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4776)
 
   * 3904a789ff694a3b4ef0bc015e73f840e150a797 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#issuecomment-1002391692


   
   ## CI report:
   
   * 15a6d4ea2eaae3e5b8fe5e174127016ea72b0e05 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4791)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#issuecomment-1002379085


   
   ## CI report:
   
   * 1f8244a3e0db6f82af5e8d45c8045c8b759309ba Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4789)
 
   * 15a6d4ea2eaae3e5b8fe5e174127016ea72b0e05 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4791)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

2021-12-28 Thread GitBox



hudi-bot commented on pull request #3391:
URL: https://github.com/apache/hudi/pull/3391#issuecomment-1002388626


   
   ## CI report:
   
   * e19068fd9ef591062e9ae920f3e2fe74f1eabfe3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1549)
 
   * ecb72b89015831cfbfa99ebcb027f660729b3195 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4792)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #3391:
URL: https://github.com/apache/hudi/pull/3391#issuecomment-1002387959


   
   ## CI report:
   
   * e19068fd9ef591062e9ae920f3e2fe74f1eabfe3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1549)
 
   * ecb72b89015831cfbfa99ebcb027f660729b3195 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

2021-12-28 Thread GitBox



hudi-bot commented on pull request #3391:
URL: https://github.com/apache/hudi/pull/3391#issuecomment-1002387959


   
   ## CI report:
   
   * e19068fd9ef591062e9ae920f3e2fe74f1eabfe3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1549)
 
   * ecb72b89015831cfbfa99ebcb027f660729b3195 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #3391:
URL: https://github.com/apache/hudi/pull/3391#issuecomment-961588069


   
   ## CI report:
   
   * e19068fd9ef591062e9ae920f3e2fe74f1eabfe3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1549)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#issuecomment-1002371458


   
   ## CI report:
   
   * aec4dde1fb90319de9cf0c6f34771c0f193ccfd9 UNKNOWN
   * f426cb3cc3513d1baf26a70fdcb18114ffe5ddc5 UNKNOWN
   * 32ec46f289e4ecf4c1e66241ea954a9f3b34e9a6 UNKNOWN
   * 46a706bfce715e88ec2d2d53fc6d81815e7471ac UNKNOWN
   * 735b3d908f02bc2404192f74987afc82151fa837 UNKNOWN
   * 48eab85f19170f1ebfc8dbf86d7a66bf089604e1 UNKNOWN
   * 3402524a8b685565ceff5fdbd9d592f0228740c4 UNKNOWN
   * fbebef9773e5a513dd03f993b76bbf2b908c5f33 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4759)
 
   * cbf3703dbf1ab502bb61ba6800118f9069382b0a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4790)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#issuecomment-1002385852


   
   ## CI report:
   
   * aec4dde1fb90319de9cf0c6f34771c0f193ccfd9 UNKNOWN
   * f426cb3cc3513d1baf26a70fdcb18114ffe5ddc5 UNKNOWN
   * 32ec46f289e4ecf4c1e66241ea954a9f3b34e9a6 UNKNOWN
   * 46a706bfce715e88ec2d2d53fc6d81815e7471ac UNKNOWN
   * 735b3d908f02bc2404192f74987afc82151fa837 UNKNOWN
   * 48eab85f19170f1ebfc8dbf86d7a66bf089604e1 UNKNOWN
   * 3402524a8b685565ceff5fdbd9d592f0228740c4 UNKNOWN
   * cbf3703dbf1ab502bb61ba6800118f9069382b0a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4790)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] stym06 commented on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path

2021-12-28 Thread GitBox



stym06 commented on issue #4318:
URL: https://github.com/apache/hudi/issues/4318#issuecomment-1002380917


   yes, to access s3 data from the local


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] YannByron commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-28 Thread GitBox



YannByron commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1002380413


   @dongkelun @xushiyan 
   I offer another solution to discuss.
   
   Query incrementally in hive need to set `hoodie.%s.consume.start.timestamp` 
which is used in `HoodieHiveUtils.readStartCommitTime`。Currently, we pass the 
`hoodie.table.name` named `tableName` to this function.
   We can add configs `hoodie.datasource.write.database.name` in 
`DataSourceWriteOptions` and `hoodie.database.name` in `HoodieTableConfig`. And 
if `database.name` provided, we joint the `database.name` and `table.name` and 
pass it to `readStartCommitTime`. And then, use can set 
`hoodie.dbName.tableName.consume.start.timestamp` in hive and query.
   
   Also, `hoodie.datasource.write.database.name` and `hoodie.database.name` can 
reuse in other scene.
   
   @xushiyan what do you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2590) Validate Diff key gen w/ and w/o glob path with and w/o metadata enabled

2021-12-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2590:
--
Status: Open  (was: In Progress)

> Validate Diff key gen w/ and w/o glob path with and w/o metadata enabled
> 
>
> Key: HUDI-2590
> URL: https://issues.apache.org/jira/browse/HUDI-2590
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2947) HoodieDeltaStreamer/DeltaSync can improperly pick up the checkpoint config from CLI in continuous mode

2021-12-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2947:
--
Labels: sev:high  (was: )

> HoodieDeltaStreamer/DeltaSync can improperly pick up the checkpoint config 
> from CLI in continuous mode
> --
>
> Key: HUDI-2947
> URL: https://issues.apache.org/jira/browse/HUDI-2947
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: sev:high
> Fix For: 0.11.0
>
>
> *Problem:*
> When deltastreamer is started with a given checkpoint, e.g., `--checkpoint 
> 0`, in the continuous mode, the deltastreamer job may pick up the wrong 
> checkpoint later on.  The wrong checkpoint (for 20211206203551080 commit) 
> happens after the replacecommit and clean, which is reset to "0", instead of 
> "5" after 20211206202728233.commit.  More details below.
>  
> The bug is due to the check here: 
> [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L335]
> {code:java}
> if (cfg.checkpoint != null && 
> (StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY))  
>   || 
> !cfg.checkpoint.equals(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY {
> resumeCheckpointStr = Option.of(cfg.checkpoint);
> } {code}
> In this case of resuming after a clustering commit, "cfg.checkpoint != null" 
> and 
> "StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY))" 
>  are both true as "--checkpoint 0" is configured and last commit is 
> replacecommit without checkpoint keys.  This leads to the resume checkpoint 
> string being reset to the configured checkpoint, skipping the timeline 
> walk-back logic below, which is wrong.  
>  
> Timeline:
>  
> {code:java}
>  189069 Dec  6 12:19 20211206201238649.commit
>       0 Dec  6 12:12 20211206201238649.commit.requested
>       0 Dec  6 12:12 20211206201238649.inflight
>  189069 Dec  6 12:27 20211206201959151.commit
>       0 Dec  6 12:20 20211206201959151.commit.requested
>       0 Dec  6 12:20 20211206201959151.inflight
>  189069 Dec  6 12:34 20211206202728233.commit
>       0 Dec  6 12:27 20211206202728233.commit.requested
>       0 Dec  6 12:27 20211206202728233.inflight
>   36662 Dec  6 12:35 20211206203449899.replacecommit
>       0 Dec  6 12:35 20211206203449899.replacecommit.inflight
>   34656 Dec  6 12:35 20211206203449899.replacecommit.requested
>   28013 Dec  6 12:35 20211206203503574.clean
>   19024 Dec  6 12:35 20211206203503574.clean.inflight
>   19024 Dec  6 12:35 20211206203503574.clean.requested
>  189069 Dec  6 12:43 20211206203551080.commit
>       0 Dec  6 12:35 20211206203551080.commit.requested
>       0 Dec  6 12:35 20211206203551080.inflight
>  189069 Dec  6 12:50 20211206204311612.commit
>       0 Dec  6 12:43 20211206204311612.commit.requested
>       0 Dec  6 12:43 20211206204311612.inflight
>       0 Dec  6 12:50 20211206205044595.commit.requested
>       0 Dec  6 12:50 20211206205044595.inflight
>     128 Dec  6 12:56 archived
>     483 Dec  6 11:52 hoodie.properties
>  {code}
>  
> Checkpoints in commits:
>  
> {code:java}
> grep "deltastreamer.checkpoint.key" *
> 20211206201238649.commit:    "deltastreamer.checkpoint.key" : "2"
> 20211206201959151.commit:    "deltastreamer.checkpoint.key" : "3"
> 20211206202728233.commit:    "deltastreamer.checkpoint.key" : "4"
> 20211206203551080.commit:    "deltastreamer.checkpoint.key" : "1"
> 20211206204311612.commit:    "deltastreamer.checkpoint.key" : "2" {code}
>  
> *Steps to reproduce:*
> Run HoodieDeltaStreamer in the continuous mode, by providing both 
> "--checkpoint 0" and "--continuous", with inline clustering and sync clean 
> enabled (some configs are masked).
>  
> {code:java}
> spark-submit \
>   --master yarn \
>   --driver-memory 8g --executor-memory 8g --num-executors 3 --executor-cores 
> 4 \
>   --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
>   --conf 
> spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain
>  \
>   --conf spark.speculation=true \
>   --conf spark.speculation.multiplier=1.0 \
>   --conf spark.speculation.quantile=0.5 \
>   --packages org.apache.spark:spark-avro_2.12:3.2.0 \
>   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
>   file:/home/hadoop/ethan/hudi-utilities-bundle_2.12-0.10.0-rc3.jar \
>   --props file:/home/hadoop/ethan/test.properties \
>   --source-class ... \
>   --source-ordering-field ts \
>   --target-base-path s3a://hudi-testing/test_hoodie_table_11/ \
>   --target-table test_table \
>   --table-type COPY_ON_WRITE \
>   --op BULK_INSERT

[jira] [Updated] (HUDI-3066) Very slow file listing after enabling metadata for existing tables in 0.10.0 release

2021-12-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3066:
--
Status: Open  (was: In Progress)

> Very slow file listing after enabling metadata for existing tables in 0.10.0 
> release
> 
>
> Key: HUDI-3066
> URL: https://issues.apache.org/jira/browse/HUDI-3066
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.10.0
> Environment: EMR 6.4.0
> Hudi version : 0.10.0
>Reporter: Harsha Teja Kanna
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: performance, pull-request-available
> Fix For: 0.11.0
>
> Attachments: Screen Shot 2021-12-18 at 6.16.29 PM.png, Screen Shot 
> 2021-12-20 at 10.05.50 PM.png, Screen Shot 2021-12-20 at 10.17.44 PM.png, 
> Screen Shot 2021-12-21 at 10.22.54 PM.png, Screen Shot 2021-12-21 at 10.24.12 
> PM.png, metadata_files.txt, metadata_files_compacted.txt, 
> metadata_timeline.txt, metadata_timeline_archived.txt, 
> metadata_timeline_compacted.txt, stderr_part1.txt, stderr_part2.txt, 
> timeline.txt, writer_log.txt
>
>
> After 'metadata table' is enabled, File listing takes long time.
> If metadata is enabled on Reader side(as shown below), it is taking even more 
> time per file listing task
> {code:java}
> import org.apache.hudi.DataSourceReadOptions
> import org.apache.hudi.common.config.HoodieMetadataConfig
> val hadoopConf = spark.conf
> hadoopConf.set(HoodieMetadataConfig.ENABLE.key(), "true")
> val basePath = "s3a://datalake-hudi"
> val sessions = spark
> .read
> .format("org.apache.hudi")
> .option(DataSourceReadOptions.QUERY_TYPE.key(), 
> DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL)
> .option(DataSourceReadOptions.READ_PATHS.key(), 
> s"${basePath}/sessions_by_entrydate/entrydate=2021/*/*/*")
> .load()
> sessions.createOrReplaceTempView("sessions") {code}
> Existing tables (COW) have inline clustering on and have many replace commits.
> Logs seem to suggest the delay is in view.AbstractTableFileSystemView 
> resetFileGroupsReplaced function or metadata.HoodieBackedTableMetadata
> Also many log messages in AbstractHoodieLogRecordReader
>  
> 2021-12-18 23:17:54,056 INFO view.AbstractTableFileSystemView: Took 4118 ms 
> to read  136 instants, 9731 replaced file groups
> 2021-12-18 23:37:46,086 INFO log.AbstractHoodieLogRecordReader: Number of 
> remaining logblocks to merge 1
> 2021-12-18 23:37:46,090 INFO log.AbstractHoodieLogRecordReader: Reading a 
> data block from file 
> s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.76_0-20-515
>  at instant 20211217035105329
> 2021-12-18 23:37:46,090 INFO log.AbstractHoodieLogRecordReader: Number of 
> remaining logblocks to merge 1
> 2021-12-18 23:37:46,094 INFO log.HoodieLogFormatReader: Moving to the next 
> reader for logfile 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.121_0-57-663',
>  fileLen=0}
> 2021-12-18 23:37:46,095 INFO log.AbstractHoodieLogRecordReader: Scanning log 
> file 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.20_0-35-613',
>  fileLen=0}
> 2021-12-18 23:37:46,095 INFO s3a.S3AInputStream: Switching to Random IO seek 
> policy
> 2021-12-18 23:37:46,096 INFO log.AbstractHoodieLogRecordReader: Reading a 
> data block from file 
> s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.62_0-34-377
>  at instant 20211217022049877
> 2021-12-18 23:37:46,096 INFO log.AbstractHoodieLogRecordReader: Number of 
> remaining logblocks to merge 1
> 2021-12-18 23:37:46,105 INFO log.HoodieLogFormatReader: Moving to the next 
> reader for logfile 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.86_0-20-362',
>  fileLen=0}
> 2021-12-18 23:37:46,109 INFO log.AbstractHoodieLogRecordReader: Scanning log 
> file 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.121_0-57-663',
>  fileLen=0}
> 2021-12-18 23:37:46,109 INFO s3a.S3AInputStream: Switching to Random IO seek 
> policy
> 2021-12-18 23:37:46,110 INFO log.HoodieLogFormatReader: Moving to the next 
> reader for logfile 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.77_0-35-590',
>  fileLen=0}
> 2021-12-18 23:37:46,112 INFO log.AbstractHoodieLogRecordReader: Reading a 
> data block from file 
> s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.20_0-35-613
>  at instant 20211216183448389
> 2021-12-18 23:37:46,112 INFO

[jira] [Updated] (HUDI-3057) Instants should be generated strictly under locks

2021-12-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3057:
--
Labels: sev:high  (was: )

> Instants should be generated strictly under locks
> -
>
> Key: HUDI-3057
> URL: https://issues.apache.org/jira/browse/HUDI-3057
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Alexey Kudinkin
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: sev:high
> Fix For: 0.11.0
>
> Attachments: logs.txt
>
>
> While looking into the flakiness of the tests outlined here:
> https://issues.apache.org/jira/browse/HUDI-3043
>  
> I've stumbled upon following failure where one of the writers tries to 
> complete the Commit but it couldn't b/c such file does already exist:
> {code:java}
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.hudi.exception.HoodieIOException: Failed to create file 
> /var/folders/kb/cnff55vj041g2nnlzs5ylqk0gn/T/junit5142536255031969586/testtable_MERGE_ON_READ/.hoodie/20211217150157632.commit
>     at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>     at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>     at 
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamerWithMultiWriter.runJobsInParallel(TestHoodieDeltaStreamerWithMultiWriter.java:336)
>     at 
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamerWithMultiWriter.testUpsertsContinuousModeWithMultipleWriters(TestHoodieDeltaStreamerWithMultiWriter.java:150)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
>     at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:212)
>     at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:208)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:137)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:71)
>     at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
>     at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>     at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
>     at 
> org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
>     at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
>     at 
>

[jira] [Updated] (HUDI-2947) HoodieDeltaStreamer/DeltaSync can improperly pick up the checkpoint config from CLI in continuous mode

2021-12-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2947:
--
Priority: Critical  (was: Blocker)

> HoodieDeltaStreamer/DeltaSync can improperly pick up the checkpoint config 
> from CLI in continuous mode
> --
>
> Key: HUDI-2947
> URL: https://issues.apache.org/jira/browse/HUDI-2947
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: sivabalan narayanan
>Priority: Critical
> Fix For: 0.11.0
>
>
> *Problem:*
> When deltastreamer is started with a given checkpoint, e.g., `--checkpoint 
> 0`, in the continuous mode, the deltastreamer job may pick up the wrong 
> checkpoint later on.  The wrong checkpoint (for 20211206203551080 commit) 
> happens after the replacecommit and clean, which is reset to "0", instead of 
> "5" after 20211206202728233.commit.  More details below.
>  
> The bug is due to the check here: 
> [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L335]
> {code:java}
> if (cfg.checkpoint != null && 
> (StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY))  
>   || 
> !cfg.checkpoint.equals(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY {
> resumeCheckpointStr = Option.of(cfg.checkpoint);
> } {code}
> In this case of resuming after a clustering commit, "cfg.checkpoint != null" 
> and 
> "StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY))" 
>  are both true as "--checkpoint 0" is configured and last commit is 
> replacecommit without checkpoint keys.  This leads to the resume checkpoint 
> string being reset to the configured checkpoint, skipping the timeline 
> walk-back logic below, which is wrong.  
>  
> Timeline:
>  
> {code:java}
>  189069 Dec  6 12:19 20211206201238649.commit
>       0 Dec  6 12:12 20211206201238649.commit.requested
>       0 Dec  6 12:12 20211206201238649.inflight
>  189069 Dec  6 12:27 20211206201959151.commit
>       0 Dec  6 12:20 20211206201959151.commit.requested
>       0 Dec  6 12:20 20211206201959151.inflight
>  189069 Dec  6 12:34 20211206202728233.commit
>       0 Dec  6 12:27 20211206202728233.commit.requested
>       0 Dec  6 12:27 20211206202728233.inflight
>   36662 Dec  6 12:35 20211206203449899.replacecommit
>       0 Dec  6 12:35 20211206203449899.replacecommit.inflight
>   34656 Dec  6 12:35 20211206203449899.replacecommit.requested
>   28013 Dec  6 12:35 20211206203503574.clean
>   19024 Dec  6 12:35 20211206203503574.clean.inflight
>   19024 Dec  6 12:35 20211206203503574.clean.requested
>  189069 Dec  6 12:43 20211206203551080.commit
>       0 Dec  6 12:35 20211206203551080.commit.requested
>       0 Dec  6 12:35 20211206203551080.inflight
>  189069 Dec  6 12:50 20211206204311612.commit
>       0 Dec  6 12:43 20211206204311612.commit.requested
>       0 Dec  6 12:43 20211206204311612.inflight
>       0 Dec  6 12:50 20211206205044595.commit.requested
>       0 Dec  6 12:50 20211206205044595.inflight
>     128 Dec  6 12:56 archived
>     483 Dec  6 11:52 hoodie.properties
>  {code}
>  
> Checkpoints in commits:
>  
> {code:java}
> grep "deltastreamer.checkpoint.key" *
> 20211206201238649.commit:    "deltastreamer.checkpoint.key" : "2"
> 20211206201959151.commit:    "deltastreamer.checkpoint.key" : "3"
> 20211206202728233.commit:    "deltastreamer.checkpoint.key" : "4"
> 20211206203551080.commit:    "deltastreamer.checkpoint.key" : "1"
> 20211206204311612.commit:    "deltastreamer.checkpoint.key" : "2" {code}
>  
> *Steps to reproduce:*
> Run HoodieDeltaStreamer in the continuous mode, by providing both 
> "--checkpoint 0" and "--continuous", with inline clustering and sync clean 
> enabled (some configs are masked).
>  
> {code:java}
> spark-submit \
>   --master yarn \
>   --driver-memory 8g --executor-memory 8g --num-executors 3 --executor-cores 
> 4 \
>   --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
>   --conf 
> spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain
>  \
>   --conf spark.speculation=true \
>   --conf spark.speculation.multiplier=1.0 \
>   --conf spark.speculation.quantile=0.5 \
>   --packages org.apache.spark:spark-avro_2.12:3.2.0 \
>   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
>   file:/home/hadoop/ethan/hudi-utilities-bundle_2.12-0.10.0-rc3.jar \
>   --props file:/home/hadoop/ethan/test.properties \
>   --source-class ... \
>   --source-ordering-field ts \
>   --target-base-path s3a://hudi-testing/test_hoodie_table_11/ \
>   --target-table test_table \
>   --table-type COPY_ON_WRITE \
>   --op BULK_INSERT \
>   --checkpoint 0 \

[jira] [Updated] (HUDI-3057) Instants should be generated strictly under locks

2021-12-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3057:
--
Priority: Critical  (was: Blocker)

> Instants should be generated strictly under locks
> -
>
> Key: HUDI-3057
> URL: https://issues.apache.org/jira/browse/HUDI-3057
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Alexey Kudinkin
>Assignee: sivabalan narayanan
>Priority: Critical
> Fix For: 0.11.0
>
> Attachments: logs.txt
>
>
> While looking into the flakiness of the tests outlined here:
> https://issues.apache.org/jira/browse/HUDI-3043
>  
> I've stumbled upon following failure where one of the writers tries to 
> complete the Commit but it couldn't b/c such file does already exist:
> {code:java}
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.hudi.exception.HoodieIOException: Failed to create file 
> /var/folders/kb/cnff55vj041g2nnlzs5ylqk0gn/T/junit5142536255031969586/testtable_MERGE_ON_READ/.hoodie/20211217150157632.commit
>     at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>     at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>     at 
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamerWithMultiWriter.runJobsInParallel(TestHoodieDeltaStreamerWithMultiWriter.java:336)
>     at 
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamerWithMultiWriter.testUpsertsContinuousModeWithMultipleWriters(TestHoodieDeltaStreamerWithMultiWriter.java:150)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
>     at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
>     at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:212)
>     at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:208)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:137)
>     at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:71)
>     at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
>     at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>     at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
>     at 
> org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
>     at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
>     at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>     at 
>

[GitHub] [hudi] hudi-bot removed a comment on pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#issuecomment-1002368726


   
   ## CI report:
   
   * 1f8244a3e0db6f82af5e8d45c8045c8b759309ba Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4789)
 
   * 15a6d4ea2eaae3e5b8fe5e174127016ea72b0e05 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#issuecomment-1002379085


   
   ## CI report:
   
   * 1f8244a3e0db6f82af5e8d45c8045c8b759309ba Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4789)
 
   * 15a6d4ea2eaae3e5b8fe5e174127016ea72b0e05 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4791)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] putaozhi123 opened a new issue #4466: [SUPPORT]ERROR table.HoodieTimelineArchiveLog: Failed to archive commits,Not an Avro data file

2021-12-28 Thread GitBox



putaozhi123 opened a new issue #4466:
URL: https://github.com/apache/hudi/issues/4466


   **Environment Description**
   
   Hudi version : 0.8.0
   
   Spark version : 2.4.7
   
   Storage (HDFS/S3/GCS..) : HDFS
   
   Running on Docker? (yes/no) : no
   
   **Additional context**
   the hudi  program  restart fails when I restart sparksubmit , the exception 
is as follows:
   
   **Stacktrace**
   21/12/29 05:57:06 ERROR table.HoodieTimelineArchiveLog: Failed to archive 
commits, .commit file: 20210916102118.rollback.inflight
   java.io.IOException: Not an Avro data file
at 
org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
at 
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:175)
at 
org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:84)
at 
org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:370)
at 
org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:311)
at 
org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:128)
at 
org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:430)
at 
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:186)
at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:121)
at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:476)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:222)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249)
at hudi.WriteHudi$$anonfun$1.apply(WriteHudi.scala:148)
at hudi.WriteHudi$$anonfun$1.apply(WriteHudi.scala:136)
at 
org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:537)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:535)
at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at

[GitHub] [hudi] nikenfls commented on issue #4461: [SUPPORT]Hudi(0.10.0) write to Aliyun oss using metadata table warning

2021-12-28 Thread GitBox



nikenfls commented on issue #4461:
URL: https://github.com/apache/hudi/issues/4461#issuecomment-1002372820


   > 
   
   Thank you very much for your time. I will try these sdk. XD


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-3110) parquet max file size not honored

2021-12-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466293#comment-17466293
 ] 

sivabalan narayanan commented on HUDI-3110:
---

setting parquet block size fixed the issue. 

> parquet max file size not honored
> -
>
> Key: HUDI-3110
> URL: https://issues.apache.org/jira/browse/HUDI-3110
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: sev:high
> Fix For: 0.11.0
>
>
> setting hoodie.parquet.max.file.size does not get honored. 
> I still see size reaches 120Mb even though I configure max parquet size to 
> 50MB. 
> this is happening in both row writer path and non row writer path.
>  
>  df.write.format("hudi").
>      |         option(PRECOMBINE_FIELD_OPT_KEY, "other").
>      |         option(RECORDKEY_FIELD_OPT_KEY, "id").
>      |         option(PARTITIONPATH_FIELD_OPT_KEY, "type").
>      |         option(OPERATION_OPT_KEY,"bulk_insert").
>      |         option("hoodie.bulkinsert.shuffle.parallelism", "4").
>      |         option("hoodie.parquet.max.file.size","52428800").
>      |         option(TABLE_NAME, tableName).
>      |         option("hoodie.datasource.write.row.writer.enable","false").
>      |         mode(Overwrite).
>      |         save(basePath)
>  
>  ls -ltr /tmp/hudi_trips_cow/PullRequestEvent
> total 754048
> -rw-r--r--  1 nsb  wheel  121847456 Dec 27 19:14 
> e199774a-ceec-47bb-883e-4e669877f778-3_1-34-192_20211227191149448.parquet
> -rw-r--r--  1 nsb  wheel  119741276 Dec 27 19:14 
> e199774a-ceec-47bb-883e-4e669877f778-4_1-34-192_20211227191149448.parquet
> -rw-r--r--  1 nsb  wheel  114652047 Dec 27 19:14 
> e199774a-ceec-47bb-883e-4e669877f778-5_1-34-192_20211227191149448.parquet



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Closed] (HUDI-3110) parquet max file size not honored

2021-12-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan closed HUDI-3110.
-
Resolution: Invalid

> parquet max file size not honored
> -
>
> Key: HUDI-3110
> URL: https://issues.apache.org/jira/browse/HUDI-3110
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: sev:high
> Fix For: 0.11.0
>
>
> setting hoodie.parquet.max.file.size does not get honored. 
> I still see size reaches 120Mb even though I configure max parquet size to 
> 50MB. 
> this is happening in both row writer path and non row writer path.
>  
>  df.write.format("hudi").
>      |         option(PRECOMBINE_FIELD_OPT_KEY, "other").
>      |         option(RECORDKEY_FIELD_OPT_KEY, "id").
>      |         option(PARTITIONPATH_FIELD_OPT_KEY, "type").
>      |         option(OPERATION_OPT_KEY,"bulk_insert").
>      |         option("hoodie.bulkinsert.shuffle.parallelism", "4").
>      |         option("hoodie.parquet.max.file.size","52428800").
>      |         option(TABLE_NAME, tableName).
>      |         option("hoodie.datasource.write.row.writer.enable","false").
>      |         mode(Overwrite).
>      |         save(basePath)
>  
>  ls -ltr /tmp/hudi_trips_cow/PullRequestEvent
> total 754048
> -rw-r--r--  1 nsb  wheel  121847456 Dec 27 19:14 
> e199774a-ceec-47bb-883e-4e669877f778-3_1-34-192_20211227191149448.parquet
> -rw-r--r--  1 nsb  wheel  119741276 Dec 27 19:14 
> e199774a-ceec-47bb-883e-4e669877f778-4_1-34-192_20211227191149448.parquet
> -rw-r--r--  1 nsb  wheel  114652047 Dec 27 19:14 
> e199774a-ceec-47bb-883e-4e669877f778-5_1-34-192_20211227191149448.parquet



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot commented on pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#issuecomment-1002371458


   
   ## CI report:
   
   * aec4dde1fb90319de9cf0c6f34771c0f193ccfd9 UNKNOWN
   * f426cb3cc3513d1baf26a70fdcb18114ffe5ddc5 UNKNOWN
   * 32ec46f289e4ecf4c1e66241ea954a9f3b34e9a6 UNKNOWN
   * 46a706bfce715e88ec2d2d53fc6d81815e7471ac UNKNOWN
   * 735b3d908f02bc2404192f74987afc82151fa837 UNKNOWN
   * 48eab85f19170f1ebfc8dbf86d7a66bf089604e1 UNKNOWN
   * 3402524a8b685565ceff5fdbd9d592f0228740c4 UNKNOWN
   * fbebef9773e5a513dd03f993b76bbf2b908c5f33 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4759)
 
   * cbf3703dbf1ab502bb61ba6800118f9069382b0a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4790)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#issuecomment-1002365199


   
   ## CI report:
   
   * aec4dde1fb90319de9cf0c6f34771c0f193ccfd9 UNKNOWN
   * f426cb3cc3513d1baf26a70fdcb18114ffe5ddc5 UNKNOWN
   * 32ec46f289e4ecf4c1e66241ea954a9f3b34e9a6 UNKNOWN
   * 46a706bfce715e88ec2d2d53fc6d81815e7471ac UNKNOWN
   * 735b3d908f02bc2404192f74987afc82151fa837 UNKNOWN
   * 48eab85f19170f1ebfc8dbf86d7a66bf089604e1 UNKNOWN
   * 3402524a8b685565ceff5fdbd9d592f0228740c4 UNKNOWN
   * fbebef9773e5a513dd03f993b76bbf2b908c5f33 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4759)
 
   * cbf3703dbf1ab502bb61ba6800118f9069382b0a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] vingov commented on issue #4429: [SUPPORT] Spark SQL CTAS command doesn't work with 0.10.0 version and Spark 3.1.1

2021-12-28 Thread GitBox



vingov commented on issue #4429:
URL: https://github.com/apache/hudi/issues/4429#issuecomment-1002370883


   Thanks, @xushiyan, I've updated the title to reflect the CTAS issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] YannByron edited a comment on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL

2021-12-28 Thread GitBox



YannByron edited a comment on issue #4154:
URL: https://github.com/apache/hudi/issues/4154#issuecomment-1002370436


   @nsivabalan @BenjMaq 
   I use the basically same commands, in Hudi0.9 + Spark2.4.4.
   
   ```
   CREATE TABLE IF NOT EXISTS test_overwrite (
   id bigint,
   name string,
   dt string
   ) USING hudi
   LOCATION 'file:///tmp/hudi/test_overwrite'
   OPTIONS (
 type = 'cow'
   )
   PARTITIONED by (dt);
   
   insert into test_overwrite
   values
   (1, 'a1', '2021-11-29'),
   (2, 'a2', '2021-11-29')
   ;
   
   insert overwrite table test_overwrite
   values
   (3, 'a3', '2021-11-29'),
   (4, 'a4', '2021-11-29')
   ;
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] YannByron edited a comment on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL

2021-12-28 Thread GitBox



YannByron edited a comment on issue #4154:
URL: https://github.com/apache/hudi/issues/4154#issuecomment-1002370436


   @nsivabalan @BenjMaq 
   I use the basically same commands, in Hudi0.9 + Spark2.4.4.
   
   ```CREATE TABLE IF NOT EXISTS test_overwrite (
   id bigint,
   name string,
   dt string
   ) USING hudi
   LOCATION 'file:///tmp/hudi/test_overwrite'
   OPTIONS (
 type = 'cow'
   )
   PARTITIONED by (dt);
   
   insert into test_overwrite
   values
   (1, 'a1', '2021-11-29'),
   (2, 'a2', '2021-11-29')
   ;
   
   insert overwrite table test_overwrite
   values
   (3, 'a3', '2021-11-29'),
   (4, 'a4', '2021-11-29')
   ;```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] YannByron commented on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL

2021-12-28 Thread GitBox



YannByron commented on issue #4154:
URL: https://github.com/apache/hudi/issues/4154#issuecomment-1002370436


   @nsivabalan @BenjMaq 
   I use the basically same commands, in Hudi0.9 + Spark2.4.4.
   
   `CREATE TABLE IF NOT EXISTS test_overwrite (
   id bigint,
   name string,
   dt string
   ) USING hudi
   LOCATION 'file:///tmp/hudi/test_overwrite'
   OPTIONS (
 type = 'cow'
   )
   PARTITIONED by (dt);
   
   insert into test_overwrite
   values
   (1, 'a1', '2021-11-29'),
   (2, 'a2', '2021-11-29')
   ;
   
   insert overwrite table test_overwrite
   values
   (3, 'a3', '2021-11-29'),
   (4, 'a4', '2021-11-29')
   ;`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#issuecomment-1002368196


   
   ## CI report:
   
   * 5cd1675199d0dd65733982cca2132c03b5bf9d6c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4769)
 
   * 1f8244a3e0db6f82af5e8d45c8045c8b759309ba Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4789)
 
   * 15a6d4ea2eaae3e5b8fe5e174127016ea72b0e05 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#issuecomment-1002368726


   
   ## CI report:
   
   * 1f8244a3e0db6f82af5e8d45c8045c8b759309ba Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4789)
 
   * 15a6d4ea2eaae3e5b8fe5e174127016ea72b0e05 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] dongkelun commented on pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-12-28 Thread GitBox



dongkelun commented on pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#issuecomment-1002368631


   > @dongkelun : can you rebase with latest master.
   
   @nsivabalan Hello,I have rebased with latest master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#issuecomment-1002365326


   
   ## CI report:
   
   * 5cd1675199d0dd65733982cca2132c03b5bf9d6c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4769)
 
   * 1f8244a3e0db6f82af5e8d45c8045c8b759309ba Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4789)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#issuecomment-1002368196


   
   ## CI report:
   
   * 5cd1675199d0dd65733982cca2132c03b5bf9d6c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4769)
 
   * 1f8244a3e0db6f82af5e8d45c8045c8b759309ba Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4789)
 
   * 15a6d4ea2eaae3e5b8fe5e174127016ea72b0e05 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] YuweiXiao commented on issue #4461: [SUPPORT]Hudi(0.10.0) write to Aliyun oss using metadata table warning

2021-12-28 Thread GitBox



YuweiXiao commented on issue #4461:
URL: https://github.com/apache/hudi/issues/4461#issuecomment-1002367507


   I cannot reproduce the warning. I am using master branch and run directly 
inside the IDE (with spark local mode and core-site.xml setup)
   
   by the way, I am using the following version of AliyunOSS sdk
   ```
   
 org.apache.hadoop
 hadoop-aliyun
 2.7.2.4-oss-magic-copy-12
   
   
   
 com.aliyun.oss
 aliyun-sdk-oss
 3.3.0
   
   
   
 com.aliyun
 aliyun-java-sdk-core
 3.7.1
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] lamberken commented on a change in pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



lamberken commented on a change in pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#discussion_r776130859



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/DropHoodieTableCommand.scala
##
@@ -85,25 +91,42 @@ case class DropHoodieTableCommand(
   }
 
   private def dropHiveDataSourceTable(
-sparkSession: SparkSession,
-table: CatalogTable,
-ifExists: Boolean,
-purge: Boolean): Unit = {
+   sparkSession: SparkSession,
+   hoodieCatalogTable: HoodieCatalogTable): Unit = {
+val table = hoodieCatalogTable.table
 val dbName = table.identifier.database.get
 val tableName = table.identifier.table
+
 // check database exists
 val dbExists = sparkSession.sessionState.catalog.databaseExists(dbName)
 if (!dbExists) {
   throw new NoSuchDatabaseException(dbName)
 }
-// check table exists
-if (!sparkSession.sessionState.catalog.tableExists(table.identifier)) {
-  throw new NoSuchTableException(dbName, table.identifier.table)
+
+if (HoodieTableType.MERGE_ON_READ == hoodieCatalogTable.tableType && 
purge) {
+  val snapshotTableName = hoodieCatalogTable.tableName + 
SUFFIX_SNAPSHOT_TABLE
+  val roTableName = hoodieCatalogTable.tableName + 
SUFFIX_READ_OPTIMIZED_TABLE
+
+  dropHiveTable(sparkSession, dbName, snapshotTableName)
+  dropHiveTable(sparkSession, dbName, roTableName)
+  dropHiveTable(sparkSession, dbName, hoodieCatalogTable.tableName, purge)

Review comment:
   let's unify the `tableName` here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3123) Consistent hashing index for upsert/insert write path

2021-12-28 Thread Yuwei Xiao (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuwei Xiao updated HUDI-3123:
-
Parent: HUDI-3000
Issue Type: Sub-task  (was: Improvement)

> Consistent hashing index for upsert/insert write path
> -
>
> Key: HUDI-3123
> URL: https://issues.apache.org/jira/browse/HUDI-3123
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Yuwei Xiao
>Priority: Major
>
> Basic write path (insert/upsert) implementation of consistent hashing index.
>  
> A framework will be provided for flexible plugin of different dynamic hashing 
> scheme, e.g., consistent hashing or extendible hashing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HUDI-3123) Consistent hashing index for upsert/insert write path

2021-12-28 Thread Yuwei Xiao (Jira)

Yuwei Xiao created HUDI-3123:


 Summary: Consistent hashing index for upsert/insert write path
 Key: HUDI-3123
 URL: https://issues.apache.org/jira/browse/HUDI-3123
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Yuwei Xiao


Basic write path (insert/upsert) implementation of consistent hashing index.

 

A framework will be provided for flexible plugin of different dynamic hashing 
scheme, e.g., consistent hashing or extendible hashing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot removed a comment on pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#issuecomment-1002364571


   
   ## CI report:
   
   * 5cd1675199d0dd65733982cca2132c03b5bf9d6c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4769)
 
   * 1f8244a3e0db6f82af5e8d45c8045c8b759309ba UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#issuecomment-1002365326


   
   ## CI report:
   
   * 5cd1675199d0dd65733982cca2132c03b5bf9d6c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4769)
 
   * 1f8244a3e0db6f82af5e8d45c8045c8b759309ba Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4789)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#issuecomment-1002365199


   
   ## CI report:
   
   * aec4dde1fb90319de9cf0c6f34771c0f193ccfd9 UNKNOWN
   * f426cb3cc3513d1baf26a70fdcb18114ffe5ddc5 UNKNOWN
   * 32ec46f289e4ecf4c1e66241ea954a9f3b34e9a6 UNKNOWN
   * 46a706bfce715e88ec2d2d53fc6d81815e7471ac UNKNOWN
   * 735b3d908f02bc2404192f74987afc82151fa837 UNKNOWN
   * 48eab85f19170f1ebfc8dbf86d7a66bf089604e1 UNKNOWN
   * 3402524a8b685565ceff5fdbd9d592f0228740c4 UNKNOWN
   * fbebef9773e5a513dd03f993b76bbf2b908c5f33 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4759)
 
   * cbf3703dbf1ab502bb61ba6800118f9069382b0a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#issuecomment-1001876380


   
   ## CI report:
   
   * aec4dde1fb90319de9cf0c6f34771c0f193ccfd9 UNKNOWN
   * f426cb3cc3513d1baf26a70fdcb18114ffe5ddc5 UNKNOWN
   * 32ec46f289e4ecf4c1e66241ea954a9f3b34e9a6 UNKNOWN
   * 46a706bfce715e88ec2d2d53fc6d81815e7471ac UNKNOWN
   * 735b3d908f02bc2404192f74987afc82151fa837 UNKNOWN
   * 48eab85f19170f1ebfc8dbf86d7a66bf089604e1 UNKNOWN
   * 3402524a8b685565ceff5fdbd9d592f0228740c4 UNKNOWN
   * fbebef9773e5a513dd03f993b76bbf2b908c5f33 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4759)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



hudi-bot removed a comment on pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#issuecomment-1002005192


   
   ## CI report:
   
   * 5cd1675199d0dd65733982cca2132c03b5bf9d6c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4769)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



hudi-bot commented on pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#issuecomment-1002364571


   
   ## CI report:
   
   * 5cd1675199d0dd65733982cca2132c03b5bf9d6c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4769)
 
   * 1f8244a3e0db6f82af5e8d45c8045c8b759309ba UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3108) Fix Purge Drop MOR Table Cause error

2021-12-28 Thread Forward Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Forward Xu updated HUDI-3108:
-
 Attachment: image-2021-12-29-10-04-31-025.png
 image-2021-12-29-09-52-30-999.png
Description: 
1.Spark creates three tables, such as `hudi_01`, `hudi_01_ro`, `hudi_01_rt`, 
when creating mor table. They share the same table directory.

2.purge drop hudi_01_ro will delete the table directory.
Then operation hudi_01_rt table will report an error.

drop table test_hudi_table_ro purge;

select * from test_hudi_table_rt;
{code:java}
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:381)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:500)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:494)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:494)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:284)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: File 
file:/opt/sourcecode/data-lake/warehouse/hudi/test_hudi_table does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:428)
at 
org.apache.hudi.common.util.TablePathUtils.getTablePath(TablePathUtils.java:50)
at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:76)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:103)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:353)
at 
org.apache.spark.sql.execution.datasources.FindDataSourceTable.$anonfun$readDataSourceTable$1(DataSourceStrategy.scala:261)
at 
org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
at 
org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at 
org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) 
{code}

  was:
1.Spark creates three tables, such as `hudi_01`, `hudi_01_ro`, `hudi_01_rt`, 
when creating mor table. They share the same table directory.

2.purge drop hudi_01_ro will delete the table directory.
Then operation hudi_01_rt table will report an error.


>  Fix Purge Drop MOR Table Cause error
> -
>
> Key: HUDI-3108
> URL: https://issues.apache.org/jira/browse/HUDI-3108
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-12-29-09-52-30-999.png, 
> image-2021-12-29-10-04-31-025.png
>
>
> 1.Spark creates three tables, such as `hudi_01`, `hudi_01_ro`, `hudi_01_rt`, 
> when creating mor table. They share the same table directory.
>

[GitHub] [hudi] minihippo commented on pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket

2021-12-28 Thread GitBox



minihippo commented on pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#issuecomment-1002355744


   @vinothchandar I addressed all comments and the failure ut is not related 
with this pr. Can we land this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] lamberken commented on a change in pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



lamberken commented on a change in pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#discussion_r776120977



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/DropHoodieTableCommand.scala
##
@@ -38,6 +38,9 @@ case class DropHoodieTableCommand(
 purge: Boolean)
 extends HoodieLeafRunnableCommand {
 
+  val SUFFIX_SNAPSHOT_TABLE = "_rt"

Review comment:
   we can use `MOR_SNAPSHOT_TABLE_SUFFIX`  `MOR_READ_OPTIMIZED_TABLE_SUFFIX`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] lamberken commented on a change in pull request #4455: [HUDI-3108] Fix Purge Drop MOR Table Cause error

2021-12-28 Thread GitBox



lamberken commented on a change in pull request #4455:
URL: https://github.com/apache/hudi/pull/4455#discussion_r776120977



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/DropHoodieTableCommand.scala
##
@@ -38,6 +38,9 @@ case class DropHoodieTableCommand(
 purge: Boolean)
 extends HoodieLeafRunnableCommand {
 
+  val SUFFIX_SNAPSHOT_TABLE = "_rt"

Review comment:
   we can use `MOR_SNAPSHOT_TABLE_SUFFIX` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-3122) presto query failed for bootstrap tables

2021-12-28 Thread Wenning Ding (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466279#comment-17466279
 ] 

Wenning Ding commented on HUDI-3122:


Thanks I will give a shot

> presto query failed for bootstrap tables
> 
>
> Key: HUDI-3122
> URL: https://issues.apache.org/jira/browse/HUDI-3122
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Wenning Ding
>Priority: Major
>
>  
> {{java.lang.NoClassDefFoundError: 
> org/apache/hudi/org/apache/hadoop/hbase/io/hfile/CacheConfig
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.createReader(HFileBootstrapIndex.java:181)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.access$400(HFileBootstrapIndex.java:76)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader.partitionIndexReader(HFileBootstrapIndex.java:272)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader.fetchBootstrapIndexInfo(HFileBootstrapIndex.java:262)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader.initIndexInfo(HFileBootstrapIndex.java:252)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader.(HFileBootstrapIndex.java:243)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.createReader(HFileBootstrapIndex.java:191)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$addFilesToView$2(AbstractTableFileSystemView.java:137)
> at java.util.HashMap.forEach(HashMap.java:1290)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:134)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:294)
> at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:281)}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HUDI-3122) presto query failed for bootstrap tables

2021-12-28 Thread Yue Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466280#comment-17466280
 ] 

Yue Zhang commented on HUDI-3122:
-

I see it in hudi-presto-bundle.jar but i am not sure if it solve your problem.
  已解压: org/apache/hadoop/hbase/io/hfile/Cacheable.class
  已解压: org/apache/hadoop/hbase/io/hfile/CacheableDeserializer.class
  已解压: org/apache/hadoop/hbase/io/hfile/CacheableDeserializerIdManager.class
  已解压: org/apache/hadoop/hbase/io/hfile/CacheConfig$1.class
  已解压: org/apache/hadoop/hbase/io/hfile/CacheConfig$ExternalBlockCaches.class
  已解压: org/apache/hadoop/hbase/io/hfile/CacheConfig.class

> presto query failed for bootstrap tables
> 
>
> Key: HUDI-3122
> URL: https://issues.apache.org/jira/browse/HUDI-3122
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Wenning Ding
>Priority: Major
>
>  
> {{java.lang.NoClassDefFoundError: 
> org/apache/hudi/org/apache/hadoop/hbase/io/hfile/CacheConfig
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.createReader(HFileBootstrapIndex.java:181)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.access$400(HFileBootstrapIndex.java:76)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader.partitionIndexReader(HFileBootstrapIndex.java:272)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader.fetchBootstrapIndexInfo(HFileBootstrapIndex.java:262)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader.initIndexInfo(HFileBootstrapIndex.java:252)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HFileBootstrapIndexReader.(HFileBootstrapIndex.java:243)
> at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.createReader(HFileBootstrapIndex.java:191)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$addFilesToView$2(AbstractTableFileSystemView.java:137)
> at java.util.HashMap.forEach(HashMap.java:1290)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:134)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:294)
> at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:281)}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

1 2 3 4 >

1 - 100 of 371 matches

Mail list logo