[jira] [Updated] (HUDI-4447) Hive Sync fails fails when performing delete table data operation

2022-07-21 Thread rex xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rex xiong updated HUDI-4447:

Description: 
 currently,  sync meta would fail when performing delete table data operation, 
because sync without setting database name/fields/extractor class, etc.This 
causes two problems:

1)sync schema will recreate table in default database

2)sync will use wrong extractor class(HiveStylePartitionValueExtractor) when 
syncing non partitioned table with metadata table enabled

 
{code:java}
spark-sql> delete from test_db.hudi_mor_none_part_table_321_0111 where id=1;
org.apache.hudi.exception.HoodieException: Could not sync using the meta sync 
class org.apache.hudi.hive.HiveSyncTool
    at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:61)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:626)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:625)
    at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)

Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
when hive syncing hudi_mor_none_part_table_321_0111
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:143)
    at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:59)
    ... 93 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
partitions for table hudi_mor_none_part_table_321_0111_ro
    at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:418)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232)
    at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:156)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:140)
    ... 94 more
Caused by: java.lang.IllegalArgumentException: Partition path  is not in the 
form partition_key=partition_value.
    at 
org.apache.hudi.hive.HiveStylePartitionValueExtractor.extractPartitionValuesInPath(HiveStylePartitionValueExtractor.java:37)
    at 
org.apache.hudi.hive.AbstractHiveSyncHoodieClient.getPartitionEvents(AbstractHiveSyncHoodieClient.java:81)
    at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:396)
    ... 97 more
org.apache.hudi.exception.HoodieException: Could not sync using the meta sync 
class org.apache.hudi.hive.HiveSyncTool
    at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:61)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:626)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:625)
    at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:625)

{code}
 

  was:
 currently,  sync meta would fail if performing delete table data operation, 
because sync without setting database name/fields/extractor class, etc.This 
causes two problems:

1)sync schema will recreate table in default database

2)sync will use wrong extractor class(HiveStylePartitionValueExtractor) when 
syncing non partitioned table with metadata table enabled

 
{code:java}
spark-sql> delete from test_db.hudi_mor_none_part_table_321_0111 where id=1;
org.apache.hudi.exception.HoodieException: Could not sync using the meta sync 
class org.apache.hudi.hive.HiveSyncTool
    at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:61)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:626)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:625)
    at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)

Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
when hive syncing hudi_mor_none_part_table_321_0111
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:143)
    at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:59)
    ... 93 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
partitions for table hudi_mor_none_part_table_321_0111_ro
    at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:418)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232)
    at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:156)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:140)
    ... 94 more
Caused by: java.lang.IllegalArgumentException: Partition path  is not in the 
form partition_key=partition_value.
    at 
org.apache.hudi.hive.HiveStylePartitionValueExtractor.extractPartitionValuesInPath(HiveStylePartitionValueExtractor.java:37)
    at 
org.apache.hudi

[jira] [Created] (HUDI-4447) Hive Sync fails fails when performing delete table data operation

2022-07-21 Thread rex xiong (Jira)
rex xiong created HUDI-4447:
---

 Summary: Hive Sync fails fails when performing delete table data 
operation
 Key: HUDI-4447
 URL: https://issues.apache.org/jira/browse/HUDI-4447
 Project: Apache Hudi
  Issue Type: Bug
  Components: meta-sync
 Environment: Spark3.2.1 & Hudi 0.11.1
Reporter: rex xiong
Assignee: rex xiong
 Fix For: 0.12.0


 currently,  sync meta would fail if performing delete table data operation, 
because sync without setting database name/fields/extractor class, etc.This 
causes two problems:

1)sync schema will recreate table in default database

2)sync will use wrong extractor class(HiveStylePartitionValueExtractor) when 
syncing non partitioned table with metadata table enabled

 
{code:java}
spark-sql> delete from test_db.hudi_mor_none_part_table_321_0111 where id=1;
org.apache.hudi.exception.HoodieException: Could not sync using the meta sync 
class org.apache.hudi.hive.HiveSyncTool
    at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:61)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:626)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:625)
    at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)

Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
when hive syncing hudi_mor_none_part_table_321_0111
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:143)
    at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:59)
    ... 93 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
partitions for table hudi_mor_none_part_table_321_0111_ro
    at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:418)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232)
    at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:156)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:140)
    ... 94 more
Caused by: java.lang.IllegalArgumentException: Partition path  is not in the 
form partition_key=partition_value.
    at 
org.apache.hudi.hive.HiveStylePartitionValueExtractor.extractPartitionValuesInPath(HiveStylePartitionValueExtractor.java:37)
    at 
org.apache.hudi.hive.AbstractHiveSyncHoodieClient.getPartitionEvents(AbstractHiveSyncHoodieClient.java:81)
    at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:396)
    ... 97 more
org.apache.hudi.exception.HoodieException: Could not sync using the meta sync 
class org.apache.hudi.hive.HiveSyncTool
    at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:61)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:626)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:625)
    at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:625)

{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3818) hudi doesn't support bytes column as primary key

2022-04-08 Thread rex xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rex xiong updated HUDI-3818:

Description: 
 when use bytes column as primary key, hudi will generate fixed hoodie key, 
then upserts will only insert one row. 
{code:java}
scala> sql("desc extended binary_test1").show()
+++---+
|            col_name|           data_type|comment|
+++---+
| _hoodie_commit_time|              string|   null|
|_hoodie_commit_seqno|              string|   null|
|  _hoodie_record_key|              string|   null|
|_hoodie_partition...|              string|   null|
|   _hoodie_file_name|              string|   null|
|                  id|              binary|   null|
|                name|              string|   null|
|                  dt|              string|   null|
|                    |                    |       |
|# Detailed Table ...|                    |       |
|            Database|             default|       |
|               Table|        binary_test1|       |
|               Owner|                root|       |
|        Created Time|Sat Apr 02 13:28:...|       |
|         Last Access|             UNKNOWN|       |
|          Created By|         Spark 3.2.0|       |
|                Type|             MANAGED|       |
|            Provider|                hudi|       |
|    Table Properties|[last_commit_time...|       |
|          Statistics|        435194 bytes|       |
+++---+

scala> sql("select * from binary_test1").show()
+---+++--+++-++
|_hoodie_commit_time|_hoodie_commit_seqno|  
_hoodie_record_key|_hoodie_partition_path|   _hoodie_file_name|                 
 id|     name|      dt|
+---+++--+++-++
|  20220402132927590|20220402132927590...|id:java.nio.HeapB...|                 
     |1a06106e-5e7a-4e6...|[03 45 6A 00 00 0...|Mary Jane|20220401|
+---+++--+++-++{code}

  was:
 
{code:java}

scala> sql("desc extended binary_test1").show(false)
++--+---+
|col_name                    |data_type                                         
                                    |comment|
++--+---+
|_hoodie_commit_time         |string                                            
                                    |null   |
|_hoodie_commit_seqno        |string                                            
                                    |null   |
|_hoodie_record_key          |string                                            
                                    |null   |
|_hoodie_partition_path      |string                                            
                                    |null   |
|_hoodie_file_name           |string                                            
                                    |null   |
|id                          |binary                                            
                                    |null   |
|name                        |string                                            
                                    |null   |
|dt                          |string                                            
                                    |null   |
|                            |                                                  
                                    |       |
|# Detailed Table Information|                                                  
                                    |       |
|Database                    |default                                           
                                    |       |
|Table                       |binary_test1                                      
                                    |       |
|Owner                       |root                                              
                                    |       |
|Created Time                |Sat Apr 02 13:28:29 CST 2022                      
                                    |       |
|Last Access                 |UNKNOWN                                           
                                    |       |
|Created By                  |Spark 3.2.0                                       
                                    |       |
|Type                        |MANAGED                                           
    

[jira] [Updated] (HUDI-3818) hudi doesn't support bytes column as primary key

2022-04-08 Thread rex xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rex xiong updated HUDI-3818:

Description: 
 
{code:java}

scala> sql("desc extended binary_test1").show(false)
++--+---+
|col_name                    |data_type                                         
                                    |comment|
++--+---+
|_hoodie_commit_time         |string                                            
                                    |null   |
|_hoodie_commit_seqno        |string                                            
                                    |null   |
|_hoodie_record_key          |string                                            
                                    |null   |
|_hoodie_partition_path      |string                                            
                                    |null   |
|_hoodie_file_name           |string                                            
                                    |null   |
|id                          |binary                                            
                                    |null   |
|name                        |string                                            
                                    |null   |
|dt                          |string                                            
                                    |null   |
|                            |                                                  
                                    |       |
|# Detailed Table Information|                                                  
                                    |       |
|Database                    |default                                           
                                    |       |
|Table                       |binary_test1                                      
                                    |       |
|Owner                       |root                                              
                                    |       |
|Created Time                |Sat Apr 02 13:28:29 CST 2022                      
                                    |       |
|Last Access                 |UNKNOWN                                           
                                    |       |
|Created By                  |Spark 3.2.0                                       
                                    |       |
|Type                        |MANAGED                                           
                                    |       |
|Provider                    |hudi                                              
                                    |       |
|Table Properties            |[last_commit_time_sync=20220402132927590, 
preCombineField=id, primaryKey=id, type=cow]|       |
|Statistics                  |435194 bytes                                      
                                    |       |
++--+---+
only showing top 20 rows


scala> sql("select * from binary_test1").show(false)
+---+-+---+--+--+-+-++
|_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key                   
          |_hoodie_partition_path|_hoodie_file_name                             
                            |id                                               
|name     |dt      |
+---+-+---+--+--+-+-++
|20220402132927590  |20220402132927590_0_1|id:java.nio.HeapByteBuffer[pos=0 
lim=16 cap=16]|                      
|1a06106e-5e7a-4e68-9ebb-a0dceab70d87-0_0-12-1005_20220402132927590.parquet|[03 
45 6A 00 00 00 00 00 00 00 00 00 00 00 00 00]|Mary Jane|20220401|
+---+-+---+--+--+-+-++
 {code}

> hudi doesn't support bytes column as primary key
> 
>
> Key: HUDI-3818
> URL: https://issues.apache.org/jira/browse/HUDI-3818
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter:

[jira] [Updated] (HUDI-3817) Need to specify parquet version for hudi-hadoop-mr-bundle when compile hudi using -Dspark3

2022-04-07 Thread rex xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rex xiong updated HUDI-3817:

Priority: Minor  (was: Major)

> Need to specify parquet version for hudi-hadoop-mr-bundle when compile hudi 
> using -Dspark3
> --
>
> Key: HUDI-3817
> URL: https://issues.apache.org/jira/browse/HUDI-3817
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: rex xiong
>Assignee: rex xiong
>Priority: Minor
>
> if use -Dspark3 to compile hudi, module hudi-hadoop-mr will use 1.12.2 of 
> parquet which has conflict with hive. 
> {code:java}
> hive> select * from h_321_0401_mor_rt;
> OK
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/parquet/schema/LogicalTypeAnnotation
>     at 
> org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:177)
>     at 
> org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:242)
>     at 
> org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:199)
>     at 
> org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:152)
>     at 
> org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:260)
>     at 
> org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:146)
>     at 
> org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:137)
>     at 
> org.apache.hudi.common.table.TableSchemaResolver.readSchemaFromLogFile(TableSchemaResolver.java:520)
>     at 
> org.apache.hudi.common.table.TableSchemaResolver.readSchemaFromLogFile(TableSchemaResolver.java:503)
>     at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:105)
>     at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:138)
>     at 
> org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:530)
>     at 
> org.apache.hudi.common.table.TableSchemaResolver.(TableSchemaResolver.java:72)
>     at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:90)
>     at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:72)
>     at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:62)
>     at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70)
>     at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47)
>     at 
> org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:74)
>     at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:776)
>     at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:344)
>     at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:540)
>     at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:509)
>     at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
>     at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2777)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
>     at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
>     at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3818) hudi doesn't support bytes column as primary key

2022-04-07 Thread rex xiong (Jira)
rex xiong created HUDI-3818:
---

 Summary: hudi doesn't support bytes column as primary key
 Key: HUDI-3818
 URL: https://issues.apache.org/jira/browse/HUDI-3818
 Project: Apache Hudi
  Issue Type: Bug
Reporter: rex xiong
Assignee: rex xiong






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3817) Need to specify parquet version for hudi-hadoop-mr-bundle when compile hudi using -Dspark3

2022-04-07 Thread rex xiong (Jira)
rex xiong created HUDI-3817:
---

 Summary: Need to specify parquet version for hudi-hadoop-mr-bundle 
when compile hudi using -Dspark3
 Key: HUDI-3817
 URL: https://issues.apache.org/jira/browse/HUDI-3817
 Project: Apache Hudi
  Issue Type: Bug
  Components: hive
Reporter: rex xiong
Assignee: rex xiong


if use -Dspark3 to compile hudi, module hudi-hadoop-mr will use 1.12.2 of 
parquet which has conflict with hive. 
{code:java}
hive> select * from h_321_0401_mor_rt;
OK
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/parquet/schema/LogicalTypeAnnotation
    at 
org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:177)
    at 
org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:242)
    at 
org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:199)
    at 
org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:152)
    at 
org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:260)
    at 
org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:146)
    at 
org.apache.hudi.org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:137)
    at 
org.apache.hudi.common.table.TableSchemaResolver.readSchemaFromLogFile(TableSchemaResolver.java:520)
    at 
org.apache.hudi.common.table.TableSchemaResolver.readSchemaFromLogFile(TableSchemaResolver.java:503)
    at 
org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:105)
    at 
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:138)
    at 
org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:530)
    at 
org.apache.hudi.common.table.TableSchemaResolver.(TableSchemaResolver.java:72)
    at 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:90)
    at 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:72)
    at 
org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:62)
    at 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70)
    at 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47)
    at 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:74)
    at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:776)
    at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:344)
    at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:540)
    at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:509)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2777)
    at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) 
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-2762) Ensure hive can query insert only logs in MOR

2022-04-07 Thread rex xiong (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518693#comment-17518693
 ] 

rex xiong commented on HUDI-2762:
-

 as [~mengtao] mentioned, hive will filter out files which start with "." or 
"_", and I think It's not appropriate to

simply modify on the hive side, because Many users may use this "feature" in 
their own production scenarios as hive treats these files as temporary files.

> Ensure hive can query insert only logs in MOR
> -
>
> Key: HUDI-2762
> URL: https://issues.apache.org/jira/browse/HUDI-2762
> Project: Apache Hudi
>  Issue Type: Task
>  Components: hive
>Reporter: Rajesh Mahindra
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.11.0
>
>
> Currently, we are able to query MOR tables that have base parquet files with 
> inserts an logs files with updates. However, we are currently unable to query 
> tables with insert only log files. Both _ro and _rt tables are returning 0 
> rows. However, hms does create the table and partitions for the table. 
>  
> One sample table is here:
> [https://s3.console.aws.amazon.com/s3/buckets/debug-hive-site?prefix=database/®ion=us-east-2]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-3744) NoSuchMethodError of getReadStatistics with Spark 3.2/Hadoop 3.2 using HBase

2022-04-06 Thread rex xiong (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518031#comment-17518031
 ] 

rex xiong commented on HUDI-3744:
-

 [~xushiyan]  [~guoyihua] My old environment has been released, I setup another 
one to do the test, but i can't reproduce the problem either, I speculate it 
may be because my cluster has another version of hbase, So I think we can close 
this question directly.

> NoSuchMethodError of getReadStatistics with Spark 3.2/Hadoop 3.2 using HBase 
> -
>
> Key: HUDI-3744
> URL: https://issues.apache.org/jira/browse/HUDI-3744
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>
> Environment: Hadoop 3.2.1 & Spark-3.2.1 
> hudi  compile from commit f2a93ead3b5a6964a72b3543ada58aa334edef9c 
> just use spark-sql and default job configuration to execute "show partitions 
> [hudi_table_name];"
> {code:java}
> // command
> spark-sql  --conf spark.serializer=org.apache.spark.serializer.KryoSerializer 
> --conf 
> spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension 
> --conf 
> spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
> // spark-sql
> spark-sql> show partitions hudi_partition_table;
> {code}
> // code placeholderjava.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
>     at 
> org.apache.hudi.io.storage.HoodieHFileReader.close(HoodieHFileReader.java:423)
>     at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.close(HoodieBackedTableMetadata.java:435)
>     at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$0(HoodieBackedTableMetadata.java:162)
>     at java.util.HashMap.forEach(HashMap.java:1290)
>     at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:138)
>     at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:128)
>     at 
> org.apache.hudi.metadata.BaseTableMetadata.fetchAllPartitionPaths(BaseTableMetadata.java:281)
>     at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:111)
>     at 
> org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:308)
>     at 
> org.apache.spark.sql.hudi.HoodieSqlCommonUtils$.getAllPartitionPaths(HoodieSqlCommonUtils.scala:81)
>     at 
> org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable.getPartitionPaths(HoodieCatalogTable.scala:157)
>     at 
> org.apache.spark.sql.hudi.command.ShowHoodieTablePartitionsCommand.run(ShowHoodieTablePartitionsCommand.scala:51)
>     at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>     at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>     at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
>     at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
>     at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>     at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>     at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>     at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>     at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
>     at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-1180) Upgrade HBase to 2.x

2022-03-28 Thread rex xiong (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513790#comment-17513790
 ] 

rex xiong commented on HUDI-1180:
-

[~guoyihua] 

Environment: Hadoop 3.2.1 & Spark-3.2.1 

hudi  compile from commit f2a93ead3b5a6964a72b3543ada58aa334edef9c 

just use spark-sql and default job configuration to execute "show partitions 
[hudi_table_name];"
{code:java}
// command
spark-sql  --conf spark.serializer=org.apache.spark.serializer.KryoSerializer 
--conf 
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension 
--conf 
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog

// spark-sql
spark-sql> show partitions hudi_partition_table;

{code}

> Upgrade HBase to 2.x
> 
>
> Key: HUDI-1180
> URL: https://issues.apache.org/jira/browse/HUDI-1180
> Project: Apache Hudi
>  Issue Type: Task
>  Components: writer-core
>Affects Versions: 0.9.0
>Reporter: Wenning Ding
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
> Attachments: image-2022-03-28-13-48-58-149.png
>
>
> Trying to upgrade HBase to 2.3.3 but ran into several issues.
> According to the Hadoop version support matrix: 
> [http://hbase.apache.org/book.html#hadoop], also need to upgrade Hadoop to 
> 2.8.5+.
>  
> There are several API conflicts between HBase 2.2.3 and HBase 1.2.3, we need 
> to resolve this first. After resolving conflicts, I am able to compile it but 
> then I ran into a tricky jetty version issue during the testing:
> {code:java}
> [ERROR] TestHBaseIndex.testDelete()  Time elapsed: 4.705 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSimpleTagLocationAndUpdate()  Time elapsed: 0.174 
> s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSimpleTagLocationAndUpdateWithRollback()  Time 
> elapsed: 0.076 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSmallBatchSize()  Time elapsed: 0.122 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTagLocationAndDuplicateUpdate()  Time elapsed: 
> 0.16 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTotalGetsBatching()  Time elapsed: 1.771 s  <<< 
> ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTotalPutsBatching()  Time elapsed: 0.082 s  <<< 
> ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> 34206 [Thread-260] WARN  
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner  - DirectoryScanner: 
> shutdown has been called
> 34240 [BP-1058834949-10.0.0.2-1597189606506 heartbeating to 
> localhost/127.0.0.1:55924] WARN  
> org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager  - 
> IncrementalBlockReportManager interrupted
> 34240 [BP-1058834949-10.0.0.2-1597189606506 heartbeating to 
> localhost/127.0.0.1:55924] WARN  
> org.apache.hadoop.hdfs.server.datanode.DataNode  - Ending block pool service 
> for: Block pool BP-1058834949-10.0.0.2-1597189606506 (Datanode Uuid 
> cb7bd8aa-5d79-4955-b1ec-bdaf7f1b6431) service to localhost/127.0.0.1:55924
> 34246 
> [refreshUsed-/private/var/folders/98/mxq3vc_n6l5728rf1wmcwrqs52lpwg/T/temp1791820148926982977/dfs/data/data1/current/BP-1058834949-10.0.0.2-1597189606506]
>  WARN  org.apache.hadoop.fs.CachingGetSpaceUsed  - Thread Interrupted waiting 
> to refresh disk information: sleep interrupted
> 34247 
> [refreshUsed-/private/var/folders/98/mxq3vc_n6l5728rf1wmcwrqs52lpwg/T/temp1791820148926982977/dfs/data/data2/current/BP-1058834949-10.0.0.2-1597189606506]
>  WARN  org.apache.hadoop.fs.CachingGetSpaceUsed  - Thread Interrupted waiting 
> to refresh disk information: sleep interrupted
> 37192 [HBase-Metrics2-1] WARN  org.apache.hadoop.metrics2.impl.MetricsConfig  
> - Cannot locate configuration: tried 
> hadoop-metrics2-datanode.properties,hadoop-metrics2.properties
> 43904 
> [master/iad1-ws-cor-r12:0:becomeActiveMaster-SendThread(localhost:58768)] 
> WARN  org.apache.zookeeper.ClientCnxn  - Session 0x173dfeb0c8b0004 for server 
> null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

[jira] [Commented] (HUDI-1180) Upgrade HBase to 2.x

2022-03-27 Thread rex xiong (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513153#comment-17513153
 ] 

rex xiong commented on HUDI-1180:
-

[~guoyihua]  for hadoop 3.2.1,current hbase version2.x has incompatible APIs, 
as shown below
{code:java}
// code placeholder
java.lang.NoSuchMethodError: 
org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
    at 
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
    at 
org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
    at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
    at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
    at 
org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
    at 
org.apache.hudi.io.storage.HoodieHFileReader.close(HoodieHFileReader.java:423)
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.close(HoodieBackedTableMetadata.java:435)
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$0(HoodieBackedTableMetadata.java:162)
    at java.util.HashMap.forEach(HashMap.java:1290)
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:138)
    at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:128)
    at 
org.apache.hudi.metadata.BaseTableMetadata.fetchAllPartitionPaths(BaseTableMetadata.java:281)
    at 
org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:111)
    at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:308)
    at 
org.apache.spark.sql.hudi.HoodieSqlCommonUtils$.getAllPartitionPaths(HoodieSqlCommonUtils.scala:81)
    at 
org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable.getPartitionPaths(HoodieCatalogTable.scala:157)
    at 
org.apache.spark.sql.hudi.command.ShowHoodieTablePartitionsCommand.run(ShowHoodieTablePartitionsCommand.scala:51)
    at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
    at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
    at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
    at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
    at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
    at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
    at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
    at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
{code}

> Upgrade HBase to 2.x
> 
>
> Key: HUDI-1180
> URL: https://issues.apache.org/jira/browse/HUDI-1180
> Project: Apache Hudi
>  Issue Type: Task
>  Components: writer-core
>Affects Versions: 0.9.0
>Reporter: Wenning Ding
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
> Attachments: image-2022-03-28-13-48-58-149.png
>
>
> Trying to upgrade HBase to 2.3.3 but ran into several issues.
> According to the Hadoop version support matrix: 
> [http://hbase.apache.org/book.html#hadoop], also need to upgrade Hadoop to 
> 2.8.5+.
>  
> There are several API conflicts between HBase 2.2.3 and HBase 1.2.3, we need 
> to resolve this first. After resolving conflicts, I am able to compile it but 
> then I ran into a tricky jetty version issue during the testing:
> {code:java}
> [ERROR] TestHBaseIndex.testDelete()  Time elapsed: 4.705 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSimpleTagLocationAndUpdate()  Time elapsed: 0.174 
> s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSimpleTagLocationAndUpdateWithRollback()  Time 
> elapsed: 0.076 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclips

[jira] [Updated] (HUDI-1180) Upgrade HBase to 2.x

2022-03-27 Thread rex xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rex xiong updated HUDI-1180:

Attachment: image-2022-03-28-13-48-58-149.png

> Upgrade HBase to 2.x
> 
>
> Key: HUDI-1180
> URL: https://issues.apache.org/jira/browse/HUDI-1180
> Project: Apache Hudi
>  Issue Type: Task
>  Components: writer-core
>Affects Versions: 0.9.0
>Reporter: Wenning Ding
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
> Attachments: image-2022-03-28-13-48-58-149.png
>
>
> Trying to upgrade HBase to 2.3.3 but ran into several issues.
> According to the Hadoop version support matrix: 
> [http://hbase.apache.org/book.html#hadoop], also need to upgrade Hadoop to 
> 2.8.5+.
>  
> There are several API conflicts between HBase 2.2.3 and HBase 1.2.3, we need 
> to resolve this first. After resolving conflicts, I am able to compile it but 
> then I ran into a tricky jetty version issue during the testing:
> {code:java}
> [ERROR] TestHBaseIndex.testDelete()  Time elapsed: 4.705 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSimpleTagLocationAndUpdate()  Time elapsed: 0.174 
> s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSimpleTagLocationAndUpdateWithRollback()  Time 
> elapsed: 0.076 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSmallBatchSize()  Time elapsed: 0.122 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTagLocationAndDuplicateUpdate()  Time elapsed: 
> 0.16 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTotalGetsBatching()  Time elapsed: 1.771 s  <<< 
> ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTotalPutsBatching()  Time elapsed: 0.082 s  <<< 
> ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> 34206 [Thread-260] WARN  
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner  - DirectoryScanner: 
> shutdown has been called
> 34240 [BP-1058834949-10.0.0.2-1597189606506 heartbeating to 
> localhost/127.0.0.1:55924] WARN  
> org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager  - 
> IncrementalBlockReportManager interrupted
> 34240 [BP-1058834949-10.0.0.2-1597189606506 heartbeating to 
> localhost/127.0.0.1:55924] WARN  
> org.apache.hadoop.hdfs.server.datanode.DataNode  - Ending block pool service 
> for: Block pool BP-1058834949-10.0.0.2-1597189606506 (Datanode Uuid 
> cb7bd8aa-5d79-4955-b1ec-bdaf7f1b6431) service to localhost/127.0.0.1:55924
> 34246 
> [refreshUsed-/private/var/folders/98/mxq3vc_n6l5728rf1wmcwrqs52lpwg/T/temp1791820148926982977/dfs/data/data1/current/BP-1058834949-10.0.0.2-1597189606506]
>  WARN  org.apache.hadoop.fs.CachingGetSpaceUsed  - Thread Interrupted waiting 
> to refresh disk information: sleep interrupted
> 34247 
> [refreshUsed-/private/var/folders/98/mxq3vc_n6l5728rf1wmcwrqs52lpwg/T/temp1791820148926982977/dfs/data/data2/current/BP-1058834949-10.0.0.2-1597189606506]
>  WARN  org.apache.hadoop.fs.CachingGetSpaceUsed  - Thread Interrupted waiting 
> to refresh disk information: sleep interrupted
> 37192 [HBase-Metrics2-1] WARN  org.apache.hadoop.metrics2.impl.MetricsConfig  
> - Cannot locate configuration: tried 
> hadoop-metrics2-datanode.properties,hadoop-metrics2.properties
> 43904 
> [master/iad1-ws-cor-r12:0:becomeActiveMaster-SendThread(localhost:58768)] 
> WARN  org.apache.zookeeper.ClientCnxn  - Session 0x173dfeb0c8b0004 for server 
> null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> [INFO] 
> [INFO] Results:
> [INFO] 
> [ERROR] Errors: 
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.s

[jira] [Updated] (HUDI-2520) Certify sync with Hive 3

2022-03-14 Thread rex xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rex xiong updated HUDI-2520:

Description: 
# when execute CTAS statment,the query failed due to twice sync meta problem: 
HoodieSparkSqlWriter synced meta first time, followed by 
HoodieCatalog.createHoodieTable synced the second time when 
HoodieStagedTable.commitStagedChanges
{code:java}
create table if not exists h3_cow using hudi partitioned by (dt) options (type 
= 'cow', primaryKey = 'id,name') as select 1 as id, 'a1' as name, 20 as price, 
'2021-01-03' as dt;

22/03/14 14:26:21 ERROR [main] Utils: Aborting task
org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException: Table or 
view 'h3_cow' already exists in database 'default'
        at 
org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createHiveDataSourceTable(CreateHoodieTableCommand.scala:172)
        at 
org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createTableInCatalog(CreateHoodieTableCommand.scala:148)
        at 
org.apache.spark.sql.hudi.catalog.HoodieCatalog.createHoodieTable(HoodieCatalog.scala:254)
        at 
org.apache.spark.sql.hudi.catalog.HoodieStagedTable.commitStagedChanges(HoodieStagedTable.scala:62)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:484)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable(WriteToDataSourceV2Exec.scala:468)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable$(WriteToDataSourceV2Exec.scala:463)
        at 
org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.writeToTable(WriteToDataSourceV2Exec.scala:106)
        at 
org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:127)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481){code}

2. when truncate partition table,neither metadata nor data is truncated and 
truncate partition table with partition specs fails 
{code:java}
// truncate partition table without partition spec, the query is success but 
never delete data
spark-sql> truncate table mor_partition_table_0314;
Time taken: 0.256 seconds

// truncate partition table with partition spec, 
spark-sql> truncate table mor_partition_table_0314 partition(dt=3);
Error in query: Table spark_catalog.default.mor_partition_table_0314 does not 
support partition management.;
'TruncatePartition unresolvedpartitionspec((dt,3), None)
+- ResolvedTable org.apache.spark.sql.hudi.catalog.HoodieCatalog@63f609a4, 
default.mor_partition_table_0314,
{code}
3. re-drop exist partition will cause NPE exception when sync to hivemetastore 
{code:java}
spark-sql> alter table  mor_partition_table_0314 drop partition  (dt=3);
spark-sql> alter table  mor_partition_table_0314 drop partition  (dt=3);

MetaException(message:java.lang.NullPointerException)
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$drop_partition_by_name_with_environment_context_result$drop_partition_by_name_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java)
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$drop_partition_by_name_with_environment_context_result$drop_partition_by_name_with_environment_context_resultS

[jira] [Updated] (HUDI-2520) Certify sync with Hive 3

2022-03-14 Thread rex xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rex xiong updated HUDI-2520:

Description: 
# when execute CTAS statment,the query failed due to twice sync meta problem: 
HoodieSparkSqlWriter synced meta first time, followed by 
HoodieCatalog.createHoodieTable synced the second time when 
HoodieStagedTable.commitStagedChanges
{code:java}
create table if not exists h3_cow using hudi partitioned by (dt) options (type 
= 'cow', primaryKey = 'id,name') as select 1 as id, 'a1' as name, 20 as price, 
'2021-01-03' as dt;

22/03/14 14:26:21 ERROR [main] Utils: Aborting task
org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException: Table or 
view 'h3_cow' already exists in database 'default'
        at 
org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createHiveDataSourceTable(CreateHoodieTableCommand.scala:172)
        at 
org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createTableInCatalog(CreateHoodieTableCommand.scala:148)
        at 
org.apache.spark.sql.hudi.catalog.HoodieCatalog.createHoodieTable(HoodieCatalog.scala:254)
        at 
org.apache.spark.sql.hudi.catalog.HoodieStagedTable.commitStagedChanges(HoodieStagedTable.scala:62)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:484)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable(WriteToDataSourceV2Exec.scala:468)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable$(WriteToDataSourceV2Exec.scala:463)
        at 
org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.writeToTable(WriteToDataSourceV2Exec.scala:106)
        at 
org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:127)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481){code}

2. when truncate partition table,neither metadata nor data is truncated and 
truncate partition table with partition specs fails 
{code:java}
// truncate partition table without partition spec, the query is success but 
never delete data
spark-sql> truncate table mor_partition_table_0314;
Time taken: 0.256 seconds

// truncate partition table with partition spec, 
spark-sql> truncate table mor_partition_table_0314 partition(dt=3);
Error in query: Table spark_catalog.default.mor_partition_table_0314 does not 
support partition management.;
'TruncatePartition unresolvedpartitionspec((dt,3), None)
+- ResolvedTable org.apache.spark.sql.hudi.catalog.HoodieCatalog@63f609a4, 
default.mor_partition_table_0314,
{code}
 

 

  was:
# when execute CTAS statment,the query failed due to twice sync meta problem: 
HoodieSparkSqlWriter synced meta first time, followed by 
HoodieCatalog.createHoodieTable synced the second time when 
HoodieStagedTable.commitStagedChanges
{code:java}
create table if not exists h3_cow using hudi partitioned by (dt) options (type 
= 'cow', primaryKey = 'id,name') as select 1 as id, 'a1' as name, 20 as price, 
'2021-01-03' as dt;

22/03/14 14:26:21 ERROR [main] Utils: Aborting task
org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException: Table or 
view 'h3_cow' already exists in database 'default'
        at 
org.apache.spark.sql.hudi.command.CreateHoodieTableCo

[jira] [Updated] (HUDI-2520) Certify sync with Hive 3

2022-03-14 Thread rex xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rex xiong updated HUDI-2520:

Attachment: image-2022-03-14-15-52-02-021.png

> Certify sync with Hive 3
> 
>
> Key: HUDI-2520
> URL: https://issues.apache.org/jira/browse/HUDI-2520
> Project: Apache Hudi
>  Issue Type: Task
>  Components: hive, meta-sync
>Reporter: Sagar Sumit
>Assignee: rex xiong
>Priority: Blocker
> Fix For: 0.11.0
>
> Attachments: image-2022-03-14-15-52-02-021.png
>
>
> # when execute CTAS statment,the query failed due to twice sync meta problem: 
> HoodieSparkSqlWriter synced meta first time, followed by 
> HoodieCatalog.createHoodieTable synced the second time when 
> HoodieStagedTable.commitStagedChanges
> {code:java}
> create table if not exists h3_cow using hudi partitioned by (dt) options 
> (type = 'cow', primaryKey = 'id,name') as select 1 as id, 'a1' as name, 20 as 
> price, '2021-01-03' as dt;
> 22/03/14 14:26:21 ERROR [main] Utils: Aborting task
> org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException: Table or 
> view 'h3_cow' already exists in database 'default'
>         at 
> org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createHiveDataSourceTable(CreateHoodieTableCommand.scala:172)
>         at 
> org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createTableInCatalog(CreateHoodieTableCommand.scala:148)
>         at 
> org.apache.spark.sql.hudi.catalog.HoodieCatalog.createHoodieTable(HoodieCatalog.scala:254)
>         at 
> org.apache.spark.sql.hudi.catalog.HoodieStagedTable.commitStagedChanges(HoodieStagedTable.scala:62)
>         at 
> org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:484)
>         at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
>         at 
> org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable(WriteToDataSourceV2Exec.scala:468)
>         at 
> org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable$(WriteToDataSourceV2Exec.scala:463)
>         at 
> org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.writeToTable(WriteToDataSourceV2Exec.scala:106)
>         at 
> org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:127)
>         at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
>         at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
>         at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
>         at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
>         at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>         at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>         at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>         at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>         at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>         at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
>         at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
>         at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
>         at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>         at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2520) Certify sync with Hive 3

2022-03-13 Thread rex xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rex xiong updated HUDI-2520:

Description: 
# when execute CTAS statment,the query failed due to twice sync meta problem: 
HoodieSparkSqlWriter synced meta first time, followed by 
HoodieCatalog.createHoodieTable synced the second time when 
HoodieStagedTable.commitStagedChanges
{code:java}
create table if not exists h3_cow using hudi partitioned by (dt) options (type 
= 'cow', primaryKey = 'id,name') as select 1 as id, 'a1' as name, 20 as price, 
'2021-01-03' as dt;

22/03/14 14:26:21 ERROR [main] Utils: Aborting task
org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException: Table or 
view 'h3_cow' already exists in database 'default'
        at 
org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createHiveDataSourceTable(CreateHoodieTableCommand.scala:172)
        at 
org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createTableInCatalog(CreateHoodieTableCommand.scala:148)
        at 
org.apache.spark.sql.hudi.catalog.HoodieCatalog.createHoodieTable(HoodieCatalog.scala:254)
        at 
org.apache.spark.sql.hudi.catalog.HoodieStagedTable.commitStagedChanges(HoodieStagedTable.scala:62)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:484)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable(WriteToDataSourceV2Exec.scala:468)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable$(WriteToDataSourceV2Exec.scala:463)
        at 
org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.writeToTable(WriteToDataSourceV2Exec.scala:106)
        at 
org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:127)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)

{code}

 #  

  was:1. For CTAS statement, 啊


> Certify sync with Hive 3
> 
>
> Key: HUDI-2520
> URL: https://issues.apache.org/jira/browse/HUDI-2520
> Project: Apache Hudi
>  Issue Type: Task
>  Components: hive, meta-sync
>Reporter: Sagar Sumit
>Assignee: rex xiong
>Priority: Blocker
> Fix For: 0.11.0
>
>
> # when execute CTAS statment,the query failed due to twice sync meta problem: 
> HoodieSparkSqlWriter synced meta first time, followed by 
> HoodieCatalog.createHoodieTable synced the second time when 
> HoodieStagedTable.commitStagedChanges
> {code:java}
> create table if not exists h3_cow using hudi partitioned by (dt) options 
> (type = 'cow', primaryKey = 'id,name') as select 1 as id, 'a1' as name, 20 as 
> price, '2021-01-03' as dt;
> 22/03/14 14:26:21 ERROR [main] Utils: Aborting task
> org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException: Table or 
> view 'h3_cow' already exists in database 'default'
>         at 
> org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createHiveDataSourceTable(CreateHoodieTableCommand.scala:172)
>         at 
> org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createTableInCatalog(CreateHoodieTableCommand.scala:148)
>         at 
> org.apache.spark.sql.hudi.catalog.HoodieCatalog.creat

[jira] [Updated] (HUDI-2520) Certify sync with Hive 3

2022-03-13 Thread rex xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rex xiong updated HUDI-2520:

Description: 
# when execute CTAS statment,the query failed due to twice sync meta problem: 
HoodieSparkSqlWriter synced meta first time, followed by 
HoodieCatalog.createHoodieTable synced the second time when 
HoodieStagedTable.commitStagedChanges
{code:java}
create table if not exists h3_cow using hudi partitioned by (dt) options (type 
= 'cow', primaryKey = 'id,name') as select 1 as id, 'a1' as name, 20 as price, 
'2021-01-03' as dt;

22/03/14 14:26:21 ERROR [main] Utils: Aborting task
org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException: Table or 
view 'h3_cow' already exists in database 'default'
        at 
org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createHiveDataSourceTable(CreateHoodieTableCommand.scala:172)
        at 
org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createTableInCatalog(CreateHoodieTableCommand.scala:148)
        at 
org.apache.spark.sql.hudi.catalog.HoodieCatalog.createHoodieTable(HoodieCatalog.scala:254)
        at 
org.apache.spark.sql.hudi.catalog.HoodieStagedTable.commitStagedChanges(HoodieStagedTable.scala:62)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:484)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable(WriteToDataSourceV2Exec.scala:468)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable$(WriteToDataSourceV2Exec.scala:463)
        at 
org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.writeToTable(WriteToDataSourceV2Exec.scala:106)
        at 
org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:127)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)

{code}

  was:
# when execute CTAS statment,the query failed due to twice sync meta problem: 
HoodieSparkSqlWriter synced meta first time, followed by 
HoodieCatalog.createHoodieTable synced the second time when 
HoodieStagedTable.commitStagedChanges
{code:java}
create table if not exists h3_cow using hudi partitioned by (dt) options (type 
= 'cow', primaryKey = 'id,name') as select 1 as id, 'a1' as name, 20 as price, 
'2021-01-03' as dt;

22/03/14 14:26:21 ERROR [main] Utils: Aborting task
org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException: Table or 
view 'h3_cow' already exists in database 'default'
        at 
org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createHiveDataSourceTable(CreateHoodieTableCommand.scala:172)
        at 
org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createTableInCatalog(CreateHoodieTableCommand.scala:148)
        at 
org.apache.spark.sql.hudi.catalog.HoodieCatalog.createHoodieTable(HoodieCatalog.scala:254)
        at 
org.apache.spark.sql.hudi.catalog.HoodieStagedTable.commitStagedChanges(HoodieStagedTable.scala:62)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:484)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
        at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHel

[jira] [Updated] (HUDI-2520) Certify sync with Hive 3

2022-03-13 Thread rex xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rex xiong updated HUDI-2520:

Description: 1. For CTAS statement, 啊

> Certify sync with Hive 3
> 
>
> Key: HUDI-2520
> URL: https://issues.apache.org/jira/browse/HUDI-2520
> Project: Apache Hudi
>  Issue Type: Task
>  Components: hive, meta-sync
>Reporter: Sagar Sumit
>Assignee: rex xiong
>Priority: Blocker
> Fix For: 0.11.0
>
>
> 1. For CTAS statement, 啊



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3605) hbase dependency need relocation or use other version for hive3

2022-03-10 Thread rex xiong (Jira)
rex xiong created HUDI-3605:
---

 Summary: hbase dependency need relocation or use other version for 
hive3
 Key: HUDI-3605
 URL: https://issues.apache.org/jira/browse/HUDI-3605
 Project: Apache Hudi
  Issue Type: Epic
Reporter: rex xiong
 Fix For: 0.11.0


Hive 3 now uses hbase version 2.0.0-alpha4,which has conflict with the hbase 
version of hudi. See the problem as below.
{code:java}
hive> select * from hudi_mor_part_table_spark3_0110_0309_ro;
OK
Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.io.hfile.HFile
.createReader(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hado
op/fs/Path;Lorg/apache/hadoop/hbase/io/FSDataInputStreamWrapper;JLorg/apache/hadoop/hbase/io/hfile/CacheConfig;Lorg/apache/hadoop/conf/Configuration;)Lorg/ap
ache/hadoop/hbase/io/hfile/HFile$Reader;
at 
org.apache.hudi.io.storage.HoodieHFileReader.(HoodieHFileReader.java:98)
at 
org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.deserializeRecords(HoodieHFileDataBlock.java:158)
at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.readRecordsFromBlockPayload(HoodieDataBlock.java:167)
at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecordItr(HoodieDataBlock.java:125)
at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecordItr(HoodieDataBlock.java:151)
at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:363)
at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:427)
at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:242)
at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:181)
at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:101)
at 
org.apache.hudi.metadata.HoodieMetadataMergedLogRecordReader.(HoodieMetadataMergedLogRecordReader.java:71)
at 
org.apache.hudi.metadata.HoodieMetadataMergedLogRecordReader.(HoodieMetadataMergedLogRecordReader.java:51)
at 
org.apache.hudi.metadata.HoodieMetadataMergedLogRecordReader$Builder.build(HoodieMetadataMergedLogRecordReader.java:246)
at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:379)
at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$openReadersIfNeeded$4(HoodieBackedTableMetadata.java:295)
at 
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.openReadersIfNeeded(H
oodieBackedTableMetadata.java:283)
at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$0(HoodieBackedTableMetadata.java:139)
at java.util.HashMap.forEach(HashMap.java:1289)
at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:138)
at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:128)
at 
org.apache.hudi.metadata.BaseTableMetadata.fetchAllPartitionPaths(BaseTableMetadata.java:274)
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-2520) Certify sync with Hive 3

2022-02-23 Thread rex xiong (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497117#comment-17497117
 ] 

rex xiong commented on HUDI-2520:
-

OK, I will take a look at this.

> Certify sync with Hive 3
> 
>
> Key: HUDI-2520
> URL: https://issues.apache.org/jira/browse/HUDI-2520
> Project: Apache Hudi
>  Issue Type: Task
>  Components: hive, meta-sync
>Reporter: Sagar Sumit
>Assignee: rex xiong
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)