[ https://issues.apache.org/jira/browse/HUDI-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sivabalan narayanan reassigned HUDI-1718: ----------------------------------------- Assignee: tao meng > when query incr view of mor table which has Multi level partitions, the > query failed > ------------------------------------------------------------------------------------- > > Key: HUDI-1718 > URL: https://issues.apache.org/jira/browse/HUDI-1718 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration > Affects Versions: 0.7.0, 0.8.0 > Reporter: tao meng > Assignee: tao meng > Priority: Major > Labels: pull-request-available, sev:critical, user-support-issues > Fix For: 0.9.0 > > > HoodieCombineHiveInputFormat use "," to join mutil partitions, however hive > use "/" to join muit1 partitions. there exists some gap, so modify > HoodieCombineHiveInputFormat's logical > test env > spark2.4.5, hadoop 3.1.1, hive 3.1.1 > > step1: > val df = spark.range(0, 10000).toDF("keyid") > .withColumn("col3", expr("keyid + 10000000")) > .withColumn("p", lit(0)) > .withColumn("p1", lit(0)) > .withColumn("p2", lit(6)) > .withColumn("a1", lit(Array[String]("sb1", "rz"))) > .withColumn("a2", lit(Array[String]("sb1", "rz"))) > // bulk_insert df, partition by p,p1,p2 > merge(df, 4, "default", "hive_8b", > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert") > step2: > val df = spark.range(0, 10000).toDF("keyid") > .withColumn("col3", expr("keyid + 10000000")) > .withColumn("p", lit(0)) > .withColumn("p1", lit(0)) > .withColumn("p2", lit(7)) > .withColumn("a1", lit(Array[String]("sb1", "rz"))) > .withColumn("a2", lit(Array[String]("sb1", "rz"))) > // upsert table hive8b > merge(df, 4, "default", "hive_8b", > DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert") > step3: > start hive beeline: > set > hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat; > set hoodie.hive_8b.consume.mode=INCREMENTAL; > set hoodie.hive_8b.consume.max.commits=3; > set hoodie.hive_8b.consume.start.timestamp=20210325141300; // this timestamp > is smaller the earlist commit, so we can query whole commits > select `p`, `p1`, `p2`,`keyid` from hive_8b_rt where > `_hoodie_commit_time`>'20210325141300' > > 2021-03-25 14:14:36,036 | INFO | AsyncDispatcher event handler | Diagnostics > report from attempt_1615883368881_0028_m_000000_3: Error: > org.apache.hudi.org.apache.avro.SchemaParseException: Illegal character in: > p,p1,p2 2021-03-25 14:14:36,036 | INFO | AsyncDispatcher event handler | > Diagnostics report from attempt_1615883368881_0028_m_000000_3: Error: > org.apache.hudi.org.apache.avro.SchemaParseException: Illegal character in: > p,p1,p2 at > org.apache.hudi.org.apache.avro.Schema.validateName(Schema.java:1151) at > org.apache.hudi.org.apache.avro.Schema.access$200(Schema.java:81) at > org.apache.hudi.org.apache.avro.Schema$Field.<init>(Schema.java:403) at > org.apache.hudi.org.apache.avro.Schema$Field.<init>(Schema.java:396) at > org.apache.hudi.avro.HoodieAvroUtils.appendNullSchemaFields(HoodieAvroUtils.java:268) > at > org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.addPartitionFields(HoodieRealtimeRecordReaderUtils.java:286) > at > org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:98) > at > org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.<init>(AbstractRealtimeRecordReader.java:67) > at > org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.<init>(RealtimeCompactedRecordReader.java:53) > at > org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70) > at > org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.<init>(HoodieRealtimeRecordReader.java:47) > at > org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:123) > at > org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getRecordReader(HoodieCombineHiveInputFormat.java:975) > at > org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getRecordReader(HoodieCombineHiveInputFormat.java:556) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at > org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:183) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:177) > -- This message was sent by Atlassian Jira (v8.3.4#803005)