[ https://issues.apache.org/jira/browse/HUDI-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-6877: --------------------------------- Labels: pull-request-available (was: ) > Fix unqualified namespace issues in Spark3.1 > -------------------------------------------- > > Key: HUDI-6877 > URL: https://issues.apache.org/jira/browse/HUDI-6877 > Project: Apache Hudi > Issue Type: Bug > Reporter: voon > Assignee: voon > Priority: Major > Labels: pull-request-available > > Spark3.1 uses Avro 1.8.2, where Avro schema resolution on any types that are > allowed to have defined namespaces are strictly-matched. i.e. fields are > resolved using their fully qualified name. > > This means that namespaces must match-up for reader and writer schema. > However, when ALTER-TABLE-NAME-DLL is performed, the tableName in > _hoodie.properties_ is changed. The Avro schema that is generated is from the > requiredSchema struct is hence different for both reader and writer schema > (although the field names and types are the same). > > This will lead to read errors, when there are log files when performing > ALTER-TABLE-NAME-DLL. > > {code:java} > test("Test rename table") { > withTempDir { tmp => > // Create table with INMEMORY index to generate log only mor table. > val tableName = generateTableName > spark.sql( > s""" > |create table $tableName ( > | id int, > | name string, > | price decimal(20,0), > | ts long > |) using hudi > | location '${tmp.getCanonicalPath}' > | tblproperties ( > | primaryKey ='id', > | type = 'mor', > | preCombineField = 'ts', > | hoodie.index.type = 'INMEMORY', > | hoodie.compact.inline = 'true' > | ) > """.stripMargin) > spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000),(2, 'a2', > 10, 1000),(3, 'a3', 10, 1000)") > spark.sql(s"ALTER TABLE $tableName rename to h0NewTableName") > spark.sql(s"insert into h0NewTableName values(2, 'a1', 10, 1001),(2, > 'a2', 10, 1000),(3, 'a3', 10, 1000)") > spark.sql(s"select id, name, price, ts from h0NewTableName order by > id").show(false) > } > } {code} > > Spark3.2 will not have this issue as it uses Avro 1.10.2. Avro schema > resolution will resolve fields using their unqualified name. -- This message was sent by Atlassian Jira (v8.20.10#820010)