Vinaykumar Bhat created HUDI-7580: ------------------------------------- Summary: Inserting rows into partitioned table leads to data sanity issues Key: HUDI-7580 URL: https://issues.apache.org/jira/browse/HUDI-7580 Project: Apache Hudi Issue Type: Bug Affects Versions: 1.0.0-beta1 Reporter: Vinaykumar Bhat
Came across this behaviour of partitioned tables when trying to debug some other issue with functional-index. It seems that the column ordering gets messed up while inserting records into a hudi table. Hence, a subsequent query returns wrong results. An example follows: The following is a scala test: {code:java} test("Test Create Functional Index") { if (HoodieSparkUtils.gteqSpark3_2) { withTempDir { tmp => val tableType = "cow" val tableName = "rides" val basePath = s"${tmp.getCanonicalPath}/$tableName" spark.sql("set hoodie.metadata.enable=true") spark.sql( s""" |create table $tableName ( | id int, | name string, | price int, | ts long |) using hudi | options ( | primaryKey ='id', | type = '$tableType', | preCombineField = 'ts', | hoodie.metadata.record.index.enable = 'true', | hoodie.datasource.write.recordkey.field = 'id' | ) | partitioned by(price) | location '$basePath' """.stripMargin) spark.sql(s"insert into $tableName (id, name, price, ts) values(1, 'a1', 10, 1000)") spark.sql(s"insert into $tableName (id, name, price, ts) values(2, 'a2', 100, 200000)") spark.sql(s"insert into $tableName (id, name, price, ts) values(3, 'a3', 1000, 2000000000)") spark.sql(s"select id, name, price, ts from $tableName").show(false) } } } {code} The query returns the following result (note how price ans ts columns are mixed up). {code:java} +---+----+----------+----+ |id |name|price |ts | +---+----+----------+----+ |3 |a3 |2000000000|1000| |2 |a2 |200000 |100 | |1 |a1 |1000 |10 | +---+----+----------+----+ {code} Have the partition column as the last column in the schema does not cause this problem. If the mixed-up columns are of imcompatible datatypes, then the insert fails with an error. -- This message was sent by Atlassian Jira (v8.20.10#820010)