[ https://issues.apache.org/jira/browse/HUDI-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sagar Sumit updated HUDI-7964: ------------------------------ Attachment: Screenshot 2024-07-11 at 5.43.41 PM.png > Partitions not created correctly with SQL when multiple partitions specified > out of order > ----------------------------------------------------------------------------------------- > > Key: HUDI-7964 > URL: https://issues.apache.org/jira/browse/HUDI-7964 > Project: Apache Hudi > Issue Type: Bug > Reporter: Sagar Sumit > Priority: Major > Labels: spark-sql > Fix For: 1.0.0 > > Attachments: Screenshot 2024-07-06 at 11.34.17 AM.png, Screenshot > 2024-07-11 at 5.43.41 PM.png > > > When multiple partitions are specified out of order (as compared to the order > of fields in the create table command), the partitioning on storage is > incorrect. Test script (notice that create table or insert into command has > city and then state, while the partitioned by clause has state first and then > city): > {code:java} > DROP TABLE IF EXISTS hudi_table_mlp; > CREATE TABLE hudi_table_mlp ( ts BIGINT, id STRING, rider STRING, > driver STRING, fare DOUBLE, city STRING, state STRING) USING > HUDIoptions( primaryKey ='id', preCombineField = 'ts', > hoodie.metadata.record.index.enable = 'true')PARTITIONED BY (state, > city)location 'file:///tmp/hudi_table_mlp'; > INSERT INTO hudi_table_mlp VALUES > (1695159649,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco','california');INSERT > INTO hudi_table_mlp VALUES > (1695091554,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70,'sunnyvale','california');INSERT > INTO hudi_table_mlp VALUES > (1695332066,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'austin','texas');INSERT > INTO hudi_table_mlp VALUES > (1695516137,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'houston','texas'); > {code} > This creates partition as follows (note that city and state values are > swapped): > !Screenshot 2024-07-06 at 11.34.17 AM.png! > Now, if i query with state='texas' filter, there are no results: > {code:java} > spark-sql> select * from hudi_table_mlp where state='texas'; > 24/07/06 11:30:36 INFO HoodieFileIndex: Using provided predicates to prune > number of target table's partitions scanned from 4 to 0 > Time taken: 0.056 seconds {code} > I have tested this with master, 0.15.0 and 0.14.1, so it's not a recent > regression. > -- This message was sent by Atlassian Jira (v8.20.10#820010)