[ 
https://issues.apache.org/jira/browse/HUDI-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7964:
------------------------------
    Attachment: Screenshot 2024-07-11 at 5.43.41 PM.png

> Partitions not created correctly with SQL when multiple partitions specified 
> out of order
> -----------------------------------------------------------------------------------------
>
>                 Key: HUDI-7964
>                 URL: https://issues.apache.org/jira/browse/HUDI-7964
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Sagar Sumit
>            Priority: Major
>              Labels: spark-sql
>             Fix For: 1.0.0
>
>         Attachments: Screenshot 2024-07-06 at 11.34.17 AM.png, Screenshot 
> 2024-07-11 at 5.43.41 PM.png
>
>
> When multiple partitions are specified out of order (as compared to the order 
> of fields in the create table command), the partitioning on storage is 
> incorrect. Test script (notice that create table or insert into command has 
> city and then state, while the partitioned by clause has state first and then 
> city):
> {code:java}
> DROP TABLE IF EXISTS hudi_table_mlp;
> CREATE TABLE hudi_table_mlp (    ts BIGINT,    id STRING,    rider STRING,    
> driver STRING,    fare DOUBLE,    city STRING,    state STRING) USING 
> HUDIoptions(    primaryKey ='id',    preCombineField = 'ts',    
> hoodie.metadata.record.index.enable = 'true')PARTITIONED BY (state, 
> city)location 'file:///tmp/hudi_table_mlp';
> INSERT INTO hudi_table_mlp VALUES 
> (1695159649,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco','california');INSERT
>  INTO hudi_table_mlp VALUES 
> (1695091554,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70,'sunnyvale','california');INSERT
>  INTO hudi_table_mlp VALUES 
> (1695332066,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'austin','texas');INSERT
>  INTO hudi_table_mlp VALUES 
> (1695516137,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'houston','texas');
>  {code}
> This creates partition as follows (note that city and state values are 
> swapped):
> !Screenshot 2024-07-06 at 11.34.17 AM.png!
> Now, if i query with state='texas' filter, there are no results:
> {code:java}
> spark-sql> select * from hudi_table_mlp where state='texas';
> 24/07/06 11:30:36 INFO HoodieFileIndex: Using provided predicates to prune 
> number of target table's partitions scanned from 4 to 0
> Time taken: 0.056 seconds {code}
> I have tested this with master, 0.15.0 and 0.14.1, so it's not a recent 
> regression.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to