[ 
https://issues.apache.org/jira/browse/HUDI-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7964:
-------------------------------

    Assignee: Sagar Sumit

> Partitions not created correctly with SQL when multiple partitions specified 
> out of order
> -----------------------------------------------------------------------------------------
>
>                 Key: HUDI-7964
>                 URL: https://issues.apache.org/jira/browse/HUDI-7964
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Sagar Sumit
>            Assignee: Sagar Sumit
>            Priority: Major
>              Labels: pull-request-available, spark-sql
>             Fix For: 1.0.0
>
>         Attachments: Screenshot 2024-07-06 at 11.34.17 AM.png, Screenshot 
> 2024-07-11 at 5.43.41 PM.png
>
>
> When multiple partitions are specified out of order (as compared to the order 
> of fields in the create table command), the partitioning on storage is 
> incorrect. Test script (notice that create table or insert into command has 
> city and then state, while the partitioned by clause has state first and then 
> city):
> {code:java}
> DROP TABLE IF EXISTS hudi_table_mlp;
> CREATE TABLE hudi_table_mlp (    
>   ts BIGINT,    
>   id STRING,    
>   rider STRING,    
>   driver STRING,    
>   fare DOUBLE,    
>   city STRING,    
>   state STRING) 
> USING HUDI options(    
>   primaryKey ='id',    
>   preCombineField = 'ts')
> PARTITIONED BY (state, city)location 'file:///tmp/hudi_table_mlp';
> INSERT INTO hudi_table_mlp VALUES 
> (1695159649,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco','california');
> INSERT INTO hudi_table_mlp VALUES 
> (1695091554,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70,'sunnyvale','california');
> INSERT INTO hudi_table_mlp VALUES 
> (1695332066,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'austin','texas');
> INSERT INTO hudi_table_mlp VALUES 
> (1695516137,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'houston','texas');
>  {code}
> This creates partition as follows (note that city and state values are 
> swapped):
> !Screenshot 2024-07-11 at 5.43.41 PM.png|width=737,height=335!
> Now, if i query with state='texas' filter, there are no results:
> {code:java}
> spark-sql> select * from hudi_table_mlp where state='texas'; -- no results --
> Time taken: 0.356 seconds {code}
> I have tested this with master, 0.15.0 and 0.14.1, so it's not a recent 
> regression.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to