[jira] [Comment Edited] (HUDI-1842) [SQL] Spark Sql Support For The Exists Hoodie Table

sivabalan narayanan (Jira) Fri, 30 Jul 2021 16:58:03 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390817#comment-17390817
 ]


sivabalan narayanan edited comment on HUDI-1842 at 7/30/21, 11:57 PM:
----------------------------------------------------------------------

I was just playing around. Just dumping my findings here. My intention: If 
incase we upgrade/update the hoodie.properties with appropriate entries, what 
it takes to start using it in spark-sql. 

 

I tried creating a table via spark shell as per quick start. And then executed 
this command via spark-sql

create table hudi_cow1 (begin_lat double, begin_lon double, driver string, 
end_lat double, end_lon double, fare double, partitionpath string, rider 
string, ts bigint, uuid string) using hudi options(primaryKey = 'uuid', 
precombineField = 'ts') partitioned by (partitionpath) location 
'file:///tmp/hudi_cow/';

table name has to match the table name as per hoodie.properties. 

Note: I created this table with latest master and so it had all the required 
properties required for sql even though it was created with spark ds. 

 

After this, I tried to insert records via spark-sql

insert into hudi_cow1 values(1.0, 2.0, "driver_1", 3.0, 4.0, 100.0, "rider_1", 
12345, "ajsdfih23498q405qtahgkfsg", "americas/united_states/san_francisco/");

 

I see that for record key and partition path, respective field names are 
prefixed to col values for meta fields. 

Result of select command. 

// showing 2 rows. 1 row was inserted via spark-shell and another one(2nd row) 
is inserted via spark-sql. 
{code:java}
20210730180218  20210730180218_1_8      ef9f4d56-12e0-4266-91ad-c4bca0580db6    
americas/united_states/san_francisco    
14e81925-2479-4a57-a932-42d1078fe988-0_1-27-28_20210730180218.parque0.1856488085068272
  0.9694586417848392      driver-213      0.38186367037201974     
0.25252652214479043     33.92216483948643       rider-213       1627136598584   
ef9f4d56-12e0-4266-91ad-c4bca0580db6    americas/united_states/san_francisco

20210730190704  20210730190704_0_1001   uuid:ajsdfih23498q405qtahgkfsg  
partitionpath=americas%2Funited_states%2Fsan_francisco%2F       
9a350a54-bb5d-4aba-bf5e-bbcc665c4449-0_0-66-3383_20210730190704.parquet 1.0     
2.0     driver_1        3.0     4.0     100.0   rider_1 
1234ajsdfih23498q405qtahgkfsg   americas/united_states/san_francisco/
{code}
 

 

 


was (Author: shivnarayan):
I was just playing around. Just dumping my findings here. My intention: If 
incase we update the hoodie.properties with appropriate entries, what it takes 
to start using it in spark-sql. 

 

I tried creating a table via spark shell as per quick start. And then executed 
this command via spark-sql

create table hudi_cow1 (begin_lat double, begin_lon double, driver string, 
end_lat double, end_lon double, fare double, partitionpath string, rider 
string, ts bigint, uuid string) using hudi options(primaryKey = 'uuid', 
precombineField = 'ts') partitioned by (partitionpath) location 
'file:///tmp/hudi_cow/';

table name has to match the table name as per hoodie.properties. 

Note: I created this table with latest master and so it had all the required 
properties required for sql even though it was created with spark ds. 

 

After this, I tried to insert records

insert into hudi_cow1 values(1.0, 2.0, "driver_1", 3.0, 4.0, 100.0, "rider_1", 
12345, "ajsdfih23498q405qtahgkfsg", "americas/united_states/san_francisco/");

 

I see that for record key and partition path, respective field names are 
prefixed to col values for meta fields. 

Result of select command. 

// showing 2 rows. 1 row was inserted via spark-shell and another one(2nd row) 
is inserted via spark-sql. 
{code:java}
20210730180218  20210730180218_1_8      ef9f4d56-12e0-4266-91ad-c4bca0580db6    
americas/united_states/san_francisco    
14e81925-2479-4a57-a932-42d1078fe988-0_1-27-28_20210730180218.parque0.1856488085068272
  0.9694586417848392      driver-213      0.38186367037201974     
0.25252652214479043     33.92216483948643       rider-213       1627136598584   
ef9f4d56-12e0-4266-91ad-c4bca0580db6    americas/united_states/san_francisco

20210730190704  20210730190704_0_1001   uuid:ajsdfih23498q405qtahgkfsg  
partitionpath=americas%2Funited_states%2Fsan_francisco%2F       
9a350a54-bb5d-4aba-bf5e-bbcc665c4449-0_0-66-3383_20210730190704.parquet 1.0     
2.0     driver_1        3.0     4.0     100.0   rider_1 
1234ajsdfih23498q405qtahgkfsg   americas/united_states/san_francisco/
{code}
 

 

 

> [SQL] Spark Sql Support For The Exists Hoodie Table
> ---------------------------------------------------
>
>                 Key: HUDI-1842
>                 URL: https://issues.apache.org/jira/browse/HUDI-1842
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: pengzhiwei
>            Priority: Blocker
>              Labels: release-blocker
>             Fix For: 0.9.0
>
>
> In order to support spark sql for hoodie, we persist some table properties to 
> the hoodie.properties. e.g. primaryKey, preCombineField, partition columns.  
> For the exists hoodie tables, these  properties are missing. We need do some 
> code in UpgradeDowngrade to support spark sql for the exists tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HUDI-1842) [SQL] Spark Sql Support For The Exists Hoodie Table

Reply via email to