[jira] [Comment Edited] (HUDI-1842) [SQL] Spark Sql Support For The Exists Hoodie Table

2021-07-30 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390817#comment-17390817
 ] 

sivabalan narayanan edited comment on HUDI-1842 at 7/30/21, 11:57 PM:
--

I was just playing around. Just dumping my findings here. My intention: If 
incase we upgrade/update the hoodie.properties with appropriate entries, what 
it takes to start using it in spark-sql. 

 

I tried creating a table via spark shell as per quick start. And then executed 
this command via spark-sql

create table hudi_cow1 (begin_lat double, begin_lon double, driver string, 
end_lat double, end_lon double, fare double, partitionpath string, rider 
string, ts bigint, uuid string) using hudi options(primaryKey = 'uuid', 
precombineField = 'ts') partitioned by (partitionpath) location 
'file:///tmp/hudi_cow/';

table name has to match the table name as per hoodie.properties. 

Note: I created this table with latest master and so it had all the required 
properties required for sql even though it was created with spark ds. 

 

After this, I tried to insert records via spark-sql

insert into hudi_cow1 values(1.0, 2.0, "driver_1", 3.0, 4.0, 100.0, "rider_1", 
12345, "ajsdfih23498q405qtahgkfsg", "americas/united_states/san_francisco/");

 

I see that for record key and partition path, respective field names are 
prefixed to col values for meta fields. 

Result of select command. 

// showing 2 rows. 1 row was inserted via spark-shell and another one(2nd row) 
is inserted via spark-sql. 
{code:java}
20210730180218  20210730180218_1_8  ef9f4d56-12e0-4266-91ad-c4bca0580db6
americas/united_states/san_francisco
14e81925-2479-4a57-a932-42d1078fe988-0_1-27-28_20210730180218.parque0.1856488085068272
  0.9694586417848392  driver-213  0.38186367037201974 
0.25252652214479043 33.92216483948643   rider-213   1627136598584   
ef9f4d56-12e0-4266-91ad-c4bca0580db6americas/united_states/san_francisco

20210730190704  20210730190704_0_1001   uuid:ajsdfih23498q405qtahgkfsg  
partitionpath=americas%2Funited_states%2Fsan_francisco%2F   
9a350a54-bb5d-4aba-bf5e-bbcc665c4449-0_0-66-3383_20210730190704.parquet 1.0 
2.0 driver_13.0 4.0 100.0   rider_1 
1234ajsdfih23498q405qtahgkfsg   americas/united_states/san_francisco/
{code}
 

 

 


was (Author: shivnarayan):
I was just playing around. Just dumping my findings here. My intention: If 
incase we update the hoodie.properties with appropriate entries, what it takes 
to start using it in spark-sql. 

 

I tried creating a table via spark shell as per quick start. And then executed 
this command via spark-sql

create table hudi_cow1 (begin_lat double, begin_lon double, driver string, 
end_lat double, end_lon double, fare double, partitionpath string, rider 
string, ts bigint, uuid string) using hudi options(primaryKey = 'uuid', 
precombineField = 'ts') partitioned by (partitionpath) location 
'file:///tmp/hudi_cow/';

table name has to match the table name as per hoodie.properties. 

Note: I created this table with latest master and so it had all the required 
properties required for sql even though it was created with spark ds. 

 

After this, I tried to insert records

insert into hudi_cow1 values(1.0, 2.0, "driver_1", 3.0, 4.0, 100.0, "rider_1", 
12345, "ajsdfih23498q405qtahgkfsg", "americas/united_states/san_francisco/");

 

I see that for record key and partition path, respective field names are 
prefixed to col values for meta fields. 

Result of select command. 

// showing 2 rows. 1 row was inserted via spark-shell and another one(2nd row) 
is inserted via spark-sql. 
{code:java}
20210730180218  20210730180218_1_8  ef9f4d56-12e0-4266-91ad-c4bca0580db6
americas/united_states/san_francisco
14e81925-2479-4a57-a932-42d1078fe988-0_1-27-28_20210730180218.parque0.1856488085068272
  0.9694586417848392  driver-213  0.38186367037201974 
0.25252652214479043 33.92216483948643   rider-213   1627136598584   
ef9f4d56-12e0-4266-91ad-c4bca0580db6americas/united_states/san_francisco

20210730190704  20210730190704_0_1001   uuid:ajsdfih23498q405qtahgkfsg  
partitionpath=americas%2Funited_states%2Fsan_francisco%2F   
9a350a54-bb5d-4aba-bf5e-bbcc665c4449-0_0-66-3383_20210730190704.parquet 1.0 
2.0 driver_13.0 4.0 100.0   rider_1 
1234ajsdfih23498q405qtahgkfsg   americas/united_states/san_francisco/
{code}
 

 

 

> [SQL] Spark Sql Support For The Exists Hoodie Table
> ---
>
> Key: HUDI-1842
> URL: https://issues.apache.org/jira/browse/HUDI-1842
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Priority: Blocker
>  Labels: release-blocker
> Fix For: 0.9.0
>
>
> In order to support spark sql for 

[jira] [Comment Edited] (HUDI-1842) [SQL] Spark Sql Support For The Exists Hoodie Table

2021-07-30 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390817#comment-17390817
 ] 

sivabalan narayanan edited comment on HUDI-1842 at 7/30/21, 11:52 PM:
--

I was just playing around. Just dumping my findings here. My intention: If 
incase we update the hoodie.properties with appropriate entries, what it takes 
to start using it in spark-sql. 

 

I tried creating a table via spark shell as per quick start. And then executed 
this command via spark-sql

create table hudi_cow1 (begin_lat double, begin_lon double, driver string, 
end_lat double, end_lon double, fare double, partitionpath string, rider 
string, ts bigint, uuid string) using hudi options(primaryKey = 'uuid', 
precombineField = 'ts') partitioned by (partitionpath) location 
'file:///tmp/hudi_cow/';

table name has to match the table name as per hoodie.properties. 

Note: I created this table with latest master and so it had all the required 
properties required for sql even though it was created with spark ds. 

 

After this, I tried to insert records

insert into hudi_cow1 values(1.0, 2.0, "driver_1", 3.0, 4.0, 100.0, "rider_1", 
12345, "ajsdfih23498q405qtahgkfsg", "americas/united_states/san_francisco/");

 

I see that for record key and partition path, respective field names are 
prefixed to col values for meta fields. 

Result of select command. 

// showing 2 rows. 1 row was inserted via spark-shell and another one(2nd row) 
is inserted via spark-sql. 
{code:java}
20210730180218  20210730180218_1_8  ef9f4d56-12e0-4266-91ad-c4bca0580db6
americas/united_states/san_francisco
14e81925-2479-4a57-a932-42d1078fe988-0_1-27-28_20210730180218.parque0.1856488085068272
  0.9694586417848392  driver-213  0.38186367037201974 
0.25252652214479043 33.92216483948643   rider-213   1627136598584   
ef9f4d56-12e0-4266-91ad-c4bca0580db6americas/united_states/san_francisco

20210730190704  20210730190704_0_1001   uuid:ajsdfih23498q405qtahgkfsg  
partitionpath=americas%2Funited_states%2Fsan_francisco%2F   
9a350a54-bb5d-4aba-bf5e-bbcc665c4449-0_0-66-3383_20210730190704.parquet 1.0 
2.0 driver_13.0 4.0 100.0   rider_1 
1234ajsdfih23498q405qtahgkfsg   americas/united_states/san_francisco/
{code}
 

 

 


was (Author: shivnarayan):
I was just playing around. Just dumping my findings here. 

I tried creating a table via spark shell as per quick start. And then executed 
this command via spark-sql

create table hudi_cow1 (begin_lat double, begin_lon double, driver string, 
end_lat double, end_lon double, fare double, partitionpath string, rider 
string, ts bigint, uuid string) using hudi options(primaryKey = 'uuid', 
precombineField = 'ts') partitioned by (partitionpath) location 
'file:///tmp/hudi_cow/';

table name has to match the table name as per hoodie.properties. 

Note: I created this table with latest master and so it had all the required 
properties required for sql even though it was created with spark ds. 

 

After this, I tried to insert records

insert into hudi_cow1 values(1.0, 2.0, "driver_1", 3.0, 4.0, 100.0, "rider_1", 
12345, "ajsdfih23498q405qtahgkfsg", "americas/united_states/san_francisco/");

 

I see that for record key and partition path, respective field names are 
prefixed to col values for meta fields. 

Result of select command. 

// showing 2 rows. 1 row was inserted via spark-shell and another one(2nd row) 
is inserted via spark-sql. 
{code:java}
20210730180218  20210730180218_1_8  ef9f4d56-12e0-4266-91ad-c4bca0580db6
americas/united_states/san_francisco
14e81925-2479-4a57-a932-42d1078fe988-0_1-27-28_20210730180218.parque0.1856488085068272
  0.9694586417848392  driver-213  0.38186367037201974 
0.25252652214479043 33.92216483948643   rider-213   1627136598584   
ef9f4d56-12e0-4266-91ad-c4bca0580db6americas/united_states/san_francisco

20210730190704  20210730190704_0_1001   uuid:ajsdfih23498q405qtahgkfsg  
partitionpath=americas%2Funited_states%2Fsan_francisco%2F   
9a350a54-bb5d-4aba-bf5e-bbcc665c4449-0_0-66-3383_20210730190704.parquet 1.0 
2.0 driver_13.0 4.0 100.0   rider_1 
1234ajsdfih23498q405qtahgkfsg   americas/united_states/san_francisco/
{code}
 

 

 

> [SQL] Spark Sql Support For The Exists Hoodie Table
> ---
>
> Key: HUDI-1842
> URL: https://issues.apache.org/jira/browse/HUDI-1842
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Priority: Blocker
>  Labels: release-blocker
> Fix For: 0.9.0
>
>
> In order to support spark sql for hoodie, we persist some table properties to 
> the hoodie.properties. e.g. primaryKey, preCombineField, partition columns.  
> For the exists hoodie 

[jira] [Comment Edited] (HUDI-1842) [SQL] Spark Sql Support For The Exists Hoodie Table

2021-07-30 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390817#comment-17390817
 ] 

sivabalan narayanan edited comment on HUDI-1842 at 7/30/21, 11:48 PM:
--

I was just playing around. Just dumping my findings here. 

I tried creating a table via spark shell as per quick start. And then executed 
this command via spark-sql

create table hudi_cow1 (begin_lat double, begin_lon double, driver string, 
end_lat double, end_lon double, fare double, partitionpath string, rider 
string, ts bigint, uuid string) using hudi options(primaryKey = 'uuid', 
precombineField = 'ts') partitioned by (partitionpath) location 
'file:///tmp/hudi_cow/';

table name has to match the table name as per hoodie.properties. 

Note: I created this table with latest master and so it had all the required 
properties required for sql even though it was created with spark ds. 

 

After this, I tried to insert records

insert into hudi_cow1 values(1.0, 2.0, "driver_1", 3.0, 4.0, 100.0, "rider_1", 
12345, "ajsdfih23498q405qtahgkfsg", "americas/united_states/san_francisco/");

 

I see that for record key and partition path, respective field names are 
prefixed to col values for meta fields. 

Result of select command. 

// showing 2 rows. 1 row was inserted via spark-shell and another one(2nd row) 
is inserted via spark-sql. 
{code:java}
20210730180218  20210730180218_1_8  ef9f4d56-12e0-4266-91ad-c4bca0580db6
americas/united_states/san_francisco
14e81925-2479-4a57-a932-42d1078fe988-0_1-27-28_20210730180218.parque0.1856488085068272
  0.9694586417848392  driver-213  0.38186367037201974 
0.25252652214479043 33.92216483948643   rider-213   1627136598584   
ef9f4d56-12e0-4266-91ad-c4bca0580db6americas/united_states/san_francisco

20210730190704  20210730190704_0_1001   uuid:ajsdfih23498q405qtahgkfsg  
partitionpath=americas%2Funited_states%2Fsan_francisco%2F   
9a350a54-bb5d-4aba-bf5e-bbcc665c4449-0_0-66-3383_20210730190704.parquet 1.0 
2.0 driver_13.0 4.0 100.0   rider_1 
1234ajsdfih23498q405qtahgkfsg   americas/united_states/san_francisco/
{code}
 

 

 


was (Author: shivnarayan):
I was just playing around. Just dumping my findings here. 

I tried creating a table via spark shell as per quick start. And then executed 
this command via spark-sql

create table hudi_cow1 (begin_lat double, begin_lon double, driver string, 
end_lat double, end_lon double, fare double, partitionpath string, rider 
string, ts bigint, uuid string) using hudi options(primaryKey = 'uuid', 
precombineField = 'ts') partitioned by (partitionpath) location 
'file:///tmp/hudi_cow/';

table name has to match the table name as per hoodie.properties. 

Note: I created this table with latest master and so it had all the required 
properties required for sql even though it was created with spark ds. 

 

After this, I tried to insert records

insert into hudi_cow1 values(1.0, 2.0, "driver_1", 3.0, 4.0, 100.0, "rider_1", 
12345, "ajsdfih23498q405qtahgkfsg", "americas/united_states/san_francisco/");

 

I see that for record key and partition path, respective field names are 
prefixed to col values. 

Result of select command. 

// showing 2 rows. 1 row was inserted via spark-shell and another one(2nd row) 
is inserted via spark-sql. 
{code:java}
20210730180218  20210730180218_1_8  ef9f4d56-12e0-4266-91ad-c4bca0580db6
americas/united_states/san_francisco
14e81925-2479-4a57-a932-42d1078fe988-0_1-27-28_20210730180218.parque0.1856488085068272
  0.9694586417848392  driver-213  0.38186367037201974 
0.25252652214479043 33.92216483948643   rider-213   1627136598584   
ef9f4d56-12e0-4266-91ad-c4bca0580db6americas/united_states/san_francisco

20210730190704  20210730190704_0_1001   uuid:ajsdfih23498q405qtahgkfsg  
partitionpath=americas%2Funited_states%2Fsan_francisco%2F   
9a350a54-bb5d-4aba-bf5e-bbcc665c4449-0_0-66-3383_20210730190704.parquet 1.0 
2.0 driver_13.0 4.0 100.0   rider_1 
1234ajsdfih23498q405qtahgkfsg   americas/united_states/san_francisco/
{code}
 

 

 

> [SQL] Spark Sql Support For The Exists Hoodie Table
> ---
>
> Key: HUDI-1842
> URL: https://issues.apache.org/jira/browse/HUDI-1842
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Priority: Blocker
>  Labels: release-blocker
> Fix For: 0.9.0
>
>
> In order to support spark sql for hoodie, we persist some table properties to 
> the hoodie.properties. e.g. primaryKey, preCombineField, partition columns.  
> For the exists hoodie tables, these  properties are missing. We need do some 
> code in UpgradeDowngrade to support spark sql for the exists tables.



--
This message was sent 

[jira] [Comment Edited] (HUDI-1842) [SQL] Spark Sql Support For The Exists Hoodie Table

2021-07-30 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390817#comment-17390817
 ] 

sivabalan narayanan edited comment on HUDI-1842 at 7/30/21, 11:14 PM:
--

I was just playing around. Just dumping my findings here. 

I tried creating a table via spark shell as per quick start. And then executed 
this command via spark-sql

create table hudi_cow1 (begin_lat double, begin_lon double, driver string, 
end_lat double, end_lon double, fare double, partitionpath string, rider 
string, ts bigint, uuid string) using hudi options(primaryKey = 'uuid', 
precombineField = 'ts') partitioned by (partitionpath) location 
'file:///tmp/hudi_cow/';

table name has to match the table name as per hoodie.properties. 

Note: I created this table with latest master and so it had all the required 
properties required for sql even though it was created with spark ds. 

 

After this, I tried to insert records

insert into hudi_cow1 values(1.0, 2.0, "driver_1", 3.0, 4.0, 100.0, "rider_1", 
12345, "ajsdfih23498q405qtahgkfsg", "americas/united_states/san_francisco/");

 

I see that for record key and partition path, respective field names are 
prefixed to col values. 

Result of select command. 

// showing 2 rows. 1 row was inserted via spark-shell and another one(2nd row) 
is inserted via spark-sql. 
{code:java}
20210730180218  20210730180218_1_8  ef9f4d56-12e0-4266-91ad-c4bca0580db6
americas/united_states/san_francisco
14e81925-2479-4a57-a932-42d1078fe988-0_1-27-28_20210730180218.parque0.1856488085068272
  0.9694586417848392  driver-213  0.38186367037201974 
0.25252652214479043 33.92216483948643   rider-213   1627136598584   
ef9f4d56-12e0-4266-91ad-c4bca0580db6americas/united_states/san_francisco

20210730190704  20210730190704_0_1001   uuid:ajsdfih23498q405qtahgkfsg  
partitionpath=americas%2Funited_states%2Fsan_francisco%2F   
9a350a54-bb5d-4aba-bf5e-bbcc665c4449-0_0-66-3383_20210730190704.parquet 1.0 
2.0 driver_13.0 4.0 100.0   rider_1 
1234ajsdfih23498q405qtahgkfsg   americas/united_states/san_francisco/
{code}
 

 

 


was (Author: shivnarayan):
I was just playing around. Just dumping my findings here. 

I tried creating a table via spark shell as per quick start. And then executed 
this command

create table hudi_cow1 (begin_lat double, begin_lon double, driver string, 
end_lat double, end_lon double, fare double, partitionpath string, rider 
string, ts bigint, uuid string) using hudi options(primaryKey = 'uuid', 
precombineField = 'ts') partitioned by (partitionpath) location 
'file:///tmp/hudi_cow/';

table name has to match the table name as per hoodie.properties. 

Note: I created this table with latest master and so it had all the required 
properties required for sql even though it was created with spark ds. 

 

After this, I tried to insert records

insert into hudi_cow1 values(1.0, 2.0, "driver_1", 3.0, 4.0, 100.0, "rider_1", 
12345, "ajsdfih23498q405qtahgkfsg", "americas/united_states/san_francisco/");

 

I see that for record key and partition path, respective field names are 
prefixed to col values. 

Result of select command. 

// showing 2 rows. 1 row was inserted via spark-shell and another one(2nd row) 
is inserted via spark-sql. 
{code:java}
20210730180218  20210730180218_1_8  ef9f4d56-12e0-4266-91ad-c4bca0580db6
americas/united_states/san_francisco
14e81925-2479-4a57-a932-42d1078fe988-0_1-27-28_20210730180218.parque0.1856488085068272
  0.9694586417848392  driver-213  0.38186367037201974 
0.25252652214479043 33.92216483948643   rider-213   1627136598584   
ef9f4d56-12e0-4266-91ad-c4bca0580db6americas/united_states/san_francisco

20210730190704  20210730190704_0_1001   uuid:ajsdfih23498q405qtahgkfsg  
partitionpath=americas%2Funited_states%2Fsan_francisco%2F   
9a350a54-bb5d-4aba-bf5e-bbcc665c4449-0_0-66-3383_20210730190704.parquet 1.0 
2.0 driver_13.0 4.0 100.0   rider_1 
1234ajsdfih23498q405qtahgkfsg   americas/united_states/san_francisco/
{code}
 

 

 

> [SQL] Spark Sql Support For The Exists Hoodie Table
> ---
>
> Key: HUDI-1842
> URL: https://issues.apache.org/jira/browse/HUDI-1842
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Priority: Blocker
>  Labels: release-blocker
> Fix For: 0.9.0
>
>
> In order to support spark sql for hoodie, we persist some table properties to 
> the hoodie.properties. e.g. primaryKey, preCombineField, partition columns.  
> For the exists hoodie tables, these  properties are missing. We need do some 
> code in UpgradeDowngrade to support spark sql for the exists tables.



--
This message was sent by Atlassian Jira