[ https://issues.apache.org/jira/browse/HUDI-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390817#comment-17390817 ]
sivabalan narayanan edited comment on HUDI-1842 at 7/30/21, 11:57 PM: ---------------------------------------------------------------------- I was just playing around. Just dumping my findings here. My intention: If incase we upgrade/update the hoodie.properties with appropriate entries, what it takes to start using it in spark-sql. I tried creating a table via spark shell as per quick start. And then executed this command via spark-sql create table hudi_cow1 (begin_lat double, begin_lon double, driver string, end_lat double, end_lon double, fare double, partitionpath string, rider string, ts bigint, uuid string) using hudi options(primaryKey = 'uuid', precombineField = 'ts') partitioned by (partitionpath) location 'file:///tmp/hudi_cow/'; table name has to match the table name as per hoodie.properties. Note: I created this table with latest master and so it had all the required properties required for sql even though it was created with spark ds. After this, I tried to insert records via spark-sql insert into hudi_cow1 values(1.0, 2.0, "driver_1", 3.0, 4.0, 100.0, "rider_1", 12345, "ajsdfih23498q405qtahgkfsg", "americas/united_states/san_francisco/"); I see that for record key and partition path, respective field names are prefixed to col values for meta fields. Result of select command. // showing 2 rows. 1 row was inserted via spark-shell and another one(2nd row) is inserted via spark-sql. {code:java} 20210730180218 20210730180218_1_8 ef9f4d56-12e0-4266-91ad-c4bca0580db6 americas/united_states/san_francisco 14e81925-2479-4a57-a932-42d1078fe988-0_1-27-28_20210730180218.parque0.1856488085068272 0.9694586417848392 driver-213 0.38186367037201974 0.25252652214479043 33.92216483948643 rider-213 1627136598584 ef9f4d56-12e0-4266-91ad-c4bca0580db6 americas/united_states/san_francisco 20210730190704 20210730190704_0_1001 uuid:ajsdfih23498q405qtahgkfsg partitionpath=americas%2Funited_states%2Fsan_francisco%2F 9a350a54-bb5d-4aba-bf5e-bbcc665c4449-0_0-66-3383_20210730190704.parquet 1.0 2.0 driver_1 3.0 4.0 100.0 rider_1 1234ajsdfih23498q405qtahgkfsg americas/united_states/san_francisco/ {code} was (Author: shivnarayan): I was just playing around. Just dumping my findings here. My intention: If incase we update the hoodie.properties with appropriate entries, what it takes to start using it in spark-sql. I tried creating a table via spark shell as per quick start. And then executed this command via spark-sql create table hudi_cow1 (begin_lat double, begin_lon double, driver string, end_lat double, end_lon double, fare double, partitionpath string, rider string, ts bigint, uuid string) using hudi options(primaryKey = 'uuid', precombineField = 'ts') partitioned by (partitionpath) location 'file:///tmp/hudi_cow/'; table name has to match the table name as per hoodie.properties. Note: I created this table with latest master and so it had all the required properties required for sql even though it was created with spark ds. After this, I tried to insert records insert into hudi_cow1 values(1.0, 2.0, "driver_1", 3.0, 4.0, 100.0, "rider_1", 12345, "ajsdfih23498q405qtahgkfsg", "americas/united_states/san_francisco/"); I see that for record key and partition path, respective field names are prefixed to col values for meta fields. Result of select command. // showing 2 rows. 1 row was inserted via spark-shell and another one(2nd row) is inserted via spark-sql. {code:java} 20210730180218 20210730180218_1_8 ef9f4d56-12e0-4266-91ad-c4bca0580db6 americas/united_states/san_francisco 14e81925-2479-4a57-a932-42d1078fe988-0_1-27-28_20210730180218.parque0.1856488085068272 0.9694586417848392 driver-213 0.38186367037201974 0.25252652214479043 33.92216483948643 rider-213 1627136598584 ef9f4d56-12e0-4266-91ad-c4bca0580db6 americas/united_states/san_francisco 20210730190704 20210730190704_0_1001 uuid:ajsdfih23498q405qtahgkfsg partitionpath=americas%2Funited_states%2Fsan_francisco%2F 9a350a54-bb5d-4aba-bf5e-bbcc665c4449-0_0-66-3383_20210730190704.parquet 1.0 2.0 driver_1 3.0 4.0 100.0 rider_1 1234ajsdfih23498q405qtahgkfsg americas/united_states/san_francisco/ {code} > [SQL] Spark Sql Support For The Exists Hoodie Table > --------------------------------------------------- > > Key: HUDI-1842 > URL: https://issues.apache.org/jira/browse/HUDI-1842 > Project: Apache Hudi > Issue Type: Sub-task > Reporter: pengzhiwei > Priority: Blocker > Labels: release-blocker > Fix For: 0.9.0 > > > In order to support spark sql for hoodie, we persist some table properties to > the hoodie.properties. e.g. primaryKey, preCombineField, partition columns. > For the exists hoodie tables, these properties are missing. We need do some > code in UpgradeDowngrade to support spark sql for the exists tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)