[ https://issues.apache.org/jira/browse/SPARK-46477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashish Sharma updated SPARK-46477: ---------------------------------- Description: Presto fail to read partition of hive table update by spark sql with following error {noformat} Hive table (bucketdb.dept_part_buk) is bucketed but partition (location=nk) is not bucketed {noformat} Spark SQL which cause read failure in presto {noformat} ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///tmp/location=nk1'; {noformat} *Root Cause* ALTER TABLE table_name [PARTITION partition_spec] SET LOCATION "loc"; is dropping bucket columns information in HMS. Repo Script {code:java} CREATE TABLE bucketdb.dept_part_buk ( deptno INT, dname STRING, location STRING) PARTITIONED BY (location) CLUSTERED BY (deptno) INTO 2 BUCKETS STORED AS textfile location 'file:///tmp'; ALTER TABLE bucketdb.dept_part_buk ADD IF NOT EXISTS PARTITION (location='nk') LOCATION 'file:///tmp/location=nk'; ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///tmp/location=nk1'; {code} Investigation Before running query HMS state mysql> select * from SDS where SD_ID = 137; +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+--------------------------------------------+-------------+------------------------------------------------------------+----------+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+--------------------------------------------+-------------+------------------------------------------------------------+----------+ | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | file:/tmp/location=nk | 2 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+--------------------------------------------+-------------+------------------------------------------------------------+----------+ mysql> select * from BUCKETING_COLS where SD_ID = 137; +-------+-----------------+-------------+ | SD_ID | BUCKET_COL_NAME | INTEGER_IDX | +-------+-----------------+-------------+ | 137 | deptno | 0 | +-------+-----------------+-------------+ Spark Sql Query ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///Users/someuser/sparkdata/location=nk1'; HMS state after Running the Spark SQL query HMS mysql> select * from SDS where SD_ID = 137; +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+---------------------------------------------+-------------+------------------------------------------------------------+----------+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+---------------------------------------------+-------------+------------------------------------------------------------+----------+ | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | file:/tmp/location=nk1 | 0 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+---------------------------------------------+-------------+------------------------------------------------------------+----------+ mysql> select * from BUCKETING_COLS where SD_ID = 141; Empty set (0.00 sec) was: Presto fail to read partition of hive table update by spark sql with following error {noformat} Hive table (bucketdb.dept_part_buk) is bucketed but partition (location=nk) is not bucketed {noformat} Spark SQL which cause read failure in presto ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///tmp/location=nk1'; Root Cause ALTER TABLE table_name [PARTITION partition_spec] SET LOCATION "loc"; is dropping bucket columns information in HMS. Repo Script CREATE TABLE bucketdb.dept_part_buk ( deptno INT, dname STRING, location STRING) PARTITIONED BY (location) CLUSTERED BY (deptno) INTO 2 BUCKETS STORED AS textfile location 'file:///tmp'; ALTER TABLE bucketdb.dept_part_buk ADD IF NOT EXISTS PARTITION (location='nk') LOCATION 'file:///tmp/location=nk'; ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///tmp/location=nk1'; Investigation Before running query HMS state mysql> select * from SDS where SD_ID = 137; +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+--------------------------------------------+-------------+------------------------------------------------------------+----------+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+--------------------------------------------+-------------+------------------------------------------------------------+----------+ | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | file:/tmp/location=nk | 2 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+--------------------------------------------+-------------+------------------------------------------------------------+----------+ mysql> select * from BUCKETING_COLS where SD_ID = 137; +-------+-----------------+-------------+ | SD_ID | BUCKET_COL_NAME | INTEGER_IDX | +-------+-----------------+-------------+ | 137 | deptno | 0 | +-------+-----------------+-------------+ Spark Sql Query ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///Users/someuser/sparkdata/location=nk1'; HMS state after Running the Spark SQL query HMS mysql> select * from SDS where SD_ID = 137; +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+---------------------------------------------+-------------+------------------------------------------------------------+----------+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+---------------------------------------------+-------------+------------------------------------------------------------+----------+ | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | file:/tmp/location=nk1 | 0 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+---------------------------------------------+-------------+------------------------------------------------------------+----------+ mysql> select * from BUCKETING_COLS where SD_ID = 141; Empty set (0.00 sec) > Hive table (bucketdb.dept_part_buk) is bucketed but partition (location=nk) > is not bucketed > ------------------------------------------------------------------------------------------- > > Key: SPARK-46477 > URL: https://issues.apache.org/jira/browse/SPARK-46477 > Project: Spark > Issue Type: Task > Components: SQL > Affects Versions: 3.3.2 > Reporter: Ashish Sharma > Priority: Minor > > Presto fail to read partition of hive table update by spark sql with > following error > {noformat} > Hive table (bucketdb.dept_part_buk) is bucketed but partition (location=nk) > is not bucketed > {noformat} > Spark SQL which cause read failure in presto > {noformat} > ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION > 'file:///tmp/location=nk1'; > {noformat} > *Root Cause* > ALTER TABLE table_name [PARTITION partition_spec] SET LOCATION "loc"; > is dropping bucket columns information in HMS. > Repo Script > {code:java} > CREATE TABLE bucketdb.dept_part_buk ( deptno INT, dname STRING, location > STRING) PARTITIONED BY (location) CLUSTERED BY (deptno) INTO 2 BUCKETS STORED > AS textfile location 'file:///tmp'; > ALTER TABLE bucketdb.dept_part_buk ADD IF NOT EXISTS PARTITION > (location='nk') LOCATION 'file:///tmp/location=nk'; > ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION > 'file:///tmp/location=nk1'; > {code} > Investigation > Before running query HMS state > mysql> select * from SDS where SD_ID = 137; > +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+--------------------------------------------+-------------+------------------------------------------------------------+----------+ > | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED > | IS_STOREDASSUBDIRECTORIES | LOCATION > | NUM_BUCKETS | OUTPUT_FORMAT > | SERDE_ID | > +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+--------------------------------------------+-------------+------------------------------------------------------------+----------+ > | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 > | 0x00 | > file:/tmp/location=nk | 2 | > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | > +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+--------------------------------------------+-------------+------------------------------------------------------------+----------+ > mysql> select * from BUCKETING_COLS where SD_ID = 137; > +-------+-----------------+-------------+ > | SD_ID | BUCKET_COL_NAME | INTEGER_IDX | > +-------+-----------------+-------------+ > | 137 | deptno | 0 | > +-------+-----------------+-------------+ > Spark Sql Query > ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION > 'file:///Users/someuser/sparkdata/location=nk1'; > HMS state after Running the Spark SQL query HMS > mysql> select * from SDS where SD_ID = 137; > +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+---------------------------------------------+-------------+------------------------------------------------------------+----------+ > | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED > | IS_STOREDASSUBDIRECTORIES | LOCATION > | NUM_BUCKETS | OUTPUT_FORMAT > | SERDE_ID | > +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+---------------------------------------------+-------------+------------------------------------------------------------+----------+ > | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 > | 0x00 | > file:/tmp/location=nk1 | 0 | > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | > +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+---------------------------------------------+-------------+------------------------------------------------------------+----------+ > mysql> select * from BUCKETING_COLS where SD_ID = 141; > Empty set (0.00 sec) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org