[jira] [Commented] (SPARK-32281) Spark wipes out SORTED spec in metastore when DESC is used
[ https://issues.apache.org/jira/browse/SPARK-32281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212156#comment-17212156 ] Apache Spark commented on SPARK-32281: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/30011 > Spark wipes out SORTED spec in metastore when DESC is used > -- > > Key: SPARK-32281 > URL: https://issues.apache.org/jira/browse/SPARK-32281 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Bruce Robbins >Priority: Major > > When altering a Hive bucketed table or updating its statistics, Spark will > wipe out the SORTED specification in the metastore if the specification uses > DESC. > For example: > {noformat} > 0: jdbc:hive2://localhost:1> -- in beeline > 0: jdbc:hive2://localhost:1> create table bucketed (a int, b int, c int, > d int) clustered by (c) sorted by (c asc, d desc) into 10 buckets; > No rows affected (0.045 seconds) > 0: jdbc:hive2://localhost:1> show create table bucketed; > ++ > | createtab_stmt | > ++ > | CREATE TABLE `bucketed`( | > | `a` int, | > | `b` int, | > | `c` int, | > | `d` int) | > | CLUSTERED BY ( | > | c) | > | SORTED BY (| > | c ASC, | > | d DESC) | > | INTO 10 BUCKETS| > | ROW FORMAT SERDE | > | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | > | STORED AS INPUTFORMAT | > | 'org.apache.hadoop.mapred.TextInputFormat' | > | OUTPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | > | LOCATION | > | 'file:/Users/bruce/hadoop/apache-hive-2.3.7-bin/warehouse/bucketed' | > | TBLPROPERTIES (| > | 'transient_lastDdlTime'='1594488043')| > ++ > 21 rows selected (0.042 seconds) > 0: jdbc:hive2://localhost:1> > - > - > - > scala> // in spark > scala> sql("alter table bucketed set tblproperties ('foo'='bar')") > 20/07/11 10:21:36 WARN HiveConf: HiveConf of name hive.metastore.local does > not exist > 20/07/11 10:21:38 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, > since hive.security.authorization.manager is set to instance of > HiveAuthorizerFactory. > res0: org.apache.spark.sql.DataFrame = [] > scala> > - > - > - > 0: jdbc:hive2://localhost:1> -- back in beeline > 0: jdbc:hive2://localhost:1> show create table bucketed; > ++ > | createtab_stmt | > ++ > | CREATE TABLE `bucketed`( | > | `a` int, | > | `b` int, | > | `c` int, | > | `d` int) | > | CLUSTERED BY ( | > | c) | > | INTO 10 BUCKETS| > | ROW FORMAT SERDE | > | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | > | STORED AS INPUTFORMAT | > | 'org.apache.hadoop.mapred.TextInputFormat' | > | OUTPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | > | LOCATION | > | 'file:/Users/bruce/hadoop/apache-hive-2.3.7-bin/warehouse/bucketed' | > | TBLPROPERTIES (| > | 'foo'='bar', | > | 'spark.sql.partitionProvider'='catalog', | > | 'transient_lastDdlTime'='1594488098')| > ++ > 20 rows selected (0.038 seconds) > 0: jdbc:hive2://localhost:1> > {noformat} > Note that the SORTED specification disappears. > Another example, this time using insert: > {noformat} > 0: jdbc:hive2://localhost:1> -- in beeline > 0: jdbc:hi
[jira] [Commented] (SPARK-32281) Spark wipes out SORTED spec in metastore when DESC is used
[ https://issues.apache.org/jira/browse/SPARK-32281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158125#comment-17158125 ] Ankit Raj Boudh commented on SPARK-32281: - [~bersprockets], I will raise PR for this soon. > Spark wipes out SORTED spec in metastore when DESC is used > -- > > Key: SPARK-32281 > URL: https://issues.apache.org/jira/browse/SPARK-32281 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Bruce Robbins >Priority: Major > > When altering a Hive bucketed table or updating its statistics, Spark will > wipe out the SORTED specification in the metastore if the specification uses > DESC. > For example: > {noformat} > 0: jdbc:hive2://localhost:1> -- in beeline > 0: jdbc:hive2://localhost:1> create table bucketed (a int, b int, c int, > d int) clustered by (c) sorted by (c asc, d desc) into 10 buckets; > No rows affected (0.045 seconds) > 0: jdbc:hive2://localhost:1> show create table bucketed; > ++ > | createtab_stmt | > ++ > | CREATE TABLE `bucketed`( | > | `a` int, | > | `b` int, | > | `c` int, | > | `d` int) | > | CLUSTERED BY ( | > | c) | > | SORTED BY (| > | c ASC, | > | d DESC) | > | INTO 10 BUCKETS| > | ROW FORMAT SERDE | > | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | > | STORED AS INPUTFORMAT | > | 'org.apache.hadoop.mapred.TextInputFormat' | > | OUTPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | > | LOCATION | > | 'file:/Users/bruce/hadoop/apache-hive-2.3.7-bin/warehouse/bucketed' | > | TBLPROPERTIES (| > | 'transient_lastDdlTime'='1594488043')| > ++ > 21 rows selected (0.042 seconds) > 0: jdbc:hive2://localhost:1> > - > - > - > scala> // in spark > scala> sql("alter table bucketed set tblproperties ('foo'='bar')") > 20/07/11 10:21:36 WARN HiveConf: HiveConf of name hive.metastore.local does > not exist > 20/07/11 10:21:38 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, > since hive.security.authorization.manager is set to instance of > HiveAuthorizerFactory. > res0: org.apache.spark.sql.DataFrame = [] > scala> > - > - > - > 0: jdbc:hive2://localhost:1> -- back in beeline > 0: jdbc:hive2://localhost:1> show create table bucketed; > ++ > | createtab_stmt | > ++ > | CREATE TABLE `bucketed`( | > | `a` int, | > | `b` int, | > | `c` int, | > | `d` int) | > | CLUSTERED BY ( | > | c) | > | INTO 10 BUCKETS| > | ROW FORMAT SERDE | > | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | > | STORED AS INPUTFORMAT | > | 'org.apache.hadoop.mapred.TextInputFormat' | > | OUTPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | > | LOCATION | > | 'file:/Users/bruce/hadoop/apache-hive-2.3.7-bin/warehouse/bucketed' | > | TBLPROPERTIES (| > | 'foo'='bar', | > | 'spark.sql.partitionProvider'='catalog', | > | 'transient_lastDdlTime'='1594488098')| > ++ > 20 rows selected (0.038 seconds) > 0: jdbc:hive2://localhost:1> > {noformat} > Note that the SORTED specification disappears. > Another example, this time using insert: > {noformat} > 0: jdbc:hive2://localhost:1> -- in beeline > 0: jdbc:hive2://localhost:1> create table bucketed (a int,