[jira] [Commented] (SPARK-32281) Spark wipes out SORTED spec in metastore when DESC is used

2020-10-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212156#comment-17212156
 ] 

Apache Spark commented on SPARK-32281:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/30011

> Spark wipes out SORTED spec in metastore when DESC is used
> --
>
> Key: SPARK-32281
> URL: https://issues.apache.org/jira/browse/SPARK-32281
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Bruce Robbins
>Priority: Major
>
> When altering a Hive bucketed table or updating its statistics, Spark will 
> wipe out the SORTED specification in the metastore if the specification uses 
> DESC.
>  For example:
> {noformat}
> 0: jdbc:hive2://localhost:1> -- in beeline
> 0: jdbc:hive2://localhost:1> create table bucketed (a int, b int, c int, 
> d int) clustered by (c) sorted by (c asc, d desc) into 10 buckets;
> No rows affected (0.045 seconds)
> 0: jdbc:hive2://localhost:1> show create table bucketed;
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `bucketed`(   |
> |   `a` int, |
> |   `b` int, |
> |   `c` int, |
> |   `d` int) |
> | CLUSTERED BY ( |
> |   c)   |
> | SORTED BY (|
> |   c ASC,   |
> |   d DESC)  |
> | INTO 10 BUCKETS|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.mapred.TextInputFormat'   |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
> | LOCATION   |
> |   'file:/Users/bruce/hadoop/apache-hive-2.3.7-bin/warehouse/bucketed' |
> | TBLPROPERTIES (|
> |   'transient_lastDdlTime'='1594488043')|
> ++
> 21 rows selected (0.042 seconds)
> 0: jdbc:hive2://localhost:1> 
> -
> -
> -
> scala> // in spark
> scala> sql("alter table bucketed set tblproperties ('foo'='bar')")
> 20/07/11 10:21:36 WARN HiveConf: HiveConf of name hive.metastore.local does 
> not exist
> 20/07/11 10:21:38 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> res0: org.apache.spark.sql.DataFrame = []
> scala> 
> -
> -
> -
> 0: jdbc:hive2://localhost:1> -- back in beeline
> 0: jdbc:hive2://localhost:1> show create table bucketed;
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `bucketed`(   |
> |   `a` int, |
> |   `b` int, |
> |   `c` int, |
> |   `d` int) |
> | CLUSTERED BY ( |
> |   c)   |
> | INTO 10 BUCKETS|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.mapred.TextInputFormat'   |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
> | LOCATION   |
> |   'file:/Users/bruce/hadoop/apache-hive-2.3.7-bin/warehouse/bucketed' |
> | TBLPROPERTIES (|
> |   'foo'='bar', |
> |   'spark.sql.partitionProvider'='catalog', |
> |   'transient_lastDdlTime'='1594488098')|
> ++
> 20 rows selected (0.038 seconds)
> 0: jdbc:hive2://localhost:1> 
> {noformat}
> Note that the SORTED specification disappears.
> Another example, this time using insert:
> {noformat}
> 0: jdbc:hive2://localhost:1> -- in beeline
> 0: jdbc:hi

[jira] [Commented] (SPARK-32281) Spark wipes out SORTED spec in metastore when DESC is used

2020-07-15 Thread Ankit Raj Boudh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158125#comment-17158125
 ] 

Ankit Raj Boudh commented on SPARK-32281:
-

[~bersprockets], I will raise PR for this soon.

> Spark wipes out SORTED spec in metastore when DESC is used
> --
>
> Key: SPARK-32281
> URL: https://issues.apache.org/jira/browse/SPARK-32281
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Bruce Robbins
>Priority: Major
>
> When altering a Hive bucketed table or updating its statistics, Spark will 
> wipe out the SORTED specification in the metastore if the specification uses 
> DESC.
>  For example:
> {noformat}
> 0: jdbc:hive2://localhost:1> -- in beeline
> 0: jdbc:hive2://localhost:1> create table bucketed (a int, b int, c int, 
> d int) clustered by (c) sorted by (c asc, d desc) into 10 buckets;
> No rows affected (0.045 seconds)
> 0: jdbc:hive2://localhost:1> show create table bucketed;
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `bucketed`(   |
> |   `a` int, |
> |   `b` int, |
> |   `c` int, |
> |   `d` int) |
> | CLUSTERED BY ( |
> |   c)   |
> | SORTED BY (|
> |   c ASC,   |
> |   d DESC)  |
> | INTO 10 BUCKETS|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.mapred.TextInputFormat'   |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
> | LOCATION   |
> |   'file:/Users/bruce/hadoop/apache-hive-2.3.7-bin/warehouse/bucketed' |
> | TBLPROPERTIES (|
> |   'transient_lastDdlTime'='1594488043')|
> ++
> 21 rows selected (0.042 seconds)
> 0: jdbc:hive2://localhost:1> 
> -
> -
> -
> scala> // in spark
> scala> sql("alter table bucketed set tblproperties ('foo'='bar')")
> 20/07/11 10:21:36 WARN HiveConf: HiveConf of name hive.metastore.local does 
> not exist
> 20/07/11 10:21:38 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> res0: org.apache.spark.sql.DataFrame = []
> scala> 
> -
> -
> -
> 0: jdbc:hive2://localhost:1> -- back in beeline
> 0: jdbc:hive2://localhost:1> show create table bucketed;
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `bucketed`(   |
> |   `a` int, |
> |   `b` int, |
> |   `c` int, |
> |   `d` int) |
> | CLUSTERED BY ( |
> |   c)   |
> | INTO 10 BUCKETS|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.mapred.TextInputFormat'   |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
> | LOCATION   |
> |   'file:/Users/bruce/hadoop/apache-hive-2.3.7-bin/warehouse/bucketed' |
> | TBLPROPERTIES (|
> |   'foo'='bar', |
> |   'spark.sql.partitionProvider'='catalog', |
> |   'transient_lastDdlTime'='1594488098')|
> ++
> 20 rows selected (0.038 seconds)
> 0: jdbc:hive2://localhost:1> 
> {noformat}
> Note that the SORTED specification disappears.
> Another example, this time using insert:
> {noformat}
> 0: jdbc:hive2://localhost:1> -- in beeline
> 0: jdbc:hive2://localhost:1> create table bucketed (a int,