[ https://issues.apache.org/jira/browse/SPARK-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek Madav updated SPARK-20697: ----------------------------------- Affects Version/s: 2.2.0 2.2.1 2.3.0 > MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables. > -------------------------------------------------------------------------- > > Key: SPARK-20697 > URL: https://issues.apache.org/jira/browse/SPARK-20697 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0, 2.2.0, 2.2.1, 2.3.0 > Reporter: Abhishek Madav > Priority: Major > > MSCK REPAIR TABLE used to recover partitions for a partitioned+bucketed table > does not restore the bucketing information to the storage descriptor in the > metastore. > Steps to reproduce: > 1) Create a paritioned+bucketed table in hive: CREATE TABLE partbucket(a int) > PARTITIONED BY (b int) CLUSTERED BY (a) INTO 10 BUCKETS ROW FORMAT DELIMITED > FIELDS TERMINATED BY ','; > 2) In Hive-CLI issue a desc formatted for the table. > # col_name data_type comment > > a int > > # Partition Information > # col_name data_type comment > > b int > > # Detailed Table Information > Database: sparkhivebucket > Owner: devbld > CreateTime: Wed May 10 10:31:07 PDT 2017 > LastAccessTime: UNKNOWN > Protect Mode: None > Retention: 0 > Location: hdfs://localhost:8020/user/hive/warehouse/partbucket > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1494437467 > > # Storage Information > SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: 10 > Bucket Columns: [a] > Sort Columns: [] > Storage Desc Params: > field.delim , > serialization.format , > 3) In spark-shell, > scala> spark.sql("MSCK REPAIR TABLE partbucket") > 4) Back to Hive-CLI > desc formatted partbucket; > # col_name data_type comment > > a int > > # Partition Information > # col_name data_type comment > > b int > > # Detailed Table Information > Database: sparkhivebucket > Owner: devbld > CreateTime: Wed May 10 10:31:07 PDT 2017 > LastAccessTime: UNKNOWN > Protect Mode: None > Retention: 0 > Location: > hdfs://localhost:8020/user/hive/warehouse/sparkhivebucket.db/partbucket > Table Type: MANAGED_TABLE > Table Parameters: > spark.sql.partitionProvider catalog > transient_lastDdlTime 1494437647 > > # Storage Information > SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > Storage Desc Params: > field.delim , > serialization.format , > Further inserts to this table cannot be made in bucketed fashion through > Hive. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org