Abhishek Madav created SPARK-20697:
--------------------------------------

             Summary: MSCK REPAIR TABLE resets the Storage Information for 
bucketed hive tables.
                 Key: SPARK-20697
                 URL: https://issues.apache.org/jira/browse/SPARK-20697
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: Abhishek Madav


MSCK REPAIR TABLE used to recover partitions for a partitioned+bucketed table 
does not restore the bucketing information to the storage descriptor in the 
metastore. 

Steps to reproduce:
1) Create a paritioned+bucketed table in hive: CREATE TABLE partbucket(a int) 
PARTITIONED BY (b int) CLUSTERED BY (a) INTO 10 BUCKETS ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',';

2) In Hive-CLI issue a desc formatted for the table.

# col_name              data_type               comment             
                 
a                       int                                         
                 
# Partition Information          
# col_name              data_type               comment             
                 
b                       int                                         
                 
# Detailed Table Information             
Database:               sparkhivebucket          
Owner:                  devbld                   
CreateTime:             Wed May 10 10:31:07 PDT 2017     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://localhost:8020/user/hive/warehouse/partbucket 
Table Type:             MANAGED_TABLE            
Table Parameters:                
        transient_lastDdlTime   1494437467          
                 
# Storage Information            
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe      
 
InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
OutputFormat:           
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
Compressed:             No                       
Num Buckets:            10                       
Bucket Columns:         [a]                      
Sort Columns:           []                       
Storage Desc Params:             
        field.delim             ,                   
        serialization.format    , 

3) In spark-shell, 

scala> spark.sql("MSCK REPAIR TABLE partbucket")

4) Back to Hive-CLI 

desc formatted partbucket;

# col_name              data_type               comment             
                 
a                       int                                         
                 
# Partition Information          
# col_name              data_type               comment             
                 
b                       int                                         
                 
# Detailed Table Information             
Database:               sparkhivebucket          
Owner:                  devbld                   
CreateTime:             Wed May 10 10:31:07 PDT 2017     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               
hdfs://localhost:8020/user/hive/warehouse/sparkhivebucket.db/partbucket 
Table Type:             MANAGED_TABLE            
Table Parameters:                
        spark.sql.partitionProvider     catalog             
        transient_lastDdlTime   1494437647          
                 
# Storage Information            
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe      
 
InputFormat:            org.apache.hadoop.mapred.TextInputFormat         
OutputFormat:           
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:             
        field.delim             ,                   
        serialization.format    , 






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to