[jira] [Updated] (SPARK-20697) MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables.

Abhishek Madav (JIRA) Tue, 20 Mar 2018 16:29:25 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Abhishek Madav updated SPARK-20697:
-----------------------------------
    Affects Version/s: 2.2.0
                       2.2.1
                       2.3.0

> MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables.
> --------------------------------------------------------------------------
>
>                 Key: SPARK-20697
>                 URL: https://issues.apache.org/jira/browse/SPARK-20697
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0, 2.2.0, 2.2.1, 2.3.0
>            Reporter: Abhishek Madav
>            Priority: Major
>
> MSCK REPAIR TABLE used to recover partitions for a partitioned+bucketed table 
> does not restore the bucketing information to the storage descriptor in the 
> metastore. 
> Steps to reproduce:
> 1) Create a paritioned+bucketed table in hive: CREATE TABLE partbucket(a int) 
> PARTITIONED BY (b int) CLUSTERED BY (a) INTO 10 BUCKETS ROW FORMAT DELIMITED 
> FIELDS TERMINATED BY ',';
> 2) In Hive-CLI issue a desc formatted for the table.
> # col_name                    data_type               comment             
>                
> a                     int                                         
>                
> # Partition Information                
> # col_name                    data_type               comment             
>                
> b                     int                                         
>                
> # Detailed Table Information           
> Database:             sparkhivebucket          
> Owner:                devbld                   
> CreateTime:           Wed May 10 10:31:07 PDT 2017     
> LastAccessTime:       UNKNOWN                  
> Protect Mode:         None                     
> Retention:            0                        
> Location:             hdfs://localhost:8020/user/hive/warehouse/partbucket 
> Table Type:           MANAGED_TABLE            
> Table Parameters:              
>       transient_lastDdlTime   1494437467          
>                
> # Storage Information          
> SerDe Library:        org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe      
>  
> InputFormat:          org.apache.hadoop.mapred.TextInputFormat         
> OutputFormat:         
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
> Compressed:           No                       
> Num Buckets:          10                       
> Bucket Columns:       [a]                      
> Sort Columns:         []                       
> Storage Desc Params:           
>       field.delim             ,                   
>       serialization.format    , 
> 3) In spark-shell, 
> scala> spark.sql("MSCK REPAIR TABLE partbucket")
> 4) Back to Hive-CLI 
> desc formatted partbucket;
> # col_name                    data_type               comment             
>                
> a                     int                                         
>                
> # Partition Information                
> # col_name                    data_type               comment             
>                
> b                     int                                         
>                
> # Detailed Table Information           
> Database:             sparkhivebucket          
> Owner:                devbld                   
> CreateTime:           Wed May 10 10:31:07 PDT 2017     
> LastAccessTime:       UNKNOWN                  
> Protect Mode:         None                     
> Retention:            0                        
> Location:             
> hdfs://localhost:8020/user/hive/warehouse/sparkhivebucket.db/partbucket 
> Table Type:           MANAGED_TABLE            
> Table Parameters:              
>       spark.sql.partitionProvider     catalog             
>       transient_lastDdlTime   1494437647          
>                
> # Storage Information          
> SerDe Library:        org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe      
>  
> InputFormat:          org.apache.hadoop.mapred.TextInputFormat         
> OutputFormat:         
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat       
> Compressed:           No                       
> Num Buckets:          -1                       
> Bucket Columns:       []                       
> Sort Columns:         []                       
> Storage Desc Params:           
>       field.delim             ,                   
>       serialization.format    , 
> Further inserts to this table cannot be made in bucketed fashion through 
> Hive. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20697) MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables.

Reply via email to