[jira] [Updated] (SPARK-20697) MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables.
[ https://issues.apache.org/jira/browse/SPARK-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-20697: -- Priority: Major (was: Critical) This sounds like Hive functionality though; is it even resolvable in Spark? > MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables. > -- > > Key: SPARK-20697 > URL: https://issues.apache.org/jira/browse/SPARK-20697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0, 2.2.0, 2.2.1, 2.3.0 >Reporter: Abhishek Madav >Priority: Major > > MSCK REPAIR TABLE used to recover partitions for a partitioned+bucketed table > does not restore the bucketing information to the storage descriptor in the > metastore. > Steps to reproduce: > 1) Create a paritioned+bucketed table in hive: CREATE TABLE partbucket(a int) > PARTITIONED BY (b int) CLUSTERED BY (a) INTO 10 BUCKETS ROW FORMAT DELIMITED > FIELDS TERMINATED BY ','; > 2) In Hive-CLI issue a desc formatted for the table. > # col_namedata_type comment > > a int > > # Partition Information > # col_namedata_type comment > > b int > > # Detailed Table Information > Database: sparkhivebucket > Owner:devbld > CreateTime: Wed May 10 10:31:07 PDT 2017 > LastAccessTime: UNKNOWN > Protect Mode: None > Retention:0 > Location: hdfs://localhost:8020/user/hive/warehouse/partbucket > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1494437467 > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: 10 > Bucket Columns: [a] > Sort Columns: [] > Storage Desc Params: > field.delim , > serialization.format, > 3) In spark-shell, > scala> spark.sql("MSCK REPAIR TABLE partbucket") > 4) Back to Hive-CLI > desc formatted partbucket; > # col_namedata_type comment > > a int > > # Partition Information > # col_namedata_type comment > > b int > > # Detailed Table Information > Database: sparkhivebucket > Owner:devbld > CreateTime: Wed May 10 10:31:07 PDT 2017 > LastAccessTime: UNKNOWN > Protect Mode: None > Retention:0 > Location: > hdfs://localhost:8020/user/hive/warehouse/sparkhivebucket.db/partbucket > Table Type: MANAGED_TABLE > Table Parameters: > spark.sql.partitionProvider catalog > transient_lastDdlTime 1494437647 > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > Storage Desc Params: > field.delim , > serialization.format, > Further inserts to this table cannot be made in bucketed fashion through > Hive. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20697) MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables.
[ https://issues.apache.org/jira/browse/SPARK-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Madav updated SPARK-20697: --- Priority: Critical (was: Major) > MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables. > -- > > Key: SPARK-20697 > URL: https://issues.apache.org/jira/browse/SPARK-20697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0, 2.2.0, 2.2.1, 2.3.0 >Reporter: Abhishek Madav >Priority: Critical > > MSCK REPAIR TABLE used to recover partitions for a partitioned+bucketed table > does not restore the bucketing information to the storage descriptor in the > metastore. > Steps to reproduce: > 1) Create a paritioned+bucketed table in hive: CREATE TABLE partbucket(a int) > PARTITIONED BY (b int) CLUSTERED BY (a) INTO 10 BUCKETS ROW FORMAT DELIMITED > FIELDS TERMINATED BY ','; > 2) In Hive-CLI issue a desc formatted for the table. > # col_namedata_type comment > > a int > > # Partition Information > # col_namedata_type comment > > b int > > # Detailed Table Information > Database: sparkhivebucket > Owner:devbld > CreateTime: Wed May 10 10:31:07 PDT 2017 > LastAccessTime: UNKNOWN > Protect Mode: None > Retention:0 > Location: hdfs://localhost:8020/user/hive/warehouse/partbucket > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1494437467 > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: 10 > Bucket Columns: [a] > Sort Columns: [] > Storage Desc Params: > field.delim , > serialization.format, > 3) In spark-shell, > scala> spark.sql("MSCK REPAIR TABLE partbucket") > 4) Back to Hive-CLI > desc formatted partbucket; > # col_namedata_type comment > > a int > > # Partition Information > # col_namedata_type comment > > b int > > # Detailed Table Information > Database: sparkhivebucket > Owner:devbld > CreateTime: Wed May 10 10:31:07 PDT 2017 > LastAccessTime: UNKNOWN > Protect Mode: None > Retention:0 > Location: > hdfs://localhost:8020/user/hive/warehouse/sparkhivebucket.db/partbucket > Table Type: MANAGED_TABLE > Table Parameters: > spark.sql.partitionProvider catalog > transient_lastDdlTime 1494437647 > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > Storage Desc Params: > field.delim , > serialization.format, > Further inserts to this table cannot be made in bucketed fashion through > Hive. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20697) MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables.
[ https://issues.apache.org/jira/browse/SPARK-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Madav updated SPARK-20697: --- Affects Version/s: 2.2.0 2.2.1 2.3.0 > MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables. > -- > > Key: SPARK-20697 > URL: https://issues.apache.org/jira/browse/SPARK-20697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0, 2.2.0, 2.2.1, 2.3.0 >Reporter: Abhishek Madav >Priority: Major > > MSCK REPAIR TABLE used to recover partitions for a partitioned+bucketed table > does not restore the bucketing information to the storage descriptor in the > metastore. > Steps to reproduce: > 1) Create a paritioned+bucketed table in hive: CREATE TABLE partbucket(a int) > PARTITIONED BY (b int) CLUSTERED BY (a) INTO 10 BUCKETS ROW FORMAT DELIMITED > FIELDS TERMINATED BY ','; > 2) In Hive-CLI issue a desc formatted for the table. > # col_namedata_type comment > > a int > > # Partition Information > # col_namedata_type comment > > b int > > # Detailed Table Information > Database: sparkhivebucket > Owner:devbld > CreateTime: Wed May 10 10:31:07 PDT 2017 > LastAccessTime: UNKNOWN > Protect Mode: None > Retention:0 > Location: hdfs://localhost:8020/user/hive/warehouse/partbucket > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1494437467 > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: 10 > Bucket Columns: [a] > Sort Columns: [] > Storage Desc Params: > field.delim , > serialization.format, > 3) In spark-shell, > scala> spark.sql("MSCK REPAIR TABLE partbucket") > 4) Back to Hive-CLI > desc formatted partbucket; > # col_namedata_type comment > > a int > > # Partition Information > # col_namedata_type comment > > b int > > # Detailed Table Information > Database: sparkhivebucket > Owner:devbld > CreateTime: Wed May 10 10:31:07 PDT 2017 > LastAccessTime: UNKNOWN > Protect Mode: None > Retention:0 > Location: > hdfs://localhost:8020/user/hive/warehouse/sparkhivebucket.db/partbucket > Table Type: MANAGED_TABLE > Table Parameters: > spark.sql.partitionProvider catalog > transient_lastDdlTime 1494437647 > > # Storage Information > SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > Storage Desc Params: > field.delim , > serialization.format, > Further inserts to this table cannot be made in bucketed fashion through > Hive. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20697) MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables.
[ https://issues.apache.org/jira/browse/SPARK-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Madav updated SPARK-20697: --- Description: MSCK REPAIR TABLE used to recover partitions for a partitioned+bucketed table does not restore the bucketing information to the storage descriptor in the metastore. Steps to reproduce: 1) Create a paritioned+bucketed table in hive: CREATE TABLE partbucket(a int) PARTITIONED BY (b int) CLUSTERED BY (a) INTO 10 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 2) In Hive-CLI issue a desc formatted for the table. # col_name data_type comment a int # Partition Information # col_name data_type comment b int # Detailed Table Information Database: sparkhivebucket Owner: devbld CreateTime: Wed May 10 10:31:07 PDT 2017 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://localhost:8020/user/hive/warehouse/partbucket Table Type: MANAGED_TABLE Table Parameters: transient_lastDdlTime 1494437467 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:10 Bucket Columns: [a] Sort Columns: [] Storage Desc Params: field.delim , serialization.format, 3) In spark-shell, scala> spark.sql("MSCK REPAIR TABLE partbucket") 4) Back to Hive-CLI desc formatted partbucket; # col_name data_type comment a int # Partition Information # col_name data_type comment b int # Detailed Table Information Database: sparkhivebucket Owner: devbld CreateTime: Wed May 10 10:31:07 PDT 2017 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://localhost:8020/user/hive/warehouse/sparkhivebucket.db/partbucket Table Type: MANAGED_TABLE Table Parameters: spark.sql.partitionProvider catalog transient_lastDdlTime 1494437647 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim , serialization.format, Further inserts to this table cannot be made in bucketed fashion through Hive. was: MSCK REPAIR TABLE used to recover partitions for a partitioned+bucketed table does not restore the bucketing information to the storage descriptor in the metastore. Steps to reproduce: 1) Create a paritioned+bucketed table in hive: CREATE TABLE partbucket(a int) PARTITIONED BY (b int) CLUSTERED BY (a) INTO 10 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 2) In Hive-CLI issue a desc formatted for the table. # col_name data_type comment a int # Partition Information # col_name data_type comment b int # Detailed Table Information Database: sparkhivebucket Owner: