[ 
https://issues.apache.org/jira/browse/HIVE-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212517#comment-14212517
 ] 

Venki Korukanti commented on HIVE-5631:
---------------------------------------

Differences is .out files are:

{noformat}
< Num Buckets:          0                        
---
> Num Buckets:          -1                       
45a46,47
> Storage Desc Params:           
>       serialization.format    1
{noformat}

Second difference "Storage Desc Params" is due to deepcopying of parent table 
StorageDescriptor and not resetting in copied version. First difference is also 
due to deepcopying. However all tables that are not bucketized contain the "-1" 
as the value for "Num Buckets". I am wondering whether we should keep the same 
default value for index tables too.

> Index creation on a skew table fails
> ------------------------------------
>
>                 Key: HIVE-5631
>                 URL: https://issues.apache.org/jira/browse/HIVE-5631
>             Project: Hive
>          Issue Type: Bug
>          Components: Indexing
>    Affects Versions: 0.12.0, 0.13.0, 0.14.0
>            Reporter: Venki Korukanti
>            Assignee: Venki Korukanti
>         Attachments: HIVE-5631.1.patch.txt, HIVE-5631.2.patch.txt, 
> HIVE-5631.3.patch.txt, HIVE-5631.4.patch.txt
>
>
> REPRO STEPS:
> create database skewtest;
> use skewtest;
> create table skew (id bigint, acct string) skewed by (acct) on ('CC','CH');
> create index skew_indx on table skew (id) as 
> 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED 
> REBUILD;
> Last DDL fails with following error.
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. 
> InvalidObjectException(message:Invalid skew column [acct])
> When creating a table, Hive has sanity tests to make sure the columns have 
> proper names and the skewed columns are subset of the table columns. Here we 
> fail because index table has skewed column info. Index tables's skewed 
> columns include {acct} and the columns are {id, _bucketname, _offsets}. As 
> the skewed column {acct} is not part of the table columns Hive throws the 
> exception.
> The reason why Index table got skewed column info even though its definition 
> has no such info is: When creating the index table a deep copy of the base 
> table's StorageDescriptor (SD) (in this case 'skew') is made. And in that 
> copied SD, index specific parameters are set and unrelated parameters are 
> reset. Here skewed column info is not reset (there are few other params that 
> are not reset). That's why the index table contains the skewed column info.
> Fix: Instead of deep copying the base table StorageDescriptor, create a new 
> one from gathered info. This way it avoids the index table to inherit 
> unnecessary properties in SD from base table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to