[ https://issues.apache.org/jira/browse/HIVE-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Venki Korukanti updated HIVE-5631: ---------------------------------- Status: Patch Available (was: Open) > Index creation on a skew table fails > ------------------------------------ > > Key: HIVE-5631 > URL: https://issues.apache.org/jira/browse/HIVE-5631 > Project: Hive > Issue Type: Bug > Components: Indexing > Affects Versions: 0.13.0, 0.12.0, 0.14.0 > Reporter: Venki Korukanti > Assignee: Venki Korukanti > Attachments: HIVE-5631.1.patch.txt, HIVE-5631.2.patch.txt, > HIVE-5631.3.patch.txt, HIVE-5631.4.patch.txt > > > REPRO STEPS: > create database skewtest; > use skewtest; > create table skew (id bigint, acct string) skewed by (acct) on ('CC','CH'); > create index skew_indx on table skew (id) as > 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED > REBUILD; > Last DDL fails with following error. > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. > InvalidObjectException(message:Invalid skew column [acct]) > When creating a table, Hive has sanity tests to make sure the columns have > proper names and the skewed columns are subset of the table columns. Here we > fail because index table has skewed column info. Index tables's skewed > columns include {acct} and the columns are {id, _bucketname, _offsets}. As > the skewed column {acct} is not part of the table columns Hive throws the > exception. > The reason why Index table got skewed column info even though its definition > has no such info is: When creating the index table a deep copy of the base > table's StorageDescriptor (SD) (in this case 'skew') is made. And in that > copied SD, index specific parameters are set and unrelated parameters are > reset. Here skewed column info is not reset (there are few other params that > are not reset). That's why the index table contains the skewed column info. > Fix: Instead of deep copying the base table StorageDescriptor, create a new > one from gathered info. This way it avoids the index table to inherit > unnecessary properties in SD from base table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)