[ https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963472#comment-13963472 ]
Laljo John Pullokkaran commented on HIVE-6867: ---------------------------------------------- BucketingSortingReduceSinkOptimizer removes RS op if src & destination is bucketed on same key. > Bucketized Table feature fails in some cases > -------------------------------------------- > > Key: HIVE-6867 > URL: https://issues.apache.org/jira/browse/HIVE-6867 > Project: Hive > Issue Type: Bug > Components: HiveServer2 > Affects Versions: 0.12.0 > Reporter: Laljo John Pullokkaran > Assignee: Laljo John Pullokkaran > > Bucketized Table feature fails in some cases. if src & destination is > bucketed on same key, and if actual data in the src is not bucketed (because > data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be > bucketed while writing to destination. > Example > ---------------------------------------------------------------------- > CREATE TABLE P1(key STRING, val STRING) > CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; > LOAD DATA LOCAL INPATH '/Users/jpullokkaran/apache-hive1/data/files/P1.txt' > INTO TABLE P1; > – perform an insert to make sure there are 2 files > INSERT OVERWRITE TABLE P1 select key, val from P1; > -------------------------------------------------- > This is not a regression. This has never worked. > This got only discovered due to Hadoop2 changes. > In Hadoop1, in local mode, number of reducers will always be 1, regardless of > what is requested by app. Hadoop2 now honors the number of reducer setting in > local mode (by spawning threads). > Long term solution seems to be to prevent load data for bucketed table. -- This message was sent by Atlassian JIRA (v6.2#6252)