[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2014-04-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969001#comment-13969001
 ] 

Ashutosh Chauhan commented on HIVE-6867:


Dupe of HIVE-3077 / HIVE-3244 ?

 Bucketized Table feature fails in some cases
 

 Key: HIVE-6867
 URL: https://issues.apache.org/jira/browse/HIVE-6867
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran

 Bucketized Table feature fails in some cases. if src  destination is 
 bucketed on same key, and if actual data in the src is not bucketed (because 
 data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
 bucketed while writing to destination.
 Example
 --
 CREATE TABLE P1(key STRING, val STRING)
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
 P1;
 – perform an insert to make sure there are 2 files
 INSERT OVERWRITE TABLE P1 select key, val from P1;
 --
 This is not a regression. This has never worked.
 This got only discovered due to Hadoop2 changes.
 In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
 what is requested by app. Hadoop2 now honors the number of reducer setting in 
 local mode (by spawning threads).
 Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2014-04-08 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963472#comment-13963472
 ] 

Laljo John Pullokkaran commented on HIVE-6867:
--

BucketingSortingReduceSinkOptimizer removes RS op if src  destination is 
bucketed on same key.

 Bucketized Table feature fails in some cases
 

 Key: HIVE-6867
 URL: https://issues.apache.org/jira/browse/HIVE-6867
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran

 Bucketized Table feature fails in some cases. if src  destination is 
 bucketed on same key, and if actual data in the src is not bucketed (because 
 data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
 bucketed while writing to destination.
 Example
 --
 CREATE TABLE P1(key STRING, val STRING)
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '/Users/jpullokkaran/apache-hive1/data/files/P1.txt' 
 INTO TABLE P1;
 – perform an insert to make sure there are 2 files
 INSERT OVERWRITE TABLE P1 select key, val from P1;
 --
 This is not a regression. This has never worked.
 This got only discovered due to Hadoop2 changes.
 In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
 what is requested by app. Hadoop2 now honors the number of reducer setting in 
 local mode (by spawning threads).
 Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)