[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the "distibute by" clause barfs

2015-02-19 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Attachment: HIVE-9718-0.14.patch
HIVE-9718-1.0.patch
HIVE-9718.patch

Patches for trunk, 0.14 release, 1.0 release

> Insert into dynamic partitions with same column structure in the "distibute 
> by" clause barfs
> 
>
> Key: HIVE-9718
> URL: https://issues.apache.org/jira/browse/HIVE-9718
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.0.0
>Reporter: Pavan Srinivas
>Priority: Critical
> Attachments: HIVE-9718-0.14.patch, HIVE-9718-1.0.patch, 
> HIVE-9718.patch, nation.tbl, patch.txt
>
>
> Sample reproducible query: 
> {code}
> SET hive.exec.dynamic.partition.mode=nonstrict;
> SET hive.exec.dynamic.partition=true;
>  insert overwrite table nation_new_p partition (some)
> select n_name as name1, n_name as name2, n_name as name3 from nation 
> distribute by name3;
> {code}
> Note: Make sure there is data in the source table to reproduce the issue. 
> During the optimizations done for Jira: 
> https://issues.apache.org/jira/browse/HIVE-4867, an optimization of 
> deduplication of columns is done. But, when one of the columns is used as 
> part of partitioned/distribute by, its not taken care of.  
> The above query produces exception as follows:
> {code}
> Diagnostic Messages for this Task:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
> {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
> carefully final deposits detect slyly agai"}
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
>   at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
>   at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
>   at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row 
> {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
> carefully final deposits detect slyly agai"}
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
>   ... 12 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
>   ... 13 more
> Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
>   ... 19 more
> {code}
> Tables used are: 
> {code}
> CREATE EXTERNAL TABLE `nation

[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the "distibute by" clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Description: 
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

 insert overwrite table nation_new_p partition (some)
select n_name as name1, n_name as name2, n_name as name3 from nation distribute 
by name3;
{code}

Note: Make sure there is data in the source table to reproduce the issue. 

During the optimizations done for Jira: 
https://issues.apache.org/jira/browse/HIVE-4867, an optimization of 
deduplication of columns is done. But, when one of the columns is used as part 
of partitioned/distribute by, its not taken care of.  

The above query produces exception as follows:
{code}
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
carefully final deposits detect slyly agai"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
carefully final deposits detect slyly agai"}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 13 more
Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
... 19 more
{code}

Tables used are: 
{code}
CREATE EXTERNAL TABLE `nation`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
{code}
and 
{code}
CREATE TABLE `nation_new_p`(
  `n_name1` string,
  `n_name2` string)
PARTITIONED BY (
  `some` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
{code}
Sample data for the table is provided by the file attached with. 

  was:
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

 inser

[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the "distibute by" clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Description: 
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

 insert overwrite table nation_new_p partition (some)
select n_name as name1, n_name as name2, n_name as name3 from nation distribute 
by name3;
{code}

Note: Make sure there is data in the source table to reproduce the issue. 

During the optimizations done for Jira: 
https://issues.apache.org/jira/browse/HIVE-4867, a optimization of 
deduplication of columns is done. But when one of the columns is used as part 
of partitioned/distribute by, its not taken care of.  

The above query produces exception as follows:
{code}
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
carefully final deposits detect slyly agai"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
carefully final deposits detect slyly agai"}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 13 more
Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
... 19 more
{code}

Tables used are: 
{code}
CREATE EXTERNAL TABLE `nation`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
{code}
and 
{code}
CREATE TABLE `nation_new_p`(
  `n_name1` string,
  `n_name2` string)
PARTITIONED BY (
  `some` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
{code}
Sample data for the table is provided by the file attached with. 

  was:
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

 insert 

[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the "distibute by" clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Description: 
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

 insert overwrite table nation_new_p partition (some)
select n_name as name1, n_name as name2, n_name as name3 from nation distribute 
by name3;
{code}

Note: Make sure there is data in the source table to reproduce the issue. 

During the optimizations done for Jira: 
https://issues.apache.org/jira/browse/HIVE-4867, a optimization of 
deduplication of columns is done. But when one of the columns is used as part 
of partitioned/distribute by, its not taken care of.  

The above query produces exception as follows:
{code}
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
carefully final deposits detect slyly agai"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
carefully final deposits detect slyly agai"}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 13 more
Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
... 19 more
{code}

Tables used are: 
{code}
CREATE EXTERNAL TABLE `nation`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
{code}
and 
{code}
CREATE TABLE `nation_new_p`(
  `n_name1` string,
  `n_name2` string,
  `n_name3` string)
PARTITIONED BY (
  `some` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
{code}
Sample data for the table is provided by the file attached with. 

  was:
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.parti

[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the "distibute by" clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Description: 
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

explain insert overwrite table nation_new_p partition (name3)
select n_name as name1, n_name as name2, n_name as name3 from nation distribute 
by name3;
{code}

Note: Make sure there is data in the source table to reproduce the issue. 

During the optimizations done for Jira: 
https://issues.apache.org/jira/browse/HIVE-4867, a optimization of 
deduplication of columns is done. But when one of the columns is used as part 
of partitioned/distribute by, its not taken care of.  

The above query produces exception as follows:
{code}
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
carefully final deposits detect slyly agai"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
carefully final deposits detect slyly agai"}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 13 more
Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
... 19 more
{code}

Tables used are: 
{code}
CREATE EXTERNAL TABLE `nation`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
{code}
and 
{code}
CREATE TABLE `nation_new_p`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
PARTITIONED BY (
  `some` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
{code}
Sample data for the table is provided by the file attached with. 

  was:
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstri

[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the "distibute by" clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Description: 
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

explain insert overwrite table nation_new_p partition (name3)
select n_name as name1, n_name as name2, n_name as name3 from nation distribute 
by name3;
{code}

Note: Make sure there is data in the source table to reproduce the issue. 

During the optimizations done for Jira: 
https://issues.apache.org/jira/browse/HIVE-4867, a optimization of 
deduplication of columns is done. But when one of the columns is used as part 
of partitioned/distribute by, its not taken care of.  

The above query produces exception as follows:
{code}
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
carefully final deposits detect slyly agai"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
carefully final deposits detect slyly agai"}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 13 more
Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
... 19 more
{code}

Table schema used: 
{code}
CREATE EXTERNAL TABLE `nation`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
{code}

Sample data for the table is provided by the file attached with. 

  was:
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

explain insert overwrite table nation_new_p partition (p)
select n_name as name1, n_name as name2, n_name as name3 from nation distribute 
by name3;
{code}

Note: Make sure there is data in the source table to reproduce the issue. 

During the optimizations done for Jira: 
https://issues.apache.org/jira/brows

[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the "distibute by" clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Priority: Critical  (was: Major)

> Insert into dynamic partitions with same column structure in the "distibute 
> by" clause barfs
> 
>
> Key: HIVE-9718
> URL: https://issues.apache.org/jira/browse/HIVE-9718
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.0.0
>Reporter: Pavan Srinivas
>Priority: Critical
> Attachments: nation.tbl, patch.txt
>
>
> Sample reproducible query: 
> {code}
> SET hive.exec.dynamic.partition.mode=nonstrict;
> SET hive.exec.dynamic.partition=true;
> explain insert overwrite table nation_new_p partition (p)
> select n_name as name1, n_name as name2, n_name as name3 from nation 
> distribute by name3;
> {code}
> Note: Make sure there is data in the source table to reproduce the issue. 
> During the optimizations done for Jira: 
> https://issues.apache.org/jira/browse/HIVE-4867, a optimization of 
> deduplication of columns is done. But when one of the columns is used as part 
> of partitioned/distribute by, its not taken care of.  
> The above query produces exception as follows:
> {code}
> Diagnostic Messages for this Task:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
> {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
> carefully final deposits detect slyly agai"}
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
>   at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
>   at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
>   at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row 
> {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
> carefully final deposits detect slyly agai"}
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
>   ... 12 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
>   ... 13 more
> Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
>   ... 19 more
> {code}
> Table schema used: 
> {code}
> CREATE EXTERNAL TABLE `nation`(
>   `n_nationkey` int,
>   `n_name` string,
>   `n_regionkey` int,
>   `n_comment` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '|'
> STORED AS INPUTFORMAT
>  

[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the "distibute by" clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Attachment: nation.tbl
patch.txt

> Insert into dynamic partitions with same column structure in the "distibute 
> by" clause barfs
> 
>
> Key: HIVE-9718
> URL: https://issues.apache.org/jira/browse/HIVE-9718
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.0.0
>Reporter: Pavan Srinivas
> Attachments: nation.tbl, patch.txt
>
>
> Sample reproducible query: 
> {code}
> SET hive.exec.dynamic.partition.mode=nonstrict;
> SET hive.exec.dynamic.partition=true;
> explain insert overwrite table nation_new_p partition (p)
> select n_name as name1, n_name as name2, n_name as name3 from nation 
> distribute by name3;
> {code}
> Note: Make sure there is data in the source table to reproduce the issue. 
> During the optimizations done for Jira: 
> https://issues.apache.org/jira/browse/HIVE-4867, a optimization of 
> deduplication of columns is done. But when one of the columns is used as part 
> of partitioned/distribute by, its not taken care of.  
> The above query produces exception as follows:
> {code}
> Diagnostic Messages for this Task:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
> {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
> carefully final deposits detect slyly agai"}
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
>   at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
>   at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
>   at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row 
> {"n_nationkey":0,"n_name":"ALGERIA","n_regionkey":0,"n_comment":" haggle. 
> carefully final deposits detect slyly agai"}
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
>   ... 12 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
>   ... 13 more
> Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
>   ... 19 more
> {code}
> Table schema used: 
> {code}
> CREATE EXTERNAL TABLE `nation`(
>   `n_nationkey` int,
>   `n_name` string,
>   `n_regionkey` int,
>   `n_comment` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '|'
> STORED AS INPUTFORMAT
>   'org.apache.had