[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-18 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034859#comment-14034859
 ] 

Navis commented on HIVE-7232:
-

[~gopalv] Could you try the query with pre HIVE-7021? Seemed caused by that.

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis
 Attachments: HIVE-7232-extra-logging.patch, q5.explain.txt, q5.sql


 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-18 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034869#comment-14034869
 ] 

Gopal V commented on HIVE-7232:
---

[~navis]: I tested this with git commit id 50f517a3930 - it has been broken 
from before HIVE-7121.

{code}
$ hive --version
Hive 0.14.0-SNAPSHOT
Subversion 
git://cn041.l42scl.hortonworks.com/grid/5/dev/gopalv/tez-autobuild/hive -r 
50f517a3930da0a987e6f6e908a91a7705bf9c60
Compiled by gopal on Tue Jun 17 23:16:51 PDT 2014
From source with checksum 8f75b133edadf23e29096f5e9b5d0f99
{code}

Sorry about the reducesinkkey0 confusion. 

I have assigned this to myself for more investigation - will edit the bug 
tomorrow to the actual issue of incorrect results.

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-7232-extra-logging.patch, q5.explain.txt, q5.sql


 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-17 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034322#comment-14034322
 ] 

Gopal V commented on HIVE-7232:
---

[~navis]: I found out that there are indeed o_orderkey entries which show up as 
214800 in text, which lies outside the range of the TPC-H Identifier column 
spec.

I will reload the data using bigint for o_orderkey soon.

But I still want to locate and confirm the different results between MR and Tez 
here.

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis
 Attachments: HIVE-7232-extra-logging.patch, q5.explain.txt, q5.sql


 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-17 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034675#comment-14034675
 ] 

Navis commented on HIVE-7232:
-

Looks like something is wrong in broadcast join. I'll look into this.

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis
 Attachments: HIVE-7232-extra-logging.patch, q5.explain.txt, q5.sql


 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-17 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034734#comment-14034734
 ] 

Navis commented on HIVE-7232:
-

I've reproduced the problem. It occurs on mapjoin + vetorization combination. 

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis
 Attachments: HIVE-7232-extra-logging.patch, q5.explain.txt, q5.sql


 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-16 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032176#comment-14032176
 ] 

Navis commented on HIVE-7232:
-

[~gopalv] Could you attach full explain result and query? Explain result on my 
notebook is different with yours (For me, it's Reducer 6 not Reducer 3). Seemed 
hard to reproduce with small sized data (used factor 1).

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-16 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032250#comment-14032250
 ] 

Navis commented on HIVE-7232:
-

Could you print out whole row instead of keys?
{noformat}
x1.append(row=+ SerDeUtils.getJSONString(row, rowInspector));
{noformat}
Thanks,

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-16 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033253#comment-14033253
 ] 

Navis commented on HIVE-7232:
-

_col10 is null which is VALUE._col0 of MAP2, which is again o_orderkey
{noformat}
Reduce Output Operator
  key expressions: o_custkey (type: int)
  sort order: +
  Map-reduce partition columns: o_custkey (type: int)
  Statistics: Num rows: 1 Data size: 86571942280 Basic stats: COMPLETE 
Column stats: NONE
  value expressions: o_orderkey (type: int), o_orderdate (type: string)
{noformat}
Is the table orders contains nulls?

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis
 Attachments: HIVE-7232-extra-logging.patch, q5.explain.txt, q5.sql


 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033435#comment-14033435
 ] 

Gopal V commented on HIVE-7232:
---

[~navis]: TPC-H data shouldn't have any NULLs in the join keys.

I will re-run the scans tomorrow. I can see one case where the schema from 
HIVE-600 might be completely broken. The Integer requirement in TPC-H requires 
only -2,147,483,646 to 2,147,483,647. 

Though rethinking this a bit, I think HIVE-600's schema a bug which assumed 
O_ORDERKEY would be an int (might not be 32-bit anymore at 1Tb scale). I will 
verify that we're not overflowing that integer limit at a higher scale tomorrow 
 producing nulls.

I can confirm that. But that aside, I am more concerned about the difference in 
output between Tez  MR.

In MR, no stage with a reduce sink will have a key row fed by a reduce input.  

I will debug this more tomorrow to narrow down the query to a pair of 
shuffle-joins and compare output between MR  Tez plans.

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis
 Attachments: HIVE-7232-extra-logging.patch, q5.explain.txt, q5.sql


 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032033#comment-14032033
 ] 

Gopal V commented on HIVE-7232:
---

[~ashutoshc]: Incorrect results as well.

Ran the same query with Tez  MR, got different results.

MR doesn't hit the same scenario becuase of the empty Map task, which doesn't 
have any input columns named reducesinkkey0.

Tez seems to hit a corner case where there are 2 shuffle joins one after the 
other - there is an input col named KEY.reducesinkkey0 and an output col named 
reducesinkkey0, which have no relation to each other.

{code}
$ diff -y -W 72  results/q5.tez.txt results/q5.mr.txt 
CHINA   985314.0848|VIETNAM 1.897236998313891E10
INDIA   819113.441801  |CHINA   1.894405687452681E10
VIETNAM 637407.2255|INDONESIA   1.89306456994551
JAPAN   523754.9791|JAPAN   1.892184676125508E10
INDONESIA   517900.1924|INDIA   1.886882412417209E10
{code}

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-15 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032088#comment-14032088
 ] 

Ashutosh Chauhan commented on HIVE-7232:


Seems like this also can get triggered  for MR path. I think latest patch on 
HIVE-5771 is failing for test like subquery_in.q because they are hitting into 
this issue.

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-15 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032106#comment-14032106
 ] 

Navis commented on HIVE-7232:
-

Fail of  subquery_in.q in HIVE-5771 seemed not caused by HIVE-4867 but strongly 
related with it because HIVE-4867 have (intentionally) broken internal 
assumption on keys/values of RS. With constant propagation optimizer, 
subquery_in.q is making different keys for each aliases of join, which seemed 
not valid.
{code}
-- sq_1
Reduce Output Operator
  key expressions: _col1 (type: int)
  sort order: ++
  Map-reduce partition columns: _col1 (type: int)
{code}
and
{code}
-- others
Reduce Output Operator
  key expressions: _col0 (type: int), _col1 (type: int)
  sort order: ++
  Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
{code}

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-15 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032108#comment-14032108
 ] 

Navis commented on HIVE-7232:
-

For this problem, I cannot understand that the RS which is a child of JOIN can 
get ROW of format,
{noformat}
{reducesinkkey0:442}
{noformat}
In my reading, join would emit ROW and rowOI which is labeled with output 
columns, like below
{noformat}
_col0{KEY.reducesinkkey0} 
_col3{VALUE._col2}
_col10  {VALUE._col0}
_col11  {KEY.reducesinkkey0} 
_col14  {VALUE._col3}
{noformat}

I don't have environment for hadoop-2, so it's hard to verify, so it might take 
some time. 

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032123#comment-14032123
 ] 

Gopal V commented on HIVE-7232:
---

[~navis]: I can run tests for you, if you have a patch file with log lines.

I can reproduce this issue consistently for all recent runs of this query.

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031791#comment-14031791
 ] 

Ashutosh Chauhan commented on HIVE-7232:


[~gopalv] Is this resulting in wrong results (because NULL key got emitted 
incorrectly) or this resulting in lower perf (because it resulted in a skew 
towards NULL) ?

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)