[jira] [Commented] (HIVE-21016) Duplicate column name in GROUP BY statement causing Vertex failures

2019-01-22 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749611#comment-16749611
 ] 

Peter Vary commented on HIVE-21016:
---

Since I am not absolutely confident in my knowledge of this part of the code, I 
would prefer to have it in the master branch only - this means we will have 
more testing around it before it will be released. If we find somebody who is 
more experienced with this part of the code that could be another story :)

Thanks, Peter 

> Duplicate column name in GROUP BY statement causing Vertex failures
> ---
>
> Key: HIVE-21016
> URL: https://issues.apache.org/jira/browse/HIVE-21016
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Bjorn Olsen
>Assignee: Mani M
>Priority: Major
>
> Hive queries fail with "Vertex failure" messages when the user submits a 
> query containing duplicate GROUP BY columns. The Hive query parser should 
> detect and reject this scenario with a meaningful error message, rather than 
> executing the query and failing with an obfuscated message. For complex 
> queries this can result in a lot of debugging effort, whereas a simple error 
> message could have saved some time.
> To repeat the issue, choose any table and perform a GROUP BY with a duplicate 
> column name.
> {{For example:}}
> select count( * ), party_id from party {{group by party_id, party_id;}}
> Note the duplicate column in the GROUP BY.
> This will fail with messages similar to below:
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing vector batch (tag=0) ffb9-5fb1-3024-922a-10cc313a7c171
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:390)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:232)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
>  ... 14 more
>  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing vector batch (tag=0) 
> ffb9-5fb1-3024-922a-10cc313a7c171
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:454)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:381)
>  ... 17 more
>  *Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21016) Duplicate column name in GROUP BY statement causing Vertex failures

2019-01-22 Thread Mani M (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749159#comment-16749159
 ] 

Mani M commented on HIVE-21016:
---

[~pvary]

Thanks for the info.

In which branch I need to fix.?

> Duplicate column name in GROUP BY statement causing Vertex failures
> ---
>
> Key: HIVE-21016
> URL: https://issues.apache.org/jira/browse/HIVE-21016
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Bjorn Olsen
>Assignee: Mani M
>Priority: Major
>
> Hive queries fail with "Vertex failure" messages when the user submits a 
> query containing duplicate GROUP BY columns. The Hive query parser should 
> detect and reject this scenario with a meaningful error message, rather than 
> executing the query and failing with an obfuscated message. For complex 
> queries this can result in a lot of debugging effort, whereas a simple error 
> message could have saved some time.
> To repeat the issue, choose any table and perform a GROUP BY with a duplicate 
> column name.
> {{For example:}}
> select count( * ), party_id from party {{group by party_id, party_id;}}
> Note the duplicate column in the GROUP BY.
> This will fail with messages similar to below:
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing vector batch (tag=0) ffb9-5fb1-3024-922a-10cc313a7c171
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:390)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:232)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
>  ... 14 more
>  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing vector batch (tag=0) 
> ffb9-5fb1-3024-922a-10cc313a7c171
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:454)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:381)
>  ... 17 more
>  *Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21016) Duplicate column name in GROUP BY statement causing Vertex failures

2019-01-22 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748607#comment-16748607
 ] 

Peter Vary commented on HIVE-21016:
---

[~rmsm...@gmail.com]: I would prefer to fix this in the SemanticAnalyzer class. 
On the other hand I have seen several versions of 
{{genGroupByPlanGroupByOperator}} method and we have to make sure that this 
check is called for every group by. So I would look around in the {{genPlan}} 
method to find a better place for this. Also maybe use the 
{{ParseUtils.validateColumnNameUniqueness}}

> Duplicate column name in GROUP BY statement causing Vertex failures
> ---
>
> Key: HIVE-21016
> URL: https://issues.apache.org/jira/browse/HIVE-21016
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Bjorn Olsen
>Assignee: Mani M
>Priority: Major
>
> Hive queries fail with "Vertex failure" messages when the user submits a 
> query containing duplicate GROUP BY columns. The Hive query parser should 
> detect and reject this scenario with a meaningful error message, rather than 
> executing the query and failing with an obfuscated message. For complex 
> queries this can result in a lot of debugging effort, whereas a simple error 
> message could have saved some time.
> To repeat the issue, choose any table and perform a GROUP BY with a duplicate 
> column name.
> {{For example:}}
> select count( * ), party_id from party {{group by party_id, party_id;}}
> Note the duplicate column in the GROUP BY.
> This will fail with messages similar to below:
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing vector batch (tag=0) ffb9-5fb1-3024-922a-10cc313a7c171
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:390)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:232)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
>  ... 14 more
>  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing vector batch (tag=0) 
> ffb9-5fb1-3024-922a-10cc313a7c171
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:454)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:381)
>  ... 17 more
>  *Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21016) Duplicate column name in GROUP BY statement causing Vertex failures

2019-01-17 Thread Mani M (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744925#comment-16744925
 ] 

Mani M commented on HIVE-21016:
---

HI [~pvary]

As per my initial analysis, I think we need to check for the duplicates in the 
below source where the group by clause is generated

 

https://github.com/apache/hive/blob/8e7c3b340f36a3b76453338b04b8cda360eeaa70/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L4937

 
{code:java}
List grpByExprs = getGroupByForClause(parseInfo, dest);
//Need to check for the duplicates from the above derived list.

for (int i = 0; i < grpByExprs.size(); ++i) {
  ASTNode grpbyExpr = grpByExprs.get(i);
  ColumnInfo exprInfo = groupByInputRowResolver.getExpression(grpbyExpr);
  if (exprInfo == null) {
  throw new SemanticException(ErrorMsg.INVALID_COLUMN.getMsg(grpbyExpr));
 }
{code}
Correct me if my understanding is wrong

> Duplicate column name in GROUP BY statement causing Vertex failures
> ---
>
> Key: HIVE-21016
> URL: https://issues.apache.org/jira/browse/HIVE-21016
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Bjorn Olsen
>Assignee: Mani M
>Priority: Major
>
> Hive queries fail with "Vertex failure" messages when the user submits a 
> query containing duplicate GROUP BY columns. The Hive query parser should 
> detect and reject this scenario with a meaningful error message, rather than 
> executing the query and failing with an obfuscated message. For complex 
> queries this can result in a lot of debugging effort, whereas a simple error 
> message could have saved some time.
> To repeat the issue, choose any table and perform a GROUP BY with a duplicate 
> column name.
> {{For example:}}
> select count( * ), party_id from party {{group by party_id, party_id;}}
> Note the duplicate column in the GROUP BY.
> This will fail with messages similar to below:
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing vector batch (tag=0) ffb9-5fb1-3024-922a-10cc313a7c171
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:390)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:232)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
>  ... 14 more
>  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing vector batch (tag=0) 
> ffb9-5fb1-3024-922a-10cc313a7c171
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:454)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:381)
>  ... 17 more
>  *Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21016) Duplicate column name in GROUP BY statement causing Vertex failures

2019-01-16 Thread Mani M (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743828#comment-16743828
 ] 

Mani M commented on HIVE-21016:
---

HI [~pvary]

Is it correct to put the validation check before this line

[https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GroupByOptimizer.java#L359]

 

> Duplicate column name in GROUP BY statement causing Vertex failures
> ---
>
> Key: HIVE-21016
> URL: https://issues.apache.org/jira/browse/HIVE-21016
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Bjorn Olsen
>Priority: Major
>
> Hive queries fail with "Vertex failure" messages when the user submits a 
> query containing duplicate GROUP BY columns. The Hive query parser should 
> detect and reject this scenario with a meaningful error message, rather than 
> executing the query and failing with an obfuscated message. For complex 
> queries this can result in a lot of debugging effort, whereas a simple error 
> message could have saved some time.
> To repeat the issue, choose any table and perform a GROUP BY with a duplicate 
> column name.
> {{For example:}}
> select count( * ), party_id from party {{group by party_id, party_id;}}
> Note the duplicate column in the GROUP BY.
> This will fail with messages similar to below:
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing vector batch (tag=0) ffb9-5fb1-3024-922a-10cc313a7c171
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:390)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:232)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
>  ... 14 more
>  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing vector batch (tag=0) 
> ffb9-5fb1-3024-922a-10cc313a7c171
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:454)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:381)
>  ... 17 more
>  *Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)