[jira] [Assigned] (HIVE-15486) DefaultGraphWalker invokes getChildren() as many times as there are children

2017-02-15 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar reassigned HIVE-15486:
---

Assignee: Dhiraj Kumar

> DefaultGraphWalker invokes getChildren() as many times as there are children
> 
>
> Key: HIVE-15486
> URL: https://issues.apache.org/jira/browse/HIVE-15486
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
> Attachments: query
>
>
> DefaultGraphWalker while walking a Node, calls up getChildren method as many 
> times as there are children. This leads a performance penalty where a node 
> has too many direct children. 
> Attached is query file. Instruction to run
> 1. time hive -f query  
> Checkout the time on hive 1.2 version vs 2.1 version. 
> This change was introduced in 
> [HIVE-11652|https://issues.apache.org/jira/browse/HIVE-11652]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2

2017-01-09 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15811588#comment-15811588
 ] 

Dhiraj Kumar commented on HIVE-15531:
-

If commons-logging is not used by hive then log4j-jcl is not required. But if 
commons-logging is being used by Hadoop, a bridge will be required. By default 
log4j-1.2-api acts as bridge for hadoop logging which had been added in the 
commit 

https://github.com/apache/hive/commit/c93d6c77e31e2eb9b40f5167ab3491d44eae351a

Also agree that with classpath properly ordered, this issue will not surface at 
all. 

I would have liked it with log4j-jcl, removing classpath ordering dependency as 
well as user lib into classpath (coming higher than hive log4j's lib)

I will leave it to you and will close the discussion here. 





> Hive breaks Hadoop commons logging with log4j2
> --
>
> Key: HIVE-15531
> URL: https://issues.apache.org/jira/browse/HIVE-15531
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
>Priority: Minor
> Attachments: HIVE-15531.patch
>
>
> Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 
> without bridge. 
> The bridge is missing in Hive. 
> This leads to a problem whereby commons-logging initialises a log4j (1.2) 
> version Logger, does not configure it properly since configuration for it is 
> missing and sends logging output to stdout (the default). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2

2017-01-08 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15810881#comment-15810881
 ] 

Dhiraj Kumar commented on HIVE-15531:
-

{quote}   Does Hive logger still get log4j2, and Hadoop logger log4j?  {quote}

It is possible if classpath is messed up. Will depend what extra library end 
user has put in their classpath. That is exactly what happened in my case as 
explained in earlier comment. 
This patch makes sure that even if end users mess up their classpath (just from 
log4j perspective), still logging will be setup properly and won't get into 
log4j vs log4j2 issue. 

I did not look it from the perspective of slf4j. 

[~prasanth_j] can you have a look?


> Hive breaks Hadoop commons logging with log4j2
> --
>
> Key: HIVE-15531
> URL: https://issues.apache.org/jira/browse/HIVE-15531
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
>Priority: Minor
> Attachments: HIVE-15531.patch
>
>
> Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 
> without bridge. 
> The bridge is missing in Hive. 
> This leads to a problem whereby commons-logging initialises a log4j (1.2) 
> version Logger, does not configure it properly since configuration for it is 
> missing and sends logging output to stdout (the default). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2

2017-01-06 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803961#comment-15803961
 ] 

Dhiraj Kumar commented on HIVE-15531:
-

[~sershe] I thought about test cases with below approach 
a) Manipulating classpath at runtime to put log4j 1.2 before log4j 2.4 
libraries and showcasing the fact that commons-log will pick older version of 
logger. 
b) Keeping log4j 1.2 in beginning and putting log4j-jcl anywhere in classpath 
and showing that commons-logging will pick Log4j 2.4 

The problem with this approach I have to manipulate classpath at runtime which 
might affect other tests downstream. Moreover, I am not able to find a clean 
solution to do it. 

Would a trivial test like this suffice? 

{code}
Log log = LogFactory.getLog(CommonsLoggingTest.class);
assertEquals(log.getClass(),org.apache.logging.log4j.jcl.Log4jLog.class);
{code}

> Hive breaks Hadoop commons logging with log4j2
> --
>
> Key: HIVE-15531
> URL: https://issues.apache.org/jira/browse/HIVE-15531
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
>Priority: Minor
> Attachments: HIVE-15531.patch
>
>
> Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 
> without bridge. 
> The bridge is missing in Hive. 
> This leads to a problem whereby commons-logging initialises a log4j (1.2) 
> version Logger, does not configure it properly since configuration for it is 
> missing and sends logging output to stdout (the default). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2

2017-01-05 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800899#comment-15800899
 ] 

Dhiraj Kumar commented on HIVE-15531:
-

Related issue [HIVE-11572|https://issues.apache.org/jira/browse/HIVE-11572]

> Hive breaks Hadoop commons logging with log4j2
> --
>
> Key: HIVE-15531
> URL: https://issues.apache.org/jira/browse/HIVE-15531
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
>Priority: Minor
> Attachments: HIVE-15531.patch
>
>
> Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 
> without bridge. 
> The bridge is missing in Hive. 
> This leads to a problem whereby commons-logging initialises a log4j (1.2) 
> version Logger, does not configure it properly since configuration for it is 
> missing and sends logging output to stdout (the default). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2

2017-01-04 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15531:

Description: 
Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 
without bridge. 

The bridge is missing in Hive. 

This leads to a problem whereby commons-logging initialises a log4j (1.2) 
version Logger, does not configure it properly since configuration for it is 
missing and sends logging output to stdout (the default). 



  was:
Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 
without bridge. 

The bridge is missing in Hive. 

This leads to a problem whereby commons-logging initialises a log4j (1.2) 
version, does not configure it properly since configuration for it is missing 
and sends logging output to stdout (the default). 




> Hive breaks Hadoop commons logging with log4j2
> --
>
> Key: HIVE-15531
> URL: https://issues.apache.org/jira/browse/HIVE-15531
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
>Priority: Minor
> Attachments: HIVE-15531.patch
>
>
> Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 
> without bridge. 
> The bridge is missing in Hive. 
> This leads to a problem whereby commons-logging initialises a log4j (1.2) 
> version Logger, does not configure it properly since configuration for it is 
> missing and sends logging output to stdout (the default). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2

2017-01-04 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798980#comment-15798980
 ] 

Dhiraj Kumar commented on HIVE-15531:
-

[~sershe], I dug the bug deeper and found out that it was our environmental 
issue. There were two problem at our end,

1. An UDF jar in near beginning of classpath with log4j (1.2) classes in it. It 
was picked by Commons-logger. 
2. Another jar, having a basic log4j.properties was picked for properties since 
hive configuration directory (top in class path) was missing log4j.properties 
and was having log4j2.properties. 

So, even without log4j-jcl (current patch), it may work fine. Since 
log4j-1.2-api (already part of hive 2.x) has org.apache.log4j.Logger class and 
initialises it properly with log4j2 configuration. That works like bridge. 
Provided there is no other class in class path before log4j-1.2-api which has 
org.apache.log4j.Logger class. 

log4j-jcl overcomes classpath issue since commons-logging tries to find 
LogFactory and log4j-jcl injects itself there. So there won't be any classpath 
issue with it. Code snippet below from org.apache.commons.logging.LogFactory

{code}
// Second, try to find a service by using the JDK1.3 class
// discovery mechanism, which involves putting a file with the name
// of an interface class in the META-INF/services directory, where the
// contents of the file is a single line specifying a concrete class
// that implements the desired interface.

if (factory == null) {
if (isDiagnosticsEnabled()) {
logDiagnostic("[LOOKUP] Looking for a resource file of name [" 
+ SERVICE_ID +
  "] to define the LogFactory subclass to use...");
}
try {
final InputStream is = getResourceAsStream(contextClassLoader, 
SERVICE_ID);
{code}

Since this issue reflects itself based on classpath, I am thinking to inject 
log4j 1.2 in classpath with test scope and check that right class has been 
loaded by Commons logging. 
Would like to know your thoughts. 











> Hive breaks Hadoop commons logging with log4j2
> --
>
> Key: HIVE-15531
> URL: https://issues.apache.org/jira/browse/HIVE-15531
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
>Priority: Minor
> Attachments: HIVE-15531.patch
>
>
> Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 
> without bridge. 
> The bridge is missing in Hive. 
> This leads to a problem whereby commons-logging initialises a log4j (1.2) 
> version, does not configure it properly since configuration for it is missing 
> and sends logging output to stdout (the default). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2

2017-01-03 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15531:

Status: Patch Available  (was: Open)

> Hive breaks Hadoop commons logging with log4j2
> --
>
> Key: HIVE-15531
> URL: https://issues.apache.org/jira/browse/HIVE-15531
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
>Priority: Minor
> Attachments: HIVE-15531.patch
>
>
> Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 
> without bridge. 
> The bridge is missing in Hive. 
> This leads to a problem whereby commons-logging initialises a log4j (1.2) 
> version, does not configure it properly since configuration for it is missing 
> and sends logging output to stdout (the default). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2

2017-01-03 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15531:

Status: In Progress  (was: Patch Available)

> Hive breaks Hadoop commons logging with log4j2
> --
>
> Key: HIVE-15531
> URL: https://issues.apache.org/jira/browse/HIVE-15531
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
>Priority: Minor
> Attachments: HIVE-15531.patch
>
>
> Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 
> without bridge. 
> The bridge is missing in Hive. 
> This leads to a problem whereby commons-logging initialises a log4j (1.2) 
> version, does not configure it properly since configuration for it is missing 
> and sends logging output to stdout (the default). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2

2017-01-03 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15531:

Attachment: HIVE-15531.patch

> Hive breaks Hadoop commons logging with log4j2
> --
>
> Key: HIVE-15531
> URL: https://issues.apache.org/jira/browse/HIVE-15531
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
>Priority: Minor
> Attachments: HIVE-15531.patch
>
>
> Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 
> without bridge. 
> The bridge is missing in Hive. 
> This leads to a problem whereby commons-logging initialises a log4j (1.2) 
> version, does not configure it properly since configuration for it is missing 
> and sends logging output to stdout (the default). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2016-12-21 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767191#comment-15767191
 ] 

Dhiraj Kumar edited comment on HIVE-11652 at 12/21/16 2:33 PM:
---

[~jcamachorodriguez] As suggested, I have create another bug at 
[HIVE-15486|https://issues.apache.org/jira/browse/HIVE-15486] 

I have attached a larger version of the same query that I had put here in the 
trail. It has 50K elements inside IN clause. Although it seems a convoluted 
query, we do have similar query running on our production system. 

We did not face problem until the upgrade to 2.1 from an older version. 
Although, this codepath is not the only problem for this query. But it accounts 
for 50% of the time consumed at hive processing. 

I did not get your suggestion of keeping last added child in the stack. Since 
the node peeked from stack would keep changing,  the list of child would change 
too for that node. Keeping the last added won't be able to help since you have 
lost list of children once you peeked another node from stack. I believe you 
need to keep all the children of node as well. 

In fact that finding position itself is not a problem, but 
ASTNode.getChildren() invocation is problem.  

I have to pick something in ASTNode.java as well,  why to add single child at 
at time to ret_vec. Why not return all the children() in one shot? 

{code}
public ArrayList getChildren() {
if (super.getChildCount() == 0) {
  return null;
}

ArrayList ret_vec = new ArrayList();
for (int i = 0; i < super.getChildCount(); ++i) {
  ret_vec.add((Node) super.getChild(i));
}

return ret_vec;
  }
{code}

I will add some profiler output to bug as well. 


was (Author: dhiraj.kumar):
[~jcamachorodriguez] As suggested, I have create another bug at 
[https://issues.apache.org/jira/browse/HIVE-15486] 

I have attached a larger version of the same query that I had put here in the 
trail. It has 50K elements inside IN clause. Although it seems a convoluted 
query, we do have similar query running on our production system. 

We did not face problem until the upgrade to 2.1 from an older version. 
Although, this codepath is not the only problem for this query. But it accounts 
for 50% of the time consumed at hive processing. 

I did not get your suggestion of keeping last added child in the stack. Since 
the node peeked from stack would keep changing,  the list of child would change 
too for that node. Keeping the last added won't be able to help since you have 
lost list of children once you peeked another node from stack. I believe you 
need to keep all the children of node as well. 

In fact that finding position itself is not a problem, but 
ASTNode.getChildren() invocation is problem.  

I have to pick something in ASTNode.java as well,  why to add single child at 
at time to ret_vec. Why not return all the children() in one shot? 

{code}
public ArrayList getChildren() {
if (super.getChildCount() == 0) {
  return null;
}

ArrayList ret_vec = new ArrayList();
for (int i = 0; i < super.getChildCount(); ++i) {
  ret_vec.add((Node) super.getChild(i));
}

return ret_vec;
  }
{code}

I will add some profiler output to bug as well. 

> Avoid expensive call to removeAll in DefaultGraphWalker
> ---
>
> Key: HIVE-11652
> URL: https://issues.apache.org/jira/browse/HIVE-11652
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Physical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
> HIVE-11652.patch
>
>
> When the plan is too large, the removeAll call in DefaultGraphWalker (line 
> 140) will take very long as it will have to go through the list looking for 
> each of the nodes. We try to get rid of this call by rewriting the logic in 
> the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2016-12-21 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767191#comment-15767191
 ] 

Dhiraj Kumar commented on HIVE-11652:
-

[~jcamachorodriguez] As suggested, I have create another bug at 
[https://issues.apache.org/jira/browse/HIVE-15486] 

I have attached a larger version of the same query that I had put here in the 
trail. It has 50K elements inside IN clause. Although it seems a convoluted 
query, we do have similar query running on our production system. 

We did not face problem until the upgrade to 2.1 from an older version. 
Although, this codepath is not the only problem for this query. But it accounts 
for 50% of the time consumed at hive processing. 

I did not get your suggestion of keeping last added child in the stack. Since 
the node peeked from stack would keep changing,  the list of child would change 
too for that node. Keeping the last added won't be able to help since you have 
lost list of children once you peeked another node from stack. I believe you 
need to keep all the children of node as well. 

In fact that finding position itself is not a problem, but 
ASTNode.getChildren() invocation is problem.  

I have to pick something in ASTNode.java as well,  why to add single child at 
at time to ret_vec. Why not return all the children() in one shot? 

{code}
public ArrayList getChildren() {
if (super.getChildCount() == 0) {
  return null;
}

ArrayList ret_vec = new ArrayList();
for (int i = 0; i < super.getChildCount(); ++i) {
  ret_vec.add((Node) super.getChild(i));
}

return ret_vec;
  }
{code}

I will add some profiler output to bug as well. 

> Avoid expensive call to removeAll in DefaultGraphWalker
> ---
>
> Key: HIVE-11652
> URL: https://issues.apache.org/jira/browse/HIVE-11652
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Physical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
> HIVE-11652.patch
>
>
> When the plan is too large, the removeAll call in DefaultGraphWalker (line 
> 140) will take very long as it will have to go through the list looking for 
> each of the nodes. We try to get rid of this call by rewriting the logic in 
> the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15486) DefaultGraphWalker invokes getChildren() as many times as there are children

2016-12-21 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15486:

Description: 
DefaultGraphWalker while walking a Node, calls up getChildren method as many 
times as there are children. This leads a performance penalty where a node has 
too many direct children. 

Attached is query file. Instruction to run

1. time hive -f query  

Checkout the time on hive 1.2 version vs 2.1 version. 

This change was introduced in 
[HIVE-11652|https://issues.apache.org/jira/browse/HIVE-11652]




  was:
DefaultGraphWalker while walking a Node, calls up getChildren method as many 
times as there are children. This leads a performance penalty where a node has 
too many direct children. 

Attached is query file. Instructions to run

1. time hive -f query  

Checkout the time on 1.2 version vs 2.1 version. 

This change was introduced in 
[HIVE-11652|https://issues.apache.org/jira/browse/HIVE-11652]





> DefaultGraphWalker invokes getChildren() as many times as there are children
> 
>
> Key: HIVE-15486
> URL: https://issues.apache.org/jira/browse/HIVE-15486
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Dhiraj Kumar
> Attachments: query
>
>
> DefaultGraphWalker while walking a Node, calls up getChildren method as many 
> times as there are children. This leads a performance penalty where a node 
> has too many direct children. 
> Attached is query file. Instruction to run
> 1. time hive -f query  
> Checkout the time on hive 1.2 version vs 2.1 version. 
> This change was introduced in 
> [HIVE-11652|https://issues.apache.org/jira/browse/HIVE-11652]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15486) DefaultGraphWalker invokes getChildren() as many times as there are children

2016-12-21 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15486:

Attachment: query

> DefaultGraphWalker invokes getChildren() as many times as there are children
> 
>
> Key: HIVE-15486
> URL: https://issues.apache.org/jira/browse/HIVE-15486
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Dhiraj Kumar
> Attachments: query
>
>
> DefaultGraphWalker while walking a Node, calls up getChildren method as many 
> times as there are children. This leads a performance penalty where a node 
> has too many direct children. 
> Attached is query file. Instructions to run
> 1. time hive -f query  
> Checkout the time on 1.2 version vs 2.1 version. 
> This change was introduced in 
> [HIVE-11652|https://issues.apache.org/jira/browse/HIVE-11652]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2016-12-20 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764924#comment-15764924
 ] 

Dhiraj Kumar commented on HIVE-11652:
-

[~jcamachorodriguez] The patch causes a performance issue. 

Example query. 
{code}select a from (select 1 as a ) tbl where a in 
(1,2,3,4,5,6,7,8,9,10);{code}

Method source
{code}  
protected void walk(Node nd) throws SemanticException {
// Push the node in the stack
opStack.push(nd);

// While there are still nodes to dispatch...
while (!opStack.empty()) {
  Node node = opStack.peek();

  if (node.getChildren() == null ||
  getDispatchedList().containsAll(node.getChildren())) {
// Dispatch current node
if (!getDispatchedList().contains(node)) {
  dispatch(node, opStack);
  opQueue.add(node);
}
opStack.pop();
continue;
  }

  // Add a single child and restart the loop
  for (Node childNode : node.getChildren()) {
if (!getDispatchedList().contains(childNode)) {
  opStack.push(childNode);
  break;
}
  }
} // end while
  }
{code}

The walk method will push the root node onto stack (where clause in this case, 
which has 12 child) and will call all its direct child at line 166. It will 
process single child (in this example) and will again invoke 
node.getChildren(). A total of 12 invocation of getChildren() will be made. 

Now, if in clause has huge list, it will causes

1. As many invocation of getChildren() method as there are children. So if "in 
clause" has 50K values,  getChildren() will be invoked 50K times.
2. Huge number of nodes and their repeated invocation puts memory pressure in 
ASTNode.getChildren(). Since it returns all the children in every case. 
3. Since the thread has taken a lock initially before compilation started, it 
blocks another compilation to make progress. 

Depending on the query, it is order of magnitude slower.  
 


> Avoid expensive call to removeAll in DefaultGraphWalker
> ---
>
> Key: HIVE-11652
> URL: https://issues.apache.org/jira/browse/HIVE-11652
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Physical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
> HIVE-11652.patch
>
>
> When the plan is too large, the removeAll call in DefaultGraphWalker (line 
> 140) will take very long as it will have to go through the list looking for 
> each of the nodes. We try to get rid of this call by rewriting the logic in 
> the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.

2016-12-01 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15291:

Hadoop Flags:   (was: Incompatible change)

> Comparison of timestamp fails if only date part is provided. 
> -
>
> Key: HIVE-15291
> URL: https://issues.apache.org/jira/browse/HIVE-15291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
> Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, 
> HIVE-15291.3.patch, HIVE-15291.4.patch
>
>
> Summary : If a query needs to compare two timestamp with one timestamp 
> provided in "-MM-DD" format, skipping the time part, it returns incorrect 
> result. 
> Steps to reproduce : 
> 1. Start a hive-cli. 
> 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
> "2016-12-30";
> 3. Expected result : true
> 4. Actual result : NULL
> Detailed description : 
> If two primitives of different type needs to compared, a common comparator 
> type is chosen. Prior to 2.1, Common type Text was chosen to compare 
> Timestamp type and Text type. 
> In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
> Text type. This leads to converting Text type (-MM-DD) into 
> java.sql.Timestamp which throws exception saying the input is not in proper 
> format. The exception is suppressed and a null is returned. 
> Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
> {code:java}
> if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
> PrimitiveGrouping.DATE_GROUP) {
>   return b;
> }
> // date/timestamp is higher precedence than String_GROUP
> if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
> PrimitiveGrouping.DATE_GROUP) {
>   return a;
> }
> {code}
> The bug was introduced in  
> [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15291) Comparison of timestamp fails if only date part is provided.

2016-12-01 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15711714#comment-15711714
 ] 

Dhiraj Kumar commented on HIVE-15291:
-

It was put from the perspective of result being incompatible with hive 1.2 
version. But since we agree that it is a bug, it is no longer "Incompatible 
change". 



> Comparison of timestamp fails if only date part is provided. 
> -
>
> Key: HIVE-15291
> URL: https://issues.apache.org/jira/browse/HIVE-15291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
> Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, 
> HIVE-15291.3.patch, HIVE-15291.4.patch
>
>
> Summary : If a query needs to compare two timestamp with one timestamp 
> provided in "-MM-DD" format, skipping the time part, it returns incorrect 
> result. 
> Steps to reproduce : 
> 1. Start a hive-cli. 
> 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
> "2016-12-30";
> 3. Expected result : true
> 4. Actual result : NULL
> Detailed description : 
> If two primitives of different type needs to compared, a common comparator 
> type is chosen. Prior to 2.1, Common type Text was chosen to compare 
> Timestamp type and Text type. 
> In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
> Text type. This leads to converting Text type (-MM-DD) into 
> java.sql.Timestamp which throws exception saying the input is not in proper 
> format. The exception is suppressed and a null is returned. 
> Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
> {code:java}
> if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
> PrimitiveGrouping.DATE_GROUP) {
>   return b;
> }
> // date/timestamp is higher precedence than String_GROUP
> if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
> PrimitiveGrouping.DATE_GROUP) {
>   return a;
> }
> {code}
> The bug was introduced in  
> [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15291) Comparison of timestamp fails if only date part is provided.

2016-11-30 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708407#comment-15708407
 ] 

Dhiraj Kumar commented on HIVE-15291:
-

Added the .q test.

> Comparison of timestamp fails if only date part is provided. 
> -
>
> Key: HIVE-15291
> URL: https://issues.apache.org/jira/browse/HIVE-15291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
> Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, 
> HIVE-15291.3.patch, HIVE-15291.4.patch
>
>
> Summary : If a query needs to compare two timestamp with one timestamp 
> provided in "-MM-DD" format, skipping the time part, it returns incorrect 
> result. 
> Steps to reproduce : 
> 1. Start a hive-cli. 
> 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
> "2016-12-30";
> 3. Expected result : true
> 4. Actual result : NULL
> Detailed description : 
> If two primitives of different type needs to compared, a common comparator 
> type is chosen. Prior to 2.1, Common type Text was chosen to compare 
> Timestamp type and Text type. 
> In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
> Text type. This leads to converting Text type (-MM-DD) into 
> java.sql.Timestamp which throws exception saying the input is not in proper 
> format. The exception is suppressed and a null is returned. 
> Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
> {code:java}
> if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
> PrimitiveGrouping.DATE_GROUP) {
>   return b;
> }
> // date/timestamp is higher precedence than String_GROUP
> if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
> PrimitiveGrouping.DATE_GROUP) {
>   return a;
> }
> {code}
> The bug was introduced in  
> [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15291) Comparison of timestamp fails if only date part is provided.

2016-11-30 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708409#comment-15708409
 ] 

Dhiraj Kumar commented on HIVE-15291:
-

Added the .q test.

> Comparison of timestamp fails if only date part is provided. 
> -
>
> Key: HIVE-15291
> URL: https://issues.apache.org/jira/browse/HIVE-15291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
> Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, 
> HIVE-15291.3.patch, HIVE-15291.4.patch
>
>
> Summary : If a query needs to compare two timestamp with one timestamp 
> provided in "-MM-DD" format, skipping the time part, it returns incorrect 
> result. 
> Steps to reproduce : 
> 1. Start a hive-cli. 
> 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
> "2016-12-30";
> 3. Expected result : true
> 4. Actual result : NULL
> Detailed description : 
> If two primitives of different type needs to compared, a common comparator 
> type is chosen. Prior to 2.1, Common type Text was chosen to compare 
> Timestamp type and Text type. 
> In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
> Text type. This leads to converting Text type (-MM-DD) into 
> java.sql.Timestamp which throws exception saying the input is not in proper 
> format. The exception is suppressed and a null is returned. 
> Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
> {code:java}
> if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
> PrimitiveGrouping.DATE_GROUP) {
>   return b;
> }
> // date/timestamp is higher precedence than String_GROUP
> if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
> PrimitiveGrouping.DATE_GROUP) {
>   return a;
> }
> {code}
> The bug was introduced in  
> [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-15291) Comparison of timestamp fails if only date part is provided.

2016-11-30 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15291:

Comment: was deleted

(was: Added the .q test.)

> Comparison of timestamp fails if only date part is provided. 
> -
>
> Key: HIVE-15291
> URL: https://issues.apache.org/jira/browse/HIVE-15291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
> Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, 
> HIVE-15291.3.patch, HIVE-15291.4.patch
>
>
> Summary : If a query needs to compare two timestamp with one timestamp 
> provided in "-MM-DD" format, skipping the time part, it returns incorrect 
> result. 
> Steps to reproduce : 
> 1. Start a hive-cli. 
> 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
> "2016-12-30";
> 3. Expected result : true
> 4. Actual result : NULL
> Detailed description : 
> If two primitives of different type needs to compared, a common comparator 
> type is chosen. Prior to 2.1, Common type Text was chosen to compare 
> Timestamp type and Text type. 
> In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
> Text type. This leads to converting Text type (-MM-DD) into 
> java.sql.Timestamp which throws exception saying the input is not in proper 
> format. The exception is suppressed and a null is returned. 
> Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
> {code:java}
> if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
> PrimitiveGrouping.DATE_GROUP) {
>   return b;
> }
> // date/timestamp is higher precedence than String_GROUP
> if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
> PrimitiveGrouping.DATE_GROUP) {
>   return a;
> }
> {code}
> The bug was introduced in  
> [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.

2016-11-30 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15291:

Attachment: HIVE-15291.4.patch

> Comparison of timestamp fails if only date part is provided. 
> -
>
> Key: HIVE-15291
> URL: https://issues.apache.org/jira/browse/HIVE-15291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
> Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, 
> HIVE-15291.3.patch, HIVE-15291.4.patch
>
>
> Summary : If a query needs to compare two timestamp with one timestamp 
> provided in "-MM-DD" format, skipping the time part, it returns incorrect 
> result. 
> Steps to reproduce : 
> 1. Start a hive-cli. 
> 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
> "2016-12-30";
> 3. Expected result : true
> 4. Actual result : NULL
> Detailed description : 
> If two primitives of different type needs to compared, a common comparator 
> type is chosen. Prior to 2.1, Common type Text was chosen to compare 
> Timestamp type and Text type. 
> In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
> Text type. This leads to converting Text type (-MM-DD) into 
> java.sql.Timestamp which throws exception saying the input is not in proper 
> format. The exception is suppressed and a null is returned. 
> Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
> {code:java}
> if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
> PrimitiveGrouping.DATE_GROUP) {
>   return b;
> }
> // date/timestamp is higher precedence than String_GROUP
> if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
> PrimitiveGrouping.DATE_GROUP) {
>   return a;
> }
> {code}
> The bug was introduced in  
> [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.

2016-11-29 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15291:

Attachment: HIVE-15291.3.patch

> Comparison of timestamp fails if only date part is provided. 
> -
>
> Key: HIVE-15291
> URL: https://issues.apache.org/jira/browse/HIVE-15291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
> Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, 
> HIVE-15291.3.patch
>
>
> Summary : If a query needs to compare two timestamp with one timestamp 
> provided in "-MM-DD" format, skipping the time part, it returns incorrect 
> result. 
> Steps to reproduce : 
> 1. Start a hive-cli. 
> 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
> "2016-12-30";
> 3. Expected result : true
> 4. Actual result : NULL
> Detailed description : 
> If two primitives of different type needs to compared, a common comparator 
> type is chosen. Prior to 2.1, Common type Text was chosen to compare 
> Timestamp type and Text type. 
> In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
> Text type. This leads to converting Text type (-MM-DD) into 
> java.sql.Timestamp which throws exception saying the input is not in proper 
> format. The exception is suppressed and a null is returned. 
> Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
> {code:java}
> if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
> PrimitiveGrouping.DATE_GROUP) {
>   return b;
> }
> // date/timestamp is higher precedence than String_GROUP
> if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
> PrimitiveGrouping.DATE_GROUP) {
>   return a;
> }
> {code}
> The bug was introduced in  
> [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15291) Comparison of timestamp fails if only date part is provided.

2016-11-29 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705823#comment-15705823
 ] 

Dhiraj Kumar commented on HIVE-15291:
-

Hello [~pvary], 

I am removing extraneous test case. Agree that testGetTimestampFromString is 
sufficient to unit test the change. 

[~jdere] can you please review the change? 

--Dhiraj

> Comparison of timestamp fails if only date part is provided. 
> -
>
> Key: HIVE-15291
> URL: https://issues.apache.org/jira/browse/HIVE-15291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
> Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch
>
>
> Summary : If a query needs to compare two timestamp with one timestamp 
> provided in "-MM-DD" format, skipping the time part, it returns incorrect 
> result. 
> Steps to reproduce : 
> 1. Start a hive-cli. 
> 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
> "2016-12-30";
> 3. Expected result : true
> 4. Actual result : NULL
> Detailed description : 
> If two primitives of different type needs to compared, a common comparator 
> type is chosen. Prior to 2.1, Common type Text was chosen to compare 
> Timestamp type and Text type. 
> In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
> Text type. This leads to converting Text type (-MM-DD) into 
> java.sql.Timestamp which throws exception saying the input is not in proper 
> format. The exception is suppressed and a null is returned. 
> Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
> {code:java}
> if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
> PrimitiveGrouping.DATE_GROUP) {
>   return b;
> }
> // date/timestamp is higher precedence than String_GROUP
> if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
> PrimitiveGrouping.DATE_GROUP) {
>   return a;
> }
> {code}
> The bug was introduced in  
> [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.

2016-11-29 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15291:

Status: Patch Available  (was: Open)

> Comparison of timestamp fails if only date part is provided. 
> -
>
> Key: HIVE-15291
> URL: https://issues.apache.org/jira/browse/HIVE-15291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
> Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch
>
>
> Summary : If a query needs to compare two timestamp with one timestamp 
> provided in "-MM-DD" format, skipping the time part, it returns incorrect 
> result. 
> Steps to reproduce : 
> 1. Start a hive-cli. 
> 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
> "2016-12-30";
> 3. Expected result : true
> 4. Actual result : NULL
> Detailed description : 
> If two primitives of different type needs to compared, a common comparator 
> type is chosen. Prior to 2.1, Common type Text was chosen to compare 
> Timestamp type and Text type. 
> In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
> Text type. This leads to converting Text type (-MM-DD) into 
> java.sql.Timestamp which throws exception saying the input is not in proper 
> format. The exception is suppressed and a null is returned. 
> Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
> {code:java}
> if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
> PrimitiveGrouping.DATE_GROUP) {
>   return b;
> }
> // date/timestamp is higher precedence than String_GROUP
> if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
> PrimitiveGrouping.DATE_GROUP) {
>   return a;
> }
> {code}
> The bug was introduced in  
> [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.

2016-11-29 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15291:

Attachment: HIVE-15291.2.patch

This has unit test cases. 

> Comparison of timestamp fails if only date part is provided. 
> -
>
> Key: HIVE-15291
> URL: https://issues.apache.org/jira/browse/HIVE-15291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
> Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch
>
>
> Summary : If a query needs to compare two timestamp with one timestamp 
> provided in "-MM-DD" format, skipping the time part, it returns incorrect 
> result. 
> Steps to reproduce : 
> 1. Start a hive-cli. 
> 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
> "2016-12-30";
> 3. Expected result : true
> 4. Actual result : NULL
> Detailed description : 
> If two primitives of different type needs to compared, a common comparator 
> type is chosen. Prior to 2.1, Common type Text was chosen to compare 
> Timestamp type and Text type. 
> In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
> Text type. This leads to converting Text type (-MM-DD) into 
> java.sql.Timestamp which throws exception saying the input is not in proper 
> format. The exception is suppressed and a null is returned. 
> Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
> {code:java}
> if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
> PrimitiveGrouping.DATE_GROUP) {
>   return b;
> }
> // date/timestamp is higher precedence than String_GROUP
> if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
> PrimitiveGrouping.DATE_GROUP) {
>   return a;
> }
> {code}
> The bug was introduced in  
> [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-15291) Comparison of timestamp fails if only date part is provided.

2016-11-29 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar reassigned HIVE-15291:
---

Assignee: Dhiraj Kumar

> Comparison of timestamp fails if only date part is provided. 
> -
>
> Key: HIVE-15291
> URL: https://issues.apache.org/jira/browse/HIVE-15291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
>Assignee: Dhiraj Kumar
> Attachments: HIVE-15291.1.patch
>
>
> Summary : If a query needs to compare two timestamp with one timestamp 
> provided in "-MM-DD" format, skipping the time part, it returns incorrect 
> result. 
> Steps to reproduce : 
> 1. Start a hive-cli. 
> 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
> "2016-12-30";
> 3. Expected result : true
> 4. Actual result : NULL
> Detailed description : 
> If two primitives of different type needs to compared, a common comparator 
> type is chosen. Prior to 2.1, Common type Text was chosen to compare 
> Timestamp type and Text type. 
> In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
> Text type. This leads to converting Text type (-MM-DD) into 
> java.sql.Timestamp which throws exception saying the input is not in proper 
> format. The exception is suppressed and a null is returned. 
> Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
> {code:java}
> if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
> PrimitiveGrouping.DATE_GROUP) {
>   return b;
> }
> // date/timestamp is higher precedence than String_GROUP
> if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
> PrimitiveGrouping.DATE_GROUP) {
>   return a;
> }
> {code}
> The bug was introduced in  
> [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.

2016-11-28 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15291:

Description: 
Summary : If a query needs to compare two timestamp with one timestamp provided 
in "-MM-DD" format, skipping the time part, it returns incorrect result. 

Steps to reproduce : 

1. Start a hive-cli. 
2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
"2016-12-30";
3. Expected result : true
4. Actual result : NULL

Detailed description : 
If two primitives of different type needs to compared, a common comparator type 
is chosen. Prior to 2.1, Common type Text was chosen to compare Timestamp type 
and Text type. 

In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
Text type. This leads to converting Text type (-MM-DD) into 
java.sql.Timestamp which throws exception saying the input is not in proper 
format. The exception is suppressed and a null is returned. 

Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
{code:java}
if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
PrimitiveGrouping.DATE_GROUP) {
  return b;
}
// date/timestamp is higher precedence than String_GROUP
if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
PrimitiveGrouping.DATE_GROUP) {
  return a;
}
{code}


The bug was introduced in  
[HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]

  was:
Summary : If a query needs to compare two timestamp with one timestamp provided 
in "-MM-DD" format and skipping the time part, it returns incorrect result. 

Steps to reproduce : 

1. Start a hive-cli. 
2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
"2016-12-30";
3. Expected result : true
4. Actual result : NULL

Detailed description : 
If two primitives of different type needs to compared, a common comparator type 
is chosen. Prior to 2.1, Common type Text was chosen to compare Timestamp type 
and Text type. 

In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
Text type. This leads to converting Text type (-MM-DD) to be converted into 
java.sql.Timestamp which throws Exception saying the input is not in proper 
format. The exception is suppressed and a null is returned. 

Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
{code:java}
if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
PrimitiveGrouping.DATE_GROUP) {
  return b;
}
// date/timestamp is higher precedence than String_GROUP
if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
PrimitiveGrouping.DATE_GROUP) {
  return a;
}
{code}


The bug was introduced in  
[HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]


> Comparison of timestamp fails if only date part is provided. 
> -
>
> Key: HIVE-15291
> URL: https://issues.apache.org/jira/browse/HIVE-15291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
> Attachments: HIVE-15291.1.patch
>
>
> Summary : If a query needs to compare two timestamp with one timestamp 
> provided in "-MM-DD" format, skipping the time part, it returns incorrect 
> result. 
> Steps to reproduce : 
> 1. Start a hive-cli. 
> 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
> "2016-12-30";
> 3. Expected result : true
> 4. Actual result : NULL
> Detailed description : 
> If two primitives of different type needs to compared, a common comparator 
> type is chosen. Prior to 2.1, Common type Text was chosen to compare 
> Timestamp type and Text type. 
> In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
> Text type. This leads to converting Text type (-MM-DD) into 
> java.sql.Timestamp which throws exception saying the input is not in proper 
> format. The exception is suppressed and a null is returned. 
> Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
> {code:java}
> if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
> PrimitiveGrouping.DATE_GROUP) {
>   return b;
> }
> // date/timestamp is higher precedence than String_GROUP
> if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
> PrimitiveGrouping.DATE_GROUP) {
>   return a;
> }
> {code}
> The bug was introduced in  
> [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.

2016-11-28 Thread Dhiraj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Kumar updated HIVE-15291:

Attachment: HIVE-15291.1.patch

This patch fixes the issue trivially. 

> Comparison of timestamp fails if only date part is provided. 
> -
>
> Key: HIVE-15291
> URL: https://issues.apache.org/jira/browse/HIVE-15291
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.1.0
>Reporter: Dhiraj Kumar
> Attachments: HIVE-15291.1.patch
>
>
> Summary : If a query needs to compare two timestamp with one timestamp 
> provided in "-MM-DD" format and skipping the time part, it returns 
> incorrect result. 
> Steps to reproduce : 
> 1. Start a hive-cli. 
> 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > 
> "2016-12-30";
> 3. Expected result : true
> 4. Actual result : NULL
> Detailed description : 
> If two primitives of different type needs to compared, a common comparator 
> type is chosen. Prior to 2.1, Common type Text was chosen to compare 
> Timestamp type and Text type. 
> In version 2.1, Common type Timestamp is chosen to compare Timestamp type and 
> Text type. This leads to converting Text type (-MM-DD) to be converted 
> into java.sql.Timestamp which throws Exception saying the input is not in 
> proper format. The exception is suppressed and a null is returned. 
> Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry
> {code:java}
> if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == 
> PrimitiveGrouping.DATE_GROUP) {
>   return b;
> }
> // date/timestamp is higher precedence than String_GROUP
> if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == 
> PrimitiveGrouping.DATE_GROUP) {
>   return a;
> }
> {code}
> The bug was introduced in  
> [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13381) Timestamp & date should have precedence in type hierarchy than string group

2016-11-28 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701374#comment-15701374
 ] 

Dhiraj Kumar commented on HIVE-13381:
-

[~ashutoshc] This patch introduced a bug logged 
[HIVE-15291|https://issues.apache.org/jira/browse/HIVE-15291]

> Timestamp & date should have precedence in type hierarchy than string group
> ---
>
> Key: HIVE-13381
> URL: https://issues.apache.org/jira/browse/HIVE-13381
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Affects Versions: 1.0.0, 1.2.0, 1.1.0, 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.1.0
>
> Attachments: HIVE-13381.2.patch, HIVE-13381.3.patch, HIVE-13381.patch
>
>
> Both sql server & oracle treats date/timestamp higher in hierarchy than 
> varchars



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)