[jira] [Assigned] (HIVE-15486) DefaultGraphWalker invokes getChildren() as many times as there are children
[ https://issues.apache.org/jira/browse/HIVE-15486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar reassigned HIVE-15486: --- Assignee: Dhiraj Kumar > DefaultGraphWalker invokes getChildren() as many times as there are children > > > Key: HIVE-15486 > URL: https://issues.apache.org/jira/browse/HIVE-15486 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.0.0, 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar > Attachments: query > > > DefaultGraphWalker while walking a Node, calls up getChildren method as many > times as there are children. This leads a performance penalty where a node > has too many direct children. > Attached is query file. Instruction to run > 1. time hive -f query > Checkout the time on hive 1.2 version vs 2.1 version. > This change was introduced in > [HIVE-11652|https://issues.apache.org/jira/browse/HIVE-11652] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2
[ https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15811588#comment-15811588 ] Dhiraj Kumar commented on HIVE-15531: - If commons-logging is not used by hive then log4j-jcl is not required. But if commons-logging is being used by Hadoop, a bridge will be required. By default log4j-1.2-api acts as bridge for hadoop logging which had been added in the commit https://github.com/apache/hive/commit/c93d6c77e31e2eb9b40f5167ab3491d44eae351a Also agree that with classpath properly ordered, this issue will not surface at all. I would have liked it with log4j-jcl, removing classpath ordering dependency as well as user lib into classpath (coming higher than hive log4j's lib) I will leave it to you and will close the discussion here. > Hive breaks Hadoop commons logging with log4j2 > -- > > Key: HIVE-15531 > URL: https://issues.apache.org/jira/browse/HIVE-15531 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar >Priority: Minor > Attachments: HIVE-15531.patch > > > Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 > without bridge. > The bridge is missing in Hive. > This leads to a problem whereby commons-logging initialises a log4j (1.2) > version Logger, does not configure it properly since configuration for it is > missing and sends logging output to stdout (the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2
[ https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810881#comment-15810881 ] Dhiraj Kumar commented on HIVE-15531: - {quote} Does Hive logger still get log4j2, and Hadoop logger log4j? {quote} It is possible if classpath is messed up. Will depend what extra library end user has put in their classpath. That is exactly what happened in my case as explained in earlier comment. This patch makes sure that even if end users mess up their classpath (just from log4j perspective), still logging will be setup properly and won't get into log4j vs log4j2 issue. I did not look it from the perspective of slf4j. [~prasanth_j] can you have a look? > Hive breaks Hadoop commons logging with log4j2 > -- > > Key: HIVE-15531 > URL: https://issues.apache.org/jira/browse/HIVE-15531 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar >Priority: Minor > Attachments: HIVE-15531.patch > > > Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 > without bridge. > The bridge is missing in Hive. > This leads to a problem whereby commons-logging initialises a log4j (1.2) > version Logger, does not configure it properly since configuration for it is > missing and sends logging output to stdout (the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2
[ https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803961#comment-15803961 ] Dhiraj Kumar commented on HIVE-15531: - [~sershe] I thought about test cases with below approach a) Manipulating classpath at runtime to put log4j 1.2 before log4j 2.4 libraries and showcasing the fact that commons-log will pick older version of logger. b) Keeping log4j 1.2 in beginning and putting log4j-jcl anywhere in classpath and showing that commons-logging will pick Log4j 2.4 The problem with this approach I have to manipulate classpath at runtime which might affect other tests downstream. Moreover, I am not able to find a clean solution to do it. Would a trivial test like this suffice? {code} Log log = LogFactory.getLog(CommonsLoggingTest.class); assertEquals(log.getClass(),org.apache.logging.log4j.jcl.Log4jLog.class); {code} > Hive breaks Hadoop commons logging with log4j2 > -- > > Key: HIVE-15531 > URL: https://issues.apache.org/jira/browse/HIVE-15531 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar >Priority: Minor > Attachments: HIVE-15531.patch > > > Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 > without bridge. > The bridge is missing in Hive. > This leads to a problem whereby commons-logging initialises a log4j (1.2) > version Logger, does not configure it properly since configuration for it is > missing and sends logging output to stdout (the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2
[ https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800899#comment-15800899 ] Dhiraj Kumar commented on HIVE-15531: - Related issue [HIVE-11572|https://issues.apache.org/jira/browse/HIVE-11572] > Hive breaks Hadoop commons logging with log4j2 > -- > > Key: HIVE-15531 > URL: https://issues.apache.org/jira/browse/HIVE-15531 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar >Priority: Minor > Attachments: HIVE-15531.patch > > > Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 > without bridge. > The bridge is missing in Hive. > This leads to a problem whereby commons-logging initialises a log4j (1.2) > version Logger, does not configure it properly since configuration for it is > missing and sends logging output to stdout (the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2
[ https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15531: Description: Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 without bridge. The bridge is missing in Hive. This leads to a problem whereby commons-logging initialises a log4j (1.2) version Logger, does not configure it properly since configuration for it is missing and sends logging output to stdout (the default). was: Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 without bridge. The bridge is missing in Hive. This leads to a problem whereby commons-logging initialises a log4j (1.2) version, does not configure it properly since configuration for it is missing and sends logging output to stdout (the default). > Hive breaks Hadoop commons logging with log4j2 > -- > > Key: HIVE-15531 > URL: https://issues.apache.org/jira/browse/HIVE-15531 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar >Priority: Minor > Attachments: HIVE-15531.patch > > > Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 > without bridge. > The bridge is missing in Hive. > This leads to a problem whereby commons-logging initialises a log4j (1.2) > version Logger, does not configure it properly since configuration for it is > missing and sends logging output to stdout (the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2
[ https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798980#comment-15798980 ] Dhiraj Kumar commented on HIVE-15531: - [~sershe], I dug the bug deeper and found out that it was our environmental issue. There were two problem at our end, 1. An UDF jar in near beginning of classpath with log4j (1.2) classes in it. It was picked by Commons-logger. 2. Another jar, having a basic log4j.properties was picked for properties since hive configuration directory (top in class path) was missing log4j.properties and was having log4j2.properties. So, even without log4j-jcl (current patch), it may work fine. Since log4j-1.2-api (already part of hive 2.x) has org.apache.log4j.Logger class and initialises it properly with log4j2 configuration. That works like bridge. Provided there is no other class in class path before log4j-1.2-api which has org.apache.log4j.Logger class. log4j-jcl overcomes classpath issue since commons-logging tries to find LogFactory and log4j-jcl injects itself there. So there won't be any classpath issue with it. Code snippet below from org.apache.commons.logging.LogFactory {code} // Second, try to find a service by using the JDK1.3 class // discovery mechanism, which involves putting a file with the name // of an interface class in the META-INF/services directory, where the // contents of the file is a single line specifying a concrete class // that implements the desired interface. if (factory == null) { if (isDiagnosticsEnabled()) { logDiagnostic("[LOOKUP] Looking for a resource file of name [" + SERVICE_ID + "] to define the LogFactory subclass to use..."); } try { final InputStream is = getResourceAsStream(contextClassLoader, SERVICE_ID); {code} Since this issue reflects itself based on classpath, I am thinking to inject log4j 1.2 in classpath with test scope and check that right class has been loaded by Commons logging. Would like to know your thoughts. > Hive breaks Hadoop commons logging with log4j2 > -- > > Key: HIVE-15531 > URL: https://issues.apache.org/jira/browse/HIVE-15531 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar >Priority: Minor > Attachments: HIVE-15531.patch > > > Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 > without bridge. > The bridge is missing in Hive. > This leads to a problem whereby commons-logging initialises a log4j (1.2) > version, does not configure it properly since configuration for it is missing > and sends logging output to stdout (the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2
[ https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15531: Status: Patch Available (was: Open) > Hive breaks Hadoop commons logging with log4j2 > -- > > Key: HIVE-15531 > URL: https://issues.apache.org/jira/browse/HIVE-15531 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar >Priority: Minor > Attachments: HIVE-15531.patch > > > Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 > without bridge. > The bridge is missing in Hive. > This leads to a problem whereby commons-logging initialises a log4j (1.2) > version, does not configure it properly since configuration for it is missing > and sends logging output to stdout (the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2
[ https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15531: Status: In Progress (was: Patch Available) > Hive breaks Hadoop commons logging with log4j2 > -- > > Key: HIVE-15531 > URL: https://issues.apache.org/jira/browse/HIVE-15531 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar >Priority: Minor > Attachments: HIVE-15531.patch > > > Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 > without bridge. > The bridge is missing in Hive. > This leads to a problem whereby commons-logging initialises a log4j (1.2) > version, does not configure it properly since configuration for it is missing > and sends logging output to stdout (the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15531) Hive breaks Hadoop commons logging with log4j2
[ https://issues.apache.org/jira/browse/HIVE-15531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15531: Attachment: HIVE-15531.patch > Hive breaks Hadoop commons logging with log4j2 > -- > > Key: HIVE-15531 > URL: https://issues.apache.org/jira/browse/HIVE-15531 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar >Priority: Minor > Attachments: HIVE-15531.patch > > > Hadoop (2.7), which is using Commons-logging is not compatible with log4j2 > without bridge. > The bridge is missing in Hive. > This leads to a problem whereby commons-logging initialises a log4j (1.2) > version, does not configure it properly since configuration for it is missing > and sends logging output to stdout (the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767191#comment-15767191 ] Dhiraj Kumar edited comment on HIVE-11652 at 12/21/16 2:33 PM: --- [~jcamachorodriguez] As suggested, I have create another bug at [HIVE-15486|https://issues.apache.org/jira/browse/HIVE-15486] I have attached a larger version of the same query that I had put here in the trail. It has 50K elements inside IN clause. Although it seems a convoluted query, we do have similar query running on our production system. We did not face problem until the upgrade to 2.1 from an older version. Although, this codepath is not the only problem for this query. But it accounts for 50% of the time consumed at hive processing. I did not get your suggestion of keeping last added child in the stack. Since the node peeked from stack would keep changing, the list of child would change too for that node. Keeping the last added won't be able to help since you have lost list of children once you peeked another node from stack. I believe you need to keep all the children of node as well. In fact that finding position itself is not a problem, but ASTNode.getChildren() invocation is problem. I have to pick something in ASTNode.java as well, why to add single child at at time to ret_vec. Why not return all the children() in one shot? {code} public ArrayList getChildren() { if (super.getChildCount() == 0) { return null; } ArrayList ret_vec = new ArrayList(); for (int i = 0; i < super.getChildCount(); ++i) { ret_vec.add((Node) super.getChild(i)); } return ret_vec; } {code} I will add some profiler output to bug as well. was (Author: dhiraj.kumar): [~jcamachorodriguez] As suggested, I have create another bug at [https://issues.apache.org/jira/browse/HIVE-15486] I have attached a larger version of the same query that I had put here in the trail. It has 50K elements inside IN clause. Although it seems a convoluted query, we do have similar query running on our production system. We did not face problem until the upgrade to 2.1 from an older version. Although, this codepath is not the only problem for this query. But it accounts for 50% of the time consumed at hive processing. I did not get your suggestion of keeping last added child in the stack. Since the node peeked from stack would keep changing, the list of child would change too for that node. Keeping the last added won't be able to help since you have lost list of children once you peeked another node from stack. I believe you need to keep all the children of node as well. In fact that finding position itself is not a problem, but ASTNode.getChildren() invocation is problem. I have to pick something in ASTNode.java as well, why to add single child at at time to ret_vec. Why not return all the children() in one shot? {code} public ArrayList getChildren() { if (super.getChildCount() == 0) { return null; } ArrayList ret_vec = new ArrayList(); for (int i = 0; i < super.getChildCount(); ++i) { ret_vec.add((Node) super.getChild(i)); } return ret_vec; } {code} I will add some profiler output to bug as well. > Avoid expensive call to removeAll in DefaultGraphWalker > --- > > Key: HIVE-11652 > URL: https://issues.apache.org/jira/browse/HIVE-11652 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Physical Optimizer >Affects Versions: 1.3.0, 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, > HIVE-11652.patch > > > When the plan is too large, the removeAll call in DefaultGraphWalker (line > 140) will take very long as it will have to go through the list looking for > each of the nodes. We try to get rid of this call by rewriting the logic in > the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767191#comment-15767191 ] Dhiraj Kumar commented on HIVE-11652: - [~jcamachorodriguez] As suggested, I have create another bug at [https://issues.apache.org/jira/browse/HIVE-15486] I have attached a larger version of the same query that I had put here in the trail. It has 50K elements inside IN clause. Although it seems a convoluted query, we do have similar query running on our production system. We did not face problem until the upgrade to 2.1 from an older version. Although, this codepath is not the only problem for this query. But it accounts for 50% of the time consumed at hive processing. I did not get your suggestion of keeping last added child in the stack. Since the node peeked from stack would keep changing, the list of child would change too for that node. Keeping the last added won't be able to help since you have lost list of children once you peeked another node from stack. I believe you need to keep all the children of node as well. In fact that finding position itself is not a problem, but ASTNode.getChildren() invocation is problem. I have to pick something in ASTNode.java as well, why to add single child at at time to ret_vec. Why not return all the children() in one shot? {code} public ArrayList getChildren() { if (super.getChildCount() == 0) { return null; } ArrayList ret_vec = new ArrayList(); for (int i = 0; i < super.getChildCount(); ++i) { ret_vec.add((Node) super.getChild(i)); } return ret_vec; } {code} I will add some profiler output to bug as well. > Avoid expensive call to removeAll in DefaultGraphWalker > --- > > Key: HIVE-11652 > URL: https://issues.apache.org/jira/browse/HIVE-11652 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Physical Optimizer >Affects Versions: 1.3.0, 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, > HIVE-11652.patch > > > When the plan is too large, the removeAll call in DefaultGraphWalker (line > 140) will take very long as it will have to go through the list looking for > each of the nodes. We try to get rid of this call by rewriting the logic in > the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15486) DefaultGraphWalker invokes getChildren() as many times as there are children
[ https://issues.apache.org/jira/browse/HIVE-15486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15486: Description: DefaultGraphWalker while walking a Node, calls up getChildren method as many times as there are children. This leads a performance penalty where a node has too many direct children. Attached is query file. Instruction to run 1. time hive -f query Checkout the time on hive 1.2 version vs 2.1 version. This change was introduced in [HIVE-11652|https://issues.apache.org/jira/browse/HIVE-11652] was: DefaultGraphWalker while walking a Node, calls up getChildren method as many times as there are children. This leads a performance penalty where a node has too many direct children. Attached is query file. Instructions to run 1. time hive -f query Checkout the time on 1.2 version vs 2.1 version. This change was introduced in [HIVE-11652|https://issues.apache.org/jira/browse/HIVE-11652] > DefaultGraphWalker invokes getChildren() as many times as there are children > > > Key: HIVE-15486 > URL: https://issues.apache.org/jira/browse/HIVE-15486 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.0.0, 2.1.0 >Reporter: Dhiraj Kumar > Attachments: query > > > DefaultGraphWalker while walking a Node, calls up getChildren method as many > times as there are children. This leads a performance penalty where a node > has too many direct children. > Attached is query file. Instruction to run > 1. time hive -f query > Checkout the time on hive 1.2 version vs 2.1 version. > This change was introduced in > [HIVE-11652|https://issues.apache.org/jira/browse/HIVE-11652] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15486) DefaultGraphWalker invokes getChildren() as many times as there are children
[ https://issues.apache.org/jira/browse/HIVE-15486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15486: Attachment: query > DefaultGraphWalker invokes getChildren() as many times as there are children > > > Key: HIVE-15486 > URL: https://issues.apache.org/jira/browse/HIVE-15486 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.0.0, 2.1.0 >Reporter: Dhiraj Kumar > Attachments: query > > > DefaultGraphWalker while walking a Node, calls up getChildren method as many > times as there are children. This leads a performance penalty where a node > has too many direct children. > Attached is query file. Instructions to run > 1. time hive -f query > Checkout the time on 1.2 version vs 2.1 version. > This change was introduced in > [HIVE-11652|https://issues.apache.org/jira/browse/HIVE-11652] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764924#comment-15764924 ] Dhiraj Kumar commented on HIVE-11652: - [~jcamachorodriguez] The patch causes a performance issue. Example query. {code}select a from (select 1 as a ) tbl where a in (1,2,3,4,5,6,7,8,9,10);{code} Method source {code} protected void walk(Node nd) throws SemanticException { // Push the node in the stack opStack.push(nd); // While there are still nodes to dispatch... while (!opStack.empty()) { Node node = opStack.peek(); if (node.getChildren() == null || getDispatchedList().containsAll(node.getChildren())) { // Dispatch current node if (!getDispatchedList().contains(node)) { dispatch(node, opStack); opQueue.add(node); } opStack.pop(); continue; } // Add a single child and restart the loop for (Node childNode : node.getChildren()) { if (!getDispatchedList().contains(childNode)) { opStack.push(childNode); break; } } } // end while } {code} The walk method will push the root node onto stack (where clause in this case, which has 12 child) and will call all its direct child at line 166. It will process single child (in this example) and will again invoke node.getChildren(). A total of 12 invocation of getChildren() will be made. Now, if in clause has huge list, it will causes 1. As many invocation of getChildren() method as there are children. So if "in clause" has 50K values, getChildren() will be invoked 50K times. 2. Huge number of nodes and their repeated invocation puts memory pressure in ASTNode.getChildren(). Since it returns all the children in every case. 3. Since the thread has taken a lock initially before compilation started, it blocks another compilation to make progress. Depending on the query, it is order of magnitude slower. > Avoid expensive call to removeAll in DefaultGraphWalker > --- > > Key: HIVE-11652 > URL: https://issues.apache.org/jira/browse/HIVE-11652 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Physical Optimizer >Affects Versions: 1.3.0, 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, > HIVE-11652.patch > > > When the plan is too large, the removeAll call in DefaultGraphWalker (line > 140) will take very long as it will have to go through the list looking for > each of the nodes. We try to get rid of this call by rewriting the logic in > the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.
[ https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15291: Hadoop Flags: (was: Incompatible change) > Comparison of timestamp fails if only date part is provided. > - > > Key: HIVE-15291 > URL: https://issues.apache.org/jira/browse/HIVE-15291 > Project: Hive > Issue Type: Bug > Components: Hive, UDF >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar > Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, > HIVE-15291.3.patch, HIVE-15291.4.patch > > > Summary : If a query needs to compare two timestamp with one timestamp > provided in "-MM-DD" format, skipping the time part, it returns incorrect > result. > Steps to reproduce : > 1. Start a hive-cli. > 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > > "2016-12-30"; > 3. Expected result : true > 4. Actual result : NULL > Detailed description : > If two primitives of different type needs to compared, a common comparator > type is chosen. Prior to 2.1, Common type Text was chosen to compare > Timestamp type and Text type. > In version 2.1, Common type Timestamp is chosen to compare Timestamp type and > Text type. This leads to converting Text type (-MM-DD) into > java.sql.Timestamp which throws exception saying the input is not in proper > format. The exception is suppressed and a null is returned. > Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry > {code:java} > if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == > PrimitiveGrouping.DATE_GROUP) { > return b; > } > // date/timestamp is higher precedence than String_GROUP > if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == > PrimitiveGrouping.DATE_GROUP) { > return a; > } > {code} > The bug was introduced in > [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15291) Comparison of timestamp fails if only date part is provided.
[ https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711714#comment-15711714 ] Dhiraj Kumar commented on HIVE-15291: - It was put from the perspective of result being incompatible with hive 1.2 version. But since we agree that it is a bug, it is no longer "Incompatible change". > Comparison of timestamp fails if only date part is provided. > - > > Key: HIVE-15291 > URL: https://issues.apache.org/jira/browse/HIVE-15291 > Project: Hive > Issue Type: Bug > Components: Hive, UDF >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar > Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, > HIVE-15291.3.patch, HIVE-15291.4.patch > > > Summary : If a query needs to compare two timestamp with one timestamp > provided in "-MM-DD" format, skipping the time part, it returns incorrect > result. > Steps to reproduce : > 1. Start a hive-cli. > 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > > "2016-12-30"; > 3. Expected result : true > 4. Actual result : NULL > Detailed description : > If two primitives of different type needs to compared, a common comparator > type is chosen. Prior to 2.1, Common type Text was chosen to compare > Timestamp type and Text type. > In version 2.1, Common type Timestamp is chosen to compare Timestamp type and > Text type. This leads to converting Text type (-MM-DD) into > java.sql.Timestamp which throws exception saying the input is not in proper > format. The exception is suppressed and a null is returned. > Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry > {code:java} > if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == > PrimitiveGrouping.DATE_GROUP) { > return b; > } > // date/timestamp is higher precedence than String_GROUP > if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == > PrimitiveGrouping.DATE_GROUP) { > return a; > } > {code} > The bug was introduced in > [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15291) Comparison of timestamp fails if only date part is provided.
[ https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708407#comment-15708407 ] Dhiraj Kumar commented on HIVE-15291: - Added the .q test. > Comparison of timestamp fails if only date part is provided. > - > > Key: HIVE-15291 > URL: https://issues.apache.org/jira/browse/HIVE-15291 > Project: Hive > Issue Type: Bug > Components: Hive, UDF >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar > Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, > HIVE-15291.3.patch, HIVE-15291.4.patch > > > Summary : If a query needs to compare two timestamp with one timestamp > provided in "-MM-DD" format, skipping the time part, it returns incorrect > result. > Steps to reproduce : > 1. Start a hive-cli. > 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > > "2016-12-30"; > 3. Expected result : true > 4. Actual result : NULL > Detailed description : > If two primitives of different type needs to compared, a common comparator > type is chosen. Prior to 2.1, Common type Text was chosen to compare > Timestamp type and Text type. > In version 2.1, Common type Timestamp is chosen to compare Timestamp type and > Text type. This leads to converting Text type (-MM-DD) into > java.sql.Timestamp which throws exception saying the input is not in proper > format. The exception is suppressed and a null is returned. > Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry > {code:java} > if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == > PrimitiveGrouping.DATE_GROUP) { > return b; > } > // date/timestamp is higher precedence than String_GROUP > if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == > PrimitiveGrouping.DATE_GROUP) { > return a; > } > {code} > The bug was introduced in > [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15291) Comparison of timestamp fails if only date part is provided.
[ https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708409#comment-15708409 ] Dhiraj Kumar commented on HIVE-15291: - Added the .q test. > Comparison of timestamp fails if only date part is provided. > - > > Key: HIVE-15291 > URL: https://issues.apache.org/jira/browse/HIVE-15291 > Project: Hive > Issue Type: Bug > Components: Hive, UDF >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar > Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, > HIVE-15291.3.patch, HIVE-15291.4.patch > > > Summary : If a query needs to compare two timestamp with one timestamp > provided in "-MM-DD" format, skipping the time part, it returns incorrect > result. > Steps to reproduce : > 1. Start a hive-cli. > 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > > "2016-12-30"; > 3. Expected result : true > 4. Actual result : NULL > Detailed description : > If two primitives of different type needs to compared, a common comparator > type is chosen. Prior to 2.1, Common type Text was chosen to compare > Timestamp type and Text type. > In version 2.1, Common type Timestamp is chosen to compare Timestamp type and > Text type. This leads to converting Text type (-MM-DD) into > java.sql.Timestamp which throws exception saying the input is not in proper > format. The exception is suppressed and a null is returned. > Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry > {code:java} > if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == > PrimitiveGrouping.DATE_GROUP) { > return b; > } > // date/timestamp is higher precedence than String_GROUP > if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == > PrimitiveGrouping.DATE_GROUP) { > return a; > } > {code} > The bug was introduced in > [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-15291) Comparison of timestamp fails if only date part is provided.
[ https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15291: Comment: was deleted (was: Added the .q test.) > Comparison of timestamp fails if only date part is provided. > - > > Key: HIVE-15291 > URL: https://issues.apache.org/jira/browse/HIVE-15291 > Project: Hive > Issue Type: Bug > Components: Hive, UDF >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar > Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, > HIVE-15291.3.patch, HIVE-15291.4.patch > > > Summary : If a query needs to compare two timestamp with one timestamp > provided in "-MM-DD" format, skipping the time part, it returns incorrect > result. > Steps to reproduce : > 1. Start a hive-cli. > 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > > "2016-12-30"; > 3. Expected result : true > 4. Actual result : NULL > Detailed description : > If two primitives of different type needs to compared, a common comparator > type is chosen. Prior to 2.1, Common type Text was chosen to compare > Timestamp type and Text type. > In version 2.1, Common type Timestamp is chosen to compare Timestamp type and > Text type. This leads to converting Text type (-MM-DD) into > java.sql.Timestamp which throws exception saying the input is not in proper > format. The exception is suppressed and a null is returned. > Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry > {code:java} > if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == > PrimitiveGrouping.DATE_GROUP) { > return b; > } > // date/timestamp is higher precedence than String_GROUP > if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == > PrimitiveGrouping.DATE_GROUP) { > return a; > } > {code} > The bug was introduced in > [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.
[ https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15291: Attachment: HIVE-15291.4.patch > Comparison of timestamp fails if only date part is provided. > - > > Key: HIVE-15291 > URL: https://issues.apache.org/jira/browse/HIVE-15291 > Project: Hive > Issue Type: Bug > Components: Hive, UDF >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar > Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, > HIVE-15291.3.patch, HIVE-15291.4.patch > > > Summary : If a query needs to compare two timestamp with one timestamp > provided in "-MM-DD" format, skipping the time part, it returns incorrect > result. > Steps to reproduce : > 1. Start a hive-cli. > 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > > "2016-12-30"; > 3. Expected result : true > 4. Actual result : NULL > Detailed description : > If two primitives of different type needs to compared, a common comparator > type is chosen. Prior to 2.1, Common type Text was chosen to compare > Timestamp type and Text type. > In version 2.1, Common type Timestamp is chosen to compare Timestamp type and > Text type. This leads to converting Text type (-MM-DD) into > java.sql.Timestamp which throws exception saying the input is not in proper > format. The exception is suppressed and a null is returned. > Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry > {code:java} > if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == > PrimitiveGrouping.DATE_GROUP) { > return b; > } > // date/timestamp is higher precedence than String_GROUP > if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == > PrimitiveGrouping.DATE_GROUP) { > return a; > } > {code} > The bug was introduced in > [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.
[ https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15291: Attachment: HIVE-15291.3.patch > Comparison of timestamp fails if only date part is provided. > - > > Key: HIVE-15291 > URL: https://issues.apache.org/jira/browse/HIVE-15291 > Project: Hive > Issue Type: Bug > Components: Hive, UDF >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar > Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch, > HIVE-15291.3.patch > > > Summary : If a query needs to compare two timestamp with one timestamp > provided in "-MM-DD" format, skipping the time part, it returns incorrect > result. > Steps to reproduce : > 1. Start a hive-cli. > 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > > "2016-12-30"; > 3. Expected result : true > 4. Actual result : NULL > Detailed description : > If two primitives of different type needs to compared, a common comparator > type is chosen. Prior to 2.1, Common type Text was chosen to compare > Timestamp type and Text type. > In version 2.1, Common type Timestamp is chosen to compare Timestamp type and > Text type. This leads to converting Text type (-MM-DD) into > java.sql.Timestamp which throws exception saying the input is not in proper > format. The exception is suppressed and a null is returned. > Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry > {code:java} > if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == > PrimitiveGrouping.DATE_GROUP) { > return b; > } > // date/timestamp is higher precedence than String_GROUP > if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == > PrimitiveGrouping.DATE_GROUP) { > return a; > } > {code} > The bug was introduced in > [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15291) Comparison of timestamp fails if only date part is provided.
[ https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705823#comment-15705823 ] Dhiraj Kumar commented on HIVE-15291: - Hello [~pvary], I am removing extraneous test case. Agree that testGetTimestampFromString is sufficient to unit test the change. [~jdere] can you please review the change? --Dhiraj > Comparison of timestamp fails if only date part is provided. > - > > Key: HIVE-15291 > URL: https://issues.apache.org/jira/browse/HIVE-15291 > Project: Hive > Issue Type: Bug > Components: Hive, UDF >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar > Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch > > > Summary : If a query needs to compare two timestamp with one timestamp > provided in "-MM-DD" format, skipping the time part, it returns incorrect > result. > Steps to reproduce : > 1. Start a hive-cli. > 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > > "2016-12-30"; > 3. Expected result : true > 4. Actual result : NULL > Detailed description : > If two primitives of different type needs to compared, a common comparator > type is chosen. Prior to 2.1, Common type Text was chosen to compare > Timestamp type and Text type. > In version 2.1, Common type Timestamp is chosen to compare Timestamp type and > Text type. This leads to converting Text type (-MM-DD) into > java.sql.Timestamp which throws exception saying the input is not in proper > format. The exception is suppressed and a null is returned. > Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry > {code:java} > if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == > PrimitiveGrouping.DATE_GROUP) { > return b; > } > // date/timestamp is higher precedence than String_GROUP > if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == > PrimitiveGrouping.DATE_GROUP) { > return a; > } > {code} > The bug was introduced in > [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.
[ https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15291: Status: Patch Available (was: Open) > Comparison of timestamp fails if only date part is provided. > - > > Key: HIVE-15291 > URL: https://issues.apache.org/jira/browse/HIVE-15291 > Project: Hive > Issue Type: Bug > Components: Hive, UDF >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar > Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch > > > Summary : If a query needs to compare two timestamp with one timestamp > provided in "-MM-DD" format, skipping the time part, it returns incorrect > result. > Steps to reproduce : > 1. Start a hive-cli. > 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > > "2016-12-30"; > 3. Expected result : true > 4. Actual result : NULL > Detailed description : > If two primitives of different type needs to compared, a common comparator > type is chosen. Prior to 2.1, Common type Text was chosen to compare > Timestamp type and Text type. > In version 2.1, Common type Timestamp is chosen to compare Timestamp type and > Text type. This leads to converting Text type (-MM-DD) into > java.sql.Timestamp which throws exception saying the input is not in proper > format. The exception is suppressed and a null is returned. > Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry > {code:java} > if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == > PrimitiveGrouping.DATE_GROUP) { > return b; > } > // date/timestamp is higher precedence than String_GROUP > if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == > PrimitiveGrouping.DATE_GROUP) { > return a; > } > {code} > The bug was introduced in > [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.
[ https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15291: Attachment: HIVE-15291.2.patch This has unit test cases. > Comparison of timestamp fails if only date part is provided. > - > > Key: HIVE-15291 > URL: https://issues.apache.org/jira/browse/HIVE-15291 > Project: Hive > Issue Type: Bug > Components: Hive, UDF >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar > Attachments: HIVE-15291.1.patch, HIVE-15291.2.patch > > > Summary : If a query needs to compare two timestamp with one timestamp > provided in "-MM-DD" format, skipping the time part, it returns incorrect > result. > Steps to reproduce : > 1. Start a hive-cli. > 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > > "2016-12-30"; > 3. Expected result : true > 4. Actual result : NULL > Detailed description : > If two primitives of different type needs to compared, a common comparator > type is chosen. Prior to 2.1, Common type Text was chosen to compare > Timestamp type and Text type. > In version 2.1, Common type Timestamp is chosen to compare Timestamp type and > Text type. This leads to converting Text type (-MM-DD) into > java.sql.Timestamp which throws exception saying the input is not in proper > format. The exception is suppressed and a null is returned. > Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry > {code:java} > if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == > PrimitiveGrouping.DATE_GROUP) { > return b; > } > // date/timestamp is higher precedence than String_GROUP > if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == > PrimitiveGrouping.DATE_GROUP) { > return a; > } > {code} > The bug was introduced in > [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-15291) Comparison of timestamp fails if only date part is provided.
[ https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar reassigned HIVE-15291: --- Assignee: Dhiraj Kumar > Comparison of timestamp fails if only date part is provided. > - > > Key: HIVE-15291 > URL: https://issues.apache.org/jira/browse/HIVE-15291 > Project: Hive > Issue Type: Bug > Components: Hive, UDF >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar >Assignee: Dhiraj Kumar > Attachments: HIVE-15291.1.patch > > > Summary : If a query needs to compare two timestamp with one timestamp > provided in "-MM-DD" format, skipping the time part, it returns incorrect > result. > Steps to reproduce : > 1. Start a hive-cli. > 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > > "2016-12-30"; > 3. Expected result : true > 4. Actual result : NULL > Detailed description : > If two primitives of different type needs to compared, a common comparator > type is chosen. Prior to 2.1, Common type Text was chosen to compare > Timestamp type and Text type. > In version 2.1, Common type Timestamp is chosen to compare Timestamp type and > Text type. This leads to converting Text type (-MM-DD) into > java.sql.Timestamp which throws exception saying the input is not in proper > format. The exception is suppressed and a null is returned. > Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry > {code:java} > if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == > PrimitiveGrouping.DATE_GROUP) { > return b; > } > // date/timestamp is higher precedence than String_GROUP > if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == > PrimitiveGrouping.DATE_GROUP) { > return a; > } > {code} > The bug was introduced in > [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.
[ https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15291: Description: Summary : If a query needs to compare two timestamp with one timestamp provided in "-MM-DD" format, skipping the time part, it returns incorrect result. Steps to reproduce : 1. Start a hive-cli. 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > "2016-12-30"; 3. Expected result : true 4. Actual result : NULL Detailed description : If two primitives of different type needs to compared, a common comparator type is chosen. Prior to 2.1, Common type Text was chosen to compare Timestamp type and Text type. In version 2.1, Common type Timestamp is chosen to compare Timestamp type and Text type. This leads to converting Text type (-MM-DD) into java.sql.Timestamp which throws exception saying the input is not in proper format. The exception is suppressed and a null is returned. Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry {code:java} if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == PrimitiveGrouping.DATE_GROUP) { return b; } // date/timestamp is higher precedence than String_GROUP if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == PrimitiveGrouping.DATE_GROUP) { return a; } {code} The bug was introduced in [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] was: Summary : If a query needs to compare two timestamp with one timestamp provided in "-MM-DD" format and skipping the time part, it returns incorrect result. Steps to reproduce : 1. Start a hive-cli. 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > "2016-12-30"; 3. Expected result : true 4. Actual result : NULL Detailed description : If two primitives of different type needs to compared, a common comparator type is chosen. Prior to 2.1, Common type Text was chosen to compare Timestamp type and Text type. In version 2.1, Common type Timestamp is chosen to compare Timestamp type and Text type. This leads to converting Text type (-MM-DD) to be converted into java.sql.Timestamp which throws Exception saying the input is not in proper format. The exception is suppressed and a null is returned. Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry {code:java} if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == PrimitiveGrouping.DATE_GROUP) { return b; } // date/timestamp is higher precedence than String_GROUP if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == PrimitiveGrouping.DATE_GROUP) { return a; } {code} The bug was introduced in [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] > Comparison of timestamp fails if only date part is provided. > - > > Key: HIVE-15291 > URL: https://issues.apache.org/jira/browse/HIVE-15291 > Project: Hive > Issue Type: Bug > Components: Hive, UDF >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar > Attachments: HIVE-15291.1.patch > > > Summary : If a query needs to compare two timestamp with one timestamp > provided in "-MM-DD" format, skipping the time part, it returns incorrect > result. > Steps to reproduce : > 1. Start a hive-cli. > 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > > "2016-12-30"; > 3. Expected result : true > 4. Actual result : NULL > Detailed description : > If two primitives of different type needs to compared, a common comparator > type is chosen. Prior to 2.1, Common type Text was chosen to compare > Timestamp type and Text type. > In version 2.1, Common type Timestamp is chosen to compare Timestamp type and > Text type. This leads to converting Text type (-MM-DD) into > java.sql.Timestamp which throws exception saying the input is not in proper > format. The exception is suppressed and a null is returned. > Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry > {code:java} > if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == > PrimitiveGrouping.DATE_GROUP) { > return b; > } > // date/timestamp is higher precedence than String_GROUP > if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == > PrimitiveGrouping.DATE_GROUP) { > return a; > } > {code} > The bug was introduced in > [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15291) Comparison of timestamp fails if only date part is provided.
[ https://issues.apache.org/jira/browse/HIVE-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhiraj Kumar updated HIVE-15291: Attachment: HIVE-15291.1.patch This patch fixes the issue trivially. > Comparison of timestamp fails if only date part is provided. > - > > Key: HIVE-15291 > URL: https://issues.apache.org/jira/browse/HIVE-15291 > Project: Hive > Issue Type: Bug > Components: Hive, UDF >Affects Versions: 2.1.0 >Reporter: Dhiraj Kumar > Attachments: HIVE-15291.1.patch > > > Summary : If a query needs to compare two timestamp with one timestamp > provided in "-MM-DD" format and skipping the time part, it returns > incorrect result. > Steps to reproduce : > 1. Start a hive-cli. > 2. Fire up the query -> select cast("2016-12-31 12:00:00" as timestamp) > > "2016-12-30"; > 3. Expected result : true > 4. Actual result : NULL > Detailed description : > If two primitives of different type needs to compared, a common comparator > type is chosen. Prior to 2.1, Common type Text was chosen to compare > Timestamp type and Text type. > In version 2.1, Common type Timestamp is chosen to compare Timestamp type and > Text type. This leads to converting Text type (-MM-DD) to be converted > into java.sql.Timestamp which throws Exception saying the input is not in > proper format. The exception is suppressed and a null is returned. > Code below from org.apache.hadoop.hive.ql.exec.FunctionRegistry > {code:java} > if (pgA == PrimitiveGrouping.STRING_GROUP && pgB == > PrimitiveGrouping.DATE_GROUP) { > return b; > } > // date/timestamp is higher precedence than String_GROUP > if (pgB == PrimitiveGrouping.STRING_GROUP && pgA == > PrimitiveGrouping.DATE_GROUP) { > return a; > } > {code} > The bug was introduced in > [HIVE-13381|https://issues.apache.org/jira/browse/HIVE-13381] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13381) Timestamp & date should have precedence in type hierarchy than string group
[ https://issues.apache.org/jira/browse/HIVE-13381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15701374#comment-15701374 ] Dhiraj Kumar commented on HIVE-13381: - [~ashutoshc] This patch introduced a bug logged [HIVE-15291|https://issues.apache.org/jira/browse/HIVE-15291] > Timestamp & date should have precedence in type hierarchy than string group > --- > > Key: HIVE-13381 > URL: https://issues.apache.org/jira/browse/HIVE-13381 > Project: Hive > Issue Type: Bug > Components: Types >Affects Versions: 1.0.0, 1.2.0, 1.1.0, 2.0.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 2.1.0 > > Attachments: HIVE-13381.2.patch, HIVE-13381.3.patch, HIVE-13381.patch > > > Both sql server & oracle treats date/timestamp higher in hierarchy than > varchars -- This message was sent by Atlassian JIRA (v6.3.4#6332)