[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767233#comment-15767233 ] Jesus Camacho Rodriguez commented on HIVE-11652: [~dhiraj.kumar], thanks for creating the issue. I meant keeping position per node, not only a single position for last node. That is why it would add memory pressure. {quote} In fact that finding position itself is not a problem, but ASTNode.getChildren() invocation is problem. {quote} Yes, that was my guess when I checked the code and probably we should focus on this part. Each time we call that method, a new list is created and its children are copied to it; I do not not know the original reason why we overwrote _getChildren_ to do that in every call and if there is actually a good reason for that. I would expect that by default, we return the original list; in turn, we return a copy if it is explicitly pointed out (via boolean). In any case, we can continue the discussion in HIVE-15486. > Avoid expensive call to removeAll in DefaultGraphWalker > --- > > Key: HIVE-11652 > URL: https://issues.apache.org/jira/browse/HIVE-11652 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Physical Optimizer >Affects Versions: 1.3.0, 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, > HIVE-11652.patch > > > When the plan is too large, the removeAll call in DefaultGraphWalker (line > 140) will take very long as it will have to go through the list looking for > each of the nodes. We try to get rid of this call by rewriting the logic in > the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767191#comment-15767191 ] Dhiraj Kumar commented on HIVE-11652: - [~jcamachorodriguez] As suggested, I have create another bug at [https://issues.apache.org/jira/browse/HIVE-15486] I have attached a larger version of the same query that I had put here in the trail. It has 50K elements inside IN clause. Although it seems a convoluted query, we do have similar query running on our production system. We did not face problem until the upgrade to 2.1 from an older version. Although, this codepath is not the only problem for this query. But it accounts for 50% of the time consumed at hive processing. I did not get your suggestion of keeping last added child in the stack. Since the node peeked from stack would keep changing, the list of child would change too for that node. Keeping the last added won't be able to help since you have lost list of children once you peeked another node from stack. I believe you need to keep all the children of node as well. In fact that finding position itself is not a problem, but ASTNode.getChildren() invocation is problem. I have to pick something in ASTNode.java as well, why to add single child at at time to ret_vec. Why not return all the children() in one shot? {code} public ArrayList getChildren() { if (super.getChildCount() == 0) { return null; } ArrayList ret_vec = new ArrayList(); for (int i = 0; i < super.getChildCount(); ++i) { ret_vec.add((Node) super.getChild(i)); } return ret_vec; } {code} I will add some profiler output to bug as well. > Avoid expensive call to removeAll in DefaultGraphWalker > --- > > Key: HIVE-11652 > URL: https://issues.apache.org/jira/browse/HIVE-11652 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Physical Optimizer >Affects Versions: 1.3.0, 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, > HIVE-11652.patch > > > When the plan is too large, the removeAll call in DefaultGraphWalker (line > 140) will take very long as it will have to go through the list looking for > each of the nodes. We try to get rid of this call by rewriting the logic in > the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766640#comment-15766640 ] Jesus Camacho Rodriguez commented on HIVE-11652: PS. I have just checked the _getChildren_ method in _ASTNode_: it creates a copy of the list of the original children of the AST node. It might be possible to return the list itself for the walker, thus preventing the copy of the list, which is expensive. Cc [~dhiraj.kumar] > Avoid expensive call to removeAll in DefaultGraphWalker > --- > > Key: HIVE-11652 > URL: https://issues.apache.org/jira/browse/HIVE-11652 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Physical Optimizer >Affects Versions: 1.3.0, 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, > HIVE-11652.patch > > > When the plan is too large, the removeAll call in DefaultGraphWalker (line > 140) will take very long as it will have to go through the list looking for > each of the nodes. We try to get rid of this call by rewriting the logic in > the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766626#comment-15766626 ] Jesus Camacho Rodriguez commented on HIVE-11652: [~dhiraj.kumar], thanks for the feedback. FWIW, when we worked on this patch, we focused on the _removeAll_ call because it was causing slowness for many queries. We run tests on multiple workloads w/o the patch, and we did not observe performance regression wrt previous version. However, it is indeed possible that our queries were not covering all possible cases. The performance of the algorithm could be improved by storing the position of the last added child to the stack. However, this will impact the memory footprint too, as we will need to store an integer per AST node in the stack. I think that is acceptable, but we just should be aware that there is a trade-off here. I would suggest you open a new JIRA case pointing to this one, describe the problem, and provide some query where this issue is noticeable so we can reproduce the issue in a different environment. If you have a clear idea on the solution, you could contribute a patch. Otherwise, I will try to look into the issue asap. -- Btw, wrt 3., have you set _hive.driver.parallel.compilation_ to _true_? That should at least prevent the locking issue. > Avoid expensive call to removeAll in DefaultGraphWalker > --- > > Key: HIVE-11652 > URL: https://issues.apache.org/jira/browse/HIVE-11652 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Physical Optimizer >Affects Versions: 1.3.0, 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, > HIVE-11652.patch > > > When the plan is too large, the removeAll call in DefaultGraphWalker (line > 140) will take very long as it will have to go through the list looking for > each of the nodes. We try to get rid of this call by rewriting the logic in > the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764924#comment-15764924 ] Dhiraj Kumar commented on HIVE-11652: - [~jcamachorodriguez] The patch causes a performance issue. Example query. {code}select a from (select 1 as a ) tbl where a in (1,2,3,4,5,6,7,8,9,10);{code} Method source {code} protected void walk(Node nd) throws SemanticException { // Push the node in the stack opStack.push(nd); // While there are still nodes to dispatch... while (!opStack.empty()) { Node node = opStack.peek(); if (node.getChildren() == null || getDispatchedList().containsAll(node.getChildren())) { // Dispatch current node if (!getDispatchedList().contains(node)) { dispatch(node, opStack); opQueue.add(node); } opStack.pop(); continue; } // Add a single child and restart the loop for (Node childNode : node.getChildren()) { if (!getDispatchedList().contains(childNode)) { opStack.push(childNode); break; } } } // end while } {code} The walk method will push the root node onto stack (where clause in this case, which has 12 child) and will call all its direct child at line 166. It will process single child (in this example) and will again invoke node.getChildren(). A total of 12 invocation of getChildren() will be made. Now, if in clause has huge list, it will causes 1. As many invocation of getChildren() method as there are children. So if "in clause" has 50K values, getChildren() will be invoked 50K times. 2. Huge number of nodes and their repeated invocation puts memory pressure in ASTNode.getChildren(). Since it returns all the children in every case. 3. Since the thread has taken a lock initially before compilation started, it blocks another compilation to make progress. Depending on the query, it is order of magnitude slower. > Avoid expensive call to removeAll in DefaultGraphWalker > --- > > Key: HIVE-11652 > URL: https://issues.apache.org/jira/browse/HIVE-11652 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Physical Optimizer >Affects Versions: 1.3.0, 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, > HIVE-11652.patch > > > When the plan is too large, the removeAll call in DefaultGraphWalker (line > 140) will take very long as it will have to go through the list looking for > each of the nodes. We try to get rid of this call by rewriting the logic in > the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720762#comment-14720762 ] Jesus Camacho Rodriguez commented on HIVE-11652: [~hsubramaniyan]/[~ashutoshc], could you take a look? Thanks Avoid expensive call to removeAll in DefaultGraphWalker --- Key: HIVE-11652 URL: https://issues.apache.org/jira/browse/HIVE-11652 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, HIVE-11652.patch When the plan is too large, the removeAll call in DefaultGraphWalker (line 140) will take very long as it will have to go through the list looking for each of the nodes. We try to get rid of this call by rewriting the logic in the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720810#comment-14720810 ] Ashutosh Chauhan commented on HIVE-11652: - +1 LGTM, while your committing it will be good to add comments describing role of following data structures in walker: * opStack * opQueue * toWalk Avoid expensive call to removeAll in DefaultGraphWalker --- Key: HIVE-11652 URL: https://issues.apache.org/jira/browse/HIVE-11652 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, HIVE-11652.patch When the plan is too large, the removeAll call in DefaultGraphWalker (line 140) will take very long as it will have to go through the list looking for each of the nodes. We try to get rid of this call by rewriting the logic in the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720660#comment-14720660 ] Hive QA commented on HIVE-11652: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752973/HIVE-11652.02.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9380 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5104/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5104/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5104/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12752973 - PreCommit-HIVE-TRUNK-Build Avoid expensive call to removeAll in DefaultGraphWalker --- Key: HIVE-11652 URL: https://issues.apache.org/jira/browse/HIVE-11652 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, HIVE-11652.patch When the plan is too large, the removeAll call in DefaultGraphWalker (line 140) will take very long as it will have to go through the list looking for each of the nodes. We try to get rid of this call by rewriting the logic in the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720870#comment-14720870 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-11652: -- +1, this should nullify HIVE-11341. Thanks Hari Avoid expensive call to removeAll in DefaultGraphWalker --- Key: HIVE-11652 URL: https://issues.apache.org/jira/browse/HIVE-11652 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, HIVE-11652.patch When the plan is too large, the removeAll call in DefaultGraphWalker (line 140) will take very long as it will have to go through the list looking for each of the nodes. We try to get rid of this call by rewriting the logic in the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716683#comment-14716683 ] Hive QA commented on HIVE-11652: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752607/HIVE-11652.01.patch {color:red}ERROR:{color} -1 due to 330 failed/errored test(s), 9379 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.accumulo.predicate.TestAccumuloPredicateHandler.testCreateIteratorSettings org.apache.hadoop.hive.accumulo.predicate.TestAccumuloPredicateHandler.testIteratorIgnoreRowIDFields org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_multiple org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_gby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_gby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_complex_types org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_cube1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_insert_common_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_position org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_compression org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compression org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_multi_insert org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input1_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into_with_schema org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters_overlap org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nullsafe org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715528#comment-14715528 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-11652: -- [~jcamachorodriguez] You may want to look at the latest patch of HIVE-11341 where a similar issue is being looked into by me. {code} // While all its descendants have not been dispatched, // we do not move forward while(!childrenDispatched) { for (Node childNode : nd.getChildren()) { walk(childNode); } childrenDispatched = getDispatchedList().containsAll(nd.getChildren()); } {code} In the above change, I see that you are using the system stack here instead of toWalk list to traverse the children nodes. However a couple of points : 1. Why do you need the 'while(!childrenDispatched)' condition at all. Wont the below code work? {code} if (!getDispatchedList().contains(childNode)) { walk(childNode). } {code} 2. we should make sure that walk() is not called again for a node already dispatched, we should make sure that this logic is present in startWalking as well {code} public void startWalking(CollectionNode startNodes, HashMapNode, Object nodeOutput) throws SemanticException { for Node (nd : startNodes) { // If already dispatched list, continue. if (getDispatchedList().contains(nd)) { continue; } walk(nd); if (nodeOutput != null getDispatchedList().contains(nd)) { nodeOutput.put(nd, retMap.get(nd)); } } } {code} In short, if you are using the system stack via recursion you can eliminate the use of toWalk data structure all together for DefaultGraphWalker. Otherwise, you can replace toWalk to be a stack instead of an arraylist. Thanks Hari Avoid expensive call to removeAll in DefaultGraphWalker --- Key: HIVE-11652 URL: https://issues.apache.org/jira/browse/HIVE-11652 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11652.patch When the plan is too large, the removeAll call in DefaultGraphWalker (line 140) will take very long as it will have to go through the list looking for each of the nodes. We try to get rid of this call by rewriting the logic in the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713645#comment-14713645 ] Hive QA commented on HIVE-11652: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12752449/HIVE-11652.patch {color:red}ERROR:{color} -1 due to 635 failed/errored test(s), 9376 tests executed *Failed tests:* {noformat} TestCustomAuthentication - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2_orc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_view_as_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_create_temp_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_schema_evolution_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_table_bincolserde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_table_colserde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cast_qualified_types org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_gby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_gby_empty org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_gby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_gby_empty org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_subq_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_subq_not_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_udf_udaf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_windowing org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_not_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_udaf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_windowing org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_nested_types org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cp_sel org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view_translate org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cteViews org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_6
[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
[ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715763#comment-14715763 ] Jesus Camacho Rodriguez commented on HIVE-11652: [~hsubramaniyan], thanks for your comments. I checked the patch in HIVE-11341, but my intention in this patch is a bit different: I'd like to understand fully the logic and dependencies, and get rid of the remove call completely. The patch that I uploaded was WIP and indeed not ready: I just wanted to trigger a QA run to check tests where we would possibly hit different issues. I just uploaded a new one that I think that moves in the right direction. As part of this patch, I would like to get rid of the removeAll call in ForwardWalker, ColumnPruner, and ConstantPropagate too, so it is still WIP. Avoid expensive call to removeAll in DefaultGraphWalker --- Key: HIVE-11652 URL: https://issues.apache.org/jira/browse/HIVE-11652 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.3.0, 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11652.01.patch, HIVE-11652.patch When the plan is too large, the removeAll call in DefaultGraphWalker (line 140) will take very long as it will have to go through the list looking for each of the nodes. We try to get rid of this call by rewriting the logic in the walker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)