[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2016-12-21 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767233#comment-15767233
 ] 

Jesus Camacho Rodriguez commented on HIVE-11652:


[~dhiraj.kumar], thanks for creating the issue.

I meant keeping position per node, not only a single position for last node. 
That is why it would add memory pressure.

{quote}
In fact that finding position itself is not a problem, but 
ASTNode.getChildren() invocation is problem.
{quote}
Yes, that was my guess when I checked the code and probably we should focus on 
this part.
Each time we call that method, a new list is created and its children are 
copied to it; I do not not know the original reason why we overwrote 
_getChildren_ to do that in every call and if there is actually a good reason 
for that. I would expect that by default, we return the original list; in turn, 
we return a copy if it is explicitly pointed out (via boolean).

In any case, we can continue the discussion in HIVE-15486.

> Avoid expensive call to removeAll in DefaultGraphWalker
> ---
>
> Key: HIVE-11652
> URL: https://issues.apache.org/jira/browse/HIVE-11652
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Physical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
> HIVE-11652.patch
>
>
> When the plan is too large, the removeAll call in DefaultGraphWalker (line 
> 140) will take very long as it will have to go through the list looking for 
> each of the nodes. We try to get rid of this call by rewriting the logic in 
> the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2016-12-21 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767191#comment-15767191
 ] 

Dhiraj Kumar commented on HIVE-11652:
-

[~jcamachorodriguez] As suggested, I have create another bug at 
[https://issues.apache.org/jira/browse/HIVE-15486] 

I have attached a larger version of the same query that I had put here in the 
trail. It has 50K elements inside IN clause. Although it seems a convoluted 
query, we do have similar query running on our production system. 

We did not face problem until the upgrade to 2.1 from an older version. 
Although, this codepath is not the only problem for this query. But it accounts 
for 50% of the time consumed at hive processing. 

I did not get your suggestion of keeping last added child in the stack. Since 
the node peeked from stack would keep changing,  the list of child would change 
too for that node. Keeping the last added won't be able to help since you have 
lost list of children once you peeked another node from stack. I believe you 
need to keep all the children of node as well. 

In fact that finding position itself is not a problem, but 
ASTNode.getChildren() invocation is problem.  

I have to pick something in ASTNode.java as well,  why to add single child at 
at time to ret_vec. Why not return all the children() in one shot? 

{code}
public ArrayList getChildren() {
if (super.getChildCount() == 0) {
  return null;
}

ArrayList ret_vec = new ArrayList();
for (int i = 0; i < super.getChildCount(); ++i) {
  ret_vec.add((Node) super.getChild(i));
}

return ret_vec;
  }
{code}

I will add some profiler output to bug as well. 

> Avoid expensive call to removeAll in DefaultGraphWalker
> ---
>
> Key: HIVE-11652
> URL: https://issues.apache.org/jira/browse/HIVE-11652
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Physical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
> HIVE-11652.patch
>
>
> When the plan is too large, the removeAll call in DefaultGraphWalker (line 
> 140) will take very long as it will have to go through the list looking for 
> each of the nodes. We try to get rid of this call by rewriting the logic in 
> the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2016-12-21 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766640#comment-15766640
 ] 

Jesus Camacho Rodriguez commented on HIVE-11652:


PS. I have just checked the _getChildren_ method in _ASTNode_: it creates a 
copy of the list of the original children of the AST node. It might be possible 
to return the list itself for the walker, thus preventing the copy of the list, 
which is expensive.

Cc [~dhiraj.kumar]

> Avoid expensive call to removeAll in DefaultGraphWalker
> ---
>
> Key: HIVE-11652
> URL: https://issues.apache.org/jira/browse/HIVE-11652
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Physical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
> HIVE-11652.patch
>
>
> When the plan is too large, the removeAll call in DefaultGraphWalker (line 
> 140) will take very long as it will have to go through the list looking for 
> each of the nodes. We try to get rid of this call by rewriting the logic in 
> the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2016-12-21 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766626#comment-15766626
 ] 

Jesus Camacho Rodriguez commented on HIVE-11652:


[~dhiraj.kumar], thanks for the feedback.

FWIW, when we worked on this patch, we focused on the _removeAll_ call because 
it was causing slowness for many queries. We run tests on multiple workloads 
w/o the patch, and we did not observe performance regression wrt previous 
version. However, it is indeed possible that our queries were not covering all 
possible cases.

The performance of the algorithm could be improved by storing the position of 
the last added child to the stack. However, this will impact the memory 
footprint too, as we will need to store an integer per AST node in the stack. I 
think that is acceptable, but we just should be aware that there is a trade-off 
here.

I would suggest you open a new JIRA case pointing to this one, describe the 
problem, and provide some query where this issue is noticeable so we can 
reproduce the issue in a different environment. If you have a clear idea on the 
solution, you could contribute a patch. Otherwise, I will try to look into the 
issue asap.

--

Btw, wrt 3., have you set _hive.driver.parallel.compilation_ to _true_? That 
should at least prevent the locking issue.

> Avoid expensive call to removeAll in DefaultGraphWalker
> ---
>
> Key: HIVE-11652
> URL: https://issues.apache.org/jira/browse/HIVE-11652
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Physical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
> HIVE-11652.patch
>
>
> When the plan is too large, the removeAll call in DefaultGraphWalker (line 
> 140) will take very long as it will have to go through the list looking for 
> each of the nodes. We try to get rid of this call by rewriting the logic in 
> the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2016-12-20 Thread Dhiraj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764924#comment-15764924
 ] 

Dhiraj Kumar commented on HIVE-11652:
-

[~jcamachorodriguez] The patch causes a performance issue. 

Example query. 
{code}select a from (select 1 as a ) tbl where a in 
(1,2,3,4,5,6,7,8,9,10);{code}

Method source
{code}  
protected void walk(Node nd) throws SemanticException {
// Push the node in the stack
opStack.push(nd);

// While there are still nodes to dispatch...
while (!opStack.empty()) {
  Node node = opStack.peek();

  if (node.getChildren() == null ||
  getDispatchedList().containsAll(node.getChildren())) {
// Dispatch current node
if (!getDispatchedList().contains(node)) {
  dispatch(node, opStack);
  opQueue.add(node);
}
opStack.pop();
continue;
  }

  // Add a single child and restart the loop
  for (Node childNode : node.getChildren()) {
if (!getDispatchedList().contains(childNode)) {
  opStack.push(childNode);
  break;
}
  }
} // end while
  }
{code}

The walk method will push the root node onto stack (where clause in this case, 
which has 12 child) and will call all its direct child at line 166. It will 
process single child (in this example) and will again invoke 
node.getChildren(). A total of 12 invocation of getChildren() will be made. 

Now, if in clause has huge list, it will causes

1. As many invocation of getChildren() method as there are children. So if "in 
clause" has 50K values,  getChildren() will be invoked 50K times.
2. Huge number of nodes and their repeated invocation puts memory pressure in 
ASTNode.getChildren(). Since it returns all the children in every case. 
3. Since the thread has taken a lock initially before compilation started, it 
blocks another compilation to make progress. 

Depending on the query, it is order of magnitude slower.  
 


> Avoid expensive call to removeAll in DefaultGraphWalker
> ---
>
> Key: HIVE-11652
> URL: https://issues.apache.org/jira/browse/HIVE-11652
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer, Physical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
> HIVE-11652.patch
>
>
> When the plan is too large, the removeAll call in DefaultGraphWalker (line 
> 140) will take very long as it will have to go through the list looking for 
> each of the nodes. We try to get rid of this call by rewriting the logic in 
> the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2015-08-28 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720762#comment-14720762
 ] 

Jesus Camacho Rodriguez commented on HIVE-11652:


[~hsubramaniyan]/[~ashutoshc], could you take a look? Thanks

 Avoid expensive call to removeAll in DefaultGraphWalker
 ---

 Key: HIVE-11652
 URL: https://issues.apache.org/jira/browse/HIVE-11652
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.3.0, 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
 HIVE-11652.patch


 When the plan is too large, the removeAll call in DefaultGraphWalker (line 
 140) will take very long as it will have to go through the list looking for 
 each of the nodes. We try to get rid of this call by rewriting the logic in 
 the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2015-08-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720810#comment-14720810
 ] 

Ashutosh Chauhan commented on HIVE-11652:
-

+1 LGTM, while your committing it will be good to add comments describing role 
of following data structures in walker:
 
* opStack
* opQueue
* toWalk

 Avoid expensive call to removeAll in DefaultGraphWalker
 ---

 Key: HIVE-11652
 URL: https://issues.apache.org/jira/browse/HIVE-11652
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.3.0, 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
 HIVE-11652.patch


 When the plan is too large, the removeAll call in DefaultGraphWalker (line 
 140) will take very long as it will have to go through the list looking for 
 each of the nodes. We try to get rid of this call by rewriting the logic in 
 the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720660#comment-14720660
 ] 

Hive QA commented on HIVE-11652:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12752973/HIVE-11652.02.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9380 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5104/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5104/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5104/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12752973 - PreCommit-HIVE-TRUNK-Build

 Avoid expensive call to removeAll in DefaultGraphWalker
 ---

 Key: HIVE-11652
 URL: https://issues.apache.org/jira/browse/HIVE-11652
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.3.0, 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
 HIVE-11652.patch


 When the plan is too large, the removeAll call in DefaultGraphWalker (line 
 140) will take very long as it will have to go through the list looking for 
 each of the nodes. We try to get rid of this call by rewriting the logic in 
 the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2015-08-28 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720870#comment-14720870
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11652:
--

+1, this should nullify HIVE-11341.

Thanks
Hari

 Avoid expensive call to removeAll in DefaultGraphWalker
 ---

 Key: HIVE-11652
 URL: https://issues.apache.org/jira/browse/HIVE-11652
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.3.0, 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
 HIVE-11652.patch


 When the plan is too large, the removeAll call in DefaultGraphWalker (line 
 140) will take very long as it will have to go through the list looking for 
 each of the nodes. We try to get rid of this call by rewriting the logic in 
 the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2015-08-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716683#comment-14716683
 ] 

Hive QA commented on HIVE-11652:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12752607/HIVE-11652.01.patch

{color:red}ERROR:{color} -1 due to 330 failed/errored test(s), 9379 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.accumulo.predicate.TestAccumuloPredicateHandler.testCreateIteratorSettings
org.apache.hadoop.hive.accumulo.predicate.TestAccumuloPredicateHandler.testIteratorIgnoreRowIDFields
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_gby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_gby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_complex_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_cube1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_insert_common_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_position
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_compression
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compression
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_multi_insert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input1_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into_with_schema
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters_overlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nullsafe
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view

[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2015-08-26 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715528#comment-14715528
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11652:
--

[~jcamachorodriguez] You may want to look at the latest patch of HIVE-11341 
where a similar issue is being looked into by me.

{code}
// While all its descendants have not been dispatched,
// we do not move forward
while(!childrenDispatched) {
  for (Node childNode : nd.getChildren()) {
 walk(childNode);
   }
   childrenDispatched = getDispatchedList().containsAll(nd.getChildren());
 }
{code}

In the above change, I see that you are using the system stack here instead of 
toWalk list to traverse the children nodes. 
However a couple of points :
1. Why do you need the 'while(!childrenDispatched)' condition at all. Wont the 
below code work? 
{code}
 if (!getDispatchedList().contains(childNode)) {
  walk(childNode). 
 }
{code}

2. we should make sure that walk() is not called again for a node already 
dispatched, we should make sure that this logic is present in startWalking as 
well
{code}
  public void startWalking(CollectionNode startNodes,
  HashMapNode, Object nodeOutput) throws SemanticException {
for Node (nd : startNodes) {
  // If already dispatched list, continue.
  if (getDispatchedList().contains(nd)) {
continue;
   }
  walk(nd);
  if (nodeOutput != null  getDispatchedList().contains(nd)) {
nodeOutput.put(nd, retMap.get(nd));
  }
}
  }
{code}

In short, if you are using the system stack via recursion you can eliminate the 
use of toWalk data structure all together for DefaultGraphWalker.
Otherwise, you can replace toWalk to be a stack instead of an arraylist.

Thanks
Hari

 Avoid expensive call to removeAll in DefaultGraphWalker
 ---

 Key: HIVE-11652
 URL: https://issues.apache.org/jira/browse/HIVE-11652
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.3.0, 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11652.patch


 When the plan is too large, the removeAll call in DefaultGraphWalker (line 
 140) will take very long as it will have to go through the list looking for 
 each of the nodes. We try to get rid of this call by rewriting the logic in 
 the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2015-08-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713645#comment-14713645
 ] 

Hive QA commented on HIVE-11652:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12752449/HIVE-11652.patch

{color:red}ERROR:{color} -1 due to 635 failed/errored test(s), 9376 tests 
executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_view_as_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_create_temp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_schema_evolution_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_table_bincolserde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_table_colserde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cast_qualified_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_gby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_gby_empty
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_gby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_gby_empty
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_subq_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_subq_not_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_udf_udaf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_windowing
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_udaf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_windowing
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_nested_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cp_sel
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view_translate
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cteViews
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_date_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_6

[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2015-08-26 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715763#comment-14715763
 ] 

Jesus Camacho Rodriguez commented on HIVE-11652:


[~hsubramaniyan], thanks for your comments.

I checked the patch in HIVE-11341, but my intention in this patch is a bit 
different: I'd like to understand fully the logic and dependencies, and get rid 
of the remove call completely.

The patch that I uploaded was WIP and indeed not ready: I just wanted to 
trigger a QA run to check tests where we would possibly hit different issues. I 
just uploaded a new one that I think that moves in the right direction. As part 
of this patch, I would like to get rid of the removeAll call in ForwardWalker, 
ColumnPruner, and ConstantPropagate too, so it is still WIP.

 Avoid expensive call to removeAll in DefaultGraphWalker
 ---

 Key: HIVE-11652
 URL: https://issues.apache.org/jira/browse/HIVE-11652
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.3.0, 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11652.01.patch, HIVE-11652.patch


 When the plan is too large, the removeAll call in DefaultGraphWalker (line 
 140) will take very long as it will have to go through the list looking for 
 each of the nodes. We try to get rid of this call by rewriting the logic in 
 the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)