from:"Bikas Saha \(JIRA\)"

[jira] [Commented] (TEZ-2119) Counter for launched containers

2020-03-06 Thread Bikas Saha (Jira)



[ 
https://issues.apache.org/jira/browse/TEZ-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053860#comment-17053860
 ] 

Bikas Saha commented on TEZ-2119:
-

Been a while. The intent of total_used might have been to maintain the total 
containers used irrespective of losses, returns and re-acquisitions. 
Initial_held is number held even when nothing is running (hot start).

> Counter for launched containers
> ---
>
> Key: TEZ-2119
> URL: https://issues.apache.org/jira/browse/TEZ-2119
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: László Bodor
>Priority: Major
> Attachments: TEZ-2119.01.patch
>
>
> org.apache.tez.common.counters.DAGCounter
> NUM_SUCCEEDED_TASKS=32976
> TOTAL_LAUNCHED_TASKS=32976
> OTHER_LOCAL_TASKS=2
> DATA_LOCAL_TASKS=9147
> RACK_LOCAL_TASKS=23761
> It would be very nice to have TOTAL_LAUNCHED_CONTAINERS counter added to 
> this. The difference between TOTAL_LAUNCHED_CONTAINERS and 
> TOTAL_LAUNCHED_TASKS should make it easy to see how much container reuse is 
> happening. It is very hard to find out now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (TEZ-1786) Support for speculation of slow tasks

2017-09-06 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156018#comment-16156018
 ] 

Bikas Saha commented on TEZ-1786:
-

Thats correct.

> Support for speculation of slow tasks
> -
>
> Key: TEZ-1786
> URL: https://issues.apache.org/jira/browse/TEZ-1786
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>
> Umbrella jira to track speculation of attempts to mitigate stragglers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3770) DAG-aware YARN task scheduler

2017-06-27 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065756#comment-16065756
 ] 

Bikas Saha commented on TEZ-3770:
-

Just clarifying that the original scheduler was not made dag aware by design. 
It was an attempt to prevent leaky features where code changed across the 
scheduler and the dag state machine. Like it happened in MR code where logic 
was spread all over. The DAG core logic and VertexManager user logic could 
determine the dependencies and priorities of tasks and the scheduler would 
allocate resources based on priority. So other schedulers could be easily 
written since they dont need to understand complex relationships.

However not all of those design assumptions have been validated since we dont 
have many schedulers written :P

> DAG-aware YARN task scheduler
> -
>
> Key: TEZ-3770
> URL: https://issues.apache.org/jira/browse/TEZ-3770
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: TEZ-3770.001.patch
>
>
> There are cases where priority alone does not convey the relationship between 
> tasks, and this can cause problems when scheduling or preempting tasks.  If 
> the YARN task scheduler was aware of the relationship between tasks then it 
> could make smarter decisions when trying to assign tasks to containers or 
> preempt running tasks to schedule pending tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-394) Better scheduling for uneven DAGs

2017-05-30 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030365#comment-16030365
 ] 

Bikas Saha commented on TEZ-394:


Not sure I understood this correctly.

bq.V1->V3->V4->V5
bq.V2->V5
bq.V6->V7

The V2 being lower priority seems to be similar to the original description 
issue of this jira. 

V6 -> V7 being disconnected from the other vertices makes sense. For that using 
current approach of distance from root or distance from leaf, both would give 
V6 high priority. Is the intent to make V6 low priority?

> Better scheduling for uneven DAGs
> -
>
> Key: TEZ-394
> URL: https://issues.apache.org/jira/browse/TEZ-394
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jason Lowe
> Attachments: TEZ-394.001.patch, TEZ-394.002.patch, TEZ-394.003.patch
>
>
>   Consider a series of joins or group by on dataset A with few datasets that 
> takes 10 hours followed by a final join with a dataset X. The vertex that 
> loads dataset X will be one of the top vertexes and initialized early even 
> though its output is not consumed till the end after 10 hours. 
> 1) Could either use delayed start logic for better resource allocation
> 2) Else if they are started upfront, need to handle failure/recovery cases 
> where the nodes which executed the MapTask might have gone down when the 
> final join happens. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (TEZ-3696) Jobs can hang when both concurrency and speculation are enabled

2017-05-04 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997706#comment-15997706
 ] 

Bikas Saha commented on TEZ-3696:
-

Thanks [~ebadger]! I missed that part of the code. Makes sense.

> Jobs can hang when both concurrency and speculation are enabled
> ---
>
> Key: TEZ-3696
> URL: https://issues.apache.org/jira/browse/TEZ-3696
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 0.9.0, 0.8.6
>
> Attachments: TEZ-3696.001.patch, TEZ-3696.002.patch, 
> TEZ-3696.003.patch, TEZ-3696.004.patch
>
>
> We can reproduce the hung job by doing the following: 
> 1. Run a sleep job with a concurrency of 1, speculation enabled, and 3 tasks 
> {noformat}
> HADOOP_CLASSPATH="$TEZ_HOME/*:$TEZ_HOME/lib/*:$TEZ_CONF_DIR" yarn jar 
> $TEZ_HOME/tez-tests-*.jar mrrsleep -Dtez.am.vertex.max-task-concurrency=1 
> -Dtez.am.speculation.enabled=true -Dtez.task.timeout-ms=6 -m 3 -mt 6 
> -ir 0 -irt 0 -r 0 -rt 0
> {noformat}
> 2. Let the 1st task run to completion and then stop the 2nd task so that a 
> speculative attempt is scheduled. Once the speculative attempt is scheduled 
> for the 2nd task, continue the original attempt and let it complete.
> {noformat}
> kill -STOP 
> // wait a few seconds for a speculative attempt to kick off
> kill -CONT 
> {noformat}
> 3. Kill the 3rd task, which will create a 2nd attempt
> {noformat}
> kill -9  
> {noformat}
> 4. The next thing to be drawn off of the queue will be the speculative 
> attempt of the 2nd task. However, it is already completed, so it will just 
> sit in the final state and the job will hang. 
> Basically, for the failure to happen, the number of speculative tasks that 
> are scheduled, but not yet ran has to be >= the concurrency of the job and 
> there has to be at least 1 task failure. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (TEZ-3696) Jobs can hang when both concurrency and speculation are enabled

2017-05-02 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993419#comment-15993419
 ] 

Bikas Saha commented on TEZ-3696:
-

Thanks for the ping. Looking at the code again, I am not sure why I had the 
check for succeeded attempts for sending the completed event. I renamed the 
event type from succeeded to completed in the same patch and hence I may have 
intended to stop differentiating them. But the sending code for the completed 
event was under the succeeded check. That seems inconsistent.

If the above is correct, then perhaps the issue would happen even without 
speculation and on every attempt failure. Because a failed attempt would not 
decrease the running count and so its retry would not get scheduled, leading to 
an off-by-N situation in the concurrency count in the dag scheduler. If this is 
correct, then perhaps the fix is only to send the completed event all the time 
and not make any other changes in the dag scheduler itself.

> Jobs can hang when both concurrency and speculation are enabled
> ---
>
> Key: TEZ-3696
> URL: https://issues.apache.org/jira/browse/TEZ-3696
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: TEZ-3696.001.patch, TEZ-3696.002.patch, 
> TEZ-3696.003.patch
>
>
> We can reproduce the hung job by doing the following: 
> 1. Run a sleep job with a concurrency of 1, speculation enabled, and 3 tasks 
> {noformat}
> HADOOP_CLASSPATH="$TEZ_HOME/*:$TEZ_HOME/lib/*:$TEZ_CONF_DIR" yarn jar 
> $TEZ_HOME/tez-tests-*.jar mrrsleep -Dtez.am.vertex.max-task-concurrency=1 
> -Dtez.am.speculation.enabled=true -Dtez.task.timeout-ms=6 -m 3 -mt 6 
> -ir 0 -irt 0 -r 0 -rt 0
> {noformat}
> 2. Let the 1st task run to completion and then stop the 2nd task so that a 
> speculative attempt is scheduled. Once the speculative attempt is scheduled 
> for the 2nd task, continue the original attempt and let it complete.
> {noformat}
> kill -STOP 
> // wait a few seconds for a speculative attempt to kick off
> kill -CONT 
> {noformat}
> 3. Kill the 3rd task, which will create a 2nd attempt
> {noformat}
> kill -9  
> {noformat}
> 4. The next thing to be drawn off of the queue will be the speculative 
> attempt of the 2nd task. However, it is already completed, so it will just 
> sit in the final state and the job will hang. 
> Basically, for the failure to happen, the number of speculative tasks that 
> are scheduled, but not yet ran has to be >= the concurrency of the job and 
> there has to be at least 1 task failure. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (TEZ-394) Better scheduling for uneven DAGs

2017-02-14 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865631#comment-15865631
 ] 

Bikas Saha edited comment on TEZ-394 at 2/14/17 12:30 PM:
--

Thanks for doing this! I regret not having done this right from the start. 
Mostly looks good to me.

The name of the assigned variable is now misleading because its not topo sorted 
anymore.
{code}+topologicalVertexStack = 
reorderForCriticalPath(topologicalVertexStack,
+vertexMap, inboundVertexMap, outboundVertexMap);{code}

[~rohini] IIRC, this will only change the vertex priority wrt other vertices. 
Vertices would still be scheduled based on their managers and typically based 
on completion of their inputs. So Root1, Root2 would both be ready and start 
running. Int3 would currently be blocked behind both but after this would be 
preferred to Root2 after Int3 is deemed capable of running.

[~gopalv] Would this break any assumptions in Hive?


was (Author: bikassaha):
Thanks for doing this! I regret not having done this right from the start. 
Mostly looks good to me.

The name of the assigned variable is now misleading because its not topo sorted 
anymore.
{code}+topologicalVertexStack = 
reorderForCriticalPath(topologicalVertexStack,
+vertexMap, inboundVertexMap, outboundVertexMap);{code}


[~gopalv] Would this break any assumptions in Hive?

> Better scheduling for uneven DAGs
> -
>
> Key: TEZ-394
> URL: https://issues.apache.org/jira/browse/TEZ-394
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jason Lowe
> Attachments: TEZ-394.001.patch
>
>
>   Consider a series of joins or group by on dataset A with few datasets that 
> takes 10 hours followed by a final join with a dataset X. The vertex that 
> loads dataset X will be one of the top vertexes and initialized early even 
> though its output is not consumed till the end after 10 hours. 
> 1) Could either use delayed start logic for better resource allocation
> 2) Else if they are started upfront, need to handle failure/recovery cases 
> where the nodes which executed the MapTask might have gone down when the 
> final join happens. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (TEZ-394) Better scheduling for uneven DAGs

2017-02-14 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865631#comment-15865631
 ] 

Bikas Saha commented on TEZ-394:


Thanks for doing this! I regret not having done this right from the start. 
Mostly looks good to me.

The name of the assigned variable is now misleading because its not topo sorted 
anymore.
{code}+topologicalVertexStack = 
reorderForCriticalPath(topologicalVertexStack,
+vertexMap, inboundVertexMap, outboundVertexMap);{code}


[~gopalv] Would this break any assumptions in Hive?

> Better scheduling for uneven DAGs
> -
>
> Key: TEZ-394
> URL: https://issues.apache.org/jira/browse/TEZ-394
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Jason Lowe
> Attachments: TEZ-394.001.patch
>
>
>   Consider a series of joins or group by on dataset A with few datasets that 
> takes 10 hours followed by a final join with a dataset X. The vertex that 
> loads dataset X will be one of the top vertexes and initialized early even 
> though its output is not consumed till the end after 10 hours. 
> 1) Could either use delayed start logic for better resource allocation
> 2) Else if they are started upfront, need to handle failure/recovery cases 
> where the nodes which executed the MapTask might have gone down when the 
> final join happens. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (TEZ-3512) Update EdgePlan proto for named edge

2016-12-21 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768737#comment-15768737
 ] 

Bikas Saha commented on TEZ-3512:
-

How can we be sure that SrcDest or DestSrc set by the AM will not conflict with 
an edge name set by the user?
If we can be sure of that in the AM why can we not be sure of that in the 
client?

What am I missing here? Clearly you seem to have something clear in your mind 
that I am missing.

> Update EdgePlan proto for named edge
> 
>
> Key: TEZ-3512
> URL: https://issues.apache.org/jira/browse/TEZ-3512
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3512.1.patch, TEZ-3512.2.patch
>
>
> EdgePlan (protobuf) should have one more field for edge name. Related DAG 
> plan creation and parsing should be modified accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3512) Update EdgePlan proto for named edge

2016-12-21 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768614#comment-15768614
 ] 

Bikas Saha commented on TEZ-3512:
-

I can see that in the patch :) 
But what will the value be for these null names? I ask because you made a valid 
point that any system generated names may collide with user defined names. In 
that case, it better to fail faster (on the client) than later (in the AM). 
This was not a problem earlier because there were no edge names. Hence we need 
to be clear about that now.

> Update EdgePlan proto for named edge
> 
>
> Key: TEZ-3512
> URL: https://issues.apache.org/jira/browse/TEZ-3512
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3512.1.patch, TEZ-3512.2.patch
>
>
> EdgePlan (protobuf) should have one more field for edge name. Related DAG 
> plan creation and parsing should be modified accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3512) Update EdgePlan proto for named edge

2016-12-20 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765960#comment-15765960
 ] 

Bikas Saha commented on TEZ-3512:
-

bq. Default value is inappropriate because any default value may also be used 
by user
What is the solution to the problem then? Even in the AM we can pick a default 
value since the user may have specified that value as the edge name. Is that 
correct? If so, isnt it better to check for that on the client side and fail 
fast (instead of waiting for the job to run and then fail).

Rest looks good to me.

> Update EdgePlan proto for named edge
> 
>
> Key: TEZ-3512
> URL: https://issues.apache.org/jira/browse/TEZ-3512
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3512.1.patch, TEZ-3512.2.patch
>
>
> EdgePlan (protobuf) should have one more field for edge name. Related DAG 
> plan creation and parsing should be modified accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3512) Update EdgePlan proto for named edge

2016-12-10 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15738699#comment-15738699
 ] 

Bikas Saha commented on TEZ-3512:
-

When the DAG is being compiled on the client side, a default value could be 
provided to an edge between v1 and v2 if the edge name is null.

In the tests, would be good to have a string s="edge2" and refer to that 
instead of hard coding "edge2" everywhere.

> Update EdgePlan proto for named edge
> 
>
> Key: TEZ-3512
> URL: https://issues.apache.org/jira/browse/TEZ-3512
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3512.1.patch
>
>
> EdgePlan (protobuf) should have one more field for edge name. Related DAG 
> plan creation and parsing should be modified accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3222) Reduce messaging overhead for auto-reduce parallelism case

2016-11-30 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709837#comment-15709837
 ] 

Bikas Saha commented on TEZ-3222:
-

bq. routeInputSourceTaskFailedEventToDestination 
I think this could be deferred because its not the common case because it 
applies mainly for failed task event handling.

> Reduce messaging overhead for auto-reduce parallelism case
> --
>
> Key: TEZ-3222
> URL: https://issues.apache.org/jira/browse/TEZ-3222
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3222.1.patch, TEZ-3222.2.patch, TEZ-3222.3.patch, 
> TEZ-3222.4.patch, TEZ-3222.5.patch, TEZ-3222.6.patch, TEZ-3222.7.patch
>
>
> A dag with 15k x 1000k vertex may auto-reduce to 15k x 1. And while the data  
> size is appropriate for 1 task attempt, this results in an increase in task 
> attempt message processing of 1000x.
> This jira aims to reduce the message processing in the auto-reduced task 
> while keeping the amount of message processing in the AM the same or less.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3222) Reduce messaging overhead for auto-reduce parallelism case

2016-11-16 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671995#comment-15671995
 ] 

Bikas Saha commented on TEZ-3222:
-

Sounds good! Thanks!


> Reduce messaging overhead for auto-reduce parallelism case
> --
>
> Key: TEZ-3222
> URL: https://issues.apache.org/jira/browse/TEZ-3222
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3222.1.patch, TEZ-3222.2.patch, TEZ-3222.3.patch, 
> TEZ-3222.4.patch, TEZ-3222.5.patch, TEZ-3222.6.patch
>
>
> A dag with 15k x 1000k vertex may auto-reduce to 15k x 1. And while the data  
> size is appropriate for 1 task attempt, this results in an increase in task 
> attempt message processing of 1000x.
> This jira aims to reduce the message processing in the auto-reduced task 
> while keeping the amount of message processing in the AM the same or less.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1190) Allow multiple edges between two vertices

2016-11-16 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671986#comment-15671986
 ] 

Bikas Saha commented on TEZ-1190:
-

Still don't understand why making named/unnamed exclusive is going to help. An 
example would help.
Having the backed completely named would render the new feature 
enabled/disabled/exclusive/hydrid become an purely client side API thing. Which 
is naturally seems to be. 
I have not looked at the code for a while :) With that caveat, its not clear to 
me why having the AM handle default names makes things easier vs doing it at 
the client and making the AM agnostic. You are right that since some user code 
also runs in egde/vm plugins the plugin wrapper layer also needs to have the 
default case handling. That would be similar to the handling on the client side.

Either ways work. I guess the reason I am persisting on this is that I think 
the separation of concerns would be better in the case where this is handled in 
the API layer. After all this is more of an API thing (which until now has 
leaked into the server side).


> Allow multiple edges between two vertices
> -
>
> Key: TEZ-1190
> URL: https://issues.apache.org/jira/browse/TEZ-1190
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Zhiyuan Yang
> Attachments: NamedEdgeDesign.pdf, TEZ-1190.prototype.patch
>
>
> This will be helpful in some scenario. In particular example, we can merge 
> two small pipelines together in one pair of vertex. Note it is possible the 
> edge type between the two vertexes are different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1190) Allow multiple edges between two vertices

2016-11-09 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652811#comment-15652811
 ] 

Bikas Saha commented on TEZ-1190:
-

How is the restriction of either all named or unnamed helpful?

How about an implementation approach where all implicit behavior is removed in 
the core layer. And edges are always named. In the DAGClient layer, the new API 
will provide the name or for backwards compatibility the DAGClient layer will 
auto-generate uniqueNames (e.g. SourceDestinationCounter). Thus the implicit 
existing behaviors is limited to the DAGClient layer. 
Similarly for plugins/contexts, we could add a new API with the edge name 
semantics instead of overloading the semantics because both parameters 
(sourceName or edgeName) are string. And we could deprecate the existing 
semantic API that uses vertex names. A translation layer could handle the 
implicit conversion of vertexName to auto-generated names produced by the 
DAGClient. 
The reason I suggest changing the internal core layer to always use edge names 
and keep the compatibility handling to the API layers is that might be cleaner 
cut of the code. And reduce the number of bugs left behind due to missed cases 
of implicit use. By continuing to support implicit names internally we may 
increase the surface area of such leaks.

Rest looks good to me for now. Nice job with capturing the cases! Of course the 
devil is in the details :)
BTW, the doc implicitly assumes that the dummy vertex approach is being dropped 
in favor of the named edge approach?

> Allow multiple edges between two vertices
> -
>
> Key: TEZ-1190
> URL: https://issues.apache.org/jira/browse/TEZ-1190
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Zhiyuan Yang
> Attachments: NamedEdgeDesign.pdf, TEZ-1190.prototype.patch
>
>
> This will be helpful in some scenario. In particular example, we can merge 
> two small pipelines together in one pair of vertex. Note it is possible the 
> edge type between the two vertexes are different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-1190) Allow multiple edges between two vertices

2016-10-31 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623227#comment-15623227
 ] 

Bikas Saha edited comment on TEZ-1190 at 10/31/16 8:15 PM:
---

+1 for design doc. 

A while back we had discussed about this and thought that an edge name could be 
made optional in the edge definition. When the name is specified, its used. If 
not specified, it defaults to the source/destination as it does today. So 
existing edges would continue to work with the implicit default. That seemed 
like a natural extension API-wise. The internal impl would change to always 
using edge names. The names would be set from the API or implicitly. Would be 
good to know if this design is being used or a new design is being proposed. 
Thanks!



was (Author: bikassaha):
+1 for design doc. 

A while back we had discussed about this and thought that an edge name could be 
made optional in the edge definition. When the name is specified, its used. If 
not specified, it defaults to the source/destination as it does today. So 
existing edges would continue to work with the implicit default. That seemed 
like a natural extension API-wise. Would be good to know if this design is 
being used or a new design is being proposed. Thanks!


> Allow multiple edges between two vertices
> -
>
> Key: TEZ-1190
> URL: https://issues.apache.org/jira/browse/TEZ-1190
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Zhiyuan Yang
>
> This will be helpful in some scenario. In particular example, we can merge 
> two small pipelines together in one pair of vertex. Note it is possible the 
> edge type between the two vertexes are different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1190) Allow multiple edges between two vertices

2016-10-31 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623227#comment-15623227
 ] 

Bikas Saha commented on TEZ-1190:
-

+1 for design doc. 

A while back we had discussed about this and thought that an edge name could be 
made optional in the edge definition. When the name is specified, its used. If 
not specified, it defaults to the source/destination as it does today. So 
existing edges would continue to work with the implicit default. That seemed 
like a natural extension API-wise. Would be good to know if this design is 
being used or a new design is being proposed. Thanks!


> Allow multiple edges between two vertices
> -
>
> Key: TEZ-1190
> URL: https://issues.apache.org/jira/browse/TEZ-1190
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Zhiyuan Yang
>
> This will be helpful in some scenario. In particular example, we can merge 
> two small pipelines together in one pair of vertex. Note it is possible the 
> edge type between the two vertexes are different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3222) Reduce messaging overhead for auto-reduce parallelism case

2016-10-19 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590836#comment-15590836
 ] 

Bikas Saha commented on TEZ-3222:
-

{code} -return commonRouteMeta[sourceTaskIndex];
+return CompositeEventRouteMetadata.create(1, sourceTaskIndex, 0);
{code}
The removed code is looking up an array indexed by sourceTaskIndex while the 
new code is directly using the sourceTaskIndex. Is there a difference? Also, 
reusing the caching (as done earlier) may improve critical path CPU for object 
creation. Though for broadcast edge I am not sure if CDME is used as of now.

{code}
+message CompositeRoutedDataMovementEventProto {
+  optional int32 source_index = 1;
+  optional int32 target_index = 2;
+  optional int32 count = 3;
+  optional bytes user_payload = 4;
+  optional int32 version = 5;
+}{code}
Can we create a message for CompositeRouteMeta and use it vs expanding its 
contents. That way CompositeRouteMeta could evolve independently.

{code}
 if (event instanceof DataMovementEvent) {
   numDmeEvents.incrementAndGet();
-  processDataMovementEvent((DataMovementEvent)event);
+  DataMovementEvent dmEvent = (DataMovementEvent)event;
+  DataMovementEventPayloadProto shufflePayload;
+  try {
+shufflePayload = 
DataMovementEventPayloadProto.parseFrom(ByteString.copyFrom(dmEvent.getUserPayload()));
+  } catch (InvalidProtocolBufferException e) {
+throw new TezUncheckedException("Unable to parse DataMovementEvent 
payload", e);
+  }
+  BitSet emptyPartitionsBitSet = null;
+  if (shufflePayload.hasEmptyPartitions()) {
+try {
+  byte[] emptyPartitions = 
TezCommonUtils.decompressByteStringToByteArray(shufflePayload.getEmptyPartitions(),
 inflater);
{code} 
I dont think DME's dont have empty partition bitset since they dont have 
multi-partition data. Right? [~rajesh.balamohan] 

Rest looks good to me.
+1
Thanks!



> Reduce messaging overhead for auto-reduce parallelism case
> --
>
> Key: TEZ-3222
> URL: https://issues.apache.org/jira/browse/TEZ-3222
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3222.1.patch, TEZ-3222.2.patch, TEZ-3222.3.patch, 
> TEZ-3222.4.patch, TEZ-3222.5.patch, TEZ-3222.6.patch
>
>
> A dag with 15k x 1000k vertex may auto-reduce to 15k x 1. And while the data  
> size is appropriate for 1 task attempt, this results in an increase in task 
> attempt message processing of 1000x.
> This jira aims to reduce the message processing in the auto-reduced task 
> while keeping the amount of message processing in the AM the same or less.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3163) Reuse and tune Inflaters and Deflaters to speed DME processing

2016-09-19 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504112#comment-15504112
 ] 

Bikas Saha commented on TEZ-3163:
-

/cc [~hitesh] [~aplusplus]

> Reuse and tune Inflaters and Deflaters to speed DME processing
> --
>
> Key: TEZ-3163
> URL: https://issues.apache.org/jira/browse/TEZ-3163
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3163.1-branch-0.7.patch, TEZ-3163.1.patch, 
> TEZ-3163.2.patch, TEZ-3163.PERF.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3388) Provide error information in shuffle response header

2016-07-29 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-3388:

Description: In MR shuffle, if any partition has an error then the reader 
gets an exception while reading the response stream and loses all the data for 
all partitions. Instead if the shuffle response header had more metadata then 
errors could be handled more efficiently. See YARN-1773 for history.  (was: In 
MR shuffle, if any partition has an error then the reader gets an exception 
while reading the response stream and loses all the data for all partitions. 
Instead if the shuffle response header had more metadata then errors could be 
handled more efficiently.)

> Provide error information in shuffle response header
> 
>
> Key: TEZ-3388
> URL: https://issues.apache.org/jira/browse/TEZ-3388
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>
> In MR shuffle, if any partition has an error then the reader gets an 
> exception while reading the response stream and loses all the data for all 
> partitions. Instead if the shuffle response header had more metadata then 
> errors could be handled more efficiently. See YARN-1773 for history.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-3388) Provide error information in shuffle response header

2016-07-29 Thread Bikas Saha (JIRA)

Bikas Saha created TEZ-3388:
---

 Summary: Provide error information in shuffle response header
 Key: TEZ-3388
 URL: https://issues.apache.org/jira/browse/TEZ-3388
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Bikas Saha


In MR shuffle, if any partition has an error then the reader gets an exception 
while reading the response stream and loses all the data for all partitions. 
Instead if the shuffle response header had more metadata then errors could be 
handled more efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3317) Speculative execution starts too early due to 0 progress

2016-07-18 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383436#comment-15383436
 ] 

Bikas Saha commented on TEZ-3317:
-

Sorry I did not understand whats the issue here from the above comment. Is task 
progress = 0 always or occasionally? What the is flow in the buggy situation? 
Is it for cases where the processor makes no progress because the input is slow 
and because input progress is not available to the processor, it reports 0 
progress overall for a long time.

> Speculative execution starts too early due to 0 progress
> 
>
> Key: TEZ-3317
> URL: https://issues.apache.org/jira/browse/TEZ-3317
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>
> Don't know at this point if this is a tez or a PigProcessor issue. There is 
> some setProgress chain that is keeping task progress from being correctly 
> reported. Task status is always zero, so as soon as the first task finishes, 
> tasks up to the speculation limit are always launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3334) Tez Custom Shuffle Handler

2016-07-12 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374071#comment-15374071
 ] 

Bikas Saha commented on TEZ-3334:
-

Also reporting errors properly in the response such that 1 error does not 
corrupt the entire data stream. YARN-1773.

> Tez Custom Shuffle Handler
> --
>
> Key: TEZ-3334
> URL: https://issues.apache.org/jira/browse/TEZ-3334
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>
> For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom 
> shuffle handler could help reduce the number of fetches and could more 
> efficiently fetch data. In particular if a reducer is fetching 100 pieces 
> serially from the same mapper it could do this in one fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3334) Tez Custom Shuffle Handler

2016-07-12 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374065#comment-15374065
 ] 

Bikas Saha commented on TEZ-3334:
-

YARN-4577 for classpath isolation of aux services. 

Perhaps the first thing could be the POC. 

Which is take existing MR shuffle and change its packaging to org.apache.tez. 
Then add it as tez_shuffle in YARN alongside mapreduce_shuffle. And verify that 
tez jobs use Tez shuffle and MR jobs use MR shuffle (both shuffle services 
running the same code effectively). 

After that we can create follow up jiras for new features and improvements to 
tez shuffle.

Sounds like a plan?

> Tez Custom Shuffle Handler
> --
>
> Key: TEZ-3334
> URL: https://issues.apache.org/jira/browse/TEZ-3334
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>
> For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom 
> shuffle handler could help reduce the number of fetches and could more 
> efficiently fetch data. In particular if a reducer is fetching 100 pieces 
> serially from the same mapper it could do this in one fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-1248) Reduce slow-start should special case 1 reducer runs

2016-07-11 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371643#comment-15371643
 ] 

Bikas Saha edited comment on TEZ-1248 at 7/11/16 9:16 PM:
--

lgtm. seems like a simple code change whose side-effect produces the results 
for this jira :P
+1. Thanks for the fix!


was (Author: bikassaha):
lgtm. seems like a simple code change whose side-effect produces the results 
for this jira :P

> Reduce slow-start should special case 1 reducer runs
> 
>
> Key: TEZ-1248
> URL: https://issues.apache.org/jira/browse/TEZ-1248
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.5.0
> Environment: 20 node cluster running tez
>Reporter: Gopal V
>Assignee: Zhiyuan Yang
>Priority: Critical
> Attachments: TEZ-1248.1.patch
>
>
> Reducer slow-start has a performance problem for the small cases where there 
> is just 1 reducer for a case with a single wave.
> Tez knows the split count and wave count, being able to determine if the 
> cluster has enough spare capacity to run the reducer earlier for lower 
> latency in a N-mapper -> 1 reducer case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1248) Reduce slow-start should special case 1 reducer runs

2016-07-11 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371643#comment-15371643
 ] 

Bikas Saha commented on TEZ-1248:
-

lgtm. seems like a simple code change whose side-effect produces the results 
for this jira :P

> Reduce slow-start should special case 1 reducer runs
> 
>
> Key: TEZ-1248
> URL: https://issues.apache.org/jira/browse/TEZ-1248
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.5.0
> Environment: 20 node cluster running tez
>Reporter: Gopal V
>Assignee: Zhiyuan Yang
>Priority: Critical
> Attachments: TEZ-1248.1.patch
>
>
> Reducer slow-start has a performance problem for the small cases where there 
> is just 1 reducer for a case with a single wave.
> Tez knows the split count and wave count, being able to determine if the 
> cluster has enough spare capacity to run the reducer earlier for lower 
> latency in a N-mapper -> 1 reducer case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3334) Tez Custom Shuffle Handler

2016-07-11 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371277#comment-15371277
 ] 

Bikas Saha commented on TEZ-3334:
-

+1. The new YARN aux service isolation work should make this easier to deploy 
alongside the existing MR shuffle while we iron things out.

> Tez Custom Shuffle Handler
> --
>
> Key: TEZ-3334
> URL: https://issues.apache.org/jira/browse/TEZ-3334
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>
> For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom 
> shuffle handler could help reduce the number of fetches and could more 
> efficiently fetch data. In particular if a reducer is fetching 100 pieces 
> serially from the same mapper it could do this in one fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3287) Have UnorderedPartitionedKVWriter honor tez.runtime.empty.partitions.info-via-events.enabled

2016-06-27 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351682#comment-15351682
 ] 

Bikas Saha commented on TEZ-3287:
-

[~rajesh.balamohan] [~sseth] please help review

> Have UnorderedPartitionedKVWriter honor 
> tez.runtime.empty.partitions.info-via-events.enabled
> 
>
> Key: TEZ-3287
> URL: https://issues.apache.org/jira/browse/TEZ-3287
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Tsuyoshi Ozawa
> Attachments: TEZ-3287.001.patch
>
>
> The ordered partitioned output allows applications to specify if empty 
> partition stats should be included as part of DataMovementEvent via a 
> configuration. It seems unordered partitioned output should honor that 
> configuration as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available

2016-06-21 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342583#comment-15342583
 ] 

Bikas Saha commented on TEZ-3291:
-

Sure. lets create a follow up jira.

> Optimize splits grouping when locality information is not available
> ---
>
> Key: TEZ-3291
> URL: https://issues.apache.org/jira/browse/TEZ-3291
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: TEZ-3291.2.patch, TEZ-3291.3.patch, TEZ-3291.4.patch, 
> TEZ-3291.5.patch, TEZ-3291.WIP.patch
>
>
> There are scenarios where splits might not contain the location details. S3 
> is an example, where all splits would have "localhost" for the location 
> details. In such cases, curent split computation does not go through the 
> rack local and allow-small groups optimizations and ends up creating small 
> number of splits. Depending on clusters this can end creating long running 
> map jobs.
> Example with hive:
> ==
> 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small 
> table.
> 2. With query-22, hive requests with the original splits count as 52 and 
> overall length of splits themselves is around 12061817 bytes. 
> {{tez.grouping.min-size}} was set to 16 MB.
> 3. In tez splits grouping, this ends up creating a single split with 52+ 
> files be processed in the split.  In clusters with split locations, this 
> would have landed up with multiple splits since {{allowSmallGroups}} would 
> have kicked in.
> But in S3, since everything would have "localhost" all splits get added to 
> single group. This makes things a lot worse.
> 4. Depending on the dataset and the format, this can be problematic. For 
> instance, file open calls and random seeks can be expensive in S3.
> 5. In this case, 52 files have to be opened and processed by single task in 
> sequential fashion. Had it been processed by multiple tasks, response time 
> would have drastically reduced.
> E.g log details
> {noformat}
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Grouping splits in Tez
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired splits: 110 too large.  Desired 
> splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total 
> length: 12061817 Original splits: 52
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 
> numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: 
> 12061817 numOriginalSplits: 52 . Grouping by length: true count: false
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 
> splitsProcessed: 52
> {noformat}
> Alternate options:
> ==
> 1. Force Hadoop to provide bogus locations for S3. But not sure, if that 
> would be accepted anytime soon. Ref: HADOOP-12878
> 2. Set {{tez.grouping.min-size}} to very very low value. But should the end 
> user always be doing this on query to query basis?
> 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute 
> desiredNumSplits only when number of distinct locations in the splits is > 1. 
> This would force more number of splits to be generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-16 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335081#comment-15335081
 ] 

Bikas Saha commented on TEZ-3296:
-

Thanks! Its clear now.

> Tez job can hang if two vertices at the same root distance have different 
> task requirements
> ---
>
> Key: TEZ-3296
> URL: https://issues.apache.org/jira/browse/TEZ-3296
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 0.7.2, 0.9.0, 0.8.4
>
> Attachments: TEZ-3296.001.patch, taskschedulerlog
>
>
> When two vertices have the same distance from the root Tez will schedule 
> containers with the same priority.  However those vertices could have 
> different task requirements and therefore different capabilities.  As 
> documented in YARN-314, YARN currently doesn't support requests for multiple 
> sizes at the same priority.  In practice this leads to one vertex allocation 
> requests clobbering the other, and that can result in a situation where the 
> Tez AM is waiting on containers it will never receive from the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-16 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334623#comment-15334623
 ] 

Bikas Saha commented on TEZ-3296:
-

Ah. Looks like a result of using priority as a key for unique requests vs using 
it a just priority.

> Tez job can hang if two vertices at the same root distance have different 
> task requirements
> ---
>
> Key: TEZ-3296
> URL: https://issues.apache.org/jira/browse/TEZ-3296
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 0.7.2, 0.9.0, 0.8.4
>
> Attachments: TEZ-3296.001.patch
>
>
> When two vertices have the same distance from the root Tez will schedule 
> containers with the same priority.  However those vertices could have 
> different task requirements and therefore different capabilities.  As 
> documented in YARN-314, YARN currently doesn't support requests for multiple 
> sizes at the same priority.  In practice this leads to one vertex allocation 
> requests clobbering the other, and that can result in a situation where the 
> Tez AM is waiting on containers it will never receive from the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-16 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334623#comment-15334623
 ] 

Bikas Saha edited comment on TEZ-3296 at 6/16/16 8:29 PM:
--

Ah. Looks like a result of using priority as a key for unique requests vs using 
it a just priority. 

Its one thing to not support multiple resource sizes at the same priority and 
another to lose such requests altogether.

Sigh! /cc [~vinodkv] [~wangda]


was (Author: bikassaha):
Ah. Looks like a result of using priority as a key for unique requests vs using 
it a just priority. Sigh! /cc [~vinodkv] [~wangda]

> Tez job can hang if two vertices at the same root distance have different 
> task requirements
> ---
>
> Key: TEZ-3296
> URL: https://issues.apache.org/jira/browse/TEZ-3296
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 0.7.2, 0.9.0, 0.8.4
>
> Attachments: TEZ-3296.001.patch
>
>
> When two vertices have the same distance from the root Tez will schedule 
> containers with the same priority.  However those vertices could have 
> different task requirements and therefore different capabilities.  As 
> documented in YARN-314, YARN currently doesn't support requests for multiple 
> sizes at the same priority.  In practice this leads to one vertex allocation 
> requests clobbering the other, and that can result in a situation where the 
> Tez AM is waiting on containers it will never receive from the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-16 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334623#comment-15334623
 ] 

Bikas Saha edited comment on TEZ-3296 at 6/16/16 8:29 PM:
--

Ah. Looks like a result of using priority as a key for unique requests vs using 
it a just priority. Sigh! /cc [~vinodkv] [~wangda]


was (Author: bikassaha):
Ah. Looks like a result of using priority as a key for unique requests vs using 
it a just priority.

> Tez job can hang if two vertices at the same root distance have different 
> task requirements
> ---
>
> Key: TEZ-3296
> URL: https://issues.apache.org/jira/browse/TEZ-3296
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 0.7.2, 0.9.0, 0.8.4
>
> Attachments: TEZ-3296.001.patch
>
>
> When two vertices have the same distance from the root Tez will schedule 
> containers with the same priority.  However those vertices could have 
> different task requirements and therefore different capabilities.  As 
> documented in YARN-314, YARN currently doesn't support requests for multiple 
> sizes at the same priority.  In practice this leads to one vertex allocation 
> requests clobbering the other, and that can result in a situation where the 
> Tez AM is waiting on containers it will never receive from the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-16 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334492#comment-15334492
 ] 

Bikas Saha edited comment on TEZ-3296 at 6/16/16 7:20 PM:
--

Sure. Lets commit this patch.

Could you please attach the task scheduler logs for the hung job and mention 
conflicting vertices? I follow what you described above and I'd expect the RM 
to return x+y containers at 2G where x is at 1.5G and y at 2G. The AM should 
accept y containers at 2G for vertex2G and x containers at 2G for vertex1.5G 
because 2G > 1.5G and the matching heuristic in AMRMClient considers fits-in vs 
exact match because the RM is always guaranteed to return a container thats 
larger than requested due to rounding. E.g. if the min container size is 1G 
then asking for 1.5G will return 2G containers and the situation would still be 
the same for the vertex1.5G in the AM.

One reason why I think it may hang is if the RM returns x+y containers at 1.5G 
because then y containers for vertex2G would never get a match. Or the RM 
returns less then x+y containers at 2G. The second case would be a bad RM bug 
that should be fixed in YARN urgently. The AM logs would shed some light on 
this.


was (Author: bikassaha):
Sure. Lets commit this patch.

Could you please attach the task scheduler logs for the hung job and mention 
conflicting vertices? I follow what you described above and I'd expect the RM 
to return x+y containers at 2G where x is at 1.5G and y at 2G. The AM should 
accept y containers at 2G for vertex2G and x containers at 2G for vertex1.5G 
because 2G > 1.5G and the matching heuristic in AMRMClient considers fits-in vs 
exact match because the RM is always guaranteed to return a container that 
larger then requested due to rounding. E.g. if the min container size is 1G 
then asking for 1.5G will return 2G containers and the situation would still be 
the same for the vertex1.5G in the AM.

One reason why I think it may hang is if the RM returns x+y containers at 1.5G 
because then y containers for vertex2G would never get a match. Or the RM 
returns less then x+y containers at 2G. The second case would be a bad RM bug 
that should be fixed in YARN urgently. The AM logs would shed some light on 
this.

> Tez job can hang if two vertices at the same root distance have different 
> task requirements
> ---
>
> Key: TEZ-3296
> URL: https://issues.apache.org/jira/browse/TEZ-3296
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3296.001.patch
>
>
> When two vertices have the same distance from the root Tez will schedule 
> containers with the same priority.  However those vertices could have 
> different task requirements and therefore different capabilities.  As 
> documented in YARN-314, YARN currently doesn't support requests for multiple 
> sizes at the same priority.  In practice this leads to one vertex allocation 
> requests clobbering the other, and that can result in a situation where the 
> Tez AM is waiting on containers it will never receive from the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-16 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334492#comment-15334492
 ] 

Bikas Saha edited comment on TEZ-3296 at 6/16/16 7:20 PM:
--

Sure. Lets commit this patch.

Could you please attach the task scheduler logs for the hung job and mention 
conflicting vertices? I follow what you described above and I'd expect the RM 
to return x+y containers at 2G where x is at 1.5G and y at 2G. The AM should 
accept y containers at 2G for vertex2G and x containers at 2G for vertex1.5G 
because 2G > 1.5G and the matching heuristic in AMRMClient considers fits-in vs 
exact match because the RM is always guaranteed to return a container that 
larger then requested due to rounding. E.g. if the min container size is 1G 
then asking for 1.5G will return 2G containers and the situation would still be 
the same for the vertex1.5G in the AM.

One reason why I think it may hang is if the RM returns x+y containers at 1.5G 
because then y containers for vertex2G would never get a match. Or the RM 
returns less then x+y containers at 2G. The second case would be a bad RM bug 
that should be fixed in YARN urgently. The AM logs would shed some light on 
this.


was (Author: bikassaha):
Sure. Lets commit this patch.

Could you please attach the task scheduler logs for the hung job and mention 
conflicting vertices? I follow what you described above and I'd expect the RM 
to return x+y containers at 2G where x is at 1.5G and y at 2G. The AM should 
accept y containers at 2G for vertex2G and x containers at 2G for vertex1.5G 
because 2G > 1.5G and the matching heuristic in AMRMClient considers fitsIn vs 
exact match because the RM is always guaranteed to return a container that 
larger then requested due to rounding. E.g. if the min container size is 1G 
then asking for 1.5G will return 2G containers and the situation would still be 
the same for the vertex1.5G in the AM.

One reason why I think it may hang is if the RM returns x+y containers at 1.5G 
because then y containers for vertex2G would never get a match. Or the RM 
returns less then x+y containers at 2G. The second case would be a bad RM bug 
that should be fixed in YARN urgently. The AM logs would shed some light on 
this.

> Tez job can hang if two vertices at the same root distance have different 
> task requirements
> ---
>
> Key: TEZ-3296
> URL: https://issues.apache.org/jira/browse/TEZ-3296
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3296.001.patch
>
>
> When two vertices have the same distance from the root Tez will schedule 
> containers with the same priority.  However those vertices could have 
> different task requirements and therefore different capabilities.  As 
> documented in YARN-314, YARN currently doesn't support requests for multiple 
> sizes at the same priority.  In practice this leads to one vertex allocation 
> requests clobbering the other, and that can result in a situation where the 
> Tez AM is waiting on containers it will never receive from the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-16 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334492#comment-15334492
 ] 

Bikas Saha commented on TEZ-3296:
-

Sure. Lets commit this patch.

Could you please attach the task scheduler logs for the hung job and mention 
conflicting vertices? I follow what you described above and I'd expect the RM 
to return x+y containers at 2G where x is at 1.5G and y at 2G. The AM should 
accept y containers at 2G for vertex2G and x containers at 2G for vertex1.5G 
because 2G > 1.5G and the matching heuristic in AMRMClient considers fitsIn vs 
exact match because the RM is always guaranteed to return a container that 
larger then requested due to rounding. E.g. if the min container size is 1G 
then asking for 1.5G will return 2G containers and the situation would still be 
the same for the vertex1.5G in the AM.

One reason why I think it may hang is if the RM returns x+y containers at 1.5G 
because then y containers for vertex2G would never get a match. Or the RM 
returns less then x+y containers at 2G. The second case would be a bad RM bug 
that should be fixed in YARN urgently. The AM logs would shed some light on 
this.

> Tez job can hang if two vertices at the same root distance have different 
> task requirements
> ---
>
> Key: TEZ-3296
> URL: https://issues.apache.org/jira/browse/TEZ-3296
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3296.001.patch
>
>
> When two vertices have the same distance from the root Tez will schedule 
> containers with the same priority.  However those vertices could have 
> different task requirements and therefore different capabilities.  As 
> documented in YARN-314, YARN currently doesn't support requests for multiple 
> sizes at the same priority.  In practice this leads to one vertex allocation 
> requests clobbering the other, and that can result in a situation where the 
> Tez AM is waiting on containers it will never receive from the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-13 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328921#comment-15328921
 ] 

Bikas Saha commented on TEZ-3296:
-

Sorry. My bad. I even used a calculator for that :P

If this is urgent I think we can go with the current proposal. Would be good to 
open a follow up item to use a BFS or topo-sort based method that uses the 
priority space more conservatively.

> Tez job can hang if two vertices at the same root distance have different 
> task requirements
> ---
>
> Key: TEZ-3296
> URL: https://issues.apache.org/jira/browse/TEZ-3296
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3296.001.patch
>
>
> When two vertices have the same distance from the root Tez will schedule 
> containers with the same priority.  However those vertices could have 
> different task requirements and therefore different capabilities.  As 
> documented in YARN-314, YARN currently doesn't support requests for multiple 
> sizes at the same priority.  In practice this leads to one vertex allocation 
> requests clobbering the other, and that can result in a situation where the 
> Tez AM is waiting on containers it will never receive from the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available

2016-06-13 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328326#comment-15328326
 ] 

Bikas Saha commented on TEZ-3291:
-

I am with Gopal on the fragility of this workaround. Single machine is 
affected. We assume localhost will not be used but it could. [~gopalv] 
[~rajesh.balamohan] can we please evaluate an extension of fileSizeEstimator or 
something similar to handled this. My gut feeling is that this is not the first 
s3 related issue we will hit and having an abstraction in place might make 
handling future issues better.

> Optimize splits grouping when locality information is not available
> ---
>
> Key: TEZ-3291
> URL: https://issues.apache.org/jira/browse/TEZ-3291
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: TEZ-3291.2.patch, TEZ-3291.3.patch, TEZ-3291.4.patch, 
> TEZ-3291.WIP.patch
>
>
> There are scenarios where splits might not contain the location details. S3 
> is an example, where all splits would have "localhost" for the location 
> details. In such cases, curent split computation does not go through the 
> rack local and allow-small groups optimizations and ends up creating small 
> number of splits. Depending on clusters this can end creating long running 
> map jobs.
> Example with hive:
> ==
> 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small 
> table.
> 2. With query-22, hive requests with the original splits count as 52 and 
> overall length of splits themselves is around 12061817 bytes. 
> {{tez.grouping.min-size}} was set to 16 MB.
> 3. In tez splits grouping, this ends up creating a single split with 52+ 
> files be processed in the split.  In clusters with split locations, this 
> would have landed up with multiple splits since {{allowSmallGroups}} would 
> have kicked in.
> But in S3, since everything would have "localhost" all splits get added to 
> single group. This makes things a lot worse.
> 4. Depending on the dataset and the format, this can be problematic. For 
> instance, file open calls and random seeks can be expensive in S3.
> 5. In this case, 52 files have to be opened and processed by single task in 
> sequential fashion. Had it been processed by multiple tasks, response time 
> would have drastically reduced.
> E.g log details
> {noformat}
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Grouping splits in Tez
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired splits: 110 too large.  Desired 
> splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total 
> length: 12061817 Original splits: 52
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 
> numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: 
> 12061817 numOriginalSplits: 52 . Grouping by length: true count: false
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 
> splitsProcessed: 52
> {noformat}
> Alternate options:
> ==
> 1. Force Hadoop to provide bogus locations for S3. But not sure, if that 
> would be accepted anytime soon. Ref: HADOOP-12878
> 2. Set {{tez.grouping.min-size}} to very very low value. But should the end 
> user always be doing this on query to query basis?
> 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute 
> desiredNumSplits only when number of distinct locations in the splits is > 1. 
> This would force more number of splits to be generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available

2016-06-12 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326689#comment-15326689
 ] 

Bikas Saha commented on TEZ-3291:
-

The comment could be more explicit like "this is a workaround for systems like 
S3 that pass the same fake hostname for all splits"
The log could log the newDesiredSplits and also the final value of desired 
splits such that we get all the info in one log.

> Optimize splits grouping when locality information is not available
> ---
>
> Key: TEZ-3291
> URL: https://issues.apache.org/jira/browse/TEZ-3291
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: TEZ-3291.2.patch, TEZ-3291.WIP.patch
>
>
> There are scenarios where splits might not contain the location details. S3 
> is an example, where all splits would have "localhost" for the location 
> details. In such cases, curent split computation does not go through the 
> rack local and allow-small groups optimizations and ends up creating small 
> number of splits. Depending on clusters this can end creating long running 
> map jobs.
> Example with hive:
> ==
> 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small 
> table.
> 2. With query-22, hive requests with the original splits count as 52 and 
> overall length of splits themselves is around 12061817 bytes. 
> {{tez.grouping.min-size}} was set to 16 MB.
> 3. In tez splits grouping, this ends up creating a single split with 52+ 
> files be processed in the split.  In clusters with split locations, this 
> would have landed up with multiple splits since {{allowSmallGroups}} would 
> have kicked in.
> But in S3, since everything would have "localhost" all splits get added to 
> single group. This makes things a lot worse.
> 4. Depending on the dataset and the format, this can be problematic. For 
> instance, file open calls and random seeks can be expensive in S3.
> 5. In this case, 52 files have to be opened and processed by single task in 
> sequential fashion. Had it been processed by multiple tasks, response time 
> would have drastically reduced.
> E.g log details
> {noformat}
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Grouping splits in Tez
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired splits: 110 too large.  Desired 
> splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total 
> length: 12061817 Original splits: 52
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 
> numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: 
> 12061817 numOriginalSplits: 52 . Grouping by length: true count: false
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 
> splitsProcessed: 52
> {noformat}
> Alternate options:
> ==
> 1. Force Hadoop to provide bogus locations for S3. But not sure, if that 
> would be accepted anytime soon. Ref: HADOOP-12878
> 2. Set {{tez.grouping.min-size}} to very very low value. But should the end 
> user always be doing this on query to query basis?
> 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute 
> desiredNumSplits only when number of distinct locations in the splits is > 1. 
> This would force more number of splits to be generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available

2016-06-12 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326684#comment-15326684
 ] 

Bikas Saha commented on TEZ-3291:
-

Would the split not have the URLs with S3 in them? Wondering how ORC split 
estimator works? If it cases the spit into ORCSplit and inspects internal 
members then perhaps the S3 split could also be cast into the correct object to 
look at the URLs?

> Optimize splits grouping when locality information is not available
> ---
>
> Key: TEZ-3291
> URL: https://issues.apache.org/jira/browse/TEZ-3291
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: TEZ-3291.2.patch, TEZ-3291.WIP.patch
>
>
> There are scenarios where splits might not contain the location details. S3 
> is an example, where all splits would have "localhost" for the location 
> details. In such cases, curent split computation does not go through the 
> rack local and allow-small groups optimizations and ends up creating small 
> number of splits. Depending on clusters this can end creating long running 
> map jobs.
> Example with hive:
> ==
> 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small 
> table.
> 2. With query-22, hive requests with the original splits count as 52 and 
> overall length of splits themselves is around 12061817 bytes. 
> {{tez.grouping.min-size}} was set to 16 MB.
> 3. In tez splits grouping, this ends up creating a single split with 52+ 
> files be processed in the split.  In clusters with split locations, this 
> would have landed up with multiple splits since {{allowSmallGroups}} would 
> have kicked in.
> But in S3, since everything would have "localhost" all splits get added to 
> single group. This makes things a lot worse.
> 4. Depending on the dataset and the format, this can be problematic. For 
> instance, file open calls and random seeks can be expensive in S3.
> 5. In this case, 52 files have to be opened and processed by single task in 
> sequential fashion. Had it been processed by multiple tasks, response time 
> would have drastically reduced.
> E.g log details
> {noformat}
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Grouping splits in Tez
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired splits: 110 too large.  Desired 
> splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total 
> length: 12061817 Original splits: 52
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 
> numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: 
> 12061817 numOriginalSplits: 52 . Grouping by length: true count: false
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 
> splitsProcessed: 52
> {noformat}
> Alternate options:
> ==
> 1. Force Hadoop to provide bogus locations for S3. But not sure, if that 
> would be accepted anytime soon. Ref: HADOOP-12878
> 2. Set {{tez.grouping.min-size}} to very very low value. But should the end 
> user always be doing this on query to query basis?
> 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute 
> desiredNumSplits only when number of distinct locations in the splits is > 1. 
> This would force more number of splits to be generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-12 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326673#comment-15326673
 ] 

Bikas Saha commented on TEZ-3296:
-

bq. Today each vertex uses a set of three priority values, the low, the high, 
and the mean of those two. (Oddly containers for high are never requested in 
practice, just the low and mean.)
The middle priority is default. The lower value (higher pri) is for failed task 
reruns. The higher value (lower pri) was intended was speculative tasks but may 
have been missed being used for that.

Wondering why the app was hung. IIRC YARN keeps the higher resource request 
when there are multiple at the same priority because thats the safer thing to 
do. So when 2 vertices have the same priority but different resources then we 
would expect to get containers for both but with the higher resource value 
across the board.
If the above is correct then perhaps there is a bug in the task scheduler code 
that needs to get fixed which we might miss if we change the vertex priorities 
to be unique as a workaround. The vertex priority change is good in its own 
right. But would be good to make sure we dont have some pending bug in the task 
scheduler that may have other side effects. Could you please attach the task 
scheduler log for the job that hung in case that has some clues.

On the patch itself the formula looks like
(Height*Total*3) + V*3.
Now - (1*24*3) + 20*3 = 150 = (2*24*3) + 2*3
So we could still have collisions depending on the manner in which vertexIds 
get assigned, right? Unless currently we are getting lucky in the vId 
assignment such that vertices close to the root also happen to get low ids.


> Tez job can hang if two vertices at the same root distance have different 
> task requirements
> ---
>
> Key: TEZ-3296
> URL: https://issues.apache.org/jira/browse/TEZ-3296
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3296.001.patch
>
>
> When two vertices have the same distance from the root Tez will schedule 
> containers with the same priority.  However those vertices could have 
> different task requirements and therefore different capabilities.  As 
> documented in YARN-314, YARN currently doesn't support requests for multiple 
> sizes at the same priority.  In practice this leads to one vertex allocation 
> requests clobbering the other, and that can result in a situation where the 
> Tez AM is waiting on containers it will never receive from the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3297) Deadlock scenario in AM during ShuffleVertexManager auto reduce

2016-06-12 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326660#comment-15326660
 ] 

Bikas Saha commented on TEZ-3297:
-

looking at the code further, looks like the crucial change is not holding own 
vertex lock while trying to read src/dest vertex lock. that makes sense and 
seems like a lock ordering issue waiting to happen. Perhaps a quick scan of 
such nested locking is in order in case not already done.

The removal of the overall lock is fine since each internal method invocation 
like getTotalTasks() are already handling their own locking. 

lgtm.

Moving VM invoked sync calls onto the dispatcher is a good idea but would need 
the addition of new callbacks into the VM to notify them of completion of the 
requested vertex state change operation. Since most current VMs dont do much 
after changing parallelism, the change might be simpler to implement now. Not 
sure about Hive custom VMs.

> Deadlock scenario in AM during ShuffleVertexManager auto reduce
> ---
>
> Key: TEZ-3297
> URL: https://issues.apache.org/jira/browse/TEZ-3297
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Priority: Critical
> Attachments: TEZ-3297.1.patch, TEZ-3297.2.patch, am_log, thread_dump
>
>
> Here is what's happening in the attached thread dump.
> App Pool thread #9 does the auto reduce on V2 and initializes the new edge 
> manager, it holds the V2 write lock and wants read lock of source vertex V1. 
> At the same time, another App Pool thread #2 schedules a task of V1 and gets 
> the output spec, so it holds the V1 read lock and wants V2 read lock. 
> Also, dispatcher thread wants the V1 write lock to begin the state machine 
> transition. Since dispatcher thread is at the head of V1 ReadWriteLock queue, 
> thread #9 cannot get V1 read lock even thread #2 is holding V1 read lock. 
> This is a circular lock scenario. #2 blocks dispatcher, dispatcher blocks #9, 
> and #9 blocks #2.
> There is no problem with ReadWriteLock behavior in this case. Please see this 
> java bug report, http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6816565.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3216) Support for more precise partition stats in VertexManagerEvent

2016-06-12 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326650#comment-15326650
 ] 

Bikas Saha commented on TEZ-3216:
-

/cc [~rajesh.balamohan] in case he is interested in this optimization.

> Support for more precise partition stats in VertexManagerEvent
> --
>
> Key: TEZ-3216
> URL: https://issues.apache.org/jira/browse/TEZ-3216
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: TEZ-3216.patch
>
>
> Follow up on TEZ-3206 discussion, at least for some use cases, more accurate 
> partition stats will be useful for DataMovementEvent routing. Maybe we can 
> provide a config option to allow apps to choose the more accurate partition 
> stats over RoaringBitmap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available

2016-06-12 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326649#comment-15326649
 ] 

Bikas Saha commented on TEZ-3291:
-

Why the numLoc=1 check only in the size < min case?

A comment before the code, explaining the above workaround would be useful. 
Also a log statement.

This may affect single node cases because numLoc=1 in that case too. Is there 
any way we can find out if the splits are coming from an S3 like source and use 
that information instead. E.g. something similar to splitSizeEstimator that can 
look at the split and return if its locations are potentially fake.

> Optimize splits grouping when locality information is not available
> ---
>
> Key: TEZ-3291
> URL: https://issues.apache.org/jira/browse/TEZ-3291
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: TEZ-3291.WIP.patch
>
>
> There are scenarios where splits might not contain the location details. S3 
> is an example, where all splits would have "localhost" for the location 
> details. In such cases, curent split computation does not go through the 
> rack local and allow-small groups optimizations and ends up creating small 
> number of splits. Depending on clusters this can end creating long running 
> map jobs.
> Example with hive:
> ==
> 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small 
> table.
> 2. With query-22, hive requests with the original splits count as 52 and 
> overall length of splits themselves is around 12061817 bytes. 
> {{tez.grouping.min-size}} was set to 16 MB.
> 3. In tez splits grouping, this ends up creating a single split with 52+ 
> files be processed in the split.  In clusters with split locations, this 
> would have landed up with multiple splits since {{allowSmallGroups}} would 
> have kicked in.
> But in S3, since everything would have "localhost" all splits get added to 
> single group. This makes things a lot worse.
> 4. Depending on the dataset and the format, this can be problematic. For 
> instance, file open calls and random seeks can be expensive in S3.
> 5. In this case, 52 files have to be opened and processed by single task in 
> sequential fashion. Had it been processed by multiple tasks, response time 
> would have drastically reduced.
> E.g log details
> {noformat}
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Grouping splits in Tez
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired splits: 110 too large.  Desired 
> splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total 
> length: 12061817 Original splits: 52
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 
> numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: 
> 12061817 numOriginalSplits: 52 . Grouping by length: true count: false
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 
> splitsProcessed: 52
> {noformat}
> Alternate options:
> ==
> 1. Force Hadoop to provide bogus locations for S3. But not sure, if that 
> would be accepted anytime soon. Ref: HADOOP-12878
> 2. Set {{tez.grouping.min-size}} to very very low value. But should the end 
> user always be doing this on query to query basis?
> 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute 
> desiredNumSplits only when number of distinct locations in the splits is > 1. 
> This would force more number of splits to be generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3300) Tez UI: A wiki must be created with info about each page in Tez UI

2016-06-12 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326638#comment-15326638
 ] 

Bikas Saha commented on TEZ-3300:
-

Could pages to the wiki be linked directly from the UI page for quick access?

> Tez UI: A wiki must be created with info about each page in Tez UI
> --
>
> Key: TEZ-3300
> URL: https://issues.apache.org/jira/browse/TEZ-3300
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sreenath Somarajapuram
>
> - It would be a page under Tez confluence
> - Must be flexible enough to support different versions of Tez UI, and give 
> context based help.
> - Add a section on understanding various errors displayed in the error-bar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-3300) Tez UI: A wiki must be created with info about each page in Tez UI

2016-06-12 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326638#comment-15326638
 ] 

Bikas Saha edited comment on TEZ-3300 at 6/12/16 9:22 PM:
--

Could pages to the wiki be linked directly from the corresponding UI pages for 
quick access?


was (Author: bikassaha):
Could pages to the wiki be linked directly from the UI page for quick access?

> Tez UI: A wiki must be created with info about each page in Tez UI
> --
>
> Key: TEZ-3300
> URL: https://issues.apache.org/jira/browse/TEZ-3300
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sreenath Somarajapuram
>
> - It would be a page under Tez confluence
> - Must be flexible enough to support different versions of Tez UI, and give 
> context based help.
> - Add a section on understanding various errors displayed in the error-bar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available

2016-06-10 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325748#comment-15325748
 ] 

Bikas Saha commented on TEZ-3291:
-

[~rajesh.balamohan] Is the patch still WIP or ready for final review?

> Optimize splits grouping when locality information is not available
> ---
>
> Key: TEZ-3291
> URL: https://issues.apache.org/jira/browse/TEZ-3291
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: TEZ-3291.WIP.patch
>
>
> There are scenarios where splits might not contain the location details. S3 
> is an example, where all splits would have "localhost" for the location 
> details. In such cases, curent split computation does not go through the 
> rack local and allow-small groups optimizations and ends up creating small 
> number of splits. Depending on clusters this can end creating long running 
> map jobs.
> Example with hive:
> ==
> 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small 
> table.
> 2. With query-22, hive requests with the original splits count as 52 and 
> overall length of splits themselves is around 12061817 bytes. 
> {{tez.grouping.min-size}} was set to 16 MB.
> 3. In tez splits grouping, this ends up creating a single split with 52+ 
> files be processed in the split.  In clusters with split locations, this 
> would have landed up with multiple splits since {{allowSmallGroups}} would 
> have kicked in.
> But in S3, since everything would have "localhost" all splits get added to 
> single group. This makes things a lot worse.
> 4. Depending on the dataset and the format, this can be problematic. For 
> instance, file open calls and random seeks can be expensive in S3.
> 5. In this case, 52 files have to be opened and processed by single task in 
> sequential fashion. Had it been processed by multiple tasks, response time 
> would have drastically reduced.
> E.g log details
> {noformat}
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Grouping splits in Tez
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired splits: 110 too large.  Desired 
> splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total 
> length: 12061817 Original splits: 52
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 
> numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: 
> 12061817 numOriginalSplits: 52 . Grouping by length: true count: false
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 
> splitsProcessed: 52
> {noformat}
> Alternate options:
> ==
> 1. Force Hadoop to provide bogus locations for S3. But not sure, if that 
> would be accepted anytime soon. Ref: HADOOP-12878
> 2. Set {{tez.grouping.min-size}} to very very low value. But should the end 
> user always be doing this on query to query basis?
> 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute 
> desiredNumSplits only when number of distinct locations in the splits is > 1. 
> This would force more number of splits to be generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements

2016-06-10 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324989#comment-15324989
 ] 

Bikas Saha commented on TEZ-3296:
-

Could you please help me understand the logic to make these unique. I am sorry 
I could not follow from the code :)

The minimum solution would be to break ties when needed such that each vertex 
has a unique priority. Right now vertex depth from root is proxying the 
priority. Instead we could do a BFS on the DAG and assign priority based on the 
traversal. Or we could reuse the topological sort in the client (done during 
DAG submission) and assign that as the priority of the vertex.



> Tez job can hang if two vertices at the same root distance have different 
> task requirements
> ---
>
> Key: TEZ-3296
> URL: https://issues.apache.org/jira/browse/TEZ-3296
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3296.001.patch
>
>
> When two vertices have the same distance from the root Tez will schedule 
> containers with the same priority.  However those vertices could have 
> different task requirements and therefore different capabilities.  As 
> documented in YARN-314, YARN currently doesn't support requests for multiple 
> sizes at the same priority.  In practice this leads to one vertex allocation 
> requests clobbering the other, and that can result in a situation where the 
> Tez AM is waiting on containers it will never receive from the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3297) Deadlock scenario in AM during ShuffleVertexManager auto reduce

2016-06-10 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324981#comment-15324981
 ] 

Bikas Saha commented on TEZ-3297:
-

I am not sure we can simply remove the lock since it may affect visibility. 
Also the assumption that task count wont change may be inaccurate in the 
future. With progressive creation of splits task count may change with time. 
Similarly input output specs are theoretically pluggable and different per 
task. Lets be cautious wrt these future features when fixing this issue else we 
may forget about it later on. A deadlock could sometimes be better than wrong 
results :)

> Deadlock scenario in AM during ShuffleVertexManager auto reduce
> ---
>
> Key: TEZ-3297
> URL: https://issues.apache.org/jira/browse/TEZ-3297
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Priority: Critical
> Attachments: TEZ-3297.1.patch, TEZ-3297.2.patch, am_log, thread_dump
>
>
> Here is what's happening in the attached thread dump.
> App Pool thread #9 does the auto reduce on V2 and initializes the new edge 
> manager, it holds the V2 write lock and wants read lock of source vertex V1. 
> At the same time, another App Pool thread #2 schedules a task of V1 and gets 
> the output spec, so it holds the V1 read lock and wants V2 read lock. 
> Also, dispatcher thread wants the V1 write lock to begin the state machine 
> transition. Since dispatcher thread is at the head of V1 ReadWriteLock queue, 
> thread #9 cannot get V1 read lock even thread #2 is holding V1 read lock. 
> This is a circular lock scenario. #2 blocks dispatcher, dispatcher blocks #9, 
> and #9 blocks #2.
> There is no problem with ReadWriteLock behavior in this case. Please see this 
> java bug report, http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6816565.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available

2016-06-09 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323264#comment-15323264
 ] 

Bikas Saha commented on TEZ-3291:
-

I will take a quick look at the patch by EOD. Looks like the main issue was 
that there was some split size heuristic that needed an update to account for 
cases where locations are invalid. The patch is using distinctlocations=1 as a 
proxy for invalid locations. Unless this negatively affects a real single node 
cluster scenario, this should be fine.

> Optimize splits grouping when locality information is not available
> ---
>
> Key: TEZ-3291
> URL: https://issues.apache.org/jira/browse/TEZ-3291
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: TEZ-3291.WIP.patch
>
>
> There are scenarios where splits might not contain the location details. S3 
> is an example, where all splits would have "localhost" for the location 
> details. In such cases, curent split computation does not go through the 
> rack local and allow-small groups optimizations and ends up creating small 
> number of splits. Depending on clusters this can end creating long running 
> map jobs.
> Example with hive:
> ==
> 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small 
> table.
> 2. With query-22, hive requests with the original splits count as 52 and 
> overall length of splits themselves is around 12061817 bytes. 
> {{tez.grouping.min-size}} was set to 16 MB.
> 3. In tez splits grouping, this ends up creating a single split with 52+ 
> files be processed in the split.  In clusters with split locations, this 
> would have landed up with multiple splits since {{allowSmallGroups}} would 
> have kicked in.
> But in S3, since everything would have "localhost" all splits get added to 
> single group. This makes things a lot worse.
> 4. Depending on the dataset and the format, this can be problematic. For 
> instance, file open calls and random seeks can be expensive in S3.
> 5. In this case, 52 files have to be opened and processed by single task in 
> sequential fashion. Had it been processed by multiple tasks, response time 
> would have drastically reduced.
> E.g log details
> {noformat}
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Grouping splits in Tez
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired splits: 110 too large.  Desired 
> splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total 
> length: 12061817 Original splits: 52
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 
> numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: 
> 12061817 numOriginalSplits: 52 . Grouping by length: true count: false
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 
> splitsProcessed: 52
> {noformat}
> Alternate options:
> ==
> 1. Force Hadoop to provide bogus locations for S3. But not sure, if that 
> would be accepted anytime soon. Ref: HADOOP-12878
> 2. Set {{tez.grouping.min-size}} to very very low value. But should the end 
> user always be doing this on query to query basis?
> 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute 
> desiredNumSplits only when number of distinct locations in the splits is > 1. 
> This would force more number of splits to be generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available

2016-06-07 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319263#comment-15319263
 ] 

Bikas Saha commented on TEZ-3291:
-

Then that would be a bug to fix. Hopefully thats what the patch is doing.

> Optimize splits grouping when locality information is not available
> ---
>
> Key: TEZ-3291
> URL: https://issues.apache.org/jira/browse/TEZ-3291
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: TEZ-3291.WIP.patch
>
>
> There are scenarios where splits might not contain the location details. S3 
> is an example, where all splits would have "localhost" for the location 
> details. In such cases, curent split computation does not go through the 
> rack local and allow-small groups optimizations and ends up creating small 
> number of splits. Depending on clusters this can end creating long running 
> map jobs.
> Example with hive:
> ==
> 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small 
> table.
> 2. With query-22, hive requests with the original splits count as 52 and 
> overall length of splits themselves is around 12061817 bytes. 
> {{tez.grouping.min-size}} was set to 16 MB.
> 3. In tez splits grouping, this ends up creating a single split with 52+ 
> files be processed in the split.  In clusters with split locations, this 
> would have landed up with multiple splits since {{allowSmallGroups}} would 
> have kicked in.
> But in S3, since everything would have "localhost" all splits get added to 
> single group. This makes things a lot worse.
> 4. Depending on the dataset and the format, this can be problematic. For 
> instance, file open calls and random seeks can be expensive in S3.
> 5. In this case, 52 files have to be opened and processed by single task in 
> sequential fashion. Had it been processed by multiple tasks, response time 
> would have drastically reduced.
> E.g log details
> {noformat}
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Grouping splits in Tez
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired splits: 110 too large.  Desired 
> splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total 
> length: 12061817 Original splits: 52
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 
> numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: 
> 12061817 numOriginalSplits: 52 . Grouping by length: true count: false
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 
> splitsProcessed: 52
> {noformat}
> Alternate options:
> ==
> 1. Force Hadoop to provide bogus locations for S3. But not sure, if that 
> would be accepted anytime soon. Ref: HADOOP-12878
> 2. Set {{tez.grouping.min-size}} to very very low value. But should the end 
> user always be doing this on query to query basis?
> 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute 
> desiredNumSplits only when number of distinct locations in the splits is > 1. 
> This would force more number of splits to be generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available

2016-06-07 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15318941#comment-15318941
 ] 

Bikas Saha commented on TEZ-3291:
-

IIRC they should because localhost will be treated as a valid machine name and 
it will group as if all splits are on the same machine. The code itself does 
the same thing by adding a same bogus machine location name for all splits that 
have no location. Thereafter the code works identically for splits that have 
real locations and other that have fake locations.

> Optimize splits grouping when locality information is not available
> ---
>
> Key: TEZ-3291
> URL: https://issues.apache.org/jira/browse/TEZ-3291
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: TEZ-3291.WIP.patch
>
>
> There are scenarios where splits might not contain the location details. S3 
> is an example, where all splits would have "localhost" for the location 
> details. In such cases, curent split computation does not go through the 
> rack local and allow-small groups optimizations and ends up creating small 
> number of splits. Depending on clusters this can end creating long running 
> map jobs.
> Example with hive:
> ==
> 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small 
> table.
> 2. With query-22, hive requests with the original splits count as 52 and 
> overall length of splits themselves is around 12061817 bytes. 
> {{tez.grouping.min-size}} was set to 16 MB.
> 3. In tez splits grouping, this ends up creating a single split with 52+ 
> files be processed in the split.  In clusters with split locations, this 
> would have landed up with multiple splits since {{allowSmallGroups}} would 
> have kicked in.
> But in S3, since everything would have "localhost" all splits get added to 
> single group. This makes things a lot worse.
> 4. Depending on the dataset and the format, this can be problematic. For 
> instance, file open calls and random seeks can be expensive in S3.
> 5. In this case, 52 files have to be opened and processed by single task in 
> sequential fashion. Had it been processed by multiple tasks, response time 
> would have drastically reduced.
> E.g log details
> {noformat}
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Grouping splits in Tez
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired splits: 110 too large.  Desired 
> splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total 
> length: 12061817 Original splits: 52
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 
> numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: 
> 12061817 numOriginalSplits: 52 . Grouping by length: true count: false
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 
> splitsProcessed: 52
> {noformat}
> Alternate options:
> ==
> 1. Force Hadoop to provide bogus locations for S3. But not sure, if that 
> would be accepted anytime soon. Ref: HADOOP-12878
> 2. Set {{tez.grouping.min-size}} to very very low value. But should the end 
> user always be doing this on query to query basis?
> 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute 
> desiredNumSplits only when number of distinct locations in the splits is > 1. 
> This would force more number of splits to be generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available

2016-06-06 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316870#comment-15316870
 ] 

Bikas Saha commented on TEZ-3291:
-

Since the data fits within the max size for a grouped split its creating 1 
split. Whats the issue here?

> Optimize splits grouping when locality information is not available
> ---
>
> Key: TEZ-3291
> URL: https://issues.apache.org/jira/browse/TEZ-3291
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: TEZ-3291.WIP.patch
>
>
> There are scenarios where splits might not contain the location details. S3 
> is an example, where all splits would have "localhost" for the location 
> details. In such cases, curent split computation does not go through the 
> rack local and allow-small groups optimizations and ends up creating small 
> number of splits. Depending on clusters this can end creating long running 
> map jobs.
> Example with hive:
> ==
> 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small 
> table.
> 2. With query-22, hive requests with the original splits count as 52 and 
> overall length of splits themselves is around 12061817 bytes. 
> {{tez.grouping.min-size}} was set to 16 MB.
> 3. In tez splits grouping, this ends up creating a single split with 52+ 
> files be processed in the split.  In clusters with split locations, this 
> would have landed up with multiple splits since {{allowSmallGroups}} would 
> have kicked in.
> But in S3, since everything would have "localhost" all splits get added to 
> single group. This makes things a lot worse.
> 4. Depending on the dataset and the format, this can be problematic. For 
> instance, file open calls and random seeks can be expensive in S3.
> 5. In this case, 52 files have to be opened and processed by single task in 
> sequential fashion. Had it been processed by multiple tasks, response time 
> would have drastically reduced.
> E.g log details
> {noformat}
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Grouping splits in Tez
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired splits: 110 too large.  Desired 
> splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total 
> length: 12061817 Original splits: 52
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 
> numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: 
> 12061817 numOriginalSplits: 52 . Grouping by length: true count: false
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 
> splitsProcessed: 52
> {noformat}
> Alternate options:
> ==
> 1. Force Hadoop to provide bogus locations for S3. But not sure, if that 
> would be accepted anytime soon. Ref: HADOOP-12878
> 2. Set {{tez.grouping.min-size}} to very very low value. But should the end 
> user always be doing this on query to query basis?
> 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute 
> desiredNumSplits only when number of distinct locations in the splits is > 1. 
> This would force more number of splits to be generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-06-01 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1536#comment-1536
 ] 

Bikas Saha commented on TEZ-3271:
-

It will help if there is a bit more detail on whats the objective herein?

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3271.1.patch, TEZ-3271.2.patch, TEZ-3271.3.patch
>
>
> mapreduce.map.failures.maxpercent
> mapreduce.reduce.failures.maxpercent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3274) Vertex with MRInput and shuffle input does not respect slow start

2016-05-26 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302549#comment-15302549
 ] 

Bikas Saha commented on TEZ-3274:
-

There probably isnt. We could use this one. Or if you need an urgent point fix 
in this jira some scheduling heuristics could be added optionally to 
RootInputInitializer. Though I am not sure what exactly is happening. Since 
these tasks also read data from HDFS why would be not want them to start asap 
if there is spare capacity. Slow start is effectively also tries to start tasks 
as soon as possible (in fact sooner than its inputs are ready so I am not sure 
why it was called slow start when it could have been called eager start :) ).

> Vertex with MRInput and shuffle input does not respect slow start
> -
>
> Key: TEZ-3274
> URL: https://issues.apache.org/jira/browse/TEZ-3274
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>
> Vertices with shuffle input and MRInput choose RootInputVertexManager (and 
> not ShuffleVertexManager) and start containers and tasks immediately. In this 
> scenario, resources can be wasted since they do not respect 
> tez.shuffle-vertex-manager.min-src-fraction 
> tez.shuffle-vertex-manager.max-src-fraction. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3274) Vertex with MRInput and shuffle input does not respect slow start

2016-05-25 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300753#comment-15300753
 ] 

Bikas Saha commented on TEZ-3274:
-

This is a known limitation. The ideal solution is to split the VertexManager 
from a monolith to an composition of vertex modifier and scheduler. The root 
input manager is a modifier. Auto-reduce is a modifier. Slow start is a 
scheduler.

> Vertex with MRInput and shuffle input does not respect slow start
> -
>
> Key: TEZ-3274
> URL: https://issues.apache.org/jira/browse/TEZ-3274
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>
> Vertices with shuffle input and MRInput choose RootInputVertexManager (and 
> not ShuffleVertexManager) and start containers and tasks immediately. In this 
> scenario, resources can be wasted since they do not respect 
> tez.shuffle-vertex-manager.min-src-fraction 
> tez.shuffle-vertex-manager.max-src-fraction. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter

2016-05-19 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291917#comment-15291917
 ] 

Bikas Saha commented on TEZ-2950:
-

bq. 2. Rely on pipelined shuffle to avoid the final merge.
Per old discussion with [~rajesh.balamohan] avoiding final merge is independent 
of pipeline shuffle and could be enabled without it (this needs code change 
though). Perhaps what you allude to in 4.

> Poor performance of UnorderedPartitionedKVWriter
> 
>
> Key: TEZ-2950
> URL: https://issues.apache.org/jira/browse/TEZ-2950
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Kuhu Shukla
> Attachments: TEZ-2950.001_prelim.patch
>
>
> Came across a job which was taking a long time in 
> UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data 
> from spill files (8500 spills) and then writing the final compressed merge 
> file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not 
> just buffer and keep directly writing to the final file which will save a lot 
> of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3222) Reduce messaging overhead for auto-reduce parallelism case

2016-05-18 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290228#comment-15290228
 ] 

Bikas Saha commented on TEZ-3222:
-

Thanks for the update! And sorry for the delayed response.

{code}@@ -78,10 +78,10 @@ public class BroadcastEdgeManager extends 
EdgeManagerPluginOnDemand {
   }
   
   @Override
-  public EventRouteMetadata routeCompositeDataMovementEventToDestination(
+  public CompositeEventRouteMetadata 
routeCompositeDataMovementEventToDestination(
   int sourceTaskIndex, int destinationTaskIndex)
   throws Exception {
-return commonRouteMeta[sourceTaskIndex];
+return CompositeEventRouteMetadata.create(1, sourceTaskIndex, 0);
   }{code}
This should probably used the same caching logic instead creating new objects.

{code}
@@ -360,8 +360,8 @@ public class ShuffleVertexManager extends 
VertexManagerPlugin {
 partitionRange = basePartitionRange;
   }
 
-  return EventRouteMetadata.create(partitionRange, targetIndicesToSend, 
-  sourceIndices[destinationTaskIndex]);
+  return CompositeEventRouteMetadata.create(partitionRange, 
targetIndicesToSend[0], 
+  sourceIndices[destinationTaskIndex][0]);
 }{code}
This is not clear to me. The main reason for array type in EventRouteMetadata 
is this auto-reduce edge manager case where a single source CDME expands to 
multiple DMEs for the same destination task where the expansion number is the 
number of partitions coalesced during auto-reduce. Hence its not clear how 
passing the first element in the array would work.

If the above is true then perhaps we could look at adding EventRouteMetadata at 
a member of CDME cloned from the source for the destination. And in the 
destination, the CDME with route metadata gets expanded into DMEs in the same 
manner as the following code in Edge (which could be moved into a helper method 
on CDME)
{code}-  int numEvents = routeMeta.getNumEvents();
-  int[] sourceIndices = routeMeta.getSourceIndices();
-  int[] targetIndices = routeMeta.getTargetIndices();
-  while (numEventsDone < numEvents && listSize++ < listMaxSize) {
-DataMovementEvent e = 
compEvent.expand(sourceIndices[numEventsDone],
-targetIndices[numEventsDone]);
-numEventsDone++;
-TezEvent tezEventToSend = new TezEvent(e, 
tezEvent.getSourceInfo(),
-tezEvent.getEventReceivedTime());
-tezEventToSend.setDestinationInfo(destinationMetaInfo);
-listToAdd.add(tezEventToSend);
-  }{code}

This would also keep the API unchanged for the edge plugin.

Does the above sound correct to you? I am looking at this code after a while 
and I may have gotten it all wrong :)

The code change in all inputs look quite similar to each other. Any potential 
for common methods?

> Reduce messaging overhead for auto-reduce parallelism case
> --
>
> Key: TEZ-3222
> URL: https://issues.apache.org/jira/browse/TEZ-3222
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3222.1.patch, TEZ-3222.2.patch, TEZ-3222.3.patch, 
> TEZ-3222.4.patch
>
>
> A dag with 15k x 1000k vertex may auto-reduce to 15k x 1. And while the data  
> size is appropriate for 1 task attempt, this results in an increase in task 
> attempt message processing of 1000x.
> This jira aims to reduce the message processing in the auto-reduced task 
> while keeping the amount of message processing in the AM the same or less.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3242) Reduce bytearray copy with TezEvent Serialization and deserialization

2016-05-11 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280723#comment-15280723
 ] 

Bikas Saha commented on TEZ-3242:
-

lgtm.

> Reduce bytearray copy with TezEvent Serialization and deserialization
> -
>
> Key: TEZ-3242
> URL: https://issues.apache.org/jira/browse/TEZ-3242
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.7.2, 0.8.4
>
> Attachments: TEZ-3242-1.patch
>
>
> Byte arrays are created for serializing protobuf messages and parsing them 
> which creates lot of garbage when we have lot of events. 
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at java.util.Arrays.copyOf(Arrays.java:3236)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>   at 
> org.apache.tez.runtime.api.impl.TezEvent.serializeEvent(TezEvent.java:197)
>   at org.apache.tez.runtime.api.impl.TezEvent.write(TezEvent.java:268)
>   at 
> org.apache.tez.runtime.api.impl.TezHeartbeatResponse.write(TezHeartbeatResponse.java:95)
>   at 
> org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:202)
>   at 
> org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:128)
>   at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:82)
>   at org.apache.hadoop.ipc.Server.setupResponse(Server.java:2496)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3244) Allow overlap of input and output memory when they are not concurrent

2016-05-06 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274621#comment-15274621
 ] 

Bikas Saha commented on TEZ-3244:
-

Nice idea! This definitely works for the case where the processor is blocking 
on all the IOs before it outputs anything. Not sure if other users like Hive 
always have that behavior. e.g. it could block on an input and then stream 
through the other input while stream to the output.
Does this need any API on the processor that says which mode to use because in 
the same job some vertices could need all IOs in parallel vs some could allow 
first Is and then Os.

How do we mitigate the risk that exists with any of such approaches that dont 
give memory to an IO when it needs it and causes that IO to not function/crash 
due to starvation?


> Allow overlap of input and output memory when they are not concurrent
> -
>
> Key: TEZ-3244
> URL: https://issues.apache.org/jira/browse/TEZ-3244
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: TEZ-3244.001.patch
>
>
> For cases when memory for inputs and outputs are not needed simultaneously it 
> would be more efficient to allow inputs to use the memory normally set aside 
> for outputs and vice-versa.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3239) ShuffleVertexManager recovery issue when auto parallelism is enabled

2016-05-02 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267913#comment-15267913
 ] 

Bikas Saha commented on TEZ-3239:
-

Barring a bug, this should not be happening in the new recovery design. Thats 
because after a vertex has been reconfigured, the new AM attempt will start a 
NoOp Vertex Manager.

> ShuffleVertexManager recovery issue when auto parallelism is enabled
> 
>
> Key: TEZ-3239
> URL: https://issues.apache.org/jira/browse/TEZ-3239
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Ming Ma
>
> Repro:
> * Enable {{tez.shuffle-vertex-manager.enable.auto-parallel}}.
> * kill the Tez AM container after the job has reached to the point that VM 
> has reconfigured the Edge.
> * The new Tez AM attempt will fail to the following error.
> {noformat}
> org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite source 
> should exist
> at 
> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexStarted(ShuffleVertexManager.java:497)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:589)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:658)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:653)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> {noformat}
> That is because the edge routing type changed to {{DataMovementType.CUSTOM}} 
> after reconfiguration. Allowing {{DataMovementType.CUSTOM}} in the following 
> check seems to fix the issue.
> {noformat}
>   if (entry.getValue().getDataMovementType() == 
> DataMovementType.SCATTER_GATHER) {
> bipartiteSources++;
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3203) DAG hangs when one of the upstream vertices has zero tasks

2016-04-27 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261376#comment-15261376
 ] 

Bikas Saha commented on TEZ-3203:
-

Now looking at the full code based on the findbugs I think I dont know what I 
am talking about :). The last patch is not needed. Patch number 2 from Jason is 
good to go.

> DAG hangs when one of the upstream vertices has zero tasks
> --
>
> Key: TEZ-3203
> URL: https://issues.apache.org/jira/browse/TEZ-3203
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3203.001.patch, TEZ-3203.002.patch, TEZ-3203.3.patch
>
>
> A DAG hangs during execution if it has a vertex with multiple inputs and one 
> of those upstream vertices has zero tasks and is using ShuffleVertexManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3203) DAG hangs when one of the upstream vertices has zero tasks

2016-04-27 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261136#comment-15261136
 ] 

Bikas Saha commented on TEZ-3203:
-

Uploaded new patch. Credit for the jira and patch goes to Jason entirely.

> DAG hangs when one of the upstream vertices has zero tasks
> --
>
> Key: TEZ-3203
> URL: https://issues.apache.org/jira/browse/TEZ-3203
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3203.001.patch, TEZ-3203.002.patch, TEZ-3203.3.patch
>
>
> A DAG hangs during execution if it has a vertex with multiple inputs and one 
> of those upstream vertices has zero tasks and is using ShuffleVertexManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3203) DAG hangs when one of the upstream vertices has zero tasks

2016-04-27 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-3203:

Attachment: TEZ-3203.3.patch

> DAG hangs when one of the upstream vertices has zero tasks
> --
>
> Key: TEZ-3203
> URL: https://issues.apache.org/jira/browse/TEZ-3203
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3203.001.patch, TEZ-3203.002.patch, TEZ-3203.3.patch
>
>
> A DAG hangs during execution if it has a vertex with multiple inputs and one 
> of those upstream vertices has zero tasks and is using ShuffleVertexManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3203) DAG hangs when one of the upstream vertices has zero tasks

2016-04-27 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261125#comment-15261125
 ] 

Bikas Saha commented on TEZ-3203:
-

My bad. I should have been more clear.
The following would be safer than removing the pendingTasks check altogether to 
handled the (potentially impossible) case that pending tasks is still not 
initialized from a -1 value. I can make the change and post the final patch.
{code}
if (numBipartiteSourceTasksCompleted == totalNumBipartiteSourceTasks && 
numPendingTasks >= 0) {
{code}

> DAG hangs when one of the upstream vertices has zero tasks
> --
>
> Key: TEZ-3203
> URL: https://issues.apache.org/jira/browse/TEZ-3203
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3203.001.patch, TEZ-3203.002.patch
>
>
> A DAG hangs during execution if it has a vertex with multiple inputs and one 
> of those upstream vertices has zero tasks and is using ShuffleVertexManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2104) A CrossProductEdge which produces synthetic cross-product parallelism

2016-04-27 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260681#comment-15260681
 ] 

Bikas Saha commented on TEZ-2104:
-

bq. Sorry for the inconsistency. Slow start only make sense for partitioned 
case; for non-partitioned case, we launch a task only if its input is ready.
Why so for non-partitioned?

bq. While #connection is a issue in very large scale, having a grouping layer 
may not make it more scalable. Because everyone gets data from grouping nodes 
and grouping nodes may not have enough network bandwidth.
+1. In my experience, aggregate trees are useful when there is a massive data 
reduction expected using the intermediate aggregation/combine operators. If not 
then the downside of redundant data copying is likely not useful compared to a 
carefully orchestrated sequence of connections that ensures limited and 
uniformly distributed load on data sources. Of course, not saying that we 
already have any heuristics to ensure limited and uniform load on sources now 
:P. If not, then that would be something to consider under this scenario 
because it significantly increases the connections compared to the current 
edges.

TEZ-3209 would be good to have as something that could be used in the shuffle 
edge or cross edge. Wondering if its related to the cross edge idea of having 
filters that prune unwanted partitions, in the sense of removing or merging 
partitions - ie logical partition management, seems to be a unifying idea 
between them.

On that note, if we figure out that some partitions are not needed then will we 
not create any tasks for them? I.e. this information is calculated up front 
before determining tasks? Is this available statically at compile time 
(provided to VM) or needed runtime information (calculated in VM)?

> A CrossProductEdge which produces synthetic cross-product parallelism
> -
>
> Key: TEZ-2104
> URL: https://issues.apache.org/jira/browse/TEZ-2104
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Gopal V
>Assignee: Zhiyuan Yang
>  Labels: gsoc, gsoc2015, hadoop, hive, java, tez
> Attachments: Cartesian product edge design.2.pdf, Cross product edge 
> design.pdf
>
>
> Instead of producing duplicate data for the synthetic cross-product, to fit 
> into partitions, the amount of net IO can be vastly reduced by a special 
> purpose cross-product data movement edge.
> The Shuffle edge routes each partition's output to a single reducer, while 
> the cross-product edge routes it into a matrix of reducers without actually 
> duplicating the disk data.
> A partitioning scheme with 3 partitions on the lhs and rhs of a join 
> operation can be routed into 9 reducers by performing a cross-product similar 
> to 
> (1,2,3) x (a,b,c) = [(1,a), (1,b), (1,c), (2,a), (2,b) ...]
> This turns a single task cross-product model into a distributed cross product.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3232) Disable randomFailingInputs in testFaulttolerance to unblock other tests

2016-04-26 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258790#comment-15258790
 ] 

Bikas Saha commented on TEZ-3232:
-

lgtm

> Disable randomFailingInputs in testFaulttolerance to unblock other tests 
> -
>
> Key: TEZ-3232
> URL: https://issues.apache.org/jira/browse/TEZ-3232
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-3232.1.patch
>
>
> The randomFailingInputs test causes the AM to hit an error condition and fail 
> other tests. For now it should be disabled. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3219) Allow service plugins to define log locations link for remotely run task attempts

2016-04-25 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256846#comment-15256846
 ] 

Bikas Saha commented on TEZ-3219:
-

Is there any issue in having the YARN based plugins provide the existing info 
via the new APIs instead of special casing them in the core code. This way the 
handling of all plugins is identical and makes the flow consistent and better 
for debugging and other issues.
{code}-if (containerId != null && nodeHttpAddress != null) {
-  final String containerIdStr = containerId.toString();
-  inProgressLogsUrl = nodeHttpAddress
-  + "/" + "node/containerlogs"
-  + "/" + containerIdStr
-  + "/" + this.appContext.getUser();
+if (getVertex().getServicePluginInfo().getContainerLauncherName().equals(
+  TezConstants.getTezYarnServicePluginName())
+|| 
getVertex().getServicePluginInfo().getContainerLauncherName().equals(
+  TezConstants.getTezUberServicePluginName())) {
+  if (containerId != null && nodeHttpAddress != null) {
+final String containerIdStr = containerId.toString();
+inProgressLogsUrl = nodeHttpAddress
++ "/" + "node/containerlogs"
++ "/" + containerIdStr
++ "/" + this.appContext.getUser();
+  }
+} else {
+  inProgressLogsUrl = 
appContext.getTaskCommunicatorManager().getInProgressLogsUrl(
+  getVertex().getTaskCommunicatorIdentifier(),
+  attemptId, containerNodeId);
 }{code}

> Allow service plugins to define log locations link for remotely run task 
> attempts 
> --
>
> Key: TEZ-3219
> URL: https://issues.apache.org/jira/browse/TEZ-3219
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Fix For: 0.9.0, 0.8.4
>
> Attachments: TEZ-3219.1.patch, TEZ-3219.2.patch, TEZ-3219.3.patch, 
> TEZ-3219.4.patch, TEZ-3219.5.patch
>
>
> Today log links are generated based on the assumption that they are running 
> in yarn containers. For LLAP-like service plugin runs, the log links are 
> incorrect. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-3222) Reduce messaging overhead for auto-reduce parallelism case

2016-04-21 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252975#comment-15252975
 ] 

Bikas Saha edited comment on TEZ-3222 at 4/21/16 11:12 PM:
---

ShuffleVertexManager, in theory, is a user land object and packaged in 
tez-runtime-library (along with other user land Inputs and Outputs). Hence, 
leaking ShuffleVertexManager into the framework DAG engine is crossing the 
border between user and system. I am afraid we may need to think about a 
different approach.
{code}-if (routeMeta != null) {
+if (edgeManagerOnDemand instanceof CustomShuffleEdgeManager) {
+  EnhancedDataMovementEvent edme = 
compEvent.expandEnhanced(srcTaskIndex, taskIndex, 
edgeManagerOnDemand.getNumDestinationTaskPhysicalInputs(0) / 
edgeManagerOnDemand.getContext().getSourceVertexNumTasks(), 
edgeManagerOnDemand.getNumDestinationTaskPhysicalInputs(taskIndex) / {code}
Throttling of events being fetched by the input such that we dont get 
everything at once alleviated some issues. Is that related to this jira? If 
yes, what is the current jira trying to fix beyond the throttling mitigation? 
Just to throw some light on the criticality.


was (Author: bikassaha):
ShuffleVertexManager, in theory, is a user land object and packaged in 
tez-runtime-library (along with other user land Inputs and Outputs). Hence, 
leaking ShuffleVertexManager into the framework DAG engine is crossing the 
border between user and system. I am afraid we may need to think about a 
different approach.
{code}-if (routeMeta != null) {
+if (edgeManagerOnDemand instanceof CustomShuffleEdgeManager) {
+  EnhancedDataMovementEvent edme = 
compEvent.expandEnhanced(srcTaskIndex, taskIndex, 
edgeManagerOnDemand.getNumDestinationTaskPhysicalInputs(0) / 
edgeManagerOnDemand.getContext().getSourceVertexNumTasks(), 
edgeManagerOnDemand.getNumDestinationTaskPhysicalInputs(taskIndex) / {code}.

Throttling of events being fetched by the input such that we dont get 
everything at once alleviated some issues. Is that related to this jira? If 
yes, what is the current jira trying to fix beyond the throttling mitigation? 
Just to throw some light on the criticality.

> Reduce messaging overhead for auto-reduce parallelism case
> --
>
> Key: TEZ-3222
> URL: https://issues.apache.org/jira/browse/TEZ-3222
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3222.1.patch
>
>
> A dag with 15k x 1000k vertex may auto-reduce to 15k x 1. And while the data  
> size is appropriate for 1 task attempt, this results in an increase in task 
> attempt message processing of 1000x.
> This jira aims to reduce the message processing in the auto-reduced task 
> while keeping the amount of message processing in the AM the same or less.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3222) Reduce messaging overhead for auto-reduce parallelism case

2016-04-21 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252975#comment-15252975
 ] 

Bikas Saha commented on TEZ-3222:
-

ShuffleVertexManager, in theory, is a user land object and packaged in 
tez-runtime-library (along with other user land Inputs and Outputs). Hence, 
leaking ShuffleVertexManager into the framework DAG engine is crossing the 
border between user and system. I am afraid we may need to think about a 
different approach.
{code}-if (routeMeta != null) {
+if (edgeManagerOnDemand instanceof CustomShuffleEdgeManager) {
+  EnhancedDataMovementEvent edme = 
compEvent.expandEnhanced(srcTaskIndex, taskIndex, 
edgeManagerOnDemand.getNumDestinationTaskPhysicalInputs(0) / 
edgeManagerOnDemand.getContext().getSourceVertexNumTasks(), 
edgeManagerOnDemand.getNumDestinationTaskPhysicalInputs(taskIndex) / {code}.

Throttling of events being fetched by the input such that we dont get 
everything at once alleviated some issues. Is that related to this jira? If 
yes, what is the current jira trying to fix beyond the throttling mitigation? 
Just to throw some light on the criticality.

> Reduce messaging overhead for auto-reduce parallelism case
> --
>
> Key: TEZ-3222
> URL: https://issues.apache.org/jira/browse/TEZ-3222
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3222.1.patch
>
>
> A dag with 15k x 1000k vertex may auto-reduce to 15k x 1. And while the data  
> size is appropriate for 1 task attempt, this results in an increase in task 
> attempt message processing of 1000x.
> This jira aims to reduce the message processing in the auto-reduced task 
> while keeping the amount of message processing in the AM the same or less.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3203) DAG hangs when one of the upstream vertices has zero tasks

2016-04-07 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231460#comment-15231460
 ] 

Bikas Saha commented on TEZ-3203:
-

Good catch!

Maybe we can get away with removing the numpendingtasks check here. I am 
worried that doing it earlier may be susceptible to calling scheduleTasks() 
multiple times.
{code}if (numBipartiteSourceTasksCompleted == totalNumBipartiteSourceTasks && 
numPendingTasks > 0) {
  LOG.info("All source tasks assigned. " +
  "Ramping up " + numPendingTasks + 
  " remaining tasks for vertex: " + getContext().getVertexName());
  schedulePendingTasks(numPendingTasks, 1);
  return;
}{code}


> DAG hangs when one of the upstream vertices has zero tasks
> --
>
> Key: TEZ-3203
> URL: https://issues.apache.org/jira/browse/TEZ-3203
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3203.001.patch
>
>
> A DAG hangs during execution if it has a vertex with multiple inputs and one 
> of those upstream vertices has zero tasks and is using ShuffleVertexManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3198) Shuffle failures for the trailing task in a vertex are often fatal to the entire DAG

2016-04-07 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230773#comment-15230773
 ] 

Bikas Saha commented on TEZ-3198:
-

Yes. Looks like our defaults can be better for real life workloads.

> Shuffle failures for the trailing task in a vertex are often fatal to the 
> entire DAG
> 
>
> Key: TEZ-3198
> URL: https://issues.apache.org/jira/browse/TEZ-3198
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0, 0.8.2
>Reporter: Jason Lowe
>Priority: Critical
>
> I've seen an increasing number of cases where a single-node failure caused 
> the whole Tez DAG to fail. These scenarios are common in that they involve 
> the last task of a vertex attempting to complete a shuffle where all the peer 
> tasks have already finished shuffling.  The last task's attempt encounters 
> errors shuffling one of its inputs and keeps reporting it to the AM.  
> Eventually the attempt decides it must be the cause of the shuffle error and 
> fails.  The subsequent attempts all do the same thing, and eventually we hit 
> the task max attempts limit and fail the vertex and DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3161) Allow task to report different kinds of errors - fatal / kill

2016-04-05 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227749#comment-15227749
 ] 

Bikas Saha commented on TEZ-3161:
-

Does a fatal error affect the recovery code path? E.g. fatal error got stored 
but dag failure did not get stored. What happens in recovery? Should dag fail 
after recovery because task fatal error was recovered. Likely yes, but does it 
work? Please ignore this comment in case its already covered.

> Allow task to report different kinds of errors - fatal / kill
> -
>
> Key: TEZ-3161
> URL: https://issues.apache.org/jira/browse/TEZ-3161
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.8.3
>
> Attachments: TEZ-3161.1.txt, TEZ-3161.2.txt, TEZ-3161.3.txt, 
> TEZ-3161.4.txt, TEZ-3161.5.txt, TEZ-3161.6.txt
>
>
> In some cases, task failures will be the same across all attempts - e.g. 
> exceeding memory utilization on an operation. In this case, there's no point 
> in running another attempt of the same task.
> There's other cases where a task may want to mark itself as KILLED - i.e. a 
> temporary error. An example of this is pipelined shuffle.
> Tez should allow both operations.
> cc [~vikram.dixit], [~rajesh.balamohan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3198) Shuffle failures for the trailing task in a vertex are often fatal to the entire DAG

2016-04-05 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227137#comment-15227137
 ] 

Bikas Saha commented on TEZ-3198:
-

Yeah. Looks like a gap in that heuristic. Maybe a test for this case would help 
when we make the next heuristic update.

> Shuffle failures for the trailing task in a vertex are often fatal to the 
> entire DAG
> 
>
> Key: TEZ-3198
> URL: https://issues.apache.org/jira/browse/TEZ-3198
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0, 0.8.2
>Reporter: Jason Lowe
>Priority: Critical
> Fix For: 0.7.1, 0.8.3
>
>
> I've seen an increasing number of cases where a single-node failure caused 
> the whole Tez DAG to fail. These scenarios are common in that they involve 
> the last task of a vertex attempting to complete a shuffle where all the peer 
> tasks have already finished shuffling.  The last task's attempt encounters 
> errors shuffling one of its inputs and keeps reporting it to the AM.  
> Eventually the attempt decides it must be the cause of the shuffle error and 
> fails.  The subsequent attempts all do the same thing, and eventually we hit 
> the task max attempts limit and fail the vertex and DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3193) Deadlock in AM during task commit request

2016-03-31 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221046#comment-15221046
 ] 

Bikas Saha commented on TEZ-3193:
-

This is probably a leftover of removal of such reverse calls. There were more 
of them and some were removed by making sure that such objects/members are 
available locally to the TaskAttemptImpl (from the Task passed in via the 
constructor) instead of calling back into the task to get this object/members. 
Hence, task location hint and taskSpec could be passed in via the constructor  
and referenced locally.
Doing this helps other future scenarios as well. If the TA location hint is 
passed in via a constructor then it could be made different for each attempt. 
E.g. remove the machine for v.1 from the location hint of v.2 for a speculative 
execution so that speculated attempt does not end up on the same machine. There 
is a jira for open for this.
Similarly, change the spec of v.1 have higher memory than the default for that 
vertex because v.0 died with OOM.

> Deadlock in AM during task commit request
> -
>
> Key: TEZ-3193
> URL: https://issues.apache.org/jira/browse/TEZ-3193
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.8.2
>Reporter: Jason Lowe
>Priority: Blocker
>
> The AM can deadlock between TaskImpl and TaskAttemptImpl.  Stacktrace and 
> details in a followup comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2442) Support DFS based shuffle in addition to HTTP shuffle

2016-03-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208891#comment-15208891
 ] 

Bikas Saha commented on TEZ-2442:
-

We typically use fs instead of dfs and DistributedFileSystem is actually the 
name of the HDFS impl of the FileSystem API.

> Support DFS based shuffle in addition to HTTP shuffle
> -
>
> Key: TEZ-2442
> URL: https://issues.apache.org/jira/browse/TEZ-2442
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.5.3
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Attachments: HDFS_based_shuffle_v2.pdf, Tez Shuffle using DFS.pdf, 
> hdfs_broadcast_hack.txt, tez_hdfs_shuffle.patch
>
>
> In Tez, Shuffle is a mechanism by which intermediate data can be shared 
> between stages. Shuffle data is written to local disk and fetched from any 
> remote node using HTTP. A DFS like MapR file system can support writing this 
> shuffle data directly to its DFS using a notion of local volumes and retrieve 
> it using HDFS API from remote node. The current Shuffle implementation 
> assumes local data can only be managed by LocalFileSystem. So it uses 
> RawLocalFileSystem and LocalDirAllocator. If we can remove this assumption 
> and introduce an abstraction to manage local disks, then we can reuse most of 
> the shuffle logic (store, sort) and inject a HDFS API based retrieval instead 
> of HTTP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2442) Support DFS based shuffle in addition to HTTP shuffle

2016-03-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1520#comment-1520
 ] 

Bikas Saha commented on TEZ-2442:
-

IIRC, this is the same for both kinds of shuffle. Because consumers can fetch 
and merge spills as they happen in a pipelined manner as they get the DME for 
each spilled output. The physical fetch method (HTTP or FS) is likely not 
relevant. [~rajesh.balamohan] can correct me if this is inaccurate.

> Support DFS based shuffle in addition to HTTP shuffle
> -
>
> Key: TEZ-2442
> URL: https://issues.apache.org/jira/browse/TEZ-2442
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.5.3
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Attachments: HDFS_based_shuffle_v2.pdf, Tez Shuffle using DFS.pdf, 
> hdfs_broadcast_hack.txt, tez_hdfs_shuffle.patch
>
>
> In Tez, Shuffle is a mechanism by which intermediate data can be shared 
> between stages. Shuffle data is written to local disk and fetched from any 
> remote node using HTTP. A DFS like MapR file system can support writing this 
> shuffle data directly to its DFS using a notion of local volumes and retrieve 
> it using HDFS API from remote node. The current Shuffle implementation 
> assumes local data can only be managed by LocalFileSystem. So it uses 
> RawLocalFileSystem and LocalDirAllocator. If we can remove this assumption 
> and introduce an abstraction to manage local disks, then we can reuse most of 
> the shuffle logic (store, sort) and inject a HDFS API based retrieval instead 
> of HTTP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2442) Support DFS based shuffle in addition to HTTP shuffle

2016-03-22 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207668#comment-15207668
 ] 

Bikas Saha commented on TEZ-2442:
-

Its important to keep in mind that for significant perf gains, final merge of 
the output could be avoided. So the output would live as separate files. Would 
be good for the design to allow for this improvement in the future. ie. allows 
multiple final output files to be written. [~rajesh.balamohan]

> Support DFS based shuffle in addition to HTTP shuffle
> -
>
> Key: TEZ-2442
> URL: https://issues.apache.org/jira/browse/TEZ-2442
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.5.3
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Attachments: HDFS_based_shuffle_v2.pdf, Tez Shuffle using DFS.pdf, 
> hdfs_broadcast_hack.txt, tez_hdfs_shuffle.patch
>
>
> In Tez, Shuffle is a mechanism by which intermediate data can be shared 
> between stages. Shuffle data is written to local disk and fetched from any 
> remote node using HTTP. A DFS like MapR file system can support writing this 
> shuffle data directly to its DFS using a notion of local volumes and retrieve 
> it using HDFS API from remote node. The current Shuffle implementation 
> assumes local data can only be managed by LocalFileSystem. So it uses 
> RawLocalFileSystem and LocalDirAllocator. If we can remove this assumption 
> and introduce an abstraction to manage local disks, then we can reuse most of 
> the shuffle logic (store, sort) and inject a HDFS API based retrieval instead 
> of HTTP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-2442) Support DFS based shuffle in addition to HTTP shuffle

2016-03-22 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207647#comment-15207647
 ] 

Bikas Saha edited comment on TEZ-2442 at 3/23/16 12:59 AM:
---

Should this config instead be something that specifies the class name used to 
do the final read/write or the filesystem scheme to use instead of hard coding 
hdfs? Then we could specify RawLocalImpl/HDFSImpl/WASBImpl/S3Impl or 
local/hdfs/wasb/s3.

Of course that would depend on the impl :)


was (Author: bikassaha):
Should this config instead be something that specifies the class name used to 
do the final read/write or the filesystem scheme to use instead of hard coding 
hdfs? Then we could specify RawLocalImpl/HDFSImpl/WASBImpl/S3Impl or 
local/hdfs/wasb/s3

> Support DFS based shuffle in addition to HTTP shuffle
> -
>
> Key: TEZ-2442
> URL: https://issues.apache.org/jira/browse/TEZ-2442
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.5.3
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Attachments: HDFS_based_shuffle_v2.pdf, Tez Shuffle using DFS.pdf, 
> hdfs_broadcast_hack.txt, tez_hdfs_shuffle.patch
>
>
> In Tez, Shuffle is a mechanism by which intermediate data can be shared 
> between stages. Shuffle data is written to local disk and fetched from any 
> remote node using HTTP. A DFS like MapR file system can support writing this 
> shuffle data directly to its DFS using a notion of local volumes and retrieve 
> it using HDFS API from remote node. The current Shuffle implementation 
> assumes local data can only be managed by LocalFileSystem. So it uses 
> RawLocalFileSystem and LocalDirAllocator. If we can remove this assumption 
> and introduce an abstraction to manage local disks, then we can reuse most of 
> the shuffle logic (store, sort) and inject a HDFS API based retrieval instead 
> of HTTP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2442) Support DFS based shuffle in addition to HTTP shuffle

2016-03-22 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207647#comment-15207647
 ] 

Bikas Saha commented on TEZ-2442:
-

Should this config instead be something that specifies the class name used to 
do the final read/write or the filesystem scheme to use instead of hard coding 
hdfs? Then we could specify RawLocalImpl/HDFSImpl/WASBImpl/S3Impl or 
local/hdfs/wasb/s3

> Support DFS based shuffle in addition to HTTP shuffle
> -
>
> Key: TEZ-2442
> URL: https://issues.apache.org/jira/browse/TEZ-2442
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.5.3
>Reporter: Kannan Rajah
>Assignee: Kannan Rajah
> Attachments: HDFS_based_shuffle_v2.pdf, Tez Shuffle using DFS.pdf, 
> hdfs_broadcast_hack.txt, tez_hdfs_shuffle.patch
>
>
> In Tez, Shuffle is a mechanism by which intermediate data can be shared 
> between stages. Shuffle data is written to local disk and fetched from any 
> remote node using HTTP. A DFS like MapR file system can support writing this 
> shuffle data directly to its DFS using a notion of local volumes and retrieve 
> it using HDFS API from remote node. The current Shuffle implementation 
> assumes local data can only be managed by LocalFileSystem. So it uses 
> RawLocalFileSystem and LocalDirAllocator. If we can remove this assumption 
> and introduce an abstraction to manage local disks, then we can reuse most of 
> the shuffle logic (store, sort) and inject a HDFS API based retrieval instead 
> of HTTP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3181) History parser : Handle invalid/unsupported history event types gracefully

2016-03-21 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205820#comment-15205820
 ] 

Bikas Saha commented on TEZ-3181:
-

I understand. But after that when this incomplete data is passed to 0.8 
analyzers then how can we expect them to work correctly? My concern is that 
consumers of this data may not handle such dropped data and instead depend on 
the parser to ensure that the data is valid. Dropping this event would make the 
data invalid. Perhaps instead of dropping it, we could translate it into 
something that makes sense on the 0.8 side but that would need versioning via 
TEZ-3179. Does this make sense or am I missing something? :) I am not sure how 
making the parser succeed would be an end goal by itself since the parsed data 
is going to be consumer by analyzers.

> History parser : Handle invalid/unsupported history event types gracefully
> --
>
> Key: TEZ-3181
> URL: https://issues.apache.org/jira/browse/TEZ-3181
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-3181.1.patch
>
>
> TEZ-2581 changed/renamed some of HistoryEventType. This causes parser to 
> throw exception when trying to parse 0.7.x ATS data with 0.8.x parser.
> {noformat}
>  Exception in thread "main" java.lang.IllegalArgumentException: No enum 
> constant 
> org.apache.tez.dag.history.HistoryEventType.VERTEX_PARALLELISM_UPDATED
>at java.lang.Enum.valueOf(Enum.java:238)
>at 
> org.apache.tez.dag.history.HistoryEventType.valueOf(HistoryEventType.java:21)
>at 
> org.apache.tez.history.parser.datamodel.VertexInfo.(VertexInfo.java:117)
>at 
> org.apache.tez.history.parser.datamodel.VertexInfo.create(VertexInfo.java:159)
>at 
> org.apache.tez.history.parser.ATSFileParser.processVertices(ATSFileParser.java:98)
>at 
> org.apache.tez.history.parser.ATSFileParser.parseATSZipFile(ATSFileParser.java:202)
>at 
> org.apache.tez.history.parser.ATSFileParser.getDAGData(ATSFileParser.java:70)
> {noformat}
> Long term fix is to have versioning support (TEZ-3179) in ATS data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3181) History parser : Handle invalid/unsupported history event types gracefully

2016-03-21 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205794#comment-15205794
 ] 

Bikas Saha commented on TEZ-3181:
-

Do we need this? Could we use 0.7 parser for 0.7 jobs and 0.8 parser for 0.8 
jobs. My concern is that we use 0.8 parser, ignore some fields that are needed 
while parsing, and then the analyzers will fail or worse produce wrong results. 
This could be because analyzers are expecting a certain structure and that has 
changed in the data.

> History parser : Handle invalid/unsupported history event types gracefully
> --
>
> Key: TEZ-3181
> URL: https://issues.apache.org/jira/browse/TEZ-3181
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-3181.1.patch
>
>
> TEZ-2581 changed/renamed some of HistoryEventType. This causes parser to 
> throw exception when trying to parse 0.7.x ATS data with 0.8.x parser.
> {noformat}
>  Exception in thread "main" java.lang.IllegalArgumentException: No enum 
> constant 
> org.apache.tez.dag.history.HistoryEventType.VERTEX_PARALLELISM_UPDATED
>at java.lang.Enum.valueOf(Enum.java:238)
>at 
> org.apache.tez.dag.history.HistoryEventType.valueOf(HistoryEventType.java:21)
>at 
> org.apache.tez.history.parser.datamodel.VertexInfo.(VertexInfo.java:117)
>at 
> org.apache.tez.history.parser.datamodel.VertexInfo.create(VertexInfo.java:159)
>at 
> org.apache.tez.history.parser.ATSFileParser.processVertices(ATSFileParser.java:98)
>at 
> org.apache.tez.history.parser.ATSFileParser.parseATSZipFile(ATSFileParser.java:202)
>at 
> org.apache.tez.history.parser.ATSFileParser.getDAGData(ATSFileParser.java:70)
> {noformat}
> Long term fix is to have versioning support (TEZ-3179) in ATS data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3168) Provide a more predictable approach for total resource guidance for wave/split calculation

2016-03-19 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200619#comment-15200619
 ] 

Bikas Saha commented on TEZ-3168:
-

For all of the problems with queue capacity, IMO cluster capacity is a more 
stable metric to look at.
Logically, the data is distributed across the cluster and so accounting for 
that dispersion while calculating splits. This also solves the current 
immediate problem of creating too small splits. Essentially the job wants to 
run tasks across all cluster nodes. The queue capacity determines how the job 
gets waves/windows of tasks that move around the cluster to read that data 
locally.

> Provide a more predictable approach for total resource guidance for 
> wave/split calculation 
> ---
>
> Key: TEZ-3168
> URL: https://issues.apache.org/jira/browse/TEZ-3168
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-3168.wip.2.patch, TEZ-3168.wip.patch
>
>
> Currently, Tez uses headroom for checking total available resources. This is 
> flaky as it ends up causing the split count to be determined by a point in 
> time lookup at what is available in the cluster. A better approach would be 
> either the queue size or even cluster size to get a more predictable count. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-3164) Surface error histograms from the AM

2016-03-14 Thread Bikas Saha (JIRA)

Bikas Saha created TEZ-3164:
---

 Summary: Surface error histograms from the AM
 Key: TEZ-3164
 URL: https://issues.apache.org/jira/browse/TEZ-3164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Bikas Saha


Job tasks are constantly probing the cluster. So if there are some issues in 
the cluster then jobs would be the first to notice that. If we can make these 
observations surface to the user then we could quickly identify cluster issues.

Lets say a set of bad machines got added to the cluster and tasks started 
seeing shuffle errors from those machines. This can slow down or hang the job. 
If the AM can surface increased errors counts from source and destination 
machines then that could pin point the bad machines vs having to arrive at 
those machines from first principles and log searching.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3085) In session mode, the credentials passed via the Tez client constructor is not available to all the tasks

2016-03-04 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181432#comment-15181432
 ] 

Bikas Saha commented on TEZ-3085:
-

Yes. And looks like its already mentioned in the first comment of this jira.

> In session mode, the credentials passed via the Tez client constructor is not 
> available to all the tasks
> 
>
> Key: TEZ-3085
> URL: https://issues.apache.org/jira/browse/TEZ-3085
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vinoth Sathappan
>
> The credentials passed through the Tez client constructor isn't available for 
> the tasks in session mode.
> TezClient(String name, TezConfiguration tezConf,
> @Nullable Map localResources,
> @Nullable Credentials credentials)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3085) In session mode, the credentials passed via the Tez client constructor is not available to all the tasks

2016-03-04 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181338#comment-15181338
 ] 

Bikas Saha commented on TEZ-3085:
-

IIRC, didn't we recently start passing AM credentials to the DAG?

> In session mode, the credentials passed via the Tez client constructor is not 
> available to all the tasks
> 
>
> Key: TEZ-3085
> URL: https://issues.apache.org/jira/browse/TEZ-3085
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vinoth Sathappan
>
> The credentials passed through the Tez client constructor isn't available for 
> the tasks in session mode.
> TezClient(String name, TezConfiguration tezConf,
> @Nullable Map localResources,
> @Nullable Credentials credentials)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1210) TezClientUtils.localizeDagPlanAsText() needs to be fixed for session mode

2016-03-04 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180180#comment-15180180
 ] 

Bikas Saha commented on TEZ-1210:
-

The DAGPlan is downloaded as a local resource to be used to run the DAG. In 
session mode the AM is already running and can accept the DAGPlan over RPC. In 
non-session mode is going to be launched (and there is no connection between it 
and the client) and thus the DAGPlan needs to be provides indirectly as a YARN 
LocalResource via HDFS.

> TezClientUtils.localizeDagPlanAsText() needs to be fixed for session mode
> -
>
> Key: TEZ-1210
> URL: https://issues.apache.org/jira/browse/TEZ-1210
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Alexander Pivovarov
>  Labels: newbie
> Fix For: 0.5.2
>
> Attachments: TEZ-1210.1.patch, TEZ-1210.2.patch
>
>
> It writes the dagPlan in text form to the same location. Either it should not 
> be invoked in session mode or it should written with a differentiating prefix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3149) Tez-tools: Add username in DagInfo

2016-02-29 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172648#comment-15172648
 ] 

Bikas Saha commented on TEZ-3149:
-

lgtm . backport to 0.7 would be good. thanks!

> Tez-tools: Add username in DagInfo
> --
>
> Key: TEZ-3149
> URL: https://issues.apache.org/jira/browse/TEZ-3149
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
> Attachments: TEZ-3149.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3014) OOM during Shuffle in JDK 8

2016-02-28 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171245#comment-15171245
 ] 

Bikas Saha commented on TEZ-3014:
-

[~jeagles] [~jlowe] Is this still an issue? OOM + JDK 8. If not, then we could 
close this.

> OOM during Shuffle in JDK 8
> ---
>
> Key: TEZ-3014
> URL: https://issues.apache.org/jira/browse/TEZ-3014
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2580) Remove VertexManagerPlugin#setVertexParallelism with VertexManagerPlugin#reconfigureVertex

2016-02-28 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171239#comment-15171239
 ] 

Bikas Saha commented on TEZ-2580:
-

We can only change this if dependent projects like Hive stop using it or else 
they will fail to compile. Not sure if they have done that.

> Remove VertexManagerPlugin#setVertexParallelism with 
> VertexManagerPlugin#reconfigureVertex
> --
>
> Key: TEZ-2580
> URL: https://issues.apache.org/jira/browse/TEZ-2580
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: TEZ-2580.001.patch
>
>
> This was deprecated in 0.7. Should be replaced with reconfigureVertex() - 
> change of name - to make it consistent with other reconfigureVertex() API's. 
> Should be done just close to release to enabled Hive to continue to build/use 
> master of Tez.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3124) Running task hangs due to missing event to initialize input in recovery

2016-02-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15160058#comment-15160058
 ] 

Bikas Saha commented on TEZ-3124:
-

So in this case task needed event to start and so it hung. If initgenerated 
events is legitimately empty then task will not hang and overall we will not 
hang.

> Running task hangs due to missing event to initialize input in recovery
> ---
>
> Key: TEZ-3124
> URL: https://issues.apache.org/jira/browse/TEZ-3124
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>  Labels: Recovery
> Fix For: 0.8.3
>
> Attachments: TEZ-3124-1.patch, TEZ-3124-2.patch, TEZ-3124-3.patch, 
> TEZ-3124-4.patch, TEZ-3124-5.patch, a.log
>
>
> {noformat}
> 2016-02-09 04:48:42 Starting to run new task attempt: 
> attempt_1454993155302_0001_1_00_61_3
> /attempt_1454993155302_0001_1_00_61
> 2016-02-09 04:48:43,196 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInput|: MRInput using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-02-09 04:48:43,200 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInputLegacy|: MRInput MRInputLegacy deferring initialization
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: All initializers finished
> 2016-02-09 04:48:43,345 [INFO] [TezChild] |resources.MemoryDistributor|: 
> InitialRequests=[MRInput:INPUT:0:org.apache.tez.mapreduce.input.MRInputLegacy],
>  
> [ireduce1:OUTPUT:1802502144:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput]
> 2016-02-09 04:48:43,559 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: 
> ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1]
> 2016-02-09 04:48:43,563 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.3, 
> AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32996
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 
> 2, numScaledRequests: 13, TotalRequested: 1802502144, TotalRequestedScaled: 
> 1.663848132923077E9, TotalJVMHeap: 2577399808, TotalAvailable: 1726857871, 
> TotalRequested/TotalJVMHeap:0.70
> 2016-02-09 04:48:43,564 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[MRInput:org.apache.tez.mapreduce.input.MRInputLegacy:INPUT:0:0], 
> [ireduce1:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:1802502144:1726857871]
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Starting Inputs/Outputs
> 2016-02-09 04:48:43,572 [INFO] [I/O Setup 1 Start: {MRInput}] 
> |runtime.LogicalIOProcessorRuntimeTask|: Started Input with src edge: MRInput
> 2016-02-09 04:48:43,572 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Input: MRInput being auto started by 
> the framework. Subsequent instances will not be auto-started
> 2016-02-09 04:48:43,573 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Num IOs determined for AutoStart: 1
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 IOs to start
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: AutoStartComplete
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |task.TaskRunner2Callable|: Running 
> task, taskAttemptId=attempt_1454993155302_0001_1_00_61_3
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |map.MapProcessor|: Running map: 
> attempt_1454993155302_0001_1_00_61_3_10001
> 2016-02-09 04:48:43,675 [INFO] [TezChild] |impl.ExternalSorter|: ireduce1 
> using: memoryMb=1646, keySerializerClass=class 
> org.apache.hadoop.io.IntWritable, 
> valueSerializerClass=org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer@5f143de6,
>  comparator=org.apache.hadoop.io.IntWritable$Comparator@ec52d1f, 
> partitioner=org.apache.tez.mapreduce.partition.MRPartitioner, 
> serialization=org.apache.hadoop.io.serializer.WritableSerialization
> 2016-02-09 04:48:43,686 [INFO] [TezChild] |impl.PipelinedSorter|: Setting up 
> PipelinedSorter for ireduce1: , UsingHashComparator=false
> 2016-02-09 04:48:45,093 [INFO] [TezChild] |impl.PipelinedSorter|: Newly 
>

[jira] [Commented] (TEZ-3124) Running task hangs due to missing event to initialize input in recovery

2016-02-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15160059#comment-15160059
 ] 

Bikas Saha commented on TEZ-3124:
-

lgtm. +1. Thanks!

> Running task hangs due to missing event to initialize input in recovery
> ---
>
> Key: TEZ-3124
> URL: https://issues.apache.org/jira/browse/TEZ-3124
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>  Labels: Recovery
> Fix For: 0.8.3
>
> Attachments: TEZ-3124-1.patch, TEZ-3124-2.patch, TEZ-3124-3.patch, 
> TEZ-3124-4.patch, TEZ-3124-5.patch, a.log
>
>
> {noformat}
> 2016-02-09 04:48:42 Starting to run new task attempt: 
> attempt_1454993155302_0001_1_00_61_3
> /attempt_1454993155302_0001_1_00_61
> 2016-02-09 04:48:43,196 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInput|: MRInput using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-02-09 04:48:43,200 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInputLegacy|: MRInput MRInputLegacy deferring initialization
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: All initializers finished
> 2016-02-09 04:48:43,345 [INFO] [TezChild] |resources.MemoryDistributor|: 
> InitialRequests=[MRInput:INPUT:0:org.apache.tez.mapreduce.input.MRInputLegacy],
>  
> [ireduce1:OUTPUT:1802502144:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput]
> 2016-02-09 04:48:43,559 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: 
> ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1]
> 2016-02-09 04:48:43,563 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.3, 
> AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32996
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 
> 2, numScaledRequests: 13, TotalRequested: 1802502144, TotalRequestedScaled: 
> 1.663848132923077E9, TotalJVMHeap: 2577399808, TotalAvailable: 1726857871, 
> TotalRequested/TotalJVMHeap:0.70
> 2016-02-09 04:48:43,564 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[MRInput:org.apache.tez.mapreduce.input.MRInputLegacy:INPUT:0:0], 
> [ireduce1:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:1802502144:1726857871]
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Starting Inputs/Outputs
> 2016-02-09 04:48:43,572 [INFO] [I/O Setup 1 Start: {MRInput}] 
> |runtime.LogicalIOProcessorRuntimeTask|: Started Input with src edge: MRInput
> 2016-02-09 04:48:43,572 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Input: MRInput being auto started by 
> the framework. Subsequent instances will not be auto-started
> 2016-02-09 04:48:43,573 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Num IOs determined for AutoStart: 1
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 IOs to start
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: AutoStartComplete
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |task.TaskRunner2Callable|: Running 
> task, taskAttemptId=attempt_1454993155302_0001_1_00_61_3
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |map.MapProcessor|: Running map: 
> attempt_1454993155302_0001_1_00_61_3_10001
> 2016-02-09 04:48:43,675 [INFO] [TezChild] |impl.ExternalSorter|: ireduce1 
> using: memoryMb=1646, keySerializerClass=class 
> org.apache.hadoop.io.IntWritable, 
> valueSerializerClass=org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer@5f143de6,
>  comparator=org.apache.hadoop.io.IntWritable$Comparator@ec52d1f, 
> partitioner=org.apache.tez.mapreduce.partition.MRPartitioner, 
> serialization=org.apache.hadoop.io.serializer.WritableSerialization
> 2016-02-09 04:48:43,686 [INFO] [TezChild] |impl.PipelinedSorter|: Setting up 
> PipelinedSorter for ireduce1: , UsingHashComparator=false
> 2016-02-09 04:48:45,093 [INFO] [TezChild] |impl.PipelinedSorter|: Newly 
> allocated block size=1725956096, index=0, Number of buffers=1, 
> currentAllocatableMemory=0, currentBufferSize=1725956096, total=1725956096
>

[jira] [Commented] (TEZ-3124) Running task hangs due to missing event to initialize input in recovery

2016-02-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159946#comment-15159946
 ] 

Bikas Saha commented on TEZ-3124:
-

Then the fix should be restricted to not logging VertexInitializedEvent if 
shouldSkipInit is true. Initializing the initGeneratedEvent to the old value 
might have the side effect when shouldSkipInit is false. In that case init will 
run again and initGeneratedEvent could have old recovered events and new init 
generated events. Can this happen? Even if no, why add the side effect of 
initing initGeneratedEvent?

Your explanation makes sense for the fix. My concern is for the change to 
initGeneratedEvents.

Orthogonally, initGeneratedEvents could be empty even after init. This is 
valid. Will that be a problem? Asking because in this case we got hung because 
vertex initialized event had empty initGeneratedEvents.

> Running task hangs due to missing event to initialize input in recovery
> ---
>
> Key: TEZ-3124
> URL: https://issues.apache.org/jira/browse/TEZ-3124
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>  Labels: Recovery
> Fix For: 0.8.3
>
> Attachments: TEZ-3124-1.patch, TEZ-3124-2.patch, TEZ-3124-3.patch, 
> TEZ-3124-4.patch, a.log
>
>
> {noformat}
> 2016-02-09 04:48:42 Starting to run new task attempt: 
> attempt_1454993155302_0001_1_00_61_3
> /attempt_1454993155302_0001_1_00_61
> 2016-02-09 04:48:43,196 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInput|: MRInput using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-02-09 04:48:43,200 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInputLegacy|: MRInput MRInputLegacy deferring initialization
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: All initializers finished
> 2016-02-09 04:48:43,345 [INFO] [TezChild] |resources.MemoryDistributor|: 
> InitialRequests=[MRInput:INPUT:0:org.apache.tez.mapreduce.input.MRInputLegacy],
>  
> [ireduce1:OUTPUT:1802502144:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput]
> 2016-02-09 04:48:43,559 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: 
> ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1]
> 2016-02-09 04:48:43,563 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.3, 
> AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32996
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 
> 2, numScaledRequests: 13, TotalRequested: 1802502144, TotalRequestedScaled: 
> 1.663848132923077E9, TotalJVMHeap: 2577399808, TotalAvailable: 1726857871, 
> TotalRequested/TotalJVMHeap:0.70
> 2016-02-09 04:48:43,564 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[MRInput:org.apache.tez.mapreduce.input.MRInputLegacy:INPUT:0:0], 
> [ireduce1:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:1802502144:1726857871]
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Starting Inputs/Outputs
> 2016-02-09 04:48:43,572 [INFO] [I/O Setup 1 Start: {MRInput}] 
> |runtime.LogicalIOProcessorRuntimeTask|: Started Input with src edge: MRInput
> 2016-02-09 04:48:43,572 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Input: MRInput being auto started by 
> the framework. Subsequent instances will not be auto-started
> 2016-02-09 04:48:43,573 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Num IOs determined for AutoStart: 1
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 IOs to start
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: AutoStartComplete
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |task.TaskRunner2Callable|: Running 
> task, taskAttemptId=attempt_1454993155302_0001_1_00_61_3
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |map.MapProcessor|: Running map: 
> attempt_1454993155302_0001_1_00_61_3_10001
> 2016-02-09 04:48:43,675 [INFO] [TezChild] |impl.ExternalSorter|: ireduce1 
> using: memoryMb=1646, keySerializerClass=class 
> org.apache.had

[jira] [Commented] (TEZ-3102) Fetch failure of a speculated task causes job hang

2016-02-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159835#comment-15159835
 ] 

Bikas Saha commented on TEZ-3102:
-

+1.

I think testTaskSucceedAndRetroActiveFailure() should be covering the new code 
changes in the success attempt code path. In the small chance that its not, 
would you please update the test. Thanks!

> Fetch failure of a speculated task causes job hang
> --
>
> Key: TEZ-3102
> URL: https://issues.apache.org/jira/browse/TEZ-3102
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3102.001.patch, TEZ-3102.002.patch
>
>
> If a task speculates then succeeds, one task will be marked successful and 
> the other killed. Then if the task retroactively fails due to fetch failures 
> the Tez AM will fail to reschedule another task. This results in a hung job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-3102) Fetch failure of a speculated task causes job hang

2016-02-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159835#comment-15159835
 ] 

Bikas Saha edited comment on TEZ-3102 at 2/23/16 11:09 PM:
---

+1.

I think testTaskSucceedAndRetroActiveFailure() should already be covering the 
new code changes in the success attempt code path. In the small chance that its 
not, would you please update the test. Thanks!


was (Author: bikassaha):
+1.

I think testTaskSucceedAndRetroActiveFailure() should be covering the new code 
changes in the success attempt code path. In the small chance that its not, 
would you please update the test. Thanks!

> Fetch failure of a speculated task causes job hang
> --
>
> Key: TEZ-3102
> URL: https://issues.apache.org/jira/browse/TEZ-3102
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3102.001.patch, TEZ-3102.002.patch
>
>
> If a task speculates then succeeds, one task will be marked successful and 
> the other killed. Then if the task retroactively fails due to fetch failures 
> the Tez AM will fail to reschedule another task. This results in a hung job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3124) Running task hangs due to missing event to initialize input in recovery

2016-02-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159548#comment-15159548
 ] 

Bikas Saha commented on TEZ-3124:
-

Lets say shouldSkipInit() is false because VertexInitializedEvent !=null but 
ConfigurationDoneEvent == null.
So we will rerun init. And then we will log another VertexInitializedEvent. 
Right? In that case how will the next AM attempt handle multiple 
VertexInitializedEvent?
If we are doing init again, then that process will add new items into 
initGeneratedEvents. So we should not be restoring older initGeneratedEvents 
into the new object or else the new object will have more items than necessary.

So I am not sure what is broken and how the fix is working. Could you please 
help by pointing out the exact sequence of events that causes the issue? Thanks!

> Running task hangs due to missing event to initialize input in recovery
> ---
>
> Key: TEZ-3124
> URL: https://issues.apache.org/jira/browse/TEZ-3124
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>  Labels: Recovery
> Fix For: 0.8.3
>
> Attachments: TEZ-3124-1.patch, TEZ-3124-2.patch, TEZ-3124-3.patch, 
> TEZ-3124-4.patch, a.log
>
>
> {noformat}
> 2016-02-09 04:48:42 Starting to run new task attempt: 
> attempt_1454993155302_0001_1_00_61_3
> /attempt_1454993155302_0001_1_00_61
> 2016-02-09 04:48:43,196 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInput|: MRInput using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-02-09 04:48:43,200 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInputLegacy|: MRInput MRInputLegacy deferring initialization
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: All initializers finished
> 2016-02-09 04:48:43,345 [INFO] [TezChild] |resources.MemoryDistributor|: 
> InitialRequests=[MRInput:INPUT:0:org.apache.tez.mapreduce.input.MRInputLegacy],
>  
> [ireduce1:OUTPUT:1802502144:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput]
> 2016-02-09 04:48:43,559 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: 
> ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1]
> 2016-02-09 04:48:43,563 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.3, 
> AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32996
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 
> 2, numScaledRequests: 13, TotalRequested: 1802502144, TotalRequestedScaled: 
> 1.663848132923077E9, TotalJVMHeap: 2577399808, TotalAvailable: 1726857871, 
> TotalRequested/TotalJVMHeap:0.70
> 2016-02-09 04:48:43,564 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[MRInput:org.apache.tez.mapreduce.input.MRInputLegacy:INPUT:0:0], 
> [ireduce1:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:1802502144:1726857871]
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Starting Inputs/Outputs
> 2016-02-09 04:48:43,572 [INFO] [I/O Setup 1 Start: {MRInput}] 
> |runtime.LogicalIOProcessorRuntimeTask|: Started Input with src edge: MRInput
> 2016-02-09 04:48:43,572 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Input: MRInput being auto started by 
> the framework. Subsequent instances will not be auto-started
> 2016-02-09 04:48:43,573 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Num IOs determined for AutoStart: 1
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 IOs to start
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: AutoStartComplete
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |task.TaskRunner2Callable|: Running 
> task, taskAttemptId=attempt_1454993155302_0001_1_00_61_3
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |map.MapProcessor|: Running map: 
> attempt_1454993155302_0001_1_00_61_3_10001
> 2016-02-09 04:48:43,675 [INFO] [TezChild] |impl.ExternalSorter|: ireduce1 
> using: memoryMb=1646, keySerializerClass=class 
> org.apache.hadoop.io.IntWritable, 
> valueSerializerClass=org.apache.hadoo

[jira] [Commented] (TEZ-2962) Use per partition stats in shuffle vertex manager auto parallelism

2016-02-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159414#comment-15159414
 ] 

Bikas Saha commented on TEZ-2962:
-

The downside of partition stats is that the values are approximate in buckets 
of 1mb/10mb/100mb etc. So 100MB stat could imply 900mb actual data size. So 
respecting max data size per task can become tricky.

> Use per partition stats in shuffle vertex manager auto parallelism
> --
>
> Key: TEZ-2962
> URL: https://issues.apache.org/jira/browse/TEZ-2962
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Priority: Critical
>
> The original code used output size sent by completed tasks. Recently per 
> partition stats have been added that provide granular information. Using 
> partition stats may be more accurate and also remove the duplicate counting 
> of data size in partition stats and per task overall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3126) Log reason for not reducing parallelism

2016-02-22 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158125#comment-15158125
 ] 

Bikas Saha commented on TEZ-3126:
-

lgtm

> Log reason for not reducing parallelism
> ---
>
> Key: TEZ-3126
> URL: https://issues.apache.org/jira/browse/TEZ-3126
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Critical
> Attachments: TEZ-3126.1.patch, TEZ-3126.2.patch
>
>
> For example, when reducing parallelism from 36 to 22. The basePartitionRange 
> will be 1 and will not re-configure the vertex.
> {code:java|title=ShuffleVertexManager#determineParallelismAndApply|borderStyle=dashed|bgColor=lightgrey}
> int desiredTaskParallelism = 
> (int)(
> (expectedTotalSourceTasksOutputSize+desiredTaskInputDataSize-1)/
> desiredTaskInputDataSize);
> if(desiredTaskParallelism < minTaskParallelism) {
>   desiredTaskParallelism = minTaskParallelism;
> }
> 
> if(desiredTaskParallelism >= currentParallelism) {
>   return true;
> }
> 
> // most shufflers will be assigned this range
> basePartitionRange = currentParallelism/desiredTaskParallelism;
> 
> if (basePartitionRange <= 1) {
>   // nothing to do if range is equal 1 partition. shuffler does it by 
> default
>   return true;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3131) Support a way to override test_root_dir for FaultToleranceTestRunner

2016-02-22 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157572#comment-15157572
 ] 

Bikas Saha commented on TEZ-3131:
-

Sure. Please go ahead. +1.

> Support a way to override test_root_dir for FaultToleranceTestRunner
> 
>
> Key: TEZ-3131
> URL: https://issues.apache.org/jira/browse/TEZ-3131
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Minor
> Attachments: TEZ-3131.1.patch
>
>
> The path is hardcoded. For regression testing, it will be useful if it can be 
> overridden via command-line if needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2600 matches

Mail list logo