[jira] [Created] (TEZ-1737) Should add taskNum in VertexFinishedEvent

2014-11-04 Thread Jeff Zhang (JIRA)
Jeff Zhang created TEZ-1737:
---

 Summary: Should add taskNum in VertexFinishedEvent
 Key: TEZ-1737
 URL: https://issues.apache.org/jira/browse/TEZ-1737
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1737) Should add taskNum in VertexFinishedEvent

2014-11-04 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1737:

Affects Version/s: 0.5.1

> Should add taskNum in VertexFinishedEvent
> -
>
> Key: TEZ-1737
> URL: https://issues.apache.org/jira/browse/TEZ-1737
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>
> In the case of dag is completed, but recovery log is not completed, epecially 
> when VertexInitliazedEvent is not written to hdfs. In that case vertex's task 
> number may be -1 since we only recover the dag to desired state, the taskNum 
> may not been recovered, and it would cause the getVertexStatus get the wrong 
> task num.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1737) Should add taskNum in VertexFinishedEvent

2014-11-04 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1737:

Description: In the case of dag is completed, but recovery log is not 
completed, epecially when VertexInitliazedEvent is not written to hdfs. In that 
case vertex's task number may be -1 since we only recover the dag to desired 
state, the taskNum may not been recovered, and it would cause the 
getVertexStatus get the wrong task num.

> Should add taskNum in VertexFinishedEvent
> -
>
> Key: TEZ-1737
> URL: https://issues.apache.org/jira/browse/TEZ-1737
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>
> In the case of dag is completed, but recovery log is not completed, epecially 
> when VertexInitliazedEvent is not written to hdfs. In that case vertex's task 
> number may be -1 since we only recover the dag to desired state, the taskNum 
> may not been recovered, and it would cause the getVertexStatus get the wrong 
> task num.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1737) Should add taskNum in VertexFinishedEvent

2014-11-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196057#comment-14196057
 ] 

Jeff Zhang commented on TEZ-1737:
-

In the case of dag is completed, but recovery log is not completed, epecially 
when VertexInitliazedEvent is not written to hdfs. In that case vertex's task 
number may be -1 since we only recover the dag to desired state, the taskNum 
may not been recovered, and it would cause the getVertexStatus get the wrong 
task num. 

> Should add taskNum in VertexFinishedEvent
> -
>
> Key: TEZ-1737
> URL: https://issues.apache.org/jira/browse/TEZ-1737
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (TEZ-1737) Should add taskNum in VertexFinishedEvent

2014-11-04 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1737:

Comment: was deleted

(was: In the case of dag is completed, but recovery log is not completed, 
epecially when VertexInitliazedEvent is not written to hdfs. In that case 
vertex's task number may be -1 since we only recover the dag to desired state, 
the taskNum may not been recovered, and it would cause the getVertexStatus get 
the wrong task num. )

> Should add taskNum in VertexFinishedEvent
> -
>
> Key: TEZ-1737
> URL: https://issues.apache.org/jira/browse/TEZ-1737
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>
> In the case of dag is completed, but recovery log is not completed, epecially 
> when VertexInitliazedEvent is not written to hdfs. In that case vertex's task 
> number may be -1 since we only recover the dag to desired state, the taskNum 
> may not been recovered, and it would cause the getVertexStatus get the wrong 
> task num.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1737) Should add taskNum in VertexFinishedEvent

2014-11-04 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1737:

Target Version/s: 0.5.2

> Should add taskNum in VertexFinishedEvent
> -
>
> Key: TEZ-1737
> URL: https://issues.apache.org/jira/browse/TEZ-1737
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>
> In the case of dag is completed, but recovery log is not completed, epecially 
> when VertexInitliazedEvent is not written to hdfs. In that case vertex's task 
> number may be -1 since we only recover the dag to desired state, the taskNum 
> may not been recovered, and it would cause the getVertexStatus get the wrong 
> task num.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1734) Vertex's taskNum may be -1 when recovered from NEW to FAILED/KILLED

2014-11-04 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1734:

Attachment: TEZ-1734.patch

> Vertex's taskNum may be -1 when recovered from NEW to FAILED/KILLED
> ---
>
> Key: TEZ-1734
> URL: https://issues.apache.org/jira/browse/TEZ-1734
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1734.patch
>
>
> When vertex recovered from NEW to FAILED/KILLED, the taskNum may be -1, in 
> this case, we don't need to recover its tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1734) Vertex's taskNum may be -1 when recovered from NEW to FAILED/KILLED

2014-11-04 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1734:

Description: When vertex recovered from NEW to FAILED/KILLED, the taskNum 
may be -1, in this case, we don't need to recover its tasks

> Vertex's taskNum may be -1 when recovered from NEW to FAILED/KILLED
> ---
>
> Key: TEZ-1734
> URL: https://issues.apache.org/jira/browse/TEZ-1734
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1734.patch
>
>
> When vertex recovered from NEW to FAILED/KILLED, the taskNum may be -1, in 
> this case, we don't need to recover its tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1734) Vertex's taskNum may be -1 when recovered from NEW to FAILED/KILLED

2014-11-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196097#comment-14196097
 ] 

Jeff Zhang commented on TEZ-1734:
-

[~hitesh], please help review the patch.
[~bikassaha] I added some test cases, please help verify whether it is your 
test case scenario.

* Fix the issue in StartRecoveryTransition & RecoveryTransition
* Add test cases for recover from NEW to FAILED with taskNum is -1

> Vertex's taskNum may be -1 when recovered from NEW to FAILED/KILLED
> ---
>
> Key: TEZ-1734
> URL: https://issues.apache.org/jira/browse/TEZ-1734
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1734.patch
>
>
> When vertex recovered from NEW to FAILED/KILLED, the taskNum may be -1, in 
> this case, we don't need to recover its tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1733) TezMerger should sort FileChunks on decompressed size

2014-11-04 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196124#comment-14196124
 ] 

Prakash Ramachandran commented on TEZ-1733:
---

[~gopalv] The TreeSet uses the comparator function to check for equality too. 
so something like the code below, with the comparator checking only length,
{code}
FileChunk f1 = new FileChunk(new Path("/tmp/", "f1"), 0, 100);
FileChunk f2 = new FileChunk(new Path("/tmp/", "f2"), 0, 200);
FileChunk f3 = new FileChunk(new Path("/tmp/", "f3"), 0, 300);
FileChunk f4 = new FileChunk(new Path("/tmp/", "f4"), 0, 100);
onDiskMapOutputs.add(f1);
onDiskMapOutputs.add(f2);
onDiskMapOutputs.add(f3);
onDiskMapOutputs.add(f4);
for(FileChunk fc : onDiskMapOutputs) {
  System.out.println(fc.toString());
}
{code}
will end up giving only 3 values (the f4 wont be inserted as the comparator 
function would return 0 => equals)

> TezMerger should sort FileChunks on decompressed size
> -
>
> Key: TEZ-1733
> URL: https://issues.apache.org/jira/browse/TEZ-1733
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Critical
> Attachments: TEZ-1733.1.patch, TEZ-1733.2.patch
>
>
>  MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
> decompressed size, to cut-down on CPU and IO costs.
> TezMerger needs an equivalent sorted TreeSet which sorts by the data with-in 
> sizes rather than actual file sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1733) TezMerger should sort FileChunks on decompressed size

2014-11-04 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196366#comment-14196366
 ] 

Gopal V commented on TEZ-1733:
--

[~pramachandran]: that would be a problem.

Can you take a closer look at this and suggest an alternative?

> TezMerger should sort FileChunks on decompressed size
> -
>
> Key: TEZ-1733
> URL: https://issues.apache.org/jira/browse/TEZ-1733
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Critical
> Attachments: TEZ-1733.1.patch, TEZ-1733.2.patch
>
>
>  MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
> decompressed size, to cut-down on CPU and IO costs.
> TezMerger needs an equivalent sorted TreeSet which sorts by the data with-in 
> sizes rather than actual file sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-1738) tez tfile parser for log parsing

2014-11-04 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned TEZ-1738:
-

Assignee: Rajesh Balamohan

> tez tfile parser for log parsing
> 
>
> Key: TEZ-1738
> URL: https://issues.apache.org/jira/browse/TEZ-1738
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>
> It can be time consuming to download logs via "yarn logs -applicationId 
>  | grep something".  Also mining large volumes of logs can be time 
> consuming on single node.
> A simple pigloader would be useful to have in tez-tools which can parse 
> TFiles and provide line by line format (tuple of (machine, key, line)) for 
> distributed processing of logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1738) tez tfile parser for log parsing

2014-11-04 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-1738:
-

 Summary: tez tfile parser for log parsing
 Key: TEZ-1738
 URL: https://issues.apache.org/jira/browse/TEZ-1738
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan


It can be time consuming to download logs via "yarn logs -applicationId  
| grep something".  Also mining large volumes of logs can be time consuming on 
single node.
A simple pigloader would be useful to have in tez-tools which can parse TFiles 
and provide line by line format (tuple of (machine, key, line)) for distributed 
processing of logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1738) tez tfile parser for log parsing

2014-11-04 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1738:
--
Attachment: TEZ-1738.1.patch

[~gopalv], [~sseth] - Please reivew.

> tez tfile parser for log parsing
> 
>
> Key: TEZ-1738
> URL: https://issues.apache.org/jira/browse/TEZ-1738
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1738.1.patch
>
>
> It can be time consuming to download logs via "yarn logs -applicationId 
>  | grep something".  Also mining large volumes of logs can be time 
> consuming on single node.
> A simple pigloader would be useful to have in tez-tools which can parse 
> TFiles and provide line by line format (tuple of (machine, key, line)) for 
> distributed processing of logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1738) tez tfile parser for log parsing

2014-11-04 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196439#comment-14196439
 ] 

Rajesh Balamohan edited comment on TEZ-1738 at 11/4/14 5:46 PM:


[~gopalv], [~sseth] - Can you please review?


was (Author: rajesh.balamohan):
[~gopalv], [~sseth] - Please reivew.

> tez tfile parser for log parsing
> 
>
> Key: TEZ-1738
> URL: https://issues.apache.org/jira/browse/TEZ-1738
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1738.1.patch
>
>
> It can be time consuming to download logs via "yarn logs -applicationId 
>  | grep something".  Also mining large volumes of logs can be time 
> consuming on single node.
> A simple pigloader would be useful to have in tez-tools which can parse 
> TFiles and provide line by line format (tuple of (machine, key, line)) for 
> distributed processing of logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1738) tez tfile parser for log parsing

2014-11-04 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1738:
--
Attachment: TEZ-1738.2.patch

renaming package.

> tez tfile parser for log parsing
> 
>
> Key: TEZ-1738
> URL: https://issues.apache.org/jira/browse/TEZ-1738
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1738.1.patch, TEZ-1738.2.patch
>
>
> It can be time consuming to download logs via "yarn logs -applicationId 
>  | grep something".  Also mining large volumes of logs can be time 
> consuming on single node.
> A simple pigloader would be useful to have in tez-tools which can parse 
> TFiles and provide line by line format (tuple of (machine, key, line)) for 
> distributed processing of logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1733) TezMerger should sort FileChunks on decompressed size

2014-11-04 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-1733:
--
Attachment: TEZ-1733.1.patch

- changed compareTo to order by compressed size first.


> TezMerger should sort FileChunks on decompressed size
> -
>
> Key: TEZ-1733
> URL: https://issues.apache.org/jira/browse/TEZ-1733
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Critical
> Attachments: TEZ-1733.1.patch, TEZ-1733.1.patch, TEZ-1733.2.patch
>
>
>  MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
> decompressed size, to cut-down on CPU and IO costs.
> TezMerger needs an equivalent sorted TreeSet which sorts by the data with-in 
> sizes rather than actual file sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-1733) TezMerger should sort FileChunks on decompressed size

2014-11-04 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran reassigned TEZ-1733:
-

Assignee: Prakash Ramachandran  (was: Gopal V)

> TezMerger should sort FileChunks on decompressed size
> -
>
> Key: TEZ-1733
> URL: https://issues.apache.org/jira/browse/TEZ-1733
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Gopal V
>Assignee: Prakash Ramachandran
>Priority: Critical
> Attachments: TEZ-1733.1.patch, TEZ-1733.1.patch, TEZ-1733.2.patch
>
>
>  MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
> decompressed size, to cut-down on CPU and IO costs.
> TezMerger needs an equivalent sorted TreeSet which sorts by the data with-in 
> sizes rather than actual file sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1739) fix package name for FileChunk.java

2014-11-04 Thread Prakash Ramachandran (JIRA)
Prakash Ramachandran created TEZ-1739:
-

 Summary: fix package name for FileChunk.java
 Key: TEZ-1739
 URL: https://issues.apache.org/jira/browse/TEZ-1739
 Project: Apache Tez
  Issue Type: Bug
Reporter: Prakash Ramachandran


The package name for the FileChunk.java is set as org.apache.hadoop.io



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1733) TezMerger should sort FileChunks on decompressed size

2014-11-04 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-1733:
--
Attachment: TEZ-1733.3.patch

apologies for the confusing naming of the patch. renamed it. 
[~gopalv]/[~rajesh.balamohan]/[~sseth] can you have a look. 
also created a ticket to track the package naming issue. TEZ-1739

> TezMerger should sort FileChunks on decompressed size
> -
>
> Key: TEZ-1733
> URL: https://issues.apache.org/jira/browse/TEZ-1733
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Gopal V
>Assignee: Prakash Ramachandran
>Priority: Critical
> Attachments: TEZ-1733.1.patch, TEZ-1733.1.patch, TEZ-1733.2.patch, 
> TEZ-1733.3.patch
>
>
>  MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
> decompressed size, to cut-down on CPU and IO costs.
> TezMerger needs an equivalent sorted TreeSet which sorts by the data with-in 
> sizes rather than actual file sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1735) Allow setting basic info per DAG for Tez UI

2014-11-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196694#comment-14196694
 ] 

Bikas Saha commented on TEZ-1735:
-

Sorry for coming late on this. If this is similar to setHistoryText() that 
currently exists for other entities, should we keep the naming consistent 
instead of setDAGInfo()? Also, most setters also have getters in the API.

> Allow setting basic info per DAG for Tez UI
> ---
>
> Key: TEZ-1735
> URL: https://issues.apache.org/jira/browse/TEZ-1735
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-1735.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1547) Make use of state change notifier in VertexManagerPlugins and fix TEZ-1494 without latency penalty

2014-11-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196803#comment-14196803
 ] 

Bikas Saha commented on TEZ-1547:
-

For the case of source being shuffle, probably no more than the previous 
versions of the patch, since the delay is in the slow-start/auto-reduce 
calculation. Thanks for the reviews.

> Make use of state change notifier in VertexManagerPlugins and fix TEZ-1494 
> without latency penalty
> --
>
> Key: TEZ-1547
> URL: https://issues.apache.org/jira/browse/TEZ-1547
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Bikas Saha
> Attachments: TEZ-1547.1.patch, TEZ-1547.10.patch, TEZ-1547.11.patch, 
> TEZ-1547.3.patch, TEZ-1547.4.patch, TEZ-1547.5.patch, TEZ-1547.6.patch, 
> TEZ-1547.7.patch, TEZ-1547.8.patch, TEZ-1547.9.patch
>
>
> Instead of the various APIs like onVertexStarted, simple notifications could 
> be sent.
> Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1732) Temporary mitigation for out of order scheduling

2014-11-04 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196921#comment-14196921
 ] 

Rajesh Balamohan commented on TEZ-1732:
---

{noformat}
M2 M7   M3
(sg) \  /   /
  \/   /
  \   /   /
   R3/ (b)   / (sg)
\   /   /
 (b) \ /   /
  \   /   /
M5 --/
|
R6
{noformat}
Attaching a DAG which can cause out of order execution.  This happens when 
enough data is available to M2 and M7. 

> Temporary mitigation for out of order scheduling
> 
>
> Key: TEZ-1732
> URL: https://issues.apache.org/jira/browse/TEZ-1732
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1732) Temporary mitigation for out of order scheduling

2014-11-04 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196945#comment-14196945
 ] 

Siddharth Seth commented on TEZ-1732:
-

For any vertex in the middle of a graph with a delayed CONFIGURED notification, 
and assuming there's no such vertices after that - the vertex with the delayed 
CONFIGURED and it's children will end up not scheduling tasks. Other downstream 
vertices will schedule their tasks.

> Temporary mitigation for out of order scheduling
> 
>
> Key: TEZ-1732
> URL: https://issues.apache.org/jira/browse/TEZ-1732
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1740) Support multiple DAGs in a Tez AM

2014-11-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196960#comment-14196960
 ] 

Bikas Saha commented on TEZ-1740:
-

For cases like Hive, where DAGs tend to have a V shape in resource 
requirements, pipelining can be beneficial for throughput/latency by having 
subsequent DAGs fill out the space left behind as the current DAG draws to a 
close.

> Support multiple DAGs in a Tez AM
> -
>
> Key: TEZ-1740
> URL: https://issues.apache.org/jira/browse/TEZ-1740
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Bikas Saha
>
> Currently, the TEZ AM supports only 1 DAG submission and execution at a time. 
> It could be enhanced to support accepting more DAGs and/or running them 
> concurrently. Definition of concurrently needs to be defined - fair-share or 
> FIFO or pipelined or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1740) Support multiple DAGs in a Tez AM

2014-11-04 Thread Bikas Saha (JIRA)
Bikas Saha created TEZ-1740:
---

 Summary: Support multiple DAGs in a Tez AM
 Key: TEZ-1740
 URL: https://issues.apache.org/jira/browse/TEZ-1740
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Bikas Saha


Currently, the TEZ AM supports only 1 DAG submission and execution at a time. 
It could be enhanced to support accepting more DAGs and/or running them 
concurrently. Definition of concurrently needs to be defined - fair-share or 
FIFO or pipelined or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1734) Vertex's taskNum may be -1 when recovered from NEW to FAILED/KILLED

2014-11-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197007#comment-14197007
 ] 

Bikas Saha commented on TEZ-1734:
-

My test case just got committed via TEZ-1547.

> Vertex's taskNum may be -1 when recovered from NEW to FAILED/KILLED
> ---
>
> Key: TEZ-1734
> URL: https://issues.apache.org/jira/browse/TEZ-1734
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1734.patch
>
>
> When vertex recovered from NEW to FAILED/KILLED, the taskNum may be -1, in 
> this case, we don't need to recover its tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1708) Make UI part of TEZ build process

2014-11-04 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated TEZ-1708:

Attachment: ambari-views-1.3.0-SNAPSHOT.jar
patch.sh
TEZ-1708.1.patch

- tez-ui is now a part of TEZ maven build. 
- Post build the ambari jar and web tar can be found in tez-ui/target.

- Please copy the three files into the parent directory of tez and run 
patch.sh. The script will do all the requited changes.
- More info on the build can be found @ 
https://cwiki.apache.org/confluence/display/TEZ/TEZ-UI+Build

> Make UI part of TEZ build process
> -
>
> Key: TEZ-1708
> URL: https://issues.apache.org/jira/browse/TEZ-1708
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
> Attachments: TEZ-1708.1.patch, ambari-views-1.3.0-SNAPSHOT.jar, 
> patch.sh
>
>
> - Ensure that the code base follow maven standards.
> - On build, a web tar and Ambari jar must be created in the target folder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1687) Use logIdentifier of Vertex for logging

2014-11-04 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1687:

Target Version/s: 0.5.2

> Use logIdentifier of Vertex for logging
> ---
>
> Key: TEZ-1687
> URL: https://issues.apache.org/jira/browse/TEZ-1687
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>
> Some places still use vertexId, logIdentifier is better for trouble 
> troubleshooting because it combines vertexId with vertex name. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1687) Use logIdentifier of Vertex for logging

2014-11-04 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1687:

Affects Version/s: 0.5.1

> Use logIdentifier of Vertex for logging
> ---
>
> Key: TEZ-1687
> URL: https://issues.apache.org/jira/browse/TEZ-1687
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>
> Some places still use vertexId, logIdentifier is better for trouble 
> troubleshooting because it combines vertexId with vertex name. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1687) Use logIdentifier of Vertex for logging

2014-11-04 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1687:

Attachment: TEZ-1687.patch

> Use logIdentifier of Vertex for logging
> ---
>
> Key: TEZ-1687
> URL: https://issues.apache.org/jira/browse/TEZ-1687
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1687.patch
>
>
> Some places still use vertexId, logIdentifier is better for trouble 
> troubleshooting because it combines vertexId with vertex name. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1687) Use logIdentifier of Vertex for logging

2014-11-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197294#comment-14197294
 ] 

Jeff Zhang commented on TEZ-1687:
-

[~hitesh], [~bikassaha], [~sseth], anyone of you could help review it ?

It's a simple patch about use logIdentitifer of vertex for logging, I'd like to 
include it in 0.5.2, because I think it should be helpful for troubleshooting 
when checking the logs.

 

> Use logIdentifier of Vertex for logging
> ---
>
> Key: TEZ-1687
> URL: https://issues.apache.org/jira/browse/TEZ-1687
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1687.patch
>
>
> Some places still use vertexId, logIdentifier is better for trouble 
> troubleshooting because it combines vertexId with vertex name. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1724) Refactoring - Consolidate ROOT_INPUT and INPUT

2014-11-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197313#comment-14197313
 ] 

Jeff Zhang commented on TEZ-1724:
-

[~sseth], after a second thought, I think here ROOT_INPUT maybe mean the input 
is on the root rather than from other vertex, it should make sense to call it 
ROOT_INPUT in that case, and it would consistent with the API 
RootInputLeafOutput, what do you think ?
 

> Refactoring - Consolidate ROOT_INPUT and INPUT
> --
>
> Key: TEZ-1724
> URL: https://issues.apache.org/jira/browse/TEZ-1724
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>
> Some places we use ROOT_INPUT , while other places we use INPUT (Actually 
> Input can been attached to one non-root-vertex, so I think we need to 
> consolidate them both to INPUT). Here're some places need to refractor.
> * RootInputInitializerManager -> InputInitializerManager
> * TezRootInputInitializerContextImpl -> TezInputInitializerContextImpl
> * VertexEventRootInputInitialized -> VertexEventInputInitialized
> * VertexEventRootInputFailed -> VertexEventInputFailed
> * VertexTerminationCause.ROOT_INPUT_INIT_FAILURE -> 
> VertexTerminationCause.INPUT_INIT_FAILURE.
> * EventType.ROOT_INPUT_DATA_INFORMATION_EVENT -> 
> EventType.INPUT_DATA_INFORMATION_EVENT
> * EventType.ROOT_INPUT_INITIALIZER_EVENT -> EventType.INPUT_INITIALIZER_EVENT
> * VertexEventType.V_ROOT_INPUT_INITIALIZED -> 
> VertexEventType.V_INPUT_INITIALIZED
> * VertexEventType.V_ROOT_INPUT_FAILED -> VertexEventType.V_INPUT_INIT_FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1724) Refactoring - Consolidate ROOT_INPUT and INPUT

2014-11-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197313#comment-14197313
 ] 

Jeff Zhang edited comment on TEZ-1724 at 11/5/14 2:07 AM:
--

[~sseth], after a second thought, I think here ROOT_INPUT maybe mean the input 
is on the root rather than from other vertex, it should make sense to call it 
ROOT_INPUT in that case, and it would consistent with the API 
RootInputLeafOutput, what do you think ?
 

e.g.
{code}
input1
 |
v1  input2  
   \  |
v2
  \
   v3
{code}


was (Author: zjffdu):
[~sseth], after a second thought, I think here ROOT_INPUT maybe mean the input 
is on the root rather than from other vertex, it should make sense to call it 
ROOT_INPUT in that case, and it would consistent with the API 
RootInputLeafOutput, what do you think ?
 

> Refactoring - Consolidate ROOT_INPUT and INPUT
> --
>
> Key: TEZ-1724
> URL: https://issues.apache.org/jira/browse/TEZ-1724
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>
> Some places we use ROOT_INPUT , while other places we use INPUT (Actually 
> Input can been attached to one non-root-vertex, so I think we need to 
> consolidate them both to INPUT). Here're some places need to refractor.
> * RootInputInitializerManager -> InputInitializerManager
> * TezRootInputInitializerContextImpl -> TezInputInitializerContextImpl
> * VertexEventRootInputInitialized -> VertexEventInputInitialized
> * VertexEventRootInputFailed -> VertexEventInputFailed
> * VertexTerminationCause.ROOT_INPUT_INIT_FAILURE -> 
> VertexTerminationCause.INPUT_INIT_FAILURE.
> * EventType.ROOT_INPUT_DATA_INFORMATION_EVENT -> 
> EventType.INPUT_DATA_INFORMATION_EVENT
> * EventType.ROOT_INPUT_INITIALIZER_EVENT -> EventType.INPUT_INITIALIZER_EVENT
> * VertexEventType.V_ROOT_INPUT_INITIALIZED -> 
> VertexEventType.V_INPUT_INITIALIZED
> * VertexEventType.V_ROOT_INPUT_FAILED -> VertexEventType.V_INPUT_INIT_FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1724) Refactoring - Consolidate ROOT_INPUT and INPUT

2014-11-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197313#comment-14197313
 ] 

Jeff Zhang edited comment on TEZ-1724 at 11/5/14 2:07 AM:
--

[~sseth], after a second thought, I think here ROOT_INPUT maybe mean the input 
is on the root rather than from other vertex, it should make sense to call it 
ROOT_INPUT in that case, and it would consistent with the API 
RootInputLeafOutput, what do you think ?
 

e.g.   it would make sense to call input1 and input2 both ROOT_INPUT
{code}
input1
 |
v1  input2  
   \  |
v2
  \
   v3
{code}


was (Author: zjffdu):
[~sseth], after a second thought, I think here ROOT_INPUT maybe mean the input 
is on the root rather than from other vertex, it should make sense to call it 
ROOT_INPUT in that case, and it would consistent with the API 
RootInputLeafOutput, what do you think ?
 

e.g.
{code}
input1
 |
v1  input2  
   \  |
v2
  \
   v3
{code}

> Refactoring - Consolidate ROOT_INPUT and INPUT
> --
>
> Key: TEZ-1724
> URL: https://issues.apache.org/jira/browse/TEZ-1724
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>
> Some places we use ROOT_INPUT , while other places we use INPUT (Actually 
> Input can been attached to one non-root-vertex, so I think we need to 
> consolidate them both to INPUT). Here're some places need to refractor.
> * RootInputInitializerManager -> InputInitializerManager
> * TezRootInputInitializerContextImpl -> TezInputInitializerContextImpl
> * VertexEventRootInputInitialized -> VertexEventInputInitialized
> * VertexEventRootInputFailed -> VertexEventInputFailed
> * VertexTerminationCause.ROOT_INPUT_INIT_FAILURE -> 
> VertexTerminationCause.INPUT_INIT_FAILURE.
> * EventType.ROOT_INPUT_DATA_INFORMATION_EVENT -> 
> EventType.INPUT_DATA_INFORMATION_EVENT
> * EventType.ROOT_INPUT_INITIALIZER_EVENT -> EventType.INPUT_INITIALIZER_EVENT
> * VertexEventType.V_ROOT_INPUT_INITIALIZED -> 
> VertexEventType.V_INPUT_INITIALIZED
> * VertexEventType.V_ROOT_INPUT_FAILED -> VertexEventType.V_INPUT_INIT_FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1687) Use logIdentifier of Vertex for logging

2014-11-04 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1687:

Target Version/s:   (was: 0.5.2)

> Use logIdentifier of Vertex for logging
> ---
>
> Key: TEZ-1687
> URL: https://issues.apache.org/jira/browse/TEZ-1687
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1687.patch
>
>
> Some places still use vertexId, logIdentifier is better for trouble 
> troubleshooting because it combines vertexId with vertex name. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1687) Use logIdentifier of Vertex for logging

2014-11-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197405#comment-14197405
 ] 

Bikas Saha commented on TEZ-1687:
-

I am sorry I just cut RC0 for 0.5.2. Would it be ok to get this into 0.5.3? 
Change looks fine.

> Use logIdentifier of Vertex for logging
> ---
>
> Key: TEZ-1687
> URL: https://issues.apache.org/jira/browse/TEZ-1687
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1687.patch
>
>
> Some places still use vertexId, logIdentifier is better for trouble 
> troubleshooting because it combines vertexId with vertex name. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1687) Use logIdentifier of Vertex for logging

2014-11-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197407#comment-14197407
 ] 

Jeff Zhang commented on TEZ-1687:
-

OK, I will put it in 0.5.3

> Use logIdentifier of Vertex for logging
> ---
>
> Key: TEZ-1687
> URL: https://issues.apache.org/jira/browse/TEZ-1687
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.1
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1687.patch
>
>
> Some places still use vertexId, logIdentifier is better for trouble 
> troubleshooting because it combines vertexId with vertex name. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1733) TezMerger should sort FileChunks on size when merging

2014-11-04 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1733:
-
Summary: TezMerger should sort FileChunks on size when merging  (was: 
TezMerger should sort FileChunks on decompressed size)

> TezMerger should sort FileChunks on size when merging
> -
>
> Key: TEZ-1733
> URL: https://issues.apache.org/jira/browse/TEZ-1733
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Gopal V
>Assignee: Prakash Ramachandran
>Priority: Critical
> Attachments: TEZ-1733.1.patch, TEZ-1733.1.patch, TEZ-1733.2.patch, 
> TEZ-1733.3.patch
>
>
>  MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
> decompressed size, to cut-down on CPU and IO costs.
> TezMerger needs an equivalent sorted TreeSet which sorts by the data with-in 
> sizes rather than actual file sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1733) TezMerger should sort FileChunks on size when merging

2014-11-04 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1733:
-
Description: 
 MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
decompressed size, to cut-down on CPU and IO costs.

TezMerger needs an equivalent sorted TreeSet which sorts by the data  by size.


  was:
 MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
decompressed size, to cut-down on CPU and IO costs.

TezMerger needs an equivalent sorted TreeSet which sorts by the data with-in 
sizes rather than actual file sizes.



> TezMerger should sort FileChunks on size when merging
> -
>
> Key: TEZ-1733
> URL: https://issues.apache.org/jira/browse/TEZ-1733
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Gopal V
>Assignee: Prakash Ramachandran
>Priority: Critical
> Attachments: TEZ-1733.1.patch, TEZ-1733.1.patch, TEZ-1733.2.patch, 
> TEZ-1733.3.patch
>
>
>  MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
> decompressed size, to cut-down on CPU and IO costs.
> TezMerger needs an equivalent sorted TreeSet which sorts by the data  by size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1733) TezMerger should sort FileChunks on size when merging

2014-11-04 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197466#comment-14197466
 ] 

Gopal V commented on TEZ-1733:
--

This is sufficient & low-risk enough to satisfy the merger IO issues.

LGTM - +1.

> TezMerger should sort FileChunks on size when merging
> -
>
> Key: TEZ-1733
> URL: https://issues.apache.org/jira/browse/TEZ-1733
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Gopal V
>Assignee: Prakash Ramachandran
>Priority: Critical
> Attachments: TEZ-1733.1.patch, TEZ-1733.1.patch, TEZ-1733.2.patch, 
> TEZ-1733.3.patch
>
>
>  MAPREDUCE-3685 fixed the Merger sort order for file chunks to use the 
> decompressed size, to cut-down on CPU and IO costs.
> TezMerger needs an equivalent sorted TreeSet which sorts by the data  by size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)