[jira] [Reopened] (TEZ-1337) Handling of local-dirs for Local Mode

2014-08-13 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reopened TEZ-1337:
-


[~airbots] - apologies for not updating this jira after TEZ-1393. The earlier 
comments aren't valid anymore. Given the number of directories involved as 
pointed out in TEZ-1393 (and comments on this) - this was always not a very 
well understood issue.

However, there's still utility in letting users configure a directory which 
will be used for what would otherwise be the YARN local-dirs and log-dirs. The 
comment about this directory being deleted during staging / task cleanup still 
holds.
Essentially, allow users to setup a directory, into which tasks write their 
intermediate output, and into which logs are generated - which is controllable 
by users. If this isn't setup - the staging directory can always be used for 
the same.

> Handling of local-dirs for Local Mode
> -
>
> Key: TEZ-1337
> URL: https://issues.apache.org/jira/browse/TEZ-1337
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Chen He
> Fix For: 0.5.0
>
> Attachments: TEZ-1337.patch
>
>
> staging dir is being used to write intermediate data. At some point, it may 
> be worthwhile to confgure a separate work area. IAC, these should be cleaned 
> up  - at least the intermediate data after a local session executes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1072) Consolidate monitoring APIs in DAGClient

2014-08-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096616#comment-14096616
 ] 

Siddharth Seth commented on TEZ-1072:
-

+1. Looks good.

> Consolidate monitoring APIs in DAGClient
> 
>
> Key: TEZ-1072
> URL: https://issues.apache.org/jira/browse/TEZ-1072
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jonathan Eagles
>Priority: Blocker
>  Labels: api
> Attachments: TEZ-1072-DAGCLIENTUTILS-v1.patch, TEZ-1072-v1.patch
>
>
> Rename waitForCompletionWithAllStatusUpdates - was this meant to be 
> waitForCompletionWithAllVertexUpdates
> Reduce the number of methods exposed - waitForCompletion, 
> waitForCompletionWithStatusUpdates(@Nullable Set vertices,
>   @Nullable Set statusGetOpts), 
> waitForCompletionWithAllStatusUpdates(@Nullable Set 
> statusGetOpts)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1416) tez-api project javadoc/annotations review and clean up.

2014-08-13 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1416:


Attachment: TEZ-1416.1.patch.review

Patch file with comments / questions marked with XXX.

Also, 

TezCounters, I believe needs some work. I think we should mark these as 
Unstable. JobCounters as @Private - and potentially move to MapReduce.

Misc in TezConfiguration - all the constants (TEZ_PREFIX, etc) should be private
The following should likely be private.
TEZ_AM_CANCEL_DELEGATION_TOKEN - Private
TEZ_AM_COUNTERS_MAX_KEYS - private
TEZ_AM_PLAN_REMOTE_PATH
TEZ_AM_INLINE_TASK_EXECUTION_ENABLED

The following should be private. Also in TezConstants instead of 
TezConfiguration - is there a separate jira for that ? There's more than just 
the ones listed below - so a separate jira would be better.
{code}
  public static final String TEZ_PB_BINARY_CONF_NAME = "tez-conf.pb";
  public static final String TEZ_PB_PLAN_BINARY_NAME = "tez-dag.pb";
  public static final String TEZ_PB_PLAN_TEXT_NAME = "tez-dag.pb.txt";
public static final String TEZ_CONTAINER_LOG4J_PROPERTIES_FILE = 
"tez-container-log4j.properties";
  public static final String TEZ_CONTAINER_LOGGER_NAME = "CLA";
  public static final String TEZ_ROOT_LOGGER_NAME = "tez.root.logger";
  public static final String TEZ_CONTAINER_LOG_FILE_NAME = "syslog";
  public static final String TEZ_CONTAINER_ERR_FILE_NAME = "stderr";
  public static final String TEZ_CONTAINER_OUT_FILE_NAME = "stdout";
{code}

TEZ_AM_GROUPING* - should these be renamed to just TEZ_GROUPING_*

CompositeDataMovementEvent, DataMovementEvent, some others missing 
public/private annotations.

InputInitializer, InputInitializerContext need to move out of runtime.api

Annotations or comments on the proto files.

Some of this isn't related to this jira, and should be a follow up - otherwise 
this patch grows to big.

> tez-api project javadoc/annotations review and clean up.
> 
>
> Key: TEZ-1416
> URL: https://issues.apache.org/jira/browse/TEZ-1416
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1416.1.patch, TEZ-1416.1.patch.review
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1400) Reducers stuck when enabling auto-reduce parallelism (MRR case)

2014-08-13 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1400:
--

Attachment: TEZ-1400.2.patch

Yes, "key":"tez.am.shuffle-vertex-manager.enable.auto-parallel","value":"true" 
is explicitly enabled only for certain vertices by Hive.  Further debugging 
revealed that, TezConfiguration picked up wrong "tez-site.xml" from classpath 
which had min/max settings as 0.0.  And TezConfiguration class gets initialized 
from HiveSplitGenerator (when it tries to compute the waves). Picking up wrong 
tez-site.xml caused the issue and from this point onwards configuration would 
end up loading tez-site.xml with wrong values. Attached patch here fixes
1. TezConfiguration.java should not load tez-site.xml during class 
initialization.  It should load it via constructor.
2. If payload is null, VertexManager gets DAG conf instead of amConf.

> Reducers stuck when enabling auto-reduce parallelism (MRR case)
> ---
>
> Key: TEZ-1400
> URL: https://issues.apache.org/jira/browse/TEZ-1400
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>  Labels: performance
> Attachments: TEZ-1400.1.patch, TEZ-1400.2.patch, dag.dot
>
>
> In M -> R1 -> R2 case, if R1 is optimized by auto-parallelism R2 gets stuck 
> waiting for events.
> e.g
> Map 1: 0/1  Map 2: -/-  Map 5: 0/1  Map 6: 0/1  Map 7: 0/1
>   Reducer 3: 0/23 Reducer 4: 0/1
> ...
> ...
> Map 1: 1/1  Map 2: 148(+13)/161 Map 5: 1/1  Map 6: 1/1  Map 
> 7: 1/1  Reducer 3: 0(+3)/3  Reducer 4: 0(+1)/1  <== Auto reduce 
> parallelism kicks in
> ..
> Map 1: 1/1  Map 2: 161/161  Map 5: 1/1  Map 6: 1/1  Map 7: 1/1
>   Reducer 3: 3/3  Reducer 4: 0(+1)/1
> Job is stuck waiting for events in Reducer 4.
>  [fetcher [Reducer_3] #23] 
> org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler: copy(3 
> of 23 at 0.02 MB/s) <=== *Waiting for 20 more partitions, even though 
> Reducer3 has been optimized to use 3 reducers



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1416) tez-api project javadoc/annotations review and clean up.

2014-08-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096574#comment-14096574
 ] 

Hitesh Shah commented on TEZ-1416:
--

Mostly looks good. Should likely wait for [~sseth] to see if he has any 
comments on which classes should/should not be marked stable. 

Also, I saw TEZ_AM_CONTAINER_REUSE_LOCALITY_DELAY_ALLOCATION_MILLIS_DEFAULT 
getting changed to 250 - is that intentional? Does not seem right to set such a 
low value for the general case? 



> tez-api project javadoc/annotations review and clean up.
> 
>
> Key: TEZ-1416
> URL: https://issues.apache.org/jira/browse/TEZ-1416
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1416.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1411) Address initial feedback on swimlanes

2014-08-13 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096528#comment-14096528
 ] 

Gopal V commented on TEZ-1411:
--

#1 - this can only be done if everyone runs ATS in their cluster.

> Address initial feedback on swimlanes
> -
>
> Key: TEZ-1411
> URL: https://issues.apache.org/jira/browse/TEZ-1411
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Gopal V
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Few other good to have things
> 1) A wrapper script that takes care of the command chaining with a single 
> appId as input from the user.
> 2) Legend in the README or in the svg itself about what is what.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1418) Provide Default value for TEZ_AM_LAUNCH_ENV and TEZ_TASK_LAUNCH

2014-08-13 Thread Subroto Sanyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subroto Sanyal updated TEZ-1418:


Summary: Provide Default value for TEZ_AM_LAUNCH_ENV and TEZ_TASK_LAUNCH  
(was: Default value for TEZ_AM_LAUNCH_ENV and TEZ_TASK_LAUNCH)

> Provide Default value for TEZ_AM_LAUNCH_ENV and TEZ_TASK_LAUNCH
> ---
>
> Key: TEZ-1418
> URL: https://issues.apache.org/jira/browse/TEZ-1418
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Subroto Sanyal
>Priority: Blocker
> Fix For: 0.5.0
>
>
> As part of the fix for the issue TEZ-1127 two new configurations have  been 
> introduced:
> # _TEZ_AM_LAUNCH_ENV_
> # _TEZ_TASK_LAUNCH_
> Ideally these properties should be configured with default value of:
> "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native"
> as in the case for _mapreduce.admin.user.env_
> The default value for these properties are set to "" (empty string).
> Now user has to explicitly set these values from the application code to use 
> the native libs (like for compression).
> From Hitesh:
> {quote}As commented on TEZ-1127, it is a question as to what the default 
> should be - whether HADOOP_COMMON_HOME or HADOOP_PREFIX and to some extent, 
> it needs to handle Windows deployments too.{quote}
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1418) Default value for TEZ_AM_LAUNCH_ENV and TEZ_TASK_LAUNCH

2014-08-13 Thread Subroto Sanyal (JIRA)
Subroto Sanyal created TEZ-1418:
---

 Summary: Default value for TEZ_AM_LAUNCH_ENV and TEZ_TASK_LAUNCH
 Key: TEZ-1418
 URL: https://issues.apache.org/jira/browse/TEZ-1418
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Subroto Sanyal
Priority: Blocker
 Fix For: 0.5.0


As part of the fix for the issue TEZ-1127 two new configurations have  been 
introduced:
# _TEZ_AM_LAUNCH_ENV_
# _TEZ_TASK_LAUNCH_

Ideally these properties should be configured with default value of:
"LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native"

as in the case for _mapreduce.admin.user.env_

The default value for these properties are set to "" (empty string).
Now user has to explicitly set these values from the application code to use 
the native libs (like for compression).

>From Hitesh:
{quote}As commented on TEZ-1127, it is a question as to what the default should 
be - whether HADOOP_COMMON_HOME or HADOOP_PREFIX and to some extent, it needs 
to handle Windows deployments too.{quote}
 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1337) Handling of local-dirs for Local Mode

2014-08-13 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096517#comment-14096517
 ] 

Chen He commented on TEZ-1337:
--

I am confused about some comments in previous discussion: 

On 01/Aug/14
{quote}
I think we should make the following changes.
Introduce a local dir for local mode. This will be the base directory for the 
filesystem that is being used for local mode.
The staging directory, however, must continue to work since the Client writes 
into this and the AM reads contents back.
*strong*Fallback to using the base staging-dir if this property is not 
configured. *strong*
The root path would end up being the root-staging dir, and not the application 
specific staging directory, since that is likely to be deleted.
{quote}

On 08/Aug/14
{quote}
The local directory for AM/tasks: This is the scratch area that tasks rely 
upon. This is what would end up being configurable via this jira, 
*strong*instead of relying on the staging directory.*strong*
{quote}

If this JIRA has already been implemented by TEZ-1393, I will close it.


> Handling of local-dirs for Local Mode
> -
>
> Key: TEZ-1337
> URL: https://issues.apache.org/jira/browse/TEZ-1337
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Chen He
> Attachments: TEZ-1337.patch
>
>
> staging dir is being used to write intermediate data. At some point, it may 
> be worthwhile to confgure a separate work area. IAC, these should be cleaned 
> up  - at least the intermediate data after a local session executes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API

2014-08-13 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096516#comment-14096516
 ] 

Tsuyoshi OZAWA commented on TEZ-1390:
-

Thanks for sharing, Jonathan!

[~bikassaha], I attached a first patch to make UserPayload to accpets 
ByteBuffer and making getPayload return ByteBuffer. Could you take a look? 

> Replace byte[] with ByteBuffer as the type of user payload in the API
> -
>
> Key: TEZ-1390
> URL: https://issues.apache.org/jira/browse/TEZ-1390
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
> Attachments: TEZ-1390.1.patch, pig.payload.txt
>
>
> This is just and API change. Internally we can continue to use byte[] since 
> thats a much bigger change.
> The translation from ByteBuffer to byte[] in the API layer should not have 
> perf impact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API

2014-08-13 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated TEZ-1390:


Attachment: TEZ-1390.1.patch

> Replace byte[] with ByteBuffer as the type of user payload in the API
> -
>
> Key: TEZ-1390
> URL: https://issues.apache.org/jira/browse/TEZ-1390
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
> Attachments: TEZ-1390.1.patch, pig.payload.txt
>
>
> This is just and API change. Internally we can continue to use byte[] since 
> thats a much bigger change.
> The translation from ByteBuffer to byte[] in the API layer should not have 
> perf impact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API

2014-08-13 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096513#comment-14096513
 ] 

Tsuyoshi OZAWA commented on TEZ-1390:
-

ByteString#asReadOnlyByteBuffer offers the feature to create a read only 
ByteBuffer without copying.

> Replace byte[] with ByteBuffer as the type of user payload in the API
> -
>
> Key: TEZ-1390
> URL: https://issues.apache.org/jira/browse/TEZ-1390
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
> Attachments: pig.payload.txt
>
>
> This is just and API change. Internally we can continue to use byte[] since 
> thats a much bigger change.
> The translation from ByteBuffer to byte[] in the API layer should not have 
> perf impact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1072) Consolidate monitoring APIs in DAGClient

2014-08-13 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096511#comment-14096511
 ] 

Jonathan Eagles commented on TEZ-1072:
--

[~sseth], [~bikassaha], Incorporating the feedback, much nicer with only 2 wait 
functions.

- removed waitForCompletionWithStatusUpdates
- renamed waitForCompletionWithAllStatusUpdates to 
waitForCompletionWithAllStatusUpdates
- made sure waitForCompletion does not log updates

> Consolidate monitoring APIs in DAGClient
> 
>
> Key: TEZ-1072
> URL: https://issues.apache.org/jira/browse/TEZ-1072
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jonathan Eagles
>Priority: Blocker
>  Labels: api
> Attachments: TEZ-1072-DAGCLIENTUTILS-v1.patch, TEZ-1072-v1.patch
>
>
> Rename waitForCompletionWithAllStatusUpdates - was this meant to be 
> waitForCompletionWithAllVertexUpdates
> Reduce the number of methods exposed - waitForCompletion, 
> waitForCompletionWithStatusUpdates(@Nullable Set vertices,
>   @Nullable Set statusGetOpts), 
> waitForCompletionWithAllStatusUpdates(@Nullable Set 
> statusGetOpts)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1072) Consolidate monitoring APIs in DAGClient

2014-08-13 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-1072:
-

Attachment: TEZ-1072-v1.patch

> Consolidate monitoring APIs in DAGClient
> 
>
> Key: TEZ-1072
> URL: https://issues.apache.org/jira/browse/TEZ-1072
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jonathan Eagles
>Priority: Blocker
>  Labels: api
> Attachments: TEZ-1072-DAGCLIENTUTILS-v1.patch, TEZ-1072-v1.patch
>
>
> Rename waitForCompletionWithAllStatusUpdates - was this meant to be 
> waitForCompletionWithAllVertexUpdates
> Reduce the number of methods exposed - waitForCompletion, 
> waitForCompletionWithStatusUpdates(@Nullable Set vertices,
>   @Nullable Set statusGetOpts), 
> waitForCompletionWithAllStatusUpdates(@Nullable Set 
> statusGetOpts)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1409) Change MRInputConfigurer, MROutputConfigurer to accept specific classes, isntead of a generic class

2014-08-13 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1409:


Attachment: TEZ-1409.1.txt

Patch makes the following changes.

- InputFormat and path nullable
- InputFormat and useNewApi properties are not updated if an inputFormat is not 
specified.
- Renames groupSplitsInAM to groupSplits - that's more generic.
- Removes new JobConf(conf) - was this required for any specific reason ?, 
other than potentially not changing the original conf.
- Removed the call to MRHelpers.setApi*. That's being taken care of anyway.

Wanted to change createConfigurer(Configuration conf, @Nullable Class 
inputFormat)
to createConfigurer(Configuration conf, @Nullable Class inputFormat) and createConfigurer(Configuration conf, 
@Nullable Class inputFormat) - but that 
isn't an option since they have the same runtime signature. Left that as is.

Similarly for the Outputs.

[~bikassaha], please review.

> Change MRInputConfigurer, MROutputConfigurer to accept specific classes, 
> isntead of a generic class
> ---
>
> Key: TEZ-1409
> URL: https://issues.apache.org/jira/browse/TEZ-1409
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Blocker
> Attachments: TEZ-1409.1.txt
>
>
> Separate methods to accept either mapred or mapreduce InputFormat. Similarly 
> for the Output. 
> This generates compile time errors while using these methods.
> Ran into this on the first iteration of TEZ-1407, where I had set the wrong 
> class (a committer instead of an OF).
> Also, ideally these should be @Nullable - in case the user has already set 
> them up correctly in the Configuration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1334) Annotate all non public classes in tez-runtime-library with @private

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096493#comment-14096493
 ] 

Bikas Saha commented on TEZ-1334:
-

I am guessing this covers MRInput/MROutput that are in tez-mapreduce also.


> Annotate all non public classes in tez-runtime-library with @private
> 
>
> Key: TEZ-1334
> URL: https://issues.apache.org/jira/browse/TEZ-1334
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Priority: Blocker
>
> This prevents javadoc from being generated.
> Alternative would be to mark classes explicitly public using annotation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1416) tez-api project javadoc/annotations review and clean up.

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096492#comment-14096492
 ] 

Bikas Saha commented on TEZ-1416:
-

[~sseth] [~hitesh] please review.

> tez-api project javadoc/annotations review and clean up.
> 
>
> Key: TEZ-1416
> URL: https://issues.apache.org/jira/browse/TEZ-1416
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1416.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1417) Rename *Configurer

2014-08-13 Thread Siddharth Seth (JIRA)
Siddharth Seth created TEZ-1417:
---

 Summary: Rename *Configurer
 Key: TEZ-1417
 URL: https://issues.apache.org/jira/browse/TEZ-1417
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Priority: Blocker


>From offline feedback from [~bikassaha], [~acmurthy] and [~hagleitn] - this 
>needs to be renamed.
Something like Configurator as Bikas had earlier suggested, or ConfigBuilder 
which I like more.

This can be done as a last refactor before 0.5 since it's very disruptive to 
patches in progress.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1416) tez-api project javadoc/annotations review and clean up.

2014-08-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1416:


Attachment: TEZ-1416.1.patch

Attached patch cleans up the javadoc.
Adds public annotation for all public classes. Private for the rest.
There are a couple of TODOs that I will address either in this or a follow up.
ObjectRegistry and LogUtils package moved.

> tez-api project javadoc/annotations review and clean up.
> 
>
> Key: TEZ-1416
> URL: https://issues.apache.org/jira/browse/TEZ-1416
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1416.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1334) Annotate all non public classes in tez-runtime-library with @private

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096488#comment-14096488
 ] 

Bikas Saha commented on TEZ-1334:
-

TEZ-1416 covers API. This can cover runtime-library.

> Annotate all non public classes in tez-runtime-library with @private
> 
>
> Key: TEZ-1334
> URL: https://issues.apache.org/jira/browse/TEZ-1334
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Priority: Blocker
>
> This prevents javadoc from being generated.
> Alternative would be to mark classes explicitly public using annotation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1330) Create a dist target which contains required jars

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096487#comment-14096487
 ] 

Bikas Saha commented on TEZ-1330:
-

minimal - since this is the minimal set of jars needed to run tez.

> Create a dist target which contains required jars
> -
>
> Key: TEZ-1330
> URL: https://issues.apache.org/jira/browse/TEZ-1330
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Blocker
> Attachments: TEZ-1330.1.wip.txt
>
>
> Comment from [~rohini] on TEZ-1300
> bq. The tez-dist now only contains tez-0.5.0-SNAPSHOT.tar.gz. Can you retain 
> the retain the directory structure also with the individual jars. The pig 
> client needs the individual jars in the classpath. It is convenient to 
> compile tez and point to the tez-dist directory for e2e testing. Without that 
> we will have to do extra step of untarring it and is a inconvenience during 
> development.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1334) Annotate all non public classes in tez-runtime-library with @private

2014-08-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1334:


Summary: Annotate all non public classes in tez-runtime-library with 
@private  (was: Annotate all non public classes in tez-api/tez-runtime-library 
with @private)

> Annotate all non public classes in tez-runtime-library with @private
> 
>
> Key: TEZ-1334
> URL: https://issues.apache.org/jira/browse/TEZ-1334
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Priority: Blocker
>
> This prevents javadoc from being generated.
> Alternative would be to mark classes explicitly public using annotation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TEZ-684) Uber/Local modes for Tez

2014-08-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096467#comment-14096467
 ] 

Siddharth Seth edited comment on TEZ-684 at 8/14/14 2:28 AM:
-

LocalResources are not handled in local mode. I'm not sure if there's a jira 
for this or not. It's a fairly complicated, and far-reaching change to get 
something like this to work. Will require the concept of a working directory - 
which would then have to be explicitly used.
In most cases, this isn't a problem since required libraries are already part 
of the client / AM. For custom resources though - this is problematic. That's 
the main reason SplitsOnClient_DistCache does not function.


was (Author: sseth):
LocalResources are not handled in local mode. I'm not sure if there's a jira 
for this or not. It's a fairly complicated, and far-reaching change to get 
something like this to work. Will require the concept of a working directory - 
which would then have to be explicitly used.

> Uber/Local modes for Tez
> 
>
> Key: TEZ-684
> URL: https://issues.apache.org/jira/browse/TEZ-684
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Chen He
>Assignee: Chen He
> Attachments: TEZ-684-2014-7-21.patch, TEZ-684.patch, TEZ-684.patch, 
> TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, 
> TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, 
> TezUberModeDesignDraft.png
>
>
> Similarly to MapReduce Uber-mode in Yarn, we plan to create the Uber-mode for 
> Tez. It runs all tasks in local in one process.
> Our target is to start DAGAppMaster in local JVM and let it run all tasks in 
> one process. 
> Here is my design: 
> Once user submits a DAG, Tez starts a instance of DAGAppMaster. This 
> DAGAppMaster will check TezConfiguration before instantiate 
> ContainerLauncher. If "is_Uber" is true, DAGAppMaster creates a 
> LocalContainerLauncher. LocalTaskScheduler and LocalTaskSchedulerEventHandler 
> will call LocalContainerLauncher to run all tasks one by one in a single JVM. 
> Communications between ResourceManager and local classes (DAGAppMaster, 
> LocalContainerLauncher, LocalTaskScheduler, and 
> LocalTaskSchedulerEventHandler) are muted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TEZ-1337) Handling of local-dirs for Local Mode

2014-08-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096462#comment-14096462
 ] 

Siddharth Seth edited comment on TEZ-1337 at 8/14/14 2:29 AM:
--

Not a blocker after TEZ-1393. The main issue users were facing was the user.dir 
being reset - and hence not having access to data which was generated 
previously.


was (Author: sseth):
Not a blocker after TEZ-1397. The main issue users were facing was the user.dir 
being reset - and hence not having access to data which was generated 
previously.

> Handling of local-dirs for Local Mode
> -
>
> Key: TEZ-1337
> URL: https://issues.apache.org/jira/browse/TEZ-1337
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Chen He
> Attachments: TEZ-1337.patch
>
>
> staging dir is being used to write intermediate data. At some point, it may 
> be worthwhile to confgure a separate work area. IAC, these should be cleaned 
> up  - at least the intermediate data after a local session executes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-684) Uber/Local modes for Tez

2014-08-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096467#comment-14096467
 ] 

Siddharth Seth commented on TEZ-684:


LocalResources are not handled in local mode. I'm not sure if there's a jira 
for this or not. It's a fairly complicated, and far-reaching change to get 
something like this to work. Will require the concept of a working directory - 
which would then have to be explicitly used.

> Uber/Local modes for Tez
> 
>
> Key: TEZ-684
> URL: https://issues.apache.org/jira/browse/TEZ-684
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Chen He
>Assignee: Chen He
> Attachments: TEZ-684-2014-7-21.patch, TEZ-684.patch, TEZ-684.patch, 
> TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, 
> TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, 
> TezUberModeDesignDraft.png
>
>
> Similarly to MapReduce Uber-mode in Yarn, we plan to create the Uber-mode for 
> Tez. It runs all tasks in local in one process.
> Our target is to start DAGAppMaster in local JVM and let it run all tasks in 
> one process. 
> Here is my design: 
> Once user submits a DAG, Tez starts a instance of DAGAppMaster. This 
> DAGAppMaster will check TezConfiguration before instantiate 
> ContainerLauncher. If "is_Uber" is true, DAGAppMaster creates a 
> LocalContainerLauncher. LocalTaskScheduler and LocalTaskSchedulerEventHandler 
> will call LocalContainerLauncher to run all tasks one by one in a single JVM. 
> Communications between ResourceManager and local classes (DAGAppMaster, 
> LocalContainerLauncher, LocalTaskScheduler, and 
> LocalTaskSchedulerEventHandler) are muted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1132) Consistent naming of Input and Outputs

2014-08-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096463#comment-14096463
 ] 

Siddharth Seth commented on TEZ-1132:
-

bq. Change KV to KeyValue in all names.
Optional, the class names are already long. Also this can get confusing w.r.t 
Readers since some are KeyValue based, others are KeyValues based.
If we evolve these to be RowBased at some point - that will just be a new set 
if Inputs/Otuputs.

bq. LocalOnFileSorterOutput should probably be removed.
bq. LocalMergedInput should probably be moved out.
+1

bq. Do we need the OnFile prefix on these? These could potentially write to 
HDFS?
Agree. I think we should remove it.

bq. Is the Shuffled prefix needed? The reader threads could potentially read 
from HDFS?
Shuffled can be interpreted in several different ways - mapreduce shuffle, just 
moving data. Probably best to just remove it to avoid confusion.

The proposed input names should also have KV/KeyValue.



> Consistent naming of Input and Outputs
> --
>
> Key: TEZ-1132
> URL: https://issues.apache.org/jira/browse/TEZ-1132
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Blocker
>
> Some places we should Sorted Partitioned. In others we should Shuffled. We 
> should use a consistent naming scheme based on Sorted, Grouped, Partitioned 
> sub-terms so that the function is clear from the name.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1337) Handling of local-dirs for Local Mode

2014-08-13 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1337:


Priority: Major  (was: Blocker)

> Handling of local-dirs for Local Mode
> -
>
> Key: TEZ-1337
> URL: https://issues.apache.org/jira/browse/TEZ-1337
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Chen He
> Attachments: TEZ-1337.patch
>
>
> staging dir is being used to write intermediate data. At some point, it may 
> be worthwhile to confgure a separate work area. IAC, these should be cleaned 
> up  - at least the intermediate data after a local session executes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1337) Handling of local-dirs for Local Mode

2014-08-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096462#comment-14096462
 ] 

Siddharth Seth commented on TEZ-1337:
-

Not a blocker after TEZ-1397. The main issue users were facing was the user.dir 
being reset - and hence not having access to data which was generated 
previously.

> Handling of local-dirs for Local Mode
> -
>
> Key: TEZ-1337
> URL: https://issues.apache.org/jira/browse/TEZ-1337
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Chen He
>Priority: Blocker
> Attachments: TEZ-1337.patch
>
>
> staging dir is being used to write intermediate data. At some point, it may 
> be worthwhile to confgure a separate work area. IAC, these should be cleaned 
> up  - at least the intermediate data after a local session executes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1416) tez-api project javadoc/annotations review and clean up.

2014-08-13 Thread Bikas Saha (JIRA)
Bikas Saha created TEZ-1416:
---

 Summary: tez-api project javadoc/annotations review and clean up.
 Key: TEZ-1416
 URL: https://issues.apache.org/jira/browse/TEZ-1416
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1330) Create a dist target which contains required jars

2014-08-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096461#comment-14096461
 ] 

Siddharth Seth commented on TEZ-1330:
-

suggestions ? -withoutHadoop ?

> Create a dist target which contains required jars
> -
>
> Key: TEZ-1330
> URL: https://issues.apache.org/jira/browse/TEZ-1330
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Blocker
> Attachments: TEZ-1330.1.wip.txt
>
>
> Comment from [~rohini] on TEZ-1300
> bq. The tez-dist now only contains tez-0.5.0-SNAPSHOT.tar.gz. Can you retain 
> the retain the directory structure also with the individual jars. The pig 
> client needs the individual jars in the classpath. It is convenient to 
> compile tez and point to the tez-dist directory for e2e testing. Without that 
> we will have to do extra step of untarring it and is a inconvenience during 
> development.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1246) Replace constructors with create() methods for DAG, Vertex, Edge etc in the API

2014-08-13 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1246:


Priority: Blocker  (was: Major)
Target Version/s: 0.5.0

I think this change is worth making before 0.5. Gives us some control over 
evolving DAG, Vertex etc.

> Replace constructors with create() methods for DAG, Vertex, Edge etc in the 
> API
> ---
>
> Key: TEZ-1246
> URL: https://issues.apache.org/jira/browse/TEZ-1246
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1334) Annotate all non public classes in tez-api/tez-runtime-library with @private

2014-08-13 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1334:


Target Version/s: 0.5.0

> Annotate all non public classes in tez-api/tez-runtime-library with @private
> 
>
> Key: TEZ-1334
> URL: https://issues.apache.org/jira/browse/TEZ-1334
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Priority: Blocker
>
> This prevents javadoc from being generated.
> Alternative would be to mark classes explicitly public using annotation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1334) Annotate all non public classes in tez-api/tez-runtime-library with @private

2014-08-13 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1334:


Priority: Blocker  (was: Major)

> Annotate all non public classes in tez-api/tez-runtime-library with @private
> 
>
> Key: TEZ-1334
> URL: https://issues.apache.org/jira/browse/TEZ-1334
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Priority: Blocker
>
> This prevents javadoc from being generated.
> Alternative would be to mark classes explicitly public using annotation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1316) Document public interfaces/classes which are not meant to be implemented/extended

2014-08-13 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1316:


Priority: Blocker  (was: Critical)

> Document public interfaces/classes which are not meant to be 
> implemented/extended
> -
>
> Key: TEZ-1316
> URL: https://issues.apache.org/jira/browse/TEZ-1316
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1414) Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096429#comment-14096429
 ] 

Bikas Saha commented on TEZ-1414:
-

There is no clear solution for the actual dirs on the jenkins host. Creating 
mini cluster can slow down this unit test. Can we mock any of this? If not then 
please see if you can reuse any of the existing tests that already start a mini 
cluster. e.g. TestMRRJobsDAGApi that is already doing a bunch on other tests.

> Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.
> ---
>
> Key: TEZ-1414
> URL: https://issues.apache.org/jira/browse/TEZ-1414
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Fix For: 0.5.0
>
> Attachments: TEZ-1414.1.patch
>
>
> The test passes locally but for some reason is fails on Jenkins. Disabling 
> temporarily until the issue get worked out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1414) Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.

2014-08-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096389#comment-14096389
 ] 

Hitesh Shah commented on TEZ-1414:
--

[~pramachandran] The other approach would be to use a local resource path from 
HDFS which is under the control of the test and not left to the env. You can 
launch a MiniDFSCluster, create files/dirs within it with the required path 
permissions set as needed.  

> Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.
> ---
>
> Key: TEZ-1414
> URL: https://issues.apache.org/jira/browse/TEZ-1414
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Fix For: 0.5.0
>
> Attachments: TEZ-1414.1.patch
>
>
> The test passes locally but for some reason is fails on Jenkins. Disabling 
> temporarily until the issue get worked out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1415) Merge various Util classes in Tez

2014-08-13 Thread Bikas Saha (JIRA)
Bikas Saha created TEZ-1415:
---

 Summary: Merge various Util classes in Tez
 Key: TEZ-1415
 URL: https://issues.apache.org/jira/browse/TEZ-1415
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha


TezCommonUtils, LogUtils.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1414) Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.

2014-08-13 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096377#comment-14096377
 ] 

Prakash Ramachandran commented on TEZ-1414:
---

Working on this. the reason for failing in jenkins could be that one of the 
ancestor directories for test directory does not have the o+x permissions. will 
check if the java.io.tmpdir can be used - not sure where the temp directory for 
jenkins build is set to.

> Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.
> ---
>
> Key: TEZ-1414
> URL: https://issues.apache.org/jira/browse/TEZ-1414
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Fix For: 0.5.0
>
> Attachments: TEZ-1414.1.patch
>
>
> The test passes locally but for some reason is fails on Jenkins. Disabling 
> temporarily until the issue get worked out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1345) Add checks to guarantee all init events are written to recovery to consider vertex initialized

2014-08-13 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096356#comment-14096356
 ] 

Jeff Zhang commented on TEZ-1345:
-

Attach the patch. 

[~hitesh] I make the following changes in the patch
* handle V_RouteEvent synchronously in VertexManager to make sure the 
RootDataInputFormation is written to recovery before VertexInitlizedEvent 
* add unit test to verify that RootDataInputFormation is written to recovery 
before VertexInitlizedEvent

> Add checks to guarantee all init events are written to recovery to consider 
> vertex initialized
> --
>
> Key: TEZ-1345
> URL: https://issues.apache.org/jira/browse/TEZ-1345
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: Tez-1345.patch
>
>
> Related to issue discovered in TEZ-1033



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-684) Uber/Local modes for Tez

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096355#comment-14096355
 ] 

Bikas Saha commented on TEZ-684:


What happens to local resources that are part of the vertex but not part of the 
AM. How does the task find them in its working directory?
Do tasks run in separate dirs? If yes, then temp files from previous tasks 
would be visible to next one. If no, then do we symlink their local files to 
the different dirs?

> Uber/Local modes for Tez
> 
>
> Key: TEZ-684
> URL: https://issues.apache.org/jira/browse/TEZ-684
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Chen He
>Assignee: Chen He
> Attachments: TEZ-684-2014-7-21.patch, TEZ-684.patch, TEZ-684.patch, 
> TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, 
> TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, 
> TezUberModeDesignDraft.png
>
>
> Similarly to MapReduce Uber-mode in Yarn, we plan to create the Uber-mode for 
> Tez. It runs all tasks in local in one process.
> Our target is to start DAGAppMaster in local JVM and let it run all tasks in 
> one process. 
> Here is my design: 
> Once user submits a DAG, Tez starts a instance of DAGAppMaster. This 
> DAGAppMaster will check TezConfiguration before instantiate 
> ContainerLauncher. If "is_Uber" is true, DAGAppMaster creates a 
> LocalContainerLauncher. LocalTaskScheduler and LocalTaskSchedulerEventHandler 
> will call LocalContainerLauncher to run all tasks one by one in a single JVM. 
> Communications between ResourceManager and local classes (DAGAppMaster, 
> LocalContainerLauncher, LocalTaskScheduler, and 
> LocalTaskSchedulerEventHandler) are muted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1345) Add checks to guarantee all init events are written to recovery to consider vertex initialized

2014-08-13 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1345:


Attachment: Tez-1345.patch

> Add checks to guarantee all init events are written to recovery to consider 
> vertex initialized
> --
>
> Key: TEZ-1345
> URL: https://issues.apache.org/jira/browse/TEZ-1345
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: Tez-1345.patch
>
>
> Related to issue discovered in TEZ-1033



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1222) Add examples for uses of the API

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096333#comment-14096333
 ] 

Bikas Saha commented on TEZ-1222:
-

WordCount, OrderededWordCount and SimpleSessionExample are committed. Intersect 
example already exists.
Removing this from blocker status.

> Add examples for uses of the API
> 
>
> Key: TEZ-1222
> URL: https://issues.apache.org/jira/browse/TEZ-1222
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Blocker
>
> Add examples of writing Input, Output, Processor.
> Add examples of creating DAGs' with different properties on the edges to 
> clarify use cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1222) Add examples for uses of the API

2014-08-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1222:


Priority: Major  (was: Blocker)

> Add examples for uses of the API
> 
>
> Key: TEZ-1222
> URL: https://issues.apache.org/jira/browse/TEZ-1222
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>
> Add examples of writing Input, Output, Processor.
> Add examples of creating DAGs' with different properties on the edges to 
> clarify use cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-671) Support View/Modify ACLs for DAGs

2014-08-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096328#comment-14096328
 ] 

Hitesh Shah commented on TEZ-671:
-

API introductions involved. [~bikassaha] [~sseth] please review.

> Support View/Modify ACLs for DAGs
> -
>
> Key: TEZ-671
> URL: https://issues.apache.org/jira/browse/TEZ-671
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Hitesh Shah
> Attachments: TEZ-671.2.patch, TEZ-671.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1072) Consolidate monitoring APIs in DAGClient

2014-08-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1072:


Assignee: Jonathan Eagles

> Consolidate monitoring APIs in DAGClient
> 
>
> Key: TEZ-1072
> URL: https://issues.apache.org/jira/browse/TEZ-1072
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jonathan Eagles
>Priority: Blocker
>  Labels: api
> Attachments: TEZ-1072-DAGCLIENTUTILS-v1.patch
>
>
> Rename waitForCompletionWithAllStatusUpdates - was this meant to be 
> waitForCompletionWithAllVertexUpdates
> Reduce the number of methods exposed - waitForCompletion, 
> waitForCompletionWithStatusUpdates(@Nullable Set vertices,
>   @Nullable Set statusGetOpts), 
> waitForCompletionWithAllStatusUpdates(@Nullable Set 
> statusGetOpts)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TEZ-1132) Consistent naming of Input and Outputs

2014-08-13 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096298#comment-14096298
 ] 

Jeff Zhang edited comment on TEZ-1132 at 8/13/14 11:29 PM:
---

[~bikassaha] Sure, assign it to you.


was (Author: zjffdu):
[~bikassaha] Sure, please take this.

> Consistent naming of Input and Outputs
> --
>
> Key: TEZ-1132
> URL: https://issues.apache.org/jira/browse/TEZ-1132
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Blocker
>
> Some places we should Sorted Partitioned. In others we should Shuffled. We 
> should use a consistent naming scheme based on Sorted, Grouped, Partitioned 
> sub-terms so that the function is clear from the name.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1132) Consistent naming of Input and Outputs

2014-08-13 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096298#comment-14096298
 ] 

Jeff Zhang commented on TEZ-1132:
-

[~bikassaha] Sure, please take this.

> Consistent naming of Input and Outputs
> --
>
> Key: TEZ-1132
> URL: https://issues.apache.org/jira/browse/TEZ-1132
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Jeff Zhang
>Priority: Blocker
>
> Some places we should Sorted Partitioned. In others we should Shuffled. We 
> should use a consistent naming scheme based on Sorted, Grouped, Partitioned 
> sub-terms so that the function is clear from the name.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1132) Consistent naming of Input and Outputs

2014-08-13 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1132:


Assignee: Bikas Saha  (was: Jeff Zhang)

> Consistent naming of Input and Outputs
> --
>
> Key: TEZ-1132
> URL: https://issues.apache.org/jira/browse/TEZ-1132
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Blocker
>
> Some places we should Sorted Partitioned. In others we should Shuffled. We 
> should use a consistent naming scheme based on Sorted, Grouped, Partitioned 
> sub-terms so that the function is clear from the name.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TEZ-1414) Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.

2014-08-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha resolved TEZ-1414.
-

   Resolution: Fixed
Fix Version/s: 0.5.0

commit e2692f7cacd56fd2e8d2734f79ecfbb447e2fc2c
Author: Bikas Saha 
Date:   Wed Aug 13 16:20:59 2014 -0700

TEZ-1414. Disable TestTezClientUtils.testLocalResourceVisibility to make 
builds pass(bikas)


> Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.
> ---
>
> Key: TEZ-1414
> URL: https://issues.apache.org/jira/browse/TEZ-1414
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Fix For: 0.5.0
>
> Attachments: TEZ-1414.1.patch
>
>
> The test passes locally but for some reason is fails on Jenkins. Disabling 
> temporarily until the issue get worked out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1414) Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.

2014-08-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1414:


Attachment: TEZ-1414.1.patch

Committing trivial patch.

> Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.
> ---
>
> Key: TEZ-1414
> URL: https://issues.apache.org/jira/browse/TEZ-1414
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Fix For: 0.5.0
>
> Attachments: TEZ-1414.1.patch
>
>
> The test passes locally but for some reason is fails on Jenkins. Disabling 
> temporarily until the issue get worked out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1337) Handling of local-dirs for Local Mode

2014-08-13 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096284#comment-14096284
 ] 

Chen He commented on TEZ-1337:
--

No need, I can submit patch today. 

> Handling of local-dirs for Local Mode
> -
>
> Key: TEZ-1337
> URL: https://issues.apache.org/jira/browse/TEZ-1337
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Chen He
>Priority: Blocker
> Attachments: TEZ-1337.patch
>
>
> staging dir is being used to write intermediate data. At some point, it may 
> be worthwhile to confgure a separate work area. IAC, these should be cleaned 
> up  - at least the intermediate data after a local session executes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1414) Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.

2014-08-13 Thread Bikas Saha (JIRA)
Bikas Saha created TEZ-1414:
---

 Summary: Disable TestTezClientUtils.testLocalResourceVisibility to 
make builds pass.
 Key: TEZ-1414
 URL: https://issues.apache.org/jira/browse/TEZ-1414
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha


The test passes locally but for some reason is fails on Jenkins. Disabling 
temporarily until the issue get worked out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1413) Fix build for TestTezClientUtils.testLocalResourceVisibility

2014-08-13 Thread Bikas Saha (JIRA)
Bikas Saha created TEZ-1413:
---

 Summary: Fix build for 
TestTezClientUtils.testLocalResourceVisibility
 Key: TEZ-1413
 URL: https://issues.apache.org/jira/browse/TEZ-1413
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Prakash Ramachandran


Build failed in Jenkins: Tez-Build #565
org.apache.tez.client.TestTezClientUtils
Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.131 sec <<< 
FAILURE!
testLocalResourceVisibility(org.apache.tez.client.TestTezClientUtils)  Time 
elapsed: 0.093 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.tez.client.TestTezClientUtils.testLocalResourceVisibility(TestTezClientUtils.java:258)

Running org.apache.tez.common.security.TestTokenCache
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.944 sec 
Running org.apache.tez.common.TestTezCommonUtils
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.949 sec 
Running org.apache.tez.common.TestReflectionUtils
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.69 sec

Results :

Failed tests: 
  TestTezClientUtils.testLocalResourceVisibility:258 null




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1337) Handling of local-dirs for Local Mode

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096278#comment-14096278
 ] 

Bikas Saha commented on TEZ-1337:
-

Is this a 0.5 blocker or can be followed up in 0.5.1?

> Handling of local-dirs for Local Mode
> -
>
> Key: TEZ-1337
> URL: https://issues.apache.org/jira/browse/TEZ-1337
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Chen He
>Priority: Blocker
> Attachments: TEZ-1337.patch
>
>
> staging dir is being used to write intermediate data. At some point, it may 
> be worthwhile to confgure a separate work area. IAC, these should be cleaned 
> up  - at least the intermediate data after a local session executes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1072) Consolidate monitoring APIs in DAGClient

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096276#comment-14096276
 ] 

Bikas Saha commented on TEZ-1072:
-

+1 on just removing.
public abstract DAGStatus waitForCompletionWithStatusUpdates(@Nullable 
Set vertices,
@Nullable Set statusGetOpts) throws IOException, 
TezException;

And renaming
public abstract DAGStatus waitForCompletionWithAllStatusUpdates(FOO) to
public abstract DAGStatus waitForCompletionWithStatusUpdates(FOO)

[~jeagles] I see what you are saying with the Utils class but it seems more 
natural to be able to use the DAGClient directly IMO.

> Consolidate monitoring APIs in DAGClient
> 
>
> Key: TEZ-1072
> URL: https://issues.apache.org/jira/browse/TEZ-1072
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Priority: Blocker
>  Labels: api
> Attachments: TEZ-1072-DAGCLIENTUTILS-v1.patch
>
>
> Rename waitForCompletionWithAllStatusUpdates - was this meant to be 
> waitForCompletionWithAllVertexUpdates
> Reduce the number of methods exposed - waitForCompletion, 
> waitForCompletionWithStatusUpdates(@Nullable Set vertices,
>   @Nullable Set statusGetOpts), 
> waitForCompletionWithAllStatusUpdates(@Nullable Set 
> statusGetOpts)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1320) Remove getApplicationId from DAGClient

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096271#comment-14096271
 ] 

Bikas Saha commented on TEZ-1320:
-

The patch looks fine except for the following that should be retained in the 
tests. Its important to test this.
{code}   verify(yarnClient, times(1)).submitApplication(captor.capture());
-  Assert.assertEquals(appId1, dagClient.getApplicationId());{code}

Pig and Hive use this (mostly for logging and UI) I think. So not sure about 
removing this because they use it for users to point users to the application  
for debugging. So not sure about removing it. If we do remove this then we 
should add something like dagClient.getExecutionContext() that provides a 
string that can be used for logging.

> Remove getApplicationId from DAGClient
> --
>
> Key: TEZ-1320
> URL: https://issues.apache.org/jira/browse/TEZ-1320
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Jonathan Eagles
>Priority: Blocker
> Attachments: TEZ-1320-v1.patch
>
>
> We should either get rid of this, or convert it to a String. Not sure why 
> this API needs to be exposed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1330) Create a dist target which contains required jars

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096258#comment-14096258
 ] 

Bikas Saha commented on TEZ-1330:
-

Tried it. Works for me. +1. Super useful for dev.
Can we find a better name for partial?

> Create a dist target which contains required jars
> -
>
> Key: TEZ-1330
> URL: https://issues.apache.org/jira/browse/TEZ-1330
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Blocker
> Attachments: TEZ-1330.1.wip.txt
>
>
> Comment from [~rohini] on TEZ-1300
> bq. The tez-dist now only contains tez-0.5.0-SNAPSHOT.tar.gz. Can you retain 
> the retain the directory structure also with the individual jars. The pig 
> client needs the individual jars in the classpath. It is convenient to 
> compile tez and point to the tez-dist directory for e2e testing. Without that 
> we will have to do extra step of untarring it and is a inconvenience during 
> development.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-269) Fix ResourceMgrDelegate#getDelegationToken after YARN-868 is fixed

2014-08-13 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated TEZ-269:
-

Attachment: TEZ-269.patch

Patch can be submitted once YARN-868 is committed

> Fix ResourceMgrDelegate#getDelegationToken after YARN-868 is fixed
> --
>
> Key: TEZ-269
> URL: https://issues.apache.org/jira/browse/TEZ-269
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>  Labels: TEZ-0.3.0
> Attachments: TEZ-269.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-671) Support View/Modify ACLs for DAGs

2014-08-13 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-671:


Attachment: TEZ-671.3.patch

Comments addressed. 

> Support View/Modify ACLs for DAGs
> -
>
> Key: TEZ-671
> URL: https://issues.apache.org/jira/browse/TEZ-671
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Hitesh Shah
> Attachments: TEZ-671.2.patch, TEZ-671.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1132) Consistent naming of Input and Outputs

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096248#comment-14096248
 ] 

Bikas Saha commented on TEZ-1132:
-

LocalOnFileSorterOutput should probably be removed.
OnFileSortedOutput -> OnFileOrderedPartitionedKVOutput
Change KV to KeyValue in all names.

Do we need the OnFile prefix on these? These could potentially write to HDFS?

LocalMergedInput should probably be moved out.
SortedGroupedMergedInput -> OrderedGroupedMergedInput
ShuffledMergedInput -> ShuffledOrderedGroupedInput
ShuffledMergedInputLegacy -> ShuffledOrderedGroupedInput

[~zjffdu] Do you mind if I take this over. This may be easier to do in PST as 
most of the Hive/Pig people who will get broken because of this are in the same 
time zone and could iterate faster over it and ask for help if needed.



> Consistent naming of Input and Outputs
> --
>
> Key: TEZ-1132
> URL: https://issues.apache.org/jira/browse/TEZ-1132
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Jeff Zhang
>Priority: Blocker
>
> Some places we should Sorted Partitioned. In others we should Shuffled. We 
> should use a consistent naming scheme based on Sorted, Grouped, Partitioned 
> sub-terms so that the function is clear from the name.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TEZ-1132) Consistent naming of Input and Outputs

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096248#comment-14096248
 ] 

Bikas Saha edited comment on TEZ-1132 at 8/13/14 10:41 PM:
---

LocalOnFileSorterOutput should probably be removed.
OnFileSortedOutput -> OnFileOrderedPartitionedKVOutput
Change KV to KeyValue in all names.

Do we need the OnFile prefix on these? These could potentially write to HDFS?

LocalMergedInput should probably be moved out.
SortedGroupedMergedInput -> OrderedGroupedMergedInput
ShuffledMergedInput -> ShuffledOrderedGroupedInput
ShuffledMergedInputLegacy -> ShuffledOrderedGroupedInput

Is the Shuffled prefix needed? The reader threads could potentially read from 
HDFS?


[~zjffdu] Do you mind if I take this over. This may be easier to do in PST as 
most of the Hive/Pig people who will get broken because of this are in the same 
time zone and could iterate faster over it and ask for help if needed.




was (Author: bikassaha):
LocalOnFileSorterOutput should probably be removed.
OnFileSortedOutput -> OnFileOrderedPartitionedKVOutput
Change KV to KeyValue in all names.

Do we need the OnFile prefix on these? These could potentially write to HDFS?

LocalMergedInput should probably be moved out.
SortedGroupedMergedInput -> OrderedGroupedMergedInput
ShuffledMergedInput -> ShuffledOrderedGroupedInput
ShuffledMergedInputLegacy -> ShuffledOrderedGroupedInput

[~zjffdu] Do you mind if I take this over. This may be easier to do in PST as 
most of the Hive/Pig people who will get broken because of this are in the same 
time zone and could iterate faster over it and ask for help if needed.



> Consistent naming of Input and Outputs
> --
>
> Key: TEZ-1132
> URL: https://issues.apache.org/jira/browse/TEZ-1132
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Jeff Zhang
>Priority: Blocker
>
> Some places we should Sorted Partitioned. In others we should Shuffled. We 
> should use a consistent naming scheme based on Sorted, Grouped, Partitioned 
> sub-terms so that the function is clear from the name.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1347) Consolidate MRHelpers

2014-08-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096219#comment-14096219
 ] 

Siddharth Seth commented on TEZ-1347:
-

Committing.

> Consolidate MRHelpers
> -
>
> Key: TEZ-1347
> URL: https://issues.apache.org/jira/browse/TEZ-1347
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Blocker
> Attachments: TEZ-1347-initial-review.txt, TEZ-1347.1.txt, 
> TEZ-1347.2.txt
>
>
> - Remove methods which don't belong in MRHelpers and potentially move them to 
> TezHelpers.
> - Get rid of methods which we don't expect/want users to use.
> - Get rid of multiple variants of the same method, if these exist.
> - Investigate other cleanup in MRHelpers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1412) Create a KeyValue(s)Reader hierarchy to show properties

2014-08-13 Thread Bikas Saha (JIRA)
Bikas Saha created TEZ-1412:
---

 Summary: Create a KeyValue(s)Reader hierarchy to show properties
 Key: TEZ-1412
 URL: https://issues.apache.org/jira/browse/TEZ-1412
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha


E.g. OrderedKeyValuesReader to show that the keys are ordered. This way the 
users can cast the appropriate reader instead of casting the inputs. This 
enabled input impls to be changed transparently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1347) Consolidate MRHelpers

2014-08-13 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1347:


Attachment: TEZ-1347.2.txt

Updated patch with comments addressed.

Renamed TezAPIHelpers to TezUtils and TezUtils to TezUtilsInternal. No 
consolidation of the various TezUtils in this patch though.

Fixed javadoc, removed Hive/Pig LimitedPrivate. Have removed the unstable from 
the API though.

Removed the numReducers check, and the associated fields. For YARNRunner, this 
check already runs in the JobClient - so nothing required there.

> Consolidate MRHelpers
> -
>
> Key: TEZ-1347
> URL: https://issues.apache.org/jira/browse/TEZ-1347
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Blocker
> Attachments: TEZ-1347-initial-review.txt, TEZ-1347.1.txt, 
> TEZ-1347.2.txt
>
>
> - Remove methods which don't belong in MRHelpers and potentially move them to 
> TezHelpers.
> - Get rid of methods which we don't expect/want users to use.
> - Get rid of multiple variants of the same method, if these exist.
> - Investigate other cleanup in MRHelpers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1055) Rename tez-mapreduce-examples to tez-examples

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096187#comment-14096187
 ] 

Bikas Saha commented on TEZ-1055:
-

[~rekhajoshm] Can I take up this blocker jira since we are looking at getting a 
tez 0.5 release done by this week.

> Rename tez-mapreduce-examples to tez-examples
> -
>
> Key: TEZ-1055
> URL: https://issues.apache.org/jira/browse/TEZ-1055
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Rekha Joshi
>Priority: Blocker
>
> And also the internal classes where applicable to remove MR references.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (TEZ-1361) Move SimpleMRProcessor, MRInput and MROutput into runtime-library

2014-08-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha reassigned TEZ-1361:
---

Assignee: Bikas Saha

> Move SimpleMRProcessor, MRInput and MROutput into runtime-library
> -
>
> Key: TEZ-1361
> URL: https://issues.apache.org/jira/browse/TEZ-1361
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Its currently in tez-mapreduce and inaccessible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1361) Move SimpleMRProcessor, MRInput and MROutput into runtime-library

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096179#comment-14096179
 ] 

Bikas Saha commented on TEZ-1361:
-

I am leaning towards leaving this in tez-mapreduce but moving the main "API" 
classes into proper packages within the project. This can be done under 
TEZ-1367. These classes are essentially MR specific. Closing this jira. Please 
reopen if anyone disagrees.

> Move SimpleMRProcessor, MRInput and MROutput into runtime-library
> -
>
> Key: TEZ-1361
> URL: https://issues.apache.org/jira/browse/TEZ-1361
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Its currently in tez-mapreduce and inaccessible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TEZ-1361) Move SimpleMRProcessor, MRInput and MROutput into runtime-library

2014-08-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha resolved TEZ-1361.
-

Resolution: Not a Problem

> Move SimpleMRProcessor, MRInput and MROutput into runtime-library
> -
>
> Key: TEZ-1361
> URL: https://issues.apache.org/jira/browse/TEZ-1361
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Its currently in tez-mapreduce and inaccessible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1411) Address initial feedback on swimlanes

2014-08-13 Thread Bikas Saha (JIRA)
Bikas Saha created TEZ-1411:
---

 Summary: Address initial feedback on swimlanes
 Key: TEZ-1411
 URL: https://issues.apache.org/jira/browse/TEZ-1411
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Gopal V
Priority: Blocker
 Fix For: 0.5.0


Few other good to have things
1) A wrapper script that takes care of the command chaining with a single appId 
as input from the user.
2) Legend in the README or in the svg itself about what is what.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1402) MRoutput configurer should disable committer

2014-08-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1402:


Summary: MRoutput configurer should disable committer  (was: MRoutput 
configurer should allow other committers and no committer)

> MRoutput configurer should disable committer
> 
>
> Key: TEZ-1402
> URL: https://issues.apache.org/jira/browse/TEZ-1402
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1402.1.patch, TEZ-1402.2.patch, TEZ-1402.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1402) MRoutput configurer should allow disabling the committer

2014-08-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1402:


Summary: MRoutput configurer should allow disabling the committer  (was: 
MRoutput configurer should disable committer)

> MRoutput configurer should allow disabling the committer
> 
>
> Key: TEZ-1402
> URL: https://issues.apache.org/jira/browse/TEZ-1402
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1402.1.patch, TEZ-1402.2.patch, TEZ-1402.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1402) MRoutput configurer should allow other committers and no committer

2014-08-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096148#comment-14096148
 ] 

Siddharth Seth commented on TEZ-1402:
-

Looks good.

> MRoutput configurer should allow other committers and no committer
> --
>
> Key: TEZ-1402
> URL: https://issues.apache.org/jira/browse/TEZ-1402
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1402.1.patch, TEZ-1402.2.patch, TEZ-1402.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1402) MRoutput configurer should allow other committers and no committer

2014-08-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1402:


Attachment: TEZ-1402.3.patch

Thanks. Attaching commit patch.

> MRoutput configurer should allow other committers and no committer
> --
>
> Key: TEZ-1402
> URL: https://issues.apache.org/jira/browse/TEZ-1402
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1402.1.patch, TEZ-1402.2.patch, TEZ-1402.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1347) Consolidate MRHelpers

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096093#comment-14096093
 ] 

Bikas Saha commented on TEZ-1347:
-

TezAPIHelpers Why does it have to have API is the name. Isnt TezHelpers enough? 
Or TezUtils. (Merge other *Utils into TezInternalUtils).

Remove the byte reference from java doc.? User payload should be enough.
Convert a Configuration to compressed user pay load (i.e. byte[]) using
Convert compressed pay load in byte[] to a Conf

Remove the limited private for hiveand pig if this is needed by anyone who 
essentially needs to offload MR based pipelines to Tez? Same for other such 
cases. @Unstable is fine to keep.
   @LimitedPrivate("Hive, Pig")
   @Unstable
-  public static void translateVertexConfToTez(Configuration conf) {
+  public static void translateMRConfToTez(Configuration conf) {
 convertVertexConfToTez(conf);

Wrong javadoc
+   * This is only meant to be used if frameworks are not setting up their own 
java options,
+   * and would like to fallback to using java options which may already be 
configured for
+   * Hadoop MapReduce mappers. < HERE
*
* Uses mapreduce.admin.reduce.child.java.opts, mapreduce.reduce.java.opts
* and mapreduce.reduce.log.level from config to generate the opts.
@@ -213,7 +305,7 @@ public class MRHelpers {
* @return JAVA_OPTS string to be used in launching the JVM
*/
   @SuppressWarnings("deprecation")
-  public static String getReduceJavaOpts(Configuration conf) {
+  public static String getJavaOptsForMRReducer(Configuration conf) {

Will numReducers ever be true for anything other than YARNRunner? If that is 
the case, then we may not need this code at all for everybody else. Just move 
it to YARNRunner?
+  if (numReduces != 0) {
+conf.setBooleanIfUnset("mapred.reducer.new-api",
..
+if (numReduces != 0) {
+  ensureNotSet(conf, "mapred.partitioner.class", mode);

Some more javadoc on when to use would help for
MRHelpers.translateMRConfToTez()




> Consolidate MRHelpers
> -
>
> Key: TEZ-1347
> URL: https://issues.apache.org/jira/browse/TEZ-1347
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Blocker
> Attachments: TEZ-1347-initial-review.txt, TEZ-1347.1.txt
>
>
> - Remove methods which don't belong in MRHelpers and potentially move them to 
> TezHelpers.
> - Get rid of methods which we don't expect/want users to use.
> - Get rid of multiple variants of the same method, if these exist.
> - Investigate other cleanup in MRHelpers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TEZ-817) TEZ_LIB_URI are always uploaded as public Local Resource

2014-08-13 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha resolved TEZ-817.


   Resolution: Fixed
Fix Version/s: 0.5.0
 Hadoop Flags: Reviewed

Thanks for your contribution. 
Committed.
commit 215909e04415425688b9eb54d342d45ab6f5fa53
Author: Bikas Saha 
Date:   Wed Aug 13 11:47:28 2014 -0700

TEZ-817. TEZ_LIB_URI are always uploaded as public Local Resource (Prakash 
Ramachandran via bikas)


> TEZ_LIB_URI are always uploaded as public Local Resource
> 
>
> Key: TEZ-817
> URL: https://issues.apache.org/jira/browse/TEZ-817
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Prakash Ramachandran
>Priority: Critical
> Fix For: 0.5.0
>
> Attachments: TEZ-817.1.patch, TEZ-817.2.patch, TEZ-817.3.patch, 
> TEZ-817.4.patch, TEZ-817.5.patch
>
>
> They can point to any remote location that may be specific to a user (if the 
> user is playing with a private build). In that case, job submission will fail 
> since YARN will complain that the public LR is not public on the remote FS.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095827#comment-14095827
 ] 

Bikas Saha commented on TEZ-1390:
-

We do not have a strong case right now except that some of the internal buffer 
copies may be avoid by using a bytebuffer because it ByteString (from the 
internal protobuf) allows creating a read only ByteBuffer from ByteString 
without copying.
If you have any concerns then now would be a good time to voice them :)

[~ozawa] Please make sure that all getPayload() methods that return ByteBuffer 
return a clone of the byte buffer as bytebuffer is not thread safe.

> Replace byte[] with ByteBuffer as the type of user payload in the API
> -
>
> Key: TEZ-1390
> URL: https://issues.apache.org/jira/browse/TEZ-1390
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
> Attachments: pig.payload.txt
>
>
> This is just and API change. Internally we can continue to use byte[] since 
> thats a much bigger change.
> The translation from ByteBuffer to byte[] in the API layer should not have 
> perf impact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1400) Reducers stuck when enabling auto-reduce parallelism (MRR case)

2014-08-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095816#comment-14095816
 ] 

Bikas Saha commented on TEZ-1400:
-

Can you confirm that ShuffleVertexManager is being explicitly enabled for 
certain (or all) vertices by calling the vertex.setVertexManager() and then 
providing it a payload that configures 
TEZ_AM_SHUFFLE_VERTEX_MANAGER_ENABLE_AUTO_PARALLEL to true.
This should not be turned on via the main job configuration as it will get 
inadvertently turned on for vertices that should not change their parallelism. 
If this is being enabled explicitly via the setVertexManager() with a payload 
then that is where the bug should be. If its not being explicitly turned on via 
setVertexManager() then that should change. 
One other thing you could try is to create a formal payload object for this 
manager and have a configurer that can set up all its parameters. By default it 
could pick up params from the client side tez-site.xml. Also remove the 
creation of payload from am conf if there is no payload to make the payload 
required.

> Reducers stuck when enabling auto-reduce parallelism (MRR case)
> ---
>
> Key: TEZ-1400
> URL: https://issues.apache.org/jira/browse/TEZ-1400
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>  Labels: performance
> Attachments: TEZ-1400.1.patch, dag.dot
>
>
> In M -> R1 -> R2 case, if R1 is optimized by auto-parallelism R2 gets stuck 
> waiting for events.
> e.g
> Map 1: 0/1  Map 2: -/-  Map 5: 0/1  Map 6: 0/1  Map 7: 0/1
>   Reducer 3: 0/23 Reducer 4: 0/1
> ...
> ...
> Map 1: 1/1  Map 2: 148(+13)/161 Map 5: 1/1  Map 6: 1/1  Map 
> 7: 1/1  Reducer 3: 0(+3)/3  Reducer 4: 0(+1)/1  <== Auto reduce 
> parallelism kicks in
> ..
> Map 1: 1/1  Map 2: 161/161  Map 5: 1/1  Map 6: 1/1  Map 7: 1/1
>   Reducer 3: 3/3  Reducer 4: 0(+1)/1
> Job is stuck waiting for events in Reducer 4.
>  [fetcher [Reducer_3] #23] 
> org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler: copy(3 
> of 23 at 0.02 MB/s) <=== *Waiting for 20 more partitions, even though 
> Reducer3 has been optimized to use 3 reducers



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API

2014-08-13 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095579#comment-14095579
 ] 

Jonathan Eagles edited comment on TEZ-1390 at 8/13/14 3:24 PM:
---

These are the usages of Payload in pig.  [^pig.payload.txt]


was (Author: jeagles):
These are the usages of Payload in pig. 

> Replace byte[] with ByteBuffer as the type of user payload in the API
> -
>
> Key: TEZ-1390
> URL: https://issues.apache.org/jira/browse/TEZ-1390
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
> Attachments: pig.payload.txt
>
>
> This is just and API change. Internally we can continue to use byte[] since 
> thats a much bigger change.
> The translation from ByteBuffer to byte[] in the API layer should not have 
> perf impact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API

2014-08-13 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-1390:
-

Attachment: pig.payload.txt

These are the usages of Payload in pig. 

> Replace byte[] with ByteBuffer as the type of user payload in the API
> -
>
> Key: TEZ-1390
> URL: https://issues.apache.org/jira/browse/TEZ-1390
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
> Attachments: pig.payload.txt
>
>
> This is just and API change. Internally we can continue to use byte[] since 
> thats a much bigger change.
> The translation from ByteBuffer to byte[] in the API layer should not have 
> perf impact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1331) Investigate : interrupts being swallowed by TezClient/DAGClient methods

2014-08-13 Thread Johannes Zillmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095503#comment-14095503
 ] 

Johannes Zillmann commented on TEZ-1331:


Had a look at the code base. Only crucial swallowing i found was in DAGClient. 
Created TEZ-1410 for that. Remaining "catch InterruptedException" are either 
re-thrown in an IOException (e.g. TezClientUtils#getAMProxy()) or really 
internal stuff only.

> Investigate : interrupts being swallowed by TezClient/DAGClient methods
> ---
>
> Key: TEZ-1331
> URL: https://issues.apache.org/jira/browse/TEZ-1331
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Priority: Blocker
>
> TEZ-1278 fixes waitTillReady to not ignore interrupts. This jira is to look 
> through other APIs to figure out whether interrupts handling needs to be 
> fixed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1410) DAGClient#waitForCompletion() methods should not swallow interrupts

2014-08-13 Thread Johannes Zillmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johannes Zillmann updated TEZ-1410:
---

Attachment: TEZ-1410.1.patch

> DAGClient#waitForCompletion() methods should not swallow interrupts
> ---
>
> Key: TEZ-1410
> URL: https://issues.apache.org/jira/browse/TEZ-1410
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.5.0
>Reporter: Johannes Zillmann
>Assignee: Johannes Zillmann
> Attachments: TEZ-1410.1.patch
>
>
> Based on TEZ-1331 i found that the 3 waitForCompletion() methods of DAGClient 
> swallowing interrupts as well. That way you never can stop the wait call 
> since all interrupts are caught and the wait logic just happily proceeds 
> (same as TEZ-1278).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1410) DAGClient#waitForCompletion() methods should not swallow interrupts

2014-08-13 Thread Johannes Zillmann (JIRA)
Johannes Zillmann created TEZ-1410:
--

 Summary: DAGClient#waitForCompletion() methods should not swallow 
interrupts
 Key: TEZ-1410
 URL: https://issues.apache.org/jira/browse/TEZ-1410
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Johannes Zillmann
Assignee: Johannes Zillmann


Based on TEZ-1331 i found that the 3 waitForCompletion() methods of DAGClient 
swallowing interrupts as well. That way you never can stop the wait call since 
all interrupts are caught and the wait logic just happily proceeds (same as 
TEZ-1278).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API

2014-08-13 Thread Johannes Zillmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095485#comment-14095485
 ] 

Johannes Zillmann commented on TEZ-1390:


Just curious, whats the benefit of using ByteBuffer vs byte[] here ?

> Replace byte[] with ByteBuffer as the type of user payload in the API
> -
>
> Key: TEZ-1390
> URL: https://issues.apache.org/jira/browse/TEZ-1390
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
>
> This is just and API change. Internally we can continue to use byte[] since 
> thats a much bigger change.
> The translation from ByteBuffer to byte[] in the API layer should not have 
> perf impact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1347) Consolidate MRHelpers

2014-08-13 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1347:


Attachment: TEZ-1347.1.txt

The last bit of cleanup in MRHelpers (not related to MRInput etc).

Changes
- Add a new class called TezAPIHelpers which contains some methods for conf to 
payload, etc
- Helper methods for payloads etc removed from MRHelpers, in favor of the 
methods in TezAPIHelpers
- Remvoed doJobClient magic. Replaced the important bit of determining which 
API to use with configureMRApiUsage
- Renamed most of the methods in MRHelpers, and improved javadoc to indicate 
these are just helpers to parse out existing MR config values.

[~bikassaha], please review.

> Consolidate MRHelpers
> -
>
> Key: TEZ-1347
> URL: https://issues.apache.org/jira/browse/TEZ-1347
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Blocker
> Attachments: TEZ-1347-initial-review.txt, TEZ-1347.1.txt
>
>
> - Remove methods which don't belong in MRHelpers and potentially move them to 
> TezHelpers.
> - Get rid of methods which we don't expect/want users to use.
> - Get rid of multiple variants of the same method, if these exist.
> - Investigate other cleanup in MRHelpers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API

2014-08-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095226#comment-14095226
 ] 

Siddharth Seth commented on TEZ-1390:
-

Sounds good.

> Replace byte[] with ByteBuffer as the type of user payload in the API
> -
>
> Key: TEZ-1390
> URL: https://issues.apache.org/jira/browse/TEZ-1390
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
>Priority: Blocker
>
> This is just and API change. Internally we can continue to use byte[] since 
> thats a much bigger change.
> The translation from ByteBuffer to byte[] in the API layer should not have 
> perf impact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)