[jira] [Reopened] (TEZ-1337) Handling of local-dirs for Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reopened TEZ-1337: - [~airbots] - apologies for not updating this jira after TEZ-1393. The earlier comments aren't valid anymore. Given the number of directories involved as pointed out in TEZ-1393 (and comments on this) - this was always not a very well understood issue. However, there's still utility in letting users configure a directory which will be used for what would otherwise be the YARN local-dirs and log-dirs. The comment about this directory being deleted during staging / task cleanup still holds. Essentially, allow users to setup a directory, into which tasks write their intermediate output, and into which logs are generated - which is controllable by users. If this isn't setup - the staging directory can always be used for the same. > Handling of local-dirs for Local Mode > - > > Key: TEZ-1337 > URL: https://issues.apache.org/jira/browse/TEZ-1337 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Chen He > Fix For: 0.5.0 > > Attachments: TEZ-1337.patch > > > staging dir is being used to write intermediate data. At some point, it may > be worthwhile to confgure a separate work area. IAC, these should be cleaned > up - at least the intermediate data after a local session executes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1072) Consolidate monitoring APIs in DAGClient
[ https://issues.apache.org/jira/browse/TEZ-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096616#comment-14096616 ] Siddharth Seth commented on TEZ-1072: - +1. Looks good. > Consolidate monitoring APIs in DAGClient > > > Key: TEZ-1072 > URL: https://issues.apache.org/jira/browse/TEZ-1072 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Jonathan Eagles >Priority: Blocker > Labels: api > Attachments: TEZ-1072-DAGCLIENTUTILS-v1.patch, TEZ-1072-v1.patch > > > Rename waitForCompletionWithAllStatusUpdates - was this meant to be > waitForCompletionWithAllVertexUpdates > Reduce the number of methods exposed - waitForCompletion, > waitForCompletionWithStatusUpdates(@Nullable Set vertices, > @Nullable Set statusGetOpts), > waitForCompletionWithAllStatusUpdates(@Nullable Set > statusGetOpts) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1416) tez-api project javadoc/annotations review and clean up.
[ https://issues.apache.org/jira/browse/TEZ-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1416: Attachment: TEZ-1416.1.patch.review Patch file with comments / questions marked with XXX. Also, TezCounters, I believe needs some work. I think we should mark these as Unstable. JobCounters as @Private - and potentially move to MapReduce. Misc in TezConfiguration - all the constants (TEZ_PREFIX, etc) should be private The following should likely be private. TEZ_AM_CANCEL_DELEGATION_TOKEN - Private TEZ_AM_COUNTERS_MAX_KEYS - private TEZ_AM_PLAN_REMOTE_PATH TEZ_AM_INLINE_TASK_EXECUTION_ENABLED The following should be private. Also in TezConstants instead of TezConfiguration - is there a separate jira for that ? There's more than just the ones listed below - so a separate jira would be better. {code} public static final String TEZ_PB_BINARY_CONF_NAME = "tez-conf.pb"; public static final String TEZ_PB_PLAN_BINARY_NAME = "tez-dag.pb"; public static final String TEZ_PB_PLAN_TEXT_NAME = "tez-dag.pb.txt"; public static final String TEZ_CONTAINER_LOG4J_PROPERTIES_FILE = "tez-container-log4j.properties"; public static final String TEZ_CONTAINER_LOGGER_NAME = "CLA"; public static final String TEZ_ROOT_LOGGER_NAME = "tez.root.logger"; public static final String TEZ_CONTAINER_LOG_FILE_NAME = "syslog"; public static final String TEZ_CONTAINER_ERR_FILE_NAME = "stderr"; public static final String TEZ_CONTAINER_OUT_FILE_NAME = "stdout"; {code} TEZ_AM_GROUPING* - should these be renamed to just TEZ_GROUPING_* CompositeDataMovementEvent, DataMovementEvent, some others missing public/private annotations. InputInitializer, InputInitializerContext need to move out of runtime.api Annotations or comments on the proto files. Some of this isn't related to this jira, and should be a follow up - otherwise this patch grows to big. > tez-api project javadoc/annotations review and clean up. > > > Key: TEZ-1416 > URL: https://issues.apache.org/jira/browse/TEZ-1416 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1416.1.patch, TEZ-1416.1.patch.review > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1400) Reducers stuck when enabling auto-reduce parallelism (MRR case)
[ https://issues.apache.org/jira/browse/TEZ-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-1400: -- Attachment: TEZ-1400.2.patch Yes, "key":"tez.am.shuffle-vertex-manager.enable.auto-parallel","value":"true" is explicitly enabled only for certain vertices by Hive. Further debugging revealed that, TezConfiguration picked up wrong "tez-site.xml" from classpath which had min/max settings as 0.0. And TezConfiguration class gets initialized from HiveSplitGenerator (when it tries to compute the waves). Picking up wrong tez-site.xml caused the issue and from this point onwards configuration would end up loading tez-site.xml with wrong values. Attached patch here fixes 1. TezConfiguration.java should not load tez-site.xml during class initialization. It should load it via constructor. 2. If payload is null, VertexManager gets DAG conf instead of amConf. > Reducers stuck when enabling auto-reduce parallelism (MRR case) > --- > > Key: TEZ-1400 > URL: https://issues.apache.org/jira/browse/TEZ-1400 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Labels: performance > Attachments: TEZ-1400.1.patch, TEZ-1400.2.patch, dag.dot > > > In M -> R1 -> R2 case, if R1 is optimized by auto-parallelism R2 gets stuck > waiting for events. > e.g > Map 1: 0/1 Map 2: -/- Map 5: 0/1 Map 6: 0/1 Map 7: 0/1 > Reducer 3: 0/23 Reducer 4: 0/1 > ... > ... > Map 1: 1/1 Map 2: 148(+13)/161 Map 5: 1/1 Map 6: 1/1 Map > 7: 1/1 Reducer 3: 0(+3)/3 Reducer 4: 0(+1)/1 <== Auto reduce > parallelism kicks in > .. > Map 1: 1/1 Map 2: 161/161 Map 5: 1/1 Map 6: 1/1 Map 7: 1/1 > Reducer 3: 3/3 Reducer 4: 0(+1)/1 > Job is stuck waiting for events in Reducer 4. > [fetcher [Reducer_3] #23] > org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler: copy(3 > of 23 at 0.02 MB/s) <=== *Waiting for 20 more partitions, even though > Reducer3 has been optimized to use 3 reducers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1416) tez-api project javadoc/annotations review and clean up.
[ https://issues.apache.org/jira/browse/TEZ-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096574#comment-14096574 ] Hitesh Shah commented on TEZ-1416: -- Mostly looks good. Should likely wait for [~sseth] to see if he has any comments on which classes should/should not be marked stable. Also, I saw TEZ_AM_CONTAINER_REUSE_LOCALITY_DELAY_ALLOCATION_MILLIS_DEFAULT getting changed to 250 - is that intentional? Does not seem right to set such a low value for the general case? > tez-api project javadoc/annotations review and clean up. > > > Key: TEZ-1416 > URL: https://issues.apache.org/jira/browse/TEZ-1416 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1416.1.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1411) Address initial feedback on swimlanes
[ https://issues.apache.org/jira/browse/TEZ-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096528#comment-14096528 ] Gopal V commented on TEZ-1411: -- #1 - this can only be done if everyone runs ATS in their cluster. > Address initial feedback on swimlanes > - > > Key: TEZ-1411 > URL: https://issues.apache.org/jira/browse/TEZ-1411 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Gopal V >Priority: Blocker > Fix For: 0.5.0 > > > Few other good to have things > 1) A wrapper script that takes care of the command chaining with a single > appId as input from the user. > 2) Legend in the README or in the svg itself about what is what. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1418) Provide Default value for TEZ_AM_LAUNCH_ENV and TEZ_TASK_LAUNCH
[ https://issues.apache.org/jira/browse/TEZ-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subroto Sanyal updated TEZ-1418: Summary: Provide Default value for TEZ_AM_LAUNCH_ENV and TEZ_TASK_LAUNCH (was: Default value for TEZ_AM_LAUNCH_ENV and TEZ_TASK_LAUNCH) > Provide Default value for TEZ_AM_LAUNCH_ENV and TEZ_TASK_LAUNCH > --- > > Key: TEZ-1418 > URL: https://issues.apache.org/jira/browse/TEZ-1418 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Subroto Sanyal >Priority: Blocker > Fix For: 0.5.0 > > > As part of the fix for the issue TEZ-1127 two new configurations have been > introduced: > # _TEZ_AM_LAUNCH_ENV_ > # _TEZ_TASK_LAUNCH_ > Ideally these properties should be configured with default value of: > "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" > as in the case for _mapreduce.admin.user.env_ > The default value for these properties are set to "" (empty string). > Now user has to explicitly set these values from the application code to use > the native libs (like for compression). > From Hitesh: > {quote}As commented on TEZ-1127, it is a question as to what the default > should be - whether HADOOP_COMMON_HOME or HADOOP_PREFIX and to some extent, > it needs to handle Windows deployments too.{quote} > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1418) Default value for TEZ_AM_LAUNCH_ENV and TEZ_TASK_LAUNCH
Subroto Sanyal created TEZ-1418: --- Summary: Default value for TEZ_AM_LAUNCH_ENV and TEZ_TASK_LAUNCH Key: TEZ-1418 URL: https://issues.apache.org/jira/browse/TEZ-1418 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Reporter: Subroto Sanyal Priority: Blocker Fix For: 0.5.0 As part of the fix for the issue TEZ-1127 two new configurations have been introduced: # _TEZ_AM_LAUNCH_ENV_ # _TEZ_TASK_LAUNCH_ Ideally these properties should be configured with default value of: "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" as in the case for _mapreduce.admin.user.env_ The default value for these properties are set to "" (empty string). Now user has to explicitly set these values from the application code to use the native libs (like for compression). >From Hitesh: {quote}As commented on TEZ-1127, it is a question as to what the default should be - whether HADOOP_COMMON_HOME or HADOOP_PREFIX and to some extent, it needs to handle Windows deployments too.{quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1337) Handling of local-dirs for Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096517#comment-14096517 ] Chen He commented on TEZ-1337: -- I am confused about some comments in previous discussion: On 01/Aug/14 {quote} I think we should make the following changes. Introduce a local dir for local mode. This will be the base directory for the filesystem that is being used for local mode. The staging directory, however, must continue to work since the Client writes into this and the AM reads contents back. *strong*Fallback to using the base staging-dir if this property is not configured. *strong* The root path would end up being the root-staging dir, and not the application specific staging directory, since that is likely to be deleted. {quote} On 08/Aug/14 {quote} The local directory for AM/tasks: This is the scratch area that tasks rely upon. This is what would end up being configurable via this jira, *strong*instead of relying on the staging directory.*strong* {quote} If this JIRA has already been implemented by TEZ-1393, I will close it. > Handling of local-dirs for Local Mode > - > > Key: TEZ-1337 > URL: https://issues.apache.org/jira/browse/TEZ-1337 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Chen He > Attachments: TEZ-1337.patch > > > staging dir is being used to write intermediate data. At some point, it may > be worthwhile to confgure a separate work area. IAC, these should be cleaned > up - at least the intermediate data after a local session executes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API
[ https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096516#comment-14096516 ] Tsuyoshi OZAWA commented on TEZ-1390: - Thanks for sharing, Jonathan! [~bikassaha], I attached a first patch to make UserPayload to accpets ByteBuffer and making getPayload return ByteBuffer. Could you take a look? > Replace byte[] with ByteBuffer as the type of user payload in the API > - > > Key: TEZ-1390 > URL: https://issues.apache.org/jira/browse/TEZ-1390 > Project: Apache Tez > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Tsuyoshi OZAWA >Priority: Blocker > Attachments: TEZ-1390.1.patch, pig.payload.txt > > > This is just and API change. Internally we can continue to use byte[] since > thats a much bigger change. > The translation from ByteBuffer to byte[] in the API layer should not have > perf impact. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API
[ https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated TEZ-1390: Attachment: TEZ-1390.1.patch > Replace byte[] with ByteBuffer as the type of user payload in the API > - > > Key: TEZ-1390 > URL: https://issues.apache.org/jira/browse/TEZ-1390 > Project: Apache Tez > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Tsuyoshi OZAWA >Priority: Blocker > Attachments: TEZ-1390.1.patch, pig.payload.txt > > > This is just and API change. Internally we can continue to use byte[] since > thats a much bigger change. > The translation from ByteBuffer to byte[] in the API layer should not have > perf impact. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API
[ https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096513#comment-14096513 ] Tsuyoshi OZAWA commented on TEZ-1390: - ByteString#asReadOnlyByteBuffer offers the feature to create a read only ByteBuffer without copying. > Replace byte[] with ByteBuffer as the type of user payload in the API > - > > Key: TEZ-1390 > URL: https://issues.apache.org/jira/browse/TEZ-1390 > Project: Apache Tez > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Tsuyoshi OZAWA >Priority: Blocker > Attachments: pig.payload.txt > > > This is just and API change. Internally we can continue to use byte[] since > thats a much bigger change. > The translation from ByteBuffer to byte[] in the API layer should not have > perf impact. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1072) Consolidate monitoring APIs in DAGClient
[ https://issues.apache.org/jira/browse/TEZ-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096511#comment-14096511 ] Jonathan Eagles commented on TEZ-1072: -- [~sseth], [~bikassaha], Incorporating the feedback, much nicer with only 2 wait functions. - removed waitForCompletionWithStatusUpdates - renamed waitForCompletionWithAllStatusUpdates to waitForCompletionWithAllStatusUpdates - made sure waitForCompletion does not log updates > Consolidate monitoring APIs in DAGClient > > > Key: TEZ-1072 > URL: https://issues.apache.org/jira/browse/TEZ-1072 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Jonathan Eagles >Priority: Blocker > Labels: api > Attachments: TEZ-1072-DAGCLIENTUTILS-v1.patch, TEZ-1072-v1.patch > > > Rename waitForCompletionWithAllStatusUpdates - was this meant to be > waitForCompletionWithAllVertexUpdates > Reduce the number of methods exposed - waitForCompletion, > waitForCompletionWithStatusUpdates(@Nullable Set vertices, > @Nullable Set statusGetOpts), > waitForCompletionWithAllStatusUpdates(@Nullable Set > statusGetOpts) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1072) Consolidate monitoring APIs in DAGClient
[ https://issues.apache.org/jira/browse/TEZ-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-1072: - Attachment: TEZ-1072-v1.patch > Consolidate monitoring APIs in DAGClient > > > Key: TEZ-1072 > URL: https://issues.apache.org/jira/browse/TEZ-1072 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Jonathan Eagles >Priority: Blocker > Labels: api > Attachments: TEZ-1072-DAGCLIENTUTILS-v1.patch, TEZ-1072-v1.patch > > > Rename waitForCompletionWithAllStatusUpdates - was this meant to be > waitForCompletionWithAllVertexUpdates > Reduce the number of methods exposed - waitForCompletion, > waitForCompletionWithStatusUpdates(@Nullable Set vertices, > @Nullable Set statusGetOpts), > waitForCompletionWithAllStatusUpdates(@Nullable Set > statusGetOpts) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1409) Change MRInputConfigurer, MROutputConfigurer to accept specific classes, isntead of a generic class
[ https://issues.apache.org/jira/browse/TEZ-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1409: Attachment: TEZ-1409.1.txt Patch makes the following changes. - InputFormat and path nullable - InputFormat and useNewApi properties are not updated if an inputFormat is not specified. - Renames groupSplitsInAM to groupSplits - that's more generic. - Removes new JobConf(conf) - was this required for any specific reason ?, other than potentially not changing the original conf. - Removed the call to MRHelpers.setApi*. That's being taken care of anyway. Wanted to change createConfigurer(Configuration conf, @Nullable Class inputFormat) to createConfigurer(Configuration conf, @Nullable Class inputFormat) and createConfigurer(Configuration conf, @Nullable Class inputFormat) - but that isn't an option since they have the same runtime signature. Left that as is. Similarly for the Outputs. [~bikassaha], please review. > Change MRInputConfigurer, MROutputConfigurer to accept specific classes, > isntead of a generic class > --- > > Key: TEZ-1409 > URL: https://issues.apache.org/jira/browse/TEZ-1409 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Blocker > Attachments: TEZ-1409.1.txt > > > Separate methods to accept either mapred or mapreduce InputFormat. Similarly > for the Output. > This generates compile time errors while using these methods. > Ran into this on the first iteration of TEZ-1407, where I had set the wrong > class (a committer instead of an OF). > Also, ideally these should be @Nullable - in case the user has already set > them up correctly in the Configuration. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1334) Annotate all non public classes in tez-runtime-library with @private
[ https://issues.apache.org/jira/browse/TEZ-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096493#comment-14096493 ] Bikas Saha commented on TEZ-1334: - I am guessing this covers MRInput/MROutput that are in tez-mapreduce also. > Annotate all non public classes in tez-runtime-library with @private > > > Key: TEZ-1334 > URL: https://issues.apache.org/jira/browse/TEZ-1334 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Priority: Blocker > > This prevents javadoc from being generated. > Alternative would be to mark classes explicitly public using annotation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1416) tez-api project javadoc/annotations review and clean up.
[ https://issues.apache.org/jira/browse/TEZ-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096492#comment-14096492 ] Bikas Saha commented on TEZ-1416: - [~sseth] [~hitesh] please review. > tez-api project javadoc/annotations review and clean up. > > > Key: TEZ-1416 > URL: https://issues.apache.org/jira/browse/TEZ-1416 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1416.1.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1417) Rename *Configurer
Siddharth Seth created TEZ-1417: --- Summary: Rename *Configurer Key: TEZ-1417 URL: https://issues.apache.org/jira/browse/TEZ-1417 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Priority: Blocker >From offline feedback from [~bikassaha], [~acmurthy] and [~hagleitn] - this >needs to be renamed. Something like Configurator as Bikas had earlier suggested, or ConfigBuilder which I like more. This can be done as a last refactor before 0.5 since it's very disruptive to patches in progress. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1416) tez-api project javadoc/annotations review and clean up.
[ https://issues.apache.org/jira/browse/TEZ-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-1416: Attachment: TEZ-1416.1.patch Attached patch cleans up the javadoc. Adds public annotation for all public classes. Private for the rest. There are a couple of TODOs that I will address either in this or a follow up. ObjectRegistry and LogUtils package moved. > tez-api project javadoc/annotations review and clean up. > > > Key: TEZ-1416 > URL: https://issues.apache.org/jira/browse/TEZ-1416 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1416.1.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1334) Annotate all non public classes in tez-runtime-library with @private
[ https://issues.apache.org/jira/browse/TEZ-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096488#comment-14096488 ] Bikas Saha commented on TEZ-1334: - TEZ-1416 covers API. This can cover runtime-library. > Annotate all non public classes in tez-runtime-library with @private > > > Key: TEZ-1334 > URL: https://issues.apache.org/jira/browse/TEZ-1334 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Priority: Blocker > > This prevents javadoc from being generated. > Alternative would be to mark classes explicitly public using annotation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1330) Create a dist target which contains required jars
[ https://issues.apache.org/jira/browse/TEZ-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096487#comment-14096487 ] Bikas Saha commented on TEZ-1330: - minimal - since this is the minimal set of jars needed to run tez. > Create a dist target which contains required jars > - > > Key: TEZ-1330 > URL: https://issues.apache.org/jira/browse/TEZ-1330 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Blocker > Attachments: TEZ-1330.1.wip.txt > > > Comment from [~rohini] on TEZ-1300 > bq. The tez-dist now only contains tez-0.5.0-SNAPSHOT.tar.gz. Can you retain > the retain the directory structure also with the individual jars. The pig > client needs the individual jars in the classpath. It is convenient to > compile tez and point to the tez-dist directory for e2e testing. Without that > we will have to do extra step of untarring it and is a inconvenience during > development. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1334) Annotate all non public classes in tez-runtime-library with @private
[ https://issues.apache.org/jira/browse/TEZ-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-1334: Summary: Annotate all non public classes in tez-runtime-library with @private (was: Annotate all non public classes in tez-api/tez-runtime-library with @private) > Annotate all non public classes in tez-runtime-library with @private > > > Key: TEZ-1334 > URL: https://issues.apache.org/jira/browse/TEZ-1334 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Priority: Blocker > > This prevents javadoc from being generated. > Alternative would be to mark classes explicitly public using annotation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (TEZ-684) Uber/Local modes for Tez
[ https://issues.apache.org/jira/browse/TEZ-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096467#comment-14096467 ] Siddharth Seth edited comment on TEZ-684 at 8/14/14 2:28 AM: - LocalResources are not handled in local mode. I'm not sure if there's a jira for this or not. It's a fairly complicated, and far-reaching change to get something like this to work. Will require the concept of a working directory - which would then have to be explicitly used. In most cases, this isn't a problem since required libraries are already part of the client / AM. For custom resources though - this is problematic. That's the main reason SplitsOnClient_DistCache does not function. was (Author: sseth): LocalResources are not handled in local mode. I'm not sure if there's a jira for this or not. It's a fairly complicated, and far-reaching change to get something like this to work. Will require the concept of a working directory - which would then have to be explicitly used. > Uber/Local modes for Tez > > > Key: TEZ-684 > URL: https://issues.apache.org/jira/browse/TEZ-684 > Project: Apache Tez > Issue Type: New Feature >Reporter: Chen He >Assignee: Chen He > Attachments: TEZ-684-2014-7-21.patch, TEZ-684.patch, TEZ-684.patch, > TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, > TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, > TezUberModeDesignDraft.png > > > Similarly to MapReduce Uber-mode in Yarn, we plan to create the Uber-mode for > Tez. It runs all tasks in local in one process. > Our target is to start DAGAppMaster in local JVM and let it run all tasks in > one process. > Here is my design: > Once user submits a DAG, Tez starts a instance of DAGAppMaster. This > DAGAppMaster will check TezConfiguration before instantiate > ContainerLauncher. If "is_Uber" is true, DAGAppMaster creates a > LocalContainerLauncher. LocalTaskScheduler and LocalTaskSchedulerEventHandler > will call LocalContainerLauncher to run all tasks one by one in a single JVM. > Communications between ResourceManager and local classes (DAGAppMaster, > LocalContainerLauncher, LocalTaskScheduler, and > LocalTaskSchedulerEventHandler) are muted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (TEZ-1337) Handling of local-dirs for Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096462#comment-14096462 ] Siddharth Seth edited comment on TEZ-1337 at 8/14/14 2:29 AM: -- Not a blocker after TEZ-1393. The main issue users were facing was the user.dir being reset - and hence not having access to data which was generated previously. was (Author: sseth): Not a blocker after TEZ-1397. The main issue users were facing was the user.dir being reset - and hence not having access to data which was generated previously. > Handling of local-dirs for Local Mode > - > > Key: TEZ-1337 > URL: https://issues.apache.org/jira/browse/TEZ-1337 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Chen He > Attachments: TEZ-1337.patch > > > staging dir is being used to write intermediate data. At some point, it may > be worthwhile to confgure a separate work area. IAC, these should be cleaned > up - at least the intermediate data after a local session executes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-684) Uber/Local modes for Tez
[ https://issues.apache.org/jira/browse/TEZ-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096467#comment-14096467 ] Siddharth Seth commented on TEZ-684: LocalResources are not handled in local mode. I'm not sure if there's a jira for this or not. It's a fairly complicated, and far-reaching change to get something like this to work. Will require the concept of a working directory - which would then have to be explicitly used. > Uber/Local modes for Tez > > > Key: TEZ-684 > URL: https://issues.apache.org/jira/browse/TEZ-684 > Project: Apache Tez > Issue Type: New Feature >Reporter: Chen He >Assignee: Chen He > Attachments: TEZ-684-2014-7-21.patch, TEZ-684.patch, TEZ-684.patch, > TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, > TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, > TezUberModeDesignDraft.png > > > Similarly to MapReduce Uber-mode in Yarn, we plan to create the Uber-mode for > Tez. It runs all tasks in local in one process. > Our target is to start DAGAppMaster in local JVM and let it run all tasks in > one process. > Here is my design: > Once user submits a DAG, Tez starts a instance of DAGAppMaster. This > DAGAppMaster will check TezConfiguration before instantiate > ContainerLauncher. If "is_Uber" is true, DAGAppMaster creates a > LocalContainerLauncher. LocalTaskScheduler and LocalTaskSchedulerEventHandler > will call LocalContainerLauncher to run all tasks one by one in a single JVM. > Communications between ResourceManager and local classes (DAGAppMaster, > LocalContainerLauncher, LocalTaskScheduler, and > LocalTaskSchedulerEventHandler) are muted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1132) Consistent naming of Input and Outputs
[ https://issues.apache.org/jira/browse/TEZ-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096463#comment-14096463 ] Siddharth Seth commented on TEZ-1132: - bq. Change KV to KeyValue in all names. Optional, the class names are already long. Also this can get confusing w.r.t Readers since some are KeyValue based, others are KeyValues based. If we evolve these to be RowBased at some point - that will just be a new set if Inputs/Otuputs. bq. LocalOnFileSorterOutput should probably be removed. bq. LocalMergedInput should probably be moved out. +1 bq. Do we need the OnFile prefix on these? These could potentially write to HDFS? Agree. I think we should remove it. bq. Is the Shuffled prefix needed? The reader threads could potentially read from HDFS? Shuffled can be interpreted in several different ways - mapreduce shuffle, just moving data. Probably best to just remove it to avoid confusion. The proposed input names should also have KV/KeyValue. > Consistent naming of Input and Outputs > -- > > Key: TEZ-1132 > URL: https://issues.apache.org/jira/browse/TEZ-1132 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Bikas Saha >Priority: Blocker > > Some places we should Sorted Partitioned. In others we should Shuffled. We > should use a consistent naming scheme based on Sorted, Grouped, Partitioned > sub-terms so that the function is clear from the name. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1337) Handling of local-dirs for Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1337: Priority: Major (was: Blocker) > Handling of local-dirs for Local Mode > - > > Key: TEZ-1337 > URL: https://issues.apache.org/jira/browse/TEZ-1337 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Chen He > Attachments: TEZ-1337.patch > > > staging dir is being used to write intermediate data. At some point, it may > be worthwhile to confgure a separate work area. IAC, these should be cleaned > up - at least the intermediate data after a local session executes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1337) Handling of local-dirs for Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096462#comment-14096462 ] Siddharth Seth commented on TEZ-1337: - Not a blocker after TEZ-1397. The main issue users were facing was the user.dir being reset - and hence not having access to data which was generated previously. > Handling of local-dirs for Local Mode > - > > Key: TEZ-1337 > URL: https://issues.apache.org/jira/browse/TEZ-1337 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Chen He >Priority: Blocker > Attachments: TEZ-1337.patch > > > staging dir is being used to write intermediate data. At some point, it may > be worthwhile to confgure a separate work area. IAC, these should be cleaned > up - at least the intermediate data after a local session executes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1416) tez-api project javadoc/annotations review and clean up.
Bikas Saha created TEZ-1416: --- Summary: tez-api project javadoc/annotations review and clean up. Key: TEZ-1416 URL: https://issues.apache.org/jira/browse/TEZ-1416 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Blocker -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1330) Create a dist target which contains required jars
[ https://issues.apache.org/jira/browse/TEZ-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096461#comment-14096461 ] Siddharth Seth commented on TEZ-1330: - suggestions ? -withoutHadoop ? > Create a dist target which contains required jars > - > > Key: TEZ-1330 > URL: https://issues.apache.org/jira/browse/TEZ-1330 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Blocker > Attachments: TEZ-1330.1.wip.txt > > > Comment from [~rohini] on TEZ-1300 > bq. The tez-dist now only contains tez-0.5.0-SNAPSHOT.tar.gz. Can you retain > the retain the directory structure also with the individual jars. The pig > client needs the individual jars in the classpath. It is convenient to > compile tez and point to the tez-dist directory for e2e testing. Without that > we will have to do extra step of untarring it and is a inconvenience during > development. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1246) Replace constructors with create() methods for DAG, Vertex, Edge etc in the API
[ https://issues.apache.org/jira/browse/TEZ-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1246: Priority: Blocker (was: Major) Target Version/s: 0.5.0 I think this change is worth making before 0.5. Gives us some control over evolving DAG, Vertex etc. > Replace constructors with create() methods for DAG, Vertex, Edge etc in the > API > --- > > Key: TEZ-1246 > URL: https://issues.apache.org/jira/browse/TEZ-1246 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1334) Annotate all non public classes in tez-api/tez-runtime-library with @private
[ https://issues.apache.org/jira/browse/TEZ-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1334: Target Version/s: 0.5.0 > Annotate all non public classes in tez-api/tez-runtime-library with @private > > > Key: TEZ-1334 > URL: https://issues.apache.org/jira/browse/TEZ-1334 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Priority: Blocker > > This prevents javadoc from being generated. > Alternative would be to mark classes explicitly public using annotation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1334) Annotate all non public classes in tez-api/tez-runtime-library with @private
[ https://issues.apache.org/jira/browse/TEZ-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1334: Priority: Blocker (was: Major) > Annotate all non public classes in tez-api/tez-runtime-library with @private > > > Key: TEZ-1334 > URL: https://issues.apache.org/jira/browse/TEZ-1334 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Priority: Blocker > > This prevents javadoc from being generated. > Alternative would be to mark classes explicitly public using annotation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1316) Document public interfaces/classes which are not meant to be implemented/extended
[ https://issues.apache.org/jira/browse/TEZ-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1316: Priority: Blocker (was: Critical) > Document public interfaces/classes which are not meant to be > implemented/extended > - > > Key: TEZ-1316 > URL: https://issues.apache.org/jira/browse/TEZ-1316 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1414) Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.
[ https://issues.apache.org/jira/browse/TEZ-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096429#comment-14096429 ] Bikas Saha commented on TEZ-1414: - There is no clear solution for the actual dirs on the jenkins host. Creating mini cluster can slow down this unit test. Can we mock any of this? If not then please see if you can reuse any of the existing tests that already start a mini cluster. e.g. TestMRRJobsDAGApi that is already doing a bunch on other tests. > Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass. > --- > > Key: TEZ-1414 > URL: https://issues.apache.org/jira/browse/TEZ-1414 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Bikas Saha > Fix For: 0.5.0 > > Attachments: TEZ-1414.1.patch > > > The test passes locally but for some reason is fails on Jenkins. Disabling > temporarily until the issue get worked out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1414) Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.
[ https://issues.apache.org/jira/browse/TEZ-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096389#comment-14096389 ] Hitesh Shah commented on TEZ-1414: -- [~pramachandran] The other approach would be to use a local resource path from HDFS which is under the control of the test and not left to the env. You can launch a MiniDFSCluster, create files/dirs within it with the required path permissions set as needed. > Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass. > --- > > Key: TEZ-1414 > URL: https://issues.apache.org/jira/browse/TEZ-1414 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Bikas Saha > Fix For: 0.5.0 > > Attachments: TEZ-1414.1.patch > > > The test passes locally but for some reason is fails on Jenkins. Disabling > temporarily until the issue get worked out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1415) Merge various Util classes in Tez
Bikas Saha created TEZ-1415: --- Summary: Merge various Util classes in Tez Key: TEZ-1415 URL: https://issues.apache.org/jira/browse/TEZ-1415 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha TezCommonUtils, LogUtils. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1414) Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.
[ https://issues.apache.org/jira/browse/TEZ-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096377#comment-14096377 ] Prakash Ramachandran commented on TEZ-1414: --- Working on this. the reason for failing in jenkins could be that one of the ancestor directories for test directory does not have the o+x permissions. will check if the java.io.tmpdir can be used - not sure where the temp directory for jenkins build is set to. > Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass. > --- > > Key: TEZ-1414 > URL: https://issues.apache.org/jira/browse/TEZ-1414 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Bikas Saha > Fix For: 0.5.0 > > Attachments: TEZ-1414.1.patch > > > The test passes locally but for some reason is fails on Jenkins. Disabling > temporarily until the issue get worked out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1345) Add checks to guarantee all init events are written to recovery to consider vertex initialized
[ https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096356#comment-14096356 ] Jeff Zhang commented on TEZ-1345: - Attach the patch. [~hitesh] I make the following changes in the patch * handle V_RouteEvent synchronously in VertexManager to make sure the RootDataInputFormation is written to recovery before VertexInitlizedEvent * add unit test to verify that RootDataInputFormation is written to recovery before VertexInitlizedEvent > Add checks to guarantee all init events are written to recovery to consider > vertex initialized > -- > > Key: TEZ-1345 > URL: https://issues.apache.org/jira/browse/TEZ-1345 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Jeff Zhang > Attachments: Tez-1345.patch > > > Related to issue discovered in TEZ-1033 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-684) Uber/Local modes for Tez
[ https://issues.apache.org/jira/browse/TEZ-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096355#comment-14096355 ] Bikas Saha commented on TEZ-684: What happens to local resources that are part of the vertex but not part of the AM. How does the task find them in its working directory? Do tasks run in separate dirs? If yes, then temp files from previous tasks would be visible to next one. If no, then do we symlink their local files to the different dirs? > Uber/Local modes for Tez > > > Key: TEZ-684 > URL: https://issues.apache.org/jira/browse/TEZ-684 > Project: Apache Tez > Issue Type: New Feature >Reporter: Chen He >Assignee: Chen He > Attachments: TEZ-684-2014-7-21.patch, TEZ-684.patch, TEZ-684.patch, > TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, > TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, TEZ-684.patch, > TezUberModeDesignDraft.png > > > Similarly to MapReduce Uber-mode in Yarn, we plan to create the Uber-mode for > Tez. It runs all tasks in local in one process. > Our target is to start DAGAppMaster in local JVM and let it run all tasks in > one process. > Here is my design: > Once user submits a DAG, Tez starts a instance of DAGAppMaster. This > DAGAppMaster will check TezConfiguration before instantiate > ContainerLauncher. If "is_Uber" is true, DAGAppMaster creates a > LocalContainerLauncher. LocalTaskScheduler and LocalTaskSchedulerEventHandler > will call LocalContainerLauncher to run all tasks one by one in a single JVM. > Communications between ResourceManager and local classes (DAGAppMaster, > LocalContainerLauncher, LocalTaskScheduler, and > LocalTaskSchedulerEventHandler) are muted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1345) Add checks to guarantee all init events are written to recovery to consider vertex initialized
[ https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-1345: Attachment: Tez-1345.patch > Add checks to guarantee all init events are written to recovery to consider > vertex initialized > -- > > Key: TEZ-1345 > URL: https://issues.apache.org/jira/browse/TEZ-1345 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Jeff Zhang > Attachments: Tez-1345.patch > > > Related to issue discovered in TEZ-1033 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1222) Add examples for uses of the API
[ https://issues.apache.org/jira/browse/TEZ-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096333#comment-14096333 ] Bikas Saha commented on TEZ-1222: - WordCount, OrderededWordCount and SimpleSessionExample are committed. Intersect example already exists. Removing this from blocker status. > Add examples for uses of the API > > > Key: TEZ-1222 > URL: https://issues.apache.org/jira/browse/TEZ-1222 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Bikas Saha >Priority: Blocker > > Add examples of writing Input, Output, Processor. > Add examples of creating DAGs' with different properties on the edges to > clarify use cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1222) Add examples for uses of the API
[ https://issues.apache.org/jira/browse/TEZ-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-1222: Priority: Major (was: Blocker) > Add examples for uses of the API > > > Key: TEZ-1222 > URL: https://issues.apache.org/jira/browse/TEZ-1222 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Bikas Saha > > Add examples of writing Input, Output, Processor. > Add examples of creating DAGs' with different properties on the edges to > clarify use cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-671) Support View/Modify ACLs for DAGs
[ https://issues.apache.org/jira/browse/TEZ-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096328#comment-14096328 ] Hitesh Shah commented on TEZ-671: - API introductions involved. [~bikassaha] [~sseth] please review. > Support View/Modify ACLs for DAGs > - > > Key: TEZ-671 > URL: https://issues.apache.org/jira/browse/TEZ-671 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Hitesh Shah > Attachments: TEZ-671.2.patch, TEZ-671.3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1072) Consolidate monitoring APIs in DAGClient
[ https://issues.apache.org/jira/browse/TEZ-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-1072: Assignee: Jonathan Eagles > Consolidate monitoring APIs in DAGClient > > > Key: TEZ-1072 > URL: https://issues.apache.org/jira/browse/TEZ-1072 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Jonathan Eagles >Priority: Blocker > Labels: api > Attachments: TEZ-1072-DAGCLIENTUTILS-v1.patch > > > Rename waitForCompletionWithAllStatusUpdates - was this meant to be > waitForCompletionWithAllVertexUpdates > Reduce the number of methods exposed - waitForCompletion, > waitForCompletionWithStatusUpdates(@Nullable Set vertices, > @Nullable Set statusGetOpts), > waitForCompletionWithAllStatusUpdates(@Nullable Set > statusGetOpts) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (TEZ-1132) Consistent naming of Input and Outputs
[ https://issues.apache.org/jira/browse/TEZ-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096298#comment-14096298 ] Jeff Zhang edited comment on TEZ-1132 at 8/13/14 11:29 PM: --- [~bikassaha] Sure, assign it to you. was (Author: zjffdu): [~bikassaha] Sure, please take this. > Consistent naming of Input and Outputs > -- > > Key: TEZ-1132 > URL: https://issues.apache.org/jira/browse/TEZ-1132 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Bikas Saha >Priority: Blocker > > Some places we should Sorted Partitioned. In others we should Shuffled. We > should use a consistent naming scheme based on Sorted, Grouped, Partitioned > sub-terms so that the function is clear from the name. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1132) Consistent naming of Input and Outputs
[ https://issues.apache.org/jira/browse/TEZ-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096298#comment-14096298 ] Jeff Zhang commented on TEZ-1132: - [~bikassaha] Sure, please take this. > Consistent naming of Input and Outputs > -- > > Key: TEZ-1132 > URL: https://issues.apache.org/jira/browse/TEZ-1132 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Jeff Zhang >Priority: Blocker > > Some places we should Sorted Partitioned. In others we should Shuffled. We > should use a consistent naming scheme based on Sorted, Grouped, Partitioned > sub-terms so that the function is clear from the name. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1132) Consistent naming of Input and Outputs
[ https://issues.apache.org/jira/browse/TEZ-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-1132: Assignee: Bikas Saha (was: Jeff Zhang) > Consistent naming of Input and Outputs > -- > > Key: TEZ-1132 > URL: https://issues.apache.org/jira/browse/TEZ-1132 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Bikas Saha >Priority: Blocker > > Some places we should Sorted Partitioned. In others we should Shuffled. We > should use a consistent naming scheme based on Sorted, Grouped, Partitioned > sub-terms so that the function is clear from the name. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (TEZ-1414) Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.
[ https://issues.apache.org/jira/browse/TEZ-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha resolved TEZ-1414. - Resolution: Fixed Fix Version/s: 0.5.0 commit e2692f7cacd56fd2e8d2734f79ecfbb447e2fc2c Author: Bikas Saha Date: Wed Aug 13 16:20:59 2014 -0700 TEZ-1414. Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass(bikas) > Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass. > --- > > Key: TEZ-1414 > URL: https://issues.apache.org/jira/browse/TEZ-1414 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Bikas Saha > Fix For: 0.5.0 > > Attachments: TEZ-1414.1.patch > > > The test passes locally but for some reason is fails on Jenkins. Disabling > temporarily until the issue get worked out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1414) Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.
[ https://issues.apache.org/jira/browse/TEZ-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-1414: Attachment: TEZ-1414.1.patch Committing trivial patch. > Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass. > --- > > Key: TEZ-1414 > URL: https://issues.apache.org/jira/browse/TEZ-1414 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Bikas Saha > Fix For: 0.5.0 > > Attachments: TEZ-1414.1.patch > > > The test passes locally but for some reason is fails on Jenkins. Disabling > temporarily until the issue get worked out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1337) Handling of local-dirs for Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096284#comment-14096284 ] Chen He commented on TEZ-1337: -- No need, I can submit patch today. > Handling of local-dirs for Local Mode > - > > Key: TEZ-1337 > URL: https://issues.apache.org/jira/browse/TEZ-1337 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Chen He >Priority: Blocker > Attachments: TEZ-1337.patch > > > staging dir is being used to write intermediate data. At some point, it may > be worthwhile to confgure a separate work area. IAC, these should be cleaned > up - at least the intermediate data after a local session executes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1414) Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass.
Bikas Saha created TEZ-1414: --- Summary: Disable TestTezClientUtils.testLocalResourceVisibility to make builds pass. Key: TEZ-1414 URL: https://issues.apache.org/jira/browse/TEZ-1414 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha The test passes locally but for some reason is fails on Jenkins. Disabling temporarily until the issue get worked out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1413) Fix build for TestTezClientUtils.testLocalResourceVisibility
Bikas Saha created TEZ-1413: --- Summary: Fix build for TestTezClientUtils.testLocalResourceVisibility Key: TEZ-1413 URL: https://issues.apache.org/jira/browse/TEZ-1413 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Prakash Ramachandran Build failed in Jenkins: Tez-Build #565 org.apache.tez.client.TestTezClientUtils Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.131 sec <<< FAILURE! testLocalResourceVisibility(org.apache.tez.client.TestTezClientUtils) Time elapsed: 0.093 sec <<< FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.tez.client.TestTezClientUtils.testLocalResourceVisibility(TestTezClientUtils.java:258) Running org.apache.tez.common.security.TestTokenCache Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.944 sec Running org.apache.tez.common.TestTezCommonUtils Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.949 sec Running org.apache.tez.common.TestReflectionUtils Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.69 sec Results : Failed tests: TestTezClientUtils.testLocalResourceVisibility:258 null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1337) Handling of local-dirs for Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096278#comment-14096278 ] Bikas Saha commented on TEZ-1337: - Is this a 0.5 blocker or can be followed up in 0.5.1? > Handling of local-dirs for Local Mode > - > > Key: TEZ-1337 > URL: https://issues.apache.org/jira/browse/TEZ-1337 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Chen He >Priority: Blocker > Attachments: TEZ-1337.patch > > > staging dir is being used to write intermediate data. At some point, it may > be worthwhile to confgure a separate work area. IAC, these should be cleaned > up - at least the intermediate data after a local session executes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1072) Consolidate monitoring APIs in DAGClient
[ https://issues.apache.org/jira/browse/TEZ-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096276#comment-14096276 ] Bikas Saha commented on TEZ-1072: - +1 on just removing. public abstract DAGStatus waitForCompletionWithStatusUpdates(@Nullable Set vertices, @Nullable Set statusGetOpts) throws IOException, TezException; And renaming public abstract DAGStatus waitForCompletionWithAllStatusUpdates(FOO) to public abstract DAGStatus waitForCompletionWithStatusUpdates(FOO) [~jeagles] I see what you are saying with the Utils class but it seems more natural to be able to use the DAGClient directly IMO. > Consolidate monitoring APIs in DAGClient > > > Key: TEZ-1072 > URL: https://issues.apache.org/jira/browse/TEZ-1072 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Priority: Blocker > Labels: api > Attachments: TEZ-1072-DAGCLIENTUTILS-v1.patch > > > Rename waitForCompletionWithAllStatusUpdates - was this meant to be > waitForCompletionWithAllVertexUpdates > Reduce the number of methods exposed - waitForCompletion, > waitForCompletionWithStatusUpdates(@Nullable Set vertices, > @Nullable Set statusGetOpts), > waitForCompletionWithAllStatusUpdates(@Nullable Set > statusGetOpts) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1320) Remove getApplicationId from DAGClient
[ https://issues.apache.org/jira/browse/TEZ-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096271#comment-14096271 ] Bikas Saha commented on TEZ-1320: - The patch looks fine except for the following that should be retained in the tests. Its important to test this. {code} verify(yarnClient, times(1)).submitApplication(captor.capture()); - Assert.assertEquals(appId1, dagClient.getApplicationId());{code} Pig and Hive use this (mostly for logging and UI) I think. So not sure about removing this because they use it for users to point users to the application for debugging. So not sure about removing it. If we do remove this then we should add something like dagClient.getExecutionContext() that provides a string that can be used for logging. > Remove getApplicationId from DAGClient > -- > > Key: TEZ-1320 > URL: https://issues.apache.org/jira/browse/TEZ-1320 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Jonathan Eagles >Priority: Blocker > Attachments: TEZ-1320-v1.patch > > > We should either get rid of this, or convert it to a String. Not sure why > this API needs to be exposed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1330) Create a dist target which contains required jars
[ https://issues.apache.org/jira/browse/TEZ-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096258#comment-14096258 ] Bikas Saha commented on TEZ-1330: - Tried it. Works for me. +1. Super useful for dev. Can we find a better name for partial? > Create a dist target which contains required jars > - > > Key: TEZ-1330 > URL: https://issues.apache.org/jira/browse/TEZ-1330 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Blocker > Attachments: TEZ-1330.1.wip.txt > > > Comment from [~rohini] on TEZ-1300 > bq. The tez-dist now only contains tez-0.5.0-SNAPSHOT.tar.gz. Can you retain > the retain the directory structure also with the individual jars. The pig > client needs the individual jars in the classpath. It is convenient to > compile tez and point to the tez-dist directory for e2e testing. Without that > we will have to do extra step of untarring it and is a inconvenience during > development. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-269) Fix ResourceMgrDelegate#getDelegationToken after YARN-868 is fixed
[ https://issues.apache.org/jira/browse/TEZ-269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated TEZ-269: - Attachment: TEZ-269.patch Patch can be submitted once YARN-868 is committed > Fix ResourceMgrDelegate#getDelegationToken after YARN-868 is fixed > -- > > Key: TEZ-269 > URL: https://issues.apache.org/jira/browse/TEZ-269 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah > Labels: TEZ-0.3.0 > Attachments: TEZ-269.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-671) Support View/Modify ACLs for DAGs
[ https://issues.apache.org/jira/browse/TEZ-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-671: Attachment: TEZ-671.3.patch Comments addressed. > Support View/Modify ACLs for DAGs > - > > Key: TEZ-671 > URL: https://issues.apache.org/jira/browse/TEZ-671 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Hitesh Shah > Attachments: TEZ-671.2.patch, TEZ-671.3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1132) Consistent naming of Input and Outputs
[ https://issues.apache.org/jira/browse/TEZ-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096248#comment-14096248 ] Bikas Saha commented on TEZ-1132: - LocalOnFileSorterOutput should probably be removed. OnFileSortedOutput -> OnFileOrderedPartitionedKVOutput Change KV to KeyValue in all names. Do we need the OnFile prefix on these? These could potentially write to HDFS? LocalMergedInput should probably be moved out. SortedGroupedMergedInput -> OrderedGroupedMergedInput ShuffledMergedInput -> ShuffledOrderedGroupedInput ShuffledMergedInputLegacy -> ShuffledOrderedGroupedInput [~zjffdu] Do you mind if I take this over. This may be easier to do in PST as most of the Hive/Pig people who will get broken because of this are in the same time zone and could iterate faster over it and ask for help if needed. > Consistent naming of Input and Outputs > -- > > Key: TEZ-1132 > URL: https://issues.apache.org/jira/browse/TEZ-1132 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Jeff Zhang >Priority: Blocker > > Some places we should Sorted Partitioned. In others we should Shuffled. We > should use a consistent naming scheme based on Sorted, Grouped, Partitioned > sub-terms so that the function is clear from the name. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (TEZ-1132) Consistent naming of Input and Outputs
[ https://issues.apache.org/jira/browse/TEZ-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096248#comment-14096248 ] Bikas Saha edited comment on TEZ-1132 at 8/13/14 10:41 PM: --- LocalOnFileSorterOutput should probably be removed. OnFileSortedOutput -> OnFileOrderedPartitionedKVOutput Change KV to KeyValue in all names. Do we need the OnFile prefix on these? These could potentially write to HDFS? LocalMergedInput should probably be moved out. SortedGroupedMergedInput -> OrderedGroupedMergedInput ShuffledMergedInput -> ShuffledOrderedGroupedInput ShuffledMergedInputLegacy -> ShuffledOrderedGroupedInput Is the Shuffled prefix needed? The reader threads could potentially read from HDFS? [~zjffdu] Do you mind if I take this over. This may be easier to do in PST as most of the Hive/Pig people who will get broken because of this are in the same time zone and could iterate faster over it and ask for help if needed. was (Author: bikassaha): LocalOnFileSorterOutput should probably be removed. OnFileSortedOutput -> OnFileOrderedPartitionedKVOutput Change KV to KeyValue in all names. Do we need the OnFile prefix on these? These could potentially write to HDFS? LocalMergedInput should probably be moved out. SortedGroupedMergedInput -> OrderedGroupedMergedInput ShuffledMergedInput -> ShuffledOrderedGroupedInput ShuffledMergedInputLegacy -> ShuffledOrderedGroupedInput [~zjffdu] Do you mind if I take this over. This may be easier to do in PST as most of the Hive/Pig people who will get broken because of this are in the same time zone and could iterate faster over it and ask for help if needed. > Consistent naming of Input and Outputs > -- > > Key: TEZ-1132 > URL: https://issues.apache.org/jira/browse/TEZ-1132 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Jeff Zhang >Priority: Blocker > > Some places we should Sorted Partitioned. In others we should Shuffled. We > should use a consistent naming scheme based on Sorted, Grouped, Partitioned > sub-terms so that the function is clear from the name. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1347) Consolidate MRHelpers
[ https://issues.apache.org/jira/browse/TEZ-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096219#comment-14096219 ] Siddharth Seth commented on TEZ-1347: - Committing. > Consolidate MRHelpers > - > > Key: TEZ-1347 > URL: https://issues.apache.org/jira/browse/TEZ-1347 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Blocker > Attachments: TEZ-1347-initial-review.txt, TEZ-1347.1.txt, > TEZ-1347.2.txt > > > - Remove methods which don't belong in MRHelpers and potentially move them to > TezHelpers. > - Get rid of methods which we don't expect/want users to use. > - Get rid of multiple variants of the same method, if these exist. > - Investigate other cleanup in MRHelpers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1412) Create a KeyValue(s)Reader hierarchy to show properties
Bikas Saha created TEZ-1412: --- Summary: Create a KeyValue(s)Reader hierarchy to show properties Key: TEZ-1412 URL: https://issues.apache.org/jira/browse/TEZ-1412 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha E.g. OrderedKeyValuesReader to show that the keys are ordered. This way the users can cast the appropriate reader instead of casting the inputs. This enabled input impls to be changed transparently. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1347) Consolidate MRHelpers
[ https://issues.apache.org/jira/browse/TEZ-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1347: Attachment: TEZ-1347.2.txt Updated patch with comments addressed. Renamed TezAPIHelpers to TezUtils and TezUtils to TezUtilsInternal. No consolidation of the various TezUtils in this patch though. Fixed javadoc, removed Hive/Pig LimitedPrivate. Have removed the unstable from the API though. Removed the numReducers check, and the associated fields. For YARNRunner, this check already runs in the JobClient - so nothing required there. > Consolidate MRHelpers > - > > Key: TEZ-1347 > URL: https://issues.apache.org/jira/browse/TEZ-1347 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Blocker > Attachments: TEZ-1347-initial-review.txt, TEZ-1347.1.txt, > TEZ-1347.2.txt > > > - Remove methods which don't belong in MRHelpers and potentially move them to > TezHelpers. > - Get rid of methods which we don't expect/want users to use. > - Get rid of multiple variants of the same method, if these exist. > - Investigate other cleanup in MRHelpers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1055) Rename tez-mapreduce-examples to tez-examples
[ https://issues.apache.org/jira/browse/TEZ-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096187#comment-14096187 ] Bikas Saha commented on TEZ-1055: - [~rekhajoshm] Can I take up this blocker jira since we are looking at getting a tez 0.5 release done by this week. > Rename tez-mapreduce-examples to tez-examples > - > > Key: TEZ-1055 > URL: https://issues.apache.org/jira/browse/TEZ-1055 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Rekha Joshi >Priority: Blocker > > And also the internal classes where applicable to remove MR references. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (TEZ-1361) Move SimpleMRProcessor, MRInput and MROutput into runtime-library
[ https://issues.apache.org/jira/browse/TEZ-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha reassigned TEZ-1361: --- Assignee: Bikas Saha > Move SimpleMRProcessor, MRInput and MROutput into runtime-library > - > > Key: TEZ-1361 > URL: https://issues.apache.org/jira/browse/TEZ-1361 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Bikas Saha >Priority: Blocker > Fix For: 0.5.0 > > > Its currently in tez-mapreduce and inaccessible. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1361) Move SimpleMRProcessor, MRInput and MROutput into runtime-library
[ https://issues.apache.org/jira/browse/TEZ-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096179#comment-14096179 ] Bikas Saha commented on TEZ-1361: - I am leaning towards leaving this in tez-mapreduce but moving the main "API" classes into proper packages within the project. This can be done under TEZ-1367. These classes are essentially MR specific. Closing this jira. Please reopen if anyone disagrees. > Move SimpleMRProcessor, MRInput and MROutput into runtime-library > - > > Key: TEZ-1361 > URL: https://issues.apache.org/jira/browse/TEZ-1361 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Priority: Blocker > Fix For: 0.5.0 > > > Its currently in tez-mapreduce and inaccessible. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (TEZ-1361) Move SimpleMRProcessor, MRInput and MROutput into runtime-library
[ https://issues.apache.org/jira/browse/TEZ-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha resolved TEZ-1361. - Resolution: Not a Problem > Move SimpleMRProcessor, MRInput and MROutput into runtime-library > - > > Key: TEZ-1361 > URL: https://issues.apache.org/jira/browse/TEZ-1361 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Priority: Blocker > Fix For: 0.5.0 > > > Its currently in tez-mapreduce and inaccessible. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1411) Address initial feedback on swimlanes
Bikas Saha created TEZ-1411: --- Summary: Address initial feedback on swimlanes Key: TEZ-1411 URL: https://issues.apache.org/jira/browse/TEZ-1411 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Gopal V Priority: Blocker Fix For: 0.5.0 Few other good to have things 1) A wrapper script that takes care of the command chaining with a single appId as input from the user. 2) Legend in the README or in the svg itself about what is what. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1402) MRoutput configurer should disable committer
[ https://issues.apache.org/jira/browse/TEZ-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-1402: Summary: MRoutput configurer should disable committer (was: MRoutput configurer should allow other committers and no committer) > MRoutput configurer should disable committer > > > Key: TEZ-1402 > URL: https://issues.apache.org/jira/browse/TEZ-1402 > Project: Apache Tez > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1402.1.patch, TEZ-1402.2.patch, TEZ-1402.3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1402) MRoutput configurer should allow disabling the committer
[ https://issues.apache.org/jira/browse/TEZ-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-1402: Summary: MRoutput configurer should allow disabling the committer (was: MRoutput configurer should disable committer) > MRoutput configurer should allow disabling the committer > > > Key: TEZ-1402 > URL: https://issues.apache.org/jira/browse/TEZ-1402 > Project: Apache Tez > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1402.1.patch, TEZ-1402.2.patch, TEZ-1402.3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1402) MRoutput configurer should allow other committers and no committer
[ https://issues.apache.org/jira/browse/TEZ-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096148#comment-14096148 ] Siddharth Seth commented on TEZ-1402: - Looks good. > MRoutput configurer should allow other committers and no committer > -- > > Key: TEZ-1402 > URL: https://issues.apache.org/jira/browse/TEZ-1402 > Project: Apache Tez > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1402.1.patch, TEZ-1402.2.patch, TEZ-1402.3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1402) MRoutput configurer should allow other committers and no committer
[ https://issues.apache.org/jira/browse/TEZ-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-1402: Attachment: TEZ-1402.3.patch Thanks. Attaching commit patch. > MRoutput configurer should allow other committers and no committer > -- > > Key: TEZ-1402 > URL: https://issues.apache.org/jira/browse/TEZ-1402 > Project: Apache Tez > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1402.1.patch, TEZ-1402.2.patch, TEZ-1402.3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1347) Consolidate MRHelpers
[ https://issues.apache.org/jira/browse/TEZ-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096093#comment-14096093 ] Bikas Saha commented on TEZ-1347: - TezAPIHelpers Why does it have to have API is the name. Isnt TezHelpers enough? Or TezUtils. (Merge other *Utils into TezInternalUtils). Remove the byte reference from java doc.? User payload should be enough. Convert a Configuration to compressed user pay load (i.e. byte[]) using Convert compressed pay load in byte[] to a Conf Remove the limited private for hiveand pig if this is needed by anyone who essentially needs to offload MR based pipelines to Tez? Same for other such cases. @Unstable is fine to keep. @LimitedPrivate("Hive, Pig") @Unstable - public static void translateVertexConfToTez(Configuration conf) { + public static void translateMRConfToTez(Configuration conf) { convertVertexConfToTez(conf); Wrong javadoc + * This is only meant to be used if frameworks are not setting up their own java options, + * and would like to fallback to using java options which may already be configured for + * Hadoop MapReduce mappers. < HERE * * Uses mapreduce.admin.reduce.child.java.opts, mapreduce.reduce.java.opts * and mapreduce.reduce.log.level from config to generate the opts. @@ -213,7 +305,7 @@ public class MRHelpers { * @return JAVA_OPTS string to be used in launching the JVM */ @SuppressWarnings("deprecation") - public static String getReduceJavaOpts(Configuration conf) { + public static String getJavaOptsForMRReducer(Configuration conf) { Will numReducers ever be true for anything other than YARNRunner? If that is the case, then we may not need this code at all for everybody else. Just move it to YARNRunner? + if (numReduces != 0) { +conf.setBooleanIfUnset("mapred.reducer.new-api", .. +if (numReduces != 0) { + ensureNotSet(conf, "mapred.partitioner.class", mode); Some more javadoc on when to use would help for MRHelpers.translateMRConfToTez() > Consolidate MRHelpers > - > > Key: TEZ-1347 > URL: https://issues.apache.org/jira/browse/TEZ-1347 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Blocker > Attachments: TEZ-1347-initial-review.txt, TEZ-1347.1.txt > > > - Remove methods which don't belong in MRHelpers and potentially move them to > TezHelpers. > - Get rid of methods which we don't expect/want users to use. > - Get rid of multiple variants of the same method, if these exist. > - Investigate other cleanup in MRHelpers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (TEZ-817) TEZ_LIB_URI are always uploaded as public Local Resource
[ https://issues.apache.org/jira/browse/TEZ-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha resolved TEZ-817. Resolution: Fixed Fix Version/s: 0.5.0 Hadoop Flags: Reviewed Thanks for your contribution. Committed. commit 215909e04415425688b9eb54d342d45ab6f5fa53 Author: Bikas Saha Date: Wed Aug 13 11:47:28 2014 -0700 TEZ-817. TEZ_LIB_URI are always uploaded as public Local Resource (Prakash Ramachandran via bikas) > TEZ_LIB_URI are always uploaded as public Local Resource > > > Key: TEZ-817 > URL: https://issues.apache.org/jira/browse/TEZ-817 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Prakash Ramachandran >Priority: Critical > Fix For: 0.5.0 > > Attachments: TEZ-817.1.patch, TEZ-817.2.patch, TEZ-817.3.patch, > TEZ-817.4.patch, TEZ-817.5.patch > > > They can point to any remote location that may be specific to a user (if the > user is playing with a private build). In that case, job submission will fail > since YARN will complain that the public LR is not public on the remote FS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API
[ https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095827#comment-14095827 ] Bikas Saha commented on TEZ-1390: - We do not have a strong case right now except that some of the internal buffer copies may be avoid by using a bytebuffer because it ByteString (from the internal protobuf) allows creating a read only ByteBuffer from ByteString without copying. If you have any concerns then now would be a good time to voice them :) [~ozawa] Please make sure that all getPayload() methods that return ByteBuffer return a clone of the byte buffer as bytebuffer is not thread safe. > Replace byte[] with ByteBuffer as the type of user payload in the API > - > > Key: TEZ-1390 > URL: https://issues.apache.org/jira/browse/TEZ-1390 > Project: Apache Tez > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Tsuyoshi OZAWA >Priority: Blocker > Attachments: pig.payload.txt > > > This is just and API change. Internally we can continue to use byte[] since > thats a much bigger change. > The translation from ByteBuffer to byte[] in the API layer should not have > perf impact. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1400) Reducers stuck when enabling auto-reduce parallelism (MRR case)
[ https://issues.apache.org/jira/browse/TEZ-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095816#comment-14095816 ] Bikas Saha commented on TEZ-1400: - Can you confirm that ShuffleVertexManager is being explicitly enabled for certain (or all) vertices by calling the vertex.setVertexManager() and then providing it a payload that configures TEZ_AM_SHUFFLE_VERTEX_MANAGER_ENABLE_AUTO_PARALLEL to true. This should not be turned on via the main job configuration as it will get inadvertently turned on for vertices that should not change their parallelism. If this is being enabled explicitly via the setVertexManager() with a payload then that is where the bug should be. If its not being explicitly turned on via setVertexManager() then that should change. One other thing you could try is to create a formal payload object for this manager and have a configurer that can set up all its parameters. By default it could pick up params from the client side tez-site.xml. Also remove the creation of payload from am conf if there is no payload to make the payload required. > Reducers stuck when enabling auto-reduce parallelism (MRR case) > --- > > Key: TEZ-1400 > URL: https://issues.apache.org/jira/browse/TEZ-1400 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Labels: performance > Attachments: TEZ-1400.1.patch, dag.dot > > > In M -> R1 -> R2 case, if R1 is optimized by auto-parallelism R2 gets stuck > waiting for events. > e.g > Map 1: 0/1 Map 2: -/- Map 5: 0/1 Map 6: 0/1 Map 7: 0/1 > Reducer 3: 0/23 Reducer 4: 0/1 > ... > ... > Map 1: 1/1 Map 2: 148(+13)/161 Map 5: 1/1 Map 6: 1/1 Map > 7: 1/1 Reducer 3: 0(+3)/3 Reducer 4: 0(+1)/1 <== Auto reduce > parallelism kicks in > .. > Map 1: 1/1 Map 2: 161/161 Map 5: 1/1 Map 6: 1/1 Map 7: 1/1 > Reducer 3: 3/3 Reducer 4: 0(+1)/1 > Job is stuck waiting for events in Reducer 4. > [fetcher [Reducer_3] #23] > org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler: copy(3 > of 23 at 0.02 MB/s) <=== *Waiting for 20 more partitions, even though > Reducer3 has been optimized to use 3 reducers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API
[ https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095579#comment-14095579 ] Jonathan Eagles edited comment on TEZ-1390 at 8/13/14 3:24 PM: --- These are the usages of Payload in pig. [^pig.payload.txt] was (Author: jeagles): These are the usages of Payload in pig. > Replace byte[] with ByteBuffer as the type of user payload in the API > - > > Key: TEZ-1390 > URL: https://issues.apache.org/jira/browse/TEZ-1390 > Project: Apache Tez > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Tsuyoshi OZAWA >Priority: Blocker > Attachments: pig.payload.txt > > > This is just and API change. Internally we can continue to use byte[] since > thats a much bigger change. > The translation from ByteBuffer to byte[] in the API layer should not have > perf impact. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API
[ https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-1390: - Attachment: pig.payload.txt These are the usages of Payload in pig. > Replace byte[] with ByteBuffer as the type of user payload in the API > - > > Key: TEZ-1390 > URL: https://issues.apache.org/jira/browse/TEZ-1390 > Project: Apache Tez > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Tsuyoshi OZAWA >Priority: Blocker > Attachments: pig.payload.txt > > > This is just and API change. Internally we can continue to use byte[] since > thats a much bigger change. > The translation from ByteBuffer to byte[] in the API layer should not have > perf impact. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1331) Investigate : interrupts being swallowed by TezClient/DAGClient methods
[ https://issues.apache.org/jira/browse/TEZ-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095503#comment-14095503 ] Johannes Zillmann commented on TEZ-1331: Had a look at the code base. Only crucial swallowing i found was in DAGClient. Created TEZ-1410 for that. Remaining "catch InterruptedException" are either re-thrown in an IOException (e.g. TezClientUtils#getAMProxy()) or really internal stuff only. > Investigate : interrupts being swallowed by TezClient/DAGClient methods > --- > > Key: TEZ-1331 > URL: https://issues.apache.org/jira/browse/TEZ-1331 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Priority: Blocker > > TEZ-1278 fixes waitTillReady to not ignore interrupts. This jira is to look > through other APIs to figure out whether interrupts handling needs to be > fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1410) DAGClient#waitForCompletion() methods should not swallow interrupts
[ https://issues.apache.org/jira/browse/TEZ-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Zillmann updated TEZ-1410: --- Attachment: TEZ-1410.1.patch > DAGClient#waitForCompletion() methods should not swallow interrupts > --- > > Key: TEZ-1410 > URL: https://issues.apache.org/jira/browse/TEZ-1410 > Project: Apache Tez > Issue Type: Improvement >Affects Versions: 0.5.0 >Reporter: Johannes Zillmann >Assignee: Johannes Zillmann > Attachments: TEZ-1410.1.patch > > > Based on TEZ-1331 i found that the 3 waitForCompletion() methods of DAGClient > swallowing interrupts as well. That way you never can stop the wait call > since all interrupts are caught and the wait logic just happily proceeds > (same as TEZ-1278). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1410) DAGClient#waitForCompletion() methods should not swallow interrupts
Johannes Zillmann created TEZ-1410: -- Summary: DAGClient#waitForCompletion() methods should not swallow interrupts Key: TEZ-1410 URL: https://issues.apache.org/jira/browse/TEZ-1410 Project: Apache Tez Issue Type: Improvement Affects Versions: 0.5.0 Reporter: Johannes Zillmann Assignee: Johannes Zillmann Based on TEZ-1331 i found that the 3 waitForCompletion() methods of DAGClient swallowing interrupts as well. That way you never can stop the wait call since all interrupts are caught and the wait logic just happily proceeds (same as TEZ-1278). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API
[ https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095485#comment-14095485 ] Johannes Zillmann commented on TEZ-1390: Just curious, whats the benefit of using ByteBuffer vs byte[] here ? > Replace byte[] with ByteBuffer as the type of user payload in the API > - > > Key: TEZ-1390 > URL: https://issues.apache.org/jira/browse/TEZ-1390 > Project: Apache Tez > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Tsuyoshi OZAWA >Priority: Blocker > > This is just and API change. Internally we can continue to use byte[] since > thats a much bigger change. > The translation from ByteBuffer to byte[] in the API layer should not have > perf impact. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1347) Consolidate MRHelpers
[ https://issues.apache.org/jira/browse/TEZ-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1347: Attachment: TEZ-1347.1.txt The last bit of cleanup in MRHelpers (not related to MRInput etc). Changes - Add a new class called TezAPIHelpers which contains some methods for conf to payload, etc - Helper methods for payloads etc removed from MRHelpers, in favor of the methods in TezAPIHelpers - Remvoed doJobClient magic. Replaced the important bit of determining which API to use with configureMRApiUsage - Renamed most of the methods in MRHelpers, and improved javadoc to indicate these are just helpers to parse out existing MR config values. [~bikassaha], please review. > Consolidate MRHelpers > - > > Key: TEZ-1347 > URL: https://issues.apache.org/jira/browse/TEZ-1347 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Blocker > Attachments: TEZ-1347-initial-review.txt, TEZ-1347.1.txt > > > - Remove methods which don't belong in MRHelpers and potentially move them to > TezHelpers. > - Get rid of methods which we don't expect/want users to use. > - Get rid of multiple variants of the same method, if these exist. > - Investigate other cleanup in MRHelpers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1390) Replace byte[] with ByteBuffer as the type of user payload in the API
[ https://issues.apache.org/jira/browse/TEZ-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095226#comment-14095226 ] Siddharth Seth commented on TEZ-1390: - Sounds good. > Replace byte[] with ByteBuffer as the type of user payload in the API > - > > Key: TEZ-1390 > URL: https://issues.apache.org/jira/browse/TEZ-1390 > Project: Apache Tez > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Tsuyoshi OZAWA >Priority: Blocker > > This is just and API change. Internally we can continue to use byte[] since > thats a much bigger change. > The translation from ByteBuffer to byte[] in the API layer should not have > perf impact. -- This message was sent by Atlassian JIRA (v6.2#6252)