[ 
https://issues.apache.org/jira/browse/TEZ-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974938#comment-13974938
 ] 

Bikas Saha commented on TEZ-698:
--------------------------------

bq. is this thread safe?
the patch just refactors the code. if it was thread safe earlier it should be 
thread safe now. If it wasnt thread safe earlier, we need to follow up on a 
separate jira.

bq. do we really need to use MRHelpers for resource calculation and java opts?
Currently we do. Since they give the only default values we have that work with 
each other. I am thinking of a TezUtils.getDefaultResource() that replaces this 
for users who dont have specific needs. For the java opts I am thinking of 
transparently adding an xmx value derived from the vertex memory settings (if 
not already set by the user). But those are follow up jiras orthogonal to this 
one.

bq. perspective of a user looking at example code, are the payloads for an 
input/output pair meant to be the same?
Thats why the API is called MRHelpers.createMRIntermediateDataPayload. Our MR 
intermediate data inputs/outputs (based on MR shuffle) take identical payloads 
(basically KV class and compression settings). Secondly, our intermediate 
inputs/outputs are in tez-runtime-library which does not depend on 
tez-mapreduce and so the MRHelpers that do the conf translation etc are not 
accessible to that code. So we cannot move the payload creation helper method 
to the actual Input/Output classes. Unfortunate.

bq. does this work with -Dparams?
Not quite sure what you mean. I noticed it crashed with 0 args and so I made a 
quick fix.

Fixed missing useNewApi. MRInput was already correct. I dont expect the conf 
argument to be JobConf and I want to create a new conf so that I dont mess up 
the user conf object.  The most important thing that magic does it set up the 
partitioner and combiner.



> Make it easy to create and configure 
> MRInput/MROutput/ShuffleInput/SortedOutput
> -------------------------------------------------------------------------------
>
>                 Key: TEZ-698
>                 URL: https://issues.apache.org/jira/browse/TEZ-698
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-698.1.patch
>
>
> We have moved away from MR and its not necessary for anyone to write mappers 
> and reducers or to configure them. But MR input and output and Shuffle 
> related inputs/outputs. Currently we have to invoke a host of methods to 
> configure them. If we can have a single API to make these configs then it 
> would really help. Secondly for IO pairs like ShuffleInput/SortedOutput, 
> their configs are related (KV types e.g.) So it maybe useful to have a 
> combined API that generates configs for both in a single API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to