[
https://issues.apache.org/jira/browse/TEZ-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974938#comment-13974938
]
Bikas Saha commented on TEZ-698:
--------------------------------
bq. is this thread safe?
the patch just refactors the code. if it was thread safe earlier it should be
thread safe now. If it wasnt thread safe earlier, we need to follow up on a
separate jira.
bq. do we really need to use MRHelpers for resource calculation and java opts?
Currently we do. Since they give the only default values we have that work with
each other. I am thinking of a TezUtils.getDefaultResource() that replaces this
for users who dont have specific needs. For the java opts I am thinking of
transparently adding an xmx value derived from the vertex memory settings (if
not already set by the user). But those are follow up jiras orthogonal to this
one.
bq. perspective of a user looking at example code, are the payloads for an
input/output pair meant to be the same?
Thats why the API is called MRHelpers.createMRIntermediateDataPayload. Our MR
intermediate data inputs/outputs (based on MR shuffle) take identical payloads
(basically KV class and compression settings). Secondly, our intermediate
inputs/outputs are in tez-runtime-library which does not depend on
tez-mapreduce and so the MRHelpers that do the conf translation etc are not
accessible to that code. So we cannot move the payload creation helper method
to the actual Input/Output classes. Unfortunate.
bq. does this work with -Dparams?
Not quite sure what you mean. I noticed it crashed with 0 args and so I made a
quick fix.
Fixed missing useNewApi. MRInput was already correct. I dont expect the conf
argument to be JobConf and I want to create a new conf so that I dont mess up
the user conf object. The most important thing that magic does it set up the
partitioner and combiner.
> Make it easy to create and configure
> MRInput/MROutput/ShuffleInput/SortedOutput
> -------------------------------------------------------------------------------
>
> Key: TEZ-698
> URL: https://issues.apache.org/jira/browse/TEZ-698
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-698.1.patch
>
>
> We have moved away from MR and its not necessary for anyone to write mappers
> and reducers or to configure them. But MR input and output and Shuffle
> related inputs/outputs. Currently we have to invoke a host of methods to
> configure them. If we can have a single API to make these configs then it
> would really help. Secondly for IO pairs like ShuffleInput/SortedOutput,
> their configs are related (KV types e.g.) So it maybe useful to have a
> combined API that generates configs for both in a single API.
--
This message was sent by Atlassian JIRA
(v6.2#6252)