[
https://issues.apache.org/jira/browse/TEZ-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siddharth Seth updated TEZ-1080:
--------------------------------
Attachment: TEZ-1080.wip.1.txt
Very early WIP patch to get feedback / suggestions.
What this does is create a configuration class per Input / Output - and exposes
the main configuration parameters via an API method. Serialization continues to
be Configuration based, but this isn't a public API and hence can be changed
out at a later point.
Several things to consider
- Do we want to keep supporting default values for things like
intermediate-output-compression via tez-site.xml ?, or should we just rely on
the in-code default. If tez-site isn't used - my guess is individual users like
Pig, Hive, Cascading will end up requiring a similar configuration which they
will then set on this API. So we'll likely need to define a set of keys which
are picked up from tez-site.xml for these Inputs / Outputs.
- A lot of the configuration is common between the Inputs / Outputs - key class
name, compression, etc. In fact, for an edge - these need to be the same.
Creating an Edge configuration pair is possible but starts to get super
complicated.
That would likely end up looking like
{code}
createIOPairConfiguration(String keyClass, String valClass)
.setCompression()
.setOtherCommonConfig()
.configureOutput(OnFileSortedOutput) // REturns an OnFileSortedOutput
config buidler
.configureSortBuffer()
.configureSorterNumThreads()
.configureInput(ShuffledMergedInput) // Returns a
ShuffledMergedInputConfigBuilder
.getInputConfBytes
.getOutputConfBytes
{code}
Something along those lines anyway.
> Configuration for non MR based Inputs/Outputs
> ---------------------------------------------
>
> Key: TEZ-1080
> URL: https://issues.apache.org/jira/browse/TEZ-1080
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Attachments: TEZ-1080.wip.1.txt
>
>
> De-link configuration from MRHelpers (except for the YARNRunner case), and
> allow for these to be configured easily - exposing necessary setters /
> getters without having to rely on config keys.
--
This message was sent by Atlassian JIRA
(v6.2#6252)