[ 
https://issues.apache.org/jira/browse/TEZ-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1080:
--------------------------------

    Attachment: TEZ-1080.wip.1.txt

Very early WIP patch to get feedback / suggestions.

What this does is create a configuration class per Input / Output - and exposes 
the main configuration parameters via an API method. Serialization continues to 
be Configuration based, but this isn't a public API and hence can be changed 
out at a later point.

Several things to consider
- Do we want to keep supporting default values for things like 
intermediate-output-compression via tez-site.xml ?, or should we just rely on 
the in-code default. If tez-site isn't used - my guess is individual users like 
Pig, Hive, Cascading will end up requiring a similar configuration which they 
will then set on this API. So we'll likely need to define a set of keys which 
are picked up from tez-site.xml for these Inputs / Outputs.
- A lot of the configuration is common between the Inputs / Outputs - key class 
name, compression, etc. In fact, for an edge - these need to be the same. 
Creating an Edge configuration pair is possible but starts to get super 
complicated.
That would likely end up looking like
{code}
createIOPairConfiguration(String keyClass, String valClass)
    .setCompression()
    .setOtherCommonConfig()
    .configureOutput(OnFileSortedOutput) // REturns an OnFileSortedOutput 
config buidler
          .configureSortBuffer()
          .configureSorterNumThreads()
     .configureInput(ShuffledMergedInput) // Returns a 
ShuffledMergedInputConfigBuilder

.getInputConfBytes
.getOutputConfBytes
{code}
Something along those lines anyway.


> Configuration for non MR based Inputs/Outputs
> ---------------------------------------------
>
>                 Key: TEZ-1080
>                 URL: https://issues.apache.org/jira/browse/TEZ-1080
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: TEZ-1080.wip.1.txt
>
>
> De-link configuration from MRHelpers (except for the YARNRunner case), and 
> allow for these to be configured easily - exposing necessary setters / 
> getters without having to rely on config keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to