[
https://issues.apache.org/jira/browse/TEZ-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036282#comment-14036282
]
Siddharth Seth commented on TEZ-1080:
-------------------------------------
bq. Yes. We would like to have cluster wide configuration through tez-site.xml
for compression, shuffle tuning, etc.
This will be supported. What will not though, is having tez properties in a
configuration file like hive-site or pig-site. Similarly properties specified
via command line will need to be handled. Properties in those files will need
to be handled by Hive/Tez. Alternately ofcourse, a separate tez-site for Pig or
Hive could e included in the Pig/Hive directory - and that would be picked up
via the classpath.
bq. How about input format/output format?
Input / Output formats should not come into play on such edges - this is for
intermediate data.
bq. API looks clean. But in terms of simplicity and how we are going to use it,
it might actually be more complicated for us. For starters, we will have to do
conversion. Code is going to look like this for us.
Eventually, I'd imagine Pig would want to configure things like the sort buffer
size based on container and data sizes, rather than letting users overwrite it.
Most of these properties should be configured independently for each edge - a
single tez property via tez-site is not optimal.
What this is trying to do is to is get rid of unnecessary settings which are
otherwise sent over the wire to configure intermediate data edges. Also, it's
far better if platforms think about these settings w.r.t what kind of data the
edge will be processing, rather than relying on defaults.
> Configuration for non MR based Inputs/Outputs
> ---------------------------------------------
>
> Key: TEZ-1080
> URL: https://issues.apache.org/jira/browse/TEZ-1080
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Attachments: TEZ-1080.wip.1.txt, TEZ-1080.wip.2.txt
>
>
> De-link configuration from MRHelpers (except for the YARNRunner case), and
> allow for these to be configured easily - exposing necessary setters /
> getters without having to rely on config keys.
--
This message was sent by Atlassian JIRA
(v6.2#6252)