[ 
https://issues.apache.org/jira/browse/TEZ-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036282#comment-14036282
 ] 

Siddharth Seth commented on TEZ-1080:
-------------------------------------

bq. Yes. We would like to have cluster wide configuration through tez-site.xml 
for compression, shuffle tuning, etc.
This will be supported. What will not though, is having tez properties in a 
configuration file like hive-site or pig-site. Similarly properties specified 
via command line will need to be handled. Properties in those files will need 
to be handled by Hive/Tez. Alternately ofcourse, a separate tez-site for Pig or 
Hive could e included in the Pig/Hive directory - and that would be picked up 
via the classpath.

bq. How about input format/output format?
Input / Output formats should not come into play on such edges - this is for 
intermediate data.

bq. API looks clean. But in terms of simplicity and how we are going to use it, 
it might actually be more complicated for us. For starters, we will have to do 
conversion. Code is going to look like this for us.
Eventually, I'd imagine Pig would want to configure things like the sort buffer 
size based on container and data sizes, rather than letting users overwrite it. 
Most of these properties should be configured independently for each edge - a 
single tez property via tez-site is not optimal.

What this is trying to do is to is get rid of unnecessary settings which are 
otherwise sent over the wire to configure intermediate data edges. Also, it's 
far better if platforms think about these settings w.r.t what kind of data the 
edge will be processing, rather than relying on defaults.

> Configuration for non MR based Inputs/Outputs
> ---------------------------------------------
>
>                 Key: TEZ-1080
>                 URL: https://issues.apache.org/jira/browse/TEZ-1080
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: TEZ-1080.wip.1.txt, TEZ-1080.wip.2.txt
>
>
> De-link configuration from MRHelpers (except for the YARNRunner case), and 
> allow for these to be configured easily - exposing necessary setters / 
> getters without having to rely on config keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to