[ 
https://issues.apache.org/jira/browse/PIG-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Groschupf updated PIG-111:
---------------------------------

    Attachment: PIG-111_v_3_sg.patch

Here is my suggestion. I'm sorry it got a little bigger as expected. I know it 
is painful to review and discuss larger patches but I think it configuration is 
a very important part.
Hadoop had big  problems in the beginning since they made some mistakes early 
(the configuration was static). In general I'm very much in favor of Inversion 
of Control and Constructor based injection of concrete configuration values but 
I understand that in case we want to switch Execution Engine implementations 
easily we cant do that. 

I noticed that pig actually only works since all configuration values are set 
as system properties and than read back as system properties. 
(System.getProperty). I made very bad experience using system properties in 
production environments since it is not clear to the user what the values are. 
Then you run a job taking a week on the wrong cluster and all services are down.

>From my point of view this are the important points:
+ the configuration object itself has no dependencies - java.util.Properies 
would be the best choice from my point of view
+ the configuration is not static so we pass an properties instance around and 
do not use system properties at all.
+ each Execution Engine implementation has to take care itself about converting 
properties into a format the underlaying technology understand (properties to 
hadoop configuration)
+ a default properties configuration file is part of our distribution 
(PIG_HOME/conf) and contains all possible configuration values (for 
documentation) but maybe do only set required values by default

The attached patch implements those points. I had to change some API- I'm very 
sorry but I personal think it is cleaner now. I also had to adjust the tests.
I suggest to apply the patch and review the changed sources instead of reading 
the patch file.
Fore sue this is just the starting point and we need furthure improvement in 
the sources - e.g. I suggest Grunt allows to set all kind of properties not 
just known once.

The patch is based on the patches done before for this issue.
Patch is against r631358. At least on my box the test suite is successfully. 



> Configuration of Pig
> --------------------
>
>                 Key: PIG-111
>                 URL: https://issues.apache.org/jira/browse/PIG-111
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Craig Macdonald
>         Attachments: after.png, before.png, config.patch.1502, 
> PIG-111_v_3_sg.patch, PIG-93-v01.patch, PIG-93-v02.patch
>
>
> This JIRA discusses issues relating to the configuration of Pig.
> Uses cases:
>  
> 1. I want to configure Pig programatically from Java
>  Motivation: pig can be embedded from another Java program, and configuration 
> should be accessible to be set by the client code
> 2. I want to configure Pig from the command line
> 3. I want to configure Pig from the Pig shell (Grunt)
> 4. I want Pig to remember my configuration for every Pig session
>  Motivation: to save me typing in some configuration stuff every time.
> 5. I want Pig to remember my configuration for this script.
>  Motivation: I must use a common configuration for 50% of my Pig scripts - 
> can I share this configuration between scripts.
> Current Status: 
>  * Pig uses System properties for some configuration
>  * A configuration properties object in PigContext is not used.
>  * pigrc can contain properties
>  * Configuration properties can not be set from Grunt
> Proposed solutions to use cases:
> 1. Configuration should be set in PigContext, and accessible from client code.
> 2. System properties are copied to PigContext, or can be specified on the 
> command line (duplication with System properties)
> 3. Allow configuration properties to be set using the "set" command in Grunt
> 4. Pigrc can contain properties. Is this enough, or can other configuration 
> stuff be set, eg aliases, imports, etc.
> 5. Add an include directive to pig, to allow a shared configuration/Pig 
> script to be included.
> Connections to Shell scripting: 
>  * The source command in Bash allows another bash script file to be included 
> - this allows shared variables to be set in one file shared between a set of 
> scripts.
>  * Aliases can be set, according to user preferences, etc.
>  * All this can be done in your .bashrc file
> Issues: 
>  * What happens when you change a property after the property has been read?
>  * Can Grunt read a pigrc containing various statements etc before the 
> PigServer is completely configured?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to