[
https://issues.apache.org/jira/browse/PIG-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefan Groschupf updated PIG-111:
---------------------------------
Attachment: PIG-111_v_3_sg.patch
Here is my suggestion. I'm sorry it got a little bigger as expected. I know it
is painful to review and discuss larger patches but I think it configuration is
a very important part.
Hadoop had big problems in the beginning since they made some mistakes early
(the configuration was static). In general I'm very much in favor of Inversion
of Control and Constructor based injection of concrete configuration values but
I understand that in case we want to switch Execution Engine implementations
easily we cant do that.
I noticed that pig actually only works since all configuration values are set
as system properties and than read back as system properties.
(System.getProperty). I made very bad experience using system properties in
production environments since it is not clear to the user what the values are.
Then you run a job taking a week on the wrong cluster and all services are down.
>From my point of view this are the important points:
+ the configuration object itself has no dependencies - java.util.Properies
would be the best choice from my point of view
+ the configuration is not static so we pass an properties instance around and
do not use system properties at all.
+ each Execution Engine implementation has to take care itself about converting
properties into a format the underlaying technology understand (properties to
hadoop configuration)
+ a default properties configuration file is part of our distribution
(PIG_HOME/conf) and contains all possible configuration values (for
documentation) but maybe do only set required values by default
The attached patch implements those points. I had to change some API- I'm very
sorry but I personal think it is cleaner now. I also had to adjust the tests.
I suggest to apply the patch and review the changed sources instead of reading
the patch file.
Fore sue this is just the starting point and we need furthure improvement in
the sources - e.g. I suggest Grunt allows to set all kind of properties not
just known once.
The patch is based on the patches done before for this issue.
Patch is against r631358. At least on my box the test suite is successfully.
> Configuration of Pig
> --------------------
>
> Key: PIG-111
> URL: https://issues.apache.org/jira/browse/PIG-111
> Project: Pig
> Issue Type: Improvement
> Reporter: Craig Macdonald
> Attachments: after.png, before.png, config.patch.1502,
> PIG-111_v_3_sg.patch, PIG-93-v01.patch, PIG-93-v02.patch
>
>
> This JIRA discusses issues relating to the configuration of Pig.
> Uses cases:
>
> 1. I want to configure Pig programatically from Java
> Motivation: pig can be embedded from another Java program, and configuration
> should be accessible to be set by the client code
> 2. I want to configure Pig from the command line
> 3. I want to configure Pig from the Pig shell (Grunt)
> 4. I want Pig to remember my configuration for every Pig session
> Motivation: to save me typing in some configuration stuff every time.
> 5. I want Pig to remember my configuration for this script.
> Motivation: I must use a common configuration for 50% of my Pig scripts -
> can I share this configuration between scripts.
> Current Status:
> * Pig uses System properties for some configuration
> * A configuration properties object in PigContext is not used.
> * pigrc can contain properties
> * Configuration properties can not be set from Grunt
> Proposed solutions to use cases:
> 1. Configuration should be set in PigContext, and accessible from client code.
> 2. System properties are copied to PigContext, or can be specified on the
> command line (duplication with System properties)
> 3. Allow configuration properties to be set using the "set" command in Grunt
> 4. Pigrc can contain properties. Is this enough, or can other configuration
> stuff be set, eg aliases, imports, etc.
> 5. Add an include directive to pig, to allow a shared configuration/Pig
> script to be included.
> Connections to Shell scripting:
> * The source command in Bash allows another bash script file to be included
> - this allows shared variables to be set in one file shared between a set of
> scripts.
> * Aliases can be set, according to user preferences, etc.
> * All this can be done in your .bashrc file
> Issues:
> * What happens when you change a property after the property has been read?
> * Can Grunt read a pigrc containing various statements etc before the
> PigServer is completely configured?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.