Hi Folks

I've been doing some release engineering around Pig 0.7 and thought I 
would share this in case any of you have it baked into a distribution. 
Using the current techniques you can drop the current distro from 44MB to 
a runtime only distro of 26MB. Also, if I've missed something or anything 
I'm suggesting here has any negative ramifications I'd love to know.

1) Delete everything out of lib directory and copy the following files 
into the lib directory commons-el.jar  commons-httpclient-3.0.1.jar 
commons-logging-1.0.4.jar  hadoop-0.20.2-core.jar  hbase-0.20.6.jar 
hbase-0.20.6-test.jar  jline-0.9.94.jar  log4j-1.2.15.jar
2) Delete the Pig Jars in $PIG_HOME except pig-0.7.1-dev-core.jar and copy 
it into the lib directory
3) Add the following to bin/pig so that grunt still works:

for f in $PIG_DIR/lib/*.jar; do
    CLASSPATH=${CLASSPATH}:$f;
done

Lastly, some observations

- According to its JIRA ticket, automaton.jar is part of Pig 0.8, what is 
the jar doing in Pig 0.7? 

- Those that ship Pig need to do Legal scans on the software to ensure all 
the dependencies (jars in the lib folder) have friendly licenses and can 
be shipped along with the base project. Creating files like Hadoop20.jar, 
where Hadoop and all of its dependencies + a bunch of classes of 
undetermined origin are all compiled into a single jar makes this 
extremely difficult. I'd like to bring it up for consideration that in 
future releases we could have an independent jar for each project in the 
lib. Otherwise, for each class we have to figure out what the project is 
(to determine its license) and what version it is based on the package 
name and date of the classes.

Regards
Steve Watt

Reply via email to