Hi Folks I've been doing some release engineering around Pig 0.7 and thought I would share this in case any of you have it baked into a distribution. Using the current techniques you can drop the current distro from 44MB to a runtime only distro of 26MB. Also, if I've missed something or anything I'm suggesting here has any negative ramifications I'd love to know.
1) Delete everything out of lib directory and copy the following files into the lib directory commons-el.jar commons-httpclient-3.0.1.jar commons-logging-1.0.4.jar hadoop-0.20.2-core.jar hbase-0.20.6.jar hbase-0.20.6-test.jar jline-0.9.94.jar log4j-1.2.15.jar 2) Delete the Pig Jars in $PIG_HOME except pig-0.7.1-dev-core.jar and copy it into the lib directory 3) Add the following to bin/pig so that grunt still works: for f in $PIG_DIR/lib/*.jar; do CLASSPATH=${CLASSPATH}:$f; done Lastly, some observations - According to its JIRA ticket, automaton.jar is part of Pig 0.8, what is the jar doing in Pig 0.7? - Those that ship Pig need to do Legal scans on the software to ensure all the dependencies (jars in the lib folder) have friendly licenses and can be shipped along with the base project. Creating files like Hadoop20.jar, where Hadoop and all of its dependencies + a bunch of classes of undetermined origin are all compiled into a single jar makes this extremely difficult. I'd like to bring it up for consideration that in future releases we could have an independent jar for each project in the lib. Otherwise, for each class we have to figure out what the project is (to determine its license) and what version it is based on the package name and date of the classes. Regards Steve Watt