Alan Gates wrote:
A few answers to your questions.

The hadoopX.jar files in pig's lib directory are not the standard hadoop jars. They differ in two ways. First, we recreate a hadoop jar that rolls in all the jars needed to compile with hadoop. This is somewhere around 15 jars. Second, we have a small hack we add for historical reasons. We need to resolve both of those issues. Once we do we can use stock hadoop jars instead of carrying along our own.


If you want to keep hadoop-related jars separate from other jars, you could put them all together in a lib/hadoop subdir. Re-packaging jars is confusing, you lose versioning information of dependent jars and also some jars may depend on specific values in MANIFEST, which repackaging may have dropped.

Regarding the hack: we had similar problems in Nutch. If changes are required to core Hadoop, perhaps it's better to submit them to Hadoop for inclusion. If they are a temporary hack, perhaps a facade class is a better approach. In some cases in Nutch we had to used a patched library anyway, which was then clearly marked as such and diffs from the stock version were available in JIRA.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to