Alan Gates wrote:
A few answers to your questions.
The hadoopX.jar files in pig's lib directory are not the standard hadoop
jars. They differ in two ways. First, we recreate a hadoop jar that
rolls in all the jars needed to compile with hadoop. This is somewhere
around 15 jars. Second, we have a small hack we add for historical
reasons. We need to resolve both of those issues. Once we do we can
use stock hadoop jars instead of carrying along our own.
If you want to keep hadoop-related jars separate from other jars, you
could put them all together in a lib/hadoop subdir. Re-packaging jars is
confusing, you lose versioning information of dependent jars and also
some jars may depend on specific values in MANIFEST, which repackaging
may have dropped.
Regarding the hack: we had similar problems in Nutch. If changes are
required to core Hadoop, perhaps it's better to submit them to Hadoop
for inclusion. If they are a temporary hack, perhaps a facade class is a
better approach. In some cases in Nutch we had to used a patched library
anyway, which was then clearly marked as such and diffs from the stock
version were available in JIRA.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com