You can register it in the pig script (or with the a recent patch, on the command-line even), and it will get shipped and put on the classpath; or you can prep your machines to have the local copy. For something like JDBC drivers I think it may be reasonable to let users decide rather than bundle it in by default -- shipping jars from the client to the cluster does have some overhead, and a lot of folks will probably have these installed on their hadoop nodes anyway.
Just imho (and I haven't actually tried using Ankur's patch yet). On Thu, Feb 18, 2010 at 9:37 AM, zaki rahaman <[email protected]>wrote: > Hey, > > First off, @Ankur, great work so far on the patch. This probably is not an > efficient way of doing mass dumps to DB (but why would you want to do that > anyway when you have HDFS?), but it hits the sweetspot for my particular > use > case (storing aggregates to interface with a webapp). I was able to apply > the patch cleanly and build. I had a question about actually using the > DBStorage UDF, namely where I have to keep the JDBC driver? I was wondering > if it would be possible to simply bundle it in the same jar as the UDF > itself, but I know that Hadoop's DBOutputFormat requires a local copy of > the > driver on each machine. Any pointers? > > -- > Zaki Rahaman >
