Zaki,
        Thanks for the appreciation :-). I agree it is not an efficient way of 
dumping to DB, for that you would use SQLLoader or something. I myself had 
exactly the same use case as yours, the reason why it was developed. You can 
either have the drivers bundled with the UDF jar or have a separate jar and 
both would need to be registered in the pig script if the drivers are not 
already installed on your cluster and are part of hadoop classpath.  Just an 
FYI, DBStorage does not use Hadoop's DBOutputformat. Take a look at 
TestDBStorage.java for a sample use case.

Hope this helps

-...@nkur


On 2/18/10 11:56 PM, "Dmitriy Ryaboy" <[email protected]> wrote:

You can register it in the pig script (or with the a recent patch, on the
command-line even), and it will get shipped and put on the classpath; or you
can prep your machines to have the local copy.  For something like JDBC
drivers I think it may be reasonable to let users decide rather than bundle
it in by default -- shipping jars from the client to the cluster does have
some overhead, and a lot of folks will probably have these installed on
their hadoop nodes anyway.

Just imho (and I haven't actually tried using Ankur's patch yet).

On Thu, Feb 18, 2010 at 9:37 AM, zaki rahaman <[email protected]>wrote:

> Hey,
>
> First off, @Ankur, great work so far on the patch. This probably is not an
> efficient way of doing mass dumps to DB (but why would you want to do that
> anyway when you have HDFS?), but it hits the sweetspot for my particular
> use
> case (storing aggregates to interface with a webapp). I was able to apply
> the patch cleanly and build. I had a question about actually using the
> DBStorage UDF, namely where I have to keep the JDBC driver? I was wondering
> if it would be possible to simply bundle it in the same jar as the UDF
> itself, but I know that Hadoop's DBOutputFormat requires a local copy of
> the
> driver on each machine. Any pointers?
>
> --
> Zaki Rahaman
>

Reply via email to