Re: Using JDBC DBStorage (PIG-1229)

Rohan Rai Fri, 19 Feb 2010 02:19:05 -0800

Hey Ankur

Thanks in the first place, your effort is so much aligned to my need at
this moment, but needed to understand the UDF a little more.
I had this query


1) Will every reducer have a different connection.
2) Will the store be transactional in nature across  reducer
3) Will the store be transactional on the dataset being pushed to DB
(Even if one fails, there is a roll back)

Regards
Rohan

Ankur C. Goel wrote:

Zaki,
        Thanks for the appreciation :-). I agree it is not an efficient way of 
dumping to DB, for that you would use SQLLoader or something. I myself had 
exactly the same use case as yours, the reason why it was developed. You can 
either have the drivers bundled with the UDF jar or have a separate jar and 
both would need to be registered in the pig script if the drivers are not 
already installed on your cluster and are part of hadoop classpath.  Just an 
FYI, DBStorage does not use Hadoop's DBOutputformat. Take a look at 
TestDBStorage.java for a sample use case.

Hope this helps

-...@nkur


On 2/18/10 11:56 PM, "Dmitriy Ryaboy" <[email protected]> wrote:

You can register it in the pig script (or with the a recent patch, on the
command-line even), and it will get shipped and put on the classpath; or you
can prep your machines to have the local copy.  For something like JDBC
drivers I think it may be reasonable to let users decide rather than bundle
it in by default -- shipping jars from the client to the cluster does have
some overhead, and a lot of folks will probably have these installed on
their hadoop nodes anyway.

Just imho (and I haven't actually tried using Ankur's patch yet).

On Thu, Feb 18, 2010 at 9:37 AM, zaki rahaman <[email protected]>wrote:

Hey,

First off, @Ankur, great work so far on the patch. This probably is not an
efficient way of doing mass dumps to DB (but why would you want to do that
anyway when you have HDFS?), but it hits the sweetspot for my particular
use
case (storing aggregates to interface with a webapp). I was able to apply
the patch cleanly and build. I had a question about actually using the
DBStorage UDF, namely where I have to keep the JDBC driver? I was wondering
if it would be possible to simply bundle it in the same jar as the UDF
itself, but I know that Hadoop's DBOutputFormat requires a local copy of
the
driver on each machine. Any pointers?

--
Zaki Rahaman



The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. It may contain confidential or legally privileged information. If 
you are not the intended recipient you are hereby notified that any disclosure, 
copying, distribution or taking any action in reliance on the contents of this 
information is strictly prohibited and may be unlawful. If you have received 
this communication in error, please notify us immediately by responding to this 
email and then delete it from your system. The firm is neither liable for the 
proper and complete transmission of the information contained in this 
communication nor for any delay in its receipt.

Re: Using JDBC DBStorage (PIG-1229)

Reply via email to