Thanks Bejoy,
I was looking at DBInputFormat with MultipleInputs. MultipleInputs takes a
Path parameter. Are these paths just ignored here?

On Mon, Dec 5, 2011 at 2:31 PM, Bejoy Ks <bejoy.had...@gmail.com> wrote:

> Hi Justin,
>            Just to add on to my response. If you need to fetch data from
> rdbms on your mapper using your custom mapreduce code you can use the
> DBInputFormat in your mapper class with MultipleInputs. You have to be
> careful in using the number of mappers for your application as dbs would be
> constrained with a limit on maximum simultaneous connections. Also you need
> to ensure that that the same Query is not executed n number of times in n
> mappers all fetching the same data, It'd be just wastage of network. Sqoop
> + Hive would be my recommendation and a good combination for such use
> cases. If you have Pig competency you can also look into pig instead of
> hive.
>
> Hope it helps!...
>
> Regards
> Bejoy.K.S
>
> On Tue, Dec 6, 2011 at 1:36 AM, Bejoy Ks <bejoy.had...@gmail.com> wrote:
>
> > Justin
> >         If I get your requirement right you need to get in data from
> > multiple rdbms sources and do a join on the same, also may be some more
> > custom operations on top of this. For this you don't need to go in for
> > writing your custom mapreduce code unless it is that required. You can
> > achieve the same in two easy steps
> > - Import data from RDBMS into Hive using SQOOP (Import)
> > - Use hive to do some join and processing on this data
> >
> > Hope it helps!..
> >
> > Regards
> > Bejoy.K.S
> >
> >
> > On Tue, Dec 6, 2011 at 12:13 AM, Justin Vincent <justi...@gmail.com
> >wrote:
> >
> >> I would like join some db tables, possibly from different databases, in
> a
> >> MR job.
> >>
> >> I would essentially like to use MultipleInputs, but that seems file
> >> oriented. I need a different mapper for each db table.
> >>
> >> Suggestions?
> >>
> >> Thanks!
> >>
> >> Justin Vincent
> >>
> >
> >
>

Reply via email to