Thanks Bejoy, I was looking at DBInputFormat with MultipleInputs. MultipleInputs takes a Path parameter. Are these paths just ignored here?
On Mon, Dec 5, 2011 at 2:31 PM, Bejoy Ks <bejoy.had...@gmail.com> wrote: > Hi Justin, > Just to add on to my response. If you need to fetch data from > rdbms on your mapper using your custom mapreduce code you can use the > DBInputFormat in your mapper class with MultipleInputs. You have to be > careful in using the number of mappers for your application as dbs would be > constrained with a limit on maximum simultaneous connections. Also you need > to ensure that that the same Query is not executed n number of times in n > mappers all fetching the same data, It'd be just wastage of network. Sqoop > + Hive would be my recommendation and a good combination for such use > cases. If you have Pig competency you can also look into pig instead of > hive. > > Hope it helps!... > > Regards > Bejoy.K.S > > On Tue, Dec 6, 2011 at 1:36 AM, Bejoy Ks <bejoy.had...@gmail.com> wrote: > > > Justin > > If I get your requirement right you need to get in data from > > multiple rdbms sources and do a join on the same, also may be some more > > custom operations on top of this. For this you don't need to go in for > > writing your custom mapreduce code unless it is that required. You can > > achieve the same in two easy steps > > - Import data from RDBMS into Hive using SQOOP (Import) > > - Use hive to do some join and processing on this data > > > > Hope it helps!.. > > > > Regards > > Bejoy.K.S > > > > > > On Tue, Dec 6, 2011 at 12:13 AM, Justin Vincent <justi...@gmail.com > >wrote: > > > >> I would like join some db tables, possibly from different databases, in > a > >> MR job. > >> > >> I would essentially like to use MultipleInputs, but that seems file > >> oriented. I need a different mapper for each db table. > >> > >> Suggestions? > >> > >> Thanks! > >> > >> Justin Vincent > >> > > > > >