I have 3 HTables.... Table1, Table2 & Table3. I have 3 different flat files. One contains keys for Table1, 2nd contains keys for Table2 & 3rd contains keys for Table3.
Use case: For every combination of these 3 keys, I need to perform some complex calculation and save the result in another HTable. In other words, I need to calculate values for the following combos: (1,1,1) (1,1,2)....... (1,1,N) (1,2,1) (1,3,1) & so on.... So I figured the best way to do this is to start a MapReduce Job for each of these combinations. The MapReduce will get (Key1, Key2, Key3) as input, then read Table1, Table2 & Table3 with these keys and perform the calculations. Is this the correct approach? If it is, I need to pass Key1, Key2 & Key3 to the Mapper & Reducer. What's the best way to do this? At this time, I don't need to join these tables in MapReduce, but in future I might have to. Thanks. ________________________________ From: Kevin Peterson <kpeter...@biz360.com> To: hbase-u...@hadoop.apache.org Sent: Thu, October 15, 2009 11:39:22 AM Subject: Re: Question about MapReduce On Thu, Oct 15, 2009 at 11:30 AM, Something Something < luckyguy2...@yahoo.com> wrote: > 1) I don't think TableInputFormat is useful in this case. Looks like it's > used for scanning columns from a single HTable. > 2) TableMapReduceUtil - same problem. Seems like this works with just one > table. > 3) JV recommended NLineInputFormat, but my parameters are not in a file. > They come from multiple files and are in memory. > > I guess what I am looking for is something like... InMemoryInputFormat... > similar to FileInputFormat & DbInputFormat. There's no such class right > now. > > Worse comes to worst, I can write the parameters into a flat file, and use > FileInputFormat - but that will slow down this process considerably. Is > there no other way? > > So you need to pull input from multiple tables at once? Are you expecting to do a join on these tables? If you explain what the data looks like, we'd understand better. What are your tables, and what would you like to treat as a single input record?