Re: Reading 2 table data in MapReduce for Performing Join

2015-03-18 Thread Suraj Nayak
Hi All,

https://issues.apache.org/jira/browse/HIVE-4997 patch helped!

On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak snay...@gmail.com wrote:

 Hi,

 I tried reading data via HCatalog for 1 Hive table in MapReduce using
 something similar to
 https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog.
 I was able to read successfully.

 Now am trying to read 2 tables, as the requirement is to join 2 tables. I
 did not find API similar to *FileInputFormat.addInputPaths* in
 *HCatInputFormat*. What is the equivalent of the same in HCat ?

 I had performed join using FilesInputFormat in HDFS(by getting split
 information in mapper). This article(
 http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join.
 http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone suggest
 how I can perform join operation using HCatalog ?

 Briefly, the aim is to

- Read 2 tables (almost similar schema)
- If key exists in both the table send it to same reducer.
- Do some processing on the records in reducer.
- Save the output into file/Hive table.

 *P.S : The reason for using MapReduce to perform join is because of
 complex requirement which can't be solved via Hive/Pig directly. *

 Any help will be greatly appreciated :)

 --
 Thanks
 Suraj Nayak M




-- 
Thanks
Suraj Nayak M


Best practises to learn hadoop for new users

2013-03-24 Thread suraj nayak