Re: hadoop cluster for querying data on mongodb

Ayon Sinha Tue, 20 Dec 2011 21:13:20 -0800

Couple of things:
1. Hadoop's strength is in data locality. So having most of your Hadoop heavy 
lifting on local filesystem (HDFS where hadoop computation is shipped to the 
nodes with the data).
2. Assuming you are pulling data into Hadoop from Mongo to crunch and put the 
resulting data back into Mongo as only the 1st and the last step in your entire 
workflow, you are basically looking for a MongoInputFormat and 
MongoOutputFormat (I made up the class names). you are probably looking for 
https://jira.mongodb.org/browse/HADOOP/component/10736


Your other options if using Pig or Hive is to write Loader UDF's, similar to 
PigStorage, HBaseStorage, etc. 
 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.



________________________________
 From: Martinus Martinus <martinus...@gmail.com>
To: hdfs-user@hadoop.apache.org 
Sent: Tuesday, December 20, 2011 7:31 PM
Subject: hadoop cluster for querying data on mongodb
 

Hi,

I have hadoop cluster running and have my data inside mongodb
 database. I already write a java code to query data on mongodb using 
mongodb-java driver. And right now, I want to use hadoop cluster to run 
my java code to get and put the data from and to mongo database. Did 
anyone has done this before? Can you explain to me how to do that?

Thanks.

Re: hadoop cluster for querying data on mongodb

Reply via email to