Couple of things: 1. Hadoop's strength is in data locality. So having most of your Hadoop heavy lifting on local filesystem (HDFS where hadoop computation is shipped to the nodes with the data). 2. Assuming you are pulling data into Hadoop from Mongo to crunch and put the resulting data back into Mongo as only the 1st and the last step in your entire workflow, you are basically looking for a MongoInputFormat and MongoOutputFormat (I made up the class names). you are probably looking for https://jira.mongodb.org/browse/HADOOP/component/10736
Your other options if using Pig or Hive is to write Loader UDF's, similar to PigStorage, HBaseStorage, etc. -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. ________________________________ From: Martinus Martinus <martinus...@gmail.com> To: hdfs-user@hadoop.apache.org Sent: Tuesday, December 20, 2011 7:31 PM Subject: hadoop cluster for querying data on mongodb Hi, I have hadoop cluster running and have my data inside mongodb database. I already write a java code to query data on mongodb using mongodb-java driver. And right now, I want to use hadoop cluster to run my java code to get and put the data from and to mongo database. Did anyone has done this before? Can you explain to me how to do that? Thanks.