On SSD you will get around 30-40MB/s on a single machine (on 4 cores). Thanks Best Regards
On Mon, Aug 31, 2015 at 3:13 PM, Deepesh Maheshwari < deepesh.maheshwar...@gmail.com> wrote: > tried it,,gives the same above exception > > Exception in thread "main" java.io.IOException: No FileSystem for scheme: > mongodb > > In you case, do you have used above code. > What read throughput , you get? > > On Mon, Aug 31, 2015 at 2:04 PM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> FYI, newAPIHadoopFile and newAPIHadoopRDD uses the NewHadoopRDD class >> itself underneath and it doesnt mean it will only read from HDFS. Give it a >> shot if you haven't tried it already (it just the inputformat and the >> reader which are different from your approach). >> >> Thanks >> Best Regards >> >> On Mon, Aug 31, 2015 at 1:14 PM, Deepesh Maheshwari < >> deepesh.maheshwar...@gmail.com> wrote: >> >>> Hi Akhil, >>> >>> This code snippet is from below link >>> >>> https://github.com/crcsmnky/mongodb-spark-demo/blob/master/src/main/java/com/mongodb/spark/demo/Recommender.java >>> >>> Here it reading data from HDFS file system but in our case i need to >>> read from mongodb. >>> >>> I have tried it earlier and now again tried it but is giving below error >>> which is self explanantory. >>> >>> Exception in thread "main" java.io.IOException: No FileSystem for >>> scheme: mongodb >>> >>> On Mon, Aug 31, 2015 at 1:03 PM, Akhil Das <ak...@sigmoidanalytics.com> >>> wrote: >>> >>>> Here's a piece of code which works well for us (spark 1.4.1) >>>> >>>> Configuration bsonDataConfig = new Configuration(); >>>> bsonDataConfig.set("mongo.job.input.format", >>>> "com.mongodb.hadoop.BSONFileInputFormat"); >>>> >>>> Configuration predictionsConfig = new Configuration(); >>>> predictionsConfig.set("mongo.output.uri", mongodbUri); >>>> >>>> JavaPairRDD<Object,BSONObject> bsonRatingsData = >>>> sc.newAPIHadoopFile( >>>> ratingsUri, BSONFileInputFormat.class, Object.class, >>>> BSONObject.class, bsonDataConfig); >>>> >>>> >>>> Thanks >>>> Best Regards >>>> >>>> On Mon, Aug 31, 2015 at 12:59 PM, Deepesh Maheshwari < >>>> deepesh.maheshwar...@gmail.com> wrote: >>>> >>>>> Hi, I am using <spark.version>1.3.0</spark.version> >>>>> >>>>> I am not getting constructor for above values >>>>> >>>>> [image: Inline image 1] >>>>> >>>>> So, i tried to shuffle the values in constructor . >>>>> [image: Inline image 2] >>>>> >>>>> But, it is giving this error.Please suggest >>>>> [image: Inline image 3] >>>>> >>>>> Best Regards >>>>> >>>>> On Mon, Aug 31, 2015 at 12:43 PM, Akhil Das < >>>>> ak...@sigmoidanalytics.com> wrote: >>>>> >>>>>> Can you try with these key value classes and see the performance? >>>>>> >>>>>> inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat" >>>>>> >>>>>> >>>>>> keyClassName = "org.apache.hadoop.io.Text" >>>>>> valueClassName = "org.apache.hadoop.io.MapWritable" >>>>>> >>>>>> >>>>>> Taken from databricks blog >>>>>> <https://databricks.com/blog/2015/03/20/using-mongodb-with-spark.html> >>>>>> >>>>>> Thanks >>>>>> Best Regards >>>>>> >>>>>> On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari < >>>>>> deepesh.maheshwar...@gmail.com> wrote: >>>>>> >>>>>>> Hi, I am trying to read mongodb in Spark newAPIHadoopRDD. >>>>>>> >>>>>>> /**** Code *****/ >>>>>>> >>>>>>> config.set("mongo.job.input.format", >>>>>>> "com.mongodb.hadoop.MongoInputFormat"); >>>>>>> config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI); >>>>>>> config.set("mongo.input.query","{host: 'abc.com'}"); >>>>>>> >>>>>>> JavaSparkContext sc=new JavaSparkContext("local", "MongoOps"); >>>>>>> >>>>>>> JavaPairRDD<Object, BSONObject> mongoRDD = >>>>>>> sc.newAPIHadoopRDD(config, >>>>>>> com.mongodb.hadoop.MongoInputFormat.class, >>>>>>> Object.class, >>>>>>> BSONObject.class); >>>>>>> >>>>>>> long count=mongoRDD.count(); >>>>>>> >>>>>>> There are about 1.5million record. >>>>>>> Though i am getting data but read operation took around 15min to >>>>>>> read whole. >>>>>>> >>>>>>> Is this Api really too slow or am i missing something. >>>>>>> Please suggest if there is an alternate approach to read data from >>>>>>> Mongo faster. >>>>>>> >>>>>>> Thanks, >>>>>>> Deepesh >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >