Because of existing architecture , i am bound to use mongodb. Please suggest for this
On Thu, Sep 3, 2015 at 9:10 PM, Jörn Franke <jornfra...@gmail.com> wrote: > You might think about another storage layer not being mongodb > (hdfs+orc+compression or hdfs+parquet+compression) to improve performance > > Le jeu. 3 sept. 2015 à 9:15, Akhil Das <ak...@sigmoidanalytics.com> a > écrit : > >> On SSD you will get around 30-40MB/s on a single machine (on 4 cores). >> >> Thanks >> Best Regards >> >> On Mon, Aug 31, 2015 at 3:13 PM, Deepesh Maheshwari < >> deepesh.maheshwar...@gmail.com> wrote: >> >>> tried it,,gives the same above exception >>> >>> Exception in thread "main" java.io.IOException: No FileSystem for >>> scheme: mongodb >>> >>> In you case, do you have used above code. >>> What read throughput , you get? >>> >>> On Mon, Aug 31, 2015 at 2:04 PM, Akhil Das <ak...@sigmoidanalytics.com> >>> wrote: >>> >>>> FYI, newAPIHadoopFile and newAPIHadoopRDD uses the NewHadoopRDD class >>>> itself underneath and it doesnt mean it will only read from HDFS. Give it a >>>> shot if you haven't tried it already (it just the inputformat and the >>>> reader which are different from your approach). >>>> >>>> Thanks >>>> Best Regards >>>> >>>> On Mon, Aug 31, 2015 at 1:14 PM, Deepesh Maheshwari < >>>> deepesh.maheshwar...@gmail.com> wrote: >>>> >>>>> Hi Akhil, >>>>> >>>>> This code snippet is from below link >>>>> >>>>> https://github.com/crcsmnky/mongodb-spark-demo/blob/master/src/main/java/com/mongodb/spark/demo/Recommender.java >>>>> >>>>> Here it reading data from HDFS file system but in our case i need to >>>>> read from mongodb. >>>>> >>>>> I have tried it earlier and now again tried it but is giving below >>>>> error which is self explanantory. >>>>> >>>>> Exception in thread "main" java.io.IOException: No FileSystem for >>>>> scheme: mongodb >>>>> >>>>> On Mon, Aug 31, 2015 at 1:03 PM, Akhil Das <ak...@sigmoidanalytics.com >>>>> > wrote: >>>>> >>>>>> Here's a piece of code which works well for us (spark 1.4.1) >>>>>> >>>>>> Configuration bsonDataConfig = new Configuration(); >>>>>> bsonDataConfig.set("mongo.job.input.format", >>>>>> "com.mongodb.hadoop.BSONFileInputFormat"); >>>>>> >>>>>> Configuration predictionsConfig = new Configuration(); >>>>>> predictionsConfig.set("mongo.output.uri", mongodbUri); >>>>>> >>>>>> JavaPairRDD<Object,BSONObject> bsonRatingsData = >>>>>> sc.newAPIHadoopFile( >>>>>> ratingsUri, BSONFileInputFormat.class, Object.class, >>>>>> BSONObject.class, bsonDataConfig); >>>>>> >>>>>> >>>>>> Thanks >>>>>> Best Regards >>>>>> >>>>>> On Mon, Aug 31, 2015 at 12:59 PM, Deepesh Maheshwari < >>>>>> deepesh.maheshwar...@gmail.com> wrote: >>>>>> >>>>>>> Hi, I am using <spark.version>1.3.0</spark.version> >>>>>>> >>>>>>> I am not getting constructor for above values >>>>>>> >>>>>>> [image: Inline image 1] >>>>>>> >>>>>>> So, i tried to shuffle the values in constructor . >>>>>>> [image: Inline image 2] >>>>>>> >>>>>>> But, it is giving this error.Please suggest >>>>>>> [image: Inline image 3] >>>>>>> >>>>>>> Best Regards >>>>>>> >>>>>>> On Mon, Aug 31, 2015 at 12:43 PM, Akhil Das < >>>>>>> ak...@sigmoidanalytics.com> wrote: >>>>>>> >>>>>>>> Can you try with these key value classes and see the performance? >>>>>>>> >>>>>>>> inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat" >>>>>>>> >>>>>>>> >>>>>>>> keyClassName = "org.apache.hadoop.io.Text" >>>>>>>> valueClassName = "org.apache.hadoop.io.MapWritable" >>>>>>>> >>>>>>>> >>>>>>>> Taken from databricks blog >>>>>>>> <https://databricks.com/blog/2015/03/20/using-mongodb-with-spark.html> >>>>>>>> >>>>>>>> Thanks >>>>>>>> Best Regards >>>>>>>> >>>>>>>> On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari < >>>>>>>> deepesh.maheshwar...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, I am trying to read mongodb in Spark newAPIHadoopRDD. >>>>>>>>> >>>>>>>>> /**** Code *****/ >>>>>>>>> >>>>>>>>> config.set("mongo.job.input.format", >>>>>>>>> "com.mongodb.hadoop.MongoInputFormat"); >>>>>>>>> config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI); >>>>>>>>> config.set("mongo.input.query","{host: 'abc.com'}"); >>>>>>>>> >>>>>>>>> JavaSparkContext sc=new JavaSparkContext("local", "MongoOps"); >>>>>>>>> >>>>>>>>> JavaPairRDD<Object, BSONObject> mongoRDD = >>>>>>>>> sc.newAPIHadoopRDD(config, >>>>>>>>> com.mongodb.hadoop.MongoInputFormat.class, >>>>>>>>> Object.class, >>>>>>>>> BSONObject.class); >>>>>>>>> >>>>>>>>> long count=mongoRDD.count(); >>>>>>>>> >>>>>>>>> There are about 1.5million record. >>>>>>>>> Though i am getting data but read operation took around 15min to >>>>>>>>> read whole. >>>>>>>>> >>>>>>>>> Is this Api really too slow or am i missing something. >>>>>>>>> Please suggest if there is an alternate approach to read data from >>>>>>>>> Mongo faster. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Deepesh >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>