[ https://issues.apache.org/jira/browse/SPARK-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthew Farrellee resolved SPARK-1443. -------------------------------------- Resolution: Done Fix Version/s: (was: 0.9.0) > Unable to Access MongoDB GridFS data with Spark using mongo-hadoop API > ---------------------------------------------------------------------- > > Key: SPARK-1443 > URL: https://issues.apache.org/jira/browse/SPARK-1443 > Project: Spark > Issue Type: Improvement > Components: Input/Output, Java API, Spark Core > Affects Versions: 0.9.0 > Environment: Java 1.7,Hadoop 2.2.0,Spark 0.9.0,Ubuntu 12.4, > Reporter: Pavan Kumar Varma > Priority: Critical > Labels: GridFS, MongoDB, Spark, hadoop2, java > Original Estimate: 12h > Remaining Estimate: 12h > > I saved a 2GB pdf file into MongoDB using GridFS. now i want process those > GridFS collection data using Java Spark Mapreduce API. previously i have > successfully processed mongoDB collections with Apache spark using > Mongo-Hadoop connector. now i'm unable to GridFS collections with the > following code. > MongoConfigUtil.setInputURI(config, > "mongodb://localhost:27017/pdfbooks.fs.chunks" ); > MongoConfigUtil.setOutputURI(config,"mongodb://localhost:27017/"+output ); > JavaPairRDD<Object, BSONObject> mongoRDD = sc.newAPIHadoopRDD(config, > com.mongodb.hadoop.MongoInputFormat.class, Object.class, > BSONObject.class); > JavaRDD<String> words = mongoRDD.flatMap(new > FlatMapFunction<Tuple2<Object,BSONObject>, > String>() { > @Override > public Iterable<String> call(Tuple2<Object, BSONObject> arg) { > System.out.println(arg._2.toString()); > ... > Please suggest/provide better API methods to access MongoDB GridFS data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org