[ 
https://issues.apache.org/jira/browse/SPARK-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Kumar Varma updated SPARK-1443:
-------------------------------------

    Summary: Unable to Access MongoDB GridFS data with Spark using mongo-hadoop 
API  (was: Accessing Mongo GridFS data with Spark using mongo-hadoop API)

> Unable to Access MongoDB GridFS data with Spark using mongo-hadoop API
> ----------------------------------------------------------------------
>
>                 Key: SPARK-1443
>                 URL: https://issues.apache.org/jira/browse/SPARK-1443
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output, Java API, Spark Core
>    Affects Versions: 0.9.0
>         Environment: Java 1.7,Hadoop 2.2.0,Spark 0.9.0,Ubuntu 12.4,
>            Reporter: Pavan Kumar Varma
>            Priority: Critical
>             Fix For: 0.9.0
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> I saved a 2GB pdf file into MongoDB using GridFS. now i want process those 
> GridFS collection data using Java Spark Mapreduce API. previously i have 
> successfully processed mongoDB collections with Apache spark using 
> Mongo-Hadoop connector. now i'm unable to GridFS collections with the 
> following code.
> MongoConfigUtil.setInputURI(config, 
> "mongodb://localhost:27017/pdfbooks.fs.chunks" );
>  MongoConfigUtil.setOutputURI(config,"mongodb://localhost:27017/"+output );
>  JavaPairRDD<Object, BSONObject> mongoRDD = sc.newAPIHadoopRDD(config,
>             com.mongodb.hadoop.MongoInputFormat.class, Object.class,
>             BSONObject.class);
>  JavaRDD<String> words = mongoRDD.flatMap(new 
> FlatMapFunction<Tuple2<Object,BSONObject>,
>    String>() {                                
>    @Override
>    public Iterable<String> call(Tuple2<Object, BSONObject> arg) {   
>    System.out.println(arg._2.toString());
>    ...
> Please provide a better API to access MongoDB GridFS data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to