Re: issue with spark and bson input

2014-08-06 Thread Dmitriy Selivanov
Finally I made it work. The trick was in asSubclass method:
val mongoRDD = sc.newAPIHadoopFile(file:///root/jobs/dump/input.bson,
classOf[BSONFileInputFormat].asSubclass(classOf[org.apache.hadoop.mapreduce.lib.input.FileInputFormat[Object,
BSONObject]]), classOf[Object], classOf[BSONObject], config)


2014-08-06 0:43 GMT+04:00 Dmitriy Selivanov selivanov.dmit...@gmail.com:

 Hello, I have issue when try to use bson file as spark input. I use
 mongo-hadoop-connector 1.3.0 and spark 1.0.0:
  val sparkConf = new SparkConf()
 val sc = new SparkContext(sparkConf)
 val config = new Configuration()
 config.set(mongo.job.input.format,
 com.mongodb.hadoop.BSONFileInputFormat)
 config.set(mapred.input.dir, file:///root/jobs/dump/input.bson)
 config.set(mongo.output.uri, mongodb:// + args(0) + / + args(2))
 val mongoRDD =
 sc.newAPIHadoopFile(file:///root/jobs/dump/input.bson,
 classOf[BSONFileInputFormat], classOf[Object], classOf[BSONObject], config)

 But on last line I recieve error: inferred type arguments
 [Object,org.bson.BSONObject,com.mongodb.hadoop.BSONFileInputFormat] do not
 conform to method newAPIHadoopFile's type parameter bounds [K,V,F :
 org.apache.hadoop.mapreduce.InputFormat[K,V]]
 this is very strange, because BSONFileInputFormat
 extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat:
 https://github.com/mongodb/mongo-hadoop/blob/master/core/src/main/java/com/mongodb/hadoop/BSONFileInputFormat.java
 How I can solve this issue?
 I have no problems with com.mongodb.hadoop.MongoInputFormat when use
 mongodb collection as input.
 And moreover seems there is no problem with java api:
 https://github.com/crcsmnky/mongodb-spark-demo/blob/master/src/main/java/com/mongodb/spark/demo/Recommender.java
 I'm not professional java/scala developer, please help.

 --
 Regards
 Dmitriy Selivanov




-- 
Regards
Dmitriy Selivanov


issue with spark and bson input

2014-08-05 Thread Dmitriy Selivanov
Hello, I have issue when try to use bson file as spark input. I use
mongo-hadoop-connector 1.3.0 and spark 1.0.0:
val sparkConf = new SparkConf()
val sc = new SparkContext(sparkConf)
val config = new Configuration()
config.set(mongo.job.input.format,
com.mongodb.hadoop.BSONFileInputFormat)
config.set(mapred.input.dir, file:///root/jobs/dump/input.bson)
config.set(mongo.output.uri, mongodb:// + args(0) + / + args(2))
val mongoRDD = sc.newAPIHadoopFile(file:///root/jobs/dump/input.bson,
classOf[BSONFileInputFormat], classOf[Object], classOf[BSONObject], config)

But on last line I recieve error: inferred type arguments
[Object,org.bson.BSONObject,com.mongodb.hadoop.BSONFileInputFormat] do not
conform to method newAPIHadoopFile's type parameter bounds [K,V,F :
org.apache.hadoop.mapreduce.InputFormat[K,V]]
this is very strange, because BSONFileInputFormat
extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat:
https://github.com/mongodb/mongo-hadoop/blob/master/core/src/main/java/com/mongodb/hadoop/BSONFileInputFormat.java
How I can solve this issue?
I have no problems with com.mongodb.hadoop.MongoInputFormat when use
mongodb collection as input.
And moreover seems there is no problem with java api:
https://github.com/crcsmnky/mongodb-spark-demo/blob/master/src/main/java/com/mongodb/spark/demo/Recommender.java
I'm not professional java/scala developer, please help.

-- 
Regards
Dmitriy Selivanov