Finally I made it work. The trick was in asSubclass method:
val mongoRDD = sc.newAPIHadoopFile(file:///root/jobs/dump/input.bson,
classOf[BSONFileInputFormat].asSubclass(classOf[org.apache.hadoop.mapreduce.lib.input.FileInputFormat[Object,
BSONObject]]), classOf[Object], classOf[BSONObject], config)
2014-08-06 0:43 GMT+04:00 Dmitriy Selivanov selivanov.dmit...@gmail.com:
Hello, I have issue when try to use bson file as spark input. I use
mongo-hadoop-connector 1.3.0 and spark 1.0.0:
val sparkConf = new SparkConf()
val sc = new SparkContext(sparkConf)
val config = new Configuration()
config.set(mongo.job.input.format,
com.mongodb.hadoop.BSONFileInputFormat)
config.set(mapred.input.dir, file:///root/jobs/dump/input.bson)
config.set(mongo.output.uri, mongodb:// + args(0) + / + args(2))
val mongoRDD =
sc.newAPIHadoopFile(file:///root/jobs/dump/input.bson,
classOf[BSONFileInputFormat], classOf[Object], classOf[BSONObject], config)
But on last line I recieve error: inferred type arguments
[Object,org.bson.BSONObject,com.mongodb.hadoop.BSONFileInputFormat] do not
conform to method newAPIHadoopFile's type parameter bounds [K,V,F :
org.apache.hadoop.mapreduce.InputFormat[K,V]]
this is very strange, because BSONFileInputFormat
extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat:
https://github.com/mongodb/mongo-hadoop/blob/master/core/src/main/java/com/mongodb/hadoop/BSONFileInputFormat.java
How I can solve this issue?
I have no problems with com.mongodb.hadoop.MongoInputFormat when use
mongodb collection as input.
And moreover seems there is no problem with java api:
https://github.com/crcsmnky/mongodb-spark-demo/blob/master/src/main/java/com/mongodb/spark/demo/Recommender.java
I'm not professional java/scala developer, please help.
--
Regards
Dmitriy Selivanov
--
Regards
Dmitriy Selivanov