subject:"Re\: Slow Mongo Read from Spark"

Re: Slow Mongo Read from Spark

2015-09-03 Thread Jörn Franke

You might think about another storage layer not being mongodb
(hdfs+orc+compression or hdfs+parquet+compression)  to improve performance

Le jeu. 3 sept. 2015 à 9:15, Akhil Das  a
écrit :

> On SSD you will get around 30-40MB/s on a single machine (on 4 cores).
>
> Thanks
> Best Regards
>
> On Mon, Aug 31, 2015 at 3:13 PM, Deepesh Maheshwari <
> deepesh.maheshwar...@gmail.com> wrote:
>
>> tried it,,gives the same above exception
>>
>> Exception in thread "main" java.io.IOException: No FileSystem for scheme:
>> mongodb
>>
>> In you case, do you have used above code.
>> What read throughput , you get?
>>
>> On Mon, Aug 31, 2015 at 2:04 PM, Akhil Das 
>> wrote:
>>
>>> FYI, newAPIHadoopFile and newAPIHadoopRDD uses the NewHadoopRDD class
>>> itself underneath and it doesnt mean it will only read from HDFS. Give it a
>>> shot if you haven't tried it already (it just the inputformat and the
>>> reader which are different from your approach).
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Aug 31, 2015 at 1:14 PM, Deepesh Maheshwari <
>>> deepesh.maheshwar...@gmail.com> wrote:
>>>
 Hi Akhil,

 This code snippet is from below link

 https://github.com/crcsmnky/mongodb-spark-demo/blob/master/src/main/java/com/mongodb/spark/demo/Recommender.java

 Here it reading data from HDFS file system but in our case i need to
 read from mongodb.

 I have tried it earlier and now again tried it but is giving below
 error which is self explanantory.

 Exception in thread "main" java.io.IOException: No FileSystem for
 scheme: mongodb

 On Mon, Aug 31, 2015 at 1:03 PM, Akhil Das 
 wrote:

> Here's a piece of code which works well for us (spark 1.4.1)
>
> Configuration bsonDataConfig = new Configuration();
> bsonDataConfig.set("mongo.job.input.format",
> "com.mongodb.hadoop.BSONFileInputFormat");
>
> Configuration predictionsConfig = new Configuration();
> predictionsConfig.set("mongo.output.uri", mongodbUri);
>
> JavaPairRDD bsonRatingsData =
> sc.newAPIHadoopFile(
> ratingsUri, BSONFileInputFormat.class, Object.class,
> BSONObject.class, bsonDataConfig);
>
>
> Thanks
> Best Regards
>
> On Mon, Aug 31, 2015 at 12:59 PM, Deepesh Maheshwari <
> deepesh.maheshwar...@gmail.com> wrote:
>
>> Hi, I am using 1.3.0
>>
>> I am not getting constructor for above values
>>
>> [image: Inline image 1]
>>
>> So, i tried to shuffle the values in constructor .
>> [image: Inline image 2]
>>
>> But, it is giving this error.Please suggest
>> [image: Inline image 3]
>>
>> Best Regards
>>
>> On Mon, Aug 31, 2015 at 12:43 PM, Akhil Das <
>> ak...@sigmoidanalytics.com> wrote:
>>
>>> Can you try with these key value classes and see the performance?
>>>
>>> inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"
>>>
>>>
>>> keyClassName = "org.apache.hadoop.io.Text"
>>> valueClassName = "org.apache.hadoop.io.MapWritable"
>>>
>>>
>>> Taken from databricks blog
>>> 
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari <
>>> deepesh.maheshwar...@gmail.com> wrote:
>>>
 Hi, I am trying to read mongodb in Spark newAPIHadoopRDD.

 / Code */

 config.set("mongo.job.input.format",
 "com.mongodb.hadoop.MongoInputFormat");
 config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI);
 config.set("mongo.input.query","{host: 'abc.com'}");

 JavaSparkContext sc=new JavaSparkContext("local", "MongoOps");

 JavaPairRDD mongoRDD =
 sc.newAPIHadoopRDD(config,
 com.mongodb.hadoop.MongoInputFormat.class,
 Object.class,
 BSONObject.class);

 long count=mongoRDD.count();

 There are about 1.5million record.
 Though i am getting data but read operation took around 15min to
 read whole.

 Is this Api really too slow or am i missing something.
 Please suggest if there is an alternate approach to read data from
 Mongo faster.

 Thanks,
 Deepesh

>>>
>>>
>>
>

>>>
>>
>

Re: Slow Mongo Read from Spark

2015-09-03 Thread Deepesh Maheshwari

Because of existing architecture , i am bound to use mongodb.

Please suggest for this

On Thu, Sep 3, 2015 at 9:10 PM, Jörn Franke  wrote:

> You might think about another storage layer not being mongodb
> (hdfs+orc+compression or hdfs+parquet+compression)  to improve performance
>
> Le jeu. 3 sept. 2015 à 9:15, Akhil Das  a
> écrit :
>
>> On SSD you will get around 30-40MB/s on a single machine (on 4 cores).
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Aug 31, 2015 at 3:13 PM, Deepesh Maheshwari <
>> deepesh.maheshwar...@gmail.com> wrote:
>>
>>> tried it,,gives the same above exception
>>>
>>> Exception in thread "main" java.io.IOException: No FileSystem for
>>> scheme: mongodb
>>>
>>> In you case, do you have used above code.
>>> What read throughput , you get?
>>>
>>> On Mon, Aug 31, 2015 at 2:04 PM, Akhil Das 
>>> wrote:
>>>
 FYI, newAPIHadoopFile and newAPIHadoopRDD uses the NewHadoopRDD class
 itself underneath and it doesnt mean it will only read from HDFS. Give it a
 shot if you haven't tried it already (it just the inputformat and the
 reader which are different from your approach).

 Thanks
 Best Regards

 On Mon, Aug 31, 2015 at 1:14 PM, Deepesh Maheshwari <
 deepesh.maheshwar...@gmail.com> wrote:

> Hi Akhil,
>
> This code snippet is from below link
>
> https://github.com/crcsmnky/mongodb-spark-demo/blob/master/src/main/java/com/mongodb/spark/demo/Recommender.java
>
> Here it reading data from HDFS file system but in our case i need to
> read from mongodb.
>
> I have tried it earlier and now again tried it but is giving below
> error which is self explanantory.
>
> Exception in thread "main" java.io.IOException: No FileSystem for
> scheme: mongodb
>
> On Mon, Aug 31, 2015 at 1:03 PM, Akhil Das  > wrote:
>
>> Here's a piece of code which works well for us (spark 1.4.1)
>>
>> Configuration bsonDataConfig = new Configuration();
>> bsonDataConfig.set("mongo.job.input.format",
>> "com.mongodb.hadoop.BSONFileInputFormat");
>>
>> Configuration predictionsConfig = new Configuration();
>> predictionsConfig.set("mongo.output.uri", mongodbUri);
>>
>> JavaPairRDD bsonRatingsData =
>> sc.newAPIHadoopFile(
>> ratingsUri, BSONFileInputFormat.class, Object.class,
>> BSONObject.class, bsonDataConfig);
>>
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Aug 31, 2015 at 12:59 PM, Deepesh Maheshwari <
>> deepesh.maheshwar...@gmail.com> wrote:
>>
>>> Hi, I am using 1.3.0
>>>
>>> I am not getting constructor for above values
>>>
>>> [image: Inline image 1]
>>>
>>> So, i tried to shuffle the values in constructor .
>>> [image: Inline image 2]
>>>
>>> But, it is giving this error.Please suggest
>>> [image: Inline image 3]
>>>
>>> Best Regards
>>>
>>> On Mon, Aug 31, 2015 at 12:43 PM, Akhil Das <
>>> ak...@sigmoidanalytics.com> wrote:
>>>
 Can you try with these key value classes and see the performance?

 inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"


 keyClassName = "org.apache.hadoop.io.Text"
 valueClassName = "org.apache.hadoop.io.MapWritable"


 Taken from databricks blog
 

 Thanks
 Best Regards

 On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari <
 deepesh.maheshwar...@gmail.com> wrote:

> Hi, I am trying to read mongodb in Spark newAPIHadoopRDD.
>
> / Code */
>
> config.set("mongo.job.input.format",
> "com.mongodb.hadoop.MongoInputFormat");
> config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI);
> config.set("mongo.input.query","{host: 'abc.com'}");
>
> JavaSparkContext sc=new JavaSparkContext("local", "MongoOps");
>
> JavaPairRDD mongoRDD =
> sc.newAPIHadoopRDD(config,
> com.mongodb.hadoop.MongoInputFormat.class,
> Object.class,
> BSONObject.class);
>
> long count=mongoRDD.count();
>
> There are about 1.5million record.
> Though i am getting data but read operation took around 15min to
> read whole.
>
> Is this Api really too slow or am i missing something.
> Please suggest if there is an alternate approach to read data from
> Mongo faster.
>
> Thanks,
> Deepesh
>


>>>
>>
>

>>>
>>

Re: Slow Mongo Read from Spark

2015-09-03 Thread Akhil Das

On SSD you will get around 30-40MB/s on a single machine (on 4 cores).

Thanks
Best Regards

On Mon, Aug 31, 2015 at 3:13 PM, Deepesh Maheshwari <
deepesh.maheshwar...@gmail.com> wrote:

> tried it,,gives the same above exception
>
> Exception in thread "main" java.io.IOException: No FileSystem for scheme:
> mongodb
>
> In you case, do you have used above code.
> What read throughput , you get?
>
> On Mon, Aug 31, 2015 at 2:04 PM, Akhil Das 
> wrote:
>
>> FYI, newAPIHadoopFile and newAPIHadoopRDD uses the NewHadoopRDD class
>> itself underneath and it doesnt mean it will only read from HDFS. Give it a
>> shot if you haven't tried it already (it just the inputformat and the
>> reader which are different from your approach).
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Aug 31, 2015 at 1:14 PM, Deepesh Maheshwari <
>> deepesh.maheshwar...@gmail.com> wrote:
>>
>>> Hi Akhil,
>>>
>>> This code snippet is from below link
>>>
>>> https://github.com/crcsmnky/mongodb-spark-demo/blob/master/src/main/java/com/mongodb/spark/demo/Recommender.java
>>>
>>> Here it reading data from HDFS file system but in our case i need to
>>> read from mongodb.
>>>
>>> I have tried it earlier and now again tried it but is giving below error
>>> which is self explanantory.
>>>
>>> Exception in thread "main" java.io.IOException: No FileSystem for
>>> scheme: mongodb
>>>
>>> On Mon, Aug 31, 2015 at 1:03 PM, Akhil Das 
>>> wrote:
>>>
 Here's a piece of code which works well for us (spark 1.4.1)

 Configuration bsonDataConfig = new Configuration();
 bsonDataConfig.set("mongo.job.input.format",
 "com.mongodb.hadoop.BSONFileInputFormat");

 Configuration predictionsConfig = new Configuration();
 predictionsConfig.set("mongo.output.uri", mongodbUri);

 JavaPairRDD bsonRatingsData =
 sc.newAPIHadoopFile(
 ratingsUri, BSONFileInputFormat.class, Object.class,
 BSONObject.class, bsonDataConfig);


 Thanks
 Best Regards

 On Mon, Aug 31, 2015 at 12:59 PM, Deepesh Maheshwari <
 deepesh.maheshwar...@gmail.com> wrote:

> Hi, I am using 1.3.0
>
> I am not getting constructor for above values
>
> [image: Inline image 1]
>
> So, i tried to shuffle the values in constructor .
> [image: Inline image 2]
>
> But, it is giving this error.Please suggest
> [image: Inline image 3]
>
> Best Regards
>
> On Mon, Aug 31, 2015 at 12:43 PM, Akhil Das <
> ak...@sigmoidanalytics.com> wrote:
>
>> Can you try with these key value classes and see the performance?
>>
>> inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"
>>
>>
>> keyClassName = "org.apache.hadoop.io.Text"
>> valueClassName = "org.apache.hadoop.io.MapWritable"
>>
>>
>> Taken from databricks blog
>> 
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari <
>> deepesh.maheshwar...@gmail.com> wrote:
>>
>>> Hi, I am trying to read mongodb in Spark newAPIHadoopRDD.
>>>
>>> / Code */
>>>
>>> config.set("mongo.job.input.format",
>>> "com.mongodb.hadoop.MongoInputFormat");
>>> config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI);
>>> config.set("mongo.input.query","{host: 'abc.com'}");
>>>
>>> JavaSparkContext sc=new JavaSparkContext("local", "MongoOps");
>>>
>>> JavaPairRDD mongoRDD =
>>> sc.newAPIHadoopRDD(config,
>>> com.mongodb.hadoop.MongoInputFormat.class,
>>> Object.class,
>>> BSONObject.class);
>>>
>>> long count=mongoRDD.count();
>>>
>>> There are about 1.5million record.
>>> Though i am getting data but read operation took around 15min to
>>> read whole.
>>>
>>> Is this Api really too slow or am i missing something.
>>> Please suggest if there is an alternate approach to read data from
>>> Mongo faster.
>>>
>>> Thanks,
>>> Deepesh
>>>
>>
>>
>

>>>
>>
>

Re: Slow Mongo Read from Spark

2015-08-31 Thread Akhil Das

Can you try with these key value classes and see the performance?

inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"


keyClassName = "org.apache.hadoop.io.Text"
valueClassName = "org.apache.hadoop.io.MapWritable"


Taken from databricks blog


Thanks
Best Regards

On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari <
deepesh.maheshwar...@gmail.com> wrote:

> Hi, I am trying to read mongodb in Spark newAPIHadoopRDD.
>
> / Code */
>
> config.set("mongo.job.input.format",
> "com.mongodb.hadoop.MongoInputFormat");
> config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI);
> config.set("mongo.input.query","{host: 'abc.com'}");
>
> JavaSparkContext sc=new JavaSparkContext("local", "MongoOps");
>
> JavaPairRDD mongoRDD =
> sc.newAPIHadoopRDD(config,
> com.mongodb.hadoop.MongoInputFormat.class, Object.class,
> BSONObject.class);
>
> long count=mongoRDD.count();
>
> There are about 1.5million record.
> Though i am getting data but read operation took around 15min to read
> whole.
>
> Is this Api really too slow or am i missing something.
> Please suggest if there is an alternate approach to read data from Mongo
> faster.
>
> Thanks,
> Deepesh
>

Re: Slow Mongo Read from Spark

2015-08-31 Thread Akhil Das

Here's a piece of code which works well for us (spark 1.4.1)

Configuration bsonDataConfig = new Configuration();
bsonDataConfig.set("mongo.job.input.format",
"com.mongodb.hadoop.BSONFileInputFormat");

Configuration predictionsConfig = new Configuration();
predictionsConfig.set("mongo.output.uri", mongodbUri);

JavaPairRDD bsonRatingsData =
sc.newAPIHadoopFile(
ratingsUri, BSONFileInputFormat.class, Object.class,
BSONObject.class, bsonDataConfig);


Thanks
Best Regards

On Mon, Aug 31, 2015 at 12:59 PM, Deepesh Maheshwari <
deepesh.maheshwar...@gmail.com> wrote:

> Hi, I am using 1.3.0
>
> I am not getting constructor for above values
>
> [image: Inline image 1]
>
> So, i tried to shuffle the values in constructor .
> [image: Inline image 2]
>
> But, it is giving this error.Please suggest
> [image: Inline image 3]
>
> Best Regards
>
> On Mon, Aug 31, 2015 at 12:43 PM, Akhil Das 
> wrote:
>
>> Can you try with these key value classes and see the performance?
>>
>> inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"
>>
>>
>> keyClassName = "org.apache.hadoop.io.Text"
>> valueClassName = "org.apache.hadoop.io.MapWritable"
>>
>>
>> Taken from databricks blog
>> 
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari <
>> deepesh.maheshwar...@gmail.com> wrote:
>>
>>> Hi, I am trying to read mongodb in Spark newAPIHadoopRDD.
>>>
>>> / Code */
>>>
>>> config.set("mongo.job.input.format",
>>> "com.mongodb.hadoop.MongoInputFormat");
>>> config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI);
>>> config.set("mongo.input.query","{host: 'abc.com'}");
>>>
>>> JavaSparkContext sc=new JavaSparkContext("local", "MongoOps");
>>>
>>> JavaPairRDD mongoRDD =
>>> sc.newAPIHadoopRDD(config,
>>> com.mongodb.hadoop.MongoInputFormat.class, Object.class,
>>> BSONObject.class);
>>>
>>> long count=mongoRDD.count();
>>>
>>> There are about 1.5million record.
>>> Though i am getting data but read operation took around 15min to read
>>> whole.
>>>
>>> Is this Api really too slow or am i missing something.
>>> Please suggest if there is an alternate approach to read data from Mongo
>>> faster.
>>>
>>> Thanks,
>>> Deepesh
>>>
>>
>>
>

Re: Slow Mongo Read from Spark

2015-08-31 Thread Akhil Das

FYI, newAPIHadoopFile and newAPIHadoopRDD uses the NewHadoopRDD class
itself underneath and it doesnt mean it will only read from HDFS. Give it a
shot if you haven't tried it already (it just the inputformat and the
reader which are different from your approach).

Thanks
Best Regards

On Mon, Aug 31, 2015 at 1:14 PM, Deepesh Maheshwari <
deepesh.maheshwar...@gmail.com> wrote:

> Hi Akhil,
>
> This code snippet is from below link
>
> https://github.com/crcsmnky/mongodb-spark-demo/blob/master/src/main/java/com/mongodb/spark/demo/Recommender.java
>
> Here it reading data from HDFS file system but in our case i need to read
> from mongodb.
>
> I have tried it earlier and now again tried it but is giving below error
> which is self explanantory.
>
> Exception in thread "main" java.io.IOException: No FileSystem for scheme:
> mongodb
>
> On Mon, Aug 31, 2015 at 1:03 PM, Akhil Das 
> wrote:
>
>> Here's a piece of code which works well for us (spark 1.4.1)
>>
>> Configuration bsonDataConfig = new Configuration();
>> bsonDataConfig.set("mongo.job.input.format",
>> "com.mongodb.hadoop.BSONFileInputFormat");
>>
>> Configuration predictionsConfig = new Configuration();
>> predictionsConfig.set("mongo.output.uri", mongodbUri);
>>
>> JavaPairRDD bsonRatingsData =
>> sc.newAPIHadoopFile(
>> ratingsUri, BSONFileInputFormat.class, Object.class,
>> BSONObject.class, bsonDataConfig);
>>
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Aug 31, 2015 at 12:59 PM, Deepesh Maheshwari <
>> deepesh.maheshwar...@gmail.com> wrote:
>>
>>> Hi, I am using 1.3.0
>>>
>>> I am not getting constructor for above values
>>>
>>> [image: Inline image 1]
>>>
>>> So, i tried to shuffle the values in constructor .
>>> [image: Inline image 2]
>>>
>>> But, it is giving this error.Please suggest
>>> [image: Inline image 3]
>>>
>>> Best Regards
>>>
>>> On Mon, Aug 31, 2015 at 12:43 PM, Akhil Das 
>>> wrote:
>>>
 Can you try with these key value classes and see the performance?

 inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"


 keyClassName = "org.apache.hadoop.io.Text"
 valueClassName = "org.apache.hadoop.io.MapWritable"


 Taken from databricks blog
 

 Thanks
 Best Regards

 On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari <
 deepesh.maheshwar...@gmail.com> wrote:

> Hi, I am trying to read mongodb in Spark newAPIHadoopRDD.
>
> / Code */
>
> config.set("mongo.job.input.format",
> "com.mongodb.hadoop.MongoInputFormat");
> config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI);
> config.set("mongo.input.query","{host: 'abc.com'}");
>
> JavaSparkContext sc=new JavaSparkContext("local", "MongoOps");
>
> JavaPairRDD mongoRDD =
> sc.newAPIHadoopRDD(config,
> com.mongodb.hadoop.MongoInputFormat.class,
> Object.class,
> BSONObject.class);
>
> long count=mongoRDD.count();
>
> There are about 1.5million record.
> Though i am getting data but read operation took around 15min to read
> whole.
>
> Is this Api really too slow or am i missing something.
> Please suggest if there is an alternate approach to read data from
> Mongo faster.
>
> Thanks,
> Deepesh
>


>>>
>>
>

Re: Slow Mongo Read from Spark

2015-08-31 Thread Deepesh Maheshwari

Hi Akhil,

This code snippet is from below link
https://github.com/crcsmnky/mongodb-spark-demo/blob/master/src/main/java/com/mongodb/spark/demo/Recommender.java

Here it reading data from HDFS file system but in our case i need to read
from mongodb.

I have tried it earlier and now again tried it but is giving below error
which is self explanantory.

Exception in thread "main" java.io.IOException: No FileSystem for scheme:
mongodb

On Mon, Aug 31, 2015 at 1:03 PM, Akhil Das 
wrote:

> Here's a piece of code which works well for us (spark 1.4.1)
>
> Configuration bsonDataConfig = new Configuration();
> bsonDataConfig.set("mongo.job.input.format",
> "com.mongodb.hadoop.BSONFileInputFormat");
>
> Configuration predictionsConfig = new Configuration();
> predictionsConfig.set("mongo.output.uri", mongodbUri);
>
> JavaPairRDD bsonRatingsData =
> sc.newAPIHadoopFile(
> ratingsUri, BSONFileInputFormat.class, Object.class,
> BSONObject.class, bsonDataConfig);
>
>
> Thanks
> Best Regards
>
> On Mon, Aug 31, 2015 at 12:59 PM, Deepesh Maheshwari <
> deepesh.maheshwar...@gmail.com> wrote:
>
>> Hi, I am using 1.3.0
>>
>> I am not getting constructor for above values
>>
>> [image: Inline image 1]
>>
>> So, i tried to shuffle the values in constructor .
>> [image: Inline image 2]
>>
>> But, it is giving this error.Please suggest
>> [image: Inline image 3]
>>
>> Best Regards
>>
>> On Mon, Aug 31, 2015 at 12:43 PM, Akhil Das 
>> wrote:
>>
>>> Can you try with these key value classes and see the performance?
>>>
>>> inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"
>>>
>>>
>>> keyClassName = "org.apache.hadoop.io.Text"
>>> valueClassName = "org.apache.hadoop.io.MapWritable"
>>>
>>>
>>> Taken from databricks blog
>>> 
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari <
>>> deepesh.maheshwar...@gmail.com> wrote:
>>>
 Hi, I am trying to read mongodb in Spark newAPIHadoopRDD.

 / Code */

 config.set("mongo.job.input.format",
 "com.mongodb.hadoop.MongoInputFormat");
 config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI);
 config.set("mongo.input.query","{host: 'abc.com'}");

 JavaSparkContext sc=new JavaSparkContext("local", "MongoOps");

 JavaPairRDD mongoRDD =
 sc.newAPIHadoopRDD(config,
 com.mongodb.hadoop.MongoInputFormat.class, Object.class,
 BSONObject.class);

 long count=mongoRDD.count();

 There are about 1.5million record.
 Though i am getting data but read operation took around 15min to read
 whole.

 Is this Api really too slow or am i missing something.
 Please suggest if there is an alternate approach to read data from
 Mongo faster.

 Thanks,
 Deepesh

>>>
>>>
>>
>

Re: Slow Mongo Read from Spark

Re: Slow Mongo Read from Spark

Re: Slow Mongo Read from Spark

Re: Slow Mongo Read from Spark

Re: Slow Mongo Read from Spark

Re: Slow Mongo Read from Spark

Re: Slow Mongo Read from Spark

7 matches

Site Navigation

Mail list logo

Footer information