Re: Integrating Spark Streaming with Reactive Mongo

2015-02-26 Thread Tathagata Das
Hey Mike,

I quickly looked through the example and I found major performance issue.
You are collecting the RDDs to the driver and then sending them to Mongo in
a foreach. Why not doing a distributed push to Mongo?

WHAT YOU HAVE
val mongoConnection = ...

WHAT YOU SHUOLD DO

rdd.foreachPartition { iterator =>
   val connection = createConnection()
   iterator.foreach { ... push partition using connection ...  }
}


On Thu, Feb 26, 2015 at 1:25 PM, Mike Trienis 
wrote:

> Hi All,
>
> I have Spark Streaming setup to write data to a replicated MongoDB
> database and would like to understand if there would be any issues using
> the Reactive Mongo library to write directly to the mongoDB? My stack is
> Apache Spark sitting on top of Cassandra for the datastore, so my thinking
> is that the MongoDB connector for Hadoop will not be particular useful for
> me since I'm not using HDFS? Is there anything that I'm missing?
>
> Here is an example of code that I'm planning on using as a starting point
> for my implementation.
>
> LogAggregator
> 
>
> Thanks, Mike.
>


Integrating Spark Streaming with Reactive Mongo

2015-02-26 Thread Mike Trienis
Hi All,

I have Spark Streaming setup to write data to a replicated MongoDB database
and would like to understand if there would be any issues using the
Reactive Mongo library to write directly to the mongoDB? My stack is Apache
Spark sitting on top of Cassandra for the datastore, so my thinking is that
the MongoDB connector for Hadoop will not be particular useful for me since
I'm not using HDFS? Is there anything that I'm missing?

Here is an example of code that I'm planning on using as a starting point
for my implementation.

LogAggregator


Thanks, Mike.


Integrating Spark Streaming with Reactive Mongo

2015-02-26 Thread Mike Trienis
Hi All,

I have Spark Streaming setup to write data to a replicated MongoDB database
and would like to understand if there would be any issues using the Reactive
Mongo library to write directly to the mongoDB? My stack is Apache Spark
sitting on top of Cassandra for the datastore, so my thinking is that the
MongoDB connector for Hadoop will not be particular useful for me since I'm
not using HDFS? Is there anything that I'm missing?  

Here is an example of code that I'm planning on using as a starting point
for my implementation. 

LogAggregator
<https://github.com/chimpler/blog-spark-streaming-log-aggregation/blob/master/src/main/scala/com/chimpler/sparkstreaminglogaggregation/LogAggregator.scala>
  

Thanks, Mike. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Integrating-Spark-Streaming-with-Reactive-Mongo-tp21828.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org