Processing Time Spikes (Spark Streaming)

2015-04-04 Thread t1ny
Hi all, I am running some benchmarks on a simple Spark application which consists of : - textFileStream() to extract text records from HDFS files - map() to parse records into JSON objects - updateStateByKey() to calculate and store an in-memory state for each key. The processing time per batch

Visualizing the DAG of a Spark application

2015-03-13 Thread t1ny
Hi all, We are looking for a tool that would let us visualize the DAG generated by a Spark application as a simple graph. This graph would represent the Spark Job, its stages and the tasks inside the stages, with the dependencies between them (either narrow or shuffle dependencies). The Spark

Re: Visualizing the DAG of a Spark application

2015-03-13 Thread t1ny
For anybody who's interested in this, here's a link to a PR that addresses this feature : https://github.com/apache/spark/pull/2077 (thanks to Todd Nist for sending it to me) -- View this message in context:

Creating RDDs from within foreachPartition() [Spark-Streaming]

2015-02-18 Thread t1ny
Hi all, I am trying to create RDDs from within /rdd.foreachPartition()/ so I can save these RDDs to ElasticSearch on the fly : stream.foreachRDD(rdd = { rdd.foreachPartition { iterator = { val sc = rdd.context iterator.foreach { case (cid,

Spark Streaming occasionally hangs after processing first batch

2014-10-20 Thread t1ny
Hi all, Spark Streaming occasionally (not always) hangs indefinitely on my program right after the first batch has been processed. As you can see in the following screenshots of the Spark Streaming monitoring UI, it hangs on the map stages that correspond (I assume) to the second batch that is

Re: Spark Streaming and ReactiveMongo

2014-09-19 Thread t1ny
Here's what we've tried so far as a first example of a custom Mongo receiver : /class MongoStreamReceiver(host: String) extends NetworkReceiver[String] { protected lazy val blocksGenerator: BlockGenerator = new BlockGenerator(StorageLevel.MEMORY_AND_DISK_SER_2) protected def onStart()