StreamingKMeans does not update cluster centroid locations

2016-02-18 Thread ramach1776
I have streaming application wherein I train the model using a receiver input stream in 4 sec batches val stream = ssc.receiverStream(receiver) //receiver gets new data every batch model.trainOn(stream.map(Vectors.parse)) If I use model.latestModel.clusterCenters.foreach(println) the value of

Re: adding a split and union to a streaming application cause big performance hit

2016-02-18 Thread ramach1776
bq. streamingContext.remember("duration") did not help Can you give a bit more detail on the above ? Did you mean the job encountered OOME later on ? Which Spark release are you using ? tried these 2 global settings (and restarted the app) after enabling cache for stream1

adding a split and union to a streaming application cause big performance hit

2016-02-17 Thread ramach1776
We have a streaming application containing approximately 12 jobs every batch, running in streaming mode (4 sec batches). Each job has several transformations and 1 action (output to cassandra) which causes the execution of the job (DAG) For example the first job, /job 1 ---> receive Stream A