Re: Unit testing framework for Spark Jobs?

2016-03-02 Thread radoburansky
I am sure you have googled this: https://github.com/holdenk/spark-testing-base On Wed, Mar 2, 2016 at 6:54 PM, SRK [via Apache Spark User List] < ml-node+s1001560n2638...@n3.nabble.com> wrote: > Hi, > > What is a good unit testing framework for Spark batch/streaming jobs? I > have core spark,

Re: Running multiple foreach loops

2016-02-17 Thread radoburansky
Why would you expect performance degradation? On Wed, Feb 17, 2016 at 10:30 PM, Daniel Imberman [via Apache Spark User List] wrote: > Hi all, > > So I'm currently figuring out how to accumulate three separate > accumulators: > > val a:Accumulator > val

Re: Optimize the performance of inserting data to Cassandra with Kafka and Spark Streaming

2016-02-17 Thread radoburansky
Hi Jerry, How do you know that only 100 messages are inserted? What is the primary key of the "tableOfTopicA" Cassandra table? Isn't it possible that you map more messages to the same primamary key and therefore they overwrite each other in Cassandra? Regards Rado On Tue, Feb 16, 2016 at 10:29

Re: Number of CPU cores for a Spark Streaming app in Standalone mode

2016-01-18 Thread radoburansky
I am adding an answer from SO: http://stackoverflow.com/questions/34861947/read-more-kafka-topics-than-number-of-cpu-cores -- View this message in context:

Number of CPU cores for a Spark Streaming app in Standalone mode

2016-01-18 Thread radoburansky
I somehow don't want to believe this waste of resources. Is it really true that if I have 20 input streams I must have at least 21 CPU cores? Even if I read only once per minute and only a few messages? I still hope that I miss an important information. Thanks a lot -- View this message in