Does Spark Streaming calculate during a batch?

2014-11-13 Thread Michael Campbell
I was running a proof of concept for my company with spark streaming, and the conclusion I came to is that spark collects data for the batch-duration, THEN starts the data-pipeline calculations. My batch size was 5 minutes, and the CPU was all but dead for 5, then when the 5 minutes were up the

Re: Does Spark Streaming calculate during a batch?

2014-11-13 Thread Michael Campbell
On Thu, Nov 13, 2014 at 11:02 AM, Sean Owen so...@cloudera.com wrote: Yes. Data is collected for 5 minutes, then processing starts at the end. The result may be an arbitrary function of the data in the interval, so the interval has to finish before computation can start. Thanks everyone.

Re: Submission to cluster fails (Spark SQL; NoSuchMethodError on SchemaRDD)

2014-10-17 Thread Michael Campbell
For posterity's sake, I solved this. The problem was the Cloudera cluster I was submitting to is running 1.0, and I was compiling against the latest 1.1 release. Downgrading to 1.0 on my compile got me past this. On Tue, Oct 14, 2014 at 6:08 PM, Michael Campbell michael.campb...@gmail.com

Spark Bug? job fails to run when given options on spark-submit (but starts and fails without)

2014-10-16 Thread Michael Campbell
TL;DR - a spark SQL job fails with an OOM (Out of heap space) error. If given --executor-memory values, it won't even start. Even (!) if the values given ARE THE SAME AS THE DEFAULT. Without --executor-memory: 14/10/16 17:14:58 INFO TaskSetManager: Serialized task 1.0:64 as 14710 bytes in 1

Re: jsonRDD: NoSuchMethodError

2014-10-15 Thread Michael Campbell
How did you resolve it? On Tue, Jul 15, 2014 at 3:50 AM, SK skrishna...@gmail.com wrote: The problem is resolved. Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/jsonRDD-NoSuchMethodError-tp9688p9742.html Sent from the Apache Spark User List

Submission to cluster fails (Spark SQL; NoSuchMethodError on SchemaRDD)

2014-10-14 Thread Michael Campbell
Hey all, I'm trying a very basic spark SQL job and apologies as I'm new to a lot of this, but I'm getting this failure: Exception in thread main java.lang.NoSuchMethodError: org.apache.spark.sql.SchemaRDD.take(I)[Lorg/apache/spark/sql/catalyst/expressions/Row; I've tried a variety of uber-jar

Re: can't print DStream after reduce

2014-07-15 Thread Michael Campbell
Owen so...@cloudera.com wrote: How about a PR that rejects a context configured for local or local[1]? As I understand it is not intended to work and has bitten several people. On Jul 14, 2014 12:24 AM, Michael Campbell michael.campb...@gmail.com wrote: This almost had me not using Spark; I

Re: can't print DStream after reduce

2014-07-13 Thread Michael Campbell
This almost had me not using Spark; I couldn't get any output. It is not at all obvious what's going on here to the layman (and to the best of my knowledge, not documented anywhere), but now you know you'll be able to answer this question for the numerous people that will also have it. On Sun,

Re: not getting output from socket connection

2014-07-13 Thread Michael Campbell
Make sure you use local[n] (where n 1) in your context setup too, (if you're running locally), or you won't get output. On Sat, Jul 12, 2014 at 11:36 PM, Walrus theCat walrusthe...@gmail.com wrote: Thanks! I thought it would get passed through netcat, but given your email, I was able to

Re: HELP!? Re: Streaming trouble (reduceByKey causes all printing to stop)

2014-06-12 Thread Michael Campbell
: *0 - *Waiting batches: *1 Why would a batch be waiting for long over my batch time of 5 seconds? On Thu, Jun 12, 2014 at 10:18 AM, Michael Campbell michael.campb...@gmail.com wrote: Ad... it's NOT working. Here's the code: val bytes = kafkaStream.map({ case (key

Re: Having trouble with streaming (updateStateByKey)

2014-06-11 Thread Michael Campbell
, 2014 at 1:47 PM, Michael Campbell michael.campb...@gmail.com wrote: I'm having a little trouble getting an updateStateByKey() call to work; was wondering if anyone could help. In my chain of calls from getting Kafka messages out of the queue to converting the message to a set of things

Kafka client - specify offsets?

2014-06-11 Thread Michael Campbell
Is there a way in the Apache Spark Kafka Utils to specify an offset to start reading? Specifically, from the start of the queue, or failing that, a specific point?

Re: New user streaming question

2014-06-07 Thread Michael Campbell
AM, Michael Campbell michael.campb...@gmail.com wrote: I've been playing with spark and streaming and have a question on stream outputs. The symptom is I don't get any. I have run spark-shell and all does as I expect, but when I run the word-count example with streaming, it *works