I was running a proof of concept for my company with spark streaming, and
the conclusion I came to is that spark collects data for the
batch-duration, THEN starts the data-pipeline calculations.
My batch size was 5 minutes, and the CPU was all but dead for 5, then when
the 5 minutes were up the
On Thu, Nov 13, 2014 at 11:02 AM, Sean Owen so...@cloudera.com wrote:
Yes. Data is collected for 5 minutes, then processing starts at the
end. The result may be an arbitrary function of the data in the
interval, so the interval has to finish before computation can start.
Thanks everyone.
For posterity's sake, I solved this. The problem was the Cloudera cluster
I was submitting to is running 1.0, and I was compiling against the latest
1.1 release. Downgrading to 1.0 on my compile got me past this.
On Tue, Oct 14, 2014 at 6:08 PM, Michael Campbell
michael.campb...@gmail.com
TL;DR - a spark SQL job fails with an OOM (Out of heap space) error. If
given --executor-memory values, it won't even start. Even (!) if the
values given ARE THE SAME AS THE DEFAULT.
Without --executor-memory:
14/10/16 17:14:58 INFO TaskSetManager: Serialized task 1.0:64 as 14710
bytes in 1
How did you resolve it?
On Tue, Jul 15, 2014 at 3:50 AM, SK skrishna...@gmail.com wrote:
The problem is resolved. Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/jsonRDD-NoSuchMethodError-tp9688p9742.html
Sent from the Apache Spark User List
Hey all, I'm trying a very basic spark SQL job and apologies as I'm new to
a lot of this, but I'm getting this failure:
Exception in thread main java.lang.NoSuchMethodError:
org.apache.spark.sql.SchemaRDD.take(I)[Lorg/apache/spark/sql/catalyst/expressions/Row;
I've tried a variety of uber-jar
Owen so...@cloudera.com wrote:
How about a PR that rejects a context configured for local or local[1]?
As I understand it is not intended to work and has bitten several people.
On Jul 14, 2014 12:24 AM, Michael Campbell michael.campb...@gmail.com
wrote:
This almost had me not using Spark; I
This almost had me not using Spark; I couldn't get any output. It is not
at all obvious what's going on here to the layman (and to the best of my
knowledge, not documented anywhere), but now you know you'll be able to
answer this question for the numerous people that will also have it.
On Sun,
Make sure you use local[n] (where n 1) in your context setup too, (if
you're running locally), or you won't get output.
On Sat, Jul 12, 2014 at 11:36 PM, Walrus theCat walrusthe...@gmail.com
wrote:
Thanks!
I thought it would get passed through netcat, but given your email, I
was able to
: *0
- *Waiting batches: *1
Why would a batch be waiting for long over my batch time of 5 seconds?
On Thu, Jun 12, 2014 at 10:18 AM, Michael Campbell
michael.campb...@gmail.com wrote:
Ad... it's NOT working.
Here's the code:
val bytes = kafkaStream.map({ case (key
, 2014 at 1:47 PM, Michael Campbell
michael.campb...@gmail.com wrote:
I'm having a little trouble getting an updateStateByKey() call to work;
was wondering if anyone could help.
In my chain of calls from getting Kafka messages out of the queue to
converting the message to a set of things
Is there a way in the Apache Spark Kafka Utils to specify an offset to
start reading? Specifically, from the start of the queue, or failing that,
a specific point?
AM, Michael Campbell
michael.campb...@gmail.com wrote:
I've been playing with spark and streaming and have a question on stream
outputs. The symptom is I don't get any.
I have run spark-shell and all does as I expect, but when I run the
word-count example with streaming, it *works
13 matches
Mail list logo