I would just like to add that we do the very same/similar thing at Webtrends,
updateStateByKey has been a life-saver for our sessionization use-cases.
Cheers,
Sean
On Jul 15, 2015, at 11:20 AM, Silvio Fiorito
silvio.fior...@granturing.commailto:silvio.fior...@granturing.com wrote:
Hi Cody,
.map is just a transformation, so no work will actually be performed until
something takes action against it. Try adding a .count(), like so:
inputRDD.map { x = {
counter += 1
} }.count()
In case it is helpful, here are the docs on what exactly the transformations
and actions are:
We have gone down a similar path at Webtrends, Spark has worked amazingly well
for us in this use case. Our solution goes from REST, directly into spark, and
back out to the UI instantly.
Here is the resulting product in case you are curious (and please pardon the
self promotion):
Hi Kane-
http://spark.apache.org/docs/latest/tuning.html has excellent information that
may be helpful. In particular increasing the number of tasks may help, as well
as confirming that you don’t have more data than you're expecting landing on a
key.
Also, if you are using spark 1.2.0,
Can the 0.8.1.1 client still talk to 0.8.0 versions of Kafka
Yes it can.
0.8.1 is fully compatible with 0.8. It is buried on this page:
http://kafka.apache.org/documentation.html
In addition to the pom version bump SPARK-2492 would bring the kafka streaming
receiver (which was originally
Are you using spark streaming?
On Oct 14, 2014, at 10:35 AM, Salman Haq sal...@revmetrix.com wrote:
Hi,
In my application, I am successfully using foreachPartition to write large
amounts of data into a Cassandra database.
What is the recommended practice if the application wants to know
Would you mind sharing the code leading to your createStream? Are you also
setting group.id?
Thanks,
Sean
On Oct 10, 2014, at 4:31 PM, Abraham Jacob abe.jac...@gmail.com wrote:
Hi Folks,
I am seeing some strange behavior when using the Spark Kafka connector in
Spark streaming.
I
PM, Sean McNamara
sean.mcnam...@webtrends.commailto:sean.mcnam...@webtrends.com wrote:
Would you mind sharing the code leading to your createStream? Are you also
setting group.idhttp://group.id/?
Thanks,
Sean
On Oct 10, 2014, at 4:31 PM, Abraham Jacob
abe.jac...@gmail.commailto:abe.jac
it (https://issues.apache.org/jira/browse/SPARK-2492).
Thanks
Jerry
From: Abraham Jacob [mailto:abe.jac...@gmail.commailto:abe.jac...@gmail.com]
Sent: Saturday, October 11, 2014 6:57 AM
To: Sean McNamara
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Spark Streaming KafkaUtils Issue
subsequent queries unless task scheduling time
beats spark.locality.wait. Can cause overall low performance for all
subsequent tasks.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Tue, Jun 24, 2014 at 4:10 AM, Sean
We have a use case where we’d like something to execute once on each node and I
thought it would be good to ask here.
Currently we achieve this by setting the parallelism to the number of nodes and
use a mod partitioner:
val balancedRdd = sc.parallelize(
(0 until Settings.parallelism)
11 matches
Mail list logo