Agree that filter is perhaps unintuitive. Though the Scala collections API has
filter and filterNot which together provide context that makes it more
intuitive.
And yes the change could be via added methods that don't break existing API.
Still overall I would be -1 on this unless a
On Fri, Feb 7, 2014 at 7:48 AM, Aaron Davidson ilike...@gmail.com wrote:
Sorry for delay, by long-running I just meant if you were running an
iterative algorithm that was slowing down over time. We have observed this
in the spark-perf benchmark; as file system state builds up, the job can
Yes! Spark streaming programs are just like any spark program and so any
ec2 cluster setup using the spark-ec2 scripts can be used to run spark
streaming programs as well.
On Thu, Feb 27, 2014 at 10:11 AM, Aureliano Buendia buendia...@gmail.comwrote:
Hi,
Does the ec2 support for spark 0.9
On Thu, Feb 27, 2014 at 6:17 PM, Tathagata Das
tathagata.das1...@gmail.comwrote:
Yes! Spark streaming programs are just like any spark program and so any
ec2 cluster setup using the spark-ec2 scripts can be used to run spark
streaming programs as well.
Great. Does it come with any input
Sortbykey would be better I think as I am not sure groupbyKey will sort the
keyspace globally.
I would say you should
you take input K, V
GroupbyKey K,V = K,Seq(V..)
partitionBy default partitioner (hash)
SoryByKey K,Seq(V..)
Output this, only thing is if you need K,V pairs you will have to
Just as a second note, I am able to build the source in the official 0.9.0
release
(http://d3kbcqa49mib13.cloudfront.net/spark-0.9.0-incubating-bin-hadoop2.tgz).
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Build-Spark-Against-CDH5-tp2129p2130.html
Sent
The provided Spark EC2
scriptshttps://spark.incubator.apache.org/docs/0.9.0/ec2-scripts.htmland
default AMI ship with Python 2.6.8.
I would like to use Python 2.7.5 or later. I believe that among the 2.x
versions, 2.7 is the most popular.
What's the easiest way to get my Spark cluster on Python
Yes, the default spark EC2 cluster runs the standalone deploy mode. Since
Spark 0.9, the standalone deploy mode allows you to launch the driver app
within the cluster itself and automatically restart it if it fails. You can
read about launching your app inside the cluster