Hello,
I was wondering if there is an easy way launch EC2 instances which have a
Spark built for Scala 2.11.
The only way I can think of is to prepare the sources for 2.11 as shown in
the Spark build instructions (
http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211),
Since switching to Spark 1.2.1 I'm seeing logging for the stage progress
(ex.):
[error] [Stage 2154: (14 + 8) / 48][Stage 2210:
(0 + 0) / 48]
Any reason why these are error level logs? Shouldn't they be info level?
In any case is there a way to disable them other than
to achieve the animation
and this won't work via a logging framework. stderr is where log-like
output goes, because stdout is for program output.
On Wed, Apr 1, 2015 at 10:56 AM, Theodore Vasiloudis
theodoros.vasilou...@gmail.com wrote:
Since switching to Spark 1.2.1 I'm seeing logging
Hello,
in the context of SPARK-2394 Make it easier to read LZO-compressed files
from EC2 clusters https://issues.apache.org/jira/browse/SPARK-2394 , I
was wondering:
Is there an easy way to make a user-provided script run at every machine in
a cluster launched on EC2?
Regards,
Theodore
--
all incoming edge pairs without
repartitioning the data by dstID. You need to perform this shuffle for
joining too. Otherwise two incoming edges could be in separate partitions
and never meet. Am I missing something?
On Mon, Dec 8, 2014 at 3:53 PM, Theodore Vasiloudis
theodoros.vasilou
improves performance. Decreasing the number of
partitions has a large negative effect on the runtime.
On Mon, Dec 8, 2014 at 5:46 PM, Daniel Darabos
daniel.dara...@lynxanalytics.com wrote:
On Mon, Dec 8, 2014 at 5:26 PM, Theodore Vasiloudis
theodoros.vasilou...@gmail.com wrote:
@Daniel
It's
Hello everyone,
I was wondering what is the most efficient way for retrieving the top K
values per key in a (key, value) RDD.
The simplest way I can think of is to do a groupByKey, sort the iterables
and then take the top K
elements for every key.
But reduceByKey is an operation that can be