On Sat, Aug 1, 2015 at 1:09 PM Matt Goodman meawo...@gmail.com wrote:
I am considering porting some of this to a more general spark-cloud
launcher, including google/aliyun/rackspace. It shouldn't be hard at all
given the current approach for setup/install.
FWIW, there are already some tools
Hi all ,
I am neebie trying to understand spark internals. There some entity referred
to as 'buckets' at many places in Spark Core code but I am having a hard
time what it is as it is just mentioned in code comments but I didn't come
across any data structure that reffered to it or any class for
Do we have a data structure that corresponds to buckets in Shuffle ? That is
of we wanted to explore the 'content' of these buckets in shuffle phase, can
we do that ? If yes, how ?
--
View this message in context:
There are two usage of buckets used in Spark core.
The first usage is in histogram, used to perform sorting. Basically we
build an approximate histogram of the data in order to decide how to
partition the data in sorting. Each bucket is a range in the histogram.
The 2nd is used in shuffle, where
To help us track planned / finished configuration renames, defaults
changes, and configuration deprecation for the upcoming 1.5.0 release, I
have created https://issues.apache.org/jira/browse/SPARK-9550.
As you make configuration changes or think of configurations that need to
be audited, please
Thank you for your reply!
Do you mean that currently if i want to use this Tungsten feature, we had to
set sort shuffle manager(spark.shuffle.manager=sort) ,right ? However, I
saw a slide Deep Dive into Project Tungsten: Bringing Spark Closer to Bare
Metal published in Spark Summit 2015 and it