Re: Should spark-ec2 get its own repo?

2015-08-02 Thread Nicholas Chammas
On Sat, Aug 1, 2015 at 1:09 PM Matt Goodman meawo...@gmail.com wrote:

 I am considering porting some of this to a more general spark-cloud
 launcher, including google/aliyun/rackspace.  It shouldn't be hard at all
 given the current approach for setup/install.


FWIW, there are already some tools for launching Spark clusters on GCE and
Azure:

http://spark-packages.org/?q=tags%3A%22Deployment%22

Nick


What are 'Buckets' referred in Spark Core code

2015-08-02 Thread Haseeb
Hi all ,
I am neebie trying to understand spark internals. There some entity referred
to as 'buckets' at many places in Spark Core code but I am having a hard
time what it is as it is just mentioned in code comments but I didn't come
across any data structure that reffered to it or any class for that matter.
I'd be really grateful if someone could shed some light on what exactly
buckets are and what is their functionally with respect to Spark internals.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/What-are-Buckets-referred-in-Spark-Core-code-tp13557.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: What are 'Buckets' referred in Spark Core code

2015-08-02 Thread cheez
Do we have a data structure that corresponds to buckets in Shuffle ? That is
of we wanted to explore the 'content' of these buckets in shuffle phase, can
we do that ? If yes, how ?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/What-are-Buckets-referred-in-Spark-Core-code-tp13557p13559.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: What are 'Buckets' referred in Spark Core code

2015-08-02 Thread Reynold Xin
There are two usage of buckets used in Spark core.

The first usage is in histogram, used to perform sorting. Basically we
build an approximate histogram of the data in order to decide how to
partition the data in sorting. Each bucket is a range in the histogram.

The 2nd is used in shuffle, where we partition the output of each map task
into different buckets, letting the reduce side fetching the map side
data based on their partition id.


On Sun, Aug 2, 2015 at 1:55 PM, Haseeb 11besemja...@seecs.edu.pk wrote:

 Hi all ,
 I am neebie trying to understand spark internals. There some entity
 referred
 to as 'buckets' at many places in Spark Core code but I am having a hard
 time what it is as it is just mentioned in code comments but I didn't come
 across any data structure that reffered to it or any class for that matter.
 I'd be really grateful if someone could shed some light on what exactly
 buckets are and what is their functionally with respect to Spark internals.



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/What-are-Buckets-referred-in-Spark-Core-code-tp13557.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Master JIRA ticket for tracking Spark 1.5.0 configuration renames, defaults changes, and configuration deprecation

2015-08-02 Thread Josh Rosen
To help us track planned / finished configuration renames, defaults
changes, and configuration deprecation for the upcoming 1.5.0 release, I
have created https://issues.apache.org/jira/browse/SPARK-9550.

As you make configuration changes or think of configurations that need to
be audited, please update that ticket by editing it or posting a comment.

This ticket will help us later when it comes time to draft release notes.

Thanks,
Josh


Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature

2015-08-02 Thread james
Thank you for your reply!
Do you mean that currently if i want to use this Tungsten feature, we had to
set sort shuffle manager(spark.shuffle.manager=sort) ,right ?  However, I
saw a slide Deep Dive into Project Tungsten: Bringing Spark Closer to Bare
Metal published in Spark Summit 2015 and it seems to recommend
'tungsten-sort' manager.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Came-across-Spark-SQL-hang-Error-issue-with-Spark-1-5-Tungsten-feature-tp13537p13561.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org