Re: Should spark-ec2 get its own repo?
On Sat, Aug 1, 2015 at 1:09 PM Matt Goodman meawo...@gmail.com wrote: I am considering porting some of this to a more general spark-cloud launcher, including google/aliyun/rackspace. It shouldn't be hard at all given the current approach for setup/install. FWIW, there are already some tools for launching Spark clusters on GCE and Azure: http://spark-packages.org/?q=tags%3A%22Deployment%22 Nick
What are 'Buckets' referred in Spark Core code
Hi all , I am neebie trying to understand spark internals. There some entity referred to as 'buckets' at many places in Spark Core code but I am having a hard time what it is as it is just mentioned in code comments but I didn't come across any data structure that reffered to it or any class for that matter. I'd be really grateful if someone could shed some light on what exactly buckets are and what is their functionally with respect to Spark internals. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-are-Buckets-referred-in-Spark-Core-code-tp13557.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: What are 'Buckets' referred in Spark Core code
Do we have a data structure that corresponds to buckets in Shuffle ? That is of we wanted to explore the 'content' of these buckets in shuffle phase, can we do that ? If yes, how ? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-are-Buckets-referred-in-Spark-Core-code-tp13557p13559.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: What are 'Buckets' referred in Spark Core code
There are two usage of buckets used in Spark core. The first usage is in histogram, used to perform sorting. Basically we build an approximate histogram of the data in order to decide how to partition the data in sorting. Each bucket is a range in the histogram. The 2nd is used in shuffle, where we partition the output of each map task into different buckets, letting the reduce side fetching the map side data based on their partition id. On Sun, Aug 2, 2015 at 1:55 PM, Haseeb 11besemja...@seecs.edu.pk wrote: Hi all , I am neebie trying to understand spark internals. There some entity referred to as 'buckets' at many places in Spark Core code but I am having a hard time what it is as it is just mentioned in code comments but I didn't come across any data structure that reffered to it or any class for that matter. I'd be really grateful if someone could shed some light on what exactly buckets are and what is their functionally with respect to Spark internals. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-are-Buckets-referred-in-Spark-Core-code-tp13557.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Master JIRA ticket for tracking Spark 1.5.0 configuration renames, defaults changes, and configuration deprecation
To help us track planned / finished configuration renames, defaults changes, and configuration deprecation for the upcoming 1.5.0 release, I have created https://issues.apache.org/jira/browse/SPARK-9550. As you make configuration changes or think of configurations that need to be audited, please update that ticket by editing it or posting a comment. This ticket will help us later when it comes time to draft release notes. Thanks, Josh
Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature
Thank you for your reply! Do you mean that currently if i want to use this Tungsten feature, we had to set sort shuffle manager(spark.shuffle.manager=sort) ,right ? However, I saw a slide Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal published in Spark Summit 2015 and it seems to recommend 'tungsten-sort' manager. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Came-across-Spark-SQL-hang-Error-issue-with-Spark-1-5-Tungsten-feature-tp13537p13561.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org