Re: spark job scheduling

2016-01-27 Thread Jakob Odersky
Nitpick: the up-to-date version of said wiki page is https://spark.apache.org/docs/1.6.0/job-scheduling.html (not sure how much it changed though) On Wed, Jan 27, 2016 at 7:50 PM, Chayapan Khannabha wrote: > I would start at this wiki page >

Re: spark job scheduling

2016-01-27 Thread Niranda Perera
Sorry I have made typos. let me rephrase 1. As I understand, the smallest unit of work an executor can perform, is a 'task'. In the 'FAIR' scheduler mode, let's say a job is submitted to the spark ctx which has a considerable amount of work to do in a single task. While such a 'big' task is

RE: spark hivethriftserver problem on 1.5.0 -> 1.6.0 upgrade

2016-01-27 Thread james.gre...@baesystems.com
Thanks Yin, here are the logs: INFO SparkContext - Added JAR file:/home/jegreen1/mms/zookeeper-3.4.6.jar at http://10.39.65.122:38933/jars/zookeeper-3.4.6.jar with timestamp 1453907484092 INFO SparkContext - Added JAR file:/home/jegreen1/mms/mms-http-0.2-SNAPSHOT.jar at

Adding Naive Bayes sample code in Documentation

2016-01-27 Thread Vinayak Agrawal
Hi, I was reading through Spark ML package and I couldn't find Naive Bayes examples documented on the spark documentation page. http://spark.apache.org/docs/latest/ml-classification-regression.html However, the API exists and can be used.

Mutiple spark contexts

2016-01-27 Thread Jakob Odersky
A while ago, I remember reading that multiple active Spark contexts per JVM was a possible future enhancement. I was wondering if this is still on the roadmap, what the major obstacles are and if I can be of any help in adding this feature? regards, --Jakob

Re: Mutiple spark contexts

2016-01-27 Thread Ashish Soni
There is a property you need to set which is spark.driver.allowMultipleContexts=true Ashish On Wed, Jan 27, 2016 at 1:39 PM, Jakob Odersky wrote: > A while ago, I remember reading that multiple active Spark contexts > per JVM was a possible future enhancement. > I was

Re: Mutiple spark contexts

2016-01-27 Thread Nicholas Chammas
There is a lengthy discussion about this on the JIRA: https://issues.apache.org/jira/browse/SPARK-2243 On Wed, Jan 27, 2016 at 1:43 PM Herman van Hövell tot Westerflier < hvanhov...@questtec.nl> wrote: > Just out of curiousity. What is the use case for having multiple active > contexts in a

Re: Spark 2.0.0 release plan

2016-01-27 Thread Michael Armbrust
We do maintenance releases on demand when there is enough to justify doing one. I'm hoping to cut 1.6.1 soon, but have not had time yet. On Wed, Jan 27, 2016 at 8:12 AM, Daniel Siegmann < daniel.siegm...@teamaol.com> wrote: > Will there continue to be monthly releases on the 1.6.x branch during

spark job scheduling

2016-01-27 Thread Niranda Perera
hi all, I have a few questions on spark job scheduling. 1. As I understand, the smallest unit of work an executor can perform. In the 'fair' scheduler mode, let's say a job is submitted to the spark ctx which has a considerable amount of work to do in a task. While such a 'big' task is running,

Re: spark job scheduling

2016-01-27 Thread Chayapan Khannabha
I would start at this wiki page https://spark.apache.org/docs/1.2.0/job-scheduling.html Although I'm sure this depends a lot on your cluster environment and the deployed Spark version. IMHO On Thu, Jan 28, 2016 at 10:27 AM, Niranda Perera wrote: > Sorry I have made

Re: Using distinct count in over clause

2016-01-27 Thread Akhil Das
Does it support over? I couldn't find it in the documentation http://spark.apache.org/docs/latest/sql-programming-guide.html#supported-hive-features Thanks Best Regards On Fri, Jan 22, 2016 at 2:31 PM, 汪洋 wrote: > I think it cannot be right. > > 在 2016年1月22日,下午4:53,汪洋

Re: Generate Amplab queries set

2016-01-27 Thread Akhil Das
Have a look at the TPC-H queries, I found this repository with the quries. https://github.com/ssavvides/tpch-spark Thanks Best Regards On Fri, Jan 22, 2016 at 1:35 AM, sara mustafa wrote: > Hi, > I have downloaded the Amplab benchmark dataset from >

Re: BUILD FAILURE at spark-sql_2.11?!

2016-01-27 Thread Jean-Baptiste Onofré
Thanks Jacek, I have the same issue here. Regards JB On 01/27/2016 10:15 AM, Jacek Laskowski wrote: Hi, Pull request submitted https://github.com/apache/spark/pull/10946/files. Please review and merge. Pozdrawiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache

Re: BUILD FAILURE at spark-sql_2.11?!

2016-01-27 Thread Jacek Laskowski
Hi, My very rough investigation has showed that the commit to may have broken the build was https://github.com/apache/spark/commit/555127387accdd7c1cf236912941822ba8af0a52 (nongli committed with rxin 7 hours ago). Found a fix and building the source again... Pozdrawiam, Jacek Jacek Laskowski |

Re: timeout in shuffle problem

2016-01-27 Thread Hamel Kothari
Are you running on YARN? Another possibility here is that your shuffle managers are facing GC pain and becoming less responsive, thus missing timeouts. Can you try increasing the memory on the node managers and see if that helps? On Sun, Jan 24, 2016 at 4:58 PM Ted Yu wrote: