Re: dataset aggregators with kryo encoder very slow

2017-01-21 Thread Koert Kuipers
sorry i meant to say SPARK-18980 On Sat, Jan 21, 2017 at 1:48 AM, Koert Kuipers wrote: > found it :) SPARK-1890 > thanks cloud-fan > > On Sat, Jan 21, 2017 at 1:46 AM, Koert Kuipers wrote: > >> trying to replicate this in spark itself i can for v2.1.0 but

Re: Why StringIndexer uses double instead of int for indexing?

2017-01-21 Thread Holden Karau
I'm downstream stages the labels & features are generally expected to be doubles, so its easier to use as a double. On Sat, Jan 21, 2017 at 5:32 PM Shiyuan wrote: > Hi Spark, > StringIndex uses double instead of int for indexing >

Why StringIndexer uses double instead of int for indexing?

2017-01-21 Thread Shiyuan
Hi Spark, StringIndex uses double instead of int for indexing http://spark.apache.org/docs/latest/ml-features.html#stringindexer. What's the rationale for using double to index? Would it be more appropriate to use int to index (which is consistent with other place like Vector.sparse) Shiyuan

mvn deploy tries to upload artifacts multiple times

2017-01-21 Thread Koert Kuipers
i noticed when doing maven deploy for spark (for inhouse release) that it tries to upload certain artifacts multiple times. for example it tried to upload spark-network-common tests jar twice. our inhouse repo doesnt appreciate this for releases. it will refuse the second time. also it makes no

Re:

2017-01-21 Thread Mark Hamstra
I wouldn't say that Executors are dumb, but there are some pretty clear divisions of concepts and responsibilities across the different pieces of the Spark architecture. A Job is a concept that is completely unknown to an Executor, which deals instead with just the Tasks that it is given. So you

Re: is this something to worry about? HADOOP_HOME or hadoop.home.dir are not set

2017-01-21 Thread Haviv, Daniel
No Thank you. Daniel On 20 Jan 2017, at 23:28, kant kodali > wrote: Hi, I am running spark standalone with no storage. when I use spark-submit to submit my job I get the following Exception and I wonder if this is something to worry about?

Re:

2017-01-21 Thread Jacek Laskowski
Executors are "dumb", i.e. they execute TaskRunners for tasks and...that's it. Your logic should be on the driver that can intercept events and...trigger cleanup. I don't think there's another way to do it. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache

tuning the spark.locality.wait

2017-01-21 Thread Cesar
I am working with datasets of the order of 200 GB using 286 cores divided across 143 executor. Each executor has 32 Gb (which makes every core 15 Gb). And I am using Spark 1.6. I would like to tune the spark.locality.wait. Does anyone can give me a range on the values of spark.locality wait that