Re: Reduce memory usage of UnsafeInMemorySorter

2016-12-06 Thread Reynold Xin
This is not supposed to happen. Do you have a repro? On Tue, Dec 6, 2016 at 6:11 PM, Nicholas Chammas wrote: > [Re-titling thread.] > > OK, I see that the exception from my original email is being triggered > from this part of UnsafeInMemorySorter: > >

Reduce memory usage of UnsafeInMemorySorter

2016-12-06 Thread Nicholas Chammas
[Re-titling thread.] OK, I see that the exception from my original email is being triggered from this part of UnsafeInMemorySorter: https://github.com/apache/spark/blob/v2.0.2/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java#L209-L212 So I can ask a more

Re: [MLLIB] RankingMetrics.precisionAt

2016-12-06 Thread Maciej Szymkiewicz
This sounds much better. Follow up question is if we should provide MAP@k, which I believe is wider used metric. On 12/06/2016 09:52 PM, Sean Owen wrote: > As I understand, this might best be called "mean precision@k", not > "mean average precision, up to k". > > On Tue, Dec 6, 2016 at 9:43 PM

Re: Can I add a new method to RDD class?

2016-12-06 Thread Teng Long
Hi Jakob, It seems like I’ll have to either replace the version with my custom version in all the pom.xml files in every subdirectory that has one and publish locally, or keep the version (i.e. 2.0.2) and manually remove the spark repository cache in ~/.ivy2 and ~/.m2 and publish spark

Re: [MLLIB] RankingMetrics.precisionAt

2016-12-06 Thread Sean Owen
As I understand, this might best be called "mean precision@k", not "mean average precision, up to k". On Tue, Dec 6, 2016 at 9:43 PM Maciej Szymkiewicz wrote: > Thank you Sean. > > Maybe I am just confused about the language. When I read that it returns "the > average

Seeing bytecode of each task ececuted.

2016-12-06 Thread Mr rty ff
Hi If there are some way to see the bytecode in each task that is executed by spark. Thanks - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: SparkR Function for Step Wise Regression

2016-12-06 Thread Miao Wang
I tried one example on sparkR:   > training <- suppressWarnings(createDataFrame(iris))> step(spark.glm(training, Sepal_Width ~ Sepal_Length + Species), direction = "forward")   There is an error: Error: $ operator not defined for this S4 class   Based on my understanding of mllib.R, I think it is

Re: Spark-9487, Need some insight

2016-12-06 Thread Saikat Kanjilal
Well other than making the code consistent whats the high level goal in doing this and why does it matter so much how many workers we have in different scenarios (pyspark versus different components of spark). I'm ok not making the change and working on something else to be honest but

Re: Can I add a new method to RDD class?

2016-12-06 Thread Jakob Odersky
Yes, I think changing the property (line 29) in spark's root pom.xml should be sufficient. However, keep in mind that you'll also need to publish spark locally before you can access it in your test application. On Tue, Dec 6, 2016 at 2:50 AM, Teng Long wrote: > Thank you

Re: SPARK-18689: A proposal for priority based app scheduling utilizing linux cgroups.

2016-12-06 Thread Hegner, Travis
Steve, I appreciate your experience and insight when dealing with large clusters at the data-center scale. I'm also well aware of the complex nature of schedulers, and that it is an area of ongoing research being done by people/companies with many more resources than I have. This might

Re: [MLLIB] RankingMetrics.precisionAt

2016-12-06 Thread Maciej Szymkiewicz
Thank you Sean. Maybe I am just confused about the language. When I read that it returns "the average precision at the first k ranking positions" I somehow expect there will ap@k there and a the final output would be MAP@k not average precision at the k-th position. I guess it is not enough

Re: unhelpful exception thrown on predict() when ALS trained model doesn't contain user or product?

2016-12-06 Thread chris snow
Ah cool, thanks for the link! On 6 December 2016 at 12:25, Nick Pentreath wrote: > Indeed, it's being tracked here: https://issues.apache. > org/jira/browse/SPARK-18230 though no Pr has been opened yet. > > > On Tue, 6 Dec 2016 at 13:36 chris snow

Re: unhelpful exception thrown on predict() when ALS trained model doesn't contain user or product?

2016-12-06 Thread Nick Pentreath
Indeed, it's being tracked here: https://issues.apache.org/jira/browse/SPARK-18230 though no Pr has been opened yet. On Tue, 6 Dec 2016 at 13:36 chris snow wrote: > I'm using the MatrixFactorizationModel.predict() method and encountered > the following exception: > > Name:

Re: Difference between netty and netty-all

2016-12-06 Thread Steve Loughran
Nicholas, FYI, there's some patch for Hadoop 2.8? 2.9? to move up to Netty https://issues.apache.org/jira/browse/HADOOP-13866 https://issues.apache.org/jira/browse/HADOOP-12854 On 5 Dec 2016, at 19:46, Nicholas Chammas > wrote:

Re: SPARK-18689: A proposal for priority based app scheduling utilizing linux cgroups.

2016-12-06 Thread Steve Loughran
This is essentially what the cluster schedulers do: allow different people to submit work with different credentials and priority; cgroups & equivalent to limit granted resources to requested ones. If you have pre-emption enabled, you can even have one job kill work off the others. Spark does

unhelpful exception thrown on predict() when ALS trained model doesn't contain user or product?

2016-12-06 Thread chris snow
I'm using the MatrixFactorizationModel.predict() method and encountered the following exception: Name: java.util.NoSuchElementException Message: next on empty iterator StackTrace: scala.collection.Iterator$$anon$2.next(Iterator.scala:39) scala.collection.Iterator$$anon$2.next(Iterator.scala:37)

Re: Spark-9487, Need some insight

2016-12-06 Thread Steve Loughran
jenkins uses SBT, so you need to do the test run there. They are different, and have different test runners in particular. On 30 Nov 2016, at 04:14, Saikat Kanjilal > wrote: Hello Spark dev community, I took this the following jira item

Re: Can I add a new method to RDD class?

2016-12-06 Thread Teng Long
Thank you Jokob for clearing things up for me. Before, I thought my application was compiled against my local build since I can get all the logs I just added in spark-core. But it was all along using spark downloaded from remote maven repository, and that’s why I “cannot" add new RDD methods