Re: Back-pressure to Spark Kafka Streaming?

2016-12-05 Thread Richard Startin
I've seen the feature work very well. For tuning, you've got: spark.streaming.backpressure.pid.proportional (defaults to 1, non-negative) - weight for response to "error" (change between last batch and this batch) spark.streaming.backpressure.pid.integral (defaults to 0.2, non-negative) -

SparkR Function for Step Wise Regression

2016-12-05 Thread Prasann modi
Hello, I have an issue related to SparkR. I want to build step wise regression model using SparkR, is any function is there in SparkR to build those kind of model. In R function is available for step wise regression, code is given below: step(glm(formula,data,family),direction = "forward"))

driver in queued state and not started

2016-12-05 Thread Yu Wei
Hi Guys, I tried to run spark on mesos cluster. However, when I tried to submit jobs via spark-submit. The driver is in "Queued state" and not started. Which should I check? Thanks, Jared, (??) Software developer Interested in open source software, big data, Linux

Re: Spark-9487, Need some insight

2016-12-05 Thread Reynold Xin
Honestly it is pretty difficult. Given the difficulty, would it still make sense to do that change? (the one that sets the same number of workers/parallelism across different languages in testing) On Mon, Dec 5, 2016 at 3:33 PM, Saikat Kanjilal wrote: > Hello again dev

Re: Back-pressure to Spark Kafka Streaming?

2016-12-05 Thread Cody Koeninger
If you want finer-grained max rate setting, SPARK-17510 got merged a while ago. There's also SPARK-18580 which might help address the issue of starting backpressure rate for the first batch. On Mon, Dec 5, 2016 at 4:18 PM, Liren Ding wrote: > Hey all, > > Does

Re: Can I add a new method to RDD class?

2016-12-05 Thread Jakob Odersky
It looks like you're having issues with including your custom spark version (with the extensions) in your test project. To use your local spark version: 1) make sure it has a custom version (let's call it 2.1.0-CUSTOM) 2) publish it to your local machine with `sbt publishLocal` 3) include the

Re: Difference between netty and netty-all

2016-12-05 Thread Shixiong(Ryan) Zhu
No. I meant only updating master. It's not worth to update a maintenance branch unless there are critical issues. On Mon, Dec 5, 2016 at 5:39 PM, Nicholas Chammas wrote: > You mean just for branch-2.0, right? > ​ > > On Mon, Dec 5, 2016 at 8:35 PM Shixiong(Ryan) Zhu

Re: [MLLIB] RankingMetrics.precisionAt

2016-12-05 Thread Sean Owen
I read it again and that looks like it implements mean precision@k as I would expect. What is the issue? On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz wrote: > Hi, > > Could I ask for a fresh pair of eyes on this piece of code: > > >

Re: Difference between netty and netty-all

2016-12-05 Thread Nicholas Chammas
You mean just for branch-2.0, right? ​ On Mon, Dec 5, 2016 at 8:35 PM Shixiong(Ryan) Zhu wrote: > Hey Nick, > > It should be safe to upgrade Netty to the latest 4.0.x version. Could you > submit a PR, please? > > On Mon, Dec 5, 2016 at 11:47 AM, Nicholas Chammas < >

Re: Difference between netty and netty-all

2016-12-05 Thread Shixiong(Ryan) Zhu
Hey Nick, It should be safe to upgrade Netty to the latest 4.0.x version. Could you submit a PR, please? On Mon, Dec 5, 2016 at 11:47 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > That file is in Netty 4.0.29, but I believe the PR I referenced is not. > It's only in Netty 4.0.37

Re: Can I add a new method to RDD class?

2016-12-05 Thread Teng Long
Tarun, I want to access some private methods e.g. withScope, so I add a similar implicit class compiled with spark. But I can’t import that into my application? for example in I have added org/apache/spark/rdd/RDDExtensions.scala, in there I defined a implicit class inside the RDDExtensions

Re: Spark-9487, Need some insight

2016-12-05 Thread Saikat Kanjilal
Hello again dev community, Ping on this, apologies for rerunning this thread but never heard from anyone, based on this link: https://wiki.jenkins-ci.org/display/JENKINS/Installing+Jenkins I can try to install jenkins locally but is that really needed? Thanks in advance.

[MLLIB] RankingMetrics.precisionAt

2016-12-05 Thread Maciej Szymkiewicz
Hi, Could I ask fora fresh pair of eyes on this piece of code: https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80 @Since("1.2.0") def precisionAt(k: Int): Double = { require(k

Re: Future of the Python 2 support.

2016-12-05 Thread Maciej Szymkiewicz
Fair enough. I have to admit I am bit disappointed but that's life :) On 12/04/2016 07:28 PM, Reynold Xin wrote: > Echoing Nick. I don't see any strong reason to drop Python 2 support. > We typically drop support for X when it is rarely used and support for > X is long past EOL. Python 2 is

Back-pressure to Spark Kafka Streaming?

2016-12-05 Thread Liren Ding
Hey all, Does backressure actually work on spark kafka streaming? According to the latest spark streaming document: *http://spark.apache.org/docs/latest/streaming-programming-guide.html * "*In Spark 1.5, we have introduced a

Re: Can I add a new method to RDD class?

2016-12-05 Thread Teng Long
I’m trying to implement a transformation that can merge partitions (to align with GPU specs) and move them onto GPU memory, for example rdd.toGPU() and later transformations like map can automatically be performed on GPU. And another transformation rdd.offGPU() to move partitions off GPU memory

Re: Can I add a new method to RDD class?

2016-12-05 Thread Tarun Kumar
Teng, Can you please share the details of transformation that you want to implement in your method foo? I have created a gist of one dummy transformation for your method foo , this foo method transforms from an RDD[T] to RDD[(T,T)]. Many such more transformations can easily be achieved.

Re: Can I add a new method to RDD class?

2016-12-05 Thread Thakrar, Jayesh
Teng, Before you go down creating your own custom Spark system, do give some thought to what Holden and others are suggesting, viz. using implicit methods. If you want real concrete examples, have a look at the Spark Cassandra Connector - Here you will see an example of "extending"

Re: Can I add a new method to RDD class?

2016-12-05 Thread Teng Long
Thank you, Ryan. Didn’t there is a method for that! > On Dec 5, 2016, at 4:10 PM, Shixiong(Ryan) Zhu > wrote: > > RDD.sparkContext is public: > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@sparkContext:org.apache.spark.SparkContext

Re: Can I add a new method to RDD class?

2016-12-05 Thread Teng Long
Thank you for providing another answer, Holden. So I did what Tarun and Michal suggested, and it didn’t work out as I want to have a new transformation method in RDD class, and need to use that RDD’s spark context which is private. So I guess the only thing I can do now is to sbt publishLocal?

Re: Can I add a new method to RDD class?

2016-12-05 Thread Shixiong(Ryan) Zhu
RDD.sparkContext is public: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@sparkContext:org.apache.spark.SparkContext On Mon, Dec 5, 2016 at 1:04 PM, Teng Long wrote: > Thank you for providing another answer, Holden. > > So I did what

Re: ability to provide custom serializers

2016-12-05 Thread Michael Armbrust
Lets start with a new ticket, link them and we can merge if the solution ends up working out for both cases. On Sun, Dec 4, 2016 at 5:39 PM, Erik LaBianca wrote: > Thanks Michael! > > On Dec 2, 2016, at 7:29 PM, Michael Armbrust > wrote: > > I

Re: Difference between netty and netty-all

2016-12-05 Thread Nicholas Chammas
That file is in Netty 4.0.29, but I believe the PR I referenced is not. It's only in Netty 4.0.37 and up. On Mon, Dec 5, 2016 at 1:57 PM Ted Yu wrote: > This should be in netty-all : > > $ jar tvf >

Re: Difference between netty and netty-all

2016-12-05 Thread Sean Owen
netty should be Netty 3.x. It is all but unused but I couldn't manage to get rid of it: https://issues.apache.org/jira/browse/SPARK-17875 netty-all should be 4.x, actually used. On Tue, Dec 6, 2016 at 12:54 AM Nicholas Chammas wrote: > I’m looking at the list of

Re: Please limit commits for branch-2.1

2016-12-05 Thread Reynold Xin
I would like to re-iterate that committers please be very conservative now in merging patches into branch-2.1. Spark is a very sophisticated (compiler, optimizer) project and sometimes one-line changes can have huge consequences and introduce regressions. If it is just a tiny optimization, don't

Re: Difference between netty and netty-all

2016-12-05 Thread Ted Yu
This should be in netty-all : $ jar tvf /home/x/.m2/repository/io/netty/netty-all/4.0.29.Final/netty-all-4.0.29.Final.jar | grep ThreadLocalRandom 967 Tue Jun 23 11:10:30 UTC 2015 io/netty/util/internal/ThreadLocalRandom$1.class 1079 Tue Jun 23 11:10:30 UTC 2015

Re: SPARK-18689: A proposal for priority based app scheduling utilizing linux cgroups.

2016-12-05 Thread Michal Šenkýř
Hello Travis, I am just a short-time member of this list but I can definitely see the benefit of using built-in OS resource management facilities to dynamically manage cluster resources on the node level in this manner. At our company we often fight for resources on our development cluster

Difference between netty and netty-all

2016-12-05 Thread Nicholas Chammas
I’m looking at the list of dependencies here: https://github.com/apache/spark/search?l=Groff=netty=Code=%E2%9C%93 What’s the difference between netty and netty-all? The reason I ask is because I’m looking at a Netty PR and trying to figure out if Spark

Re: SPARK-18689: A proposal for priority based app scheduling utilizing linux cgroups.

2016-12-05 Thread Hegner, Travis
My apologies, in my excitement of finding a rather simple way to accomplish the scheduling goal I have in mind, I hastily jumped straight into a technical solution, without explaining that goal, or the problem it's attempting to solve. You are correct that I'm looking for an additional running

Re: Can I add a new method to RDD class?

2016-12-05 Thread Holden Karau
Doing that requires publishing a custom version of Spark, you can edit the version number do do a publishLocal - but maintaining that change is going to be difficult. The other approaches suggested are probably better, but also does your method need to be defined on the RDD class? Could you

java.lang.IllegalStateException: There is no space for new record

2016-12-05 Thread Nicholas Chammas
I was testing out a new project at scale on Spark 2.0.2 running on YARN, and my job failed with an interesting error message: TaskSetManager: Lost task 37.3 in stage 31.0 (TID 10684, server.host.name): java.lang.IllegalStateException: There is no space for new record 05:27:09.573 at

Re: Can I add a new method to RDD class?

2016-12-05 Thread long
Thank you very much! But why can’t I just add new methods in to the source code of RDD? > On Dec 5, 2016, at 3:15 AM, Michal Šenkýř [via Apache Spark Developers List] > wrote: > > A simple Scala example of implicit classes: > > implicit class

Re: Can I add a new method to RDD class?

2016-12-05 Thread Michal Šenkýř
A simple Scala example of implicit classes: implicit class EnhancedString(str:String) { def prefix(prefix:String)= prefix+ str } println("World".prefix("Hello ")) As Tarun said, you have to import it if it's not in the same class where you use it. Hope this makes it clearer, Michal