Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-08-01 Thread Bharath Ravi Kumar
Thanks for fixing it.

On Sun, Aug 2, 2015 at 3:17 AM, Patrick Wendell  wrote:

> Hey All,
>
> I got it up and running - it was a newly surfaced bug in the build scripts.
>
> - Patrick
>
> On Wed, Jul 29, 2015 at 6:05 AM, Bharath Ravi Kumar 
> wrote:
> > Hey Patrick,
> >
> > Any update on this front please?
> >
> > Thanks,
> > Bharath
> >
> > On Fri, Jul 24, 2015 at 8:38 PM, Patrick Wendell 
> wrote:
> >>
> >> Hey Bharath,
> >>
> >> There was actually an incompatible change to the build process that
> >> broke several of the Jenkins builds. This should be patched up in the
> >> next day or two and nightly builds will resume.
> >>
> >> - Patrick
> >>
> >> On Fri, Jul 24, 2015 at 12:51 AM, Bharath Ravi Kumar
> >>  wrote:
> >> > I noticed the last (1.5) build has a timestamp of 16th July. Have
> >> > nightly
> >> > builds been discontinued since then?
> >> >
> >> > Thanks,
> >> > Bharath
> >> >
> >> > On Sun, May 24, 2015 at 1:11 PM, Patrick Wendell 
> >> > wrote:
> >> >>
> >> >> Hi All,
> >> >>
> >> >> This week I got around to setting up nightly builds for Spark on
> >> >> Jenkins. I'd like feedback on these and if it's going well I can
> merge
> >> >> the relevant automation scripts into Spark mainline and document it
> on
> >> >> the website. Right now I'm doing:
> >> >>
> >> >> 1. SNAPSHOT's of Spark master and release branches published to ASF
> >> >> Maven snapshot repo:
> >> >>
> >> >>
> >> >>
> >> >>
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/
> >> >>
> >> >> These are usable by adding this repository in your build and using a
> >> >> snapshot version (e.g. 1.3.2-SNAPSHOT).
> >> >>
> >> >> 2. Nightly binary package builds and doc builds of master and release
> >> >> versions.
> >> >>
> >> >> http://people.apache.org/~pwendell/spark-nightly/
> >> >>
> >> >> These build 4 times per day and are tagged based on commits.
> >> >>
> >> >> If anyone has feedback on these please let me know.
> >> >>
> >> >> Thanks!
> >> >> - Patrick
> >> >>
> >> >> -
> >> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >> >>
> >> >
> >
> >
>


Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-07-29 Thread Bharath Ravi Kumar
Hey Patrick,

Any update on this front please?

Thanks,
Bharath

On Fri, Jul 24, 2015 at 8:38 PM, Patrick Wendell  wrote:

> Hey Bharath,
>
> There was actually an incompatible change to the build process that
> broke several of the Jenkins builds. This should be patched up in the
> next day or two and nightly builds will resume.
>
> - Patrick
>
> On Fri, Jul 24, 2015 at 12:51 AM, Bharath Ravi Kumar
>  wrote:
> > I noticed the last (1.5) build has a timestamp of 16th July. Have nightly
> > builds been discontinued since then?
> >
> > Thanks,
> > Bharath
> >
> > On Sun, May 24, 2015 at 1:11 PM, Patrick Wendell 
> wrote:
> >>
> >> Hi All,
> >>
> >> This week I got around to setting up nightly builds for Spark on
> >> Jenkins. I'd like feedback on these and if it's going well I can merge
> >> the relevant automation scripts into Spark mainline and document it on
> >> the website. Right now I'm doing:
> >>
> >> 1. SNAPSHOT's of Spark master and release branches published to ASF
> >> Maven snapshot repo:
> >>
> >>
> >>
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/
> >>
> >> These are usable by adding this repository in your build and using a
> >> snapshot version (e.g. 1.3.2-SNAPSHOT).
> >>
> >> 2. Nightly binary package builds and doc builds of master and release
> >> versions.
> >>
> >> http://people.apache.org/~pwendell/spark-nightly/
> >>
> >> These build 4 times per day and are tagged based on commits.
> >>
> >> If anyone has feedback on these please let me know.
> >>
> >> Thanks!
> >> - Patrick
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >
>


Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-07-26 Thread Bharath Ravi Kumar
Thanks Patrick. I'll await resumption of the master tree's nightly builds.

-Bharath

On Fri, Jul 24, 2015 at 8:38 PM, Patrick Wendell  wrote:

> Hey Bharath,
>
> There was actually an incompatible change to the build process that
> broke several of the Jenkins builds. This should be patched up in the
> next day or two and nightly builds will resume.
>
> - Patrick
>
> On Fri, Jul 24, 2015 at 12:51 AM, Bharath Ravi Kumar
>  wrote:
> > I noticed the last (1.5) build has a timestamp of 16th July. Have nightly
> > builds been discontinued since then?
> >
> > Thanks,
> > Bharath
> >
> > On Sun, May 24, 2015 at 1:11 PM, Patrick Wendell 
> wrote:
> >>
> >> Hi All,
> >>
> >> This week I got around to setting up nightly builds for Spark on
> >> Jenkins. I'd like feedback on these and if it's going well I can merge
> >> the relevant automation scripts into Spark mainline and document it on
> >> the website. Right now I'm doing:
> >>
> >> 1. SNAPSHOT's of Spark master and release branches published to ASF
> >> Maven snapshot repo:
> >>
> >>
> >>
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/
> >>
> >> These are usable by adding this repository in your build and using a
> >> snapshot version (e.g. 1.3.2-SNAPSHOT).
> >>
> >> 2. Nightly binary package builds and doc builds of master and release
> >> versions.
> >>
> >> http://people.apache.org/~pwendell/spark-nightly/
> >>
> >> These build 4 times per day and are tagged based on commits.
> >>
> >> If anyone has feedback on these please let me know.
> >>
> >> Thanks!
> >> - Patrick
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >
>


Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-07-24 Thread Bharath Ravi Kumar
I noticed the last (1.5) build has a timestamp of 16th July. Have nightly
builds been discontinued since then?

Thanks,
Bharath

On Sun, May 24, 2015 at 1:11 PM, Patrick Wendell  wrote:

> Hi All,
>
> This week I got around to setting up nightly builds for Spark on
> Jenkins. I'd like feedback on these and if it's going well I can merge
> the relevant automation scripts into Spark mainline and document it on
> the website. Right now I'm doing:
>
> 1. SNAPSHOT's of Spark master and release branches published to ASF
> Maven snapshot repo:
>
>
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/
>
> These are usable by adding this repository in your build and using a
> snapshot version (e.g. 1.3.2-SNAPSHOT).
>
> 2. Nightly binary package builds and doc builds of master and release
> versions.
>
> http://people.apache.org/~pwendell/spark-nightly/
>
> These build 4 times per day and are tagged based on commits.
>
> If anyone has feedback on these please let me know.
>
> Thanks!
> - Patrick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Spark on Mesos vs Yarn

2015-06-09 Thread Bharath Ravi Kumar
All,

Despite the common origin of spark & mesos, the stability and adoption of
mesos, and the age of the spark-mesos binding, I find the mesos support
less mature, with fundamental shortcomings (like framework auth
<https://issues.apache.org/jira/browse/SPARK-6284>) remaining unresolved.
If there's shortage of developer time, I'd be glad to contribute, but it's
unclear if the committer group has sufficient time (and priority) to take
the mesos support forward. While it has been stated often that support for
mesos & yarn are equally important, that doesn't seem to translate to
visible progress. I'd be glad if my observation is incorrect as I seek
better focus and long term commitment on the mesos support.
As for the specific issue (6284), I'm happy to build, testing & eventually
deploy the patch in our production cluster, but I'd rather see it becoming
mainstream.
Thanks for your consideration.

-Bharath


On Thu, May 28, 2015 at 9:18 AM, Bharath Ravi Kumar 
wrote:

> A follow up : considering that spark on mesos is indeed important to
> databricks, its partners and the community, fundamental issues like
> spark-6284 shouldn't be languishing for this long. A mesos cluster hosting
> diverse (i.e.multi-tenant)  workloads is a common scenario in production
> for serious users. The ability to auth a framework & assign roles would be
> a fairly basic ask, one would imagine. Is the lack of time / effort a
> constraint? If so, I'd be glad to help (as mentioned in the jira).
>
> On Fri, May 15, 2015 at 5:29 PM, Iulian DragoČ™  > wrote:
>
>> Hi Ankur,
>>
>> Just to add a thought to Tim's excellent answer, Spark on Mesos is very
>> important to us and is the recommended deployment for our customers as
>> Typesafe.
>>
>> Thanks for pointing to your PR, I see Tim already went through a round of
>> reviews. It seems very useful, I'll give it a try as well.
>>
>> thanks,
>> iulian
>>
>>
>>
>> On Fri, May 15, 2015 at 9:53 AM, Ankur Chauhan 
>> wrote:
>>
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA1
>>>
>>> Hi Tim,
>>>
>>> Thanks for such a detailed email. I am excited to hear about the new
>>> features, I had a pull request going for adding "attribute based
>>> filtering in the mesos scheduler" but it hasn't received much love -
>>> https://github.com/apache/spark/pull/5563 . I am a fan of
>>> mesos/marathon/mesosphere and spark ecosystems and trying to push
>>> adoption at my workplace.
>>>
>>> It would love to see documentation, tutorials (anything actually) that
>>> would make mesos + spark a better and more fleshed out solution. Would
>>> it be possible for you to share some links to the JIRA and pull
>>> requests so that I can keep track on the progress/features.
>>>
>>> Again, thanks for replying.
>>>
>>> - -- Ankur Chauhan
>>>
>>> On 15/05/2015 00:39, Tim Chen wrote:
>>> > Hi Ankur,
>>> >
>>> > This is a great question as I've heard similar concerns about Spark
>>> > on Mesos.
>>> >
>>> > At the time when I started to contribute to Spark on Mesos approx
>>> > half year ago, the Mesos scheduler and related code hasn't really
>>> > got much attention from anyone and it was pretty much in
>>> > maintenance mode.
>>> >
>>> > As a Mesos PMC that is really interested in Spark I started to
>>> > refactor and check out different JIRAs and PRs around the Mesos
>>> > scheduler, and after that started to fix various bugs in Spark,
>>> > added documentation and also in fix related Mesos issues as well.
>>> >
>>> > Just recently for 1.4 we've merged in Cluster mode and Docker
>>> > support, and there are also pending PRs around framework
>>> > authentication, multi-role support, dynamic allocation, more finer
>>> > tuned coarse grain mode scheduling configurations, etc.
>>> >
>>> > And finally just want to mention that Mesosphere and Typesafe is
>>> > collaborating to bring a certified distribution
>>> > (https://databricks.com/spark/certification/certified-spark-distributi
>>> on)
>>> > of Spark on Mesos and DCOS, and we will be pouring resources into
>>> > not just maintain Spark on Mesos but drive more features into the
>>> > Mesos scheduler and also in Mesos so stateful services can leverage
>>> > new APIs and features to make better scheduling decision

NPE calling reduceByKey on JavaPairRDD

2014-06-26 Thread Bharath Ravi Kumar
Hi,

I've been encountering a NPE invoking reduceByKey on JavaPairRDD since
upgrading to 1.0.0 . The issue is straightforward to reproduce with 1.0.0
and doesn't occur with 0.9.0.  The stack trace is as follows:

14/06/26 21:05:35 WARN scheduler.TaskSetManager: Loss was due to
java.lang.NullPointerException
java.lang.NullPointerException
at
org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750)
at
org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)


I've raised a bug to track this issue :
https://issues.apache.org/jira/browse/SPARK-2292

Thanks,
Bharath


Re: LogisticRegression: Predicting continuous outcomes

2014-05-29 Thread Bharath Ravi Kumar
Xiangrui, Christopher,

Thanks for responding.  I'll  go through the code in detail to evaluate if
the loss function used is suitable to our dataset. I'll also go through the
referred paper since I was unaware of the underlying theory. Thanks again.

-Bharath


On Thu, May 29, 2014 at 8:16 AM, Christopher Nguyen  wrote:

> Bharath, (apologies if you're already familiar with the theory): the
> proposed approach may or may not be appropriate depending on the overall
> transfer function in your data. In general, a single logistic regressor
> cannot approximate arbitrary non-linear functions (of linear combinations
> of the inputs). You can review works by, e.g., Hornik and Cybenko in the
> late 80's to see if you need something more, such as a simple, one
> hidden-layer neural network.
>
> This is a good summary:
>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.2647&rep=rep1&type=pdf
>
> --
> Christopher T. Nguyen
> Co-founder & CEO, Adatao <http://adatao.com>
> linkedin.com/in/ctnguyen
>
>
>
> On Wed, May 28, 2014 at 11:18 AM, Bharath Ravi Kumar  >wrote:
>
> > I'm looking to reuse the LogisticRegression model (with SGD) to predict a
> > real-valued outcome variable. (I understand that logistic regression is
> > generally applied to predict binary outcome, but for various reasons,
> this
> > model suits our needs better than LinearRegression). Related to that I
> have
> > the following questions:
> >
> > 1) Can the current LogisticRegression model be used as is to train based
> on
> > binary input (i.e. explanatory) features, or is there an assumption that
> > the explanatory features must be continuous?
> >
> > 2) I intend to reuse the current class to train a model on LabeledPoints
> > where the label is a real value (and not 0 / 1). I'd like to know if
> > invoking setValidateData(false) would suffice or if one must override the
> > validator to achieve this.
> >
> > 3) I recall seeing an experimental method on the class (
> >
> >
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
> > )
> > that clears the threshold separating positive & negative predictions.
> Once
> > the model is trained on real valued labels, would clearing this flag
> > suffice to predict an outcome that is continous in nature?
> >
> > Thanks,
> > Bharath
> >
> > P.S: I'm writing to dev@ and not user@ assuming that lib changes might
> be
> > necessary. Apologies if the mailing list is incorrect.
> >
>


LogisticRegression: Predicting continuous outcomes

2014-05-28 Thread Bharath Ravi Kumar
I'm looking to reuse the LogisticRegression model (with SGD) to predict a
real-valued outcome variable. (I understand that logistic regression is
generally applied to predict binary outcome, but for various reasons, this
model suits our needs better than LinearRegression). Related to that I have
the following questions:

1) Can the current LogisticRegression model be used as is to train based on
binary input (i.e. explanatory) features, or is there an assumption that
the explanatory features must be continuous?

2) I intend to reuse the current class to train a model on LabeledPoints
where the label is a real value (and not 0 / 1). I'd like to know if
invoking setValidateData(false) would suffice or if one must override the
validator to achieve this.

3) I recall seeing an experimental method on the class (
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala)
that clears the threshold separating positive & negative predictions. Once
the model is trained on real valued labels, would clearing this flag
suffice to predict an outcome that is continous in nature?

Thanks,
Bharath

P.S: I'm writing to dev@ and not user@ assuming that lib changes might be
necessary. Apologies if the mailing list is incorrect.