Re: weightCol doesn't seem to be handled properly in PySpark

2016-09-12 Thread Evan Zamir
Yep, done. https://issues.apache.org/jira/browse/SPARK-17508

On Mon, Sep 12, 2016 at 9:06 AM Nick Pentreath 
wrote:

> Could you create a JIRA ticket for it?
>
> https://issues.apache.org/jira/browse/SPARK
>
> On Thu, 8 Sep 2016 at 07:50 evanzamir  wrote:
>
>> When I am trying to use LinearRegression, it seems that unless there is a
>> column specified with weights, it will raise a py4j error. Seems odd
>> because
>> supposedly the default is weightCol=None, but when I specifically pass in
>> weightCol=None to LinearRegression, I get this error.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/weightCol-doesn-t-seem-to-be-handled-properly-in-PySpark-tp27677.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: I noticed LinearRegression sometimes produces negative R^2 values

2016-09-07 Thread Evan Zamir
Yes, it's on a hold out segment from the data set being fitted.
On Wed, Sep 7, 2016 at 1:02 AM Sean Owen <so...@cloudera.com> wrote:

> Yes, should be.
> It's also not necessarily nonnegative if you evaluate R^2 on a
> different data set than you fit it to. Is that the case?
>
> On Tue, Sep 6, 2016 at 11:15 PM, Evan Zamir <zamir.e...@gmail.com> wrote:
> > I am using the default setting for setting fitIntercept, which *should*
> be
> > TRUE right?
> >
> > On Tue, Sep 6, 2016 at 1:38 PM Sean Owen <so...@cloudera.com> wrote:
> >>
> >> Are you not fitting an intercept / regressing through the origin? with
> >> that constraint it's no longer true that R^2 is necessarily
> >> nonnegative. It basically means that the errors are even bigger than
> >> what you'd get by predicting the data's mean value as a constant
> >> model.
> >>
> >> On Tue, Sep 6, 2016 at 8:49 PM, evanzamir <zamir.e...@gmail.com> wrote:
> >> > Am I misinterpreting what r2() in the LinearRegression Model summary
> >> > means?
> >> > By definition, R^2 should never be a negative number!
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >> >
> http://apache-spark-user-list.1001560.n3.nabble.com/I-noticed-LinearRegression-sometimes-produces-negative-R-2-values-tp27667.html
> >> > Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >> >
> >> > -
> >> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >> >
>


Re: I noticed LinearRegression sometimes produces negative R^2 values

2016-09-06 Thread Evan Zamir
I am using the default setting for setting *fitIntercept*, which *should*
be TRUE right?

On Tue, Sep 6, 2016 at 1:38 PM Sean Owen  wrote:

> Are you not fitting an intercept / regressing through the origin? with
> that constraint it's no longer true that R^2 is necessarily
> nonnegative. It basically means that the errors are even bigger than
> what you'd get by predicting the data's mean value as a constant
> model.
>
> On Tue, Sep 6, 2016 at 8:49 PM, evanzamir  wrote:
> > Am I misinterpreting what r2() in the LinearRegression Model summary
> means?
> > By definition, R^2 should never be a negative number!
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/I-noticed-LinearRegression-sometimes-produces-negative-R-2-values-tp27667.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>


Re: How to add custom steps to Pipeline models?

2016-08-14 Thread Evan Zamir
Thanks, but I should have been more clear that I'm trying to do this in
PySpark, not Scala. Using an example I found on SO, I was able to implement
a Pipeline step in Python, but it seems it is more difficult (perhaps
currently impossible) to make it persist to disk (I tried implementing
_to_java method to no avail). Any ideas about that?

On Sun, Aug 14, 2016 at 6:02 PM Jacek Laskowski  wrote:

> Hi,
>
> It should just work if you followed the Transformer interface [1].
> When you have the transformers, creating a Pipeline is a matter of
> setting them as additional stages (using Pipeline.setStages [2]).
>
> [1]
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Transformer.scala
> [2]
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala#L107
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Aug 12, 2016 at 9:19 AM, evanzamir  wrote:
> > I'm building an LDA Pipeline, currently with 4 steps, Tokenizer,
> > StopWordsRemover, CountVectorizer, and LDA. I would like to add more
> steps,
> > for example, stemming and lemmatization, and also 1-gram and 2-grams
> (which
> > I believe is not supported by the default NGram class). Is there a way to
> > add these steps? In sklearn, you can create classes with fit() and
> > transform() methods, and that should be enough. Is that true in Spark ML
> as
> > well (or something similar)?
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-custom-steps-to-Pipeline-models-tp27522.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>