Re: weightCol doesn't seem to be handled properly in PySpark
Yep, done. https://issues.apache.org/jira/browse/SPARK-17508 On Mon, Sep 12, 2016 at 9:06 AM Nick Pentreathwrote: > Could you create a JIRA ticket for it? > > https://issues.apache.org/jira/browse/SPARK > > On Thu, 8 Sep 2016 at 07:50 evanzamir wrote: > >> When I am trying to use LinearRegression, it seems that unless there is a >> column specified with weights, it will raise a py4j error. Seems odd >> because >> supposedly the default is weightCol=None, but when I specifically pass in >> weightCol=None to LinearRegression, I get this error. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/weightCol-doesn-t-seem-to-be-handled-properly-in-PySpark-tp27677.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>
Re: I noticed LinearRegression sometimes produces negative R^2 values
Yes, it's on a hold out segment from the data set being fitted. On Wed, Sep 7, 2016 at 1:02 AM Sean Owen <so...@cloudera.com> wrote: > Yes, should be. > It's also not necessarily nonnegative if you evaluate R^2 on a > different data set than you fit it to. Is that the case? > > On Tue, Sep 6, 2016 at 11:15 PM, Evan Zamir <zamir.e...@gmail.com> wrote: > > I am using the default setting for setting fitIntercept, which *should* > be > > TRUE right? > > > > On Tue, Sep 6, 2016 at 1:38 PM Sean Owen <so...@cloudera.com> wrote: > >> > >> Are you not fitting an intercept / regressing through the origin? with > >> that constraint it's no longer true that R^2 is necessarily > >> nonnegative. It basically means that the errors are even bigger than > >> what you'd get by predicting the data's mean value as a constant > >> model. > >> > >> On Tue, Sep 6, 2016 at 8:49 PM, evanzamir <zamir.e...@gmail.com> wrote: > >> > Am I misinterpreting what r2() in the LinearRegression Model summary > >> > means? > >> > By definition, R^2 should never be a negative number! > >> > > >> > > >> > > >> > -- > >> > View this message in context: > >> > > http://apache-spark-user-list.1001560.n3.nabble.com/I-noticed-LinearRegression-sometimes-produces-negative-R-2-values-tp27667.html > >> > Sent from the Apache Spark User List mailing list archive at > Nabble.com. > >> > > >> > - > >> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> > >
Re: I noticed LinearRegression sometimes produces negative R^2 values
I am using the default setting for setting *fitIntercept*, which *should* be TRUE right? On Tue, Sep 6, 2016 at 1:38 PM Sean Owenwrote: > Are you not fitting an intercept / regressing through the origin? with > that constraint it's no longer true that R^2 is necessarily > nonnegative. It basically means that the errors are even bigger than > what you'd get by predicting the data's mean value as a constant > model. > > On Tue, Sep 6, 2016 at 8:49 PM, evanzamir wrote: > > Am I misinterpreting what r2() in the LinearRegression Model summary > means? > > By definition, R^2 should never be a negative number! > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/I-noticed-LinearRegression-sometimes-produces-negative-R-2-values-tp27667.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > - > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > >
Re: How to add custom steps to Pipeline models?
Thanks, but I should have been more clear that I'm trying to do this in PySpark, not Scala. Using an example I found on SO, I was able to implement a Pipeline step in Python, but it seems it is more difficult (perhaps currently impossible) to make it persist to disk (I tried implementing _to_java method to no avail). Any ideas about that? On Sun, Aug 14, 2016 at 6:02 PM Jacek Laskowskiwrote: > Hi, > > It should just work if you followed the Transformer interface [1]. > When you have the transformers, creating a Pipeline is a matter of > setting them as additional stages (using Pipeline.setStages [2]). > > [1] > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Transformer.scala > [2] > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala#L107 > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Fri, Aug 12, 2016 at 9:19 AM, evanzamir wrote: > > I'm building an LDA Pipeline, currently with 4 steps, Tokenizer, > > StopWordsRemover, CountVectorizer, and LDA. I would like to add more > steps, > > for example, stemming and lemmatization, and also 1-gram and 2-grams > (which > > I believe is not supported by the default NGram class). Is there a way to > > add these steps? In sklearn, you can create classes with fit() and > > transform() methods, and that should be enough. Is that true in Spark ML > as > > well (or something similar)? > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-custom-steps-to-Pipeline-models-tp27522.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > - > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > >