Re: LogisticRegression: Predicting continuous outcomes

2014-05-29 Thread Bharath Ravi Kumar
Xiangrui, Christopher,

Thanks for responding.  I'll  go through the code in detail to evaluate if
the loss function used is suitable to our dataset. I'll also go through the
referred paper since I was unaware of the underlying theory. Thanks again.

-Bharath


On Thu, May 29, 2014 at 8:16 AM, Christopher Nguyen  wrote:

> Bharath, (apologies if you're already familiar with the theory): the
> proposed approach may or may not be appropriate depending on the overall
> transfer function in your data. In general, a single logistic regressor
> cannot approximate arbitrary non-linear functions (of linear combinations
> of the inputs). You can review works by, e.g., Hornik and Cybenko in the
> late 80's to see if you need something more, such as a simple, one
> hidden-layer neural network.
>
> This is a good summary:
>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.2647&rep=rep1&type=pdf
>
> --
> Christopher T. Nguyen
> Co-founder & CEO, Adatao 
> linkedin.com/in/ctnguyen
>
>
>
> On Wed, May 28, 2014 at 11:18 AM, Bharath Ravi Kumar  >wrote:
>
> > I'm looking to reuse the LogisticRegression model (with SGD) to predict a
> > real-valued outcome variable. (I understand that logistic regression is
> > generally applied to predict binary outcome, but for various reasons,
> this
> > model suits our needs better than LinearRegression). Related to that I
> have
> > the following questions:
> >
> > 1) Can the current LogisticRegression model be used as is to train based
> on
> > binary input (i.e. explanatory) features, or is there an assumption that
> > the explanatory features must be continuous?
> >
> > 2) I intend to reuse the current class to train a model on LabeledPoints
> > where the label is a real value (and not 0 / 1). I'd like to know if
> > invoking setValidateData(false) would suffice or if one must override the
> > validator to achieve this.
> >
> > 3) I recall seeing an experimental method on the class (
> >
> >
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
> > )
> > that clears the threshold separating positive & negative predictions.
> Once
> > the model is trained on real valued labels, would clearing this flag
> > suffice to predict an outcome that is continous in nature?
> >
> > Thanks,
> > Bharath
> >
> > P.S: I'm writing to dev@ and not user@ assuming that lib changes might
> be
> > necessary. Apologies if the mailing list is incorrect.
> >
>


Re: LogisticRegression: Predicting continuous outcomes

2014-05-28 Thread Christopher Nguyen
Bharath, (apologies if you're already familiar with the theory): the
proposed approach may or may not be appropriate depending on the overall
transfer function in your data. In general, a single logistic regressor
cannot approximate arbitrary non-linear functions (of linear combinations
of the inputs). You can review works by, e.g., Hornik and Cybenko in the
late 80's to see if you need something more, such as a simple, one
hidden-layer neural network.

This is a good summary:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.2647&rep=rep1&type=pdf

--
Christopher T. Nguyen
Co-founder & CEO, Adatao 
linkedin.com/in/ctnguyen



On Wed, May 28, 2014 at 11:18 AM, Bharath Ravi Kumar wrote:

> I'm looking to reuse the LogisticRegression model (with SGD) to predict a
> real-valued outcome variable. (I understand that logistic regression is
> generally applied to predict binary outcome, but for various reasons, this
> model suits our needs better than LinearRegression). Related to that I have
> the following questions:
>
> 1) Can the current LogisticRegression model be used as is to train based on
> binary input (i.e. explanatory) features, or is there an assumption that
> the explanatory features must be continuous?
>
> 2) I intend to reuse the current class to train a model on LabeledPoints
> where the label is a real value (and not 0 / 1). I'd like to know if
> invoking setValidateData(false) would suffice or if one must override the
> validator to achieve this.
>
> 3) I recall seeing an experimental method on the class (
>
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
> )
> that clears the threshold separating positive & negative predictions. Once
> the model is trained on real valued labels, would clearing this flag
> suffice to predict an outcome that is continous in nature?
>
> Thanks,
> Bharath
>
> P.S: I'm writing to dev@ and not user@ assuming that lib changes might be
> necessary. Apologies if the mailing list is incorrect.
>


Re: LogisticRegression: Predicting continuous outcomes

2014-05-28 Thread Xiangrui Meng
Please find my comments inline. -Xiangrui

On Wed, May 28, 2014 at 11:18 AM, Bharath Ravi Kumar
 wrote:
> I'm looking to reuse the LogisticRegression model (with SGD) to predict a
> real-valued outcome variable. (I understand that logistic regression is
> generally applied to predict binary outcome, but for various reasons, this
> model suits our needs better than LinearRegression). Related to that I have
> the following questions:
>
> 1) Can the current LogisticRegression model be used as is to train based on
> binary input (i.e. explanatory) features, or is there an assumption that
> the explanatory features must be continuous?
>

Binary features should be okay.

> 2) I intend to reuse the current class to train a model on LabeledPoints
> where the label is a real value (and not 0 / 1). I'd like to know if
> invoking setValidateData(false) would suffice or if one must override the
> validator to achieve this.
>

I'm not sure whether the loss function makes sense with real valued
labels. We may use the assumption that the label is binary to simplify
the computation of loss. You can take a look at the code and see
whether the loss function fits your model.

> 3) I recall seeing an experimental method on the class (
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala)
> that clears the threshold separating positive & negative predictions. Once
> the model is trained on real valued labels, would clearing this flag
> suffice to predict an outcome that is continous in nature?
>

If you clear the threshold, it outputs the raw scores from the
logistic function.

> Thanks,
> Bharath
>
> P.S: I'm writing to dev@ and not user@ assuming that lib changes might be
> necessary. Apologies if the mailing list is incorrect.


LogisticRegression: Predicting continuous outcomes

2014-05-28 Thread Bharath Ravi Kumar
I'm looking to reuse the LogisticRegression model (with SGD) to predict a
real-valued outcome variable. (I understand that logistic regression is
generally applied to predict binary outcome, but for various reasons, this
model suits our needs better than LinearRegression). Related to that I have
the following questions:

1) Can the current LogisticRegression model be used as is to train based on
binary input (i.e. explanatory) features, or is there an assumption that
the explanatory features must be continuous?

2) I intend to reuse the current class to train a model on LabeledPoints
where the label is a real value (and not 0 / 1). I'd like to know if
invoking setValidateData(false) would suffice or if one must override the
validator to achieve this.

3) I recall seeing an experimental method on the class (
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala)
that clears the threshold separating positive & negative predictions. Once
the model is trained on real valued labels, would clearing this flag
suffice to predict an outcome that is continous in nature?

Thanks,
Bharath

P.S: I'm writing to dev@ and not user@ assuming that lib changes might be
necessary. Apologies if the mailing list is incorrect.