I'm looking to reuse the LogisticRegression model (with SGD) to predict a
real-valued outcome variable. (I understand that logistic regression is
generally applied to predict binary outcome, but for various reasons, this
model suits our needs better than LinearRegression). Related to that I have
the following questions:

1) Can the current LogisticRegression model be used as is to train based on
binary input (i.e. explanatory) features, or is there an assumption that
the explanatory features must be continuous?

2) I intend to reuse the current class to train a model on LabeledPoints
where the label is a real value (and not 0 / 1). I'd like to know if
invoking setValidateData(false) would suffice or if one must override the
validator to achieve this.

3) I recall seeing an experimental method on the class (
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala)
that clears the threshold separating positive & negative predictions. Once
the model is trained on real valued labels, would clearing this flag
suffice to predict an outcome that is continous in nature?

Thanks,
Bharath

P.S: I'm writing to dev@ and not user@ assuming that lib changes might be
necessary. Apologies if the mailing list is incorrect.

Reply via email to