Bharath, (apologies if you're already familiar with the theory): the proposed approach may or may not be appropriate depending on the overall transfer function in your data. In general, a single logistic regressor cannot approximate arbitrary non-linear functions (of linear combinations of the inputs). You can review works by, e.g., Hornik and Cybenko in the late 80's to see if you need something more, such as a simple, one hidden-layer neural network.
This is a good summary: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.2647&rep=rep1&type=pdf -- Christopher T. Nguyen Co-founder & CEO, Adatao <http://adatao.com> linkedin.com/in/ctnguyen On Wed, May 28, 2014 at 11:18 AM, Bharath Ravi Kumar <[email protected]>wrote: > I'm looking to reuse the LogisticRegression model (with SGD) to predict a > real-valued outcome variable. (I understand that logistic regression is > generally applied to predict binary outcome, but for various reasons, this > model suits our needs better than LinearRegression). Related to that I have > the following questions: > > 1) Can the current LogisticRegression model be used as is to train based on > binary input (i.e. explanatory) features, or is there an assumption that > the explanatory features must be continuous? > > 2) I intend to reuse the current class to train a model on LabeledPoints > where the label is a real value (and not 0 / 1). I'd like to know if > invoking setValidateData(false) would suffice or if one must override the > validator to achieve this. > > 3) I recall seeing an experimental method on the class ( > > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala > ) > that clears the threshold separating positive & negative predictions. Once > the model is trained on real valued labels, would clearing this flag > suffice to predict an outcome that is continous in nature? > > Thanks, > Bharath > > P.S: I'm writing to dev@ and not user@ assuming that lib changes might be > necessary. Apologies if the mailing list is incorrect. >
