Hi Naftali, Yes you're right. For now please add a column of ones. We are working on adding a weighted regularization term, and exposing the scala intercept option in the python binding.
Best, Reza On Mon, Jun 16, 2014 at 12:19 PM, Naftali Harris <naft...@affirm.com> wrote: > Hi everyone, > > The Python LogisticRegressionWithSGD does not appear to estimate an > intercept. When I run the following, the returned weights and intercept > are both 0.0: > > from pyspark import SparkContext > from pyspark.mllib.regression import LabeledPoint > from pyspark.mllib.classification import LogisticRegressionWithSGD > > def main(): > sc = SparkContext(appName="NoIntercept") > > train = sc.parallelize([LabeledPoint(0, [0]), LabeledPoint(1, [0]), > LabeledPoint(1, [0])]) > > model = LogisticRegressionWithSGD.train(train, iterations=500, > step=0.1) > print "Final weights: " + str(model.weights) > print "Final intercept: " + str(model.intercept) > > if __name__ == "__main__": > main() > > > Of course, one can fit an intercept with the simple expedient of adding a > column of ones, but that's kind of annoying. Moreover, it looks like the > scala version has an intercept option. > > Am I missing something? Should I just add the column of ones? If I > submitted a PR doing that, is that the sort of thing you guys would accept? > > Thanks! :-) > > Naftali >