Thanks Yanbo! That works! The only issue is that it won’t print the predicted value from lp.features, from code line below.
model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print() It prints the test input data correctly, but it keeps on printing “0.0” as the predicted values, which is the lp.features. Thanks Tri From: Yanbo Liang [mailto:yanboha...@gmail.com] Sent: Thursday, November 27, 2014 12:22 AM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD Hi Tri, Maybe my latest responds for your problem is lost, whatever, the following code snippet can run correctly. val model = new StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt)) model.algorithm.setIntercept(true) Because that all setXXX() function in StreamingLinearRegressionWithSGD will return this.type which is an instance of itself, so we need set other configuration in a separate line w/o return value. 2014-11-27 1:04 GMT+08:00 Bui, Tri <tri....@verizonwireless.com.invalid<mailto:tri....@verizonwireless.com.invalid>>: Thanks Yanbo! Modified code below: val conf = new SparkConf().setMaster("local[2]").setAppName("StreamingLinearRegression") val ssc = new StreamingContext(conf, Seconds(args(2).toLong)) val trainingData = ssc.textFileStream(args(0)).map(LabeledPoint.parse) val testData = ssc.textFileStream(args(1)).map(LabeledPoint.parse) val model = new StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt)).setNumIterations(args(4).toInt).setStepSize(.0001).algorithm.setIntercept(true) model.trainOn(trainingData) model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print() ssc.start() ssc.awaitTermination() But I am getting compile error: [error] /data/project/LinearRegression/src/main/scala/StreamingLinearRegression.scala:54: value trainOn is not a member of org.apache.spark.mllib.regression.LinearRegressionWithSGD [error] model.trainOn(trainingData) [error] ^ [error] /data/project/LinearRegression/src/main/scala/StreamingLinearRegression.scala:55: value predictOnValues is not a member of org.apache.spark.mllib.regression.LinearRegressionWithSGD [error] model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print() [error] ^ [error] two errors found [error] (compile:compile) Compilation failed Thanks Tri From: Yanbo Liang [mailto:yanboha...@gmail.com<mailto:yanboha...@gmail.com>] Sent: Tuesday, November 25, 2014 8:57 PM To: Bui, Tri Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD Hi Tri, setIntercept() is not a member function of StreamingLinearRegressionWithSGD, it's a member function of LinearRegressionWithSGD(GeneralizedLinearAlgorithm) which is a member variable(named algorithm) of StreamingLinearRegressionWithSGD. So you need to change your code to: val model = new StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt)) .algorithm.setIntercept(true) Thanks Yanbo 2014-11-25 23:51 GMT+08:00 Bui, Tri <tri....@verizonwireless.com.invalid<mailto:tri....@verizonwireless.com.invalid>>: Thanks Liang! It was my bad, I fat finger one of the data point, correct it and the result match with yours. I am still not able to get the intercept. I am getting [error] /data/project/LinearRegression/src/main/scala/StreamingLinearRegression.scala:47: value setIntercept mber of org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD I try code below: val model = new StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt)) model.setIntercept(addIntercept = true).trainOn(trainingData) and: val model = new StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt)) .setIntercept(true) But still get compilation error. Thanks Tri From: Yanbo Liang [mailto:yanboha...@gmail.com<mailto:yanboha...@gmail.com>] Sent: Tuesday, November 25, 2014 4:08 AM To: Bui, Tri Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD The case run correctly in my environment. 14/11/25 17:48:20 INFO regression.StreamingLinearRegressionWithSGD: Model updated at time 1416908900000 ms 14/11/25 17:48:20 INFO regression.StreamingLinearRegressionWithSGD: Current model: weights, [0.9999999999998588] Can you provide more detail information if it is convenience? Turn on the intercept value can be set as following: val model = new StreamingLinearRegressionWithSGD() .algorithm.setIntercept(true) 2014-11-25 3:31 GMT+08:00 Bui, Tri <tri....@verizonwireless.com.invalid<mailto:tri....@verizonwireless.com.invalid>>: Hi, I am getting incorrect weights model from StreamingLinearRegressionwith SGD. One feature Input data is: (1,[1]) (2,[2]) … . (20,[20]) The result from the Current model: weights is [-4.432]….which is not correct. Also, how do I turn on the intercept value for the StreamingLinearRegression ? Thanks Tri