Thanks Yanbo!  That works!

The only issue is that it won’t print the predicted value from lp.features, 
from code line below.

model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print()

It prints the test input data correctly, but it keeps on printing “0.0” as the 
predicted values, which is the lp.features.

Thanks
Tri

From: Yanbo Liang [mailto:yanboha...@gmail.com]
Sent: Thursday, November 27, 2014 12:22 AM
To: Bui, Tri
Cc: user@spark.apache.org
Subject: Re: Inaccurate Estimate of weights model from 
StreamingLinearRegressionWithSGD

Hi Tri,

Maybe my latest responds for your problem is lost, whatever, the following code 
snippet can run correctly.

val model = new 
StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt))

model.algorithm.setIntercept(true)

Because that all setXXX() function in StreamingLinearRegressionWithSGD will 
return this.type which is an instance of itself,
so we need set other configuration in a separate line w/o return value.

2014-11-27 1:04 GMT+08:00 Bui, Tri 
<tri....@verizonwireless.com.invalid<mailto:tri....@verizonwireless.com.invalid>>:
Thanks Yanbo!

Modified code below:

val conf = new 
SparkConf().setMaster("local[2]").setAppName("StreamingLinearRegression")
    val ssc = new StreamingContext(conf, Seconds(args(2).toLong))
    val trainingData = ssc.textFileStream(args(0)).map(LabeledPoint.parse)
    val testData = ssc.textFileStream(args(1)).map(LabeledPoint.parse)
    val model = new 
StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt)).setNumIterations(args(4).toInt).setStepSize(.0001).algorithm.setIntercept(true)
    model.trainOn(trainingData)
    model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print()
    ssc.start()
    ssc.awaitTermination()

But I am getting compile error:
[error] 
/data/project/LinearRegression/src/main/scala/StreamingLinearRegression.scala:54:
 value trainOn is not a member
of org.apache.spark.mllib.regression.LinearRegressionWithSGD
[error]     model.trainOn(trainingData)
[error]           ^
[error] 
/data/project/LinearRegression/src/main/scala/StreamingLinearRegression.scala:55:
 value predictOnValues is not a
member of org.apache.spark.mllib.regression.LinearRegressionWithSGD
[error]     model.predictOnValues(testData.map(lp => (lp.label, 
lp.features))).print()
[error]           ^
[error] two errors found
[error] (compile:compile) Compilation failed

Thanks
Tri

From: Yanbo Liang [mailto:yanboha...@gmail.com<mailto:yanboha...@gmail.com>]
Sent: Tuesday, November 25, 2014 8:57 PM
To: Bui, Tri
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Inaccurate Estimate of weights model from 
StreamingLinearRegressionWithSGD

Hi Tri,

setIntercept() is not a member function of StreamingLinearRegressionWithSGD, 
it's a member function of LinearRegressionWithSGD(GeneralizedLinearAlgorithm) 
which is a member variable(named algorithm) of StreamingLinearRegressionWithSGD.

So you need to change your code to:
val model = new 
StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt))
.algorithm.setIntercept(true)

Thanks
Yanbo


2014-11-25 23:51 GMT+08:00 Bui, Tri 
<tri....@verizonwireless.com.invalid<mailto:tri....@verizonwireless.com.invalid>>:
Thanks Liang!

It was my bad, I fat finger one of the data point, correct it and the result 
match with yours.

I am still not able to get the intercept.  I am getting   [error] 
/data/project/LinearRegression/src/main/scala/StreamingLinearRegression.scala:47:
 value setIntercept
mber of org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD

I try code below:
val model = new 
StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt))
model.setIntercept(addIntercept = true).trainOn(trainingData)

and:

val model = new 
StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt))
.setIntercept(true)

But still get compilation error.

Thanks
Tri




From: Yanbo Liang [mailto:yanboha...@gmail.com<mailto:yanboha...@gmail.com>]
Sent: Tuesday, November 25, 2014 4:08 AM
To: Bui, Tri
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Inaccurate Estimate of weights model from 
StreamingLinearRegressionWithSGD

The case run correctly in my environment.

14/11/25 17:48:20 INFO regression.StreamingLinearRegressionWithSGD: Model 
updated at time 1416908900000 ms
14/11/25 17:48:20 INFO regression.StreamingLinearRegressionWithSGD: Current 
model: weights, [0.9999999999998588]

Can you provide more detail information if it is convenience?

Turn on the intercept value can be set as following:
val model = new StreamingLinearRegressionWithSGD()
      .algorithm.setIntercept(true)

2014-11-25 3:31 GMT+08:00 Bui, Tri 
<tri....@verizonwireless.com.invalid<mailto:tri....@verizonwireless.com.invalid>>:
Hi,

I am getting incorrect weights model from StreamingLinearRegressionwith SGD.

One feature Input data is:

(1,[1])
(2,[2])
…
.
(20,[20])

The result from the Current model: weights is [-4.432]….which is not correct.

Also, how do I turn on the intercept value for the StreamingLinearRegression ?

Thanks
Tri



Reply via email to