Re: Fwd: Model weights of linear regression becomes abnormal values

2015-05-29 Thread Petar Zecevic


You probably need to scale the values in the data set so that they are 
all of comparable ranges and translate them so that their means get to 0.


You can use pyspark.mllib.feature.StandardScaler(True, True) object for 
that.


On 28.5.2015. 6:08, Maheshakya Wijewardena wrote:


Hi,

I'm trying to use Sparks' *LinearRegressionWithSGD* in PySpark with 
the attached dataset. The code is attached. When I check the model 
weights vector after training, it contains `nan` values.

[nan,nan,nan,nan,nan,nan,nan,nan]
But for some data sets, this problem does not occur. What might be the reason 
for this?
Is this an issue with the data I'm using or a bug?
Best regards.
--
Pruthuvi Maheshakya Wijewardena
Software Engineer
WSO2 Lanka (Pvt) Ltd
Email: mahesha...@wso2.com mailto:mahesha...@wso2.com
Mobile: +94711228855/*
*/




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




Re: Model weights of linear regression becomes abnormal values

2015-05-27 Thread Maheshakya Wijewardena
Thanks for the information. I'll try that out with Spark 1.4.

On Thu, May 28, 2015 at 9:54 AM, DB Tsai dbt...@dbtsai.com wrote:

 LinearRegressionWithSGD requires to tune the step size and # of
 iteration very carefully. Please try Linear Regression with elastic
 net implementation in Spark 1.4 in ML framework, which uses quasi
 newton method and step size will be automatically determined. That
 implementation also matches the result from R.

 Sincerely,

 DB Tsai
 ---
 Blog: https://www.dbtsai.com


 On Wed, May 27, 2015 at 9:08 PM, Maheshakya Wijewardena
 mahesha...@wso2.com wrote:
 
  Hi,
 
  I'm trying to use Sparks' LinearRegressionWithSGD in PySpark with the
  attached dataset. The code is attached. When I check the model weights
  vector after training, it contains `nan` values.
 
  [nan,nan,nan,nan,nan,nan,nan,nan]
 
  But for some data sets, this problem does not occur. What might be the
  reason for this?
  Is this an issue with the data I'm using or a bug?
 
  Best regards.
 
  --
  Pruthuvi Maheshakya Wijewardena
  Software Engineer
  WSO2 Lanka (Pvt) Ltd
  Email: mahesha...@wso2.com
  Mobile: +94711228855
 
 
 
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org




-- 
Pruthuvi Maheshakya Wijewardena
Software Engineer
WSO2 Lanka (Pvt) Ltd
Email: mahesha...@wso2.com
Mobile: +94711228855


Re: Model weights of linear regression becomes abnormal values

2015-05-27 Thread 吴明瑜
Sorry. I mean the parameter step.

2015-05-28 12:21 GMT+08:00 Maheshakya Wijewardena mahesha...@wso2.com:

 What is the parameter for the learning rate alpha? LinearRegressionWithSGD
 has only following parameters.


 @param data:  The training data.
 @param iterations:The number of iterations (default: 100).
 @param step:  The step parameter used in SGD
   (default: 1.0).
 @param miniBatchFraction: Fraction of data to be used for each SGD
   iteration.
 @param initialWeights:The initial weights (default: None).
 @param regParam:  The regularizer parameter (default: 1.0).
 @param regType:   The type of regularizer used for training
   our model.
   Allowed values: l1 for using L1Updater,
   l2 for using
SquaredL2Updater,
   none for no regularizer.
   (default: none)
 @param intercept: Boolean parameter which indicates the use
   or not of the augmented representation for
   training data (i.e. whether bias features
   are activated or not).


 On Thu, May 28, 2015 at 9:42 AM, 吴明瑜 timbero...@gmail.com wrote:

 The problem may occur when your algorithm cannot converge. Maybe you can
 check if the learning rate alpha is too large. Try reducing it.

 2015-05-28 12:08 GMT+08:00 Maheshakya Wijewardena mahesha...@wso2.com:


 Hi,

 I'm trying to use Sparks' *LinearRegressionWithSGD* in PySpark with the
 attached dataset. The code is attached. When I check the model weights
 vector after training, it contains `nan` values.

 [nan,nan,nan,nan,nan,nan,nan,nan]

 But for some data sets, this problem does not occur. What might be the 
 reason for this?
 Is this an issue with the data I'm using or a bug?

 Best regards.

 --
 Pruthuvi Maheshakya Wijewardena
 Software Engineer
 WSO2 Lanka (Pvt) Ltd
 Email: mahesha...@wso2.com
 Mobile: +94711228855




 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 --
 Mingyu Wu

 Institute of Parallel and Distributed Systems

 School of Software Engineering

 Shanghai Jiao Tong University




 --
 Pruthuvi Maheshakya Wijewardena
 Software Engineer
 WSO2 Lanka (Pvt) Ltd
 Email: mahesha...@wso2.com
 Mobile: +94711228855





-- 
Mingyu Wu

Institute of Parallel and Distributed Systems

School of Software Engineering

Shanghai Jiao Tong University


Fwd: Model weights of linear regression becomes abnormal values

2015-05-27 Thread Maheshakya Wijewardena
Hi,

I'm trying to use Sparks' *LinearRegressionWithSGD* in PySpark with the
attached dataset. The code is attached. When I check the model weights
vector after training, it contains `nan` values.

[nan,nan,nan,nan,nan,nan,nan,nan]

But for some data sets, this problem does not occur. What might be the
reason for this?
Is this an issue with the data I'm using or a bug?

Best regards.

-- 
Pruthuvi Maheshakya Wijewardena
Software Engineer
WSO2 Lanka (Pvt) Ltd
Email: mahesha...@wso2.com
Mobile: +94711228855
6,148,72,35,0,336,627,50,1
1,85,66,29,0,266,351,31,0
8,183,64,0,0,233,672,32,1
1,89,66,23,94,281,167,21,0
0,137,40,35,168,431,2288,33,1
5,116,74,0,0,256,201,30,0
3,78,50,32,88,310,248,26,1
10,115,0,0,0,353,134,29,0
2,197,70,45,543,305,158,53,1
8,125,96,0,0,0,232,54,1
4,110,92,0,0,376,191,30,0
10,168,74,0,0,380,537,34,1
10,139,80,0,0,271,1441,57,0
1,189,60,23,846,301,398,59,1
5,166,72,19,175,258,587,51,1
7,100,0,0,0,300,484,32,1
0,118,84,47,230,458,551,31,1
7,107,74,0,0,296,254,31,1
1,103,30,38,83,433,183,33,0
1,115,70,30,96,346,529,32,1
3,126,88,41,235,393,704,27,0
import sys
from pyspark import SparkContext
from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD
from numpy import array

# Load and parse data
def parse_point(line):
values = [float(x) for x in line.split(',')]
return LabeledPoint(values[0], values[1:])

sc = SparkContext(appName='LinearRegression')
# Add path to your dataset.
data = sc.textFile('dummy_data_sest.csv')
parsedData = data.map(parse_point)

# Build the model
model = LinearRegressionWithSGD.train(parsedData)

# Check model weight vector
print(model.weights)
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Model weights of linear regression becomes abnormal values

2015-05-27 Thread DB Tsai
LinearRegressionWithSGD requires to tune the step size and # of
iteration very carefully. Please try Linear Regression with elastic
net implementation in Spark 1.4 in ML framework, which uses quasi
newton method and step size will be automatically determined. That
implementation also matches the result from R.

Sincerely,

DB Tsai
---
Blog: https://www.dbtsai.com


On Wed, May 27, 2015 at 9:08 PM, Maheshakya Wijewardena
mahesha...@wso2.com wrote:

 Hi,

 I'm trying to use Sparks' LinearRegressionWithSGD in PySpark with the
 attached dataset. The code is attached. When I check the model weights
 vector after training, it contains `nan` values.

 [nan,nan,nan,nan,nan,nan,nan,nan]

 But for some data sets, this problem does not occur. What might be the
 reason for this?
 Is this an issue with the data I'm using or a bug?

 Best regards.

 --
 Pruthuvi Maheshakya Wijewardena
 Software Engineer
 WSO2 Lanka (Pvt) Ltd
 Email: mahesha...@wso2.com
 Mobile: +94711228855




 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org