Re: Fwd: Model weights of linear regression becomes abnormal values
You probably need to scale the values in the data set so that they are all of comparable ranges and translate them so that their means get to 0. You can use pyspark.mllib.feature.StandardScaler(True, True) object for that. On 28.5.2015. 6:08, Maheshakya Wijewardena wrote: Hi, I'm trying to use Sparks' *LinearRegressionWithSGD* in PySpark with the attached dataset. The code is attached. When I check the model weights vector after training, it contains `nan` values. [nan,nan,nan,nan,nan,nan,nan,nan] But for some data sets, this problem does not occur. What might be the reason for this? Is this an issue with the data I'm using or a bug? Best regards. -- Pruthuvi Maheshakya Wijewardena Software Engineer WSO2 Lanka (Pvt) Ltd Email: mahesha...@wso2.com mailto:mahesha...@wso2.com Mobile: +94711228855/* */ - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Model weights of linear regression becomes abnormal values
Thanks for the information. I'll try that out with Spark 1.4. On Thu, May 28, 2015 at 9:54 AM, DB Tsai dbt...@dbtsai.com wrote: LinearRegressionWithSGD requires to tune the step size and # of iteration very carefully. Please try Linear Regression with elastic net implementation in Spark 1.4 in ML framework, which uses quasi newton method and step size will be automatically determined. That implementation also matches the result from R. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Wed, May 27, 2015 at 9:08 PM, Maheshakya Wijewardena mahesha...@wso2.com wrote: Hi, I'm trying to use Sparks' LinearRegressionWithSGD in PySpark with the attached dataset. The code is attached. When I check the model weights vector after training, it contains `nan` values. [nan,nan,nan,nan,nan,nan,nan,nan] But for some data sets, this problem does not occur. What might be the reason for this? Is this an issue with the data I'm using or a bug? Best regards. -- Pruthuvi Maheshakya Wijewardena Software Engineer WSO2 Lanka (Pvt) Ltd Email: mahesha...@wso2.com Mobile: +94711228855 - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Pruthuvi Maheshakya Wijewardena Software Engineer WSO2 Lanka (Pvt) Ltd Email: mahesha...@wso2.com Mobile: +94711228855
Re: Model weights of linear regression becomes abnormal values
Sorry. I mean the parameter step. 2015-05-28 12:21 GMT+08:00 Maheshakya Wijewardena mahesha...@wso2.com: What is the parameter for the learning rate alpha? LinearRegressionWithSGD has only following parameters. @param data: The training data. @param iterations:The number of iterations (default: 100). @param step: The step parameter used in SGD (default: 1.0). @param miniBatchFraction: Fraction of data to be used for each SGD iteration. @param initialWeights:The initial weights (default: None). @param regParam: The regularizer parameter (default: 1.0). @param regType: The type of regularizer used for training our model. Allowed values: l1 for using L1Updater, l2 for using SquaredL2Updater, none for no regularizer. (default: none) @param intercept: Boolean parameter which indicates the use or not of the augmented representation for training data (i.e. whether bias features are activated or not). On Thu, May 28, 2015 at 9:42 AM, 吴明瑜 timbero...@gmail.com wrote: The problem may occur when your algorithm cannot converge. Maybe you can check if the learning rate alpha is too large. Try reducing it. 2015-05-28 12:08 GMT+08:00 Maheshakya Wijewardena mahesha...@wso2.com: Hi, I'm trying to use Sparks' *LinearRegressionWithSGD* in PySpark with the attached dataset. The code is attached. When I check the model weights vector after training, it contains `nan` values. [nan,nan,nan,nan,nan,nan,nan,nan] But for some data sets, this problem does not occur. What might be the reason for this? Is this an issue with the data I'm using or a bug? Best regards. -- Pruthuvi Maheshakya Wijewardena Software Engineer WSO2 Lanka (Pvt) Ltd Email: mahesha...@wso2.com Mobile: +94711228855 - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Mingyu Wu Institute of Parallel and Distributed Systems School of Software Engineering Shanghai Jiao Tong University -- Pruthuvi Maheshakya Wijewardena Software Engineer WSO2 Lanka (Pvt) Ltd Email: mahesha...@wso2.com Mobile: +94711228855 -- Mingyu Wu Institute of Parallel and Distributed Systems School of Software Engineering Shanghai Jiao Tong University
Fwd: Model weights of linear regression becomes abnormal values
Hi, I'm trying to use Sparks' *LinearRegressionWithSGD* in PySpark with the attached dataset. The code is attached. When I check the model weights vector after training, it contains `nan` values. [nan,nan,nan,nan,nan,nan,nan,nan] But for some data sets, this problem does not occur. What might be the reason for this? Is this an issue with the data I'm using or a bug? Best regards. -- Pruthuvi Maheshakya Wijewardena Software Engineer WSO2 Lanka (Pvt) Ltd Email: mahesha...@wso2.com Mobile: +94711228855 6,148,72,35,0,336,627,50,1 1,85,66,29,0,266,351,31,0 8,183,64,0,0,233,672,32,1 1,89,66,23,94,281,167,21,0 0,137,40,35,168,431,2288,33,1 5,116,74,0,0,256,201,30,0 3,78,50,32,88,310,248,26,1 10,115,0,0,0,353,134,29,0 2,197,70,45,543,305,158,53,1 8,125,96,0,0,0,232,54,1 4,110,92,0,0,376,191,30,0 10,168,74,0,0,380,537,34,1 10,139,80,0,0,271,1441,57,0 1,189,60,23,846,301,398,59,1 5,166,72,19,175,258,587,51,1 7,100,0,0,0,300,484,32,1 0,118,84,47,230,458,551,31,1 7,107,74,0,0,296,254,31,1 1,103,30,38,83,433,183,33,0 1,115,70,30,96,346,529,32,1 3,126,88,41,235,393,704,27,0 import sys from pyspark import SparkContext from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD from numpy import array # Load and parse data def parse_point(line): values = [float(x) for x in line.split(',')] return LabeledPoint(values[0], values[1:]) sc = SparkContext(appName='LinearRegression') # Add path to your dataset. data = sc.textFile('dummy_data_sest.csv') parsedData = data.map(parse_point) # Build the model model = LinearRegressionWithSGD.train(parsedData) # Check model weight vector print(model.weights) - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Model weights of linear regression becomes abnormal values
LinearRegressionWithSGD requires to tune the step size and # of iteration very carefully. Please try Linear Regression with elastic net implementation in Spark 1.4 in ML framework, which uses quasi newton method and step size will be automatically determined. That implementation also matches the result from R. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Wed, May 27, 2015 at 9:08 PM, Maheshakya Wijewardena mahesha...@wso2.com wrote: Hi, I'm trying to use Sparks' LinearRegressionWithSGD in PySpark with the attached dataset. The code is attached. When I check the model weights vector after training, it contains `nan` values. [nan,nan,nan,nan,nan,nan,nan,nan] But for some data sets, this problem does not occur. What might be the reason for this? Is this an issue with the data I'm using or a bug? Best regards. -- Pruthuvi Maheshakya Wijewardena Software Engineer WSO2 Lanka (Pvt) Ltd Email: mahesha...@wso2.com Mobile: +94711228855 - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org