Hi, I'm trying to use Sparks' *LinearRegressionWithSGD* in PySpark with the attached dataset. The code is attached. When I check the model weights vector after training, it contains `nan` values.
[nan,nan,nan,nan,nan,nan,nan,nan] But for some data sets, this problem does not occur. What might be the reason for this? Is this an issue with the data I'm using or a bug? Best regards. -- Pruthuvi Maheshakya Wijewardena Software Engineer WSO2 Lanka (Pvt) Ltd Email: mahesha...@wso2.com Mobile: +94711228855
6,148,72,35,0,336,627,50,1 1,85,66,29,0,266,351,31,0 8,183,64,0,0,233,672,32,1 1,89,66,23,94,281,167,21,0 0,137,40,35,168,431,2288,33,1 5,116,74,0,0,256,201,30,0 3,78,50,32,88,310,248,26,1 10,115,0,0,0,353,134,29,0 2,197,70,45,543,305,158,53,1 8,125,96,0,0,0,232,54,1 4,110,92,0,0,376,191,30,0 10,168,74,0,0,380,537,34,1 10,139,80,0,0,271,1441,57,0 1,189,60,23,846,301,398,59,1 5,166,72,19,175,258,587,51,1 7,100,0,0,0,300,484,32,1 0,118,84,47,230,458,551,31,1 7,107,74,0,0,296,254,31,1 1,103,30,38,83,433,183,33,0 1,115,70,30,96,346,529,32,1 3,126,88,41,235,393,704,27,0
import sys from pyspark import SparkContext from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD from numpy import array # Load and parse data def parse_point(line): values = [float(x) for x in line.split(',')] return LabeledPoint(values[0], values[1:]) sc = SparkContext(appName='LinearRegression') # Add path to your dataset. data = sc.textFile('dummy_data_sest.csv') parsedData = data.map(parse_point) # Build the model model = LinearRegressionWithSGD.train(parsedData) # Check model weight vector print(model.weights)
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org