Hi,

I'm trying to use Sparks' *LinearRegressionWithSGD* in PySpark with the
attached dataset. The code is attached. When I check the model weights
vector after training, it contains `nan` values.

[nan,nan,nan,nan,nan,nan,nan,nan]

But for some data sets, this problem does not occur. What might be the
reason for this?
Is this an issue with the data I'm using or a bug?

Best regards.

-- 
Pruthuvi Maheshakya Wijewardena
Software Engineer
WSO2 Lanka (Pvt) Ltd
Email: mahesha...@wso2.com
Mobile: +94711228855
6,148,72,35,0,336,627,50,1
1,85,66,29,0,266,351,31,0
8,183,64,0,0,233,672,32,1
1,89,66,23,94,281,167,21,0
0,137,40,35,168,431,2288,33,1
5,116,74,0,0,256,201,30,0
3,78,50,32,88,310,248,26,1
10,115,0,0,0,353,134,29,0
2,197,70,45,543,305,158,53,1
8,125,96,0,0,0,232,54,1
4,110,92,0,0,376,191,30,0
10,168,74,0,0,380,537,34,1
10,139,80,0,0,271,1441,57,0
1,189,60,23,846,301,398,59,1
5,166,72,19,175,258,587,51,1
7,100,0,0,0,300,484,32,1
0,118,84,47,230,458,551,31,1
7,107,74,0,0,296,254,31,1
1,103,30,38,83,433,183,33,0
1,115,70,30,96,346,529,32,1
3,126,88,41,235,393,704,27,0
import sys
from pyspark import SparkContext
from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD
from numpy import array

# Load and parse data
def parse_point(line):
    values = [float(x) for x in line.split(',')]
    return LabeledPoint(values[0], values[1:])

sc = SparkContext(appName='LinearRegression')
# Add path to your dataset.
data = sc.textFile('dummy_data_sest.csv')
parsedData = data.map(parse_point)

# Build the model
model = LinearRegressionWithSGD.train(parsedData)

# Check model weight vector
print(model.weights)
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to