The data is in LIBSVM format. So this line won't work:
values = [float(s) for s in line.split(' ')]
Please use the util function in MLUtils to load it as an RDD of LabeledPoint.
http://spark.apache.org/docs/latest/mllib-data-types.html#labeled-point
from pyspark.mllib.util import MLUtils
Can you please suggest sample data for running the logistic_regression.py?
I am trying to use a sample data file at
https://github.com/apache/spark/blob/master/data/mllib/sample_linear_regression_data.txt
I am running this on CDH5.2 Quickstart VM.
[cloudera@quickstart mllib]$ spark-submit