Re: support vector machine does not classify properly?

2016-02-14 Thread prem09




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/support-vector-machine-does-not-classify-properly-tp26216p26223.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



support vector machine does not classify properly?

2016-02-12 Thread prem09
Hi,
I created a dataset of 100 points, ranging from X=1.0 to to X=100.0. I let
the y variable be 0.0 if X < 51.0 and 1.0 otherwise. I then fit a
SVMwithSGD. When I predict the y values for the same values of X as in the
sample, I get back 1.0 for each predicted y! 

Incidentally, I don't get perfect separation when I replace SVMwithSGD with
LogisticRegressionWithSGD or NaiveBayes.

Here's the code:


import sys
from pyspark import SparkContext
from pyspark.mllib.classification import LogisticRegressionWithSGD,
LogisticRegressionModel
from pyspark.mllib.classification import NaiveBayes, NaiveBayesModel
from pyspark.mllib.classification import SVMWithSGD, SVMModel
from pyspark.mllib.regression import LabeledPoint
import numpy as np

# Load a text file and convert each line to a tuple.
sc=SparkContext(appName="Prem")

# Load and parse the data
def parsePoint(line):
values = [float(x) for x in line.split('\t')]
return LabeledPoint(values[0], values[1:])

data = sc.textFile("c:/python27/classifier.txt")
parsedData = data.map(parsePoint)
print parsedData

# Build the model
model = SVMWithSGD.train(parsedData, iterations=100)
model.setThreshold(0.5)
print model

### Build the model
##model = LogisticRegressionWithSGD.train(parsedData, iterations=100,
intercept=True)
##print model

### Build the model
##model = NaiveBayes.train(parsedData)
##print model

for i in range(100):
print i+1, model.predict(np.array([float(i+1)]))

=

Incidentally, the weights I observe in MLlib are 0.8949991, while if I run
it using the scikit-learn library version of support vector machine, I get
0.05417109. Is this indicative of the problem?
Can you please let me know what I am doing wrong?

Thanks,
Prem



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/support-vector-machine-does-not-classify-properly-tp26216.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org