Re: Problem in running MLlib SVM

Fazlan Nazeem Mon, 30 Nov 2015 09:14:33 -0800

You should never use the training data to measure your prediction accuracy.
Always use a fresh dataset (test data) for this purpose.


On Sun, Nov 29, 2015 at 8:36 AM, Jeff Zhang <zjf...@gmail.com> wrote:

> I think this should represent the label of LabledPoint (0 means negative 1
> means positive)
> http://spark.apache.org/docs/latest/mllib-data-types.html#labeled-point
>
> The document you mention is for the mathematical formula, not the
> implementation.
>
> On Sun, Nov 29, 2015 at 9:13 AM, Tarek Elgamal <tarek.elga...@gmail.com>
> wrote:
>
>> According to the documentation
>> <http://spark.apache.org/docs/latest/mllib-linear-methods.html>, by
>> default, if wTx≥0 then the outcome is positive, and negative otherwise. I
>> suppose that wTx is the "score" in my case. If score is more than 0 and the
>> label is positive, then I return 1 which is correct classification and I
>> return zero otherwise. Do you have any idea how to classify a point as
>> positive or negative using this score or another function ?
>>
>> On Sat, Nov 28, 2015 at 5:14 AM, Jeff Zhang <zjf...@gmail.com> wrote:
>>
>>>         if((score >=0 && label == 1) || (score <0 && label == 0))
>>>              {
>>>               return 1; //correct classiciation
>>>              }
>>>              else
>>>               return 0;
>>>
>>>
>>>
>>> I suspect score is always between 0 and 1
>>>
>>>
>>>
>>> On Sat, Nov 28, 2015 at 10:39 AM, Tarek Elgamal <tarek.elga...@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to run the straightforward example of SVm but I am getting
>>>> low accuracy (around 50%) when I predict using the same data I used for
>>>> training. I am probably doing the prediction in a wrong way. My code is
>>>> below. I would appreciate any help.
>>>>
>>>>
>>>> import java.util.List;
>>>>
>>>> import org.apache.spark.SparkConf;
>>>> import org.apache.spark.SparkContext;
>>>> import org.apache.spark.api.java.JavaRDD;
>>>> import org.apache.spark.api.java.function.Function;
>>>> import org.apache.spark.api.java.function.Function2;
>>>> import org.apache.spark.mllib.classification.SVMModel;
>>>> import org.apache.spark.mllib.classification.SVMWithSGD;
>>>> import org.apache.spark.mllib.regression.LabeledPoint;
>>>> import org.apache.spark.mllib.util.MLUtils;
>>>>
>>>> import scala.Tuple2;
>>>> import edu.illinois.biglbjava.readers.LabeledPointReader;
>>>>
>>>> public class SimpleDistSVM {
>>>>   public static void main(String[] args) {
>>>>     SparkConf conf = new SparkConf().setAppName("SVM Classifier
>>>> Example");
>>>>     SparkContext sc = new SparkContext(conf);
>>>>     String inputPath=args[0];
>>>>
>>>>     // Read training data
>>>>     JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc,
>>>> inputPath).toJavaRDD();
>>>>
>>>>     // Run training algorithm to build the model.
>>>>     int numIterations = 3;
>>>>     final SVMModel model = SVMWithSGD.train(data.rdd(), numIterations);
>>>>
>>>>     // Clear the default threshold.
>>>>     model.clearThreshold();
>>>>
>>>>
>>>>     // Predict points in test set and map to an RDD of 0/1 values where
>>>> 0 is misclassication and 1 is correct classification
>>>>     JavaRDD<Integer> classification = data.map(new
>>>> Function<LabeledPoint, Integer>() {
>>>>          public Integer call(LabeledPoint p) {
>>>>            int label = (int) p.label();
>>>>            Double score = model.predict(p.features());
>>>>            if((score >=0 && label == 1) || (score <0 && label == 0))
>>>>            {
>>>>            return 1; //correct classiciation
>>>>            }
>>>>            else
>>>>             return 0;
>>>>
>>>>          }
>>>>        }
>>>>      );
>>>>     // sum up all values in the rdd to get the number of correctly
>>>> classified examples
>>>>      int sum=classification.reduce(new Function2<Integer, Integer,
>>>> Integer>()
>>>>     {
>>>>     public Integer call(Integer arg0, Integer arg1)
>>>>     throws Exception {
>>>>     return arg0+arg1;
>>>>     }});
>>>>
>>>>      //compute accuracy as the percentage of the correctly classified
>>>> examples
>>>>      double accuracy=((double)sum)/((double)classification.count());
>>>>      System.out.println("Accuracy = " + accuracy);
>>>>
>>>>         }
>>>>       }
>>>>     );
>>>>   }
>>>> }
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Thanks & Regards,

Fazlan Nazeem

*Software Engineer*

*WSO2 Inc*
Mobile : +94772338839
<%2B94%20%280%29%20773%20451194>
fazl...@wso2.com

Re: Problem in running MLlib SVM

Reply via email to