[ 
https://issues.apache.org/jira/browse/MAHOUT-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhao zhendong updated MAHOUT-232:
---------------------------------

    Affects Version/s: 0.1
               Status: Patch Available  (was: Open)

Sequential SVM based on Pegasos.
-------------------------------------------------------------------------------------------
Currently, this package provides (Features):
-------------------------------------------------------------------------------------------

1. Sequential SVM linear solver, include training and testing.

2. It supports general file system right now, it means that HDFS supporting 
will be a near future work.

3. Supporting large-scale data set. ( need to assign the argument 
"trainSampleNum" )
   Because of the Pegasos only need to sample certain samples, this package 
supports to pre-fetch
   the certain size (e.g. max iteration) of samples to memory.
   For example: if the size of data set has 100,000,000 samples, due to the 
default maximum iteration is 10,000,
   as the result, this package only randomly loads 10,000 samples to memory. 

-------------------------------------------------------------------------------------------
TODO:
-------------------------------------------------------------------------------------------
1. Supporting HDFS;

2. Because of adopting mahout.math.SparseMatrix and 
mahout.math.SparseVectorUnsafe,
   I must assign the cardinality of matrix while create them. It's not easy for 
reading
   the data set with the format of SVM-light or libsvm, which are very popular 
in
   Machine learning community. Such dataset does not store the number of 
samples and
   the size of dimension.
   Currently, I still use a stupid method to read the data to map<> first,
   then dump the data to SparseMatrix.
   Does any one know some smart methods or other matrix to support such 
operation?

-------------------------------------------------------------------------------------------
Usage:
-------------------------------------------------------------------------------------------
Training:
SVMPegasosTraining.java
I have hard encoded the arguments in this file, if you want to custom the 
arguments by youself, please uncomment the first line in main function. 
The default argument is:
-tr ../examples/src/test/resources/svmdataset/train.dat -m 
../examples/src/test/resources/svmdataset/SVM.model


Testing:
SVMPegasosTesting.java
I have hard encoded the arguments in this file, if you want to custom the 
arguments by youself, please uncomment the first line in main function.
The default argument is:
-te ../examples/src/test/resources/svmdataset/test.dat -m 
../examples/src/test/resources/svmdataset/SVM.model

> Implementation of sequential SVM solver based on Pegasos
> --------------------------------------------------------
>
>                 Key: MAHOUT-232
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-232
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.1
>            Reporter: zhao zhendong
>
> After discussed with guys in this community, I decided to re-implement a 
> Sequential SVM solver based on Pegasos  for Mahout platform (mahout command 
> line style,  SparseMatrix and SparseVector etc.) , Eventually, it will 
> support HDFS. 
> The plan of Sequential Pegasos:
> 1 Supporting the general file system ( almost finished );
> 2 Supporting HDFS;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to