If you do want to stick with SVM-
This is a question that I keep coming back myself to and unfortunately
have forgotten more (and lost more literature) than I’ve retained.
I believe that the most easily parallelizable sections of libSVM for
small
datasets are (for C-SVC(R), RBF and polynomial kernels):
1. The Kernel projections
2. The Hyper-Paramater grid search for C, \gamma (I believe
this
is now included in LibSVM- I havent looked at it in a while)
3. For multi-class SVC: the concurrent computation of each
SVM for
each one-against-one class vote.
I’m unfamiliar with any easily parallizable method for QP itself.
Unfortunately for (2), (3) this involves broadcasting the entire
dataset
out to each node of a cluster (or working in a shared memory
environment)
so may not be practical depending on the size of your data set.
I’ve only
ever implemented (2) for relatively small datasets using MPI and and
a with
pure java socket implementation.
Other approaches (further from simple LibSVM), which are more
applicable
to large datasets (I’m less familiar with these):
4. Divide and conquer the QP/SMO problem and solve (As I’ve
said,
I’m unfamiliar with this and I don’t know of any standard)
5. Break the training set into subsets and solve.
For (5) there are several approaches, two that I know of are ensemble
approaches and those that accumulate Support Vectors from each
partition
and heuristically keep/reject them until the model converges. As
well I’ve
read some just read some research on implementing this in map a
MapReduce
style[2].
I came across this paper [1] last night which you may find
interesting as
well which is an interesting comparison of some SVM parallelization
strategies, particularly it discusses (1) for a shared memory
environment
and for offloading work to GPUs (using OpenMP and CUDA). It also cites
several other nice papers discussing SVM parallelization strategies
especially for (5). Also then goes on to discuss more purely linear
algebra approach to optimizing SVMs (sec. 5)
Also regarding (5) you may be interested in [2] (something I’ve only
looked over briefly).
[1] http://arxiv.org/pdf/1404.1066v1.pdf
[2] http://arxiv.org/pdf/1301.0082.pdf
From: ted.dunn...@gmail.com
Date: Tue, 21 Oct 2014 17:32:22 -0700
Subject: Re: Any idea which approaches to non-liniear svm are easily
parallelizable?
To: dev@mahout.apache.org
Last I heard, the best methods pre-project and do linear SVM.
Beyond that, I would guess that deep learning techniques would subsume
non-linear SVM pretty easily. The best parallel implementation I know
for
that is in H2O.
On Tue, Oct 21, 2014 at 4:12 PM, Dmitriy Lyubimov <dlie...@gmail.com>
wrote:
in particular, from libSVM --
http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf ?
thanks.
-d