On Tue, Mar 20, 2012 at 5:58 PM, Olivier Grisel <[email protected]> wrote: > Le 20 mars 2012 22:07, James Bergstra <[email protected]> a écrit : >> So recently I wrote this code: >> https://github.com/jaberg/asgd/blob/early_stopping/asgd/linsvm.py >> >> My intent with this class was to provide a sklearn-like interface to >> train linear SVMs, but which would have automatic selection logic to >> handle various problem dimensions, which call for different >> algorithms: >> * if you have more features than examples, you should use a >> gram-matrix algorithm, > > Are you sure? Even for 100k sparse features for 20k text documents? > That would not fit in memory if you use a dense Gram matrix, and I > have never seen any linear models fitted for high dim sparse data that > used precomputed Grams. >
Good point, feature sparsity is another important consideration. >> * if you don't then you should use an sgd-type algorithm >> * if you have more than two classes, you should use a larank-type >> algorithm (i think?), but ... > > @mblondel is planning to work on a LaSVM. I wonder if LaRank shares > some design (I have not re-read the paper recently). > I might be mis-using terminology, I meant to refer to the multi-class margin defined by the difference between the correct label and the best-among-incorrect-labels. > Contributing Polyak-Averaging as implemented in @npinto asgd to the > sklearn SGD cython code + early stopping and robust heuristic for > switching from the pure SGD to the ASGD model would indeed be a great > contrib to the project :) Definitely. I think this has been done already, but I'm not sure where the code is, or whether it's finished. I'll try to get back to the list about that. > Automated model switching implemented as a meta estimator that would > route the data to the right algorithm on the other hand should be > motivated by extensive testing on a large number of realistic datasets > IMHO. Furthermore the numerous hyperparameters of the underlying > models my not work well with the scikit-learn flat is better than > nested philosophy... This is all true, but how many algorithm-specific hyper-parameters are there for a linear svm? There's the cache size, the trade-off point between asgd and sgd... these shouldn't affect the solution, so you wouldn't choose them by cross-validation anyway. There are constants related to the stopping criterion which I think might actually be common between different implementations. I agree that flat is better than nested, but... convenient is better than annoying too! I think that this might be an instance where sklearn can take care of details that no one should have to think about. If the logic of picking a solver is overly complex though, then I'd be surprised, and I'd say forget it. Anyway, I'll keep using the code I linked for now and maybe once it has been hardened some I'll send a PR. - James ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
