2013/1/28 Eustache DIEMERT <[email protected]>:
> I would be much interested.
>
> There are a few initiatives out there to interface VW with python, e.g.
>
> https://github.com/shilad/PyVowpal
>
> I haven't tested this one myself but I have the feeling that it's a proof of
> concept rather than a long term solution.
>
> One important point to keep in mind is that the benefit of using VW is super
> fast learning of gigantic datasets in streaming (from disk or network) mode.
>
> I guess 2 issues might need to get addressed :
>
> 1/ stream api : do we have a mature API to learn from a stream ?

Vowpal Wabbit is super efficient because it manages disk access itself
using a dedicated thread that does hashing based vectorization
directly from it's own input file format (basically an extension of
the svmlight file format)

The numpy ecosystem on the other hand is very much based on the
assumption that your data is already loaded in memory as numpy arrays
or other datastructures built on top of them such as scipy.sparse
matrices or pandas DataFrames.

Using vowpal wabbit from numpy breaks a lot of the benefits of using
VW in the first plance. So a lightweight utility wrapper as done by
PyVowpal is probably the right approach.

> 2/ options management : how do we expose options in VW itself given that
> many are added/udpated on a regular basis ?

IMO it's probably best kept as a separate project.

--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to