2013/1/28 Eustache DIEMERT <[email protected]>: > I would be much interested. > > There are a few initiatives out there to interface VW with python, e.g. > > https://github.com/shilad/PyVowpal > > I haven't tested this one myself but I have the feeling that it's a proof of > concept rather than a long term solution. > > One important point to keep in mind is that the benefit of using VW is super > fast learning of gigantic datasets in streaming (from disk or network) mode. > > I guess 2 issues might need to get addressed : > > 1/ stream api : do we have a mature API to learn from a stream ?
Vowpal Wabbit is super efficient because it manages disk access itself using a dedicated thread that does hashing based vectorization directly from it's own input file format (basically an extension of the svmlight file format) The numpy ecosystem on the other hand is very much based on the assumption that your data is already loaded in memory as numpy arrays or other datastructures built on top of them such as scipy.sparse matrices or pandas DataFrames. Using vowpal wabbit from numpy breaks a lot of the benefits of using VW in the first plance. So a lightweight utility wrapper as done by PyVowpal is probably the right approach. > 2/ options management : how do we expose options in VW itself given that > many are added/udpated on a regular basis ? IMO it's probably best kept as a separate project. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
