Hi Jack.
All sklearn estimators work on numpy arrays or sparse matrices.
I guess the easiest way would be to just use gensim for the feature
extraction and then feed the resulting features into sklearn.
Hth,
Andy
On 01/03/2013 10:02 PM, Jack Alan wrote:
Hi all,
I'm working in document classification and I wonder if there is a way
of having the feature vector calculated based on Latent Semantic
Indexing (LSI) instead of tf or tf-idf. As you know with LSI or Latent
Dirichlet Allocation (LDA), semantic features are captured.
I found an online Python library to do so called gensim. The point is,
how to merge gensim with sklearn to fullfill the requirement? or any
alternatives?
Jack
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general