Hi Jack.
All sklearn estimators work on numpy arrays or sparse matrices.
I guess the easiest way would be to just use gensim for the feature extraction and then feed the resulting features into sklearn.
Hth,
Andy


On 01/03/2013 10:02 PM, Jack Alan wrote:
Hi all,

I'm working in document classification and I wonder if there is a way of having the feature vector calculated based on Latent Semantic Indexing (LSI) instead of tf or tf-idf. As you know with LSI or Latent Dirichlet Allocation (LDA), semantic features are captured.

I found an online Python library to do so called gensim. The point is, how to merge gensim with sklearn to fullfill the requirement? or any alternatives?

Jack


------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to