2014/1/23 Maheshakya Wijewardena <[email protected]>:
> Hi
>
> As I think, using sparse data we can enhance the descriptiveness of the data
> while keeping its' smaller compared to the dense data without loosing
> information.

I don't understand what you mean by "sparse data we can enhance the
descriptiveness of the data".

> I will try using sparse data on 20newsgroups data and let you know the
> results.

What do you mean? 20newsgroups data is inherently sparse in the sense
as extracted BoW features are mostly zero valued. The problem is that
the current implementation of Decision Trees requires a dense
*representation* of that sparse data to work. To make Decision Trees
work on a spase representation (e.g. a CSC sparse matrix) would
require to re-implement a lot of the code.

> Arnaud,
> I've gone through those messages and I've already started working on
> patches. Last year I've done a project of a module in our university. It was
> to implement Bagging in Scikit-learn. As Gilles had already begun that, I
> was not able to get my code merged. Moreover I have not implemented feature
> bootstrapping as it was beyond the scope of my original proposal to the
> project.
> https://github.com/maheshakya/scikit-learn/blob/bagging2/sklearn/ensemble/bagging.py
>
> I would appreciate if you can review and give some feedback on my
> implementation and what can I do further.

I don't really see the point in spending time reviewing past
alternative implementations of existing features. There are already
129 pull requests that need reviewer's time:

  https://github.com/scikit-learn/scikit-learn/pulls

In my opinion it would be much more productive to fix bugs in the
current code base.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to