HI Alexandre,

Thank you very much for your help. This is absolutely the thing that fits
my problem. Your help is very appreciate.
I am also running the sampling method as Robert suggested.
I will try with block version, and compare the results. Then, I will let
all you guys know the results as soon as possible.

Regards,



On Thu, May 9, 2013 at 12:18 PM, Alexandre ABRAHAM <
[email protected]> wrote:

> Hi Bao,
>
> Sorry for late reply, I've set up some code yesterday evening and my post
> got blocked because of its size. The code is really simple and I kept the
> scikit formalism so if you lookes at the scikit function, this should be
> familiar to you.
>
> Gist : https://gist.github.com/AlexandreAbraham/5544803
>
> Methods :
> - *_slow : these functions implement the "compute distance on the fly
> method".
> - *_block : the smarter method. Basically, distance matrices are computed
> per cluster.
>
> Benches:
> - small data (look at the main of the gist) :
>     Scikit silhouette (1s): -0.002484
>     Slow silhouette (154s): -0.002484
>     Block silhouette (2s): -0.002484
> - big data (X = np.random.random((20000, 1000)), y =
> np.repeat(np.arange(100), 200)):
>     Scikit silhouette (585.857552s): -0.003101, memory usage: about 4GB
>     Block silhouette (633.306765s): -0.003101, memory usage: about 200MB
>
> Conclusion:
> - you should *not* use the slow version. It is deadly slow.
> - block method is a little slower but uses far less memory. This,
> obviously, depends on your cluster sizes.
>
> I would advise you to try the block version and, if your data do not fit
> in memory, then try sampling as Robert said (this option is available with
> the block approach in my code).
>
> Alexandre.
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and
> their applications. This 200-page book is written by three acclaimed
> leaders in the field. The early access version is available now.
> Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Nguyen Thien Bao

NeuroInformatics Laboratory (NILab),
Fondazione Bruno Kessler (FBK), Trento, Italy
Centro Interdipartimentale Mente e Cervello (CIMeC)
Universit`a degli Studi di Trento, Italy
Email: [email protected]  or  [email protected]
Cellphone: +39.345.293.1006 (Italy)
Cellphone: +84.996.352.452 (VietNam)
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to