Hi Bao,
The Silhouette Function hasn't been written with this type of scalability
in mind.
It requires a pairwise distance matrix, which is prohibitive (as others
have said).
If the number of clusters is low, sampling should give you a good
approximation of the silhouette score, although I can't offer any
mathematical guarantees on this.
Thanks,
- Robert
On 8 May 2013 06:12, Bao Thien <[email protected]> wrote:
> Thank Ronnle,
>
> But the data size is 300K, then the memory requirement is about
> 300Kx300K~90.000MB~90GB. It is impossible to upgrade the ram :(.
> Do you have any other suggestion?
>
> Regards,
>
>
>
>
> On Tue, May 7, 2013 at 10:06 PM, Ronnie Ghose <[email protected]>wrote:
>
>> ....can you just get more ram?
>> On May 7, 2013 2:42 PM, "Bao Thien" <[email protected]> wrote:
>>
>>> I run a clustering algorithm and want to evaluate the result by using
>>> silhouette score in scikit-learn. But in the scikit-learn, it needs to
>>> calculate the distance matrix: distances = pairwise_distances(X,
>>> metric=metric, **kwds)
>>>
>>> Due to the fact that my data is order of 300K, and my memory is 2GB,
>>> leading to the result that out of memory.
>>>
>>> Does anyone know how to overcome this problem or not? Thank you for your
>>> help.
>>>
>>>
>>> --
>>> Nguyen Thien Bao
>>>
>>> NeuroInformatics Laboratory (NILab),
>>> Fondazione Bruno Kessler (FBK), Trento, Italy
>>> Centro Interdipartimentale Mente e Cervello (CIMeC)
>>> Universit`a degli Studi di Trento, Italy
>>> Email: [email protected] or [email protected]
>>> Cellphone: +39.345.293.1006 (Italy)
>>> Cellphone: +84.996.352.452 (VietNam)
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Learn Graph Databases - Download FREE O'Reilly Book
>>> "Graph Databases" is the definitive new guide to graph databases and
>>> their applications. This 200-page book is written by three acclaimed
>>> leaders in the field. The early access version is available now.
>>> Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and
>> their applications. This 200-page book is written by three acclaimed
>> leaders in the field. The early access version is available now.
>> Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> --
> Nguyen Thien Bao
>
> NeuroInformatics Laboratory (NILab),
> Fondazione Bruno Kessler (FBK), Trento, Italy
> Centro Interdipartimentale Mente e Cervello (CIMeC)
> Universit`a degli Studi di Trento, Italy
> Email: [email protected] or [email protected]
> Cellphone: +39.345.293.1006 (Italy)
> Cellphone: +84.996.352.452 (VietNam)
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and
> their applications. This 200-page book is written by three acclaimed
> leaders in the field. The early access version is available now.
> Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
--
Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general