Data compression is more a distance metric than a specific algorithm. A
data compression algorithm generally learns some key K of patterns in a
document D, then uses K to compress D.
The intuition behind using data compression methods for machine learning is
that if we learn K from one document, and use it to compress another
document, then the compression will be better (i.e. smaller size) if the
two documents are similar.
The answer to the question is no -- we don't have anything like that in
scikit-learn.
>From here <http://docs.python.org/2/library/zlib.html#module-zlib>, I don't
think we have "easy" access to be able to do this. If there is another way,
this would be a good addition to scikit-learn :)
On 8 July 2013 01:56, Olivier Grisel <[email protected]> wrote:
> 2013/7/7 Anubhab Baksi <[email protected]>:
> > Hi,
> > I searched for several Data Compression codes in Scikit learn, but I
> could
> > not find it.
> >
> > Now, can anybody please tell me, are these really implemented?
>
> I am not sure what you really mean by such a generic term as "Data
> Compression" but if this is a name for a specific machine learning
> algorithm then it is not implemented in scikit-learn. BTW most machine
> learning models can be interpreted as doing some sort of lossy
> training set compression but the purpose is generally not to be able
> to "uncompress" the model later to recover the training set but rather
> to use the statistical summary of the training set to be able to make
> useful predictions on any future test set assuming they share the same
> statistical distribution.
>
> If you are interested in lossless compression algorithms in Python
> then you should rather use the gzip or bz2 modules of the standard
> lib:
>
> http://docs.python.org/2/library/archiving.html
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
--
Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general