Re: compression algorithm!

2009-04-08 Thread Richard Foley
CPAN:

   http://search.cpan.org/search?query=compressionmode=all

-- 
Richard Foley
Ciao - shorter than aufwiedersehen

http://www.rfi.net/

On Wednesday 08 April 2009 07:08:33 abhishek jain wrote:
 Hi Friends,I have a task to discover or search for a compression algorithm
 which compresses even 300 - 400 characters to about at least 200-300%
 compression reducing them to 150 characters.
 
 Is this a possibility, i know it should be,
 I need to research more on this, so if you can please point me to some
 articles or let me know with some code it would be great.
 
 -- 
 Thanks and kind Regards,
 Abhishek jain
 07799 328 727



Re: compression algorithm!

2009-04-08 Thread Paul Makepeace
On Wed, Apr 8, 2009 at 6:08 AM, abhishek jain
abhishek.netj...@gmail.com wrote:
 Hi Friends,I have a task to discover or search for a compression algorithm
 which compresses even 300 - 400 characters to about at least 200-300%
 compression reducing them to 150 characters.

If you have e.g. XML compression ratios of 8:1 (85+% of original size)
aren't unreasonable depending on how verbose the schema  data is. The
problem you're likely to run into with such small file sizes (300-400
chars) is that there's a dictionary overhead the compression algorithm
requires per file.

One solution to this is to concatenate your files and then compress as one.

bzip2 and gzip, both free, work well with text.

Paul



 Is this a possibility, i know it should be,
 I need to research more on this, so if you can please point me to some
 articles or let me know with some code it would be great.

 --
 Thanks and kind Regards,
 Abhishek jain
 07799 328 727



Re: compression algorithm!

2009-04-08 Thread Peter Corlett
On Wed, Apr 08, 2009 at 10:43:34AM +0100, Paul Makepeace wrote:
 abhishek jain abhishek.netj...@gmail.com wrote:
 Hi Friends,I have a task to discover or search for a compression
 algorithm which compresses even 300 - 400 characters to about at least
 200-300% compression reducing them to 150 characters.
[...]
 One solution to this is to concatenate your files and then compress as one.

150 characters suggests that it might be distinct messages rather than
files, that these are to be sent over SMS or some other expensive
transmission medium, and these are to be compressed to reduce costs.
Concatenation would be inappropriate in that case as the messages would not
be sent in a timely manner, and some means of ordering the messages would be
required.

One other feature of general-purpose lossless compression schemes is that
the output size is going to vary depending on the complexity of the input.
This makes it somewhat tricky to say I want to compress losslessly to 150
characters. You can lossily compress to 150 characters in a SMS of course:
it's called txtspeak.

The problem, as stated, is intractible. Rather than struggle along trying to
solve it, one should step back and reconsider that requirement and how some
other part of the greater system could be engineered to make it easier. I'd
consider using UDP datagrams over GPRS, for example.

As a side-note, you can't send 160 eight bit characters in a single SMS as
it uses a seven bit coding. You can send eight and sixteen bit messages, but
you get fewer characters.



Re: compression algorithm!

2009-04-08 Thread abhishek jain

  You can lossily compress to 150 characters in a SMS of course:
 it's called txtspeak.


Do you mean by txtspeak is writing eg. 'brb' for be right back?
No i cannot do so,



 The problem, as stated, is intractible. Rather than struggle along trying
 to
 solve it, one should step back and reconsider that requirement and how some
 other part of the greater system could be engineered to make it easier. I'd
 consider using UDP datagrams over GPRS, for example.


Also i cannt use GPRS.
Any solution one can think of in regards to the compression,
Thanks a lot for replying,
Abhi


Re: compression algorithm!

2009-04-08 Thread Peter Corlett
On Wed, Apr 08, 2009 at 05:39:23PM +0530, abhishek jain wrote:
[...]
 Any solution one can think of in regards to the compression,

Not really, because we don't know what the source data is beyond it being
300-400 characters. Theory states it's impossible to compress all strings
into smaller strings - and wishing otherwise will not make it work in
practice - so you're going to have to narrow it down a bit for us.

Also, what do you mean by characters? Do you mean bytes or some other
encoding? What you're trying to do seems quite muddled and smells doomed to
failure to me.



Re: compression algorithm!

2009-04-08 Thread Peter Corlett
On Wed, Apr 08, 2009 at 10:38:33AM +0530, abhishek jain wrote:
 Hi Friends,I have a task to discover or search for a compression algorithm
 which compresses even 300 - 400 characters to about at least 200-300%
 compression reducing them to 150 characters.

This is not possible in the general case. See section 9 of the
comp.compression FAQ at
http://www.faqs.org/faqs/compression-faq/part1/section-8.html for discussion
and a proof.



Re: compression algorithm!

2009-04-08 Thread Nicholas Clark
On Wed, Apr 08, 2009 at 01:20:14PM +0100, Peter Corlett wrote:

 Also, what do you mean by characters? Do you mean bytes or some other
 encoding? What you're trying to do seems quite muddled and smells doomed to
 failure to me.

I have this feeling the it's not mandated that homework is done with Perl
either. (Nor, necessarily, that the submitter is reading anything other than
direct replies)

Nicholas Clark


Re: compression algorithm!

2009-04-08 Thread Philip Newton
On Wed, Apr 8, 2009 at 07:08, abhishek jain abhishek.netj...@gmail.com wrote:
 Hi Friends,I have a task to discover or search for a compression algorithm
 which compresses even 300 - 400 characters to about at least 200-300%
 compression reducing them to 150 characters.

What kind of text are we talking about?

If it's random data (i.e. all 256 characters are equally possible),
then what you are asking is, of course, impossible.

If it's English text but otherwise not especially repetitive, 50%
compression is probably hard if not impossible to achieve for a
general-purpose compression algorithm.

If it's something repetitive (say, status reports which always start
the same way or always contain certain fixed phrases), then a custom
codebook may be the way to go. (For example, Server 'indigo' has
failed due to: case temperature exceeded maximum permissible
temperature might compress to sift given { s = Server , i =
'indigo', f =  has failed due to: , t = case temperature
exceeded maximum permissible temperature }.)

Cheers,
-- 
Philip Newton philip.new...@gmail.com


Re: compression algorithm!

2009-04-08 Thread Michael Lush

On Wed, 8 Apr 2009, abhishek jain wrote:

Hi Friends,I have a task to discover or search for a compression algorithm
which compresses even 300 - 400 characters to about at least 200-300%
compression reducing them to 150 characters.

Is this a possibility, i know it should be,
I need to research more on this, so if you can please point me to some
articles or let me know with some code it would be great.


Do a sha digest keep the key and plaintext in a internet accessible
database :-)

--
Michael
~~~
Michael John Lush PhD   Tel:44-1223 492626
Bioinformatician 
HUGO Gene Nomenclature Committee	Email: h...@genenames.org

European Bioinformatics Institute
Hinxton, Cambridge
URL: http://www.genenames.org
~~~


Re: compression algorithm!

2009-04-08 Thread David Cantrell
On Wed, Apr 08, 2009 at 10:38:33AM +0530, abhishek jain wrote:

 Hi Friends,I have a task to discover or search for a compression algorithm
 which compresses even 300 - 400 characters to about at least 200-300%
 compression reducing them to 150 characters.

Easy.  You want a Redundant Array of Illiterate Teenagers and some cheap
blingy mobile phones.

-- 
David Cantrell | Bourgeois reactionary pig

  Languages for which ISO-Latin-$n is not necessary, #1 in a series:

Latin