Re: compression algorithm!
CPAN: http://search.cpan.org/search?query=compressionmode=all -- Richard Foley Ciao - shorter than aufwiedersehen http://www.rfi.net/ On Wednesday 08 April 2009 07:08:33 abhishek jain wrote: Hi Friends,I have a task to discover or search for a compression algorithm which compresses even 300 - 400 characters to about at least 200-300% compression reducing them to 150 characters. Is this a possibility, i know it should be, I need to research more on this, so if you can please point me to some articles or let me know with some code it would be great. -- Thanks and kind Regards, Abhishek jain 07799 328 727
Re: compression algorithm!
On Wed, Apr 8, 2009 at 6:08 AM, abhishek jain abhishek.netj...@gmail.com wrote: Hi Friends,I have a task to discover or search for a compression algorithm which compresses even 300 - 400 characters to about at least 200-300% compression reducing them to 150 characters. If you have e.g. XML compression ratios of 8:1 (85+% of original size) aren't unreasonable depending on how verbose the schema data is. The problem you're likely to run into with such small file sizes (300-400 chars) is that there's a dictionary overhead the compression algorithm requires per file. One solution to this is to concatenate your files and then compress as one. bzip2 and gzip, both free, work well with text. Paul Is this a possibility, i know it should be, I need to research more on this, so if you can please point me to some articles or let me know with some code it would be great. -- Thanks and kind Regards, Abhishek jain 07799 328 727
Re: compression algorithm!
On Wed, Apr 08, 2009 at 10:43:34AM +0100, Paul Makepeace wrote: abhishek jain abhishek.netj...@gmail.com wrote: Hi Friends,I have a task to discover or search for a compression algorithm which compresses even 300 - 400 characters to about at least 200-300% compression reducing them to 150 characters. [...] One solution to this is to concatenate your files and then compress as one. 150 characters suggests that it might be distinct messages rather than files, that these are to be sent over SMS or some other expensive transmission medium, and these are to be compressed to reduce costs. Concatenation would be inappropriate in that case as the messages would not be sent in a timely manner, and some means of ordering the messages would be required. One other feature of general-purpose lossless compression schemes is that the output size is going to vary depending on the complexity of the input. This makes it somewhat tricky to say I want to compress losslessly to 150 characters. You can lossily compress to 150 characters in a SMS of course: it's called txtspeak. The problem, as stated, is intractible. Rather than struggle along trying to solve it, one should step back and reconsider that requirement and how some other part of the greater system could be engineered to make it easier. I'd consider using UDP datagrams over GPRS, for example. As a side-note, you can't send 160 eight bit characters in a single SMS as it uses a seven bit coding. You can send eight and sixteen bit messages, but you get fewer characters.
Re: compression algorithm!
You can lossily compress to 150 characters in a SMS of course: it's called txtspeak. Do you mean by txtspeak is writing eg. 'brb' for be right back? No i cannot do so, The problem, as stated, is intractible. Rather than struggle along trying to solve it, one should step back and reconsider that requirement and how some other part of the greater system could be engineered to make it easier. I'd consider using UDP datagrams over GPRS, for example. Also i cannt use GPRS. Any solution one can think of in regards to the compression, Thanks a lot for replying, Abhi
Re: compression algorithm!
On Wed, Apr 08, 2009 at 05:39:23PM +0530, abhishek jain wrote: [...] Any solution one can think of in regards to the compression, Not really, because we don't know what the source data is beyond it being 300-400 characters. Theory states it's impossible to compress all strings into smaller strings - and wishing otherwise will not make it work in practice - so you're going to have to narrow it down a bit for us. Also, what do you mean by characters? Do you mean bytes or some other encoding? What you're trying to do seems quite muddled and smells doomed to failure to me.
Re: compression algorithm!
On Wed, Apr 08, 2009 at 10:38:33AM +0530, abhishek jain wrote: Hi Friends,I have a task to discover or search for a compression algorithm which compresses even 300 - 400 characters to about at least 200-300% compression reducing them to 150 characters. This is not possible in the general case. See section 9 of the comp.compression FAQ at http://www.faqs.org/faqs/compression-faq/part1/section-8.html for discussion and a proof.
Re: compression algorithm!
On Wed, Apr 08, 2009 at 01:20:14PM +0100, Peter Corlett wrote: Also, what do you mean by characters? Do you mean bytes or some other encoding? What you're trying to do seems quite muddled and smells doomed to failure to me. I have this feeling the it's not mandated that homework is done with Perl either. (Nor, necessarily, that the submitter is reading anything other than direct replies) Nicholas Clark
Re: compression algorithm!
On Wed, Apr 8, 2009 at 07:08, abhishek jain abhishek.netj...@gmail.com wrote: Hi Friends,I have a task to discover or search for a compression algorithm which compresses even 300 - 400 characters to about at least 200-300% compression reducing them to 150 characters. What kind of text are we talking about? If it's random data (i.e. all 256 characters are equally possible), then what you are asking is, of course, impossible. If it's English text but otherwise not especially repetitive, 50% compression is probably hard if not impossible to achieve for a general-purpose compression algorithm. If it's something repetitive (say, status reports which always start the same way or always contain certain fixed phrases), then a custom codebook may be the way to go. (For example, Server 'indigo' has failed due to: case temperature exceeded maximum permissible temperature might compress to sift given { s = Server , i = 'indigo', f = has failed due to: , t = case temperature exceeded maximum permissible temperature }.) Cheers, -- Philip Newton philip.new...@gmail.com
Re: compression algorithm!
On Wed, 8 Apr 2009, abhishek jain wrote: Hi Friends,I have a task to discover or search for a compression algorithm which compresses even 300 - 400 characters to about at least 200-300% compression reducing them to 150 characters. Is this a possibility, i know it should be, I need to research more on this, so if you can please point me to some articles or let me know with some code it would be great. Do a sha digest keep the key and plaintext in a internet accessible database :-) -- Michael ~~~ Michael John Lush PhD Tel:44-1223 492626 Bioinformatician HUGO Gene Nomenclature Committee Email: h...@genenames.org European Bioinformatics Institute Hinxton, Cambridge URL: http://www.genenames.org ~~~
Re: compression algorithm!
On Wed, Apr 08, 2009 at 10:38:33AM +0530, abhishek jain wrote: Hi Friends,I have a task to discover or search for a compression algorithm which compresses even 300 - 400 characters to about at least 200-300% compression reducing them to 150 characters. Easy. You want a Redundant Array of Illiterate Teenagers and some cheap blingy mobile phones. -- David Cantrell | Bourgeois reactionary pig Languages for which ISO-Latin-$n is not necessary, #1 in a series: Latin