Anyone know anything about detecting sub-strings in strings? If you want to reply off the RB list that's fine, but on my ElfData list is a good place to reply.
I've finally decided on a good decompressor code. I'm writing some compression code for my plugin, you see. It's pretty cool actually! I got so many optimisations over the standard "deflate" algorithm. Mine for example can even compress strings of 2 bytes down to 1 byte. Deflate can only compress strings of 4 bytes minimum, and even then it's harder to pull off, as there is an increased overhead from breaking last escaped block, and then having to write another escaped block header. Mine has a minimum 1 byte length for an escaped block header, whereas deflate has a 2 byte escaped block header. But my escaped blocks are variable, due to some clever carry and multiply code. Anyhow, it's a far more sophisticated decompressor, yet only written in 1 page of code (if you include the inlined functions), and it's safe and fast. It can't be hacked to crash by reading unallocated RAM, for example. So it's good for apps which need security, we wouldn't want someone to "decompress" variables off the stack now would we :). Of course ElfData must do everything correctly so that's why I paid attention to details like this. I'm going to add entropy encoding, later, as a final stage, to compress things further. And experiment with BOCU compression for UTF-8, and byte-math-difference based code (similar idea to BOCU) for non-UTF-8. This basically means I can acheive tighter compression due to a tighter syntax, and also compress patterns that other compressors would have to ignore, increasing the compression more! Well, that's all very well, but it's only the decompressor :) I didn't write the compressor yet. I don't know anything about pattern recognition theory. In theory, I could just dump all the strings found, into an ElfDataDictionary... but that would be a lot of strings. And I'm not so sure about how it would find suffixes. To dump everything from a sliding window of 1-257 bytes into an ElfDataDictionary would be ridiculously slow and require huge amounts of RAM, making it basically like a PPM compressor. It would take about 66,000 setting/getting of dictionary keys, per byte! So I can't do that. Obviously, there has to be a faster more practical way, but I'm not familiar with it yet. I'm sure I can see (see, meaning invent) some good optimisations over the standard common practice, though :) Any tips or pointers on recognising repeating sub-strings are much appreciated! -- http://elfdata.com/plugin/ "String processing, done right" _______________________________________________ Unsubscribe or switch delivery mode: <http://www.realsoftware.com/support/listmanager/> Search the archives: <http://support.realsoftware.com/listarchives/lists.html>
