Inspired by a conversation a while ago, I invented a byte stuffing
algorithm that has both the expected optimal 0.2% overhead _and_ is
prefix preserving, meaning values that are close to each other in
radix space before encoding stay close together afterwords. The exact
bytes may change, but long runs of the same character on the input
remain long runs on the output. It also requires no lookahead so can
be done on the fly during the JudySL lookup process.

The motivation was partiallly COBS, which is a good byte stuffing
algorithm but requires a 256 byte buffer, making it inpractical on uCs
and the lookahead means that JudySL performance is killed.  444444409
and 444444410  end up begining with 844... and 9444.. respectively
meaning none of the shared run of 4s can be shared when they should
take up adjacent bits in a leaf tree.

I call it cuckoo stuffing, as it is partially inspired by the cuckoo
hashing algorithm, where a hash collision "ejects" the previous value
which then has to be rehashed, with cuckoo stuffing an escape
character occuring in the stream ejects the escape character causing a
new one to be selected via a synchronized PRNG.

details are here
http://notanumber.net/archives/183/cuckoo-byte-stuffing-algorithm

It also is very fast and lean, 7 bytes of memory overhead and
implementable in less than a hundred cpu instructions.

    John

-- 
John Meacham - http://notanumber.net/

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Reply via email to