On Wed, Dec 1, 2010 at 9:50 AM, Anthony <o...@inbox.org> wrote:
> On Wed, Dec 1, 2010 at 10:47 AM, Stefan de Konink <ste...@konink.de> wrote:
>> On Wed, 1 Dec 2010, Anthony wrote:
>>
>>>> Yeah, but your lead basically shows we are talking about more than 10%...
>>>
>>> Yeah, probably, but at the expense of more complicated code, greater
>>> memory usage, etc.
>>
>> The hole process is IO-bound... memory is used anyway to overcome the IO
>> issues...
>
> Not in an embedded system, which is where a small difference like 10%
> is going to matter.

CPU's are going to get faster, for free. Developer time, especially
OSM developer time is severely limited. The community is better served
by having them doing new stuff than coding an overcomplicated format.

>
>>> I'm interested now in seeing how the full history compression goes,
>>> though.  If it can achieve 70, 80, 90% on top of zlib, then it might
>>> be worth embedding the compression as opposed to just using it for
>>> transfer over the Internet.
>>
>> The dictionary is compressed per block, so it greatly depends if the trick
>> works.
>
> 32 megs is a lot better than 900K, though.  900K is how much zlib uses, right?
>

Each fileblock is independently decodable, which means that I have to
reset the dictionary for each fileblock. There are around 100k
fileblocks in the planet, and 13gb uncompressed, which means that the
average fileblock has 130kb of data. gzip has a 32kb or 64kb (?)
window, smaller than the number of bytes in the fileblock. bzip2 has a
window that is 900kb, and LZMA is megabytes.... but lzma's
multimegabyte window doesn't matter, because the compressor is
restarted for each fileblock, every few hundred kilobytes.

The 15% gain you measured between .rawpbf.xz and .pbf  really lets
lzma cheat too much, because it can exploit a window tens of times
larger than it would if integrated.

Could you run your test on a whole planet, or a hack-integration of
LZMA into osmosis?

Scott

_______________________________________________
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev

Reply via email to