On Tue, Dec 10, 2013 at 12:26 AM, Heikki Linnakangas < hlinnakan...@vmware.com> wrote:
> On 12/09/2013 11:34 AM, Alexander Korotkov wrote: > >> On Mon, Dec 9, 2013 at 1:18 PM, Heikki Linnakangas >> <hlinnakan...@vmware.com>wrote: >> >> Even if we use varbyte encoding, I wonder if it would be better to treat >>> block + offset number as a single 48-bit integer, rather than encode them >>> separately. That would allow the delta of two items on the same page to >>> be >>> stored as a single byte, rather than two bytes. Naturally it would be a >>> loss on other values, but would be nice to see some kind of an analysis >>> on >>> that. I suspect it might make the code simpler, too. >>> >> >> Yeah, I had that idea, but I thought it's not a better option. Will try to >> do some analysis. >> > > The more I think about that, the more convinced I am that it's a good > idea. I don't think it will ever compress worse than the current approach > of treating block and offset numbers separately, and, although I haven't > actually tested it, I doubt it's any slower. About the same amount of > arithmetic is required in both versions. > > Attached is a version that does that. Plus some other minor cleanup. > > (we should still investigate using a completely different algorithm, > though) Yes, when I though about that, I didn't realize that we can reserve less than 16 bits for offset number. I rerun my benchmark and got following results: event | period -----------------------+----------------- index_build | 00:01:46.39056 index_build_recovery | 00:00:05 index_update | 00:06:01.557708 index_update_recovery | 00:01:23 search_new | 00:24:05.600366 search_updated | 00:25:29.520642 (6 rows) label | blocks_mark ----------------+------------- search_new | 847509920 search_updated | 883789826 (2 rows) label | size ---------------+----------- new | 364560384 after_updates | 642736128 (2 rows) Speed is same while index size is less. In previous format it was: label | size ---------------+----------- new | 419299328 after_updates | 715915264 (2 rows) Good optimization, thanks. I'll try another datasets but I expect similar results. However, patch didn't apply to head. Corrected version is attached. ------ With best regards, Alexander Korotkov.
gin-packed-postinglists-20.patch.gz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers