Re: [HACKERS] GIN improvements part 1: additional information

Alexander Korotkov Tue, 10 Dec 2013 11:45:26 -0800

On Tue, Dec 10, 2013 at 12:26 AM, Heikki Linnakangas <
[email protected]> wrote:


> On 12/09/2013 11:34 AM, Alexander Korotkov wrote:
>
>> On Mon, Dec 9, 2013 at 1:18 PM, Heikki Linnakangas
>> <[email protected]>wrote:
>>
>>  Even if we use varbyte encoding, I wonder if it would be better to treat
>>> block + offset number as a single 48-bit integer, rather than encode them
>>> separately. That would allow the delta of two items on the same page to
>>> be
>>> stored as a single byte, rather than two bytes. Naturally it would be a
>>> loss on other values, but would be nice to see some kind of an analysis
>>> on
>>> that. I suspect it might make the code simpler, too.
>>>
>>
>> Yeah, I had that idea, but I thought it's not a better option. Will try to
>> do some analysis.
>>
>
> The more I think about that, the more convinced I am that it's a good
> idea. I don't think it will ever compress worse than the current approach
> of treating block and offset numbers separately, and, although I haven't
> actually tested it, I doubt it's any slower. About the same amount of
> arithmetic is required in both versions.
>
> Attached is a version that does that. Plus some other minor cleanup.
>
> (we should still investigate using a completely different algorithm,
> though)


Yes, when I though about that, I didn't realize that we can reserve less
than 16 bits for offset number.
I rerun my benchmark and got following results:

         event         |     period
-----------------------+-----------------
 index_build           | 00:01:46.39056
 index_build_recovery  | 00:00:05
 index_update          | 00:06:01.557708
 index_update_recovery | 00:01:23
 search_new            | 00:24:05.600366
 search_updated        | 00:25:29.520642
(6 rows)

     label      | blocks_mark
----------------+-------------
 search_new     |   847509920
 search_updated |   883789826
(2 rows)

     label     |   size
---------------+-----------
  new           | 364560384
 after_updates | 642736128
(2 rows)

Speed is same while index size is less. In previous format it was:

     label     |   size
---------------+-----------
 new           | 419299328
 after_updates | 715915264
(2 rows)

Good optimization, thanks. I'll try another datasets but I expect similar
results.
However, patch didn't apply to head. Corrected version is attached.

------
With best regards,
Alexander Korotkov.

gin-packed-postinglists-20.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part 1: additional information

Reply via email to