Re: [HACKERS] GIN improvements part 1: additional information

Alexander Korotkov Wed, 08 Jan 2014 14:00:02 -0800

On Mon, Jan 6, 2014 at 12:35 PM, Amit Langote <[email protected]>wrote:


> On Sat, Dec 21, 2013 at 4:36 AM, Heikki Linnakangas
> <[email protected]> wrote:
> >
> > Yet another version. The encoding/decoding code is now quite isolated in
> > ginpostinglist.c, so it's easy to experiment with different encodings.
> This
> > patch uses varbyte encoding again.
> >
> > I got a bit carried away, experimented with a bunch of different
> encodings.
> > I tried rice encoding, rice encoding with block and offset number delta
> > stored separately, the simple9 variant, and varbyte encoding.
> >
> > The compressed size obviously depends a lot on the distribution of the
> > items, but in the test set I used, the differences between different
> > encodings were quite small.
> >
> > One fatal problem with many encodings is VACUUM. If a page is completely
> > full and you remove one item, the result must still fit. In other words,
> > removing an item must never enlarge the space needed. Otherwise we have
> to
> > be able to split on vacuum, which adds a lot of code, and also makes it
> > possible for VACUUM to fail if there is no disk space left. That's
> > unpleasant if you're trying to run VACUUM to release disk space. (gin
> fast
> > updates already has that problem BTW, but let's not make it worse)
> >
> > I believe that eliminates all encodings in the Simple family, as well as
> > PForDelta, and surprisingly also Rice encoding. For example, if you have
> > three items in consecutive offsets, the differences between them are
> encoded
> > as 11 in rice encoding. If you remove the middle item, the encoding for
> the
> > next item becomes 010, which takes more space than the original.
> >
> > AFAICS varbyte encoding is safe from that. (a formal proof would be nice
> > though).
> >
> > So, I'm happy to go with varbyte encoding now, indeed I don't think we
> have
> > much choice unless someone can come up with an alternative that's
> > VACUUM-safe. I have to put this patch aside for a while now, I spent a
> lot
> > more time on these encoding experiments than I intended. If you could
> take a
> > look at this latest version, spend some time reviewing it and cleaning up
> > any obsolete comments, and re-run the performance tests you did earlier,
> > that would be great. One thing I'm slightly worried about is the
> overhead of
> > merging the compressed and uncompressed posting lists in a scan. This
> patch
> > will be in good shape for the final commitfest, or even before that.
> >
>
>
> I just tried out the patch "gin-packed-postinglists-varbyte2.patch"
> (which looks like the latest one in this thread) as follows:
>
> 1) Applied patch to the HEAD (on commit
> 94b899b829657332bda856ac3f06153d09077bd1)
> 2) Created a test table and index
>
> create table test (a text);
> copy test from '/usr/share/dict/words';
> create index test_trgm_idx on test using gin (a gin_trgm_ops);
>
> 3) Got the following error on a wildcard query:
>
> postgres=# explain (buffers, analyze) select count(*) from test where
> a like '%tio%';
> ERROR:  lock 9447 is not held
> STATEMENT:  explain (buffers, analyze) select count(*) from test where
> a like '%tio%';
> ERROR:  lock 9447 is not held
>

Thanks for reporting. Fixed version is attached.

------
With best regards,
Alexander Korotkov.

gin-packed-postinglists-varbyte3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part 1: additional information

Reply via email to