Re: [HACKERS] WIP: store additional info in GIN index

Alexander Korotkov Wed, 05 Dec 2012 00:12:32 -0800

On Wed, Dec 5, 2012 at 1:56 AM, Tomas Vondra <t...@fuzzy.cz> wrote:

> On 4.12.2012 20:12, Alexander Korotkov wrote:
> > Hi!
> >
> > On Sun, Dec 2, 2012 at 5:02 AM, Tomas Vondra <t...@fuzzy.cz
> > <mailto:t...@fuzzy.cz>> wrote:
> >
> >     I've tried to apply the patch with the current HEAD, but I'm getting
> >     segfaults whenever VACUUM runs (either called directly or from
> autovac
> >     workers).
> >
> >     The patch applied cleanly against 9b3ac49e and needed a minor fix
> when
> >     applied on HEAD (because of an assert added to ginRedoCreatePTree),
> but
> >     that shouldn't be a problem.
> >
> >
> > Thanks for testing! Patch is rebased with HEAD. The bug you reported was
> > fixed.
>
> Applies fine, but I get a segfault in dataPlaceToPage at gindatapage.c.
> The whole backtrace is here: http://pastebin.com/YEPuWeuV
>
> The messages written into PostgreSQL log are quite variable - usually it
> looks like this:
>
> 2012-12-04 22:31:08 CET 31839 LOG:  database system was not properly
> shut down; automatic recovery in progress
> 2012-12-04 22:31:08 CET 31839 LOG:  redo starts at 0/68A76E48
> 2012-12-04 22:31:08 CET 31839 LOG:  unexpected pageaddr 0/1BE64000 in
> log segment 000000010000000000000069, offset 15089664
> 2012-12-04 22:31:08 CET 31839 LOG:  redo done at 0/69E63638
>
> but I've seen this message too
>
> 2012-12-04 22:20:29 CET 31709 LOG:  database system was not properly
> shut down; automatic recovery in progress
> 2012-12-04 22:20:29 CET 31709 LOG:  redo starts at 0/AEAFAF8
> 2012-12-04 22:20:29 CET 31709 LOG:  record with zero length at 0/C7D5698
> 2012-12-04 22:20:29 CET 31709 LOG:  redo done at 0/C7D55E
>
>
> I wasn't able to prepare a simple testcase to reproduce this, so I've
> attached two files from my "fun project" where I noticed it. It's a
> simple DB + a bit of Python for indexing mbox archives inside Pg.
>
> - create.sql - a database structure with a bunch of GIN indexes on
>                tsvector columns on "messages" table
>
> - load.py - script for parsing mbox archives / loading them into the
>             "messages" table (warning: it's a bit messy)
>
>
> Usage:
>
> 1) create the DB structure
> $ createdb archives
> $ psql archives < create.sql
>
> 2) fetch some archives (I consistently get SIGSEGV after first three)
> $ wget
> http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-01.gz
> $ wget
> http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-02.gz
> $ wget
> http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-03.gz
>
> 3) gunzip and load them using the python script
> $ gunzip pgsql-hackers.*.gz
> $ ./load.py --db archives pgsql-hackers.*
>
> 4) et voila - a SIGSEGV :-(
>
>
> I suspect this might be related to the fact that the load.py script uses
> savepoints quite heavily to handle UNIQUE_VIOLATION (duplicate messages).
>


Thanks for bug report. It is fixed in the attached patch.

------
With best regards,
Alexander Korotkov.

ginaddinfo.3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: store additional info in GIN index

Reply via email to