subject:"Re\: \[HACKERS\] \[PATCHES\] GIN improvements"

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-11 Thread Teodor Sigaev


But the real bottom line is: if autovacuum is working properly, it
should clean up the index before the list ever gets to the point where
it'd be sane to turn off indexscans.  So I don't see why we need to hack
the planner for this at all.  If any hacking is needed, it should be
in the direction of making sure autovacuum puts sufficient priority
on this task.


Autovacuum will start if table has GIN index with fastupdate=on and number of 
inserted tuple since last vacuum > autovacuum_vacuum_threshold.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-09 Thread Tom Lane

Jeff Davis  writes:
> On Wed, 2009-02-04 at 14:40 -0500, Robert Haas wrote:
>> Well, there's nothing to force that plan to be invalidated when the
>> state of the pending list changes, is there?

> Would it be unreasonable to invalidate cached plans during the pending
> list cleanup?

If the pending list cleanup is done by VACUUM then such an invalidation
already happens (VACUUM forces it after updating pg_class.reltuples/
relpages).  What's bothering me is the lack of any reasonable mechanism
for invalidating plans in the other direction, ie when the list grows
past the threshold where this code wants to turn off indexscans.  Since
the threshold depends on parameters that can vary across sessions, you'd
more or less have to send a global invalidation after every addition to
the list, in case that addition put it over the threshold in some other
session's view.  That's unreasonably often, in my book.

Also, as mentioned earlier, I'm pretty down on the idea of a threshold
where indexscans suddenly turn off entirely; that's not my idea of how
the planner ought to work.

But the real bottom line is: if autovacuum is working properly, it
should clean up the index before the list ever gets to the point where
it'd be sane to turn off indexscans.  So I don't see why we need to hack
the planner for this at all.  If any hacking is needed, it should be
in the direction of making sure autovacuum puts sufficient priority
on this task.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Robert Haas

On Wed, Feb 4, 2009 at 4:23 PM, Jeff Davis  wrote:
> On Wed, 2009-02-04 at 14:40 -0500, Robert Haas wrote:
>> Well, there's nothing to force that plan to be invalidated when the
>> state of the pending list changes, is there?
>>
>
> Would it be unreasonable to invalidate cached plans during the pending
> list cleanup?
>
> Anyway, it just strikes me as strange to expect a plan to be a good plan
> for very long. Can you think of an example where we applied this rule
> before?

Well, I am not an expert on this topic.

But, plans for prepared statements and statements within PL/pgsql
functions are cached for the lifetime of the session, which in some
situations could be quite long.

I would think that invalidating significantly more often would be bad
for performance.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Jeff Davis

On Wed, 2009-02-04 at 14:40 -0500, Robert Haas wrote:
> Well, there's nothing to force that plan to be invalidated when the
> state of the pending list changes, is there?
> 

Would it be unreasonable to invalidate cached plans during the pending
list cleanup?

Anyway, it just strikes me as strange to expect a plan to be a good plan
for very long. Can you think of an example where we applied this rule
before?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Robert Haas

On Wed, Feb 4, 2009 at 1:39 PM, Jeff Davis  wrote:
> On Mon, 2009-02-02 at 20:38 -0500, Tom Lane wrote:
>> Also, I really think it's a pretty bad idea to make index cost
>> estimation depend on the current state of the index's pending list
>> --- that state seems far too transient to base plan choices on.
>
> I'm confused by this. Don't we want to base the plan choice on the most
> current data, even if it is transient?
>
> Regards,
>Jeff Davis

Well, there's nothing to force that plan to be invalidated when the
state of the pending list changes, is there?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Jeff Davis

On Mon, 2009-02-02 at 20:38 -0500, Tom Lane wrote:
> Also, I really think it's a pretty bad idea to make index cost
> estimation depend on the current state of the index's pending list
> --- that state seems far too transient to base plan choices on.

I'm confused by this. Don't we want to base the plan choice on the most
current data, even if it is transient?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Teodor Sigaev


I looked at this a little bit --- it needs proofreading ("VACUUME"?).

Sorry, VACUUME fixed



Do we really need an additional column in pgstat table entries in
order to store something that looks like it can be derived from the
other columns?  The stats tables are way too big already.


It's not derived, because vacuum resets n_inserted_tuples to zero, but it 
doesn't reset tuples_inserted, tuples_updated and tuples_hot_updated. So, 
n_inserted_tuples is calculable until first vacuum occurs.





Also, I really think it's a pretty bad idea to make index cost
estimation depend on the current state of the index's pending list
--- that state seems far too transient to base plan choices on.


I asked for that. v0.23 used statistic data by calling 
pg_stat_get_fresh_inserted_tuples(), so revert to that.

It's possible to add pending list information to IndexOptInfo, if it's 
acceptable.


It's particularly got to be nuts to turn off indexscans entirely
if the pending list is "too full".  Having some lossy pages might
not be great but I don't believe it can be so bad that you should
go to a seqscan all the time.


It's impossible to return "lossy page" via amgettuple interface. Although, with 
amgetbitmap it works well - and GIN will not emit error even bitmap becomes lossy.


In attached version, gincostestimate will disable index scan if estimation of 
number of matched tuples in pending list is greater than non-lossy limit of 
tidbitmap. That estimation is a product of indexSelectivity and number of tuples 
in pending list.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.26.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-02-02 Thread Tom Lane

Teodor Sigaev  writes:
> I'm very sorry, but v0.24 has a silly bug with not initialized value :(.
> New version is attached

I looked at this a little bit --- it needs proofreading ("VACUUME"?).

Do we really need an additional column in pgstat table entries in
order to store something that looks like it can be derived from the
other columns?  The stats tables are way too big already.

Also, I really think it's a pretty bad idea to make index cost
estimation depend on the current state of the index's pending list
--- that state seems far too transient to base plan choices on.
It's particularly got to be nuts to turn off indexscans entirely
if the pending list is "too full".  Having some lossy pages might
not be great but I don't believe it can be so bad that you should
go to a seqscan all the time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-23 Thread Teodor Sigaev


I'm very sorry, but v0.24 has a silly bug with not initialized value :(.
New version is attached

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.25.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-22 Thread Teodor Sigaev

BTW, gincostestimate could use that information for cost estimation, but is 
index opening and metapge reading in amcostestimate acceptable?


That sounds reasonable to me. I think that's what the index-specific
cost estimators are for. 


Done.


Do you expect a performance impact?


I'm afraid for that and will test tomorrow. But statistic from index is exact.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.24.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-21 Thread Jeff Davis

On Wed, 2009-01-21 at 15:06 +0300, Teodor Sigaev wrote:
> Done. Now GIN counts number of pending tuples and pages and stores they on 
> metapage. Index cleanup could start during normal insertion in two cases:
> - number of pending tuples is too high to keep guaranteed non-lossy tidbitmap
> - pending page's content doesn't fit into work_mem.

Great, thanks. I will take a look at this version tonight.

Because time is short, I will mark it as "Ready for committer review"
now. I think all of the major issues have been addressed, and I'll just
be looking at the code and testing it.

> BTW, gincostestimate could use that information for cost estimation, but is 
> index opening and metapge reading in amcostestimate acceptable?

That sounds reasonable to me. I think that's what the index-specific
cost estimators are for. Do you expect a performance impact?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-21 Thread Teodor Sigaev

- after limit is reached, force cleanup of pending list by calling 
gininsertcleanup. Not very good, because users sometimes will see a huge 
execution time of simple insert. Although users who runs a huge update should be 
satisfied.


I have difficulties in a choice of way. Seems to me, the better will be second 
way: if user gets very long time of insertion then (auto)vacuum of his 
installation should tweaked.

I agree that the second solution sounds better to me.



Done. Now GIN counts number of pending tuples and pages and stores they on 
metapage. Index cleanup could start during normal insertion in two cases:

- number of pending tuples is too high to keep guaranteed non-lossy tidbitmap
- pending page's content doesn't fit into work_mem.

BTW, gincostestimate could use that information for cost estimation, but is 
index opening and metapge reading in amcostestimate acceptable?




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.23.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-20 Thread Jeff Davis

On Mon, 2009-01-19 at 19:53 +0300, Teodor Sigaev wrote:
> I see only two guaranteed solution of the problem:
> - after limit is reached, force normal index inserts. One of the motivation 
> of 
> patch was frequent question from users: why update of whole table with GIN 
> index 
> is so slow? So this way will not resolve this question.
> - after limit is reached, force cleanup of pending list by calling 
> gininsertcleanup. Not very good, because users sometimes will see a huge 
> execution time of simple insert. Although users who runs a huge update should 
> be 
> satisfied.
> 
> I have difficulties in a choice of way. Seems to me, the better will be 
> second 
> way: if user gets very long time of insertion then (auto)vacuum of his 
> installation should tweaked.
> 

I agree that the second solution sounds better to me.

With the new Visibility Map, it's more reasonable to run VACUUM more
often, so those that are inserting single tuples at a time should not
encounter the long insert time.

I'm still looking at the rest of the patch.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Teodor Sigaev




I suggest you take StdRdOptions out of the GinOptions struct, and leave
fillfactor out of ginoptions.  I don't think there's much point in
supporting options that don't actually do anything.  If the user tries
to set fillfactor for a gin index, he will get an error.  Which is a
good thing IMHO.

Oh, I see. Fixed.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.22.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Tom Lane

Alvaro Herrera  writes:
> Teodor Sigaev wrote:
>> I didn't change a recognition of fillfactor value, although GIN doesn't 
>> use it for now.

> I suggest you take StdRdOptions out of the GinOptions struct, and leave
> fillfactor out of ginoptions.  I don't think there's much point in
> supporting options that don't actually do anything.  If the user tries
> to set fillfactor for a gin index, he will get an error.  Which is a
> good thing IMHO.

+1 ... appearing to accept an option that doesn't really do anything is
likely to confuse users.  We didn't have much choice in the previous
incarnation of reloptions, but I think now we should throw errors when
we can.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Alvaro Herrera

Teodor Sigaev wrote:
>> I notice you added a fillfactor reloption in ginoptions, but does it
>> really make sense?  I recall removing it because the original code
>> contained a comment that says "this is here because default_reloptions
>> wants it, but it has no effect".
>
> I didn't change a recognition of fillfactor value, although GIN doesn't 
> use it for now.

I suggest you take StdRdOptions out of the GinOptions struct, and leave
fillfactor out of ginoptions.  I don't think there's much point in
supporting options that don't actually do anything.  If the user tries
to set fillfactor for a gin index, he will get an error.  Which is a
good thing IMHO.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Teodor Sigaev


I notice you added a fillfactor reloption in ginoptions, but does it
really make sense?  I recall removing it because the original code
contained a comment that says "this is here because default_reloptions
wants it, but it has no effect".


I didn't change a recognition of fillfactor value, although GIN doesn't use it 
for now.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Alvaro Herrera

Teodor Sigaev wrote:
> New version. Changes:
>  - synced with current CVS

I notice you added a fillfactor reloption in ginoptions, but does it
really make sense?  I recall removing it because the original code
contained a comment that says "this is here because default_reloptions
wants it, but it has no effect".

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Teodor Sigaev


Changes:
 Results of pernding list's scan now are placed directly in resulting 
tidbitmap. This saves cycles for filtering results and reduce memory usage. 
Also, it allows to not check losiness of tbm.




Is this a 100% bulletproof solution, or is it still possible for a query
to fail due to the pending list? It relies on the stats collector, so
perhaps in rare cases it could still fail?

Yes :(


Can you explain why the tbm must not be lossy?


The problem with lossy tbm has two aspects:
 - amgettuple interface hasn't possibility to work with page-wide result instead
   of exact ItemPointer. amgettuple can not return just a block number as
   amgetbitmap can.
 - Because of concurrent vacuum process: while we scan pending list, it's
   content could be transferred into regular structure of index and then we will
   find the same tuple twice. Again, amgettuple hasn't protection from that,
   only amgetbitmap has it. So, we need to filter results from regular GIN
   by results from pending list. ANd for filtering we can't use lossy tbm.

v0.21 prevents from that fail on call of gingetbitmap, because now all results 
are collected in single resulting tidbitmap.





Also, can you clarify why a large update can cause a problem? In the


If query looks like
UPDATE tbl SET col=... WHERE col ... and planner choose GIN indexscan over col 
then there is a probability of increasing of pending list over non-lossy limit.




previous discussion, you suggested that it force normal index inserts
after a threshold based on work_mem:

http://archives.postgresql.org/pgsql-hackers/2008-12/msg00065.php


I see only two guaranteed solution of the problem:
- after limit is reached, force normal index inserts. One of the motivation of 
patch was frequent question from users: why update of whole table with GIN index 
is so slow? So this way will not resolve this question.
- after limit is reached, force cleanup of pending list by calling 
gininsertcleanup. Not very good, because users sometimes will see a huge 
execution time of simple insert. Although users who runs a huge update should be 
satisfied.


I have difficulties in a choice of way. Seems to me, the better will be second 
way: if user gets very long time of insertion then (auto)vacuum of his 
installation should tweaked.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.21.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-18 Thread Jeff Davis

On Fri, 2009-01-16 at 15:39 +0300, Teodor Sigaev wrote:
> > START_CRIT_SECTION();
> > ...
> > l = PageAddItem(...);
> > if (l == InvalidOffsetNumber)
> > elog(ERROR, "failed to add item to index page in \"%s\"",
> >  RelationGetRelationName(index));
> > 
> > It's no use using ERROR, because it will turn into PANIC, which is
> I did that similar to other GIN/GiST places. BTW, BTree  directly emits PANIC 
> if
> PageAddItem fails
> 

I'd still prefer PANIC over an ERROR that will always turn into a PANIC.
I'll leave it as you did, though.

> > 
> > 4. Heikki mentioned:
> > http://archives.postgresql.org/pgsql-hackers/2008-11/msg01832.php
> > 
> > "To make things worse, a query will fail if all the matching 
> > fast-inserted tuples don't fit in the non-lossy tid bitmap."
> > 
> > That issue still remains, correct? Is there a resolution to that?
> 
> Now gincostestimate can forbid index scan by disable_cost (see Changes). Of 
> course, it doesn't prevent failure in case of large update (for example), but 
> it 
> prevents in most cases. BTW, because of sequential scan of pending list cost 
> of 
> scan grows up fast and index scan becomes non-optimal.

Is this a 100% bulletproof solution, or is it still possible for a query
to fail due to the pending list? It relies on the stats collector, so
perhaps in rare cases it could still fail?

It might be surprising though, that after an UPDATE and before a VACUUM,
the gin index just stops working (if work_mem is too low). For many
use-cases, if GIN is not used, it's just as bad as the query failing,
because it would be so slow.

Can you explain why the tbm must not be lossy?

Also, can you clarify why a large update can cause a problem? In the
previous discussion, you suggested that it force normal index inserts
after a threshold based on work_mem:

http://archives.postgresql.org/pgsql-hackers/2008-12/msg00065.php

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-16 Thread Teodor Sigaev


New version. Changes:
 - synced with current CVS
 - added all your changes
 - autovacuum will run if fast update mode is turned on and
   trigger of fresh tuple is fired
 - gincostestimate now tries to calculate cost of scan of pending pages.
   gincostestimate set disable_cost if it believe that tidbitmap will
   become lossy. So, tidbitmap has new method - estimation of
   maximum number of tuples with guaranteed non-lossy mode.



START_CRIT_SECTION();
...
l = PageAddItem(...);
if (l == InvalidOffsetNumber)
elog(ERROR, "failed to add item to index page in \"%s\"",
 RelationGetRelationName(index));

It's no use using ERROR, because it will turn into PANIC, which is

I did that similar to other GIN/GiST places. BTW, BTree  directly emits PANIC if
PageAddItem fails




4. Heikki mentioned:
http://archives.postgresql.org/pgsql-hackers/2008-11/msg01832.php

"To make things worse, a query will fail if all the matching 
fast-inserted tuples don't fit in the non-lossy tid bitmap."


That issue still remains, correct? Is there a resolution to that?


Now gincostestimate can forbid index scan by disable_cost (see Changes). Of 
course, it doesn't prevent failure in case of large update (for example), but it 
prevents in most cases. BTW, because of sequential scan of pending list cost of 
scan grows up fast and index scan becomes non-optimal.




5. I attached a newer version merged with HEAD.

Thank you


6. You defined:

#define GinPageHasFullRow(page) ( GinPageGetOpaque(page)->flags &
GIN_LIST_FULLROW )


Fixed



7.  I don't understand this chunk of code:

How can (!ItemPointerEquals(&pos->item, &item)) ever happen?

And how can (scanGetCandidate(scan, pos) == false) ever happen? Should
that be an Assert() instead?

If those can happen during normal operation, then we need a better error
message there.


It should be assert, but assert enabled and disabled code will be different :(.
In both cases, scanGetCandidate() should be called, but in assert enabled code 
we need to check return value and pos->item.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



fast_insert_gin-0.20.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-12 Thread Teodor Sigaev


Changes:
 - added vacuum_delay_point() in gininsertcleanup
 - add trigger to run vacuum by number of inserted tuples from
   last vacuum. Number of inserted tuples represents number
   of really inserted tuples in index and it is calculated as
   tuples_inserted + tuples_updated - tuples_hot_updated.
   Trigger fires on instuples > vac_base_thresh because search time is linear
   on number of pending pages (tuples)
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.17.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-03 Thread Heikki Linnakangas


Tom Lane wrote:

Greg Stark <[EMAIL PROTECTED]> writes:
If we do this though it would be really nice to do it at a higher  
level than the indexam. If we could do it for any indexam that  
provides a kind of bulk insert method that would be great.


I'm just not sure how to support all the indexable operators for the  
various indexams on the local buffered list.


In principle, just return all those TIDs marked "lossy, please recheck".
This is a bit brute-force but I'm not sure any useful optimization is
possible.


You could flush the local buffer to the index whenever the index is 
queried. Not sure if it's better than returning them for recheck, though.


This wouldn't work for unique indexes, BTW, but that's not a problem for 
GIN.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-03 Thread Tom Lane

Greg Stark <[EMAIL PROTECTED]> writes:
> If we do this though it would be really nice to do it at a higher  
> level than the indexam. If we could do it for any indexam that  
> provides a kind of bulk insert method that would be great.

> I'm just not sure how to support all the indexable operators for the  
> various indexams on the local buffered list.

In principle, just return all those TIDs marked "lossy, please recheck".
This is a bit brute-force but I'm not sure any useful optimization is
possible.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-03 Thread Greg Stark




On 3 Dec 2008, at 06:57 AM, Heikki Linnakangas <[EMAIL PROTECTED] 
> wrote:



Tom Lane wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
If *that* is a use case we're interested in, the incoming tuples  
could be accumulated in backend-private memory, and inserted into  
the index at commit. That would be a lot simpler, with no need to  
worry about concurrent inserts or vacuums.
Doesn't work --- the index would yield wrong answers for later  
queries

in the same transaction.


Queries would still need to check the backend-private list.



More to the point -- at least if I'm guessing right about tom's  
thoughts --queries would still have to check the heap. That is the  
backend private list would just be a proxy for buffered *index* tuples.


If we do this though it would be really nice to do it at a higher  
level than the indexam. If we could do it for any indexam that  
provides a kind of bulk insert method that would be great.


I'm just not sure how to support all the indexable operators for the  
various indexams on the local buffered list.


Incidentally buffering btree index inserts was originally Heikki's idea.



--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-02 Thread Heikki Linnakangas


Tom Lane wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
If *that* is a use case we're interested in, the incoming tuples could 
be accumulated in backend-private memory, and inserted into the index at 
commit. That would be a lot simpler, with no need to worry about 
concurrent inserts or vacuums.


Doesn't work --- the index would yield wrong answers for later queries
in the same transaction.


Queries would still need to check the backend-private list.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-02 Thread Tom Lane

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Teodor Sigaev wrote:
>> - Falling back to regular insert will take long time for update of whole 
>> table - and that was one of reasons of that patch. Users forget to drop 
>> GIN index before a global update and query runs forever.

> If *that* is a use case we're interested in, the incoming tuples could 
> be accumulated in backend-private memory, and inserted into the index at 
> commit. That would be a lot simpler, with no need to worry about 
> concurrent inserts or vacuums.

Doesn't work --- the index would yield wrong answers for later queries
in the same transaction.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-02 Thread Heikki Linnakangas


Teodor Sigaev wrote:
- Falling back to regular insert will take long time for update of whole 
table - and that was one of reasons of that patch. Users forget to drop 
GIN index before a global update and query runs forever.


If *that* is a use case we're interested in, the incoming tuples could 
be accumulated in backend-private memory, and inserted into the index at 
commit. That would be a lot simpler, with no need to worry about 
concurrent inserts or vacuums.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-12-02 Thread Teodor Sigaev

grovel through. The situation will only be rectified at the next vacuum, 
but if there's no deletes or updates on the table, just inserts, 
autovacuum won't happen until the next anti-wraparound vacuum.
There is not agreement here, see 
http://archives.postgresql.org/message-id/[EMAIL PROTECTED]


Yet another problem is that if so much work is offloaded to autovacuum, 
it can tie up autovacuum workers for a very long time. And the work can 
happen on an unfortunate time, when the system is busy, and affect other 
queries. There's no vacuum_delay_point()s in gininsertcleanup, so 
there's no way to throttle that work.

Will insert.


I think we need a hard limit on the number of list pages, before we can 
consider accepting this patch. After the limit is full, the next 
inserter can flush the list, inserting the tuples in the list into the 
tree, or just fall back to regular, slow, inserts.


Hard limit is not very good decision
- If it will make a flush when limit is reached then sometimes insert or update 
will take unacceptable amount of time. Small limit is not very helpful, large 
will takes  a lot of time. Although if we calculate limit using work_mem setting 
then, may  be, it will be useful. Bulk insert will collect all pending pages in 
memory at once.
- Falling back to regular insert will take long time for update of whole table - 
and that was one of reasons of that patch. Users forget to drop GIN index before 
a global update and query runs forever.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-11-27 Thread Alvaro Herrera

Gregory Stark wrote:
> Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> 
> > I think we need a hard limit on the number of list pages, before we can
> > consider accepting this patch. After the limit is full, the next inserter 
> > can
> > flush the list, inserting the tuples in the list into the tree, or just fall
> > back to regular, slow, inserts.
> 
> I do like the idea of having the work fall to vacuum though. Perhaps we need
> some way for autovacuum to ask an access method what shape an index is in and
> whether it needs vacuuming? Or more likely a separate command from vacuum that
> specifically cleans up an index.

Yeah, this is what we agreed to on Ottawa.  We need to collect some
different stats for the GIN indexes where this is active, and ensure
that autovacuum checks them.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-11-27 Thread Gregory Stark

Heikki Linnakangas <[EMAIL PROTECTED]> writes:

> I think we need a hard limit on the number of list pages, before we can
> consider accepting this patch. After the limit is full, the next inserter can
> flush the list, inserting the tuples in the list into the tree, or just fall
> back to regular, slow, inserts.

I do like the idea of having the work fall to vacuum though. Perhaps we need
some way for autovacuum to ask an access method what shape an index is in and
whether it needs vacuuming? Or more likely a separate command from vacuum that
specifically cleans up an index.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-11-27 Thread Heikki Linnakangas

There's a pretty fundamental issue with this patch, which is that while 
buffering the inserts in the "list pages" makes the inserts fast, all 
subsequent queries become slower until the tuples have been properly 
inserted into the index. I'm sure it's a good tradeoff in many cases, 
but there has got to be a limit to it. Currently, if you create an empty 
table, and load millions of tuples into it using INSERTs, the index 
degenerates into  just a pile of "fast" tuples that every query needs to 
grovel through. The situation will only be rectified at the next vacuum, 
but if there's no deletes or updates on the table, just inserts, 
autovacuum won't happen until the next anti-wraparound vacuum.


To make things worse, a query will fail if all the matching 
fast-inserted tuples don't fit in the non-lossy tid bitmap. That's 
another reason to limit the number of list pages; queries will start 
failing otherwise.


Yet another problem is that if so much work is offloaded to autovacuum, 
it can tie up autovacuum workers for a very long time. And the work can 
happen on an unfortunate time, when the system is busy, and affect other 
queries. There's no vacuum_delay_point()s in gininsertcleanup, so 
there's no way to throttle that work.


I think we need a hard limit on the number of list pages, before we can 
consider accepting this patch. After the limit is full, the next 
inserter can flush the list, inserting the tuples in the list into the 
tree, or just fall back to regular, slow, inserts.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-10-31 Thread Teodor Sigaev


Reworked version of fast insertion patch for GIN.


* shiftList does LockBufferForCleanup, which means that it can be blocked
for an indefinitely long time by a concurrent scan, and since it's 
holding
exclusive lock on metapage no new scans or insertions can start 
meanwhile.
This is not only horrid from a performance standpoint but it very 
probably

can result in deadlocks --- which will be deadlocks on LWLocks and thus
not even detected by the system.
Ops, I see possible scenario: UPDATE tbl SET gin_indexed_field = ... 
where gin_indexed_field   with concurrent shiftList. Will fix. Thank 
you.

Fixed, see below


* GIN index scans release lock and pin on one pending-list page before
acquiring pin and lock on the next, which means there's a race condition:
shiftList could visit and delete the next page before we get to it,
because there's a window where we're holding no buffer lock at all.

Agree, will fix.

Fixed


* There is a bigger race condition, which is that after a scan has
returned a tuple from a pending page, vacuum could move the index entry
into the main index structure, and then that same scan could return that
same index entry a second time.  This is a no-no, and I don't see any 
easy

fix.

Fixed. TIDBitmap is used for that and for preventing deadlock mentioned above.
TIDBitmap is used for collectiong matched tuples from pending pages and after 
that it used as filter for results from regular GIN's scan.


Patch extends TIDBitmap interface by 2 functions:
bool tbm_check_tuple(TIDBitmap *tbm, const ItemPointer tid);
returns true if tid already exists in bitmap
bool   tbm_has_lossy(TIDBitmap *tbm);
returns true if bitmap becomes lossy


Also, sequential scan on pending page is replaced to binary search for 
performance. It's not a big win but it might improve performance.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.15.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-25 Thread Teodor Sigaev





(a) that's not back-patchable and (b) it'll create a merge conflict with
your patch, if you're still going to add a new AM function column.
I think that aminsertcleanup per se isn't needed, but if we want an
"amanalyze" there'd still be a conflict.  Where are we on that?


I'll revert aminsertcleanup framework but leave gininsertcleanup function as is, 
because I'll not have enough time until end of summer - I'd like to finalize 
patch and fixes first.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
>   - GiST already supports both scan directions in theory, but page split may 
> change order between forward and backward scans (user-defined pageSplit 
> doesn't 
> preserve order of tuples). Holding of split until end of scan will produce 
> unacceptable concurrency level.

>   - GIN doesn't support backward scan direction and will not support in close 
> future.

Okay.  I'll see about fixing the planner to not assume that GIST or GIN
indexscans are scrollable.

The cleanest way to do this is to introduce a new bool column in pg_am
rather than hard-wiring assumptions about which AMs can do it.  However
(a) that's not back-patchable and (b) it'll create a merge conflict with
your patch, if you're still going to add a new AM function column.
I think that aminsertcleanup per se isn't needed, but if we want an
"amanalyze" there'd still be a conflict.  Where are we on that?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Teodor Sigaev


queries return the same row twice.  A bitmap indexscan plan would mask
such an index bug ... but a plain indexscan won't.


Fuh. :(. Well. Will fix.

GiST:
 - GiST already supports both scan directions in theory, but page split may 
change order between forward and backward scans (user-defined pageSplit doesn't 
preserve order of tuples). Holding of split until end of scan will produce 
unacceptable concurrency level.
 - GiST  can return one itempointer twice. It's fixable by storing content of 
current page in memory instead of just keeping page pinned. Will do (backpatches 
too).


GIN:
 - GIN doesn't support backward scan direction and will not support in close 
future.
 - Right now GIN doesn't return twice the same itempointer, but with current 
fast_insert patch it might return. So, suppose, to fix that it's enough just to 
remember itempointers returned from pending list and use it as filter for 
results from  regular structure. Will do.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> operations such as page splits.  Do we need to change the planner to
>> assume that this only works nicely for btree?

> It seems to that direction (backward or forward) has meaning only for
> indexes with amcanorder = true. With amcanorder=false results will be
> occasionally for any direction.

Well, no; amcanorder specifies that the index can return results that
are sorted according to some externally meaningful ordering.  The
question at hand is just whether the results of a single indexscan
are self-consistent.  That's a property that can reasonably be expected
to hold regardless of amcanorder; it does hold for hash indexes for
instance.  (In the case of hash we have to forbid splitting a bucket
that's actively being scanned in order to make it true.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Teodor Sigaev


operations such as page splits.  Do we need to change the planner to
assume that this only works nicely for btree?


It seems to that direction (backward or forward) has meaning only for indexes 
with amcanorder = true. With amcanorder=false results will be occasionally for 
any direction.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Tom Lane

I wrote:
> Really?  Then GiST needs to be fixed too.  Otherwise you risk having
> queries return the same row twice.  A bitmap indexscan plan would mask
> such an index bug ... but a plain indexscan won't.

BTW, there's another issue I forgot about yesterday, which is that
the planner assumes that all index AMs work correctly for backwards
scan.  The place where the rubber meets the road here is that
if you DECLARE SCROLL CURSOR for a plan implemented as a plain
indexscan, then FETCH BACKWARDS is supposed to reliably generate
results consistent with previous FETCH FORWARDS, to wit the same
tuples in the reverse order.

We can assume that the query is using an MVCC snapshot, which means
that at the index level it's okay for the index to return newly-inserted
entries that weren't returned in the previous forward scan, or to not
return entries that were removed meanwhile by VACUUM.  But re-ordering
live tuples is bad news.

The idea of copying the pending-tuples list into local scan state would
make this work as expected as far as the proposed patch goes, but I'm
wondering whether the behavior isn't completely broken anyway by
operations such as page splits.  Do we need to change the planner to
assume that this only works nicely for btree?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> * There is a bigger race condition, which is that after a scan has
>> returned a tuple from a pending page, vacuum could move the index entry
>> into the main index structure, and then that same scan could return that
>> same index entry a second time.  This is a no-no, and I don't see any easy
>> fix.

> Hmm, isn't it allowed for indexes? At least GiST has this behaviour from its 
> birth date.

Really?  Then GiST needs to be fixed too.  Otherwise you risk having
queries return the same row twice.  A bitmap indexscan plan would mask
such an index bug ... but a plain indexscan won't.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-24 Thread Teodor Sigaev


* shiftList() holds an exclusive lock on metapage throughout its run,
which means that it's impossible for two of them to run concurrently.
So why bother with "concurrent deletion" detection?
Because metapage is locked immediately before shiftList call, while  metapage is 
 unlocked another process could produce locking metapage and execution of 
shiftList. So, when shiftList starts it should check of already deleted page. If 
shiftList sees already deleted page then it doesn't do anything and reports to 
the caller.



* shiftList does LockBufferForCleanup, which means that it can be blocked
for an indefinitely long time by a concurrent scan, and since it's holding
exclusive lock on metapage no new scans or insertions can start meanwhile.
This is not only horrid from a performance standpoint but it very probably
can result in deadlocks --- which will be deadlocks on LWLocks and thus
not even detected by the system.
Ops, I see possible scenario: UPDATE tbl SET gin_indexed_field = ... where 
gin_indexed_field   with concurrent shiftList. Will fix. Thank you.


Nevertheless,  shiftList should be fast in typical scenario: it doesn't do 
complicated work but just marks as deleted pages which already was readed before.



* GIN index scans release lock and pin on one pending-list page before
acquiring pin and lock on the next, which means there's a race condition:
shiftList could visit and delete the next page before we get to it,
because there's a window where we're holding no buffer lock at all.

Agree, will fix.


* It seems also possible that once a list page has been marked
GIN_DELETED, it could be re-used for some other purpose before a
scan-in-flight reaches it -- reused either as a regular index page or as a
Impossible - because deletion is running from the head of list and scan too. But 
deletion locks metapage and locks pages for cleanup. So, scan may start only 
from not yet deleted page and will go through the list before deletion process.




* There is a bigger race condition, which is that after a scan has
returned a tuple from a pending page, vacuum could move the index entry
into the main index structure, and then that same scan could return that
same index entry a second time.  This is a no-no, and I don't see any easy
fix.
Hmm, isn't it allowed for indexes? At least GiST has this behaviour from its 
birth date.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> It's a mess:

> These are rather severe problems.  Maybe there's a better solution, but
> perhaps it would be good enough to lock out concurrent access to the
> index while the bulkinsert procedure is working.

Ugh...

The idea I was toying with was to not allow GIN scans to "stop" on
pending-insertion pages; rather, they should suck out all the matching
tuple IDs into backend-local memory as fast as they can, and then return
the TIDs to the caller one at a time from that internal array.  Then,
when the scan is later visiting the main part of the index, it could
check each matching TID against that array to see if it'd already
returned the TID.  (So it might be an idea to sort the TID array after
gathering it, to make those subsequent checks fast via binary search.)

This would cost in backend-local memory, of course, but hopefully not
very much.  The advantages are the elimination of the deadlock risk
from scan-blocks-insertcleanup-blocks-insert, and fixing the race
condition when a TID previously seen in the pending list is moved to
the main index.  There were still a number of locking issues to fix
but I think they're all relatively easy to deal with.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Alvaro Herrera

Tom Lane wrote:
> Teodor Sigaev <[EMAIL PROTECTED]> writes:

> I didn't get much further than that because I got discouraged after
> looking at the locking issues around the pending-insertions list.
> It's a mess:

These are rather severe problems.  Maybe there's a better solution, but
perhaps it would be good enough to lock out concurrent access to the
index while the bulkinsert procedure is working.




-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
> Updated: http://www.sigaev.ru/misc/fast_insert_gin-0.9.gz

Here is the GIN fast-insert patch back again.  Changes:

* Sync with CVS HEAD
* Clean up documentation and some of the code comments
* Fix up custom reloptions code
* Suppress some compiler warnings

I didn't get much further than that because I got discouraged after
looking at the locking issues around the pending-insertions list.
It's a mess:

* shiftList() holds an exclusive lock on metapage throughout its run,
which means that it's impossible for two of them to run concurrently.
So why bother with "concurrent deletion" detection?

* shiftList does LockBufferForCleanup, which means that it can be blocked
for an indefinitely long time by a concurrent scan, and since it's holding
exclusive lock on metapage no new scans or insertions can start meanwhile.
This is not only horrid from a performance standpoint but it very probably
can result in deadlocks --- which will be deadlocks on LWLocks and thus
not even detected by the system.

* GIN index scans release lock and pin on one pending-list page before
acquiring pin and lock on the next, which means there's a race condition:
shiftList could visit and delete the next page before we get to it,
because there's a window where we're holding no buffer lock at all.
I think this isn't fatal in itself, since presumably the data in the next
page has been moved into the main index and we can scan it later, but the
scan code isn't checking whether the page has been deleted out from under
it.

* It seems also possible that once a list page has been marked
GIN_DELETED, it could be re-used for some other purpose before a
scan-in-flight reaches it -- reused either as a regular index page or as a
new list page.  Neither case is being defended against.  It might be that
the new-list-page case isn't a problem, or it might not.

* There is a bigger race condition, which is that after a scan has
returned a tuple from a pending page, vacuum could move the index entry
into the main index structure, and then that same scan could return that
same index entry a second time.  This is a no-no, and I don't see any easy
fix.

I haven't really finished reviewing this code, but I'm going to bounce it
back to you to see if you can solve the locking problems.  Unless that can
be made safe there is no point doing any more work on this patch.

regards, tom lane



bineMnFgM4K6g.bin
Description: fast_insert_gin-0.10.patch.gz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Tom Lane

I wrote:
>> Yeah, I was going to complain about that next :-).  Autovacuum isn't
>> going to trigger as a result of INSERT operations; somehow we have
>> to teach it what to do for GIN indexes.  I remember we discussed this
>> at PGCon but I don't think we decided exactly what to do...

One simple idea is to call aminsertcleanup (probably renamed to
something else like amanalyzehook) during ANALYZE.  This seems a bit
grotty, but it has the very attractive property that we don't need to
give the autovacuum control logic any special knowledge about GIN
indexes.  Either inserts or updates will lead it to trigger either
auto-ANALYZE or auto-VACUUM, and either way GIN gets a cleanup
opportunity.

A possible argument against this is that if we later fix things so
that VACUUM and ANALYZE can happen concurrently on the same table,
amanalyzehook could get called concurrently with ambulkdelete or
other vacuum-support operations.  So the AM author would have to
take care to interlock that safely.  But this doesn't seem like
a big deal to me --- interlocks against regular inserts/updates
are probably a harder problem anyway.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
> So, may be we just move insertcleanup call to   
> ginbulkdelete/ginvacuumcleanup 
> but leave aminsertcleanup field in pg_proc for a future.

I'd be inclined not to add the extra AM call if we aren't going to use
it now.  There's no very good reason to think that a definition we
settled on today would be exactly the right thing for whatever future
need might appear.  Better to wait till we have a concrete example to
design around.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Teodor Sigaev


once, for regular VACUUM I think you really have to call it within
each bulkdelete operation.  


Exactly what I did in last patch.


There's probably no point in optimizing
it away in VACUUM FULL either, since surely it'll be fast to call
index_cleanup when there's nothing in the pending list?


Sure, with empty pending list insertcleanup will just lock/unlock metapage.


Yeah, I was going to complain about that next :-).  Autovacuum isn't
going to trigger as a result of INSERT operations; somehow we have
to teach it what to do for GIN indexes.  I remember we discussed this
at PGCon but I don't think we decided exactly what to do...
So, may be we just move insertcleanup call to   ginbulkdelete/ginvacuumcleanup 
but leave aminsertcleanup field in pg_proc for a future.



I've already made a number of changes to the patch; let me keep working
on it and send it back to you later.

ok

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-22 Thread Teodor Sigaev


Well, if that is required to be true then this whole design is pretty
broken, because VACUUM doesn't hold any lock that would guarantee that
no insert happens between the two calls.  If we fold the two AM calls
into one call then it'd be okay for the index AM to take such a lock
transiently during the single index-cleanup-plus-bulkdelete call.
Actually, lock doesn't needed. Just bulkdelete should not try to remove not yet 
"insertcleanuped" items pointer. That's easy because VacPageList is prepared 
before insertcleanup call.




Maybe it'd be better if ambulkdelete *did* scan the pending list?


I don't like that idea because it requires to add a lot of code (concurrent 
deletion of pages in list), much simpler to call insertcleanup inside 
ginbulkdelete/ginvacuumcleanup.



You'd still need at least page-level locking but perhaps not anything
stronger.


That's close to trivial to revert this piece to add cleanup call to 
ginbulkdelete/ginvacuumcleanup. Early variants used this variant.

Reasons for new variant was:
 - defining needing of call of insertcleanup, and stats argument was used for
   it in both function. If it's a NULL then call cleanup.
 - I thought about statistic-based trigger for separate call of insertcleanup.
   Trigger should be fired on massive insert/update events very similar to
   trigger on massive delete for ambulkdelete. I'm very sorry but I didn't do it
   yet, and definitely I need some help here.

Do I revert that piece?

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-22 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> I started to look at this.  I don't understand why VACUUM does an insert
>> cleanup before starting to vacuum, but VACUUM FULL doesn't?

> Hmm. May be I missed something, but I don't understand where and what... I 
> tried 
> to track all places of ambultdelete call. aminsertcleanup should be called 
> before any ambulkdelete, because ambulkdelete doesn't scan pending list which 
> can store items to be deleted and hence index will store item pointers to 
> absent 
> tuples.

>> needed is the one at vacuum startup, which tempts me to propose that
>> the new AM entry point should be called "amvacuumstartup", instead of
>> wiring in the assumption that what it's for is specifically cleanup
>> of insertions.

> That's possible but inserts into index should be forbidden between 
> amvacuumstartup and last call of ambulkdelete.

Well, if that is required to be true then this whole design is pretty
broken, because VACUUM doesn't hold any lock that would guarantee that
no insert happens between the two calls.  If we fold the two AM calls
into one call then it'd be okay for the index AM to take such a lock
transiently during the single index-cleanup-plus-bulkdelete call.

For VACUUM FULL there's no such issue because the whole table is locked,
but I still don't see any real point in having two successive index AM
calls when the AM could perfectly well do all the work in one call.

Maybe it'd be better if ambulkdelete *did* scan the pending list?
You'd still need at least page-level locking but perhaps not anything
stronger.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-22 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
> That's close to trivial to revert this piece to add cleanup call to 
> ginbulkdelete/ginvacuumcleanup. Early variants used this variant.

Yeah, I think we should do it that way.

On reflection I don't think you even need the amvacuumstartup call,
because it is *not* safe to assume that an index cleanup operation
there will guarantee that vacuum won't try to remove pending tuples.
Remember that a tuple inserted by a transaction that later aborted
is DEAD and can be reclaimed instantly by VACUUM.  So while in the
case of VACUUM FULL it might be okay to call index_cleanup only
once, for regular VACUUM I think you really have to call it within
each bulkdelete operation.  There's probably no point in optimizing
it away in VACUUM FULL either, since surely it'll be fast to call
index_cleanup when there's nothing in the pending list?

>   - I thought about statistic-based trigger for separate call of 
> insertcleanup.
> Trigger should be fired on massive insert/update events very similar to
> trigger on massive delete for ambulkdelete. I'm very sorry but I didn't 
> do it
> yet, and definitely I need some help here.

Yeah, I was going to complain about that next :-).  Autovacuum isn't
going to trigger as a result of INSERT operations; somehow we have
to teach it what to do for GIN indexes.  I remember we discussed this
at PGCon but I don't think we decided exactly what to do...

> Do I revert that piece?

I've already made a number of changes to the patch; let me keep working
on it and send it back to you later.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-22 Thread Teodor Sigaev


I started to look at this.  I don't understand why VACUUM does an insert
cleanup before starting to vacuum, but VACUUM FULL doesn't?


Hmm. May be I missed something, but I don't understand where and what... I tried 
to track all places of ambultdelete call. aminsertcleanup should be called 
before any ambulkdelete, because ambulkdelete doesn't scan pending list which 
can store items to be deleted and hence index will store item pointers to absent 
tuples.



needed is the one at vacuum startup, which tempts me to propose that
the new AM entry point should be called "amvacuumstartup", instead of
wiring in the assumption that what it's for is specifically cleanup
of insertions.


That's possible but inserts into index should be forbidden between 
amvacuumstartup and last call of ambulkdelete.





Comments?  I can make the change if you think it's okay --- I'm busy
cleaning up docs and comments at the moment.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-22 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
> Updated: http://www.sigaev.ru/misc/fast_insert_gin-0.9.gz
> I still havn't clearness of acceptability for suggested aminsertcleanup  
> calling.

I started to look at this.  I don't understand why VACUUM does an insert
cleanup before starting to vacuum, but VACUUM FULL doesn't?

I don't particularly like the idea of adding aminsertcleanup calls
immediately before other AM operations such as ambulkdelete.  It seems
to me that those operations ought to include the cleanup subroutine
themselves, if they need it; they shouldn't depend on callers to get
this right.  Offhand it looks to me like the only new index AM call
needed is the one at vacuum startup, which tempts me to propose that
the new AM entry point should be called "amvacuumstartup", instead of
wiring in the assumption that what it's for is specifically cleanup
of insertions.

Comments?  I can make the change if you think it's okay --- I'm busy
cleaning up docs and comments at the moment.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-13 Thread Teodor Sigaev


Updated: http://www.sigaev.ru/misc/fast_insert_gin-0.9.gz


need more review of fast_insert yet?  It looked like a number of people
commented on it already.


I still havn't clearness of acceptability for suggested aminsertcleanup  
calling.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-11 Thread Teodor Sigaev


I've committed the multicolumn one with minor revisions (fix some poor
English in docs and comments, add regression-test coverage).  Do you

Thank you very much.


need more review of fast_insert yet?  It looked like a number of people
commented on it already.
I should modify it to support/synchronize with multicolumn GIN - both patches 
touch the same pieces of code, and I didn't make a single patch to simplify review.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-11 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
> http://www.sigaev.ru/misc/fast_insert_gin-0.7.gz
> http://www.sigaev.ru/misc/multicolumn_gin-0.3.gz

I've committed the multicolumn one with minor revisions (fix some poor
English in docs and comments, add regression-test coverage).  Do you
need more review of fast_insert yet?  It looked like a number of people
commented on it already.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-09 Thread Josh Berkus

Neil,

> I was tasked with reviewing this for the current commit fest, although
> so far I've just been working on grokking the rest of the GIN code. But
> if you'd like to review it instead, that's fine with me.

I have plenty of other stuff I could assign you if you're not needed on 
GIN.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-09 Thread Neil Conway

On Tue, 2008-07-08 at 14:51 -0400, Tom Lane wrote:
> I'd still like to take a look.

I was tasked with reviewing this for the current commit fest, although
so far I've just been working on grokking the rest of the GIN code. But
if you'd like to review it instead, that's fine with me.

-Neil



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-08 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> I looked this over and it looks good in general. 

> May I think that patch passed review and commit it?

I'd still like to take a look.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-08 Thread Teodor Sigaev

I looked this over and it looks good in general. 

May I think that patch passed review and commit it?

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-07 Thread Alvaro Herrera

Teodor Sigaev wrote:
>> I looked this over and it looks good in general.  I was only wondering
>> about for single-column indexes -- we're storing attribute numbers too,
>> right?
> No, GinState->oneCol field signals to GinFormTuple and  
> gin_index_getattr/gintuple_get_attrnum about actual storage.
>
> Single column index is binary compatible with current index :)

Ah, neat!

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-07 Thread Teodor Sigaev


I looked this over and it looks good in general.  I was only wondering
about for single-column indexes -- we're storing attribute numbers too,
right?
No, GinState->oneCol field signals to GinFormTuple and 
gin_index_getattr/gintuple_get_attrnum about actual storage.


Single column index is binary compatible with current index :)

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-07 Thread Alvaro Herrera

Teodor Sigaev wrote:
> Sync with current CVS HEAD and post in hackers- too because patches- 
> close to the closing.
>
> http://www.sigaev.ru/misc/multicolumn_gin-0.3.gz

I looked this over and it looks good in general.  I was only wondering
about for single-column indexes -- we're storing attribute numbers too,
right?  Would it be too difficult to strip them out in that case?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-02 Thread Teodor Sigaev

Sync with current CVS HEAD and post in hackers- too because patches- close to 
the closing.


http://www.sigaev.ru/misc/fast_insert_gin-0.7.gz
http://www.sigaev.ru/misc/multicolumn_gin-0.3.gz

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

63 matches

Mail list logo