Re: [HACKERS] more than one index in a single heap pass?
On Jul 15, 2009, at 2:52 PM, Dimitri Fontaine wrote: Le 15 juil. 09 à 02:01, Glen Parker a écrit : Sounds to me like another reason to separate index definition from creation. If an index can be defined but not yet created or valid, then you could imagine syntax like: DEFINE INDEX blahblah1 ON mytable (some fields); DEFINE INDEX blahblah2 ON mytable (some other fields); [RE]INDEX TABLE mytable; ...provided that REINDEX TABLE could recreate all indexes simultaneously as you suggest. Well to me it sounded much more like: BEGIN; CREATE INDEX idx_a ON t(a) DEFERRED; CREATE INDEX idx_b ON t(b) DEFERRED; COMMIT; And at commit time, PostgreSQL would build all the transaction indexes in one pass over the heap, but as Tom already pointed out, using only 1 CPU. Maybe that'd be a way to limit the overall io bandwidth usage while not consuming too many CPU resources at the same time. I mean now we have a choice to either sync scan the table heap on multiple CPU, saving IO but using 1 CPU per index, or to limit CPU to only 1 but then scan the heap once per index. The intermediary option of using 1 CPU while still making a single heap scan sure can be worthwhile to some? Here's an off-the-wall thought... since most of the CPU time is in the sort, what about allowing a backend to fork off dedicated sort processes? Aside from building multiple indexes at once, that functionality could also be useful in general queries. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more than one index in a single heap pass?
decibel wrote: Here's an off-the-wall thought... since most of the CPU time is in the sort, what about allowing a backend to fork off dedicated sort processes? Aside from building multiple indexes at once, that functionality could also be useful in general queries. Sure, that would be cool. And also a lot of work :-). The comparison operators can be arbitrarily complex, potentially querying other tables etc, so you would indeed need pretty much all the infrastrucutre you need to solve the general case. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more than one index in a single heap pass?
Andrew Dunstan wrote: Wasn't that a big part of the point of the parallel pg_restore feature? Well, yes, it's some of it, and in theory Tom's late addition of a queue that gets all the dependencies of a table as soon as the table data is restored should make that work better. But of course, that's not the only time indexes are created, and each index creation command will be doing its own heap processing, albeit that synchronised scanning will make that lots cheaper. As I said originally, it was just an idle thought that came to me today. Well, TODO has: Allow multiple indexes to be created concurrently, ideally via a single heap scan, and have pg_restore use it Isn't this already largely done by parallel pg_restore work? so we have to decide if we still want that item. I think what we don't have is a way to create multiple indexes simultaneously via SQL. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more than one index in a single heap pass?
Bruce Momjian br...@momjian.us writes: Well, TODO has: Allow multiple indexes to be created concurrently, ideally via a single heap scan, and have pg_restore use it Isn't this already largely done by parallel pg_restore work? so we have to decide if we still want that item. I think what we don't have is a way to create multiple indexes simultaneously via SQL. And if we did build that, people would be bleating because it could only make use of one CPU. I think multiple backends using the existing syncscan infrastructure meets this need pretty well already. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more than one index in a single heap pass?
Hi, Le 15 juil. 09 à 02:01, Glen Parker a écrit : Sounds to me like another reason to separate index definition from creation. If an index can be defined but not yet created or valid, then you could imagine syntax like: DEFINE INDEX blahblah1 ON mytable (some fields); DEFINE INDEX blahblah2 ON mytable (some other fields); [RE]INDEX TABLE mytable; ...provided that REINDEX TABLE could recreate all indexes simultaneously as you suggest. Well to me it sounded much more like: BEGIN; CREATE INDEX idx_a ON t(a) DEFERRED; CREATE INDEX idx_b ON t(b) DEFERRED; COMMIT; And at commit time, PostgreSQL would build all the transaction indexes in one pass over the heap, but as Tom already pointed out, using only 1 CPU. Maybe that'd be a way to limit the overall io bandwidth usage while not consuming too many CPU resources at the same time. I mean now we have a choice to either sync scan the table heap on multiple CPU, saving IO but using 1 CPU per index, or to limit CPU to only 1 but then scan the heap once per index. The intermediary option of using 1 CPU while still making a single heap scan sure can be worthwhile to some? Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more than one index in a single heap pass?
Andrew Dunstan wrote: I was just wondering idly today if we could usefully build a number of indexes at the same time in a single pass over the heap, or could it be that we wouldn't gain much? I haven't even got around to thinking about any syntax for it. Could we make it work on two backends building one index each in synchronized scans? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more than one index in a single heap pass?
Alvaro Herrera alvhe...@commandprompt.com writes: Andrew Dunstan wrote: I was just wondering idly today if we could usefully build a number of indexes at the same time in a single pass over the heap, or could it be that we wouldn't gain much? I haven't even got around to thinking about any syntax for it. Could we make it work on two backends building one index each in synchronized scans? Don't we more or less have that already? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more than one index in a single heap pass?
On Tue, Jul 14, 2009 at 8:50 PM, Tom Lanet...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Andrew Dunstan wrote: I was just wondering idly today if we could usefully build a number of indexes at the same time in a single pass over the heap, or could it be that we wouldn't gain much? I haven't even got around to thinking about any syntax for it. Could we make it work on two backends building one index each in synchronized scans? Don't we more or less have that already? Wasn't that a big part of the point of the parallel pg_restore feature? -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more than one index in a single heap pass?
Greg Stark wrote: On Tue, Jul 14, 2009 at 8:50 PM, Tom Lanet...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Andrew Dunstan wrote: I was just wondering idly today if we could usefully build a number of indexes at the same time in a single pass over the heap, or could it be that we wouldn't gain much? I haven't even got around to thinking about any syntax for it. Could we make it work on two backends building one index each in synchronized scans? Don't we more or less have that already? Wasn't that a big part of the point of the parallel pg_restore feature? Well, yes, it's some of it, and in theory Tom's late addition of a queue that gets all the dependencies of a table as soon as the table data is restored should make that work better. But of course, that's not the only time indexes are created, and each index creation command will be doing its own heap processing, albeit that synchronised scanning will make that lots cheaper. As I said originally, it was just an idle thought that came to me today. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more than one index in a single heap pass?
Andrew Dunstan wrote: Well, yes, it's some of it, and in theory Tom's late addition of a queue that gets all the dependencies of a table as soon as the table data is restored should make that work better. But of course, that's not the only time indexes are created, and each index creation command will be doing its own heap processing, albeit that synchronised scanning will make that lots cheaper. As I said originally, it was just an idle thought that came to me today. Sounds to me like another reason to separate index definition from creation. If an index can be defined but not yet created or valid, then you could imagine syntax like: DEFINE INDEX blahblah1 ON mytable (some fields); DEFINE INDEX blahblah2 ON mytable (some other fields); [RE]INDEX TABLE mytable; ...provided that REINDEX TABLE could recreate all indexes simultaneously as you suggest. -Glen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers