Re: [HACKERS] more than one index in a single heap pass?

2009-07-17 Thread decibel

On Jul 15, 2009, at 2:52 PM, Dimitri Fontaine wrote:

Le 15 juil. 09 à 02:01, Glen Parker a écrit :
Sounds to me like another reason to separate index definition from  
creation.  If an index can be defined but not yet created or  
valid, then you could imagine syntax like:


DEFINE INDEX blahblah1 ON mytable (some fields);
DEFINE INDEX blahblah2 ON mytable (some other fields);
[RE]INDEX TABLE mytable;

...provided that REINDEX TABLE could recreate all indexes  
simultaneously as you suggest.


Well to me it sounded much more like:
 BEGIN;
  CREATE INDEX idx_a ON t(a) DEFERRED;
  CREATE INDEX idx_b ON t(b) DEFERRED;
 COMMIT;

And at commit time, PostgreSQL would build all the transaction  
indexes in one pass over the heap, but as Tom already pointed out,  
using only 1 CPU. Maybe that'd be a way to limit the overall io  
bandwidth usage while not consuming too many CPU resources at the  
same time.


I mean now we have a choice to either sync scan the table heap on  
multiple CPU, saving IO but using 1 CPU per index, or to limit CPU  
to only 1 but then scan the heap once per index. The intermediary  
option of using 1 CPU while still making a single heap scan sure  
can be worthwhile to some?



Here's an off-the-wall thought... since most of the CPU time is in  
the sort, what about allowing a backend to fork off dedicated sort  
processes? Aside from building multiple indexes at once, that  
functionality could also be useful in general queries.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more than one index in a single heap pass?

2009-07-17 Thread Heikki Linnakangas
decibel wrote:
 Here's an off-the-wall thought... since most of the CPU time is in the
 sort, what about allowing a backend to fork off dedicated sort
 processes? Aside from building multiple indexes at once, that
 functionality could also be useful in general queries.

Sure, that would be cool. And also a lot of work :-). The comparison
operators can be arbitrarily complex, potentially querying other tables
etc, so you would indeed need pretty much all the infrastrucutre you
need to solve the general case.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more than one index in a single heap pass?

2009-07-15 Thread Bruce Momjian
Andrew Dunstan wrote:
  Wasn't that a big part of the point of the parallel pg_restore feature?
 

 
 Well, yes, it's some of it, and in theory Tom's late addition of a queue 
 that gets all the dependencies of a table as soon as the table data is 
 restored should make that work better.  But of course, that's not the 
 only time indexes are created, and each index creation command will be 
 doing its own heap processing, albeit that synchronised scanning will 
 make that lots cheaper.
 
 As I said originally, it was just an idle thought that came to me today.

Well, TODO has:

Allow multiple indexes to be created concurrently, ideally via a
single heap scan, and have pg_restore use it

Isn't this already largely done by parallel pg_restore work? 

so we have to decide if we still want that item.  I think what we don't
have is a way to create multiple indexes simultaneously via SQL.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more than one index in a single heap pass?

2009-07-15 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Well, TODO has:

   Allow multiple indexes to be created concurrently, ideally via a
   single heap scan, and have pg_restore use it

   Isn't this already largely done by parallel pg_restore work? 

 so we have to decide if we still want that item.  I think what we don't
 have is a way to create multiple indexes simultaneously via SQL.

And if we did build that, people would be bleating because it could only
make use of one CPU.  I think multiple backends using the existing
syncscan infrastructure meets this need pretty well already.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more than one index in a single heap pass?

2009-07-15 Thread Dimitri Fontaine

Hi,

Le 15 juil. 09 à 02:01, Glen Parker a écrit :
Sounds to me like another reason to separate index definition from  
creation.  If an index can be defined but not yet created or valid,  
then you could imagine syntax like:


DEFINE INDEX blahblah1 ON mytable (some fields);
DEFINE INDEX blahblah2 ON mytable (some other fields);
[RE]INDEX TABLE mytable;

...provided that REINDEX TABLE could recreate all indexes  
simultaneously as you suggest.


Well to me it sounded much more like:
 BEGIN;
  CREATE INDEX idx_a ON t(a) DEFERRED;
  CREATE INDEX idx_b ON t(b) DEFERRED;
 COMMIT;

And at commit time, PostgreSQL would build all the transaction indexes  
in one pass over the heap, but as Tom already pointed out, using only  
1 CPU. Maybe that'd be a way to limit the overall io bandwidth usage  
while not consuming too many CPU resources at the same time.


I mean now we have a choice to either sync scan the table heap on  
multiple CPU, saving IO but using 1 CPU per index, or to limit CPU to  
only 1 but then scan the heap once per index. The intermediary option  
of using 1 CPU while still making a single heap scan sure can be  
worthwhile to some?


Regards,
--
dim
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more than one index in a single heap pass?

2009-07-14 Thread Alvaro Herrera
Andrew Dunstan wrote:

 I was just wondering idly today if we could usefully build a number of  
 indexes at the same time in a single pass over the heap, or could it be  
 that we wouldn't gain much? I haven't even got around to thinking about  
 any syntax for it.

Could we make it work on two backends building one index each in
synchronized scans?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more than one index in a single heap pass?

2009-07-14 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Andrew Dunstan wrote:
 I was just wondering idly today if we could usefully build a number of  
 indexes at the same time in a single pass over the heap, or could it be  
 that we wouldn't gain much? I haven't even got around to thinking about  
 any syntax for it.

 Could we make it work on two backends building one index each in
 synchronized scans?

Don't we more or less have that already?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more than one index in a single heap pass?

2009-07-14 Thread Greg Stark
On Tue, Jul 14, 2009 at 8:50 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Andrew Dunstan wrote:
 I was just wondering idly today if we could usefully build a number of
 indexes at the same time in a single pass over the heap, or could it be
 that we wouldn't gain much? I haven't even got around to thinking about
 any syntax for it.

 Could we make it work on two backends building one index each in
 synchronized scans?

 Don't we more or less have that already?

Wasn't that a big part of the point of the parallel pg_restore feature?

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more than one index in a single heap pass?

2009-07-14 Thread Andrew Dunstan



Greg Stark wrote:

On Tue, Jul 14, 2009 at 8:50 PM, Tom Lanet...@sss.pgh.pa.us wrote:
  

Alvaro Herrera alvhe...@commandprompt.com writes:


Andrew Dunstan wrote:
  

I was just wondering idly today if we could usefully build a number of
indexes at the same time in a single pass over the heap, or could it be
that we wouldn't gain much? I haven't even got around to thinking about
any syntax for it.


Could we make it work on two backends building one index each in
synchronized scans?
  

Don't we more or less have that already?



Wasn't that a big part of the point of the parallel pg_restore feature?

  


Well, yes, it's some of it, and in theory Tom's late addition of a queue 
that gets all the dependencies of a table as soon as the table data is 
restored should make that work better.  But of course, that's not the 
only time indexes are created, and each index creation command will be 
doing its own heap processing, albeit that synchronised scanning will 
make that lots cheaper.


As I said originally, it was just an idle thought that came to me today.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more than one index in a single heap pass?

2009-07-14 Thread Glen Parker

Andrew Dunstan wrote:
Well, yes, it's some of it, and in theory Tom's late addition of a queue 
that gets all the dependencies of a table as soon as the table data is 
restored should make that work better.  But of course, that's not the 
only time indexes are created, and each index creation command will be 
doing its own heap processing, albeit that synchronised scanning will 
make that lots cheaper.


As I said originally, it was just an idle thought that came to me today.



Sounds to me like another reason to separate index definition from 
creation.  If an index can be defined but not yet created or valid, then 
you could imagine syntax like:


DEFINE INDEX blahblah1 ON mytable (some fields);
DEFINE INDEX blahblah2 ON mytable (some other fields);
[RE]INDEX TABLE mytable;

...provided that REINDEX TABLE could recreate all indexes simultaneously 
as you suggest.


-Glen


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers