Re: [HACKERS] pg_reorg in core?

2012-09-26 Thread Andres Freund
On Tuesday, September 25, 2012 01:48:34 PM Michael Paquier wrote:
 On Tue, Sep 25, 2012 at 5:55 PM, Andres Freund and...@2ndquadrant.comwrote:
  On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote:
   On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund and...@2ndquadrant.com
  
  wrote:
   Could you clarify what do you mean here by cleanup?
   I am afraid I do not get your point here.
  
  Sorry, was a bit tired when writing the above.
  
  The point is that to work concurrent the CONCURRENT operations
  commit/start multiple transactions internally. It can be interrupted
  (user, shutdown, error,
  crash) and leave transient state behind every time it does so. What I
  wanted to
  say is that we need to take care that each of those can easily be cleaned
  up
  afterwards.
 
 Sure, many errors may happen.
 But, in the case of CREATE INDEX CONCURRENTLY, there is no clean up method
 implemented as far as I know (might be missing something though). Isn't an
 index only considered as invalid in case of failure for concurrent creation?
Well, you can DROP or REINDEX the invalid index.

There are several scenarios where you can get invalid indexes. Unique 
violations, postgres restarts, aborted index creation...

 In the case of REINDEX it would be essential to create such a cleanup
 mechanism as I cannot imagine a production database with an index that has
 been marked as invalid due to a concurrent reindex failure, by assuming here,
 of course, that REINDEX CONCURRENTLY would use the same level of process
 error as CREATE INDEX CONCURRENTLY.
Not sure what youre getting at?

 One of the possible cleanup mechanisms I got on top of my head is a
 callback at transaction abort, each callback would need to be different for
 each subtransaction used at during the concurrent operation.
 In case the callback itself fails, well the old and/or new indexes become
 invalid.
Thats not going to work. E.g. the session might have been aborted or such. 
Also, there is not much you can do from an callback at transaction end as you 
cannot do catalog modifications.

I was thinking of REINDEX CONCURRENTLY CONTINUE or something vaguely similar.
 
2. no support for concurrent on system tables (not easy for shared
catalogs)
   
   Doesn't this exclude all the tables that are in the schema catalog?
  
  No. Only SELECT array_to_string(array_agg(relname), ', ') FROM pg_class
  WHERE relisshared AND relkind = 'r';
  their toast tables and their indexes are shared. The problem is that for
  those you cannot create a separate index and let it update concurrently
  because you cannot write into each databases pg_class/pg_index.

 Yes indeed, I didn't think about things that are shared among databases.
 Blocking that is pretty simple, only a matter of places checked.

Its just a bit sad to make the thing not really appear lockless ;)


Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-26 Thread Michael Paquier
On Wed, Sep 26, 2012 at 8:13 PM, Andres Freund and...@2ndquadrant.comwrote:

 On Tuesday, September 25, 2012 01:48:34 PM Michael Paquier wrote:
  On Tue, Sep 25, 2012 at 5:55 PM, Andres Freund and...@2ndquadrant.com
 wrote:
   On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote:
On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund 
 and...@2ndquadrant.com
   
   wrote:
Could you clarify what do you mean here by cleanup?
I am afraid I do not get your point here.
  
   Sorry, was a bit tired when writing the above.
  
   The point is that to work concurrent the CONCURRENT operations
   commit/start multiple transactions internally. It can be interrupted
   (user, shutdown, error,
   crash) and leave transient state behind every time it does so. What I
   wanted to
   say is that we need to take care that each of those can easily be
 cleaned
   up
   afterwards.
 
  Sure, many errors may happen.
  But, in the case of CREATE INDEX CONCURRENTLY, there is no clean up
 method
  implemented as far as I know (might be missing something though). Isn't
 an
  index only considered as invalid in case of failure for concurrent
 creation?
 Well, you can DROP or REINDEX the invalid index.

 There are several scenarios where you can get invalid indexes. Unique
 violations, postgres restarts, aborted index creation...

  In the case of REINDEX it would be essential to create such a cleanup
  mechanism as I cannot imagine a production database with an index that
 has
  been marked as invalid due to a concurrent reindex failure, by assuming
 here,
  of course, that REINDEX CONCURRENTLY would use the same level of process
  error as CREATE INDEX CONCURRENTLY.
 Not sure what youre getting at?

I just meant that when CREATE INDEX CONCURRENTLY fails, the index created is
considered as invalid, so it cannot be used by planner.

Based on what you told before:
1) build new index with indisready = false
newindex.indisready = true
wait
2) newindex.indisvalid = true
wait
3) swap(oldindex.relfilenode, newindex.relfilenode)
oldindex.indisvalid = false
wait
4) oldindex.indisready = false
wait
drop new index with old relfilenode

If the reindex fails at step 1 or 2, the new index is not usable so the
relation will finish
with an index which is not valid. If it fails at step 4, the old index is
invalid. If it fails at step
3, both indexes are valid and both are usable for given relation.
Do you think it is acceptable to consider that the user has to do the
cleanup of the old or new index
himself if there is a failure?


  One of the possible cleanup mechanisms I got on top of my head is a
  callback at transaction abort, each callback would need to be different
 for
  each subtransaction used at during the concurrent operation.
  In case the callback itself fails, well the old and/or new indexes become
  invalid.
 Thats not going to work. E.g. the session might have been aborted or such.
 Also, there is not much you can do from an callback at transaction end as
 you
 cannot do catalog modifications.

 I was thinking of REINDEX CONCURRENTLY CONTINUE or something vaguely
 similar.

You could also reissue the reindex command and avoid an additional command.
When launching a
concurrent reindex, it could be possible to check if there is already an
index that has been created to replace the
old one that failed previously. In order to control that, why not adding an
additional field in pg_index?
When creating a new index concurrently, we register in its pg_index entry
the oid of the index that it has to
replace. When reissuing the command after a failure, it is then possible to
check if there is already an index that has
been issued by a previous REINDEX CONCURRENT command and based on the flag
values of the old and new
indexes it is then possible to replay the command from the step where it
previously failed.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] pg_reorg in core?

2012-09-26 Thread Andres Freund
On Wednesday, September 26, 2012 02:39:36 PM Michael Paquier wrote:
 Do you think it is acceptable to consider that the user has to do the
 cleanup of the old or new index himself if there is a failure?
The problem I see is that if you want the thing to be efficient you might end 
up 
doing step 1) for all/a bunch of indexes, then 2), then  In that case you 
can have loads of invalid indexes around. 

 You could also reissue the reindex command and avoid an additional command.
 When launching a concurrent reindex, it could be possible to check if there
 is already an index that has been created to replace the old one that failed
 previously. In order to control that, why not adding an additional field in
 pg_index?
 When creating a new index concurrently, we register in its pg_index entry
 the oid of the index that it has to replace. When reissuing the command
 after a failure, it is then possible to check if there is already an index
 that has been issued by a previous REINDEX CONCURRENT command and based on
 the flag values of the old and new indexes it is then possible to replay the
 command from the step where it previously failed.
I don't really like this idea but we might end up there anyway because we 
probably need to keep track whether an index is actually only a replacement 
index that shouldn't exist on its own. Otherwise its hard to know which 
indexes to drop if it failed halfway through.

Greetings,

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-26 Thread Bruce Momjian
On Mon, Sep 24, 2012 at 03:55:35PM -0700, Josh Berkus wrote:
 On 9/24/12 3:43 PM, Simon Riggs wrote:
  On 24 September 2012 17:36, Josh Berkus j...@agliodbs.com wrote:
 
  For me, the Postgres user interface should include
  * REINDEX CONCURRENTLY
 
  I don't see why we don't have REINDEX CONCURRENTLY now.
  
  Same reason for everything on (anyone's) TODO list.
 
 Yes, I'm just pointing out that it would be a very small patch for
 someone, and that AFAIK it didn't make it on the TODO list yet.

I see it on the TODO list, and it has been there for years:

https://wiki.postgresql.org/wiki/Todo#Indexes
Add REINDEX CONCURRENTLY, like CREATE INDEX CONCURRENTLY 

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-25 Thread Andres Freund
On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote:
 On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund and...@2ndquadrant.comwrote:
  On Tuesday, September 25, 2012 12:55:35 AM Josh Berkus wrote:
   On 9/24/12 3:43 PM, Simon Riggs wrote:
On 24 September 2012 17:36, Josh Berkus j...@agliodbs.com wrote:
For me, the Postgres user interface should include
* REINDEX CONCURRENTLY

I don't see why we don't have REINDEX CONCURRENTLY now.

Same reason for everything on (anyone's) TODO list.
   
   Yes, I'm just pointing out that it would be a very small patch for
   someone, and that AFAIK it didn't make it on the TODO list yet.
  
  Its not *that* small.
  
  1. You need more than you can do with CREATE INDEX CONCURRENTLY and DROP
  INDEX
  CONCURRENTLY because the index can e.g. be referenced by a foreign key
  constraint. So you need to replace the existing index oid with a new one
  by swapping the relfilenodes of both after verifying several side
  conditions (indcheckxmin, indisvalid, indisready).
  
  It would probably have to look like:
  
  - build new index with indisready = false
  - newindex.indisready = true
  - wait
  - newindex.indisvalid = true
  - wait
  - swap(oldindex.relfilenode, newindex.relfilenode)
  - oldindex.indisvalid = false
  - wait
  - oldindex.indisready = false
  - wait
  - drop new index with old relfilenode
  
  Every wait indicates an externally visible state which you might
  encounter/need
  to cleanup...
 
 Could you clarify what do you mean here by cleanup?
 I am afraid I do not get your point here.
Sorry, was a bit tired when writing the above.

The point is that to work concurrent the CONCURRENT operations commit/start 
multiple transactions internally. It can be interrupted (user, shutdown, error, 
crash) and leave transient state behind every time it does so. What I wanted to 
say is that we need to take care that each of those can easily be cleaned up 
afterwards.

  2. no support for concurrent on system tables (not easy for shared
  catalogs)
 Doesn't this exclude all the tables that are in the schema catalog?
No. Only

SELECT array_to_string(array_agg(relname), ', ') FROM pg_class WHERE 
relisshared AND relkind = 'r';

their toast tables and their indexes are shared. The problem is that for those 
you cannot create a separate index and let it update concurrently because you 
cannot write into each databases pg_class/pg_index.

  3. no support for the indexes of exclusion constraints (not hard I think)
 This just consists in a check of indisready in pg_index.
It will probably be several places, but yea, I don't think its hard.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-25 Thread Michael Paquier
On Tue, Sep 25, 2012 at 5:55 PM, Andres Freund and...@2ndquadrant.comwrote:

 On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote:
  On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  Could you clarify what do you mean here by cleanup?
  I am afraid I do not get your point here.

 Sorry, was a bit tired when writing the above.

 The point is that to work concurrent the CONCURRENT operations commit/start
 multiple transactions internally. It can be interrupted (user, shutdown,
 error,
 crash) and leave transient state behind every time it does so. What I
 wanted to
 say is that we need to take care that each of those can easily be cleaned
 up
 afterwards.

Sure, many errors may happen.
But, in the case of CREATE INDEX CONCURRENTLY, there is no clean up method
implemented as far as I know (might be missing something though). Isn't an
index
only considered as invalid in case of failure for concurrent creation?
In the case of REINDEX it would be essential to create such a cleanup
mechanism
as I cannot imagine a production database with an index that has been
marked as
invalid due to a concurrent reindex failure, by assuming here, of course,
that
REINDEX CONCURRENTLY would use the same level of process error as CREATE
INDEX CONCURRENTLY.

One of the possible cleanup mechanisms I got on top of my head is a
callback at
transaction abort, each callback would need to be different for each
subtransaction
used at during the concurrent operation.
In case the callback itself fails, well the old and/or new indexes become
invalid.



   2. no support for concurrent on system tables (not easy for shared
   catalogs)
  Doesn't this exclude all the tables that are in the schema catalog?
 No. Only

 SELECT array_to_string(array_agg(relname), ', ') FROM pg_class WHERE
 relisshared AND relkind = 'r';

 their toast tables and their indexes are shared. The problem is that for
 those
 you cannot create a separate index and let it update concurrently because
 you
 cannot write into each databases pg_class/pg_index.

Yes indeed, I didn't think about things that are shared among databases.
Blocking that is pretty simple, only a matter of places checked.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] pg_reorg in core?

2012-09-25 Thread Dimitri Fontaine
Simon Riggs si...@2ndquadrant.com writes:
 For me, the Postgres user interface should include
 * REINDEX CONCURRENTLY
 * CLUSTER CONCURRENTLY
 * ALTER TABLE CONCURRENTLY
 and also that autovacuum would be expanded to include REINDEX and
 CLUSTER, renaming it to automaint.

FWIW, +1 to all those user requirements, and for not having pg_reorg
simply moved as-is nearer to core. I would paint the shed autoheal,
maybe.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-25 Thread Michael Paquier
On Wed, Sep 26, 2012 at 4:42 AM, Dimitri Fontaine dimi...@2ndquadrant.frwrote:

 Simon Riggs si...@2ndquadrant.com writes:
  For me, the Postgres user interface should include
  * REINDEX CONCURRENTLY
  * CLUSTER CONCURRENTLY
  * ALTER TABLE CONCURRENTLY
  and also that autovacuum would be expanded to include REINDEX and
  CLUSTER, renaming it to automaint.

 FWIW, +1 to all those user requirements, and for not having pg_reorg
 simply moved as-is nearer to core. I would paint the shed autoheal,
 maybe.

Yes, completely agreed.
Based on what Simon is suggesting, REINDEX and CLUSTER extensions
are prerequisites for autovacuum extension. It would need to use a mechanism
that it slightly different than pg_reorg. ALTER TABLE could used something
close
to pg_reorg by creating a new table then swaping the 2 tables. The cases of
column
drop and addition particularly need some thoughts.

I would like to work on such features and provide patches for the 2 first.
This will of
course strongly depend on the time I can spend on in the next couple of
months.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] pg_reorg in core?

2012-09-24 Thread Alvaro Herrera
Excerpts from Daniele Varrazzo's message of dom sep 23 22:02:51 -0300 2012:
 On Mon, Sep 24, 2012 at 12:23 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 
  As proposed by Masahiko, a single organization grouping all the tools (one
  repository per tool) would be enough. Please note that github can also host
  documentation. Bug tracker would be tool-dedicated in this case.
 
 From this PoV, pgFoundry allows your tool to be under
 http://yourtool.projects.postgresql.org instead of under a more
 generic namespace: I find it a nice and cozy place in the url space
 where to put your project. If pgFoundry will be dismissed I hope at
 least a hosting service for static pages will remain.

I don't think that has been offered.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-24 Thread Roberto Mello
On Sat, Sep 22, 2012 at 3:25 AM, Satoshi Nagayasu sn...@uptime.jp wrote:

 To solve this problem, I would like to have some umbrella project.
 It would be called pg dba utils, or something like this.
 This umbrella project may contain several third-party tools (pg_reorg,
 pg_rman, pg_filedump, xlogdump, etc, etc...) as its sub-modules.

Great idea!

+1

Roberto Mello


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-24 Thread Simon Riggs
On 21 September 2012 08:42, Michael Paquier michael.paqu...@gmail.com wrote:


 On Fri, Sep 21, 2012 at 1:00 PM, Hitoshi Harada umi.tan...@gmail.com
 wrote:

 I'm not familiar with pg_reorg, but I wonder why we need a separate
 program for this task.  I know pg_reorg is ok as an external program
 per se, but if we could optimize CLUSTER (or VACUUM which I'm a little
 pessimistic about) in the same way, it's much nicer than having
 additional binary + extension.  Isn't it possible to do the same thing
 above within the CLUSTER command?  Maybe CLUSTER .. CONCURRENTLY?

 CLUSTER might be more adapted in this case as the purpose is to reorder the
 table.
 The same technique used by pg_reorg (aka table coupled with triggers) could
 lower the lock access of the table.
 Also, it could be possible to control each sub-operation in the same fashion
 way as CREATE INDEX CONCURRENTLY.
 By the way, whatever the operation, VACUUM or CLUSTER used, I got a couple
 of doubts:
 1) isn't it be too costly for a core operation as pg_reorg really needs many
 temporary objects? Could be possible to reduce the number of objects created
 if added to core though...
 2) Do you think the current CLUSTER is enough and are there wishes to
 implement such an optimization directly in core?


For me, the Postgres user interface should include
* REINDEX CONCURRENTLY
* CLUSTER CONCURRENTLY
* ALTER TABLE CONCURRENTLY
and also that autovacuum would be expanded to include REINDEX and
CLUSTER, renaming it to automaint.

The actual implementation mechanism for those probably looks something
like pg_reorg, but I don't see it as preferable to include the utility
directly into core, though potentially some of the underlying code
might be.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-24 Thread Satoshi Nagayasu

2012/09/25 0:15, Simon Riggs wrote:

On 21 September 2012 08:42, Michael Paquier michael.paqu...@gmail.com wrote:



On Fri, Sep 21, 2012 at 1:00 PM, Hitoshi Harada umi.tan...@gmail.com
wrote:


I'm not familiar with pg_reorg, but I wonder why we need a separate
program for this task.  I know pg_reorg is ok as an external program
per se, but if we could optimize CLUSTER (or VACUUM which I'm a little
pessimistic about) in the same way, it's much nicer than having
additional binary + extension.  Isn't it possible to do the same thing
above within the CLUSTER command?  Maybe CLUSTER .. CONCURRENTLY?


CLUSTER might be more adapted in this case as the purpose is to reorder the
table.
The same technique used by pg_reorg (aka table coupled with triggers) could
lower the lock access of the table.
Also, it could be possible to control each sub-operation in the same fashion
way as CREATE INDEX CONCURRENTLY.
By the way, whatever the operation, VACUUM or CLUSTER used, I got a couple
of doubts:
1) isn't it be too costly for a core operation as pg_reorg really needs many
temporary objects? Could be possible to reduce the number of objects created
if added to core though...
2) Do you think the current CLUSTER is enough and are there wishes to
implement such an optimization directly in core?



For me, the Postgres user interface should include
* REINDEX CONCURRENTLY
* CLUSTER CONCURRENTLY
* ALTER TABLE CONCURRENTLY
and also that autovacuum would be expanded to include REINDEX and
CLUSTER, renaming it to automaint.

The actual implementation mechanism for those probably looks something
like pg_reorg, but I don't see it as preferable to include the utility
directly into core, though potentially some of the underlying code
might be.


I think it depends on what trade-off we can see.

AFAIK, basically, rebuilding tables and/or indexes has
a trade-off between lock-free and disk-space.

So, if we have enough disk space to build a temporary
table/index when rebuilding a table/index, concurrently
would be a great option, and I would love it to have
in core.

Regards,
--
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-24 Thread Christopher Browne
On Mon, Sep 24, 2012 at 10:17 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Excerpts from Daniele Varrazzo's message of dom sep 23 22:02:51 -0300 2012:
 On Mon, Sep 24, 2012 at 12:23 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:

  As proposed by Masahiko, a single organization grouping all the tools (one
  repository per tool) would be enough. Please note that github can also host
  documentation. Bug tracker would be tool-dedicated in this case.

 From this PoV, pgFoundry allows your tool to be under
 http://yourtool.projects.postgresql.org instead of under a more
 generic namespace: I find it a nice and cozy place in the url space
 where to put your project. If pgFoundry will be dismissed I hope at
 least a hosting service for static pages will remain.

 I don't think that has been offered.

But I don't think it's necessarily the case that pgFoundry is getting
dismissed, either.

I got a note from Marc Fournier not too long ago (sent to some
probably-not-small set of people with pgFoundry accounts) indicating
that they were planning to upgrade gForge as far as they could, and
then switch to FusionForge http://fusionforge.org/, which is
evidently the successor.  It shouldn't be assumed that the upgrade
process will be easy or quick.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-24 Thread Josh Berkus

 For me, the Postgres user interface should include
 * REINDEX CONCURRENTLY

I don't see why we don't have REINDEX CONCURRENTLY now.  When I was
writing out the instructions for today's update, I was thinking we
already have all the commands for this.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-24 Thread Simon Riggs
On 24 September 2012 17:36, Josh Berkus j...@agliodbs.com wrote:

 For me, the Postgres user interface should include
 * REINDEX CONCURRENTLY

 I don't see why we don't have REINDEX CONCURRENTLY now.

Same reason for everything on (anyone's) TODO list.

Lack of vision is not holding us back, we just need the vision to realise it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-24 Thread Josh Berkus
On 9/24/12 3:43 PM, Simon Riggs wrote:
 On 24 September 2012 17:36, Josh Berkus j...@agliodbs.com wrote:

 For me, the Postgres user interface should include
 * REINDEX CONCURRENTLY

 I don't see why we don't have REINDEX CONCURRENTLY now.
 
 Same reason for everything on (anyone's) TODO list.

Yes, I'm just pointing out that it would be a very small patch for
someone, and that AFAIK it didn't make it on the TODO list yet.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-24 Thread Andres Freund
On Tuesday, September 25, 2012 12:55:35 AM Josh Berkus wrote:
 On 9/24/12 3:43 PM, Simon Riggs wrote:
  On 24 September 2012 17:36, Josh Berkus j...@agliodbs.com wrote:
  For me, the Postgres user interface should include
  * REINDEX CONCURRENTLY
  
  I don't see why we don't have REINDEX CONCURRENTLY now.
  
  Same reason for everything on (anyone's) TODO list.
 
 Yes, I'm just pointing out that it would be a very small patch for
 someone, and that AFAIK it didn't make it on the TODO list yet.
Its not *that* small.

1. You need more than you can do with CREATE INDEX CONCURRENTLY and DROP INDEX 
CONCURRENTLY because the index can e.g. be referenced by a foreign key 
constraint. So you need to replace the existing index oid with a new one by 
swapping the relfilenodes of both after verifying several side conditions 
(indcheckxmin, indisvalid, indisready).

It would probably have to look like:

- build new index with indisready = false
- newindex.indisready = true
- wait
- newindex.indisvalid = true
- wait
- swap(oldindex.relfilenode, newindex.relfilenode)
- oldindex.indisvalid = false
- wait
- oldindex.indisready = false
- wait
- drop new index with old relfilenode

Every wait indicates an externally visible state which you might encounter/need 
to cleanup...

To make it viable to use that systemwide it might be necessary to batch the 
individual steps together for multiple indexes because all that waiting is 
going to suck if you do it for every single table in the database while you 
also have longrunning queries...

2. no support for concurrent on system tables (not easy for shared catalogs)

3. no support for the indexes of exlusion constraints (not hard I think)

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-24 Thread Michael Paquier
On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund and...@2ndquadrant.comwrote:

 On Tuesday, September 25, 2012 12:55:35 AM Josh Berkus wrote:
  On 9/24/12 3:43 PM, Simon Riggs wrote:
   On 24 September 2012 17:36, Josh Berkus j...@agliodbs.com wrote:
   For me, the Postgres user interface should include
   * REINDEX CONCURRENTLY
  
   I don't see why we don't have REINDEX CONCURRENTLY now.
  
   Same reason for everything on (anyone's) TODO list.
 
  Yes, I'm just pointing out that it would be a very small patch for
  someone, and that AFAIK it didn't make it on the TODO list yet.
 Its not *that* small.

 1. You need more than you can do with CREATE INDEX CONCURRENTLY and DROP
 INDEX
 CONCURRENTLY because the index can e.g. be referenced by a foreign key
 constraint. So you need to replace the existing index oid with a new one by
 swapping the relfilenodes of both after verifying several side conditions
 (indcheckxmin, indisvalid, indisready).

 It would probably have to look like:

 - build new index with indisready = false
 - newindex.indisready = true
 - wait
 - newindex.indisvalid = true
 - wait
 - swap(oldindex.relfilenode, newindex.relfilenode)
 - oldindex.indisvalid = false
 - wait
 - oldindex.indisready = false
 - wait
 - drop new index with old relfilenode

 Every wait indicates an externally visible state which you might
 encounter/need
 to cleanup...

Could you clarify what do you mean here by cleanup?
I am afraid I do not get your point here.


 2. no support for concurrent on system tables (not easy for shared
 catalogs)

Doesn't this exclude all the tables that are in the schema catalog?



 3. no support for the indexes of exclusion constraints (not hard I think)

This just consists in a check of indisready in pg_index.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] pg_reorg in core?

2012-09-23 Thread Satoshi Nagayasu
2012/09/23 12:37, Greg Sabino Mullane wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: RIPEMD160
 
 
 I think it's time to consider some *umbrella project* for maintaining
 several small projects outside the core.

 Well, that was pgfoundry, and it didn't work out.
 
 I'm not sure that is quite analogous to what was being proposed.
 I read it as more of let's package a bunch of these small utilities
 together into a single project, such that installing one installs them
 all (e.g. aptitude install pg_tools), and they all have a single bug
 tracker, etc. That tracker could be github, of course.

Exactly --- I do not care the SCM system though. :)

 I'm not convinced of the merit of that plan, but that's an alternative
 interpretation that doesn't involve our beloved pgfoundry. :)

For example, xlogdump had not been maintained for 5 years when
I picked it up last year. And the latest pg_filedump that supports 9.2
has not been released yet. pg_reorg as well.

If those tools are in a single project, it would be easier to keep
attention on it. Then, developers can easily build *all of them*
at once, fix them, and post any patch on the single mailing list.
Actually, it would save developers from waisting their time.

From my viewpoint, it's not just a SCM or distributing issue.
It's about how to survive for such small projects around the core
even if these could not come in the core.

Regards,

 
 Oh, and -1 for putting it in core. Way too early, and not
 important enough.
 
 - -- 
 Greg Sabino Mullane g...@turnstep.com
 PGP Key: 0x14964AC8 201209222334
 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
 -BEGIN PGP SIGNATURE-
 
 iEYEAREDAAYFAlBeg/AACgkQvJuQZxSWSsjL5ACgimT71B4lSb1ELhgMw5EBzAKs
 xHIAn08vxGzmM6eSmDfZfxlJDTousq7h
 =KgXW
 -END PGP SIGNATURE-
 
 
 
 


-- 
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-23 Thread Michael Paquier
On Mon, Sep 24, 2012 at 1:14 AM, Satoshi Nagayasu sn...@uptime.jp wrote:

 2012/09/23 12:37, Greg Sabino Mullane wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: RIPEMD160
 
 
  I think it's time to consider some *umbrella project* for maintaining
  several small projects outside the core.
 
  Well, that was pgfoundry, and it didn't work out.
 
  I'm not sure that is quite analogous to what was being proposed.
  I read it as more of let's package a bunch of these small utilities
  together into a single project, such that installing one installs them
  all (e.g. aptitude install pg_tools), and they all have a single bug
  tracker, etc. That tracker could be github, of course.

 Exactly --- I do not care the SCM system though. :)

The bug tracker is going to be a mess if it has to manage 100 subprojects,
knowing that each of them is strictly independant.
Maintainers are also different people for each tool.



  I'm not convinced of the merit of that plan, but that's an alternative
  interpretation that doesn't involve our beloved pgfoundry. :)

 For example, xlogdump had not been maintained for 5 years when
 I picked it up last year. And the latest pg_filedump that supports 9.2
 has not been released yet. pg_reorg as well.

 If those tools are in a single project, it would be easier to keep
 attention on it. Then, developers can easily build *all of them*
 at once, fix them, and post any patch on the single mailing list.
 Actually, it would save developers from waisting their time.

 From my viewpoint, it's not just a SCM or distributing issue.
 It's about how to survive for such small projects around the core
 even if these could not come in the core.

The package manager system could be  easily pgxn. It is already designed
for that.
For development what you are looking for here is something that github
could perfectly manage.
As proposed by Masahiko, a single organization grouping all the tools (one
repository per tool) would be enough. Please note that github can also host
documentation. Bug tracker would be tool-dedicated in this case.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] pg_reorg in core?

2012-09-22 Thread M.Sakamoto
Hi,
I'm sakamoto, maintainer of reorg.

 What could be also great is to move the project directly into github to
 facilitate its maintenance and development.
No argument from me there, especially as I have my own fork in github,
but that's up to the current maintainers.
Yup, I am thinking development on CVS(onPgfoundry) is a bit awkward for
me and github would be a suitable place.

To be honest, we have little available development resources, so
no additional features are added recently. But features and fixes to
be done piled up, which Josh sums up.

In the short term, within this month I'll release minor versionup
of reorg to support PostgreSQL 9.2. And I think it's the time to
reconsider the way we maintain pg_reorg.
It's happy that Josh and Michael are interested in reorg,
and I wish you to be a maintainer :)

I think we can discuss at reorg list.

M.Sakamoto NTT OSS Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-22 Thread Satoshi Nagayasu
(2012/09/22 11:01), sakamoto wrote:
 (2012/09/22 10:02), Christopher Browne wrote:

 If the present project is having a tough time doing enhancements, I 
 should think it mighty questionable to try to draw it into core, that 
 presses it towards a group of already very busy developers.

 On the other hand, if the present development efforts can be made more 
 public, by having them take place in a more public repository, that at 
 least has potential to let others in the community see and 
 participate.  There are no guarantees, but privacy is liable to hurt.

 I wouldn't expect any sudden huge influx of developers, but a steady 
 visible stream of development effort would be mighty useful to a 
 merge into core argument.

 A *lot* of projects are a lot like this.  On the Slony project, we 
 have tried hard to maintain this sort of visibility.  Steve Singer, 
 Jan Wieck and I do our individual efforts on git repos visible at 
 GitHub to ensure ongoing efforts aren't invisible inside a corporate 
 repo.  It hasn't led to any massive of extra developers, but I am 
 always grateful to see Peter Eisentraut's bug reports.

 
 Agreed.  What reorg project needs first is transparency, including
 issue traking, bugs,  listup todo items, clearfied release schedules,
 quarity assurance and so force.
 Only after all that done, the discussion to put them to core can be 
 started.
 
 Until now, reorg is developed and maintained behind corporate repository.
 But now that its activity goes slow, what I should do as a maintainer is to
 try development process more public and finds someone to corporate with:)

I think it's time to consider some *umbrella project* for maintaining
several small projects outside the core.

As you pointed out, the problem here is that it's difficult to keep
enough eyeballs and development resource on tiny projects outside
the core.

For examples, NTT OSSC has created lots of tools, but they're facing
some difficulties to keep them being maintained because of their
development resources. There're diffrent code repositories, different
web sites, diffirent issus tracking system and different dev mailing
lists, for different small projects. My xlogdump as well.

Actually, that's the reason why it's difficult to keep enough eyeballs
on small third-party projects. And also the reason why some developers
want to push their tools into the core, isn't it? :)

To solve this problem, I would like to have some umbrella project.
It would be called pg dba utils, or something like this.
This umbrella project may contain several third-party tools (pg_reorg,
pg_rman, pg_filedump, xlogdump, etc, etc...) as its sub-modules.

And also it may have single web site, code repository, issue tracking
system and developer mailing list in order to share its development
resource for testing, maintening and releasing. I think it would help
third-party projects keep enough eyeballs even outside the core.

Of course, if a third-party project has faster pace on its development
and enough eyeballs to maintain, it's ok to be an independent project.
However when a tool have already got matured with less eyeballs,
it needs to be merged into this umbrella project.

Any comments?

 
 Sakamoto
 
 


-- 
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-22 Thread Pavel Stehule
2012/9/22 Satoshi Nagayasu sn...@uptime.jp:
 (2012/09/22 11:01), sakamoto wrote:
 (2012/09/22 10:02), Christopher Browne wrote:

 If the present project is having a tough time doing enhancements, I
 should think it mighty questionable to try to draw it into core, that
 presses it towards a group of already very busy developers.

 On the other hand, if the present development efforts can be made more
 public, by having them take place in a more public repository, that at
 least has potential to let others in the community see and
 participate.  There are no guarantees, but privacy is liable to hurt.

 I wouldn't expect any sudden huge influx of developers, but a steady
 visible stream of development effort would be mighty useful to a
 merge into core argument.

 A *lot* of projects are a lot like this.  On the Slony project, we
 have tried hard to maintain this sort of visibility.  Steve Singer,
 Jan Wieck and I do our individual efforts on git repos visible at
 GitHub to ensure ongoing efforts aren't invisible inside a corporate
 repo.  It hasn't led to any massive of extra developers, but I am
 always grateful to see Peter Eisentraut's bug reports.


 Agreed.  What reorg project needs first is transparency, including
 issue traking, bugs,  listup todo items, clearfied release schedules,
 quarity assurance and so force.
 Only after all that done, the discussion to put them to core can be
 started.

 Until now, reorg is developed and maintained behind corporate repository.
 But now that its activity goes slow, what I should do as a maintainer is to
 try development process more public and finds someone to corporate with:)

 I think it's time to consider some *umbrella project* for maintaining
 several small projects outside the core.

 As you pointed out, the problem here is that it's difficult to keep
 enough eyeballs and development resource on tiny projects outside
 the core.

 For examples, NTT OSSC has created lots of tools, but they're facing
 some difficulties to keep them being maintained because of their
 development resources. There're diffrent code repositories, different
 web sites, diffirent issus tracking system and different dev mailing
 lists, for different small projects. My xlogdump as well.

 Actually, that's the reason why it's difficult to keep enough eyeballs
 on small third-party projects. And also the reason why some developers
 want to push their tools into the core, isn't it? :)

 To solve this problem, I would like to have some umbrella project.
 It would be called pg dba utils, or something like this.
 This umbrella project may contain several third-party tools (pg_reorg,
 pg_rman, pg_filedump, xlogdump, etc, etc...) as its sub-modules.

 And also it may have single web site, code repository, issue tracking
 system and developer mailing list in order to share its development
 resource for testing, maintening and releasing. I think it would help
 third-party projects keep enough eyeballs even outside the core.

 Of course, if a third-party project has faster pace on its development
 and enough eyeballs to maintain, it's ok to be an independent project.
 However when a tool have already got matured with less eyeballs,
 it needs to be merged into this umbrella project.

 Any comments?


good idea

Pavel


 Sakamoto




 --
 Satoshi Nagayasu sn...@uptime.jp
 Uptime Technologies, LLC. http://www.uptime.jp


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-22 Thread Daniele Varrazzo
On Fri, Sep 21, 2012 at 9:45 AM, M.Sakamoto
sakamoto_masahiko...@lab.ntt.co.jp wrote:
 Hi,
 I'm sakamoto, maintainer of reorg.

 What could be also great is to move the project directly into github to
 facilitate its maintenance and development.
No argument from me there, especially as I have my own fork in github,
but that's up to the current maintainers.
 Yup, I am thinking development on CVS(onPgfoundry) is a bit awkward for
 me and github would be a suitable place.

Hello Sakamoto-san

I have created a reorg organization on github: https://github.com/reorg/
You are welcome to become one of the owners of the organization. I
have already added Itagaki Takahiro as owner because he has a github
account. If you open a github account or give me the email of one you
own I will invite you as organization owner. Michael is also member of
the organization.

I have re-converted the original CVS repository as Michael's
conversion was missing the commit email info, but I have rebased his
commits on the new master. My intention is to track CVS commits into
the cvs branch of the repos and merge them into the master, until
official development is moved to git.

The repository is at https://github.com/reorg/pg_reorg. Because I'm
not sure yet about a few details (from the development model to the
committers emails) it may be rebased in the near future, until
everything has been decided.

Thank you very much.

-- Daniele


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-22 Thread Peter Eisentraut
On Sat, 2012-09-22 at 16:25 +0900, Satoshi Nagayasu wrote:
 I think it's time to consider some *umbrella project* for maintaining
 several small projects outside the core.

Well, that was pgfoundry, and it didn't work out.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-22 Thread Christopher Browne
On Sat, Sep 22, 2012 at 7:45 PM, Peter Eisentraut pete...@gmx.net wrote:
 On Sat, 2012-09-22 at 16:25 +0900, Satoshi Nagayasu wrote:
 I think it's time to consider some *umbrella project* for maintaining
 several small projects outside the core.

 Well, that was pgfoundry, and it didn't work out.

There seem to be some efforts to update it, but yeah, the software
behind it didn't age gracefully, and it seems doubtful to me that
people will be flocking back to pgfoundry.

The other ongoing attempt at an umbrella is PGXN, and it's different
enough in approach that, while it's not obvious that it'll succeed, if
it fails, the failure wouldn't involve the same set of issues that
made pgfoundry problematic.

PGXN notably captures metadata about the project; resources (e.g. -
SCM) don't have to be kept there.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-22 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 I think it's time to consider some *umbrella project* for maintaining
 several small projects outside the core.

 Well, that was pgfoundry, and it didn't work out.

I'm not sure that is quite analogous to what was being proposed. 
I read it as more of let's package a bunch of these small utilities 
together into a single project, such that installing one installs them 
all (e.g. aptitude install pg_tools), and they all have a single bug 
tracker, etc. That tracker could be github, of course.

I'm not convinced of the merit of that plan, but that's an alternative 
interpretation that doesn't involve our beloved pgfoundry. :)

Oh, and -1 for putting it in core. Way too early, and not 
important enough.

- -- 
Greg Sabino Mullane g...@turnstep.com
PGP Key: 0x14964AC8 201209222334
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAlBeg/AACgkQvJuQZxSWSsjL5ACgimT71B4lSb1ELhgMw5EBzAKs
xHIAn08vxGzmM6eSmDfZfxlJDTousq7h
=KgXW
-END PGP SIGNATURE-




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-21 Thread Daniele Varrazzo
On Fri, Sep 21, 2012 at 5:17 AM, Josh Kupershmidt schmi...@gmail.com wrote:

 If the argument for moving pg_reorg into core is faster and easier
 development, well I don't really buy that.

I don't see any problem in having pg_reorg in PGXN instead.

I've tried adding a META.json to the project and it seems working fine
with the pgxn client. It is together with other patches in my own
github fork.

https://github.com/dvarrazzo/pg_reorg/

I haven't submitted it to PGXN as I prefer the original author to keep
the ownership.

-- Daniele


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-21 Thread Michael Paquier
On Fri, Sep 21, 2012 at 9:33 PM, Daniele Varrazzo 
daniele.varra...@gmail.com wrote:

 On Fri, Sep 21, 2012 at 5:17 AM, Josh Kupershmidt schmi...@gmail.com
 wrote:

  If the argument for moving pg_reorg into core is faster and easier
  development, well I don't really buy that.

 I don't see any problem in having pg_reorg in PGXN instead.

 I've tried adding a META.json to the project and it seems working fine
 with the pgxn client. It is together with other patches in my own
 github fork.

 https://github.com/dvarrazzo/pg_reorg/

 I haven't submitted it to PGXN as I prefer the original author to keep
 the ownership.

Thanks, I merged your patches with the dev branch for the time being.
It would be great to have some input from the maintainers of pg_reorg in
pgfoundry to see if they agree about putting it in pgxn.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] pg_reorg in core?

2012-09-21 Thread Michael Paquier
On Fri, Sep 21, 2012 at 1:00 PM, Hitoshi Harada umi.tan...@gmail.comwrote:

 I'm not familiar with pg_reorg, but I wonder why we need a separate
 program for this task.  I know pg_reorg is ok as an external program
 per se, but if we could optimize CLUSTER (or VACUUM which I'm a little
 pessimistic about) in the same way, it's much nicer than having
 additional binary + extension.  Isn't it possible to do the same thing
 above within the CLUSTER command?  Maybe CLUSTER .. CONCURRENTLY?

CLUSTER might be more adapted in this case as the purpose is to reorder the
table.
The same technique used by pg_reorg (aka table coupled with triggers) could
lower the lock access of the table.
Also, it could be possible to control each sub-operation in the same
fashion way as CREATE INDEX CONCURRENTLY.
By the way, whatever the operation, VACUUM or CLUSTER used, I got a couple
of doubts:
1) isn't it be too costly for a core operation as pg_reorg really needs
many temporary objects? Could be possible to reduce the number of objects
created if added to core though...
2) Do you think the current CLUSTER is enough and are there wishes to
implement such an optimization directly in core?
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] pg_reorg in core?

2012-09-21 Thread sakamoto

(2012/09/21 22:32), Michael Paquier wrote:
On Fri, Sep 21, 2012 at 9:33 PM, Daniele Varrazzo 
daniele.varra...@gmail.com mailto:daniele.varra...@gmail.com wrote:


On Fri, Sep 21, 2012 at 5:17 AM, Josh Kupershmidt
schmi...@gmail.com mailto:schmi...@gmail.com wrote:

I haven't submitted it to PGXN as I prefer the original author to keep
the ownership.

Thanks, I merged your patches with the dev branch for the time being.
It would be great to have some input from the maintainers of pg_reorg 
in pgfoundry to see if they agree about putting it in pgxn.



Hi, I'm Sakamoto, reorg mainainer.
I'm very happy Josh, Michael  and Daniele are interested in reorg.

I'm working on the next version of reorg 1.1.8, which will be released 
in a couple of days.
And I come to think that it is a point to reconsider the way to 
develop/maintain.
To be honest,   we have little available development resources, so no 
additional
features are added recently.  But features and fixes to be done (as Josh 
sums up. thanks).


I think it is a good idea to develop on github. Michael's repo is the root?
After the release of 1.1.8, I will freeze CVS repository and create a 
mirror on github.

# Or Michael's repo will do :)

I have received some patches from Josh, Daniele. It should be developed 
in the next
major version 1.2. So some of them may not be included in 1.1.8 (caz 
it's minor versionup),

but I feel so appreciated.

I think we can discuss further at reorg list.

Sakamoto


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-21 Thread Michael Paquier
On Sat, Sep 22, 2012 at 9:08 AM, sakamoto dsakam...@lolloo.net wrote:

 (2012/09/21 22:32), Michael Paquier wrote:

 On Fri, Sep 21, 2012 at 9:33 PM, Daniele Varrazzo 
 daniele.varra...@gmail.com 
 mailto:daniele.varrazzo@**gmail.comdaniele.varra...@gmail.com
 wrote:

 On Fri, Sep 21, 2012 at 5:17 AM, Josh Kupershmidt
 schmi...@gmail.com mailto:schmi...@gmail.com wrote:

 I haven't submitted it to PGXN as I prefer the original author to keep
 the ownership.

 Thanks, I merged your patches with the dev branch for the time being.
 It would be great to have some input from the maintainers of pg_reorg in
 pgfoundry to see if they agree about putting it in pgxn.

  Hi, I'm Sakamoto, reorg mainainer.
 I'm very happy Josh, Michael  and Daniele are interested in reorg.

 I'm working on the next version of reorg 1.1.8, which will be released in
 a couple of days.
 And I come to think that it is a point to reconsider the way to
 develop/maintain.
 To be honest,   we have little available development resources, so no
 additional
 features are added recently.  But features and fixes to be done (as Josh
 sums up. thanks).

 I think it is a good idea to develop on github. Michael's repo is the root?
 After the release of 1.1.8, I will freeze CVS repository and create a
 mirror on github.
 # Or Michael's repo will do :)

As you wish. You could create a root folder based on a new organization, or
on your own account, or use my repo.
The result will be the same. I let it at your appreciation

I have received some patches from Josh, Daniele. It should be developed in
 the next
 major version 1.2. So some of them may not be included in 1.1.8 (caz it's
 minor versionup),
 but I feel so appreciated.

Great!
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] pg_reorg in core?

2012-09-21 Thread Christopher Browne
If the present project is having a tough time doing enhancements, I should
think it mighty questionable to try to draw it into core, that presses it
towards a group of already very busy developers.

On the other hand, if the present development efforts can be made more
public, by having them take place in a more public repository, that at
least has potential to let others in the community see and participate.
There are no guarantees, but privacy is liable to hurt.

I wouldn't expect any sudden huge influx of developers, but a steady
visible stream of development effort would be mighty useful to a merge
into core argument.

A *lot* of projects are a lot like this.  On the Slony project, we have
tried hard to maintain this sort of visibility.  Steve Singer, Jan Wieck
and I do our individual efforts on git repos visible at GitHub to ensure
ongoing efforts aren't invisible inside a corporate repo.  It hasn't led to
any massive of extra developers, but I am always grateful to see Peter
Eisentraut's bug reports.


Re: [HACKERS] pg_reorg in core?

2012-09-21 Thread sakamoto

(2012/09/22 10:02), Christopher Browne wrote:


If the present project is having a tough time doing enhancements, I 
should think it mighty questionable to try to draw it into core, that 
presses it towards a group of already very busy developers.


On the other hand, if the present development efforts can be made more 
public, by having them take place in a more public repository, that at 
least has potential to let others in the community see and 
participate.  There are no guarantees, but privacy is liable to hurt.


I wouldn't expect any sudden huge influx of developers, but a steady 
visible stream of development effort would be mighty useful to a 
merge into core argument.


A *lot* of projects are a lot like this.  On the Slony project, we 
have tried hard to maintain this sort of visibility.  Steve Singer, 
Jan Wieck and I do our individual efforts on git repos visible at 
GitHub to ensure ongoing efforts aren't invisible inside a corporate 
repo.  It hasn't led to any massive of extra developers, but I am 
always grateful to see Peter Eisentraut's bug reports.




Agreed.  What reorg project needs first is transparency, including
issue traking, bugs,  listup todo items, clearfied release schedules,
quarity assurance and so force.
Only after all that done, the discussion to put them to core can be started.

Until now, reorg is developed and maintained behind corporate repository.
But now that its activity goes slow, what I should do as a maintainer is to
try development process more public and finds someone to corporate with:)

Sakamoto


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-20 Thread Josh Kupershmidt
On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Hi all,

 During the last PGCon, I heard that some community members would be
 interested in having pg_reorg directly in core.

I'm actually not crazy about this idea, at least not given the current
state of pg_reorg. Right now, there are a quite a few fixes and
features which remain to be merged in to cvs head, but at least we can
develop pg_reorg on a schedule independent of Postgres itself, i.e. we
can release new features more often than once a year. Perhaps when
pg_reorg is more stable, and the known bugs and missing features have
been ironed out, we could think about integrating into core.

Granted, a nice thing about integrating with core is we'd probably
have more of an early warning when reshuffling of PG breaks pg_reorg
(e.g. the recent splitting of the htup headers), but such changes have
been quick and easy to fix so far.

 Just to recall, pg_reorg is a functionality developped by NTT that allows to
 redistribute a table without taking locks on it.
 The technique it uses to reorganize the table is to create a temporary copy
 of the table to be redistributed with a CREATE TABLE AS
 whose definition changes if table is redistributed with a VACUUM FULL or
 CLUSTER.
 Then it follows this mechanism:
 - triggers are created to redirect all the DMLs that occur on the table to
 an intermediate log table.

N.B. CREATE TRIGGER takes an AccessExclusiveLock on the table, see below.

 - creation of indexes on the temporary table based on what the user wishes
 - Apply the logs registered during the index creation
 - Swap the names of freshly created table and old table
 - Drop the useless objects

 The code is hosted by pg_foundry here: http://pgfoundry.org/projects/reorg/.
 I am also maintaining a fork in github in sync with pgfoundry here:
 https://github.com/michaelpq/pg_reorg.

 Just, do you guys think it is worth adding a functionality like pg_reorg in
 core or not?

 If yes, well I think the code of pg_reorg is going to need some
 modifications to make it more compatible with contrib modules using only
 EXTENSION.
 For the time being pg_reorg is divided into 2 parts, binary and library.
 The library part is the SQL portion of pg_reorg, containing a set of C
 functions that are called by the binary part. This has been extended to
 support CREATE EXTENSION recently.
 The binary part creates a command pg_reorg in charge of calling the set of
 functions created by the lib part, being just a wrapper of the library part
 to control the creation and deletion of the objects.
 It is also in charge of deleting the temporary objects by callback if an
 error occurs.

 By using the binary command, it is possible to reorganize a single table or
 a database, in this case reorganizing a database launches only a loop on
 each table of this database.

 My idea is to remove the binary part and to rely only on the library part to
 make pg_reorg a single extension with only system functions like other
 contrib modules.

 In order to do that what is missing is a function that could be used as an
 entry point for table reorganization, a function of the type
 pg_reorg_table(tableoid) and pg_reorg_table(tableoid, text).
 All the functionalities of pg_reorg could be reproducible:
 - pg_reorg_table(tableoid) for a VACUUM FULL reorganization
 - pg_reorg_table(tableoid, NULL) for a CLUSTER reorganization if table has a
 CLUSTER key
 - pg_reorg_table(tableoid, columnname) for a CLUSTER reorganization based on
 a wanted column.

 Is it worth the shot?

I haven't seen this documented as such, but AFAICT the reason that
pg_reorg is split into a binary and set of backend functions which are
called by the binary is that pg_reorg needs to be able to control its
steps in several transactions so as to avoid holding locks
excessively. The reorg_one_table() function uses four or five
transactions per table, in fact. If all the logic currently in the
pg_reorg binary were moved into backend functions,  calling
pg_reorg_table() would have to be a single transaction, and there
would be no advantage to using such a function vs. CLUSTER or VACUUM
FULL.

Also, having a separate binary we should be able to perform some neat
tricks such as parallel index builds using multiple connections (I'm
messing around with this idea now). AFAIK this would also not be
possible if pg_reorg were contained solely in the library functions.

Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-20 Thread Michael Paquier
On Fri, Sep 21, 2012 at 12:07 PM, Josh Kupershmidt schmi...@gmail.comwrote:

 On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  Hi all,
 
  During the last PGCon, I heard that some community members would be
  interested in having pg_reorg directly in core.

 I'm actually not crazy about this idea, at least not given the current
 state of pg_reorg. Right now, there are a quite a few fixes and
 features which remain to be merged in to cvs head, but at least we can
 develop pg_reorg on a schedule independent of Postgres itself, i.e. we
 can release new features more often than once a year. Perhaps when
 pg_reorg is more stable, and the known bugs and missing features have
 been ironed out, we could think about integrating into core.


What could be also great is to move the project directly into github to
facilitate its maintenance and development.
My own copy is based and synced on what is in pgfoundry as I don't own any
admin access to on pgfoundry (honestly don't think I can get one either),
even if I am from NTT. Hey, some people with admin rights here?


 Granted, a nice thing about integrating with core is we'd probably
 have more of an early warning when reshuffling of PG breaks pg_reorg
 (e.g. the recent splitting of the htup headers), but such changes have
 been quick and easy to fix so far.

Yes, that is also why I am proposing to integrate it into core. Its
maintenance pace would be faster and easier than it is now in pgfoundry.
However, if hackers do not think that it is worth adding it to core... Well
separate development as done now would be fine but slower...
Also, just by watching the extension modules in contrib, I haven't seen one
using both the library and binary at the same time like pg_reorg does.

 - creation of indexes on the temporary table based on what the user wishes
  - Apply the logs registered during the index creation
  - Swap the names of freshly created table and old table
  - Drop the useless objects
 
  The code is hosted by pg_foundry here:
 http://pgfoundry.org/projects/reorg/.
  I am also maintaining a fork in github in sync with pgfoundry here:
  https://github.com/michaelpq/pg_reorg.
 
  Just, do you guys think it is worth adding a functionality like pg_reorg
 in
  core or not?
 
  If yes, well I think the code of pg_reorg is going to need some
  modifications to make it more compatible with contrib modules using only
  EXTENSION.
  For the time being pg_reorg is divided into 2 parts, binary and library.
  The library part is the SQL portion of pg_reorg, containing a set of C
  functions that are called by the binary part. This has been extended to
  support CREATE EXTENSION recently.
  The binary part creates a command pg_reorg in charge of calling the set
 of
  functions created by the lib part, being just a wrapper of the library
 part
  to control the creation and deletion of the objects.
  It is also in charge of deleting the temporary objects by callback if an
  error occurs.
 
  By using the binary command, it is possible to reorganize a single table
 or
  a database, in this case reorganizing a database launches only a loop on
  each table of this database.
 
  My idea is to remove the binary part and to rely only on the library
 part to
  make pg_reorg a single extension with only system functions like other
  contrib modules.

  In order to do that what is missing is a function that could be used as
 an
  entry point for table reorganization, a function of the type
  pg_reorg_table(tableoid) and pg_reorg_table(tableoid, text).
  All the functionalities of pg_reorg could be reproducible:
  - pg_reorg_table(tableoid) for a VACUUM FULL reorganization
  - pg_reorg_table(tableoid, NULL) for a CLUSTER reorganization if table
 has a
  CLUSTER key
  - pg_reorg_table(tableoid, columnname) for a CLUSTER reorganization
 based on
  a wanted column.
 
  Is it worth the shot?

 I haven't seen this documented as such, but AFAICT the reason that
 pg_reorg is split into a binary and set of backend functions which are
 called by the binary is that pg_reorg needs to be able to control its
 steps in several transactions so as to avoid holding locks
 excessively. The reorg_one_table() function uses four or five
 transactions per table, in fact. If all the logic currently in the
 pg_reorg binary were moved into backend functions,  calling
 pg_reorg_table() would have to be a single transaction, and there
 would be no advantage to using such a function vs. CLUSTER or VACUUM
 FULL.

Of course, but functionalities like CREATE INDEX CONCURRENTLY use multiple
transactions. Couldn't it be possible to use something similar to make the
modifications visible to other backends?



 Also, having a separate binary we should be able to perform some neat
 tricks such as parallel index builds using multiple connections (I'm
 messing around with this idea now). AFAIK this would also not be
 possible if pg_reorg were contained solely in the library functions.


Re: [HACKERS] pg_reorg in core?

2012-09-20 Thread Hitoshi Harada
On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Hi all,

 During the last PGCon, I heard that some community members would be
 interested in having pg_reorg directly in core.
 Just to recall, pg_reorg is a functionality developped by NTT that allows to
 redistribute a table without taking locks on it.
 The technique it uses to reorganize the table is to create a temporary copy
 of the table to be redistributed with a CREATE TABLE AS
 whose definition changes if table is redistributed with a VACUUM FULL or
 CLUSTER.
 Then it follows this mechanism:
 - triggers are created to redirect all the DMLs that occur on the table to
 an intermediate log table.
 - creation of indexes on the temporary table based on what the user wishes
 - Apply the logs registered during the index creation
 - Swap the names of freshly created table and old table
 - Drop the useless objects


I'm not familiar with pg_reorg, but I wonder why we need a separate
program for this task.  I know pg_reorg is ok as an external program
per se, but if we could optimize CLUSTER (or VACUUM which I'm a little
pessimistic about) in the same way, it's much nicer than having
additional binary + extension.  Isn't it possible to do the same thing
above within the CLUSTER command?  Maybe CLUSTER .. CONCURRENTLY?

Thanks,
-- 
Hitoshi Harada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_reorg in core?

2012-09-20 Thread Josh Kupershmidt
On Thu, Sep 20, 2012 at 8:33 PM, Michael Paquier
michael.paqu...@gmail.com wrote:

 On Fri, Sep 21, 2012 at 12:07 PM, Josh Kupershmidt schmi...@gmail.com
 wrote:

 On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:

 What could be also great is to move the project directly into github to
 facilitate its maintenance and development.

No argument from me there, especially as I have my own fork in github,
but that's up to the current maintainers.

 Granted, a nice thing about integrating with core is we'd probably
 have more of an early warning when reshuffling of PG breaks pg_reorg
 (e.g. the recent splitting of the htup headers), but such changes have
 been quick and easy to fix so far.

 Yes, that is also why I am proposing to integrate it into core. Its
 maintenance pace would be faster and easier than it is now in pgfoundry.

If the argument for moving pg_reorg into core is faster and easier
development, well I don't really buy that. Yes, there would presumably
be more eyeballs on the project, but you could make the same argument
about any auxiliary Postgres project which wants more attention, and
we can't have everything in core. And I fail to see how being in-core
makes development easier; I think everyone here would agree that the
bar to commit things to core is pretty darn high. If you're concerned
about the [lack of] development on pg_reorg, there are plenty of
things to fix without moving the project. I recently posted an issues
roundup to the reorg list, if you are interested in pitching in.

Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers