Re: [HACKERS] remove flatfiles.c

2009-09-16 Thread Andrew McNamara
 That's what I want to believe. But picture if you have, say a
 1-terabyte table which is 50% dead tuples and you don't have a spare
 1-terabytes to rewrite the whole table.

But trying to VACUUM FULL that table is going to be horridly painful
too, and you'll still have bloated indexes afterwards.  You might as
well just live with the 50% waste, especially since if you did a
full-table update once you'll probably do it again sometime.

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

This was almost exactly the scenario I faced recently. A production
database unexpectedly filled up its partition. On investigation, we found
a developer had added a component to the application that updated every
row in one table each day, exhausting the free space map. Over time,
most of the tables in the system had grown to contain 50-70% dead tuples.

The owner of the system was understandably reluctant to dump and restore
the system, and there wasn't enough space left on the system to rewrite
any of the large tables. In the end, I dropped a table (the one owned
by the offending developer... 8-), and this gave me just enough space
to VACUUM FULL one table at a time.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-03 Thread daveg
On Tue, Sep 01, 2009 at 07:42:56PM -0400, Tom Lane wrote:
 Greg Stark gsst...@mit.edu writes:
  On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
  Herreraalvhe...@commandprompt.com wrote:
  The use cases where VACUUM FULL wins currently are where storing two
  copies of the table and its indexes concurrently just isn't practical.
  
  Yeah, but then do you really need to use VACUUM FULL?  If that's really
  a problem then there ain't that many dead tuples around.
 
  That's what I want to believe. But picture if you have, say a
  1-terabyte table which is 50% dead tuples and you don't have a spare
  1-terabytes to rewrite the whole table.
 
 But trying to VACUUM FULL that table is going to be horridly painful
 too, and you'll still have bloated indexes afterwards.  You might as
 well just live with the 50% waste, especially since if you did a
 full-table update once you'll probably do it again sometime.
 
 I'm having a hard time believing that VACUUM FULL really has any
 interesting use-case anymore.

I have a client who uses temp tables heavily, hundreds of thousands of creates
and drops per day. They also have long running queries. The only thing that
keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
a few times a day. Without that pg_class, pg_attribute etc quickly balloon to
thousands of pages.

-dg

-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-03 Thread Andrew Dunstan



daveg wrote:

On Tue, Sep 01, 2009 at 07:42:56PM -0400, Tom Lane wrote:
  

Greg Stark gsst...@mit.edu writes:


On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
Herreraalvhe...@commandprompt.com wrote:
  

The use cases where VACUUM FULL wins currently are where storing two
copies of the table and its indexes concurrently just isn't practical.
  

Yeah, but then do you really need to use VACUUM FULL?  If that's really
a problem then there ain't that many dead tuples around.


That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.
  

But trying to VACUUM FULL that table is going to be horridly painful
too, and you'll still have bloated indexes afterwards.  You might as
well just live with the 50% waste, especially since if you did a
full-table update once you'll probably do it again sometime.

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.



I have a client who uses temp tables heavily, hundreds of thousands of creates
and drops per day. They also have long running queries. The only thing that
keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
a few times a day. Without that pg_class, pg_attribute etc quickly balloon to
thousands of pages.


  



That's a rate of more than one create and drop per second. How does your 
client handle the fact that VACUUM FULL will exclusively lock those 
catalog tables? Without knowing more, it looks like a bit of a design issue.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-03 Thread daveg
On Thu, Sep 03, 2009 at 07:57:25PM -0400, Andrew Dunstan wrote:
 daveg wrote:
 On Tue, Sep 01, 2009 at 07:42:56PM -0400, Tom Lane wrote:
 I'm having a hard time believing that VACUUM FULL really has any
 interesting use-case anymore.
 
 I have a client who uses temp tables heavily, hundreds of thousands of 
 creates
 and drops per day. They also have long running queries. The only thing that
 keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
 a few times a day. Without that pg_class, pg_attribute etc quickly balloon 
 to thousands of pages.
 
 That's a rate of more than one create and drop per second. How does your 
 client handle the fact that VACUUM FULL will exclusively lock those 
 catalog tables? Without knowing more, it looks like a bit of a design issue.

I'd say it is several per second.

They wait for the catalog locks sometimes. This is not an interactive
application so that is somewhat acceptable. It also occasionally causes
deadlocks which is less agreeable.

There are various reasons for the heavy use of temps, mainly having to do
with loading external feeds or reusing intermediate query results in a series
of queries.

It would be great if there was a way to have temp tables that
did not get cataloged, eg local cache only.

-dg

-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Greg Stark
On Wed, Sep 2, 2009 at 6:30 AM, Jaime
Casanovajcasa...@systemguards.com.ec wrote:
 On Tue, Sep 1, 2009 at 9:55 PM, Robert Haasrobertmh...@gmail.com wrote:

 I'm a bit skeptical about partitioning as a solution, too.  The
 planner is just not clever enough with partitioned tables, yet.

Yeah, we need to fix that :)

I think we're already reaching the point where the pains of dealing
with partitioned tables are usually less than the pains of dealing
with VACUUM FULL.

 analyze and vacuum a *very* big table and even scan a huge index is
 not a joke neither...

Hm, not sure I see this. The sample size for Analyze is not dependent
on the size of the table. Only on the stats_target. And vacuum with
the VM is now going to be dependent only on the number of updates to
the table, not on the size of the table.

The problem use cases we have today are only when you really do have
enough dead space to clean up that you want to compact the file -- but
not so much that it's worth rewriting the whole table using CLUSTER or
ALTER TABLE.

Perhaps we should go one version with a enable_legacy_full_vacuum
which defaults to off. That would at least let us hear about use cases
where people are unhappy with a replacement.

I did start a while ago on a replacement which used the existing
rewrite mechanism to do the equivalent of cluster without changing the
ordering. I forget where I left that but I could go back and look at
it. I'll be busy for the next few weeks though so it won't be right
away.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Josh Berkus
All,


 I'm having a hard time believing that VACUUM FULL really has any
 interesting use-case anymore.

Basically, for:
a) people who don't understand CLUSTER (easily fixed, simply create a
VACUUM FULL command which just does CLUSTER on the primary key)

b) people who are completely out of space on disk and are trying to
shrink the database to free up space.

For (b), I think it's OK to just tell those people that they need to
move the database files or find something else to delete.  Most of the
time, they have to do that *anyway* in order for VACUUM FULL to work,
since the transaction log is on the same disk.  We just need a little
more documentation, is all.

 The problem use cases we have today are only when you really do have
 enough dead space to clean up that you want to compact the file -- but
 not so much that it's worth rewriting the whole table using CLUSTER or
 ALTER TABLE.

I haven't seen this use-case in the field.  I'm not sure that it
actually exists.  Anyone run across a case where this made sense?

Recently I actually had a client dump and reload their database rather
than running VACUUM FULL; a reload took 4 hours but VACUUM FULL took
more than 18.

 Perhaps we should go one version with a enable_legacy_full_vacuum
 which defaults to off. That would at least let us hear about use cases
 where people are unhappy with a replacement.

I think we do need to do this, just because people won't have changed
their admin scripts.  But the goal should be to dump VACUUM FULL
entirely by 8.6 if we *don't* get serious use-cases.

 I did start a while ago on a replacement which used the existing
 rewrite mechanism to do the equivalent of cluster without changing the
 ordering. I forget where I left that but I could go back and look at
 it. I'll be busy for the next few weeks though so it won't be right
 away.

This would be very helpful.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Joshua D. Drake
On Wed, 2009-09-02 at 10:41 -0700, Josh Berkus wrote:
 All,
 
 
  I'm having a hard time believing that VACUUM FULL really has any
  interesting use-case anymore.
 
 Basically, for:
 a) people who don't understand CLUSTER (easily fixed, simply create a
 VACUUM FULL command which just does CLUSTER on the primary key)
 
 b) people who are completely out of space on disk and are trying to
 shrink the database to free up space.
 
 For (b), I think it's OK to just tell those people that they need to
 move the database files or find something else to delete.  Most of the
 time, they have to do that *anyway* in order for VACUUM FULL to work,
 since the transaction log is on the same disk.  We just need a little
 more documentation, is all.

Right.

 
  The problem use cases we have today are only when you really do have
  enough dead space to clean up that you want to compact the file -- but
  not so much that it's worth rewriting the whole table using CLUSTER or
  ALTER TABLE.
 
 I haven't seen this use-case in the field.  I'm not sure that it
 actually exists.  Anyone run across a case where this made sense?
 

No.

 Recently I actually had a client dump and reload their database rather
 than running VACUUM FULL; a reload took 4 hours but VACUUM FULL took
 more than 18.
 

Exactly.

  Perhaps we should go one version with a enable_legacy_full_vacuum
  which defaults to off. That would at least let us hear about use cases
  where people are unhappy with a replacement.
 
 I think we do need to do this, just because people won't have changed
 their admin scripts.  But the goal should be to dump VACUUM FULL
 entirely by 8.6 if we *don't* get serious use-cases.
 

Agreed, but I think we shouldn't even put it in the postgresql.conf by
default. Just document that it exists. Settings for the sake of settings
(even ones that may have a corner case) seem to confuse users.

Joshua D. Drake


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Greg Stark
On Wed, Sep 2, 2009 at 6:41 PM, Josh Berkusj...@agliodbs.com wrote:
 All,


 I'm having a hard time believing that VACUUM FULL really has any
 interesting use-case anymore.

 Basically, for:
 a) people who don't understand CLUSTER (easily fixed, simply create a
 VACUUM FULL command which just does CLUSTER on the primary key)

I don't think we want to cluster on the primary key. I think we just
want to rewrite the table keeping the same physical ordering.


 The problem use cases we have today are only when you really do have
 enough dead space to clean up that you want to compact the file -- but
 not so much that it's worth rewriting the whole table using CLUSTER or
 ALTER TABLE.

 I haven't seen this use-case in the field.  I'm not sure that it
 actually exists.  Anyone run across a case where this made sense?

Well I've certainly seen people whose disks are more than 50% full.
They tend to be the same people who want to compact their tables. I
can't say whether any of them had a single table with associated
indexes that were taking up more than 50% but it's not uncommon to
have a single table that dominates your database.



 Recently I actually had a client dump and reload their database rather
 than running VACUUM FULL; a reload took 4 hours but VACUUM FULL took
 more than 18.

 Perhaps we should go one version with a enable_legacy_full_vacuum
 which defaults to off. That would at least let us hear about use cases
 where people are unhappy with a replacement.

 I think we do need to do this, just because people won't have changed
 their admin scripts.  But the goal should be to dump VACUUM FULL
 entirely by 8.6 if we *don't* get serious use-cases.

We could deal with the admin scripts by making VACUUM FULL do the new
behaviour. But I actually don't really like that. I wold prefer to
break VACUUM FULL since anyone doing it routinely is probably
mistaken. We could name the command something which is more
descriptive like VACUUM REWRITE or VACUUM REBUILD or something like
that.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Kevin Grittner
Greg Stark gsst...@mit.edu wrote:
 
 I don't think we want to cluster on the primary key. I think we just
 want to rewrite the table keeping the same physical ordering.
 
Well if that's what you want to do, couldn't you do something like?:
 
Lock the table.
Prop all indexes
Pass the heap with two pointers, one to the first available empty
space and one to the first non-dead row past that, and move inside the
existing file.
Rebuild the indexes.
Release the lock.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Josh Berkus
Greg,

 I don't think we want to cluster on the primary key. I think we just
 want to rewrite the table keeping the same physical ordering.

Agreed.

 Well I've certainly seen people whose disks are more than 50% full.
 They tend to be the same people who want to compact their tables. I
 can't say whether any of them had a single table with associated
 indexes that were taking up more than 50% but it's not uncommon to
 have a single table that dominates your database.

Those people would also need for the tables involved to be fairly small,
or to be able to afford a lot of downtime.  VACUUM FULL on a 100GB table
with current commodity servers can take upwards of 8 hours.  I really
think the cases of people who have more available downtime than disk
space is is vanishingly small group.

However, I'll do a survey.  Why not?

 We could deal with the admin scripts by making VACUUM FULL do the new
 behaviour. But I actually don't really like that. I wold prefer to
 break VACUUM FULL since anyone doing it routinely is probably
 mistaken. We could name the command something which is more
 descriptive like VACUUM REWRITE or VACUUM REBUILD or something like
 that.

Agreed.  I like VACUUM REWRITE, as it makes it fairly clear what's going on.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Wed, Sep 2, 2009 at 6:41 PM, Josh Berkusj...@agliodbs.com wrote:
 Perhaps we should go one version with a enable_legacy_full_vacuum
 which defaults to off. That would at least let us hear about use cases
 where people are unhappy with a replacement.
 
 I think we do need to do this, just because people won't have changed
 their admin scripts.  But the goal should be to dump VACUUM FULL
 entirely by 8.6 if we *don't* get serious use-cases.

 We could deal with the admin scripts by making VACUUM FULL do the new
 behaviour. But I actually don't really like that. I wold prefer to
 break VACUUM FULL since anyone doing it routinely is probably
 mistaken. We could name the command something which is more
 descriptive like VACUUM REWRITE or VACUUM REBUILD or something like
 that.

What's wrong with just ignoring the FULL option?  It's a reserved
word anyway because of FULL OUTER JOINs, so there's no syntactic
benefit to be had from eliminating it from the VACUUM syntax.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Robert Haas
On Wed, Sep 2, 2009 at 1:52 PM, Greg Starkgsst...@mit.edu wrote:
 We could deal with the admin scripts by making VACUUM FULL do the new
 behaviour. But I actually don't really like that. I wold prefer to
 break VACUUM FULL since anyone doing it routinely is probably
 mistaken.

So I have a script that goes and finds bloated tables and runs VACUUM
FULL on them in the middle of the night if the bloat passes a certain
threshold.  The tables are small enough and the number of users is low
enough that this doesn't cause any problems for me.  I'm OK if the
name of the command changes, but I'd like there to be a command that I
can pass a table name to and get my table debloated without having to
make any follow-on decisions (such as picking an index to cluster by).

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 So I have a script that goes and finds bloated tables and runs VACUUM
 FULL on them in the middle of the night if the bloat passes a certain
 threshold.  The tables are small enough and the number of users is low
 enough that this doesn't cause any problems for me.  I'm OK if the
 name of the command changes, but I'd like there to be a command that I
 can pass a table name to and get my table debloated without having to
 make any follow-on decisions (such as picking an index to cluster by).

I think we *should* have a command that works like CLUSTER except it just
seqscans the source table without depending on any particular index.
Whether it's called VACUUM FULL or something else is a detail.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Greg Stark
On Wed, Sep 2, 2009 at 6:57 PM, Kevin
Grittnerkevin.gritt...@wicourts.gov wrote:
 Greg Stark gsst...@mit.edu wrote:

 I don't think we want to cluster on the primary key. I think we just
 want to rewrite the table keeping the same physical ordering.

 Well if that's what you want to do, couldn't you do something like?:

 Lock the table.
 Prop all indexes
 Pass the heap with two pointers, one to the first available empty
 space and one to the first non-dead row past that, and move inside the
 existing file.
 Rebuild the indexes.
 Release the lock.

Well dropping the indexes and moving tuples are both hard if you
care about crash-safety and transactional integrity.

The way we rewrite tables now is:

Lock table
Create new filenode.
Scan old table and copy each record into the new filenode keeping
update chains intact.
Rebuild all indexes for the table (using a similar strategy with new
relfilenodes)
Commit the transaction

If the transaction aborts at any point you still have the old pg_class
record which points to the old relfilenode and all the old indexes are
still valid.

We have all the pieces we need to do this, it's just a matter of
putting them together with a command to call them.

A big part of what VACUUM FULL is annoying is the complexity of moving
tuples in place. VACUUM FULL has to mark the old tuples and the new
copies with its xid. It can't truncate the relation until it commits
that xid.

Actually I wonder how much performance improvement would come on
normal DML just from not having to check xvac in the visibility
checks. It's probably not much but...

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Robert Haas
On Wed, Sep 2, 2009 at 2:31 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Greg Stark gsst...@mit.edu writes:
 On Wed, Sep 2, 2009 at 6:41 PM, Josh Berkusj...@agliodbs.com wrote:
 Perhaps we should go one version with a enable_legacy_full_vacuum
 which defaults to off. That would at least let us hear about use cases
 where people are unhappy with a replacement.

 I think we do need to do this, just because people won't have changed
 their admin scripts.  But the goal should be to dump VACUUM FULL
 entirely by 8.6 if we *don't* get serious use-cases.

 We could deal with the admin scripts by making VACUUM FULL do the new
 behaviour. But I actually don't really like that. I wold prefer to
 break VACUUM FULL since anyone doing it routinely is probably
 mistaken. We could name the command something which is more
 descriptive like VACUUM REWRITE or VACUUM REBUILD or something like
 that.

 What's wrong with just ignoring the FULL option?  It's a reserved
 word anyway because of FULL OUTER JOINs, so there's no syntactic
 benefit to be had from eliminating it from the VACUUM syntax.

Silent behavior changes are usually a bad idea.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Robert Haas
On Wed, Sep 2, 2009 at 2:54 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 So I have a script that goes and finds bloated tables and runs VACUUM
 FULL on them in the middle of the night if the bloat passes a certain
 threshold.  The tables are small enough and the number of users is low
 enough that this doesn't cause any problems for me.  I'm OK if the
 name of the command changes, but I'd like there to be a command that I
 can pass a table name to and get my table debloated without having to
 make any follow-on decisions (such as picking an index to cluster by).

 I think we *should* have a command that works like CLUSTER except it just
 seqscans the source table without depending on any particular index.
 Whether it's called VACUUM FULL or something else is a detail.

Yeah.  We could do this by extending the syntax for cluster (e.g.
CLUSTER [VERBOSE] tablename [USING indexname | WITH NO INDEX]), but
I'm not sure whether that has any real advantage over just using the
existing command name.

I confess to being a little fuzzy on the details of how this
implementation (seq-scanning the source table for live tuples) is
different/better from the current VACUUM FULL implementation.  Can
someone fill me in?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Greg Stark
On Wed, Sep 2, 2009 at 8:10 PM, Robert Haasrobertmh...@gmail.com wrote:
 I confess to being a little fuzzy on the details of how this
 implementation (seq-scanning the source table for live tuples) is
 different/better from the current VACUUM FULL implementation.  Can
 someone fill me in?


VACUUM FULL is a *lot* more complex.

It scans pages *backwards* from the end (which does wonderful things
on rotating media). Marks each live tuple it finds as moved off,
finds a new place for it (using the free space map I think?). Insert
the tuple on the new page and marks it moved in and updates the
indexes.

Then it commits the transaction but keeps the lock. Then it has to
vacuum all the indexes of the references to the old tuples at the end
of the table. I think it has to commit that too before it can finally
truncate the table.

The backwards scan is awful for rotating media. The reading from the
end and writing to the beginning is bad too, though hopefully the
cache can help that.

A lot of the complexity comes in from other parts of the system that
have to be aware of tuples that have been moved off or moved in.
They have to be able to check whether the vacuum committed or not.

That reminds me there was another proposal to do an online vacuum
full similar to our concurrent index builds. Do noop-updates to tuples
at the end of the table, hopefully finding space for them earlier in
the table. Wait until those transactions are no longer visible to
anyone else and then truncate. (Actually I think you could just not do
anything and let regular lazy vacuum do the truncate). That might be a
good practical alternative for sites where copying their entire table
isn't practical.


-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 The backwards scan is awful for rotating media. The reading from the
 end and writing to the beginning is bad too, though hopefully the
 cache can help that.

Yeah.  And all that pales in comparison to what happens in the indexes.
You have to insert index entries (retail) for each moved-in tuple,
then after doing the intermediate commit you run around and remove
the index entries for the moved-off tuples.  Lots of nonsequential
access to insert the entries.  The cleanup isn't so bad --- it's
comparable to what regular lazy VACUUM has to do --- but that's just
one step in a very expensive process.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 It scans pages *backwards* from the end (which does wonderful things
 on rotating media). Marks each live tuple it finds as moved off,
 finds a new place for it (using the free space map I think?).

BTW, VACUUM FULL doesn't use the free space map --- that code predates
the FSM by a lot.  It builds its own map of free space during its
initial lazy-VACUUM-equivalent scan that just removes dead tuples.
While I don't think this hurts performance any, I have seen reports of
VACUUM FULL failing outright on large tables because it runs out of
memory for this map.  So that's still another way in which it's not
actually all that useful for huge tables.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Alvaro Herrera
Tom Lane escribió:
 Greg Stark gsst...@mit.edu writes:
  It scans pages *backwards* from the end (which does wonderful things
  on rotating media). Marks each live tuple it finds as moved off,
  finds a new place for it (using the free space map I think?).
 
 BTW, VACUUM FULL doesn't use the free space map --- that code predates
 the FSM by a lot.  It builds its own map of free space during its
 initial lazy-VACUUM-equivalent scan that just removes dead tuples.

Another weird consequence of this is that it bails out if it finds a
tuple larger than it can fit in one of the earlier pages; if there's
dead space to be compacted before that, it's not compacted.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Robert Haas
On Wed, Sep 2, 2009 at 3:30 PM, Greg Starkgsst...@mit.edu wrote:
 On Wed, Sep 2, 2009 at 8:10 PM, Robert Haasrobertmh...@gmail.com wrote:
 I confess to being a little fuzzy on the details of how this
 implementation (seq-scanning the source table for live tuples) is
 different/better from the current VACUUM FULL implementation.  Can
 someone fill me in?


 VACUUM FULL is a *lot* more complex.

 It scans pages *backwards* from the end (which does wonderful things
 on rotating media). Marks each live tuple it finds as moved off,
 finds a new place for it (using the free space map I think?). Insert
 the tuple on the new page and marks it moved in and updates the
 indexes.

 Then it commits the transaction but keeps the lock. Then it has to
 vacuum all the indexes of the references to the old tuples at the end
 of the table. I think it has to commit that too before it can finally
 truncate the table.

 The backwards scan is awful for rotating media. The reading from the
 end and writing to the beginning is bad too, though hopefully the
 cache can help that.

 A lot of the complexity comes in from other parts of the system that
 have to be aware of tuples that have been moved off or moved in.
 They have to be able to check whether the vacuum committed or not.

Ugh.

 That reminds me there was another proposal to do an online vacuum
 full similar to our concurrent index builds. Do noop-updates to tuples
 at the end of the table, hopefully finding space for them earlier in
 the table. Wait until those transactions are no longer visible to
 anyone else and then truncate. (Actually I think you could just not do
 anything and let regular lazy vacuum do the truncate). That might be a
 good practical alternative for sites where copying their entire table
 isn't practical.

I don't have a strong opinion about whether it's better to take an
exclusive lock on everything and reorganize freely, or whether it's
better to try to use MVCC semantics to reduce the locking impact (at a
cost of difficulty in the presence of long-running transactions).  But
I think it would be really nice to have an incremental way to reduce
table bloat.  Locking a table (or even better, part of a table) for a
couple of seconds once an hour for several days or weeks figures to be
practical in some (many?) environments where locking a table for
minutes or hours is not.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Another weird consequence of this is that it bails out if it finds a
 tuple larger than it can fit in one of the earlier pages; if there's
 dead space to be compacted before that, it's not compacted.

I don't find a lot wrong with that.  The code defines its purpose as
being to shorten the table file length.  Once it hits a page that
can't be emptied, it cannot shorten the file any further, so why
shouldn't it stop?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Alvaro Herrera
Tom Lane escribió:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Another weird consequence of this is that it bails out if it finds a
  tuple larger than it can fit in one of the earlier pages; if there's
  dead space to be compacted before that, it's not compacted.
 
 I don't find a lot wrong with that.  The code defines its purpose as
 being to shorten the table file length.  Once it hits a page that
 can't be emptied, it cannot shorten the file any further, so why
 shouldn't it stop?

All that work, and it wasn't capable of defragging the other pages?  At
the very least it could register them in the FSM.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Tom Lane escribió:
 I don't find a lot wrong with that.  The code defines its purpose as
 being to shorten the table file length.  Once it hits a page that
 can't be emptied, it cannot shorten the file any further, so why
 shouldn't it stop?

 All that work, and it wasn't capable of defragging the other pages?  At
 the very least it could register them in the FSM.

You mean like vac_update_fsm() ?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Alvaro Herrera
Tom Lane escribió:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Tom Lane escribi�:
  I don't find a lot wrong with that.  The code defines its purpose as
  being to shorten the table file length.  Once it hits a page that
  can't be emptied, it cannot shorten the file any further, so why
  shouldn't it stop?
 
  All that work, and it wasn't capable of defragging the other pages?  At
  the very least it could register them in the FSM.
 
 You mean like vac_update_fsm() ?

Huh :-)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Ron Mayer
Robert Haas wrote:
 On Tue, Sep 1, 2009 at 9:29 PM, Alvaro
 Herreraalvhe...@commandprompt.com wrote:
 Ron Mayer wrote:
 Greg Stark wrote:
 That's what I want to believe. But picture if you have, say a
 1-terabyte table which is 50% dead tuples and you don't have a spare
 1-terabytes to rewrite the whole table.
 Could one hypothetically do
update bigtable set pk = pk where ctid in (select ctid from bigtable 
 order by ctid desc limit 100);
vacuum;
 and repeat until max(ctid) is small enough?
 I remember Hannu Krosing said they used something like that to shrink
 really bloated tables.  Maybe we should try to explicitely support a
 mechanism that worked in that fashion.  I think I tried it at some point
 and found that the problem with it was that ctid was too limited in what
 it was able to do.
 
 I think a way to incrementally shrink large tables would be enormously
 beneficial.   Maybe vacuum could try to do a bit of that each time it
 runs.

Yet when I try it now, I'm having trouble making it work.
Would you expect the ctid to be going down in the psql session
shown below?  I wonder why it isn't.





regression=# create table shrink_test as select * from tenk1;
SELECT
regression=# delete from shrink_test where (unique2 % 2) = 0;
DELETE 5000
regression=# create index shrink_test(unique1) on shrink_test(unique1);
CREATE INDEX
regression=# select max(ctid) from shrink_test;
   max
--
 (333,10)
(1 row)

regression=# update shrink_test set unique1=unique1 where ctid in (select ctid 
from shrink_test order by ctid desc limit 100);
UPDATE 100
regression=# vacuum shrink_test;
VACUUM
regression=# select max(ctid) from shrink_test;
   max
--
 (333,21)
(1 row)

regression=# update shrink_test set unique1=unique1 where ctid in (select ctid 
from shrink_test order by ctid desc limit 100);
UPDATE 100
regression=# vacuum shrink_test;
VACUUM
regression=# select max(ctid) from shrink_test;
   max
--
 (333,27)
(1 row)

regression=# update shrink_test set unique1=unique1 where ctid in (select ctid 
from shrink_test order by ctid desc limit 100);
UPDATE 100
regression=# vacuum shrink_test;
VACUUM
regression=# select max(ctid) from shrink_test;
   max
--
 (333,33)
(1 row)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Greg Stark
On Wed, Sep 2, 2009 at 11:55 PM, Ron Mayerrm...@cheapcomplexdevices.com wrote:
 Yet when I try it now, I'm having trouble making it work.
 Would you expect the ctid to be going down in the psql session
 shown below?  I wonder why it isn't.

Even before HOT we preferentially tried to put updated tuples on the
same page they were on before. On pre-8.3 if you did these updates
*without* the vacuum they would eventually be forced to find a new
page and hopefully would find one earlier in the table.

On 8.4 HOT will (hopefully) prevent even that from working. Unless you
have a long-running transaction in the background it will clean up the
old tuples in the chain on the page each time the page fills up.
You've deleted half the tuples on the page so the updates will always
fit in that space.

Eventually you'll hit the maximum number of tuples allowed on the page
dead or alive. But the vacuums are defeating that too.  A special
purpose command could work around all of this.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-02 Thread Greg Stark
On Wed, Sep 2, 2009 at 8:45 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Greg Stark gsst...@mit.edu writes:
 The backwards scan is awful for rotating media. The reading from the
 end and writing to the beginning is bad too, though hopefully the
 cache can help that.

 Yeah.  And all that pales in comparison to what happens in the indexes.
 You have to insert index entries (retail) for each moved-in tuple,

Hm, that could be addressed by buffering index inserts in backend
local memory. That's something Heikki proposed a long time ago
primarily for improving bulk data loading. Basically it would be a
retail version of the bulk loader that we saw at the 10th anniversary
where you merge a sorted list into the index.

You would still have to flush the buffer at transaction commit but
even if it only buffered a few dozen tuples if they're in the same
region of the index it would be a win. In this case it could probably
buffer hundreds and merge them all into the index en masse.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Simon Riggs

On Mon, 2009-08-31 at 18:53 -0400, Alvaro Herrera wrote:

 Regarding sync commits that previously happen and now won't, I think the
 only case worth worrying about is the one in vacuum.c.  Do we need a
 ForceSyncCommit() in there?  I'm not sure if vacuum itself already
 forces sync commit.

VACUUM FULL requires ForceSyncCommit().

Not sure why removing them elsewhere is important? Getting robustness
wrong is a big, bad thing and this opens us to future error. We already
tuned VACUUM so it does very little if it has no work to do, why would
one extra I/O improve things so much? If it ain't broke... 

VACUUM does so many things that I'd rather have it all safely on disk.
I'd feel happier with the rule VACUUM always sync commits, so we all
remember it and can rely upon it to be the same from release to release.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 VACUUM does so many things that I'd rather have it all safely on disk.
 I'd feel happier with the rule VACUUM always sync commits, so we all
 remember it and can rely upon it to be the same from release to release.

Non-FULL vacuum has *never* done a sync commit, except in the unusual
corner case that it moves the database's datfrozenxid, which is a corner
case that didn't even exist until fairly recently.  I think the argument
that we should have it force sync for no reason whatsoever is silly.
We get beat up on a regular basis about spikes in response time;
why would you want to have vacuum creating one when it doesn't need to?

As for the FULL case, the sync commit is to try to protect a horribly
unsafe kluge that should go away entirely (if vacuum full itself doesn't
go away entirely).  That's hardly something I want to institutionalize
either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Simon Riggs

On Tue, 2009-09-01 at 09:58 -0400, Tom Lane wrote:

 We get beat up on a regular basis about spikes in response time;
 why would you want to have vacuum creating one when it doesn't need
 to?

If one I/O on a background utility can cause such a spike, we are in
serious shitake. I would be more comfortable if the various important
things VACUUM does were protected by sync commit. I see no reason to
optimise away one I/O just because we might theoretically do so. Any
mistake in the theory and we are exposed. Why take the risk? We do many
things to check and secure our data, why not this one? If this was
suggested separately it as an optimisation you'd laugh and say why
bother?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Tue, 2009-09-01 at 09:58 -0400, Tom Lane wrote:
 We get beat up on a regular basis about spikes in response time;
 why would you want to have vacuum creating one when it doesn't need
 to?

 If one I/O on a background utility can cause such a spike, we are in
 serious shitake. I would be more comfortable if the various important
 things VACUUM does were protected by sync commit. I see no reason to
 optimise away one I/O just because we might theoretically do so. Any
 mistake in the theory and we are exposed. Why take the risk?

*WHAT* risk?  Most vacuums do not do a sync commit, and never have.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Greg Stark
On Tue, Sep 1, 2009 at 2:58 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 We get beat up on a regular basis about spikes in response time;
 why would you want to have vacuum creating one when it doesn't need to?

Isn't this sync commit just going to do the same thing that the wal
writer is going to do in at most 200ms anyways?

 As for the FULL case, the sync commit is to try to protect a horribly
 unsafe kluge that should go away entirely (if vacuum full itself doesn't
 go away entirely).

I'm all for throwing away VACUUM FULL btw. I was thinking of proposing
that we replace it with something like CLUSTER which just rewrites the
tuples in the order it finds them.

The use cases where VACUUM FULL wins currently are where storing two
copies of the table and its indexes concurrently just isn't practical.
Also perhaps tables where there are too many large indexes to make
rebuilding them all in one maintenance window practical.

I don't see any way to address these problems without something as
complex as xvac and moved_in/moved_off and without the index bloat
problems. I think we could improve the i/o access patterns we have
currently which make vacuum full so slow, but the fundamental problems
would remain.

So the question is whether those use cases are worth keeping our
existing vacuum full for or whether we could do without it and just
recommend partitioning for people with tables large enough to make
table rewrites impractical.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Alvaro Herrera
Greg Stark wrote:

 The use cases where VACUUM FULL wins currently are where storing two
 copies of the table and its indexes concurrently just isn't practical.

Yeah, but then do you really need to use VACUUM FULL?  If that's really
a problem then there ain't that many dead tuples around.

 Also perhaps tables where there are too many large indexes to make
 rebuilding them all in one maintenance window practical.

If that's the concern maybe we oughta do something about concurrently
re-creating those indexes somehow.  Plain REINDEX doesn't work of
course, but maybe we can do some trick with creating a new index and
dropping the original one afterwards.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Greg Stark
On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
Herreraalvhe...@commandprompt.com wrote:
 The use cases where VACUUM FULL wins currently are where storing two
 copies of the table and its indexes concurrently just isn't practical.

 Yeah, but then do you really need to use VACUUM FULL?  If that's really
 a problem then there ain't that many dead tuples around.

That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.

 Also perhaps tables where there are too many large indexes to make
 rebuilding them all in one maintenance window practical.

 If that's the concern maybe we oughta do something about concurrently
 re-creating those indexes somehow.  Plain REINDEX doesn't work of
 course, but maybe we can do some trick with creating a new index and
 dropping the original one afterwards.

Well that doesn't really work if you want to rewrite the table.
CLUSTER has to rebuild all the indexes when it's done.


I think the solution for both of these is actually partitioning. The
bottom line is that having a single table which contains very large
amounts of data is awkward to maintain.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
 Herreraalvhe...@commandprompt.com wrote:
 The use cases where VACUUM FULL wins currently are where storing two
 copies of the table and its indexes concurrently just isn't practical.
 
 Yeah, but then do you really need to use VACUUM FULL?  If that's really
 a problem then there ain't that many dead tuples around.

 That's what I want to believe. But picture if you have, say a
 1-terabyte table which is 50% dead tuples and you don't have a spare
 1-terabytes to rewrite the whole table.

But trying to VACUUM FULL that table is going to be horridly painful
too, and you'll still have bloated indexes afterwards.  You might as
well just live with the 50% waste, especially since if you did a
full-table update once you'll probably do it again sometime.

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Ron Mayer
Greg Stark wrote:
 
 That's what I want to believe. But picture if you have, say a
 1-terabyte table which is 50% dead tuples and you don't have a spare
 1-terabytes to rewrite the whole table.

Could one hypothetically do
   update bigtable set pk = pk where ctid in (select ctid from bigtable order 
by ctid desc limit 100);
   vacuum;
and repeat until max(ctid) is small enough?

Sure, it'll take longer than vacuum full; but at first glance
it seems lightweight enough to do even on a live, heavily accessed
table.

IIRC I tried something like this once, and it worked to some extent,
but after a few loops didn't shrink the table as much as I had expected.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Alvaro Herrera
Ron Mayer wrote:
 Greg Stark wrote:
  
  That's what I want to believe. But picture if you have, say a
  1-terabyte table which is 50% dead tuples and you don't have a spare
  1-terabytes to rewrite the whole table.
 
 Could one hypothetically do
update bigtable set pk = pk where ctid in (select ctid from bigtable order 
 by ctid desc limit 100);
vacuum;
 and repeat until max(ctid) is small enough?

I remember Hannu Krosing said they used something like that to shrink
really bloated tables.  Maybe we should try to explicitely support a
mechanism that worked in that fashion.  I think I tried it at some point
and found that the problem with it was that ctid was too limited in what
it was able to do.

The neat thing is that now that we have the visibility fork, each vacuum
needn't scan the whole table each time.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Robert Haas
On Tue, Sep 1, 2009 at 7:42 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Greg Stark gsst...@mit.edu writes:
 On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
 Herreraalvhe...@commandprompt.com wrote:
 The use cases where VACUUM FULL wins currently are where storing two
 copies of the table and its indexes concurrently just isn't practical.

 Yeah, but then do you really need to use VACUUM FULL?  If that's really
 a problem then there ain't that many dead tuples around.

 That's what I want to believe. But picture if you have, say a
 1-terabyte table which is 50% dead tuples and you don't have a spare
 1-terabytes to rewrite the whole table.

 But trying to VACUUM FULL that table is going to be horridly painful
 too, and you'll still have bloated indexes afterwards.  You might as
 well just live with the 50% waste, especially since if you did a
 full-table update once you'll probably do it again sometime.

 I'm having a hard time believing that VACUUM FULL really has any
 interesting use-case anymore.

What if your large table doesn't have an index?  Then there's no way to cluster.

I'm a bit skeptical about partitioning as a solution, too.  The
planner is just not clever enough with partitioned tables, yet.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Robert Haas
On Tue, Sep 1, 2009 at 9:29 PM, Alvaro
Herreraalvhe...@commandprompt.com wrote:
 Ron Mayer wrote:
 Greg Stark wrote:
 
  That's what I want to believe. But picture if you have, say a
  1-terabyte table which is 50% dead tuples and you don't have a spare
  1-terabytes to rewrite the whole table.

 Could one hypothetically do
    update bigtable set pk = pk where ctid in (select ctid from bigtable 
 order by ctid desc limit 100);
    vacuum;
 and repeat until max(ctid) is small enough?

 I remember Hannu Krosing said they used something like that to shrink
 really bloated tables.  Maybe we should try to explicitely support a
 mechanism that worked in that fashion.  I think I tried it at some point
 and found that the problem with it was that ctid was too limited in what
 it was able to do.

I think a way to incrementally shrink large tables would be enormously
beneficial.   Maybe vacuum could try to do a bit of that each time it
runs.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Alvaro Herrera
Robert Haas escribió:
 On Tue, Sep 1, 2009 at 7:42 PM, Tom Lanet...@sss.pgh.pa.us wrote:

  But trying to VACUUM FULL that table is going to be horridly painful
  too, and you'll still have bloated indexes afterwards.  You might as
  well just live with the 50% waste, especially since if you did a
  full-table update once you'll probably do it again sometime.
 
  I'm having a hard time believing that VACUUM FULL really has any
  interesting use-case anymore.
 
 What if your large table doesn't have an index?  Then there's no way to 
 cluster.

But there's nothing saying we cannot provide a version of CLUSTER that
does not follow any index and just copies the live tuples.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Robert Haas
On Tue, Sep 1, 2009 at 10:58 PM, Alvaro
Herreraalvhe...@commandprompt.com wrote:
 Robert Haas escribió:
 On Tue, Sep 1, 2009 at 7:42 PM, Tom Lanet...@sss.pgh.pa.us wrote:

  But trying to VACUUM FULL that table is going to be horridly painful
  too, and you'll still have bloated indexes afterwards.  You might as
  well just live with the 50% waste, especially since if you did a
  full-table update once you'll probably do it again sometime.
 
  I'm having a hard time believing that VACUUM FULL really has any
  interesting use-case anymore.

 What if your large table doesn't have an index?  Then there's no way to 
 cluster.

 But there's nothing saying we cannot provide a version of CLUSTER that
 does not follow any index and just copies the live tuples.

Agreed.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Rod Taylor
On Tue, Sep 1, 2009 at 19:34, Greg Stark gsst...@mit.edu wrote:

 On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
 Herreraalvhe...@commandprompt.com wrote:
  The use cases where VACUUM FULL wins currently are where storing two
  copies of the table and its indexes concurrently just isn't practical.
 
  Yeah, but then do you really need to use VACUUM FULL?  If that's really
  a problem then there ain't that many dead tuples around.

 That's what I want to believe. But picture if you have, say a
 1-terabyte table which is 50% dead tuples and you don't have a spare
 1-terabytes to rewrite the whole table.



It would be interesting if there was something between VACUUM FULL and
CLUSTER which could, say, work on a single 1GB segment at a time in a manner
similar to cluster.

You would still end up with index bloat like vacuum full, though perhaps not
as bad, but shuffling around the tuples should be faster.


The idea here is that the files can be truncated individually. Two 500MB
files is pretty much the same as a single 1GB file on disk.


Of course, I'm hand waving and don't have the technical expertise to figure
out if it can be done easily within PostgreSQL.


Re: [HACKERS] remove flatfiles.c

2009-09-01 Thread Jaime Casanova
On Tue, Sep 1, 2009 at 9:55 PM, Robert Haasrobertmh...@gmail.com wrote:

 I'm a bit skeptical about partitioning as a solution, too.  The
 planner is just not clever enough with partitioned tables, yet.


analyze and vacuum a *very* big table and even scan a huge index is
not a joke neither...
and yes the planner is not very clever about partitioning and
certainly that is something we need to fix not something we have to
live with... no that that will be easy but hey! we have very brilliant
people here (you being one of them)

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] remove flatfiles.c

2009-08-31 Thread Alvaro Herrera
This patch removes flatfiles.c for good.

It doesn't change the keeping of locks in dbcommands.c and user.c,
because at least some of them are still required.

Regarding sync commits that previously happen and now won't, I think the
only case worth worrying about is the one in vacuum.c.  Do we need a
ForceSyncCommit() in there?  I'm not sure if vacuum itself already
forces sync commit.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Index: src/backend/access/transam/twophase_rmgr.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/twophase_rmgr.c,v
retrieving revision 1.8
diff -c -p -r1.8 twophase_rmgr.c
*** src/backend/access/transam/twophase_rmgr.c	1 Jan 2009 17:23:36 -	1.8
--- src/backend/access/transam/twophase_rmgr.c	31 Aug 2009 21:52:17 -
***
*** 18,24 
  #include commands/async.h
  #include pgstat.h
  #include storage/lock.h
- #include utils/flatfiles.h
  #include utils/inval.h
  
  
--- 18,23 
*** const TwoPhaseCallback twophase_recover_
*** 27,33 
  	NULL,		/* END ID */
  	lock_twophase_recover,		/* Lock */
  	NULL,		/* Inval */
- 	NULL,		/* flat file update */
  	NULL,		/* notify/listen */
  	NULL		/* pgstat */
  };
--- 26,31 
*** const TwoPhaseCallback twophase_postcomm
*** 37,43 
  	NULL,		/* END ID */
  	lock_twophase_postcommit,	/* Lock */
  	inval_twophase_postcommit,	/* Inval */
- 	flatfile_twophase_postcommit,		/* flat file update */
  	notify_twophase_postcommit, /* notify/listen */
  	pgstat_twophase_postcommit	/* pgstat */
  };
--- 35,40 
*** const TwoPhaseCallback twophase_postabor
*** 47,53 
  	NULL,		/* END ID */
  	lock_twophase_postabort,	/* Lock */
  	NULL,		/* Inval */
- 	NULL,		/* flat file update */
  	NULL,		/* notify/listen */
  	pgstat_twophase_postabort	/* pgstat */
  };
--- 44,49 
Index: src/backend/access/transam/xact.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.274
diff -c -p -r1.274 xact.c
*** src/backend/access/transam/xact.c	11 Jun 2009 14:48:54 -	1.274
--- src/backend/access/transam/xact.c	31 Aug 2009 21:50:41 -
***
*** 43,49 
  #include storage/sinvaladt.h
  #include storage/smgr.h
  #include utils/combocid.h
- #include utils/flatfiles.h
  #include utils/guc.h
  #include utils/inval.h
  #include utils/memutils.h
--- 43,48 
*** CommitTransaction(void)
*** 1608,1619 
  	/* NOTIFY commit must come before lower-level cleanup */
  	AtCommit_Notify();
  
- 	/*
- 	 * Update flat files if we changed pg_database, pg_authid or
- 	 * pg_auth_members.  This should be the last step before commit.
- 	 */
- 	AtEOXact_UpdateFlatFiles(true);
- 
  	/* Prevent cancel/die interrupt while cleaning up */
  	HOLD_INTERRUPTS();
  
--- 1607,1612 
*** PrepareTransaction(void)
*** 1797,1803 
  	/* close large objects before lower-level cleanup */
  	AtEOXact_LargeObject(true);
  
! 	/* NOTIFY and flatfiles will be handled below */
  
  	/*
  	 * Don't allow PREPARE TRANSACTION if we've accessed a temporary table in
--- 1790,1796 
  	/* close large objects before lower-level cleanup */
  	AtEOXact_LargeObject(true);
  
! 	/* NOTIFY will be handled below */
  
  	/*
  	 * Don't allow PREPARE TRANSACTION if we've accessed a temporary table in
*** PrepareTransaction(void)
*** 1860,1866 
  	StartPrepare(gxact);
  
  	AtPrepare_Notify();
- 	AtPrepare_UpdateFlatFiles();
  	AtPrepare_Inval();
  	AtPrepare_Locks();
  	AtPrepare_PgStat();
--- 1853,1858 
*** PrepareTransaction(void)
*** 1909,1915 
  	/* Clean up the snapshot manager */
  	AtEarlyCommit_Snapshot();
  
! 	/* notify and flatfiles don't need a postprepare call */
  
  	PostPrepare_PgStat();
  
--- 1901,1907 
  	/* Clean up the snapshot manager */
  	AtEarlyCommit_Snapshot();
  
! 	/* notify doesn't need a postprepare call */
  
  	PostPrepare_PgStat();
  
*** AbortTransaction(void)
*** 2036,2042 
  	AtAbort_Portals();
  	AtEOXact_LargeObject(false);	/* 'false' means it's abort */
  	AtAbort_Notify();
- 	AtEOXact_UpdateFlatFiles(false);
  
  	/*
  	 * Advertise the fact that we aborted in pg_clog (assuming that we got as
--- 2028,2033 
*** CommitSubTransaction(void)
*** 3764,3771 
  	AtEOSubXact_LargeObject(true, s-subTransactionId,
  			s-parent-subTransactionId);
  	AtSubCommit_Notify();
- 	AtEOSubXact_UpdateFlatFiles(true, s-subTransactionId,
- s-parent-subTransactionId);
  
  	CallSubXactCallbacks(SUBXACT_EVENT_COMMIT_SUB, s-subTransactionId,
  		 s-parent-subTransactionId);
--- 3755,3760 
*** AbortSubTransaction(void)
*** 3885,3892 
  		

Re: [HACKERS] remove flatfiles.c

2009-08-31 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 This patch removes flatfiles.c for good.

Aw, you beat me to it.

 Regarding sync commits that previously happen and now won't, I think the
 only case worth worrying about is the one in vacuum.c.  Do we need a
 ForceSyncCommit() in there?  I'm not sure if vacuum itself already
 forces sync commit.

Hmm, I had been assuming we wouldn't need that anymore.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-08-31 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:

  Regarding sync commits that previously happen and now won't, I think the
  only case worth worrying about is the one in vacuum.c.  Do we need a
  ForceSyncCommit() in there?  I'm not sure if vacuum itself already
  forces sync commit.
 
 Hmm, I had been assuming we wouldn't need that anymore.

The comment in user.c and dbcommands.c says


/*
 * Force synchronous commit, thus minimizing the window between
 * creation of the database files and commital of the 
transaction. If
 * we crash before committing, we'll have a DB that's taking up 
disk
 * space but is not in pg_database, which is not good.
 */
ForceSyncCommit();

so I think those ones are still necessary.  There's another call in
RenameDatabase() which I don't think needs a sync commit (because it
won't change the dir name), and one in vacuum.c:

/*
!* If we were able to advance datfrozenxid, mark the flat-file copy of
!* pg_database for update at commit, and see if we can truncate pg_clog.
!* Also force update if the shared XID-wrap-limit info is stale.
 */
if (dirty || !TransactionIdLimitIsValid())
-   {
-   database_file_update_needed();
vac_truncate_clog(newFrozenXid);
-   }
  }

AFAICT this doesn't need a sync commit.  (Right now, VACUUM FULL forces
one, but lazy vacuum doesn't).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove flatfiles.c

2009-08-31 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Tom Lane wrote:
 Hmm, I had been assuming we wouldn't need that anymore.

 The comment in user.c and dbcommands.c says [...]
 so I think those ones are still necessary.

Yeah, after a look through the code I think you can trust the associated
comments: if it says it needs sync commit, put in ForceSyncCommit, else
we don't need it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers