Re: [HACKERS] Vacuum questions...

2005-09-30 Thread Jim C. Nasby
On Thu, Sep 29, 2005 at 12:50:13AM +0300, Hannu Krosing wrote:
> On T, 2005-09-27 at 17:57 -0500, Jim C. Nasby wrote:
> > On Tue, Sep 27, 2005 at 02:47:46PM -0400, Jan Wieck wrote:
> > > On 9/24/2005 8:17 PM, Jim C. Nasby wrote:
> > > 
> > > >Would it be difficult to vacuum as part of a dump? The reasoning behind
> > > >this is that you have to read the table to do the dump anyway, 
> > > 
> > > I think aside from what's been said so far, it would be rather difficult 
> > > anyway. pg_dump relies on MVCC and requires to run in one transaction to 
> > > see a consistent snapshot while vacuum jiggles around with transactions 
> > > in some rather non-standard way.
> > 
> > Is this true even if they were in different connections?
> > 
> > My (vague) understanding of the vacuum process is that it first vacuums
> > indexes, and then vacuums the heap. 
> 
> actually (lazy) vacuum does this
> 
> 1) scan heap, collect ctids of rows to remove
> 2) clean indexes
> 3) clean heap
> 
> > Since we don't dump indexes, there's
> > nothing for backup to do while those are vacuumed, so my idea is:
> > 
> > pg_dump:
> > foreach (table)
> > spawn vacuum
> > wait for vacuum to hit heap
> > start copy
> > wait for analyze to finish
> > next;
> 
> probably the first heap scan of vacuum would go faster than dump as it
> does not have to write out anything, and the second scan ( nr 3 in above
> list ) would be either faster or slower, as it has to lock each page and
> rearrange tuples there.
> 
> so it would be very hard to synchronize vacuum with either of them.

Well, I guess it depends on what the dump was writing to. Also depends
on available cache I expect.

Is this something that could be hacked together fairly easy just for
testing purposes? Would firing off a VACUUM tablename at the same time
as a COPY tablename be a good enough approximation?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Vacuum questions...

2005-09-28 Thread Hannu Krosing
On T, 2005-09-27 at 17:57 -0500, Jim C. Nasby wrote:
> On Tue, Sep 27, 2005 at 02:47:46PM -0400, Jan Wieck wrote:
> > On 9/24/2005 8:17 PM, Jim C. Nasby wrote:
> > 
> > >Would it be difficult to vacuum as part of a dump? The reasoning behind
> > >this is that you have to read the table to do the dump anyway, 
> > 
> > I think aside from what's been said so far, it would be rather difficult 
> > anyway. pg_dump relies on MVCC and requires to run in one transaction to 
> > see a consistent snapshot while vacuum jiggles around with transactions 
> > in some rather non-standard way.
> 
> Is this true even if they were in different connections?
> 
> My (vague) understanding of the vacuum process is that it first vacuums
> indexes, and then vacuums the heap. 

actually (lazy) vacuum does this

1) scan heap, collect ctids of rows to remove
2) clean indexes
3) clean heap

> Since we don't dump indexes, there's
> nothing for backup to do while those are vacuumed, so my idea is:
> 
> pg_dump:
> foreach (table)
> spawn vacuum
> wait for vacuum to hit heap
> start copy
> wait for analyze to finish
> next;

probably the first heap scan of vacuum would go faster than dump as it
does not have to write out anything, and the second scan ( nr 3 in above
list ) would be either faster or slower, as it has to lock each page and
rearrange tuples there.

so it would be very hard to synchronize vacuum with either of them.

-- 
Hannu Krosing <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Vacuum questions...

2005-09-27 Thread Jim C. Nasby
On Tue, Sep 27, 2005 at 07:12:21PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > AFAIK, this should allow both to run in seperate transactions.
> 
> ... and pretty much destroy any synchronization between the two scans,
> which was sort of the point wasn't it?

Aren't there ways to sync them outside of a transaction? My theory is
that you don't need to syncronize them at the tuple level, since
whichever one gets ahead reading the HEAP will be pulling the data off
the drive, while the one that's behind will just grab it out of the
buffer (or at worst, the kernel's cache). So all you should need to do
is start both scans at about (as in within a few seconds) the same time.

Heck, if vacuum was made to put more verbose info in it's process status
then it could be as simple as having pg_dump start a vacuum of a table
in a seperate connection and just watching for the status to indicate it
had started vacuuming the table.

I *think* this shouldn't be too hard to test, which is good since it's
all theory right now. :)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Vacuum questions...

2005-09-27 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> AFAIK, this should allow both to run in seperate transactions.

... and pretty much destroy any synchronization between the two scans,
which was sort of the point wasn't it?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Vacuum questions...

2005-09-27 Thread Jim C. Nasby
On Tue, Sep 27, 2005 at 02:47:46PM -0400, Jan Wieck wrote:
> On 9/24/2005 8:17 PM, Jim C. Nasby wrote:
> 
> >Would it be difficult to vacuum as part of a dump? The reasoning behind
> >this is that you have to read the table to do the dump anyway, 
> 
> I think aside from what's been said so far, it would be rather difficult 
> anyway. pg_dump relies on MVCC and requires to run in one transaction to 
> see a consistent snapshot while vacuum jiggles around with transactions 
> in some rather non-standard way.

Is this true even if they were in different connections?

My (vague) understanding of the vacuum process is that it first vacuums
indexes, and then vacuums the heap. Since we don't dump indexes, there's
nothing for backup to do while those are vacuumed, so my idea is:

pg_dump:
foreach (table)
spawn vacuum
wait for vacuum to hit heap
start copy
wait for analyze to finish
next;

dump_vacuum (table):
foreach (index on table)
vacuum index
next;

notify pg_dump we're going to start vacuum of heap
vacuum heap

if we should analyze {
analyze table
}

notify pg_dump analyze is done
exit

AFAIK, this should allow both to run in seperate transactions. Granted,
it would slow down the dump, since it would have to wait while indexes
were being vacuumed, but it would win when it came to the heap.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Vacuum questions...

2005-09-27 Thread Jan Wieck

On 9/24/2005 8:17 PM, Jim C. Nasby wrote:


Would it be difficult to vacuum as part of a dump? The reasoning behind
this is that you have to read the table to do the dump anyway, 


I think aside from what's been said so far, it would be rather difficult 
anyway. pg_dump relies on MVCC and requires to run in one transaction to 
see a consistent snapshot while vacuum jiggles around with transactions 
in some rather non-standard way.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Vacuum questions...

2005-09-27 Thread Gaetano Mendola
Gaetano Mendola wrote:
> Alvaro Herrera wrote:
>> On Mon, Sep 26, 2005 at 05:41:24PM +0200, Gaetano Mendola wrote:
>>> Joshua D. Drake wrote:
 Autovacuum is integrated into the backend for 8.1
>>> Can I set the autovacuum parameter per table instead of per
>>> engine ?
>> Yes.
> 

Reading the 8.1 release note I found:


Move /contrib/pg_autovacuum into the main server (Alvaro)

Integrating autovacuum into the server allows it to be automatically started
and stopped in sync with the database server, and allows autovacuum to be
configured from postgresql.conf.


May be it could be useles mention that was not exactly pg_autovacuum moved
because for example you can now set parameter per table and pg_autvacuum did 
not.


Regards
Gaetano Mendola



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Vacuum questions...

2005-09-26 Thread Gaetano Mendola
Alvaro Herrera wrote:
> On Mon, Sep 26, 2005 at 05:41:24PM +0200, Gaetano Mendola wrote:
>> Joshua D. Drake wrote:
> 
>>> Autovacuum is integrated into the backend for 8.1
>> Can I set the autovacuum parameter per table instead of per
>> engine ?
> 
> Yes.

Finally :-)

good work.

Regards
Gaetano Mendola

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Vacuum questions...

2005-09-26 Thread Jim C. Nasby
On Sat, Sep 24, 2005 at 08:25:30PM -0700, Joshua D. Drake wrote:
> Jim C. Nasby wrote:
> 
> >Would it be difficult to vacuum as part of a dump? The reasoning behind
> >this is that you have to read the table to do the dump anyway, so it
> >would be a good time to be able to piggy-back other operations that need
> >to read the entire table on top. I know vacuuming of indexes complicates
> >this, so it's probably not as simple as just firing off a vacuum and
> >copy at the same time (although that idea is probably worth testing,
> >since it might still be a win).
> > 
> >
> This would be a nightmare on a large database. Think of how
> long it takes to dump 20 gig, now add how long it is going to
> take to vacuum that size of DB, now think about a 500 gig
> database.

What says that the length of the dump is limited by the disks the
database is on? I suspect in many cases it's not.

Anyway, this would certainly be an optional step, so if it turns out it
hurts you in your environment, you don't have to use it.

> Actually this also probably would not gain you much in 8.1
> as vacuum in theory is already dealing with itself.

Maybe, maybe not. If you're already reading the entire table to dump it,
why read the entire table again some other time to vacuum it. Just get
it all done at once.

Whenever we get a 'map of pages that need vacuuming' which will
presumably greatly speed up vacuum then maybe your argument makes sense.
Right now I don't see how it helps anything.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Vacuum questions...

2005-09-26 Thread Jim C. Nasby
On Sun, Sep 25, 2005 at 11:50:14AM -0400, Tom Lane wrote:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > Perhaps VACUUM could send some statistics after each N pages and this
> > would then be available through something similar to pg_statistics
> > table.
> 
> Why not just have it send some text to be displayed in the "current
> command" field of pg_stat_activity?  The infrastructure is all there
> already for that.

If someone wanted to write a nice interface showing the status of a
vacuum it would be easier if they didn't have to parse a text field...
but *anything* would be a vast improvement over what we have now.

newbie TODO?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Vacuum questions...

2005-09-26 Thread Alvaro Herrera
On Mon, Sep 26, 2005 at 05:41:24PM +0200, Gaetano Mendola wrote:
> Joshua D. Drake wrote:

> > Autovacuum is integrated into the backend for 8.1
> 
> Can I set the autovacuum parameter per table instead of per
> engine ?

Yes.

-- 
Alvaro Herrera Architect, http://www.EnterpriseDB.com
Jude: I wish humans laid eggs
Ringlord: Why would you want humans to lay eggs?
Jude: So I can eat them

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Vacuum questions...

2005-09-26 Thread Gaetano Mendola
Joshua D. Drake wrote:
> Hannu Krosing wrote:
> 
>> On L, 2005-09-24 at 20:25 -0700, Joshua D. Drake wrote:
>>
>>  
>>
>>> Actually this also probably would not gain you much in 8.1
>>> as vacuum in theory is already dealing with itself.
>>>   
>>
>> Interesting. Could you explain it in a more detailed way ?
>> How does vacuum "deal with itself" in 8.1 ?
>>  
>>
> Autovacuum is integrated into the backend for 8.1

Can I set the autovacuum parameter per table instead of per
engine ? I'm using pg_autovacuum right now in 7.4 and is not
enough because some tables ( one that implement a materialized
view for example ) are out of an average engine usage and other
tables are so huge to not be analyzed for months.


Regards
Gaetano Mendola



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Vacuum questions...

2005-09-25 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes:
> Perhaps VACUUM could send some statistics after each N pages and this
> would then be available through something similar to pg_statistics
> table.

Why not just have it send some text to be displayed in the "current
command" field of pg_stat_activity?  The infrastructure is all there
already for that.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Vacuum questions...

2005-09-25 Thread Joshua D. Drake

Hannu Krosing wrote:


On L, 2005-09-24 at 20:25 -0700, Joshua D. Drake wrote:

 


Actually this also probably would not gain you much in 8.1
as vacuum in theory is already dealing with itself.
   



Interesting. Could you explain it in a more detailed way ?
How does vacuum "deal with itself" in 8.1 ?
 


Autovacuum is integrated into the backend for 8.1


Not sure of a "quick scan" approach, espacially for tables big enough
for the progress info would be interesting (in my experience a scan is
never quick).
 


It would be a seq so on a larger table it would probably be
a long time. I was thinking if there was some mapping of
known dead rows or something so we didn't have to scan
the page for the statistics.

We of course would scan to do the actual work but if vacuum
cleared the map while doing the work it may not be that bad.

Sincerely,

Joshua D. Drake



Perhaps VACUUM could send some statistics after each N pages and this
would then be available through something similar to pg_statistics
table.

 




--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Vacuum questions...

2005-09-25 Thread Hannu Krosing
On L, 2005-09-24 at 20:25 -0700, Joshua D. Drake wrote:

> Actually this also probably would not gain you much in 8.1
> as vacuum in theory is already dealing with itself.

Interesting. Could you explain it in a more detailed way ?
How does vacuum "deal with itself" in 8.1 ?

> >Also, would it be possible to add some means to check the status of a
> >running vacuum? Even with vacuum verbose, once it starts in on a large
> >table you have no way to know how far along it is.
> >  
> >
> That is an interesting thought... Perhaps a quick scan of
> the table to see how many dead rows there are? Then check
> back every n/10 ? Hmmm... I am not a C guy so I don't know if
> that is technically feasible (although probably possible) but it
> is interesting from a DBA perspective.

Not sure of a "quick scan" approach, espacially for tables big enough
for the progress info would be interesting (in my experience a scan is
never quick).

Perhaps VACUUM could send some statistics after each N pages and this
would then be available through something similar to pg_statistics
table.

-- 
Hannu Krosing <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Vacuum questions...

2005-09-24 Thread Alvaro Herrera
On Sat, Sep 24, 2005 at 07:17:38PM -0500, Jim C. Nasby wrote:

> Finally, if vacuum_delay is enabled, does vacuum_cost_page_miss consider
> a miss as not in the database buffer, or not in the kernel buffer?

The database buffer.

> I
> remember discussions about trying to track IO request times to try and
> determine if something came out of kernel buffers or not, but AFAIK
> that's all vaporware right now...

I don't remember the discussion, but it certainly hasn't been
implemented.

-- 
Alvaro Herrerahttp://www.PlanetPostgreSQL.org
"Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Vacuum questions...

2005-09-24 Thread Joshua D. Drake

Jim C. Nasby wrote:


Would it be difficult to vacuum as part of a dump? The reasoning behind
this is that you have to read the table to do the dump anyway, so it
would be a good time to be able to piggy-back other operations that need
to read the entire table on top. I know vacuuming of indexes complicates
this, so it's probably not as simple as just firing off a vacuum and
copy at the same time (although that idea is probably worth testing,
since it might still be a win).
 


This would be a nightmare on a large database. Think of how
long it takes to dump 20 gig, now add how long it is going to
take to vacuum that size of DB, now think about a 500 gig
database.

Actually this also probably would not gain you much in 8.1
as vacuum in theory is already dealing with itself.


When dropping a table or index, is it's space immediately released in
the FSM?
 


I would have to double check but I believe you would have to
vacuum to reclaim the space to the FSM because the relationship
is still there just like when you delete (but not truncate).


Also, would it be possible to add some means to check the status of a
running vacuum? Even with vacuum verbose, once it starts in on a large
table you have no way to know how far along it is.
 


That is an interesting thought... Perhaps a quick scan of
the table to see how many dead rows there are? Then check
back every n/10 ? Hmmm... I am not a C guy so I don't know if
that is technically feasible (although probably possible) but it
is interesting from a DBA perspective.

Although that could be an issue on a large table as well I think.


Finally, if vacuum_delay is enabled, does vacuum_cost_page_miss consider
a miss as not in the database buffer, or not in the kernel buffer? I
remember discussions about trying to track IO request times to try and
determine if something came out of kernel buffers or not, but AFAIK
that's all vaporware right now...
 


Good question, anyone else?

Sincerely,

Joshua D. Drake





--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] Vacuum questions...

2005-09-24 Thread Jim C. Nasby
Would it be difficult to vacuum as part of a dump? The reasoning behind
this is that you have to read the table to do the dump anyway, so it
would be a good time to be able to piggy-back other operations that need
to read the entire table on top. I know vacuuming of indexes complicates
this, so it's probably not as simple as just firing off a vacuum and
copy at the same time (although that idea is probably worth testing,
since it might still be a win).

When dropping a table or index, is it's space immediately released in
the FSM?

Also, would it be possible to add some means to check the status of a
running vacuum? Even with vacuum verbose, once it starts in on a large
table you have no way to know how far along it is.

Finally, if vacuum_delay is enabled, does vacuum_cost_page_miss consider
a miss as not in the database buffer, or not in the kernel buffer? I
remember discussions about trying to track IO request times to try and
determine if something came out of kernel buffers or not, but AFAIK
that's all vaporware right now...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq