[GENERAL] VACUUM and ANALYZE With Empty Tables

2004-11-23 Thread Mark Dexter
Title: VACUUM and ANALYZE With Empty Tables






We use a development environment that works with Postgres via ODBC and uses cursors to insert and update rows in Postgres tables.  I'm using Postgres version 7.4.5.

I have a test program that reads 34,000 rows from an external file and inserts them into a Postgres table.  Under normal circumstances, it takes about 1 minute to complete the test.  In troubleshooting a performance problem, I have discovered the following:

A. If I TRUNCATE or DELETE all of the rows in the table and then run VACUUM or ANALYZE on the empty table, the test program takes over 15 minutes to complete (i.e., 15X performance drop).

B. If I drop and create the table without using VACUUM or ANALYZE, it takes about 1 minute to complete the test program.

C. If I insert 94 or more rows into the table and then run VACUUM or ANALYZE, it takes 1 minute to complete the test program.

D. If I insert 93 or fewer rows into the table and then run VACUUM or ANALYZE, it takes over 15 minutes to complete the test.

D. If the test program is running slowly, I can speed it up to normal speed by running ANALYZE  from another database session.

My concern about this is as follows.  Our application uses a number of "work" tables that will have many rows inserted into them during a process (e.g., a user running a report) and then the rows will be deleted once the process is over.  (I don't think we can use TEMPORARY tables because the tables need to be seen outside of the current database session.)  

If we routinely run VACUUM or VACUUM ANALYZE (e.g., nightly), these work tables will normally be empty when the VACUUM is run.  So it would appear from the testing above that they will experience performance problems when inserting large numbers of rows  through our application.

Is there some easy way around this problem?  If there a way to force VACUUM or ANALYZE to optimize for a set number of rows even if the table is empty when it is run?  Thanks for your help.   Mark




Re: [GENERAL] VACUUM and ANALYZE With Empty Tables

2004-11-24 Thread Mark Dexter
Thanks very much for the information.  It would appear that our best option 
might be to vacuum analyze these tables in our application at a point in time 
when they contain rows instead of doing it at night.  Needlesst to say, it 
would nice to have an option to analyze with a target number of rows instead of 
the number presently in the table.
 
I suppose another option would be to keep a small number of rows permanently in 
these tables.  In my testing, 100 rows (94 to be exact) did the trick.  Is this 
number going to vary from table to table?
 
Thanks again for your help.  Mark



From: Richard Huxton [mailto:[EMAIL PROTECTED]
Sent: Wed 11/24/2004 1:26 AM
To: Mark Dexter
Cc: [EMAIL PROTECTED]
Subject: Re: [GENERAL] VACUUM and ANALYZE With Empty Tables



Mark Dexter wrote:
> We use a development environment that works with Postgres via ODBC and
> uses cursors to insert and update rows in Postgres tables.  I'm using
> Postgres version 7.4.5.

> A. If I TRUNCATE or DELETE all of the rows in the table and then run
> VACUUM or ANALYZE on the empty table, the test program takes over 15
> minutes to complete (i.e., 15X performance drop).

> If we routinely run VACUUM or VACUUM ANALYZE (e.g., nightly), these work
> tables will normally be empty when the VACUUM is run.  So it would
> appear from the testing above that they will experience performance
> problems when inserting large numbers of rows  through our application.

Yep - it's a known issue. The analyse is doing what you asked, it's just
not what you want.

> Is there some easy way around this problem?  If there a way to force
> VACUUM or ANALYZE to optimize for a set number of rows even if the table
> is empty when it is run?  Thanks for your help.   Mark

There are only two options I know of:
  1. Vaccum analyse each table separately (tedious, I know)
  2. Try pg_autovacuum in the contrib/ directory

The autovacuum utility monitors activity for you and targets tables when
they've seen a certain amount of activity. Even if it hasn't got the
tunability you need, it should be a simple patch to add a list of
"excluded" tables.

--
   Richard Huxton
   Archonet Ltd



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[GENERAL] VACUUM and ANALYZE Follow-Up

2004-11-29 Thread Mark Dexter
Title: VACUUM and ANALYZE Follow-Up






Several recent postings appear to confirm that there is an issue with the use of VACUUM or ANALYZE on empty tables.  Specifically, if you

VACUUM or ANALYZE a table that is empty and then insert a large number of rows into this table, you will experience very poor performance.

For example, in our testing, we suffered a 15X performance penalty when inserting 35,000 rows into a table that had been VACUUM'd or

ANALYZE'd when empty.  Also, in our testing, it didn't matter whether you just did VACCUM or VACUUM ANALYZE -- in both cases the 

subsequent inserts were slow.


In the short run, the work-around appears to be either to avoid using these commands on empty tables or to keep some "dummy" rows in these

tables that don't get deleted (and use DELETE instead of TRUNCATE).


However, in the long run, it would seem to make sense to address the issue directly so DBA's and developers don't have to deal with it.  Several

possible solutions come to mind, and I'm sure there are others.


1. Provide an option with ANALYZE to force it to work as if a table had a minimum number of rows (e.g., ANALYZE MINIMUM 1000 would analyze 

tables as if they all had at least 1000 rows).

2. Provide an option during table creation to state the minimum number of rows to use for ANALYZE.

3. Just change ANALYZE to assume that all tables might have a reasonable number of rows at some point even if they are empty now.  (How much performance is actually gained currently when ANALYZE updates the stats for an empty table?)

In any case, it is hard to see how the present behaviour can be seen as desirable.  It obviously causes problems at least for new Postgres users, and we

all hope there will be many more of these folks in the future.  Thanks for considering this.  Mark





Re: [GENERAL] VACUUM and ANALYZE Follow-Up

2004-11-29 Thread Mark Dexter
Hmm... it seems that we're maybe not understanding one another here.
I'm going to try to be more clear.  Below are the questions that were
raised and my attemp to answer them clearly.

1. Why run VACUUM on an empty table?  

This is a good question, but perhaps there is a valid answer.  Our
production database contains about 1500 tables.  It is MUCH more
convenient to be able to VACCUM ANALYZE the entire database than to list
specific tables.  Furthermore, our application uses "work" tables that
often will be empty (e.g., at night) but that, at times, will contain a
large number of rows.  The Postgres documentation says the following:
"We recommend that active production databases be vacuumed frequently
(at least nightly), in order to remove expired rows.".  This is going to
be difficult without messing up the performance.

2. Hang on, it's an empty table. Is it supposed to fabricate these
statistics out of thin air? Any made up numbers will probably be worse
than none at all.

Well, that's why I suggested some type of command line option so the
user could give it a number of rows to use for the analysis (e.g.,
ANALYZE MINIMUM 1000).  

Another point of interest:  If I DROP and the CREATE the table, without
doing ANALYZE, I get good performance when inserting rows.  So whatever
assumptions the database is making about a newly-created table appear to
be different (and BETTER) than the assumptions made when doing ANALYZE
on an empty table.  It's not clear to me why this should be.  In both
cases, you don't really know anything about the table other than at this
moment it has zero rows.  Obviously, it would be better (at least in
this instance) if running ANALYZE on an empty table had the same
performance result as using CREATE to make a new (empty) table.

Finally, my testing would seem to contradict that any made-up number
will be better than none at all.  In my testing (inserting 35,000 rows
into an empty table), I could only  measure two distinct outcomes -- one
good and one bad (with a 15X performance difference).  I got good
performance with any of the folloiwng scenarios:  CREATE TABLE, ANALYZE
or VACUUM with more than 94 rows in the table.  I got bad performance if
I did ANALYZE or VACUUM with less than 94 rows in the table.  I could
not measure any difference between other numbers of rows (between 0 and
35,000).  So I don't think in practice it is that sensitive, at least in
the simple test case I was doing.

Below are two additional questions I have.

3. Is there some benefit to having ANALYZE behave the way it now does on
empty or nearly empty tables?  Is there a large performance improvement
for really small tables (e.g., under 100 rows or under 1000 rows)?  Does
anyone really care about performance for small tables? 

4. Isn't ANALYZE on a totally empty table really a special case?  The
presumption should be that the table will not remain empty.  To optimize
the performance assuming that there will be zero (or close to zero) rows
seems somewhat pointless.  However, there are valid reasons why a table
might be empty at the moment in time when the ANALYZE is run.  (In our
case, we use "work" tables that get cleared at the end of an application
process.)  And, as mentioned above, it is easier to VACUUM ANALYZE an
entire database than it is to list tables individually.

5. Why does DROP / CREATE work better than TRUNCATE / VACUUM in terms of
creating a fresh table in which to insert new rows?  Is this desirable?
In both cases, the optimizer doesn't really know anything about what to
expect for the table.  But CREATE provides a better starting point for
inserts than does VACUUM or ANALYZE, at least in my testing.

I am relatively new to Postgres, and I apologize if I'm repeating issues
that have been raised before.  However, it does seem to me to be an
important issue.  The easier it is to maintain a Postgres database, the
more likely it is to be widely used.  Thanks.  Mark

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [GENERAL] VACUUM and ANALYZE Follow-Up

2004-11-30 Thread Mark Dexter
Title: Re: [GENERAL] VACUUM and ANALYZE Follow-Up






Tom, I did read through the links you provided.  Unfortunately, I don't feel qualified to judge the technical merits of the possible solutions.  Since you appear to be well informed on this issue, can I ask you a couple of quick questions?

1. Would it be difficult to add an option to ANALYZE to force it to pretend that there are a minimum number of rows (e.g., ANALYZE MINIMUM 1000 or something)?  This would appear to be a simple-minded way to solve the problem without any concerns about backward compatibility.

2. Why does a newly CREATE'd table behave differently than an empty table after ANALYZE?  Does it make sense that it should?  In the CREATE case, the assumptions appear to be much more reasonable for a table that is going to grow.  

3. Has anyone ever tested whether there is a measurable performance gained after doing ANALYZE on empty or nearly empty tables?  We know that there is a very large (in my case 15x) performance loss when the table starts growing.  If the gain is small or negligable when the tables really are small, then perhaps worrying about maintaining current behaviour is not as important.

The nice thing about option (1) is that is solves the slow insert issue both for empty tables and for tables with a few rows.  It also causes absolutely no backward-compatibility issues.

Thanks very much for your comments on this.  Mark






Re: [GENERAL] disabling OIDs?

2004-12-13 Thread Mark Dexter
Title: Re: disabling OIDs?






For what it's worth, OIDs are required if you ever want to use updateable cursors with the ODBC driver.  We discovered this the hard way.  Mark Dexter




Re: [GENERAL] disabling OIDs?

2004-12-13 Thread Mark Dexter
I don't know why they use OID's for cursors.  But I do know that if you
run a trace the SQL that creates the cursor uses OID's, 
so it doesn't work if the table is created without OID's.  Also, if you
want to have updateable cursors against views (i.e., a view with rules
for INSERT, UPDATE, and DELETE), you must name the OID and CTID as
columns in the view.   Again, we learned this the hard way.  Mark Dexter

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 13, 2004 12:06 PM
To: Mark Dexter
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: disabling OIDs?



"Mark Dexter" <[EMAIL PROTECTED]> writes:

> > For what it's worth, OIDs are required if you ever want to use 
> > updateable cursors with the ODBC driver.  We discovered this the 
> > hard way.  Mark Dexter

That's unfortunate. Is it because it's difficult to track down the
primary key of the table? Is it any easier to track down the primary key
of the table in 8.0? It would be much better if it checked the primary
key and used that instead of OIDs.

Though I'm unclear implementing "updateable cursors" in the client-end
is really a good idea. I suppose it's nice if you understand the
limitations inherent.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] Version Control Software for Database Objects

2005-01-13 Thread Mark Dexter
Title: Version Control Software for Database Objects






We maintain multiple versions of our application's database and we are looking for version control software to help us automate this.  Specifically, we would like to have a program that automatically tracks all changes to the database (tables, views, functions, etc.) and assists with updating customers' databases from one version to the next.

Does anyone know of such a program that works with PostgreSQL?  Thanks for your help.






[GENERAL] Complex Update Queries with Fromlist

2004-10-14 Thread Mark Dexter
Title: Complex Update Queries with Fromlist






In Microsoft SQL Server, I can write an UPDATE query as follows:


update orders set RequiredDate = 

(case when c.City IN ('Seattle','Portland') then o.OrderDate + 2  else o.OrderDate + 1 end)

from orders o

join customers c on

o.Customerid = c.Customerid

where c.region in ('WA','OR')


This query finds 47 rows matching the WHERE clause and updates the RequiredDate in the Orders table based on data in the orders table and the customer table for these 47 rows.

It appears that I can do the same thing in Postgres with the following syntax:


update orders set RequiredDate = 

(case when c.city in ('Seattle','Portland') then date(o.OrderDate) + 1  

   else date(o.OrderDate) + 2 end)

from orders o

join customers c on

o.Customerid = c.Customerid

where c.region in ('WA','OR')

and orders.orderid = o.orderid


The only difference being that I need to add the join at the end to join the orders table in the update statement with the "orders o" table in the fromlist.

First, does this look correct?  It appears to work the way I want.  Second, it would be really nice if there was better documentation of the UPDATE statement in Postgres, including examples of this type.

Thanks.  


Mark Dexter

Dexter + Chaney

9700 Lake City Way NE, Seattle, WA  98115-2347

Direct Phone: 206.777.6819  Fax: 206-367-9613

General Phone: 800-875-1400  

Email: [EMAIL PROTECTED]






Re: [GENERAL] Complex Update Queries with Fromlist

2004-10-16 Thread Mark Dexter
Thank you.  That works for the simple example I had.

However, I have an additional question about this.  What if I need to do
a LEFT OUTER JOIN in the Fromlist.  For example, this query works but it
requires the orders table to appear twice.

UPDATE orders set requireddate =
  (case when c.city in ('Seattle','Portland') then date(o.OrderDate) + 1

else date(o.OrderDate) + 2 end)
FROM orders o   
LEFT OUTER JOIN customers c on
  o.customerid = c.customerid
  where c.region in ('WA','OR')
 and orders.orderid = o.orderid

Is there some way to do an outer join in the Fromlist back to the main
table being updated without having it appear twice?  Thanks.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]