Re: [PERFORM] Autovacuum / full vacuum

2006-01-18 Thread Michael Riess

There's a number of sites that have lots of info on postgresql.conf
tuning. Google for 'postgresql.conf tuning' or 'annotated
postgresql.conf'.


I know some of these sites, but who should I know if the information on 
those pages is correct? The information on those pages should be 
published as part of the postgres documentation. Doesn't have to be too 
much, maybe like this page:


http://www.powerpostgresql.com/Downloads/annotated_conf_80.html

But it should be part of the documentation to show newbies that not only 
the information is correct, but also approved of and recommended by the 
postgres team.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] wildcard search performance with like

2006-01-18 Thread Michael Riess
As far as I know the index is only used when you do a prefix search, for 
example


col like 'xyz%'

I think that if you are looking for expressions such as 'A%B', you could 
rephrase them like this:


col like 'A%' AND col like 'A%B'

So the database could use the index to narrow down the result and then 
do a sequential search for the second condition.


Mike


Yantao Shi schrieb:

Hi,

I have a postges 8.1.1 table with over 29 million rows in it. The colunm 
(file_name) that I need to search on has entries like the following:


MOD04_L2.A2005311.1400.004.2005312013848.hdf

MYD04_L2.A2005311.0700.004.2005312013437.hdf
I have an index on this column. But an index search is performance only 
when I give the full file_name for search:


testdbspc=# explain select file_name from catalog where file_name = 
'MOD04_L2.A2005311.1400.004.2005312013848.hdf';

QUERY PLAN
Index Scan using catalog_pk_idx on catalog  (cost=0.00..6.01 rows=1 
width=404)
 Index Cond: (file_name = 
'MOD04_L2.A2005311.1400.004.2005312013848.hdf'::bpchar)

(2 rows)

What I really need to do most of the time is a multi-wildcard search on 
this column, which is now doing a whole table scan without using the 
index at all:


testdbspc=# explain select file_name from catalog where file_name like 
'MOD04_L2.A2005311.%.004.2005312013%.hdf';

QUERY PLAN
Seq Scan on catalog  (cost=0.00..429.00 rows=1 width=404)
 Filter: (file_name ~~ 'MOD04_L2.A2005311.%.004.2005312013%.hdf'::text)
(2 rows)

Obviously, the performance of the table scan on such a large table is 
not acceptable.


I tried full-text indexing and searching. It did NOT work on this column 
because all the letters and numbers are linked together with . and 
considered one big single word by to_tsvector.


Any solutions for this column to use an index search with multiple wild 
cards?


Thanks a lot,
Yantao Shi




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Michael Riess

hi,

I'm curious as to why autovacuum is not designed to do full vacuum. I 
know that the necessity of doing full vacuums can be reduced by 
increasing the FSM, but in my opinion that is the wrong decision for 
many applications. My application does not continuously 
insert/update/delete tuples at a constant rate. Basically there are long 
periods of relatively few modifications and short burst of high 
activity. Increasing the FSM so that even during these bursts most space 
 would be reused would mean to reduce the available memory for all 
other database tasks.


So my question is: What's the use of an autovacuum daemon if I still 
have to use a cron job to do full vacuums? wouldn't it just be a minor 
job to enhance autovacuum to be able to perform full vacuums, if one 
really wants it to do that - even if some developers think that it's the 
wrong approach?


Mike

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Michael Riess

Hi,

did you read my post? In the first part I explained why I don't want to 
increase the FSM that much.


Mike

So my question is: What's the use of an autovacuum daemon if I still 
have to use a cron job to do full vacuums? wouldn't it just be a minor 
job to enhance autovacuum to be able to perform full vacuums, if one 
really wants it to do that - even if some developers think that it's 
the wrong approach?


You should never have to do full vacuums...

Chris

---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Michael Riess



VACUUM FULL blocks the application.  That is NOT something that anyone
wants to throw into the activity mix randomly.


There must be a way to implement a daemon which frees up space of a 
relation without blocking it too long. It could abort after a certain 
number of blocks have been freed and then move to the next relation.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Michael Riess

Well,

I think that the documentation is not exactly easy to understand. I 
always wondered why there are no examples for common postgresql 
configurations. All I know is that the default configuration seems to be 
too low for production use. And while running postgres I get no hints as 
to which setting needs to be increased to improve performance. I have no 
chance to see if my FSM settings are too low other than to run vacuum 
full verbose in psql, pipe the result to a text file and grep for some 
words to get a somewhat comprehensive idea of how much unused space 
there is in my system.


Don't get me wrong - I really like PostgreSQL and it works well in my 
application. But somehow I feel that it might run much better ...


about the FSM: You say that increasing the FSM is fairly cheap - how 
should I know that?



did you read my post? In the first part I explained why I don't want
to increase the FSM that much.


No, you didn't.  You explained *that* you thought you didn't want to
increase the FSM.  You didn't explain why.

FSM expansion comes fairly cheap  ...


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Michael Riess

Hi,


hi,

I'm curious as to why autovacuum is not designed to do full vacuum. I 


Because nothing that runs automatically should ever take an exclusive
lock on the entire database, which is what VACUUM FULL does.


I thought that vacuum full only locks the table which it currently 
operates on? I'm pretty sure that once a table has been vacuumed, it can 
be accessed without any restrictions while the vacuum process works on 
the next table.




activity. Increasing the FSM so that even during these bursts most space 
 would be reused would mean to reduce the available memory for all 
other database tasks.


I don't believe the hit is enough that you should even notice it. 
You'd have to post some pretty incredible use cases to show that the

tiny loss of memory to FSM is worth (a) an exclusive lock and (b) the
loss of efficiency you get from having some preallocated pages in
tables.


I have 5000 tables and a workstation with 1 GB RAM which hosts an Apache 
  Web Server, Tomcat Servlet Container and PostgreSQL. RAM is not 
something that I have plenty of ... and the hardware is fixed and cannot 
be changed.




---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Materialized Views

2006-01-16 Thread Michael Riess

Hi,

I've been reading an interesting article which compared different 
database systems, focusing on materialized views. I was wondering how 
the postgresql developers feel about this feature ... is it planned to 
implement materialized views any time soon? They would greatly improve 
both performance and readability (and thus maintainability) of my code.


In particular I'm interested in a view which materializes whenever 
queried, and is invalidated as soon as underlying data is changed.


Mike

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Materialized Views

2006-01-16 Thread Michael Riess

Thanks!

Of course I know that I can build materialized views with triggers, but 
so far I've avoided using triggers altogether ... I would really 
appreciate something like create view foo (select * from b) materialize 
on query.


But I'll look into your blog entry, thanks again!

Mike

On Mon, 16 Jan 2006 15:36:53 +0100
Michael Riess [EMAIL PROTECTED] wrote:


Hi,

I've been reading an interesting article which compared different 
database systems, focusing on materialized views. I was wondering how 
the postgresql developers feel about this feature ... is it planned

to implement materialized views any time soon? They would greatly
improve both performance and readability (and thus maintainability)
of my code.

In particular I'm interested in a view which materializes whenever 
queried, and is invalidated as soon as underlying data is changed.


  You can already build materialized views in PostgreSQL, but you
  end up doing the heavy lifting yourself with triggers. You put
  insert/update/delete triggers on the underlying tables of your
  view that do the right thing in your materialized view table.

  I wrote a blog entry about this recently,
  http://revsys.com/blog/archive/9, where I used a very simple
  materialized view to achieve the performance I needed. It has links
  to the relevant documentation you'll need however to build triggers
  for a more complex situation. 

  Hope this helps! 


 -
   Frank Wiles [EMAIL PROTECTED]
   http://www.wiles.org
 -


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Read only transactions - Commit or Rollback

2005-12-20 Thread Michael Riess

Markus Schaber schrieb:

Hello,

We have a database containing PostGIS MAP data, it is accessed mainly
via JDBC. There are multiple simultaneous read-only connections taken
from the JBoss connection pooling, and there usually are no active
writers. We use connection.setReadOnly(true).

Now my question is what is best performance-wise, if it does make any
difference at all:

Having autocommit on or off? (I presume off)



If you are using large ResultSets, it is interesting to know that 
Statement.setFetchSize() does not do anything as long as you have 
autocommit on. So you might want to always disable autocommit and set a 
reasonable fetch size with large results, or otherwise have serious 
memory problems in Java/JDBC.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-07 Thread Michael Riess

Christopher Kings-Lynne schrieb:
No, my problem is that using TSearch2 interferes with other core 
components of postgres like (auto)vacuum or dump/restore.


That's nonsense...seriously.

The only trick with dump/restore is that you have to install the 
tsearch2 shared library before restoring.  That's the same as all 
contribs though.


Well, then it changed since I last read the documentation. That was 
about a year ago, and since then we are using Lucene ... and as it works 
quite nicely, I see no reason to switch to TSearch2. Including it with 
the pgsql core would make it much more attractive to me, as it seems to 
me that once included into the core, features seem to be more stable. 
Call me paranoid, if you must ... ;-)





Chris


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Michael Riess

Hi,

Is it possible to get this query run faster than it does now, by adding
indexes, changing the query?

SELECT customers.objectid FROM prototype.customers, prototype.addresses
WHERE
customers.contactaddress = addresses.objectid
ORDER BY zipCode asc, housenumber asc
LIMIT 1 OFFSET 283745

Explain:

Limit  (cost=90956.71..90956.71 rows=1 width=55)
  -  Sort  (cost=90247.34..91169.63 rows=368915 width=55)
Sort Key: addresses.zipcode, addresses.housenumber
-  Hash Join  (cost=14598.44..56135.75 rows=368915 width=55)
  Hash Cond: (outer.contactaddress = inner.objectid)
  -  Seq Scan on customers  (cost=0.00..31392.15
rows=368915 width=80)
  -  Hash  (cost=13675.15..13675.15 rows=369315 width=55)
-  Seq Scan on addresses  (cost=0.00..13675.15
rows=369315 width=55)

The customers table has an index on contactaddress and objectid.
The addresses table has an index on zipcode+housenumber and objectid.


When the resulting relation contains all the info from both tables, 
indexes won't help, seq scan is inevitable.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Michael Riess


Has anyone ever compared TSearch2 to Lucene, as far as performance is 
concerned?


I'll stay away from TSearch2 until it is fully integrated in the 
postgres core (like create index foo_text on foo (texta, textb) USING 
TSearch2). Because a full integration is unlikely to happen in the near 
future (as far as I know), I'll stick to Lucene.


Mike

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Michael Riess

Bruce Momjian schrieb:

Oleg Bartunov wrote:

Folks,

tsearch2 and Lucene are very different search engines, so it'd be unfair
comparison. If you need full access to metadata and instant indexing
you, probably, find tsearch2 is more suitable then Lucene. But, if 
you could live without that features and need to search read only

archives you need Lucene.

Tsearch2 integration into pgsql would be cool, but, I see no problem to 
use tsearch2 as an official extension module. After completing our

todo, which we hope will likely  happens for 8.2 release, you could
forget about Lucene and other engines :) We'll be available for developing
in spring and we estimate about three months for our todo, so, it's
really doable.


Agreed.  There isn't anything magical about a plug-in vs something
integrated, as least in PostgreSQL.  In other database, plug-ins can't
fully function as integrated, but in PostgreSQL, everything is really a
plug-in because it is all abstracted.



I only remember evaluating TSearch2 about a year ago, and when I read 
statements like Vacuum and/or database dump/restore work differently 
when using TSearch2, sql scripts need to be executed etc. I knew that I 
would not want to go there.


But I don't doubt that it works, and that it is a sane concept.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] postgresql performance tuning

2005-12-06 Thread Michael Riess

Ameet Kini schrieb:


This didn't get through the first time around, so resending it again.
Sorry for any duplicate entries.

Hello,

I have a question on postgres's performance tuning, in particular, the
vacuum and reindex commands. Currently I do a vacuum (without full) on all
of my tables.  


I'm curious ... why no full vacuum? I bet that the full vacuum will 
compact your (index) tables as much as a reindex would.


I guess the best advice is to increase FSM and to use autovacuum.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Michael Riess
No, my problem is that using TSearch2 interferes with other core 
components of postgres like (auto)vacuum or dump/restore.




...

So you'll avoid a non-core product and instead only use another non-core 
product...?


Chris

Michael Riess wrote:


Has anyone ever compared TSearch2 to Lucene, as far as performance is 
concerned?



I'll stay away from TSearch2 until it is fully integrated in the 
postgres core (like create index foo_text on foo (texta, textb) USING 
TSearch2). Because a full integration is unlikely to happen in the 
near future (as far as I know), I'll stick to Lucene.


Mike

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] 15,000 tables - next step

2005-12-04 Thread Michael Riess

William Yu schrieb:
 Michael Riess wrote:
 Well, I'd think that's were your problem is.  Not only you have a
 (relatively speaking) small server -- you also share it with other
 very-memory-hungry services!  That's not a situation I'd like to be in.
 Try putting Apache and Tomcat elsewhere, and leave the bulk of the 1GB
 to Postgres.


 No can do. I can try to switch to a 2GB machine, but I will not use 
several machines. Not for a 5GB database. ;-)


 With 1500 shared buffers you are not really going
 anywhere -- you should have ten times that at the very least.


 Like I said - I tried to double the buffers and the performance did 
not improve in the least. And I also tried this on a 2GB machine, and 
swapping was not a problem. If I used 10x more buffers, I would in 
essence remove the OS buffers.


 Increasing buffers do improve performance -- if you have enough 
memory. You just don't have enough memory to play with. My servers run 
w/ 10K buffers (128MB on 64-bit FC4) and it definitely runs better w/ it 
at 10K versus 1500.


 With that many tables, your system catalogs are probably huge.


content2=# select sum(relpages) from pg_class where relname like 'pg_%';
  sum
---
 64088
(1 row)

:-)


 While my situtation was fixable by scheduling a nightly 
vacuum/analyze on the system catalogs to get rid of the bazillion dead 
table/index info, you have no choice but to get more memory so you can 
stuff your entire system catalog into buffers/os cache. Personally, w/ 
1GB of ECC RAM at ~$85, it's a no brainer. Get as much memory as your 
server can support.


The problem is that we use pre-built hardware which isn't configurable. 
We can only switch to a bigger server with 2GB, but that's tops.


I will do the following:

- switch to 10k buffers on a 1GB machine, 20k buffers on a 2GB machine
- try to optimize my connection polls to remember which apps (groups of 
30 tables) were accessed, so that there is a better chance of using caches
- swap out tables which are rarely used: export the content, drop the 
table, and re-create it on the fly upon access.


Thanks for your comments!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] 15,000 tables - next step

2005-12-03 Thread Michael Riess

Jan Wieck schrieb:

On 12/2/2005 6:01 PM, Michael Riess wrote:


Hi,

thanks for your comments so far - I appreciate it. I'd like to narrow 
down my problem a bit:


As I said in the other thread, I estimate that only 20% of the 15,000 
tables are accessed regularly. So I don't think that vacuuming or the 
number of file handles is a problem. Have a look at this:


What makes you think that? Have you at least tried to adjust your shared 
buffers, freespace map settings and background writer options to values 
that match your DB? How does increasing the kernel file desctriptor 
limit (try the current limit times 5 or 10) affect your performance?





Of course I tried to tune these settings. You should take into account 
that the majority of the tables are rarely ever modified, therefore I 
don't think that I need a gigantic freespace map. And the background 
writer never complained.


Shared memory ... I currently use 1500 buffers for 50 connections, and 
performance really suffered when I used 3000 buffers. The problem is 
that it is a 1GB machine, and Apache + Tomcat need about 400MB.


But thanks for your suggestions! I guess that I'll have to find a way to 
reduce the number of tables. Unfortunately my application needs them, so 
I'll have to find a way to delete rarely used tables and create them on 
the fly when they're accessed again. But this will really make my 
application much more complex and error-prone, and I had hoped that the 
database system could take care of that. I still think that a database 
system's performance should not suffer from the mere presence of unused 
tables.


Mike

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 15,000 tables - next step

2005-12-03 Thread Michael Riess

Alvaro Herrera schrieb:

Michael Riess wrote:

Shared memory ... I currently use 1500 buffers for 50 connections, and 
performance really suffered when I used 3000 buffers. The problem is 
that it is a 1GB machine, and Apache + Tomcat need about 400MB.


Well, I'd think that's were your problem is.  Not only you have a
(relatively speaking) small server -- you also share it with other
very-memory-hungry services!  That's not a situation I'd like to be in.
Try putting Apache and Tomcat elsewhere, and leave the bulk of the 1GB
to Postgres. 


No can do. I can try to switch to a 2GB machine, but I will not use 
several machines. Not for a 5GB database. ;-)



With 1500 shared buffers you are not really going
anywhere -- you should have ten times that at the very least.



Like I said - I tried to double the buffers and the performance did not 
improve in the least. And I also tried this on a 2GB machine, and 
swapping was not a problem. If I used 10x more buffers, I would in 
essence remove the OS buffers.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] 15,000 tables - next step

2005-12-02 Thread Michael Riess

Hi,

thanks for your comments so far - I appreciate it. I'd like to narrow 
down my problem a bit:


As I said in the other thread, I estimate that only 20% of the 15,000 
tables are accessed regularly. So I don't think that vacuuming or the 
number of file handles is a problem. Have a look at this:


content2=# select relpages, relname from pg_class order by relpages desc 
limit 20;

 relpages | relname
--+-
11867 | pg_attribute
10893 | pg_attribute_relid_attnam_index
 3719 | pg_class_relname_nsp_index
 3310 | wsobjects_types
 3103 | pg_class
 2933 | wsobjects_types_fields
 2903 | wsod_133143
 2719 | pg_attribute_relid_attnum_index
 2712 | wsod_109727
 2666 | pg_toast_98845
 2601 | pg_toast_9139566
 1876 | wsod_32168
 1837 | pg_toast_138780
 1678 | pg_toast_101427
 1409 | wsobjects_types_fields_idx
 1088 | wso_log
  943 | pg_depend
  797 | pg_depend_depender_index
  737 | wsod_3100
  716 | wp_hp_zen

I don't think that postgres was designed for a situation like this, 
where a system table that should be fairly small (pg_attribute) is this 
large.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] 15,000 tables

2005-12-01 Thread Michael Riess

Hi,

we are currently running a postgres server (upgraded to 8.1) which has 
one large database with approx. 15,000 tables. Unfortunately performance 
suffers from that, because the internal tables (especially that which 
holds the attribute info) get too large.


(We NEED that many tables, please don't recommend to reduce them)

Logically these tables could be grouped into 500 databases. My question is:

Would performance be better if I had 500 databases (on one postgres 
server instance) which each contain 30 tables, or is it better to have 
one large database with 15,000 tables? In the old days of postgres 6.5 
we tried that, but performance was horrible with many databases ...


BTW: I searched the mailing list, but found nothing on the subject - and 
there also isn't any information in the documentation about the effects 
of the number of databases, tables or attributes on the performance.


Now, what do you say? Thanks in advance for any comment!

Mike

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Michael Riess

Hi David,

incidentally: The directory which holds our datbase currently contains 
73883 files ... do I get a prize or something? ;-)


Regards,

Mike

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Michael Riess

Hi David,



with 15,000 tables you are talking about a LOT of files to hold these 
(30,000 files with one index each and each database being small enough 
to not need more then one file to hold it), on linux ext2/3 this many 
files in one directory will slow you down horribly. 


We use ReiserFS, and I don't think that this is causing the problem ... 
although it would probably help to split the directory up using tablespaces.


But thanks for the suggestion!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Michael Riess

Hi,



On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote:

Hi,

we are currently running a postgres server (upgraded to 8.1) which has
one large database with approx. 15,000 tables. Unfortunately performance
suffers from that, because the internal tables (especially that which
holds the attribute info) get too large.

(We NEED that many tables, please don't recommend to reduce them)



Have you ANALYZEd your database? VACUUMing?


Of course ... before 8.1 we routinely did a vacuum full analyze each 
night. As of 8.1 we use autovacuum.




BTW, are you using some kind of weird ERP? I have one that treat
informix as a fool and don't let me get all of informix potential...
maybe the same is in your case...


No. Our database contains tables for we content management systems. The 
server hosts approx. 500 cms applications, and each of them has approx. 
30 tables.


That's why I'm asking if it was better to have 500 databases with 30 
tables each. In previous Postgres versions this led to even worse 
performance ...


Mike

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Michael Riess

Hi Tom,


Michael Riess [EMAIL PROTECTED] writes:

(We NEED that many tables, please don't recommend to reduce them)


No, you don't.  Add an additional key column to fold together different
tables of the same structure.  This will be much more efficient than
managing that key at the filesystem level, which is what you're
effectively doing now.


Been there, done that. (see below)



(If you really have 15000 distinct rowtypes, I'd like to know what
your database design is...)


Sorry, I should have included that info in the initial post. You're 
right in that most of these tables have a similar structure. But they 
are independent and can be customized by the users.


Think of it this way: On the server there are 500 applications, and each 
has 30 tables. One of these might be a table which contains the products 
of a webshop, another contains news items which are displayed on the 
website etc. etc..


The problem is that the customers can freely change the tables ... add 
columns, remove columns, change column types etc.. So I cannot use 
system wide tables with a key column.



Mike

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Michael Riess

Michael Riess [EMAIL PROTECTED] writes:

On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote:

we are currently running a postgres server (upgraded to 8.1) which
has one large database with approx. 15,000 tables. Unfortunately
performance suffers from that, because the internal tables
(especially that which holds the attribute info) get too large.

(We NEED that many tables, please don't recommend to reduce them)


Have you ANALYZEd your database? VACUUMing?

Of course ... before 8.1 we routinely did a vacuum full analyze each
night. As of 8.1 we use autovacuum.


VACUUM FULL was probably always overkill, unless always includes
versions prior to 7.3...


Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL, 
but the database got considerably slower near the end of the week.





BTW, are you using some kind of weird ERP? I have one that treat
informix as a fool and don't let me get all of informix potential...
maybe the same is in your case...

No. Our database contains tables for we content management
systems. The server hosts approx. 500 cms applications, and each of
them has approx. 30 tables.

That's why I'm asking if it was better to have 500 databases with 30
tables each. In previous Postgres versions this led to even worse
performance ...


This has the feeling of fitting with Alan Perlis' dictum below...

Supposing you have 500 databases, each with 30 tables, each with 4
indices, then you'll find you have, on disk...

# of files = 500 x 30 x 5 = 75000 files

If each is regularly being accessed, that's bits of 75000 files
getting shoved through OS and shared memory caches.  Oh, yes, and
you'll also have regular participation of some of the pg_catalog
files, with ~500 instances of THOSE, multiplied some number of ways...



Not all of the tables are frequently accessed. In fact I would estimate 
that only 20% are actually used ... but there is no way to determine if 
or when a table will be used. I thought about a way to swap out tables 
which have not been used for a couple of days ... maybe I'll do just 
that. But it would be cumbersome ... I had hoped that an unused table 
does not hurt performance. But of course the internal tables which 
contain the meta info get too large.



An application with 15000 frequently accessed tables doesn't strike me
as being something that can possibly turn out well.  You have, in
effect, more tables than (arguably) bloated ERP systems like SAP R/3;
it only has a few thousand tables, and since many are module-specific,
and nobody ever implements *all* the modules, it is likely only a few
hundred that are hot spots.  No 15000 there..


I think that my systems confirms with the 80/20 rule ...
.

---(end of broadcast)---
TIP 6: explain analyze is your friend