Re: [PERFORM] BUG #2784: Performance serious degrades over a period
Bruno Wolff III <[EMAIL PROTECTED]> wrote: > > This really should have been asked on pgsql-performance and would probably > get a better response there.. > > On Sun, Nov 26, 2006 at 16:35:52 +, > Michael Simms <[EMAIL PROTECTED]> wrote: > > PostgreSQL version: 8.1.4 > > Operating system: Linux kernel 2.6.12 > > Description:Performance serious degrades over a period of a month > > Details: > > > > OK, we have a database that runs perfectly well after a dump and restore, > > but over a period of a month or two, it just degrades to the point of > > uselessness. > > vacuumdb -a is run every 24 hours. We have also run for months at a time > > using -a -z but the effect doesnt change. > > > > This sounds like you either need to increase your FSM setting or vacuum > more often. I think vacuumdb -v will give you enough information to tell > if FSM is too low at the frequency you are vacuuming. > > > The database is for a counter, not the most critical part of the system, but > > a part of the system nonetheless. Other tables we have also degrade over > > time, but the counter is the most pronounced. There seems to be no common > > feature of the tables that degrade. All I know is that a series of queries > > that are run on the database every 24 hours, after a dump/restore takes 2 > > hours. Now, 2 months after, it is taking over 12. We are seriously > > considering switching to mysql to avoid this issue. > > You probably will want to vacuum the counter table more often than the other > tables in the database. Depending on how often the counter(s) are being > updated and how many separate counters are in the table you might want to > vacuum that table as often as once a minute. > > Depending on your requirements you might also want to consider using a > sequence > instead of a table row for the counter. Just to throw it in to the mix: you might also be in a usage pattern that would benefit from a scheduled reindex every so often. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Advice on selecting good values for work_mem?
I'm gearing up to do some serious investigation into performance for PostgreSQL with regard to our application. I have two issues that I've questions about, and I'll address them in two seperate emails. This email regards the tuning of work_mem. I'm planning on going through all of the queries our application does, under various load scenarios and approaching each performance issue as it appears. What I'm fuzzy on is how to discretely know when I'm overflowing work_mem? Obviously, if work_mem is exhausted by a particular query, temp files will be created and performance will begin to suck, but it would be nice to have some more information -- how large was the resultant temp file, for example. Does the creation of a temp file trigger any logging? I've yet to see any, but we may not have hit any circumstances where work_mem was exhausted. I've been looking through the docs at the various pg_stat* views and functions, but it doesn't look as if there's anything in there about this. That leads to my other question. Assuming I've got lots of connections (which I do), how can I determine if work_mem is too high? Do server processes allocated it even if they don't actually use it? Is the only way to find out to reduce it and see when it starts to be a problem? If so, that leads back to my first question: how can I be sure whether temp files were created or not? My goal is to set work_mem as small as is possible for the most common queries, then force the developers to use "set work_mem to x" to adjust it for big queries. -- Bill Moran Collaborative Fusion Inc. [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] How to determine if my setting for shared_buffers is too high?
I'm gearing up to do some serious investigation into performance for PostgreSQL with regard to our application. I have two issues that I've questions about, and I'll address them in two seperate emails. This one regards tuning shared_buffers. I believe I have a good way to monitor database activity and tell when a database grows large enough that it would benefit from more shared_buffers: if I monitor the blks_read column of pg_stat_database, it should increase very slowly if there is enough shared_buffer space. When shared buffer space runs out, more disk read requests will be required and this number will begin to climb. If anyone sees a flaw in this approach, I'd be interested to hear it. The other tuning issue with shared_buffers is how to tell if I'm allocating too much. For example, if I allocate 1G of RAM to shared buffers, and the entire database can fit in 100M, that 900M might be better used as work_mem, or something else. I haven't been able to find anything regarding how much of the shared buffer space PostgreSQL is actually using, as opposed to simply allocating. -- Bill Moran Collaborative Fusion Inc. [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] New to PostgreSQL, performance considerations
In response to Ron <[EMAIL PROTECTED]>: > > 3= Daniel van Ham Colchete is running Gentoo. That means every SW > component on his box has been compiled to be optimized for the HW it > is running on. > There may be a combination of effects going on for him that others > not running a system optimized from the ground up for its HW do not see. http://www.potentialtech.com/wmoran/source.php You get an idea of how old these tests are by the fact that the latest and greatest was FreeBSD 4.9 at the time, but I suppose it may still be relevent. -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Scaling concerns
tsuraan <[EMAIL PROTECTED]> wrote: > > I'm writing a webmail-type application that is meant to be used in a > corporate environment. The core of my system is a Postgres database > that is used as a message header cache. The two (relevant) tables > being used are pasted into the end of this message. My problem is > that, as the messages table increases to tens of millions of rows, > pgsql slows down considerably. Even an operation like "select > count(*) from messages" can take minutes, with a totally idle system. > Postgres seems to be the most scalable Free database out there, so I > must be doing something wrong. > > As for the most common strategy of having a slower (more rows) > "archival" database and a smaller, faster "live" database, all the > clients in the company are using their normal corporate email server > for day-to-day email handling. The webmail is used for access email > that's no longer on the corporate server, so it's not really simple to > say which emails should be considered live and which are really > out-of-date. > > My postgres settings are entirely default with the exception of > shared_buffers being set to 40,000 and max_connections set to 400. > I'm not sure what the meaning of most of the other settings are, so I > haven't touched them. The machines running the database servers are > my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB > SATA II drive), and a production server with two dual-core Intel > chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550 > controller. Both machines are running Gentoo Linux with a 2.6.1x > kernel, and both exhibit significant performance degradation when I > start getting tens of millions of records. > > Any advice would be most appreciated. Thanks in advance! > > Tables: > > CREATE TABLE EmailAddresses ( > emailid SERIAL PRIMARY KEY, -- The unique identifier of this address > name TEXT NOT NULL, -- The friendly name in the address > addrspec TEXT NOT NULL, -- The [EMAIL PROTECTED] part of the > address > UNIQUE(name, addrspec) > ); > > and > > CREATE TABLE Messages ( > -- Store info: > msgkeyBIGSERIAL PRIMARY KEY, -- Unique identifier for a message > path TEXT NOT NULL, -- Where the message is on the file > system > inserted TIMESTAMP DEFAULT now(),-- When the message was fetched > -- Message Info: > msgid TEXT UNIQUE NOT NULL, -- Message's Message-Id field > mfrom INTEGER -- Who sent the message > REFERENCES EmailAddresses > DEFAULT NULL, > mdate TIMESTAMP DEFAULT NULL, -- Message "date" header field > replyto TEXT DEFAULT NULL, -- Message-ID of replied-to message > subject TEXT DEFAULT NULL, -- Message "subject" header field > numatch INTEGER DEFAULT NULL, -- Number of attachments > UNIQUE(path) > ); You might benefit from adding some performance-specific changes to your schema. For example, if you created a separate table for each emailid (call them Messages_1, Messages_2, etc). I expect that no one user will have an unbearable number of messages, thus each user will see reasonable performance when working with their mailbox. You can handle the management of this entirely in your application logic, but it might be better of you wrote stored procedures to access message tables -- to make it easier on the application side. -Bill ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Advice on selecting good values for work_mem?
In response to Tom Lane <[EMAIL PROTECTED]>: > Bill Moran <[EMAIL PROTECTED]> writes: > > Does the creation of a temp file trigger any logging? > > No; but it wouldn't be hard to add some if you wanted. I'd do it at > deletion, not creation, so you could log the size the file reached. > See FileClose() in src/backend/storage/file/fd.c. Is this along the lines of what you were thinking? Is this acceptable to get pulled into the tree (maintaining local patches sucks ;) I've only been using this patch a day and I'm already giddy about how much it helps tuning work memory sizes ... -- Bill Moran Collaborative Fusion Inc. *** fd.c.prev Mon Dec 18 16:09:51 2006 --- fd.c Mon Dec 18 16:09:31 2006 *** *** 939,944 --- 939,945 FileClose(File file) { Vfd *vfdP; + struct stat u filestats; Assert(FileIsValid(file)); *** *** 968,973 --- 969,982 { /* reset flag so that die() interrupt won't cause problems */ vfdP->fdstate &= ~FD_TEMPORARY; + if (fstat(vfdP->fd, &filestats)) { + ereport(WARNING, + (errmsg("A temporary file of %d bytes was used", + filestats.st_size), + errhint("You many need to increase work_mem."))); + } else { + elog(ERROR, "Could not stat \"%s\": %m", vfdP->fileName); + } if (unlink(vfdP->fileName)) elog(LOG, "failed to unlink \"%s\": %m", vfdP->fileName); ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Advice on selecting good values for work_mem?
In response to Stephen Frost <[EMAIL PROTECTED]>: > * Bill Moran ([EMAIL PROTECTED]) wrote: > > What I'm fuzzy on is how to discretely know when I'm overflowing > > work_mem? Obviously, if work_mem is exhausted by a particular > > query, temp files will be created and performance will begin to suck, > > I don't believe this is necessairly *always* the case. There are > instances in PostgreSQL where it will just continue to allocate memory > beyond the work_mem setting. This is usually due to poor statistics > (you changed the data in the table dramatically and havn't run analyze, > or you never ran analyze on the table at all, or the statistics > gathering values are set too low to capture enough information about > the data, etc). It would nice if it was possible to have this detected > and logged, or similar. Additionally, work_mem isn't actually a > per-query thing, aiui, it's more like a per-node in the planner thing. > That is to say that if you have multiple sorts going on, or a sort and a > hash, that *both* of those expect to be able to use up to work_mem > amount of memory. I'm aware of that. It's one of the reasons I asked about monitoring its usage. I mean, if I could be sure that each process only used work_mem amount of space, it would be pretty easy to run some calculations and go to management and say, "these servers need X amount of RAM for optimal performance ..." As it is, I'm trying to find the most complex queries and estimate how many joins and sorts there are and how much that's going to add up to. It'd be nice to be able to crank up the debugging and have postgresql say: QUERY 0: total work_mem: bytes JOIN 0: x bytes JOIN 1: y bytes ... Perhaps it's in there somewhere ... I haven't experimented with cranking the logging up to maximum yet. If it's missing, I'm hoping to have some time to add it. Adding debugging to PostgreSQL is a pretty easy way to learn how the code fits together ... > Also, another point you might want to consider how to handle is that > work_mem has no bearing on libpq and I don't recall there being a way to > constrain libpq's memory usage. This has been an issue for me just > today when a forgot a couple parameters to a join which caused a > cartesean product result and ended up running the box out of memory. > Sure, it's my fault, and unlikely to happen in an application, but it > still sucks. :) It also managed to run quickly enough that I didn't > notice what was happening. :/ Of course, the server side didn't need > much memory at all to generate that result. Also, libpq stores > everything in *it's* memory before passing it to the client. An example > scenario of this being kind of an issue is psql, you need double the > memory size of a given result because the result is first completely > grabbed and stored in libpq and then sent to your pager (eg: less) which > then sucks it all into memory again. In applications (and I guess psql, > though I never think of it, and it'd be nice to have as a configurable > option if it isn't already...) you can use cursors to limit the amount > of memory libpq uses. In our case, the database servers are always dedicated, and the application side always runs on a different server. This is both a blessing and a curse: On the one hand, I don't have to worry about any client apps eating up RAM on the DB server. On the other hand, last week we found a place where a query with lots of joins was missing a key WHERE clause, it was pulling something like 10X the number of records it needed, then limiting it further on the client side. Optimizing this sort of thing is something I enjoy. > As these are new things (both the temp file creation logging and the > work_mem overflow detection, I believe), this discussion is probably > more appropriate for -hackers. True. It started out here because I wasn't sure that the stuff didn't already exist, and was curious how others were doing it. When I've had some more opportunity to investigate work_mem monitoring, I'll start the discussion back up on -hackers. > > That leads to my other question. Assuming I've got lots of > > connections (which I do), how can I determine if work_mem is too > > high? Do server processes allocated it even if they don't actually > > use it? Is the only way to find out to reduce it and see when it > > starts to be a problem? If so, that leads back to my first question: > > how can I be sure whether temp files were created or not? > > Yeah, look for swappiness... It'd be nice to be able to get memory > statistics on queries which have been run though... > > > My goal
Re: [PERFORM] Vacuum v/s Autovacuum
In response to "Gauri Kanekar" <[EMAIL PROTECTED]>: > On 1/18/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote: > > > > On Jan 18, 2007, at 22:24 , Gauri Kanekar wrote: > > > > > is autovacuum similar to vacuum full analyse verbose. > > > > http://www.postgresql.org/docs/8.2/interactive/routine- > > vacuuming.html#AUTOVACUUM > > > > Apparently, no FULL, no VERBOSE (which is only really useful if you > > want to see the results, not for routine maintenance). [please don't top-post] Actually, you can raise the debugging level in PostgreSQL and get something similar to VERBOSE. The only problem is that it also increases the amount of logging that occurs with everything. > We have autovacuum ON , but still postgres server warns to > increas max_fsm_pages value. > > Do autovacuum release space after it is over? Yes. If you're still getting warnings about max_fsm_pages while autovac is running, you need to do one of two things: 1) Increase max_fsm_pages 2) Adjust autovacuum's settings so it vacuums more often. Depending on this, you may also need to temporarily adjust max_fsm_pages, then manually vacuum -- you may then find that autovacuum can keep everything clean with lower settings of max_fsm_pages. Overall, the best settings for 1 and 2 depend on the nature of your workload, and simulation and monitoring will be required to find the best values. I feel that the docs on this are very good. If the amount of data that changes between runs of autovacuum is greater than max_fsm_pages, then vacuum will be unable to reclaim all the space. -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] slow result
In response to Laurent Manchon <[EMAIL PROTECTED]>: > > I have a slow response of my PostgreSQL database 7.4 using this query below > on a table with 80 rows: > > select count(*)from tbl; > > PostgreSQL return result in 28 sec every time. > although MS-SQL return result in 0.02 sec every time. > > My server is a DELL PowerEdge 2600 with bi-processor Xeon at 3.2 Ghz > with 3GBytes RAM While there's truth in everything that's been said by others, the query should not take _that_ long. I just tried a count(*) on a table with 460,000 rows, and it took less than a second. count(*) in PostgreSQL is not likely to compare to most other RDBMS for the reasons others have stated, but counting 800,000 rows shouldn't take 28 seconds. The standard question applies: have you vacuumed recently? > My PostgreSQL Conf is > * > log_connections = yes > syslog = 2 > effective_cache_size = 5 > sort_mem = 1 > max_connections = 200 > shared_buffers = 3000 > vacuum_mem = 32000 > wal_buffers = 8 > max_fsm_pages = 2000 > max_fsm_relations = 100 > > Can you tell me is there a way to enhence performance ? On our 4G machines, we use shared_buffers=24 (which equates to about 2G). The only reason I don't set it higher is that FreeBSD has a limit on shared memory of 2G. The caveat here is that I'm running a mix of 8.1 and 8.2. There have been significant improvements in both the usage of shared memory, and the optimization of count(*) since 7.4, so the first suggestion I have is to upgrade your installation. -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] work_mem
"Campbell, Lance" <[EMAIL PROTECTED]> wrote: > > I have been researching how to improve my overall performance of > postgres. I am a little confused on the reasoning for how work_mem is > used in the postgresql.conf file. The way I understand the > documentation is you define with work_mem how much memory you want to > allocate per search. Couldn't you run out of memory? This approach > seems kind of odd to me. How do you tell the system not to allocate too > much memory if you all of the sudden got hit with a heavier number of > queries? work_mem tells PostgreSQL how much memory to use for each sort/join. If a sort/join exceeds that amount, PostgreSQL uses temp files on the disk instead of memory to do the work. If you want a query to complete, you've got to allow Postgres to finish it. The work_mem setting gives Postgres information on how to go about doing that in the best way. If you want to guarantee that individual processes can't suck up tons of memory, use your operating system's ulimit or equivalent functionality. That's one of the advantages of the forking model, it allows the operating system to enforce a certain amount of policy an a per-connection basis. HTH, Bill ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [OT] Very strange postgresql behaviour
In response to Arnau <[EMAIL PROTECTED]>: > >I have postgresql 7.4.2 running on debian and I have the oddest > postgresql behaviour I've ever seen. > > I do the following queries: > > > espsm_asme=# select customer_app_config_id, customer_app_config_name > from customer_app_config where customer_app_config_id = 5929 or > customer_app_config_id = 11527 order by customer_app_config_id; > > > customer_app_config_id | customer_app_config_name > +-- > 5929 | INFO > (1 row) > > >I do the same query but changing the order of the or conditions: > > > espsm_asme=# select customer_app_config_id, customer_app_config_name > from customer_app_config where customer_app_config_id = 11527 or > customer_app_config_id = 5929 order by customer_app_config_id; > > > customer_app_config_id | customer_app_config_name > +-- >11527 | MOVIDOSERENA TONI 5523 > (1 row) > > > >As you can see, the configuration 5929 and 11527 both exists, but > when I do the queries they don't appear. [snip] Just a guess, but perhaps your index is damaged. Have you tried REINDEXing? -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] work-mem how do I identify the proper size
In response to "Campbell, Lance" <[EMAIL PROTECTED]>: > If I set work-mem at a particular amount of memory how do I answer the > following questions: > > 1) How many of my queries were able to run inside the memory I > allocated for work-mem? > > 2) How many of my queries had to run from disk because work-mem > was not set high enough? > > 3) If a query had to go to disk in order to be sorted or completed > is there a way to identify how much memory it would have taken in order > to run the query from memory? I don't know of any good way to answer these questions on current versions. I have a patch in for 8.3 that logs the usage of temporary files, which helps with some of this. It'd be nice to have additional debug logging that tells you: 1) when a sort/join operation uses disk instead of memory 2) A higher level debugging that announces "this query used temp files for some operations". #1 would be nice for optimizing, but may involve a lot of overhead. #2 could (potentially) be enabled on production servers to flag queries that need investigated, without generating a significant amount of logging overhead. Hopefully I'll get some time to try to hack some stuff together for this soon. A little bit of playing around shows that cost estimates for queries change radically when the system thinks it will be creating temp files (which makes sense ...) Notice these two partial explains: -> Sort (cost=54477.32..55674.31 rows=478798 width=242) -> Sort (cost=283601.32..284798.31 rows=478798 width=242) These are explains of the same query (a simple select * + order by on a non-indexed column) The first one is taken with work_mem set at 512m, which would appear to be enough space to do the entire sort in memory. The second is with work_mem set to 128k. More interesting is that that actual runtime doesn't differ by nearly that much: 3100ms vs 2200ms. (I've realized that my setting for random_page_cost is too damn high for this hardware -- thanks for causing me to look at that ... :) Anyway -- hope that helps. -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Postgres performance Linux vs FreeBSD
In response to "Jacek Zaręba" <[EMAIL PROTECTED]>: > Hello, I've set up 2 identical machines, hp server 1ghz p3, > 768mb ram, 18gb scsi3 drive. On the first one I've installed > Debian/GNU 4.0 Linux, on the second FreeBSD 6.2. On both > machines I've installed Postgresql 8.2.3 from sources. > Now the point :)) According to my tests postgres on Linux > box run much faster then on FreeBSD, here are my results: > > *** setting up ** > creeate table foo as select x from generate_series(1,250) x; > vacuum foo; > checkpoint; > \timing > > * > > *** BSD * > actual=# select count(*) from foo; >count > - > 250 > (1 row) > > Time: 1756.455 ms > actual=# explain analyze select count(*) from foo; >QUERY PLAN > -- > Aggregate (cost=34554.20..34554.21 rows=1 width=0) (actual > time=12116.841..12116.843 rows=1 loops=1) > -> Seq Scan on foo (cost=0.00..28304.20 rows=250 width=0) > (actual time=9.276..6435.890 rows=250 loops=1) > Total runtime: 12116.989 ms > (3 rows) > > Time: 12117.803 ms > > ** > > > *** LIN ** > actual=# select count(*) from foo; >count > - > 250 > (1 row) > > Time: 1362,193 ms > actual=# EXPLAIN ANALYZE > actual-# select count(*) from foo; >QUERY PLAN > -- > Aggregate (cost=34554.20..34554.21 rows=1 width=0) (actual > time=4737.243..4737.244 rows=1 loops=1) > -> Seq Scan on foo (cost=0.00..28304.20 rows=250 width=0) > (actual time=0.058..2585.170 rows=250 loops=1) > Total runtime: 4737.363 ms > (3 rows) > > Time: 4738,367 ms > actual=# > ** > > Just a word about FS i've used: > BSD: > /dev/da0s1g on /usr/local/pgsql (ufs, local, noatime, soft-updates) > > LIN: > /dev/sda7 on /usr/local/pgsql type xfs (rw,noatime) > > > My question is simple :) what's wrong with the FreeBSD BOX?? > What's the rule for computing gettimeofday() time ?? I can't speak to the gettimeofday() question, but I have a slew of comments regarding other parts of this email. The first thing that I expect most people will comment on is your testing strategy. You don't get a lot of details, but it seems as if you ran 1 query on each server, 1 run on each. If you actually did more tests, you should provide that information, otherwise, people will criticize your testing strategy instead of looking at the problem. The other side to this is that you haven't shown enough information about your alleged problem to even start to investigate it. -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Recommended Initial Settings
In response to "Campbell, Lance" <[EMAIL PROTECTED]>: > Richard, > Thanks for your reply. > > You said: > "Your operating-system should be doing the caching for you." > > My understanding is that as long as Linux has memory available it will > cache files. Then from your comment I get the impression that since > Linux would be caching the data files for the postgres database it would > be redundant to have a large shared_buffers. Did I understand you > correctly? Keep in mind that keeping the data in the kernel's buffer requires Postgres to make a syscall to read a file, which the kernel then realizes is cached in memory. The kernel then has to make that data available to the Postgres (userland) process. If the data is in Postgres' buffers, Postgres can fetch it directly, thus avoiding the overhead of the syscalls and the kernel activity. You still have to make sysvshm calls, though. So, it depends on which is able to manage the memory better. Is the kernel so much more efficient that it makes up for the overhead of the syscalls? My understanding is that in recent versions of Postgres, this is not the case, and large shared_buffers improve performance. I've yet to do any in-depth testing on this, though. -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Estimate the size of the SQL file generated by pg_dump utility
In response to Bruce Momjian <[EMAIL PROTECTED]>: > Bricklen Anderson wrote: > > Bruce Momjian wrote: > > > Ravindran G-TLS,Chennai. wrote: > > >> Note: Please bear with us for the disclaimer because it is automated in > > >> the exchange server. > > >> Regards, > > >> Ravi > > > > > > FYI, we are getting closer to rejecting any email with such a > > > disclaimer, or emailing you back every time saying we are ignoring the > > > disclaimer. > > > > I think this issue cropped up a year or two ago, and one of the > > suggestions was for the offender to simply put a link back to their > > disclaimer at the foot of their email, rather than that uber-verbose > > message. > > Right. The problem is that most of the posters have no control over > their footers --- it is added by their email software. I'm curious, what problem does the disclaimer cause? I wrote the following TOS for my personal system: https://www.potentialtech.com/cms/node/9 Excerpt of the relevant part: "If you send me email, you are granting me the unrestricted right to use the contents of that email however I see fit, unless otherwise agreed in writing beforehand. You have no rights to the privacy of any email that you send me. If I feel the need, I will forward emails to authorities or make their contents publicly available. By sending me email you consent to this policy and agree that it overrides any disclaimers or policies that may exist elsewhere." I have no idea if that's legally binding or not, but I've talked to a few associates who have some experience in law, and they all argue that email disclaimers probably aren't legally binding anyway -- so the result is undefined. Don't know if this addresses the issue or confuses it ... ? -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance of count(*)
t; > This message and its attachments are intended exclusively for the > > named addressee. If you receive this message in error, please > > immediately delete it from your system and notify the sender. You > > may not use this message or any part of it for any purpose. > > The message may contain information that is confidential or > > protected by law, and any opinions expressed are those of the > > individualsender. Internet e-mail guarantees neither the > > confidentiality nor the proper receipt of the message sent. > > If the addressee of this message does not consent to the use > > of internete-mail,pleaseinform usinmmediately. > > > > > > > > > > > > ---(end of broadcast)--- > > TIP 7: You can help support the PostgreSQL project by donating at > > > > http://www.postgresql.org/about/donate > > > ---(end of broadcast)--- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate > > > > > > -- Bill Moran Collaborative Fusion Inc. [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Potential memory usage issue
In response to David Brain <[EMAIL PROTECTED]>: > > I recently migrated one of our large (multi-hundred GB) dbs from an > Intel 32bit platform (Dell 1650 - running 8.1.3) to a 64bit platform > (Dell 1950 - running 8.1.5). However I am not seeing the performance > gains I would expect What were you expecting? It's possible that your expectations are unreasonable. In our testing, we found that 64bit on the same hardware as 32bit only gave us a 5% gain, in the best case. In many cases the gain was near 0, and in some there was a small performance loss. These findings seemed to jive with what others have been reporting. > - I am suspecting that some of this is due to > differences I am seeing in reported memory usage. > > On the 1650 - a 'typical' postmaster process looks like this in top: > > 5267 postgres 16 0 439m 427m 386m S 3.0 21.1 3:31.73 postmaster > > On the 1940 - a 'typical' postmaster process looks like: > > 10304 postgres 16 0 41896 13m 11m D4 0.3 0:11.73 postmaster > > I currently have both systems running in parallel so the workloads will > be approximately equal. The configurations of the two systems in terms > of postgresql.conf is pretty much identical between the two systems, I > did make some changes to logging, but nothing to buffers/shared memory > config. > > I have never seen a postmaster process on the new system consume > anywhere near as much RAM as the old system - I am wondering if there is > something up with the shared memory config/usage that is causing my > performance issues. Any thoughts as to where I should go from here? Provide more information, for one thing. I'm assuming from the top output that this is some version of Linux, but more details on that are liable to elicit more helpful feedback. We run everything on FreeBSD here, but I haven't seen any difference in the way PostgreSQL uses memory on ia32 FreeBSD vs. amd64 FreeBSD. Without more details on your setup, my only suggestion would be to double-verify that your postgresql.conf settings are correct on the 64 bit system. -- Bill Moran Collaborative Fusion Inc. [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Potential memory usage issue
In response to David Brain <[EMAIL PROTECTED]>: > > Thanks for the response. > Bill Moran wrote: > > In response to David Brain <[EMAIL PROTECTED]>: > >> I recently migrated one of our large (multi-hundred GB) dbs from an > >> Intel 32bit platform (Dell 1650 - running 8.1.3) to a 64bit platform > >> (Dell 1950 - running 8.1.5). However I am not seeing the performance > >> gains I would expect > > > > What were you expecting? It's possible that your expectations are > > unreasonable. > > Possibly - but there is a fair step up hardware performance wise from a > 1650 (Dual 1.4 Ghz PIII with U160 SCSI) to a 1950 (Dual, Dual Core 2.3 > Ghz Xeons with SAS) - so I wasn't necessarily expecting much from the > 32->64 transition (except maybe the option to go > 4GB easily - although > currently we only have 4GB in the box), but was from the hardware > standpoint. Ahh ... I didn't get that from your original message. > I am curious as to why 'top' gives such different output on the two > systems - the datasets are large and so I know I benefit from having > high shared_buffers and effective_cache_size settings. Have you done any actual queries on the new system? PG won't use the shm until it needs it -- and that doesn't occur until it gets a request for data via a query. Install the pg_bufferstats contrib module and take a look at how shared memory is being use. I like to use MRTG to graph shared buffer usage over time, but you can just do a SELECT count(*) WHERE NOT NULL to see how many buffers are actually in use. > > Provide more information, for one thing. I'm assuming from the top output > > that this is some version of Linux, but more details on that are liable > > to elicit more helpful feedback. > > > Yes the OS is Linux - on the 1650 version 2.6.14, on the 1950 version 2.6.18 -- Bill Moran Collaborative Fusion Inc. [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Potential memory usage issue
In response to David Brain <[EMAIL PROTECTED]>: > Bill Moran wrote: > > > > > Install the pg_bufferstats contrib module and take a look at how shared > > memory is being use. I like to use MRTG to graph shared buffer usage > > over time, but you can just do a SELECT count(*) WHERE NOT NULL to see > > how many buffers are actually in use. > > > > Can you explain what you'd use as a diagnostic on this - I just > installed the module - but I'm not entirely clear as to what the output > is actually showing me and/or what would be considered good or bad. Well, there are different things you can do with it. See the README, which I found pretty comprehensive. What I was referring to was the ability to track how many shared_buffers were actually in use, which can easily be seen at a cluster-wide view with two queries: select count(*) from pg_buffercache; select count(*) from pg_buffercache where reldatabase is not null; The first gives you the total number of buffers available (you could get this from your postgresql.conf as well, but with automated collection and graphing via mrtg, doing it this way guarantees that we'll always know what the _real_ value is) The second gives you the number of buffers that are actually holding data. If #2 is smaller than #1, that indicates that the entire working set of your database is able to fit in shared memory. This might not be your entire database, as some tables might never be queried from (i.e. log tables that are only queried when stuff goes wrong ...) This means that Postgres is usually able to execute queries without going to the disk for data, which usually equates to fast queries. If it's consistently _much_ lower, it may indicate that your shared_buffers value is too high, and the system may benefit from re-balancing memory usage. If #2 is equal to #1, it probably means that your working set is larger than the available shared buffers, this _may_ mean that your queries are using the disk a lot, and that you _may_ benefit from increasing shared_buffers, adding more RAM, sacrificing a 15000 RPM SCSI drive to the gods of performance, etc ... Another great thing to track is read activity. I do this via the pg_stat_database table: select sum(blks_hit) from pg_stat_database; select sum(blks_read) from pg_stat_database; (Note that you need block-level stats collecting enabled to make these usable) If the second one is increasing particularly fast, that's a strong indication that more shared_memory might improve performance. If neither of them are increasing, that indicates that nobody's really doing much with the database ;) I strongly recommend that you graph these values using mrtg or cacti or one of the many other programs designed to do that. It makes life nice when someone says, "hey, the DB system was really slow yesterday while you where busy in meetings, can you speed it up." -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Basic Q on superfluous primary keys
In response to "Kynn Jones" <[EMAIL PROTECTED]>: > Consider these two very similar schemas: > > Schema 1: > > > CREATE TABLE foo ( > id serial PRIMARY KEY, > frobnitz character(varying 100) NOT NULL UNIQUE > ); > > > CREATE TABLE bar ( > id serial PRIMARY KEY, > foo_id int REFERENCES foo(id) > ) > > > Schema 2: > > > CREATE TABLE foo ( > frobnitz character(varying 100) PRIMARY KEY > ); > > > CREATE TABLE bar ( > id serial PRIMARY KEY, > frobnitz character(varying 100) REFERENCES foo(frobnitz) > ) > > > > > The two situations are semantically identical: each record in table bar > refers to a record in table foo. The difference is that in the first > schema, this referencing is done through an "artificial" serial-integer > primary key, while in the second schema this reference is done through a > data field that happens to be unique and not null, so it can serve as > primary key. The first case is call a "surrogate key". A little googling on that term will turn up a wealth of discussion -- both for and against. > I find Schema 1 awkward and unnatural; more specifically, foo.id seems > unnecessary in light of the non-null uniqueness of foo.frobnitz. But I > remember once reading that "long" fields like foo.frobnitz did not make good > primary keys. I had a discussion about this recently on the Drupal mailing lists, at the end of which I promised to do some benchmarking to determine whether or not text keys really do hurt performance of indexes. Unfortunately, I still haven't followed through on that promise -- maybe I'll get to it tomorrow. > Is the field foo.id in Schema 1 superfluous? For example, wouldn't the > referencing from bar to foo really be done "behind the scenes" through some > hidden field (oid?) instead of through the frobnitz text field? Which of > the two schemas would give better perfornance? -- Bill Moran Collaborative Fusion Inc. [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Fragmentation of WAL files
In response to Jim Nasby <[EMAIL PROTECTED]>: > I was recently running defrag on my windows/parallels VM and noticed > a bunch of WAL files that defrag couldn't take care of, presumably > because the database was running. What's disturbing to me is that > these files all had ~2000 fragments. Now, this was an EnterpriseDB > database which means the WAL files were 64MB instead of 16MB, but > even having 500 fragments for a 16MB WAL file seems like it would > definitely impact performance. I don't know about that. I've seen marketing material that claims that modern NTFS doesn't suffer performance problems from fragmentation. I've never tested it myself, but my point is that you might want to do some experiments -- you might find out that it doesn't make any difference. If it does, you should be able to stop the DB, defragment the files, then start the DB back up. Since WAL files are recycled, they shouldn't fragment again -- unless I'm missing something. If that works, it may indicate that (on Windows) a good method for installing is to create all the necessary WAL files as empty files before launching the DB. > Can anyone else confirm this? I don't know if this is a windows-only > issue, but I don't know of a way to check fragmentation in unix. I can confirm that it's only a Windows problem. No UNIX filesystem that I'm aware of suffers from fragmentation. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Filesystem fragmentation (Re: [PERFORM] Fragmentation of WAL files)
In response to Heikki Linnakangas <[EMAIL PROTECTED]>: [snip] > >> Can anyone else confirm this? I don't know if this is a windows-only > >> issue, but I don't know of a way to check fragmentation in unix. > > > > I can confirm that it's only a Windows problem. No UNIX filesystem > > that I'm aware of suffers from fragmentation. > > What do you mean by suffering? All filesystems fragment files at some > point. When and how differs from filesystem to filesystem. And some > filesystems might be smarter than others in placing the fragments. To clarify my viewpoint: To my knowledge, there is no Unix filesystem that _suffers_ from fragmentation. Specifically, all filessytems have some degree of fragmentation that occurs, but every Unix filesystem that I am aware of has built-in mechanisms to mitigate this and prevent it from becoming a performance issue. > There's a tool for Linux in the e2fsprogs package called filefrag that > shows the fragmentation of a file, but I've never used it myself. Interesting. However, the existence of a tool does not particularly indicated the _need_ for said tool. It might just have been something cool that somebody wrote. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Simple query, 10 million records...MySQL ten times faster
In response to zardozrocks <[EMAIL PROTECTED]>: > I have this table: > > CREATE TABLE test_zip_assoc ( > id serial NOT NULL, > f_id integer DEFAULT 0 NOT NULL, > lat_radians numeric(6,5) DEFAULT 0.0 NOT NULL, > long_radians numeric(6,5) DEFAULT 0.0 NOT NULL > ); > CREATE INDEX lat_radians ON test_zip_assoc USING btree (lat_radians); > CREATE INDEX long_radians ON test_zip_assoc USING btree > (long_radians); > > > > It's basically a table that associates some foreign_key (for an event, > for instance) with a particular location using longitude and > latitude. I'm basically doing a simple proximity search. I have > populated the database with *10 million* records. I then test > performance by picking 50 zip codes at random and finding the records > within 50 miles with a query like this: > > SELECT id > FROM test_zip_assoc > WHERE > lat_radians > 0.69014816041 > AND lat_radians < 0.71538026567 > AND long_radians > -1.35446228028 > AND long_radians < -1.32923017502 > > > On my development server (dual proc/dual core Opteron 2.8 Ghz with 4GB > ram) this query averages 1.5 seconds each time it runs after a brief > warmup period. In PostGreSQL it averages about 15 seconds. > > Both of those times are too slow. I need the query to run in under a > second with as many as a billion records. I don't know if this is > possible but I'm really hoping someone can help me restructure my > indexes (multicolumn?, multiple indexes with a 'where' clause?) so > that I can get this running as fast as possible. > > If I need to consider some non-database data structure in RAM I will > do that too. Any help or tips would be greatly appreciated. I'm > willing to go to greath lengths to test this if someone can make a > good suggestion that sounds like it has a reasonable chance of > improving the speed of this search. There's an extensive thread on my > efforts already here: > > http://phpbuilder.com/board/showthread.php?t=10331619&page=10 Why didn't you investigate/respond to the last posts there? The advice to bump shared_buffers is good advice. work_mem might also need bumped. Figure out which postgresql.conf your system is using and get it dialed in for your hardware. You can make all the indexes you want, but if you've told Postgres that it only has 8M of RAM to work with, performance is going to suck. I don't see hardware specs on that thread (but I didn't read the whole thing) If the system you're using is a dedicated DB system, set shared_buffers to 1/3 - 1/2 of the physical RAM on the machine for starters. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
workload. But this doesn't have to be an either/or > proposition. I don't think any control needs to be abandoned. But > self-adjusting defaults seem like an achievable goal ( I know, I know, "show > us > the patch" ). I just don't know if this feeling has resonated well between > new > users and long-term developers. I know it must be grating to have to answer > the > same questions over and over and over "have you analyzed? Did you leave > postgresql.conf at the defaults??". Seems like a win-win for both sides, > IMHO. Well, it seems like this is happening where it's practical -- autovacuum is a good example. Personally, I wouldn't be opposed to more automagic stuff, just as long as I have the option to disable it. There are some cases where I still disable autovac. > In closing, I am not bashing PG! I love it and swear by it. These comments > are > purely from an advocacy perspective. I'd love to see PG user base continue > to grow. I expect that part of the problem is "who's going to do it?" -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Best OS for Postgres 8.2
In response to "Joshua D. Drake" <[EMAIL PROTECTED]>: > David Levy wrote: > > Hi, > > > > I am about to order a new server for my Postgres cluster. I will > > probably get a Dual Xeon Quad Core instead of my current Dual Xeon. > > Which OS would you recommend to optimize Postgres behaviour (i/o > > access, multithreading, etc) ? > > > > I am hesitating between Fedora Core 6, CentOS and Debian. Can anyone > > help with this ? > > Well you just described three linux distributions, which is hardly a > question about which OS to use ;). I would stick with the long supported > versions of Linux, thus CentOS 5, Debian 4, Ubuntu Dapper. There used to be a prominent site that recommended FreeBSD for Postgres. Don't know if that's still recommended or not -- but bringing it up is likely to start a Holy War. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] estimating the need for VACUUM FULL and REINDEX
In response to Guillaume Cottenceau <[EMAIL PROTECTED]>: > I'm trying to come up with a way to estimate the need for a > VACUUM FULL and/or a REINDEX on some tables. You shouldn't vacuum full unless you have a good reason. Vacuum full causes index bloat. > According to documentation[1], VACUUM FULL's only benefit is > returning unused disk space to the operating system; am I correct > in assuming there's also the benefit of optimizing the > performance of scans, because rows are physically compacted on > the disk? In my experience, the smaller the overall database size, the less shared memory it requires. Keeping it vacuumed will reduce the amount of space taken up in memory, which means it's more likely that the data you need at any particular time is in memory. Look up a thread with my name on it a lot related to reindexing. I did some experiments with indexes and reindexing and the only advantage I found was that the space requirement for the indexes is reduced by reindexing. I was not able to find any performance difference in newly created indexes vs. indexes that were starting to bloat. > With that in mind, I've tried to estimate how much benefit would > be brought by running VACUUM FULL, with the output of VACUUM > VERBOSE. However, it seems that for example the "removable rows" > reported by each VACUUM VERBOSE run is actually reused by VACUUM, > so is not what I'm looking for. I'm not sure what you mean by that last sentence. There are only two circumstances (I can think of) for running vacuum full: 1) You've just made some major change to the database (such as adding an obscene # of records, making massive changes to a large percentage of the existing data, or issuing a lot of "alter table") and want to get the FSM back down to a manageable size. 2) You are desperately hurting for disk space, and need a holdover until you can get bigger drives. Reindexing pretty much falls into the same 2 scenerios. I do recommend that you reindex after any vacuum full. However, a much better approach is to either schedule frequent vacuums (without the full) or configure/enable autovacuum appropriately for your setup. > Then according to documentation[2], REINDEX has some benefit when > all but a few index keys on a page have been deleted, because the > page remains allocated (thus, I assume it improves index scan > performance, am I correct?). However, again I'm unable to > estimate the expected benefit. With a slightly modified version > of a query found in documentation[3] to see the pages used by a > relation[4], I'm able to see that the index data from a given > table... > > relname | relpages | reltuples > +--+--- > idx_sessions_owner_key | 38 | 2166 > pk_sessions| 25 | 2166 > > ...is duly optimized after a REINDEX: > > relname | relpages | reltuples > +--+--- > idx_sessions_owner_key | 13 | 2166 > pk_sessions|7 | 2166 > > but what I'd need is really these 38-13 and 25-7 figures (or > estimates) prior to running REINDEX. Again, my experience shows that reindexing is only worthwhile if you're really hurting for disk space/memory. I don't know of any way to tell what size an index would be if it were completely packed, but it doesn't seem as if this is the best approach anyway. Newer versions of PG have the option to create indexes with empty space already there at creation time (I believe this is called "fill factor") to allow for future growth. The only other reason I can see for vacuum full/reindex is if you _can_. For example, if there is a period that you know the database will be unused that it sufficiently long that you know these operations can complete. Keep in mind that both reindex and vacuum full create performance problems while they are running. If you knew, however, that the system was _never_ being used between 6:00 PM and 8:00 AM, you could run them over night. In that case, I would recommend replacing vacuum full with cluster. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] specific query (not all) on Pg8 MUCH slower than Pg7
In response to "Alexander Staubo" <[EMAIL PROTECTED]>: > On 5/8/07, Tom Lane <[EMAIL PROTECTED]> wrote: > > You're not getting the indexscan optimization of the LIKE clause, which > > is most likely due to having initdb'd the 8.1 installation in something > > other than C locale. You can either redo the initdb in C locale (which > > might be a good move to fix other inconsistencies from the 7.3 behavior > > you're used to) or create a varchar_pattern_ops index on the column(s) > > you're using LIKE with. > > Given the performance implications of setting the wrong locale, and > the high probability of accidentally doing this (I run my shells with > LANG=en_US.UTF-8, so all my databases have inherited this locale), why > is there no support for changing the database locale after the fact? > > # alter database test set lc_collate = 'C'; > ERROR: parameter "lc_collate" cannot be changed How would that command handle UTF data that could not be converted to C? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Throttling PostgreSQL's CPU usage
In response to Daniel Griscom <[EMAIL PROTECTED]>: > I'm building a kiosk with a 3D front end accessing PostGIS/PostgreSQL > via Apache/PHP. The 3D display is supposed to show smooth motion from > location to location, with PostGIS giving dynamically updated > information on the locations. Everything runs on the same machine, > and it all works, but when I start a query the 3D display stutters > horribly. It looks like PostgreSQL grabs hold of the CPU and doesn't > let go until it's completed the query. > > I don't need the PostgreSQL query to return quickly, but I must > retain smooth animation while the query is being processed. In other > words, I need PostgreSQL to spread out its CPU usage so that it > doesn't monopolize the CPU for any significant time (more than 50ms > or so). > > Possible solutions: > > 1: Set the PostgreSQL task priority lower than the 3D renderer task, > and to make sure that the 3D renderer sleep()s enough to let > PostgreSQL get its work done. The obvious objection to this obvious > solution is "Priority inversion!", but I see that as an additional > challenge to be surmounted rather than an absolute prohibition. So, > any thoughts on setting the PostgreSQL task priority (including by > the much-maligned tool shown at > <http://weblog.bignerdranch.com/?p=11>)? If it's all PostgreSQL processes that you want take a backseat to your rendering process, why not just nice the initial PostgreSQL daemon? All children will inherit the nice value, and there's no chance of priority inversion because all the PostgreSQL backends are running at the same priority. Just a thought. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] REVISIT specific query (not all) on Pg8 MUCH slower than Pg7
id and cvt.name = 'gene'",$rec,$fbgnwc)); > > > my $uq = $dbh2->prepare(sprintf("SELECT db.name, accession, > version, is_current from feature_dbxref fd, dbxref dx, db where fd.feature_id > = %d and fd.dbxref_id = dx.dbxref_id and dx.db_id = db.db_id and db.name = > '%s'",$pr{feature_id},$uds{$uh{$rec}{stflag}})); > > > >my $cq = $dbh2->prepare(sprintf("SELECT f.uniquename, f.name, cvt.name > as ntype, dx.db_id, dx.accession, fd.is_current from dbxref dx, feature f, > feature_dbxref fd, cvte > rm cvt where accession like '%s' and dx.dbxref_id = fd.dbxref_id and > fd.feature_id = f.feature_id and f.type_id = cvt.cvterm_id and cvt.name not > in > ('gene','protein','natural_transposable_element','chromosome_structure_variation','chromosome_arm','repeat_region')",$nacc)); > > > > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to >choose an index scan if your joining column's datatypes do not >match > > > > > > -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] pg_stats how-to?
In response to "Y Sidhu" <[EMAIL PROTECTED]>: > My immediate problem is to decrease vacuum times. Don't take this as being critical, I'm just trying to point out a slight difference between what you're doing and what you think you're doing: Your problem is not decreasing vacuum times. You _think_ that the solution to your problem is decreasing vacuum times. We don't know what your actual problem is, and "decreasing vacuum times" may not be the correct solution to it. Please describe the _problem_. Is vacuum causing performance issues while it's running? I mean, if vacuum takes a long time to run, so what -- what is the actual _problem_ caused by vacuum taking long to run. You may benefit by enabling autovacuum, or setting vacuum_cost_delay to allow vacuum to run with less interference to other queries (for example). Some details on what you're doing and what's happening would be helpful, such as the output of vacuum verbose, details on the size of your database, your hardware, how long vacuum is taking, what you feel is an acceptable length of time, your PG config. > On 5/14/07, Jim C. Nasby <[EMAIL PROTECTED]> wrote: > > > > On Mon, May 14, 2007 at 12:02:03PM -0700, Y Sidhu wrote: > > > I am sorry about this Jim, please understand that I am a newbie and am > > > trying to solve long vacuum time problems and get a handle on speeding > > up > > > queries/reports. I was pointed to pg_stats and that's where I am at now. > > I > > > > Well, I have no idea what that person was trying to convey then. What > > are you trying to look up? Better yet, what's your actual problem? > > > > > have added this into my conf file: > > > stats_start_collector TRUE stats_reset_on_server_start FALSE > > > stats_command_string TRUE > > > However, these being production servers, I have not enabled these: > > > stats_row_level stats_block_level > > FYI, stats_command_string has a far larger performance overhead than any > > of the other stats commands prior to 8.2. > > > > > Yes, I have re-started the server(s). It seems like I query tables to > > get > > > the info. If so, are there any queries written that I can use? > > > > > > Thanks for following up on this with me. > > > > > > Yudhvir > > > > > > === > > > On 5/14/07, Jim C. Nasby <[EMAIL PROTECTED]> wrote: > > > > > > > >On Mon, May 14, 2007 at 11:09:21AM -0700, Y Sidhu wrote: > > > >> The stats_block_level and stats_row_level are NOT enabled. The > > question > > > >is > > > >> how to use pg_stats. Do I access/see them via the ANALYZE command? or > > > >using > > > >> SQL. I cannot find any document which will get me started on this. > > > > > > > >Ok, we're both confused I think... I thought you were talking about the > > > >pg_stat* views, which depend on the statistics collector (that's what > > > >the stats_* parameters control). > > > > > > > >That actually has nothing at all to do with pg_stats or pg_statistics. > > > >Those deal with statistics about the data in the database, and not > > about > > > >statistics from the engine (which is what the pg_stat* views do...). > > > > > > > >If you want to know about pg_stats, take a look at > > > >http://www.postgresql.org/docs/8.2/interactive/view-pg-stats.html ... > > > >but normally you shouldn't need to worry yourself about that. Are you > > > >trying to debug something? > > > > > > > >Information about the backend statistics can be found at > > > >http://www.postgresql.org/docs/8.2/interactive/monitoring.html -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] PITR performance costs
Dave Cramer <[EMAIL PROTECTED]> wrote: > Since PITR has to enable archiving does this not increase the amount > of disk I/O required ? It does increase the required amount of I/O. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] DB cluster sharing between 32 and 64 bit software versions
In response to Ireneusz Pluta <[EMAIL PROTECTED]>: > Hello, > > I am going to build a new PostgreSQL dedicated server, on FreeBSD. Before it > goes to production > service I need to make some tests and take configuration decisions, focused > on my application needs. > Usual thing. One of them is selection of one of 32 or 64 bit versions of both > OS and PG. What I am > going to do is to install both versions on different filesystems of the same > machine. As a > consequence I would also have to deal with two independent copies of my real > databases on which I > want to perfrom my tests. However, the databases are rather large, so I am > thinking about > possibilities of not to have to restore two copies of my data, but use just > one instead, and sharing > it between the 32 and 64 versions, across reboots. > > Would that scenario work, or I am simply too naive considering it? It won't work, unfortunately. The on-disk representation of the data is different between ia32 and amd64. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL not fully utilizing system resources?
"Gregory Stewart" <[EMAIL PROTECTED]> wrote: > > Hello List, > > We've been running PostgreSQL as our web application database for > almost a year and it has noticeably slowed down over last few months. Just going to go through your email and address each point inline. First off, you say nothing of your vacuum/analyze schedule other than to point out that autovacuum is on. If you run "vacuum verbose" on the database, what does the output say? > Our current setup and pgsql configuration looks like this: > > 8.1.2 on Ubuntu 4 on Opteron Dual Core with 2 GBytes RAM. This is a > dedicated DB server. Upgrade. 8.1.2 is old, you should be running 8.1.9 unless you have a specific reason not to. > We currently have about 3.5 million rows in 91 tables. How large is the dataset? What does pg_database_size tell you? 3.5M could be a lot or a little, depending on the size of each row. > Besides the > requests coming from the web server, we have batch processes running > every 15 minutes from another internal machine that do a lot of > UPDATE, DELETE and INSERT queries on thousands of rows. Hence my concern that your autovacuum settings may not be aggressive enough. > Many of the SELECT queries coming from the web server contain large > JOINS and aggregate calculations. > > We are running a financial application which is very data intensive > and calculates a lot on the SQL side. > > Anyways, watching the system processes we realized that PostgreSQL is > only using about 300 Mbytes for itself. That's because you told it to. Below, you allocated 143M of RAM to shared buffers. Current thinking is to allocate 1/3 of your RAM to shared buffers and start fine-tuning from there. If you haven't already determined that less is better for your workload, I'd consider bumping shared_buffers up to ~7. > Also, both cores are usually > maxed out to 100% usage. Maxed out on CPU usage? What's your IO look like? > Are we expecting too much from our server? Hard to say without more details. > Our non-default configuration settings are: > > max_connections = 100 > shared_buffers = 17500 > work_mem = 2048 While I can't be sure without more details, you may benefit by raising the work_mem value. If you've got 2G of RAM, and you allocate 600M to shared_buffers, that leaves 1.4G for work_mem. Depending on whether or not the large joins you describe need it or not, you may benefit from increasing work_mem. Your description gives the impression that most of the RAM on this system is completely free. If that's the case, you may be constraining PG without need, but there's not enough information in your post to be sure. > maintenance_work_mem = 4 > max_fsm_pages = 35000 > autovacuum = on > > What can I do to make best use of my db server? Is our configuration > flawed? Or are we already at a point where we need consider clustering > / load balancing? It's a tough call. Explain of some problematic queries would be helpful. It is entirely possible that you're doing some intensive math and you're simply going to need more CPU horsepower to get it done any faster, but there's just not enough information in your post to know for sure. Post some explains of some problem queries. Let us know more about your IO load. Give us some snapshots of top under load. Find out how large the database is. Provide the output of vacuum verbose. -- Bill Moran Collaborative Fusion Inc. [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Please help me understand these numbers
In response to "Chris Hoover" <[EMAIL PROTECTED]>: > I need some help. I have started taking snapshots of performance of my > databases with concerns to io. I created a view on each cluster defined as: > SELECT pg_database.datname AS database_name, > pg_stat_get_db_blocks_fetched(pg_database.oid) AS blocks_fetched, > pg_stat_get_db_blocks_hit(pg_database.oid) AS blocks_hit, > pg_stat_get_db_blocks_fetched(pg_database.oid) - > pg_stat_get_db_blocks_hit(pg_database.oid) AS physical_reads >FROM pg_database > WHERE pg_stat_get_db_blocks_fetched(pg_database.oid) > 0 > ORDER BY pg_stat_get_db_blocks_fetched(pg_database.oid) - > pg_stat_get_db_blocks_hit(pg_database.oid) DESC; > > I am taking 5 minute snapshots of this view. > > When I look at my data, I am getting row like this: > database_name: xxx > blocks_fetched: 2396915583 > blocks_hit: 1733190669 > physical_reads: 663724914 > snapshot_timestamp: 2007-06-08 09:20:01.396079 > > database_name: xxx > blocks_fetched: 2409671770 > blocks_hit: 1733627788 > physical_reads: 676043982 > snapshot_timestamp: 2007-06-08 09:25:01.512911 > > Subtracting these 2 lines gives me a 5 minute number of > blocks_fetched: 12756187 > blocks_hit: 437119 > physical_reads: 12319068 > > If I am interpreting these number correctly, for this 5 minute interval I > ended up hitting only 3.43% of the requested data in my shared_buffer, and > ended up requesting 12,319,068 blocks from the os? Since a postgres block > is 8KB, that's 98,553,544 KB (~94GB)! > > Are my assumptions correct in this? It certainly seems possible. > I am just having a hard time fathoming > this. For this particular db, that is almost 1/2 of the total database (it > is a 200GB+ db) requested in just 5 minutes! What are your share_buffers setting and the total RAM available to the OS? My guess would be that you have plenty of RAM in the system (8G+ ?) but that you haven't allocated very much of it to shared_buffers (only a few 100 meg?). As a result, PostgreSQL is constantly asking the OS for disk blocks that it doesn't have cached, but the OS has those disk blocks cached in RAM. If my guess is right, you'll probably see improved performance by allocating more shared memory to PostgreSQL, thus avoiding having to move data from one area in memory to another before it can be used. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Please help me understand these numbers
In response to "Chris Hoover" <[EMAIL PROTECTED]>: > On 6/8/07, Bill Moran <[EMAIL PROTECTED]> wrote: > > > > In response to "Chris Hoover" <[EMAIL PROTECTED]>: > > > > > I need some help. I have started taking snapshots of performance of my > > > databases with concerns to io. I created a view on each cluster defined > > as: > > > SELECT pg_database.datname AS database_name, > > > pg_stat_get_db_blocks_fetched(pg_database.oid) AS blocks_fetched, > > > pg_stat_get_db_blocks_hit(pg_database.oid) AS blocks_hit, > > > pg_stat_get_db_blocks_fetched(pg_database.oid) - > > > pg_stat_get_db_blocks_hit(pg_database.oid) AS physical_reads > > >FROM pg_database > > > WHERE pg_stat_get_db_blocks_fetched(pg_database.oid) > 0 > > > ORDER BY pg_stat_get_db_blocks_fetched(pg_database.oid) - > > > pg_stat_get_db_blocks_hit(pg_database.oid) DESC; > > > > > > I am taking 5 minute snapshots of this view. > > > > > > When I look at my data, I am getting row like this: > > > database_name: xxx > > > blocks_fetched: 2396915583 > > > blocks_hit: 1733190669 > > > physical_reads: 663724914 > > > snapshot_timestamp: 2007-06-08 09:20:01.396079 > > > > > > database_name: xxx > > > blocks_fetched: 2409671770 > > > blocks_hit: 1733627788 > > > physical_reads: 676043982 > > > snapshot_timestamp: 2007-06-08 09:25:01.512911 > > > > > > Subtracting these 2 lines gives me a 5 minute number of > > > blocks_fetched: 12756187 > > > blocks_hit: 437119 > > > physical_reads: 12319068 > > > > > > If I am interpreting these number correctly, for this 5 minute interval > > I > > > ended up hitting only 3.43% of the requested data in my shared_buffer, > > and > > > ended up requesting 12,319,068 blocks from the os? Since a postgres > > block > > > is 8KB, that's 98,553,544 KB (~94GB)! > > > > > > Are my assumptions correct in this? > > > > It certainly seems possible. > > > > > I am just having a hard time fathoming > > > this. For this particular db, that is almost 1/2 of the total database > > (it > > > is a 200GB+ db) requested in just 5 minutes! > > > > What are your share_buffers setting and the total RAM available to the OS? > > > > My guess would be that you have plenty of RAM in the system (8G+ ?) but > > that > > you haven't allocated very much of it to shared_buffers (only a few 100 > > meg?). > > As a result, PostgreSQL is constantly asking the OS for disk blocks that > > it > > doesn't have cached, but the OS has those disk blocks cached in RAM. > > > > If my guess is right, you'll probably see improved performance by > > allocating > > more shared memory to PostgreSQL, thus avoiding having to move data from > > one area in memory to another before it can be used. > > > > -- > > Bill Moran > > Collaborative Fusion Inc. > > http://people.collaborativefusion.com/~wmoran/ > > > > [EMAIL PROTECTED] > > Phone: 412-422-3463x4023 > > > > Wow, that's amazing. You pretty much hit my config on the head. 9GB ram > with 256MB shared_buffers. Some days are better than others :) > I have just started playing with my shared_buffers config on another server > that tends to be my main problem server. I just ran across these > informational functions the other day, and they are opening up some great > territory for me that I have been wanting to know about for a while. Have a look at the pg_buffercache module, which can be pretty useful for figuring out what data is being accessed. > I was starting to bump my shared_buffers up slowly. Would it be more > advisable to just push them to 25% of my ram and start there or work up > slowly. I was going slowly since it takes a database restart to change the > parameter. I looked back through and couldn't find which version of PostgreSQL you were using. If it's 8.X, the current wisdom is to start with 25 - 30% of your unused RAM for shared buffers (by "unused", it's meant to take into account any other applications running on the same machine and their RAM requirements) and then tune down or up as seems to help. So, my recommendation would be to bump shared_buffers up to around 2G and go from there. Another thing that I realized wasn't in your original email is if you're having any sort of problems? If there are slow queries or other performance issues, do before/after tests to see if you're adjusting values in the right direction. If you don't have any performance issues outstanding, it can be easy to waste a lot of time/energy tweaking settings that don't really help anything. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How much memory PostgreSQL is going to use?
In response to Arnau <[EMAIL PROTECTED]>: > Hi all, > >I have a server with 4GB of memory and I'm tweaking the PostgreSQL > configuration. This server will be dedicated to run PostgreSQL so I'd > like to dedicate as much as possible RAM to it. > >I have dedicated 1GB to shared_buffers (shared_buffers=131072) but > I'm not sure if this will be the maximum memory used by PostgreSQL or > additional to this it will take more memory. Because if shared_buffers > is the maximum I could raise that value even more. Individual backend processes will allocate more memory above shared_buffers for processing individual queries. See work_mem. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] VACUUM vs auto-vacuum daemon
In response to "Sabin Coanda" <[EMAIL PROTECTED]>: > Hi there, > > Using explicitly VACUUM command give me the opportunity to fine tune my > VACUUM scheduling parameters, after I analyze the log generated by VACUUM > VERBOSE. > > On the other hand I'd like to use the auto-vacuum mechanism because of its > facilities. Unfortunately, after I made some initial estimations for > autovacuum_naptime, and I set the specific data into pg_autovacuum table, I > have not a feedback from the auto-vacuum mechanism to check that it works > well or not. It would be nice to have some kind of log similar with the one > generated by VACUUM VERBOSE. Is the auto-vacuum mechanism able to provide > such a useful log ? Ditto what Alvaro said. However, you can get some measure of tracking my running VACUUM VERBOSE on a regular basis to see how well autovacuum is keeping up. There's no problem with running manual vacuum and autovacuum together, and you'll be able to gather _some_ information about how well autovacuum is keeping up. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] VACUUM vs auto-vacuum daemon
In response to "Sabin Coanda" <[EMAIL PROTECTED]>: > Hi Bill, > > ... > > > > However, you can get some measure of tracking my running VACUUM VERBOSE > > on a regular basis to see how well autovacuum is keeping up. There's > > no problem with running manual vacuum and autovacuum together, and you'll > > be able to gather _some_ information about how well autovacuum is > > keeping up. > > Well, I think it is useful just if I am able to synchronize the autovacuum > to run always after I run vacuum verbose. But I don't know how to do that. > Do you ? No, I don't. Why would you want to do that? Personally, I'd be more interested in whether autovacuum, running whenever it wants without me knowing, is keeping the table bloat under control. If this were a concern for me (which it was during initial testing of our DB) I would run vacuum verbose once a day to watch sizes and what not. After a while, I'd switch to once a week, then probably settle on once a month to ensure nothing ever gets out of hand. Put it in a cron job and have the output mailed. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance query about large tables, lots of concurrent access
In response to Karl Wright <[EMAIL PROTECTED]>: > Alvaro Herrera wrote: > > Karl Wright wrote: > >> Alvaro Herrera wrote: > >>> Karl Wright wrote: > >>> > >>>> This particular run lasted four days before a VACUUM became essential. > >>>> The symptom that indicates that VACUUM is needed seems to be that the > >>>> CPU usage of any given postgresql query skyrockets. Is this essentially > >>>> correct? > >>> Are you saying you weren't used to run VACUUM all the time? If so, > >>> that's where the problem lies. > >> Postgresql 7.4 VACUUM runs for so long that starting it with a cron job > >> even every 24 hours caused multiple instances of VACUUM to eventually be > >> running in my case. So I tried to find a VACUUM schedule that permitted > >> each individual vacuum to finish before the next one started. A vacuum > >> seemed to require 4-5 days with this particular database - or at least > >> it did for 7.4. So I had the VACUUM schedule set to run every six days. > > > > How large is the database? I must admit I have never seen a database > > that took 4 days to vacuum. This could mean that your database is > > humongous, or that the vacuum strategy is wrong for some reason. > > The database is humongus, and the machine is under intense load. On the > instance where this long vacuum occurred, there were several large > tables - one with 7,000,000 rows, one with 14,000,000, one with > 140,000,000, and one with 250,000,000. Don't rule out the possibility that the only way to fix this _might_ be to throw more hardware at it. Proper configuration can buy you a lot, but if your usage is exceeding the available bandwidth of the IO subsystem, the only way you're going to get better performance is to put in a faster IO subsystem. > > You know that you can run vacuum on particular tables, right? It would > > be probably a good idea to run vacuum on the most updated tables, and > > leave alone those that are not or little updated (hopefully the biggest; > > this would mean that an almost-complete vacuum run would take much less > > than a whole day). > > Yeah, sorry, that doesn't apply here. Why not? I see no reason why an appropriate autovaccum schedule would not apply to your scenario. I'm not saying it does, only that your response does not indicate that it doesn't, and thus I'm concerned that you're writing autovacuum off without proper research. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Maintenance question / DB size anomaly...
Kurt Overberg <[EMAIL PROTECTED]> wrote: > > That's the thing thats kinda blowing my mind here, when I look at > that table: > > db1=# select count(*) from _my_cluster.sl_log_1 ; > count > --- >6788 > (1 row) > > As far as my DB is concerned, there's only ~7000 rows (on average) > when I look > in there (it does fluctuate, I've seen it go as high as around 12k, > but then its > gone back down, so I know events are moving around in there). This is consistent with my experience with Slony and sl_log_[12] I'm pretty sure that the slon processes vacuum sl_log_* on a fairly regular basis. I'm absolutely positive that slon occasionally switches from using sl_log_1, to sl_log_2, then truncates sl_log_1 (then, after some time, does the same in reverse) So, in order for you to get massive bloat of the sl_log_* tables, you must be doing a LOT of transactions in the time before it switches logs and truncates the unused version. Either that, or something is going wrong. > So from what I can tell- from the disk point of view, there's ~11Gb > of data; from the > vacuum point of view there's 309318 rows. From the psql point of > view, there's only > around 7,000. Am I missing something? Something seems wrong here. Correct me if I'm missing something, but you're saying the table takes up 11G on disk, but vacuum says there are ~14000 pages. That would mean your page size is ~800K. Doesn't seem right. > Unless there's something > going on under the > hood that I don't know about (more than likely), it seems like my > sl_log_1 table is munged or > somehow otherwise very screwed up. I fear that a re-shuffling or > dropping/recreating > the index will mess it up further. Maybe when I take my production > systems down for > maintenance, can I wait until sl_log_1 clears out, so then I can just > drop that > table altogether (and re-create it of course)? Possibly drop this node from the Slony cluster and re-add it. Unless it's the origin node, in which case you'll have to switchover, then redo the origin then switch back ... > > Thanks! > > /kurt > > > > > On Jun 19, 2007, at 5:33 PM, Tom Lane wrote: > > > Kurt Overberg <[EMAIL PROTECTED]> writes: > >> mydb # vacuum verbose _my_cluster.sl_log_1 ; > >> INFO: "sl_log_1": found 455001 removable, 309318 nonremovable row > >> versions in 13764 pages > >> DETAIL: 0 dead row versions cannot be removed yet. > > > > Hmm. So you don't have a long-running-transactions problem (else that > > DETAIL number would have been large). What you do have is a failure > > to vacuum sl_log_1 on a regular basis (because there are so many > > dead/removable rows). I suspect also some sort of Slony problem, > > because AFAIK a properly operating Slony system shouldn't have that > > many live rows in sl_log_1 either --- don't they all represent > > as-yet-unpropagated events? I'm no Slony expert though. You probably > > should ask about that on the Slony lists. > > > >> ...I then checked the disk and those pages are still there. > > > > Yes, regular VACUUM doesn't try very hard to shorten the disk file. > > > >> Would a VACUUM FULL take care of this? > > > > It would, but it will take an unpleasantly long time with so many live > > rows to reshuffle. I'd advise first working to see if you can get the > > table down to a few live rows. Then a VACUUM FULL will be a snap. > > Also, you might want to do REINDEX after VACUUM FULL to compress the > > indexes --- VACUUM FULL isn't good at that. > > > > regards, tom lane > > > > ---(end of > > broadcast)--- > > TIP 4: Have you searched our list archives? > > > >http://archives.postgresql.org > > > ---(end of broadcast)--- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate > > > > > > -- Bill Moran Collaborative Fusion Inc. [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribu
Re: [PERFORM] cached entities
In response to "Sergey Konoplev" <[EMAIL PROTECTED]>: > Hi > > I'd like to know how to get information about which PG entities are in > kernel cache, if possible. That's going to be specific to the OS you're running. Unless you're talking about PG's shared_buffers -- if that's the case, have a look at the pg_buffercache contrib module. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Maintenance question / DB size anomaly...
In response to Kurt Overberg <[EMAIL PROTECTED]>: > OOookay. Since the discussion has wandered a bit I just > wanted to restate things in an effort to clear the problem in my head. > > Okay, so the sl_log_1 TABLE looks okay. Its the indexes that seem to > be messed up, specifically sl_log_1_idx1 seems to think that there's > > 300,000 rows in the table its associated with. I just want to fix > the index, really. So my question remains: Apologies, I must have misunderstood some previous message. > Its it okay to dump and recreate that index (or reindex it) while the > servers are down and the database is not being accessed? There are people here who know _way_ more than me -- but I can't see any reason why you couldn't just REINDEX it while everything is running. There may be some performance slowdown during the reindex, but everything should continue to chug along. A drop/recreate of the index should be OK as well. > Tom, Bill, Chris and Richard, thank you so much for your thoughts on > this matter so far. It helps to not feel "so alone" when dealing > with difficult issues (for me anyway) on a system I don't know so > much about. :D Isn't Open Source great! -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Database-wide VACUUM ANALYZE
In response to "Steven Flatt" <[EMAIL PROTECTED]>: > On 6/21/07, Francisco Reyes <[EMAIL PROTECTED]> wrote: > > > > Are you on FreeBSD by any chance? > > > > I think the FreeBSD port by default installs a script that does a daily > > vacuum. > > > Yes, FreeBSD. Do you know what script that is? /usr/local/etc/periodic/daily/502.pgsql > And it does a db-wide > VACUUM ANALYZE every day?! That is certainly not necessary, and in fact, > costly for us. You can control it with knobs in /etc/periodic.conf (just like other periodic job): daily_pgsql_vacuum_enable="YES" daily_pgsql_backup_enable="NO" are the defaults. > Hmmm... I wonder why this would just start now, three days ago. Everything > seemed to be normal for the last two weeks. Someone alter /etc/periodic.conf? Perhaps it's been running all along but you never noticed it before now? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Query is taking 5 HOURS to Complete on 8.1 version
In response to smiley2211 <[EMAIL PROTECTED]>: > > This query is taking less than 5 minutes on 7.4 but over 5 hours on 8.1... > > PostgreSQL 8.1.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) > 4.1.0 (SUSE Linux) > Total runtime: 20448310.101 ms = 5.6800862 hour > (132 rows) When was the last time you vacuum analyzed the database? Also, you don't even provide the query. I can't imagine how you'd expect anyone to help you. If vacuum analyze doesn't fix the problem, please provide the query, explain output of the query, and the schema of any tables involved, including information on indexes. > > --postgresql.conf: > > shared_buffers = 114688 # min 16 or max_connections*2, 8KB > each > #temp_buffers = 2 # min 100, 8KB each > #max_prepared_transactions = 5 # can be 0 or more > # note: increasing max_prepared_transactions costs ~600 bytes of shared > memory > # per transaction slot, plus lock space (see max_locks_per_transaction). > work_mem = 10240# size in KB > maintenance_work_mem = 64384# min 1024, size in KB > max_stack_depth = 4096 # min 100, size in KB > > # - Free Space Map - > > max_fsm_pages = 50 # min max_fsm_relations*16, 6 bytes each > max_fsm_relations = 1000# min 100, ~70 bytes each > > # - Kernel Resource Usage - > > #max_files_per_process = 1000 # min 25 > #preload_libraries = '' > > # - Cost-Based Vacuum Delay - > > #vacuum_cost_delay = 0 # 0-1000 milliseconds > #vacuum_cost_page_hit = 1 # 0-1 credits > #vacuum_cost_page_miss = 10 # 0-1 credits > #vacuum_cost_page_dirty = 20# 0-1 credits > #vacuum_cost_limit = 200# 0-1 credits > > # - Background writer - > > #bgwriter_delay = 200 # 10-1 milliseconds between > rounds > #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers > scanned/round > #bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round > #bgwriter_all_percent = 0.333 # 0-100% of all buffers > scanned/round > #bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round > > > #--- > # WRITE AHEAD LOG > #--- > > # - Settings - > > #fsync = on # turns forced synchronization on or > off > #wal_sync_method = fsync# the default is the first option > # supported by the operating system: > # open_datasync > # fdatasync > # fsync > # fsync_writethrough > # open_sync > #full_page_writes = on # recover from partial page writes > #wal_buffers = 8# min 4, 8KB each > #commit_delay = 0 # range 0-10, in microseconds > #commit_siblings = 5# range 1-1000 > > # - Checkpoints - > > checkpoint_segments = 12# in logfile segments, min 1, 16MB > each > #checkpoint_timeout = 300 # range 30-3600, in seconds > #checkpoint_warning = 30# in seconds, 0 is off > > # - Archiving - > > #archive_command = '' # command to use to archive a > logfile > # segment > > > #--- > # QUERY TUNING > #--- > > # - Planner Method Configuration - > > enable_bitmapscan = off > enable_hashagg = on > enable_hashjoin = on > enable_indexscan = on > enable_mergejoin = on > enable_nestloop = on > enable_seqscan = off > enable_sort = on > enable_tidscan = on > > # - Planner Cost Constants - > > effective_cache_size = 1# typically 8KB each > random_page_cost = 4# units are one sequential page > fetch > # cost > #cpu_tuple_cost = 0.01 # (same) > #cpu_index_tuple_cost = 0.001 # (same) > #cpu_operator_cost = 0.0025 # (same) > #--- > # LOCK MANAGEMENT > #-----
Re: [PERFORM] Is it possible to know where is the "deadlock"
In response to Arnau <[EMAIL PROTECTED]>: > Hi all, > >I have a serious problem with a server. This server holds severals > DB, the problem is thet the CPU's spend most of the time waiting: > > Cpu0: 4.0% us, 2.3% sy, 0.0% ni, 61.5% id, 32.1% wa, 0.0% hi, 0.0% si > Cpu1: 2.3% us, 0.3% sy, 0.0% ni, 84.1% id, 13.3% wa, 0.0% hi, 0.0% si > Cpu2: 1.3% us, 0.3% sy, 0.0% ni, 68.6% id, 29.8% wa, 0.0% hi, 0.0% si > Cpu3: 4.6% us, 3.3% sy, 0.0% ni, 2.6% id, 88.4% wa, 0.3% hi, 0.7% si > > The iostat -c says about 8% of time waiting for IO. I'm afraid this > is due to locks between concurrent queries, is there anyway to have more > info about? This looks perfectly normal for a medium-load server. Although you don't state your problem (you state what you think is a symptom, and call it the problem) I'm guessing you have queries that are executing slower than you would like? If that's the case, I would suggest investigating the slow queries directly. Check for indexes and ensure your vacuum/analyze schedule is acceptable. If you get stumped, post details of the queries here asking for help. Another thing that (I'm guessing) may be confusing you is if this system has multiple CPUs, each query can only execute on a single CPU. So a single query at full throttle on a 8-way system will only use 12.5% max. If you have reason to believe that locks are an issue, the pg_locks view can help you prove/disprove that theory: http://www.postgresql.org/docs/8.2/interactive/view-pg-locks.html If none of those are the case, then please describe the actual problem you are having. HTH. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] disk filling up
In response to "Brandon Shalton" <[EMAIL PROTECTED]>: > Hello all, > > My hard disk is filling up in the /base directory to where it has consumed > all 200gig of that drive. > > All the posts that i see keep saying move to a bigger drive, but at some > point a bigger drive would just get consumed. > > How can i keep the disk from filling up other than get like a half TB setup > just to hold the ./base/* folder Are you vacuuming regularly? What is the output of vacuum verbose. If table bloat (fixed by correctly vacuuming) is not your problem, then you either need to implement a data expiration policy to get rid of old data, or increase the amount of storage to accommodate the data you want to keep. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Simple select hangs while CPU close to 100%
In response to "Jozsef Szalay" <[EMAIL PROTECTED]>: > Our application is such that any update to the database is done by a > single session in a batch process using bulk load. The frequency of > these usually larger scale updates is variable but an update runs every > 2-3 days on average. > > Originally a plain VACUUM ANALYZE was executed on every affected table > after every load. Any other insert/update activity outside of the bulk loads? What's the vacuum policy outside the bulk loads? You say originally, does it still do so? I agree with Pavel that the output of vacuum verbose when the problem is occurring would be helpful. > VACUUM FULL ANALYZE is scheduled to run on a weekly basis. If you need to do this, then other settings are incorrect. > I do understand the need for vacuuming. Nevertheless I expect Postgres > to return data eventually even if I do not vacuum. In my case, the > simple SELECT COUNT(*) FROM table; statement on a table that had around > 100K "live" rows has not returned the result for more than 6 hours after > which I manually killed it. It should, 6 hours is too long for that process, unless you're running a 486dx2. You didn't mention your hardware or your postgresql.conf settings. What other activity is occurring during this long count()? Can you give us a shot of the iostat output and/or top during this phenomenon? > > Jozsef > > > -Original Message- > From: Bill Moran [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 25, 2007 1:12 PM > To: Jozsef Szalay > Cc: Pavel Stehule; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > > In response to "Jozsef Szalay" <[EMAIL PROTECTED]>: > > > Hi Pavel, > > > > > > Yes I did vacuum. In fact the only way to "fix" this problem is > > executing a "full" vacuum. The plain vacuum did not help. > > I read over my previous reply and picked up on something else ... > > What is your vacuum _policy_? i.e. how often do you vacuum/analyze? > The fact that you had to do a vacuum full to get things back under > control tends to suggest that your current vacuum schedule is not > aggressive enough. > > An explicit vacuum of this table after the large delete/insert may > be helpful. > > > -Original Message- > > From: Pavel Stehule [mailto:[EMAIL PROTECTED] > > Sent: Sunday, July 22, 2007 10:53 AM > > To: Jozsef Szalay > > Cc: pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > > > > Hello > > > > did you vacuum? > > > > It's good technique do vacuum table after remove bigger number of > rows. > > > > Regards > > Pavel Stehule > > > > 2007/7/22, Jozsef Szalay <[EMAIL PROTECTED]>: > > > > > > > > > > > > > > > I'm having this very disturbing problem. I got a table with about > > 100,000 > > > rows in it. Our software deletes the majority of these rows and then > > bulk > > > loads another 100,000 rows into the same table. All this is > happening > > within > > > a single transaction. I then perform a simple "select count(*) from > > ..." > > > statement that never returns. In the mean time, the backend Postgres > > process > > > is taking close to 100% of the CPU. The hang-up does not always > happen > > on > > > the same statement but eventually it happens 2 out of 3 times. If I > > dump and > > > then restore the schema where this table resides the problem is gone > > until > > > the next time we run through the whole process of deleting, loading > > and > > > querying the table. > > > > > > > > > > > > There is no other activity in the database. All requested locks are > > granted. > > > > > > > > > > > > Has anyone seen similar behavior? > > > > > > > > > > > > Some details: > > > > > > > > > > > > Postgres v 8.1.2 > > > > > > Linux Fedora 3 > > > > > > > > > > > > shared_buffers = 65536 > > > > > > temp_buffers = 32768 > > > > > > work_mem = 131072 > > > > > > maintenance_work_mem = 131072 > > > > > > max_stack_depth = 8192 > > > > > > max_fsm_pages = 4 > > > > > > wal_buffers = 16 > > > > > > checkpoint_segments = 16 > > > > >
Re: [PERFORM] Simple select hangs while CPU close to 100%
In response to "Jozsef Szalay" <[EMAIL PROTECTED]>: > Hi Pavel, > > Yes I did vacuum. In fact the only way to "fix" this problem is > executing a "full" vacuum. The plain vacuum did not help. Based on the information, I would expect that this problem is the result of improper PG tuning, or inadequate server sizing (RAM, etc). A table changing 100,000 rows shouldn't cause enough bloat to hurt count(*)'s performance significantly, unless something else is wrong. Some quick piddling around shows that tables with over a million rows can count(*) the whole table in about 1/2 second on a system 2G of RAM. A table with 13 mil takes a min and a half. We haven't specifically tuned this server for count(*) performance, as it's not a priority for this database, so I expect the performance drop for the 13 mil database is a result of exhausting shared_buffers and hitting the disks. > -Original Message- > From: Pavel Stehule [mailto:[EMAIL PROTECTED] > Sent: Sunday, July 22, 2007 10:53 AM > To: Jozsef Szalay > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > > Hello > > did you vacuum? > > It's good technique do vacuum table after remove bigger number of rows. > > Regards > Pavel Stehule > > 2007/7/22, Jozsef Szalay <[EMAIL PROTECTED]>: > > > > > > > > > > I'm having this very disturbing problem. I got a table with about > 100,000 > > rows in it. Our software deletes the majority of these rows and then > bulk > > loads another 100,000 rows into the same table. All this is happening > within > > a single transaction. I then perform a simple "select count(*) from > ..." > > statement that never returns. In the mean time, the backend Postgres > process > > is taking close to 100% of the CPU. The hang-up does not always happen > on > > the same statement but eventually it happens 2 out of 3 times. If I > dump and > > then restore the schema where this table resides the problem is gone > until > > the next time we run through the whole process of deleting, loading > and > > querying the table. > > > > > > > > There is no other activity in the database. All requested locks are > granted. > > > > > > > > Has anyone seen similar behavior? > > > > > > > > Some details: > > > > > > > > Postgres v 8.1.2 > > > > Linux Fedora 3 > > > > > > > > shared_buffers = 65536 > > > > temp_buffers = 32768 > > > > work_mem = 131072 > > > > maintenance_work_mem = 131072 > > > > max_stack_depth = 8192 > > > > max_fsm_pages = 4 > > > > wal_buffers = 16 > > > > checkpoint_segments = 16 > > > > > > > > > > > > top reports > > > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > > > > 19478 postgres 25 0 740m 721m 536m R 99.7 4.4 609:41.16 > postmaster > > > > > > > > ps -ef | grep postgres reports > > > > > > > > postgres 19478 8061 99 00:11 ?10:13:03 postgres: user dbase > [local] > > SELECT > > > > > > > > strace -p 19478 > > > > no system calls reported > > > > > > > > > > > > Thanks for the help! > > > > Jozsef > > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org > > > > > > -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Simple select hangs while CPU close to 100%
In response to "Jozsef Szalay" <[EMAIL PROTECTED]>: > Hi Pavel, > > > Yes I did vacuum. In fact the only way to "fix" this problem is > executing a "full" vacuum. The plain vacuum did not help. I read over my previous reply and picked up on something else ... What is your vacuum _policy_? i.e. how often do you vacuum/analyze? The fact that you had to do a vacuum full to get things back under control tends to suggest that your current vacuum schedule is not aggressive enough. An explicit vacuum of this table after the large delete/insert may be helpful. > -Original Message- > From: Pavel Stehule [mailto:[EMAIL PROTECTED] > Sent: Sunday, July 22, 2007 10:53 AM > To: Jozsef Szalay > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100% > > Hello > > did you vacuum? > > It's good technique do vacuum table after remove bigger number of rows. > > Regards > Pavel Stehule > > 2007/7/22, Jozsef Szalay <[EMAIL PROTECTED]>: > > > > > > > > > > I'm having this very disturbing problem. I got a table with about > 100,000 > > rows in it. Our software deletes the majority of these rows and then > bulk > > loads another 100,000 rows into the same table. All this is happening > within > > a single transaction. I then perform a simple "select count(*) from > ..." > > statement that never returns. In the mean time, the backend Postgres > process > > is taking close to 100% of the CPU. The hang-up does not always happen > on > > the same statement but eventually it happens 2 out of 3 times. If I > dump and > > then restore the schema where this table resides the problem is gone > until > > the next time we run through the whole process of deleting, loading > and > > querying the table. > > > > > > > > There is no other activity in the database. All requested locks are > granted. > > > > > > > > Has anyone seen similar behavior? > > > > > > > > Some details: > > > > > > > > Postgres v 8.1.2 > > > > Linux Fedora 3 > > > > > > > > shared_buffers = 65536 > > > > temp_buffers = 32768 > > > > work_mem = 131072 > > > > maintenance_work_mem = 131072 > > > > max_stack_depth = 8192 > > > > max_fsm_pages = 4 > > > > wal_buffers = 16 > > > > checkpoint_segments = 16 > > > > > > > > > > > > top reports > > > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > > > > 19478 postgres 25 0 740m 721m 536m R 99.7 4.4 609:41.16 > postmaster > > > > > > > > ps -ef | grep postgres reports > > > > > > > > postgres 19478 8061 99 00:11 ?10:13:03 postgres: user dbase > [local] > > SELECT > > > > > > > > strace -p 19478 > > > > no system calls reported > > > > > > > > > > > > Thanks for the help! > > > > Jozsef > > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org > > > > > > -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] When/if to Reindex
In response to "Steven Flatt" <[EMAIL PROTECTED]>: > On 8/8/07, Vivek Khera <[EMAIL PROTECTED]> wrote: > > > > If all you ever did was insert into that table, then you probably > > don't need to reindex. If you did mass updates/deletes mixed with > > your inserts, then perhaps you do. > > > > Do some experiments comparing pg_class.relpages for your table and > > its indexes before and after a reindex. Decide if the number of > > pages you save on the index is worth the trouble. If it shaves off > > just a handful of pages, I'd vote no... > > > What's interesting is that an insert-only table can benefit significantly > from reindexing after the table is fully loaded. I had done experiments > exactly as you suggest (looking at pg_class.relpages), and determined that > reindexing results in about a 30% space savings for all indexes except the > PK index. The PK index (integer based on a sequence) does not benefit at > all. By setting fillfactor=100 on the index prior to reindexing, I get > another 10% space savings on all the indexes. > > Not to mention the general performance improvements when reading from the > table... > > So, we decided that reindexing partitions after they're fully loaded *was* > worth it. I've had similar experience. One thing you didn't mention that I've noticed is that VACUUM FULL often bloats indexes. I've made it SOP that after application upgrades (which usually includes lots of ALTER TABLES and other massive schema and data changes) I VACUUM FULL and REINDEX (in that order). Lots of ALTER TABLEs seem to bloat the database size considerably, beyond what normal VACUUM seems to fix. A FULL seems to fix that, but it appears to bloat the indexes, thus a REINDEX helps. I would expect that setting fillfactor to 100 will encourage indexs to bloat faster, and would only be recommended if you didn't expect the index contents to change? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] When/if to Reindex
In response to "Decibel!" <[EMAIL PROTECTED]>: > On Wed, Aug 08, 2007 at 03:27:57PM -0400, Bill Moran wrote: > > I've had similar experience. One thing you didn't mention that I've noticed > > is that VACUUM FULL often bloats indexes. I've made it SOP that > > after application upgrades (which usually includes lots of ALTER TABLES and > > other massive schema and data changes) I VACUUM FULL and REINDEX (in that > > order). > > You'd be better off with a CLUSTER in that case. It'll be faster, and > you'll ensure that the table has optimal ordering. Point taken. > > Lots of ALTER TABLEs seem to bloat the database size considerably, beyond > > what normal VACUUM seems to fix. A FULL seems to fix that, but it appears > > to bloat the indexes, thus a REINDEX helps. > > Hrm, are you sure that's still true? I just did an ALTER TABLE ... TYPE > and it created a new file, meaning no bloating. No, I'm not. This isn't something I've analyzed or investigated in detail. During upgrades, a lot happens: ATLER TABLES, tables are dropped, new tables are created, massive amounts of data may be altered in a short period, stored procedures are replaced, etc, etc. I don't remember what led me to believe that the ALTER TABLES were causing the worst of the problem, but it's entirely possible that I was off-base. (I seem to remember being concerned about too many DROP COLUMN and ADD COLUMNs) In any event, my original statement (that it's a good idea to REINDEX after VACUUM FULL) still seems to be correct. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] deadlock_timeout parameter in Postgresql.cof
In response to "Sachchida Ojha" <[EMAIL PROTECTED]>: > I am having some dead locking problem with my app system. Our dev are > debugging the app to find out the cause of the problem. In the mean time > I looked at postgresql.conf file. I found that there is a parameter in > postgresql.conf file deadlock_timeout which was set 1000 (ms). Normally > I see deadlock in the night or when auto vacuum is running for a long > time. > > My question is > > What is the significance of this parameter and updating this parameter > value will make any difference ? Does the documentation leave anything unanswered? http://www.postgresql.org/docs/8.2/static/runtime-config-locks.html -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] significant vacuum issues - looking for suggestions
prod) >a) install verson 8.2.4 from source, leaving 8.1.4 in place > >b) create the new 8.2.4 cluster on a new port > >c) setup WAL archiving on the 8.1.4 cluster > >d) do a full dump of the 8.1.4 cluster and restore it to the new 8.2.4 > cluster > >e) stop the 8.2.4 cluster and bring it up in recovery mode, pointing it to > the directory where we're archiving the 8.1.4 cluster's WAL segments. > >f) once caught up, bring both clusters down > >g) copy any final files from the 8.1.4 cluster's pg_xlog directory into > the > new 8.2.4 pg_xlog dir (is this ok, since I'm moving 8.1.4 version tx logs > into an 8.2.4 xlog dir?) > >h) Change the port on the 8.2.4 cluster to what the original 8.1.4 cluster > port was > >i) bring up the new 8.2.4 system, and actively manage the vacuum needs > moving fwd via a combination of autovacuum, cron processes for specififed > table vac's (daily, hourly, 15min, 5min, etc), and as needed interactive > session vacuums > = > > > The src based install will allow me to setup a robust upgrade CM process > capable of supporting multiple concurrent versions on a server if needed, the > ability to quickly revert to a previous version, etc however this is a > discussion for another day - I only mention it in case the question "why not > just use RPM's?" arises... > > > So here's my questions: > > 1) Does this sound like a good plan? > > 2) Are there other steps I should be taking, other Issues I should be > concerned about short-term, etc? > > 3) Does anyone have any additional advice for managing either this initial > mess, or the system(s) long term? > > Thanks in advance... > > /Kevin > > ---(end of broadcast)--- > TIP 3: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faq > > > > > > -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] significant vacuum issues - looking for suggestions
In response to "Kevin Grittner" <[EMAIL PROTECTED]>: > >>> On Fri, Aug 24, 2007 at 2:57 PM, in message > <[EMAIL PROTECTED]>, Kevin Kempter > <[EMAIL PROTECTED]> wrote: > >c) setup WAL archiving on the 8.1.4 cluster > > > >d) do a full dump of the 8.1.4 cluster and restore it to the new 8.2.4 > > cluster > > > >e) stop the 8.2.4 cluster and bring it up in recovery mode, pointing it > > to > > the directory where we're archiving the 8.1.4 cluster's WAL segments. > > You can't use these techniques for a major version upgrade. > Use pg_dump piped to psql. That will also eliminate all bloat. If you can't afford any downtime, you may be able to use Slony to do your upgrade. However, slony adds overhead, and if this system is tapped out already, it may not tolerate the additional overhead. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Postgres performance problem
In response to Chris Mair <[EMAIL PROTECTED]>: > > Hi, > > > > Note: I have already vacumm full. It does not solve the problem. To jump in here in Chris' defense, regular vacuum is not at all the same as vacuum full. Periodic vacuum is _much_ preferable to an occasional vacuum full. The output of vacuum verbose would have useful information ... are you exceeding your FSM limits? Try a reindex on the database. There may be some obscure corner cases where reindex makes a notable improvement in performance. > > I have a postgres 8.1 database. In the last days I have half traffic > > than 4 weeks ago, and resources usage is twice. The resource monitor > > graphs also shows hight peaks (usually there is not peaks) Resource monitor graphs? That statement means nothing to me, therefore I don't know if the information they're providing is useful or accurate, or even _what_ it is. What, exactly, are these graphs monitoring? You might want to provide your postgresql.conf. Have you considered the possibility that the database has simply got more records and therefore access takes more IO and CPU? > > The performarce is getting poor with the time. > > > > Im not able to find the problem, seems there is not slow querys ( I have > > log_min_duration_statement = 5000 right now, tomorrow I ll decrease it ) > > > > Server is HP, and seems there is not hardware problems detected. > > > > Any ideas to debug it? > > Hi, > > first of all: let us know the exact version of PG and the OS. > > If performance is getting worse, there ususally is some bloat > envolved. Not vacuuming aggressivly enough, might be the most > common cause. Do you autovacuum or vacuum manually? > Tell us more... > > > Bye, > Chris. > > > > ---(end of broadcast)--- > TIP 3: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faq > > > > > > -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance issue
In response to Willo van der Merwe <[EMAIL PROTECTED]>: > Hi Guys, > > I have something odd. I have Gallery2 running on PostgreSQL 8.1, and > recently I upgraded to 8.1.9-1.el4s1.1 (64bit). The issue here really is > how do I get PostgreSQL to work with their horrible code. The queries > they generate look something like : > SELECT blah, blah FROM table1, table2 WHERE AND > id IN () > > On the previous version (which I can't recall what it was, but it was a > version 8.1) the queries executed fine, but suddenly now, these queries > are taking up-to 4 minutes to complete. I am convinced it's the > parsing/handling of the IN clause. It could, of course, be that the list > has grown so large that it can't fit into a buffer anymore. For obvious > reasons I can't run an EXPLAIN ANALYZE from a prompt. Those reasons are not obvious to me. The explain analyze output is going to be key to working this out -- unless it's something like your postgresql.conf isn't properly tuned. > I vacuum and > reindex the database daily. > > I'd prefer not to have to rewrite the code, so any suggestions would be > very welcome. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Transaction Log
In response to Mark Mielke <[EMAIL PROTECTED]>: > [EMAIL PROTECTED] wrote: > > For best performance, the transaction log should be on a separate disk. > > > > Does the writing of the log benefit from a battery backed controller > > as well? If not, what do people think about writing the transaction > > log to a flash card or the like? Flash cards write _very_ slowly. > How popular are the battery backed RAM drives that exist today? I don't > recall seeing them spoken about in this mailing list. The local geek > shop has these devices on sale. Are they still too expensive? I've seen them around and as best I can tell, they're pretty inexpensive. The main drawback is the storage, you'd be looking at the price of the card, plus the price of however much RAM you wanted on it. http://www.amazon.com/Gigabyte-GC-RAMDISK-i-RAM-Hard-Drive/dp/B000EPM9NC/ref=pd_bbs_sr_1/102-3968336-1618519?ie=UTF8&s=electronics&qid=1188418613&sr=8-1 http://techreport.com/articles.x/9312/1 Up to 4G, but you have to add the price of the RAM on to the price of the card. In the case of WAL logs, you could probably get away with a lot less space than many other usages, so they might be very practical. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Vacum Analyze problem
In response to [EMAIL PROTECTED]: > Hello everyone: > >I wanted to ask you about how the VACUUM ANALYZE works. is it possible > that something can happen in order to reset its effects forcing to execute > the VACUUM ANALYZE comand again? i am asking this because i am struggling > with a query which works ok after i run a VACUUM ANALYZE, however, sudennly, > it starts to take forever (the execution of the query) until i make another > VACUUM ANALYZE, and so on ... >I'd like to point that i am a novice when it comes to non basic > postgresql performance related stuff. > > Thank you all in advance To add to Mikko's comments: Periodic vacuuming and analyzing is a mandatory part of running a PostgreSQL database server. You'll probably be best served to configure the autovacuum daemon to handle this for you. See the postgresql.conf config file. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] About autovacuum
In response to Jean-David Beyer <[EMAIL PROTECTED]>: > I have this turned on, and if I look at the log, it runs once a minute, > which is fine. > > But what does it do? I.e, it runs VACUUM, but does it also do an analyze? Yes. If you turn up the debugging level, you'll see detailed log messages about its activities. There were discussions on other lists about improving autovacuum's log messages, I'm pretty sure it will log more helpful information in 8.3. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] R: DELETE queries slow down
In response to "Galantucci Giovanni" <[EMAIL PROTECTED]>: > I perform simple INSERT and simple where-clause DELETE. > I also force a commit after every DELETE. Do you mean that you delete 1 row at a time? This is slower than batching your deletes. > My two tables are about these: > > TABLE_A > Column_1 | column2 | ... > > TABLE_B > Column_1B foreign key references TABLE_A(column_1) on delete cascade | > . > > Every row in TABLE_B is also present in TABLE_A, but the contrary is not true. > After hours in which I insert and delete only on TABLE_A (everything ok), I > start inserting also on TABLE_B, exploiting the constrain on column_1B. After > the first DELETE I perform on both tables, each following DELETE lasts for > minutes, with cpu usage on 99,9%. > I tried also to perform a VACUUM after each DELETE, but had no benefits. > Even the EXPLAIN ANALYZE of the DELETE shows no changes with respect to the > previous DELETEs: it uses an index on column_1 of TABLE_A. Are you unable to provide these details? (i.e. output of explain, the actual table schema, actual queries) Without them, the question is very vague and difficult to give advice on. If the planner comes up with the same plan whether running fast or slow, the question is what part of that plan is no longer valid (what part's actual time no longer matches it's predicted time) > My doubt is that the query planner is not enough fast to follow sudden > changes in the way I use the DB, is there a way in which I can help it to > adjust its statistics and its query planner more quickly? See: http://www.postgresql.org/docs/8.2/static/sql-analyze.html which also has links to other information on this topic. If you can demonstrate that the statistics are stale, you might benefit from manual analyze after large operations. > My other doubt is that the foreign key on TABLE_B is a problem when I try to > delete from TABLE_A, and postgres tries to find nonexistent constrained rows > on TABLE_B. It's quite possible, considering the fact that you seem to be CPU bound. > > -Messaggio originale- > Da: Gregory Stark [mailto:[EMAIL PROTECTED] > Inviato: lunedì 17 settembre 2007 12.22 > A: Heikki Linnakangas > Cc: Galantucci Giovanni; pgsql-performance@postgresql.org > Oggetto: Re: DELETE queries slow down > > "Heikki Linnakangas" <[EMAIL PROTECTED]> writes: > > > Galantucci Giovanni wrote: > > > >> For 1 or 2 hours we update only one table, and everything goes ok, where > >> DELETE last at most 6 or 7 seconds. > >> > >> Then for a minute we do INSERT on both table, and everything continue > >> going ok, with DELETE that last about 10 seconds. > >> > >> From that moment on, DELETES become timeless, and last for 240 and more > >> seconds! > > What do the inserts and deletes actually look like? Are there subqueries or > joins or are they just inserting values and deleting simple where clauses? > > And are these in autocommit mode or are you running multiple commands in a > single transaction? > > Generally it's faster to run more commands in a single transaction but what > I'm worried about is that you may have a transaction open which you aren't > committing for a long time. This can stop vacuum from being able to clean up > dead space and if it's in the middle of a query can actually cause vacuum to > get stuck waiting for the query to finish using the page it's using. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: R: [PERFORM] R: DELETE queries slow down
"Galantucci Giovanni" <[EMAIL PROTECTED]> wrote: > > No, I perform a single DELETE for about 8/10 rows at a time. > > Yesterday I tried to raise the parameter default_statistics_target on the > file postgresql.conf, setting it to 50 (previously it was set to 10) and > everything went ok. > > It seems that postgres needs some time to adapt itself to sudden changes in > the way I use the DB, maybe to adapt its planner to the new way of use. I > think that tuning this parameter could be enough to help postgres update it's > planner faster. > > Do you think it could be reasonable? Based on the information you've given and the responses you've made, I think you're as likely to roll a 1d6 and get the right solution as anything else. Good luck. > -Messaggio originale- > Da: Bill Moran [mailto:[EMAIL PROTECTED] > Inviato: martedì 18 settembre 2007 18.19 > A: Galantucci Giovanni > Cc: pgsql-performance@postgresql.org > Oggetto: Re: [PERFORM] R: DELETE queries slow down > > > > In response to "Galantucci Giovanni" <[EMAIL PROTECTED]>: > > > > > I perform simple INSERT and simple where-clause DELETE. > > > I also force a commit after every DELETE. > > > > Do you mean that you delete 1 row at a time? This is slower than > > batching your deletes. > > > > > My two tables are about these: > > > > > > TABLE_A > > > Column_1 | column2 | ... > > > > > > TABLE_B > > > Column_1B foreign key references TABLE_A(column_1) on delete cascade | > > . > > > > > > Every row in TABLE_B is also present in TABLE_A, but the contrary is not > > true. > > > After hours in which I insert and delete only on TABLE_A (everything ok), I > > start inserting also on TABLE_B, exploiting the constrain on column_1B. > > After the first DELETE I perform on both tables, each following DELETE > > lasts for minutes, with cpu usage on 99,9%. > > > I tried also to perform a VACUUM after each DELETE, but had no benefits. > > > Even the EXPLAIN ANALYZE of the DELETE shows no changes with respect to the > > previous DELETEs: it uses an index on column_1 of TABLE_A. > > > > Are you unable to provide these details? (i.e. output of explain, the > > actual table schema, actual queries) Without them, the question is > > very vague and difficult to give advice on. > > > > If the planner comes up with the same plan whether running fast or slow, > > the question is what part of that plan is no longer valid (what part's > > actual time no longer matches it's predicted time) > > > > > My doubt is that the query planner is not enough fast to follow sudden > > changes in the way I use the DB, is there a way in which I can help it to > > adjust its statistics and its query planner more quickly? > > > > See: > > http://www.postgresql.org/docs/8.2/static/sql-analyze.html > > which also has links to other information on this topic. > > > > If you can demonstrate that the statistics are stale, you might benefit > > from manual analyze after large operations. > > > > > My other doubt is that the foreign key on TABLE_B is a problem when I try > > to delete from TABLE_A, and postgres tries to find nonexistent constrained > > rows on TABLE_B. > > > > It's quite possible, considering the fact that you seem to be CPU bound. > > > > > > > > -Messaggio originale- > > > Da: Gregory Stark [mailto:[EMAIL PROTECTED] > > > Inviato: lunedì 17 settembre 2007 12.22 > > > A: Heikki Linnakangas > > > Cc: Galantucci Giovanni; pgsql-performance@postgresql.org > > > Oggetto: Re: DELETE queries slow down > > > > > > "Heikki Linnakangas" <[EMAIL PROTECTED]> writes: > > > > > > > Galantucci Giovanni wrote: > > > > > > > >> For 1 or 2 hours we update only one table, and everything goes ok, where > > > >> DELETE last at most 6 or 7 seconds. > > > >> > > > >> Then for a minute we do INSERT on both table, and everything continue > > > >> going ok, with DELETE that last about 10 seconds. > > > >> > > > >> From that moment on, DELETES become timeless, and last for 240 and more > > > >> seconds! > > > > > > What do the inserts and deletes actually look like? Are there subqueries or > > > jo
Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!
-> Hash > Join (cost=10847.38..100010780.52 rows=65 width=20) (never executed) >Hash > Cond: ("outer".encounter_id = "inner".id) >-> > Seq Scan on encounters_questions_answers eqa > (cost=1.00..17608.66 rows=464766 width=8) (never executed) >-> > Hash (cost=847.37..847.37 rows=3 width=20) (never executed) > > -> Hash Join (cost=214.73..847.37 rows=3 width=20) (never executed) > > > Hash Cond: ("outer".enrollment_id = "inner".id) > > > -> Index Scan using encounters_id on encounters ec (cost=0.00..524.72 > rows=21578 width=8) (never executed) > > > -> Hash (cost=214.73..214.73 rows=1 width=20) (never executed) > > > -> Index Scan using enrollements_pk on enrollments en (cost=0.00..214.73 > rows=1 width=20) (never executed) > > > Filter: ($0 = person_id) > -> HashAggregate > (cost=96.86..96.87 rows=1 width=36) (never executed) >-> Nested Loop > (cost=60.61..96.85 rows=1 width=36) (never executed) > -> Nested > Loop (cost=60.61..93.72 rows=1 width=32) (never executed) >-> > Nested Loop (cost=60.61..90.69 rows=1 width=36) (never executed) > > -> Nested Loop (cost=0.00..9.37 rows=1 width=36) (never executed) > > > -> Index Scan using people_pk on people p (cost=0.00..4.35 rows=1 width=8) > (never executed) > > > Index Cond: (id = $0) > > > -> Index Scan using answers_answer_un on answers a (cost=0.00..5.01 rows=1 > width=28) (never executed) > > > Index Cond: ((answer)::text = 'Yes'::text) > > -> Bitmap Heap Scan on questions_answers qa (cost=60.61..81.23 rows=7 > width=16) (never executed) > > > Recheck Cond: ((qa.answer_id = "outer".id) AND (((qa.question_tag)::text = > 'consentTransfer'::text) OR ((qa.question_tag)::text = 'shareWithEval': > :text))) > > > -> BitmapAnd (cost=60.61..60.61 rows=7 width=0) (never executed) > > > -> Bitmap Index Scan on qs_as_answer_id (cost=0.00..5.27 rows=649 width=0) > (never executed) > > > Index Cond: (qa.answer_id = "outer".id) > > > -> BitmapOr (cost=55.08..55.08 rows=6596 width=0) (never executed) > > > -> Bitmap Index Scan on qs_as_qtag (cost=0.00..27.54 rows=3298 width=0) > (never executed) > > > Index Cond: ((question_tag)::text = 'consentTransfer'::text) > > > -> Bitmap Index Scan on qs_as_qtag (cost=0.00..27.54 rows=3298 width=0) > (never executed) > >
Re: [PERFORM] Low CPU Usage
In response to [EMAIL PROTECTED]: > > That's not what it looks like based on the EXPLAIN ANALYZE output. > > It looks like run time dropped from two seconds to half a second. > > > It seems as though you either have a network delay delivering the results, > > or your application is slow to read them. > > > Exactly how are you arriving at those timings you're reporting to us? > > I have noticed this in a daly process I run which involves normally 45 > minutes and with the new server takes 1:40. > > Some days ago I began to do some tests with no success, then I opened PgAdmin > with this simply query to read 2 big tables and then compare disk access. > SELECT * > FROM fact_ven_renta fvr, dim_producto_std_producto dpp > WHERE > fvr.producto_std_producto_sk = dpp.producto_sk > > fact_ven_renta has 136316 rows > dim_producto_std_producto has 3669 rows Run the tests from psql on the same server. As Kevin pointed out, the _server_ is faster, but it appears as if the connection between PGadmin and this new server is slower somehow. Are you sure of your speed/duplex settings on the network side? That's the most common cause of this kind of thing in my experience. Try doing a raw FTP transfer between the client and server and see if you get the speed you should. > > > > I have made all possible combinations pgadmin (running in the same server > each query, in the old one, in the new one), without difference and I only > retrieve the first 100 records (I didn't count the network time in any case). > But the weird thing is running the query in the new server the are many disk > access and cpu usage. And with other applications in the same server are a > lot of disks access. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Low CPU Usage
In response to [EMAIL PROTECTED]: > >> > That's not what it looks like based on the EXPLAIN ANALYZE output. > >> > It looks like run time dropped from two seconds to half a second. > >> > >> > It seems as though you either have a network delay delivering the > >> > results, > >> > or your application is slow to read them. > >> > >> > Exactly how are you arriving at those timings you're reporting to us? > >> > >> I have noticed this in a daly process I run which involves normally 45 > >> minutes and with the new server takes 1:40. > >> > >> Some days ago I began to do some tests with no success, then I opened > >> PgAdmin with this simply query to read 2 big tables and then compare disk > >> access. > >> SELECT * > >> FROM fact_ven_renta fvr, dim_producto_std_producto dpp > >> WHERE > >> fvr.producto_std_producto_sk = dpp.producto_sk > >> > >> fact_ven_renta has 136316 rows > >> dim_producto_std_producto has 3669 rows > > >Run the tests from psql on the same server. As Kevin pointed out, the > >_server_ is faster, but it appears as if the connection between PGadmin and > >this new server is slower somehow. > > It runs quickly!!! But I don't know how to compare because looks like it > retrieve fields by demand, when I put ctrl+end (go to the last record) it use > a lot of CPU and disk, run quickly anyway. That's pretty odd. If you use \timing in psql, you can get execution time for each query, if it helps you track things down. > Correct me if am I wrong but, executing PgAdmin in the same server there > aren't networks delays! Not network, no. But the results of your explains seem to show that the query is executing much faster on the new system than the old, so the problem still becomes, "what is happening after the query completes that is so slow?" It's just that networking is ruled out. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] select count(*) performance (vacuum did not help)
In response to "Gábor Farkas" <[EMAIL PROTECTED]>: > Heikki Linnakangas wrote: > > Gábor Farkas wrote: > >> > >> if i do a full vacuum to that table only, will the database still serve > >> data from the other tables at a normal speed? > > > > Yes. The extra I/O load vacuum full generates while it's running might > > disrupt other activity, though. > > > > i see. > > will i achieve the same thing by simply dropping that table and > re-creating it? Yes. Once you've done so, keep up the vacuum schedule you've already established. You may want to (as has already been suggested) explicitly vacuum this table after large delete operations as well. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] quickly getting the top N rows
In response to Ben <[EMAIL PROTECTED]>: > If I have this: > > create table foo (bar int primary key); > > ...then in my ideal world, Postgres would be able to use that index on bar > to help me with this: > > select bar from foo order by bar desc limit 20; > > But in my experience, PG8.2 is doing a full table scan on foo, then > sorting it, then doing the limit. I have a more complex primary key, but I > was hoping the same concept would still apply. Am I doing something wrong, > or just expecting something that doesn't exist? Show us the explain. However, 2 guesses: 1) You never analyzed the table, thus PG has awful statistics and doesn't know how to pick a good plan. 2) You have so few rows in the table that a seq scan is actually faster than an index scan, which is why PG uses it instead. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Postgres running Very slowly
In response to "Radhika S" <[EMAIL PROTECTED]>: > Hi -, > I have a very peculiar situation. > > I am running a postgres 7.4.6 database. It is running slow... . 7.4.6 is very old. You're lucky it hasn't corrupted your data. At least upgrade to the latest 7.4.18 (yes, that's 12 patches ahead of you). Optimally, upgrade to 8.2.5, which has a huge number of performance improvements. > I vacuum --analyze daily. I just did again. > I did a vacuum full last night. > > But to no avail. CPU usage and memory are normal, but the system is > crawling. You need to specifically define "crawling" before anyone will be able to provide any useful advice. What queries are running slow? What does the explain output look like? The answers are in the details, so we can't provide the answers unless you provide the details. Like the OS you're running it on, for example. > Here is the info from vacuum. > > CPU 0.02s/0.01u sec elapsed 0.02 sec. > INFO: free space map: 167 relations, 1412 pages stored; 3440 total pages > needed > DETAIL: Allocated FSM size: 1000 relations + 2 pages = 178 kB shared > memory. > VACUUM This doesn't look problematic, so I doubt your vacuum policy is to blame. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 12 hour table vacuums
In response to Ron St-Pierre <[EMAIL PROTECTED]>: > We vacuum only a few of our tables nightly, this one is the last one > because it takes longer to run. I'll probably re-index it soon, but I > would appreciate any advice on how to speed up the vacuum process (and > the db in general). I doubt anyone can provide meaningful advice without the output of vacuum verbose. > > Okay, here's our system: >postgres 8.1.4 >Linux version 2.4.21 >Red Hat Linux 3.2.3 >8 GB ram >Intel(R) Xeon(TM) CPU 3.20GHz >Raid 5 >autovacuum=off >serves as the application server and database server >server is co-located in another city, hardware upgrade is not > currently an option > > Here's the table information: > The table has 140,000 rows, 130 columns (mostly NUMERIC), 60 indexes. It > is probably our 'key' table in the database and gets called by almost > every query (usually joined to others). The table gets updated only > about 10 times a day. We were running autovacuum but it interfered with > the updates to we shut it off. We vacuum this table nightly, and it > currently takes about 12 hours to vacuum it. Not much else is running > during this period, nothing that should affect the table. > > Here are the current non-default postgresql.conf settings: > max_connections = 100 > shared_buffers = 5 > work_mem = 9192 > maintenance_work_mem = 786432 > max_fsm_pages = 7 > vacuum_cost_delay = 200 > vacuum_cost_limit = 100 > bgwriter_delay = 1 > fsync = on > checkpoint_segments = 64 > checkpoint_timeout = 1800 > effective_cache_size = 27 > random_page_cost = 2 > log_destination = 'stderr' > redirect_stderr = on > client_min_messages = warning > log_min_messages = warning > stats_start_collector = off > stats_command_string = on > stats_block_level = on > stats_row_level = on > autovacuum = off > autovacuum_vacuum_threshold = 2000 > deadlock_timeout = 1 > max_locks_per_transaction = 640 > add_missing_from = on > > As I mentioned, any insights into changing the configuration to optimize > performance are most welcome. > > Thanks > > Ron > > ---(end of broadcast)--- > TIP 5: don't forget to increase your free space map settings > > > > > > -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 12 hour table vacuums
In response to Ron St-Pierre <[EMAIL PROTECTED]>: > Bill Moran wrote: > > In response to Ron St-Pierre <[EMAIL PROTECTED]>: > > > > > >> We vacuum only a few of our tables nightly, this one is the last one > >> because it takes longer to run. I'll probably re-index it soon, but I > >> would appreciate any advice on how to speed up the vacuum process (and > >> the db in general). > >> > > > > I doubt anyone can provide meaningful advice without the output of > > vacuum verbose. Understood, however I may have spoken too soon. It appears that Tom found an obvious issue with your config that seems likely to be the problem. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Large (8M) cache vs. dual-core CPUs
I've been given the task of making some hardware recommendations for the next round of server purchases. The machines to be purchased will be running FreeBSD & PostgreSQL. Where I'm stuck is in deciding whether we want to go with dual-core pentiums with 2M cache, or with HT pentiums with 8M cache. Both of these are expensive bits of hardware, and I'm trying to gather as much evidence as possible before making a recommendation. The FreeBSD community seems pretty divided over which is likely to be better, and I have been unable to discover a method for estimating how much of the 2M cache on our existing systems is being used. Does anyone in the PostgreSQL community have any experience with large caches or dual-core pentiums that could make any recommendations? Our current Dell 2850 systems are CPU bound - i.e. they have enough RAM, and fast enough disks that the CPUs seem to be the limiting factor. As a result, this decision on what kind of CPUs to get in the next round of servers is pretty important. Any advice is much appreciated. -- Bill Moran Collaborative Fusion Inc. IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Why so slow?
"Bealach-na Bo" <[EMAIL PROTECTED]> wrote: > >If you don't need access to the old data constantly: > > > > - copy the live data to a new table > > - TRUNCATE the old table (which needs an exclusive lock but is very fast) > > - insert the data back in > > - for an event log I would imagine this could work > > Obtaining exclusive locks on this table is very difficult, or rather, > will make life very difficult for others, so I'm averse to running > vacuum full or truncate (though I don't know how fast truncate is) > on a regular basis. I might just get away with running it > once a month, but no more. > > (Lazy) vacuum, however is a much more palatable option. But (lazy) > vacuum does not always reclaim space. Will this affect performance and > does this mean that a full vacuum is unavoidable? Or can I get away > with daily (lazy) vacuums? Disk space is not an issue for me, but > performance is a BIG issue. Of course, I realize that I could improve > the latter with better schema design - I'm working on a new schema, > but can't kill this one yet :|. My understanding is basically that if you vacuum with the correct frequency, you'll never need to vacuum full. This is why the autovacuum system is so nice, it adjusts the frequency of vacuum according to how much use the DB is getting. The problem is that if you get behind, plain vacuum is unable to get things caught up again, and a vacuum full is required to recover disk space. At this point, it seems like you need to do 2 things: 1) Schedule lazy vacuum to run, or configure autovacuum. 2) Schedule some downtime to run "vacuum full" to recover some disk space. #2 only needs done once to get you back on track, assuming that #1 is done properly. A little bit of wasted space in the database is OK, and lazy vacuum done on a reasonable schedule will keep the level of wasted space to an acceptable level. -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Why so slow?
"Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > On Sun, Apr 30, 2006 at 10:03:46AM -0400, Bill Moran wrote: > > At this point, it seems like you need to do 2 things: > > 1) Schedule lazy vacuum to run, or configure autovacuum. > > 2) Schedule some downtime to run "vacuum full" to recover some disk space. > > > > #2 only needs done once to get you back on track, assuming that #1 is > > done properly. > > You'll also want to reindex since vacuum full won't clean the indexes > up. You might also want to read > http://www.pervasivepostgres.com/instantkb13/article.aspx?id=10087 and > http://www.pervasivepostgres.com/instantkb13/article.aspx?id=10116. Reindexing is in a different class than vacuuming. Neglecting to vacuum creates a problem that gets worse and worse as time goes on. Neglecting to reindex does not create an infinately growing problem, since empty index pages are recycled automatically. It's also conceivable that some usage patterns don't need to reindex at all. http://www.postgresql.org/docs/8.1/interactive/routine-reindex.html -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 64-bit vs 32-bit performance ... backwards?
Anthony Presley <[EMAIL PROTECTED]> wrote: > Hi all! > > I had an interesting discussion today w/ an Enterprise DB developer and > sales person, and was told, twice, that the 64-bit linux version of > Enterprise DB (which is based on the 64-bit version of PostgreSQL 8.1) > is SIGNIFICANTLY SLOWER than the 32-bit version. Since the guys of EDB > are PostgreSQL . has anyone seen that the 64-bit is slower than the > 32-bit version? > > I was told that the added 32-bits puts a "strain" and extra "overhead" > on the processor / etc which actually slows down the pointers and > necessary back-end "stuff" on the database. > > I'm curious if anyone can back this up or debunk it. It's about > the polar opposite of everything I've heard from every other database > vendor for the past several years, and would be quite an eye-opener for > me. We did some tests on with identical hardware in both EMT64 and ia32 mode. (Dell 2850, if you're curious) This was PostgreSQL 8.1 running on FreeBSD 6. We found 64 bit to be ~5% slower than 32 bit mode in the (very) limited tests that we did. We pulled the plug before doing any extensive testing, because it just didn't seem as if it was going to be worth it. -- Bill Moran I already know the ending it's the part that makes your face implode. I don't know what makes your face implode, but that's the way the movie ends. TMBG ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM]Is it possible to start two instances of postgresql?
In response to Dan Harris <[EMAIL PROTECTED]>: > > > [EMAIL PROTECTED] wrote: > > > >> both of the two database are live but use for two different web app. > >> my company don't want to spend more to buy a new server, so then I think of > >> to implement both under the same server and one instance.. > > Just as an anecdote, I am running 30 databases on a single instance and > it's working quite well. There may be reasons to run multiple > instances but it seems like tuning them to cooperate for memory would > pose some problems - e.g. effective_cache_size. The only reason I can see for doing this is when you need to run two different versions of PostgreSQL. Which is what I've been forced to do on one of our servers. It works, but it's a pain to admin. If you can just put all the databases in one db cluster (is that terminology still correct?) it'll be much easier. -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance penalty for remote access of postgresql
In response to "Guoping Zhang" <[EMAIL PROTECTED]>: > > Thanks for pointing me the cause, but we simply cannot use the COPY FROM > solution. > > Currently, our application service is running with its own dedicated local > database, IF Feasible, we want to separate the application services out of > database server and run SEVERAL instances of applation serivice on its own > server (one per server), and make them all shall one database server. This > helps to the scalability and also reduce the device cost as only database > server would need mirror/backup/UPS etc. > > Obviously, if there is no better solution, the TCP round trip penalty will > stop us doing so as we do have performance requirement. > > I guess there shall be quite number of people out there facing the similar > problem, right? No alternative solution? I suppose I'm a little confused on two points: 1) What did you expect. 2) What is your network? On #1: networking adds overhead. Period. Always. I believe you earlier said you estimated around %20 perf hit. For small transactions, I wouldn't expect much better. TCP adds a good bit of header to each packet, plus the time in the kernel, and the RTT. 20% sounds about average to me. #2 falls into a number of different categories. For example: a) What is your topology? If you absolutely need blazing speed, you should have a dedicated gigabit switched network between the machines. b) Not all network hardware is created equal. Cheap switches seldom perform at their advertised speed. Stick with high-end stuff. NICs are the same way. On #2, you'll want to ensure that the problem is not in the hardware before you start complaining about PostgreSQL, or even TCP. If you've got a cheap, laggy switch, not amount of TCP or PostgreSQL tuning is going to overcome it. Hope some of this is helpful. -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PITR performance overhead?
In response to "George Pavlov" <[EMAIL PROTECTED]>: > I am looking for some general guidelines on what is the performance > overhead of enabling point-in-time recovery (archive_command config) on > an 8.1 database. Obviously it will depend on a multitude of factors, but > some broad-brush statements and/or anecdotal evidence will suffice. > Should one worry about its performance implications? Also, what can one > do to mitigate it? Prior to implementing PITR, I did some testing to see what kind of overhead it would add. It was negligible. I don't remember the details, but I seem to remember the performance hit was barely measurable. Note that in our usage scenarios, we have very little IO compared to CPU usage. The result is that our DB servers have plenty of disk bandwidth to spare. Since the log backup occurs as a background process, it made almost no difference in our tests. If your DB is very IO intensive, you may have different results. -- Bill Moran Collaborative Fusion Inc. IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] unsubscribe
In response to "Gourish Singbal" <[EMAIL PROTECTED]>: If you look in the mail headers: List-Unsubscribe: <mailto:[EMAIL PROTECTED]> -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [PATCHES] Template0 age is increasing speedily.
In response to "Nimesh Satam" <[EMAIL PROTECTED]>: > Hi, > > Postgres Version used is 8.1.3 > OS: Linux > > > 'SELECT datname, age(datfrozenxid) FROM pg_database' > > postgres | 1575 > xyz | 1073743934 > template1 | 1632 > template0 | 61540256 > > This is the command which I tried and got the above output, and the number > is increasing pretty fast for template0. > > Please let me know if this a problem. Short answer: no, this is not a problem. Long answer: http://www.postgresql.org/docs/8.1/interactive/manage-ag-templatedbs.html -- Bill Moran Collaborative Fusion Inc. IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] autovacuum on a -mostly- r/o table
In response to Edoardo Ceccarelli <[EMAIL PROTECTED]>: > Hello, > > we are running a 7.3 postgres db with only a big table (avg > 500.000records) and 7 indexes for a search engine. > we have 2 of this databases and we can switch from one to another. > Last week we decided to give a try to 8.1 on one of them and everything > went fine, db is faster (about 2 or 3 times in our case) and the server > load is higher - which should mean that faster response time is achieved > by taking a better use of the server. > > We also activated the autovacuum feature to give it a try and that's > were our problems started. > I left the standard autovacuum configuration just to wait and see, pg > decided to start a vacuum on the table just midday when users were > launching search queries on the table and server load reached a very > high value so that in a couple of minutes the db was unusable > > With pg7.3 we use to vacuum the db night time, mostly because the insert > and updates in this table is made in a batch way: a single task that > puts 100.000 records in the db in 10/20minutes, so the best time to > actually vacuum the db would be after this batch. > > I have read that autovacuum cannot check to see pg load before launching > vacuum but is there any patch about it? that would sort out the problem > in a good and simple way. > Otherwise, which kind of set of parameters I should put in autovacuum > configuration? I am stuck because in our case the table gets mostly read > and if I set up things as to vacuum the table after a specific amount of > insert/updates, I cannot foresee whether this could happen during > daytime when server is under high load. > How can I configure the vacuum to run after the daily batch insert/update? It doesn't sound as if your setup is a good match for autovacuum. You might be better off going back to the cron vacuums. That's the beauty of Postgres -- it gives you the choice. If you want to continue with autovac, you may want to experiment with vacuum_cost_delay and associated parameters, which can lessen the impact of vacuuming. -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] autovacuum on a -mostly- r/o table
In response to Edoardo Ceccarelli <[EMAIL PROTECTED]>: > Rod Taylor wrote: > > On Wed, 2006-09-27 at 18:08 +0200, Edoardo Ceccarelli wrote: > > > >> I have read that autovacuum cannot check to see pg load before > >> launching > >> vacuum but is there any patch about it? that would sort out the > >> problem > >> in a good and simple way. > >> > > > > In some cases the solution to high load is to vacuum the tables being > > hit the heaviest -- meaning that simply checking machine load isn't > > enough to make that decision. > > > > In fact, that high load problem is exactly why autovacuum was created in > > the first place. > > > True, > but autovacuum could check load -before- and -during- it's execution and > it could adjust himself automatically to perform more or less > aggressively depending on the difference between those two values. > Maybe with a parameter like: maximum-autovacuum-load=0.2 > that would mean: "never load the machine more than 20% for the autovacuum" This is pretty non-trivial. How do you define 20% load? 20% of the CPU? Does that mean that it's OK for autovac to use 3% cpu and 100% of your IO? Ok, so we need to calculate an average of IO and CPU -- which disks? If your WAL logs are on one disk, and you've used tablespaces to spread the rest of your DB across different partitions, it can be pretty difficult to determine which IO parameters you want to take into consideration. -- Bill Moran Collaborative Fusion Inc. IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Problems with inconsistant query performance.
In response to Matthew Schumacher <[EMAIL PROTECTED]>: > > What I really need is a way to profile my proc when it runs slow so that > I can resolve which of the queries is really slow. Anyone with an idea > on how to do this? You could turn on statement logging and duration logging. This would give you a record of when things run and how long they take. A little work analyzing should show you which queries are running when your favorite query slows down. -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Problems with inconsistant query performance.
In response to "Jim C. Nasby" <[EMAIL PROTECTED]>: > On Thu, Sep 28, 2006 at 11:28:43AM -0400, Bill Moran wrote: > > In response to Matthew Schumacher <[EMAIL PROTECTED]>: > > > > > > What I really need is a way to profile my proc when it runs slow so that > > > I can resolve which of the queries is really slow. Anyone with an idea > > > on how to do this? > > > > You could turn on statement logging and duration logging. This would > > give you a record of when things run and how long they take. A little > > work analyzing should show you which queries are running when your > > favorite query slows down. > > By default, that doesn't help you debug what's happening inside a > function, because you only get the call to the function. I don't know if > you can increase verbosity to combat that. Right, but my point was that he believes another query is interfering when the target query is slow. Turning on those logging statements will: a) Allow him to identify times when the query is slow. b) Identify other queries that are running at the same time. If he sees a pattern (i.e. My query is always slow if query X5 is running at the same time) he'll have a good lead into further research. -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performace Optimization for Dummies
In response to "Carlo Stonebanks" <[EMAIL PROTECTED]>: > >> indexes. I don't know whether autovacuum will also analyze tables > >> for you automagically, but it would be a good idea to analyze the table > > > > It does. > > So, I have checked my log and I see an autovacuum running once every minute > on our various databases being hosted on the server - once every minute! > > From what I can see, autovacuum is hitting the db's in question about once > every five minutes. Does this imply an ANALYZE is being done automatically > that would meet the requirements we are talking about here? Is there any > benefit ot explicitly performing an ANALYZE? > > (Or does this go hand-in-and with turning off autovacuum...?) It's only checking to see if vacuum/analyze needs done every 5 minutes. It may or may not do any actual work at that time, based on how much the tables have changed. See: http://www.postgresql.org/docs/8.1/interactive/maintenance.html#AUTOVACUUM This is a case, during your bulk loads, where autovacuum might actually hurt you. How many records are you inserting/updating in 5 minutes? You may be exceeding autovacuum's ability to keep things clean. I can't say for sure, but I would suspect that you'd be better off not using autovacuum until after the initial data loads are done. My guess is that you'll get better performance if you disable autovac and write manual vacuum/analyze into your load scripts. Exactly how often to have your script do it is something that will require testing to figure out, but probably starting with every 100 or so, then adjust it up and down and see what works best. Explicitly performing a vacuum or analyze can be very beneficial, especially if you know what kind of changes your creating in the data. (Now that I think of it, there's no reason to disable autovac, as it will notice if you've just manually vacuumed a table and not do it again.) If you know that you're radically changing the kind of data in a table, manually running analyze is a good idea. If you know that you're creating a lot of dead tuples, manually vacuuming is a good idea. Especially during a big data load where these changes might be taking place faster than autovac notices. -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] autovacuum not working?
In response to "Medora Schauer" <[EMAIL PROTECTED]>: > I've recently moved to 8.1 and find that autovacuum doesn't seem to be > working, at least not the way I expected it to. I need the tuple count > for a table to be updated so indexes will be used when appropriate. I > was expecting the tuples count for a table to be updated after > autovacuum ran. This doesn't seem to be the case. I added 511 records > to a previously empty table and waited over an hour. Tuples for the > table (as per pgaccess) was 0. After I did a manual vacuum analyze it > went to 511. >From your attached config file: #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before # vacuum -- Bill Moran Collaborative Fusion Inc. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux
Subbiah, Stalin wrote: As anyone done performance benchmark testing with solaris sparc/intel linux. I once read a post here, which had benchmarking test results for using different filesystem like xfs, ext3, ext2, ufs etc. i couldn't find that link anymore and google is failing on me, so anyone have the link handy. If you're talking about the work I did, it's here: http://www.potentialtech.com/wmoran/ (then follow the link) Anyway, that should be easily portable to any platform that will run Postgres, but I don't know how useful it is in comparing two different platforms. See the information in the document. It was intended only to test disk access speed, and attempts to flood the HDD system with database work to do. Thanks! -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 23, 2004 12:13 PM To: Matt Clark; Subbiah, Stalin; 'Andrew Sullivan'; [EMAIL PROTECTED] Subject: Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux Matt, Stalin, As for the compute intensive side (complex joins & sorts etc), the Dell will most likely beat the Sun by some distance, although what the Sun lacks in CPU power it may make up a bit in memory bandwidth/ latency. Personally, I've been unimpressed by Dell/Xeon; I think the Sun might do better than you think, comparitively.On all the Dell servers I've used so far, I've not seen performance that comes even close to the hardware specs. -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] slow vacuum performance
pginfo wrote: Hi, I am running pg 7.4.1 on linux box. I have a midle size DB with many updates and after it I try to run vacuum full analyze. It takes about 2 h. If I try to dump and reload the DB it take 20 min. How can I improve the vacuum full analyze time? How often are you vacuuming? If you've gone a LONG time since the last vacuum, it can take quite a while, to the point where a dump/restore is faster. A recent realization that I've had some misconceptions about vacuuming led me to re-read section 8.2 of the admin guide (on vacuuming) ... I highly suggest a review of these 3 pages of the admin manual, as it contains an excellent description of why databases need vacuumed, that one can use to determine how often vacuuming is necessary. -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] good pc but bad performance,why?
huang yaqin wrote: > hello, > > I have some question when I use postgresql 7.4.1 on redhat adv server 2.1 . > I use IBM335 as server, it has 4 cpus, 1G RAM. but I got very bad performance. > I can only do about 50 inserts per sencond. Event worse than my pc(PIII 800,256M > RAM), can anyone give me some advice? Have you referenced this document?: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] index v. seqscan for certain values
Quick bit of input, since you didn't mention it. How often do you run ANALYZE? I found it interesting that a database I was doing tests on sped up by a factor of 20 after ANALYZE. If your data changes a lot, you should probably schedule ANALYZE to run with VACUUM. Jeremy Dunn wrote: I've searched the archives and can't find an answer to this seemingly simple question. Apologies if it's too common. The table in question has ~1.3M rows. It has 85 columns, 5 of which have single-column indexes. The column in question (CID) has 183 distinct values. For these values, the largest has ~38,000 rows, and the smallest has 1 row. About 30 values have < 100 rows, and about 10 values have > 20,000 rows. The database is 7.2.3 running on RedHat 7.1. (we are in process of upgrading to PG 7.4.2)All of the query plan options are enabled, and the cpu costs are set to the default values. ( cpu_tuple_cost is 0.01, cpu_index_tuple_cost is 0.001). The database is VACUUM'd every night. The problem: A simply query: select count(*) from xxx where CID= where is a CID value which has relatively few rows, returns a plan using the index on that column. explain analyze select count(*) from xxx where cid=869366; Aggregate (cost=19136.33..19136.33 rows=1 width=0) (actual time=78.49..78.49 rows=1 loops=1) -> Index Scan using xxx_cid on emailrcpts (cost=0.00..19122.21 rows=5648 width=0) (actual time=63.40..78.46 rows=1 loops=1) Total runtime: 78.69 msec The same plan is true for values which have up to about 20,000 rows: explain analyze select count(*) from xxx where cid=6223341; Aggregate (cost=74384.19..74384.19 rows=1 width=0) (actual time=11614.89..11614.89 rows=1 loops=1) -> Index Scan using xxx_cid on emailrcpts (cost=0.00..74329.26 rows=21974 width=0) (actual time=35.75..11582.10 rows=20114 loops=1) Total runtime: 11615.05 msec However for the values that have > 20,000 rows, the plan changes to a sequential scan, which is proportionately much slower. explain analyze select count(*) from xxx where cid=7191032; Aggregate (cost=97357.61..97357.61 rows=1 width=0) (actual time=46427.81..46427.82 rows=1 loops=1) -> Seq Scan on xxx (cost=0.00..97230.62 rows=50792 width=0) (actual time=9104.45..46370.27 rows=37765 loops=1) Total runtime: 46428.00 msec The question: why does the planner consider a sequential scan to be better for these top 10 values? In terms of elapsed time it is more than twice as slow, proportionate to an index scan for the same number of rows. What I tried: A) alter table xxx alter column cid set statistics 500; analyze xxx; This does not affect the results. B) dropped/rebuilt the index, with no improvement. C) decreasing cpu_index_tuple_cost by a factor of up to 1000, with no success D) force an index scan for the larger values by using a very high value for cpu_tuple_cost (e.g. .5) but this doesn't seem like a wise thing to do. Your thoughts appreciated in advance! - Jeremy 7+ years experience in Oracle performance-tuning relatively new to postgresql -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] [ SOLVED ] select count(*) very slow on an already
Rajesh Kumar Mallah wrote: Hi, The problem was solved by reloading the Table. the query now takes only 3 seconds. But that is not a solution. If dropping/recreating the table improves things, then we can reasonably assume that the table is pretty active with updates/inserts. Correct? The problem is that such phenomenon obscures our judgement used in optimising queries and database. Lots of phenomenon obscure that ... If a query runs slow we really cant tell if its a problem with query itself , hardware or dead rows. I already did vacumm full on the table but it still did not have that effect on performance. In fact the last figures were after doing a vacuum full. If the data gets too fragmented, a vacuum may not be enough. Also, read up on the recommendations _against_ vacuum full (recommending only using vacuum on databases) With full, vacuum condenses the database, which may actually hurt performance. A regular vacuum just fixes things up, and may leave unused space lying around. However, this should apparently achieve a balance between usage and vacuum. See the docs, they are much better at describing this than I am. Can there be any more elegent solution to this problem. As a guess, look into CLUSTER (a Postgres SQL command). CLUSTER will basically recreate the table while ordering rows based on an index. (this might benefit you in other ways as well) Don't forget to analyze after cluster. If the problem is caused by frequent updates/inserts, you may find that re-clustering the table on a certain schedule is worthwhile. Be warned, this suggestion is based on an educated guess, I make no guarantees that it will help your problem. Read the docs on cluster and come to your own conclusions. Regds Mallah. Richard Huxton wrote: On Thursday 15 April 2004 08:10, Rajesh Kumar Mallah wrote: The problem is that i want to know if i need a Hardware upgrade at the moment. Eg i have another table rfis which contains ~ .6 million records. SELECT count(*) from rfis where sender_uid > 0; Time: 117560.635 ms Which is approximate 4804 records per second. Is it an acceptable performance on the hardware below: RAM: 2 GB DISKS: ultra160 , 10 K , 18 GB Processor: 2* 2.0 Ghz Xeon Hmm - doesn't seem good, does it? If you run it again, is it much faster (since the data should be cached then)? What does "vmstat 10" show while you're running the query? One thing you should have done is read the performance tuning guide at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php The default values are very conservative, and you will need to change them. What kind of upgrades shoud be put on the server for it to become reasonable fast. If you've only got one disk, then a second disk for OS/logging. Difficult to say more without knowing numbers of users/activity etc. -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [ SOLVED ] select count(*) very slow on an already
Rajesh Kumar Mallah wrote: Bill Moran wrote: Rajesh Kumar Mallah wrote: Hi, The problem was solved by reloading the Table. the query now takes only 3 seconds. But that is not a solution. If dropping/recreating the table improves things, then we can reasonably assume that the table is pretty active with updates/inserts. Correct? Yes the table results from an import process and under goes lots of inserts and updates , but thats before the vacuum full operation. the table is not accessed during vacuum. What i want to know is is there any wat to automate the dumping and reload of a table individually. will the below be safe and effective: The CLUSTER command I described is one way of doing this. It essentially automates the task of copying the table, dropping the old one, and recreating it. If the data gets too fragmented, a vacuum may not be enough. Also, read up on the recommendations _against_ vacuum full (recommending only using vacuum on databases) With full, vacuum condenses the database, which may actually hurt performance. A regular vacuum just fixes things up, and may leave unused space lying around. However, this should apparently achieve a balance between usage and vacuum. See the docs, they are much better at describing this than I am. i understand simultaneous vacuum and usage detoriates performance mostly. but this case is different. Just want to make sure we're on the same page here. I'm not talking about vacuuming simultaneous with anything. I'm simply saying that "vacuum full" isn't always the best choice. You should probably only be doing "vacuum". The reason and details for this are in the admin docs. -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [ SOLVED ] select count(*) very slow on an already
Shea,Dan [CIS] wrote: Bill, if you had alot of updates and deletions and wanted to optimize your table, can you just issue the cluster command. Will the cluster command rewrite the table without the obsolete data that a vacuum flags or do you need to issue a vacuum first? From the reference docs: "During the cluster operation, a temporary copy of the table is created that contains the table data in the index order. Temporary copies of each index on the table are created as well. Therefore, you need free space on disk at least equal to the sum of the table size and the index sizes. "CLUSTER preserves GRANT, inheritance, index, foreign key, and other ancillary information about the table. "Because the optimizer records statistics about the ordering of tables, it is advisable to run ANALYZE on the newly clustered table. Otherwise, the optimizer may make poor choices of query plans." The primary reason CLUSTER exists is to allow you to physically reorder a table based on a key. This should provide a performance improvement if data with the same key is accessed all at once. (i.e. if you do "SELECT * FROM table WHERE key=5" and it returns 100 rows, those 100 rows are guaranteed to be all on the same part of the disk after CLUSTER, thus a performance improvement should result.) Updates and inserts will add data in the next available space in a table with no regard for any keys, and _may_ require running all over the disk to retrieve the data in the previous example query. I doubt if CLUSTER is an end-all optimization tool. The specific reason I suggested it was because the original poster was asking for an easier way to drop/recreate a table (as prior experimentation had shown this to improve performance) I can't think of anything easier than "CLUSTER ON " Since CLUSTER recreates the table, it implicitly removes the dead tuples. However, it's going to be a LOT slower than vacuum, so if dead tuples are the main problem, vacuum is still the way to go. -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Why will vacuum not end?
Shea,Dan [CIS] wrote: No, but data is constantly being inserted by userid scores. It is postgres runnimg the vacuum. Dan. -Original Message- From: Christopher Kings-Lynne [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 20, 2004 12:02 AM To: Shea,Dan [CIS] Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] Why will vacuum not end? This vacuum is running a marathon. Why will it not end and show me free space map INFO? We have deleted a lot of data and I would like to be confident that these deletions will be used as free space, rather than creating more table files. Does another postgres query running have a lock on that table? This may be a dumb question (but only because I don't know the answer) Doesn't/shouldn't vacuum have some kind of timeout so if a table is locked it will give up eventually (loudly complaining when it does so)? -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Need to run CLUSTER to keep performance
In response to Rafael Martinez <[EMAIL PROTECTED]>: > Heikki Linnakangas wrote: > > Rafael Martinez wrote: > > >> The tables with this 'problem' are not big, so CLUSTER finnish very fast > >> and it does not have an impact in the access because of locking. But we > >> wonder why this happens. > > > > 2 seconds for seq scanning 12 MB worth of data sounds like a lot. Have > > you increased shared_buffers from the default? Which operating system > > are you using? Shared memory access is known to be slower on Windows. > > > > This is a server with 8GB of ram, we are using 25% as shared_buffers. > Linux RHELAS4 with a 2.6.9-55.0.9.ELsmp kernel / x86_64. > > > On a small table like that you could run VACUUM every few minutes > > without much impact on performance. That should keep the table size in > > check. > > > > Ok, we run VACUUM ANALYZE only one time a day, every night. But we would > espect the performance to get ok again after running vacuum, and it > doesn't. Only CLUSTER helps. If you have a large value for max_fsm_pages, but only vacuum once a day, you could end up with considerable bloat on a small table, but not enough to exceed max_fsm_pages (thus you wouldn't see any warning/errors) I recommend either: a) autovaccum, with aggressive settings for that table b) a more aggressive schedule for that particular table, maybe a cron that vacuums that table every 5 minutes. You could also do a combination, i.e. enable autovacuum with conservative settings and set a cron to vacuum the table every 10 minutes. Vacuuming once a day is usually only enough if you have very minimal updates. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
On Fri, 9 Nov 2007 11:11:18 -0500 (EST) Greg Smith <[EMAIL PROTECTED]> wrote: > On Fri, 9 Nov 2007, Sebastian Hennebrueder wrote: > > > If the queries are complex, this is understable. > > The queries used for this comparison are trivial. There's only one table > involved and there are no joins. It's testing very low-level aspects of > performance. Actually, what it's really showing is parallelism, and I've always expected PostgreSQL to come out on top in that arena. -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] work_mem and shared_buffers
On Fri, 9 Nov 2007 12:08:57 -0600 "Campbell, Lance" <[EMAIL PROTECTED]> wrote: > How do you know when you should up the value of work_mem? Just play > with the number. Is there a query I could do that would tell me if > PostgreSql is performing SQL that could use more memory for sorting? 8.2 and older, it can be difficult to know, and I don't have a specific recommendation. 8.3 includes a parameter to log the usage of temporary files by Postgres. When a sort can't fit in the available memory, it uses a temp file, thus you could use this new feature to track when sorts don't fit in work_mem. -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Need to run CLUSTER to keep performance
In response to Heikki Linnakangas <[EMAIL PROTECTED]>: > Rafael Martinez wrote: > > DETAIL: 83623 dead row versions cannot be removed yet. > > Looks like you have a long-running transaction in the background, so > VACUUM can't remove all dead tuples. I didn't see that in the vacuum > verbose outputs you sent earlier. Is there any backends in "Idle in > transaction" state, if you run ps? > > In 8.1, CLUSTER will remove those tuples anyway, but it's actually not > correct. If the long-running transaction decides to do a select on > hosts-table later on, it will see an empty table because of that. That's > been fixed in 8.3, but it also means that CLUSTER might no longer help > you on 8.3. VACUUM FULL is safe in that sense in 8.1 as well. Considering how small the table is, you may want to just program the process holding the transaction open to do a vacuum full of that table when it's done with it's work. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Clustered/covering indexes (or lack thereof :-)
In response to Jeff Davis <[EMAIL PROTECTED]>: > On Sun, 2007-11-11 at 22:59 -0800, adrobj wrote: > > This is probably a FAQ, but I can't find a good answer... > > > > So - are there common techniques to compensate for the lack of > > clustered/covering indexes in PostgreSQL? To be more specific - here is my > > table (simplified): > > > > topic_id int > > post_id int > > post_text varchar(1024) > > > > The most used query is: SELECT post_id, post_text FROM Posts WHERE > > topic_id=XXX. Normally I would have created a clustered index on topic_id, > > and the whole query would take ~1 disk seek. > > > > What would be the common way to handle this in PostgreSQL, provided that I > > can't afford 1 disk seek per record returned? > > > > Periodically CLUSTER the table on the topic_id index. The table will not > be perfectly clustered at all times, but it will be close enough that it > won't make much difference. > > There's still the hit of performing a CLUSTER, however. > > Another option, if you have a relatively small number of topic_ids, is > to break it into separate tables, one for each topic_id. Or materialize the data, if performance is the utmost requirement. Create second table: materialized_topics ( topic_id int, post_ids int[], post_texts text[] ) Now add a trigger to your original table that updates materialized_topics any time the first table is altered. Thus you always have fast lookups. Of course, this may be non-optimal if that table sees a lot of updates. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] autovacuum: recommended?
In response to Jean-David Beyer <[EMAIL PROTECTED]>: > Decibel! wrote: > > On Nov 18, 2007, at 1:26 PM, gabor wrote: > >> hubert depesz lubaczewski wrote: > >>> On Fri, Nov 16, 2007 at 10:40:43AM +0100, Gábor Farkas wrote: > >>>> we are moving one database from postgresql-7.4 to postgresql-8.2.4. > >>> any particular reason why not 8.2.5? > >> > >> the distribution i use only has 8.2.4 currently. > > > > Then I think you need to consider abandoning your distribution's > > packages or find a better distribution. IIRC, 8.2.5 is over 2-3 months > > old now; there's no reason a distribution shouldn't have it at this > > point. (Unless of course you haven't kept your distribution > > up-to-date... ;) > > Some people run distributions such as Red Hat Enterprise Linux 5 (their > latest); I do. postgresql that comes with that. > > Now once they pick a version of a program, they seldom change it. They do > put security and bug fixes in it by back-porting the changes into the source > code and rebuilding it. I guess for postgresql the changes were too much for > backporting, so they upgraded from postgresql-8.1.4-1.1 that came with it > originally and are now up to postgresql-8.1.9-1.el5. I am pretty sure they > will never upgrade RHEL5 to the 8.2 series because they do not do it to get > new features. > > Now you may think there are better distributions than Red Hat Enterprise > Linux 5, but enough people seem to think it good enough to pay for it and > keep Red Hat in business. I doubt they are all foolish. > > Luckily I do not seem to be troubled by the problems experienced by the O.P. > > I do know that if I try to use .rpms from other sources, I can get in a lot > of trouble with incompatible libraries. And I cannot upgrade the libraries > without damaging other programs. I think you've missed the point. The discussion is not that the distro is bad because it hasn't moved from 8.1 -> 8.2. The comment is that it's bad because it hasn't updated a major branch with the latest bug fixes. i.e. it hasn't moved from 8.1.4 to 8.1.5. If this is indeed the case, I agree that such a distro isn't worth using. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] tuning for TPC-C benchmark
'it_IT.UTF-8' # locale for system > error message > # strings > lc_monetary = 'it_IT.UTF-8' # locale for monetary > formatting > lc_numeric = 'it_IT.UTF-8' # locale for number > formatting > lc_time = 'it_IT.UTF-8' # locale for time > formatting > > # - Other Defaults - > > #explain_pretty_print = on > #dynamic_library_path = '$libdir' > > > #--- > # LOCK MANAGEMENT > #--- > > #deadlock_timeout = 1000# in milliseconds > #max_locks_per_transaction = 64 # min 10 > # note: each lock table slot uses ~220 bytes of shared memory, and > there are > # max_locks_per_transaction * (max_connections + > max_prepared_transactions) > # lock table slots. > > > #--- > # VERSION/PLATFORM COMPATIBILITY > #--- > > # - Previous Postgres Versions - > > #add_missing_from = off > #backslash_quote = safe_encoding# on, off, or safe_encoding > #default_with_oids = off > #escape_string_warning = off > #regex_flavor = advanced# advanced, extended, or basic > #sql_inheritance = on > > # - Other Platforms & Clients - > > #transform_null_equals = off > > > #--- > # CUSTOMIZED OPTIONS > #--- > > #custom_variable_classes = '' # list of custom variable > class names -- Bill Moran Collaborative Fusion Inc. [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query only slow on first run
In response to cluster <[EMAIL PROTECTED]>: > >> Probably by buying much faster disk hardware. > > Or buy more RAM, so that the data can stay cached. > > So the only problem here is lack of RAM and/or disk speed? Not automatically, but the chances that more RAM and/or faster disks will improve this situation are probably 90% or better. Other things that could cause this problem are poor schema design, and unreasonable expectations. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings