Re: [PERFORM] forced sequential scan when condition has current_user
On Jan 4, 2010, at 1:59 PM, Robert Haas wrote: > The thing is, PostgreSQL doesn't know at planning time what the value of > current_user() will be, so the plan can't depend on that; the planner > just takes its best shot. current_user() is a stable function and the manual is explicit that the result of stable function can be used in an index scan: "A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement. This category allows the optimizer to optimize multiple calls of the function to a single call. In particular, it is safe to use an expression containing such a function in an index scan condition. (Since an index scan will evaluate the comparison value only once, not once at each row, it is not valid to use a VOLATILE function in an index scan condition.)" postgres=# select provolatile from pg_proc where proname = 'current_user'; provolatile - s So, I think the OP's question is still valid. Erik Jones, Database Administrator Engine Yard Support, Scalability, Reliability 866.518.9273 x 260 Location: US/Pacific IRC: mage2k -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] CPU bound at 99%
On Apr 22, 2008, at 10:31 AM, Bryan Buecking wrote: Hi, I'm running into an performance problem where a Postgres db is running at 99% CPU (4 cores) with about 500 concurrent connection doing various queries from a web application. This problem started about a week ago, and has been steadily going downhill. I have been tweaking the config a bit, mainly shared_memory but have seen no noticeable improvements. at any given time there is about 5-6 postgres in startup (ps auxwww | grep postgres | grep startup | wc -l) about 2300 connections in idle (ps auxwww | grep postgres | idle) and loads of "FATAL: sorry, too many clients already" being logged. The server that connects to the db is an apache server using persistent connections. MaxClients is 2048 thus the high number of connections needed. Application was written in PHP using the Pear DB class. Are you referring to PHP's persistent connections? Do not use those. Here's a thread that details the issues with why not: http://archives.postgresql.org/pgsql-general/2007-08/msg00660.php . Basically, PHP's persistent connections are NOT pooling solution. Us pgpool or somesuch. max_connections = 2400 That is WAY too high. Get a real pooler, such as pgpool, and drop that down to 1000 and test from there. I see you mentioned 500 concurrent connections. Are each of those connections actually doing something? My guess that once you cut down on the number actual connections you'll find that each connection can get it's work done faster and you'll see that number drop significantly. For example, our application does anywhere from 200 - 600 transactions per second, dependent on the time of day/week, and we never need more that 150 to 200 connections (although we do have the max_connections set to 500). Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] connections slowing everything down?
On Apr 21, 2008, at 9:15 AM, Adrian Moisey wrote: Hi # ps -ef | grep idle | wc -l 87 [...] I have 2 web servers which connect to PGPool which connects to our postgres db. I have noticed that idle connections seem to take up CPU and RAM (according to top). Could this in any way cause things to slow down? Dependant on how much memory you have in your system, yes. You can fix the constant use of memory by idle connections by adjusting the child_life_time setting in your pgpool.conf file. The default if 5 minutes which a bit long. Try dropping that down to 20 or 30 seconds. We have 32GBs. If I get it to close the connections faster, will that actually help? Is there a way i can figure it out? First, sorry, I gave you the wrong config setting, I meant connection_life_time. child_life_time is the lifetime of an idle pool process on the client machine and the connection_life_time is the lifetime of an idle connection (i.e. no transaction running) on the server. With the default connection_life_time of 5 minutes it's easily possible to keep an connection open indefinitely. Imagine a client gets a connection and runs a single query, then nothing happens on that connection for 4:30 minutes at which point another single query is run. If that pattern continues that connection will never be relinquished. While the point of a pool is to cut down on the number of connections that need to be established, you don't necessarily want to go the extreme and never tear down connections as that will cause a degradation in available server resources. With a smaller, but not 0, connection life time, connections will stay open and available during periods of high work rates from the client, but will be relinquished when there isn't as much to do. Without more details on what exactly is happening on your system I can't say for sure that this is your fix. Are you tracking/monitoring your server's free memory? If not I'd suggest getting either Cacti or Monit in place to monitor system stats such as free memory (using vmstat), system IO (using iostat), db transaction rates (using db queries). Then you'll be able to draw correlations between application behavior (slowness, etc) and actual system numbers. I know that I had issues with connections being held open for long times (using the default 300s) causing our free memory to gradually decrease over the day and resetting our pools would clear it out so there was a direct cause and effect relationship there. When I dropped the connection_life_time to 30s the problem went away. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] connections slowing everything down?
On Apr 21, 2008, at 4:50 AM, Adrian Moisey wrote: Hi # ps -ef | grep idle | wc -l 87 # ps -ef | grep SELECT | wc -l 5 I have 2 web servers which connect to PGPool which connects to our postgres db. I have noticed that idle connections seem to take up CPU and RAM (according to top). Could this in any way cause things to slow down? Dependant on how much memory you have in your system, yes. You can fix the constant use of memory by idle connections by adjusting the child_life_time setting in your pgpool.conf file. The default if 5 minutes which a bit long. Try dropping that down to 20 or 30 seconds. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] shared_buffers in 8.2.x
On Apr 10, 2008, at 7:39 AM, Gaetano Mendola wrote: Hi all, specifing as shared_buffers = 26800 in 8.2.x will this value accepted like in the 8.1.x series and then 26800*8192 bytes = 209 MB or 26800 bytes (not being specified the memory unit)? With no specified unit then it defaults to 8K. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] large tables and simple "= constant" queries using indexes
On Apr 10, 2008, at 9:44 AM, John Beaver wrote: Thanks a lot, all of you - this is excellent advice. With the data clustered and statistics at a more reasonable value of 100, it now reproducibly takes even less time - 20-57 ms per query. After reading the section on "Statistics Used By the Planner" in the manual, I was a little concerned that, while the statistics sped up the queries that I tried immeasurably, that the most_common_vals array was where the speedup was happening, and that the values which wouldn't fit in this array wouldn't be sped up. Though I couldn't offhand find an example where this occurred, the clustering approach seems intuitively like a much more complete and scalable solution, at least for a read-only table like this. As to whether the entire index/table was getting into ram between my statistics calls, I don't think this was the case. Here's the behavior that I found: - With statistics at 10, the query took 25 (or so) seconds no matter how many times I tried different values. The query plan was the same as for the 200 and 800 statistics below. - Trying the same constant a second time gave an instantaneous result, I'm guessing because of query/result caching. - Immediately on increasing the statistics to 200, the query took a reproducibly less amount of time. I tried about 10 different values - Immediately on increasing the statistics to 800, the query reproducibly took less than a second every time. I tried about 30 different values. - Decreasing the statistics to 100 and running the cluster command brought it to 57 ms per query. - The Activity Monitor (OSX) lists the relevant postgres process as taking a little less than 500 megs. - I didn't try decreasing the statistics back to 10 before I ran the cluster command, so I can't show the search times going up because of that. But I tried killing the 500 meg process. The new process uses less than 5 megs of ram, and still reproducibly returns a result in less than 60 ms. Again, this is with a statistics value of 100 and the data clustered by gene_prediction_view_gene_ref_key. And I'll consider the idea of using triggers with an ancillary table for other purposes; seems like it could be a useful solution for something. FWIW, killing the backend process responsible for the query won't necessarily clear the table's data from memory as that will be in the shared_buffers. If you really want to flush the data from memory you need to read in data from other tables of a size total size greater than your shared_buffers setting. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance with temporary table
On Apr 9, 2008, at 6:41 PM, samantha mahindrakar wrote: Hi The reason for using the temporary table is that i need this data buffered somewhere so that i can use it for later computation. And the fact that for each imputation i need to have historical data from 10 previous weeks makes it necessary to create something that can hold the data. However once the computation is done for each record i wouldn't need that historical data for that record. I Would be moving on to the next record and find its own historical data. Is there any way i can avoid using temp table? What's wrong with the data in the paritions? Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how can a couple of expensive queries drag my system down?
On Mar 26, 2008, at 3:31 PM, Scott Marlowe wrote: On Wed, Mar 26, 2008 at 1:48 PM, p prince <[EMAIL PROTECTED]> wrote: is this 'normal'? (loaded question I know) Should I be looking to offload expensive reporting queries to read- only replicants of my database? Yes, definitely look into setting up something like a slony slave that's used for reporting queries. The nice thing about this setup is you only need to replicate the tables you run reports against. For simple two-node (i.e. no cascaded replication) I'd suggest looking into Londiste. It's loads easier to wrap your head around and it's extremely easy to add/remove tables from replication as it doesn't deal with "table sets" like Slony does. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimisation help
On Mar 4, 2008, at 6:54 PM, dforums wrote: Hello, After controling the settings I so, that shared_buffers is configurated at 1024 (the default), however, in my postgresql.conf I set it to 25, is it due to shared memory settings, should I increase shmmax? Did you do a full restart of the db cluster? Changes to shared memory settings require that. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your Subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] 7 hrs for a pg_restore?
On Feb 20, 2008, at 10:54 AM, Tom Lane wrote: Erik Jones <[EMAIL PROTECTED]> writes: On Feb 20, 2008, at 8:14 AM, Gregory Stark wrote: I would suggest leaving out the && which only obfuscate what's going on here. PGOPTIONS=... pg_restore ... would work just as well and be clearer about what's going on. Right, that's just an unnecessary habit of mine. Isn't that habit outright wrong? ISTM that with the && in there, what you're doing is equivalent to PGOPTIONS=whatever pg_restore ... This syntax will set PGOPTIONS for the remainder of the shell session, causing it to also affect (say) a subsequent psql invocation. Which is exactly not what is wanted. Yes. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 7 hrs for a pg_restore?
On Feb 20, 2008, at 8:14 AM, Gregory Stark wrote: "Douglas J Hunley" <[EMAIL PROTECTED]> writes: On Tuesday 19 February 2008 16:32:02 Erik Jones wrote: pg_restore is a postgres client app that uses libpq to connect and, thus, will pick up anything in your $PGOPTIONS env variable. So, PGOPTONS="-c maintenance_work_mem=512MB" && pg_restore now that's just plain cool /me updates our wiki I would suggest leaving out the && which only obfuscate what's going on here. PGOPTIONS=... pg_restore ... would work just as well and be clearer about what's going on. Right, that's just an unnecessary habit of mine. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] wal_sync_methods for AIX
On Feb 19, 2008, at 3:58 PM, Dan Langille wrote: Erik Jones wrote: On Feb 15, 2008, at 3:55 PM, Dan Langille wrote: We're using PostgreSQL 8.1.11 on AIX 5.3 and we've been doing some playing around with various settings. So far, we've (I say we, but it's another guy doing the work) found that open_datasync seems better than fsync. By how much, we have not yet determined, but initial observations are promising. Here's a good explanation (by the Greg Smith) on the different sync methods. It basically says that if you have open_datasync available, it'll probably beat everything else. Where is that explanation? Sorry, did I leave off the link? http://www.westnet.com/~gsmith/ content/postgresql/TuningPGWAL.htm Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 7 hrs for a pg_restore?
On Feb 19, 2008, at 2:55 PM, Douglas J Hunley wrote: On Tuesday 19 February 2008 15:07:30 Jeff wrote: On Feb 19, 2008, at 1:22 PM, Tom Lane wrote: maintenance_work_mem, to be more specific. If that's too small it will definitely cripple restore speed. I'm not sure fsync would make much difference, but checkpoint_segments would. See http://www.postgresql.org/docs/8.3/static/populate.html#POPULATE-PG- DUMP I wonder if it would be worthwhile if pg_restore could emit a warning if maint_work_mem is "low" (start flamewar on what "low" is). And as an addition to that - allow a cmd line arg to have pg_restore bump it before doing its work? On several occasions I was moving a largish table and the COPY part went plenty fast, but when it hit index creation it slowed down to a crawl due to low maint_work_mem.. fwiw, I +1 this now that I have a (minor) understanding of what's going on, I'd love to do something like: pg_restore -WM $large_value pg_restore is a postgres client app that uses libpq to connect and, thus, will pick up anything in your $PGOPTIONS env variable. So, PGOPTONS="-c maintenance_work_mem=512MB" && pg_restore Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Disable WAL completely
On Feb 18, 2008, at 3:32 AM, hubert depesz lubaczewski wrote: On Mon, Feb 18, 2008 at 03:00:47PM +0530, Kathirvel, Jeevanandam wrote: Is there way to minimize the I/O operation on disk/CF. Can I create RAM file system and point the pg_xlog files to RAM location instead of CF. whether this will work? it will, but in case you'll lost power you will also (most probably) loose your database. Right. Without the xlog directory you'll have very little chance of ever doing any kind of clean stop/start of your database. If you don't need the reliability offered by Postgres's use of transaction logs you'll probably be much better served with a different database or even a completely different storage scheme than trying to make Postgres fit that bill. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] wal_sync_methods for AIX
On Feb 15, 2008, at 3:55 PM, Dan Langille wrote: We're using PostgreSQL 8.1.11 on AIX 5.3 and we've been doing some playing around with various settings. So far, we've (I say we, but it's another guy doing the work) found that open_datasync seems better than fsync. By how much, we have not yet determined, but initial observations are promising. Here's a good explanation (by the Greg Smith) on the different sync methods. It basically says that if you have open_datasync available, it'll probably beat everything else. Our tests have been on a p550 connected to DS6800 array using pgbench. One nasty behaviour we have seen is long running commits. Initial thoughts connected them with checkpoints, but the long running commits do not correlate with checkpoints being written. Have you seen this behaviour? FYI, 8.3.0 is not an option for us in the short term. What have you been using on AIX and why? I really don't know anything about AIX, but are you sure that these long running commits are directly correlated with using open_datasync? Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
On Feb 15, 2008, at 12:42 PM, Joshua D. Drake wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Fri, 15 Feb 2008 12:37:10 -0600 Erik Jones <[EMAIL PROTECTED]> wrote: (welll, forced to) migrate to a new system with a sane drive configuration. The old set up was done horribly by a sysadmin who's no longer with us who set us up with a RAID5 array with both the data and xlogs both mirrored across all of the disks with no spares. Is the admin still with us? Or is he fertilizer? I have some know some great gardeners from Jersey... Heh, he's definitely no long with us although not in the sense that he's now "pushin' up daisies"... Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
On Feb 15, 2008, at 12:06 PM, Josh Berkus wrote: On Friday 15 February 2008 06:29, Greg Smith wrote: PostgreSQL only uses direct I/O for writing to the WAL; everything else goes through the regular OS buffer cache unless you force it to do otherwise at the OS level (like some Solaris setups do with forcedirectio). Also, note that even when direct I/O is available, most users and benchmark tests have reported that having PostgreSQL "take over" the entire cache is not a net performance gain. I believe this is mostly because our I/ O and caching code aren't designed for this kind of operation. I believe that MyEmma had a different experience on their workload, though. Actually, while we did have shared_buffers set to 4G on an 8G system when we were running with forcedirectio, the decision to even run with forcedirectio was a temporary until we were able (welll, forced to) migrate to a new system with a sane drive configuration. The old set up was done horribly by a sysadmin who's no longer with us who set us up with a RAID5 array with both the data and xlogs both mirrored across all of the disks with no spares. So, I wouldn't consider the numbers I was seeing then a reliable expectation as that system was nowhere close to ideal. We've seen much more sane and consistent numbers on a more normal setup, i.e. without forcedirectio and with <= 25% system memory. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Optimizer : query rewrite and execution plan ?
On Feb 6, 2008, at 7:35 AM, Roberts, Jon wrote: Since the SQL is not your fault and difficult to control, it is an argument in favour of an optional planner mode that would perform additional checks for redundant clauses of various kinds. The default for that would be "off" since most people don't suffer from this problem. BO isn't the only SQL generating-client out there, so I think this is a fairly wide problem. I would have to disagree. I spend a lot of time writing code that generates SQL from a business app and feel strongly that any optimisation is my responsibility. The point to a BI tool like BO is to abstract the data collection and do it dynamically. The SQL is built at run time because the tool is designed to give the end user as much flexibility as the data structure allows to query the data however they want. It isn't feasible, possible, or recommended to rewrite all of the possible generated SQL that could be designed at runtime by the tool. No, but it is feasible to expect the tool to generate well-formed queries without redundant clauses. There are plenty that do. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] After Vacuum Analyse - Procedure performance notimproved - Innner select is faster
On Jan 9, 2008, at 12:00 AM, Anoo Sivadasan Pillai wrote: Why the procedure is not getting the performance advantage of Vacuum analyse? Plan caching by the function, probably. Try disconnecting the session and reconnecting to prove the hypothesis. If it is a recurring problem for you, you could put the SELECT under EXECUTE in the function. But most likely this is just a one-time problem. Is there any way to clear the cached plan manually other than disconnecting (With the help of some commands/Configuration settings) ? Only as of 8.3. Erik Jones DBA | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Putting files into fields in a table
On Dec 13, 2007, at 2:09 PM, Campbell, Lance wrote: I did not see much info in the 8.2 documentation on BLOB. I did ready about "bytea" or binary data type. It seems like it would work for storing files. I guess I could stick with the OS for file storage but it is a pain. It would be easier to use the DB. In postgres they're simply called Large Objects (or LOBs) and there is a whole chapter devoted to them in Part IV of the manual. Note that you only need to use this facility if you're going to be storing data over 1G in size (at which point your limit becomes 2G). What kind of data are in these files? What gain do you foresee in storing the files directly in the db (as opposed, say, to storing the paths to the files in the filesystem)? Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] database tuning
On Dec 11, 2007, at 5:18 PM, kelvan wrote: you know what you lot have left my original question this server is a temporary piece of shit my original question is what are the overheads for postgres but obviously no one knows or no one knows where a webpage containing this information is -_- overhead information i would to know is row overheads column overheads and header overheads for blocks and anything else i have missed trust me postgres and a Mac don't like working together you have no idea the amount of problems we have incurred with php trying to talk to postgres on a Mac out biggest problem is Mac tecs are incompetent and we cannot get any support for the server I know postgres connects fine just we cannot get it working on the Mac so I love your guys ideas but they don't work that's why I have had to take another approach if we were not using a Mac we would have none of the problems we have with connection issues such as php seems to want to take up 20 db connections at a time but I think we fixed that problem overall our problem is the Mac and we cannot get it support neither I nor the web app developer are Mac savvy hell as far as we have seen no Mac tec is Mac savvy either we cannot get parts of postgres to run on a Mac either such as pgagent which is necessary for us but we cannot seem to find a daemon that works on a Mac I have a list of problems a mile long and none of them are postgres it is the Mac so minus all that as the Mac is only a temporary solution can anyone just answer the original question for me if not and I mean no offence to anyone but I really don't care as I am going to re do it all later down the track as I have said your ideas sound good just not Mac oriented nor are they to do with my original question I have never had trouble finding overhead information on any other DBMS I have used this is the first time I have had to ask for it and since this DBMS is open source I have to ask a community rather than a company if anyone is wondering why I don't switch now money and time are not on my side and for those who wonder why don't I leave this job is big time just starts off small time but the potential of this job is very nice and as they say if you want something good you have to work hard for it I am not a fan of taking the easy way out as it brings no benefits for those who want to know more I cannot tell you as I am under a confidentiality agreement Kelvan, proper capitalization and punctuation are virtues when trying to communicate extensively via text mediums. I, for one, read the first couple and last couple of lines of this message after gruelingly reading your last message and I wouldn't be surprised if others with more experience and better answers at the ready simply ignored both as that much text is extremely difficult to follow in the absence those aforementioned virtues. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Cost-Based Vacuum Delay tuning
On Dec 7, 2007, at 10:44 AM, Guillaume Cottenceau wrote: Erik Jones writes: vacuum_cost_delay/vacuum_cost_limit (deactivated) 20/200 40/200 100/1000 150/1000 200/1000 300/1000 VACUUM ANALYZE time54 s112 s188 s109 s 152 s 190 s 274 s SELECT time50 s 28 s 26 s 24 s22 s 20 s 19 s While you do mention that the table you're running your select on is too big to fit in the shared_buffers, the drop in time between the first run and the rest most likely still reflects the fact that when These figures don't show a difference between first run and subsequent runs. For each parameter tuning, a couple of runs are fired after database restart, and once the value is approximately constant, it's picked and put in this table. The "deactivated" shows the (stable, from subsequent runs) figure when vacuum delay is disabled (vacuum_cost_delay parameter quoted), not the first run, if that's where the confusion came from. It was. Is it on pgsql-hackers? I haven't found much stuff in pgsql-performance while looking for "vacuum_cost_delay tuning". would be good have multiple autovacuum workers that could be tuned for different times or workloads. I know Alvarro was going to work Sounds interesting. Run the initial archive search against pgsql-general over the last year for a thread called 'Autovacuum Improvements' I'm wondering if it would not be possible to dynamically ignore (or lower, if it makes more sense?) the Cost-Based Vacuum Delay during vacuum full, if a configurable amount of queries are waiting for the lock? (please save yourself from answering "you should never run VACUUM FULL if you're vacuuming enough" - as long as VACUUM FULL is available in PostgreSQL, there's no reason to not make it as practically usable as possible, albeit with low dev priority) Ok, I won't say what you said not to say. But, I will say that I don't agree with you're conjecture that VACUUM FULL should be made more lightweight, it's like using dynamite to knock a whole in a wall for a window. Thanks for opening a new kind of trol^Hargument against VACUUM FULL, that one's more fresh (at least to me, who doesn't follow the list too close anyway). Just for the record, I inherited a poorly (actually, "not" would be more appropriate) tuned database, containing more than 90% of dead tuples on large tables, and I witnessed quite some performance improvement while I could fix that. If you really want the VACUUM FULL effect without having to deal with vacuum_cost_delay, use CLUSTER. It also re-writes the table and, AFAIK, is not subject to any of the vacuum related configuration parameters. I'd argue that if you really need VACUUM FULL, you may as well use CLUSTER to get a good ordering of the re-written table. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Cost-Based Vacuum Delay tuning
ering if it would not be possible to dynamically ignore (or lower, if it makes more sense?) the Cost-Based Vacuum Delay during vacuum full, if a configurable amount of queries are waiting for the lock? (please save yourself from answering "you should never run VACUUM FULL if you're vacuuming enough" - as long as VACUUM FULL is available in PostgreSQL, there's no reason to not make it as practically usable as possible, albeit with low dev priority) Ok, I won't say what you said not to say. But, I will say that I don't agree with you're conjecture that VACUUM FULL should be made more lightweight, it's like using dynamite to knock a whole in a wall for a window. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] doubt with pg_dump and high concurrent used databases
On Nov 25, 2007, at 10:46 AM, Pablo Alcaraz wrote: Hi all, I read that pg_dump can run while the database is being used and makes "consistent backups". I have a huge and *heavy* selected, inserted and updated database. Currently I have a cron task that disconnect the database users, make a backup using pg_dump and put the database online again. The problem is, now there are too much information and everyday the database store more and more data, the backup process needs more and more time to run and I am thinking about to do the backup using a process that let me to do it with the minimal interruptions for the users. I do not need a last second backup. I could the a backup with "almost all" the data but I need the information on it to be coherent. For example, if the backup store information about an invoice it *must* to store both header and items invoice information. I could live if the backup does not store some invoices information when is ran, because they ll be backuped the next time the backup process run. But I can not store only a part of the invoices. That is I call a coherent backup. The best for me is that the cron tab does a concurrent backup with all the information until the time it starts to run while the clients are using the database. Example: if the cron launch the backup process at 12:30 AM, the backup moust be builded with all the information *until* 12:30AM. So if I need to restore it I get a database coherent with the same information like it was at 12:30AM. it does not matter if the process needs 4 hours to run. Does the pg_dump create this kind of "consistent backups"? Or do I need to do the backups using another program? Yes, that is exactly what pg_dump does. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Curious about dead rows.
On Nov 14, 2007, at 4:46 PM, Tom Lane wrote: Russell Smith <[EMAIL PROTECTED]> writes: It is possible that analyze is not getting the number of dead rows right? Hah, I think you are on to something. ANALYZE is telling the truth about how many "dead" rows it saw, but its notion of "dead" is "not good according to SnapshotNow". Thus, rows inserted by a not-yet-committed transaction would be counted as dead. So if these are background auto-analyzes being done in parallel with inserting transactions that run for awhile, seeing a few not-yet-committed rows would be unsurprising. Wouldn't this result in a variable number of dead rows being reported on separate runs including zero while no pending inserts are happening? This may be a good way to verify that this is what is happening if he can quiet down his app long enough to run an ANALYZE in isolation. Perhaps, if the ANALYZE runs fast enough he can just lock the table for the run. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
On Nov 11, 2007, at 2:17 PM, Joshua D. Drake wrote: Dimitri wrote: Seems to me there is more thread model implementation problem on FreeBSD, and databases just reflecting it... Most of the test I done on Solaris show the same performance level on the same short READ- only queries for MySQL and PostgreSQL. And to be honest till the end, thread model should be far faster (context switching between threads is way faster vs processes), but - as I say usually - even a very good idea may be just wasted by a poor implementation... And in case of MySQL they have too much locking to manage concurrency between threads which kills all thread model benefits... Also, to compare apples to apples, they should run this test from remote client rather locally on the same host - however in this case the result for PostgreSQL will mostly depends on client implementation: if client implements reading via CURSOR (quite often), reading will generate 4x times more intensive network traffic than necessary and final PostgreSQL result will be worse... Reading this article I'm just happy for them to see progress done on FreeBSD :-) As well to demonstrate OS parallelism it's not so impressive to see 4CPU server results rather 8CPU or 32threaded Niagara... Don't know why they did not present similar performance graphs for these platform, strange no?... I don't find it strange. I would rather see benchmarks on what the majority of people running on the platform are going to run. Most people don't run 8core machines and they especially don't run 32thread Niagra boxes. Wait! So, what do you check you're email with? :) Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] work_mem and shared_buffers
On Nov 9, 2007, at 1:24 PM, Scott Marlowe wrote: On Nov 9, 2007 1:19 PM, Campbell, Lance <[EMAIL PROTECTED]> wrote: It is amazing, how after working with databases very actively for over 8 years, I am still learning things. The fun thing about postgresql is that just when you've got it figured out, somebody will come along and improve it in such a way as to make your previously gathered knowledge obsolete. In a good way. I imagine in a few years, hardly anyone using postgresql will remember the ancient art of having either apostrophes in a row inside your plpgsql functions... Speaking of that devil, I started working with Postgres mere months after that particular evil went away but we still have a good bit of plpgsql with it in production. I've been meaning to convert it and clean it up for a while now. Would you, or anybody, happen to know of any scripts out there that I could grab to make a quick job, no brains required of it? Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
On Nov 9, 2007, at 6:06 AM, Ivan Voras wrote: Hi, I just read this document and thought I should share it with this list: http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf Among other things (FreeBSD advocacy, mostly :) ), it contains a direct comparison between MySQL and PostgreSQL on various platforms, with PostgreSQL winning! Which is typical for those who aren't in on the FUD :) Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Migrating to 8.3 - checkpoints and background writer
On Nov 4, 2007, at 6:33 PM, Greg Smith wrote: For those of you considering a move to the upcoming 8.3 release, now in beta, I've written some documentation on the changes made in checkpoint and background writer configuration in the new version: http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm Since the first half of that covers the current behavior in 8.1 and 8.2, those sections may be helpful if you'd like to know more about checkpoint slowdowns and ways to resolve them even if you have no plans to evaluate 8.3 yet. I'd certainly encourage anyone who can run the 8.3 beta to consider adding some tests in this area while there's still time to correct any issues encountered before the official release. Greg, thanks a lot of this. I'd say this should definitely be linked to from the main site's techdocs section. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Bunching "transactions"
On Oct 25, 2007, at 10:30 AM, Jean-David Beyer wrote: I have just changed around some programs that ran too slowly (too much time in io-wait) and they speeded up greatly. This was not unexpected, but I wonder about the limitations. By transaction, I mean a single INSERT or a few related INSERTs. What I used to do is roughly like this: for each file { for each record { BEGIN WORK; INSERT stuff in table(s); if error { ROLLBACK WORK } else { COMMIT WORK; } } } The speedup was the obvious one: for each file { BEGIN WORK; for each record { INSERT stuff in table(s); } if error { ROLLBACK WORK } else { COMMIT WORK; } } This means, of course, that the things I think of as transactions have been bunched into a much smaller number of what postgreSQL thinks of as large transactions, since there is only one per file rather than one per record. Now if a file has several thousand records, this seems to work out just great. But what is the limitation on such a thing? In this case, I am just populating the database and there are no other users at such a time. I am willing to lose the whole insert of a file if something goes wrong -- I would fix whatever went wrong and start over anyway. But at some point, disk IO would have to be done. Is this just a function of how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it have to do with wal_buffers and checkpoint_segments? You're reading data from a file and generating inserts? Can you not use COPY? That would be the most performant. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] [SQL] two queryes in a single tablescan
On Oct 20, 2007, at 12:19 PM, Andreas Kretschmer wrote: Markus Schaber <[EMAIL PROTECTED]> schrieb: is there any way to get both results in a single query, eventually through stored procedure? The retrieved [count(*),A] ; [count(*),B)] data couldnt fit on a single table, of course. The main goal would be to get multiple results while scanning the table[s] once only thus getting results in a faster way. PostgreSQL 8.3 contains great improvements in this area, you can simply start the selects from concurrent connections, and the backend will synchronize the scans. works this right across different transactions? I mean, for instance, TX a insert rows and TX b insert other rows and both clients (with different transactions) starts a seq-scan? If you are in read-committed mode and both backends start their scans after the other has made its insert, then yes. Note Markus's point that both queries must be initiated by concurrent connections. Since Postgres doesn't have any kind of shared transaction mechanism across connections then this is inherent. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Huge amount of memory consumed during transaction
On Oct 12, 2007, at 4:48 PM, henk de wit wrote: > > I have work_mem set to 256MB. > Wow. That's inordinately high. I'd recommend dropping that to 32-43MB. Ok, it seems I was totally wrong with the work_mem setting. I'll adjust it to a more saner level. Thanks a lot for the advice everyone! > Explain is your friend in that respect. It shows all the operators, but it doesn't really say that these all will actually run in parallel right? Of course I guess it would give a good idea about what the upper bound is. You can determine what runs in parellel based on the indentation of the output. Items at the same indentation level under the same "parent" line will run in parallel Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Huge amount of memory consumed during transaction
On Oct 12, 2007, at 4:09 PM, henk de wit wrote: > It looks to me like you have work_mem set optimistically large. This > query seems to be doing *many* large sorts and hashes: I have work_mem set to 256MB. Reading in PG documentation I now realize that "several sort or hash operations might be running in parallel". So this is most likely the problem, although I don't really understand why memory never seems to increase for any of the other queries (not executed in a transaction). Some of these are at least the size of the query that is giving problems. Wow. That's inordinately high. I'd recommend dropping that to 32-43MB. Btw, is there some way to determine up front how many sort or hash operations will be running in parallel for a given query? Explain is your friend in that respect. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Huge amount of memory consumed during transaction
On Oct 11, 2007, at 9:51 AM, Tom Lane wrote: henk de wit <[EMAIL PROTECTED]> writes: ERROR: out of memory DETAIL: Failed on request of size 4194304. This error should have produced a map of per-context memory use in the postmaster log. Please show us that. regards, tom lane Tom, are there any docs anywhere that explain how to interpret those per-context memory dumps? For example, when I see an autovacuum context listed is it safe to assume that the error came from an autovac operation, etc.? Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] [Again] Postgres performance problem
On Sep 13, 2007, at 12:58 AM, Greg Smith wrote: On Wed, 12 Sep 2007, Scott Marlowe wrote: I'm getting more and more motivated to rewrite the vacuum docs. I think a rewrite from the ground up might be best... I keep seeing people doing vacuum full on this list and I'm thinking it's as much because of the way the docs represent vacuum full as anything. I agree you shouldn't start thinking in terms of how to fix the existing documentation. I'd suggest instead writing a tutorial leading someone through what they need to know about their tables first and then going into how vacuum works based on that data. As an example, people throw around terms like "index bloat" and "dead tuples" when talking about vacuuming. The tutorial I'd like to see somebody write would start by explaining those terms and showing how to measure them--preferably with a good and bad example to contrast. The way these terms are thrown around right now, I don't expect newcomers to understand either the documentation or the advice people are giving them; I think it's shooting over their heads and what's needed are some walkthroughs. Another example I'd like to see thrown in there is what it looks like when you don't have enough FSM slots. Isn't that the point of the documentation? I mean, if the existing, official manual has been demonstrated (through countless mailing list help requests) to not sufficiently explain a given topic, shouldn't it be revised? One thing that might help is a hyperlinked glossary so that people reading through the documentation can go straight to the postgres definition of dead tuple, index bloat, etc. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] [Again] Postgres performance problem
On Sep 12, 2007, at 2:19 PM, Frank Schoep wrote: On Sep 12, 2007, at 9:07 PM, Scott Marlowe wrote: On 9/12/07, Mikko Partio <[EMAIL PROTECTED]> wrote: … Aren't you mixing up REINDEX and CLUSTER? … Either one does what a vacuum full did / does, but generally does it better. On topic of REINDEX / VACUUM FULL versus a CLUSTER / VACUUM ANALYZE I'd like to ask if CLUSTER is safe to run on a table that is in active use. After updating my maintenance scripts from a VACUUM FULL (add me to the list) to CLUSTER (which improves performance a lot) I noticed I was getting "could not open relation …" errors in the log while the scripts ran so I reverted the change. This was on 8.1.9. You'd probably see the same behavior on 8.2.x. CLUSTER is not transactionally safe so you don't want to run CLUSTER on tables that are actively being used. I believe that's been fixed for 8.3. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] 8.2 Autovacuum BUG ?
On Aug 31, 2007, at 2:08 PM, Tom Lane wrote: Alvaro Herrera <[EMAIL PROTECTED]> writes: Mikko Partio escribió: Off-topic question: the documentation says that XID numbers are 32 bit. Could the XID be 64 bit when running on a 64 bit platform? That would effectively prevent wrap-around issues. No, because they would take too much space in tuple headers. It's worth noting that the patch Florian is working on, to suppress assignment of XIDs for transactions that never write anything, will make for a large reduction in the rate of XID consumption in many real- world applications. That will reduce the need for tuple freezing and probably lessen the attraction of wider XIDs even more. If he gets it done soon (before the HOT dust settles) I will be strongly tempted to try to sneak it into 8.3 ... regards, tom lane Off topic and just out of curiousity, is this the work that will allow standby servers to have selects run on them without stopping WAL replay? Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] io storm on checkpoints, postgresql 8.2.4, linux
On Aug 22, 2007, at 10:57 AM, Kenneth Marshall wrote: On Wed, Aug 22, 2007 at 07:33:35PM +0400, Dmitry Potapov wrote: Hello! We run a large (~66Gb) web-backend database on Postgresql 8.2.4 on Linux. The hardware is Dual Xeon 5130 with 16Gb ram, LSI Megaraid U320-2x scsi controller w/512Mb writeback cache and a BBU. Storage setup contains 3 raid10 arrays (data, xlog, indexes, each on different array), 12 HDDs total. Frontend application uses jdbc driver, connection pooling and threads. We've run into an issue of IO storms on checkpoints. Once in 20min (which is checkpoint_interval) the database becomes unresponsive for about 4-8 seconds. Query processing is suspended, server does nothing but writing a large amount of data to disks. Because of the db server being stalled, some of the web clients get timeout and disconnect, which is unacceptable. Even worse, as the new requests come at a pretty constant rate, by the time this storm comes to an end there is a huge amount of sleeping app. threads waiting for their queries to complete. After the db server comes back to life again, these threads wake up and flood it with queries, so performance suffer even more, for some minutes after the checkpoint. It seemed strange to me that our 70%-read db generates so much dirty pages that writing them out takes 4-8 seconds and grabs the full bandwidth. First, I started to tune bgwriter to a more aggressive settings, but this was of no help, nearly no performance changes at all. Digging into the issue further, I discovered that linux page cache was the reason. "Dirty" parameter in /proc/meminfo (which shows the amount of ready-to- write "dirty" data currently sitting in page cache) grows between checkpoints from 0 to about 100Mb. When checkpoint comes, all the 100mb got flushed out to disk, effectively causing a IO storm. I found this (http://www.westnet.com/~gsmith/content/linux- pdflush.htm <http://www.westnet.com/%7Egsmith/content/linux-pdflush.htm>) document and peeked into mm/page-writeback.c in linux kernel source tree. I'm not sure that I understand pdflush writeout semantics correctly, but looks like when the amount of "dirty" data is less than dirty_background_ratio*RAM/ 100, pdflush only writes pages in background, waking up every dirty_writeback_centisecs and writing no more than 1024 pages (MAX_WRITEBACK_PAGES constant). When we hit dirty_background_ratio, pdflush starts to write out more agressively. So, looks like the following scenario takes place: postgresql constantly writes something to database and xlog files, dirty data gets to the page cache, and then slowly written out by pdflush. When postgres generates more dirty pages than pdflush writes out, the amount of dirty data in the pagecache is growing. When we're at checkpoint, postgres does fsync () on the database files, and sleeps until the whole page cache is written out. By default, dirty_background_ratio is 2%, which is about 328Mb of 16Gb total. Following the curring pdflush logic, nearly this amount of data we face to write out on checkpoint effective stalling everything else, so even 1% of 16Gb is too much. My setup experience 4-8 sec pause in operation even on ~100Mb dirty pagecache... I temporaly solved this problem by setting dirty_background_ratio to 0%. This causes the dirty data to be written out immediately. It is ok for our setup (mostly because of large controller cache), but it doesn't looks to me as an elegant solution. Is there some other way to fix this issue without disabling pagecache and the IO smoothing it was designed to perform? -- Regards, Dmitry Dmitry, You are working at the correct level. The bgwriter performs the I/O smoothing function at the database level. Obviously, the OS level smoothing function needed to be tuned and you have done that within the parameters of the OS. You may want to bring this up on the Linux kernel lists and see if they have any ideas. Good luck, Ken Have you tried decreasing you checkpoint interval? That would at least help to reduce the amount of data that needs to be flushed when Postgres fsyncs. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views
On Aug 25, 2007, at 2:58 PM, Erik Jones wrote: On Aug 24, 2007, at 7:41 PM, Benjamin Arai wrote: Hi, I have an application which loads millions of NEW documents each month into a PostgreSQL tsearch2 table. I have the initial version completed and searching performance is great but my problem is that each time a new month rolls around I have to drop all the indexes do a COPY and re- index the entire table. This is problematic considering that each month takes longer than the previous to rebuild the indexes and the application in unavailable during the rebuilding process. In order to avoid the re-indexing I was thinking of instead creating a new table each month (building its indexes and etc) and accessing them all through a view. This way I only have to index the new data each month. Does this work? Does a view with N tables make it N times slower for tsearch2 queries? Is there a better solution? You can use Postgres's inheritance mechanism for your partitioning mechanism and combine it with constraint exclusion to avoid the N^2 issues. See: http://www.postgresql.org/docs/8.2/interactive/ddl-inherit.html and http://www.postgresql.org/docs/8.2/interactive/ddl-partitioning.html Basically, create a table from which all of your partitioned tables inherit. Partition in such a way that you can use constraint exclusion and then you can treat the parent table like the view you were suggesting. Erik Jones Sorry, I didn't see that you had crossposted and carried the conversation on another list. Please, don't do that. Avoid the top posting, as well. They both make it difficult for others to join in or follow the conversations and issues. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views
On Aug 24, 2007, at 7:41 PM, Benjamin Arai wrote: Hi, I have an application which loads millions of NEW documents each month into a PostgreSQL tsearch2 table. I have the initial version completed and searching performance is great but my problem is that each time a new month rolls around I have to drop all the indexes do a COPY and re- index the entire table. This is problematic considering that each month takes longer than the previous to rebuild the indexes and the application in unavailable during the rebuilding process. In order to avoid the re-indexing I was thinking of instead creating a new table each month (building its indexes and etc) and accessing them all through a view. This way I only have to index the new data each month. Does this work? Does a view with N tables make it N times slower for tsearch2 queries? Is there a better solution? You can use Postgres's inheritance mechanism for your partitioning mechanism and combine it with constraint exclusion to avoid the N^2 issues. See: http://www.postgresql.org/docs/8.2/interactive/ddl-inherit.html and http://www.postgresql.org/docs/8.2/interactive/ddl-partitioning.html Basically, create a table from which all of your partitioned tables inherit. Partition in such a way that you can use constraint exclusion and then you can treat the parent table like the view you were suggesting. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Update table performance
On Aug 8, 2007, at 3:00 AM, Heikki Linnakangas wrote: Erik Jones wrote: Decibel! wrote: I should mention that if you can handle splitting the update into multiple transactions, that will help a lot since it means you won't be doubling the size of the table. As I mentioned above, when you do an update you're actually inserting a new row and deleting the old one. That deleted row is still considered part of the table (for reasons of concurrency, read up on the concurrency chapter in the manual for the details) and once it is no longer visible by any live transactions can be re-used by future inserts. So, if you update one column on every row of a one million row table all at once, you have to allocate and write out one million new rows. But, if you do the update a quarter million at a time, the last three updates would be able to re-use many of the rows deleted in earlier updates. Only if you vacuum between the updates. This is true. In fact, the chapter on Routine Database Maintenance tasks that discusses vacuuming explains all of this. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Update table performance
On Aug 7, 2007, at 6:13 PM, Mark Makarowsky wrote: Can you provide more detail on what you mean by your two suggestions below: Yeah, I've used "vertical partitioning" very successfully in the past, though I've never done it for just a single field. I'll typically leave the few most common fields in the "main" table and pull everything else into a second table. Vertical partitioning is where you split up your table on disk by columns, i.e on the vertical lines. He quoted it because Postgres doesn't actually support it transparently but you can always fake it by splitting up your table. For example, given the following table wherein column bar gets updated a lot but the others don't: create table foo ( id int not null, bar int, baz int, primary key (id) ); You could split it up like so: create table foo_a ( id int, baz int, primary key (id) ); create table foo_b ( foo_id int, bar int, foreign key foo_a_id (foo_id) references foo_a (id) ); The reason you'd ever want to do this is that when Postgres goes to update a row what it actually does is inserts a new row with the new value(s) that you changed and marks the old one as deleted. So, if you have a wide table and frequently update only certain columns, you'll take a performance hit as you're having to re-write a lot of static values. I should mention that if you can handle splitting the update into multiple transactions, that will help a lot since it means you won't be doubling the size of the table. As I mentioned above, when you do an update you're actually inserting a new row and deleting the old one. That deleted row is still considered part of the table (for reasons of concurrency, read up on the concurrency chapter in the manual for the details) and once it is no longer visible by any live transactions can be re-used by future inserts. So, if you update one column on every row of a one million row table all at once, you have to allocate and write out one million new rows. But, if you do the update a quarter million at a time, the last three updates would be able to re-use many of the rows deleted in earlier updates. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to use a trigger to write rows to a remote server
On Jul 18, 2007, at 11:30 AM, Michael Dengler wrote: Hmm..I was hoping to avoid personal insults Anyway, Nuts or not...what I am attempting is to simply have row from one table inserted into another servers DB I don't see it as replication because: a) The destination table will have a trigger that modifies the arriving data to fit its table scheme. b) It is not critical that the data be synchronous (ie a lost row on the destination DB is not a big deal) c) I see as more of a provision of data to the destination DB NOT A REPLICATION OF DATA. Essentially the remote server just wants to know when some record arrives at the source server and wants to know some of the info contained in the new record. And yes it may be that I know little about the myriad of problems involved with replication...but I do know how to carry on a civil, adult conversationmaybe we can have a knowledge exchange. Cheers Mike Mike, If all you need is for your trigger to make a simple query on another db then you can use dblink or an untrusted version of one of the available procedural languages such as plperlu or plpythonu. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Foreign Key Deadlocking
On Apr 19, 2007, at 9:00 AM, Dave Cramer wrote: On 18-Apr-07, at 11:36 AM, Csaba Nagy wrote: Can someone confirm that I've identified the right fix? I'm pretty sure that won't help you... see: http://archives.postgresql.org/pgsql-general/2006-12/msg00029.php The deadlock will be there if you update/insert the child table and update/insert the parent table in the same transaction (even if you update some other field on the parent table than the key referenced by the child table). If your transactions always update/insert only one of those tables, it won't deadlock (assuming you order the inserts/ updates properly per PK). Cheers, Csaba. Hi Csaba, I have a similar problem. In an attempt to avoid the overhead of select count(*) from mailbox where uid = somuid I've implemented triggers on insert and delete. So there is a user table which refers to to an inbox table, so when people insert into the inbox there is an RI trigger grabbing the shared lock, then the count triggers try to grab an exclusive lock resulting in a deadlock. Can we safely remove the shared locks ? Is there a right way to implement the count triggers. I've tried before triggers, and after triggers, both result in different kinds of deadlocks. Dave The ways I've done this in the past is to have the count triggers make inserts into some interim table rather than try to update the actual count field and have another process that continually sweeps what's in the interim table and makes aggregated updates to the count table. Even if there isn't much to aggregate on any given sweep, this gives you a sequential pattern as your inserts/deletes on the main table don't depend on any locking in another table (well, strictly speaking, your inserts into the interim table would be blocked by any exclusive locks on it but you shouldn't need to ever do that anyway). erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] a question about Direct I/O and double buffering
On Apr 5, 2007, at 3:33 PM, [EMAIL PROTECTED] wrote: On Thu, 5 Apr 2007, Xiaoning Ding wrote: To the best of my knowledge, Postgres itself does not have a direct IO option (although it would be a good addition). So, in order to use direct IO with postgres you'll need to consult your filesystem docs for how to set the forcedirectio mount option. I believe it can be set dynamically, but if you want it to be permanent you'll to add it to your fstab/vfstab file. I use Linux. It supports direct I/O on a per-file basis only. To bypass OS buffer cache, files should be opened with O_DIRECT option. I afraid that I have to modify PG. as someone who has been reading the linux-kernel mailing list for 10 years, let me comment on this a bit. linux does have a direct i/o option, Yes, I know applications can request direct i/o with the O_DIRECT flag to open(), but can this be set to be forced for all applications or for individual applications from "outside" the application (not that I've ever heard of something like the second)? but it has significant limits on when and how you cna use it (buffers must be 512byte aligned and multiples of 512 bytes, things like that). That's a standard limit imposed by the sector size of hard drives, and is present in all direct i/o implementations, not just Linux. Also, in many cases testing has shon that there is a fairly significant performance hit for this, not a perfomance gain. Those performance hits have been noticed for high i/o transaction databases? The idea here is that these kinds of database manage their own caches and having a separate filesystem cache in virtual memory that works with system memory page sizes is an unneeded level of indirection. Yes, you should expect other "normal" utilities will suffer a performance hit as if you are trying to cp a 500 byte file you'll still have to work with 8K writes and reads whereas with the filesystem cache you can just write/read part of a page in memory and let the cache decide when it needs to write and read from disk. If there are other caveats to direct i/o on Linux I'd love to hear them. erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] a question about Direct I/O and double buffering
On Apr 5, 2007, at 2:56 PM, Mark Lewis wrote: ... [snipped for brevity] ... Not to hijack this thread, but has anybody here tested the behavior of PG on a file system with OS-level caching disabled via forcedirectio or by using an inherently non-caching file system such as ocfs2? I've been thinking about trying this setup to avoid double-caching now that the 8.x series scales shared buffers better, but I figured I'd ask first if anybody here had experience with similar configurations. -- Mark Rather than repeat everything that was said just last week, I'll point out that we just had a pretty decent discusson on this last week that I started, so check the archives. In summary though, if you have a high io transaction load with a db where the average size of your "working set" of data doesn't fit in memory with room to spare, then direct io can be a huge plus, otherwise you probably won't see much of a difference. I have yet to hear of anybody actually seeing any degradation in the db performance from it. In addition, while it doesn't bother me, I'd watch the top posting as some people get pretty religious about (I moved your comments down). I saw the thread, but my understanding from reading through it was that you never fully tracked down the cause of the factor of 10 write volume mismatch, so I pretty much wrote it off as a data point for forcedirectio because of the unknowns. Did you ever figure out the cause of that? -- Mark Lewis Nope. What we never tracked down was the factor of 10 drop in database transactions, not disk transactions. The write volume was most definitely due to the direct io setting -- writes are now being done in terms of the system's block size where as before they were being done in terms of the the filesystem's cache page size (as it's in virtual memory). Basically, we do so many write transactions that the fs cache was constantly paging. erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] a question about Direct I/O and double buffering
On Apr 5, 2007, at 1:27 PM, Mark Lewis wrote: On Thu, 2007-04-05 at 13:09 -0500, Erik Jones wrote: On Apr 5, 2007, at 12:09 PM, Xiaoning Ding wrote: Hi, A page may be double buffered in PG's buffer pool and in OS's buffer cache. Other DBMS like DB2 and Oracle has provided Direct I/O option to eliminate double buffering. I noticed there were discusses on the list. But I can not find similar option in PG. Does PG support direct I/O now? The tuning guide of PG usually recommends a small shared buffer pool (compared to the size of physical memory). I think it is to avoid swapping. If there were swapping, OS kernel may swap out some pages in PG's buffer pool even PG want to keep them in memory. i.e. PG would loose full control over buffer pool. A large buffer pool is not good because it may 1. cause more pages double buffered, and thus decrease the efficiency of buffer cache and buffer pool. 2. may cause swapping. Am I right? If PG's buffer pool is small compared with physical memory, can I say that the hit ratio of PG's buffer pool is not so meaningful because most misses can be satisfied by OS Kernel's buffer cache? Thanks! To the best of my knowledge, Postgres itself does not have a direct IO option (although it would be a good addition). So, in order to use direct IO with postgres you'll need to consult your filesystem docs for how to set the forcedirectio mount option. I believe it can be set dynamically, but if you want it to be permanent you'll to add it to your fstab/vfstab file. Not to hijack this thread, but has anybody here tested the behavior of PG on a file system with OS-level caching disabled via forcedirectio or by using an inherently non-caching file system such as ocfs2? I've been thinking about trying this setup to avoid double-caching now that the 8.x series scales shared buffers better, but I figured I'd ask first if anybody here had experience with similar configurations. -- Mark Rather than repeat everything that was said just last week, I'll point out that we just had a pretty decent discusson on this last week that I started, so check the archives. In summary though, if you have a high io transaction load with a db where the average size of your "working set" of data doesn't fit in memory with room to spare, then direct io can be a huge plus, otherwise you probably won't see much of a difference. I have yet to hear of anybody actually seeing any degradation in the db performance from it. In addition, while it doesn't bother me, I'd watch the top posting as some people get pretty religious about (I moved your comments down). erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] a question about Direct I/O and double buffering
On Apr 5, 2007, at 1:22 PM, Xiaoning Ding wrote: Erik Jones wrote: On Apr 5, 2007, at 12:09 PM, Xiaoning Ding wrote: Hi, A page may be double buffered in PG's buffer pool and in OS's buffer cache. Other DBMS like DB2 and Oracle has provided Direct I/O option to eliminate double buffering. I noticed there were discusses on the list. But I can not find similar option in PG. Does PG support direct I/O now? The tuning guide of PG usually recommends a small shared buffer pool (compared to the size of physical memory). I think it is to avoid swapping. If there were swapping, OS kernel may swap out some pages in PG's buffer pool even PG want to keep them in memory. i.e. PG would loose full control over buffer pool. A large buffer pool is not good because it may 1. cause more pages double buffered, and thus decrease the efficiency of buffer cache and buffer pool. 2. may cause swapping. Am I right? If PG's buffer pool is small compared with physical memory, can I say that the hit ratio of PG's buffer pool is not so meaningful because most misses can be satisfied by OS Kernel's buffer cache? Thanks! To the best of my knowledge, Postgres itself does not have a direct IO option (although it would be a good addition). So, in order to use direct IO with postgres you'll need to consult your filesystem docs for how to set the forcedirectio mount option. I believe it can be set dynamically, but if you want it to be permanent you'll to add it to your fstab/vfstab file. I use Linux. It supports direct I/O on a per-file basis only. To bypass OS buffer cache, files should be opened with O_DIRECT option. I afraid that I have to modify PG. Xiaoning Looks like it. I just did a cursory search of the archives and it seems that others have looked at this before so you'll probably want to start there if your up to it. erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] a question about Direct I/O and double buffering
On Apr 5, 2007, at 12:09 PM, Xiaoning Ding wrote: Hi, A page may be double buffered in PG's buffer pool and in OS's buffer cache. Other DBMS like DB2 and Oracle has provided Direct I/O option to eliminate double buffering. I noticed there were discusses on the list. But I can not find similar option in PG. Does PG support direct I/O now? The tuning guide of PG usually recommends a small shared buffer pool (compared to the size of physical memory). I think it is to avoid swapping. If there were swapping, OS kernel may swap out some pages in PG's buffer pool even PG want to keep them in memory. i.e. PG would loose full control over buffer pool. A large buffer pool is not good because it may 1. cause more pages double buffered, and thus decrease the efficiency of buffer cache and buffer pool. 2. may cause swapping. Am I right? If PG's buffer pool is small compared with physical memory, can I say that the hit ratio of PG's buffer pool is not so meaningful because most misses can be satisfied by OS Kernel's buffer cache? Thanks! To the best of my knowledge, Postgres itself does not have a direct IO option (although it would be a good addition). So, in order to use direct IO with postgres you'll need to consult your filesystem docs for how to set the forcedirectio mount option. I believe it can be set dynamically, but if you want it to be permanent you'll to add it to your fstab/vfstab file. erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] Shared buffers, db transactions commited, and write IO on Solaris
On Apr 3, 2007, at 11:51 AM, Dimitri wrote: Well, to check if there is a real potential gain all we need is a small comparing test using PgSQL compiled with LOG block size equal to say 1K and direct IO enabled. Rgds, -Dimitri On 3/30/07, Kenneth Marshall <[EMAIL PROTECTED]> wrote: > On Fri, Mar 30, 2007 at 04:25:16PM +0200, Dimitri wrote: > > The problem is while your goal is to commit as fast as possible - it's > > pity to vast I/O operation speed just keeping common block size... > > Let's say if your transaction modification entering into 512K - you'll > > be able to write much more 512K blocks per second rather 8K per second > > (for the same amount of data)... Even we rewrite probably several > > times the same block with incoming transactions - it still costs on > > traffic, and we will process slower even H/W can do better. Don't > > think it's good, no? ;) > > > > Rgds, > > -Dimitri > > > With block sizes you are always trading off overhead versus space > efficiency. Most OS write only in 4k/8k to the underlying hardware > regardless of the size of the write you issue. Issuing 16 512byte > writes has much more overhead than 1 8k write. On the light transaction > end, there is no real benefit to a small write and it will slow > performance for high throughput environments. It would be better to, > and I think that someone is looking into, batching I/O. > > Ken > Folks, to close topic with "LOG block size=1K" idea - I took a time to test it (yes) and in best cases there is only 15% gain comparing to 8K - storage protocol is quite heavy itself, so less or more data sent within it doesn't reduce service time too much... As well even this gain is quickly decreasing with growing workload! So, yes 8K is good enough and probably the most optimal choice for LOG (as well data) block size. Rgds, -Dimitri Hey, man, thanks for taking the time to profile that! erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] Shared buffers, db transactions commited, and write IO on Solaris
On Mar 30, 2007, at 4:46 PM, Josh Berkus wrote: Erik, You'er welcome! However, I believe our situation is very different from what you're testing if I understand you correctly. Are you saying that you're entire database will fit in memory? If so, then these are very different situations as there is no way ours could ever do that. In fact, I'm not sure that forcedirectio would really net you any gain in that situation as the IO service time will be basically nil if the filesystem cache doesn't have to page which I would think is why your seeing what you are. Even more interesting. I guess we've been doing too much work with benchmark workloads, which tend to be smaller databases. Thing is, there's *always* I/O for a read/write database. If nothing else, updates have to be synched to disk. Right. But, how *much* I/O? Anyway ... regarding the mystery transactions ... are you certain that it's not your application? I can imagine that, if your app has a fairly tight retry interval for database non-response, that I/O sluggishness could result in commit attempts spinning out of control. Well, our application code itself doesn't retry queries if the db is taking a long time to respond. However, we do have a number of our servers making db connections via pgpool so you may be on to something here. While I will be taking these questions to the pgpool lists, I'll posit them here as well: If a pgpool child process reaches it's connection lifetime while waiting on a query to complete, does pgpool retry the query with another child? If a connection thus dies, does the transaction complete normally on the server? If the answers to these questions are both yes, this could definitely be what was happening. erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] Shared buffers, db transactions commited, and write IO on Solaris
On Mar 30, 2007, at 10:05 AM, Kenneth Marshall wrote: On Fri, Mar 30, 2007 at 04:25:16PM +0200, Dimitri wrote: The problem is while your goal is to commit as fast as possible - it's pity to vast I/O operation speed just keeping common block size... Let's say if your transaction modification entering into 512K - you'll be able to write much more 512K blocks per second rather 8K per second (for the same amount of data)... Even we rewrite probably several times the same block with incoming transactions - it still costs on traffic, and we will process slower even H/W can do better. Don't think it's good, no? ;) Rgds, -Dimitri With block sizes you are always trading off overhead versus space efficiency. Most OS write only in 4k/8k to the underlying hardware regardless of the size of the write you issue. Issuing 16 512byte writes has much more overhead than 1 8k write. On the light transaction end, there is no real benefit to a small write and it will slow performance for high throughput environments. It would be better to, and I think that someone is looking into, batching I/O. Ken True, and really, considering that data is only written to disk by the bgwriter and at checkpoints, writes are already somewhat batched. Also, Dimitri, I feel I should backtrack a little and point out that it is possible to have postgres write in 512byte blocks (at least for UFS which is what's in my head right now) if you set the systems logical block size to 4K and fragment size to 512 bytes and then set postgres's BLCKSZ to 512bytes. However, as Ken has just pointed out, what you gain in space efficiency you lose in performance so if you're working with a high traffic database this wouldn't be a good idea. erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] Shared buffers, db transactions commited, and write IO on Solaris
On Mar 30, 2007, at 8:14 AM, Dimitri wrote: You are right in that the page size constraint is lifted in that directio cuts out the VM filesystem cache. However, the Solaris kernel still issues io ops in terms of its logical block size (which we have at the default 8K). It can issue io ops for fragments as small as 1/8th of the block size, but Postgres issues its io requests in terms of the block size which means that io ops from Postgres will be in 8K chunks which is exactly what we see when we look at our system io stats. In fact, if any io request is made that isn't a multiple of 512 bytes (the disk sector size), the file system switches back to the buffered io. Oh, yes, of course! yes, you still need to respect multiple of 512 bytes block size on read and write - sorry, I was tired :) Then it's seems to be true - default XLOG block size is 8K, means for every even small auto-committed transaction we should write 8K?... Is there any reason to use so big default block size?... Probably it may be a good idea to put it as 'initdb' parameter? and have such value per database server? I believe it's because that is a pretty normal Unix kernal block size and you want the two to match. erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] Shared buffers, db transactions commited, and write IO on Solaris
On Mar 29, 2007, at 5:15 PM, Dimitri wrote: >> > Erik, > > using 'forcedirectio' simply brings your write operations to the > *real* volume - means while you need to write 10 bytes you'll write 10 > bytes (instead of UFS block size (8K)). So it explains me why your > write volume became slower. I men 'lower' (not slower) Sorry, that's not true. Google "ufs forcedirectio" go to the first link and you will find: "forcedirectio The forcedirectio (read "force direct IO") UFS option causes data to be buffered in kernel address whenever data is transferred between user address space and the disk. In other words, it bypasses the file system cache. For certain types of applications -- primarily database systems -- this option can dramatically improve performance. In fact, some database experts have argued that a file using the forcedirectio option will outperform a raw partition, though this opinion seems fairly controversial. The forcedirectio improves file system performance by eliminating double buffering, providing a small, efficient code path for file system reads and writes and removing pressure on memory." Erik, please, don't take me wrong, but reading Google (or better man pages) don't replace brain and basic practice... Direct IO option is not a silver bullet which will solve all your problems (try to do 'cp' on the mounted in 'forcedirectio' filesystem, or use your mailbox on it - you'll quickly understand impact)... However, what this does mean is that writes will be at the actual filesystem block size and not the cache block size (8K v. 512K). while UFS filesystem mounted normally, it uses its own cache for all operations (read and write) and saves data modifications on per page basis, means: when a process writes 200 bytes there will be 200 bytes modified in cache, then whole page is written (8K) once data demanded to be flushed (and WAL is writing per each commit)... Now, mounted with 'forcedirectio' option UFS is free of page size constraint and will write like a raw device an exactly demanded amount of data, means: when a process writes 200 bytes it'll write exactly 200 bytes to the disk. = You are right in that the page size constraint is lifted in that directio cuts out the VM filesystem cache. However, the Solaris kernel still issues io ops in terms of its logical block size (which we have at the default 8K). It can issue io ops for fragments as small as 1/8th of the block size, but Postgres issues its io requests in terms of the block size which means that io ops from Postgres will be in 8K chunks which is exactly what we see when we look at our system io stats. In fact, if any io request is made that isn't a multiple of 512 bytes (the disk sector size), the file system switches back to the buffered io. However, to understand TX number mystery I think the only possible solution is to reproduce a small live test: (I'm sure you're aware you can mount/unmount forcedirectio dynamically?) during stable workload do: # mount -o remount,logging /path_to_your_filesystem and check if I/O volume is increasing as well TX numbers than come back: # mount -o remount,forcedirectio /path_to_your_filesystem and see if I/O volume is decreasing as well TX numbers... That's an excellent idea and I'll run it by the rest of our team tomorrow. erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] Shared buffers, db transactions commited, and write IO on Solaris
On Mar 29, 2007, at 7:23 PM, Josh Berkus wrote: Erik, Wow, thanks for the post. We've just started testing the option of sizing shared_buffers bigger than the database, and using forcedirectio in benchmarks at Sun. So far, our experience has been *equal* performance in that configuration, so it's *very* interesting to see you're getting a gain. -- --Josh Josh, You'er welcome! However, I believe our situation is very different from what you're testing if I understand you correctly. Are you saying that you're entire database will fit in memory? If so, then these are very different situations as there is no way ours could ever do that. In fact, I'm not sure that forcedirectio would really net you any gain in that situation as the IO service time will be basically nil if the filesystem cache doesn't have to page which I would think is why your seeing what you are. erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] Shared buffers, db transactions commited, and write IO on Solaris
On Mar 29, 2007, at 2:19 PM, Tom Lane wrote: Erik Jones <[EMAIL PROTECTED]> writes: One very important thing to note here is that the number, or rather rate, of disk writes has not changed. It's the volume of data in those writes that has dropped, along with those transaction mysterious counts. Hmm. I'm suddenly thinking about the stats collector: in existing 8.2.x releases it's got a bug that causes it to write the collected-stats file much too often. If you had done something that would shrink the size of the stats file, that might explain this observation. Do you have stats_reset_on_server_start turned on? Nope. The drop in reported transaction rate is still baffling though. Are you sure you're really doing the same amount of work? Can you estimate what you think the transaction rate *should* be from a what-are-your- clients- doing perspective? Unfortunately, I can't think of any way to do that. Our app is made up of a lot of different components and not all of them are even directly client driven. For the client driven portions of the app any given web request can contain anywhere from around 10 to sometimes over 50 different xacts (and, that just a guesstimate). Also, we didn't start tracking xact counts via pg_stat_database until about two months ago when we were in IO bound hell and we actually thought that the really big xact #s were normal for our app as that was the first and, thus, only numbers we had to work with. Also, another metric we track is to take a count from pg_stat_activity of queries running longer than 1 second every five minutes. Before these recent changes it wasn't uncommon to see that count start to seriously stack up to over 200 at times with write intensive queries hanging out for sometimes 30 minutes or more (we'd often end having to kill them...). Since we upped the shared buffers and turned on forcedirectio for our fs mount, that number has stayed under 50 and has only crossed 20 once. erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] Shared buffers, db transactions commited, and write IO on Solaris
On Mar 29, 2007, at 12:41 PM, dimitri k wrote: On 3/29/07, Erik Jones <[EMAIL PROTECTED]> wrote: On Mar 29, 2007, at 11:16 AM, Tom Lane wrote: > Erik Jones <[EMAIL PROTECTED]> writes: >> We've recently made a couple changes to our system that have resulted >> in a drastic increase in performance as well as some very confusing >> changes to the database statistics, specifically >> pg_stat_database.xact_commit. Here's the details: > > I'm kinda boggled too. I can see how increasing shared buffers could > result in a drastic reduction in write rate, if the working set of > your > queries fits in the new space but didn't fit in the old. I have no > idea > how that leads to a drop in number of transactions committed though. > It doesn't make sense that autovac would run less frequently, because > it's driven by number of tuples changed not number of disk writes; and > that could hardly account for a 10x drop anyway. > > Did you by any chance take note of exactly which processes were > generating all the I/O or the CPU load? Well, wrt to the CPU load, as I said, we're pretty sure that's autovac as we still get spikes that hit about the same threshold, after which cache hits go up dramatically and the spikes just don't last two days anymore. As far as the procs responsible for the writes go, we were unable to see that from the OS level as the guy we had as a systems admin last year totally screwed us with the way he set up the SunCluster on the boxes and we have been unable to run Dtrace which has left us watching a lot of iostat. However, we did notice a direct correlation between write spikes and "write intensive" queries like large COPYs, UPDATEs, and INSERTs. One very important thing to note here is that the number, or rather rate, of disk writes has not changed. It's the volume of data in those writes that has dropped, along with those transaction mysterious counts. Could the bgwriter be the culprit here? Does anything it does get logged as a transaction? erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r) Erik, using 'forcedirectio' simply brings your write operations to the *real* volume - means while you need to write 10 bytes you'll write 10 bytes (instead of UFS block size (8K)). So it explains me why your write volume became slower. Sorry, that's not true. Google "ufs forcedirectio" go to the first link and you will find: "forcedirectio The forcedirectio (read "force direct IO") UFS option causes data to be buffered in kernel address whenever data is transferred between user address space and the disk. In other words, it bypasses the file system cache. For certain types of applications -- primarily database systems -- this option can dramatically improve performance. In fact, some database experts have argued that a file using the forcedirectio option will outperform a raw partition, though this opinion seems fairly controversial. The forcedirectio improves file system performance by eliminating double buffering, providing a small, efficient code path for file system reads and writes and removing pressure on memory." However, what this does mean is that writes will be at the actual filesystem block size and not the cache block size (8K v. 512K). Now, why TX number is reduced - is a small mystery :) Options: - you really do 10 times less commits, means you work 10 times slower? ;) what about users? how do you measure your work performance? We are an email marketing service provider with a web front end application. We measure work performance via web requests (counts, types, etc...), mailer activity and the resulting database activity. We are doing as much or more work now than previously, and faster. - TX reported in pg_* tables are not exact, but I don't believe at all :) Even if they aren't exact, being off by a factor of 10 wouldn't be believable. the forcedirectio mount setting for ufs can definitely explain the drop in data written volume, but doesn't do much to explain the difference in xact commits. erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] Shared buffers, db transactions commited, and write IO on Solaris
On Mar 29, 2007, at 11:16 AM, Tom Lane wrote: Erik Jones <[EMAIL PROTECTED]> writes: We've recently made a couple changes to our system that have resulted in a drastic increase in performance as well as some very confusing changes to the database statistics, specifically pg_stat_database.xact_commit. Here's the details: I'm kinda boggled too. I can see how increasing shared buffers could result in a drastic reduction in write rate, if the working set of your queries fits in the new space but didn't fit in the old. I have no idea how that leads to a drop in number of transactions committed though. It doesn't make sense that autovac would run less frequently, because it's driven by number of tuples changed not number of disk writes; and that could hardly account for a 10x drop anyway. Did you by any chance take note of exactly which processes were generating all the I/O or the CPU load? Well, wrt to the CPU load, as I said, we're pretty sure that's autovac as we still get spikes that hit about the same threshold, after which cache hits go up dramatically and the spikes just don't last two days anymore. As far as the procs responsible for the writes go, we were unable to see that from the OS level as the guy we had as a systems admin last year totally screwed us with the way he set up the SunCluster on the boxes and we have been unable to run Dtrace which has left us watching a lot of iostat. However, we did notice a direct correlation between write spikes and "write intensive" queries like large COPYs, UPDATEs, and INSERTs. One very important thing to note here is that the number, or rather rate, of disk writes has not changed. It's the volume of data in those writes that has dropped, along with those transaction mysterious counts. Could the bgwriter be the culprit here? Does anything it does get logged as a transaction? erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
[PERFORM] Shared buffers, db transactions commited, and write IO on Solaris
Greetings, We've recently made a couple changes to our system that have resulted in a drastic increase in performance as well as some very confusing changes to the database statistics, specifically pg_stat_database.xact_commit. Here's the details: OS: Solaris10 x86 Server: Sunfire X4100, 8GB Memory, 2 Dual Core Opterons Postgres 8.2.3 Disk array: Sun STK 6130 + CSM100 SATA tray, dual channel MPXIO, 15k drives, RAID5 across 14 disks WAL logs on SATA RAID10 SAN architecture, 2 brocade FABRIC switches The changes we made were: Increase shared buffers from 15 to 20 Set the disk mount for the data directory to use forcedirectio (added that mount option that to the /etc/vfstab entry (ufs fs)) So, the reason we did this was that for months now we'd been experiencing extremely high IO load from both the perspective of the OS and the database, specifically where writes were concerned. During peak hourse it wasn't unheard of for pg_stat_database to report anywhere from 50 to 100 transactions committed in an hour. iostat's %b (disk busy) sat at 100% for longer than we'd care to think about with the wait percentage going from a few percent on up to 50% at times and the cpu load almost never rising from around a 2 avg., i.e. we were extremely IO bound in all cases. As soon as we restarted postgres after making those changes the IO load was gone. While we the number and amount of disk reads have stayed pretty much the same and the number of disk writes have stayed the same, the amount of data being written has dropped by about a factor of 10, which is huge. The cpu load shot way up to around a 20 avg. and stayed that way up and stayed that way for about two days (we're thinking that was autovacuum "catching up"). In addition, and this is the truly confusing part, the xact_commit and xact_rollback stats from pg_stat_database both dropped by an order of magnitude (another factor of 10). So, we are now doing 5 to 10 commits per hour during peak hours. So, where were all of those extra transactions coming from? Are transactions reported on in pg_stat_database anything but SQL statements? What was causing all of the excess(?!) data being written to the disk (it seems that there's a 1:1 correspondence between the xacts and volume of data being written)? Given that we have the bgwriter on, could it have been the culprit and one of the changes allowed it to now operate more efficiently and/or correctly? erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: OT: Munin (was Re: [PERFORM] Determining server load from client)
On Mar 21, 2007, at 4:13 PM, Tobias Brox wrote: [Erik Jones - Wed at 09:31:48AM -0500] I use cacti (http://cacti.net) which does the same thing that munin does but in php instead. Here's what I use to db stats to it (again, php): I haven't tried cacti, but our sysadm has done a little bit of research and concluded "cacti is better". Maybe some day we'll move over. Munin is generating all the graphs statically every fifth minute, while cacti generates them on demand as far as I've understood. The munin approach is pretty bloat, since one usually would watch the graphs much more seldom than what they are generated (at least, we do). That's not really an argument since CPU is cheap nowadays - but a real argument is that the munin approach is less flexible. One would like to adjust the graph (like, min/max values for both axis) while watching quite some times. Well, by "default", Cacti polls all of the data sources you've set up every five minutes as well as that's how the docs instruct you to set up the cron job for the poller. However, with a little understanding of how the rrdtool rras work, you could definitely poll more often and simply edit the existing rras and datasources to expect that or create new ones. And, yes, the graph customization is pretty cool although for the most part the just map what's available from the rrdtool graph functionality. If you do decide to set up Cacti I suggest you go straight to the faq section of the manual and read the part about going from a simple script to a graph. The main manual is almost entirely centered on the built-in networking (e.g. snmp) data sources and, as such, doesn't do much for explaining how to set up other data sources. erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: OT: Munin (was Re: [PERFORM] Determining server load from client)
On Mar 21, 2007, at 5:13 AM, Tobias Brox wrote: I have my postgres munin monitoring script at http://oppetid.no/~tobixen/pg_activity.munin.txt (had to suffix it with .txt to make the local apache happy). I would like to see what others have done as well. I use cacti (http://cacti.net) which does the same thing that munin does but in php instead. Here's what I use to db stats to it (again, php): You basically call the script with the database name and the stat you want. I have the active_queries stat set up as a gauge in cacti and the others as counters: if(!isset($argv[1])) {echo "DB name argument required!\n";exit (); } $stats = array('xact_commit', 'xact_rollback', 'blks_read', 'blks_hit', 'active_queries'); if(!isset($argv[2]) || !in_array($argv[2], $stats)) {echo "Invalid stat arg!: {$argv[2]}"; exit(); } require_once('DB.php'); $db_name = $argv[1]; if(DB::isError($db = DB::connect("pgsql://[EMAIL PROTECTED]:5432/$db_name"))) { exit(); } if($argv[2] == 'active_queries') { $actives_sql = "SELECT COUNT(*) FROM pg_stat_activity WHERE current_query NOT ILIKE '' AND now() - query_start > '1 second';"; if(DB::isError($db_stat = $db->getOne($actives_sql))) { exit(); } echo "$db_stat\n"; exit(); } $db_stat_sql = "SELECT {$argv[2]} FROM pg_stat_database WHERE datname='$db_name';"; if(DB::isError($db_stat = $db->getOne($db_stat_sql))) { exit(); } echo "$db_stat\n"; erik jones <[EMAIL PROTECTED]> software developer 615-296-0838 emma(r)
Re: [PERFORM] Partitioning
On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote: Take a look at the set of partitioning functions I wrote shortly after the 8.1 release: http://www.studenter.hb.se/~arch/files/part_functions.sql You could probably work something out using those functions (as-is, or as inspiration) together with pgAgent (http://www.pgadmin.org/docs/1.4/pgagent.html) /Mikael Those are some great functions. -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)
Erik Jones wrote: Tom Lane wrote: Erik Jones <[EMAIL PROTECTED]> writes: That's a good idea, but first I'll still need to run it by my sysadmin wrt space -- our dump files are around 22GB when we can let them finish these days. Given that we're now speculating about regex problems, you could do a test run of "pg_dump -s" with logging enabled; that shouldn't take an unreasonable amount of time or space. regards, tom lane Sounds like a post-lunch plan! By the way, even though this isn't even solved yet, thank you for all of your help! Ok, this ended up taking a bit longer to get to due to the fact that we've been building indexes on our user tables off and on for the last few days. But, I'm back on it now. Here is my general plan of action: I'm going to do a schema dump of the pg_catalog schema from a fresh, clean 8.2 install and, tomorrow night after I do the same against the db we've been having issues with, diff the two to see if there are any glaring discrepancies. While running the dump from the live db I will have statement logging on for the dump, are there any queries or query lengths that I should pay particular attention to? -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)
Tom Lane wrote: Erik Jones <[EMAIL PROTECTED]> writes: That's a good idea, but first I'll still need to run it by my sysadmin wrt space -- our dump files are around 22GB when we can let them finish these days. Given that we're now speculating about regex problems, you could do a test run of "pg_dump -s" with logging enabled; that shouldn't take an unreasonable amount of time or space. regards, tom lane Sounds like a post-lunch plan! By the way, even though this isn't even solved yet, thank you for all of your help! -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)
Tom Lane wrote: Erik Jones <[EMAIL PROTECTED]> writes: Guillaume Smet wrote: Could you set log_min_duration_statement=0 on your server and enable Heh, unfortunately, setting log_min_duration_statement=0 would be a total last resort as the last we counted (2 months ago) we were doing approximately 3 million transactions per hour. Do it just for the pg_dump: export PGOPTIONS="--log_min_duration_statement=0" pg_dump ... I don't think that the regex issue explains pg_dump being slow, unless perhaps you are making use of the table-selection switches? That's a good idea, but first I'll still need to run it by my sysadmin wrt space -- our dump files are around 22GB when we can let them finish these days. We do have plans to move off of the dump to a snapshot backup strategy that will eventually lead to a PITR warm-standby setup but, first, we want to make sure we have a stable, fast, up-to-date server -- our web servers are still connecting to the db via 8.1.4 client libs as given what we've seen of the track record for 8.2. client libs on our setup, we're bit reticent to move the rest of the application over. While I wait to see what we can do about logging everything during the dump I'll probably build 8.2 on a remote linux machine and see how connecting via those tools compares. -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)
Guillaume Smet wrote: Erik, Could you set log_min_duration_statement=0 on your server and enable logging (tutorial here if you don't know how to do that: http://pgfouine.projects.postgresql.org/tutorial.html). You should see which queries are executed in both cases and find the slow one easily. Heh, unfortunately, setting log_min_duration_statement=0 would be a total last resort as the last we counted (2 months ago) we were doing approximately 3 million transactions per hour. -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)
Tom Lane wrote: Erik Jones <[EMAIL PROTECTED]> writes: Tom Lane wrote: I could see this taking an unreasonable amount of time if you had a huge number of pg_class rows or a very long search_path --- is your database at all out of the ordinary in those ways? Well, running "select count(*) from pg_class;" returns 524699 rows Ouch. our search path is the default. I'd also like to reiterate that \d pg_class returns instantly when run from the 8.1.4 psql client connected to the 8.2 db. I think I know where the problem is: would you compare timing of select * from pg_class where c.relname ~ '^(pg_class)$'; Approximately 4 seconds. select * from pg_class where c.relname ~ '^pg_class$'; Instant. Recent versions of psql put parentheses into the regex pattern for safety in case it's got "|", but I just realized that that probably confuses the optimizer's check for an indexable regex :-( However, this only explains slowdown in psql's \d commands, which wasn't your original complaint ... Well, it explains the slowdown wrt a query against the catalog tables by a postgres client application. Were there any changes made like this to pg_dump and/or pg_restore? -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)
Tom Lane wrote: Erik Jones <[EMAIL PROTECTED]> writes: ... sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 send(4, " Q\0\0\0E5 S E L E C T ".., 230, 0) = 230 <--- Hang is right here! sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 pollsys(0x08046EE8, 1, 0x, 0x) (sleeping...) pollsys(0x08046EE8, 1, 0x, 0x) = 1 recv(4, " T\0\0\0 P\003 o i d\0\0".., 16384, 0) = 140 ... Hmph. So it seems the delay really is on the server's end. Any chance you could truss the connected backend process too and see what it's doing? Actually ... before you do that, the first query for "\d pg_class" should look like SELECT c.oid, n.nspname, c.relname FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname ~ '^(pg_class)$' AND pg_catalog.pg_table_is_visible(c.oid) ORDER BY 2, 3; I could see this taking an unreasonable amount of time if you had a huge number of pg_class rows or a very long search_path --- is your database at all out of the ordinary in those ways? Well, running "select count(*) from pg_class;" returns 524699 rows and our search path is the default. I'd also like to reiterate that \d pg_class returns instantly when run from the 8.1.4 psql client connected to the 8.2 db. How would I go about determining which backend server process psql was attached to? -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)
Tom Lane wrote: Erik Jones <[EMAIL PROTECTED]> writes: Hmm... This gets stranger and stranger. When connecting to the database with the psql client in 8.2's bin directory and using commands such as \d the client hangs, or takes an extremely long time. Hangs at what point? During connection? Try strace'ing psql (or whatever the Solaris equivalent is) to see what it's doing. Ok, here's the truss output when attached to psql with "\d pg_class", I put a marker where the pause is. Note that today the pause is only (sic) about 3-4 seconds long before the command completes and the output is displayed and that the only difference in the system between yesterday and today is that today we don't have a dump running. I realize that most of this output below is unnecessary, but while I know what most of this is doing individually, I wouldn't know what to cut out for brevity's sake without accidentally also clipping something that is needed. read(0, 0x08047B7B, 1) (sleeping...) read(0, " \", 1)= 1 write(1, " \", 1) = 1 read(0, " d", 1)= 1 write(1, " d", 1) = 1 read(0, " ", 1)= 1 write(1, " ", 1) = 1 read(0, " ", 1)= 1 write(1, " ", 1) = 1 read(0, " p", 1)= 1 write(1, " p", 1) = 1 read(0, "7F", 1)= 1 write(1, "\b \b", 3) = 3 read(0, "7F", 1)= 1 write(1, "\b \b", 3) = 3 read(0, " p", 1)= 1 write(1, " p", 1) = 1 read(0, " g", 1)= 1 write(1, " g", 1) = 1 read(0, " _", 1)= 1 write(1, " _", 1) = 1 read(0, " c", 1)= 1 write(1, " c", 1) = 1 read(0, " l", 1)= 1 write(1, " l", 1) = 1 read(0, " a", 1)= 1 write(1, " a", 1) = 1 read(0, " s", 1)= 1 write(1, " s", 1) = 1 read(0, " s", 1)= 1 write(1, " s", 1) = 1 read(0, "\r", 1)= 1 write(1, "\n", 1) = 1 lwp_sigmask(SIG_SETMASK, 0x0002, 0x) = 0xFFBFFEFF [0x] ioctl(0, TCSETSW, 0xFEF431E0) = 0 lwp_sigmask(SIG_SETMASK, 0x, 0x) = 0xFFBFFEFF [0x] sigaction(SIGINT, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGTERM, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGQUIT, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGALRM, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGTSTP, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGTTOU, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGTTIN, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGWINCH, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGWINCH, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 send(4, " Q\0\0\0E5 S E L E C T ".., 230, 0) = 230 <--- Hang is right here! sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 pollsys(0x08046EE8, 1, 0x, 0x) (sleeping...) pollsys(0x08046EE8, 1, 0x, 0x) = 1 recv(4, " T\0\0\0 P\003 o i d\0\0".., 16384, 0) = 140 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 send(4, " Q\0\0\08F S E L E C T ".., 144, 0) = 144 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 pollsys(0x08046EE8, 1, 0x, 0x) = 1 recv(4, " T\0\0\0D3\007 r e l h a".., 16384, 0) = 272 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 send(4, " Q\0\00186 S E L E C T ".., 391, 0) = 391 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 pollsys(0x08046EE8, 1, 0x, 0x) = 1 recv(4, " T\0\0\08F\005 a t t n a".., 16384, 0) = 1375 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 send(4, " Q\0\001 g S E L E C T ".., 360, 0) = 360 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 pollsys(0x08046EE8, 1, 0x, 0x) = 1 recv(4, " T\0\0\0DD\007 r e l n
Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)
Richard Huxton wrote: Erik Jones wrote: Hmm... This gets stranger and stranger. When connecting to the database with the psql client in 8.2's bin directory and using commands such as \d the client hangs, or takes an extremely long time. If we connect to the same 8.2 database with a psql client from 8.1.4, both remotely and locally, \d responds immediately. Could the issue be with the client programs somehow? Couldn't be some DNS problems that only affect the 8.2 client I suppose? Hmm... I don't see how that would matter when the 8.2. client is being run locally. -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] More 8.2 client issues (Was: [Slow dump?)
Hmm... This gets stranger and stranger. When connecting to the database with the psql client in 8.2's bin directory and using commands such as \d the client hangs, or takes an extremely long time. If we connect to the same 8.2 database with a psql client from 8.1.4, both remotely and locally, \d responds immediately. Could the issue be with the client programs somehow? Note also that we did our migration over the xmas weekend using the dump straight into a restore command. We kicked it off Saturday (12-23-06) night and it had just reached the point of adding foreign keys the morning of the 26th. We stopped it there, wrote a script to go through and build indexes (which finished in a timely manner) and have added just the foreign keys strictly necessary for our applications functionality (i.e. foreign keys set to cascade on update/delete, etc...). Original Message Subject:Re: [PERFORM] Slow dump? Date: Tue, 02 Jan 2007 11:40:18 -0600 From: Erik Jones <[EMAIL PROTECTED]> To: Tom Lane <[EMAIL PROTECTED]> CC: pgsql-performance@postgresql.org References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> Tom Lane wrote: Erik Jones <[EMAIL PROTECTED]> writes: Hello, we recently migrated our system from 8.1.x to 8.2 and when running dumps have noticed an extreme decrease in speed where the dump is concerned (by more than a factor of 2). That's odd. pg_dump is normally pretty much I/O bound, at least assuming your tables are sizable. The only way it wouldn't be is if you have a datatype with a very slow output converter. Have you looked into exactly which tables are slow to dump and what datatypes they contain? (Running pg_dump with log_min_duration_statement enabled would provide useful data about which steps take a long time, if you're not sure.) regards, tom lane Well, all of our tables use pretty basic data types: integer (various sizes), text, varchar, boolean, and timestamps without time zone. In addition, other than not having a lot of our foreign keys in place, there have been no other schema changes since the migration. -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow dump?
Tom Lane wrote: Erik Jones <[EMAIL PROTECTED]> writes: Hello, we recently migrated our system from 8.1.x to 8.2 and when running dumps have noticed an extreme decrease in speed where the dump is concerned (by more than a factor of 2). That's odd. pg_dump is normally pretty much I/O bound, at least assuming your tables are sizable. The only way it wouldn't be is if you have a datatype with a very slow output converter. Have you looked into exactly which tables are slow to dump and what datatypes they contain? (Running pg_dump with log_min_duration_statement enabled would provide useful data about which steps take a long time, if you're not sure.) regards, tom lane Well, all of our tables use pretty basic data types: integer (various sizes), text, varchar, boolean, and timestamps without time zone. In addition, other than not having a lot of our foreign keys in place, there have been no other schema changes since the migration. -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Slow dump?
Hello, we recently migrated our system from 8.1.x to 8.2 and when running dumps have noticed an extreme decrease in speed where the dump is concerned (by more than a factor of 2). I was wondering if someone might offer some suggestions as to what may be causing the problem. How important are max_fsm_pages and max_fsm_relations to doing a dump? I was just looking over your config file and that's the only thing that jumped out at me as needing to be changed. Machine info: OS: Solaris 10 Sunfire X4100 XL 2x AMD Opteron Model 275 dual core procs 8GB of ram Pertinent postgres settings: shared_buffers: 5 work_mem: 8192 maintenance_work_mem: 262144 max_stack_depth: 3048 (default) There doesn't seem to be any other performance degradation while the dump is running (which I suppose is good). Any ideas? -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Problems with an update-from statement and pg-8.1.4
Rafael Martinez wrote: On Wed, 2006-12-06 at 14:19 -0600, Erik Jones wrote: Rafael Martinez wrote: On Wed, 2006-12-06 at 14:55 -0500, Ted Allen wrote: Stephan Szabo wrote: On Wed, 6 Dec 2006, Rafael Martinez wrote: mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM mail m, mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; I don't think this statement does what you expect. You're ending up with two copies of mail in the above one as "mail" and one as "m". You probably want to remove the mail m in FROM and use mail rather than m in the where clause. Worse yet I think your setting "spamcore" for EVERY row in mail to '-5.026'. The above solution should fix it though. -- Ted Thanks for the answers. I think the 'problem' is explain in the documentation: "fromlist A list of table expressions, allowing columns from other tables to appear in the WHERE condition and the update expressions. This is similar to the list of tables that can be specified in the FROMClause of a SELECT statement. Note that the target table must not appear in the fromlist, unless you intend a self-join (in which case it must appear with an alias in the fromlist)". And as you said, we can not have 'mail m' in the FROM clause. I have contacted the developers and they will change the statement. I gave then these 2 examples: --- mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM mail_received mr where mr.mail_id = mail.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; QUERY PLAN -- Nested Loop (cost=0.00..6.54 rows=1 width=57) -> Index Scan using received_queue_id_index on mail_received mr (cost=0.00..3.20 rows=1 width=4) Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) Filter: (mailhost = '129.240.10.47'::inet) -> Index Scan using mail_pkey on mail (cost=0.00..3.32 rows=1 width=57) Index Cond: ("outer".mail_id = mail.mail_id) (6 rows) mailstats=# explain update mail SET spamscore = '-5.026' where mail_id = (select mail_id from mail_received where queue_id = '1GrxLs-0004N9-I1' and mailhost = '129.240.10.47'); QUERY PLAN - Index Scan using mail_pkey on mail (cost=3.20..6.52 rows=1 width=57) Index Cond: (mail_id = $0) InitPlan -> Index Scan using received_queue_id_index on mail_received (cost=0.00..3.20 rows=1 width=4) Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) Filter: (mailhost = '129.240.10.47'::inet) (6 rows) --- Look again at the estimated costs of those two query plans. You haven't gained anything there. Try this out: EXPLAIN UPDATE mail SET spamscore = '-5.026' FROM mail_received mr WHERE mail.mail_id = mr.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' ; Haven't we? * In the statement with problems we got this: Nested Loop (cost=0.00..932360.78 rows=7184312 width=57) * In the ones I sent: Nested Loop (cost=0.00..6.54 rows=1 width=57) Index Scan using mail_pkey on mail (cost=3.20..6.52 rows=1 width=57) * And in the last one you sent me: -- Nested Loop (cost=0.00..6.53 rows=1 width=57) -> Index Scan using received_queue_id_index on mail_received mr (cost=0.00..3.20 rows=1 width=4) Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) -> Index Scan using mail_pkey on mail (cost=0.00..3.32 rows=1 width=57) Index Cond: (mail.mail_id = "outer".mail_id) (5 rows) -- I can not see the different. regards, Ah, sorry, I was just looking at the two that you sent in your last message thinking that they were 'old' and 'new', not both 'new'. My bad... -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Problems with an update-from statement and pg-8.1.4
Rafael Martinez wrote: On Wed, 2006-12-06 at 14:55 -0500, Ted Allen wrote: Stephan Szabo wrote: On Wed, 6 Dec 2006, Rafael Martinez wrote: mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM mail m, mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; I don't think this statement does what you expect. You're ending up with two copies of mail in the above one as "mail" and one as "m". You probably want to remove the mail m in FROM and use mail rather than m in the where clause. Worse yet I think your setting "spamcore" for EVERY row in mail to '-5.026'. The above solution should fix it though. -- Ted Thanks for the answers. I think the 'problem' is explain in the documentation: "fromlist A list of table expressions, allowing columns from other tables to appear in the WHERE condition and the update expressions. This is similar to the list of tables that can be specified in the FROMClause of a SELECT statement. Note that the target table must not appear in the fromlist, unless you intend a self-join (in which case it must appear with an alias in the fromlist)". And as you said, we can not have 'mail m' in the FROM clause. I have contacted the developers and they will change the statement. I gave then these 2 examples: --- mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM mail_received mr where mr.mail_id = mail.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; QUERY PLAN -- Nested Loop (cost=0.00..6.54 rows=1 width=57) -> Index Scan using received_queue_id_index on mail_received mr (cost=0.00..3.20 rows=1 width=4) Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) Filter: (mailhost = '129.240.10.47'::inet) -> Index Scan using mail_pkey on mail (cost=0.00..3.32 rows=1 width=57) Index Cond: ("outer".mail_id = mail.mail_id) (6 rows) mailstats=# explain update mail SET spamscore = '-5.026' where mail_id = (select mail_id from mail_received where queue_id = '1GrxLs-0004N9-I1' and mailhost = '129.240.10.47'); QUERY PLAN - Index Scan using mail_pkey on mail (cost=3.20..6.52 rows=1 width=57) Index Cond: (mail_id = $0) InitPlan -> Index Scan using received_queue_id_index on mail_received (cost=0.00..3.20 rows=1 width=4) Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) Filter: (mailhost = '129.240.10.47'::inet) (6 rows) --- Look again at the estimated costs of those two query plans. You haven't gained anything there. Try this out: EXPLAIN UPDATE mail SET spamscore = '-5.026' FROM mail_received mr WHERE mail.mail_id = mr.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' ; -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Locking in PostgreSQL?
Casey Duncan wrote: On Dec 5, 2006, at 11:04 PM, Joost Kraaijeveld wrote: Does PostgreSQL lock the entire row in a table if I update only 1 column? Know that updating 1 column is actually updating the whole row. So if one transaction updates column A of a row, it will block another concurrent transaction that tries to update column B of the same row. As was mentioned however, neither of these transactions block others reading the row in question, though they see the row as it existed before the updates until those update transactions commit. If you know that your application will suffer excessive update contention trying to update different columns of the same row, you could consider splitting the columns into separate tables. This is an optimization to favor write contention over read performance (since you would likely need to join the tables when selecting) and I wouldn't do it speculatively. I'd only do it if profiling the application demonstrated significantly better performance with two tables. -Casey Or, come up with some kind of (pre)caching strategy for your updates wherein you could then combine multiple updates to the same row into one update. -- erik jones <[EMAIL PROTECTED]> software development emma(r) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate