Re: [PERFORM] ECC RAM really needed?
On Fri, May 25, 2007 at 18:45:15 -0700, Craig James <[EMAIL PROTECTED]> wrote: > We're thinking of building some new servers. We bought some a while back > that have ECC (error correcting) RAM, which is absurdly expensive compared > to the same amount of non-ECC RAM. Does anyone have any real-life data > about the error rate of non-ECC RAM, and whether it matters or not? In my > long career, I've never once had a computer that corrupted memory, or at > least I never knew if it did. ECC sound like a good idea, but is it > solving a non-problem? In the past when I purchased ECC ram it wasn't that much more expensive than nonECC ram. Wikipedia suggests a rule of thumb of one error per month per gigabyte, though suggests error rates vary widely. They reference a paper that should provide you with more background. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] BUG #3270: limit < 16 optimizer behaviour
This should have been asked on the performance list, not filed as a bug. I doubt anyone will have a complete answer to your question without EXPLAIN ANALYZE output from the query. Have you ANALYZE'd the tables recently? Poor statistics is one possible cause of the issue you are having. On Fri, May 11, 2007 at 14:07:57 +, Liviu Ionescu <[EMAIL PROTECTED]> wrote: > > The following bug has been logged online: > > Bug reference: 3270 > Logged by: Liviu Ionescu > Email address: [EMAIL PROTECTED] > PostgreSQL version: 8.2.4 > Operating system: Linux > Description:limit < 16 optimizer behaviour > Details: > > I have a table of about 15Mrows, and a query like this: > > SELECT historianid,storagedate,slotdate,status,value FROM historiandata > JOIN rtunodes ON(historiandata.historianid=rtunodes.nodeid) > JOIN rtus ON(rtunodes.rtuid=rtus.nodeid) > WHERE realmid IN (1119,1422,698,1428) > AND historianid in (2996) > ORDER BY storagedate desc > LIMIT 10 > > if there are no records with the given historianid, if limit is >= 16 the > query is quite fast, otherwise it takes forever. > > my current fix was to always increase the limit to 16, but, although I know > the optimizer behaviour depends on LIMIT, I still feel this looks like a > bug; if the resultset has no records the value of the LIMIT should not > matter. > > regards, > > Liviu Ionescu > > > > CREATE TABLE historiandata > ( > historianid int4 NOT NULL, > status int2 NOT NULL DEFAULT 0, > value float8, > slotdate timestamptz NOT NULL, > storagedate timestamptz NOT NULL DEFAULT now(), > CONSTRAINT historiandata_pkey PRIMARY KEY (historianid, slotdate), > CONSTRAINT historianid_fkey FOREIGN KEY (historianid) > REFERENCES historians (nodeid) MATCH SIMPLE > ON UPDATE CASCADE ON DELETE RESTRICT > ) > WITHOUT OIDS; > ALTER TABLE historiandata OWNER TO tomcat; > > > -- Index: historiandata_historianid_index > > -- DROP INDEX historiandata_historianid_index; > > CREATE INDEX historiandata_historianid_index > ON historiandata > USING btree > (historianid); > > -- Index: historiandata_slotdate_index > > -- DROP INDEX historiandata_slotdate_index; > > CREATE INDEX historiandata_slotdate_index > ON historiandata > USING btree > (slotdate); > > -- Index: historiandata_storagedate_index > > -- DROP INDEX historiandata_storagedate_index; > > CREATE INDEX historiandata_storagedate_index > ON historiandata > USING btree > (storagedate); > > > CREATE TABLE rtunodes > ( > nodeid int4 NOT NULL, > rtuid int4 NOT NULL, > no_publicnodeid int4, > name varchar(64) NOT NULL, > isinvalid bool NOT NULL DEFAULT false, > nodetype varchar(16), > CONSTRAINT rtunodes_pkey PRIMARY KEY (nodeid), > CONSTRAINT nodeid_fkey FOREIGN KEY (nodeid) > REFERENCES nodes (nodeid) MATCH SIMPLE > ON UPDATE CASCADE ON DELETE RESTRICT, > CONSTRAINT rtuid_fkey FOREIGN KEY (rtuid) > REFERENCES rtus (nodeid) MATCH SIMPLE > ON UPDATE CASCADE ON DELETE RESTRICT > ) > WITHOUT OIDS; > ALTER TABLE rtunodes OWNER TO tomcat; > > > > CREATE TABLE rtus > ( > nodeid int4 NOT NULL, > passwd varchar(10) NOT NULL, > xml text, > no_nextpublicnodeid int4 NOT NULL DEFAULT 1, > rtudriverid int2, > realmid int4 NOT NULL, > enablegetlogin bool NOT NULL DEFAULT false, > enablegetconfig bool NOT NULL DEFAULT false, > businfoxml text, > uniqueid varchar(32) NOT NULL, > no_publicrtuid int4, > loginname varchar(10) NOT NULL, > protocolversion varchar(8) DEFAULT '0.0'::character varying, > isinvalid bool DEFAULT false, > CONSTRAINT rtus_pkey PRIMARY KEY (nodeid), > CONSTRAINT nodeid_fkey FOREIGN KEY (nodeid) > REFERENCES nodes (nodeid) MATCH SIMPLE > ON UPDATE CASCADE ON DELETE RESTRICT, > CONSTRAINT realmid_fkey FOREIGN KEY (realmid) > REFERENCES realms (nodeid) MATCH SIMPLE > ON UPDATE CASCADE ON DELETE RESTRICT, > CONSTRAINT rtudriverid_fkey FOREIGN KEY (rtudriverid) > REFERENCES rtudrivers (rtudriverid) MATCH SIMPLE > ON UPDATE CASCADE ON DELETE RESTRICT, > CONSTRAINT rtus_loginname_unique UNIQUE (loginname), > CONSTRAINT rtus_uniqueid_unique UNIQUE (uniqueid) > ) > WITHOUT OIDS; > ALTER TABLE rtus OWNER TO tomcat; > > > -- Index: rtus_realmid_index > > -- DROP INDEX rtus_realmid_index; > > CREATE INDEX rtus_realmid_index > ON rtus > USING btree > (realmid); > > -- Index: rtus_rtudriverid_index > > -- DROP INDEX rtus_rtudriverid_index; > > CREATE INDEX rtus_rtudriverid_index > ON rtus > USING btree > (rtudriverid); > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our ex
Re: [PERFORM] Two hard drives --- what to do with them?
On Wed, Feb 28, 2007 at 05:21:41 +1030, Shane Ambler <[EMAIL PROTECTED]> wrote: > > The difference between SCSI and IDE/SATA in this case is a lot if not > all IDE/SATA drives tell you that the cache is disabled when you ask it > to but they either don't actually disable it or they don't retain the > setting so you get caught later. SCSI disks can be trusted when you set > this option. I have some Western Digital Caviars and they don't lie about disabling write caching. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Two hard drives --- what to do with them?
On Tue, Feb 27, 2007 at 15:35:13 +1030, Shane Ambler <[EMAIL PROTECTED]> wrote: > > From all that I have heard this is another advantage of SCSI disks - > they honor these settings as you would expect - many IDE/SATA disks > often say "sure I'll disable the cache" but continue to use it or don't > retain the setting after restart. It is easy enough to tests if your disk lie about disabling the cache. I doubt that it is all that common for modern disks to do that. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Two hard drives --- what to do with them?
On Sun, Feb 25, 2007 at 23:11:01 +0100, Peter Kovacs <[EMAIL PROTECTED]> wrote: > A related question: > Is it sufficient to disable write cache only on the disk where pg_xlog > is located? Or should write cache be disabled on both disks? With recent linux kernels you may also have the option to use write barriers instead of disabling caching. You need to make sure all of your stacked block devices will handle it and most versions of software raid (other than 1) won't. This won't be a lot faster, since at sync points the OS needs to order a cache flush, but it does give the disks a chance to reorder some commands in between flushes. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How long should it take to insert 200,000 records?
On Tue, Feb 06, 2007 at 10:31:26 -0800, Mark Lewis <[EMAIL PROTECTED]> wrote: > > Sure it's possible: > > CREATE TABLE parent (col1 int4); > -- insert many millions of rows into parent > CREATE TABLE child (col1 int4 REFERENCES parent(col1)); > -- insert many millions of rows into child, very very slowly. I don't think Postgres allows this. You don't have to have an index in the child table, but do in the parent table. Quote from http://developer.postgresql.org/pgdocs/postgres/sql-createtable.html: The referenced columns must be the columns of a unique or primary key constraint in the referenced table. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)
On Tue, Feb 06, 2007 at 09:39:54 -0500, Mark Stosberg <[EMAIL PROTECTED]> wrote: > > I've been investigating partial indexes for the pets table. It has about > 300,000 rows, but only about 10 are "active", and those are the ones we > are care about. Queries are also frequently made on males vs females, dogs vs > cats It probably won't pay to make partial indexes on sex or species (at least for the popular ones), as you aren't likely to save enough by eliminating only half the cases to make up for maintaining another index. A partial index for active rows probably does make sense. > or specific ages, and those specific cases seem like possible candidates for > partial indexes > as well. I played with that approach some, but had trouble coming up with any > thing that > benchmarked faster. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] optimizing a geo_distance() proximity query (example and benchmark)
On Mon, Feb 05, 2007 at 18:01:05 -0500, Mark Stosberg <[EMAIL PROTECTED]> wrote: > > It's also notable that the units used are meters, not miles like > geo_distance(). That's what the magic number of "16093.44" is-- 10 miles > converted to meters. You can change the earth() function in earthdistance.sql before running it to use some other unit other than meters: -- earth() returns the radius of the earth in meters. This is the only -- place you need to change things for the cube base distance functions -- in order to use different units (or a better value for the Earth's radius). CREATE OR REPLACE FUNCTION earth() RETURNS float8 LANGUAGE 'sql' IMMUTABLE AS 'SELECT ''6378168''::float8'; > However, my next step was to try a more "real world" query that involved > a more complex where clause and a couple of table joins. So far, that > result is coming out /slower/ with the new approach, even though the > index is being used. I believe this may be cause of the additional > results found that are outside of the sphere, but inside the cube. This > causes additional rows that need processing in the joined tables. This is unlikely to be the cause. The ratio of the area of the cube to the circle for small radii (compared to the radius of the earth, so that we can consider thinsg flat) is 4/pi = 1.27 which shouldn't cause that much of a change. It might be that you are getting a bad plan. The guess on the selectivity of the gist constraint may not be very good. Some people here may be able to tell you more if you show us explain analyze output. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] optimizing a geo_distance() proximity query
On Mon, Feb 05, 2007 at 14:47:25 -0500, Mark Stosberg <[EMAIL PROTECTED]> wrote: > > This is also interesting. Is this approach practical if I want to index > what's near each of about 40,000 US zipcodes, or the approach mostly > useful if you there are just a small number of fixed points to address? I think the answer depends on what your data model is. If you treat each zip code as having a location at a single point, the earth distance stuff should work. If you are trying to include the shape of each zip code in your model and measure distances to the nearest point of zip codes, then you will probably be better off using postgis. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] optimizing a geo_distance() proximity query
On Sat, Feb 03, 2007 at 14:00:26 -0500, Mark Stosberg <[EMAIL PROTECTED]> wrote: > > I'm using geo_distance() from contrib/earthdistance would like to find a > way to spend up the geo distance calculation if possible. This is for a > proximity search: "Show me adoptable pets within 250 miles of this > zipcode". If you are using the "cube" based part of the earth distance package, then you can use gist indexes to speed those searches up. There are functions for creating boxes that include all of the points some distance from a fixed point. This is lossy, so you need to recheck if you don't want some points a bit farther away returned. Also you would need to pick a point to be where the zip code is located, rather than using area based zip codes. However, if you have actually addresses you could use the tiger database to locate them instead of just zip code locations. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Querying distinct values from a large table
On Tue, Jan 30, 2007 at 14:33:34 +0600, Igor Lobanov <[EMAIL PROTECTED]> wrote: > Greetings! > > I have rather large table with about 5 millions of rows and a dozen of > columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I need > to query distinct pairs of ('a';'b') from this table. > > Is there any way to somehow improve the performance of this operation? > Table can not be changed. DISTINCT currently can't use a hash aggregate plan and will use a sort. If there aren't many distinct values, the hash aggregate plan will run much faster. To get around this limitation, rewrite the query as a group by. Something like: SELECT a, b FROM table GROUP BY a, b; ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] slow result
On Tue, Jan 23, 2007 at 11:34:52 +0100, Laurent Manchon <[EMAIL PROTECTED]> wrote: > Hi, > > I have a slow response of my PostgreSQL database 7.4 using this query below > on a table with 80 rows: > > select count(*)from tbl; > > PostgreSQL return result in 28 sec every time. > although MS-SQL return result in 0.02 sec every time. Besides the other advice mentioned in this thread, check that you don't have a lot of dead tuples in that table. 28 seconds seems a bit high for even a sequential scan of 80 tuples unless they are pretty large. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] DB benchmark and pg config file help
On Fri, Jan 19, 2007 at 09:05:35 -0500, Kevin Hunter <[EMAIL PROTECTED]> wrote: > > Seriously though, that would have bitten me. Thank you, I did not know > that. Does that mean that I can't publish the results outside of my > work/research/personal unit at all? Or do I just need to obscure about > which DB I'm talking? (Like Vendor {1,2,3,...} Product). Check with your lawyer. Depending on where you are, those clauses may not even be valid. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Configuration Advice
> From: "Steve" <[EMAIL PROTECTED]> > To: pgsql-performance@postgresql.org > Sent: 1/17/2007 2:41 PM > Subject: [PERFORM] Configuration Advice > > SO ... our goal here is to make this load process take less time. It > seems the big part is building the big summary table; this big summary > table is currently 9 million rows big. Every night, we drop the table, > re-create it, build the 9 million rows of data (we use COPY to put hte > data in when it's prepared, not INSERT), and then build the indexes on it > -- of which there are many. Unfortunately this table gets queried > in a lot of different ways and needs these indexes; also unfortunately, we > have operator class indexes to support both ASC and DESC sorting on > columns so these are for all intents and purposes duplicate but required > under Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this still > a requirement?) Note that you only need to have the ASC and DESC versions of opclasses when you are going to use multicolumn indexes with some columns in ASC order and some in DESC order. For columns used by themselves in an index, you don't need to do this, no matter which order you are sorting on. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] File Systems Compared
On Fri, Dec 15, 2006 at 10:44:39 -0600, Bruno Wolff III <[EMAIL PROTECTED]> wrote: > > The other feature I would like is to be able to use write barriers with > encrypted file systems. I haven't found anythign on whether or not there > are near term plans by any one to support that. I asked about this on the dm-crypt list and was told that write barriers work pre 2.6.19. There was a change for 2.6.19 that might break things for SMP systems. But that will probably get fixed eventually. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] File Systems Compared
On Fri, Dec 15, 2006 at 10:34:15 -0600, Bruno Wolff III <[EMAIL PROTECTED]> wrote: > The reply wasn't (directly copied to the performance list, but I will > copy this one back. Sorry about this one, I meant to intersperse my replies and hit the 'y' key at the wrong time. (And there ended up being a copy on performance anyway from the news gateway.) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] File Systems Compared
On Thu, Dec 14, 2006 at 13:21:11 -0800, Ron Mayer <[EMAIL PROTECTED]> wrote: > Bruno Wolff III wrote: > > On Thu, Dec 14, 2006 at 01:39:00 -0500, > > Jim Nasby <[EMAIL PROTECTED]> wrote: > >> On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote: > >>> This appears to be changing under Linux. Recent kernels have write > >>> barriers implemented using cache flush commands (which > >>> some drives ignore, so you need to be careful). > > Is it true that some drives ignore this; or is it mostly > an urban legend that was started by testers that didn't > have kernels with write barrier support. I'd be especially > interested in knowing if there are any currently available > drives which ignore those commands. I saw posts claiming this, but no specific drives mentioned. I did see one post that claimed that the cache flush command was mandated (not optional) by the spec. > >>> In very recent kernels, software raid using raid 1 will also > >>> handle write barriers. To get this feature, you are supposed to > >>> mount ext3 file systems with the barrier=1 option. For other file > >>> systems, the parameter may need to be different. > > With XFS the default is apparently to enable write barrier > support unless you explicitly disable it with the nobarrier mount option. > It also will warn you in the system log if the underlying device > doesn't have write barrier support. I think there might be a similar patch for ext3 going into 2.6.19. I haven't checked a 2.6.19 kernel to make sure though. > > SGI recommends that you use the "nobarrier" mount option if you do > have a persistent (battery backed) write cache on your raid device. > > http://oss.sgi.com/projects/xfs/faq.html#wcache > > > >> But would that actually provide a meaningful benefit? When you > >> COMMIT, the WAL data must hit non-volatile storage of some kind, > >> which without a BBU or something similar, means hitting the platter. > >> So I don't see how enabling the disk cache will help, unless of > >> course it's ignoring fsync. > > With write barriers, fsync() waits for the physical disk; but I believe > the background writes from write() done by pdflush don't have to; so > it's kinda like only disabling the cache for WAL files and the filesystem's > journal, but having it enabled for the rest of your write activity (the > tables except at checkpoints? the log file?). Not exactly. Whenever you commit the file system log or fsync the wal file, all previously written blocks will be flushed to the disk platter, before any new write requests are honored. So journalling semantics will work properly. > > Note the use case for this is more for hobbiests or development boxes. You > > can > > only use it on software raid (md) 1, which rules out most "real" systems. > > > > Ugh. Looking for where that's documented; and hoping it is or will soon > work on software 1+0 as well. I saw a comment somewhere that raid 0 provided some problems and the suggestion was to handle the barrier at a different level (though I don't know how you could). So I don't belive 1+0 or 5 are currently supported or will be in the near term. The other feature I would like is to be able to use write barriers with encrypted file systems. I haven't found anythign on whether or not there are near term plans by any one to support that. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] File Systems Compared
The reply wasn't (directly copied to the performance list, but I will copy this one back. On Thu, Dec 14, 2006 at 13:21:11 -0800, Ron Mayer <[EMAIL PROTECTED]> wrote: > Bruno Wolff III wrote: > > On Thu, Dec 14, 2006 at 01:39:00 -0500, > > Jim Nasby <[EMAIL PROTECTED]> wrote: > >> On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote: > >>> This appears to be changing under Linux. Recent kernels have write > >>> barriers implemented using cache flush commands (which > >>> some drives ignore, so you need to be careful). > > Is it true that some drives ignore this; or is it mostly > an urban legend that was started by testers that didn't > have kernels with write barrier support. I'd be especially > interested in knowing if there are any currently available > drives which ignore those commands. > > >>> In very recent kernels, software raid using raid 1 will also > >>> handle write barriers. To get this feature, you are supposed to > >>> mount ext3 file systems with the barrier=1 option. For other file > >>> systems, the parameter may need to be different. > > With XFS the default is apparently to enable write barrier > support unless you explicitly disable it with the nobarrier mount option. > It also will warn you in the system log if the underlying device > doesn't have write barrier support. > > SGI recommends that you use the "nobarrier" mount option if you do > have a persistent (battery backed) write cache on your raid device. > > http://oss.sgi.com/projects/xfs/faq.html#wcache > > > >> But would that actually provide a meaningful benefit? When you > >> COMMIT, the WAL data must hit non-volatile storage of some kind, > >> which without a BBU or something similar, means hitting the platter. > >> So I don't see how enabling the disk cache will help, unless of > >> course it's ignoring fsync. > > With write barriers, fsync() waits for the physical disk; but I believe > the background writes from write() done by pdflush don't have to; so > it's kinda like only disabling the cache for WAL files and the filesystem's > journal, but having it enabled for the rest of your write activity (the > tables except at checkpoints? the log file?). > > > Note the use case for this is more for hobbiests or development boxes. You > > can > > only use it on software raid (md) 1, which rules out most "real" systems. > > > > Ugh. Looking for where that's documented; and hoping it is or will soon > work on software 1+0 as well. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] File Systems Compared
On Thu, Dec 14, 2006 at 01:39:00 -0500, Jim Nasby <[EMAIL PROTECTED]> wrote: > On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote: > > > >This appears to be changing under Linux. Recent kernels have write > >barriers > >implemented using cache flush commands (which some drives ignore, > >so you > >need to be careful). In very recent kernels, software raid using > >raid 1 > >will also handle write barriers. To get this feature, you are > >supposed to > >mount ext3 file systems with the barrier=1 option. For other file > >systems, > >the parameter may need to be different. > > But would that actually provide a meaningful benefit? When you > COMMIT, the WAL data must hit non-volatile storage of some kind, > which without a BBU or something similar, means hitting the platter. > So I don't see how enabling the disk cache will help, unless of > course it's ignoring fsync. When you do an fsync, the OS sends a cache flush command to the drive, which on most drives (but supposedly there are ones that ignore this command) doesn't return until all of the cached pages have been written to the platter, and doesn't return from the fsync until the flush is complete. While this writes more sectors than you really need, it is safe. And it allows for caching to speed up some things (though not as much as having queued commands would). I have done some tests on my systems and the speeds I am getting make it clear that write barriers slow things down to about the same range as having caches disabled. So I believe that it is likely working as advertised. Note the use case for this is more for hobbiests or development boxes. You can only use it on software raid (md) 1, which rules out most "real" systems. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] File Systems Compared
On Wed, Dec 06, 2006 at 08:55:14 -0800, Mark Lewis <[EMAIL PROTECTED]> wrote: > > Anyone run their RAIDs with disk caches enabled, or is this akin to > > having fsync off? > > Disk write caches are basically always akin to having fsync off. The > only time a write-cache is (more or less) safe to enable is when it is > backed by a battery or in some other way made non-volatile. > > So a RAID controller with a battery-backed write cache can enable its > own write cache, but can't safely enable the write-caches on the disk > drives it manages. This appears to be changing under Linux. Recent kernels have write barriers implemented using cache flush commands (which some drives ignore, so you need to be careful). In very recent kernels, software raid using raid 1 will also handle write barriers. To get this feature, you are supposed to mount ext3 file systems with the barrier=1 option. For other file systems, the parameter may need to be different. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] File Systems Compared
On Wed, Dec 06, 2006 at 18:45:56 +0100, Markus Schiltknecht <[EMAIL PROTECTED]> wrote: > > Cool, thank you for the example :-) I thought the MTA or at least the the > mailing list would wrap mails at some limit. I've now set word-wrap to > characters (it seems not possible to turn it off completely in > thunderbird). But when writing, I'm now getting one long line. > > What's common practice? What's it on the pgsql mailing lists? If you do this you should set format=flowed (see rfc 2646). If you do that, then clients can break the lines in an appropiate way. This is actually better than fixing the line width in the original message, since the recipient may not have the same number of characters (or pixels) of display as the sender. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] BUG #2784: Performance serious degrades over a period of a month
This really should have been asked on pgsql-performance and would probably get a better response there.. On Sun, Nov 26, 2006 at 16:35:52 +, Michael Simms <[EMAIL PROTECTED]> wrote: > PostgreSQL version: 8.1.4 > Operating system: Linux kernel 2.6.12 > Description:Performance serious degrades over a period of a month > Details: > > OK, we have a database that runs perfectly well after a dump and restore, > but over a period of a month or two, it just degrades to the point of > uselessness. > vacuumdb -a is run every 24 hours. We have also run for months at a time > using -a -z but the effect doesnt change. > This sounds like you either need to increase your FSM setting or vacuum more often. I think vacuumdb -v will give you enough information to tell if FSM is too low at the frequency you are vacuuming. > The database is for a counter, not the most critical part of the system, but > a part of the system nonetheless. Other tables we have also degrade over > time, but the counter is the most pronounced. There seems to be no common > feature of the tables that degrade. All I know is that a series of queries > that are run on the database every 24 hours, after a dump/restore takes 2 > hours. Now, 2 months after, it is taking over 12. We are seriously > considering switching to mysql to avoid this issue. You probably will want to vacuum the counter table more often than the other tables in the database. Depending on how often the counter(s) are being updated and how many separate counters are in the table you might want to vacuum that table as often as once a minute. Depending on your requirements you might also want to consider using a sequence instead of a table row for the counter. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Simple join optimized badly?
On Mon, Oct 09, 2006 at 23:33:03 +0200, Tobias Brox <[EMAIL PROTECTED]> wrote: > > Just a comment from the side line; can't the rough "set > enable_seqscan=off" be considered as sort of a hint anyway? There have > been situations where we've actually had to resort to such crud. That only works for simple queries. To be generally useful, you want to be able to hint how to handle each join being done in the query. The current controlls affect all joins. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Unsubscribe
On Wed, Oct 04, 2006 at 08:30:03 -0700, "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: > > Although I 100% agree with you Bruno, it should be noted that our lists > are a closed box for most people. They don't follow what is largely > considered standard amongst lists which is to have list information at > the bottom of each e-mail. There are reasons you don't want to do that. Footers work OK for single part email messages. They don't make so much sense in multipart messages. You can probably take a crap shoot and add the footer to the first text/plain part and not break things. This won't work so well for multipart alternative messages that have text/plain and text/html parts. You could also try to insert a footer in to the html part, but thats a bit trickier since you can't just put it at the end. However, since the postgres lists are mostly just using text/plain parts for message bodies and there are already footers being used to distribute tips, it wouldn't make things significantly worse to add unsubscribe information as well. I would prefer just making the unsubscribe instructions easy to find on the web. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Unsubscribe
On Wed, Oct 04, 2006 at 10:03:00 +0200, Luc Delgado <[EMAIL PROTECTED]> wrote: > > Please unsubscribe me! Thank you! If you really can't figure out how to unsubscribe from a list, you should contact the list owner, not the list. The list members can't unsubscribe you (and it isn't their job to) and the owner may not be subscribed to the list. The convention for lists is that adding '-owner' to the local part of the list email address will be an address for the owner. A good place to search to find out how to unsubscribe to a list is to search for the mailing lists using google. Usually the information on how to subscribe and unsubscribe are in the same place and you were able to find out how to subscribe in the first place, so you should be able to figure out how to unsubscribe by yourself as well. > > > Also, it would be better to have a message foot saying how to unsubscribe. No, the standard is that the list information is kept in the headers so that it can be extracted by mail clients that care to. There is an RFC describing these headers. They are supplied by the mailing list software used for the Postgres mailing lists. Have your mail client display full headers for one of the list messages to get the instructions from there. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] BUG #2658: Query not using index
On Tue, Oct 03, 2006 at 12:13:43 -0700, Graham Davis <[EMAIL PROTECTED]> wrote: > Also, the multikey index of (assetid, ts) would already be sorted and > that is why using such an index in this case is > faster than doing a sequential scan that does the sorting afterwards. That isn't necessarily true. The sequentional scan and sort will need a lot fewer disk seeks and could run faster than using an index scan that has the disk drives doing seeks for every tuple (in the worst case, where the on disk order of tuples doesn't match the order in the index). If your server is caching most of the blocks than the index scan might give better results. You might try disabling sequentional scans to try to coerce the other plan and see what results you get. If it is substantially faster the other way, then you might want to look at lowering the random page cost factor. However, since this can affect other queries you need to be careful that you don't speed up one query at the expense of a lot of other queries. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] BUG #2543: Performance delay acrros the same day
On Fri, Jul 21, 2006 at 07:41:02 +, Alaa El Gohary <[EMAIL PROTECTED]> wrote: > > The following bug has been logged online: The report below isn't a bug, its a performance question and should have been sent to [EMAIL PROTECTED] I am redirecting replies there. > A query on the postgresql DB takes about 5 seconds and then it starts to > take more time till it reaches about 60 seconds by the end of the same day. > I tried vacuum but nothing changed the only thing that works is to dump the > DB ,drop and create a new one with the dump taken. > i need to know if there is any way to restore the performance back without > the need for drop and create > cause i can't do this accross the day You most likely aren't vacuuming often enough and/or don't have your FSM setting high enough. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] hyper slow after upgrade to 8.1.4
On Wed, Jul 12, 2006 at 15:41:14 -0500, Medora Schauer <[EMAIL PROTECTED]> wrote: > I have just upgraded from 7.3.4 to 8.1.4 and now *all* db access calls > are extremely slow. I didn't need to preserve any old data so at this > point all my tables are empty. Just connecting to a db takes several > seconds. > > I know I'm not giving much to go on but I'm stumped. Can anyone suggest > how I might track down the cause of this problem? That connections are slow makes me think DNS is worth looking at. It might be that reverse lookups are timing out. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Commit slower on faster PC
On Wed, Jul 12, 2006 at 10:16:40 -0600, "Koth, Christian (DWBI)" <[EMAIL PROTECTED]> wrote: > > I have noticed a strange performance behaviour using a commit statement on > two different machines. On one of the machines the commit is many times > faster than on the other machine which has faster hardware. Server and client > are running always on the same machine. > > Server version (same on both machines): PostgreSQL 8.1.3. (same binaries as > well) > > PC1: > > IDE-HDD (approx. 50 MB/s rw), fs: ext3 > > PC2: > > SCSI-HDD (approx. 65 MB/s rw), fs: ext3 > > Both installations of the database have the same configuration, different > from default are only the following settings on both machines: > > pgbench gives me the following results: > PC1: > > tps = 293.625393 (excluding connections establishing) > > PC2: > > tps = 46.519634 (excluding connections establishing) Have you checked to see if the ide drive is lying about having written the data to the platters? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] High CPU Usage - PostgreSQL 7.3
On Mon, Jul 10, 2006 at 17:55:38 +1000, Neil Hepworth <[EMAIL PROTECTED]> wrote: > > running on our server (obviously we need to update certain queries, > e.g. delete .. using.. and test with 8.1 first) - I will be pushing > for an upgrade as soon as possible. And the fsync=false is a You can set add_missing_from if you want to delay rewriting queries that use that feature. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Optimizer internals
On Thu, Jun 15, 2006 at 15:38:32 -0400, John Vincent <[EMAIL PROTECTED]> wrote: > Any suggestions? FYI the original question wasn't meant as a poke at > >comparing PG to MySQL to DB2. I'm not making an yvalue judgements either > >way. I'm just trying to understand how we can use it the best way possible. > > > > Actually we just thought about something. With PG, we can create an index > that is a SUM of the column where indexing, no? We're going to test this in > a few hours. Would that be able to be satisfied by an index scan? No, that won't work. While you can make indexes on functions of a row, you can't make indexes on aggregate functions. You might find making a materialized view of the information you want can help with performance. The issues with "sum" are pretty much the same ones as with "count". You can find a couple different ways of doing materialized views for "count" in the archives. There is a simple way of doing it that doesn't work well with lots of concurrent updates and a more complicated method that does work well with lots of concurrent updates. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Question about clustering multiple columns
On Fri, Jun 16, 2006 at 11:11:59 -0700, Benjamin Arai <[EMAIL PROTECTED]> wrote: > Hi, > > Thanks for the reply. I have one more question. Does it matter in which > order that I make the index? Please keep replies copied to the lists so that other people can learn from and crontibute to the discussion. In this case I am just going to copy back to the performance list, since it is generally better for perfomance questions than the general list. > For example, should I create an index cusip,date or date,cusip, does it > matter which order. My goal is to cluster the entries by cusip, then for > each cusip order the data by date (maybe the order by data occurs > automatically). Hm, in that case maybe I only need to cluster by cusip, but > then how do I ensure that each cusip had its data ordered by date? I think that you want to order by cusip (assuming that corresponds to "name" in you sample query below) first. You won't end up having to go through values in the index that will be filtered out if you do it that way. The documentation for the cluster command says that it clusters on indexes, not columns. So if the index is on (cusip, date), then the records will be ordered by cusip, date immediately after the cluster. (New records added after the cluster are not guarenteed to be ordered by the index.) > > Benjamin > > -Original Message- > From: Bruno Wolff III [mailto:[EMAIL PROTECTED] > Sent: Friday, June 16, 2006 8:32 AM > To: Benjamin Arai > Cc: pgsql-general@postgresql.org; pgsql-performance@postgresql.org > Subject: Re: Question about clustering multiple columns > > On Tue, Jun 13, 2006 at 09:04:15 -0700, > Benjamin Arai <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I have a database where there are three columns (name,date,data). The > > queries are almost always something like SELECT date,data FROM table > > WHERE name=blah AND date > 1/1/2005 AND date < 1/1/2006;. I currently > > have three B-tree indexes, one for each of the columns. Is clustering > > on date index going to be what I want, or do I need a index that > > contains both name and date? > > I would expect that clustering on the name would be better for the above > query. > You probably want an index on name and date combined. > > !DSPAM:4492ce0d180368658827628! > ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Question about clustering multiple columns
On Tue, Jun 13, 2006 at 09:04:15 -0700, Benjamin Arai <[EMAIL PROTECTED]> wrote: > Hi, > > I have a database where there are three columns (name,date,data). The > queries are almost always something like SELECT date,data FROM table WHERE > name=blah AND date > 1/1/2005 AND date < 1/1/2006;. I currently have three > B-tree indexes, one for each of the columns. Is clustering on date index > going to be what I want, or do I need a index that contains both name and > date? I would expect that clustering on the name would be better for the above query. You probably want an index on name and date combined. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] App very unresponsive while performing simple update
On Wed, May 31, 2006 at 01:23:07 -0500, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > On Sun, May 28, 2006 at 07:20:59PM -0400, Greg Stark wrote: > > Brendan Duddridge <[EMAIL PROTECTED]> writes: > > More likely you were blocking on some lock. Until that other query holding > > that lock tries to commit Postgres won't actually detect a deadlock, it'll > > just sit waiting until the lock becomes available. > > Wow, are you sure that's how it works? I would think it would be able to > detect deadlocks as soon as both processes are waiting on each other's > locks. I don't see how it could wait for a commit. If a command is blocked waiting for a lock, how are you going to get a commit (you might get a rollback if the query is aborted)? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] is it possible to make this faster?
On Thu, May 25, 2006 at 16:31:40 -0400, Merlin Moncure <[EMAIL PROTECTED]> wrote: > On 5/25/06, Bruno Wolff III <[EMAIL PROTECTED]> wrote: > >On Thu, May 25, 2006 at 16:07:19 -0400, > > Merlin Moncure <[EMAIL PROTECTED]> wrote: > >> been doing a lot of pgsql/mysql performance testing lately, and there > >> is one query that mysql does much better than pgsql...and I see it a > >> lot in normal development: > >> > >> select a,b,max(c) from t group by a,b; > >> > > >SELECT DISTINCT ON (a, b) a, b, c FROM t ORDER BY a DESC, b DESC, c DESC; > > that is actually slower than group by in my case...am i missing > something? (both essentially resolved to seq_scan) If there aren't many c's for each (a,b), then a sort might be the best way to do this. I don't remember if skip scanning ever got done, but if it did, it would have been 8.1 or later. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] is it possible to make this faster?
On Thu, May 25, 2006 at 16:07:19 -0400, Merlin Moncure <[EMAIL PROTECTED]> wrote: > been doing a lot of pgsql/mysql performance testing lately, and there > is one query that mysql does much better than pgsql...and I see it a > lot in normal development: > > select a,b,max(c) from t group by a,b; > > t has an index on a,b,c. > > in my sample case with cardinality of 1000 for a, 2000 for b, and > 30 records in t, pgsql does a seq. scan on dev box in about a > second (returning 2000 records). > > recent versions of mysql do much better, returning same set in < 20ms. > mysql explain says it uses an index to optimize the group by somehow. > is there a faster way to write this query? SELECT DISTINCT ON (a, b) a, b, c FROM t ORDER BY a DESC, b DESC, c DESC; ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Optimizer: limit not taken into account
Please don't reply to previous messages to start new threads. This makes it harder to find stuff in the archives and may keep people from noticing your message. On Wed, May 17, 2006 at 08:54:52 -0700, "Craig A. James" <[EMAIL PROTECTED]> wrote: > Here's a "corner case" that might interest someone. It tripped up one of > our programmers. > > We have a table with > 10 million rows. The ID column is indexed, the > table has been vacuum/analyzed. Compare these two queries: > > select * from tbl where id >= 1000 limit 1; > select * from tbl where id >= 1000 order by id limit 1; > > The first takes 4 seconds, and uses a full table scan. The second takes 32 > msec and uses the index. Details are below. I suspect it wasn't intended to be a full table scan. But rather a sequential scan until it found a matching row. If the data in the table is ordered by by id, this strategy may not work out well. Where as if the data is randomly ordered, it would be expected to find a match quickly. Have you analyzed the table recently? If the planner has bad stats on the table, that is going to make it more likely to choose a bad plan. > I understand why the planner makes the choices it does -- the "id > > 1000" isn't very selective and under normal circumstances a full table > scan is probably the right choice. But the "limit 1" apparently doesn't > alter the planner's strategy at all. We were surprised by this. > > Adding the "order by" was a simple solution. > > Craig > > > > pg=> explain analyze select url, url_digest from url_queue where priority > >= 1000 limit 1; > QUERY PLAN > -- > Limit (cost=0.00..0.65 rows=1 width=108) (actual time=4036.113..4036.117 > rows=1 loops=1) > -> Seq Scan on url_queue (cost=0.00..391254.35 rows=606176 width=108) > (actual time=4036.101..4036.101 rows=1 loops=1) > Filter: (priority >= 1000) > Total runtime: 4036.200 ms > (4 rows) > > pg=> explain analyze select url, url_digest from url_queue where priority > >= 1000 order by priority limit 1; > QUERY PLAN > -- > Limit (cost=0.00..2.38 rows=1 width=112) (actual time=32.445..32.448 > rows=1 loops=1) > -> Index Scan using url_queue_priority on url_queue > (cost=0.00..1440200.41 rows=606176 width=112) (actual time=32.434..32.434 > rows=1 loops=1) > Index Cond: (priority >= 1000) > Total runtime: 32.566 ms > > ---(end of broadcast)--- > TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
On Thu, May 11, 2006 at 18:41:25 -0500, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > On Thu, May 11, 2006 at 07:20:27PM -0400, Bruce Momjian wrote: > > My damn powerbook drive recently failed with very little warning, other > than I did notice that disk activity seemed to be getting a bit slower. > IIRC it didn't log any errors or anything. Even if it did, if the OS was > catching them I'd hope it would pop up a warning or something. But from > what I've heard, some drives now-a-days will silently remap dead sectors > without telling the OS anything, which is great until you've used up all > of the spare sectors and there's nowhere to remap to. :( You might look into smartmontools. One part of this is a daemon that runs selftests on the disks on a regular basis. You can have warnings mailed to you on various conditions. Drives will fail the self test before they run out of spare sectors. There are other drive characteristics that can be used to tell if drive failure is imminent and give you a chance to replace a drive before it fails. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] UNSUBSCRIBE
On Wed, May 10, 2006 at 01:15:11 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: > > Maybe the real problem is at the other end of the process, ie we should > require some evidence of a greater-than-room-temp IQ to subscribe in the > first place? I suspect it is more lazyiness that smarts. That had to at least figure out how to respond to the confirm message in the first place in order to get subscribed. My theory is that they don't want to take the trouble to figure out how to unsubscribe when they (think that they) can just send a message to the list (not even the admin) asking to be unsubscribed and it will (well actually won't on these lists) happen. Maybe posts with "unsubscribe" in the subject could be held for moderation and/or get an automated reply with instructions for unsubscribing. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Arguments Pro/Contra Software Raid
On Tue, May 09, 2006 at 12:10:32 +0200, "Jean-Yves F. Barbier" <[EMAIL PROTECTED]> wrote: > Naa, you can find ATA &| SATA ctrlrs for about EUR30 ! But those are the ones that you would generally be better off not using. > Definitely NOT, however if your server doen't have a heavy load, the > software overload can't be noticed (essentially cache managing and > syncing) It is fairly common for database machines to be IO, rather than CPU, bound and so the CPU impact of software raid is low. > Some hardware ctrlrs are able to avoid the loss of a disk if you turn > to have some faulty sectors (by relocating internally them); software > RAID doesn't as sectors *must* be @ the same (linear) addresses. That is not true. Software raid works just fine on drives that have internally remapped sectors. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Takes too long to fetch the data from database
On Tue, May 09, 2006 at 09:24:15 +0530, soni de <[EMAIL PROTECTED]> wrote: > > EXPLAIN > pdb=# EXPLAIN ANALYZE select * from wan where kname = 'pluto' order by stime > limit 50; > NOTICE: QUERY PLAN: > > Limit (cost=3515.32..3515.32 rows=50 width=95) (actual time= > 230492.69..230493.07 rows=50 loops=1) > -> Sort (cost=3515.32..3515.32 rows=208 width=95) (actual time= > 230492.68..230493.00 rows=51 loops=1) >-> Seq Scan on wan (cost=0.00..3507.32 rows=208 width=95) (actual > time=0.44..229217.38 rows=18306 loops=1) > Total runtime: 230631.62 msec Unless you have an index on (kname, stime) the query is going to need to find the records with a value for kname of 'pluto' and then get the most recent 50 of them. It looks like there are enough estimated records with kname = 'pluto', that a sequential scan is being prefered. Creating an extra index will slow down inserts somewhat, but will speed up queries like the above significantly, so may be worthwhile for you. I think later versions of Postgres are smarter, but for sure in 7.2 you will need to write the query like: SELECT * FROM wan WHERE kname = 'pluto' ORDER BY kname DESC, stime DESC LIMIT 50 ; ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] postgresql transaction id monitoring with nagios
On Tue, May 02, 2006 at 12:06:30 -0700, Tony Wasson <[EMAIL PROTECTED]> wrote: > > Ah thanks, it's a bug in my understanding of the thresholds. > > "With the standard freezing policy, the age column will start at one > billion for a freshly-vacuumed database." > > So essentially, 1B is normal, 2B is the max. The logic is now.. > > The script detects a wrap at 2 billion. It starts warning once one or > more databases show an age over 1.5 billion transactions. It reports > critical at 1.75B transactions. > > If anyone else understands differently, hit me with a clue bat. Isn't this obsolete now anyway? I am pretty sure 8.1 has safeguards against wrap around. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Why so slow?
On Fri, Apr 28, 2006 at 17:37:30 +, Bealach-na Bo <[EMAIL PROTECTED]> wrote: > >The above shows that the indexes contained 10M rows and 160M of dead > >space each. That means you weren't vacuuming nearly enough. > > How is it that a row in the table can grow to a size far exceeding the sum > of the maximum sized of the fields it consists of? Because unless you run vacuum, the old deleted rows are not reused. Those rows cannot be deleted immediately, because the rows may be visible to other transactions. Periodic vacuums are used to find deleted rows which are no longer visible to any transactions. You probably want to read the following: http://developer.postgresql.org/docs/postgres/routine-vacuuming.html ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Running on an NFS Mounted Directory
On Thu, Apr 27, 2006 at 09:06:48 -0400, Ketema Harris <[EMAIL PROTECTED]> wrote: > Yes, your right, I meant not have to do the backups from the db server > itself. I can do that within the storage device now, by allocating space > for it, and letting the device copy the data files on some periodic basis. Only if the database server isn't running or your SAN provides a way to provide a snapshot of the data at a particular instant in time. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Running on an NFS Mounted Directory
On Thu, Apr 27, 2006 at 08:57:51 -0400, Ketema Harris <[EMAIL PROTECTED]> wrote: > performance from the db. I also would hopefully then not have to do > periodic backups from the db server to some other type of storage. Is this > not a good idea? How bad of a performance hit are we talking about? Also, You always need to do backups if you care about your data. What if someone accidental deletes a lot of data? What if someone blows up your data center (or there is a flood)? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Takes too long to fetch the data from database
On Fri, Apr 21, 2006 at 10:12:24 +0530, soni de <[EMAIL PROTECTED]> wrote: > I don't want to query exactly 81900 rows into set. I just want to fetch 50 > or 100 rows at a time in a decreasing order of stime.(i.e 50 or 100 rows > starting from last to end). You can do this efficiently, if stime has an index and you can deal with using stime from the previous query instead of the record count. The idea is to select up 50 or 100 records in descending order where the stime is <= the previous stime. This can give you some overlapping records, so you need some way to deal with this. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Takes too long to fetch the data from database
On Thu, Apr 20, 2006 at 11:07:31 +0530, soni de <[EMAIL PROTECTED]> wrote: > Please provide me some help regarding how could I use cursor in following > cases? : > > I want to fetch 50 records at a time starting from largest stime. > > SELECT * FROM wan ORDER BY stime LIMIT 50 OFFSET 81900; Something like the following may be faster: SELECT * FROM wan ORDER BY stime DESC LIMIT 50; ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Help optimizing a slow index scan
On Sat, Mar 18, 2006 at 11:50:48 +0300, Oleg Bartunov wrote: > I may be wrong but we in astronomy have several sky indexing schemes, which > allows to effectively use classical btree index. See > http://www.sai.msu.su/~megera/oddmuse/index.cgi/SkyPixelization > for details. Sergei Koposov has developed Q3C contrib module for > PostgreSQL 8.1+ and we use it with billiard size astronomical catalogs. Note that Earth Distance can also be used for astronomy. If you use an appropiate radius, distances will be in degrees. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Help optimizing a slow index scan
On Fri, Mar 17, 2006 at 08:34:26 -0700, Dan Harris <[EMAIL PROTECTED]> wrote: > Markus Bertheau wrote: > >Have you tried using a GIST index on lat & long? These things are > >meant for two-dimensional data, whereas btree doesn't handle > >two-dimensional data that well. How many rows satisfy either of the > >long / lat condition? > > > > > >> > According to the analyze, less than 500 rows matched. I'll look into > GIST indexes, thanks for the feedback. Have you looked at using the Earth Distance contrib module? If a spherical model of the earth is suitable for your application, then it may work for you and might be easier than trying to create Gist indexes yourself. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] No vacuum for insert-only database?
On Mon, Mar 13, 2006 at 09:19:32 -0800, "Craig A. James" <[EMAIL PROTECTED]> wrote: > Alvaro Herrera wrote: > >>If I only insert data into a table, never update or delete, then I should > >>never have to vacuum it. Is that correct? > > > >You still need to vacuum eventually, to avoid transaction Id wraparound > >issues. But not as often. > > Thanks. Any suggestions for what "not as often" means? For example, if my > database will never contain more than 10 million rows, is that a problem? > 100 million rows? When does transaction ID wraparound become a problem? I believe it is at billion (10^9). ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] help needed asap....
On Sun, Mar 12, 2006 at 11:46:25 -, Phadnis <[EMAIL PROTECTED]> wrote: > > 1 ) when i try to query for count or for any thg it takes a long time to > return the result. How to avoid this Postgres doesn't cache counts, so if you are counting a lot of records, this may take a while to run. If you do a lot of counts or need them to be fast even if it slows other things down, there are some things you can do to address this. Several strategies have been repeatedly discussed in the archives. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Large Table With Only a Few Rows
On Mon, Feb 27, 2006 at 06:48:02 -0800, Nik <[EMAIL PROTECTED]> wrote: > I have a table that has only a few records in it at the time, and they > get deleted every few seconds and new records are inserted. Table never > has more than 5-10 records in it. > > However, I noticed a deteriorating performance in deletes and inserts > on it. So I performed vacuum analyze on it three times (twice in a row, > and once two days later). In the statistics it says that the table size > is 863Mb, toast table size is 246Mb, and indexes size is 134Mb, even > though the table has only 5-10 rows in it it. I was wondering how can I > reclaim all this space and improve the performance? You can use VACUUM FULL to recover the space. You should be running normal VACUUMs on that table every minute or two, not once a day. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] rotate records
On Tue, Feb 28, 2006 at 09:14:59 +0530, "Jeevanandam, Kathirvel (IE10)" <[EMAIL PROTECTED]> wrote: > Hi all, Please don't hijack existing threads to start new ones. This can cause people to miss your question and messes up the archives. Performance questions should generally be posted to the performance list. I have redirected followups to there. > > I am facing performance issues even with less than 3000 records, I am > using Triggers/SPs in all the tables. What could be the problem. > Any idea it is good to use triggers w.r.t performance? A common cause of this kind of thing is not running vacuum often enough leaving you with a lot of dead tuples. You should probably start by doing a vacuum full analyse and then showing the list some problem query sources along with explain analyse output for them. > > Regards, > Jeeva.K > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index Usage using IN
On Thu, Feb 02, 2006 at 09:12:59 +1300, Ralph Mason <[EMAIL PROTECTED]> wrote: > Hi, > > I have 2 tables both have an index on ID (both ID columns are an oid). > > I want to find only only rows in one and not the other. > > Select ID from TableA where ID not IN ( Select ID from Table B) > > This always generates sequential scans. > > Table A has about 250,000 rows. Table B has about 250,000 Rows. > > We should get a Scan on Table B and a Index Lookup on Table A. I don't think that is going to work if there are NULLs in table B. I don't know whether or not Postgres has code to special case NULL testing (either for constraints ruling them out, or doing probes for them in addition to the key it is trying to match) for doing NOT IN. Just doing a simple index probe into table A isn't going to tell you all you need to know if you don't find a match. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?
On Sat, Dec 24, 2005 at 22:13:43 -0500, Luke Lonergan <[EMAIL PROTECTED]> wrote: > David, > > > now hot-swap may not be supported on all interface types, that may be what > > you have run into, but with SCSI or SATA you should be able to hot-swap > > with the right controller. > > That's actually the problem - Linux hot swap is virtually non-functional for > SCSI. You can write into the proper places in /proc, then remove and rescan > to get a new drive up, but I've found that the resulting OS state is flaky. > This is true of the latest 2.6 kernels and LSI and Adaptec SCSI controllers. > > The problems I've seen are with Linux, not the controllers. The other option is to keep hot spares available so that you can have a failure or two before you have to pull drives out. This might allow you to get to a maintenance window to swap out the bad drives. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Overriding the optimizer
On Thu, Dec 15, 2005 at 21:41:06 -0800, "Craig A. James" <[EMAIL PROTECTED]> wrote: > > If I understand enable_seqscan, it's an all-or-nothing affair. Turning it > off turns it off for the whole database, right? The same is true of all of You can turn it off just for specific queries. However, it will apply to all joins within a query. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance degradation after successive UPDATE's
On Wed, Dec 07, 2005 at 14:14:31 +0200, Assaf Yaari <[EMAIL PROTECTED]> wrote: > Hi Jan, > > As I'm novice with PostgreSQL, can you elaborate the term FSM and > settings recommendations? http://developer.postgresql.org/docs/postgres/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-FSM > BTW: I'm issuing VACUUM ANALYZE every 15 minutes (using cron) and also > changes the setting of fsync to false in postgresql.conf but still time > seems to be growing. You generally don't want fsync set to false. > Also no other transactions are open. Have you given us explain analyse samples yet? > > Thanks, > Assaf. > > > -Original Message- > > From: Jan Wieck [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, December 06, 2005 2:35 PM > > To: Assaf Yaari > > Cc: Bruno Wolff III; pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] Performance degradation after > > successive UPDATE's > > > > On 12/6/2005 4:08 AM, Assaf Yaari wrote: > > > Thanks Bruno, > > > > > > Issuing VACUUM FULL seems not to have influence on the time. > > > I've added to my script VACUUM ANALYZE every 100 UPDATE's > > and run the > > > test again (on different record) and the time still increase. > > > > I think he meant > > > > - run VACUUM FULL once, > > - adjust FSM settings to database size and turnover ratio > > - run VACUUM ANALYZE more frequent from there on. > > > > > > Jan > > > > > > > > Any other ideas? > > > > > > Thanks, > > > Assaf. > > > > > >> -Original Message- > > >> From: Bruno Wolff III [mailto:[EMAIL PROTECTED] > > >> Sent: Monday, December 05, 2005 10:36 PM > > >> To: Assaf Yaari > > >> Cc: pgsql-performance@postgresql.org > > >> Subject: Re: Performance degradation after successive UPDATE's > > >> > > >> On Mon, Dec 05, 2005 at 19:05:01 +0200, > > >> Assaf Yaari <[EMAIL PROTECTED]> wrote: > > >> > Hi, > > >> > > > >> > I'm using PostgreSQL 8.0.3 on Linux RedHat WS 3.0. > > >> > > > >> > My application updates counters in DB. I left a test > > over the night > > >> > that increased counter of specific record. After night running > > >> > (several hundreds of thousands updates), I found out > > that the time > > >> > spent on UPDATE increased to be more than 1.5 second (at > > >> the beginning > > >> > it was less than 10ms)! Issuing VACUUM ANALYZE and even > > >> reboot didn't > > >> > seemed to solve the problem. > > >> > > >> You need to be running vacuum more often to get rid of the deleted > > >> rows (update is essentially insert + delete). Once you get > > too many, > > >> plain vacuum won't be able to clean them up without > > raising the value > > >> you use for FSM. By now the table is really bloated and > > you probably > > >> want to use vacuum full on it. > > >> > > > > > > ---(end of > > > broadcast)--- > > > TIP 9: In versions below 8.0, the planner will ignore your desire to > > >choose an index scan if your joining column's > > datatypes do not > > >match > > > > > > -- > > #= > > =# > > # It's easier to get forgiveness for being wrong than for > > being right. # > > # Let's break this rule - forgive me. > > # > > #== > > [EMAIL PROTECTED] # > > ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Can this query go faster???
On Tue, Dec 06, 2005 at 10:52:57 +0100, Csaba Nagy <[EMAIL PROTECTED]> wrote: > Joost, > > Why do you use an offset here ? I guess you're traversing the table > somehow, in this case it would be better to remember the last zipcode + > housenumber and put an additional condition to get the next bigger than > the last one you've got... that would go for the index on > zipcode+housenumber and be very fast. The big offset forces postgres to > traverse that many entries until it's able to pick the one row for the > result... The other problem with saving an offset, is unless the data isn't changing or you are doing all of the searches in one serialized transaction, the fixed offset might not put you back where you left off. Using the last key, instead of counting records is normally a better way to do this. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance degradation after successive UPDATE's
On Tue, Dec 06, 2005 at 11:08:07 +0200, Assaf Yaari <[EMAIL PROTECTED]> wrote: > Thanks Bruno, > > Issuing VACUUM FULL seems not to have influence on the time. That was just to get the table size back down to something reasonable. > I've added to my script VACUUM ANALYZE every 100 UPDATE's and run the > test again (on different record) and the time still increase. Vacuuming every 100 updates should put an upperbound on how slow things get. I doubt you need to analyze every 100 updates, but that doesn't cost much more on top of a vacuum. However, if there is another transaction open while you are doing the updates, that would prevent clearing out the deleted rows, since they are potentially visible to it. This is something you want to rule out. > Any other ideas? Do you have any triggers on this table? Are you updating any other tables at the same time? In particular ones that are referred to by the problem table. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance degradation after successive UPDATE's
On Mon, Dec 05, 2005 at 19:05:01 +0200, Assaf Yaari <[EMAIL PROTECTED]> wrote: > Hi, > > I'm using PostgreSQL 8.0.3 on Linux RedHat WS 3.0. > > My application updates counters in DB. I left a test over the night that > increased counter of specific record. After night running (several > hundreds of thousands updates), I found out that the time spent on > UPDATE increased to be more than 1.5 second (at the beginning it was > less than 10ms)! Issuing VACUUM ANALYZE and even reboot didn't seemed to > solve the problem. You need to be running vacuum more often to get rid of the deleted rows (update is essentially insert + delete). Once you get too many, plain vacuum won't be able to clean them up without raising the value you use for FSM. By now the table is really bloated and you probably want to use vacuum full on it. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Effects of cascading references in foreign keys
On Sat, Oct 29, 2005 at 08:24:32 -0600, Michael Fuhr <[EMAIL PROTECTED]> wrote: > > Does an UPDATE of e.g. m_fld1 in t_master cause a 'lookup' in all tables > > which have a cascading update-rule or is this 'lookup' only triggered if > > the referenced column in t_master is explicitly updated? > > My tests suggest that a lookup on the referring key is done only > if the referenced key is changed. Here's an example from 8.1beta4; > I used this version because EXPLAIN ANALYZE shows triggers and the > time spent in them, but I see similar performance characteristics > in earlier versions. I've intentionally not put an index on the > referring column to make lookups on it slow. It looks like this feature was added last May, so I think it only applies to 8.1. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Effects of cascading references in foreign keys
On Sat, Oct 29, 2005 at 13:10:31 +0200, Martin Lesser <[EMAIL PROTECTED]> wrote: > Which effects have UPDATEs on REFERENCEd TABLEs when only columns in the > referenced table are updated which are not part of the FOREIGN KEY > constraint? In 8.1 there is a check to see if the foreign key value has changed and if not a trigger isn't queued. In the currently released versions any update will fire triggers. The check in comment for trigger.c didn't say if this optimization applied to both referencing and referenced keys or just one of those. If you need to know more you can look at the code at: http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/ for trigger.c. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] blue prints please
On Tue, Oct 25, 2005 at 22:24:06 -0600, Sidar López Cruz <[EMAIL PROTECTED]> wrote: > where can i find bests practices for tunning postgresql? You should first read the documentation. For 8.1, that would be here: http://developer.postgresql.org/docs/postgres/runtime-config.html There is also good information on techdocs at: http://techdocs.postgresql.org/#techguides (Look under the subcategory "optimising".) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] prepared transactions that persist across sessions?
On Sun, Oct 23, 2005 at 00:14:23 -0400, [EMAIL PROTECTED] wrote: > Hey all. > > Please point me to a place I should be looking if this is a common > question that has been debated periodically and at great length > already. :-) You probably want to read: http://candle.pha.pa.us/main/writings/pgsql/sgml/runtime-config-query.html Connection pooling might be another approach, since it should be possible to reuse prepared statements when reusing a connection. > I have a complex query. It's a few Kbytes large, and yes, I've already > worked on reducing it to be efficient in terms of database design, and > minimizing the expressions used to join the tables. Running some timing > tests, I've finding that the query itself, when issued in full, takes > around 60 milliseconds to complete on modest hardware. If prepared, and > then executed, however, it appears to take around 60 milliseconds to > prepare, and 20 milliseconds to execute. I'm not surprised. PostgreSQL > is very likely calculating the costs of many, many query plans. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] The need for full vacuum / reindex
On Wed, Sep 28, 2005 at 05:33:27 +0200, Tobias Brox <[EMAIL PROTECTED]> wrote: > By occation, we dropped the whole production database and refreshed it from > a database backup - and all our performance problems seems to have gone. I > suppose this means that to keep the database efficient, one eventually does > have to do reindexing and/or full vacuum from time to time? Normally you only need to do that if you didn't vacuum often enough or with high enough fsm setting and bloat has gotten out of hand to the point that you need to recover some space. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] VACUUM FULL vs CLUSTER
On Fri, Sep 23, 2005 at 18:16:44 +0200, Stef <[EMAIL PROTECTED]> wrote: > Bruno Wolff III mentioned : > => If you have a proper FSM setting you shouldn't need to do vacuum fulls > => (unless you have an older version of postgres where index bloat might > => be an issue). > > What version of postgres was the last version that had > the index bloat problem? You can check the release notes to be sure, but my memory is that the unbounded bloat problem was fixed in 7.4. There still are usage patterns that can result in bloating, but it is limited to some constant multiplier of the minimum index size. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] VACUUM FULL vs CLUSTER
On Tue, Sep 20, 2005 at 14:53:19 -0400, Markus Benne <[EMAIL PROTECTED]> wrote: > I have a table that is purged by 25% each night. I'd like to do a > vacuum nightly after the purge to reclaim the space, but I think I'll > still need to do a vacuum full weekly. > > Would there be any benefit to doing a cluster instead of the vacuum? If you have a proper FSM setting you shouldn't need to do vacuum fulls (unless you have an older version of postgres where index bloat might be an issue). ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Slow update
On Mon, Sep 12, 2005 at 10:14:25 +0100, Hilary Forbes <[EMAIL PROTECTED]> wrote: > Hello everyone > > I must be doing something very wrong here so help please! I have two tables > > tableA has 300,000 recs > tableB has 20,000 recs > > I need to set the value of a field in table A to a value in table B depending > on the existence of the record in table B. So what I have done is > > UPDATE tableA set a.val1=b.somefield FROM tableA a, tableB b WHERE > a.key1=b.key1; > > The primary key of tableA is key1 and that of tableB is key1 ie the join is > on primary keys. > > The "optimizer" has elected to d a sequential scan on tableA to determine > which fields to update rather than the query being driveb by tableB and it is > taking forever. Surely I must be able to force the system to read down > tableB in preference to reading down tableA? It would help to see the exact query and the explain analyze output. Hopefully you didn't really write the query similar to above, since it is using illegal syntax and the if it was changed slightly to become legal than it would do a cross join of table A with the inner join of tableA and tableB, which isn't what you want. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] RAID Configuration Sugestion
Please keep replies copied to the list so that others may contribute to and learn from the discussion. On Tue, Aug 30, 2005 at 10:15:13 -0300, Alvaro Nunes Melo <[EMAIL PROTECTED]> wrote: > Hello Bruno, > > Bruno Wolff III wrote: > > >On Tue, Aug 30, 2005 at 09:37:17 -0300, > > Alvaro Nunes Melo <[EMAIL PROTECTED]> wrote: > > > > > >>The server is a Dual Xeon 3.0 with 2 GB RAM and two SCSI disks. Our main > >>doubt is what is the best configuration for the disks. We are thinking > >>about use them in a RAID-0 array. Is this the best option? What do you > >>suggest on partitioning? Separate partitions for the OS, data and pg_xlog? > > > Our main goal is performance speedup. Disk space might not be a problem. > I've read a lot here about movig pg_xlog to different partitions, and > we'll surely make tests to see what configuration might be better. This isn't a very good mix of hardware for running postgres. Xeons have some context switching issues for which you will probably see some speed up in 8.1. (So if you aren't going into production for sevral months you might want to be using 8.1beta.) Having only two disk drives is also not a good idea. With what you have you either want to use raid 0 and not worry too much about how the disks are partitioned or use one disk for wal logging and the other for other stuff. There are other people on the list who can probably give you a better idea of which of these options is likely to be better in your case. However, they may need to know more about your raid controller. In particular how much battery backed memory does it have and its model. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] RAID Configuration Sugestion
On Tue, Aug 30, 2005 at 09:37:17 -0300, Alvaro Nunes Melo <[EMAIL PROTECTED]> wrote: > > The server is a Dual Xeon 3.0 with 2 GB RAM and two SCSI disks. Our main > doubt is what is the best configuration for the disks. We are thinking > about use them in a RAID-0 array. Is this the best option? What do you > suggest on partitioning? Separate partitions for the OS, data and pg_xlog? You don't have a lot of options with just two disks. What are you trying to accomplish with raid? Raid 0 will possibly give you some speed up, while raid 1 will give you some fault tolerance, some speed of of reads, but cost you half your disk space. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Need indexes on empty tables for good performance ?
On Tue, Aug 23, 2005 at 13:41:32 +1000, "Lenard, Rohan (Rohan)" <[EMAIL PROTECTED]> wrote: > I've read that indexes aren't used for COUNT(*) and I've noticed (7.3.x) > with EXPLAIN that indexes never seem to be used on empty tables - is > there any reason to have indexes on empty tables, or will postgresql > never use them. count will use indexes if appropiate. The counts themselves are NOT in the indexes, so counts of significant fractions of a table (in particular of the whole table) won't benefit from indexes. You aren't going to get query speed ups by putting indexes on empty tables. However, they may be required if you have unique or primary keys declared in the table. You may want them to enforce some kinds of constraints. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] How does the planner execute unions?
On Fri, Aug 26, 2005 at 16:14:18 -0400, Chris Hoover <[EMAIL PROTECTED]> wrote: > Hopefully a quick question. > > In 7.3.4, how does the planner execute a query with union alls in it? > > Does it execute the unions serially, or does it launch a "thread" for > each union (or maybe something else entirely). Postgres doesn't have parallel execution of parts of queries. So it is going to do one part followed by the other part. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching by Postgres
On Tue, Aug 23, 2005 at 10:10:45 -0700, gokulnathbabu manoharan <[EMAIL PROTECTED]> wrote: > Hi all, > > I like to know the caching policies of Postgresql. > What parameter in the postgresql.conf affects the > cache size used by the Postgresql? As far as I have > searched my knowledge of the parameters are The main policy is to let the OS do most of the caching. > 1. shared_buffers - Sets the limit on the amount of > shared memory used. If I take this is as the cache > size then my performance should increase with the > increase in the size of shared_buffers. But it seems > it is not the case and my performance actually > decreases with the increase in the shared_buffers. I > have a RAM size of 32 GB. The table which I use more > frequently has around 68 million rows. Can I cache > this entire table in RAM? Using extermely large values for shared buffers is known to be a performance loss for Postgres. Some improvements were made for 8.0 and more for 8.1. The OS will cache frequently used data from files for you. So if you are using that table a lot and the rows aren't too wide, it should mostly be cached for you by the OS. > 2. work_mem - It is the amount of memory used by an > operation. My guess is once the operation is complete > this is freed and hence has nothing to do with the > caching. This is used for sorts and some other things. > 3. effective_cache_size - The parameter used by the > query planner and has nothing to do with the actual > caching. You are supposed to use this to give the planner an idea about how much space the OS will using for caching on behalf of Posgres. > So kindly help me in pointing me to the correct > parameter to set. > > It will be great if you can point me to the docs that > explains the implementation of caching in Postgresql > which will help me in understanding things much > clearly. You probably want to read the following: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] BUG #1797: Problem using Limit in a function, seqscan
On Fri, Jul 29, 2005 at 13:52:45 +0100, Magno Leite <[EMAIL PROTECTED]> wrote: > > Description:Problem using Limit in a function, seqscan > > I looked for about this problem in BUG REPORT but I can't find. This is my > problem, when I try to use limit in a function, the Postgre doesn't use my > index, then it use sequencial scan. What is the problem ? You haven't described the problem well enough to allow us to help you and you posted it to the wrong list. This should be discussed on the performance list, not the bug list. It would help if you showed us the query you are running and run it outside of the function with EXPLAIN ANALYSE and show us that output. Depending on what that output shows, we may ask you other questions. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] slow joining very large table to smaller ones
On Thu, Jul 14, 2005 at 16:29:58 -0600, Dan Harris <[EMAIL PROTECTED]> wrote: > > Ok, I tried this one. My ssh keeps getting cut off by a router > somewhere between me and the server due to inactivity timeouts, so > all I know is that both the select and explain analyze are taking > over an hour to run. Here's the explain select for that one, since > that's the best I can get. Are you using NAT at home? That's probably where the issue is. If you have control of that box you can probably increase the timeout to a couple of hours. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] join and query planner
On Wed, Jul 06, 2005 at 18:54:02 -0300, Dario Pudlo <[EMAIL PROTECTED]> wrote: > (first at all, sorry for my english) > Hi. >- Does "left join" restrict the order in which the planner must join > tables? I've read about join, but i'm not sure about left join... The left join operator is not associative so in general the planner doesn't have much flexibility to reorder left (or right) joins. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Why the planner is not using the INDEX .
On Mon, Jul 04, 2005 at 20:29:50 -0400, David Gagnon <[EMAIL PROTECTED]> wrote: > Thanks .. I miss that FK don't create indexed ... since Primary key > implicitly does ... > > I'm a bit surprised of that behavior thought, since it means that if we > delete a row from table A all tables (B,C,D) with FK pointing to this > table (A) must be scanned. But in some applications you don't ever do that, so you don't save anything by having the index for deletes but have to pay the cost to update it when modifying the referencing table. If you think an index will help in your case, just create one. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] tricky query
On Tue, Jun 28, 2005 at 12:02:09 -0400, Merlin Moncure <[EMAIL PROTECTED]> wrote: > > Confirmed. Hats off to you, the above some really wicked querying. > IIRC I posted the same question several months ago with no response and > had given up on it. I think your solution (smallest X1 not in X) is a > good candidate for general bits, so I'm passing this to varlena for > review :) > > SELECT t1.id+1 as id_new FROM id_test t1 > WHERE NOT EXISTS > (SELECT t2.id FROM id_test t2 WHERE t2.id = t1.id+1) > ORDER BY t1.id LIMIT 1; You need to rework this to check to see if row '1' is missing. The above returns the start of the first gap after the first row that isn't missing. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] tricky query
On Tue, Jun 28, 2005 at 10:21:16 -0400, Merlin Moncure <[EMAIL PROTECTED]> wrote: > I need a fast way (sql only preferred) to solve the following problem: > > I need the smallest integer that is greater than zero that is not in the > column of a table. In other words, if an 'id' column has values > 1,2,3,4,6 and 7, I need a query that returns the value of 5. > > I've already worked out a query using generate_series (not scalable) and > pl/pgsql. An SQL only solution would be preferred, am I missing > something obvious? I would expect that using generate series from the 1 to the max (using order by and limit 1 to avoid extra sequential scans) and subtracting out the current list using except and then taking the minium value would be the best way to do this if the list is pretty dense and you don't want to change the structure. If it is sparse than you can do a special check for 1 and if that is present find the first row whose successor is not in the table. That shouldn't be too slow. If you are willing to change the structure you might keep one row for each number and use a flag to mark which ones are empty. If there are relatively few empty rows at any time, then you can create a partial index on the row number for only empty rows. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] parameterized LIKE does not use index
On Thu, Jun 23, 2005 at 10:33:18 +0200, Kurt De Grave <[EMAIL PROTECTED]> wrote: > > Now it's tempting to dream of some mechanism that could make the > database consider > replanning the query automatically once it knows the parameter, or > choose from > a set of plans depending on the parameter. In this case the general plan > was about three orders > of magnitude slower than the specialized plan. But I guess this case is > not all that common > and the developer can work around it. I remember some discussion about delaying planning until the first actual query so that planning could use actual parameters to do the planning. If you really want to have it check the parameters every time, I think you will need to replan every time. I don't know if there is a way to save some of the prepare working while doing this. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Needed: Simplified guide to optimal memory configuration
On Thu, Jun 16, 2005 at 07:46:45 -0700, Todd Landfried <[EMAIL PROTECTED]> wrote: > Yes, it is 7.2. Why? because an older version of our software runs on > RH7.3 and that was the latest supported release of Postgresql for > RH7.3 (that we can find). We're currently ported to 8, but we still > have a large installed base with the other version. You can build it from source. I run 8.0 stable from CVS on a RH 6.1 box. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Needed: Simplified guide to optimal memory configuration
On Wed, Jun 15, 2005 at 02:06:27 -0700, Todd Landfried <[EMAIL PROTECTED]> wrote: > > What's the problem? The sucker gets s-l-o-w on relatively simple > queries. For example, simply listing all of the users online at one > time takes 30-45 seconds if we're talking about 800 users. We've > adjusted the time period for vacuuming the tables to the point where > it occurs once an hour, but we're getting only a 25% performance gain > from that. We're looking at the system settings now to see how those > can be tweaked. It might be useful to see example slow queries and the corresponding explain analyze output. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Index ot being used
On Mon, Jun 13, 2005 at 15:05:00 -0400, Madison Kelly <[EMAIL PROTECTED]> wrote: > Wow! > > With the sequence scan off my query took less than 2sec. When I turned > it back on the time jumped back up to just under 14sec. > > > tle-bu=> set enable_seqscan = off; SET > tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display > FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC, > file_parent_dir ASC, file_name ASC; > > QUERY PLAN > > Index Scan using file_info_7_display_idx on file_info_7 > (cost=0.00..83171.78 rows=25490 width=119) (actual > time=141.405..1700.459 rows=25795 loops=1) >Index Cond: ((file_type)::text = 'd'::text) > Total runtime: 1851.366 ms > (3 rows) > > > tle-bu=> set enable_seqscan = on; SET > tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display > FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC, > file_parent_dir ASC, file_name ASC; > QUERY PLAN > > Sort (cost=14810.92..14874.65 rows=25490 width=119) (actual > time=13605.185..13728.436 rows=25795 loops=1) >Sort Key: file_type, file_parent_dir, file_name >-> Seq Scan on file_info_7 (cost=0.00..11956.84 rows=25490 > width=119) (actual time=0.048..2018.996 rows=25795 loops=1) > Filter: ((file_type)::text = 'd'::text) > Total runtime: 13865.830 ms > (5 rows) > > So the index obiously provides a major performance boost! I just need > to figure out how to tell the planner how to use it... The two things you probably want to look at are (in postgresql.conf): effective_cache_size = 1# typically 8KB each random_page_cost = 2# units are one sequential page fetch cost Increasing effective cache size and decreasing the penalty for random disk fetches will favor using index scans. People have reported that dropping random_page_cost from the default of 4 to 2 works well. Effective cache size should be set to some reasonable estimate of the memory available on your system to postgres, not counting that set aside for shared buffers. However, since the planner thought the index scan plan was going to be 6 times slower than the sequential scan plan, I don't know if tweaking these values enough to switch the plan choice won't cause problems for other queries. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Index ot being used
On Mon, Jun 13, 2005 at 11:46:46 -0500, Kevin Grittner <[EMAIL PROTECTED]> wrote: > I agree that ignoring useless columns in an ORDER BY clause is less > important than ignoring index columns where the value is fixed. There > is one use case for ignoring useless ORDER BY columns that leaps to > mind, however -- a column is added to the ORDER BY clause of a query to > help out the optimizer, then the indexes are modified such that that > column is no longer useful. Whether this merits the programming effort > and performance hit you describe seems highly questionable, though. I suspect that this isn't a big deal. There was a question like that that has been going back and forth over the last couple of days. If you remove the constant expression from the index, you aren't likely going to use the index anyway, but will instead sort the output rows from either a sequential scan or an index scan based on an index that does use the constant expression. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Index ot being used
On Mon, Jun 13, 2005 at 12:22:14 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: > > I don't think the use-case has been shown that justifies doing this much > work to ignore useless ORDER BY clauses. The examples that have come up > in the past all suggest ignoring index columns not the other way 'round. > Can you make a case that we need to do that part of it? I don't think so. I don't think people are likely to order by constant expressions except by adding them to the front to help optimization. When I was thinking about this I was looking at what equivalences could be used and didn't look back to see which ones would be useful in the normal case. And I think it is a lot more likely people will leave out columns they know not to be relevant than to include them. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Index ot being used
On Mon, Jun 13, 2005 at 09:51:57 -0500, John A Meinel <[EMAIL PROTECTED]> wrote: > > I don't know if there are specific reasons why not, other than just not > being implemented yet. It might be tricky to get it correct (for > instance, how do you know which columns can be added, which ones will be > constant) Perhaps you could just potentially add the WHERE items if they > have an equality constraint with a constant. But I'm guessing there are > more cases than that where the optimization could be performed. I think there is already some intelligence about which expressions are constant in particular parts of a plan. I think you need to be able to do two things. One is to drop constant expressions from order by lists. The other is when looking for an index to produce a specific ordering, to ingore leading constant expressions when comparing to the order by expressions. > Also, the more options you give the planner, the longer it takes on > average to plan any single query. Yes, it is beneficial for this use > case, but does that balance out slowing down all the other queries by a > tiny bit. But there aren't that many possible indexes, so I don't expect this will slow things down much more than the current check for potentially useful indexes. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Index ot being used
On Mon, Jun 13, 2005 at 00:29:08 -0400, Madison Kelly <[EMAIL PROTECTED]> wrote: > Bruno Wolff III wrote: > >On Sun, Jun 12, 2005 at 23:42:05 -0400, > > Madison Kelly <[EMAIL PROTECTED]> wrote: > > > >>As you probably saw in my last reply, I went back to the old index and > >>tried the query you and Tom Lane recommended. Should this not have > >>caught the index? > > > > > >Probably, but there might be some other reason the planner thought it > >was better to not use it. Using indexes is not always faster. > > > >It would help to see your latest definition of the table and indexes, > >the exact query you used and explain analyze output. > > > > Okay, here's what I have at the moment: > > tle-bu=> \d file_info_7 Table > "public.file_info_7" > Column| Type |Modifiers > --+--+- > file_group_name | text | > file_group_uid | bigint | not null > file_mod_time| bigint | not null > file_name| text | not null > file_parent_dir | text | not null > file_perm| text | not null > file_size| bigint | not null > file_type| character varying(2) | not null default > 'f'::character varying > file_user_name | text | > file_user_uid| bigint | not null > file_backup | boolean | not null default true > file_display | boolean | not null default false > file_restore_display | boolean | not null default false > file_restore | boolean | not null default false > Indexes: > "file_info_7_display_idx" btree (file_parent_dir, file_name) > > > tle-bu=> \d file_info_7_display_idx > Index "public.file_info_7_display_idx" > Column | Type > -+-- > file_parent_dir | text > file_name | text > btree, for table "public.file_info_7" > > > tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display > FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC, > file_name ASC; > QUERY PLAN > > Sort (cost=15091.53..15165.29 rows=29502 width=114) (actual > time=12834.933..12955.136 rows=25795 loops=1) >Sort Key: file_parent_dir, file_name >-> Seq Scan on file_info_7 (cost=0.00..11762.44 rows=29502 > width=114) (actual time=0.244..2533.388 rows=25795 loops=1) > Filter: ((file_type)::text = 'd'::text) > Total runtime: 13042.421 ms > (5 rows) > > > Since my last post I went back to a query closer to what I actually > want. What is most important to me is that 'file_parent_dir, file_name, > file_display' are returned and that the results are sorted by > 'file_parent_dir, file_name' and the results are restricted to where > 'file_info='d''. I am guessing you mean 'file_type' instead of 'file_info'. To do this efficiently you want an index on (file_type, file_parent_dir, file_name). Currently you only have an index on (file_parent_dir, file_name) which won't help for this query. You also need to order by file_type even though it will be constant for all of the returned rows in order to help out the planner. This will allow an index scan over the desired rows that returns them in the desired order. Please actually try this before changing anything else. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Index ot being used
On Sun, Jun 12, 2005 at 23:42:05 -0400, Madison Kelly <[EMAIL PROTECTED]> wrote: > > As you probably saw in my last reply, I went back to the old index and > tried the query you and Tom Lane recommended. Should this not have > caught the index? Probably, but there might be some other reason the planner thought it was better to not use it. Using indexes is not always faster. It would help to see your latest definition of the table and indexes, the exact query you used and explain analyze output. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Index ot being used
On Sun, Jun 12, 2005 at 22:00:01 -0500, Bruno Wolff III <[EMAIL PROTECTED]> wrote: > On Sun, Jun 12, 2005 at 18:52:05 -0400, > Madison Kelly <[EMAIL PROTECTED]> wrote: > > > > After sending that email I kept plucking away and in the course of > > doing so decided that I didn't need to return the 'file_type' column. > > Other than that, it would see my query now matches what you two have > > recommended in the 'ORDER BY...' front but I still can't get an index > > search. > > No it doesn't. Even if you don't return file_type you still need it > in the order by clause if you want postgres to consider using your > index. I didn't notice that you had changed the index. The reason this index doesn't help is that you can't use it to select on records with the desired file_type. > > Is there some reason you didn't actually try out our suggestion, but are > now asking for more advice? > > > > > Here is the latest query and the new index: > > > > tle-bu=> \d file_info_7_display_idx; > > Index "public.file_info_7_display_idx" > > Column | Type > > -+-- > > file_parent_dir | text > > file_name | text > > btree, for table "public.file_info_7" > > > > tle-bu=> EXPLAIN ANALYZE SELECT file_parent_dir, file_name, file_display > > FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC, > > file_name ASC; > > ---(end of broadcast)--- > TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Index ot being used
On Sun, Jun 12, 2005 at 18:52:05 -0400, Madison Kelly <[EMAIL PROTECTED]> wrote: > > After sending that email I kept plucking away and in the course of > doing so decided that I didn't need to return the 'file_type' column. > Other than that, it would see my query now matches what you two have > recommended in the 'ORDER BY...' front but I still can't get an index > search. No it doesn't. Even if you don't return file_type you still need it in the order by clause if you want postgres to consider using your index. Is there some reason you didn't actually try out our suggestion, but are now asking for more advice? > > Here is the latest query and the new index: > > tle-bu=> \d file_info_7_display_idx; > Index "public.file_info_7_display_idx" > Column | Type > -+-- > file_parent_dir | text > file_name | text > btree, for table "public.file_info_7" > > tle-bu=> EXPLAIN ANALYZE SELECT file_parent_dir, file_name, file_display > FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC, > file_name ASC; ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index ot being used
On Sun, Jun 12, 2005 at 10:12:27 -0400, Madison Kelly <[EMAIL PROTECTED]> wrote: > Indexes: > "file_info_7_display_idx" btree (file_type, file_parent_dir, file_name) > Here is my full query: > > tle-bu=> EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_type > FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC, > file_name ASC; > QUERY PLAN > - This is a case where postgres's planner can't make a deduction needed for it to realize that the index can be used. Try rewriting the query as: SELECT file_name, file_parent_dir, file_type FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC, file_parent_dir ASC, file_name ASC; ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Help with rewriting query
On Thu, Jun 09, 2005 at 18:26:09 -0700, Junaili Lie <[EMAIL PROTECTED]> wrote: > Hi Bruno, > I followed your suggestion. > The query plan shows that it uses the index (id, person_id). However, > the execution time is still slow. I have to do ctl-C to stop it. > Maybe something is wrong with my postgresql config. > It's running Solaris on dual Opteron, 4GB. > I allocated around 128MB for sorting and more than 80% for > effective_cache_size and shared_buffers = 32768. > Any further ideas is much appreciated. It might be useful to see that plan and the actual query you used. There were only 569 entries in the people table, so I find it hard to believe that an index look up per person is taking so long that you need to cancel the query. > > > > > On 6/8/05, Bruno Wolff III <[EMAIL PROTECTED]> wrote: > > On Wed, Jun 08, 2005 at 15:48:27 -0700, > > Junaili Lie <[EMAIL PROTECTED]> wrote: > > > Hi, > > > The suggested query below took forever when I tried it. > > > In addition, as suggested by Tobias, I also tried to create index on > > > food(p_id, id), but still no goal (same query plan). > > > Here is the explain: > > > TEST1=# explain select f.p_id, max(f.id) from Food f, Person p where > > > (f.p_id = p.id) group by p.id; > > > > The above is going to require reading all the food table (assuming no > > orphaned records), so the plan below seems reasonable. > > > > > QUERY PLAN > > > > > > GroupAggregate (cost=0.00..214585.51 rows=569 width=16) > > > -> Merge Join (cost=0.00..200163.50 rows=2884117 width=16) > > > Merge Cond: ("outer".id = "inner".p_id) > > > -> Index Scan using person_pkey on person p > > > (cost=0.00..25.17 rows=569 width=8) > > > -> Index Scan using person_id_food_index on food f > > > (cost=0.00..164085.54 rows=2884117 width=16) > > > (5 rows) > > > > > > > > > > > > > > > TEST1=# explain select p.id, (Select f.id from food f where > > > f.p_id=p.id order by f.id desc limit 1) from person p; > > > > Using a subselect seems to be the best hope of getting better performance. > > I think you almost got it right, but in order to use the index on > > (p_id, id) you need to order by f.p_id desc, f.id desc. Postgres won't > > deduce this index can be used because f.p_id is constant in the subselect, > > you need to give it some help. > > > > >QUERY PLAN > > > --- > > > Seq Scan on Person p (cost=1.00..17015.24 rows=569 width=8) > > > SubPlan > > > -> Limit (cost=0.00..12.31 rows=1 width=8) > > > -> Index Scan Backward using food_pkey on food f > > > (cost=0.00..111261.90 rows=9042 width=8) > > > Filter: (p_id = $0) > > > (5 rows) > > > > > > any ideas or suggestions is appreciate. > > > > > > > > > On 6/8/05, Tobias Brox <[EMAIL PROTECTED]> wrote: > > > > [Junaili Lie - Wed at 12:34:32PM -0700] > > > > > select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group > > > > > by f.p_id will work. > > > > > But I understand this is not the most efficient way. Is there another > > > > > way to rewrite this query? (maybe one that involves order by desc > > > > > limit 1) > > > > > > > > eventually, try something like > > > > > > > > select p.id,(select f.id from food f where f.p_id=p.id order by f.id > > > > desc limit 1) > > > > from person p > > > > > > > > not tested, no warranties. > > > > > > > > Since subqueries can be inefficient, use "explain analyze" to see which > > > > one > > > > is actually better. > > > > > > > > This issue will be solved in future versions of postgresql. > > > > > > > > -- > > > > Tobias Brox, +47-91700050 > > > > Tallinn > > > > > > > > > > ---(end of broadcast)--- > > > TIP 9: the planner will ignore your desire to choose an index scan if your > > > joining column's datatypes do not match > > ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Help with rewriting query
On Wed, Jun 08, 2005 at 15:48:27 -0700, Junaili Lie <[EMAIL PROTECTED]> wrote: > Hi, > The suggested query below took forever when I tried it. > In addition, as suggested by Tobias, I also tried to create index on > food(p_id, id), but still no goal (same query plan). > Here is the explain: > TEST1=# explain select f.p_id, max(f.id) from Food f, Person p where > (f.p_id = p.id) group by p.id; The above is going to require reading all the food table (assuming no orphaned records), so the plan below seems reasonable. > QUERY PLAN > > GroupAggregate (cost=0.00..214585.51 rows=569 width=16) > -> Merge Join (cost=0.00..200163.50 rows=2884117 width=16) > Merge Cond: ("outer".id = "inner".p_id) > -> Index Scan using person_pkey on person p > (cost=0.00..25.17 rows=569 width=8) > -> Index Scan using person_id_food_index on food f > (cost=0.00..164085.54 rows=2884117 width=16) > (5 rows) > > > > > TEST1=# explain select p.id, (Select f.id from food f where > f.p_id=p.id order by f.id desc limit 1) from person p; Using a subselect seems to be the best hope of getting better performance. I think you almost got it right, but in order to use the index on (p_id, id) you need to order by f.p_id desc, f.id desc. Postgres won't deduce this index can be used because f.p_id is constant in the subselect, you need to give it some help. >QUERY PLAN > --- > Seq Scan on Person p (cost=1.00..17015.24 rows=569 width=8) > SubPlan > -> Limit (cost=0.00..12.31 rows=1 width=8) > -> Index Scan Backward using food_pkey on food f > (cost=0.00..111261.90 rows=9042 width=8) > Filter: (p_id = $0) > (5 rows) > > any ideas or suggestions is appreciate. > > > On 6/8/05, Tobias Brox <[EMAIL PROTECTED]> wrote: > > [Junaili Lie - Wed at 12:34:32PM -0700] > > > select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group > > > by f.p_id will work. > > > But I understand this is not the most efficient way. Is there another > > > way to rewrite this query? (maybe one that involves order by desc > > > limit 1) > > > > eventually, try something like > > > > select p.id,(select f.id from food f where f.p_id=p.id order by f.id desc > > limit 1) > > from person p > > > > not tested, no warranties. > > > > Since subqueries can be inefficient, use "explain analyze" to see which one > > is actually better. > > > > This issue will be solved in future versions of postgresql. > > > > -- > > Tobias Brox, +47-91700050 > > Tallinn > > > > ---(end of broadcast)--- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] full outer performance problem
On Wed, Jun 08, 2005 at 11:37:40 +0200, Kim Bisgaard <[EMAIL PROTECTED]> wrote: > Hi, > > I'm having problems with the query optimizer and FULL OUTER JOIN on > PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins. > I might be naive, but I think that it should be possible? > > I have two BIG tables (virtually identical) with 3 NOT NULL columns > Station_id, TimeObs, Temp_, with unique indexes on (Station_id, > TimeObs) and valid ANALYSE (set statistics=100). I want to join the two > tables with a FULL OUTER JOIN. > > When I specify the query as: > > SELECT station_id, timeobs,temp_grass, temp_dry_at_2m >FROM temp_dry_at_2m a >FULL OUTER JOIN temp_grass b >USING (station_id, timeobs) >WHERE station_id = 52981 > AND timeobs = '2004-1-1 0:0:0' > > I get the correct results > > station_id | timeobs | temp_grass | temp_dry_at_2m > +-++ > 52944 | 2004-01-01 00:10:00 || -1.1 > (1 row) > > BUT LOUSY performance, and the following EXPLAIN: > > >QUERY PLAN > -- > Merge Full Join (cost=1542369.83..1618958.58 rows=6956994 width=32) > (actual time=187176.408..201436.264 rows=1 loops=1) > Merge Cond: (("outer".station_id = "inner".station_id) AND > ("outer".timeobs = "inner".timeobs)) > Filter: ((COALESCE("outer".station_id, "inner".station_id) = 52981) AND > (COALESCE("outer".timeobs, "inner".timeobs) = '2004-01-01 > 00:00:00'::timestamp without time zone)) > -> Sort (cost=1207913.44..1225305.93 rows=6956994 width=16) (actual > time=145748.253..153851.607 rows=6956994 loops=1) > Sort Key: a.station_id, a.timeobs > -> Seq Scan on temp_dry_at_2m a (cost=0.00..117549.94 > rows=6956994 width=16) (actual time=0.049..54226.770 rows=6956994 > loops=1) > -> Sort (cost=334456.38..340472.11 rows=2406292 width=16) (actual > time=31668.876..34491.123 rows=2406292 loops=1) > Sort Key: b.station_id, b.timeobs > -> Seq Scan on temp_grass b (cost=0.00..40658.92 rows=2406292 > width=16) (actual time=0.052..5484.489 rows=2406292 loops=1) > Total runtime: 201795.989 ms > (10 rows) Someone else will need to comment on why Postgres can't use a more efficient plan. What I think will work for you is to restrict the station_id and timeobs on each side and then do a full join. You can try something like the sample query below (which hasn't been tested): SELECT station_id, timeobs, temp_grass, temp_dry_at_2m FROM (SELECT station_id, timeobs, temp_dry_at_2m FROM temp_dry_at_2m WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0') a FULL OUTER JOIN (SELECT station_id, timeobs, temp_grass FROM temp_grass WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0') b USING (station_id, timeobs) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] BUG #1697: Select getting slower on continously updating data
On Fri, Jun 03, 2005 at 00:09:00 -0700, Bahadur Singh <[EMAIL PROTECTED]> wrote: > > Many thanks for this tip ! > But is this good idea to analyse/vacuuming the > database tables while updates are taking place.. > Since, I update continuously say (100,000 ) times or > more the same data set. > > This is the result of analyze command. > > INFO: analyzing "public.salesarticle" > INFO: "salesarticle": scanned 3000 of 20850 pages, > containing 62 live rows and 134938 dead rows; 62 rows > in sample, 431 estimated total rows > > Gesamtlaufzeit der Abfrage: 5531 ms. > Total Time Taken : 5531 ms. > > Can you suggest me some clever way to so, because I > would prefer to do vaccumming while database is not > loaded with queries/transactions. While that may be a nice preference, under your usage pattern that does not appear to be a good idea. As long as your disk I/O isn't saturated you want to be running vacuums a lot more often than you are. (Analyze should only be needed if the distrution of values is changing constantly. An example would be timestamps indicating when an update occured.) ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Index on a NULL-value
On Tue, May 31, 2005 at 11:31:58 +0800, Tobias Brox <[EMAIL PROTECTED]> wrote: > [Tobias Brox] > > test=# set enable_seqscan=off; > > [Bruno Wolff III - Mon at 10:16:53PM -0500] > > It isn't surprising that an index wasn't used since a sequential scan is > > going to be faster in your test case. > > > > If you want to test this out, you to want use realistically sized tables. > > Wrong. In this case I was not wondering about the planners choise of not > using the index, but the fact that the planner could not find the index at > all. Reproducing it on a simple table in a test environment was a valid > strategy to solve this specific problem. I missed that you turned sequential scans off for your test. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Index on a NULL-value
On Tue, May 31, 2005 at 11:21:20 +0800, Tobias Brox <[EMAIL PROTECTED]> wrote: > [Tobias Brox - Tue at 11:02:07AM +0800] > > test=# explain select * from mock where b is NULL; > > QUERY PLAN > > > > Seq Scan on mock (cost=1.00..10020.00 rows=6 width=8) > >Filter: (b IS NULL) > > (2 rows) > > (...) > > > ---(end of broadcast)--- > > TIP 9: the planner will ignore your desire to choose an index scan if your > > joining column's datatypes do not match > > That tip helped me :-) > > test=# explain select * from mock where (b IS NULL)=true; > QUERY PLAN > > -- > Index Scan using b_is_null on mock (cost=0.00..4.68 rows=1 width=8) >Index Cond: ((b IS NULL) = true) > (2 rows) Looked back at your first example and saw that you didn't use a partial index which is why you had to contort things to make it possible to use an indexed search. (Though the planner really should have done this since all of the rows should be in one disk block and doing an index scan should require doing more disk reads than a sequential scan for the test case you used.) You want something like this: CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL; The advantage is that the index can be a lot smaller than an index over all of the rows in the case where only a small fraction of rows have a null value for b. (If this isn't the case you probably don't want the index.) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index on a NULL-value
On Tue, May 31, 2005 at 11:02:07 +0800, Tobias Brox <[EMAIL PROTECTED]> wrote: > I read in the manual today: > > Indexes are not used for IS NULL clauses by default. The best way to use > indexes in such cases is to create a partial index using an IS NULL > predicate. > > This is from the documentation for PostgreSQL 8. I did not find anything > equivalent in the 7.4.8-documentation. > > I wasn't aware of this until it became an issue :-) Well, so I follow the > tip but in vain. Reduced and reproduced like this in PostgreSQL 7.4.7: > > test=# create table mock(a int, b int); > CREATE TABLE > test=# create index b_is_null on mock((b IS NULL)); > CREATE INDEX > test=# insert into mock values (10,20); > INSERT 70385040 1 > test=# insert into mock values (20,30); > INSERT 70385041 1 > test=# insert into mock values (30, NULL); > INSERT 70385042 1 > test=# set enable_seqscan=off; > SET > test=# explain select * from mock where b is NULL; > QUERY PLAN > > Seq Scan on mock (cost=1.00..10020.00 rows=6 width=8) >Filter: (b IS NULL) > (2 rows) > > vacuum analyze also didn't help to recognize the index ;-) It isn't surprising that an index wasn't used since a sequential scan is going to be faster in your test case. If you want to test this out, you to want use realistically sized tables. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Can anyone explain this: duplicate dbs.
On Tue, May 24, 2005 at 21:39:15 -0500, John A Meinel <[EMAIL PROTECTED]> wrote: > > By the way, I think doing: > > CREATE DATABASE tempdb WITH TEMPLATE = originaldb; > > Is a much faster way of doing dump and load. I *think* it would recreate > indexes, etc. If it just does a copy it may not show the dump/restore > improvement. You need to be careful when doing this. See section 18.3 of the 8.0 docs for caveats. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Select performance vs. mssql
On Tue, May 24, 2005 at 08:36:36 -0700, mark durrant <[EMAIL PROTECTED]> wrote: > > --MSSQL's ability to hit the index only and not having > to go to the table itself results in a _big_ > performance/efficiency gain. If someone who's in > development wants to pass this along, it would be a > nice addition to PostgreSQL sometime in the future. > I'd suspect that as well as making one query faster, > it would make everything else faster/more scalable as > the server load is so much less. This gets brought up a lot. The problem is that the index doesn't include information about whether the current transaction can see the referenced row. Putting this information in the index will add significant overhead to every update and the opinion of the developers is that this would be a net loss overall. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster