Re: [PERFORM] FreeBSD config
On Thu, Feb 26, 2004 at 06:06:06PM -0500, Christopher Weimann wrote: > On 02/26/2004-11:16AM, Dror Matalon wrote: > > > > > > effective_cache_size changes no cache settings for postgresql, it simply > > > acts as a hint to the planner on about how much of the dataset your OS / > > > Kernel / Disk cache can hold. > > > > I understand that. The question is why have the OS, in this case FreeBsd > > use only 200 Megs for disk cache and not more. Why not double the > > vfs.hibufspace to 418119680 and double the effective_cache_size to 51040. > > > > FreeBSD uses ALL ram that isn't being used for something else as > its disk cache. The "effective_cache_size" in the PostGreSQL config > has no effect on how the OS chooses to use memory, it is just hint > to the PostGreSQL planner so it can guess the the likelyhood of > what it is looking for being in the cache. Let me try and say it again. I know that setting effective_cache_size doesn't affect the OS' cache. I know it just gives Postgres the *idea* of how much cache the OS is using. I know that. I also know that a correct hint helps performance. I've read Matt Dillon's discussion about the freebsd VM at http://www.daemonnews.org/21/freebsd_vm.html and I didn't see him saying that Freebsd uses all the free RAM for disk cache. Would you care to provide a URL pointing to that? Assuming you are correct, why has the ongoing recommendation been to use hibufspace/8192 as the effective_cache_size? Seems like it would be quite a bit more on machines with lots of RAM. Regards, Dror -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] FreeBSD config
On Mon, Mar 01, 2004 at 08:30:58PM +1300, Mark Kirkwood wrote: > > > Shridhar Daithankar wrote: > > >Dror Matalon wrote: > > > >>I've read Matt Dillon's discussion about the freebsd VM at > >>http://www.daemonnews.org/21/freebsd_vm.html and I didn't see him > >>saying that Freebsd uses all the free RAM for disk cache. Would you care > >>to provide a URL pointing to that? > > > > > > I noticed this passage too, but ... > Quoting from http://www.daemonnews.org/21/freebsd_vm.html : > > * > When To Free a Page* > > Since the VM system uses all available memory for disk caching, there ^ The VM system, as you can see from the article, is focused on paging and caching the programs and program data. Is the cache for disk reads and writes thrown into the mix as well? > are usually very few truly-free pages... > > > Got to say - a very interesting discussion you have all being having, I > am now quite confused about what those vfs.*buf* variables actually do... Same here. > > Please feed back any clarfications from the FreeBSD experts to this list! > > regards > > Mark > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] FreeBSD config
I guess the thing to do is to move this topic over to a freebsd list where we can get more definitive answers on how disk caching is handled. I asked here since I know that FreeBsd is often recommended, http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html# as a good platform for postgres, and with Modern machines often having Gigabytes of memory the issue of, possibly, having a disk cache of 200MB would be one often asked. On Fri, Feb 27, 2004 at 12:46:08PM +0530, Shridhar Daithankar wrote: > Dror Matalon wrote: > > >Let me try and say it again. I know that setting effective_cache_size > >doesn't affect the OS' cache. I know it just gives Postgres the *idea* > >of how much cache the OS is using. I know that. I also know that a > >correct hint helps performance. > > > >I've read Matt Dillon's discussion about the freebsd VM at > >http://www.daemonnews.org/21/freebsd_vm.html and I didn't see him > >saying that Freebsd uses all the free RAM for disk cache. Would you care > >to provide a URL pointing to that? > > I don't believe freeBSD yses everything available unlike linux. It is > actually a good thing. If you have 1GB RAM and kernel buffers set at 600MB, > you are guaranteed to have some mmory in crunch situations. > > As far you original questions, I think you can increase the kernel buffer > sizes for VFS safely. However remembet that more to dedicate to kernel > buffers, less space you have in case of crunch for whatever reasons. > > FreeBSD gives you a control which linux does not. Use it to best of your > advantage.. > > Shridhar -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] FreeBSD config
Hi, We have postgres running on freebsd 4.9 with 2 Gigs of memory. As per repeated advice on the mailing lists we configured effective_cache_size = 25520 which you get by doing `sysctl -n vfs.hibufspace` / 8192 Which results in using 200Megs for disk caching. Is there a reason not to increase the hibufspace beyond the 200 megs and provide a bigger cache to postgres? I looked both on the postgres and freebsd mailing lists and couldn't find a good answer to this. If yes, any suggestions on what would be a good size on a 2 Gig machine? Regards, Dror -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] FreeBSD config
On Fri, Feb 27, 2004 at 05:47:47AM +0800, Christopher Kings-Lynne wrote: > > We have postgres running on freebsd 4.9 with 2 Gigs of memory. As per > > repeated advice on the mailing lists we configured effective_cache_size > > = 25520 which you get by doing `sysctl -n vfs.hibufspace` / 8192 > > > > Which results in using 200Megs for disk caching. > > effective_cache_size does nothing of the sort. CHeck your > shared_buffers value... Sigh. http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html effective_cache_size Sets the optimizer's assumption about the effective size of the disk cache (that is, the portion of the kernel's disk cache that will be used for PostgreSQL data files). This is measured in disk pages, which are normally 8 kB each. http://archives.postgresql.org/pgsql-performance/2003-07/msg00159.php talks about how to programmatically determine the right setting for effective_cache_size: case `uname` in "FreeBSD") echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))" ;; *) echo "Unable to automatically determine the effective cache size" >> /dev/stderr ;; esac which brings me back to my question why not make Freebsd use more of its memory for disk caching and then tell postgres about it. > > > Is there a reason not to increase the hibufspace beyond the 200 megs and > > provide a bigger cache to postgres? I looked both on the postgres and > > freebsd mailing lists and couldn't find a good answer to this. > > Well, maybe butnot necessarily. It's better to leave the OS to look after > most of your RAM. > > Chris > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] FreeBSD config
Thanks for the pointer. So maxbufspace = nbuf * BKVASIZE; Which is confirmed in http://unix.derkeiler.com/Mailing-Lists/FreeBSD/performance/2003-09/0045.html and it looks like there's a patch by Sean Chittenden at http://people.freebsd.org/~seanc/patches/patch-HEAD-kern.nbuf that does what I was asking. Seems a little on the bleeding edge. Has anyone tried this? On Thu, Feb 26, 2004 at 04:36:01PM -0600, Kevin Barnard wrote: > On 26 Feb 2004 at 13:58, Dror Matalon wrote: > > > > > which brings me back to my question why not make Freebsd use more of its > > memory for disk caching and then tell postgres about it. > > > > I think there is some confusion about maxbufsize and hibufspace. I looking at a > comment in the FreeBSB source 4.9 that explains this. I think you will want to > increase effective_cache to match maxbufsize not hibufspace but I could be wrong. > > $FreeBSD: src/sys/kern/vfs_bio.c,v 1.242.2.21 line 363 > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] FreeBSD config
On Thu, Feb 26, 2004 at 11:55:31AM -0700, scott.marlowe wrote: > On Thu, 26 Feb 2004, Dror Matalon wrote: > > > Hi, > > > > We have postgres running on freebsd 4.9 with 2 Gigs of memory. As per > > repeated advice on the mailing lists we configured effective_cache_size > > = 25520 which you get by doing `sysctl -n vfs.hibufspace` / 8192 > > > > Which results in using 200Megs for disk caching. > > > > Is there a reason not to increase the hibufspace beyond the 200 megs and > > provide a bigger cache to postgres? I looked both on the postgres and > > freebsd mailing lists and couldn't find a good answer to this. > > Actually, I think you're confusing effective_cache_size with > shared_buffers. No, I'm not. > > effective_cache_size changes no cache settings for postgresql, it simply > acts as a hint to the planner on about how much of the dataset your OS / > Kernel / Disk cache can hold. I understand that. The question is why have the OS, in this case FreeBsd use only 200 Megs for disk cache and not more. Why not double the vfs.hibufspace to 418119680 and double the effective_cache_size to 51040. > > Making it bigger only tells the query planny it's more likely the data > it's looking for will be in cache. > > shared_buffers, OTOH, sets the amount of cache that postgresql uses. It's > generall considered that 256 Megs or 1/4 of memory, whichever is LESS, is > a good setting for production database servers. > Actually last I looked, I thought that the recommended max shared buffers was 10,000, 80MB, even on machines with large amounts of memory. Regards, Dror -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] tuning questions
On Thu, Dec 04, 2003 at 11:59:32AM -0500, Jeff wrote: > On Thu, 04 Dec 2003 08:06:23 -0800 > Jack Coates <[EMAIL PROTECTED]> wrote: > > > testbed: > > dual P3 1.3 GHz box with 2GB RAM > > two IDE 120G drives on separate channels (DMA on), OS on one, DB on > > the other, some swap on each (totalling 2.8G). > > RH Linux 8. > > Side Note: be sure to turn off write caching on those disks or you may > have data corruption in the event of a failure I've seen this comment several times from different people. Would someone care to explain how you would get data corruption? I thought that the whole idea of the log is to provide a journal similar to what you get in a journaling file system. In other words, the db writes a series of transactions to the log and marks that "log entry" (don't know the right nomeclature) as valid. When the db crashes, it reads the log, and discards the last "log entry" if it wasn't marked as valid, and "replays" any transactions that haven't been commited ot the db. The end result being that you might loose your last transaction(s) if the db crashes, but nothing ever gets corrupted. So what am I missing in this picture? Regards, Dror -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] For full text indexing, which is better, tsearch2 or
On Thu, Nov 27, 2003 at 08:51:14AM +0800, Christopher Kings-Lynne wrote: > >Which one is better (performance/easier to use), > >tsearch2 or fulltextindex? > >there is an example how to use fulltextindex in the > >techdocs, but I checked the contrib/fulltextindex > >package, there is a WARNING that fulltextindex is > >much slower than tsearch2. but tsearch2 seems > >complex to use, and I can not find a good example. > >Which one I should use? Any suggestions? > > I believe I wrote that warning :) > > Tsearch2 is what you should use. Yes, it's more complicated but it's > HEAPS faster and seriously powerful. > Can you provide some numbers please, both for creating full text indexes as well as for searching them? I tried to use tsearch and it seemed like just creating a full text index on million+ records took forever. > Just read the README file. > > You could also try out the original tsearch (V1), but that will probably > be superceded soon, now that tsearch2 is around. > > Chris > > > > ---(end of broadcast)--- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Maximum Possible Insert Performance?
But the permissions of the base ramdisk might be wrong. I'd su to the user that you run postgres as (probably postgres), and make sure that you can go to the directory where the log and the database files are and make sure you can see the files. On Wed, Nov 26, 2003 at 10:03:47AM -0800, William Yu wrote: > Tom Lane wrote: > >William Yu <[EMAIL PROTECTED]> writes: > > > >>I then tried to put the WAL directory onto a ramdisk. I turned off > >>swapping, created a tmpfs mount point and copied the pg_xlog directory > >>over. Everything looked fine as far as I could tell but Postgres just > >>panic'd with a "file permissions" error. Anybody have thoughts to why > >>tmpfs would not work? > > > > > >I'd say you got the file or directory ownership or permissions wrong. > > I did a mv instead of a cp which duplicates ownership & permissions exactly. > > > ---(end of broadcast)--- > TIP 4: Don't 'kill -9' the postmaster -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] why index scan not working when using 'like'?
Hi, Searches with like or regexes often can't use the index. Think of the index as a sorted list of your items. It's easy to find an item when you know it starts with mif so ('mif%' should use the index). But when you use a 'like' that starts with '%' the index is useless and the search needs to do a sequential scan. Regards, Dror On Tue, Nov 25, 2003 at 07:48:49PM +, LIANHE SHAO wrote: > Hi all, > > I want to use index on the gene_symbol column in my > query and gene_symbol is indexed. but when I use > lower (gene_symbol) like lower('%mif%'), the index > is not used. While when I change to > lower(gene_symbol) = lower('mif'), the index is used > and index scan works, but this is not what I like. I > want all the gene_symbols containing substring > 'mif' are pulled out, and not necessarily exactly match. > > could anybody give me some hints how to deal with > this. If I do not used index, it take too long for > the query. > > > PGA> explain select distinct probeset_id, chip, > gene_symbol, title, sequence_description, pfam from > affy_array_annotation where lower(gene_symbol) like > upper('%mif%'); >QUERY PLAN > - > Unique (cost=29576.44..29591.44 rows=86 width=265) >-> Sort (cost=29576.44..29578.59 rows=857 > width=265) > Sort Key: probeset_id, chip, gene_symbol, > title, sequence_description, pfam > -> Seq Scan on affy_array_annotation > (cost=0.00..29534.70 rows=857 width=265) >Filter: (lower((gene_symbol)::text) > ~~ 'MIF%'::text) > (5 rows) > > > PGA=> explain select distinct probeset_id, chip, > gene_symbol, title, sequence_description, pfam from > affy_array_annotation where lower(gene_symbol) = > upper('%mif%'); > > QUERY PLAN > - > Unique (cost=3433.44..3448.44 rows=86 width=265) >-> Sort (cost=3433.44..3435.58 rows=857 width=265) > Sort Key: probeset_id, chip, gene_symbol, > title, sequence_description, pfam > -> Index Scan using gene_symbol_idx_fun1 > on affy_array_annotation (cost=0.00..3391.70 > rows=857 width=265) >Index Cond: > (lower((gene_symbol)::text) = '%MIF%'::text) > (5 rows) > > > > > > Regards, > William > > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Various performance questions
On Mon, Oct 27, 2003 at 07:52:06AM -0500, Christopher Browne wrote: > In the last exciting episode, [EMAIL PROTECTED] (Dror Matalon) wrote: > > I was answering an earlier response that suggested that maybe the actual > > counting took time so it would take quite a bit longer when there are > > more rows to count. > > Well, if a "where clause" allows the system to use an index to search > for the subset of elements, that would reduce the number of pages that > have to be examined, thereby diminishing the amount of work. > > Why don't you report what EXPLAIN ANALYZE returns as output for the > query with WHERE clause? That would allow us to get more of an idea > of what is going on... Here it is once again, and I've added another data poing "channel < 1000" which takes even less time than channel < 5000. It almost seems like the optimizer knows that it can skip certain rows "rows=4910762" vs "rows=1505605" . But how can it do that without using an index or actually looking at each row? zp1936=> EXPLAIN ANALYZE select count(*) from items; QUERY PLAN -- Aggregate (cost=245044.53..245044.53 rows=1 width=0) (actual time=55806.893..55806.897 rows=1 loops=1) -> Seq Scan on items (cost=0.00..232767.62 rows=4910762 width=0) (actual time=0.058..30481.482 rows=4910762 loops=1) Total runtime: 55806.992 ms (3 rows) zp1936=> EXPLAIN ANALYZE select count(*) from items where channel < 5000; QUERY PLAN -- Aggregate (cost=248808.54..248808.54 rows=1 width=0) (actual time=26071.264..26071.269 rows=1 loops=1) -> Seq Scan on items (cost=0.00..245044.52 rows=1505605 width=0) (actual time=0.161..17623.033 rows=1632057 loops=1) Filter: (channel < 5000) Total runtime: 26071.361 ms (4 rows) zp1936=> EXPLAIN ANALYZE select count(*) from items where channel < 1000; QUERY PLAN --- Aggregate (cost=245429.74..245429.74 rows=1 width=0) (actual time=10225.272..10225.276 rows=1 loops=1) -> Seq Scan on items (cost=0.00..245044.52 rows=154085 width=0) (actual time=7.633..10083.246 rows=25687 loops=1) Filter: (channel < 1000) Total runtime: 10225.373 ms (4 rows) > -- > (format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org") > http://www3.sympatico.ca/cbbrowne/spiritual.html > When replying, it is often possible to cleverly edit the original > message in such a way as to subtly alter its meaning or tone to your > advantage while appearing that you are taking pains to preserve the > author's intent. As a bonus, it will seem that your superior > intellect is cutting through all the excess verbiage to the very heart > of the matter. -- from the Symbolics Guidelines for Sending Mail > > ---(end of broadcast)--- > TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Various performance questions
On Mon, Oct 27, 2003 at 11:12:37AM -0500, Vivek Khera wrote: > >>>>> "DM" == Dror Matalon <[EMAIL PROTECTED]> writes: > > DM> effective_cache_size = 25520 -- freebsd forumla: vfs.hibufspace / 8192 > > DM> 1. While it seems to work correctly, I'm unclear on why this number is > DM> correct. 25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it > DM> seems like the number should be more like 1 - 1.5 Gigs. > > Nope, that's correct... I know it's correct. I was asking why it's correct. > > > -- > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Vivek Khera, Ph.D.Khera Communications, Inc. > Internet: [EMAIL PROTECTED] Rockville, MD +1-240-453-8497 > AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ > > ---(end of broadcast)--- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Various performance questions
On Mon, Oct 27, 2003 at 12:52:27PM +0530, Shridhar Daithankar wrote: > Dror Matalon wrote: > > >On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote: > >>Most of the time involves: > >> > >>a) Reading each page of the table, and > >>b) Figuring out which records on those pages are still "live." > > > > > >The table has been VACUUM ANALYZED so that there are no "dead" records. > >It's still not clear why select count() would be slower than select with > >a "where" clause. > > Do a vacuum verbose full and then everything should be within small range > of each other. > I did vaccum full verbose and the results are the same as before, 55 seconds for count(*) and 26 seconds for count(*) where channel < 5000. > Also in the where clause, does explicitly typecasting helps? > > Like 'where channel<5000::int2;' It makes no difference. > > HTH > > Shridhar > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Various performance questions
On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote: > [EMAIL PROTECTED] (Dror Matalon) wrote: > > On Sun, Oct 26, 2003 at 10:49:29PM -0500, Greg Stark wrote: > >> Dror Matalon <[EMAIL PROTECTED]> writes: > >> > >> > explain analyze select count(*) from items where channel < 5000; > >> > QUERY PLAN > >> > -- > >> > Aggregate (cost=249141.54..249141.54 rows=1 width=0) (actual > >> > time=26224.603..26224.608 rows=1 loops=1) > >> >-> Seq Scan on items (cost=0.00..245377.52 rows=1505605 width=0) (actual > >> > time=7.599..17686.869 rows=1632057 loops=1) > >> > Filter: (channel < 5000) > >> > Total runtime: 26224.703 ms > >> > > >> > > >> > How can it do a sequential scan and apply a filter to it in less time > >> > than the full sequential scan? Is it actually using an index without > >> > really telling me? > >> > >> It's not using the index and not telling you. > >> > >> It's possible the count(*) operator itself is taking some time. Postgres > > > > I find it hard to believe that the actual counting would take a > > significant amount of time. > > Most of the time involves: > > a) Reading each page of the table, and > b) Figuring out which records on those pages are still "live." The table has been VACUUM ANALYZED so that there are no "dead" records. It's still not clear why select count() would be slower than select with a "where" clause. > > What work were you thinking was involved in doing the counting? I was answering an earlier response that suggested that maybe the actual counting took time so it would take quite a bit longer when there are more rows to count. > > >> doesn't have to call it on the rows that don't match the where clause. How > >> long does "explain analyze select 1 from items" with and without the where > >> clause take? > > > > Same as count(*). Around 55 secs with no where clause, around 25 secs > > with. > > Good; at least that's consistent... > -- > (format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org") > http://www3.sympatico.ca/cbbrowne/postgresql.html > Signs of a Klingon Programmer #2: "You question the worthiness of my > code? I should kill you where you stand!" > > ---(end of broadcast)--- > TIP 4: Don't 'kill -9' the postmaster -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Various performance questions
On Sun, Oct 26, 2003 at 10:49:29PM -0500, Greg Stark wrote: > Dror Matalon <[EMAIL PROTECTED]> writes: > > > explain analyze select count(*) from items where channel < 5000; > > QUERY PLAN > > -- > > Aggregate (cost=249141.54..249141.54 rows=1 width=0) (actual > > time=26224.603..26224.608 rows=1 loops=1) > >-> Seq Scan on items (cost=0.00..245377.52 rows=1505605 width=0) (actual > > time=7.599..17686.869 rows=1632057 loops=1) > > Filter: (channel < 5000) > > Total runtime: 26224.703 ms > > > > > > How can it do a sequential scan and apply a filter to it in less time > > than the full sequential scan? Is it actually using an index without > > really telling me? > > It's not using the index and not telling you. > > It's possible the count(*) operator itself is taking some time. Postgres I find it hard to believe that the actual counting would take a significant amount of time. > doesn't have to call it on the rows that don't match the where clause. How > long does "explain analyze select 1 from items" with and without the where > clause take? Same as count(*). Around 55 secs with no where clause, around 25 secs with. > > What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an This is 7.4. > int8 to store its count so it's not limited to 4 billion records. > Unfortunately int8 is somewhat inefficient as it has to be dynamically > allocated repeatedly. It's possible it's making a noticeable difference, > especially with all the pages in cache, though I'm a bit surprised. There's > some thought about optimizing this in 7.5. > > -- > greg > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Various performance questions
Hi, We're in the process of setting up a new database server. The application is an online rss aggregator which you can see at www.fastbuzz.com (still running with the old hardware). The new machine is a dual Xeon with 2 Gigs of ram The OS is freebsd 4.9. shared_buffers = 1 sort_mem = 32768 effective_cache_size = 25520 -- freebsd forumla: vfs.hibufspace / 8192 1. While it seems to work correctly, I'm unclear on why this number is correct. 25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it seems like the number should be more like 1 - 1.5 Gigs. 2. The main performance challenges are with the items table which has around five million rows and grows at the rate of more than 100,000 rows a day. If I do a select count(*) from the items table it take 55 - 60 seconds to execute. I find it interesting that it takes that long whether it's doing it the first time and fetching the pages from disk or on subsequent request where it fetches the pages from memory. I know that it's not touching the disks because I'm running an iostat in a different window. Here's the explain analyze: explain analyze select count(*) from items; QUERY PLAN -- Aggregate (cost=245377.53..245377.53 rows=1 width=0) (actual time=55246.035..55246.040 rows=1 loops=1) -> Seq Scan on items (cost=0.00..233100.62 rows=4910762 width=0) (actual time=0.054..30220.641 rows=4910762 loops=1) Total runtime: 55246.129 ms (3 rows) and the number of pages: select relpages from pg_class where relname = 'items'; relpages -- 183993 So does it make sense that it would take close to a minute to count the 5 million rows even if all pages are in memory? 3. Relpages is 183993 so file size should be 183993*8192 = 1507270656, roughly 1.5 gig. The actual disk size is 1073741824 or roughly 1 gig. Why the difference? 4. If I put a certain filter/condition on the query it tells me that it's doing a sequential scan, and yet it takes less time than a full sequential scan: explain analyze select count(*) from items where channel < 5000; QUERY PLAN -- Aggregate (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1) -> Seq Scan on items (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057 loops=1) Filter: (channel < 5000) Total runtime: 26224.703 ms How can it do a sequential scan and apply a filter to it in less time than the full sequential scan? Is it actually using an index without really telling me? Here's the structure of the items table Column | Type | Modifiers ---+--+--- articlenumber | integer | not null channel | integer | not null title | character varying| link | character varying| description | character varying| comments | character varying(500) | dtstamp | timestamp with time zone | signature | character varying(32)| pubdate | timestamp with time zone | Indexes: "item_channel_link" btree (channel, link) "item_created" btree (dtstamp) "item_signature" btree (signature) "items_channel_article" btree (channel, articlenumber) "items_channel_tstamp" btree (channel, dtstamp) 5. Any other comments/suggestions on the above setup. Thanks, Dror -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Speeding up Aggregates
On Fri, Oct 10, 2003 at 10:32:32AM -0700, Josh Berkus wrote: > Dror, > > > Ouch. I just double checked and you're right. Is this considered a bug, > > or just an implementation issue? > > It's an implementation issue, which may be fixed by 7.5 but not sooner. > Basically, the free ability of PostgreSQL users to define their own > aggregates limits our ability to define query planner optimization for > aggregates. Only recently has anyone suggested a feasable way around this. > > > While I've seen this hint a few times in the lists, it seems like it's > > one of those magic incantations that those in the know, know about, and > > that people new to postgres are going to be surprised by the need to use > > this idiom. > > It IS in the FAQ. Might be a good idea to put it in its own section rather than under "My queries are slow or don't make use of the indexes. Why?" Also, you might want to take out for 7.4 4.22) Why are my subqueries using IN so slow? > > -- > Josh Berkus > Aglio Database Solutions > San Francisco -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Speeding up Aggregates
On Thu, Oct 09, 2003 at 08:35:22PM -0500, Bruno Wolff III wrote: > On Thu, Oct 09, 2003 at 17:44:46 -0700, > Dror Matalon <[EMAIL PROTECTED]> wrote: > > > > How is doing order by limit 1 faster than doing max()? Seems like the > > optimizer will need to sort or scan the data set either way. That part > > didn't actually make a difference in my specific case. > > max() will never be evaluated by using an index to find the greatest value. > So in many cases using order by and limit 1 is faster. Ouch. I just double checked and you're right. Is this considered a bug, or just an implementation issue? While I've seen this hint a few times in the lists, it seems like it's one of those magic incantations that those in the know, know about, and that people new to postgres are going to be surprised by the need to use this idiom. Regards, Dror -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Speeding up Aggregates
On Thu, Oct 09, 2003 at 07:07:00PM -0400, Greg Stark wrote: > Dror Matalon <[EMAIL PROTECTED]> writes: > > > Actually what finally sovled the problem is repeating the > > dtstamp > last_viewed > > in the sub select > > That will at least convince the optimizer to use an index range lookup. But it > still will have to scan every record that matches channel==$1, link==$2, and > dtstamp>$3. > > The trick of using limit 1 will be faster still as it only has to retrieve a > single record using the index. But you have to be sure to convince it to use How is doing order by limit 1 faster than doing max()? Seems like the optimizer will need to sort or scan the data set either way. That part didn't actually make a difference in my specific case. > the index and the way to do that is to list exactly the same columns in the > ORDER BY as are in the index definition. > > Even if some of the leading columns are redundant because they'll be constant > for all of the records retrieved. The optimizer doesn't know to ignore those. The main problem in my case was that the optimizer was doing the max() on all 700 rows, rather than the filtered rows. It's not until I put the "dtstamp> last_viewed" in the sub select as well as in the main query that it realized that it can first filter the 696 rows out and then to the max() on the 4 rows that satisfied this constraint. That was the big saving. Hope this all makes sense, Dror > > > > (This is the thing i pointed out previously in > > > <[EMAIL PROTECTED]> on Feb 13th 2003 on pgsql-general) > > -- > greg > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL
Yeah, I had similar thought to Oliver's and suspected that this would be the answer. Also, while it's not too hard to do this for a single platform, it gets complecated once you start looking at different ones. Josh, let me know when you're ready to do this. I'll try to help, although my perl's kind of rusty. Also, can you even assume perl for a postgres install? Does Solaris, for instance come with perl? Dror On Thu, Oct 09, 2003 at 09:56:11AM -0700, Josh Berkus wrote: > Oliver, > > > I think instead of thinking about where to put the > > information about tuning, someone should provide a > > "pgsql-autotune". Maybe even a shell script would do the > > trick. > > Well, you see, there's the issue. "I think someone." Lots of people have > spoken in favor of an "auto-conf" script; nobody so far has stepped forward > to get it done for 7.4, and I doubt we have time now. > > I'll probably create a Perl script in a month or so, but not before that > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > ---(end of broadcast)--- > TIP 4: Don't 'kill -9' the postmaster -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Compare rows
Greg, On Wed, Oct 08, 2003 at 03:07:30PM -0400, Greg Spiegelberg wrote: > Dror, > > I gave this some serious thought at first. I only deal with > int8, numeric(24,12) and varchar(32) columns which I could > reduce to 3 different tables. Problem was going from 1700-3000 I'm not sure how the data types come into play here. I was for the most part following your examples. > rows to around 300,000-1,000,000 rows per system per day that > is sending data to our database. > Depending on the distribution of your data you can end up with more, less or roughly the same amount of data in the end. It all depends on how many of the 600+ columns change every time you insert a row. If only a few of them do, then you'll clearly end up with less total data, since you'll be writing several rows that are very short instead of one huge row that contains all the information. In other words, you're tracking changes better. It also sounds like you feel that having a few thousand rows in a very "wide" table is better than having 300,000 - 1,00,000 rows in a "narrow" table. My gut feeling is that it's the other way around, but there are plenty of people on this list who can provide a more informed answer. Using the above eample, assuming that both tables roughly have the same number of pages in them, would postgres deal better with a table with 3-4 columns with 300,000 - 1,000,000 rows or with a table with several hundred columns with only 3000 or so rows? Regards, Dror > BTW, the int8 and numeric(24,12) are for future expansion. > I hate limits. > > Greg > > > Dror Matalon wrote: > >It's still not quite clear what you're trying to do. Many people's gut > >reaction is that you're doing something strange with so many columns in > >a table. > > > >Using your example, a different approach might be to do this instead: > > > > Day | Name | Value > > --+-+--- > > Oct 1 | OS | Solaris 5.8 > > Oct 1 | Patch | 108528-12 > > Oct 3 | Patch | 108528-13 > > > > > >You end up with lots more rows, fewer columns, but it might be > >harder to query the table. On the other hand, queries should run quite > >fast, since it's a much more "normal" table. > > > >But without knowing more, and seeing what the other columns look like, > >it's hard to tell. > > > >Dror > > > -- > Greg Spiegelberg > Sr. Product Development Engineer > Cranel, Incorporated. > Phone: 614.318.4314 > Fax: 614.431.8388 > Email: [EMAIL PROTECTED] > Cranel. Technology. Integrity. Focus. > > > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Compare rows
It's still not quite clear what you're trying to do. Many people's gut reaction is that you're doing something strange with so many columns in a table. Using your example, a different approach might be to do this instead: Day | Name | Value --+-+--- Oct 1 | OS | Solaris 5.8 Oct 1 | Patch | 108528-12 Oct 3 | Patch | 108528-13 You end up with lots more rows, fewer columns, but it might be harder to query the table. On the other hand, queries should run quite fast, since it's a much more "normal" table. But without knowing more, and seeing what the other columns look like, it's hard to tell. Dror On Wed, Oct 08, 2003 at 02:39:54PM -0400, Greg Spiegelberg wrote: > Joe Conway wrote: > >Greg Spiegelberg wrote: > > > >>The reason for my initial question was this. We save changes only. > >>In other words, if system S has row T1 for day D1 and if on day D2 > >>we have another row T1 (excluding our time column) we don't want > >>to save it. > > > > > >It still isn't entirely clear to me what you are trying to do, but > >perhaps some sort of calculated checksum or hash would work to determine > >if the data has changed? > > Best example I have is this. > > You're running Solaris 5.8 with patch 108528-X and you're collecting > that data daily. Would you want option 1 or 2 below? > > Option 1 - Store it all > Day | OS | Patch > --+-+--- > Oct 1 | Solaris 5.8 | 108528-12 > Oct 2 | Solaris 5.8 | 108528-12 > Oct 3 | Solaris 5.8 | 108528-13 > Oct 4 | Solaris 5.8 | 108528-13 > Oct 5 | Solaris 5.8 | 108528-13 > and so on... > > To find what you're running: > select * from table order by day desc limit 1; > > To find when it last changed however takes a join. > > > Option 2 - Store only changes > Day | OS | Patch > --+-+--- > Oct 1 | Solaris 5.8 | 108528-12 > Oct 3 | Solaris 5.8 | 108528-13 > > To find what you're running: > select * from table order by day desc limit 1; > > To find when it last changed: > select * from table order by day desc limit 1 offset 1; > > I selected Option 2 because I'm dealing with mounds of complicated and > varying data formats and didn't want to have to write complex queries > for everything. > > Greg > > -- > Greg Spiegelberg > Sr. Product Development Engineer > Cranel, Incorporated. > Phone: 614.318.4314 > Fax: 614.431.8388 > Email: [EMAIL PROTECTED] > Cranel. Technology. Integrity. Focus. > > > > ---(end of broadcast)--- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Speeding up Aggregates
Actually what finally sovled the problem is repeating the dtstamp > last_viewed in the sub select select articlenumber, channel, description, title, link, dtstamp from items i1, my_channels where ((i1.channel = '2' and my_channels.id = '2' and owner = 'drormata' and (dtstamp > last_viewed)) ) and (dtstamp = (select max (dtstamp) from items i2 where channel = '2' and i1.link = i2.link)); to explain analyze select articlenumber, channel, description, title, link, dtstamp from items i1, my_channels where ((i1.channel = '2' and my_channels.id = '2' and owner = 'drormata' and (dtstamp > last_viewed)) ) and (dtstamp = (select max (dtstamp) from items i2 where channel = '2' and i1.link = i2.link and dtstamp > last_viewed)); Which in the stored procedure looks like this: CREATE or REPLACE FUNCTION item_max_date (int4, varchar, timestamptz) RETURNS timestamptz AS ' select max(dtstamp) from items where channel = $1 and link = $2 and dtstamp > $3; ' LANGUAGE 'sql'; Basically I have hundreds or thousands of items but only a few that satisfy "dtstamp > last_viewed". Obviously I want to run the max() only on on a few items. Repeating "dtstamp > last_viewed" did the trick, but it seems like there should be a more elegant/clear way to tell the planner which constraint to apply first. Dror On Wed, Oct 08, 2003 at 10:54:24AM -0400, Greg Stark wrote: > Rod Taylor <[EMAIL PROTECTED]> writes: > > > On Fri, 2003-10-03 at 17:53, Dror Matalon wrote: > > > On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote: > > > > > > > > It is too bad the (channel, link) index doesn't have dtstamp at the end > > > > of it, otherwise the below query would be a gain (might be a small one > > > > anyway). > > > > > > > > select dtstamp > > > > from items > > > >where channel = $1 > > > > and link = $2 > > > > ORDER BY dtstamp DESC > > > >LIMIT 1; > > > > It didn't make a difference even with the 3 term index? I guess you > > don't have very many common values for channel / link combination. > > You need to do: > > ORDER BY channel DESC, link DESC, dtstamp DESC > > This is an optimizer nit. It doesn't notice that since it selected on channel > and link already the remaining tuples in the index will be ordered simply by > dtstamp. > > (This is the thing i pointed out previously in > <[EMAIL PROTECTED]> on Feb 13th 2003 on pgsql-general) > > > -- > greg > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Speeding up Aggregates
On Fri, Oct 03, 2003 at 06:10:29PM -0400, Rod Taylor wrote: > On Fri, 2003-10-03 at 17:53, Dror Matalon wrote: > > On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote: > > > > item_max_date() looks like this: > > > >select max(dtstamp) from items where channel = $1 and link = $2; > > > > > > It is too bad the (channel, link) index doesn't have dtstamp at the end > > > of it, otherwise the below query would be a gain (might be a small one > > > anyway). > > > > > > select dtstamp > > > from items > > >where channel = $1 > > > and link = $2 > > > ORDER BY dtstamp DESC > > >LIMIT 1; > > It didn't make a difference even with the 3 term index? I guess you > don't have very many common values for channel / link combination. There's no noticeable difference between two term and three term indexes. > > > > How about the below? Note the word STABLE on the end. > > CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS > timestamptz AS ' > select max(dtstamp) from items where channel = $1 and link = $2; > ' LANGUAGE 'sql' STABLE; Made no difference. -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Speeding up Aggregates
On Fri, Oct 03, 2003 at 02:35:46PM -0700, Josh Berkus wrote: > Dror, > > > I am using 7.4, and had tried NOT EXISTS and didn't see any > > improvements. > > It wouldn't if you're using 7.4, which has improved IN performance immensely. > > What happens if you stop using a function and instead use a subselect? An improvement. Now I'm getting in the 200 msec response time. And by the way, I tried "not exists" again and it actually runs slower than "not in." > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco > > > ---(end of broadcast)--- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Speeding up Aggregates
On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote: > > item_max_date() looks like this: > >select max(dtstamp) from items where channel = $1 and link = $2; > > It is too bad the (channel, link) index doesn't have dtstamp at the end > of it, otherwise the below query would be a gain (might be a small one > anyway). > > select dtstamp > from items >where channel = $1 > and link = $2 > ORDER BY dtstamp DESC >LIMIT 1; Similar idea to what Josh suggested. I did create an additional index with dtstamp at the end and it doesn't look like the planner used it. Using the above query instead of max() didn't improve things either. > > > Could you show us the exact specification of the function? In > particular, did you mark it VOLATILE, IMMUTABLE, or STABLE? > > I hope it isn't the first or second one ;) CREATE or REPLACE FUNCTION item_max_date (int4, varchar) RETURNS timestamptz AS ' select max(dtstamp) from items where channel = $1 and link = $2; ' LANGUAGE 'sql'; -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Speeding up Aggregates
Hi Josh, On Fri, Oct 03, 2003 at 02:07:10PM -0700, Josh Berkus wrote: > Dror, > > > select articlenumber, channel, description, title, link, dtstamp from > > items, my_channels where items.channel = '2' and my_channels.id = > > '2' and owner = 'drormata' and dtstamp > last_viewed and > > articlenumber not in (select item from viewed_items where channel > > ='2' and owner = 'drormata'); > > the NOT IN is a bad idea unless the subselect never returns more than a > handful of rows. If viewed_items can grow to dozens of rows, wyou should > use WHERE NOT EXISTS instead. Unless you're using 7.4. > I am using 7.4, and had tried NOT EXISTS and didn't see any improvements. > > item_max_date() looks like this: > >select max(dtstamp) from items where channel = $1 and link = $2; > > Change it to > > SELECT dtstamp from iterm where channel = $1 and link = $2 > ORDER BY dtstamp DESC LIMIT 1 > Didn't make a difference. And plugging real values into this query as well as into the original select max(dtstamp) from items where channel = $1 and link = $2; and doing an explain analyze shows that the cost is the same. The strange things is that when I run the above queries by hand they take about .5 msec. Yet on a resultset that fetches 5 rows, I go up from 15 msec to 300 msec. It would seem like it should be something like 15 + (0.5 * 5) + small overhead, = 30 msec or so rather than the 300 I'm seeing. > and possibly build an index on channel, link, dtstamp Didn't make a difference either. Explain analyze shows that it didn't use it. > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Speeding up Aggregates
Hi, I have a query that ran quite well initially, but slowed down quite a bit once I introduced an aggregate into the equation. The average execution time went up from around 15 msec to around 300 msec. The original query fetches a bunch of articles: select articlenumber, channel, description, title, link, dtstamp from items, my_channels where items.channel = '2' and my_channels.id = '2' and owner = 'drormata' and dtstamp > last_viewed and articlenumber not in (select item from viewed_items where channel ='2' and owner = 'drormata'); I then added a call to a function: and (dtstamp = item_max_date(2, link)) item_max_date() looks like this: select max(dtstamp) from items where channel = $1 and link = $2; This should eliminate duplicate articles and only show the most recent one. resulting in the following query select articlenumber, channel, description, title, link, dtstamp from items, my_channels where items.channel = '2' and my_channels.id = '2' and owner = 'drormata' and dtstamp > last_viewed and articlenumber not in (select item from viewed_items where channel ='2' and owner = 'drormata') and (dtstamp = item_max_date(2, link)); Any suggestions on optimizing the query/function? It makes sense that it slowed down, but I wonder if I can do better. I'm including index list as well as "explain analyze" of both versions. Indexes: "item_channel_link" btree (channel, link) "item_created" btree (dtstamp) "item_signature" btree (signature) "items_channel_article" btree (channel, articlenumber) explain analyze select articlenumber, channel, description, title, link, dtstamp from items, my_channels where items.channel = '2' and my_channels.id = '2' and owner = 'drormata' and dtstamp > last_viewed and articlenumber not in (select item from viewed_items where channel ='2' and owner = 'drormata'); QUERY PLAN --- Nested Loop (cost=8.19..6982.58 rows=302 width=259) (actual time=16.95..17.16 rows=8 loops=1) Join Filter: ("inner".dtstamp > "outer".last_viewed) -> Seq Scan on my_channels (cost=0.00..3.23 rows=1 width=8) (actual time=0.36..0.38 rows=1 loops=1) Filter: ((id = 2) AND (("owner")::text = 'drormata'::text)) -> Index Scan using items_channel_article on items (cost=8.19..6968.05 rows=904 width=259) (actual time=0.68..13.94 rows=899 loops=1) Index Cond: (channel = 2) Filter: (NOT (hashed subplan)) SubPlan -> Seq Scan on viewed_items (cost=0.00..8.19 rows=2 width=4) (actual time=0.48..0.48 rows=0 loops=1) Filter: ((channel = 2) AND (("owner")::text = 'drormata'::text)) Total runtime: 17.42 msec (11 rows) explain analyze select articlenumber, channel, description, title, link, dtstamp from items, my_channels where items.channel = '2' and my_channels.id = '2' and owner = 'drormata' and dtstamp > last_viewed and articlenumber not in (select item from viewed_items where channel ='2' and owner = 'drormata') and (dtstamp = item_max_date(2, link)); QUERY PLAN -- Nested Loop (cost=8.19..6980.33 rows=1 width=259) (actual time=262.94..265.14 rows=7 loops=1) Join Filter: ("outer".dtstamp > "inner".last_viewed) -> Index Scan using items_channel_article on items (cost=8.19..6977.08 rows=1 width=259) (actual time=1.94..150.55 rows=683 loops=1) Index Cond: (channel = 2) Filter: ((dtstamp = item_max_date(2, link)) AND (NOT (hashed subplan))) SubPlan -> Seq Scan on viewed_items (cost=0.00..8.19 rows=2 width=4) (actual time=0.43..0.43 rows=0 loops=1) Filter: ((channel = 2) AND (("owner")::text = 'drormata'::text)) -> Seq Scan on my_channels (cost=0.00..3.23 rows=1 width=8) (actual time=0.14..0.15 rows=1 loops=683) Filter: ((id = 2) AND (("owner")::text = 'drormata'::text)) Total runtime: 265.39 msec -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] count(*) slow on large tables
I smell a religious war in the aii:-). Can you go several days in a row without doing select count(*) on any of your tables? I suspect that this is somewhat a domain specific issue. In some areas you don't need to know the total number of rows in your tables, in others you do. I also suspect that you're right, that end user applications don't use this information as often as DBAs would. On the other hand, it seems whenever you want to optimize your app (something relevant to this list), one of the things you do need to know is the number of rows in your table. Dror On Thu, Oct 02, 2003 at 10:08:18PM -0400, Christopher Browne wrote: > The world rejoiced as [EMAIL PROTECTED] (Dror Matalon) wrote: > > I don't have an opinion on how hard it would be to implement the > > tracking in the indexes, but "select count(*) from some table" is, in my > > experience, a query that people tend to run quite often. > > One of the databases that I've used, I believe it was Informix, had that > > info cached so that it always new how many rows there were in any > > table. It was quite useful. > > I can't imagine why the raw number of tuples in a relation would be > expected to necessarily be terribly useful. > > I'm involved with managing Internet domains, and it's only when people > are being pretty clueless that anyone imagines that "select count(*) > from domains;" would be of any use to anyone. There are enough "test > domains" and "inactive domains" and other such things that the raw > number of "things in the table" aren't really of much use. > > - I _do_ care how many pages a table occupies, to some extent, as that > determines whether it will fit in my disk space or not, but that's not > COUNT(*). > > - I might care about auditing the exact numbers of records in order to > be assured that a data conversion process was done correctly. But in > that case, I want to do something a whole *lot* more detailed than > mere COUNT(*). > > I'm playing "devil's advocate" here, to some extent. But > realistically, there is good reason to be skeptical of the merits of > using SELECT COUNT(*) FROM TABLE for much of anything. > > Furthermore, the relation that you query mightn't be a physical > "table." It might be a more virtual VIEW, and if that's the case, > bets are even MORE off. If you go with the common dictum of "good > design" that users don't directly access tables, but go through VIEWs, > users may have no way to get at SELECT COUNT(*) FROM TABLE. > -- > output = reverse("ac.notelrac.teneerf" "@" "454aa") > http://www.ntlug.org/~cbbrowne/finances.html > Rules of the Evil Overlord #74. "When I create a multimedia > presentation of my plan designed so that my five-year-old advisor can > easily understand the details, I will not label the disk "Project > Overlord" and leave it lying on top of my desk." > <http://www.eviloverlord.com/> > > ---(end of broadcast)--- > TIP 4: Don't 'kill -9' the postmaster -- Dror Matalon, President Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] count(*) slow on large tables
I don't have an opinion on how hard it would be to implement the tracking in the indexes, but "select count(*) from some table" is, in my experience, a query that people tend to run quite often. One of the databases that I've used, I believe it was Informix, had that info cached so that it always new how many rows there were in any table. It was quite useful. On Thu, Oct 02, 2003 at 05:57:30PM -0400, Christopher Browne wrote: > [EMAIL PROTECTED] (Jean-Luc Lachance) writes: > > That's one of the draw back of MVCC. > > I once suggested that the transaction number and other house keeping > > info be included in the index, but was told to forget it... > > It would solve once and for all the issue of seq_scan vs index_scan. > > It would simplify the aggregate problem. > > It would only simplify _one_ case, namely the case where someone cares > about the cardinality of a relation, and it would do that at > _considerable_ cost. > > A while back I outlined how this would have to be done, and for it to > be done efficiently, it would be anything BUT simple. > > It would be very hairy to implement it correctly, and all this would > cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;" > > If you had a single WHERE clause attached, you would have to revert to > walking through the tuples looking for the ones that are live and > committed, which is true for any DBMS. > > And it still begs the same question, of why the result of this query > would be particularly meaningful to anyone. I don't see the > usefulness; I don't see the value of going to the considerable effort > of "fixing" this purported problem. > -- > let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];; > <http://dev6.int.libertyrms.com/> > Christopher Browne > (416) 646 3304 x124 (land) > > ---(end of broadcast)--- > TIP 4: Don't 'kill -9' the postmaster -- Dror Matalon, President Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] count(*) slow on large tables
Hi, I have a somewhat large table, 3 million rows, 1 Gig on disk, and growing. Doing a count(*) takes around 40 seconds. Looks like the count(*) fetches the table from disk and goes through it. Made me wonder, why the optimizer doesn't just choose the smallest index which in my case is around 60 Megs and goes through it, which it could do in a fraction of the time. Dror -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly