Re: [PERFORM] Querying 19million records very slowly
> I'm running FreeBSD 4.11, and im editing the file in > /usr/local/etc/postgresql.conf, but it doesnt help. On my system the 'live' config file resides in /var/lib/postgresql/data/postgresql.conf - maybe you have them in /usr/local/var/lib ... -- Tobias Brox, +86-13521622905 Nordicbet, IT dept ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Querying 19million records very slowly
-I also changed now() to current_time, which increased performance quite good. I need to make further tests, before I'll post any results. -I tried to change now()- interval 'x hours' to like 2005-06-22 16:00:00+02. This also increased the performance. changing to time > '2005-06-22 16:00:00+02' (or what ever date is 24 hours back) or time > current_time - interval '24 hours' will be used. I'm running FreeBSD 4.11, and im editing the file in /usr/local/etc/postgresql.conf, but it doesnt help. When i start up "psql database", none of the options are changed (with a restart of the postmaster). I cant find a '--configuration=path/file' option for the postmaster either... Kjell Tore On 6/22/05, Tobias Brox <[EMAIL PROTECTED]> wrote: [Kjell Tore Fossbakk - Wed at 07:41:54AM -0700]> I cant get the config file to load into my postgres. that's the > problem. I want to set it to 10k, but it is only still at 1000... I> save the file and restart the service..>> yes, i ment 'pg_ctl reload', sry about that one.Classical problem, a bit depending on the distro you are using. The "master" file usually resides in /etc/postgresql while the actual fileused usually resides in /var/lib/postgres/data ... or something. Somedistros copies over the file (such that one always should edit the file in /etc) others don't (thus you either have to do that your self, or edit thefile in the database data directory.--Tobias Brox, +86-13521622905Nordicbet, IT dept
Re: [PERFORM] Querying 19million records very slowly
[Kjell Tore Fossbakk - Wed at 07:41:54AM -0700] > I cant get the config file to load into my postgres. that's the > problem. I want to set it to 10k, but it is only still at 1000... I > save the file and restart the service.. > > yes, i ment 'pg_ctl reload', sry about that one. Classical problem, a bit depending on the distro you are using. The "master" file usually resides in /etc/postgresql while the actual file used usually resides in /var/lib/postgres/data ... or something. Some distros copies over the file (such that one always should edit the file in /etc) others don't (thus you either have to do that your self, or edit the file in the database data directory. -- Tobias Brox, +86-13521622905 Nordicbet, IT dept ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Querying 19million records very slowly
On 2005-06-22 10:55, Michael Glaesemann wrote: > There has been discussion in the past on storing the time zone name > with the timestamptz as well, though no one has implemented this yet. The reason for this may be that time zone names (abbreviations) are not unique. For example, "ECT" can mean "Ecuador Time" (offset -05) or "Eastern Caribbean Time" (offset -04). http://www.worldtimezone.com/wtz-names/timezonenames.html cheers, stefan ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Querying 19million records very slowly
I cant get the config file to load into my postgres. that's the problem. I want to set it to 10k, but it is only still at 1000... I save the file and restart the service.. yes, i ment 'pg_ctl reload', sry about that one. kjell tore On 6/22/05, Bricklen Anderson <[EMAIL PROTECTED]> wrote: > >> enable_seqscan = false > >> enable_indexscan = true > >> .. > >> effective_cache_size = 1 > >> random_page_cost = 2 > >> .. > >> > >> I save the file, type 'pg_crl reload' then enter 'psql database'. > >> > >> argus=> show effective_cache_size ; > >> effective_cache_size > >> -- > >> 1000 > >> (1 row) > > I assume that 'pg_crl' is a typo? That should read 'pg_ctl reload' > Also, you said that your effective_cache_size = 1, yet when you SHOW > it, > it's only 1000. A cut 'n paste error, or maybe your erroneous "pg_crl" > didn't > trigger the reload? > > -- > ___ > > This e-mail may be privileged and/or confidential, and the sender does > not waive any related rights and obligations. Any distribution, use or > copying of this e-mail or the information it contains by other than an > intended recipient is unauthorized. If you received this e-mail in > error, please advise me (by return e-mail or otherwise) immediately. > ___ > ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Querying 19million records very slowly
OK, so the planner is in fact making a mistake (I think). Try turning down your random_page_cost a little. It defaults at 4.0, see if 2.0 works "right". (Careful, move these things around too much for one query, you will wreck others.) 4.0 is a little large for almost all modern hardware, so see if moving it down a little makes things somewhat smarter. P On Wednesday, June 22, 2005, at 12:45 AM, Kjell Tore Fossbakk wrote: database=> set enable_seqscan to on; SET Time: 0.34 ms database=> explain analyze select count(*) from test where p1=53; QUERY PLAN --- Aggregate (cost=522824.50..522824.50 rows=1 width=0) (actual time=56380.72..56380.72 rows=1 loops=1) -> Seq Scan on test (cost=0.00..517383.30 rows=2176479 width=0) (actual time=9.61..47677.48 rows=2220746 loops=1) Filter: (p1 = 53) Total runtime: 56380.79 msec (4 rows) Time: 56381.40 ms database=> explain analyze select count(*) from test where p1=53 and time > now() - interval '24 hours' ; QUERY PLAN --- - Aggregate (cost=661969.01..661969.01 rows=1 width=0) (actual time=45787.02..45787.02 rows=1 loops=1) -> Seq Scan on test (cost=0.00..660155.28 rows=725493 width=0) (actual time=37799.32..45613.58 rows=42424 loops=1) Filter: ((p1 = 53) AND ("time" > (now() - '1 day'::interval))) Total runtime: 45787.09 msec (4 rows) Time: 45787.79 ms database=> explain analyze select date_trunc('hour', time),count(*) as total from test where p1=53 and time>now()-interval '24 hours' group by date_trunc order by date_trunc; QUERY PLAN --- - Aggregate (cost=755116.97..760558.17 rows=72549 width=8) (actual time=46040.63..46717.61 rows=23 loops=1) -> Group (cost=755116.97..758744.44 rows=725493 width=8) (actual time=46022.06..46548.84 rows=42407 loops=1) -> Sort (cost=755116.97..756930.70 rows=725493 width=8) (actual time=46022.04..46198.94 rows=42407 loops=1) Sort Key: date_trunc('hour'::text, "time") -> Seq Scan on test (cost=0.00..660155.28 rows=725493 width=8) (actual time=37784.91..45690.88 rows=42407 loops=1) Filter: ((p1 = 53) AND ("time" > (now() - '1 day'::interval))) Total runtime: 46718.43 msec (7 rows) Time: 46719.44 ms database=> create index test_time_p1_idx on test(time,p1) ; CREATE INDEX Time: 178926.02 ms database=> vacuum analyze test ; VACUUM Time: 73058.33 ms database=> \d test Table "public.test" Column | Type | Modifiers -+--+--- time| timestamp with time zone | source | inet | destination | inet | p1 | integer | p2 | integer | database=> \di public | test_time_idx| index | database | test public | test_source_idx | index | database | test public | test_destination_idx | index | database | test public | test_p1_idx | index | database | test public | test_p2_idx | index | database | test public | test_time_p1_idx | index | database | test database=> set enable_seqscan to off ; SET Time: 0.28 ms database=> explain analyze select date_trunc('hour', time),count(*) as total from test where p1=53 and time>now()-interval '24 hours' group by date_trunc order by date_trunc; Aggregate (cost=2315252.66..2320767.17 rows=73527 width=8) (actual time=2081.15..2720.44 rows=23 loops=1) -> Group (cost=2315252.66..2318929.00 rows=735268 width=8) (actual time=2079.76..2564.22 rows=41366 loops=1) -> Sort (cost=2315252.66..2317090.83 rows=735268 width=8) (actual time=2079.74..2243.32 rows=41366 loops=1) Sort Key: date_trunc('hour'::text, "time") -> Index Scan using test_time_p1_idx on test (cost=0.00..2218878.46 rows=735268 width=8) (actual time=29.50..1774.52 rows=41366 loops=1) Index Cond: (("time" > (now() - '1 day'::interval)) AND (p1 = 53)) Total runtime: 2735.42 msec Time: 2736.48 ms database=> explain analyze select date_trunc('hour', time),count(*) as total from test where p1=80 and time>now()-interval '24 hours' group by date_trunc order by date_trunc; Aggregate (cost=1071732.15..1074305.59 rows=34313 width=8) (actual time=6353.93..7321.99 rows=22 loops=1) -> Group (cost=1071732.15..1073447.77 rows=343125 width=8) (actual time=6323.76..7078.10 rows=64267 loops=1) -> Sort
Re: [PERFORM] Querying 19million records very slowly
Tobias Brox <[EMAIL PROTECTED]> writes: >> time (datetime) > '2005-06-21 10:36:22+02' >> or >> time (timestamp) > 'some timestamp pointing to yesterday' > If I have understood it correctly, the planner will recognize the timestamp > and compare it with the statistics in the first example but not in the > second, and thus it will be more likely to use index scan on the first one > and seqscan on the second. That statement is true for releases before 8.0. Kjell has not at any point told us what PG version he is running, unless I missed it... regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Querying 19million records very slowly
[Kjell Tore Fossbakk - Wed at 11:10:42AM +0200] > Which one does Pg read fastes? Does he convert datetime in the table, > then my where clause and check, for each row? How does he compare a > datetime with a datetime? Timestamp are easy, large number bigger than > another large number.. > > time (datetime) > '2005-06-21 10:36:22+02' > > or > > time (timestamp) > 'some timestamp pointing to yesterday' If I have understood it correctly, the planner will recognize the timestamp and compare it with the statistics in the first example but not in the second, and thus it will be more likely to use index scan on the first one and seqscan on the second. -- Tobias Brox, +86-13521622905 Nordicbet, IT dept ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Querying 19million records very slowly
> Try to type in '2005-06-21 16:36:22+08' directly in the query, and see if it > makes changes. Or probably '2005-06-21 10:36:22+02' in your case ;-) Which one does Pg read fastes? Does he convert datetime in the table, then my where clause and check, for each row? How does he compare a datetime with a datetime? Timestamp are easy, large number bigger than another large number.. time (datetime) > '2005-06-21 10:36:22+02' or time (timestamp) > 'some timestamp pointing to yesterday' Hmm.. I cant find any doc that describes this very good. On 6/22/05, Michael Glaesemann <[EMAIL PROTECTED]> wrote: > > On Jun 22, 2005, at 5:39 PM, Tobias Brox wrote: > > > (btw, does postgresql really handles timezones? '+02' is quite > > different > > from 'CET', which will be obvious sometime in the late autoumn...) > > Yes, it does. It doesn't (currently) record the time zone name, but > rather only the offset from UTC. If a time zone name (rather than UTC > offset) is given, it is converted to the UTC offset *at that > timestamptz* when it is stored. For time zones that take into account > DST, their UTC offset changes during the year, and PostgreSQL records > the equivalent UTC offset for the appropriate timestamptz values. > > There has been discussion in the past on storing the time zone name > with the timestamptz as well, though no one has implemented this yet. > > Michael Glaesemann > grzm myrealbox com > > > ---(end of broadcast)--- > TIP 6: Have you searched our list archives? > >http://archives.postgresql.org > ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Querying 19million records very slowly
On Jun 22, 2005, at 5:39 PM, Tobias Brox wrote: (btw, does postgresql really handles timezones? '+02' is quite different from 'CET', which will be obvious sometime in the late autoumn...) Yes, it does. It doesn't (currently) record the time zone name, but rather only the offset from UTC. If a time zone name (rather than UTC offset) is given, it is converted to the UTC offset *at that timestamptz* when it is stored. For time zones that take into account DST, their UTC offset changes during the year, and PostgreSQL records the equivalent UTC offset for the appropriate timestamptz values. There has been discussion in the past on storing the time zone name with the timestamptz as well, though no one has implemented this yet. Michael Glaesemann grzm myrealbox com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Querying 19million records very slowly
[Kjell Tore Fossbakk - Wed at 10:18:30AM +0200] > I'll test the use of current_timestamp, rather than now(). I am not > sure if Pg can do a match between a fixed timestamp and a datetime? I have almost all my experience with timestamps wo timezones, but ... isn't that almost the same as the timedate type? > time > current_timestamp - interval '24 hours', > when time is -mm-dd hh-mm-ss+02, like 2005-06-22 16:00:00+02. Try to type in '2005-06-21 16:36:22+08' directly in the query, and see if it makes changes. Or probably '2005-06-21 10:36:22+02' in your case ;-) (btw, does postgresql really handles timezones? '+02' is quite different from 'CET', which will be obvious sometime in the late autoumn...) -- Tobias Brox, +86-13521622905 Nordicbet, IT dept ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Querying 19million records very slowly
Appreciate your time, Mr Brox. I'll test the use of current_timestamp, rather than now(). I am not sure if Pg can do a match between a fixed timestamp and a datetime? time > current_timestamp - interval '24 hours', when time is -mm-dd hh-mm-ss+02, like 2005-06-22 16:00:00+02. If Pg cant do it, and current_time is faster, i could possibly convert the time field in my database to timestamp, and insert all rows as timestamp rather than a timedate. But that is some script to work over 19 mill rows, so I need to know if that will give me any more speed.. Kjell Tore. On 6/22/05, Tobias Brox <[EMAIL PROTECTED]> wrote: > [Kjell Tore Fossbakk - Wed at 09:45:22AM +0200] > > database=> explain analyze select count(*) from test where p1=53 and > > time > now() - interval '24 hours' ; > > Sorry to say that I have not followed the entire thread neither read the > entire email I'm replying to, but I have a quick hint on this one (ref my > earlier thread about timestamp indices) - the postgresql planner will > generally behave smarter when using a fixed timestamp (typically generated > by the app server) than logics based on now(). > > One of my colleagues also claimed that he found the usage of > localtimestamp faster than now(). > > -- > Tobias Brox, +86-13521622905 > Nordicbet, IT dept > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Querying 19million records very slowly
[Kjell Tore Fossbakk - Wed at 09:45:22AM +0200] > database=> explain analyze select count(*) from test where p1=53 and > time > now() - interval '24 hours' ; Sorry to say that I have not followed the entire thread neither read the entire email I'm replying to, but I have a quick hint on this one (ref my earlier thread about timestamp indices) - the postgresql planner will generally behave smarter when using a fixed timestamp (typically generated by the app server) than logics based on now(). One of my colleagues also claimed that he found the usage of localtimestamp faster than now(). -- Tobias Brox, +86-13521622905 Nordicbet, IT dept ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Querying 19million records very slowly
database=> set enable_seqscan to on; SET Time: 0.34 ms database=> explain analyze select count(*) from test where p1=53; QUERY PLAN --- Aggregate (cost=522824.50..522824.50 rows=1 width=0) (actual time=56380.72..56380.72 rows=1 loops=1) -> Seq Scan on test (cost=0.00..517383.30 rows=2176479 width=0) (actual time=9.61..47677.48 rows=2220746 loops=1) Filter: (p1 = 53) Total runtime: 56380.79 msec (4 rows) Time: 56381.40 ms database=> explain analyze select count(*) from test where p1=53 and time > now() - interval '24 hours' ; QUERY PLAN Aggregate (cost=661969.01..661969.01 rows=1 width=0) (actual time=45787.02..45787.02 rows=1 loops=1) -> Seq Scan on test (cost=0.00..660155.28 rows=725493 width=0) (actual time=37799.32..45613.58 rows=42424 loops=1) Filter: ((p1 = 53) AND ("time" > (now() - '1 day'::interval))) Total runtime: 45787.09 msec (4 rows) Time: 45787.79 ms database=> explain analyze select date_trunc('hour', time),count(*) as total from test where p1=53 and time>now()-interval '24 hours' group by date_trunc order by date_trunc; QUERY PLAN Aggregate (cost=755116.97..760558.17 rows=72549 width=8) (actual time=46040.63..46717.61 rows=23 loops=1) -> Group (cost=755116.97..758744.44 rows=725493 width=8) (actual time=46022.06..46548.84 rows=42407 loops=1) -> Sort (cost=755116.97..756930.70 rows=725493 width=8) (actual time=46022.04..46198.94 rows=42407 loops=1) Sort Key: date_trunc('hour'::text, "time") -> Seq Scan on test (cost=0.00..660155.28 rows=725493 width=8) (actual time=37784.91..45690.88 rows=42407 loops=1) Filter: ((p1 = 53) AND ("time" > (now() - '1 day'::interval))) Total runtime: 46718.43 msec (7 rows) Time: 46719.44 ms database=> create index test_time_p1_idx on test(time,p1) ; CREATE INDEX Time: 178926.02 ms database=> vacuum analyze test ; VACUUM Time: 73058.33 ms database=> \d test Table "public.test" Column | Type | Modifiers -+--+--- time| timestamp with time zone | source | inet | destination | inet | p1 | integer | p2 | integer | database=> \di public | test_time_idx| index | database | test public | test_source_idx | index | database | test public | test_destination_idx | index | database | test public | test_p1_idx | index | database | test public | test_p2_idx | index | database | test public | test_time_p1_idx | index | database | test database=> set enable_seqscan to off ; SET Time: 0.28 ms database=> explain analyze select date_trunc('hour', time),count(*) as total from test where p1=53 and time>now()-interval '24 hours' group by date_trunc order by date_trunc; Aggregate (cost=2315252.66..2320767.17 rows=73527 width=8) (actual time=2081.15..2720.44 rows=23 loops=1) -> Group (cost=2315252.66..2318929.00 rows=735268 width=8) (actual time=2079.76..2564.22 rows=41366 loops=1) -> Sort (cost=2315252.66..2317090.83 rows=735268 width=8) (actual time=2079.74..2243.32 rows=41366 loops=1) Sort Key: date_trunc('hour'::text, "time") -> Index Scan using test_time_p1_idx on test (cost=0.00..2218878.46 rows=735268 width=8) (actual time=29.50..1774.52 rows=41366 loops=1) Index Cond: (("time" > (now() - '1 day'::interval)) AND (p1 = 53)) Total runtime: 2735.42 msec Time: 2736.48 ms database=> explain analyze select date_trunc('hour', time),count(*) as total from test where p1=80 and time>now()-interval '24 hours' group by date_trunc order by date_trunc; Aggregate (cost=1071732.15..1074305.59 rows=34313 width=8) (actual time=6353.93..7321.99 rows=22 loops=1) -> Group (cost=1071732.15..1073447.77 rows=343125 width=8) (actual time=6323.76..7078.10 rows=64267 loops=1) -> Sort (cost=1071732.15..1072589.96 rows=343125 width=8) (actual time=6323.75..6579.42 rows=64267 loops=1) Sort Key: date_trunc('hour'::text, "time") -> Index Scan using test_time_p1_idx on test (cost=0.00..1035479.58 rows=343125 width=8) (actual time=0.20..5858.67 rows=64267 loops=1) Index Cond: (("time" > (now() - '1 day'::interval)) AND (p1 = 80)) Total runtime: 7322.82 msec Time: 7323.90 ms
Re: [PERFORM] Querying 19million records very slowly
use CURRENT_TIME which is a constant instead of now() which is not considered constant... (I think) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Querying 19million records very slowly
Some tips: - EXPLAIN ANALYZE provides a more useful analysis of a slow query, because it gives both the estimate and actual times/rows for each step in the plan. - The documentation is right: rows with little variation are pretty useless to index. Indexing is about "selectivity", reducing the amount of stuff the database has to read off the the disk. - You only have two things in your WHERE clause, so that is where the most important indexes reside. How many of your rows have p1=53? How many of your rows have happened in the last day? If your answer is "a lot" then the indexes are not going to help: PostgreSQL will be more efficient scanning every tuple than it will be jumping around the index structure for a large number of tuples. - If neither time nor p1 are particularly selective individually, but they are selective when taken together, try a multi-key index on them both. Paul Kjell Tore Fossbakk wrote: Hello! I use FreeBSD 4.11 with PostGreSQL 7.3.8. I got a huge database with roughly 19 million records. There is just one table, with a time field, a few ints and a few strings. table test fields time (timestamp), source (string), destination (string), p1 (int), p2 (int) I have run VACUUM ANALYZE ; I have created indexes on every field, but for some reason my postgre server wants to use a seqscan, even tho i know a indexed scan would be much faster. create index test_time_idx on test (time) ; create index test_source_idx on test (source) ; create index test_destination_idx on test (destination) ; create index test_p1_idx on test (p1) ; create index test_p2_idx on test (p2) ; What is really strange, is that when i query a count(*) on one of the int fields (p1), which has a very low count, postgre uses seqscan. In another count on the same int field (p1), i know he is giving about 2.2 million hits, but then he suddenly uses seqscan, instead of a indexed one. Isn't the whole idea of indexing to increase performance in large queries.. To make sort of a phonebook for the values, to make it faster to look up what ever you need... This just seems opposite.. Here is a EXPLAIN of my query database=> explain select date_trunc('hour', time),count(*) as total from test where p1=53 and time > now() - interval '24 hours' group by date_trunc order by date_trunc ; QUERY PLAN -- Aggregate (cost=727622.61..733143.23 rows=73608 width=8) -> Group (cost=727622.61..731303.02 rows=736083 width=8) -> Sort (cost=727622.61..729462.81 rows=736083 width=8) Sort Key: date_trunc('hour'::text, "time") -> Seq Scan on test (cost=0.00..631133.12 rows=736083 width=8) Filter: ((p1 = 53) AND ("time" > (now() - '1 day'::interval))) (6 rows) database=> drop INDEX test_ test_source_idx test_destination_idxtest_p1_idx test_p2_idx test_time_idx After all this, i tried to set enable_seqscan to off and enable_nestedloops to on. This didnt help much either. The time to run the query is still in minutes. My results are the number of elements for each hour, and it gives about 1000-2000 hits per hour. I have read somewhere, about PostGreSQL, that it can easily handle 100-200million records. And with the right tuned system, have a great performance.. I would like to learn how :) I also found an article on a page ( http://techdocs.postgresql.org/techdocs/pgsqladventuresep3.php): Tip #11: Don't bother indexing columns with huge numbers of records and a small range of values, such as BOOLEAN columns. This tip, regretfully, is perhaps the only tip where I cannot provide a good, real-world example from my work. So I'll give you a hypothetical situation instead: Imagine that you have a database table with a list of every establishment vending ice cream in the US. A simple example might look like: Where there were almost 1 million rows, but due to simplistic data entry, only three possible values for type (1-SUPERMARKET, 2-BOUTIQUE, and 3-OTHER) which are relatively evenly distributed. In this hypothetical situation, you might find (with testing using EXPLAIN) that an index on type is ignored and the parser uses a "seq scan" (or table scan) instead. This is because a table scan can actually be faster than an index scan in this situation. Thus, any index on type should be dropped. Certainly, the boolean column (active) requires no indexing as it has only two possible values and no index will be faster than a table scan. Then I ask, what is useful with indexing, when I can't use it on a VERY large database? It is on my 15 million record database it takes for ever to do seqscans over and over again... This is probably why, as i mentioned earlier, the reason (read the quote) why he chooses a full scan and not a indexed one... So what do I do? :confused: I'v used SQL for years, but n
[PERFORM] Querying 19million records very slowly
Hello!I use FreeBSD 4.11 with PostGreSQL 7.3.8.I got a huge database with roughly 19 million records. There is just onetable, with a time field, a few ints and a few strings.table testfields time (timestamp), source (string), destination (string), p1 (int), p2 (int)I have run VACUUM ANALYZE ;I have created indexes on every field, but for some reason my postgreserver wants to use a seqscan, even tho i know a indexed scan would bemuch faster. create index test_time_idx on test (time) ;create index test_source_idx on test (source) ;create index test_destination_idx on test (destination) ;create index test_p1_idx on test (p1) ;create index test_p2_idx on test (p2) ; What is really strange, is that when i query a count(*) on one of the intfields (p1), which has a very low count, postgre uses seqscan. In anothercount on the same int field (p1), i know he is giving about 2.2 millionhits, but then he suddenly uses seqscan, instead of a indexed one. Isn'tthe whole idea of indexing to increase performance in large queries.. Tomake sort of a phonebook for the values, to make it faster to look up what ever you need... This just seems opposite..Here is a EXPLAIN of my querydatabase=> explain select date_trunc('hour', time),count(*) as total fromtest where p1=53 and time > now() - interval '24 hours' group by date_trunc order by date_trunc ; QUERY PLAN--Aggregate (cost=727622.61..733143.23 rows=73608 width=8) -> Group (cost=727622.61..731303.02 rows=736083 width=8) -> Sort (cost=727622.61..729462.81 rows=736083 width=8) Sort Key: date_trunc('hour'::text, "time") -> Seq Scan on test (cost=0.00..631133.12 rows=736083width=8) Filter: ((p1 = 53) AND ("time" > (now() - '1day'::interval)))(6 rows) database=> drop INDEX test_test_source_idx test_destination_idx test_p1_idx test_p2_idx test_time_idxAfter all this, i tried to set enable_seqscan to off and enable_nestedloops to on. This didnt help much either. The time to run thequery is still in minutes. My results are the number of elements for eachhour, and it gives about 1000-2000 hits per hour. I have read somewhere, about PostGreSQL, that it can easily handle 100-200million records. Andwith the right tuned system, have a great performance.. I would like tolearn how :)I also found an article on a page( http://techdocs.postgresql.org/techdocs/pgsqladventuresep3.php):Tip #11: Don't bother indexing columns with huge numbers of records and asmall range of values, such as BOOLEAN columns.This tip, regretfully, is perhaps the only tip where I cannot provide a good, real-world example from my work. So I'll give you a hypotheticalsituation instead:Imagine that you have a database table with a list of every establishmentvending ice cream in the US. A simple example might look like: Where there were almost 1 million rows, but due to simplistic data entry,only three possible values for type (1-SUPERMARKET, 2-BOUTIQUE, and3-OTHER) which are relatively evenly distributed. In this hypothetical situation, you might find (with testing using EXPLAIN) that an index ontype is ignored and the parser uses a "seq scan" (or table scan) instead. This is because a table scan can actually be faster than an index scan in this situation. Thus, any index on type should be dropped.Certainly, the boolean column (active) requires no indexing as it has onlytwo possible values and no index will be faster than a table scan. Then I ask, what is useful with indexing, when I can't use it on a VERYlarge database? It is on my 15 million record database it takes for everto do seqscans over and over again... This is probably why, as i mentioned earlier, the reason (read the quote) why he chooses a full scan and not aindexed one...So what do I do? :confused:I'v used SQL for years, but never in such a big scale. Thus, not having tolearn how to deal with large number of records. Usually a maximum of 1000 records. Now, with millions, I need to learn a way to make my suckyqueries better.Im trying to learn more about tuning my system, makeing better queries andsuch. I'v found some documents on the Internet, but far from the best. Feedback most appreciated!Regards,a learning PostGreSQL user