Re: [postgis-users] OT Understanding slow queries
32 bit Windows is actually limited to 2 GB processes, or 3 GB if you start Windows with a special */3GB* option. Aren On Thu, Sep 1, 2011 at 6:45 PM, Paragon Corporation wrote: > ** > Charles, > > Honestly I have little idea. In theory you should fair better with Linux, > but I think it depends on the kinds of processes you run how much better it > is and how much ram you have. One advantage that Linux has over windows > (which is an advantage when running 32-bit Linux even, but more of an > advantage I think when you are running 64-bit Linux) is that you can > allocate more shared memory so processes that benefit on a lot of shared > memory should benefit (basically processes where you are using more or less > the same data in different ways can live in RAM. Windows is limited not > just for the 32-bit but also the 64-bit and I think this is just a > fundamental flaw in PostgreSQL on windows. > > So generally speaking on windows I can't boost my shared memory more than > say I think about 700 - 1GB without running into crashing issues. > As far as work mem and so forth is concerned, on 32-bit windows you can > have as much as 4GB per postgres process (and more with 64-bit windows) , > but of course if you are running > Linux and 64-bit at that you can go up way more. > > I'm sure Linux folks will shoot me for saying this and chew me out, but I > haven't really noticed much of a difference running my processes on 64-bit > Linux vs. 32-bit window, > but then again my processes are probably different from other peoples and I > don't have a 32G ram Linux to take advantage of the massive more shared > memory I can allocate. > > So on the low end (say 4-8GB ram range I suspect there isn't much of a > difference, but when you get higher to the 32GB/64GB range, you would > probably do a lot better with Linux. > > Thanks, > Regina > > -- > *From:* postgis-users-boun...@postgis.refractions.net [mailto: > postgis-users-boun...@postgis.refractions.net] *On Behalf Of *Charles > Galpin > *Sent:* Thursday, September 01, 2011 3:10 PM > *To:* PostGIS Users Discussion > > *Subject:* Re: [postgis-users] OT Understanding slow queries > > Hi Regina > > I am revisiting this again. How much of a performance difference should > one expect to see between the 32 bit version and the 64 bit version of > postgres when using PostGIS for typical gis queries like filtering by > bounding box, locating nearest points etc? Depending on how I break up my > data, I'll have anywhere from 200k to 260M records per table depending on > how I partition it on a machine with 32G of ram. > > I am trying to make a case to use linux for a specific project but without > being able to say there are significant gains (in performance) I'm just > fighting an uphill battle. At best right now I can use the latest > postgis/postgresql under windows but only 32 bit. > > Thanks, > charles > > On Aug 26, 2011, at 3:50 PM, Paragon Corporation wrote: > > > Sadly it's for immediate production use and I'm forced to use windows > which limits my version choices a bit given my lack of skill under windows > to build postgis :( > > > charles > > Charles, > > You know we do have pretty much latest builds of PostGIS (even trunk on > PostGIS website for windows). > http://www.postgis.org/download/windows/experimental.php > > and as far as PostgreSQL -- they have released windows binaries for even > PostgreSQL 9.1 RC1 > http://www.enterprisedb.com/products-services-training/pgdevdownload > > Can't get too much more current than that (all without having to compile > anything unless you are talking about the 64-bit versions). > > Thanks, > Regina > http://www.postgis.us > > > > > ___ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users > > > > ___ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users > > ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
Charles, Honestly I have little idea. In theory you should fair better with Linux, but I think it depends on the kinds of processes you run how much better it is and how much ram you have. One advantage that Linux has over windows (which is an advantage when running 32-bit Linux even, but more of an advantage I think when you are running 64-bit Linux) is that you can allocate more shared memory so processes that benefit on a lot of shared memory should benefit (basically processes where you are using more or less the same data in different ways can live in RAM. Windows is limited not just for the 32-bit but also the 64-bit and I think this is just a fundamental flaw in PostgreSQL on windows. So generally speaking on windows I can't boost my shared memory more than say I think about 700 - 1GB without running into crashing issues. As far as work mem and so forth is concerned, on 32-bit windows you can have as much as 4GB per postgres process (and more with 64-bit windows) , but of course if you are running Linux and 64-bit at that you can go up way more. I'm sure Linux folks will shoot me for saying this and chew me out, but I haven't really noticed much of a difference running my processes on 64-bit Linux vs. 32-bit window, but then again my processes are probably different from other peoples and I don't have a 32G ram Linux to take advantage of the massive more shared memory I can allocate. So on the low end (say 4-8GB ram range I suspect there isn't much of a difference, but when you get higher to the 32GB/64GB range, you would probably do a lot better with Linux. Thanks, Regina _ From: postgis-users-boun...@postgis.refractions.net [mailto:postgis-users-boun...@postgis.refractions.net] On Behalf Of Charles Galpin Sent: Thursday, September 01, 2011 3:10 PM To: PostGIS Users Discussion Subject: Re: [postgis-users] OT Understanding slow queries Hi Regina I am revisiting this again. How much of a performance difference should one expect to see between the 32 bit version and the 64 bit version of postgres when using PostGIS for typical gis queries like filtering by bounding box, locating nearest points etc? Depending on how I break up my data, I'll have anywhere from 200k to 260M records per table depending on how I partition it on a machine with 32G of ram. I am trying to make a case to use linux for a specific project but without being able to say there are significant gains (in performance) I'm just fighting an uphill battle. At best right now I can use the latest postgis/postgresql under windows but only 32 bit. Thanks, charles On Aug 26, 2011, at 3:50 PM, Paragon Corporation wrote: > Sadly it's for immediate production use and I'm forced to use windows which limits my version choices a bit given my lack of skill under windows to build postgis :( > charles Charles, You know we do have pretty much latest builds of PostGIS (even trunk on PostGIS website for windows). http://www.postgis.org/download/windows/experimental.php and as far as PostgreSQL -- they have released windows binaries for even PostgreSQL 9.1 RC1 http://www.enterprisedb.com/products-services-training/pgdevdownload Can't get too much more current than that (all without having to compile anything unless you are talking about the 64-bit versions). Thanks, Regina http://www.postgis.us <http://www.postgis.us/> ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
Hi Regina I am revisiting this again. How much of a performance difference should one expect to see between the 32 bit version and the 64 bit version of postgres when using PostGIS for typical gis queries like filtering by bounding box, locating nearest points etc? Depending on how I break up my data, I'll have anywhere from 200k to 260M records per table depending on how I partition it on a machine with 32G of ram. I am trying to make a case to use linux for a specific project but without being able to say there are significant gains (in performance) I'm just fighting an uphill battle. At best right now I can use the latest postgis/postgresql under windows but only 32 bit. Thanks, charles On Aug 26, 2011, at 3:50 PM, Paragon Corporation wrote: > > Sadly it's for immediate production use and I'm forced to use windows > which limits my version choices a bit given my lack of skill under windows to > build postgis :( > > charles > > Charles, > > You know we do have pretty much latest builds of PostGIS (even trunk on > PostGIS website for windows). > http://www.postgis.org/download/windows/experimental.php > > and as far as PostgreSQL -- they have released windows binaries for even > PostgreSQL 9.1 RC1 > http://www.enterprisedb.com/products-services-training/pgdevdownload > > Can't get too much more current than that (all without having to compile > anything unless you are talking about the 64-bit versions). > > Thanks, > Regina > http://www.postgis.us > > > > > ___ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
On Aug 26, 2011, at 3:50 PM, Paragon Corporation wrote: > You know we do have pretty much latest builds of PostGIS (even trunk on > PostGIS website for windows). > http://www.postgis.org/download/windows/experimental.php > > and as far as PostgreSQL -- they have released windows binaries for even > PostgreSQL 9.1 RC1 > http://www.enterprisedb.com/products-services-training/pgdevdownload > > Can't get too much more current than that (all without having to compile > anything unless you are talking about the 64-bit versions). Sorry, I should have been more clear. I also use pgRouting and afaik I am tied to 8.4 on windows without building this stuff myself. If I am wrong, please let me know! charles ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
> Sadly it's for immediate production use and I'm forced to use windows which limits my version choices a bit given my lack of skill under windows to build postgis :( > charles Charles, You know we do have pretty much latest builds of PostGIS (even trunk on PostGIS website for windows). http://www.postgis.org/download/windows/experimental.php and as far as PostgreSQL -- they have released windows binaries for even PostgreSQL 9.1 RC1 http://www.enterprisedb.com/products-services-training/pgdevdownload Can't get too much more current than that (all without having to compile anything unless you are talking about the 64-bit versions). Thanks, Regina http://www.postgis.us ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
On 8/25/2011 1:21 PM, Ben Madin wrote: Steve, does this just apply to count(*), or is count(id) just as bad? I was originally a MySQL user and count(*) could be very efficient there. Best way to evaluate this is with explain. But yes both have the same full table scan. MySQL maintains row count as one of its tables stats so it just pulls that from the stats metadata. If you just want an approximate row count, there are some status based on the last analyze. If you use pgadmin3 you can see in the table properties: Rows (estimated)1957620 This is one of the largest hurdles for MySQL users moving over to postgresql because it is so obvious and in your face. But the things I can do today with postgresql and with postgis, I could never do in MySQL. -Steve [snip history] ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
Charles Galpin lhsw.com> writes: > My apologies Hehe -- none necessary -- welcome to the light side. > I would consider Postgres on trunk if it has features youreally need. > > Sadly it's for immediate production use and I'm forced to use windows which limits my version choices a bit given my lack of skill under windows to build postgis :( Yeah, so much for index only queries ;) You ask interesting questions -- don't be a stranger. ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
On Aug 25, 2011, at 4:00 PM, fork wrote: > One thing -- while we hope that you ask lots of questions on this list, would > you not "top posting", and trimming out non-germane text? Threading and > trimming make a conversation MUCH easier to follow. My apologies > Also -- if you are developing an app that will be rolled out later or that is > somewhat academic, I would consider Postgres on trunk if it has features you > really need. It is easy to build, just make sure you back up your data... Sadly it's for immediate production use and I'm forced to use windows which limits my version choices a bit given my lack of skill under windows to build postgis :( charles ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
Charles Galpin lhsw.com> writes: > All of your feedback has been most helpful. Yes this query is contrived but I *thought* it was > representative of a worst case scenario that might be similar to future data sets and it's likely not. I think we are all glad to help. The count(*) assumption is reasonable enough to be made by LOTS of people. One thing -- while we hope that you ask lots of questions on this list, would you not "top posting", and trimming out non-germane text? Threading and trimming make a conversation MUCH easier to follow. Also -- if you are developing an app that will be rolled out later or that is somewhat academic, I would consider Postgres on trunk if it has features you really need. It is easy to build, just make sure you back up your data... ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
> there is no way to get a count without seq scanning. True for a single table, in the given example, there was a join. A possible outcome could have been a seq scan on the smaller table and a nested loop index lookup on the bigger table. The optimizer must have thought this was more expensive than the full scan with hash join. > -Original Message- > From: postgis-users-boun...@postgis.refractions.net [mailto:postgis- > users-boun...@postgis.refractions.net] On Behalf Of fork > Sent: Thursday, August 25, 2011 1:56 PM > To: postgis-users@postgis.refractions.net > Subject: Re: [postgis-users] OT Understanding slow queries > > Ben Madin remoteinformation.com.au> writes: > > > does this just apply to count(*), or is count(id) just as bad? I was > originally a MySQL user and count(*) > > could be very efficient there. > > My understanding is that Postgres does not keep record length for any > of its > tables internally, so there is no way to get a count without seq > scanning. The > trade off is that inserts deletes are faster but a very common query is > much > slower. I don't know if the planner could benefit in any way from the > count > being available, though. > > The lists say to use a trigger on inserts and deletes to update a > metadata table > if you really do need to know how many elements are in it exactly, but > that is a > far less frequent need than you may think (for example an EXISTS can > get you an > answer to "are there any records" more quickly than a count(*)). I > think you > can do a quick and rough estimate by doing some math with the table > size on > disk, but I never have. > > It is unfortunate that the first (rather lame) "benchmark" anyone tries > to do > with a new database is run "select count(*) from x" -- I am sure lots > of people > have been turned off from PG because this particular query is slow > compared to > MySQL. > > (MySQL always wins in the more hollywood competitions against PG, but > fails in > the long run, IMHO) > > > > ___ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
I haven't noticed count(*) causing additional query time, but I was just using it to rule out overhead of pulling all those rows back. All of your feedback has been most helpful. Yes this query is contrived but I *thought* it was representative of a worst case scenario that might be similar to future data sets and it's likely not. Due to your prompting I did figure out why this particular problem i was seeing was slow and it was really just user stupidity. The real use case is using geoserver to visualize this data. In the worst case scenario someone zooms out nice and far and effectively gets all the links in that join. Now geoserver seems to be able to get all the links with something like "get * from links" and generate the tiles at this zoom level fairly quickly so I figured all the other overhead being equal, the join (sans the count(*) ) would be the worst case and although it is, I think you are right that it simply has to join across all those ids and there is no way to improve that if selecting all. I'll show a more reasonable case of what usually happens, but I'll explain the actual problem. I though adding a distinct on (link_id) would speed up the join since the source_link table has many rows for each link_id but it turns out this was what was making it slow (and I didn't realize that's what the real query was doing). I also though for some reason an index would make distinct run quickly since the index is effectively all unique values right? So here is what is typically going on for the query and after changing the "select distinct on (link_id) l.*" to "select l.*" it performs reasonably with an additional level of filtering with a bounding box. explain analyse SELECT "link_id",encode(ST_AsBinary(ST_Force_2D("the_geom")),'base64') as "the_geom" FROM (select l.* from links l, source_link ld where l.link_id = ld.link_id) as "vtable" WHERE "the_geom" && ST_GeomFromText('POLYGON ((-74.79526213727112 40.11142841966784, -74.79526213727112 40.2127052195, -74.66273955428638 40.2127052195, -74.66273955428638 40.11142841966784, -74.79526213727112 40.11142841966784))', 4326) /* 'Hash Join (cost=3722.92..74955.80 rows=58813 width=378) (actual time=412.729..5946.610 rows=44469 loops=1)' ' Hash Cond: (ld.link_id = l.link_id)' ' -> Seq Scan on source_link ld (cost=0.00..58282.29 rows=3179029 width=10) (actual time=0.026..2823.685 rows=3179029 loops=1)' ' -> Hash (cost=3706.60..3706.60 rows=1306 width=378) (actual time=7.805..7.805 rows=1285 loops=1)' '-> Bitmap Heap Scan on links l (cost=74.42..3706.60 rows=1306 width=378) (actual time=0.944..6.093 rows=1285 loops=1)' ' Recheck Cond: (the_geom && '010320E61001000500E6D42993E5B252C0BF285549430E4440E6D42993E5B252C0ADE2B3EC391B44403DDB29536AAA52C0ADE2B3EC391B44403DDB29536AAA52C0BF285549430E4440E6D42993E5B252C0BF285549430E4440'::geometry)' ' -> Bitmap Index Scan on links_geom_idx (cost=0.00..74.09 rows=1306 width=0) (actual time=0.883..0.883 rows=1285 loops=1)' 'Index Cond: (the_geom && '010320E61001000500E6D42993E5B252C0BF285549430E4440E6D42993E5B252C0ADE2B3EC391B44403DDB29536AAA52C0ADE2B3EC391B44403DDB29536AAA52C0BF285549430E4440E6D42993E5B252C0BF285549430E4440'::geometry)' 'Total runtime: 5983.473 ms' So what prompted my initial concern is solved. This slowness caught me at a time where in the back of my mind i am contemplating our next data challenge which is why I looked into it. If I run into problems with our other data set I'll be sure to cover all my options and give realistic queries if I ask for help. Thanks again for all the advice. charles - Original Message - From: "Ben Madin" To: "PostGIS Users Discussion" Sent: Thursday, August 25, 2011 1:21:00 PM GMT -05:00 US/Canada Eastern Subject: Re: [postgis-users] OT Understanding slow queries Steve, does this just apply to count(*), or is count(id) just as bad? I was originally a MySQL user and count(*) could be very efficient there. cheers Ben On 26/08/2011, at 12:01 AM, Stephen Woodbridge wrote: > The issue here is the count(*) which forces a full table scan in postgresql > as fork mentioned. You need to look at a real query, unless you are really > doing a count(*). > > -Steve > > On 8/25/2011 11:49 AM, Ben Madin wrote: >> I'm no expert at this, but my understanding (which is limited) was >> that you are asking for the whole table, so indexing doesn't really >> get used (my understanding is that indexing helps to find the page >> for a subset of data more qui
Re: [postgis-users] OT Understanding slow queries
On 08/25/2011 10:56 AM, fork wrote: Ben Madin remoteinformation.com.au> writes: does this just apply to count(*), or is count(id) just as bad? I was originally a MySQL user and count(*) could be very efficient there. My understanding is that Postgres does not keep record length for any of its tables internally, so there is no way to get a count without seq scanning. The trade off is that inserts deletes are faster but a very common query is much slower. I don't know if the planner could benefit in any way from the count being available, though. The lists say to use a trigger on inserts and deletes to update a metadata table if you really do need to know how many elements are in it exactly, but that is a far less frequent need than you may think (for example an EXISTS can get you an answer to "are there any records" more quickly than a count(*)). I think you can do a quick and rough estimate by doing some math with the table size on disk, but I never have. It is unfortunate that the first (rather lame) "benchmark" anyone tries to do with a new database is run "select count(*) from x" -- I am sure lots of people have been turned off from PG because this particular query is slow compared to MySQL. (MySQL always wins in the more hollywood competitions against PG, but fails in the long run, IMHO) I think PostgreSQL 9.2 will have index-only scans that should improve the performance of SELECT count(*) queries. http://rhaas.blogspot.com/2011/08/index-only-scans-now-theres-patch.html Granted, this is in trunk so it won't help for any production databases. -bborie -- Bborie Park Programmer Center for Vectorborne Diseases UC Davis 530-752-8380 bkp...@ucdavis.edu ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
Ben Madin remoteinformation.com.au> writes: > does this just apply to count(*), or is count(id) just as bad? I was originally a MySQL user and count(*) > could be very efficient there. My understanding is that Postgres does not keep record length for any of its tables internally, so there is no way to get a count without seq scanning. The trade off is that inserts deletes are faster but a very common query is much slower. I don't know if the planner could benefit in any way from the count being available, though. The lists say to use a trigger on inserts and deletes to update a metadata table if you really do need to know how many elements are in it exactly, but that is a far less frequent need than you may think (for example an EXISTS can get you an answer to "are there any records" more quickly than a count(*)). I think you can do a quick and rough estimate by doing some math with the table size on disk, but I never have. It is unfortunate that the first (rather lame) "benchmark" anyone tries to do with a new database is run "select count(*) from x" -- I am sure lots of people have been turned off from PG because this particular query is slow compared to MySQL. (MySQL always wins in the more hollywood competitions against PG, but fails in the long run, IMHO) ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
Steve, does this just apply to count(*), or is count(id) just as bad? I was originally a MySQL user and count(*) could be very efficient there. cheers Ben On 26/08/2011, at 12:01 AM, Stephen Woodbridge wrote: > The issue here is the count(*) which forces a full table scan in postgresql > as fork mentioned. You need to look at a real query, unless you are really > doing a count(*). > > -Steve > > On 8/25/2011 11:49 AM, Ben Madin wrote: >> I'm no expert at this, but my understanding (which is limited) was >> that you are asking for the whole table, so indexing doesn't really >> get used (my understanding is that indexing helps to find the page >> for a subset of data more quickly than scanning through the whole >> lot). >> >> Also, you might be able to get some speed up by using a different >> join type (outer join and not null where clause)? >> >> cheers >> >> Ben >> >> >> On 25/08/2011, at 9:41 PM, Charles Galpin wrote: >> >>> If this is too off topic, please let me know and I'll sign up on a >>> postgres list to get help. But this is related to my use of postgis >>> and If anyone knows this stuff, it's you guys. >>> >>> I have an example query that I expect to be much faster, but my >>> main concern is we are about to do some visualization of historical >>> congestion data which will require queries across much larger data >>> sets - like 150 million records a day. We are about to test using >>> partitions but the number per table will still be much larger than >>> what I am dealing with now. >>> >>> So here is a query I would think would be much faster than 43 >>> seconds for two tables, one with about 97k rows, and the other 3.2 >>> million. >>> >>> explain select count(l.*) from links l, source_link ld where >>> l.link_id = ld.link_id; /* 'Aggregate (cost=174731.72..174731.73 >>> rows=1 width=32)' ' -> Hash Join (cost=13024.27..166784.14 >>> rows=3179029 width=32)' 'Hash Cond: (ld.link_id = >>> l.link_id)' '-> Seq Scan on source_link ld >>> (cost=0.00..58282.29 rows=3179029 width=10)' '-> Hash >>> (cost=10963.12..10963.12 rows=96812 width=42)' ' -> >>> Seq Scan on links l (cost=0.00..10963.12 rows=96812 width=42)' */ >>> >>> Each table has an index on link_id, defined like this >>> >>> CREATE INDEX links_link_id_idx ON links USING btree (link_id); >>> >>> CREATE INDEX source_link_link_id_idx ON source_link USING btree >>> (link_id); >>> >>> Shouldn't this index prevent these sequential scans, or am I >>> misreading this? Should this really take 43 seconds? >>> >>> thanks for any advice, charles >>> >>> ___ postgis-users >>> mailing list postgis-users@postgis.refractions.net >>> http://postgis.refractions.net/mailman/listinfo/postgis-users >> >> ___ postgis-users mailing >> list postgis-users@postgis.refractions.net >> http://postgis.refractions.net/mailman/listinfo/postgis-users > > ___ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
The issue here is the count(*) which forces a full table scan in postgresql as fork mentioned. You need to look at a real query, unless you are really doing a count(*). -Steve On 8/25/2011 11:49 AM, Ben Madin wrote: I'm no expert at this, but my understanding (which is limited) was that you are asking for the whole table, so indexing doesn't really get used (my understanding is that indexing helps to find the page for a subset of data more quickly than scanning through the whole lot). Also, you might be able to get some speed up by using a different join type (outer join and not null where clause)? cheers Ben On 25/08/2011, at 9:41 PM, Charles Galpin wrote: If this is too off topic, please let me know and I'll sign up on a postgres list to get help. But this is related to my use of postgis and If anyone knows this stuff, it's you guys. I have an example query that I expect to be much faster, but my main concern is we are about to do some visualization of historical congestion data which will require queries across much larger data sets - like 150 million records a day. We are about to test using partitions but the number per table will still be much larger than what I am dealing with now. So here is a query I would think would be much faster than 43 seconds for two tables, one with about 97k rows, and the other 3.2 million. explain select count(l.*) from links l, source_link ld where l.link_id = ld.link_id; /* 'Aggregate (cost=174731.72..174731.73 rows=1 width=32)' ' -> Hash Join (cost=13024.27..166784.14 rows=3179029 width=32)' 'Hash Cond: (ld.link_id = l.link_id)' '-> Seq Scan on source_link ld (cost=0.00..58282.29 rows=3179029 width=10)' '-> Hash (cost=10963.12..10963.12 rows=96812 width=42)' ' -> Seq Scan on links l (cost=0.00..10963.12 rows=96812 width=42)' */ Each table has an index on link_id, defined like this CREATE INDEX links_link_id_idx ON links USING btree (link_id); CREATE INDEX source_link_link_id_idx ON source_link USING btree (link_id); Shouldn't this index prevent these sequential scans, or am I misreading this? Should this really take 43 seconds? thanks for any advice, charles ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
I'm no expert at this, but my understanding (which is limited) was that you are asking for the whole table, so indexing doesn't really get used (my understanding is that indexing helps to find the page for a subset of data more quickly than scanning through the whole lot). Also, you might be able to get some speed up by using a different join type (outer join and not null where clause)? cheers Ben On 25/08/2011, at 9:41 PM, Charles Galpin wrote: > If this is too off topic, please let me know and I'll sign up on a postgres > list to get help. But this is related to my use of postgis and If anyone > knows this stuff, it's you guys. > > I have an example query that I expect to be much faster, but my main concern > is we are about to do some visualization of historical congestion data which > will require queries across much larger data sets - like 150 million records > a day. We are about to test using partitions but the number per table will > still be much larger than what I am dealing with now. > > So here is a query I would think would be much faster than 43 seconds for two > tables, one with about 97k rows, and the other 3.2 million. > > explain select count(l.*) > from links l, source_link ld where l.link_id = ld.link_id; > /* > 'Aggregate (cost=174731.72..174731.73 rows=1 width=32)' > ' -> Hash Join (cost=13024.27..166784.14 rows=3179029 width=32)' > 'Hash Cond: (ld.link_id = l.link_id)' > '-> Seq Scan on source_link ld (cost=0.00..58282.29 rows=3179029 > width=10)' > '-> Hash (cost=10963.12..10963.12 rows=96812 width=42)' > ' -> Seq Scan on links l (cost=0.00..10963.12 rows=96812 > width=42)' > */ > > Each table has an index on link_id, defined like this > > CREATE INDEX links_link_id_idx > ON links > USING btree > (link_id); > > CREATE INDEX source_link_link_id_idx > ON source_link > USING btree > (link_id); > > Shouldn't this index prevent these sequential scans, or am I misreading this? > Should this really take 43 seconds? > > thanks for any advice, > charles > > ___ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] OT Understanding slow queries
Charles Galpin lhsw.com> writes: > explain select count(l.*) > from links l, source_link ld where l.link_id = ld.link_id; Can you try this returning some sort of value (like the keys) instead of a count(*)? count(*) can be pretty slow in Postgres, sometimes (I think) forcinga seq scan. I am not particularly confident this will fix your problem, but it is worth a shot. I would also experiment with DISTINCT and LIMIT, after making sure ANALYZE has been run appropriately. ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
[postgis-users] OT Understanding slow queries
If this is too off topic, please let me know and I'll sign up on a postgres list to get help. But this is related to my use of postgis and If anyone knows this stuff, it's you guys. I have an example query that I expect to be much faster, but my main concern is we are about to do some visualization of historical congestion data which will require queries across much larger data sets - like 150 million records a day. We are about to test using partitions but the number per table will still be much larger than what I am dealing with now. So here is a query I would think would be much faster than 43 seconds for two tables, one with about 97k rows, and the other 3.2 million. explain select count(l.*) from links l, source_link ld where l.link_id = ld.link_id; /* 'Aggregate (cost=174731.72..174731.73 rows=1 width=32)' ' -> Hash Join (cost=13024.27..166784.14 rows=3179029 width=32)' 'Hash Cond: (ld.link_id = l.link_id)' '-> Seq Scan on source_link ld (cost=0.00..58282.29 rows=3179029 width=10)' '-> Hash (cost=10963.12..10963.12 rows=96812 width=42)' ' -> Seq Scan on links l (cost=0.00..10963.12 rows=96812 width=42)' */ Each table has an index on link_id, defined like this CREATE INDEX links_link_id_idx ON links USING btree (link_id); CREATE INDEX source_link_link_id_idx ON source_link USING btree (link_id); Shouldn't this index prevent these sequential scans, or am I misreading this? Should this really take 43 seconds? thanks for any advice, charles ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users