Re: [postgis-users] OT Understanding slow queries

2011-09-01 Thread Aren Cambre
32 bit Windows is actually limited to 2 GB processes, or 3 GB if you start
Windows with a special */3GB* option.

Aren

On Thu, Sep 1, 2011 at 6:45 PM, Paragon Corporation  wrote:

> **
> Charles,
>
> Honestly I have little idea.  In theory you should fair better with Linux,
> but I think it depends on the kinds of processes you run how much better it
> is and how much ram you have.  One advantage that Linux has over windows
> (which is an advantage when running 32-bit Linux even, but more of an
> advantage I think when you are running 64-bit Linux) is that you can
> allocate more shared memory so processes that benefit on a lot of shared
> memory should benefit (basically processes where you are using more or less
> the same data in different ways can live in RAM.  Windows is limited not
> just for the 32-bit but also the 64-bit and I think this is just a
> fundamental flaw in PostgreSQL on windows.
>
> So generally speaking on windows I can't boost my shared memory more than
> say I think about 700 - 1GB without running into crashing issues.
> As far as work mem and so forth is concerned, on 32-bit windows you can
> have as much as 4GB per postgres process (and more with 64-bit windows) ,
> but of course if you are running
> Linux and 64-bit at that you can go up way more.
>
> I'm sure Linux folks will shoot me for saying this and chew me out, but I
> haven't really noticed much of a difference running my processes on 64-bit
> Linux vs. 32-bit window,
> but then again my processes are probably different from other peoples and I
> don't have a 32G ram Linux to take advantage of the massive more shared
> memory I can allocate.
>
> So on the low end (say 4-8GB ram range I suspect there isn't much of a
> difference, but when you get higher to the 32GB/64GB range, you would
> probably do a lot better with Linux.
>
> Thanks,
> Regina
>
>  --
> *From:* postgis-users-boun...@postgis.refractions.net [mailto:
> postgis-users-boun...@postgis.refractions.net] *On Behalf Of *Charles
> Galpin
> *Sent:* Thursday, September 01, 2011 3:10 PM
> *To:* PostGIS Users Discussion
>
> *Subject:* Re: [postgis-users] OT Understanding slow queries
>
> Hi Regina
>
> I am revisiting this again.  How much of a performance difference should
> one expect to see between the 32 bit version and the 64 bit version of
> postgres when using PostGIS for typical gis queries like filtering by
> bounding box, locating nearest points etc? Depending on how I break up my
> data, I'll have anywhere from 200k to 260M records per table depending on
> how I partition it on a machine with 32G of ram.
>
> I am trying to make a case to use linux for a specific project but without
> being able to say there are significant gains (in performance) I'm just
> fighting an uphill battle. At best right now I can use the latest
> postgis/postgresql under windows but only 32 bit.
>
> Thanks,
> charles
>
>  On Aug 26, 2011, at 3:50 PM, Paragon Corporation wrote:
>
>   >  Sadly it's for immediate production use and I'm forced to use windows
> which limits my version choices a bit given my lack of skill under windows
> to build postgis :(
>
>  >  charles
>
> Charles,
>
> You know we do have pretty much latest builds of PostGIS (even trunk on
> PostGIS website for windows).
> http://www.postgis.org/download/windows/experimental.php
>
> and as far as PostgreSQL -- they have released windows binaries for even
> PostgreSQL 9.1 RC1
> http://www.enterprisedb.com/products-services-training/pgdevdownload
>
> Can't get too much more current than that (all without having to compile
> anything unless you are talking about the 64-bit versions).
>
> Thanks,
> Regina
> http://www.postgis.us
>
>
>
>
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
>
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-09-01 Thread Paragon Corporation
Charles,
 
Honestly I have little idea.  In theory you should fair better with Linux,
but I think it depends on the kinds of processes you run how much better it
is and how much ram you have.  One advantage that Linux has over windows
(which is an advantage when running 32-bit Linux even, but more of an
advantage I think when you are running 64-bit Linux) is that you can
allocate more shared memory so processes that benefit on a lot of shared
memory should benefit (basically processes where you are using more or less
the same data in different ways can live in RAM.  Windows is limited not
just for the 32-bit but also the 64-bit and I think this is just a
fundamental flaw in PostgreSQL on windows.
 
So generally speaking on windows I can't boost my shared memory more than
say I think about 700 - 1GB without running into crashing issues.
As far as work mem and so forth is concerned, on 32-bit windows you can have
as much as 4GB per postgres process (and more with 64-bit windows) , but of
course if you are running 
Linux and 64-bit at that you can go up way more.
 
I'm sure Linux folks will shoot me for saying this and chew me out, but I
haven't really noticed much of a difference running my processes on 64-bit
Linux vs. 32-bit window,
but then again my processes are probably different from other peoples and I
don't have a 32G ram Linux to take advantage of the massive more shared
memory I can allocate.
 
So on the low end (say 4-8GB ram range I suspect there isn't much of a
difference, but when you get higher to the 32GB/64GB range, you would
probably do a lot better with Linux.
 
Thanks,
Regina


  _  

From: postgis-users-boun...@postgis.refractions.net
[mailto:postgis-users-boun...@postgis.refractions.net] On Behalf Of Charles
Galpin
Sent: Thursday, September 01, 2011 3:10 PM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] OT Understanding slow queries


Hi Regina 

I am revisiting this again.  How much of a performance difference should one
expect to see between the 32 bit version and the 64 bit version of postgres
when using PostGIS for typical gis queries like filtering by bounding box,
locating nearest points etc? Depending on how I break up my data, I'll have
anywhere from 200k to 260M records per table depending on how I partition it
on a machine with 32G of ram.

I am trying to make a case to use linux for a specific project but without
being able to say there are significant gains (in performance) I'm just
fighting an uphill battle. At best right now I can use the latest
postgis/postgresql under windows but only 32 bit.

Thanks,
charles

On Aug 26, 2011, at 3:50 PM, Paragon Corporation wrote:


 >  Sadly it's for immediate production use and I'm forced to use windows
which limits my version choices a bit given my lack of skill under windows
to build postgis :(

 >  charles 
 
Charles,
 
You know we do have pretty much latest builds of PostGIS (even trunk on
PostGIS website for windows).
http://www.postgis.org/download/windows/experimental.php
 
and as far as PostgreSQL -- they have released windows binaries for even
PostgreSQL 9.1 RC1
http://www.enterprisedb.com/products-services-training/pgdevdownload
 
Can't get too much more current than that (all without having to compile
anything unless you are talking about the 64-bit versions).
 
Thanks,
Regina
http://www.postgis.us <http://www.postgis.us/> 
 
 
 



___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users



___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-09-01 Thread Charles Galpin
Hi Regina

I am revisiting this again.  How much of a performance difference should one 
expect to see between the 32 bit version and the 64 bit version of postgres 
when using PostGIS for typical gis queries like filtering by bounding box, 
locating nearest points etc? Depending on how I break up my data, I'll have 
anywhere from 200k to 260M records per table depending on how I partition it on 
a machine with 32G of ram.

I am trying to make a case to use linux for a specific project but without 
being able to say there are significant gains (in performance) I'm just 
fighting an uphill battle. At best right now I can use the latest 
postgis/postgresql under windows but only 32 bit.

Thanks,
charles

On Aug 26, 2011, at 3:50 PM, Paragon Corporation wrote:

>  >  Sadly it's for immediate production use and I'm forced to use windows 
> which limits my version choices a bit given my lack of skill under windows to 
> build postgis :(
>  >  charles 
>  
> Charles,
>  
> You know we do have pretty much latest builds of PostGIS (even trunk on 
> PostGIS website for windows).
> http://www.postgis.org/download/windows/experimental.php
>  
> and as far as PostgreSQL -- they have released windows binaries for even 
> PostgreSQL 9.1 RC1
> http://www.enterprisedb.com/products-services-training/pgdevdownload
>  
> Can't get too much more current than that (all without having to compile 
> anything unless you are talking about the 64-bit versions).
>  
> Thanks,
> Regina
> http://www.postgis.us
>  
>  
>  
> 
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-08-26 Thread Charles Galpin

On Aug 26, 2011, at 3:50 PM, Paragon Corporation wrote:

> You know we do have pretty much latest builds of PostGIS (even trunk on 
> PostGIS website for windows).
> http://www.postgis.org/download/windows/experimental.php
>  
> and as far as PostgreSQL -- they have released windows binaries for even 
> PostgreSQL 9.1 RC1
> http://www.enterprisedb.com/products-services-training/pgdevdownload
>  
> Can't get too much more current than that (all without having to compile 
> anything unless you are talking about the 64-bit versions).

Sorry, I should have been more clear. I also use pgRouting and afaik I am tied 
to 8.4 on windows without building this stuff myself.  If I am wrong, please 
let me know!

charles

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-08-26 Thread Paragon Corporation
 >  Sadly it's for immediate production use and I'm forced to use windows
which limits my version choices a bit given my lack of skill under windows
to build postgis :(

 >  charles 
 
Charles,
 
You know we do have pretty much latest builds of PostGIS (even trunk on
PostGIS website for windows).
http://www.postgis.org/download/windows/experimental.php
 
and as far as PostgreSQL -- they have released windows binaries for even
PostgreSQL 9.1 RC1
http://www.enterprisedb.com/products-services-training/pgdevdownload
 
Can't get too much more current than that (all without having to compile
anything unless you are talking about the 64-bit versions).
 
Thanks,
Regina
http://www.postgis.us
 
 
 



___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-08-25 Thread Stephen Woodbridge

On 8/25/2011 1:21 PM, Ben Madin wrote:

Steve,

does this just apply to count(*), or is count(id) just as bad? I was
originally a MySQL user and count(*) could be very efficient there.



Best way to evaluate this is with explain. But yes both have the same 
full table scan.


MySQL maintains row count as one of its tables stats so it just pulls 
that from the stats metadata. If you just want an approximate row count, 
there are some status based on the last analyze. If you use pgadmin3 you 
can see in the table properties:


Rows (estimated)1957620 

This is one of the largest hurdles for MySQL users moving over to 
postgresql because it is so obvious and in your face. But the things I 
can do today with postgresql and with postgis, I could never do in MySQL.


-Steve

[snip history]
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-08-25 Thread fork
Charles Galpin  lhsw.com> writes:

> My apologies

Hehe -- none necessary -- welcome to the light side. 

> I would consider Postgres on trunk if it has features youreally need.
> 
> Sadly it's for immediate production use and I'm forced to use windows which
limits my version choices a bit given my lack of skill under windows to build
postgis :(

Yeah, so much for index only queries ;)

You ask interesting questions -- don't be a stranger.


___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-08-25 Thread Charles Galpin

On Aug 25, 2011, at 4:00 PM, fork wrote:

> One thing -- while we hope that you ask lots of questions on this list, would
> you not "top posting", and trimming out non-germane text?  Threading and
> trimming make a conversation MUCH easier to follow.

My apologies

> Also -- if you are developing an app that will be rolled out later or that is
> somewhat academic, I would consider Postgres on trunk if it has features you
> really need.  It is easy to build, just make sure you back up your data...

Sadly it's for immediate production use and I'm forced to use windows which 
limits my version choices a bit given my lack of skill under windows to build 
postgis :(

charles

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-08-25 Thread fork
Charles Galpin  lhsw.com> writes:


> All of your feedback has been most helpful.  Yes this query is contrived but I
*thought* it was
> representative of a worst case scenario that might be similar to future data
sets and it's likely not.  

I think we are all glad to help.  The count(*) assumption is reasonable enough
to be made by LOTS of people.  

One thing -- while we hope that you ask lots of questions on this list, would
you not "top posting", and trimming out non-germane text?  Threading and
trimming make a conversation MUCH easier to follow.

Also -- if you are developing an app that will be rolled out later or that is
somewhat academic, I would consider Postgres on trunk if it has features you
really need.  It is easy to build, just make sure you back up your data...



___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-08-25 Thread Bergenroth, Brandon
> there is no way to get a count without seq scanning.  

True for a single table, in the given example, there was a join.

A possible outcome could have been a seq scan on the smaller table and a nested 
loop index lookup on the bigger table.  The optimizer must have thought this 
was more expensive than the full scan with hash join.



> -Original Message-
> From: postgis-users-boun...@postgis.refractions.net [mailto:postgis-
> users-boun...@postgis.refractions.net] On Behalf Of fork
> Sent: Thursday, August 25, 2011 1:56 PM
> To: postgis-users@postgis.refractions.net
> Subject: Re: [postgis-users] OT Understanding slow queries
> 
> Ben Madin  remoteinformation.com.au> writes:
> 
> > does this just apply to count(*), or is count(id) just as bad? I was
> originally a MySQL user and count(*)
> > could be very efficient there.
> 
> My understanding is that Postgres does not keep record length for any
> of its
> tables internally, so there is no way to get a count without seq
> scanning.  The
> trade off is that inserts deletes are faster but a very common query is
> much
> slower.  I don't know if the planner could benefit in any way from the
> count
> being available, though.
> 
> The lists say to use a trigger on inserts and deletes to update a
> metadata table
> if you really do need to know how many elements are in it exactly, but
> that is a
> far less frequent need than you may think (for example an EXISTS can
> get you an
> answer to "are there any records" more quickly than a count(*)).  I
> think you
> can do a quick and rough estimate by doing some math with the table
> size on
> disk, but I never have.
> 
> It is unfortunate that the first (rather lame) "benchmark" anyone tries
> to do
> with a new database is run "select count(*) from x" -- I am sure lots
> of people
> have been turned off from PG because this particular query is slow
> compared to
> MySQL.
> 
> (MySQL always wins in the more hollywood competitions against PG, but
> fails in
> the long run, IMHO)
> 
> 
> 
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-08-25 Thread Charles Galpin
I haven't noticed count(*) causing additional query time, but I was just using 
it to rule out overhead of pulling all those rows back. 

All of your feedback has been most helpful.  Yes this query is contrived but I 
*thought* it was representative of a worst case scenario that might be similar 
to future data sets and it's likely not.  Due to your prompting I did figure 
out why this particular problem i was seeing was slow and it was really just 
user stupidity.

The real use case is using geoserver to visualize this data. In the worst case 
scenario someone zooms out nice and far and effectively gets all the links in 
that join.  Now geoserver seems to be able to get all the links with something 
like "get * from links" and generate the tiles at this zoom level fairly 
quickly so I figured all the other overhead being equal, the join (sans the 
count(*) ) would be the worst case and although it is, I think you are right 
that it simply has to join across all those ids and there is no way to improve 
that if selecting all. 

I'll show a more reasonable case of what usually happens, but I'll explain the 
actual problem. I though adding a distinct on (link_id) would speed up the join 
since the source_link table has many rows for each link_id but it turns out 
this was what was making it slow (and I didn't realize that's what the real 
query was doing). I also though for some reason an index would make distinct 
run quickly since the index is effectively all unique values right?

So here is what is typically going on for the query and after changing the 
"select distinct on (link_id) l.*" to "select l.*" it performs reasonably with 
an additional level of filtering with a bounding box. 

explain analyse SELECT 
"link_id",encode(ST_AsBinary(ST_Force_2D("the_geom")),'base64') as "the_geom" 
FROM (select l.* from links l, source_link ld where l.link_id = ld.link_id) as 
"vtable" 
  WHERE "the_geom" && ST_GeomFromText('POLYGON ((-74.79526213727112 
40.11142841966784, -74.79526213727112 40.2127052195, -74.66273955428638 
40.2127052195, -74.66273955428638 40.11142841966784, -74.79526213727112 
40.11142841966784))', 4326)
/*
'Hash Join  (cost=3722.92..74955.80 rows=58813 width=378) (actual 
time=412.729..5946.610 rows=44469 loops=1)'
'  Hash Cond: (ld.link_id = l.link_id)'
'  ->  Seq Scan on source_link ld  (cost=0.00..58282.29 rows=3179029 width=10) 
(actual time=0.026..2823.685 rows=3179029 loops=1)'
'  ->  Hash  (cost=3706.60..3706.60 rows=1306 width=378) (actual 
time=7.805..7.805 rows=1285 loops=1)'
'->  Bitmap Heap Scan on links l  (cost=74.42..3706.60 rows=1306 
width=378) (actual time=0.944..6.093 rows=1285 loops=1)'
'  Recheck Cond: (the_geom && 
'010320E61001000500E6D42993E5B252C0BF285549430E4440E6D42993E5B252C0ADE2B3EC391B44403DDB29536AAA52C0ADE2B3EC391B44403DDB29536AAA52C0BF285549430E4440E6D42993E5B252C0BF285549430E4440'::geometry)'
'  ->  Bitmap Index Scan on links_geom_idx  (cost=0.00..74.09 
rows=1306 width=0) (actual time=0.883..0.883 rows=1285 loops=1)'
'Index Cond: (the_geom && 
'010320E61001000500E6D42993E5B252C0BF285549430E4440E6D42993E5B252C0ADE2B3EC391B44403DDB29536AAA52C0ADE2B3EC391B44403DDB29536AAA52C0BF285549430E4440E6D42993E5B252C0BF285549430E4440'::geometry)'
'Total runtime: 5983.473 ms'

So what prompted my initial concern is solved.  This slowness caught me at a 
time where in the back of my mind i am contemplating our next data challenge 
which is why I looked into it.  If I run into problems with our other data set 
I'll be sure to cover all my options and give realistic queries if I ask for 
help. 

Thanks again for all the advice.

charles

- Original Message -
From: "Ben Madin" 
To: "PostGIS Users Discussion" 
Sent: Thursday, August 25, 2011 1:21:00 PM GMT -05:00 US/Canada Eastern
Subject: Re: [postgis-users] OT Understanding slow queries

Steve,

does this just apply to count(*), or is count(id) just as bad? I was originally 
a MySQL user and count(*) could be very efficient there.

cheers

Ben


On 26/08/2011, at 12:01 AM, Stephen Woodbridge wrote:

> The issue here is the count(*) which forces a full table scan in postgresql 
> as fork mentioned. You need to look at a real query, unless you are really 
> doing a count(*).
> 
> -Steve
> 
> On 8/25/2011 11:49 AM, Ben Madin wrote:
>> I'm no expert at this, but my understanding (which is limited) was
>> that you are asking for the whole table, so indexing doesn't really
>> get used (my understanding is that indexing helps to find the page
>> for a subset of data more qui

Re: [postgis-users] OT Understanding slow queries

2011-08-25 Thread Bborie Park



On 08/25/2011 10:56 AM, fork wrote:

Ben Madin  remoteinformation.com.au>  writes:


does this just apply to count(*), or is count(id) just as bad? I was

originally a MySQL user and count(*)

could be very efficient there.


My understanding is that Postgres does not keep record length for any of its
tables internally, so there is no way to get a count without seq scanning.  The
trade off is that inserts deletes are faster but a very common query is much
slower.  I don't know if the planner could benefit in any way from the count
being available, though.

The lists say to use a trigger on inserts and deletes to update a metadata table
if you really do need to know how many elements are in it exactly, but that is a
far less frequent need than you may think (for example an EXISTS can get you an
answer to "are there any records" more quickly than a count(*)).  I think you
can do a quick and rough estimate by doing some math with the table size on
disk, but I never have.

It is unfortunate that the first (rather lame) "benchmark" anyone tries to do
with a new database is run "select count(*) from x" -- I am sure lots of people
have been turned off from PG because this particular query is slow compared to
MySQL.

(MySQL always wins in the more hollywood competitions against PG, but fails in
the long run, IMHO)



I think PostgreSQL 9.2 will have index-only scans that should improve 
the performance of SELECT count(*) queries.


http://rhaas.blogspot.com/2011/08/index-only-scans-now-theres-patch.html

Granted, this is in trunk so it won't help for any production databases.

-bborie

--
Bborie Park
Programmer
Center for Vectorborne Diseases
UC Davis
530-752-8380
bkp...@ucdavis.edu
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-08-25 Thread fork
Ben Madin  remoteinformation.com.au> writes:

> does this just apply to count(*), or is count(id) just as bad? I was
originally a MySQL user and count(*)
> could be very efficient there.

My understanding is that Postgres does not keep record length for any of its
tables internally, so there is no way to get a count without seq scanning.  The
trade off is that inserts deletes are faster but a very common query is much
slower.  I don't know if the planner could benefit in any way from the count
being available, though.

The lists say to use a trigger on inserts and deletes to update a metadata table
if you really do need to know how many elements are in it exactly, but that is a
far less frequent need than you may think (for example an EXISTS can get you an
answer to "are there any records" more quickly than a count(*)).  I think you
can do a quick and rough estimate by doing some math with the table size on
disk, but I never have.  

It is unfortunate that the first (rather lame) "benchmark" anyone tries to do
with a new database is run "select count(*) from x" -- I am sure lots of people
have been turned off from PG because this particular query is slow compared to
MySQL. 

(MySQL always wins in the more hollywood competitions against PG, but fails in
the long run, IMHO) 



___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-08-25 Thread Ben Madin
Steve,

does this just apply to count(*), or is count(id) just as bad? I was originally 
a MySQL user and count(*) could be very efficient there.

cheers

Ben


On 26/08/2011, at 12:01 AM, Stephen Woodbridge wrote:

> The issue here is the count(*) which forces a full table scan in postgresql 
> as fork mentioned. You need to look at a real query, unless you are really 
> doing a count(*).
> 
> -Steve
> 
> On 8/25/2011 11:49 AM, Ben Madin wrote:
>> I'm no expert at this, but my understanding (which is limited) was
>> that you are asking for the whole table, so indexing doesn't really
>> get used (my understanding is that indexing helps to find the page
>> for a subset of data more quickly than scanning through the whole
>> lot).
>> 
>> Also, you might be able to get some speed up by using a different
>> join type (outer join and not null where clause)?
>> 
>> cheers
>> 
>> Ben
>> 
>> 
>> On 25/08/2011, at 9:41 PM, Charles Galpin wrote:
>> 
>>> If this is too off topic, please let me know and I'll sign up on a
>>> postgres list to get help. But this is related to my use of postgis
>>> and If anyone knows this stuff, it's you guys.
>>> 
>>> I have an example query that I expect to be much faster, but my
>>> main concern is we are about to do some visualization of historical
>>> congestion data which will require queries across much larger data
>>> sets - like 150 million records a day. We are about to test using
>>> partitions but the number per table will still be much larger than
>>> what I am dealing with now.
>>> 
>>> So here is a query I would think would be much faster than 43
>>> seconds for two tables, one with about 97k rows, and the other 3.2
>>> million.
>>> 
>>> explain select count(l.*) from links l, source_link ld where
>>> l.link_id = ld.link_id; /* 'Aggregate  (cost=174731.72..174731.73
>>> rows=1 width=32)' '  ->   Hash Join  (cost=13024.27..166784.14
>>> rows=3179029 width=32)' 'Hash Cond: (ld.link_id =
>>> l.link_id)' '->   Seq Scan on source_link ld
>>> (cost=0.00..58282.29 rows=3179029 width=10)' '->   Hash
>>> (cost=10963.12..10963.12 rows=96812 width=42)' '  ->
>>> Seq Scan on links l  (cost=0.00..10963.12 rows=96812 width=42)' */
>>> 
>>> Each table has an index on link_id, defined like this
>>> 
>>> CREATE INDEX links_link_id_idx ON links USING btree (link_id);
>>> 
>>> CREATE INDEX source_link_link_id_idx ON source_link USING btree
>>> (link_id);
>>> 
>>> Shouldn't this index prevent these sequential scans, or am I
>>> misreading this?  Should this really take 43 seconds?
>>> 
>>> thanks for any advice, charles
>>> 
>>> ___ postgis-users
>>> mailing list postgis-users@postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>> 
>> ___ postgis-users mailing
>> list postgis-users@postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-08-25 Thread Stephen Woodbridge
The issue here is the count(*) which forces a full table scan in 
postgresql as fork mentioned. You need to look at a real query, unless 
you are really doing a count(*).


-Steve

On 8/25/2011 11:49 AM, Ben Madin wrote:

I'm no expert at this, but my understanding (which is limited) was
that you are asking for the whole table, so indexing doesn't really
get used (my understanding is that indexing helps to find the page
for a subset of data more quickly than scanning through the whole
lot).

Also, you might be able to get some speed up by using a different
join type (outer join and not null where clause)?

cheers

Ben


On 25/08/2011, at 9:41 PM, Charles Galpin wrote:


If this is too off topic, please let me know and I'll sign up on a
postgres list to get help. But this is related to my use of postgis
and If anyone knows this stuff, it's you guys.

I have an example query that I expect to be much faster, but my
main concern is we are about to do some visualization of historical
congestion data which will require queries across much larger data
sets - like 150 million records a day. We are about to test using
partitions but the number per table will still be much larger than
what I am dealing with now.

So here is a query I would think would be much faster than 43
seconds for two tables, one with about 97k rows, and the other 3.2
million.

explain select count(l.*) from links l, source_link ld where
l.link_id = ld.link_id; /* 'Aggregate  (cost=174731.72..174731.73
rows=1 width=32)' '  ->   Hash Join  (cost=13024.27..166784.14
rows=3179029 width=32)' 'Hash Cond: (ld.link_id =
l.link_id)' '->   Seq Scan on source_link ld
(cost=0.00..58282.29 rows=3179029 width=10)' '->   Hash
(cost=10963.12..10963.12 rows=96812 width=42)' '  ->
Seq Scan on links l  (cost=0.00..10963.12 rows=96812 width=42)' */

Each table has an index on link_id, defined like this

CREATE INDEX links_link_id_idx ON links USING btree (link_id);

CREATE INDEX source_link_link_id_idx ON source_link USING btree
(link_id);

Shouldn't this index prevent these sequential scans, or am I
misreading this?  Should this really take 43 seconds?

thanks for any advice, charles

___ postgis-users
mailing list postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


___ postgis-users mailing
list postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-08-25 Thread Ben Madin
I'm no expert at this, but my understanding (which is limited) was that you are 
asking for the whole table, so indexing doesn't really get used (my 
understanding is that indexing helps to find the page for a subset of data more 
quickly than scanning through the whole lot).

Also, you might be able to get some speed up by using a different join type 
(outer join and not null where clause)?

cheers

Ben


On 25/08/2011, at 9:41 PM, Charles Galpin wrote:

> If this is too off topic, please let me know and I'll sign up on a postgres 
> list to get help. But this is related to my use of postgis and If anyone 
> knows this stuff, it's you guys.
> 
> I have an example query that I expect to be much faster, but my main concern 
> is we are about to do some visualization of historical congestion data which 
> will require queries across much larger data sets - like 150 million records 
> a day. We are about to test using partitions but the number per table will 
> still be much larger than what I am dealing with now.
> 
> So here is a query I would think would be much faster than 43 seconds for two 
> tables, one with about 97k rows, and the other 3.2 million.
> 
> explain select count(l.*) 
> from links l, source_link ld where l.link_id = ld.link_id;
> /*
> 'Aggregate  (cost=174731.72..174731.73 rows=1 width=32)'
> '  ->  Hash Join  (cost=13024.27..166784.14 rows=3179029 width=32)'
> 'Hash Cond: (ld.link_id = l.link_id)'
> '->  Seq Scan on source_link ld  (cost=0.00..58282.29 rows=3179029 
> width=10)'
> '->  Hash  (cost=10963.12..10963.12 rows=96812 width=42)'
> '  ->  Seq Scan on links l  (cost=0.00..10963.12 rows=96812 
> width=42)'
> */
> 
> Each table has an index on link_id, defined like this
> 
> CREATE INDEX links_link_id_idx
>  ON links
>  USING btree
>  (link_id);
> 
> CREATE INDEX source_link_link_id_idx
>  ON source_link
>  USING btree
>  (link_id);
> 
> Shouldn't this index prevent these sequential scans, or am I misreading this? 
>  Should this really take 43 seconds?
> 
> thanks for any advice,
> charles
> 
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] OT Understanding slow queries

2011-08-25 Thread fork
Charles Galpin  lhsw.com> writes:

> explain select count(l.*) 
> from links l, source_link ld where l.link_id = ld.link_id;

Can you try this returning some sort of value (like the keys) instead of a
count(*)?  count(*) can be pretty slow in Postgres, sometimes 
(I think) forcinga seq scan.

I am not particularly confident this will fix your problem, 
but it is worth a shot.

I would also experiment with DISTINCT and LIMIT, after
 making sure ANALYZE has been run appropriately.

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


[postgis-users] OT Understanding slow queries

2011-08-25 Thread Charles Galpin
If this is too off topic, please let me know and I'll sign up on a postgres 
list to get help. But this is related to my use of postgis and If anyone knows 
this stuff, it's you guys.

I have an example query that I expect to be much faster, but my main concern is 
we are about to do some visualization of historical congestion data which will 
require queries across much larger data sets - like 150 million records a day. 
We are about to test using partitions but the number per table will still be 
much larger than what I am dealing with now.

So here is a query I would think would be much faster than 43 seconds for two 
tables, one with about 97k rows, and the other 3.2 million.

explain select count(l.*) 
from links l, source_link ld where l.link_id = ld.link_id;
/*
'Aggregate  (cost=174731.72..174731.73 rows=1 width=32)'
'  ->  Hash Join  (cost=13024.27..166784.14 rows=3179029 width=32)'
'Hash Cond: (ld.link_id = l.link_id)'
'->  Seq Scan on source_link ld  (cost=0.00..58282.29 rows=3179029 
width=10)'
'->  Hash  (cost=10963.12..10963.12 rows=96812 width=42)'
'  ->  Seq Scan on links l  (cost=0.00..10963.12 rows=96812 
width=42)'
*/

Each table has an index on link_id, defined like this

CREATE INDEX links_link_id_idx
  ON links
  USING btree
  (link_id);
  
CREATE INDEX source_link_link_id_idx
  ON source_link
  USING btree
  (link_id);

Shouldn't this index prevent these sequential scans, or am I misreading this?  
Should this really take 43 seconds?

thanks for any advice,
charles

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users