Re: [PERFORM] bad plan and LIMIT

2009-05-01 Thread Adam Ruth
You could try changing the IN to an EXISTS, that may alter how the  
optimizer weighs the limit.


SELECT ID FROM ps_image WHERE EXISTS (SELECT null FROM  
ps_gallery_image WHERE gallery_id ='G7ejKGoWS_cY' and image_id =  
ps_image.id) ORDER BY LOWER(FILE_NAME) ASC


On 30/04/2009, at 3:51 AM, James Nelson wrote:



Hi, I'm hoping you guys can help with improving this query I'm  
having a problem with. The main problem is that the query plan  
changes depending on the value of the LIMIT clause, with small  
values using a poor plan and running very slowly. The two times are  
roughly 5 minutes for the bad plan and 1.5 secs for the good plan.


I have read a little about how the query planner takes into account  
the limit clause, and I can see the effect this has on the costs  
shown by explain. The problem is that the estimated cost ends up  
being wildly inaccurate. I'm not sure if this a problem with the  
planner or if it is something I am doing wrong on my end.


the query (without the limit clause):

SELECT ID FROM ps_image WHERE id IN (SELECT image_id FROM  
ps_gallery_image WHERE gallery_id='G7ejKGoWS_cY') ORDER BY  
LOWER(FILE_NAME) ASC


The ps_image table has about 24 million rows, ps_gallery_image has  
about 14 million. The query above produces roughly 50 thousand rows.


When looking at the explain with the limit, I can see the  
interpolation that the planner does for the limit node (arriving at  
a final cost of 458.32 for this example) but not sure why it is  
inaccurate compared to the actual times.


Thanks in advance for taking a look at this, let me know if there is  
additional information I should provide.


Some information about the tables  and the explains follow below.

James Nelson

[ja...@db2 ~] psql --version
psql (PostgreSQL) 8.3.5
contains support for command-line editing

photoshelter=# \d ps_image
 Table public.ps_image
  Column |   Type   | Modifiers
---+-- 
+---

id| character varying(16)| not null
user_id   | character varying(16)|
album_id  | character varying(16)| not null
parent_id | character varying(16)|
file_name | character varying(200)   |
file_size | bigint   |
 20 rows snipped 
Indexes:
  ps_image_pkey PRIMARY KEY, btree (id)
  i_file_name_l btree (lower(file_name::text))
 indexes, fk constraints and triggers snipped 

photoshelter=# \d ps_gallery_image
Table public.ps_gallery_image
  Column |   Type   |   Modifiers
---+--+
gallery_id| character varying(16)| not null
image_id  | character varying(16)| not null
display_order | integer  | not null default 0
caption   | character varying(2000)  |
ctime | timestamp with time zone | not null default now()
mtime | timestamp with time zone | not null default now()
id| character varying(16)| not null
Indexes:
  ps_gallery_image_pkey PRIMARY KEY, btree (id)
  gi_gallery_id btree (gallery_id)
  gi_image_id btree (image_id)
Foreign-key constraints:
  ps_gallery_image_gallery_id_fkey FOREIGN KEY (gallery_id)  
REFERENCES ps_gallery(id) ON DELETE CASCADE
  ps_gallery_image_image_id_fkey FOREIGN KEY (image_id) REFERENCES  
ps_image(id) ON DELETE CASCADE

Triggers:
  ps_image_gi_sync AFTER INSERT OR DELETE OR UPDATE ON  
ps_gallery_image FOR EACH ROW EXECUTE PROCEDURE ps_image_sync()


= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
==

explain analyze for bad plan

photoshelter=# explain  analyze SELECT ID FROM ps_image WHERE id IN  
(SELECT image_id FROM ps_gallery_image WHERE  
gallery_id='G7ejKGoWS_cY') ORDER BY LOWER(FILE_NAME) ASC limit 1;
   QUERY 
 PLAN

-
Limit  (cost=0.00..458.32 rows=1 width=36) (actual  
time=709831.847..709831.847 rows=1 loops=1)
 -  Nested Loop IN Join  (cost=0.00..17700128.78 rows=38620  
width=36) (actual time=709831.845..709831.845 rows=1 loops=1)
   -  Index Scan using i_file_name_l on ps_image   
(cost=0.00..1023863.22 rows=24460418 width=36) (actual  
time=0.063..271167.293 rows=8876340 loops=1)
   -  Index Scan using gi_image_id on ps_gallery_image   
(cost=0.00..0.85 rows=1 width=17) (actual time=0.048..0.048 rows=0  
loops=8876340)
 Index Cond: ((ps_gallery_image.image_id)::text =  
(ps_image.id)::text)
 Filter: ((ps_gallery_image.gallery_id)::text =  
'G7ejKGoWS_cY'::text)

Total runtime: 709831.932 ms

= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 

[PERFORM]

2009-04-23 Thread Adam Ruth


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [ADMIN] Databases Vs. Schemas

2004-03-26 Thread Adam Ruth
On Mar 23, 2004, at 11:16 AM, Subbiah, Stalin wrote:

And we also created rules to allow update, delete, and insert on those
views so that they looked like tables.  The reason we did this is
because we ran into issues with too many open files during pg_dump 
when
we had thousands of tables instead of about 1 hundred tables and
thousands of views.
Is it because you had smaller value set for max. allowable number of 
open
files descriptor. what was ulimit -a set to ?
It was actually running on OS X and it was a shared memory issue.  We 
would have had to recompile the Darwin kernel to get a bigger SHMMAX, 
but this solution seemed better since we would possibly be installing 
on servers where we wouldn't have that much leeway.  I think that the 
view idea works better for a number of other reasons.  For one, I can 
do a query on the base table and see all of the rows for all of the 
schemas at once, that has proven quite useful.


We, however, did have a need to periodically select data from 2 
schemas
at a time, and it was simpler logic than if we needed 2 database
connections.
Adam Ruth

On Mar 22, 2004, at 2:30 PM, Subbiah, Stalin wrote:

--sorry to repost, just subscribed to the list. hopefully it gets to
the
list this time --
Hi All,

We are evaluating the options for having multiple databases vs.
schemas on a
single database cluster for a custom grown app that we developed. Each
app
installs same set of tables for each service. And the service could
easily
be in thousands. so Is it better to have 1000 databases vs 1000
schemas in a
database cluster. What are the performance overhead of having multiple
databases vs. schemas (if any). I'm leaning towards having schemas
rather
than databases but i would like to get others opinion on this.
Appreciate
your reply.
Thanks,
Stalin
---(end of
broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that 
your
  message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] [ADMIN] Databases Vs. Schemas

2004-03-26 Thread Adam Ruth
We have a  similarly sized database and we went with schemas.  We did 
something different, though, we created one schema that contained all 
of the tables (we used the public schema) and then created the hundreds 
of schemas with views that access only the related rows for a 
particular schema.  Something like this:

create table public.file (siteid int, id int, [fields]);
create schema sc1;
create view sc1.file as select * from public.file where siteid = 1;
create schema sc2;
create view sc2.file as select * from public file where siteid = 2;
And we also created rules to allow update, delete, and insert on those 
views so that they looked like tables.  The reason we did this is 
because we ran into issues with too many open files during pg_dump when 
we had thousands of tables instead of about 1 hundred tables and 
thousands of views.

We, however, did have a need to periodically select data from 2 schemas 
at a time, and it was simpler logic than if we needed 2 database 
connections.

Adam Ruth

On Mar 22, 2004, at 2:30 PM, Subbiah, Stalin wrote:

--sorry to repost, just subscribed to the list. hopefully it gets to 
the
list this time --

Hi All,

We are evaluating the options for having multiple databases vs. 
schemas on a
single database cluster for a custom grown app that we developed. Each 
app
installs same set of tables for each service. And the service could 
easily
be in thousands. so Is it better to have 1000 databases vs 1000 
schemas in a
database cluster. What are the performance overhead of having multiple
databases vs. schemas (if any). I'm leaning towards having schemas 
rather
than databases but i would like to get others opinion on this. 
Appreciate
your reply.

Thanks,
Stalin
---(end of 
broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] [pgsql-advocacy] MySQL+InnoDB vs. PostgreSQL test?

2004-02-03 Thread Adam Ruth
Wow, I didn't know that (didn't get far enough to test any rollback).  
That's not a good thing.  facetiousBut then again, it's MySQL who 
needs rollback anyway?/facetious

On Feb 2, 2004, at 5:44 PM, Christopher Kings-Lynne wrote:

One more thing that annoyed me.  If you started a process, such as a 
large DDL operation, or heaven forbid, a cartesian join (what?  I 
never do that!).
I believe InnoDB also has O(n) rollback time.  eg. if you are rolling 
back 100 million row changes, it takes a long, long time.  In 
PostgreSQL rolling back is O(1)...

Chris



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] [pgsql-advocacy] MySQL+InnoDB vs. PostgreSQL test?

2004-02-02 Thread Adam Ruth
Josh,

I evaluated MySQL + InnoDB briefly for a project, once.  I didn't get 
very far because of some severe limitations in MySQL.

I had to import all of the data from an existing database (MS SQL).  
One of the tables was about 8 million rows, 10 fields, and had 5 
indexes.  I found it quite impossible to import into MySQL.  I would 
import the data into a table with no indexes, then perform a bunch of 
manipulation on it (I wasn't just converting from MS SQL, but also 
needed to alter quite a bit of the structure).  After the manipulation, 
I would drop some columns and build the indexes.  It took MySQL over 4 
days to do this!

What I found out was that any DDL changes to a table in MySQL actually 
does this:  create a new table, copy all of the data over, then drop 
the old table and rename the new one.  Whenever I added a new index, 
MySQL would go through the process of rebuilding each previous index.  
Same thing when adding or dropping columns.

I could not find a way to import all of the data in a reasonable amount 
of time.  For comparison, it took less that 45 minutes to import all of 
the data in to PostgreSQL (that's ALL of the data, not just that one 
table).

Needless to say (but I'll say it anyway :-), I didn't get any farther 
in my evaluation, there was no point.

One more thing that annoyed me.  If you started a process, such as a 
large DDL operation, or heaven forbid, a cartesian join (what?  I never 
do that!).  There's no way to cancel it with InnoDB.  You have to wait 
for it to finish.  Hitting ctrl+c in their command line tool only kills 
the command line tool, the process continues.  Even if you stop the 
database and restart it (including with a hard boot), it will pick 
right up where it left off and continue.  That proved to be way too 
much of a pain for me.

Disclaimer:  I'm not a real MySQL expert, or anything.  There could be 
ways of getting around this, but after two weeks of trying, I decided 
to give up.  It only took me a few hours to build the requisite 
PostgreSQL scripts and I never looked back.

Adam Ruth

On Feb 2, 2004, at 10:21 AM, Josh Berkus wrote:

Folks,

I've had requests from a couple of businesses to see results of 
infomal MySQL
+InnoDB vs. PostgreSQL tests.I know that we don't have the setup 
to do
full formal benchmarking, but surely someone in our community has gone
head-to-head on your own application?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco
---(end of 
broadcast)---
TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster