Re: [PERFORM] bad plan and LIMIT
You could try changing the IN to an EXISTS, that may alter how the optimizer weighs the limit. SELECT ID FROM ps_image WHERE EXISTS (SELECT null FROM ps_gallery_image WHERE gallery_id ='G7ejKGoWS_cY' and image_id = ps_image.id) ORDER BY LOWER(FILE_NAME) ASC On 30/04/2009, at 3:51 AM, James Nelson wrote: Hi, I'm hoping you guys can help with improving this query I'm having a problem with. The main problem is that the query plan changes depending on the value of the LIMIT clause, with small values using a poor plan and running very slowly. The two times are roughly 5 minutes for the bad plan and 1.5 secs for the good plan. I have read a little about how the query planner takes into account the limit clause, and I can see the effect this has on the costs shown by explain. The problem is that the estimated cost ends up being wildly inaccurate. I'm not sure if this a problem with the planner or if it is something I am doing wrong on my end. the query (without the limit clause): SELECT ID FROM ps_image WHERE id IN (SELECT image_id FROM ps_gallery_image WHERE gallery_id='G7ejKGoWS_cY') ORDER BY LOWER(FILE_NAME) ASC The ps_image table has about 24 million rows, ps_gallery_image has about 14 million. The query above produces roughly 50 thousand rows. When looking at the explain with the limit, I can see the interpolation that the planner does for the limit node (arriving at a final cost of 458.32 for this example) but not sure why it is inaccurate compared to the actual times. Thanks in advance for taking a look at this, let me know if there is additional information I should provide. Some information about the tables and the explains follow below. James Nelson [ja...@db2 ~] psql --version psql (PostgreSQL) 8.3.5 contains support for command-line editing photoshelter=# \d ps_image Table public.ps_image Column | Type | Modifiers ---+-- +--- id| character varying(16)| not null user_id | character varying(16)| album_id | character varying(16)| not null parent_id | character varying(16)| file_name | character varying(200) | file_size | bigint | 20 rows snipped Indexes: ps_image_pkey PRIMARY KEY, btree (id) i_file_name_l btree (lower(file_name::text)) indexes, fk constraints and triggers snipped photoshelter=# \d ps_gallery_image Table public.ps_gallery_image Column | Type | Modifiers ---+--+ gallery_id| character varying(16)| not null image_id | character varying(16)| not null display_order | integer | not null default 0 caption | character varying(2000) | ctime | timestamp with time zone | not null default now() mtime | timestamp with time zone | not null default now() id| character varying(16)| not null Indexes: ps_gallery_image_pkey PRIMARY KEY, btree (id) gi_gallery_id btree (gallery_id) gi_image_id btree (image_id) Foreign-key constraints: ps_gallery_image_gallery_id_fkey FOREIGN KEY (gallery_id) REFERENCES ps_gallery(id) ON DELETE CASCADE ps_gallery_image_image_id_fkey FOREIGN KEY (image_id) REFERENCES ps_image(id) ON DELETE CASCADE Triggers: ps_image_gi_sync AFTER INSERT OR DELETE OR UPDATE ON ps_gallery_image FOR EACH ROW EXECUTE PROCEDURE ps_image_sync() = = = = = = = = = = = = = = = == explain analyze for bad plan photoshelter=# explain analyze SELECT ID FROM ps_image WHERE id IN (SELECT image_id FROM ps_gallery_image WHERE gallery_id='G7ejKGoWS_cY') ORDER BY LOWER(FILE_NAME) ASC limit 1; QUERY PLAN - Limit (cost=0.00..458.32 rows=1 width=36) (actual time=709831.847..709831.847 rows=1 loops=1) - Nested Loop IN Join (cost=0.00..17700128.78 rows=38620 width=36) (actual time=709831.845..709831.845 rows=1 loops=1) - Index Scan using i_file_name_l on ps_image (cost=0.00..1023863.22 rows=24460418 width=36) (actual time=0.063..271167.293 rows=8876340 loops=1) - Index Scan using gi_image_id on ps_gallery_image (cost=0.00..0.85 rows=1 width=17) (actual time=0.048..0.048 rows=0 loops=8876340) Index Cond: ((ps_gallery_image.image_id)::text = (ps_image.id)::text) Filter: ((ps_gallery_image.gallery_id)::text = 'G7ejKGoWS_cY'::text) Total runtime: 709831.932 ms = = = = = = = = = = = = = = =
[PERFORM]
-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [ADMIN] Databases Vs. Schemas
On Mar 23, 2004, at 11:16 AM, Subbiah, Stalin wrote: And we also created rules to allow update, delete, and insert on those views so that they looked like tables. The reason we did this is because we ran into issues with too many open files during pg_dump when we had thousands of tables instead of about 1 hundred tables and thousands of views. Is it because you had smaller value set for max. allowable number of open files descriptor. what was ulimit -a set to ? It was actually running on OS X and it was a shared memory issue. We would have had to recompile the Darwin kernel to get a bigger SHMMAX, but this solution seemed better since we would possibly be installing on servers where we wouldn't have that much leeway. I think that the view idea works better for a number of other reasons. For one, I can do a query on the base table and see all of the rows for all of the schemas at once, that has proven quite useful. We, however, did have a need to periodically select data from 2 schemas at a time, and it was simpler logic than if we needed 2 database connections. Adam Ruth On Mar 22, 2004, at 2:30 PM, Subbiah, Stalin wrote: --sorry to repost, just subscribed to the list. hopefully it gets to the list this time -- Hi All, We are evaluating the options for having multiple databases vs. schemas on a single database cluster for a custom grown app that we developed. Each app installs same set of tables for each service. And the service could easily be in thousands. so Is it better to have 1000 databases vs 1000 schemas in a database cluster. What are the performance overhead of having multiple databases vs. schemas (if any). I'm leaning towards having schemas rather than databases but i would like to get others opinion on this. Appreciate your reply. Thanks, Stalin ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] [ADMIN] Databases Vs. Schemas
We have a similarly sized database and we went with schemas. We did something different, though, we created one schema that contained all of the tables (we used the public schema) and then created the hundreds of schemas with views that access only the related rows for a particular schema. Something like this: create table public.file (siteid int, id int, [fields]); create schema sc1; create view sc1.file as select * from public.file where siteid = 1; create schema sc2; create view sc2.file as select * from public file where siteid = 2; And we also created rules to allow update, delete, and insert on those views so that they looked like tables. The reason we did this is because we ran into issues with too many open files during pg_dump when we had thousands of tables instead of about 1 hundred tables and thousands of views. We, however, did have a need to periodically select data from 2 schemas at a time, and it was simpler logic than if we needed 2 database connections. Adam Ruth On Mar 22, 2004, at 2:30 PM, Subbiah, Stalin wrote: --sorry to repost, just subscribed to the list. hopefully it gets to the list this time -- Hi All, We are evaluating the options for having multiple databases vs. schemas on a single database cluster for a custom grown app that we developed. Each app installs same set of tables for each service. And the service could easily be in thousands. so Is it better to have 1000 databases vs 1000 schemas in a database cluster. What are the performance overhead of having multiple databases vs. schemas (if any). I'm leaning towards having schemas rather than databases but i would like to get others opinion on this. Appreciate your reply. Thanks, Stalin ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] [pgsql-advocacy] MySQL+InnoDB vs. PostgreSQL test?
Wow, I didn't know that (didn't get far enough to test any rollback). That's not a good thing. facetiousBut then again, it's MySQL who needs rollback anyway?/facetious On Feb 2, 2004, at 5:44 PM, Christopher Kings-Lynne wrote: One more thing that annoyed me. If you started a process, such as a large DDL operation, or heaven forbid, a cartesian join (what? I never do that!). I believe InnoDB also has O(n) rollback time. eg. if you are rolling back 100 million row changes, it takes a long, long time. In PostgreSQL rolling back is O(1)... Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] [pgsql-advocacy] MySQL+InnoDB vs. PostgreSQL test?
Josh, I evaluated MySQL + InnoDB briefly for a project, once. I didn't get very far because of some severe limitations in MySQL. I had to import all of the data from an existing database (MS SQL). One of the tables was about 8 million rows, 10 fields, and had 5 indexes. I found it quite impossible to import into MySQL. I would import the data into a table with no indexes, then perform a bunch of manipulation on it (I wasn't just converting from MS SQL, but also needed to alter quite a bit of the structure). After the manipulation, I would drop some columns and build the indexes. It took MySQL over 4 days to do this! What I found out was that any DDL changes to a table in MySQL actually does this: create a new table, copy all of the data over, then drop the old table and rename the new one. Whenever I added a new index, MySQL would go through the process of rebuilding each previous index. Same thing when adding or dropping columns. I could not find a way to import all of the data in a reasonable amount of time. For comparison, it took less that 45 minutes to import all of the data in to PostgreSQL (that's ALL of the data, not just that one table). Needless to say (but I'll say it anyway :-), I didn't get any farther in my evaluation, there was no point. One more thing that annoyed me. If you started a process, such as a large DDL operation, or heaven forbid, a cartesian join (what? I never do that!). There's no way to cancel it with InnoDB. You have to wait for it to finish. Hitting ctrl+c in their command line tool only kills the command line tool, the process continues. Even if you stop the database and restart it (including with a hard boot), it will pick right up where it left off and continue. That proved to be way too much of a pain for me. Disclaimer: I'm not a real MySQL expert, or anything. There could be ways of getting around this, but after two weeks of trying, I decided to give up. It only took me a few hours to build the requisite PostgreSQL scripts and I never looked back. Adam Ruth On Feb 2, 2004, at 10:21 AM, Josh Berkus wrote: Folks, I've had requests from a couple of businesses to see results of infomal MySQL +InnoDB vs. PostgreSQL tests.I know that we don't have the setup to do full formal benchmarking, but surely someone in our community has gone head-to-head on your own application? -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster