Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
1.451 ms = 1.451 milliseconds 1451.0 ms = 1.451 seconds ... so 32.918 ms for a commit seems perhaps reasonable ? Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Zeugswetter Andreas DCP SD Sent: Thu 5/11/2006 12:55 AM To: Jim C. Nasby; PFC Cc: Greg Stark; Tom Lane; pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org Subject:Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal > Something else worth considering is not using the normal > catalog methods > for storing information about temp tables, but hacking that together > would probably be a rather large task. But the timings suggest, that it cannot be the catalogs in the worst case he showed. > 0.101 ms BEGIN > 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER NOT > NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP 1.4 seconds is not great for create table, is that what we expect ? > 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC > LIMIT 20 > 0.443 ms ANALYZE tmp > 0.365 ms SELECT * FROM tmp > 0.310 ms DROP TABLE tmp > 32.918 ms COMMIT > > CREATING the table is OK, but what happens on COMMIT ? I hear the disk > seeking frantically. The 32 seconds for commit can hardly be catalog related. It seems the file is fsynced before it is dropped. Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq !DSPAM:446c0a75172664042098162! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
On Thu, May 11, 2006 at 06:08:36PM -0400, Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > I'd hope that wasn't what's happening... is the backend smart enough to > > know not to fsync anything involved with the temp table? > > The catalog entries required for it have to be fsync'd, unless you enjoy > putting your entire database at risk (a bad block in pg_class, say, > would probably take out more than one table). Yeah, thought about that after sending... :( > It's interesting to speculate about keeping such catalog entries in > child tables of pg_class etc that are themselves temp tables. Resolving > the apparent circularity of this is left as an exercise for the reader. Well, since it'd be a system table with a fixed OID there could presumably be a special case in the recovery code for it, though that's pretty fugly sounding. Another alternative would be to support global temp tables... I think that would handle all the complaints of the OP except for the cost of analyze. I suspect this would be easier to do than creating a special type of temp table that used tuplestore instead of the full table framework, and it'd certainly be more general-purpose. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > I'd hope that wasn't what's happening... is the backend smart enough to > know not to fsync anything involved with the temp table? The catalog entries required for it have to be fsync'd, unless you enjoy putting your entire database at risk (a bad block in pg_class, say, would probably take out more than one table). It's interesting to speculate about keeping such catalog entries in child tables of pg_class etc that are themselves temp tables. Resolving the apparent circularity of this is left as an exercise for the reader. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
> 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC > LIMIT 20 > 0.443 ms ANALYZE tmp > 0.365 ms SELECT * FROM tmp > 0.310 ms DROP TABLE tmp > 32.918 ms COMMIT The 32 seconds for commit can hardly be catalog related. It seems the file is fsynced before it is dropped. I'd hope that wasn't what's happening... is the backend smart enough to know not to fsync anything involved with the temp table? ISTM that that transaction shouldn't actually be creating any WAL traffic at all. Though on the other hand there's no reason that DROP should be in the transaction at all; maybe that's gumming things up during the commit. I included the DROP to make it clear that the time was spent in COMMITting, not in DROPping the table. Also, you can't use CREATE TEMP TABLE AS SELECT ... and at the same time make it ON COMMIT DROP. You have to CREATE and INSERT. With an ON COMMIT DROP temp table, the global timings are the same wether or not it is dropped before commit : it is always the COMMIT which takes all the milliseconds. I still bet on system catalog updates being the main cause of the time spent in COMMIT... (because ANALYZE changes this time) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
On Thu, May 11, 2006 at 09:55:15AM +0200, Zeugswetter Andreas DCP SD wrote: > > > Something else worth considering is not using the normal > > catalog methods > > for storing information about temp tables, but hacking that together > > would probably be a rather large task. > > But the timings suggest, that it cannot be the catalogs in the worst > case > he showed. > > > 0.101 ms BEGIN > > 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER > NOT > > NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP > > 1.4 seconds is not great for create table, is that what we expect ? milliseconds... :) Given the amount of code and locking that it looks like is involved in creating a table, that might not be unreasonable... > > 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id > DESC > > LIMIT 20 > > 0.443 ms ANALYZE tmp > > 0.365 ms SELECT * FROM tmp > > 0.310 ms DROP TABLE tmp > > 32.918 ms COMMIT > > > > CREATING the table is OK, but what happens on COMMIT ? I hear > the disk > > seeking frantically. > > The 32 seconds for commit can hardly be catalog related. It seems the > file is > fsynced before it is dropped. I'd hope that wasn't what's happening... is the backend smart enough to know not to fsync anything involved with the temp table? ISTM that that transaction shouldn't actually be creating any WAL traffic at all. Though on the other hand there's no reason that DROP should be in the transaction at all; maybe that's gumming things up during the commit. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
On Thu, May 11, 2006 at 08:43:46PM +0200, Martijn van Oosterhout wrote: > On Thu, May 11, 2006 at 11:35:34AM -0400, Greg Stark wrote: > > I can say that I've seen plenty of instances where the ability to create > > temporary tables very quickly with no overhead over the original query would > > be useful. > > I wonder if this requires what the standard refers to as a global > temporary table. As I read it (which may be wrong, I find the language > obtuse), a global temporary table is a temporary table whose structure > is predefined. So, you'd define it once, updating the catalog only once > but still get a table that is emptied each startup. > > Ofcourse, it may not be what the standard means, but it still seems > like a useful idea, to cut down on schema bloat. IIRC that's the exact syntax Oracle uses: CREATE GLOBAL TEMPORARY TABLE ... I always found it a bit odd, since it always seemed to me like a global temporary table would be one that every backend could read... something akin to a real table that doesn't worry about fsync or any of that (and is potentially not backed on disk at all). -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
On Thu, May 11, 2006 at 11:35:34AM -0400, Greg Stark wrote: > I can say that I've seen plenty of instances where the ability to create > temporary tables very quickly with no overhead over the original query would > be useful. I wonder if this requires what the standard refers to as a global temporary table. As I read it (which may be wrong, I find the language obtuse), a global temporary table is a temporary table whose structure is predefined. So, you'd define it once, updating the catalog only once but still get a table that is emptied each startup. Ofcourse, it may not be what the standard means, but it still seems like a useful idea, to cut down on schema bloat. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > Perhaps it would be worth creating a class of temporary tables that used > a tuplestore, although that would greatly limit what could be done with > that temp table. I can say that I've seen plenty of instances where the ability to create temporary tables very quickly with no overhead over the original query would be useful. For instance, in one site I had to do exactly what I always advise others against: use offset/limit to implement paging. So first I have to execute the query with a count(*) aggregate to get the total, then execute the same query a second time to fetch the actual page of interest. This would be (or could be arranged to be) within the same transaction and doesn't require the ability to execute any dml against the tuple store which I imagine would be the main issues? For bonus points what would be real neat would be if the database could notice shared plan segments, keep around the materialized tuple store, and substitute it instead of reexecuting that segment of the plan. Of course this requires keeping track of transaction snapshot states and making sure it's still correct. > Something else worth considering is not using the normal catalog methods > for storing information about temp tables, but hacking that together > would probably be a rather large task. It would be nice if using this feature didn't interact poorly with preplanning all your queries and using the cached plans. Perhaps if you had some way to create a single catalog entry that defined all the column names and types and then simply pointed it at a new tuplestore each time without otherwise altering the catalog entry? -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
On Thu, May 11, 2006 at 09:55:15AM +0200, Zeugswetter Andreas DCP SD wrote: > > 0.101 ms BEGIN > > 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER > NOT > > NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP > > 1.4 seconds is not great for create table, is that what we expect ? Hmm, I'm hoping ms means milliseconds... -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
> Something else worth considering is not using the normal > catalog methods > for storing information about temp tables, but hacking that together > would probably be a rather large task. But the timings suggest, that it cannot be the catalogs in the worst case he showed. > 0.101 ms BEGIN > 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER NOT > NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP 1.4 seconds is not great for create table, is that what we expect ? > 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC > LIMIT 20 > 0.443 ms ANALYZE tmp > 0.365 ms SELECT * FROM tmp > 0.310 ms DROP TABLE tmp > 32.918 ms COMMIT > > CREATING the table is OK, but what happens on COMMIT ? I hear the disk > seeking frantically. The 32 seconds for commit can hardly be catalog related. It seems the file is fsynced before it is dropped. Andreas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
On Tue, May 09, 2006 at 06:29:31PM +0200, PFC wrote: You mean the cursors'storage is in fact the same internal machinery as a temporary table ? Use the source, Luke... LOL, yeah, I should have, sorry. See tuplestore_begin_heap in backend/utils/sort/tuplestore.c and heap_create_with_catalog in backend/catalog/heap.c. You'll find that creating a tuplestore is far easier than creating a temp table. I had used intuition (instead of the source) to come at the same conclusion regarding the level of complexity of these two... But I'll look at the source ;) Perhaps it would be worth creating a class of temporary tables that used a tuplestore, although that would greatly limit what could be done with that temp table. Just selecting from it I guess, but that's all that's needed. Anymore would duplicate the functionality of a temp table. I find cursors awkward. The application can FETCH from them, but postgres itself can't do it in SQL, unless using FOR.. IN in plpgsql... It would be a powerful addition to be able to split queries, factor out common parts between multiple queries, etc, using this system, it can even be used to execute an inner part of a query, then plan the rest according to the results and execute it... without the overhead of a temp table. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
On Tue, May 09, 2006 at 06:29:31PM +0200, PFC wrote: > You mean the cursors'storage is in fact the same internal machinery > as a temporary table ? Use the source, Luke... See tuplestore_begin_heap in backend/utils/sort/tuplestore.c and heap_create_with_catalog in backend/catalog/heap.c. You'll find that creating a tuplestore is far easier than creating a temp table. Perhaps it would be worth creating a class of temporary tables that used a tuplestore, although that would greatly limit what could be done with that temp table. Something else worth considering is not using the normal catalog methods for storing information about temp tables, but hacking that together would probably be a rather large task. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
Creating cursors for a simple plan like a single sequential scan is fast because it's using the original data from the table. I used the following query : SELECT * FROM bookmarks ORDER BY annonce_id DESC LIMIT 20 It's a backward index scan + limit... not a seq scan. And it's damn fast : 0.042 ms BEGIN 0.153 ms DECLARE tmp SCROLL CURSOR WITHOUT HOLD FOR SELECT * FROM bookmarks ORDER BY annonce_id DESC LIMIT 20 0.246 ms FETCH ALL FROM tmp 0.048 ms MOVE FIRST IN tmp 0.246 ms FETCH ALL FROM tmp 0.048 ms CLOSE tmp 0.084 ms COMMIT But your example was predicated on this part of the job being a complex query. If it's a complex query involving joins and groupings, etc, then it will have to be materialized and there's no (good) reason for that to be any faster than a temporary table which is effectively the same thing. You mean the cursors'storage is in fact the same internal machinery as a temporary table ? In that case, this raises an interesting question : why is the cursor faster ? Let's try a real-life example from my website : it is a search query (quite complex) which is then joined to a lot of tables to resolve FKeys. To that query I must add add an application-made join using a big IN() clause extracted from the data. Timings includes the time to fetch the results into Python. The "running total" column is the sum of all timings since the BEGIN. query_time running_total rowsquery 0.061 ms0.061 ms-1 BEGIN 23.420 ms 23.481 ms 85 SELECT * FROM (huge query with a lot of joins) 4.318 ms27.799 ms 2 SELECT l.*, u.login, u.bg_color FROM annonces_log l, users u WHERE u.id=l.user_id AND l.annonce_id IN (list of ids from previous query) ORDER BY annonce_id, added 0.241 ms28.040 ms -1 COMMIT (Just in case you want to hurt yourself, here's the EXPLAIN ANALYZE output : http://peufeu.com/temp/big_explain.txt) Using a cursor takes about the same time. Also, doing just the search query takes about 12 ms, the joins take up the rest. Now, I'll rewrite my query eliminating the joins and using a temp table. Storing the whole result in the temp table will be too slow, because there are too many columns. Therefore I will only store the primary and foreign key columns, and join again to the main table to get the full records. query_time running_total rowsquery 0.141 ms0.141 ms-1 BEGIN Do the search : 8.229 ms8.370 ms-1 CREATE TEMPORARY TABLE tmp AS SELECT id, city_id, zipcode, contact_id, contact_group_id, price/terrain as sort FROM (stripped down search query) 0.918 ms9.287 ms-1 ANALYZE tmp Fetch the main data to display : 7.663 ms16.951 ms 85 SELECT a.* FROM tmp t, annonces_display a WHERE a.id=t.id ORDER BY t.sort Fetch log entries associates with each row (one row to many log entries) : 1.021 ms17.972 ms 2 SELECT l.*, u.login, u.bg_color FROM annonces_log l, users u, tmp t WHERE u.id=l.user_id AND l.annonce_id = t.id ORDER BY annonce_id, added 3.468 ms21.440 ms 216 SELECT annonce_id, array_accum(list_id) AS list_ids, array_accum(COALESCE(user_id,0)) AS list_added_by, max(added) AS added_to_list FROM bookmarks GROUP BY annonce_id Resolve foreign key relations 1.034 ms22.474 ms 37 SELECT r.annonce_id FROM read_annonces r, tmp t WHERE r.annonce_id = t.id 0.592 ms23.066 ms 9 SELECT * FROM cities_dist_zipcode WHERE zipcode IN (SELECT zipcode FROM tmp) 0.716 ms23.782 ms 11 SELECT * FROM cities_dist WHERE id IN (SELECT city_id FROM tmp) 1.125 ms24.907 ms 45 SELECT * FROM contacts WHERE id IN (SELECT contact_id FROM tmp) 0.799 ms25.705 ms 42 SELECT * FROM contact_groups WHERE id IN (SELECT contact_group_id FROM tmp) 0.463 ms26.169 ms -1 DROP TABLE tmp 32.208 ms 58.377 ms -1 COMMIT From this we see : Using a temporary table is FASTER than doing the large query with all the joins. (26 ms versus 28 ms). It's also nicer and cleaner. However the COMMIT takes as much time as all the queries together ! Let's run with fsync=off : query_time running_total rowsquery 0.109 ms0.109 ms-1 BEGIN 8.321 ms8.430 ms-1 CREATE TEMPORARY TABLE tmp AS SELECT id, city_id, zipcode, contact_id, contact_group_id, price/terrain as sort FROM (stripped down search query) 0.849 ms9.280 ms-1 ANALYZE tmp 7.360 ms16.640 ms 85 SELECT a.* FROM tmp t, annonces_display a WHERE a.id=t.id ORDER BY t.sort 1.067 ms17.707 ms 2 SELECT l.*, u.login, u.bg_color FROM annonces_log l, users u, t
Re: [HACKERS] Big IN() clauses etc : feature proposal
PFC <[EMAIL PROTECTED]> writes: > > I really like this. It's clean, efficient, and easy to use. > > This would be a lot faster than using temp tables. > Creating cursors is very fast so we can create two, and avoid doing > twice the same work (ie. hashing the ids from the results to grab categories > only once). Creating cursors for a simple plan like a single sequential scan is fast because it's using the original data from the table. But your example was predicated on this part of the job being a complex query. If it's a complex query involving joins and groupings, etc, then it will have to be materialized and there's no (good) reason for that to be any faster than a temporary table which is effectively the same thing. -- greg ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Big IN() clauses etc : feature proposal
The moral of the story is that you're probably better off running a bunch of small selects than in trying to optimize things with one gargantuan select. Ever experiment with loading the parameters into a temp table and joining to that? Also, it might be worth re-testing that conclusion with PG CVS tip (or 8.2 when it comes out). The reimplementation of IN as = ANY that I did a couple months ago might well change the results. Long mail, but I think it's interesting... I think this is a generic problem, which is often encountered : selecting a bunch of records based on a list of primary keys (or other indexed, unique field) ; said list being anything from very short to quite large. Here are a few occurences of this need : 1- The application supplies a list of id's (the case of the OP of this thread) 2- A query Q1 yields a list of selected objects , that we wish to use in several subsequent queries. And Q1 is a query we don't wish to do several times, either because it's slow, complicated (advanced search, for instance), or it acts on a constantly moving dataset, so the results would be different each time. So we store the result of Q1 in the application, or in a temp table, or in an array in a plpgsql variable, whatever, to reuse them. Then, for each of these objects, often we will make more queries to resolve foreign keys (get category name, owner name, from categories and users tables, etc). I have encountered both cases quite often, and they both pose a few problems. I think it would be a good opportunity for a new feature (see below). A typical use case for point 2 : Consider an "objects" table. Each object ... - is related to one or several rows from the "categories" table via an "objects_categories" link table. - has an owner_id referencing the "users" table I do an "advanced search" query on "objects", which returns a list of objects. I can join directly to "users" to get the owner's name, but joining to "categories" is already problematic because of the many-to-many relationship. I wish to do this : fetch all objects matching the search criteria ; fetch the owner users ; fetch the categories ; build in my application object space a clean and straightforward data representation for all this. Also : - I do not wish to complicate the search query. - The row estimates for the search query results are likely to be "not so good" (because it's a complex query) ; so the joins to users and categories are likely to use suboptimal plans based on "not so good" estimates. - The rows from "objects" are large ; so moving them around through a lot of small joins hurts performance. The obvious solution is this : BEGIN; CREATE TEMPORARY TABLE results ON COMMIT DROP AS SELECT * FROM advanced search query; ANALYZE results; -- get the results to the application SELECT * FROM results; -- get object owners info SELECT * FROM users WHERE id IN (SELECT user_id FROM results); -- get category info SELECT * FROM categories WHERE id IN (SELECT category_id FROM objects_to_categories WHERE object_id IN (SELECT id FROM results)); -- get object/category relations (the ORM will use this to link objects in the application) SELECT * FROM objects_to_categories WHERE object_id IN (SELECT id FROM results); COMMIT; You might wonder why I do it this way on the "categories" table. This is because I use an Object-Relational mapper which will instantiate a User or Category class object for each row I fetch from these tables. I do not want to fetch just the username, using a simple join, but I want the full object, because : - I want to instantiate these objects (they have useful methods to process rights etc) - I do not want to mix columns from "objects" and "users" And I do not wish to instantiate each category more than once. This would waste memory, but more importantly, it is a lot cleaner to have only one instance per row, because my ORM then translates the foreign key relations into object relations (pointers). Each instanciated category will contain a list of Object instances ; each Object instance will contain a list of the categories it belongs to, and point to its owner user. Back to the point : I can't use the temp table method, because temp tables are too slow. Creating a temp table, filling it, analyzing it and then dropping it takes about 100 ms. The search query, on average, takes 10 ms. So I have to move this logic to the application, or to plpgsql, and jump through hoops and use big IN() clauses ; which has the following drawbacks : - slow - ugly - very hard for the ORM to auto-generate *** Feature proposal : A way to store query results in a named buffer and reuse them in the next queries. This should be as fast as possible, store results in RAM