[SQL] performance of functions - or rather lack of it
We are building a postgresql based backend database for our 'hosting provisioning' system. In a vain attempt to add some, what I thought, simple performance tweaks, I thought I would try putting some of the larger and more straighforward queries into functions. For everything else the same, the functions are on the whole slower. Should they be ? The whole thing is being driver through perl DBI. This may be contributory. Anyhow, the original query: SELECT COUNT(mb.instance) FROM domain dm, mail ms, mailbox mb WHERE dm.enabled = true and dm.existent = true and dm.available = true AND ms.enabled = true and ms.existent = true and ms.available = true AND mb.enabled = true and mb.existent = true and mb.available = true AND dm.parent = ms.parent AND mb.parent = ms.serviceid AND dm.instance = $q_domain AND mb.instance = $q_local_part; where $q_XXX are quoted perl scalars. The function is then: CREATE FUNCTION mail_is_mailbox(text, text) RETURNS int4 AS ' SELECT COUNT(mb.instance) FROM domain dm, mail ms, mailbox mb WHERE dm.enabled = true and dm.existent = true and dm.available = true AND ms.enabled = true and ms.existent = true and ms.available = true AND mb.enabled = true and mb.existent = true and mb.available = true AND dm.parent = ms.parent AND mb.parent = ms.serviceid AND dm.instance = $2 AND mb.instance = $1;' LANGUAGE 'sql'; SELECT mail_is_mailbox($q_local_part, $q_domain); Running both these 1000 times from a remote (same subnet 100BaseTX) client with the same query results in time for the function typically 20 - 25% more than the bare query. 22 vs 16 seconds for example. I would have thought that not sending the long SQL across the wire 1000 times would have saved some time even without any potential query optimisations by pre-parsing the SQL ? rgds, -- Peter Galbavy Knowledge Matters Ltd. http://www.knowledge.com/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] performance of functions - or rather lack of it
BTW The service is 7.0.2 and the client 7.1RC1 and the OSes are OpenBSD/i386 2.8-stable. On Wed, Apr 04, 2001 at 11:12:34AM +0100, Peter Galbavy wrote: > We are building a postgresql based backend database for our 'hosting > provisioning' system. In a vain attempt to add some, what I thought, simple > performance tweaks, I thought I would try putting some of the larger and > more straighforward queries into functions. For everything else the same, > the functions are on the whole slower. Should they be ? -- Peter Galbavy Knowledge Matters Ltd http://www.knowledge.com/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] performance of functions - or rather lack of it
On Wed, Apr 04, 2001 at 01:01:15PM -0400, Tom Lane wrote: > Without knowing which PG version you're using, what plans you're > getting, or even whether you've VACUUM ANALYZEd lately, it's difficult > to say more than that. I followed that up quickly - server 7.0.2, client 7.1RC1. VACUUMed etc prior to the test. OTOH, since the tests were done multiple times directly after the other, the underlying infrastructure should be the same. BTW I deleted your paragraph above, but I agree about the parameters and the constant stuff. I will retry with a fixed-value function just for the completeness of it. > Unless your TCP connection is running across tin cans and string, > the transfer time for the query text is negligible ... Fair point. I am not really in the 100Mb networking work in my heart ... :-) -- Peter Galbavy Knowledge Matters Ltd http://www.knowledge.com/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[SQL] how do i insert an empty string ?
FAQ: A search yielded nothing explicit... I have an INSERT statement: INSERT INTO metadata (md5, origin, name, value) VALUES ('fd859f263bd0579935f2146a22d24f32', 'EXIF', 'UserComment', '') but this fails (using Perl DBI, DBD::Pg) because $dbh->quote() returns two single quotes, which fails because something along the way thinks this is a single quote. I do NOT want to insert a NULL but an empty string... (This is either doing a $dbh->do(...) or a prepare ... execute without $dbh->quote()) Peter ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] how do i insert an empty string ?
Sorry: 7.3 beta 2 on OpenBSD 3.2 Peter - Original Message - From: "Peter Galbavy" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, October 15, 2002 11:01 AM Subject: [SQL] how do i insert an empty string ? > FAQ: A search yielded nothing explicit... > > I have an INSERT statement: > > INSERT INTO metadata (md5, origin, name, value) > VALUES ('fd859f263bd0579935f2146a22d24f32', 'EXIF', > 'UserComment', '') > > but this fails (using Perl DBI, DBD::Pg) because $dbh->quote() returns two > single quotes, which fails because something along the way thinks this is a > single quote. I do NOT want to insert a NULL but an empty string... > > (This is either doing a $dbh->do(...) or a prepare ... execute without > $dbh->quote()) > > Peter > > > ---(end of broadcast)--- > TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] > ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] IGNORE ME how do i insert an empty string ?
Please ignore me for now. The string is NOT empty, but full of NUL characters. My bad for not using 'less' to view the output... Peter - Original Message ----- From: "Peter Galbavy" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, October 15, 2002 11:01 AM Subject: [SQL] how do i insert an empty string ? > FAQ: A search yielded nothing explicit... > > I have an INSERT statement: > > INSERT INTO metadata (md5, origin, name, value) > VALUES ('fd859f263bd0579935f2146a22d24f32', 'EXIF', > 'UserComment', '') > > but this fails (using Perl DBI, DBD::Pg) because $dbh->quote() returns two > single quotes, which fails because something along the way thinks this is a > single quote. I do NOT want to insert a NULL but an empty string... > > (This is either doing a $dbh->do(...) or a prepare ... execute without > $dbh->quote()) > > Peter > > > ---(end of broadcast)--- > TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[SQL] 'fake' join and performance ?
OK, I am now confused; postgresql 7.3beta2 on OpenBSD: photos=# select * from metadata WHERE name = 'Make' and value = 'Canon' limit 10; *bang*, 10 values, sub second response. photos=# select * from metadata m, images i WHERE m.name = 'Make' and m.value = 'Canon' limit 10; *yawn* - see you later... Now, 'images' is a new and currently empty table that I intend to do a join on later, but I started building a query to test my join'ing skills and found this; Explain'ing for both: photos=# explain select * from metadata WHERE name = 'Make' and value = 'Canon' limit 10; QUERY PLAN Limit (cost=0.00..27711.98 rows=6 width=92) -> Index Scan using metadata_index_2 on metadata (cost=0.00..31072.94 rows=7 width=92) Index Cond: (name = 'Make'::text) Filter: (value = 'Canon'::text) (4 rows) photos=# explain select * from metadata m, images i WHERE m.name = 'Make' and m.value = 'Canon' limit 10; QUERY PLAN Limit (cost=0.00..27712.04 rows=6 width=816) -> Nested Loop (cost=0.00..31073.00 rows=7 width=816) -> Index Scan using metadata_index_2 on metadata m (cost=0.00..31072.94 rows=7 width=92) Index Cond: (name = 'Make'::text) Filter: (value = 'Canon'::text) -> Seq Scan on images i (cost=0.00..0.00 rows=1 width=724) (6 rows) Er, what's that nested loop. I *know* I have shot myself in the foot somehow, but my initial reaction was that the optimiser should just make the 'fake' (i.e. unreferenced) reference to another table go away... peter ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] 'fake' join and performance ?
> That would be in violation of the SQL spec. The query is defined to > return each join row from the cross product of the FROM tables that > meets the condition of the WHERE clause. As you wrote the query, each > metadata row that meets the WHERE clause will be returned exactly as > many times as there are rows in the images table. There is no such > thing as an "unreferenced" FROM entry as far as SQL is concerned. Sounds about right. Thanks for the clarification. Peter ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[SQL] help optimise this ?
I have a table of image 'instances' where the columns include: md5 char(32),-- the 'original' image md5 key file_md5 char(32) primary key, -- the md5 of each version of an image image_width int, image_length int I want to then find either the largest (max) or smallest (min) version of an image that falls within some range of sizes: e.g. select file_md5 from image_instance where image_width = (select min(image_width) from image_instance where md5 = '546b94e94851a56ee721f3b755f58462') and image_length = (select min(image_length) from image_instance where md5 = '546b94e94851a56ee721f3b755f58462') and md5 = '546b94e94851a56ee721f3b755f58462' and image_width between 0 and 160 and image_length between 0 and 160; Now, having to do three selects on 'md5' to limit the search seems a little unoptimal to me. Note that the test tables are small and I have no other indexes apart from the 'primary key' constraint yet - this is not my primary concern at this point, I would just like cleaner SQL. All I want back is (for some definition) the 'file_md5' that best matches my min/max criteria. I have not - and will leave for now - the case where a cropped image results in a scale change between width and length such that the min/max test returns a different set of rows for each dimension. Argh. And help given is greatly appreciated. rgds, -- Peter ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] help optimise this ?
Wow. Three people have replied with an effectively identical solution. Why didn't I think of this ? Answers on a postcard to... Thanks to all that have replied. Peter - Original Message - From: "Tom Lane" <[EMAIL PROTECTED]> To: "Peter Galbavy" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Thursday, November 21, 2002 3:46 PM Subject: Re: [SQL] help optimise this ? > "Peter Galbavy" <[EMAIL PROTECTED]> writes: > > I want to then find either the largest (max) or smallest (min) version of an > > image that falls within some range of sizes: > > Depends on how you want to define "largest" and "smallest", but if > "area" is a good enough definition, seems like this would work: > > select file_md5 from image_instance > where md5 = '546b94e94851a56ee721f3b755f58462' > and image_width between 0 and 160 > and image_length between 0 and 160 > order by image_width * image_length > limit 1; > > Probably an index on md5 would be sufficient to make this go fast --- > I assume you're not going to be storing a vast number of sizes of > the same image. > > regards, tom lane > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] Need help paging through record sets
I have an almost identical application, but I am using Apache::ASP instead of PHP. Apart from the language differences, I suspect the ideas are the same. What I have done is store the *entire* list of results in a session variable with a clock-time. When I get a new query (...?page=2), I check if the query args are the same (I only have two) and to see if the results are still timely (I use a 30 second timeout, but whatever is good for you). If they are OK, slice the array/list (in perl) else do a new query and use those results. I have yet to add sorting, but that becomes another key in the 'valdation' of the stored list. As my entire site is driven through the use of MD5 checksums to identify photographs (rather than files names or hierarchies) all I end up carrying around is (at most) 100 or so 32 character strings in the session store. A few thousand results should be fine for most medium sized servers I guess. Peter - Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, December 20, 2002 8:53 PM Subject: [SQL] Need help paging through record sets Hello everyone, and thanks for reading my first newbie post. :-) I am a neopyhte PHP and postgreSQL user, with a website at www.the-athenaeum.org. We store (among other things) artworks, which people can view in a list, sorted by artist name, date, medium, etc. We now have enough works that I need to rewrite the PHP listings script (and its embedded SQL) so that users can page through records. As an example, if a user is looking at works by date ascending, they may want to see 100 records at a time. Since we have 600+ records, there would be 7 pages. They'd start on the first page (of course!) and there would be links to pages 2 through 7 as well, just like with results pages of a Google search. They could, from page 1, click any of the other pages to go immdiately to that set of 100 records for display. I see this kind of thing all over the place, and in looking it up, I see most solutions use "SELECT TOP x", which postgreSQL doesn't seem to have. I know how to use LIMIT, but that always starts from the top. I could add a piece to the WHERE clause, say something like "WHERE date > 01-02-1853", but how do I know where the cutoffs are several pages along, without retrieving the whole record set? I suppose the optimal solution for me would be to sort all of the records, then be able to select a range from that sorted record set. So, if they click the link to page 3, I'd like to do this (in pseudocode): 1. SORT records by the date field, descending 2. Retrieve only records 200-299 from the sorted list Is there a way to do that? How is it done elsewhere? Thanks in advance for your help, Chris McCormick, webmaster The Athenaeum - Interactive Humanities Online www.the-athenaeum.org ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[SQL] weighting (the results of) a query ?
I have a table with a primary key ('md5') and a bunch of text fields. There is one row per 'photograph' and the number of rows is about 1100 now but will rise to over 20,000 in a few months - assuming I get time to import all my stuff. I would like to offer users on my web site a free text search on these text fields, but I would like to weight the results base on which field the text came from. Let's say those fields are (for simplicity) 'category', 'subcategory', 'caption' and 'keywords'. I want to do: SELECT md5, weighting() FROM images WHERE category ~* 'term' OR subcategory ~* 'term' OR ... Is there anything I can do - including writing functions - to return a number that is somehow representative of which WHERE clause matched 'first' and even better the more columns matched ? I am guessing that like 'C' an 'OR' conditional stops at the first match and does not process further conditions after a previous one has matched - that's good enough for me for day one... It is not critial that I get a value out, the return order of results could be fine too. I would like to minimise the number of queries to the DB, but I can fall back on doing one query per column and combining the results in perl. This is my approach for an initial implementation later today unless anyone can suggest otherwise... Any pointers, tips, code, suggestions greatly appreciated. Happy New Year all, BTW -- Peter ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] weighting (the results of) a query ?
Thanks to Len Morgan for the hints to get to this: SELECT *, '4' as result FROM images WHERE iptc_release_date < 'today' AND iptc_priority BETWEEN 1 AND 5 AND iptc_caption ~* 'term' UNION SELECT *, '3' as result FROM images WHERE iptc_release_date < 'today' AND iptc_priority BETWEEN 1 AND 5 AND iptc_keywords ~* 'term' UNION SELECT *, '2' as result FROM images WHERE iptc_release_date < 'today' AND iptc_priority BETWEEN 1 AND 5 AND iptc_category ~* 'term' UNION SELECT *, '1' as result FROM images WHERE iptc_release_date < 'today' AND iptc_priority BETWEEN 1 AND 5 AND iptc_subcategory ~* 'term' ORDER BY result desc, iptc_priority asc, shot_time, image_serial asc; Using a constant and UNION made it work OK. Not sure yet on real world performance, but that's what tuning is for :) Hope someone finds this in the archive and finds it useful. Peter - Original Message - From: "Peter Galbavy" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, January 05, 2003 10:41 AM Subject: [SQL] weighting (the results of) a query ? > I have a table with a primary key ('md5') and a bunch of text fields. > There is one row per 'photograph' and the number of rows is about 1100 > now but will rise to over 20,000 in a few months - assuming I get time > to import all my stuff. > > I would like to offer users on my web site a free text search on these > text fields, but I would like to weight the results base on which field > the text came from. > > Let's say those fields are (for simplicity) 'category', 'subcategory', > 'caption' and 'keywords'. > > I want to do: > > SELECT md5, weighting() FROM images WHERE > category ~* 'term' OR subcategory ~* 'term' OR ... > > Is there anything I can do - including writing functions - to return a > number that is somehow representative of which WHERE clause matched > 'first' and even better the more columns matched ? > > I am guessing that like 'C' an 'OR' conditional stops at the first match > and does not process further conditions after a previous one has > matched - that's good enough for me for day one... > > It is not critial that I get a value out, the return order of results > could be fine too. > > I would like to minimise the number of queries to the DB, but I can fall > back on doing one query per column and combining the results in perl. > This is my approach for an initial implementation later today unless > anyone can suggest otherwise... > > Any pointers, tips, code, suggestions greatly appreciated. > > Happy New Year all, BTW > -- > Peter > > > ---(end of broadcast)--- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) > ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] weighting (the results of) a query ?
> Most likely you should write UNION ALL, not UNION. As given, the query > will go through a pass of attempted duplicate-row-elimination, which is > almost certainly not what you want. Not sure - what I want is only one row per real row but ordered as per the constants. When you say duplicate-row-elimination do you mean including the 'constant' weighting, in which case UNION ALL is probably right. At the moment, the dataset returned appears correctly de-duplicated. What I mean is when an image row has both 'eat static' in the caption and as a keyword, I want it returned only once... Peter ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] weighting (the results of) a query ?
> I think your query might fail on that requirement regardless, no? At > least I missed how you'd prevent it. I have had about 10 minutes to play with this - my day jobrequires I do real testing when I get home later tonight :) Thanks, and I will keep an eye out for this and figure a way around it. Petr ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[SQL] techniques for manual ordering of data ?
I tried googling, but the keywords here are so general, that I couldn't find much useful. I am looking for reommendations on how to build an ordered list of media (in this case photographs) in a particular "context". What I have - which I have somewhat simplified maually - is: CREATE TABLE contexts ( photo_idint8 REFERENCES photos ..., catalogue_idint8 REFERENCES catalogues ..., collection_id int8 REFERENCES collections ..., orderingint ); Here, a photograph may appear in multiple catalogue/collection places. What I want to do is to allow the admin to reorder the display by using + / - style buttons. My initial thought was to use an arbitrary "ordering" column, like old BASIC line numbers, which would be initially allocated in 10s or 100s, and then updating the value via the UI and occassionally applying a "renum" style function when the list gets uneven. To avoid performance hits through too many updates, I am trying to avoid the admin UI from UPDATEing all the rows in a specific catalogue/collection every time a button is pressed - both the SQL and the subsequent fragmented table. Hmm. Anyone got any "prior art" or suggestions on how they acheive this ? Note that I am NOT including ordering based on an photograph specific metadata like time or location. That's another story and another interface. rgds, -- Peter ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html