[SQL] performance of functions - or rather lack of it

2001-04-04 Thread Peter Galbavy

We are building a postgresql based backend database for our 'hosting
provisioning' system. In a vain attempt to add some, what I thought, simple
performance tweaks, I thought I would try putting some of the larger and
more straighforward queries into functions. For everything else the same,
the functions are on the whole slower. Should they be ?

The whole thing is being driver through perl DBI. This may be contributory.

Anyhow, the original query:

SELECT COUNT(mb.instance) FROM domain dm, mail ms, mailbox mb WHERE
dm.enabled = true and dm.existent = true and dm.available = true AND
ms.enabled = true and ms.existent = true and ms.available = true AND
mb.enabled = true and mb.existent = true and mb.available = true AND
dm.parent = ms.parent AND
mb.parent = ms.serviceid AND
dm.instance = $q_domain AND
mb.instance = $q_local_part;

where $q_XXX are quoted perl scalars. The function is then:

CREATE FUNCTION mail_is_mailbox(text, text) RETURNS int4 AS '
SELECT COUNT(mb.instance) FROM domain dm, mail ms, mailbox mb WHERE
dm.enabled = true and dm.existent = true and dm.available = true AND
ms.enabled = true and ms.existent = true and ms.available = true AND
mb.enabled = true and mb.existent = true and mb.available = true AND
dm.parent = ms.parent AND
mb.parent = ms.serviceid AND
dm.instance = $2 AND
mb.instance = $1;'
LANGUAGE 'sql';

SELECT mail_is_mailbox($q_local_part, $q_domain);

Running both these 1000 times from a remote (same subnet 100BaseTX) client
with the same query results in time for the function typically 20 - 25% more
than the bare query. 22 vs 16 seconds for example.

I would have thought that not sending the long SQL across the wire 1000
times would have saved some time even without any potential query
optimisations by pre-parsing the SQL ?

rgds,
--
Peter Galbavy
Knowledge Matters Ltd.
http://www.knowledge.com/


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [SQL] performance of functions - or rather lack of it

2001-04-04 Thread Peter Galbavy

BTW The service is 7.0.2 and the client 7.1RC1 and the OSes are
OpenBSD/i386 2.8-stable.

On Wed, Apr 04, 2001 at 11:12:34AM +0100, Peter Galbavy wrote:
> We are building a postgresql based backend database for our 'hosting
> provisioning' system. In a vain attempt to add some, what I thought, simple
> performance tweaks, I thought I would try putting some of the larger and
> more straighforward queries into functions. For everything else the same,
> the functions are on the whole slower. Should they be ?

-- 
Peter Galbavy
Knowledge Matters Ltd
http://www.knowledge.com/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [SQL] performance of functions - or rather lack of it

2001-04-05 Thread Peter Galbavy

On Wed, Apr 04, 2001 at 01:01:15PM -0400, Tom Lane wrote:
> Without knowing which PG version you're using, what plans you're
> getting, or even whether you've VACUUM ANALYZEd lately, it's difficult
> to say more than that.

I followed that up quickly - server 7.0.2, client 7.1RC1. VACUUMed
etc prior to the test. OTOH, since the tests were done multiple
times directly after the other, the underlying infrastructure should
be the same.

BTW I deleted your paragraph above, but I agree about the parameters
and the constant stuff. I will retry with a fixed-value function
just for the completeness of it.

> Unless your TCP connection is running across tin cans and string,
> the transfer time for the query text is negligible ...

Fair point. I am not really in the 100Mb networking work in my heart
... :-)

-- 
Peter Galbavy
Knowledge Matters Ltd
http://www.knowledge.com/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



[SQL] how do i insert an empty string ?

2002-10-15 Thread Peter Galbavy

FAQ: A search yielded nothing explicit...

I have an INSERT statement:

INSERT INTO metadata (md5, origin, name, value)
  VALUES ('fd859f263bd0579935f2146a22d24f32', 'EXIF',
'UserComment', '')

but this fails (using Perl DBI, DBD::Pg) because $dbh->quote() returns two
single quotes, which fails because something along the way thinks this is a
single quote. I do NOT want to insert a NULL but an empty string...

(This is either doing a $dbh->do(...) or a prepare ... execute without
$dbh->quote())

Peter


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [SQL] how do i insert an empty string ?

2002-10-15 Thread Peter Galbavy

Sorry: 7.3 beta 2 on OpenBSD 3.2

Peter
- Original Message -
From: "Peter Galbavy" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, October 15, 2002 11:01 AM
Subject: [SQL] how do i insert an empty string ?


> FAQ: A search yielded nothing explicit...
>
> I have an INSERT statement:
>
> INSERT INTO metadata (md5, origin, name, value)
>   VALUES ('fd859f263bd0579935f2146a22d24f32', 'EXIF',
> 'UserComment', '')
>
> but this fails (using Perl DBI, DBD::Pg) because $dbh->quote() returns two
> single quotes, which fails because something along the way thinks this is
a
> single quote. I do NOT want to insert a NULL but an empty string...
>
> (This is either doing a $dbh->do(...) or a prepare ... execute without
> $dbh->quote())
>
> Peter
>
>
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [SQL] IGNORE ME how do i insert an empty string ?

2002-10-15 Thread Peter Galbavy

Please ignore me for now. The string is NOT empty, but full of NUL
characters. My bad for not using 'less' to view the output...

Peter

- Original Message -----
From: "Peter Galbavy" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, October 15, 2002 11:01 AM
Subject: [SQL] how do i insert an empty string ?


> FAQ: A search yielded nothing explicit...
>
> I have an INSERT statement:
>
> INSERT INTO metadata (md5, origin, name, value)
>   VALUES ('fd859f263bd0579935f2146a22d24f32', 'EXIF',
> 'UserComment', '')
>
> but this fails (using Perl DBI, DBD::Pg) because $dbh->quote() returns two
> single quotes, which fails because something along the way thinks this is
a
> single quote. I do NOT want to insert a NULL but an empty string...
>
> (This is either doing a $dbh->do(...) or a prepare ... execute without
> $dbh->quote())
>
> Peter
>
>
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



[SQL] 'fake' join and performance ?

2002-10-22 Thread Peter Galbavy
OK, I am now confused; postgresql 7.3beta2 on OpenBSD:


photos=# select * from metadata WHERE name = 'Make' and value = 'Canon'
limit 10;

*bang*, 10 values, sub second response.

photos=# select * from metadata m, images i WHERE m.name = 'Make' and
m.value = 'Canon' limit 10;

*yawn* - see you later...

Now, 'images' is a new and currently empty table that I intend to do a join
on later, but I started building a query to test my join'ing skills and
found this;

Explain'ing for both:

photos=# explain select * from metadata WHERE name = 'Make' and value =
'Canon' limit 10;
 QUERY PLAN


 Limit  (cost=0.00..27711.98 rows=6 width=92)
   ->  Index Scan using metadata_index_2 on metadata  (cost=0.00..31072.94
rows=7 width=92)
 Index Cond: (name = 'Make'::text)
 Filter: (value = 'Canon'::text)
(4 rows)

photos=# explain select * from metadata m, images i WHERE m.name = 'Make'
and m.value = 'Canon' limit 10;
 QUERY PLAN


 Limit  (cost=0.00..27712.04 rows=6 width=816)
   ->  Nested Loop  (cost=0.00..31073.00 rows=7 width=816)
 ->  Index Scan using metadata_index_2 on metadata m
(cost=0.00..31072.94 rows=7 width=92)
   Index Cond: (name = 'Make'::text)
   Filter: (value = 'Canon'::text)
 ->  Seq Scan on images i  (cost=0.00..0.00 rows=1 width=724)
(6 rows)



Er, what's that nested loop. I *know* I have shot myself in the foot
somehow, but my initial reaction was that the optimiser should just make the
'fake' (i.e. unreferenced) reference to another table go away...

peter


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [SQL] 'fake' join and performance ?

2002-10-22 Thread Peter Galbavy
> That would be in violation of the SQL spec.  The query is defined to
> return each join row from the cross product of the FROM tables that
> meets the condition of the WHERE clause.  As you wrote the query, each
> metadata row that meets the WHERE clause will be returned exactly as
> many times as there are rows in the images table.  There is no such
> thing as an "unreferenced" FROM entry as far as SQL is concerned.

Sounds about right. Thanks for the clarification.

Peter



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



[SQL] help optimise this ?

2002-11-21 Thread Peter Galbavy
I have a table of image 'instances' where the columns include:

md5 char(32),-- the 'original' image md5 key
file_md5 char(32) primary key,   -- the md5 of each version of an image
image_width int,
image_length int

I want to then find either the largest (max) or smallest (min) version of an
image that falls within some range of sizes:

e.g.

select file_md5 from image_instance
where image_width =
(select min(image_width) from image_instance where md5 =
'546b94e94851a56ee721f3b755f58462')
and image_length =
(select min(image_length) from image_instance where md5 =
'546b94e94851a56ee721f3b755f58462')
and md5 = '546b94e94851a56ee721f3b755f58462'
and image_width between 0 and 160
and image_length between 0 and 160;

Now, having to do three selects on 'md5' to limit the search seems a little
unoptimal to me. Note that the test tables are small and I have no other
indexes apart from the 'primary key' constraint yet - this is not my primary
concern at this point, I would just like cleaner SQL.

All I want back is (for some definition) the 'file_md5' that best matches my
min/max criteria.

I have not - and will leave for now - the case where a cropped image results
in a scale change between width and length such that the min/max test
returns a different set of rows for each dimension. Argh.

And help given is greatly appreciated.

rgds,
--
Peter


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [SQL] help optimise this ?

2002-11-21 Thread Peter Galbavy
Wow. Three people have replied with an effectively identical solution.

Why didn't I think of this ? Answers on a postcard to...

Thanks to all that have replied.

Peter
- Original Message -
From: "Tom Lane" <[EMAIL PROTECTED]>
To: "Peter Galbavy" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, November 21, 2002 3:46 PM
Subject: Re: [SQL] help optimise this ?


> "Peter Galbavy" <[EMAIL PROTECTED]> writes:
> > I want to then find either the largest (max) or smallest (min) version
of an
> > image that falls within some range of sizes:
>
> Depends on how you want to define "largest" and "smallest", but if
> "area" is a good enough definition, seems like this would work:
>
> select file_md5 from image_instance
> where md5 = '546b94e94851a56ee721f3b755f58462'
> and image_width between 0 and 160
> and image_length between 0 and 160
> order by image_width * image_length
> limit 1;
>
> Probably an index on md5 would be sufficient to make this go fast ---
> I assume you're not going to be storing a vast number of sizes of
> the same image.
>
> regards, tom lane
>


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [SQL] Need help paging through record sets

2002-12-21 Thread Peter Galbavy
I have an almost identical application, but I am using Apache::ASP instead of PHP. 
Apart from the language differences, I
suspect the ideas are the same.

What I have done is store the *entire* list of results in a session variable with a 
clock-time. When I get a new query
(...?page=2), I check if the query args are the same (I only have two) and to see if 
the results are still timely (I use a 30
second timeout, but whatever is good for you). If they are OK, slice the array/list 
(in perl) else do a new query and use
those results.

I have yet to add sorting, but that becomes another key in the 'valdation' of the 
stored list.

As my entire site is driven through the use of MD5 checksums to identify photographs 
(rather than files names or hierarchies)
all I end up carrying around is (at most) 100 or so 32 character strings in the 
session store. A few thousand results should
be fine for most medium sized servers I guess.

Peter

- Original Message -
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, December 20, 2002 8:53 PM
Subject: [SQL] Need help paging through record sets


Hello everyone, and thanks for reading my first newbie post. :-)

I am a neopyhte PHP and postgreSQL user, with a website at www.the-athenaeum.org.  We 
store (among other things) artworks,
which people can view in a list, sorted by artist name, date, medium, etc.

We now have enough works that I need to rewrite the PHP listings script (and its 
embedded SQL) so that users can page through
records.  As an example, if a user is looking at works by date ascending, they may 
want to see 100 records at a time.  Since
we have 600+ records, there would be 7 pages.  They'd start on the first page (of 
course!) and there would be links to pages
2 through 7 as well, just like with results pages of a Google search.  They could, 
from page 1, click any of the other pages
to go immdiately to that set of 100 records for display.

I see this kind of thing all over the place, and in looking it up, I see most 
solutions use "SELECT TOP x", which postgreSQL
doesn't seem to have.  I know how to use LIMIT, but that always starts from the top.  
I could add a piece to the WHERE
clause, say something like "WHERE date > 01-02-1853", but how do I know where the 
cutoffs are several pages along, without
retrieving the whole record set?

I suppose the optimal solution for me would be to sort all of the records, then be 
able to select a range from that sorted
record set.  So, if they click the link to page 3, I'd like to do this (in pseudocode):

1.  SORT records by the date field, descending
2.  Retrieve only records 200-299 from the sorted list

Is there a way to do that?  How is it done elsewhere?

Thanks in advance for your help,
Chris McCormick, webmaster
The Athenaeum - Interactive Humanities Online
www.the-athenaeum.org




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[SQL] weighting (the results of) a query ?

2003-01-05 Thread Peter Galbavy
I have a table with a primary key ('md5') and a bunch of text fields.
There is one row per 'photograph' and the number of rows is about 1100
now but will rise to over 20,000 in a few months - assuming I get time
to import all my stuff.

I would like to offer users on my web site a free text search on these
text fields, but I would like to weight the results base on which field
the text came from.

Let's say those fields are (for simplicity) 'category', 'subcategory',
'caption' and 'keywords'.

I want to do:

SELECT md5, weighting() FROM images WHERE
category ~* 'term' OR subcategory ~* 'term' OR ...

Is there anything I can do - including writing functions - to return a
number that is somehow representative of which WHERE clause matched
'first' and even better the more columns matched ?

I am guessing that like 'C' an 'OR' conditional stops at the first match
and does not process further conditions after a previous one has
matched - that's good enough for me for day one...

It is not critial that I get a value out, the return order of results
could be fine too.

I would like to minimise the number of queries to the DB, but I can fall
back on doing one query per column and combining the results in perl.
This is my approach for an initial implementation later today unless
anyone can suggest otherwise...

Any pointers, tips, code, suggestions greatly appreciated.

Happy New Year all, BTW
--
Peter


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [SQL] weighting (the results of) a query ?

2003-01-07 Thread Peter Galbavy
Thanks to Len Morgan for the hints to get to this:

SELECT *, '4' as result  FROM images
WHERE
iptc_release_date < 'today' AND
iptc_priority BETWEEN 1 AND 5 AND iptc_caption ~* 'term'
UNION
SELECT *, '3' as result FROM images
WHERE
iptc_release_date < 'today' AND
iptc_priority BETWEEN 1 AND 5 AND iptc_keywords ~* 'term'
UNION
SELECT *, '2' as result FROM images
WHERE
iptc_release_date < 'today' AND
iptc_priority BETWEEN 1 AND 5 AND iptc_category ~* 'term'
UNION
SELECT *, '1' as result FROM images
WHERE
iptc_release_date < 'today' AND
iptc_priority BETWEEN 1 AND 5 AND iptc_subcategory ~* 'term'

ORDER BY result desc, iptc_priority asc, shot_time, image_serial asc;

Using a constant and UNION made it work OK. Not sure yet on real world
performance, but that's what tuning is for :)

Hope someone finds this in the archive and finds it useful.

Peter

- Original Message -
From: "Peter Galbavy" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, January 05, 2003 10:41 AM
Subject: [SQL] weighting (the results of) a query ?


> I have a table with a primary key ('md5') and a bunch of text fields.
> There is one row per 'photograph' and the number of rows is about 1100
> now but will rise to over 20,000 in a few months - assuming I get time
> to import all my stuff.
>
> I would like to offer users on my web site a free text search on these
> text fields, but I would like to weight the results base on which field
> the text came from.
>
> Let's say those fields are (for simplicity) 'category', 'subcategory',
> 'caption' and 'keywords'.
>
> I want to do:
>
> SELECT md5, weighting() FROM images WHERE
> category ~* 'term' OR subcategory ~* 'term' OR ...
>
> Is there anything I can do - including writing functions - to return a
> number that is somehow representative of which WHERE clause matched
> 'first' and even better the more columns matched ?
>
> I am guessing that like 'C' an 'OR' conditional stops at the first match
> and does not process further conditions after a previous one has
> matched - that's good enough for me for day one...
>
> It is not critial that I get a value out, the return order of results
> could be fine too.
>
> I would like to minimise the number of queries to the DB, but I can fall
> back on doing one query per column and combining the results in perl.
> This is my approach for an initial implementation later today unless
> anyone can suggest otherwise...
>
> Any pointers, tips, code, suggestions greatly appreciated.
>
> Happy New Year all, BTW
> --
> Peter
>
>
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
>


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [SQL] weighting (the results of) a query ?

2003-01-07 Thread Peter Galbavy
> Most likely you should write UNION ALL, not UNION.  As given, the query
> will go through a pass of attempted duplicate-row-elimination, which is
> almost certainly not what you want.

Not sure - what I want is only one row per real row but ordered as per the
constants. When you say duplicate-row-elimination do you mean including the
'constant' weighting, in which case UNION ALL is probably right. At the
moment, the dataset returned appears correctly de-duplicated.

What I mean is when an image row has both 'eat static' in the caption and as
a keyword, I want it returned only once...

Peter


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [SQL] weighting (the results of) a query ?

2003-01-07 Thread Peter Galbavy
> I think your query might fail on that requirement regardless, no?  At
> least I missed how you'd prevent it.

I have had about 10 minutes to play with this - my day jobrequires I do real
testing when I get home later tonight :)

Thanks, and I will keep an eye out for this and figure a way around it.

Petr


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



[SQL] techniques for manual ordering of data ?

2004-02-07 Thread Peter Galbavy
I tried googling, but the keywords here are so general, that I couldn't find
much useful.

I am looking for reommendations on how to build an ordered list of media (in
this case photographs) in a particular "context". What I have - which I have
somewhat simplified maually - is:

CREATE TABLE contexts (
photo_idint8 REFERENCES photos ...,
catalogue_idint8 REFERENCES catalogues ...,
collection_id   int8 REFERENCES collections ...,

orderingint
);

Here, a photograph may appear in multiple catalogue/collection places. What
I want to do is to allow the admin to reorder the display by using + / -
style buttons. My initial thought was to use an arbitrary "ordering" column,
like old BASIC line numbers, which would be initially allocated in 10s or
100s, and then updating the value via the UI and occassionally applying a
"renum" style function when the list gets uneven.

To avoid performance hits through too many updates, I am trying to avoid the
admin UI from UPDATEing all the rows in a specific catalogue/collection
every time a button is pressed - both the SQL and the subsequent fragmented
table. Hmm.

Anyone got any "prior art" or suggestions on how they acheive this ?

Note that I am NOT including ordering based on an photograph specific
metadata like time or location. That's another story and another interface.

rgds,
--
Peter


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html