[SQL] Comparing a string against an XPath result set

2009-07-16 Thread Tim Landscheidt
Hi,

I'm trying to check whether a given string is part of an
XPath result set. I have encountered some performance prob-
lems in the process, but before I track those down further,
I'd like to see if I've chosen the right path to begin with:

| SELECT * FROM $TABLE
|   WHERE $COLUMN = ANY((xpath($EXPR, xmlparse(DOCUMENT $DOC)))::TEXT[]);

In this case, xpath() returns an array of XML fragments made
of a "pure" text child each. I cannot compare XML fragments
with anything, so the conversion to TEXT[] seems to be the
only way I could check whether $COLUMN is a part of those.

  Is there any other way I could tackle this?

Tim


-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] De-duplicating rows

2009-07-16 Thread Christophe
The Subject: is somewhat imprecise, but here's what I'm trying to do.   
For some reason, my brain is locking up over it.


I'm moving a 7.2 (yes) database to 8.4.  In the table in question, the  
structure is along the lines of:


serial_number   SERIAL, PRIMARY KEY
email   TEXT
create_date TIMESTAMP
attr1   type
attr2   type
attr3   type
...

(The point of the "attr" fields is that there are many more columns  
for each row.)


The new structure removes the "serial_number" field, and uses "email"  
as the primary key, but is otherwise unchanged:


email   TEXT, PRIMARY KEY
create_date TIMESTAMP
attr1   type
attr2   type
attr3   type
...

Now, since this database has been production since 7.2 days, cruft has  
crept in: in particular, there are duplicate email addresses, some  
with mismatched attributes.  The policy decision by the client is that  
the correct row is the one with the earliest timestamp.  (The  
timestamps are widely distributed; it's not the case that there is a  
single timestamp above which all the duplicates live.)  Thus, ideally,  
I want to select exactly one row per "email", picking the row with the  
earliest timestamp in the case that there is more than one row with  
that email.


Any suggestions on how to write such a SELECT?  Of course, I could do  
this with an application against the db, but a single SELECT would be  
great if possible.


TIA!

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql