Thank you for the answer Grzegorz.

> if you have a primary key on the table, and you should, you might get better 
> performance using LEFT JOIN.

Well as far as I know, the result of such JOIN is a cartezian product, which is 
not exactly what I need. I need the same structure as table 'data' has. Or am I 
missing a trick how LEFT OUTER JOIN can be used instead of EXCEPT? :)

> EXCEPT will compare all columns, which might not be that fast, especially if 
> those are text. (hence why I always tell others to use int as key in a table, 
> but that's a different story). 

There is no good int to start using as a key in my 'data'. I would have to 
create one (out of some hash function, diggesting the whole row probably), but 
there is a strong possibility of adding colums into 'data' latter on, which 
would require recalculation of such 'hash' column over and over again for 
millions of rows. While not impossible, cerainly something I would like to 
avoid. 
Moreover, if one creates and maintains such hash column by hand and on his own, 
it is very likely, that he will forgot something, or even mess it up 
completely. However, if there is a tool (something like an index on all colums) 
available in the database itself, I would be eager to use it.

Thanks again,
Winco


> -- 
> GJ



    * From: "Kincel, Martin" <mkin...@soitron.com>
    * To: <pgsql-general@postgresql.org>
    * Subject: optimalisation with EXCEPT clause
    * Date: Tue, 13 Apr 2010 17:01:18 +0200
    * Message-id: <a5ed43533e983e4685c9e6156be8874f0840d...@kenya.tronet.as>

Hello,


everyday I collect a couple of thousands rows of unique data from our
systems and I INSERT them into the table. Since I need no duplicate
data, I use EXCEPT clause when INSERTing, like this:

===
INSERT INTO data SELECT * FROM new_collected_data() EXCEPT SELECT * FROM
data;
===

It works exactly as I need, but there is a small issue I am thinking
about how to improve. Yes it's performance, what else? :)

Since I am INSERTing new_collected_data() in 10000-rows chunks into a
table already containing millions of rows, it takes a few minutes
(literally), which is something I understand and accept. 
However, I am wondering whether there is any way how to improve the
performance, either via indices, or ALTERing TABLE with UNIQUE
constraint or something else I might have completely forgot about.

Does anyone have any recommended approach how to speed up queries
containing EXCEPT clause? 


Thanks a lot,
Winco


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to