Thank you for the answer Grzegorz. > if you have a primary key on the table, and you should, you might get better > performance using LEFT JOIN.
Well as far as I know, the result of such JOIN is a cartezian product, which is not exactly what I need. I need the same structure as table 'data' has. Or am I missing a trick how LEFT OUTER JOIN can be used instead of EXCEPT? :) > EXCEPT will compare all columns, which might not be that fast, especially if > those are text. (hence why I always tell others to use int as key in a table, > but that's a different story). There is no good int to start using as a key in my 'data'. I would have to create one (out of some hash function, diggesting the whole row probably), but there is a strong possibility of adding colums into 'data' latter on, which would require recalculation of such 'hash' column over and over again for millions of rows. While not impossible, cerainly something I would like to avoid. Moreover, if one creates and maintains such hash column by hand and on his own, it is very likely, that he will forgot something, or even mess it up completely. However, if there is a tool (something like an index on all colums) available in the database itself, I would be eager to use it. Thanks again, Winco > -- > GJ * From: "Kincel, Martin" <mkin...@soitron.com> * To: <pgsql-general@postgresql.org> * Subject: optimalisation with EXCEPT clause * Date: Tue, 13 Apr 2010 17:01:18 +0200 * Message-id: <a5ed43533e983e4685c9e6156be8874f0840d...@kenya.tronet.as> Hello, everyday I collect a couple of thousands rows of unique data from our systems and I INSERT them into the table. Since I need no duplicate data, I use EXCEPT clause when INSERTing, like this: === INSERT INTO data SELECT * FROM new_collected_data() EXCEPT SELECT * FROM data; === It works exactly as I need, but there is a small issue I am thinking about how to improve. Yes it's performance, what else? :) Since I am INSERTing new_collected_data() in 10000-rows chunks into a table already containing millions of rows, it takes a few minutes (literally), which is something I understand and accept. However, I am wondering whether there is any way how to improve the performance, either via indices, or ALTERing TABLE with UNIQUE constraint or something else I might have completely forgot about. Does anyone have any recommended approach how to speed up queries containing EXCEPT clause? Thanks a lot, Winco -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general