On 16 Jan 2011, at 5:38, Randall Smith wrote:

>> But you already do have an index on that id-field, so what's the problem 
>> with using a unique constraint? Its unique index could just replace the 
>> existing one.
> 
> I'm OK with indexing the 8 byte integer, but I don't want to index the
> text field that could possibly be up to 1k.  In some simple tests,
> indexing both fields, the index was larger than the table.


Ah, it was about a different field - that makes much more sense :)

You probably don't need the full width of that field to be fairly certain that 
it's going to be unique from that point on anyway. The chance that a unique 
constraint kicks in when it shouldn't gets lower the wider your index on that 
field gets, but it's probably sufficiently low with relatively few characters, 
while it's probably not that big a deal if it kicks in early in a few cases.

For example, if you only index the first 100 bytes of each of these fields, you 
are very unlikely to run into a situation where your constraint claims the 
field is not unique while it actually is.

That shrinks your index down by a factor of 10, and IMO 100 bytes is still on 
the wide side for a unique text column. I don't think many people successfully 
enter 100 characters in a row without ever making an error.

If instead the field contains computer-generated data, then there surely is an 
easier way to detect that this text is not going to be unique. You could store 
the values of the parameters used to generate that data for example, and put a 
unique constraint on those.


I'd also wonder what the value is of two distinct records with the same 1000 
characters of text up to the last one. Are those distinct on purpose or is it 
more likely that someone made an error entering that field and accidentally 
managed to create a unique entry while in fact the unique constraint was 
supposed to kick in?

Besides that, you say there will be billions of records, but that text field 
needs to be unique? I really do hope the contents of that field will be 
computer-generated, or your users will get quite desperate coming up with new 
values to use after a while, especially if the contents need to make any sense.

But of course much depends on the nature of the data in your field. You haven't 
said anything about that.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4d32bdf511764853411139!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to