On Sun, 2003-02-16 at 23:34, Tom Lane wrote: > Ryan Bradetich <[EMAIL PROTECTED]> writes: > > Although the table schema is immaterial, I will provide it so we have a > > common framework for this discussion: > > > host_id integer (not null) > > timestamp datetime (not null) > > category text (not null) [<= 5 chars] > > anomaly text (not null) [<= 1024 chars] > > > This table is used to store archived data, so each row in the table must > > be unique. Currently I am using a primary key across each column to > > enforce this uniqueness. > > It's not real clear to me why you bother enforcing a constraint that the > complete row be unique. Wouldn't a useful constraint be that the first > three columns be unique? Even if that's not correct, what's wrong with > tolerating a few duplicates? You can't tell me it's to save on storage > ;-)
The table holds system policy compliance data. The catagory is basically the policy, and the anomaly is the detailed text explaining why the system is out of compliance. So the anomaly data is important (and often the reason why the key is unique). The reason we are archiving the data is to generate reports and graphs showing policy compliance over time. Duplicated rows will artifically inflate the numbers in the reports and graphs. The other option we had was to perform a DISTINCT select at report / graph time, we chose no to go this route bacause of the sort added to the query. (Also it just seemed tidier to only store good data :)) The disk storage is a minor concern :), but I was actually looking at it as a possible performance enhancement. I am curious how it affects the shared buffer cache, and also there should be less average pages to read since the index size was smaller. Does this make sense? Or am I out in left field again? :) > > I am not sure why all the data is duplicated in the index ... but i bet > > it has to do with performance since it would save a lookup in the main > > table. > > An index that can't prevent looking into the main table wouldn't be > worth anything AFAICS ... Ok, scratch that idea then :) I will continue looking at other ideas like the MD5 data hashing etc. Thanks for your input Tom! - Ryan regards, tom lane -- Ryan Bradetich <[EMAIL PROTECTED]> ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])