Re: [PERFORM] Basic Q on superfluous primary keys
Merlin Moncure wrote: In the context of this debate, I see this argument all the time, with the implied suffix: 'If only we used integer keys we would not have had this problem...'. Either the customer identifies parts with a part number or they don't...and if they do identify parts with a number and recycle the numbers, you have a problem...period. On the contrary. You create a new record with the same part number. You mark the old part number "obsolete". Everything else (the part's description, and all the relationships that it's in, such as order history, catalog inclusion, revision history, etc.) is unaffected. New orders are placed against the new part number's DB record; for safety the old part number can have a trigger that prevent new orders from being placed. Since the part number is NOT the primary key, duplicate part numbers are not a problem. If you had instead used the part number as the primary key, you'd be dead in the water. You can argue that the customer is making a dumb decision by reusing catalog numbers, and I'd agree. But they do it, and as database designers we have to handle it. In my particular system, we aggregate information from several hundred companies, and this exact scenario happens frequently. Since we're only aggregating information, we have no control over the data that these companies provide. If we'd used catalog numbers for primary keys, we'd have big problems. Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Basic Q on superfluous primary keys
On 4/18/07, Dave Dutcher <[EMAIL PROTECTED]> wrote: I think a database with all natural keys is unrealistic. For example if you have a table that refers to people, are you going to use their name as a primary key? Names change all the time due to things like marriage, divorce, or trouble with the law. We have tables with 20 million rows which reference back to a table of people, and if I used the person's name as key, it would be a major pain when somebody's name changes. Even if there is referential integrity, one person might be referred to by 25% of the 20 million rows, so the update would take quite a long time. Also the table will be filled with dead rows and the indexes will likely be bloated. If I want to clean that up, it will take a vacuum full or a cluster which will lock the whole table and run for hours. If I use a surrogate key, I can change their name in one row and be done with it. That's perfectly reasonable (I mentioned this upthread)...there are a couple of corner cases where RI costs too much Exchanging a surrogate for a natural is a valid performance consideration. Usually, the performance win is marginal at best (and your example suggests possible normalization issues in the child table), sometimes there is no alternativeupdating 5 million rows is obviously nasty. That said -- if the cost of update was zero, would you still do it that way? I'm trying to separate performance related issues, which are reasonable and valid depending on the situation, with good design principles. merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Basic Q on superfluous primary keys
On Tue, 2007-04-17 at 21:06 -0700, Craig A. James wrote: > Merlin Moncure wrote: > > In the context of this debate, I see this argument all the time, with > > the implied suffix: 'If only we used integer keys we would not have > > had this problem...'. Either the customer identifies parts with a > > part number or they don't...and if they do identify parts with a > > number and recycle the numbers, you have a problem...period. > > On the contrary. You create a new record with the same part number. You > mark the old part number "obsolete". Everything else (the part's > description, and all the relationships that it's in, such as order history, > catalog inclusion, revision history, etc.) is unaffected. New orders are > placed against the new part number's DB record; for safety the old part > number can have a trigger that prevent new orders from being placed. > > Since the part number is NOT the primary key, duplicate part numbers are not > a problem. If you had instead used the part number as the primary key, you'd > be dead in the water. > > You can argue that the customer is making a dumb decision by reusing catalog > numbers, and I'd agree. But they do it, and as database designers we have to > handle it. In my particular system, we aggregate information from several > hundred companies, and this exact scenario happens frequently. Since we're > only aggregating information, we have no control over the data that these > companies provide. If we'd used catalog numbers for primary keys, we'd have > big problems. > Storing data is easy. The difficulty lies in storing data in such a way that your assumptions about the data remain valid and your queries still answer the questions that you think they answer. Because an internal ID field has no meaning outside of the database (some auto-generated keys do have meaning outside the database, but I'm not talking about those), you can't effectively query based on an internal id any more than you can query by the ctid. So what do you query by then? You query by natural keys anyway. Internal id fields are an implementation detail related to performance (the real topic of this discussion). If you have two parts with the same part id, what does that mean? Sure, you can store the data, but then the queries that assumed that data was unique no longer hold. Sometimes you need two parts with the same part id, but you have to know the meaning in order to query based on that data. Let me ask these questions: - Do you think that all of your relations have an internal id? - Do you think that all the internal ids you use are unique in the relations in which they appear? If you answer "yes" to either question, consider that every query on that data is also a relation and so are subselects and intermediate results. Do those all have an id? If not, why not? How do you join a virtual relation to a physical relation if the virtual relation has no internal id? Is the id field still unique in the result of a join or Cartesian product? Regards, Jeff Davis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Basic Q on superfluous primary keys
I think a database with all natural keys is unrealistic. For example if you have a table that refers to people, are you going to use their name as a primary key? Names change all the time due to things like marriage, divorce, or trouble with the law. We have tables with 20 million rows which reference back to a table of people, and if I used the person's name as key, it would be a major pain when somebody's name changes. Even if there is referential integrity, one person might be referred to by 25% of the 20 million rows, so the update would take quite a long time. Also the table will be filled with dead rows and the indexes will likely be bloated. If I want to clean that up, it will take a vacuum full or a cluster which will lock the whole table and run for hours. If I use a surrogate key, I can change their name in one row and be done with it. Just my 2 cents. Dave ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Basic Q on superfluous primary keys
Merlin Moncure wrote: Since the part number is NOT the primary key, duplicate part numbers are not a problem. If you had instead used the part number as the primary key, you'd be dead in the water. You are redefining the primary key to be (part_number, obsoletion_date). Now, if you had not anticipated that in the original design (likely enough), you do have to refactor queries that join on the table...so what? If that's too much work, you can use a view to take care of the problem (which may be a good idea anyways). *you have to refactor the system anyways because you are now allowing duplicate part numbers where previously (from the perspective of the user), they were unique *. The hidden advantage of pushing the full key through the database is it tends to expose holes in the application/business logic. Chances are some query is not properly distinguishing obsoleted parts and now the real problems come...surrogate keys do not remove complexity, they simply sweep it under the rug. This really boils down to an object-oriented perspective. I have an object, a customer's catalog entry. It has properties such as catalog number, description, etc, and whether it's obsolete or not. Management of the object (its relation to other objects, its history, etc.) should NOT depend on the object's specific definition. This is true whether the object is represented in Lisp, C++, Perl, or (in this case) an SQL schema. Good object oriented design abstracts the object and its behavior from management of the object. In C++, Perl, etc., we manage objects via a pointer or object reference. In SQL, we reference objects by an *arbitrary* integer that is effectively a pointer to the object. What you're suggesting is that I should break the object-oriented encapsulation by pulling out specific fields of the object, exposing those internal object details to the applications, and spreading those details across the whole schema. And I argue that this is wrong, because it breaks encapsulation. By exposing the details of the object, if the details change, *all* of your relationships break, and all of your applications have to change. And I've never seen a system where breaking object-oriented encapsulation was a good long-term solution. Systems change, and object-oriented techniques were invented to help manage change. This is one of the reasons the Postgres project was started way back when: To bring object-oriented techniques to the relational-database world. Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Basic Q on superfluous primary keys
On 4/18/07, Craig A. James <[EMAIL PROTECTED]> wrote: Merlin Moncure wrote: > In the context of this debate, I see this argument all the time, with > the implied suffix: 'If only we used integer keys we would not have > had this problem...'. Either the customer identifies parts with a > part number or they don't...and if they do identify parts with a > number and recycle the numbers, you have a problem...period. On the contrary. You create a new record with the same part number. You mark the old part number "obsolete". Everything else (the part's description, and all the relationships that it's in, such as order history, catalog inclusion, revision history, etc.) is unaffected. New orders are placed against the new part number's DB record; for safety the old part number can have a trigger that prevent new orders from being placed. Since the part number is NOT the primary key, duplicate part numbers are not a problem. If you had instead used the part number as the primary key, you'd be dead in the water. You are redefining the primary key to be (part_number, obsoletion_date). Now, if you had not anticipated that in the original design (likely enough), you do have to refactor queries that join on the table...so what? If that's too much work, you can use a view to take care of the problem (which may be a good idea anyways). *you have to refactor the system anyways because you are now allowing duplicate part numbers where previously (from the perspective of the user), they were unique *. The hidden advantage of pushing the full key through the database is it tends to expose holes in the application/business logic. Chances are some query is not properly distinguishing obsoleted parts and now the real problems come...surrogate keys do not remove complexity, they simply sweep it under the rug. merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Basic Q on superfluous primary keys
On Wed, 18 Apr 2007, Richard Huxton wrote: And at the end of the day that's the root of the problem. It's easy to be lulled into "well it looks like a primary key" rather than being able to guarantee it. In some of these cases it is guaranteed to be a primary key given all available information at the time. The funny thing about unexpected changes to a business model is that you never expect them. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Basic Q on superfluous primary keys
Craig A. James wrote: Since we're only aggregating information, we have no control over the data that these companies provide. And at the end of the day that's the root of the problem. It's easy to be lulled into "well it looks like a primary key" rather than being able to guarantee it. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Basic Q on superfluous primary keys
Merlin Moncure wrote: In the context of this debate, I see this argument all the time, with the implied suffix: 'If only we used integer keys we would not have had this problem...'. Either the customer identifies parts with a part number or they don't...and if they do identify parts with a number and recycle the numbers, you have a problem...period. On the contrary. You create a new record with the same part number. You mark the old part number "obsolete". Everything else (the part's description, and all the relationships that it's in, such as order history, catalog inclusion, revision history, etc.) is unaffected. New orders are placed against the new part number's DB record; for safety the old part number can have a trigger that prevent new orders from being placed. Since the part number is NOT the primary key, duplicate part numbers are not a problem. If you had instead used the part number as the primary key, you'd be dead in the water. You can argue that the customer is making a dumb decision by reusing catalog numbers, and I'd agree. But they do it, and as database designers we have to handle it. In my particular system, we aggregate information from several hundred companies, and this exact scenario happens frequently. Since we're only aggregating information, we have no control over the data that these companies provide. If we'd used catalog numbers for primary keys, we'd have big problems. Craig ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Basic Q on superfluous primary keys
On 4/16/07, Greg Smith <[EMAIL PROTECTED]> wrote: I think the point Craig was trying to make is that what you refer to here as "extraordinary cases" are, in fact, rather common. I've never seen a database built on natural keys that didn't at some point turn ugly when some internal or external business need suddenly invalidated the believed uniqueness of that key. I don't think it's so terrible to add a field to a key...I too have worked on a ERP system based on natural keys and was quite amazed on how well organized the database was.When the company decided to re-number all the items in the database, it was a minor pain. Extending a critical key would be a project any way you organize the database IMO. Natural keys are most common in manufacturing and accounting systems because of the COBOL heritage, when natural keys were the only way to realistically do it. Unfortunately SQL really missed the boat on keys...otherwise they would behave more like a composite type. The last really bad one I saw was a manufacturing database that used a combination of the customer code and the customer's part number as the key. Surely if the customer changes their part number, we should switch ours to match so the orders are easy to process, right? When this got fun was when one large customer who released products on a yearly schedule decided to change the bill of material for many of their parts for the new year, but re-used the same part number; oh, and they were still ordering the old parts as well. Hilarity ensued. In the context of this debate, I see this argument all the time, with the implied suffix: 'If only we used integer keys we would not have had this problem...'. Either the customer identifies parts with a part number or they don't...and if they do identify parts with a number and recycle the numbers, you have a problem...period. Adding a integer key only moves the confusion to a separate place, unless it is used by the user to identify the part number and then *becomes* the key, or a part of it. If you hide the id from the user, then I claim the data model is pretty much busted. merlin ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Basic Q on superfluous primary keys
On Mon, 16 Apr 2007, Merlin Moncure wrote: extraordinary cases do happen, like a company overhauling its numbering systems, but such cases can be dealt with by a number of methods including letting RI do its thing. I think the point Craig was trying to make is that what you refer to here as "extraordinary cases" are, in fact, rather common. I've never seen a database built on natural keys that didn't at some point turn ugly when some internal or external business need suddenly invalidated the believed uniqueness of that key. The last really bad one I saw was a manufacturing database that used a combination of the customer code and the customer's part number as the key. Surely if the customer changes their part number, we should switch ours to match so the orders are easy to process, right? When this got fun was when one large customer who released products on a yearly schedule decided to change the bill of material for many of their parts for the new year, but re-used the same part number; oh, and they were still ordering the old parts as well. Hilarity ensued. it is especially unclear how adding an integer to a table will somehow magically solve these problems. If the key is a integer, it's always possible to figure out a trivial map that renumbers the entire database programmatically in order to merge two sets of data. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Basic Q on superfluous primary keys
Craig A. James wrote: > Merlin Moncure wrote: >> Using surrogate keys is dangerous and can lead to very bad design >> habits that are unfortunately so prevalent in the software industry >> they are virtually taught in schools. ... While there is >> nothing wrong with them in principle (you are exchanging one key for >> another as a performance optimization), they make it all too easy to >> create denormalized designs and tables with no real identifying >> criteria, etc,... > > Wow, that's the opposite of everything I've ever been taught, and all my > experience in the last few decades. > > ...chemistry...two companies using the same "natural" > keys had as much as 10% differences in their multi-million-compound > databases. These errors led to six-month to year-long delays, as each > of the conflicting chemical record had to be examined by hand by a PhD > chemist to reclassify it. That sounds almost like a feature, not a bug - giving information about what assumptions that went into the "natural key" need to be reconsidered. And I don't see how it would have been improved by adding a surrogate key - except that the data would have been just as messed up though harder to see where the messups were. > We've always recommended to our customers that all primary keys be > completely information free. They should be not based on any > information or combination of information from the data records. Every > time the customer has not followed this advice, they've later regretted it. Hmm... but then do you put a unique index on what the otherwise-would-have-been-natural-primary-key columns? If not, you tend to get into the odd situation of multiple rows that only vary in their surrogate key -- and it seems the surrogate key is redundant. > I'm sure there are situations where a natural key is appropriate, but I > haven't seen it in my work. I've seen both - and indeed usually use surrogate keys for convenience; but also find that having to fix incorrect assumptions in natural primary keys tends to raise business issues that are worth addressing anyway. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Basic Q on superfluous primary keys
On 4/16/07, Craig A. James <[EMAIL PROTECTED]> wrote: Merlin Moncure wrote: > Using surrogate keys is dangerous and can lead to very bad design > habits that are unfortunately so prevalent in the software industry > they are virtually taught in schools. ... While there is > nothing wrong with them in principle (you are exchanging one key for > another as a performance optimization), they make it all too easy to > create denormalized designs and tables with no real identifying > criteria, etc,... I can't recall ever seeing a "natural" key that was immutable. In my business (chemistry), we've seen several disasterous situations were companies picked keys they thought were natural and immutable, and years down the road they discovered (for example) that chemical compounds they thought were pure were in fact isotopic mixtures, or simply the wrong molecule (as analytical techniques improved). Or during a corporate takeover, they discovered that two companies using the same "natural" keys had as much as 10% differences in their multi-million-compound databases. These errors led to six-month to year-long delays, as each of the conflicting chemical record had to be examined by hand by a PhD chemist to reclassify it. while your example might be a good case study in proper classification, it has nothing to do with key selection. it is especially unclear how adding an integer to a table will somehow magically solve these problems. are you claiming that a primary key can't be changed? mutability is strictly a performance argument. since RI handles cascading primary key changes, it's simply a matter of if you are willing to wait for RI to do its work or not (if not, swap in the id key as in my example). the performance argument really only applies to specific cases, and can be considered a attribute of certain tables. extraordinary cases do happen, like a company overhauling its numbering systems, but such cases can be dealt with by a number of methods including letting RI do its thing. merlin ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Basic Q on superfluous primary keys
Merlin Moncure wrote: Using surrogate keys is dangerous and can lead to very bad design habits that are unfortunately so prevalent in the software industry they are virtually taught in schools. ... While there is nothing wrong with them in principle (you are exchanging one key for another as a performance optimization), they make it all too easy to create denormalized designs and tables with no real identifying criteria, etc,... Wow, that's the opposite of everything I've ever been taught, and all my experience in the last few decades. I can't recall ever seeing a "natural" key that was immutable. In my business (chemistry), we've seen several disasterous situations were companies picked keys they thought were natural and immutable, and years down the road they discovered (for example) that chemical compounds they thought were pure were in fact isotopic mixtures, or simply the wrong molecule (as analytical techniques improved). Or during a corporate takeover, they discovered that two companies using the same "natural" keys had as much as 10% differences in their multi-million-compound databases. These errors led to six-month to year-long delays, as each of the conflicting chemical record had to be examined by hand by a PhD chemist to reclassify it. In other businesses, almost any natural identifier you pick is subject to simple typographical errors. When you discover the errors in a field you've used as a primary key, it can be quite hard to fix, particularly if you have distributed data across several systems and schemas. We've always recommended to our customers that all primary keys be completely information free. They should be not based on any information or combination of information from the data records. Every time the customer has not followed this advice, they've later regretted it. I'm sure there are situations where a natural key is appropriate, but I haven't seen it in my work. Craig ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Basic Q on superfluous primary keys
On 4/14/07, Bill Moran <[EMAIL PROTECTED]> wrote: In response to "Kynn Jones" <[EMAIL PROTECTED]>: > The two situations are semantically identical: each record in table bar > refers to a record in table foo. The difference is that in the first > schema, this referencing is done through an "artificial" serial-integer > primary key, while in the second schema this reference is done through a > data field that happens to be unique and not null, so it can serve as > primary key. I had a discussion about this recently on the Drupal mailing lists, at the end of which I promised to do some benchmarking to determine whether or not text keys really do hurt performance of indexes. Unfortunately, I still haven't followed through on that promise -- maybe I'll get to it tomorrow. The main reason why integer indexes are faster than natural counterparts is that the index is smaller and puts less pressure on cache. This is however offset by removing joins here and there and you usually just end up indexing the data anyways. Performance is kind of tangential to the argument though -- I've seen databases using all natural keys and found them to be very clean and performant. Using surrogate keys is dangerous and can lead to very bad design habits that are unfortunately so prevalent in the software industry they are virtually taught in schools. Many software frameworks assume you use them and refuse to work without them (avoid!) While there is nothing wrong with them in principle (you are exchanging one key for another as a performance optimization), they make it all too easy to create denormalized designs and tables with no real identifying criteria, etc, and the resultant stinky queries to put it all back together again, (full of unions, self joins, extraneous groups, case statements, etc). A good compromise in your designs is to identify your natural key but use the surrogate if you have valid performance reasons: CREATE TABLE foo ( frobnitz_id int unique, frobnitz character(varying 100) PRIMARY KEY\ [...] ); frobnitz_id is of course optional and not necessary in all tables. It may be a pain to relate a large table with a four or five part key and judicious use of surrogates may be justified for performance or even just to keep your queries smaller: create table order_line_item_discount ( company_name text, order_no int, line_item_seq_no int, discount_code text, primary key(company_name, order_no, line_item_seq_no, discount_code) ) becomes create table order_line_item_discount ( order_line_item_id int, discount_code text, primary key (order_line_item_id, discount_code) ) merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Basic Q on superfluous primary keys
In response to "Kynn Jones" <[EMAIL PROTECTED]>: > Consider these two very similar schemas: > > Schema 1: > > > CREATE TABLE foo ( > id serial PRIMARY KEY, > frobnitz character(varying 100) NOT NULL UNIQUE > ); > > > CREATE TABLE bar ( > id serial PRIMARY KEY, > foo_id int REFERENCES foo(id) > ) > > > Schema 2: > > > CREATE TABLE foo ( > frobnitz character(varying 100) PRIMARY KEY > ); > > > CREATE TABLE bar ( > id serial PRIMARY KEY, > frobnitz character(varying 100) REFERENCES foo(frobnitz) > ) > > > > > The two situations are semantically identical: each record in table bar > refers to a record in table foo. The difference is that in the first > schema, this referencing is done through an "artificial" serial-integer > primary key, while in the second schema this reference is done through a > data field that happens to be unique and not null, so it can serve as > primary key. The first case is call a "surrogate key". A little googling on that term will turn up a wealth of discussion -- both for and against. > I find Schema 1 awkward and unnatural; more specifically, foo.id seems > unnecessary in light of the non-null uniqueness of foo.frobnitz. But I > remember once reading that "long" fields like foo.frobnitz did not make good > primary keys. I had a discussion about this recently on the Drupal mailing lists, at the end of which I promised to do some benchmarking to determine whether or not text keys really do hurt performance of indexes. Unfortunately, I still haven't followed through on that promise -- maybe I'll get to it tomorrow. > Is the field foo.id in Schema 1 superfluous? For example, wouldn't the > referencing from bar to foo really be done "behind the scenes" through some > hidden field (oid?) instead of through the frobnitz text field? Which of > the two schemas would give better perfornance? -- Bill Moran Collaborative Fusion Inc. [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Basic Q on superfluous primary keys
Consider these two very similar schemas: Schema 1: CREATE TABLE foo ( id serial PRIMARY KEY, frobnitz character(varying 100) NOT NULL UNIQUE ); CREATE TABLE bar ( id serial PRIMARY KEY, foo_id int REFERENCES foo(id) ) Schema 2: CREATE TABLE foo ( frobnitz character(varying 100) PRIMARY KEY ); CREATE TABLE bar ( id serial PRIMARY KEY, frobnitz character(varying 100) REFERENCES foo(frobnitz) ) The two situations are semantically identical: each record in table bar refers to a record in table foo. The difference is that in the first schema, this referencing is done through an "artificial" serial-integer primary key, while in the second schema this reference is done through a data field that happens to be unique and not null, so it can serve as primary key. I find Schema 1 awkward and unnatural; more specifically, foo.id seems unnecessary in light of the non-null uniqueness of foo.frobnitz. But I remember once reading that "long" fields like foo.frobnitz did not make good primary keys. Is the field foo.id in Schema 1 superfluous? For example, wouldn't the referencing from bar to foo really be done "behind the scenes" through some hidden field (oid?) instead of through the frobnitz text field? Which of the two schemas would give better perfornance? Thanks! kj