> This all started with David making a broad blanket statement about "data > integrity" and "row duplication" and how using "synthetic" record ID keys > ruined the ability to automatically filter out "dupes". (I _think_ that > was your point David. Please correct me if not.) And in that strict sense, > if a "row" is really actually
I've been a bit taken aback at how this thread has gone, to be honest. My point was really basic: Bolding on a random number field makes a row unique, it doesn't make the underlying *data* unique. That's it. That's more-or-less me using one of my standard techniques: 1) Find the obvious. 2) Draw a circle around it. 3) Declare it mine. Patent pending... Seriously, I'm saying something that just shouldn't be even the tiniest bit contentious. That point is nothing but an established principle from database design, it's not something I'd normally consider debatable. I'd put it in the same category as when someone says, "Race condition? I've figured out a shortcut." No you haven't. That's pretty much in line with things the patent office won't even review: * Perpetual motion machines. * Squirrel-proof bird feeders. Until such time as the fundamental laws of nature change, neither of those inventions can work. A lot of the stuff about duplicates doesn't bear on the design question about the table's columns. Keller 123 Keller 123 Keler 123 How many rooms are there? You have no way to know. It could be 1, 2, or 3: 1: You've got a pure duplicate and a typo-based duplicate. 2: You've got a pure duplicate and two buildings with very similar names. 3: You've got two buildings with very similar names *and you are missing enough data to identify rows uniquely.* In that last case,there's a design question. If you haven't identified enough columns of *data* from the real world to uniquely identify each row, you don't have a data model that can be implemented in a relational database. I mean, sure, you can use a tool to store the data - but I don't see how you model that. What makes each row unique? I mean out in the real world. Perhaps there's a floor, or a direction, or some kind of vernacular detail that actual people use to distinguish rooms. In which case, you've found a missing field! That's part of the point of modeling. But what if there really is nothing? Well, then it's pretty common to add something, like a number. But that then (generally) needs to flow out into the real world. So, you synthesize data in the database, add it to the row *and* to the real world. Serial numbers are a perfect example of this. Same with account numbers, customer numbers, etc. And UUIDs are functionally serial numbers. They have nothing at all to do with the data in the row, they're purely an implementation-level convenience. As some people seem to be getting the impression I'm against UUIDs, that's incorrect. I'm glad they're native (I used them sooner than that), I've got nothing against them. But they are what they are and not more. Any complaints I had about how they were implemented in 4D is irrelevant now as it's been since V14. Too old to matter. I also have been getting the impression when people say "primary key", they don't always appreciate what a "key" really is - an attribute or set of attributes that are entirely about the row of data that also uniquely identify a row. That's a key...off the top of my head, I'm sure there's a better summary out there. The 4D world has long had a peculiarly hostile relationship to normalization, starting with 4D itself (subtables, wrong examples in the manuals, etc.) I've seen the same cropping up again with how object fields are being promoted. (Slapping some JSON in a text field and calling your system NoSQL isn't necessarily helpful...) But just slapping a random-but-unique number (sequential or not) onto the row? That easily masks real duplicates. And why other databases prominently support multi-field constraints on tables - for precisely this reason. Again, you *can* do this in 4D with a compound index set to unique. So, the feature exists, I just haven't seen it being used widely. Perhaps it because engine-level errors feel disproportionately painful to deal with in 4D? That could just be me. Sincerely, I may be more allergic to 4D engine-level errors than others. I pretty much turn off the unique attribute and check uniqueness myself. Hmmm. I might be missing upping my game here. I kind of remember that years ago hitting a duplicate error wasn't easy to trap for and got ugly. I can't swear that's true, I just remember having that impression. Long story shorter, I probably haven't tried 4D's uniqueness controls in years. Do people use them? How are they in V16? Since I'm doing some Postgres stuff these days, yeah, over there I set up multi-column constraints, no question. There are lots of other kinds of duplicates out there that can crop up, even if you have a good model behind your tables. Life is hard. But it's important and helpful to keep in mind the distinction between the issues around the design of the table and the issues around data entry. They're not all the same. As far as duplicate matching goes, that's a different topic and one I enjoy, but it's a different topic. ********************************************************************** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **********************************************************************