Mark, Nathan, I'm moving this over to the PGSQL-SQL list, away from -hackers, as it's no longer a -hackers type discussion. Hope you don't mind!
> On Wed, Sep 07, 2005 at 11:31:16AM -0700, Josh Berkus wrote: > > I'm also a little baffled to come up with any real application where > > making an id number for most tables "unguessable" would provide any > > kind of real protection not far better provided by other means. For > > your "users" table, sure, but that's a very special case. > > It should never be the sole means of defense, however, it can be quite > effective at prevention. > > For a rather simple example, consider a site that associates a picture > with each member. If the pictures are named 1.jpg, 2.jpg, 3.jpg, etc. > it makes it ridiculously easy to write a script to pull all of the > pictures off the site. This can be bothersome, as the only type of > person who would do this, is the type of person with an illegitimate > motivation. I want the data to be easily and freely accessible as > specific objects, but I do not wish to provide an easy way of > dumping all of the data as a unit. > > By making the picture identifier unguessable, it discourages the most > common sort of abuse of the system. If the number is unguessable, and > they can't access the directory as a listing, it will be sufficiently > difficult as to discourage the common abuser of the system. On the > other hand, an obviously guessable identifier may *encourage* the > common person to consider abuse. > > In my case, it isn't only pictures. I don't want people pulling all > the data off the site as a dump, and using it how they wish, but I do > wish to make the data freely available, and easily accessible from a > web browser. > > I'm not under the impression that it is impossible for a competent > person to dump my database. I am under the impression that the people > who would do such a thing, tend not to be intelligent, and will be > stopped by this simple tactic. > > I could use any identifier at all. It could be a random sequence of > characters. The UUID appeals to me, as I don't have to re-invent > the concept. This use of UUID falls outside the scope of using it > to join tables. It's a handle that is associated with the data, > for external identification of the object. Seems like this would be better served by simply encrypting the three-part universal key (server|table|row) using an encryption key which is not public/obvious. That would preserve the obscurity of object naming while still allowing the UUID to contain useful information. > I happen to also use it as an internal primary key for the objects > that fit this category, as I wish to benefit from the built-in merge > capabilities of UUID over SERIAL, and I don't currently see the > point of keeping a SERIAL and a UUID for each object. On the last > point, I did start to do this, but every single one of my queries > become more complicated as a result. Using the SERIAL for joining, > and the UUID for identifying a set of rows was becoming a little > ridiculous for my purposes. Using only the UUID to provide for > all my purposes is suiting my requirements for the cost of 1.5X > the size of a primary key index, 2X the size of a index for > a n to n relation mapping UUID to UUID, and an far less significant > increase in table space (much less than 1.5X, although I haven't > finished calculating it yet). > > Not that everybody should rip out SERIAL and replace it with UUID, > but it really isn't that bad, and in some cases, such as mine, > I don't see the point of using both, and choose to instead allow > UUID to solve many of my concerns at the same time, with an > acceptable for me cost in database pages. Oh, you won't get any argument from me on that one -- no need for *two* surrogate keys in a table. IME, most tables don't need even *one*. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match