It's interesting that some people are arguing over the key strategy. I personally would rather use UUID's than sequences, identities, integers (some of which are DB specific, and some aren't even safe on the cluster). Most of the problems people are pointing out would probably be valid if roller was hitting the database directly every time, but for people who looked at the code knows that it's doing a lot of caching in order to scale. Changing your key strategy won't allow you to scale any better. It would be better IMHO if the developers instead focused on improving caching even more, then make the roller work properly on a cluster. It's too bad Tangosol doesn't offer an offer source license for Coherence, but then again - there are other distributed caches out there like Jboss Cache and etc.
George Daswani -----Original Message----- From: Greg Hamer [mailto:[EMAIL PROTECTED] Sent: Thursday, March 16, 2006 5:50 PM To: [email protected] Subject: Re: why primary key is using hex instead of integer Following is a minor addendum to my email Tuesday. In my work today I was reviewing the following from the MySQL 5.1 Reference Manual: 7.4.2. Make Your Data as Small as Possible Particularly relevant excerpts follow below. I believe Oracle, Microsoft, Sybase and IBM provide similar guidelines. EXCERPTS One of the most basic optimizations is to design your tables to take as little space on the disk as possible. This can result in huge improvements because disk reads are faster, and smaller tables normally require less main memory while their contents are being actively processed during query execution. Indexing also is a lesser resource burden if done on smaller columns. ... You can get better performance for a table and minimize storage space by using the techniques listed here: a.. Use the most efficient (smallest) data types possible. MySQL has many specialized types that save disk space and memory. For example, use the smaller integer types if possible to get smaller tables. MEDIUMINT is often a better choice than INT because a MEDIUMINT column uses 25% less space. b.. ... c.. The primary index of a table should be as short as possible. This makes identification of each row easy and efficient. d.. ... e.. ... Shorter indexes are faster, not only because they require less disk space, but because they give also you more hits in the index cache, and thus fewer disk seeks. ----- Original Message ----- From: Greg Hamer To: [email protected] Sent: Tuesday, March 14, 2006 2:52 PM Subject: Re: why primary key is using hex instead of integer Based on my experience as a DBA, I suspect there are penalties, but you are only likely to see them under load. All data returned from databases passes through RAM. All databases enhance performance via caching data in RAM. The longer a row (i.e the more bytes in a row), the less rows that can be cached in a given amount of RAM. Making primary keys that require 32 bytes of storage per row versus, say integers which require 4 bytes, reduces the amount of data that can be served from cache. Now multiply the 32 vs 4 bytes by: a) every foreign key referencing back to primary key; b) every non-clustered index on the primary key; and c) every index on the foreign keys. Add it all up and there likely is detrimental impact on: a) performance; b) scalability; and c) cost (for both disks and RAM). I think this is way 9 out of 10 DBAs prefer the shortest, most compact datatype possible for primary keys ;-) Yes, there are cases where uuids are nice (e.g. Elias and Matt's examples of "merging content from different servers" and "migrate a blog"). But having the larger datatype introduces a penalty that you pay 100% of the time, vs these less frequent cases when having uuids help out. Just my 2ยข. g P.S. a quick google turned up the following link of a MSSQL DBA's thoughts on GUIDs as primary keys: http://www.sqljunkies.com/WebLog/odds_and_ends/archive/2005/08/31/16595.aspx ----- Original Message ----- From: "Elias Torres" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Tuesday, March 14, 2006 8:46 AM Subject: Re: why primary key is using hex instead of integer -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I don't think it imposes a performance hit and even if it did it's neglible. Once the database has indexed the strings or the ints the lookup cost should be the same. I think it might help if we used fixed char(32) as opposed to varchar(40), but that's mostly a space issue since we are not using the last 8 bytes. - -Elias Allen Gilliland wrote: > I don't know about why this decision was made historically, but i would > agree that technically this imposes a slight performance hit. AFAIK, > databases can do lookups on purely numeric keys a little bit faster than > alphanumerics. > > I have no idea if the difference is significant or not. > > -- Allen > > > On Tue, 2006-03-14 at 07:25, Lance Lavandowska wrote: > >>IIRC it is largely historical: at one point Roller used Castor, which >>has as it's "default" a GUID generator that creates a 30-char primary >>key. >> >>Lance >> >>On 3/14/06, David M Johnson <[EMAIL PROTECTED]> wrote: >> >>>On Mar 11, 2006, at 4:45 PM, BigLiu wrote: >>> >>>>I want to create some customer tables to extend roller. But I have >>>>question >>>>regarding why the primary key is defined in hex instead of integer? >>>>Will >>>>this cause any performance problem? >>> >>>I can't really remember why I chose the hex key. I don't think it has >>>a significant impact on performance. >>> >>>- Dave
