It's interesting that some people are arguing over the key strategy.  I 
personally would rather use UUID's than sequences, identities, integers (some 
of which are DB specific, and some aren't even safe on the cluster).  Most of 
the problems people are pointing out would probably be valid if roller was 
hitting the database directly every time, but for people who looked at the code 
knows that it's doing a lot of caching in order to scale.  Changing your key 
strategy won't allow you to scale any better.  It would be better IMHO if the 
developers instead focused on improving caching even more, then make the roller 
work properly on a cluster.  It's too bad Tangosol doesn't offer an offer 
source license for Coherence, but then again - there are other distributed 
caches out there like Jboss Cache and etc.

George Daswani
-----Original Message-----
From: Greg Hamer [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 16, 2006 5:50 PM
To: [email protected]
Subject: Re: why primary key is using hex instead of integer

Following is a minor addendum to my email Tuesday.

In my work today I was reviewing the following from the MySQL 5.1 Reference 
Manual:  
7.4.2. Make Your Data as Small as Possible

Particularly relevant excerpts follow below.  I believe Oracle, Microsoft, 
Sybase and IBM provide similar guidelines.


EXCERPTS

One of the most basic optimizations is to design your tables to take as little 
space on the disk as possible. This can result in huge improvements because 
disk reads are faster, and smaller tables normally require less main memory 
while their contents are being actively processed during query execution. 
Indexing also is a lesser resource burden if done on smaller columns.

...

You can get better performance for a table and minimize storage space by using 
the techniques listed here:
  a.. Use the most efficient (smallest) data types possible. MySQL has many 
specialized types that save disk space and memory. For example, use the smaller 
integer types if possible to get smaller tables. MEDIUMINT is often a better 
choice than INT because a MEDIUMINT column uses 25% less space. 
  b.. ... 
  c.. The primary index of a table should be as short as possible. This makes 
identification of each row easy and efficient. 
  d.. ... 
  e.. ... Shorter indexes are faster, not only because they require less disk 
space, but because they give also you more hits in the index cache, and thus 
fewer disk seeks.



  ----- Original Message ----- 
  From: Greg Hamer 
  To: [email protected] 
  Sent: Tuesday, March 14, 2006 2:52 PM
  Subject: Re: why primary key is using hex instead of integer


  Based on my experience as a DBA, I suspect there are penalties, but you are 
only likely to see them under load.

  All data returned from databases passes through RAM.  All databases enhance 
performance via caching data in RAM.  The longer a row (i.e the more bytes in a 
row), the less rows that can be cached in a given amount of RAM.  Making 
primary keys that require 32 bytes of storage per row versus, say integers 
which require 4 bytes, reduces the amount of data that can be served from cache.

  Now multiply the 32 vs 4 bytes by:  a) every foreign key referencing back to 
primary key; b) every non-clustered index on the primary key; and c) every 
index on the foreign keys.  

  Add it all up and there likely is detrimental impact on:  a) performance; b) 
scalability; and c) cost (for both disks and RAM).

  I think this is way 9 out of 10 DBAs prefer the shortest, most compact 
datatype possible for primary keys ;-)

  Yes, there are cases where uuids are nice (e.g. Elias and Matt's examples of 
"merging content from different servers" and "migrate a blog").  But having the 
larger datatype introduces a penalty that you pay 100% of the time, vs these 
less frequent cases when having uuids help out.

  Just my 2ยข. 

  g

  P.S.  a quick google turned up the following link of a MSSQL DBA's thoughts 
on GUIDs as primary keys:
  http://www.sqljunkies.com/WebLog/odds_and_ends/archive/2005/08/31/16595.aspx


  ----- Original Message ----- 
  From: "Elias Torres" <[EMAIL PROTECTED]>
  To: <[email protected]>
  Sent: Tuesday, March 14, 2006 8:46 AM
  Subject: Re: why primary key is using hex instead of integer


  -----BEGIN PGP SIGNED MESSAGE-----
  Hash: SHA1

  I don't think it imposes a performance hit and even if it did it's
  neglible. Once the database has indexed the strings or the ints the
  lookup cost should be the same. I think it might help if we used fixed
  char(32) as opposed to varchar(40), but that's mostly a space issue
  since we are not using the last 8 bytes.

  - -Elias

  Allen Gilliland wrote:
  > I don't know about why this decision was made historically, but i would
  > agree that technically this imposes a slight performance hit.  AFAIK,
  > databases can do lookups on purely numeric keys a little bit faster than
  > alphanumerics.
  > 
  > I have no idea if the difference is significant or not.
  > 
  > -- Allen
  > 
  > 
  > On Tue, 2006-03-14 at 07:25, Lance Lavandowska wrote:
  > 
  >>IIRC it is largely historical: at one point Roller used Castor, which
  >>has as it's "default" a GUID generator that creates a 30-char primary
  >>key.
  >>
  >>Lance
  >>
  >>On 3/14/06, David M Johnson <[EMAIL PROTECTED]> wrote:
  >>
  >>>On Mar 11, 2006, at 4:45 PM, BigLiu wrote:
  >>>
  >>>>I want to create some customer tables to extend roller. But I have
  >>>>question
  >>>>regarding why the primary key is defined in hex instead of integer?
  >>>>Will
  >>>>this cause any performance problem?
  >>>
  >>>I can't really remember why I chose the hex key. I don't think it has
  >>>a significant impact on performance.
  >>>
  >>>- Dave

Reply via email to