RE: [JBoss-user] Re: random keys

2001-06-27 Thread Wood, Alan

Sorry for this being off topic a bit.this is probably better discussed
in some sort of EJB User Mailing List rather than JBoss itself.

True.  I put it up since some databases don't have that concept.  Also,
there is no real bit twiddling needed:

#1 On startup:
---
Load Data From Database into Integer object (assumes last 8 bits are 0)
Save value into object
Incremement by 256
Store new data into Database

#2 On key gen:
---
Give current value stored in object to requestor
Increment current value by 1
If you've incremented 255 times, then it is time to get a new key.
  (Repeat step #1 above)

Just that conceptually you are incrementing the low side and keeping the
high side of the integer the same.  :)

There really is no large difference between using a counter and a table to
store the high value...just that a table can be done in all databases, so if
your application needs to be vendor/db neutral, then this can be a better
mechanism.  I think that to make it truely vendor/db neutral, you have to
use an entity bean for the high valuebut I'm not positive about that.

Alan



-Original Message-
From: David Jencks [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 26, 2001 5:36 PM
To: [EMAIL PROTECTED]
Subject: RE: [JBoss-user] Re: random keys


Hi,
You can get the same effect with the generators and sequences I am familiar
with by requesting a large step or increment. Then you don't need to do bit
twiddling.  If your db supports generators/sequences use them.  

david jencks

On 2001.06.26 16:38:03 -0400 Wood, Alan wrote:
  
  If you're going to run in multiple jvm's, the solution is a counter
  table. Lock the counter table when you update the count. This doesn't
 perform
  as bad as it sounds.
 
 Which would leave this alternative of course. It somehow seems like
 overkill to me, but it may very well be the option I land on.
 
 Thanks for the input.
 
 Cheers
  Bent D
 
 An alternative to this is discussed in many forums (including
 theserverside.com).  It sacrifices loss of key space for a bit of
 performance.  I'm just reading up on it, but the basics are:
 
 Make a key an integer with a 24 bit high index value and an 8 bit low
 index
 value.  (You can use any bit split you would like.  Use 32  32 if you
 needs
 lots of room in the key space) The result would be a true 32 bit integer.
 
 Hold the last used 24 bit high index value in the database exactly as
 mentioned in the previous post.  
 
 Create an SessionBean that will generate a unique key for you.  Make it
 stateless, although it will hold state.  (The state in this case is
 discardable...if the state is missing, a new one can be generated.)
 
 Either encapsulate the high value into an entitybean, or just use direct
 sql
 in the SessionBean to load the high value when the bean is created. 
 Since
 the SessionBean is stateless, it will probably be created in a pool (set
 it
 to a small # of instances) and only get initialized once per run of the
 EJB
 server.  (Not a requirement though)
 
 Now, generate the lower order keys as you are called.  Just increment the
 low value, and append it to the high value to create the full integer. 
 (Or
 in other words, add 1 to your saved key).
 
 Detect if you are going to go into the next high value range, and if you
 are
 then reload the high value from the database (incrementing it when you
 do).
 
 This method requires the following:
 
You can atomically read the high value, increment the high value, 
write the high value at the database level.  (SELECT FOR UPDATE,
Lock the record, or whatever needs to be done.  I believe if you put
it into an entity bean, then this will already be done for you if you
make an operation (method) that does the increment and mark the bean
for transaction level REQUIRES_NEW, or REQUIRES ??).
 
 This allows many different SessionBeans across multiple VMs to generate
 unique keys (since they will all have different high values).  It will
 result in some of the integer space not being used (due to shutdowns of
 the
 server, etc.)  It should also allow pooling of the unique key generator
 so
 that you have less of a bottleneck there during entity creation.
 
 I'd still keep the pool size limited though so that your key space isn't
 used up at each server shutdown.
 
 Keep in mind, I'm still learning and if someone could correct me if I got
 something wrong, I'd be much appreciative.  But, I've read this mechanism
 a
 few times and it seems solid enough.
 
 Hope this helps,
 
 Alan
 
 
 NOTICE:  This transmission, and any attached files, may contain
 information from 
 Genaissance Pharmaceuticals which is confidential and/or legally
 privileged.  
 Such information is intended only for the use of the individual or entity
 to whom 
 this transmission is addressed.  If you are not the intended recipient

Re: [JBoss-user] Re: random keys

2001-06-26 Thread danch

[EMAIL PROTECTED] wrote:

 On Tue, Jun 26, 2001 at 05:32:43PM -0400, Michael Bilow wrote:
 
 On 2001-06-26 at 21:45 +0200, [EMAIL PROTECTED] wrote:
 
 
 Why would this matter? Do databases assume that records with primary
 keys near one another will often be used together?
 
 Yes, this is why they are called primary keys.  Traditionally, database
 engines would try to entry-sequence records by primary key, and there
 remains an expectation that access by primary key will always be the
 fastest and most efficient mechanism for accessing a table.
 
 
 It seems strange to me that locality would be important in this
 case. The assumption that record number 5 and record number 6 are
 inherently linked more than record 5 and record 8793 are would
 certainly hold for some databases, but that it should be true in the
 general case (or even just often enough that it matters)?

(From my understanding of relational databases) Actually it's not so 
much that the database assumes that they're linked as that the table is 
generally organized in some sort of B-Tree structure where the record's 
location in the tree (what page it's on) is determined by the primary 
key. This way when the database does a search by primary key, once it's 
found the key it doesn't need another IO to get the actual data. This 
just optimizes the case of finding by the primary key.


 
 I can only see the usefulness for binary search, but there you would
 presumably build index tables anyway so actual location of data
 doesn't matter.

actual location matters only in terms of io. 

-danch




___
JBoss-user mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/jboss-user



Re: [JBoss-user] Re: random keys

2001-06-26 Thread Michael Bilow

On 2001-06-27 at 00:24 +0200, [EMAIL PROTECTED] wrote:

 On Tue, Jun 26, 2001 at 05:32:43PM -0400, Michael Bilow wrote:
  On 2001-06-26 at 21:45 +0200, [EMAIL PROTECTED] wrote:
  
   Why would this matter? Do databases assume that records with primary
   keys near one another will often be used together?
  
  Yes, this is why they are called primary keys.  Traditionally, database
  engines would try to entry-sequence records by primary key, and there
  remains an expectation that access by primary key will always be the
  fastest and most efficient mechanism for accessing a table.
 
 It seems strange to me that locality would be important in this
 case. The assumption that record number 5 and record number 6 are
 inherently linked more than record 5 and record 8793 are would
 certainly hold for some databases, but that it should be true in the
 general case (or even just often enough that it matters)?
 
 I can only see the usefulness for binary search, but there you would
 presumably build index tables anyway so actual location of data
 doesn't matter.

The main reason why primary key access is expected to be more efficient is
because experience has shown that databases tend to made up of two flavors
of table: tables which are read frequently and written infrequently, which
are usually searched on the same key, and tables which are inserted to
frequently and read not too much more often than they are written.

An example is something like an order entry system where orders are
created in an orders table for customers in a customers table to sell
items that are in an items table.  The items table will be written very
rarely, only when new items are introduced, but will be read frequently.  
Although there might be occasional need to search the items table on some
key other than the primary key, such as a description field, the vast
majority of accesses from the point of view of the database engine will be
to resolve references from other tables and these will all be done by
primary key.  For example, whenever an order is viewed, the orders table
references to items by primary key will have to be resolved through the
items table.  Because of this, optimizing for primary key will usually
result in an order of magnitude performance improvement.  The customers
table may be modified more frequently than the items table, but if there
are regular customers then the customers table will still be modified much
less frequently than the orders table.

The orders table, in turn, is mostly being modified by insertion
operations.  There might be occasions to modify an order record, say to
notate than an order has been shipped or that part of an order is
backordered, but the basic common operation regarding an order table will
be to either insert a new order or to locate all orders associated with
some other entity, such as a customer.  Looking up all orders for a
customer will require resolving through a secondary index on the orders
table, but those references will themselves resolve back to primary keys
in the orders table.  So the end result is that all database accesses are
eventually going to become a search by primary key, and optimizing for
that is invariably a huge win.

-- Mike



___
JBoss-user mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/jboss-user