Re: [Hardhats-members] Thoughts on (relational) database speed

Gregory Woodhouse Sun, 04 Dec 2005 20:32:24 -0800


On Dec 4, 2005, at 6:51 PM, Kevin Toppenberg wrote:

Our billing/scheduling software runs on Oracle. Today I had the"pleasure" of working on my schedule on it. I thought I was goingto scream because it was so SLOW. And this is running on a state-of-art (2005) windows server.

Thought? I'm sure we've all had good and bad experiences withdatabase performance. I certainly have. My sense is that it is reallydifficult to come up reliable generalizations, but the more you tryto do, the more work you have to do to do it. If you want to run anyquery you can imagine, that's going to cost more than doing a simplelookup on an indexed key. One way of thinking about it is that theSearch option is essentially equivalent to SELECT ... WHERE in termsof expressivity, and we all know how relatively slow and resourceintensive it can be. Fortunately, it is rarely the case that you needthis much flexibility, and ordinary (Fileman) queries generally suffice.

For example, I have approximately 8500 "slots" into whichappointments can be put for 2006. There was a mixup and thelocation for these slots was set incorrectly. So I told thedatabase to reassign the location from TMG to TMGLAUGH. It tookaproximately 30 minutes to this! This works out to about 5transactions/sec. And this was a weekend, so there was no one elseusing the server. It was painfully slow, because there was severalother things I had to do, and waiting these long times each cyclewas quite a pain.

I like to think of relational databases as being like diesel trains.They're fast and powerful, but they take time to get up to speed.

A few weeks ago I was working on creating these slots. I wouldtell it that I wanted 15 min appt slots from 1:00 to 4:301/1/2006-12/31/2006, start a timer, and then do something else. Itwould take 30-45 minutes to do this! Then I would tell it that Iwanted a 30 min physical slot at two given times during the day andset it churning. That would take another 30 minutes or so. Iprobably had 5-6 such cycles I had to wade through. I couldn'thelp but wonder if an M database wouldn't have blown through this.


I wouldn't be surprised.

Now, at this point, it would be easy to fall into a relational-database bashing party. But I would like comments on WHY (from atechnology point of view) is it intrinsically slower? Bhaskar madea comment once that he suspected that relational database might beusing b-trees in the background to increase speed, while stillpresenting a flat table to the user. I wonder if that is true, orif it would be possible.

Oh, I'm sure they do. Are you familiar with self-balancing binarytrees? They are just ordinary binary (or nearly binary) trees, butINSERT operations that would cause a branch to grow too far arefollowed by ROTATE operations that serve to put the tree back intobalance. This is important because it ensures that lookup operationsrun in logarithmic time, but at the cost of insertions that areslower on the average. If you pick up a standard book on algorithms(I recommend (Cormen, Leiserson, Riven and Stein, "Introduction toAlgorithms"), you'll gind a variety of algorithms for implementingbalanced trees (2-3 trees, red-black trees, 2-3-4 trees and AVLtrees, for instance). The trouble is that these trees are all closeto binary, so rotations are very common, making them impractical foruse in a DBMS. A B-tree simply takes this idea and extends it totrees where a node may have many children (and I wonder if themotivation for the name might not have been that they are "bushy").Since B-trees have a small depth relative to their breadth, there isa fixed slowdown due to linear searches, but it is bounded andpredictable. On the other hand, the structures are much more stable,and the the path you need to follow down the tree to find a node ismuch shorter, making them suitable for use in a DBMS.

It seems that the computationally-intense taks for a relationaldatabase is the indexing.


Right.

Of course this must involve sorting. And M users have learned that"M means never having to sort."

Don't take that too seriously! There are many ways a global (orlocal, for that matter) array could be implemented, but the modestsize limit on subscripts (as opposed to nodes) makes hashing anattractive alternative. The idea behind hashing is to compute anumber from a string in such a way that different strings areunlikely to hash to the same value, but such that the values computedare reasonably small. Then you can use the hash values to index intoa table where the values are actually stored. Of course, there can be"collisions" where different strings hash to the same value. In thiscase, you'll probably have a list of nodes that needs to be searchedlinearly. Of course, balanced trees like AVL trees are also anoption, but updating hash tables is fast, and lookups are nearly asfast, and the memory requirements are predictable, making them anattractive option. But no matter what, the sorting has to occursomewhere -- you can't get something for nothing, not even in M.

But is it just that M is sorting for us in the background? I.e. thework is still being done, we just don't have to worry about it. Oris it using a b-tree that intrinsically doesn't have to beindexed. Is this where the speed advantage comes from?

I've never had the occasion (or motivation, perhaps) to dig into aMUMPS implementation, but I hope these general observations will behelpful.


Any thoughts?

Kevin


===
Gregory Woodhouse
[EMAIL PROTECTED]

"One must act on what has not yet happened."
--Lao Tzu





-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Hardhats-members mailing list
Hardhats-members@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/hardhats-members

Re: [Hardhats-members] Thoughts on (relational) database speed

Reply via email to