On Dec 4, 2005, at 6:51 PM, Kevin Toppenberg wrote:
Our billing/scheduling software runs on Oracle. Today I had the "pleasure" of working on my schedule on it. I thought I was going to scream because it was so SLOW. And this is running on a state- of-art (2005) windows server.
Thought? I'm sure we've all had good and bad experiences with database performance. I certainly have. My sense is that it is really difficult to come up reliable generalizations, but the more you try to do, the more work you have to do to do it. If you want to run any query you can imagine, that's going to cost more than doing a simple lookup on an indexed key. One way of thinking about it is that the Search option is essentially equivalent to SELECT ... WHERE in terms of expressivity, and we all know how relatively slow and resource intensive it can be. Fortunately, it is rarely the case that you need this much flexibility, and ordinary (Fileman) queries generally suffice.
For example, I have approximately 8500 "slots" into which appointments can be put for 2006. There was a mixup and the location for these slots was set incorrectly. So I told the database to reassign the location from TMG to TMGLAUGH. It took aproximately 30 minutes to this! This works out to about 5 transactions/sec. And this was a weekend, so there was no one else using the server. It was painfully slow, because there was several other things I had to do, and waiting these long times each cycle was quite a pain.
I like to think of relational databases as being like diesel trains. They're fast and powerful, but they take time to get up to speed.
A few weeks ago I was working on creating these slots. I would tell it that I wanted 15 min appt slots from 1:00 to 4:30 1/1/2006-12/31/2006, start a timer, and then do something else. It would take 30-45 minutes to do this! Then I would tell it that I wanted a 30 min physical slot at two given times during the day and set it churning. That would take another 30 minutes or so. I probably had 5-6 such cycles I had to wade through. I couldn't help but wonder if an M database wouldn't have blown through this.
I wouldn't be surprised.
Now, at this point, it would be easy to fall into a relational- database bashing party. But I would like comments on WHY (from a technology point of view) is it intrinsically slower? Bhaskar made a comment once that he suspected that relational database might be using b-trees in the background to increase speed, while still presenting a flat table to the user. I wonder if that is true, or if it would be possible.
Oh, I'm sure they do. Are you familiar with self-balancing binary trees? They are just ordinary binary (or nearly binary) trees, but INSERT operations that would cause a branch to grow too far are followed by ROTATE operations that serve to put the tree back into balance. This is important because it ensures that lookup operations run in logarithmic time, but at the cost of insertions that are slower on the average. If you pick up a standard book on algorithms (I recommend (Cormen, Leiserson, Riven and Stein, "Introduction to Algorithms"), you'll gind a variety of algorithms for implementing balanced trees (2-3 trees, red-black trees, 2-3-4 trees and AVL trees, for instance). The trouble is that these trees are all close to binary, so rotations are very common, making them impractical for use in a DBMS. A B-tree simply takes this idea and extends it to trees where a node may have many children (and I wonder if the motivation for the name might not have been that they are "bushy"). Since B-trees have a small depth relative to their breadth, there is a fixed slowdown due to linear searches, but it is bounded and predictable. On the other hand, the structures are much more stable, and the the path you need to follow down the tree to find a node is much shorter, making them suitable for use in a DBMS.
It seems that the computationally-intense taks for a relational database is the indexing.
Right.
Of course this must involve sorting. And M users have learned that "M means never having to sort."
Don't take that too seriously! There are many ways a global (or local, for that matter) array could be implemented, but the modest size limit on subscripts (as opposed to nodes) makes hashing an attractive alternative. The idea behind hashing is to compute a number from a string in such a way that different strings are unlikely to hash to the same value, but such that the values computed are reasonably small. Then you can use the hash values to index into a table where the values are actually stored. Of course, there can be "collisions" where different strings hash to the same value. In this case, you'll probably have a list of nodes that needs to be searched linearly. Of course, balanced trees like AVL trees are also an option, but updating hash tables is fast, and lookups are nearly as fast, and the memory requirements are predictable, making them an attractive option. But no matter what, the sorting has to occur somewhere -- you can't get something for nothing, not even in M.
But is it just that M is sorting for us in the background? I.e. the work is still being done, we just don't have to worry about it. Or is it using a b-tree that intrinsically doesn't have to be indexed. Is this where the speed advantage comes from?
I've never had the occasion (or motivation, perhaps) to dig into a MUMPS implementation, but I hope these general observations will be helpful.
Any thoughts? Kevin
=== Gregory Woodhouse [EMAIL PROTECTED] "One must act on what has not yet happened." --Lao Tzu ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Hardhats-members mailing list Hardhats-members@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/hardhats-members