On 4/17/08, J. Andrew Rogers <[EMAIL PROTECTED]> wrote: > > No, you are not correct about this. All good database engines use a > combination of clever adaptive cache replacement algorithms (read: keeps > stuff you are most likely to access next in RAM) and cost-based optimization > (read: optimizes performance by adaptively selecting query execution > algorithms based on measured resource access costs) to optimize performance > across a broad range of use cases. For highly regular access patterns > (read: similar query types and complexity), the engine will converge on very > efficient access patterns and resource management that match this usage. > For irregular access patterns, it will attempt to dynamically select the > best options given recent access history and resource cost statistics -- not > always the best result (on occasion hand optimization could do better), but > more likely to produce good results than simpler rule-based optimization on > average. > > Note that by "good database engine" I am talking engines that actually > support these kinds of tightly integrated and adaptive management features: > Oracle, DB2, PostgreSQL, et al. This does *not* include MySQL, which is a > naive and relatively non-adaptive engine, and which scales much worse and is > generally slower than PostgreSQL anyway if you are looking for a free open > source solution. > > > I would also point out that different engines are optimized for different > use cases. For example, while Oracle and PostgreSQL share the same > transaction model, Oracle design decisions optimized for massive numbers of > small concurrent update transactions and PostgreSQL design decisions > optimized for massive numbers of small concurrent insert/delete transaction. > Databases based on other transaction models, such as IBM's DB2, sacrifice > extreme write concurrency for superior read-only performance. There are > unavoidable tradeoffs with such things, so the market has a diverse ecology > of engines that have chosen a different set of tradeoffs and buyers should > be aware of what these tradeoffs are if scalable performance is a criteria.
Thanks for the info -- I studied database systems almost a decade ago, so I can hardly remember the details =) ARC (Adaptive Cache Replacement) seems to be one of the most popular methods, and it's based on keeping track of "frequently used" and "recently used". Unfortunately, for AGI / inference purposes, those may not be the right optimization objectives. The requirement of inference is that we need to access a lot of *different* nodes, but the same nodes may not be required many times. Perhaps what we need is to *bundle* up nodes that are associated with each other, so we can read a whole block of nodes with 1 disk access. This requires a very special type of storage organization -- it seems that existing DBMSs don't have it =( YKY ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244&id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com