On Fri, Mar 24, 2017 at 04:59:00PM +0000, dlangPupil via Digitalmars-d wrote: [...] > In addition to prevalence of random lookups, certain additional app > conditions and designs might make "gigantic AAs" more useful or > competitive with alternatives: > > 1. Use by programmers who need a data structure that's easier to > reason about or faster and safer to implement or prototype, and easier > for future users to read and maintain.
This point is arguable, since a B tree is pretty well-known and thus not difficult to reason about. Generally, whether you use a hashtable of a B tree or something else, the high-level view is that it's some opaque data structure that, given some key K, returns some value V. Which underlying algorithm is used is an implementation detail. And generally as far as application programmers are concerned, you wouldn't want to implement an advanced data structure from scratch; you'd use an off-the-shelf library that has already implemented it. > 2. Apps with large tables that use (or could use) non-natural keys, > e.g., crypto-quality GUIDs, which can't benefit from (or impose the > extra computational cost of) the sorting that must be performed and > maintained on natural key values (e.g., LastName="Smith") in order to > use a B-Tree search. Encrypted GUIDs is a very good point. You probably can't expect much locality from that. But this point, and the following, presume a rather specific use case, i.e., document retrieval, presumably over the network. The application of hashtables and B-trees, however, is far wider in scope. In particular, what I had in mind when I asked about cache speeds is compute-intensive (as opposed to I/O intensive) processes. These two domains of application are quite different in some ways. In document retrieval, for example, if you view it as a server serving documents over the network to a set of clients, having 1 I/O roundtrip per lookup is probably acceptable, because the time it takes to send the document over the network to the client is large enough that making lookups any faster probably won't make too much of a difference. Besides, if client requests are essentially unpredictable, then 1 I/O roundtrip is about the best you could do. In CPU-intensive computations, however, the cache hierarchy becomes very prominent, because you're talking about CPU speeds vs. RAM speeds, which is currently a rather large gap. Being cache-friendly could mean a difference on the scale of orders of magnitude in terms of overall performance. If you're talking about data lookups inside a compute-intensive inner loop, having 1 I/O roundtrip per iteration is unacceptable. Even 1 L1 cache miss per iteration may be unacceptable, depending on the application. Ideally, you want to maximize the number of iterations per L1 cache miss, and then on the next level, squeeze as many L1 misses onto L2 as possible before you have to go to L3 or revert to RAM, because RAM is slow, relatively speaking, even if it's still very fast compared to an I/O roundtrip to the hard disk. In other words, you want as much locality as you can possibly get in your memory accesses -- the entire cache hierarchy, after all, is built on the premise of memory accesses exhibiting locality. Furthermore, when it comes to CPU-intensive applications, quite often you already know the pattern of lookups beforehand, since it's usually coming from within the application itself, not from external clients that may be completely unpredictable. So you can usually predict the next n memory accesses much more easily, and take advantage of it by organizing your data structure around that. [...] > Finally, re: caches: I haven't found whether it is or isn't possible > to combine a server's DRAM and its Optane SSD RAM or DIMMs to form a > single pool of RAM. Mixing DIMMS of different speeds is a no-no; but > if this could be done, then the hottest data could be "cached" in DRAM > and thus spared the 10x latency penalty of the SSD device. If the SSD device incurs a 10x latency penalty, that's still a far cry from L1/L2 cache latencies! True, it will help in the worst case when you have no choice but to take an I/O roundtrip (a hard drive would be far slower)... but this would hardly be anything revolutionary if DRAM is still faster. At the end of the day, you're still talking about a memory hierarchy with L1/L2 at the top, L3 and RAM in between, and hard drive / SSD at the bottom. If so, then the cache-unfriendliness of hashtables still applies, and we still have to work with finding ways of improving hashtable locality, e.g., with locality-sensitive hashing, cache-oblivious hashtables, etc., or use a more cache-friendly data structure like B-trees or one of the newer cache-oblivious trees. (In my case, though, B-trees may not represent much of an improvement, because I'm dealing with high-dimensional data that cannot be easily linearized to take maximum advantage of B-tree locality. So at some level I still need some kind of hash-like structure to work with my data. But it will probably have some tree-like structure to it, because of the (high-dimensional) locality it exhibits.) T -- Without geometry, life would be pointless. -- VS