Re: RE: Are Gigantic Associative Arrays Now Possible?

dlangPupil via Digitalmars-d Fri, 24 Mar 2017 10:02:05 -0700

On Friday, 24 March 2017 at 06:30:25 UTC, H. S. Teoh wrote:

You have to keep in mind that one downside of hashtables isthat they tend to be unfriendly towards caching hierarchies

...

For certain applications, where key lookups are more-or-lessrandom, this is the best you could do, but for a good number ofapplications, there tends to be some correlation betweenlookups, and even things like B-trees could potentially dobetter because of their cache-friendliness (and locality), evenif you assume I/O is much faster than your traditionalspindle-based hard drive. The O(log n) could have a muchsmaller constant than the hashtable O(1) if lookups tend to becorrelated (i.e., exhibit locality), and lots more cache missesare incurred in the latter.
Having said that, though, large-capacity low-latency SSDs arevery interesting to me because I'm interested in certainapplications that involve very large lookup tables. CurrentlyI'm just using traditional hashtables, but once you get to acertain size, I/O overhead dominates and progress grinds to ahalt. I've been researching ways of taking advantage oflocality to ease this, but so far haven't gotten it to actuallywork yet.
T

Thanks T for the great insight. Very helpful! Nice to see thatyou, Laeeth, the ACM, Intel and Micron all agree: this newtechnology could be very disruptive.

In addition to prevalence of random lookups, certain additionalapp conditions and designs might make "gigantic AAs" more usefulor competitive with alternatives:

1. Use by programmers who need a data structure that's easier toreason about or faster and safer to implement or prototype, andeasier for future users to read and maintain.

2. Apps with large tables that use (or could use) non-naturalkeys, e.g., crypto-quality GUIDs, which can't benefit from (orimpose the extra computational cost of) the sorting that must beperformed and maintained on natural key values (e.g.,LastName="Smith") in order to use a B-Tree search.

3. Tables (and related tables) whose row data (and evenaggregate data) could be "clumped" (think "documentized") toachieve a singularity (and not merely some degree of locality),thus consolidating multiple lookups into a single lookup.

-In other words, a key's value is itself an array of data thatcan be fetched in a single lookup.

-The key id for each "next" (updated) version of a documentcould also be stored with the current version.

-The next key-value entry to hold the update could be createdimmediately but not have its values written unless and until theprior version of the document has changed.

-Better yet in an append-only design, creation of the nextkey-value entry could be deferred until a revision actuallyoccurs.

-Such "chained documentization" would automate histories, andcould be further abstracted and adapted (e.g., with columnstoreconcepts) to accommodate apps in which lookups aren't mostlyrandom.

Finally, re: caches: I haven't found whether it is or isn'tpossible to combine a server's DRAM and its Optane SSD RAM orDIMMs to form a single pool of RAM. Mixing DIMMS of differentspeeds is a no-no; but if this could be done, then the hottestdata could be "cached" in DRAM and thus spared the 10x latencypenalty of the SSD device.

Re: RE: Are Gigantic Associative Arrays Now Possible?

Reply via email to