Score another one for random testing! :)
On Sun, Nov 18, 2012 at 10:26 PM, Danny Yoo <d...@hashcollision.org> wrote: > > > On Sun, Nov 18, 2012 at 4:24 PM, Pierpaolo Bernardi <olopie...@gmail.com> > wrote: >> >> How does compare to builtin mutable hashes? > > > > The following code represents a rough hashtable equivalent of what my rb > code would be enabling (quick search for word by position): > > > ;; We might be curious as to the overhead of the tree structure. > ;; (of course, it's worth it because we are dealing with a dynamic set > here.) > ;; Still, let's compare with inserting into a native hash: > (printf "just for comparison: inserting in a native hash:\n") > (let ([ht (make-hash)]) > (time > (for/fold ([acc 0]) ([word (in-list (force all-words))]) > (hash-set! ht acc word) > (+ acc (string-length word))))) > > > It's also useful to compare this vs the existing splay tree approach in > syntax-color/token-tree: > > (printf "just for comparison: inserting in the original token > tree:\n") > (let ([t (new token-tree%)]) > (time > (for ([word (in-list (force all-words))]) > (insert-last-spec! t (string-length word) word)))) > > > Here's the output of the insertion benchmark: > > ;; (from the rb-tree insertion) > inserting 235886 words at the end... > cpu time: 204 real time: 205 gc time: 0 > > just for comparison: inserting in a native hash: > cpu time: 108 real time: 107 gc time: 0 > > just for comparison: inserting in the original token tree: > cpu time: 51 real time: 51 gc time: 0 > > > So compared to the rb-tree version, insertions into the hashtable are about > twice as fast. And as one might expect, the splay tree bulk insertion is > the fastest: it doesn't deal with balance at insertion time and can it delay > that work until we start searching the structure. > > The rb-tree (as well as the original splay code) allows for much more > flexible searches and dynamic updates into the sequence structure than the > hash, so it's definitely worth the extra complexity. My conjecture is that > the non-allocating nature of the rb-tree, as well as its always-balanced > structure, will be worth the cost of the extra insertion time vs splays. I > just hope I can show it! :) I'll see tomorrow when I code up a token-tree% > implementation and start measuring times in DrRacket. > > I just got done with the fundamental rb-tree data structures this evening. > Thank goodness Robby strongly recommended me to write a randomized testing > approach. It caught a lot of things I hadn't even considered. > > _________________________ > Racket Developers list: > http://lists.racket-lang.org/dev > _________________________ Racket Developers list: http://lists.racket-lang.org/dev