Re: [Chicken-users] Good way to code the equivalent to this?
On Aug 25, 2008, at 12:50 PM, Tobia Conforto wrote: Matt Welland wrote: the tactic of loading lots of data into a hierarchy of hash arrays and then extracting the needed pieces in a myriad of ways, sometimes on the fly in a meeting with management nervously looking on :-) has been tremendously useful for me. I for one am hoping that there are faster hash tables in the future of chicken. Until there are, here's how you can have the best of both worlds, especially considering: 1. the use you seem to make of hash tables (mainly setting new keys and referencing them, rarely or never deleting them) 2. what has been said of alists vs. hash tables for small data sets 3. what has been said of alist->hash-table Just use your own hash table-like structure, that starts as an alist and only becomes a proper hash table when it grows over a certain size, tuneable with a parameter. Something like this should go a long way in letting you keep your coding style and have good performance: (define smart-hash-threshold (make-parameter 20)) (define (make-smart-hash) (cons (list) (void))) ;cdr currently unused (define (smart-hash-ref table key) (if (list? (car table)) (alist-ref key (car table)) (hash-table-ref (car table) key))) (define (smart-hash-set! table key value) (if (list? (car table)) (begin (set-car! table (alist-update! key value (car table))) (if (> (length (car table)) (smart-hash-threshold)) (set-car! table (alist->hash-table (car table) (hash-table-set! (car table) key value))) Tobia The "lookup-table" egg operates in a similar fashion. ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users Best Wishes, Kon ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
Matt Welland wrote: the tactic of loading lots of data into a hierarchy of hash arrays and then extracting the needed pieces in a myriad of ways, sometimes on the fly in a meeting with management nervously looking on :-) has been tremendously useful for me. I for one am hoping that there are faster hash tables in the future of chicken. Until there are, here's how you can have the best of both worlds, especially considering: 1. the use you seem to make of hash tables (mainly setting new keys and referencing them, rarely or never deleting them) 2. what has been said of alists vs. hash tables for small data sets 3. what has been said of alist->hash-table Just use your own hash table-like structure, that starts as an alist and only becomes a proper hash table when it grows over a certain size, tuneable with a parameter. Something like this should go a long way in letting you keep your coding style and have good performance: (define smart-hash-threshold (make-parameter 20)) (define (make-smart-hash) (cons (list) (void))) ;cdr currently unused (define (smart-hash-ref table key) (if (list? (car table)) (alist-ref key (car table)) (hash-table-ref (car table) key))) (define (smart-hash-set! table key value) (if (list? (car table)) (begin (set-car! table (alist-update! key value (car table))) (if (> (length (car table)) (smart-hash-threshold)) (set-car! table (alist->hash-table (car table) (hash-table-set! (car table) key value))) Tobia ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
On Mon, 25 Aug 2008, Tobia Conforto wrote: Elf wrote: (define a (alist->hash-table (let loop ((i 0)) (if (fx= 25 i) '() (cons (cons (random 50) (random 50)) (loop (fx+ 1 i) =)) for an improvement in time (surprisingly), use (define a (alist->hash-table (let loop ((i 0) (r '())) (if (fx= 25 i) r (loop (fx+ 1 i) (cons (cons (random 50) (random 50)) r =)) ...there are a lot more minor GCs this way Why is it surprising that the tail-recursive version performs better? And why do you say it has more GC? Isn't it supposed to produce less garbage? well, i say it has more cause thats what the stats said :) the reason why it would be more in large cases like this is because it has to repeatedly copy a large structure in the iterative-tail-recursion version, whereas in the stacking recursion it just has to copy the frames onto the heap every so often. -elf ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
i got almost the same time as with my previous (incorrect) method by using a single hash table and an alist instead of a hash for the inner structure. memory usage isnt so good though. :( using a sparse multiple-alist structure seems to work nicely in small space and good time. i can post the code if interested. -elf On Sun, 24 Aug 2008, Jim Ursetto wrote: Note that your solution effects a mapping of x -> y, not from x -> y -> #t. That gets the job accomplished correctly, but doesn't reflect the original Perl code. For a fair comparison, you have to change the perl code to match, and then it runs in 1/3 of the memory and 1/2 the time of the original perl code. This is what I mean concretely: print "Filling the array with 25 entries.\n"; foreach $n (0 .. 25) { $x = int(rand(50)); $y = int(rand(50)); $a{$x} = $y; # change to one-level hash } print "Reading from the array 1 times\n"; $hits=0; foreach $n (0 .. 1) { $x = int(rand(50)); $y = int(rand(50)); if (exists $a{$x} and $a{$x} == $y) { # quell warning on undef $hits++; } } print "Done (hits $hits)\n"; On Sun, Aug 24, 2008 at 5:56 AM, Elf <[EMAIL PROTECTED]> wrote: for an improvement in time (surprisingly), use (define a (alist->hash-table (let loop ((i 0) (r '())) (if (fx= 25 i) r (loop (fx+ 1 i) (cons (cons (random 50) (random 50)) r =)) ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
Elf wrote: (define a (alist->hash-table (let loop ((i 0)) (if (fx= 25 i) '() (cons (cons (random 50) (random 50)) (loop (fx+ 1 i) =)) for an improvement in time (surprisingly), use (define a (alist->hash-table (let loop ((i 0) (r '())) (if (fx= 25 i) r (loop (fx+ 1 i) (cons (cons (random 50) (random 50)) r =)) ...there are a lot more minor GCs this way Why is it surprising that the tail-recursive version performs better? And why do you say it has more GC? Isn't it supposed to produce less garbage? Tobia ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
Wow, thanks. That cut my run time in half to aprox 8..4 seconds. I'm using chicken 2.74 on this machine. I've tried 3.3 at a different machine but not seen much difference. Note that changing to non-hierarchical hashes breaks a lot of my existing code and is not exactly apples to apples with the perl. Still useful though and I may be able to use that approach with a little foresight next time. On Sun, Aug 24, 2008 at 9:36 PM, Alex Shinn <[EMAIL PROTECTED]> wrote: >> "Matt" == Matt Welland <[EMAIL PROTECTED]> writes: > >Matt> chicken: ~16 secs (when it didn't crash) >Matt> stk: ~17 secs >Matt> ruby: ~1.3 secs >Matt> perl: ~0.7 secs > > Which Chicken version are you timing, and with what compiler > optimizations? If you use -Ob and a suitable initial heap > size, then on my machine elf's version runs in under 700ms, > as opposed to ~500ms for the Perl version. Use the command: > > $ csc -Ob -heap-initial-size 300M file.scm > > If you really just want to get the job done regardless of > method, you can note that in this particular case a simple > vector is suitable and skip the hash tables altogether. > > This is over twice as fast as the perl version: > > > (define a (make-vector 50 '())) > > (print "filling ...") > > (do ((i 0 (add1 i))) >((>= i 25)) > (let ((x (random 50)) >(y (random 50))) >(vector-set! a x (cons y (vector-ref a x) > > (print "reading ...") > > (define hits 0) > > (do ((i 0 (add1 i))) >((>= i 1)) > (let ((x (random 50)) >(y (random 50))) >(if (memq y (vector-ref a x)) >(set! hits (add1 hits) > > (print "done.") > > > > -- > Alex > ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
Note that your solution effects a mapping of x -> y, not from x -> y -> #t. That gets the job accomplished correctly, but doesn't reflect the original Perl code. For a fair comparison, you have to change the perl code to match, and then it runs in 1/3 of the memory and 1/2 the time of the original perl code. This is what I mean concretely: print "Filling the array with 25 entries.\n"; foreach $n (0 .. 25) { $x = int(rand(50)); $y = int(rand(50)); $a{$x} = $y; # change to one-level hash } print "Reading from the array 1 times\n"; $hits=0; foreach $n (0 .. 1) { $x = int(rand(50)); $y = int(rand(50)); if (exists $a{$x} and $a{$x} == $y) { # quell warning on undef $hits++; } } print "Done (hits $hits)\n"; On Sun, Aug 24, 2008 at 5:56 AM, Elf <[EMAIL PROTECTED]> wrote: > > for an improvement in time (surprisingly), use > > (define a >(alist->hash-table >(let loop ((i 0) > (r '())) >(if (fx= 25 i) >r >(loop (fx+ 1 i) > (cons (cons (random 50) (random 50)) r >=)) ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
Thanks all for the various workarounds. The ones I understand I will try. The others I'll study :-) My test results were as follows (this is with 300,000 entries - closer to what my current problem is hitting): chicken: ~16 secs (when it didn't crash) stk:~17 secs ruby: ~1.3 secs perl: ~0.7 secs By the by the tactic of loading lots of data into a hierarchy of hash arrays and then extracting the needed pieces in a myriad of ways, sometimes on the fly in a meeting with management nervously looking on :-) has been tremendously useful for me. I for one am hoping that there are faster hash tables in the future of chicken. ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
As a comparison, Felix's solution performs equivalently to my alist example when compiled, but runs much faster from the REPL. It also uses 9MB instead of 13MB. It's also simpler. On Sun, Aug 24, 2008 at 4:06 AM, felix winkelmann <[EMAIL PROTECTED]> wrote: > This should perform much better and more closely > resembles what you are trying to achieve. ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
one final note: the time and memory usage for the list-tabulate, map!, and unfold solutions, on my box, are approximately 0.4s slower than perl, with slightly better memory usage stats (in chicken, that is). hope this helps. -elf ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
i've been testing various solutions, which ill be happy to post if interested. all of them involve alist->hash-table. so far, the fastest solution, on average, has been list-tabluate to generate the list. (map! was a close contender, though.) the solution with the least GCs, on average, uses unfold for generation. all of these have been run in the interpreter, not compiled. -elf ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
for an improvement in time (surprisingly), use (define a (alist->hash-table (let loop ((i 0) (r '())) (if (fx= 25 i) r (loop (fx+ 1 i) (cons (cons (random 50) (random 50)) r =)) instead. personally, id test both on whatever box youre running, as there are a lot more minor GCs this way. the alist->hash has an interesting property, though... the first value (if there are duplicate keys) seems to be the one thats used, the others are ignored. however, its a lot faster to build the list than to do the hash-table-set!, interestingly. -elf ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
my solution :) its faster (at least on chicken 3.3.something). timing on this box put this at just under 8s. -elf --- (use extras) (print "filling ...") (define a (alist->hash-table (let loop ((i 0)) (if (fx= 25 i) '() (cons (cons (random 50) (random 50)) (loop (fx+ 1 i) =)) (print "reading ...") (define hits (let loop ((i 0) (r 0)) (if (fx= i 1) r (if (fx= (random 50) (hash-table-ref/default a (random 50) -1)) (loop (fx+ 1 i) (fx+ 1 r)) (loop (fx+ 1 i) r) (print "done.") ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
Hi! This should perform much better and more closely resembles what you are trying to achieve. Note that Perl's data structures are bummed to absurd dimensions. We don't want that. cheers, felix -- (use extras) (define a (make-hash-table equal?)) (print "filling ...") (do ((i 0 (add1 i))) ((>= i 25)) (hash-table-set! a (cons (random 50) (random 50)) #t)) (print "reading ...") (do ((i 0 (add1 i)) (hits 0)) ((>= i 1)) (let ((x (random 50)) (y (random 50))) (when (hash-table-ref/default a (cons x y) #f) (set! hits (add1 hits) (print "done.") ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
>>> My attempts all use gigs of memory and run 10x as long. Let's take a look at why so much memory is being used, using the following very similar code. (define a (make-hash-table =)) (let loop ((n 25)) (let ((x (random 50)) (y (random 50))) (when (> n 0) (hash-table-update! a x (lambda (h) (hash-table-set! h y #t) h) (cut make-hash-table eq?)) (loop (sub1 n) $ csi #;1> ,r Memory: heap size is 50 bytes with 361224 bytes currently in use We evaluate the code above and recheck the memory usage. Remember that half the "in use" memory is dedicated to the stopĀ© GC. #;3> ,r Memory: heap size is 1025151236 bytes with 776804714 bytes currently in use So, an entire gigabyte of virtual memory and about 265MB of memory just for the hash itself (btw, my RSS states 566MB of actual memory in use): #;3> (- 776804714 (/ 1025151236 2)) ; heap used at end 264229096 #;4> (- 361224 25); heap used at beginning 111224 #;5> (- 264229096 111224) ; total mem used 264117872 Now we calculate the overhead actually used by the hash itself. It relies on knowledge of Chicken internals. #;6> (vector-length (##sys#slot a 1)) ; number of buckets 635413 #;7> (hash-table-size a) ; number of associations 196597 #;8> (* 196597 1280) ; overhead in bytes for empty hash tables (307 buckets) 251644160 #;10> (* 25 12); size in bytes of y elements (12 bytes/pair) 300 #;11> (* 25 12); size in bytes of x elements (12 bytes/pair) 300 #;14> (length (filter null? (vector->list (##sys#slot a 1; # of empty buckets in a 438816 #;15> (- 635413 196597); equal to #14, so each bucket contains at most 1 value 438816 #;16> (* 438816 8) ; overhead in bytes for empty buckets in a 3510528 #;20> (* 196597 8) ; overhead in bytes for full buckets in a 1572776 #;21> (+ 251644160 3510528 300 300 1572776) ; calculated memory usage 262727464 Our final calculation is 263MB for the hash, which is close enough for government work. Of this, 250MB is spent on the second-level hashes, which take up a full 1280 bytes when empty because each has a minimum of 307 buckets. Another 5MB is spent on bucket overhead in the first-level array, and 6MB is spent on actual (key . value) pairs. I found you can save a few megs by compacting the first-level array using (define a (make-hash-table = min-load: 0.99 max-load: 0.99). This is a complete hack, but the number of buckets dropped from 635,000 to 317,701 (i.e. the next lowest possible value). However, nothing else I tried worked. You can't cap the number of buckets, you can't adjust the permissible load to 1.00 or higher, you can't reduce the minimum size of a hash table, and trying a different integer hashing function gave no benefit. But I am not an expert, so may be missing something. In short, Chicken hash tables are not suitable for your application, or, in my opinion, anything serious right now. I believe this is being worked on. Also keep in mind that perl hashes are highly, highly optimized as they are Perl's swiss-army data structure. Meantime, write your own data structure that takes into consideration the normal use of your sparse arrays. For example, I made a dumb change that implements the second level as a plain alist, since collisions at the first level are rare in this particular case. Compiled, it runs at 1.2 seconds on my machine vs 0.8 seconds for the perl script. The data structure is 13MB because of the savings at the second level. Although I couldn't solve your problem, here is the modified example. (print "Filling the associative array with 25 entries.") (define a (make-hash-table =)) (let loop ((n 25)) (let ((x (random 50)) (y (random 50))) (when (> n 0) (hash-table-update!/default a x (lambda (L) (cons (cons y #t) L)) '()) (loop (sub1 n) (print "Reading from the associative array 1 times\n") (printf "Done (hits ~a)" (let loop ((n 1) (hits 0)) (let ((x (random 50)) (y (random 50))) (if (= n 0) hits (if (and-let* ((h (hash-table-ref/default a x #f))) (alist-ref y h)) (loop (sub1 n) (add1 hits)) (loop (sub1 n) hits)) ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
Sorry for the overly brief problem statement. This is the two deep sparse array code I threw together. BTW, in the test code I am using numbers as the keys out of lazyness. The real code may use strings and or numbers. I'm using chicken 2.740 and 3.3.0 if it matters. I guess my question really has two parts. 1) How do chicken hash arrays compare in speed and memory usage with hash arrays in perl? 2) Is there some more scheme oriented way to store data like this for quick random lookup? (define vv (make-hash-table)) (define (sparse-array-set! a x y val) (let ((sh (hash-table-ref/default a x #f))) (if sh (hash-table-set! sh y val) (let ((h (make-hash-table))) (hash-table-set! a x h) (hash-table-set! h y val) (define (sparse-array-ref a x y) (let ((h (hash-table-ref/default a x #f))) (if h (hash-table-ref/default h y #f) #f))) (define maxval 50) (let loop ((x (random maxval)) (y (random maxval)) (n 0)) (sparse-array-set! vv x y #t) (if (< n 25) (loop (random maxval)(random maxval)(+ n 1 On Sat, Aug 23, 2008 at 2:35 PM, John Cowan <[EMAIL PROTECTED]> wrote: > Matt Welland scripsit: > >> My attempts all use gigs of memory and run 10x as long. > > So show us the problematic code, then. > > > -- > When I'm stuck in something boring John Cowan > where reading would be impossible or(who loves Asimov too) > rude, I often set up math problems for [EMAIL PROTECTED] > myself and solve them as a way to pass http://www.ccil.org/~cowan > the time. --John Jenkins > ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Good way to code the equivalent to this?
Matt Welland scripsit: > My attempts all use gigs of memory and run 10x as long. So show us the problematic code, then. -- When I'm stuck in something boring John Cowan where reading would be impossible or(who loves Asimov too) rude, I often set up math problems for [EMAIL PROTECTED] myself and solve them as a way to pass http://www.ccil.org/~cowan the time. --John Jenkins ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users
[Chicken-users] Good way to code the equivalent to this?
My attempts all use gigs of memory and run 10x as long. #!/usr/bin/perl -w print "Filling the array with 25 entries.\n"; foreach $n (0 .. 25) { $x = int(rand(50)); $y = int(rand(50)); $a{$x}{$y}=1; } print "Reading from the arrary 1 times\n"; $hits=0; foreach $n (0 .. 1) { $x = int(rand(50)); $y = int(rand(50)); if (defined $a{$x}{$y}) { $hits++; } } print "Done\n"; -- Visit www.kiatoa.com, a user-run, self governing web site. ___ Chicken-users mailing list Chicken-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/chicken-users