Re: [Chicken-users] Good way to code the equivalent to this?

2008-09-24 Thread Kon Lovett


On Aug 25, 2008, at 12:50 PM, Tobia Conforto wrote:


Matt Welland wrote:
the tactic of loading lots of data into a hierarchy of hash arrays  
and then extracting the needed pieces in a myriad of ways,  
sometimes on the fly in a meeting with management nervously looking  
on :-) has been tremendously useful for me. I for one am hoping  
that there are faster hash tables in the future of chicken.


Until there are, here's how you can have the best of both worlds,  
especially considering:
1. the use you seem to make of hash tables (mainly setting new keys  
and referencing them, rarely or never deleting them)

2. what has been said of alists vs. hash tables for small data sets
3. what has been said of alist->hash-table

Just use your own hash table-like structure, that starts as an alist  
and only becomes a proper hash table when it grows over a certain  
size, tuneable with a parameter.


Something like this should go a long way in letting you keep your  
coding style and have good performance:



(define smart-hash-threshold (make-parameter 20))

(define (make-smart-hash)
 (cons (list)
   (void))) ;cdr currently unused

(define (smart-hash-ref table key)
 (if (list? (car table))
 (alist-ref key (car table))
 (hash-table-ref (car table) key)))

(define (smart-hash-set! table key value)
 (if (list? (car table))
 (begin
   (set-car! table (alist-update! key value (car table)))
   (if (> (length (car table)) (smart-hash-threshold))
   (set-car! table (alist->hash-table (car table)
 (hash-table-set! (car table) key value)))


Tobia


The "lookup-table" egg operates in a similar fashion.





___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Best Wishes,
Kon




___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-25 Thread Tobia Conforto

Matt Welland wrote:
the tactic of loading lots of data into a hierarchy of hash arrays  
and then extracting the needed pieces in a myriad of ways, sometimes  
on the fly in a meeting with management nervously looking on :-) has  
been tremendously useful for me. I for one am hoping that there are  
faster hash tables in the future of chicken.


Until there are, here's how you can have the best of both worlds,  
especially considering:
1. the use you seem to make of hash tables (mainly setting new keys  
and referencing them, rarely or never deleting them)

2. what has been said of alists vs. hash tables for small data sets
3. what has been said of alist->hash-table

Just use your own hash table-like structure, that starts as an alist  
and only becomes a proper hash table when it grows over a certain  
size, tuneable with a parameter.


Something like this should go a long way in letting you keep your  
coding style and have good performance:



(define smart-hash-threshold (make-parameter 20))

(define (make-smart-hash)
  (cons (list)
(void))) ;cdr currently unused

(define (smart-hash-ref table key)
  (if (list? (car table))
  (alist-ref key (car table))
  (hash-table-ref (car table) key)))

(define (smart-hash-set! table key value)
  (if (list? (car table))
  (begin
(set-car! table (alist-update! key value (car table)))
(if (> (length (car table)) (smart-hash-threshold))
(set-car! table (alist->hash-table (car table)
  (hash-table-set! (car table) key value)))


Tobia


___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-25 Thread Elf

On Mon, 25 Aug 2008, Tobia Conforto wrote:


Elf wrote:

(define a
  (alist->hash-table
(let loop ((i 0))
  (if (fx= 25 i)
  '()
  (cons (cons (random 50)
  (random 50))
(loop (fx+ 1 i)
=))



for an improvement in time (surprisingly), use

(define a
  (alist->hash-table
(let loop ((i 0)
   (r '()))
  (if (fx= 25 i)
  r
  (loop (fx+ 1 i)
(cons (cons (random 50)
(random 50))
  r
=))

...there are a lot more minor GCs this way


Why is it surprising that the tail-recursive version performs better?  And 
why do you say it has more GC?  Isn't it supposed to produce less garbage?




well, i say it has more cause thats what the stats said :)  the reason why 
it would be more in large cases like this is because it has to repeatedly

copy a large structure in the iterative-tail-recursion version, whereas
in the stacking recursion it just has to copy the frames onto the heap 
every so often.


-elf



___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-25 Thread Elf


i got almost the same time as with my previous (incorrect) method by using a 
single hash table and an alist instead of a hash for the inner structure. 
memory usage isnt so good though. :(


using a sparse multiple-alist structure seems to work nicely in small space
and good time.  i can post the code if interested.

-elf

On Sun, 24 Aug 2008, Jim Ursetto wrote:


Note that your solution effects a mapping of x -> y, not from x -> y
-> #t.  That gets the job accomplished correctly, but doesn't reflect
the original Perl code.  For a fair comparison, you have to change the
perl code to match, and then it runs in 1/3 of the memory and 1/2 the
time of the original perl code.

This is what I mean concretely:

print "Filling the array with 25 entries.\n";
foreach $n (0 .. 25) {
$x = int(rand(50));
$y = int(rand(50));
$a{$x} = $y; # change to one-level hash
}

print "Reading from the array 1 times\n";
$hits=0;
foreach $n (0 .. 1) {
$x = int(rand(50));
$y = int(rand(50));
if (exists $a{$x} and $a{$x} == $y) {  # quell warning on undef
  $hits++;
}
}

print "Done (hits $hits)\n";

On Sun, Aug 24, 2008 at 5:56 AM, Elf <[EMAIL PROTECTED]> wrote:


for an improvement in time (surprisingly), use

(define a
   (alist->hash-table
   (let loop ((i   0)
  (r   '()))
   (if (fx= 25 i)
   r
   (loop (fx+ 1 i)
 (cons (cons (random 50) (random 50)) r
   =))





___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-25 Thread Tobia Conforto

Elf wrote:

(define a
  (alist->hash-table
(let loop ((i 0))
  (if (fx= 25 i)
  '()
  (cons (cons (random 50)
  (random 50))
(loop (fx+ 1 i)
=))



for an improvement in time (surprisingly), use

(define a
  (alist->hash-table
(let loop ((i 0)
   (r '()))
  (if (fx= 25 i)
  r
  (loop (fx+ 1 i)
(cons (cons (random 50)
(random 50))
  r
=))

...there are a lot more minor GCs this way


Why is it surprising that the tail-recursive version performs better?   
And why do you say it has more GC?  Isn't it supposed to produce less  
garbage?



Tobia


___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-24 Thread Matt Welland
Wow, thanks. That cut my run time in half to aprox 8..4 seconds. I'm
using chicken 2.74 on this machine. I've tried 3.3 at a different
machine but not seen much difference.

Note that changing to non-hierarchical hashes breaks a lot of my
existing code and is not exactly apples to apples with the perl. Still
useful though and I may be able to use that approach with a little
foresight next time.

On Sun, Aug 24, 2008 at 9:36 PM, Alex Shinn <[EMAIL PROTECTED]> wrote:
>> "Matt" == Matt Welland <[EMAIL PROTECTED]> writes:
>
>Matt> chicken: ~16 secs (when it didn't crash)
>Matt> stk: ~17 secs
>Matt> ruby: ~1.3 secs
>Matt> perl: ~0.7 secs
>
> Which Chicken version are you timing, and with what compiler
> optimizations?  If you use -Ob and a suitable initial heap
> size, then on my machine elf's version runs in under 700ms,
> as opposed to ~500ms for the Perl version.  Use the command:
>
>  $ csc -Ob -heap-initial-size 300M file.scm
>
> If you really just want to get the job done regardless of
> method, you can note that in this particular case a simple
> vector is suitable and skip the hash tables altogether.
>
> This is over twice as fast as the perl version:
>
> 
> (define a (make-vector 50 '()))
>
> (print "filling ...")
>
> (do ((i 0 (add1 i)))
>((>= i 25))
>  (let ((x (random 50))
>(y (random 50)))
>(vector-set! a x (cons y (vector-ref a x)
>
> (print "reading ...")
>
> (define hits 0)
>
> (do ((i 0 (add1 i)))
>((>= i 1))
>  (let ((x (random 50))
>(y (random 50)))
>(if (memq y (vector-ref a x))
>(set! hits (add1 hits)
>
> (print "done.")
> 
>
>
> --
> Alex
>


___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-24 Thread Jim Ursetto
Note that your solution effects a mapping of x -> y, not from x -> y
-> #t.  That gets the job accomplished correctly, but doesn't reflect
the original Perl code.  For a fair comparison, you have to change the
perl code to match, and then it runs in 1/3 of the memory and 1/2 the
time of the original perl code.

This is what I mean concretely:

print "Filling the array with 25 entries.\n";
foreach $n (0 .. 25) {
 $x = int(rand(50));
 $y = int(rand(50));
 $a{$x} = $y; # change to one-level hash
}

print "Reading from the array 1 times\n";
$hits=0;
foreach $n (0 .. 1) {
 $x = int(rand(50));
 $y = int(rand(50));
 if (exists $a{$x} and $a{$x} == $y) {  # quell warning on undef
   $hits++;
 }
}

print "Done (hits $hits)\n";

On Sun, Aug 24, 2008 at 5:56 AM, Elf <[EMAIL PROTECTED]> wrote:
>
> for an improvement in time (surprisingly), use
>
> (define a
>(alist->hash-table
>(let loop ((i   0)
>   (r   '()))
>(if (fx= 25 i)
>r
>(loop (fx+ 1 i)
>  (cons (cons (random 50) (random 50)) r
>=))


___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-24 Thread Matt Welland
Thanks all for the various workarounds. The ones I understand I will
try. The others I'll study :-)

My test results were as follows (this is  with 300,000 entries -
closer to what my current problem is hitting):

chicken: ~16 secs (when it didn't crash)
stk:~17 secs
ruby:  ~1.3 secs
perl:   ~0.7 secs

By the by the tactic of loading lots of data into a hierarchy of hash
arrays and then extracting the needed pieces in a myriad of ways,
sometimes on the fly in a meeting with management nervously looking on
:-) has been tremendously useful for me. I for one am hoping that
there are faster hash tables in the future of chicken.


___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-24 Thread Jim Ursetto
As a comparison, Felix's solution performs equivalently to my alist
example when compiled, but runs much faster from the REPL.  It also
uses 9MB instead of 13MB.  It's also simpler.

On Sun, Aug 24, 2008 at 4:06 AM, felix winkelmann <[EMAIL PROTECTED]> wrote:
> This should perform much better and more closely
> resembles what you are trying to achieve.


___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-24 Thread Elf


one final note:

the time and memory usage for the list-tabulate, map!, and unfold solutions, 
on my box, are approximately 0.4s slower than perl, with slightly better 
memory usage stats (in chicken, that is).


hope this helps.

-elf



___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-24 Thread Elf


i've been testing various solutions, which ill be happy to post if interested.
all of them involve alist->hash-table.  so far, the fastest solution, on 
average, has been list-tabluate to generate the list. (map! was a close 
contender, though.)  the solution with the least GCs, on average, uses unfold

for generation.   all of these have been run in the interpreter, not compiled.

-elf




___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-24 Thread Elf


for an improvement in time (surprisingly), use

(define a
(alist->hash-table
(let loop ((i   0)
   (r   '()))
(if (fx= 25 i)
r
(loop (fx+ 1 i)
  (cons (cons (random 50) (random 50)) r
=))

instead.  personally, id test both on whatever box youre running, as there 
are a lot more minor GCs this way.  the alist->hash has an interesting property,

though... the first value (if there are duplicate keys) seems to be the one
thats used, the others are ignored.  however, its a lot faster to build the
list than to do the hash-table-set!, interestingly.

-elf



___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-24 Thread Elf


my solution :)  its faster (at least on chicken 3.3.something).
timing on this box put this at just under 8s.

-elf

---

(use extras)

(print "filling ...")

(define a
(alist->hash-table
(let loop ((i   0))
(if (fx= 25 i)
'()
(cons (cons (random 50) (random 50))
  (loop (fx+ 1 i)
=))

(print "reading ...")

(define hits
(let loop ((i   0)
   (r   0))
(if (fx= i 1)
r
(if (fx= (random 50)
 (hash-table-ref/default a (random 50) -1))
(loop (fx+ 1 i) (fx+ 1 r))
(loop (fx+ 1 i) r)

(print "done.")





___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-24 Thread felix winkelmann
Hi!


This should perform much better and more closely
resembles what you are trying to achieve.

Note that Perl's data structures are bummed to absurd
dimensions. We don't want that.


cheers,
felix
--

(use extras)

(define a (make-hash-table equal?))

(print "filling ...")

(do ((i 0 (add1 i)))
((>= i 25))
  (hash-table-set!
   a
   (cons (random 50) (random 50))
   #t))

(print "reading ...")

(do ((i 0 (add1 i))
 (hits 0))
((>= i 1))
  (let ((x (random 50))
(y (random 50)))
(when (hash-table-ref/default a (cons x y) #f)
  (set! hits (add1 hits)

(print "done.")


___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-24 Thread Jim Ursetto
>>> My attempts all use gigs of memory and run 10x as long.

Let's take a look at why so much memory is being used, using the
following very similar code.

(define a (make-hash-table =))
(let loop ((n 25))
  (let ((x (random 50))
(y (random 50)))
(when (> n 0)
  (hash-table-update! a x
  (lambda (h)
(hash-table-set! h y #t)
h)
  (cut make-hash-table eq?))
  (loop (sub1 n)

$ csi
#;1> ,r
Memory: heap size is 50 bytes with 361224 bytes currently in use

We evaluate the code above and recheck the memory usage.  Remember
that half the "in use" memory is dedicated to the stopĀ© GC.

#;3> ,r
Memory: heap size is 1025151236 bytes with 776804714 bytes currently in use

So, an entire gigabyte of virtual memory and about 265MB of memory
just for the hash itself (btw, my RSS states 566MB of actual memory in
use):

#;3> (- 776804714 (/ 1025151236 2))  ; heap used at end
264229096
#;4> (- 361224 25); heap used at beginning
111224
#;5> (- 264229096 111224) ; total mem used
264117872

Now we calculate the overhead actually used by the hash itself.  It
relies on knowledge of Chicken internals.

#;6> (vector-length (##sys#slot a 1))  ; number of buckets
635413
#;7> (hash-table-size a)   ; number of associations
196597
#;8> (* 196597 1280)   ; overhead in bytes for empty
hash tables (307 buckets)
251644160
#;10> (* 25 12); size in bytes of y elements
(12 bytes/pair)
300
#;11> (* 25 12); size in bytes of x elements
(12 bytes/pair)
300
#;14> (length (filter null? (vector->list (##sys#slot a 1;
# of empty buckets in a
438816
#;15> (- 635413 196597); equal to #14, so each bucket
contains at most 1 value
438816
#;16> (* 438816 8) ; overhead in bytes for empty
buckets in a
3510528
#;20> (* 196597 8) ; overhead in bytes for full buckets in a
1572776
#;21> (+ 251644160 3510528 300 300 1572776)  ; calculated memory usage
262727464

Our final calculation is 263MB for the hash, which is close enough for
government work.  Of this, 250MB is spent on the second-level hashes,
which take up a full 1280 bytes when empty because each has a minimum
of 307 buckets. Another 5MB is spent on bucket overhead in the
first-level array, and 6MB is spent on actual (key . value) pairs.

I found you can save a few megs by compacting the first-level array
using (define a (make-hash-table = min-load: 0.99 max-load: 0.99).
This is a complete hack, but the number of buckets dropped from
635,000 to 317,701 (i.e. the next lowest possible value).

However, nothing else I tried worked.  You can't cap the number of
buckets, you can't adjust the permissible load to 1.00 or higher, you
can't reduce the minimum size of a hash table, and trying a different
integer hashing function gave no benefit.  But I am not an expert, so
may be missing something.

In short, Chicken hash tables are not suitable for your application,
or, in my opinion, anything serious right now.  I believe this is
being worked on.  Also keep in mind that perl hashes are highly,
highly optimized as they are Perl's swiss-army data structure.

Meantime, write your own data structure that takes into consideration
the normal use of your sparse arrays.  For example, I made a dumb
change that implements the second level as a plain alist, since
collisions at the first level are rare in this particular case.
Compiled, it runs at 1.2 seconds on my machine vs 0.8 seconds for the
perl script.  The data structure is 13MB because of the savings at the
second level.  Although I couldn't solve your problem, here is the
modified example.

(print "Filling the associative array with 25 entries.")

(define a (make-hash-table =))

(let loop ((n 25))
  (let ((x (random 50))
(y (random 50)))
(when (> n 0)
  (hash-table-update!/default a x
  (lambda (L)
(cons (cons y #t) L))
  '())
  (loop (sub1 n)

(print "Reading from the associative array 1 times\n")

(printf "Done (hits ~a)"
(let loop ((n 1) (hits 0))
  (let ((x (random 50))
(y (random 50)))
(if (= n 0)
hits
(if (and-let* ((h (hash-table-ref/default a x #f)))
  (alist-ref y h))
(loop (sub1 n) (add1 hits))
(loop (sub1 n) hits))


___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-23 Thread Matt Welland
Sorry for the overly brief problem statement. This is the two deep
sparse array code I threw together. BTW, in the test code I am using
numbers as the keys out of lazyness. The real code may use strings and
or numbers. I'm using chicken 2.740 and 3.3.0 if it matters.

I guess my question really has two parts.
1) How do chicken hash arrays compare in speed and memory usage with
hash arrays in perl?
2) Is there some more scheme oriented way to store data like this for
quick random lookup?

(define vv  (make-hash-table))

(define (sparse-array-set! a x y val)
  (let ((sh (hash-table-ref/default a x #f)))
(if sh
(hash-table-set! sh y val)
(let ((h (make-hash-table)))
  (hash-table-set! a x h)
  (hash-table-set! h y val)

(define (sparse-array-ref a x y)
  (let ((h (hash-table-ref/default a x #f)))
(if h
(hash-table-ref/default h y #f)
#f)))

(define maxval 50)
(let loop ((x (random maxval))
   (y (random maxval))
   (n 0))
  (sparse-array-set! vv x y #t)
  (if (< n 25)
  (loop (random maxval)(random maxval)(+ n 1


On Sat, Aug 23, 2008 at 2:35 PM, John Cowan <[EMAIL PROTECTED]> wrote:
> Matt Welland scripsit:
>
>> My attempts all use gigs of memory and run 10x as long.
>
> So show us the problematic code, then.
>
>
> --
> When I'm stuck in something boring  John Cowan
> where reading would be impossible or(who loves Asimov too)
> rude, I often set up math problems for  [EMAIL PROTECTED]
> myself and solve them as a way to pass  http://www.ccil.org/~cowan
> the time.  --John Jenkins
>


___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Good way to code the equivalent to this?

2008-08-23 Thread John Cowan
Matt Welland scripsit:

> My attempts all use gigs of memory and run 10x as long.

So show us the problematic code, then.


-- 
When I'm stuck in something boring  John Cowan
where reading would be impossible or(who loves Asimov too)
rude, I often set up math problems for  [EMAIL PROTECTED]
myself and solve them as a way to pass  http://www.ccil.org/~cowan
the time.  --John Jenkins


___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


[Chicken-users] Good way to code the equivalent to this?

2008-08-23 Thread Matt Welland
My attempts all use gigs of memory and run 10x as long.

#!/usr/bin/perl -w
print "Filling the array with 25 entries.\n";
foreach $n (0 .. 25) {
  $x = int(rand(50));
  $y = int(rand(50));
  $a{$x}{$y}=1;
}

print "Reading from the arrary 1 times\n";
$hits=0;
foreach $n (0 .. 1) {
  $x = int(rand(50));
  $y = int(rand(50));
  if (defined $a{$x}{$y}) {
$hits++;
  }
}

print "Done\n";


-- 
Visit www.kiatoa.com, a user-run, self governing
web site.


___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users