On Sunday, January 31, 2016 at 12:13:31 AM UTC-6, Scotty C wrote:
> > that's what i did. so new performance data. this is with bytes instead of 
> > strings for data on the hard drive but bignums in the hash still.
> > 
> > as a single large file and a hash with 2000003 buckets for 26.6 million 
> > records the data rate is 98408/sec.
> > 
> > when i split and go with 11 smaller files and hash with 500009 buckets the 
> > data rate is 106281/sec.
> 
> hash is reworked, bytes based. same format though, vector of bytes. so time 
> test results:
> 
> single large file same # buckets as above data rate 175962/sec.
> 
> 11 smaller files same # buckets as above data rate 205971/sec.

throughput update. i had to hand code some of the stuff (places are just not 
working for me) but i just managed to hack my way through running this in 
parallel. i copied the original 26.6 million records to a new file. ran two 
slightly reworked copies of my duplicate removal code at a shell prompt like 
this:
racket ddd-parallel.rkt &
racket ddd-parallel1.rkt &
i'm not messing with the single large file anymore. so for twice the data the 
data rate is up to 356649/sec.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to