On Sunday, January 31, 2016 at 12:13:31 AM UTC-6, Scotty C wrote: > > that's what i did. so new performance data. this is with bytes instead of > > strings for data on the hard drive but bignums in the hash still. > > > > as a single large file and a hash with 2000003 buckets for 26.6 million > > records the data rate is 98408/sec. > > > > when i split and go with 11 smaller files and hash with 500009 buckets the > > data rate is 106281/sec. > > hash is reworked, bytes based. same format though, vector of bytes. so time > test results: > > single large file same # buckets as above data rate 175962/sec. > > 11 smaller files same # buckets as above data rate 205971/sec.
throughput update. i had to hand code some of the stuff (places are just not working for me) but i just managed to hack my way through running this in parallel. i copied the original 26.6 million records to a new file. ran two slightly reworked copies of my duplicate removal code at a shell prompt like this: racket ddd-parallel.rkt & racket ddd-parallel1.rkt & i'm not messing with the single large file anymore. so for twice the data the data rate is up to 356649/sec. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.