On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith <n...@pobox.com> wrote:
> On Sun, Feb 26, 2012 at 5:23 PM, Warren Weckesser > <warren.weckes...@enthought.com> wrote: > > I haven't pushed it to the extreme, but the "big" example (in the > examples/ > > directory) is a 1 gig text file with 2 million rows and 50 fields in each > > row. This is read in less than 30 seconds (but that's with a solid state > > drive). > > Obviously this was just a quick test, but FYI, a solid state drive > shouldn't really make any difference here -- this is a pure sequential > read, and for those, SSDs are if anything actually slower than > traditional spinning-platter drives. > > Good point. > For this kind of benchmarking, you'd really rather be measuring the > CPU time, or reading byte streams that are already in memory. If you > can process more MB/s than the drive can provide, then your code is > effectively perfectly fast. Looking at this number has a few > advantages: > - You get more repeatable measurements (no disk buffers and stuff > messing with you) > - If your code can go faster than your drive, then the drive won't > make your benchmark look bad > - There are probably users out there that have faster drives than you > (e.g., I just measured ~340 megabytes/s off our lab's main RAID > array), so it's nice to be able to measure optimizations even after > they stop mattering on your equipment. > > For anyone benchmarking software like this, be sure to clear the disk cache before each run. In linux: $ sync $ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" In Mac OSX: $ purge I'm not sure what the equivalent is in Windows. Warren
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion