On 12/07/2010 04:20 PM, Levente Uzonyi wrote: > On Tue, 7 Dec 2010, Sven Van Caekenberghe wrote: > >> Related to the Matrix CSV input/output optimalization quest, I was >> puzzled why writing seemed so much slower than reading. >> >> Here is a simple example: >> >> [ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream | >> 100000 timesRepeat: [ stream print: 100 atRandom; space ] ] ] >> timeToRun. >> 1558 >> [ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream | >> 100000 timesRepeat: [ Integer readFrom: stream. stream peekFor: $ >> ] ] ] timeToRun. >> 183 >> [ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream | >> 100000 timesRepeat: [ stream nextPut: ($a to: $z) atRandom; space >> ] ] ] timeToRun. >> 1705 >> [ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream | >> 100000 timesRepeat: [ stream next. stream peekFor: $ ] ] ] timeToRun. >> 47 >> >> Clearly, the writing is close to an order of magnitude slower than >> reading. >> >> This was on Pharo 1.1 with Cog, but I double-checked with Pharo 1.2 >> and Squeak 4.1. >> >> On my machine (Mac Book Pro), this is what another dynamic language does: >> >> > (time (with-output-to-file (out "/tmp/numbers.txt") >> (loop repeat 100000 do (format out "~d " (random 100))))) >> Timing the evaluation of (WITH-OUTPUT-TO-FILE (OUT "/tmp/numbers.txt") >> (LOOP REPEAT 100000 DO (FORMAT OUT "~d " (RANDOM 100)))) >> >> User time = 0.413 >> System time = 0.002 >> Elapsed time = 0.401 >> Allocation = 2502320 bytes >> 0 Page faults >> Calls to %EVAL 1700063 >> NIL >> >>> (time (with-open-file (in "/tmp/numbers.txt") >> (loop repeat 100000 do (read in)))) >> Timing the evaluation of (WITH-OPEN-FILE (IN "/tmp/numbers.txt") (LOOP >> REPEAT 100000 DO (READ IN))) >> >> User time = 0.328 >> System time = 0.001 >> Elapsed time = 0.315 >> Allocation = 2500764 bytes >> 0 Page faults >> Calls to %EVAL 1400056 >> NIL >> >> So Pharo Smalltalk clearly matches the read/parse speed, which is >> great, but fails at simple writing. >> >> Maybe I am doing something wrong here (I know these are >> MultiByteFileSteams), but I fail to see what. Something with >> buffering/flushing ? >> >> Anybody any idea ? > > That's because filestreams are read buffered, but not write buffered. I > implemented a subclass of FileStream (intended as a possible replacement > of StandardFileStream) which is read and write buffered. It gives the > same performance for reading as the current implementation and a > significant boost for writes, so it can be done. But write buffering has > side effects, while read buffering doesn't. Maybe it can be added as a > separate subclass of FileStream if there's need for it, but the > multibyte stuff has to be duplicated in this case (note that it's > already duplicated in MultiByteFileStream and > MultiByteBinaryOrTextStream). I also had an idea to create > MultiByteStream which would be a stream that wraps another stream and > does the conversion stuff using a TextConverter. It'd be a lot of work > to do it and I don't expect more than 30% performance improvement (for > the read performance).
No, buffering should not be in a subclass or even the file stream class itself. Buffering should be an other class that wraps file stream. Cheers Philippe
