On Tue, 7 Dec 2010, Sven Van Caekenberghe wrote:

Related to the Matrix CSV input/output optimalization quest, I was puzzled why 
writing seemed so much slower than reading.

Here is a simple example:

[ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream |
        100000 timesRepeat: [ stream print: 100 atRandom; space ] ] ] timeToRun.
1558
[ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream |
        100000 timesRepeat: [ Integer readFrom: stream. stream peekFor: $ ] ] ] 
timeToRun.
183
[ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream |
        100000 timesRepeat: [ stream nextPut: ($a to: $z) atRandom; space ] ] ] 
timeToRun.
1705
[ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream |
        100000 timesRepeat: [ stream next. stream peekFor: $ ] ] ] timeToRun.
47

Clearly, the writing is close to an order of magnitude slower than reading.

This was on Pharo 1.1 with Cog, but I double-checked with Pharo 1.2 and Squeak 
4.1.

On my machine (Mac Book Pro), this is what another dynamic language does:

> (time (with-output-to-file (out "/tmp/numbers.txt")
              (loop repeat 100000 do (format out "~d " (random 100)))))
Timing the evaluation of (WITH-OUTPUT-TO-FILE (OUT "/tmp/numbers.txt") (LOOP REPEAT 
100000 DO (FORMAT OUT "~d " (RANDOM 100))))

User time    =        0.413
System time  =        0.002
Elapsed time =        0.401
Allocation   = 2502320 bytes
0 Page faults
Calls to %EVAL    1700063
NIL

(time (with-open-file (in "/tmp/numbers.txt")
               (loop repeat 100000 do (read in))))
Timing the evaluation of (WITH-OPEN-FILE (IN "/tmp/numbers.txt") (LOOP REPEAT 
100000 DO (READ IN)))

User time    =        0.328
System time  =        0.001
Elapsed time =        0.315
Allocation   = 2500764 bytes
0 Page faults
Calls to %EVAL    1400056
NIL

So Pharo Smalltalk clearly matches the read/parse speed, which is great, but 
fails at simple writing.

Maybe I am doing something wrong here (I know these are MultiByteFileSteams), 
but I fail to see what. Something with buffering/flushing ?

Anybody any idea ?

That's because filestreams are read buffered, but not write buffered. I implemented a subclass of FileStream (intended as a possible replacement of StandardFileStream) which is read and write buffered. It gives the same performance for reading as the current implementation and a significant boost for writes, so it can be done. But write buffering has side effects, while read buffering doesn't. Maybe it can be added as a separate subclass of FileStream if there's need for it, but the multibyte stuff has to be duplicated in this case (note that it's already duplicated in MultiByteFileStream and MultiByteBinaryOrTextStream). I also had an idea to create MultiByteStream which would be a stream that wraps another stream and does the conversion stuff using a TextConverter. It'd be a lot of work to do it and I don't expect more than 30% performance improvement (for the read performance).

There are several stream libraries (for example XTreams) that can easily support write buffering without the need to care about compatibility.


Levente


Sven




Reply via email to