Re: [Pharo-project] Why is FileStream writing almost an order of a magnitude slower than reading ?

Levente Uzonyi Tue, 07 Dec 2010 07:20:48 -0800

On Tue, 7 Dec 2010, Sven Van Caekenberghe wrote:

Related to the Matrix CSV input/output optimalization quest, I was puzzled why 
writing seemed so much slower than reading.


Here is a simple example:

[ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream |
        100000 timesRepeat: [ stream print: 100 atRandom; space ] ] ] timeToRun.
1558
[ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream |
        100000 timesRepeat: [ Integer readFrom: stream. stream peekFor: $ ] ] ] 
timeToRun.
183
[ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream |
        100000 timesRepeat: [ stream nextPut: ($a to: $z) atRandom; space ] ] ] 
timeToRun.
1705
[ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream |
        100000 timesRepeat: [ stream next. stream peekFor: $ ] ] ] timeToRun.
47

Clearly, the writing is close to an order of magnitude slower than reading.

This was on Pharo 1.1 with Cog, but I double-checked with Pharo 1.2 and Squeak 
4.1.

On my machine (Mac Book Pro), this is what another dynamic language does:

> (time (with-output-to-file (out "/tmp/numbers.txt")
              (loop repeat 100000 do (format out "~d " (random 100)))))
Timing the evaluation of (WITH-OUTPUT-TO-FILE (OUT "/tmp/numbers.txt") (LOOP REPEAT 
100000 DO (FORMAT OUT "~d " (RANDOM 100))))

User time    =        0.413
System time  =        0.002
Elapsed time =        0.401
Allocation   = 2502320 bytes
0 Page faults
Calls to %EVAL    1700063
NIL

(time (with-open-file (in "/tmp/numbers.txt")

               (loop repeat 100000 do (read in))))
Timing the evaluation of (WITH-OPEN-FILE (IN "/tmp/numbers.txt") (LOOP REPEAT 
100000 DO (READ IN)))

User time    =        0.328
System time  =        0.001
Elapsed time =        0.315
Allocation   = 2500764 bytes
0 Page faults
Calls to %EVAL    1400056
NIL

So Pharo Smalltalk clearly matches the read/parse speed, which is great, but 
fails at simple writing.

Maybe I am doing something wrong here (I know these are MultiByteFileSteams), 
but I fail to see what. Something with buffering/flushing ?

Anybody any idea ?

That's because filestreams are read buffered, but not write buffered. Iimplemented a subclass of FileStream (intended as a possible replacementof StandardFileStream) which is read and write buffered. It gives the sameperformance for reading as the current implementation and a significantboost for writes, so it can be done. But write buffering has side effects,while read buffering doesn't. Maybe it can be added as a separate subclassof FileStream if there's need for it, but the multibyte stuff has to beduplicated in this case (note that it's already duplicated inMultiByteFileStream and MultiByteBinaryOrTextStream). I also had an ideato create MultiByteStream which would be a stream that wraps anotherstream and does the conversion stuff using a TextConverter. It'd be a lotof work to do it and I don't expect more than 30% performance improvement(for the read performance).

There are several stream libraries (for example XTreams) that can easilysupport write buffering without the need to care about compatibility.



Levente


Sven

Re: [Pharo-project] Why is FileStream writing almost an order of a magnitude slower than reading ?

Reply via email to