On 12/07/2010 04:20 PM, Levente Uzonyi wrote:
> On Tue, 7 Dec 2010, Sven Van Caekenberghe wrote:
> 
>> Related to the Matrix CSV input/output optimalization quest, I was
>> puzzled why writing seemed so much slower than reading.
>>
>> Here is a simple example:
>>
>> [ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream |
>>     100000 timesRepeat: [ stream print: 100 atRandom; space ] ] ]
>> timeToRun.
>> 1558
>> [ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream |
>>     100000 timesRepeat: [ Integer readFrom: stream. stream peekFor: $
>> ] ] ] timeToRun.
>> 183
>> [ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream |
>>     100000 timesRepeat: [ stream nextPut: ($a to: $z) atRandom; space
>> ] ] ] timeToRun.
>> 1705
>> [ FileStream fileNamed: '/tmp/numbers.txt' do: [ :stream |
>>     100000 timesRepeat: [ stream next. stream peekFor: $ ] ] ] timeToRun.
>> 47
>>
>> Clearly, the writing is close to an order of magnitude slower than
>> reading.
>>
>> This was on Pharo 1.1 with Cog, but I double-checked with Pharo 1.2
>> and Squeak 4.1.
>>
>> On my machine (Mac Book Pro), this is what another dynamic language does:
>>
>> > (time (with-output-to-file (out "/tmp/numbers.txt")
>>               (loop repeat 100000 do (format out "~d " (random 100)))))
>> Timing the evaluation of (WITH-OUTPUT-TO-FILE (OUT "/tmp/numbers.txt")
>> (LOOP REPEAT 100000 DO (FORMAT OUT "~d " (RANDOM 100))))
>>
>> User time    =        0.413
>> System time  =        0.002
>> Elapsed time =        0.401
>> Allocation   = 2502320 bytes
>> 0 Page faults
>> Calls to %EVAL    1700063
>> NIL
>>
>>> (time (with-open-file (in "/tmp/numbers.txt")
>>                (loop repeat 100000 do (read in))))
>> Timing the evaluation of (WITH-OPEN-FILE (IN "/tmp/numbers.txt") (LOOP
>> REPEAT 100000 DO (READ IN)))
>>
>> User time    =        0.328
>> System time  =        0.001
>> Elapsed time =        0.315
>> Allocation   = 2500764 bytes
>> 0 Page faults
>> Calls to %EVAL    1400056
>> NIL
>>
>> So Pharo Smalltalk clearly matches the read/parse speed, which is
>> great, but fails at simple writing.
>>
>> Maybe I am doing something wrong here (I know these are
>> MultiByteFileSteams), but I fail to see what. Something with
>> buffering/flushing ?
>>
>> Anybody any idea ?
> 
> That's because filestreams are read buffered, but not write buffered. I
> implemented a subclass of FileStream (intended as a possible replacement
> of StandardFileStream) which is read and write buffered. It gives the
> same performance for reading as the current implementation and a
> significant boost for writes, so it can be done. But write buffering has
> side effects, while read buffering doesn't. Maybe it can be added as a
> separate subclass of FileStream if there's need for it, but the
> multibyte stuff has to be duplicated in this case (note that it's
> already duplicated in MultiByteFileStream and
> MultiByteBinaryOrTextStream). I also had an idea to create
> MultiByteStream which would be a stream that wraps another stream and
> does the conversion stuff using a TextConverter. It'd be a lot of work
> to do it and I don't expect more than 30% performance improvement (for
> the read performance).

No, buffering should not be in a subclass or even the file stream class
itself. Buffering should be an other class that wraps file stream.

Cheers
Philippe


Reply via email to