[Newbies] Re: Binary file I/O performance problems

Klaus D. Witzel Wed, 03 Sep 2008 02:54:24 -0700

Hi David,

let me respond in "reverse" order of your points:

I find it troubling that I am having to write code below the
abstraction level of C to read and write data from a file.  I thought
Smalltalk was supposed to free me from this kind of drudgery? Right
now, Java looks good and Python/Ruby look fantastic by comparison.

Here the difference to Squeak/Smalltalk is, that the intermediate levelroutines like #uint32 are made available at the Smalltalk language levelwhere users can see them, use them and modify them. Such an approach isseen as part of an invaluable resource by Smalltalk users. It has a price,yes.

But Squeak/Smalltalk can do faster, dramatically faster than what youobserved. The .image file (10s - 100s MB) is read from disk andde-endianessed in a second or so. Of course this is possible only becausethe file is in a ready-to-use format, but this can be a clue when youperhaps want to consider alternative input methods.

This (I think) cleans up some of the code smell, but for only marginal
performance improvements. It seems that I may need to implement a
buffer on the binary stream. Is there a good example on how this
should be done in the image or elsewhere?

I don't know of a particular example (specialized somehow on your problemat hand, for buffered reading of arbitrary "struct"s) but this here iseasy to do in Squeak:


  byteArray := ByteArray new: 2 << 20.
  actuallyTransferred :=
        binaryStream readInto: byteArray startingAt: 1 count: byteArray size

You may perhaps want to check that GBs can be brought into Squeak's memoryin a matter of seconds, just #printIt in a workspace:


[1024 timesRepeat: [[
        (binaryStream := (SourceFiles at: 1) readOnlyCopy) binary.
        byteArray := ByteArray new: 2 << 20.
          actuallyTransferred :=
                binaryStream reset; readInto:
                byteArray startingAt: 1 count: byteArray size]
 ensure: [binaryStream close]]] timeToRun

When reading from disk 4-byte-wise this makes a huge difference for sure.From here on you would use the ByteArray protocol (#byteAt:*, #shortAt:*,#longAt:*, #doubleAt:*) but as mentioned earlier these methods are perhapsnot optimal (when compared to other languages and their implementationlibraries) for you.

Last but not least, when doing performance critical i/o or conversions,Squeak users sometimes write a Squeak plugin (which then extends theSqueak VM), still at the Smalltalk/Slang language level but with it theycan do/call any hw-oriented routine for speeding up things dramatically,and this indeed compares well to other languages and their implementationlibraries :)


HTH.

/Klaus


On Wed, 03 Sep 2008 08:00:54 +0200, David Finlayson wrote:

OK - I made some of the suggested changes. I broke the readers into twoparts:


uint32
        "returns the next unsigned, 32-bit integer from the binary
        stream"
        isBigEndian
                ifTrue: [^ self nextBigEndianNumber: 4]
                ifFalse: [^ self nextLittleEndianNumber: 4]

Where nextLittleEndianNumber looks like this:

nextLittleEndianNumber: n
        "Answer the next n bytes as a positive Integer or
        LargePositiveInteger, where the bytes are ordered from least
        significant to most significant.
        Copied from PositionableStream"
        | bytes s |
        [bytes := stream next: n.
        s := 0.
        n
                to: 1
                by: -1
                do: [:i | s := (s bitShift: 8)
                                                bitOr: (bytes at: i)].
        ^ s]
                on: Error
                do: [^ nil]



David



_______________________________________________
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners

[Newbies] Re: Binary file I/O performance problems

Reply via email to