Hi David,

let me respond in "reverse" order of your points:

I find it troubling that I am having to write code below the
abstraction level of C to read and write data from a file.  I thought
Smalltalk was supposed to free me from this kind of drudgery? Right
now, Java looks good and Python/Ruby look fantastic by comparison.

Here the difference to Squeak/Smalltalk is, that the intermediate level routines like #uint32 are made available at the Smalltalk language level where users can see them, use them and modify them. Such an approach is seen as part of an invaluable resource by Smalltalk users. It has a price, yes.

But Squeak/Smalltalk can do faster, dramatically faster than what you observed. The .image file (10s - 100s MB) is read from disk and de-endianessed in a second or so. Of course this is possible only because the file is in a ready-to-use format, but this can be a clue when you perhaps want to consider alternative input methods.

This (I think) cleans up some of the code smell, but for only marginal
performance improvements. It seems that I may need to implement a
buffer on the binary stream. Is there a good example on how this
should be done in the image or elsewhere?

I don't know of a particular example (specialized somehow on your problem at hand, for buffered reading of arbitrary "struct"s) but this here is easy to do in Squeak:

  byteArray := ByteArray new: 2 << 20.
  actuallyTransferred :=
        binaryStream readInto: byteArray startingAt: 1 count: byteArray size

You may perhaps want to check that GBs can be brought into Squeak's memory in a matter of seconds, just #printIt in a workspace:

[1024 timesRepeat: [[
        (binaryStream := (SourceFiles at: 1) readOnlyCopy) binary.
        byteArray := ByteArray new: 2 << 20.
          actuallyTransferred :=
                binaryStream reset; readInto:
                byteArray startingAt: 1 count: byteArray size]
 ensure: [binaryStream close]]] timeToRun

When reading from disk 4-byte-wise this makes a huge difference for sure. From here on you would use the ByteArray protocol (#byteAt:*, #shortAt:*, #longAt:*, #doubleAt:*) but as mentioned earlier these methods are perhaps not optimal (when compared to other languages and their implementation libraries) for you.

Last but not least, when doing performance critical i/o or conversions, Squeak users sometimes write a Squeak plugin (which then extends the Squeak VM), still at the Smalltalk/Slang language level but with it they can do/call any hw-oriented routine for speeding up things dramatically, and this indeed compares well to other languages and their implementation libraries :)

HTH.

/Klaus


On Wed, 03 Sep 2008 08:00:54 +0200, David Finlayson wrote:

OK - I made some of the suggested changes. I broke the readers into two parts:

uint32
        "returns the next unsigned, 32-bit integer from the binary
        stream"
        isBigEndian
                ifTrue: [^ self nextBigEndianNumber: 4]
                ifFalse: [^ self nextLittleEndianNumber: 4]

Where nextLittleEndianNumber looks like this:

nextLittleEndianNumber: n
        "Answer the next n bytes as a positive Integer or
        LargePositiveInteger, where the bytes are ordered from least
        significant to most significant.
        Copied from PositionableStream"
        | bytes s |
        [bytes := stream next: n.
        s := 0.
        n
                to: 1
                by: -1
                do: [:i | s := (s bitShift: 8)
                                                bitOr: (bytes at: i)].
        ^ s]
                on: Error
                do: [^ nil]



David


_______________________________________________
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners

Reply via email to