Re: [Newbies] Re: Binary file I/O performance problems

2008-09-06 Thread Herbert König
Hello David,


YO   I'm sure that there are other implications, but it sounds like you
YO do need some primitives to make it efficient.  I would make a
YO primitive that is equivalent of read_xyza_ping() that fills a Squeak
YO object, or if you are dealing with array of XYZA_Ping structure,
YO making an array of homogeneous arrays so that all linenames are stored
YO in a ByteArray, all pingnums are stored in a WordArray, etc.  In this
YO way, you may still be able to utilize the vector primitives.

this approach seems to give a chance of solving the sped problem.

In your original post you talked about 10 significant figures, so be
aware that float array only is 32 bit floats with only about 8
significant figures.

The second caveat is if many of your floats are in the range of 1e-38
(the closet to zero number of 32 Bit Float) Float array gets very slow
(speed degradation by a factor of 8).  I'm talking about FloatArray*
and *= here.

Sorry if I sound negative I just think its bad to ignore problems that
are know in advance.


-- 
Cheers,

Herbert   

___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


Re: [Newbies] Re: Binary file I/O performance problems

2008-09-06 Thread David Finlayson
I have implemented a number of signal processing programs in both C99
and Python (with psyco jit). I have an 8-core Mac Pro workstation
which I can use as for parallel processing by launching multiple
instances of the code using Make scripts. An interesting thing
happened when I compared the performance of the C code to the Python
code:

The C code became I/O bound at 4 cores saturating either the disks or
the memory bus (I am not sure exactly where the bottleneck is). While
the Python version never became I/O bound at 8 cores, it did however
close to within a factor of 10 of the performance of the C code. This
suggested to me that If I had enough processors to saturate the I/O
there was no speed advantage of writing the code in C.

The next generation of workstations we buy will probably have dozens
of cores but hard drives and memory will only be marginally faster (if
history is any indication). So, if I/O is the rate limiting factor,
not cpu speed, why not look for the most productive programing
environment possible? I've always read that Smalltalk is often
considered the most productive programing environment ever invented.
So I wanted to give it a try. But I am discovering (from the point of
view of a scientist programmer like myself) it lacks a lot in
comparison to Matlab or Python (both high-level) and especially C and
C++ (lots and lots of library code).

I am going to have to weigh the pros and cons of whether it makes
since to push on with this.

David
___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


[Newbies] Re: Binary file I/O performance problems

2008-09-05 Thread nicolas cellier

nicolas cellier a écrit :

Yoshiki Ohshima a écrit :

At Fri, 5 Sep 2008 10:59:03 -0700,
David Finlayson wrote:

I re-wrote the test application to load the test file entirely into
memory before parsing the data. The total time to parse the file
decreased by about 50%. Now that I/O is removed from the picture, the
new bottle neck is turning bytes into integers (and then integers into
Floats).

I know that Smalltalk isn't the common language for number crunching,
but if I can get acceptable performance out of it, then down the road
I would like to tap into the Croquet environment. That is why I am
trying to learn a way that will work.


  If the integers or floats are in the layout of C's int[] or float[],
there is a better chance to make it much faster.

  Look at the method BitmapasByteArray and
BitmapcopyFromByteArray:.  You can convert a big array of non-pointer
words from/to a byte array.

  data := (1 to: 100) as: FloatArray.
  words := Bitmap new: data size.
  words replaceFrom: 1 to: data size with: data.
  bytes := words asByteArray.

  and you write out the bytes into a binary file.

  to get them back:

  words copyFromByteArray: bytes.
  data replaceFrom: 1 to: words size with: words.

Obviously, you can recycle some of the intermediate buffer allocation
and that would speed it up.

  FloatArray has some vector arithmetic primitives, and the Kedama
system in OLPC Etoys image have more elaborated vector arithmetic
primitives on integers and floats including operations with masked
vectors.

-- Yoshiki


Hi David,
your applications is exciting my curiosity. Which company/organization 
are you working for, if not indiscreet?


I think you will solve most performances problems following good advices 
from Yoshiki.


You might also want to investigate FFI as a way for handling 
C-layout-like ByteArray memory from within Smalltalk as an alternative.
I made an example of use in Smallapack-Collections (search Smallapack in 
squeaksource, http://www.squeaksource.com/Smallapack/) .
ExternalArray is an abstract class for handling memory filled as a 
C-Arrays of any type from within Smalltalk (only float double and 
complex are programmed in subclasses, but you can extend), and in fact 
FFI can handle any structure (though you'll might have to resolve 
alignment problems by yourself).
There's a trade-off between fast reading (no conversion) and slower 
access (conversion at each access), however with ByteArray#doubleAt: 
and #floatAt: primitives (from FFI), and fast hacks to eventually 
reverse endianness of a whole array at once, maintaining ExternalArrays 
of elementary types or small structures procide access time still 
reasonnable.


Nicolas


forgot to provide some timing (Athlon 32bits 1GHz) for write/read access:

| a b c |
{
  [a := FloatArray withAll: (1 to: 10)] timeToRun.
  [b := ExternalFloatArray withAll: (1 to: 10)] timeToRun.
  [c := ExternalDoubleArray withAll: (1 to: 10)] timeToRun.
  [a do: [:e | ]] timeToRun.
  [b do: [:e | ]] timeToRun.
  [c do: [:e | ]] timeToRun.
}.
 #(142 312 335 80 181 182)


___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


Re: [Newbies] Re: Binary file I/O performance problems

2008-09-05 Thread Yoshiki Ohshima
At Fri, 05 Sep 2008 23:00:07 +0200,
nicolas cellier wrote:
 
 Hi David,
 your applications is exciting my curiosity. Which company/organization 
 are you working for, if not indiscreet?

  I assume the answer is USGS, because of his email address!  Yes, it
sounds like something cool is going on.

-- Yoshiki
___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


Re: [Newbies] Re: Binary file I/O performance problems

2008-09-05 Thread David Finlayson
Unfortunately, the data is not a simple block of floats. For example,
in C here is how I read a ping header block from one of our vendors
formats:

/* read_xyza_ping: read ping block, returns 1 if successful, EOF if
 * end of file  */
int read_xyza_ping(FILE *fin, XYZA_Ping *pp) {
int8_t byte[4];

fread(pp-linename, sizeof(int8_t), MAX_LINENAME_LEN, fin);
fread(pp-pingnum, sizeof(uint32_t), 1, fin);
fread(byte, sizeof(int8_t), 4, fin);
fread(pp-time, sizeof(double), 1, fin);
fread(pp-notxers, sizeof(int32_t), 1, fin);
fread(byte, sizeof(int8_t), 4, fin);
read_posn(fin, pp-posn);
fread(pp-roll, sizeof(double), 1, fin);
fread(pp-pitch, sizeof(double), 1, fin);
fread(pp-heading, sizeof(double), 1, fin);
fread(pp-height, sizeof(double), 1, fin);
fread(pp-tide, sizeof(double), 1, fin);
fread(pp-sos, sizeof(double), 1, fin);

if (ferror(fin) != 0) {
perror(sxpfile: error: (read_xyza_ping));
abort();
}

// time between 1995 - 2020?
assert(788936400  pp-time  pp-time  1577865600);
assert(0  pp-notxers  pp-notxers = MAX_TXERS);
assert(-90.0  pp-roll  pp-roll  90.0);
assert(-90.0  pp-pitch  pp-pitch  90.0);
assert(0.0 = pp-heading  pp-heading = 360.0);

// heave values
assert(-10.0  pp-height  pp-height  10.0);
assert(-100  pp-tide  pp-tide  100.0);

// speed of sound reasonable? (freshwater too)
assert(1000 = pp-sos  pp-sos  1600);

return feof(fin) ? EOF : 1;
}

Note how there are various sized integers and floating point numbers
mixed together along with padding space put into the file during the
write (the original engineer must have just used fwrite on the
structs).

The notxers variable above indicates the number of XYZA_Txer structs
to follow, each XYZA_Txer struct indicates the number of XYZA_Point
structs to follow and so on until the entire structure is read into
memory. Then you start over again and read the next ping.

It is painful, but I don't know how to read any other way except to
read them in one structure at a time.





-- 
David Finlayson, Ph.D.
Operational Geologist

U.S. Geological Survey
Pacific Science Center
400 Natural Bridges Drive
Santa Cruz, CA 95060, USA

Tel: 831-427-4757, Fax: 831-427-4748, E-mail: [EMAIL PROTECTED]
___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


[Newbies] Re: Binary file I/O performance problems

2008-09-03 Thread Klaus D. Witzel

Hi David,

let me respond in reverse order of your points:


I find it troubling that I am having to write code below the
abstraction level of C to read and write data from a file.  I thought
Smalltalk was supposed to free me from this kind of drudgery? Right
now, Java looks good and Python/Ruby look fantastic by comparison.


Here the difference to Squeak/Smalltalk is, that the intermediate level  
routines like #uint32 are made available at the Smalltalk language level  
where users can see them, use them and modify them. Such an approach is  
seen as part of an invaluable resource by Smalltalk users. It has a price,  
yes.


But Squeak/Smalltalk can do faster, dramatically faster than what you  
observed. The .image file (10s - 100s MB) is read from disk and  
de-endianessed in a second or so. Of course this is possible only because  
the file is in a ready-to-use format, but this can be a clue when you  
perhaps want to consider alternative input methods.



This (I think) cleans up some of the code smell, but for only marginal
performance improvements. It seems that I may need to implement a
buffer on the binary stream. Is there a good example on how this
should be done in the image or elsewhere?


I don't know of a particular example (specialized somehow on your problem  
at hand, for buffered reading of arbitrary structs) but this here is  
easy to do in Squeak:


  byteArray := ByteArray new: 2  20.
  actuallyTransferred :=
binaryStream readInto: byteArray startingAt: 1 count: byteArray size

You may perhaps want to check that GBs can be brought into Squeak's memory  
in a matter of seconds, just #printIt in a workspace:


[1024 timesRepeat: [[
(binaryStream := (SourceFiles at: 1) readOnlyCopy) binary.
byteArray := ByteArray new: 2  20.
  actuallyTransferred :=
binaryStream reset; readInto:
byteArray startingAt: 1 count: byteArray size]
 ensure: [binaryStream close]]] timeToRun

When reading from disk 4-byte-wise this makes a huge difference for sure.  
From here on you would use the ByteArray protocol (#byteAt:*, #shortAt:*,  
#longAt:*, #doubleAt:*) but as mentioned earlier these methods are perhaps  
not optimal (when compared to other languages and their implementation  
libraries) for you.


Last but not least, when doing performance critical i/o or conversions,  
Squeak users sometimes write a Squeak plugin (which then extends the  
Squeak VM), still at the Smalltalk/Slang language level but with it they  
can do/call any hw-oriented routine for speeding up things dramatically,  
and this indeed compares well to other languages and their implementation  
libraries :)


HTH.

/Klaus


On Wed, 03 Sep 2008 08:00:54 +0200, David Finlayson wrote:

OK - I made some of the suggested changes. I broke the readers into two  
parts:


uint32
returns the next unsigned, 32-bit integer from the binary
stream
isBigEndian
ifTrue: [^ self nextBigEndianNumber: 4]
ifFalse: [^ self nextLittleEndianNumber: 4]

Where nextLittleEndianNumber looks like this:

nextLittleEndianNumber: n
Answer the next n bytes as a positive Integer or
LargePositiveInteger, where the bytes are ordered from least
significant to most significant.
Copied from PositionableStream
| bytes s |
[bytes := stream next: n.
s := 0.
n
to: 1
by: -1
do: [:i | s := (s bitShift: 8)
bitOr: (bytes at: i)].
^ s]
on: Error
do: [^ nil]



David



___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners