[Numpy-discussion] tofile speed

2007-07-23 Thread Lars Friedrich
Hello everyone,

I am using array.tofile successfully for a data-acqusition-streaming 
application. I mean that I do the following:

for a long time:
temp = dataAcquisisionDevice.getData()
temp.tofile(myDataFile)

temp is a numpy array that is used for storing the data temporarily. The 
data acquisition device is acquiring continuously and writing the data 
to a buffer from which I can read with .getData(). This works fine, but 
of course, when I turn the sample rate higher, there is a point when 
temp.toFile is too slow. The dataAcquisitionDevice's buffer will run 
full before I can fetch the data again.

(temp has a size of ~Mbyte, and the for loop has a period of ~0.5 
seconds so that increasing the chunk size won't help)

I have no idea how efficient array.tofile() is. Maybe it is terribly 
efficient and what I see is just the limitation of my hardware 
(harddisk). Currently I can stream with roughly 4 Mbyte/s, which is 
quite fast, I guess. However, if anyone can point me to a way to write 
my data to harddisk faster, I would be very happy!

Thanks

Lars


-- 
Dipl.-Ing. Lars Friedrich

Photonic Measurement Technology
Department of Microsystems Engineering -- IMTEK
University of Freiburg
Georges-Köhler-Allee 102
D-79110 Freiburg
Germany

phone: +49-761-203-7531
fax:   +49-761-203-7537
room:  01 088
email: [EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] tofile speed

2007-07-23 Thread Sebastian Haase
Just a guess out of my hat:
there might be a buffer class in the standard python library... I'm
thinking of a class that implements file-I/O and collects input up to
a maximum buffer size before it copies the same byte stream to it's
output. Since I/O is more efficient if larger chunks are written this
should improve the overall performance.

How large are your data-chunks per write ? (IOW: what is len(temp.data))

HTH,
Sebastian Haase


On 7/23/07, Lars Friedrich <[EMAIL PROTECTED]> wrote:
> Hello everyone,
>
> I am using array.tofile successfully for a data-acqusition-streaming
> application. I mean that I do the following:
>
> for a long time:
>temp = dataAcquisisionDevice.getData()
>temp.tofile(myDataFile)
>
> temp is a numpy array that is used for storing the data temporarily. The
> data acquisition device is acquiring continuously and writing the data
> to a buffer from which I can read with .getData(). This works fine, but
> of course, when I turn the sample rate higher, there is a point when
> temp.toFile is too slow. The dataAcquisitionDevice's buffer will run
> full before I can fetch the data again.
>
> (temp has a size of ~Mbyte, and the for loop has a period of ~0.5
> seconds so that increasing the chunk size won't help)
>
> I have no idea how efficient array.tofile() is. Maybe it is terribly
> efficient and what I see is just the limitation of my hardware
> (harddisk). Currently I can stream with roughly 4 Mbyte/s, which is
> quite fast, I guess. However, if anyone can point me to a way to write
> my data to harddisk faster, I would be very happy!
>
> Thanks
>
> Lars
>
>
> --
> Dipl.-Ing. Lars Friedrich
>
> Photonic Measurement Technology
> Department of Microsystems Engineering -- IMTEK
> University of Freiburg
> Georges-Köhler-Allee 102
> D-79110 Freiburg
> Germany
>
> phone: +49-761-203-7531
> fax:   +49-761-203-7537
> room:  01 088
> email: [EMAIL PROTECTED]
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] tofile speed

2007-07-23 Thread Charles R Harris

On 7/23/07, Lars Friedrich <[EMAIL PROTECTED]> wrote:


Hello everyone,

I am using array.tofile successfully for a data-acqusition-streaming
application. I mean that I do the following:

for a long time:
temp = dataAcquisisionDevice.getData()
temp.tofile(myDataFile)

temp is a numpy array that is used for storing the data temporarily. The
data acquisition device is acquiring continuously and writing the data
to a buffer from which I can read with .getData(). This works fine, but
of course, when I turn the sample rate higher, there is a point when
temp.toFile is too slow. The dataAcquisitionDevice's buffer will run
full before I can fetch the data again.

(temp has a size of ~Mbyte, and the for loop has a period of ~0.5
seconds so that increasing the chunk size won't help)

I have no idea how efficient array.tofile() is. Maybe it is terribly
efficient and what I see is just the limitation of my hardware
(harddisk). Currently I can stream with roughly 4 Mbyte/s, which is
quite fast, I guess. However, if anyone can point me to a way to write
my data to harddisk faster, I would be very happy!



4 MB/s is extremely slow, these days most drives will do better than 50 MB/s
during sustained writes. Raid-0 will about double that rate if you aren't
terribly worried about drive failure. What operating system and hardware are
you using?

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] tofile speed

2007-07-23 Thread Lars Friedrich
Hello everyone,

thank you for the replies.

Sebastian, the chunk size is roughly 4*10^6 samples, with two byte per 
sample, this is about 8MB. I can vary this size, but increasing it only 
helps for much smaller values. For example, when I use a size of 100 
Samples, I am much too slow. It gets better for 1000 Samples, 1 
Samples and so on. But since I already reached a chunksize in the region 
of megabytes, I have difficulties to increase my buffer size further. I 
also have the feeling that increasing does not help in this size region. 
(correct me if I am wrong...)

Chuck, I am using a Windows XP system with a new (few months old) Maxtor 
SATA-drive.

Lars
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] tofile speed

2007-07-24 Thread Sebastian Haase
So you are  saying that a given tofile() call returns only after 2 seconds !?
Can you measure the getData() call time  (just comment the tofile out
for a while- I that doesn't use 100% CPU ..) ? (timeit module is
needed - I think)
Maybe multithreading might help - so that tofile() and GetData() can overlap.
But 2 sec is really slow 

-S.


On 7/24/07, Lars Friedrich <[EMAIL PROTECTED]> wrote:
> Hello everyone,
>
> thank you for the replies.
>
> Sebastian, the chunk size is roughly 4*10^6 samples, with two byte per
> sample, this is about 8MB. I can vary this size, but increasing it only
> helps for much smaller values. For example, when I use a size of 100
> Samples, I am much too slow. It gets better for 1000 Samples, 1
> Samples and so on. But since I already reached a chunksize in the region
> of megabytes, I have difficulties to increase my buffer size further. I
> also have the feeling that increasing does not help in this size region.
> (correct me if I am wrong...)
>
> Chuck, I am using a Windows XP system with a new (few months old) Maxtor
> SATA-drive.
>
> Lars
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] tofile speed

2007-07-24 Thread Sebastian Haase
Your are not generating text files - right?

On 7/24/07, Sebastian Haase <[EMAIL PROTECTED]> wrote:
> So you are  saying that a given tofile() call returns only after 2 seconds !?
> Can you measure the getData() call time  (just comment the tofile out
> for a while- I that doesn't use 100% CPU ..) ? (timeit module is
> needed - I think)
> Maybe multithreading might help - so that tofile() and GetData() can overlap.
> But 2 sec is really slow 
>
> -S.
>
>
> On 7/24/07, Lars Friedrich <[EMAIL PROTECTED]> wrote:
> > Hello everyone,
> >
> > thank you for the replies.
> >
> > Sebastian, the chunk size is roughly 4*10^6 samples, with two byte per
> > sample, this is about 8MB. I can vary this size, but increasing it only
> > helps for much smaller values. For example, when I use a size of 100
> > Samples, I am much too slow. It gets better for 1000 Samples, 1
> > Samples and so on. But since I already reached a chunksize in the region
> > of megabytes, I have difficulties to increase my buffer size further. I
> > also have the feeling that increasing does not help in this size region.
> > (correct me if I am wrong...)
> >
> > Chuck, I am using a Windows XP system with a new (few months old) Maxtor
> > SATA-drive.
> >
> > Lars
> > ___
> > Numpy-discussion mailing list
> > Numpy-discussion@scipy.org
> > http://projects.scipy.org/mailman/listinfo/numpy-discussion
> >
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] tofile speed

2007-07-25 Thread Lars Friedrich
Hello,

I tried the following:


### start code

a = N.random.rand(100)

myFile = file('test.bin', 'wb')

for i in range(100):
a.tofile(myFile)

myFile.close()

### end code


And this gives roughly 50 MB/s on my office-machine but only 6.5 MB/s on 
the machine that I was reporting about.

Both computers use Python 2.4.3 with enthought 1.0.0 and numpy 1.0.1

So I think I will go and check the harddisk-drivers. array.tofile does 
not seem to be the problem and actually seems to be very fast. Any other 
recommendations?

Thanks
Lars
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] tofile speed

2007-07-25 Thread Charles R Harris

On 7/25/07, Lars Friedrich <[EMAIL PROTECTED]> wrote:


Hello,

I tried the following:


### start code

a = N.random.rand(100)

myFile = file('test.bin', 'wb')

for i in range(100):
a.tofile(myFile)

myFile.close()

### end code


And this gives roughly 50 MB/s on my office-machine but only 6.5 MB/s on
the machine that I was reporting about.

Both computers use Python 2.4.3 with enthought 1.0.0 and numpy 1.0.1

So I think I will go and check the harddisk-drivers. array.tofile does
not seem to be the problem and actually seems to be very fast. Any other
recommendations?



You might check what disk controllers the disks are using. I got an almost
x10 speedup moving some disks from a DELL PCI CERC board to the onboard SATA
and using software raid.  Sometimes DMA isn't enabled, but that is pretty
rare these days.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion