On 09/23/2009 10:00 AM, Dave Wood wrote:
"If the text file has 'numbers and strings' how is numpy meant to know
what dtype to use?
Please try genfromtxt especially if columns contain both numbers and
strings."
Well, I suppose they are all considered to be strings here. I haven't
tried to convert the numbers to floats yet.
"What happens if you read a file instead of using stdin?"
Same problem
"It is possible that one or more rows have multiple sequential delimiters.
Please check the row lengths of your 'data' variable after doing:"
Already done, they all have the same number of rows.
The fact that the script works with the first 40k lines, and also with
the last 40k lines suggests to me that there is no problem with the file.
(I calculate column means and standard deviations later in the script
- it's only the first two columns which can't be cast to floating
point numbers)
"Really without the input or system, it is hard to say anything.
If you really know your data I would suggest preallocating the array
and updating the array one line at a time to avoid the large multiple
intermediate objects."
I'm running on linux. My machine is redhat with 2GB RAM, but when
memory became an issue I tried running on other Linux machines with
much greater RAM capacities. I don't know what distos.
I just tried preallocating the array and updating it one line at a
time, and that works fine. Thanks very much for the suggestion. :)
This doesn't seem like the expected behaviour though and the error
message seems wrong.
Many thanks,
Dave
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
http://mail.scipy.org/mailman/listinfo/numpy-discussion
------------------------------------------------------------------------
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Glad it you got a solution.
While far from an expert, with 2GB ram you do not have that much free
RAM outside the OS and other overheads. With your code, the OS has to
read all the data in at least once as well as allocate the storage for
the result and any intermediate objects. So it is easy to exhaust memory.
I agree that the error message is too vague so you could file a ticket.
Use PyTables if memory is a problem for you.
For example, see the recent 'np.memmap and memory usage' thread on numpy
discussion:
http://www.mail-archive.com/numpy-discussion@scipy.org/msg18863.html
Especially the post by Francesc Alted:
http://www.mail-archive.com/numpy-discussion@scipy.org/msg18868.html
Bruce
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion