On 09/29/2009 01:30 PM, Pierre GM wrote:
On Sep 29, 2009, at 1:57 PM, Bruce Southey wrote:
On 09/29/2009 11:37 AM, Christopher Barker wrote:
Pierre GM wrote:
Probably more than memory is the execution time involved in printing
these problem rows.
The rows with problems will be printed outside the loop (with at least
an associated warning or possibly raising an exception). My concern is
to whether store only the tuples (index of the row, nb of columns) for
the invalid rows, or just create a list of nb of columns that I'd
parse afterwards. The first solution requires an extra test in the
loop, the second may waste some memory space.
Bah, I'll figure it out. Please send me some test cases so that I can
time/test the best option.
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
The first case just has to handle a missing delimiter - actually I
expect that most of my cases would relate this. So here is simple Python
code to generate arbitrary large list with the occasional missing delimiter.
I set it so it reads the desired number of rows and frequency of bad
rows from the linux command line.
$time python tbig.py 1000000 100000
If I comment out the extra prints in io.py that I put in, it takes about
22 seconds to finish if the delimiters are correct. If I have the
missing delimiter it takes 20.5 seconds to crash.
Bruce
import sys
import numpy as np
from StringIO import StringIO
if len(sys.argv) != 3:
print 'incorrect number of arguments'
exit()
else:
a=['a,b,c,d,e']
numberofrows=int(sys.argv[1])
error_freq=int(sys.argv[2])
for x in range(numberofrows):
if x % error_freq:
a.append('1,1,1,1,1')
else:
a.append('2,2,2,2 2')
st=StringIO("%s" % ('\n'.join(a)))
arr = np.genfromtxt(st, names=True, dtype=None , delimiter=',')
print arr.shape, len(arr.dtype)
print arr
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion