Re: [Numpy-discussion] Question about improving genfromtxt errors

Bruce Southey Tue, 29 Sep 2009 13:37:19 -0700

On 09/29/2009 01:30 PM, Pierre GM wrote:

On Sep 29, 2009, at 1:57 PM, Bruce Southey wrote:

On 09/29/2009 11:37 AM, Christopher Barker wrote:

Pierre GM wrote:

  Probably more than memory is the execution time involved in printing
these problem rows.

The rows with problems will be printed outside the loop (with at least
an associated warning or possibly raising an exception). My concern is
to whether store only the tuples (index of the row, nb of columns) for
the invalid rows, or just create a list of nb of columns that I'd
parse afterwards. The first solution requires an extra test in the
loop, the second may waste some memory space.
Bah, I'll figure it out. Please send me some test cases so that I can
time/test the best option.

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi,

The first case just has to handle a missing delimiter - actually Iexpect that most of my cases would relate this. So here is simple Pythoncode to generate arbitrary large list with the occasional missing delimiter.

I set it so it reads the desired number of rows and frequency of badrows from the linux command line.

$time python tbig.py 1000000 100000

If I comment out the extra prints in io.py that I put in, it takes about22 seconds to finish if the delimiters are correct. If I have themissing delimiter it takes 20.5 seconds to crash.



Bruce

import sys
import numpy as np
from StringIO import StringIO

if len(sys.argv) != 3:
    print 'incorrect number of arguments'
    exit()
else:
    a=['a,b,c,d,e']
    numberofrows=int(sys.argv[1])
    error_freq=int(sys.argv[2])
    for x in range(numberofrows):
        if x % error_freq:
            a.append('1,1,1,1,1')
        else:
            a.append('2,2,2,2 2')

    st=StringIO("%s" % ('\n'.join(a)))
    arr = np.genfromtxt(st, names=True, dtype=None , delimiter=',')
    print arr.shape, len(arr.dtype)
    print arr

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Question about improving genfromtxt errors

Reply via email to