Re: [Numpy-discussion] loadtxt/savetxt tickets

Bruce Southey Tue, 05 Apr 2011 12:08:48 -0700

On 04/04/2011 12:38 PM, Charles R Harris wrote:

On Mon, Apr 4, 2011 at 11:01 AM, Bruce Southey <bsout...@gmail.com<mailto:bsout...@gmail.com>> wrote:


    On 04/04/2011 11:20 AM, Charles R Harris wrote:



    On Mon, Apr 4, 2011 at 9:59 AM, Bruce Southey <bsout...@gmail.com
    <mailto:bsout...@gmail.com>> wrote:

        On 03/31/2011 12:02 PM, Derek Homeier wrote:
        > On 31 Mar 2011, at 17:03, Bruce Southey wrote:
        >
        >> This is an invalid ticket because the docstring clearly
        states that in
        >> 3 different, yet critical places, that missing values are
        not handled
        >> here:
        >>
        >> "Each row in the text file must have the same number of
        values."
        >> "genfromtxt : Load data with missing values handled as
        specified."
        >> "   This function aims to be a fast reader for simply
        formatted
        >> files.  The
        >>     `genfromtxt` function provides more sophisticated
        handling of,
        >> e.g.,
        >>     lines with missing values."
        >>
        >> Really I am trying to separate the usage of loadtxt and
        genfromtxt to
        >> avoid unnecessary duplication and confusion. Part of this is
        >> historical because loadtxt was added in 2007 and
        genfromtxt was added
        >> in 2009. So really certain features of loadtxt have been
         'kept' for
        >> backwards compatibility purposes yet these features can be
        'abused' to
        >> handle missing data. But I really consider that any
        missing values
        >> should cause loadtxt to fail.
        >>
        > OK, I was not aware of the design issues of loadtxt vs.
        genfromtxt -
        > you could probably say also for historical reasons since I
        have not
        > used genfromtxt much so far.
        > Anyway the docstring statement "Converters can also be used to
        >           provide a default value for missing data:"
        > then appears quite misleading, or an invitation to abuse,
        if you will.
        > This should better be removed from the documentation then,
        or users
        > explicitly discouraged from using converters instead of
        genfromtxt
        > (I don't see how you could completely prevent using
        converters in
        > this way).
        >
        >> The patch is incorrect because it should not include a
        space in the
        >> split() as indicated in the comment by the original
        reporter. Of
        > The split('\r\n') alone caused test_dtype_with_object(self)
        to fail,
        > probably
        > because it relies on stripping the blanks. But maybe the
        test is ill-
        > formed?
        >
        >> course a corrected patch alone still is not sufficient to
        address the
        >> problem without the user providing the correct converter.
        Also you
        >> start to run into problems with multiple delimiters (such
        as one space
        >> versus two spaces) so you start down the path to add all
        the features
        >> that duplicate genfromtxt.
        > Given that genfromtxt provides that functionality more
        conveniently,
        > I agree again users should be encouraged to use this instead of
        > converters.
        > But the actual tab-problem causes in fact an issue not
        related to
        > missing
        > values at all (well, depending on what you call a missing
        value).
        > I am describing an example on the ticket.
        >
        > Cheers,
        >                                       Derek
        >
        > _______________________________________________
        > NumPy-Discussion mailing list
        > NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
        > http://mail.scipy.org/mailman/listinfo/numpy-discussion
        Okay I see that 1071 got closed which I am fine with.

        I think that your following example should be a test because
        the two
        spaces should not be removed with a tab delimiter:
        np.loadtxt(StringIO("aa\tbb\n \t \ncc\t"), delimiter='\t',
        dtype=np.dtype([('label', 'S4'), ('comment', 'S4')]))


    Make a test and we'll put it in.

    Chuck

    I know!
    Trying to write one made me realize that loadtxt is not handling
    string arrays correctly. So I have to check more on this as I
    think loadtxt is giving a 1-d array instead of a 2-d array.


Tests often have that side effect.

<snip>

Chuck

Okay,

My confusion aside (sorry for that), I added ticket 1784 with a possibletest that should work with ticket 1071:

http://projects.scipy.org/numpy/ticket/1794

Bruce

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] loadtxt/savetxt tickets

Reply via email to