On 04/04/2011 12:38 PM, Charles R Harris wrote:
On Mon, Apr 4, 2011 at 11:01 AM, Bruce Southey <bsout...@gmail.com
<mailto:bsout...@gmail.com>> wrote:
On 04/04/2011 11:20 AM, Charles R Harris wrote:
On Mon, Apr 4, 2011 at 9:59 AM, Bruce Southey <bsout...@gmail.com
<mailto:bsout...@gmail.com>> wrote:
On 03/31/2011 12:02 PM, Derek Homeier wrote:
> On 31 Mar 2011, at 17:03, Bruce Southey wrote:
>
>> This is an invalid ticket because the docstring clearly
states that in
>> 3 different, yet critical places, that missing values are
not handled
>> here:
>>
>> "Each row in the text file must have the same number of
values."
>> "genfromtxt : Load data with missing values handled as
specified."
>> " This function aims to be a fast reader for simply
formatted
>> files. The
>> `genfromtxt` function provides more sophisticated
handling of,
>> e.g.,
>> lines with missing values."
>>
>> Really I am trying to separate the usage of loadtxt and
genfromtxt to
>> avoid unnecessary duplication and confusion. Part of this is
>> historical because loadtxt was added in 2007 and
genfromtxt was added
>> in 2009. So really certain features of loadtxt have been
'kept' for
>> backwards compatibility purposes yet these features can be
'abused' to
>> handle missing data. But I really consider that any
missing values
>> should cause loadtxt to fail.
>>
> OK, I was not aware of the design issues of loadtxt vs.
genfromtxt -
> you could probably say also for historical reasons since I
have not
> used genfromtxt much so far.
> Anyway the docstring statement "Converters can also be used to
> provide a default value for missing data:"
> then appears quite misleading, or an invitation to abuse,
if you will.
> This should better be removed from the documentation then,
or users
> explicitly discouraged from using converters instead of
genfromtxt
> (I don't see how you could completely prevent using
converters in
> this way).
>
>> The patch is incorrect because it should not include a
space in the
>> split() as indicated in the comment by the original
reporter. Of
> The split('\r\n') alone caused test_dtype_with_object(self)
to fail,
> probably
> because it relies on stripping the blanks. But maybe the
test is ill-
> formed?
>
>> course a corrected patch alone still is not sufficient to
address the
>> problem without the user providing the correct converter.
Also you
>> start to run into problems with multiple delimiters (such
as one space
>> versus two spaces) so you start down the path to add all
the features
>> that duplicate genfromtxt.
> Given that genfromtxt provides that functionality more
conveniently,
> I agree again users should be encouraged to use this instead of
> converters.
> But the actual tab-problem causes in fact an issue not
related to
> missing
> values at all (well, depending on what you call a missing
value).
> I am describing an example on the ticket.
>
> Cheers,
> Derek
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
Okay I see that 1071 got closed which I am fine with.
I think that your following example should be a test because
the two
spaces should not be removed with a tab delimiter:
np.loadtxt(StringIO("aa\tbb\n \t \ncc\t"), delimiter='\t',
dtype=np.dtype([('label', 'S4'), ('comment', 'S4')]))
Make a test and we'll put it in.
Chuck
I know!
Trying to write one made me realize that loadtxt is not handling
string arrays correctly. So I have to check more on this as I
think loadtxt is giving a 1-d array instead of a 2-d array.
Tests often have that side effect.
<snip>
Chuck
Okay,
My confusion aside (sorry for that), I added ticket 1784 with a possible
test that should work with ticket 1071:
http://projects.scipy.org/numpy/ticket/1794
Bruce
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion