On Mon, Apr 4, 2011 at 9:59 AM, Bruce Southey <bsout...@gmail.com> wrote:
> On 03/31/2011 12:02 PM, Derek Homeier wrote: > > On 31 Mar 2011, at 17:03, Bruce Southey wrote: > > > >> This is an invalid ticket because the docstring clearly states that in > >> 3 different, yet critical places, that missing values are not handled > >> here: > >> > >> "Each row in the text file must have the same number of values." > >> "genfromtxt : Load data with missing values handled as specified." > >> " This function aims to be a fast reader for simply formatted > >> files. The > >> `genfromtxt` function provides more sophisticated handling of, > >> e.g., > >> lines with missing values." > >> > >> Really I am trying to separate the usage of loadtxt and genfromtxt to > >> avoid unnecessary duplication and confusion. Part of this is > >> historical because loadtxt was added in 2007 and genfromtxt was added > >> in 2009. So really certain features of loadtxt have been 'kept' for > >> backwards compatibility purposes yet these features can be 'abused' to > >> handle missing data. But I really consider that any missing values > >> should cause loadtxt to fail. > >> > > OK, I was not aware of the design issues of loadtxt vs. genfromtxt - > > you could probably say also for historical reasons since I have not > > used genfromtxt much so far. > > Anyway the docstring statement "Converters can also be used to > > provide a default value for missing data:" > > then appears quite misleading, or an invitation to abuse, if you will. > > This should better be removed from the documentation then, or users > > explicitly discouraged from using converters instead of genfromtxt > > (I don't see how you could completely prevent using converters in > > this way). > > > >> The patch is incorrect because it should not include a space in the > >> split() as indicated in the comment by the original reporter. Of > > The split('\r\n') alone caused test_dtype_with_object(self) to fail, > > probably > > because it relies on stripping the blanks. But maybe the test is ill- > > formed? > > > >> course a corrected patch alone still is not sufficient to address the > >> problem without the user providing the correct converter. Also you > >> start to run into problems with multiple delimiters (such as one space > >> versus two spaces) so you start down the path to add all the features > >> that duplicate genfromtxt. > > Given that genfromtxt provides that functionality more conveniently, > > I agree again users should be encouraged to use this instead of > > converters. > > But the actual tab-problem causes in fact an issue not related to > > missing > > values at all (well, depending on what you call a missing value). > > I am describing an example on the ticket. > > > > Cheers, > > Derek > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Okay I see that 1071 got closed which I am fine with. > > I think that your following example should be a test because the two > spaces should not be removed with a tab delimiter: > np.loadtxt(StringIO("aa\tbb\n \t \ncc\t"), delimiter='\t', > dtype=np.dtype([('label', 'S4'), ('comment', 'S4')])) > > Make a test and we'll put it in. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion