On Mon, Apr 4, 2011 at 11:01 AM, Bruce Southey <bsout...@gmail.com> wrote:
> On 04/04/2011 11:20 AM, Charles R Harris wrote: > > > > On Mon, Apr 4, 2011 at 9:59 AM, Bruce Southey <bsout...@gmail.com> wrote: > >> On 03/31/2011 12:02 PM, Derek Homeier wrote: >> > On 31 Mar 2011, at 17:03, Bruce Southey wrote: >> > >> >> This is an invalid ticket because the docstring clearly states that in >> >> 3 different, yet critical places, that missing values are not handled >> >> here: >> >> >> >> "Each row in the text file must have the same number of values." >> >> "genfromtxt : Load data with missing values handled as specified." >> >> " This function aims to be a fast reader for simply formatted >> >> files. The >> >> `genfromtxt` function provides more sophisticated handling of, >> >> e.g., >> >> lines with missing values." >> >> >> >> Really I am trying to separate the usage of loadtxt and genfromtxt to >> >> avoid unnecessary duplication and confusion. Part of this is >> >> historical because loadtxt was added in 2007 and genfromtxt was added >> >> in 2009. So really certain features of loadtxt have been 'kept' for >> >> backwards compatibility purposes yet these features can be 'abused' to >> >> handle missing data. But I really consider that any missing values >> >> should cause loadtxt to fail. >> >> >> > OK, I was not aware of the design issues of loadtxt vs. genfromtxt - >> > you could probably say also for historical reasons since I have not >> > used genfromtxt much so far. >> > Anyway the docstring statement "Converters can also be used to >> > provide a default value for missing data:" >> > then appears quite misleading, or an invitation to abuse, if you will. >> > This should better be removed from the documentation then, or users >> > explicitly discouraged from using converters instead of genfromtxt >> > (I don't see how you could completely prevent using converters in >> > this way). >> > >> >> The patch is incorrect because it should not include a space in the >> >> split() as indicated in the comment by the original reporter. Of >> > The split('\r\n') alone caused test_dtype_with_object(self) to fail, >> > probably >> > because it relies on stripping the blanks. But maybe the test is ill- >> > formed? >> > >> >> course a corrected patch alone still is not sufficient to address the >> >> problem without the user providing the correct converter. Also you >> >> start to run into problems with multiple delimiters (such as one space >> >> versus two spaces) so you start down the path to add all the features >> >> that duplicate genfromtxt. >> > Given that genfromtxt provides that functionality more conveniently, >> > I agree again users should be encouraged to use this instead of >> > converters. >> > But the actual tab-problem causes in fact an issue not related to >> > missing >> > values at all (well, depending on what you call a missing value). >> > I am describing an example on the ticket. >> > >> > Cheers, >> > Derek >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion@scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> Okay I see that 1071 got closed which I am fine with. >> >> I think that your following example should be a test because the two >> spaces should not be removed with a tab delimiter: >> np.loadtxt(StringIO("aa\tbb\n \t \ncc\t"), delimiter='\t', >> dtype=np.dtype([('label', 'S4'), ('comment', 'S4')])) >> >> > Make a test and we'll put it in. > > Chuck > > > I know! > Trying to write one made me realize that loadtxt is not handling string > arrays correctly. So I have to check more on this as I think loadtxt is > giving a 1-d array instead of a 2-d array. > > Tests often have that side effect. <snip> Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion