Hi, On Tue, Apr 5, 2011 at 10:56 AM, Charles R Harris <charlesr.har...@gmail.com> wrote: > > > On Tue, Apr 5, 2011 at 11:45 AM, <josef.p...@gmail.com> wrote: >> >> On Tue, Apr 5, 2011 at 1:20 PM, Charles R Harris >> <charlesr.har...@gmail.com> wrote: >> > >> > >> > On Tue, Apr 5, 2011 at 10:46 AM, Christopher Barker >> > <chris.bar...@noaa.gov> >> > wrote: >> >> >> >> On 4/4/11 10:35 PM, Charles R Harris wrote: >> >> > IIUC, "Ub" is undefined -- "U" means universal newlines, which >> >> > makes >> >> > no >> >> > sense when used with "b" for binary. I looked at the code a ways >> >> > back, >> >> > and I can't remember the resolution order, but there isn't any >> >> > checking >> >> > for incompatible flags. >> >> > >> >> > I'd expect that genfromtxt, being txt, and line oriented, should >> >> > use >> >> > 'rU'. but if it wants the raw line endings (why would it?) then >> >> > rb >> >> > should be fine. >> >> > >> >> > >> >> > "U" has been kept around for backwards compatibility, the python >> >> > documentation recommends that it not be used for new code. >> >> >> >> That is for 3.* -- the 2.7.* docs say: >> >> >> >> """ >> >> In addition to the standard fopen() values mode may be 'U' or 'rU'. >> >> Python is usually built with universal newline support; supplying 'U' >> >> opens the file as a text file, but lines may be terminated by any of >> >> the >> >> following: the Unix end-of-line convention '\n', the Macintosh >> >> convention '\r', or the Windows convention '\r\n'. All of these >> >> external >> >> representations are seen as '\n' by the Python program. If Python is >> >> built without universal newline support a mode with 'U' is the same as >> >> normal text mode. Note that file objects so opened also have an >> >> attribute called newlines which has a value of None (if no newlines >> >> have >> >> yet been seen), '\n', '\r', '\r\n', or a tuple containing all the >> >> newline types seen. >> >> >> >> Python enforces that the mode, after stripping 'U', begins with 'r', >> >> 'w' >> >> or 'a'. >> >> "" >> >> >> >> which does, in fact indicate that 'Ub' is NOT allowed. We should be >> >> using 'Ur', I think. Maybe the "python enforces" is what we saw the >> >> error from -- it didn't used to enforce anything. >> >> >> > >> > 'rbU' works and I put that in as a quick fix. >> >> >> >> On 4/5/11 7:12 AM, Charles R Harris wrote: >> >> >> >> > The 'Ub' mode doesn't work for '\r' on python 3. This may be a bug in >> >> > python, as it works just fine on python 2.7. >> >> >> >> "Ub" never made any sense anywhere -- "U" means universal newline text >> >> file. "b" means binary -- combining them makes no sense. On older >> >> pythons, the behaviour of 'Ub' was undefined -- now, it looks like it >> >> is >> >> supposed to raise an error. >> >> >> >> does 'Ur' work with \r line endings on Python 3? >> > >> > Yes. >> > >> >> >> >> According to my read of the docs, 'U' does nothing -- "universal" >> >> newline support is supposed to be the default: >> >> >> >> """ >> >> On input, if newline is None, universal newlines mode is enabled. Lines >> >> in the input can end in '\n', '\r', or '\r\n', and these are translated >> >> into '\n' before being returned to the caller. >> >> """ >> >> >> >> > It may indeed be desirable >> >> > to read the files as text, but that would require more work on both >> >> > loadtxt and genfromtxt. >> >> >> >> Why can't we just open the file with mode 'Ur'? text is text, messing >> >> with line endings shouldn't hurt anything, and it might help. >> >> >> > >> > Well, text in the files then gets the numpy 'U' type instead of 'S', and >> > there are places where byte streams are assumed for stripping and such. >> > Which is to say that changing to text mode requires some work. Another >> > possibility is to use a generator: >> > >> > def usetext(fname): >> > f = open(fname, 'rt') >> > for l in f: >> > yield asbytes(f.next()) >> > >> > I think genfromtxt could use a refactoring and cleanup, but probably not >> > for >> > 1.6. >> >> I think it should also be possible to read "rb" and strip any \r, \r\n >> in _iotools.py, >> that's were the bytes are used, from my reading and the initial error >> message. >> > > Doesn't work for \r, you get the whole file at once instead of line by line.
Thanks for trying to sort out this ugliness. I've added another pull request: https://github.com/numpy/numpy/pull/71 - tests for \n \r\n and \r files, raising skiptest for currently failing 3.2 \r mode. Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion