On Tue, Apr 5, 2011 at 1:20 PM, Charles R Harris <charlesr.har...@gmail.com> wrote: > > > On Tue, Apr 5, 2011 at 10:46 AM, Christopher Barker <chris.bar...@noaa.gov> > wrote: >> >> On 4/4/11 10:35 PM, Charles R Harris wrote: >> > IIUC, "Ub" is undefined -- "U" means universal newlines, which makes >> > no >> > sense when used with "b" for binary. I looked at the code a ways >> > back, >> > and I can't remember the resolution order, but there isn't any >> > checking >> > for incompatible flags. >> > >> > I'd expect that genfromtxt, being txt, and line oriented, should use >> > 'rU'. but if it wants the raw line endings (why would it?) then rb >> > should be fine. >> > >> > >> > "U" has been kept around for backwards compatibility, the python >> > documentation recommends that it not be used for new code. >> >> That is for 3.* -- the 2.7.* docs say: >> >> """ >> In addition to the standard fopen() values mode may be 'U' or 'rU'. >> Python is usually built with universal newline support; supplying 'U' >> opens the file as a text file, but lines may be terminated by any of the >> following: the Unix end-of-line convention '\n', the Macintosh >> convention '\r', or the Windows convention '\r\n'. All of these external >> representations are seen as '\n' by the Python program. If Python is >> built without universal newline support a mode with 'U' is the same as >> normal text mode. Note that file objects so opened also have an >> attribute called newlines which has a value of None (if no newlines have >> yet been seen), '\n', '\r', '\r\n', or a tuple containing all the >> newline types seen. >> >> Python enforces that the mode, after stripping 'U', begins with 'r', 'w' >> or 'a'. >> "" >> >> which does, in fact indicate that 'Ub' is NOT allowed. We should be >> using 'Ur', I think. Maybe the "python enforces" is what we saw the >> error from -- it didn't used to enforce anything. >> > > 'rbU' works and I put that in as a quick fix. >> >> On 4/5/11 7:12 AM, Charles R Harris wrote: >> >> > The 'Ub' mode doesn't work for '\r' on python 3. This may be a bug in >> > python, as it works just fine on python 2.7. >> >> "Ub" never made any sense anywhere -- "U" means universal newline text >> file. "b" means binary -- combining them makes no sense. On older >> pythons, the behaviour of 'Ub' was undefined -- now, it looks like it is >> supposed to raise an error. >> >> does 'Ur' work with \r line endings on Python 3? > > Yes. > >> >> According to my read of the docs, 'U' does nothing -- "universal" >> newline support is supposed to be the default: >> >> """ >> On input, if newline is None, universal newlines mode is enabled. Lines >> in the input can end in '\n', '\r', or '\r\n', and these are translated >> into '\n' before being returned to the caller. >> """ >> >> > It may indeed be desirable >> > to read the files as text, but that would require more work on both >> > loadtxt and genfromtxt. >> >> Why can't we just open the file with mode 'Ur'? text is text, messing >> with line endings shouldn't hurt anything, and it might help. >> > > Well, text in the files then gets the numpy 'U' type instead of 'S', and > there are places where byte streams are assumed for stripping and such. > Which is to say that changing to text mode requires some work. Another > possibility is to use a generator: > > def usetext(fname): > f = open(fname, 'rt') > for l in f: > yield asbytes(f.next()) > > I think genfromtxt could use a refactoring and cleanup, but probably not for > 1.6.
I think it should also be possible to read "rb" and strip any \r, \r\n in _iotools.py, that's were the bytes are used, from my reading and the initial error message. Josef > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion