One other caveat here, "line" contains the newline at the end, so
you might have
print line.rstrip('\r\n')
to remove them.
I don't understand the presence of the '\r' there. Any '\x0d' that
remains after reading the file in text mode and is removed by that
rstrip would be a strange occurrence in the data which the OP may
prefer to find out about and deal with; it is not part of "the
newline". Why suppress one particular data character in preference to
others?
In an ideal world where everybody knew how to make a proper
text-file, it wouldn't be an issue. Recreating the form of some
of the data I get from customers/providers:
>>> f = file('tmp/x.txt', 'wb')
>>> f.write('headers\n') # headers in Unix format
>>> f.write('data1\r\n') # data in Dos format
>>> f.write('data2\r\n')
>>> f.write('data3') # no trailing newline of any sort
>>> f.close()
Then reading it back in:
>>> for line in file('tmp/x.txt'): print repr(line)
...
'headers\n'
'data1\r\n'
'data2\r\n'
'data3'
As for wanting to know about stray '\r' characters, I only want
the data -- I don't particularly like to be reminded of the
incompetence of those who send me malformed text-files ;-)
The same applies in any case to the use of rstrip('\n'); if that finds
more than one ocurrence of '\x0a' to remove, it has exceeded the
mandate of removing the newline (if any).
I believe that using the formulaic "for line in file(FILENAME)"
iteration guarantees that each "line" will have at most only one
'\n' and it will be at the end (again, a malformed text-file with
no terminal '\n' may cause it to be absent from the last line)
So, we are left with the unfortunately awkward
if line.endswith('\n'):
line = line[:-1]
You're welcome to it, but I'll stick with my more DWIM solution
of "get rid of anything that resembles an attempt at a CR/LF".
Thank goodness I haven't found any of my data-sources using
"\n\r" instead, which would require me to left-strip '\r'
characters as well. Sigh. My kingdom for competency. :-/
-tkc
--
http://mail.python.org/mailman/listinfo/python-list