Re: Irregular last line in a text file, was Re: Regular expressions
On 2015-11-04 14:39, Steven D'Aprano wrote: > On Wednesday 04 November 2015 03:56, Tim Chase wrote: >> Or even more valuable to me: >> >> with open(..., newline="strip") as f: >> assert all(not line.endswith(("\n", "\r")) for line in f) > > # Works only on Windows text files. > def chomp(lines): > for line in lines: > yield line.rstrip('\r\n') .rstrip() takes a string that is a set of characters, so it will remove any \r or \n at the end of the string (so it works with both Windows & *nix line-endings) whereas just using .rstrip() without a parameter can throw away data you might want: >>> "hello \r\n\r\r\n\n\n".rstrip("\r\n") 'hello ' >>> "hello \r\n\r\r\n\n\n".rstrip() 'hello' -tkc -- https://mail.python.org/mailman/listinfo/python-list
Re: Irregular last line in a text file, was Re: Regular expressions
On 4 November 2015 at 03:39, Steven D'Aprano wrote: > > Better would be this: > > def chomp(lines): > for line in lines: > yield line.rstrip() # remove all trailing whitespace > > > with open(...) as f: > for line in chomp(f): ... with open(...) as f: for line in map(str.rstrip, f): ... -- Oscar -- https://mail.python.org/mailman/listinfo/python-list
Re: Irregular last line in a text file, was Re: Regular expressions
On Wednesday 04 November 2015 03:56, Tim Chase wrote: > Or even more valuable to me: > > with open(..., newline="strip") as f: > assert all(not line.endswith(("\n", "\r")) for line in f) # Works only on Windows text files. def chomp(lines): for line in lines: yield line.rstrip('\r\n') Better would be this: def chomp(lines): for line in lines: yield line.rstrip() # remove all trailing whitespace with open(...) as f: for line in chomp(f): ... -- Steve -- https://mail.python.org/mailman/listinfo/python-list
Re: Irregular last line in a text file, was Re: Regular expressions
On 2015-11-03, Tim Chase wrote: [re. iterating over lines in a file] > I can't think of more than 1-2 times in my last 10+ years of > Pythoning that I've actually had potential use for the newlines, If you can think of 1-2 times when you've been interating over the lines in a file and wanted to see the EOL markers, then that's 1-2 times more than I've ever wanted to see them since I started using Python 16 years ago... -- Grant Edwards grant.b.edwardsYow! ! Up ahead! It's a at DONUT HUT!! gmail.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Irregular last line in a text file, was Re: Regular expressions
On 2015-11-03 11:39, Ian Kelly wrote: > >> because I have countless loops that look something like > >> > >> with open(...) as f: > >> for line in f: > >> line = line.rstrip('\r\n') > >> process(line) > > > > What would happen if you read a file opened like this without > > iterating over lines? > > I think I'd go with this: > > >>> def strip_newlines(iterable): > ... for line in iterable: > ... yield line.rstrip('\r\n') > ... Behind the scenes, this is what I usually end up doing, but the effective logic is the same. I just like the notion of being able to tell open() that I want iteratation to happen over the *content* of the lines, ignoring the new-line delimiters. I can't think of more than 1-2 times in my last 10+ years of Pythoning that I've actually had potential use for the newlines, usually on account of simply feeding the entire line back into some filelike.write() method where I wanted the newlines in the resulting file. But even in those cases, I seem to recall stripping off the arbitrary newlines (LF vs. CR/LF) and then adding my own known line delimiter. -tkc -- https://mail.python.org/mailman/listinfo/python-list
Re: Irregular last line in a text file, was Re: Regular expressions
On Tue, Nov 3, 2015 at 11:33 AM, Ian Kelly wrote: > On Tue, Nov 3, 2015 at 9:56 AM, Tim Chase > wrote: >> Or even more valuable to me: >> >> with open(..., newline="strip") as f: >> assert all(not line.endswith(("\n", "\r")) for line in f) >> >> because I have countless loops that look something like >> >> with open(...) as f: >> for line in f: >> line = line.rstrip('\r\n') >> process(line) > > What would happen if you read a file opened like this without > iterating over lines? I think I'd go with this: >>> def strip_newlines(iterable): ... for line in iterable: ... yield line.rstrip('\r\n') ... >>> list(strip_newlines(['one\n', 'two\r', 'three'])) ['one', 'two', 'three'] Or if I care about optimizing the for loop (but we're talking about file I/O, so probably not), this might be faster: >>> import operator >>> def strip_newlines(iterable): ... return map(operator.methodcaller('rstrip', '\r\n'), iterable) ... >>> list(strip_newlines(['one\n', 'two\r', 'three'])) ['one', 'two', 'three'] Then the iteration is just: for line in strip_newlines(f): -- https://mail.python.org/mailman/listinfo/python-list
Re: Irregular last line in a text file, was Re: Regular expressions
On Tue, Nov 3, 2015 at 9:56 AM, Tim Chase wrote: > On 2015-11-03 16:35, Peter Otten wrote: >> I wish there were a way to prohibit such files. Maybe a special >> value >> >> with open(..., newline="normalize") f: >> assert all(line.endswith("\n") for line in f) >> >> to ensure that all lines end with "\n"? > > Or even more valuable to me: > > with open(..., newline="strip") as f: > assert all(not line.endswith(("\n", "\r")) for line in f) > > because I have countless loops that look something like > > with open(...) as f: > for line in f: > line = line.rstrip('\r\n') > process(line) What would happen if you read a file opened like this without iterating over lines? -- https://mail.python.org/mailman/listinfo/python-list
Re: Irregular last line in a text file, was Re: Regular expressions
Tim Chase wrote: > On 2015-11-03 16:35, Peter Otten wrote: >> I wish there were a way to prohibit such files. Maybe a special >> value >> >> with open(..., newline="normalize") f: >> assert all(line.endswith("\n") for line in f) >> >> to ensure that all lines end with "\n"? > > Or even more valuable to me: > > with open(..., newline="strip") as f: > assert all(not line.endswith(("\n", "\r")) for line in f) > > because I have countless loops that look something like > > with open(...) as f: > for line in f: > line = line.rstrip('\r\n') > process(line) Indeed. It's obvious now you're saying it... -- https://mail.python.org/mailman/listinfo/python-list
Re: Irregular last line in a text file, was Re: Regular expressions
On 2015-11-03 16:35, Peter Otten wrote: > I wish there were a way to prohibit such files. Maybe a special > value > > with open(..., newline="normalize") f: > assert all(line.endswith("\n") for line in f) > > to ensure that all lines end with "\n"? Or even more valuable to me: with open(..., newline="strip") as f: assert all(not line.endswith(("\n", "\r")) for line in f) because I have countless loops that look something like with open(...) as f: for line in f: line = line.rstrip('\r\n') process(line) -tkc -- https://mail.python.org/mailman/listinfo/python-list
Re: Irregular last line in a text file, was Re: Regular expressions
Peter Otten writes: > Jussi Piitulainen wrote: >> Peter Otten writes: >> >>> If a "line" is defined as a string that ends with a newline >>> >>> def ends_in_asterisk(line): >>> return False >>> >>> would also satisfy the requirement. Lies, damned lies, and specs ;) >> >> Even if a "line" is defined as a string that comes from reading >> something like a file with default options, a line may end in >> an asterisk. > > Note that the last line from the file is not a line as defined by me > in the above post ;) Noted. > [ line.endswith('*') for line in StringIO('rivi*\nrivi*\nrivi*') ] >> [False, False, True] > > I wish there were a way to prohibit such files. Maybe a special value > > with open(..., newline="normalize") f: > assert all(line.endswith("\n") for line in f) > > to ensure that all lines end with "\n"? I'd like that. It should be the default. -- https://mail.python.org/mailman/listinfo/python-list
Irregular last line in a text file, was Re: Regular expressions
Jussi Piitulainen wrote: > Peter Otten writes: > >> If a "line" is defined as a string that ends with a newline >> >> def ends_in_asterisk(line): >> return False >> >> would also satisfy the requirement. Lies, damned lies, and specs ;) > > Even if a "line" is defined as a string that comes from reading > something like a file with default options, a line may end in > an asterisk. Note that the last line from the file is not a line as defined by me in the above post ;) [ line.endswith('*') for line in StringIO('rivi*\nrivi*\nrivi*') ] > [False, False, True] I wish there were a way to prohibit such files. Maybe a special value with open(..., newline="normalize") f: assert all(line.endswith("\n") for line in f) to ensure that all lines end with "\n"? -- https://mail.python.org/mailman/listinfo/python-list