Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-04 Thread Tim Chase
On 2015-11-04 14:39, Steven D'Aprano wrote:
> On Wednesday 04 November 2015 03:56, Tim Chase wrote:
>> Or even more valuable to me:
>> 
>>   with open(..., newline="strip") as f:
>> assert all(not line.endswith(("\n", "\r")) for line in f)
> 
> # Works only on Windows text files.
> def chomp(lines):
> for line in lines:
> yield line.rstrip('\r\n')

.rstrip() takes a string that is a set of characters, so it will
remove any \r or \n at the end of the string (so it works with
both Windows & *nix line-endings) whereas just using .rstrip()
without a parameter can throw away data you might want:

  >>> "hello \r\n\r\r\n\n\n".rstrip("\r\n")
  'hello '
  >>> "hello \r\n\r\r\n\n\n".rstrip()
  'hello'

-tkc




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-04 Thread Oscar Benjamin
On 4 November 2015 at 03:39, Steven D'Aprano
 wrote:
>
> Better would be this:
>
> def chomp(lines):
> for line in lines:
> yield line.rstrip()  # remove all trailing whitespace
>
>
> with open(...) as f:
> for line in chomp(f): ...

with open(...) as f:
for line in map(str.rstrip, f): ...

--
Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Steven D'Aprano
On Wednesday 04 November 2015 03:56, Tim Chase wrote:

> Or even more valuable to me:
> 
>   with open(..., newline="strip") as f:
> assert all(not line.endswith(("\n", "\r")) for line in f)

# Works only on Windows text files.
def chomp(lines):
for line in lines:
yield line.rstrip('\r\n')


Better would be this:

def chomp(lines):
for line in lines:
yield line.rstrip()  # remove all trailing whitespace


with open(...) as f:
for line in chomp(f): ...


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Grant Edwards
On 2015-11-03, Tim Chase  wrote:

[re. iterating over lines in a file]

> I can't think of more than 1-2 times in my last 10+ years of
> Pythoning that I've actually had potential use for the newlines,

If you can think of 1-2 times when you've been interating over the
lines in a file and wanted to see the EOL markers, then that's 1-2
times more than I've ever wanted to see them since I started using
Python 16 years ago...

-- 
Grant Edwards   grant.b.edwardsYow! !  Up ahead!  It's a
  at   DONUT HUT!!
  gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Tim Chase
On 2015-11-03 11:39, Ian Kelly wrote:
> >> because I have countless loops that look something like
> >>
> >>   with open(...) as f:
> >> for line in f:
> >>   line = line.rstrip('\r\n')
> >>   process(line)  
> >
> > What would happen if you read a file opened like this without
> > iterating over lines?  
> 
> I think I'd go with this:
> 
> >>> def strip_newlines(iterable):  
> ... for line in iterable:
> ... yield line.rstrip('\r\n')
> ...

Behind the scenes, this is what I usually end up doing, but the
effective logic is the same.  I just like the notion of being able to
tell open() that I want iteratation to happen over the *content* of
the lines, ignoring the new-line delimiters.

I can't think of more than 1-2 times in my last 10+ years of
Pythoning that I've actually had potential use for the newlines,
usually on account of simply feeding the entire line back into some
filelike.write() method where I wanted the newlines in the resulting
file. But even in those cases, I seem to recall stripping off the
arbitrary newlines (LF vs. CR/LF) and then adding my own known line
delimiter.

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Ian Kelly
On Tue, Nov 3, 2015 at 11:33 AM, Ian Kelly  wrote:
> On Tue, Nov 3, 2015 at 9:56 AM, Tim Chase  
> wrote:
>> Or even more valuable to me:
>>
>>   with open(..., newline="strip") as f:
>> assert all(not line.endswith(("\n", "\r")) for line in f)
>>
>> because I have countless loops that look something like
>>
>>   with open(...) as f:
>> for line in f:
>>   line = line.rstrip('\r\n')
>>   process(line)
>
> What would happen if you read a file opened like this without
> iterating over lines?

I think I'd go with this:

>>> def strip_newlines(iterable):
... for line in iterable:
... yield line.rstrip('\r\n')
...
>>> list(strip_newlines(['one\n', 'two\r', 'three']))
['one', 'two', 'three']

Or if I care about optimizing the for loop (but we're talking about
file I/O, so probably not), this might be faster:

>>> import operator
>>> def strip_newlines(iterable):
... return map(operator.methodcaller('rstrip', '\r\n'), iterable)
...
>>> list(strip_newlines(['one\n', 'two\r', 'three']))
['one', 'two', 'three']

Then the iteration is just:
for line in strip_newlines(f):
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Ian Kelly
On Tue, Nov 3, 2015 at 9:56 AM, Tim Chase  wrote:
> On 2015-11-03 16:35, Peter Otten wrote:
>> I wish there were a way to prohibit such files. Maybe a special
>> value
>>
>> with open(..., newline="normalize") f:
>> assert all(line.endswith("\n") for line in f)
>>
>> to ensure that all lines end with "\n"?
>
> Or even more valuable to me:
>
>   with open(..., newline="strip") as f:
> assert all(not line.endswith(("\n", "\r")) for line in f)
>
> because I have countless loops that look something like
>
>   with open(...) as f:
> for line in f:
>   line = line.rstrip('\r\n')
>   process(line)

What would happen if you read a file opened like this without
iterating over lines?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Peter Otten
Tim Chase wrote:

> On 2015-11-03 16:35, Peter Otten wrote:
>> I wish there were a way to prohibit such files. Maybe a special
>> value
>> 
>> with open(..., newline="normalize") f:
>> assert all(line.endswith("\n") for line in f)
>> 
>> to ensure that all lines end with "\n"?
> 
> Or even more valuable to me:
> 
>   with open(..., newline="strip") as f:
> assert all(not line.endswith(("\n", "\r")) for line in f)
> 
> because I have countless loops that look something like
> 
>   with open(...) as f:
> for line in f:
>   line = line.rstrip('\r\n')
>   process(line)

Indeed. It's obvious now you're saying it...

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Tim Chase
On 2015-11-03 16:35, Peter Otten wrote:
> I wish there were a way to prohibit such files. Maybe a special
> value
> 
> with open(..., newline="normalize") f: 
> assert all(line.endswith("\n") for line in f)
> 
> to ensure that all lines end with "\n"?

Or even more valuable to me:

  with open(..., newline="strip") as f:
assert all(not line.endswith(("\n", "\r")) for line in f)

because I have countless loops that look something like

  with open(...) as f:
for line in f:
  line = line.rstrip('\r\n')
  process(line)

-tkc




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Jussi Piitulainen
Peter Otten writes:
> Jussi Piitulainen wrote:
>> Peter Otten writes:
>> 
>>> If a "line" is defined as a string that ends with a newline
>>>
>>> def ends_in_asterisk(line):
>>> return False
>>>
>>> would also satisfy the requirement. Lies, damned lies, and specs ;)
>> 
>> Even if a "line" is defined as a string that comes from reading
>> something like a file with default options, a line may end in
>> an asterisk.
>  
> Note that the last line from the file is not a line as defined by me
> in the above post ;)

Noted.

> [ line.endswith('*') for line in StringIO('rivi*\nrivi*\nrivi*') ]
>> [False, False, True]
>
> I wish there were a way to prohibit such files. Maybe a special value
>
> with open(..., newline="normalize") f: 
> assert all(line.endswith("\n") for line in f)
>
> to ensure that all lines end with "\n"?

I'd like that. It should be the default.
-- 
https://mail.python.org/mailman/listinfo/python-list


Irregular last line in a text file, was Re: Regular expressions

2015-11-03 Thread Peter Otten
Jussi Piitulainen wrote:

> Peter Otten writes:
> 
>> If a "line" is defined as a string that ends with a newline
>>
>> def ends_in_asterisk(line):
>> return False
>>
>> would also satisfy the requirement. Lies, damned lies, and specs ;)
> 
> Even if a "line" is defined as a string that comes from reading
> something like a file with default options, a line may end in
> an asterisk.
 
Note that the last line from the file is not a line as defined by me in the 
above post ;)

 [ line.endswith('*') for line in StringIO('rivi*\nrivi*\nrivi*') ]
> [False, False, True]

I wish there were a way to prohibit such files. Maybe a special value

with open(..., newline="normalize") f: 
assert all(line.endswith("\n") for line in f)

to ensure that all lines end with "\n"?


-- 
https://mail.python.org/mailman/listinfo/python-list