On 8 August 2018 at 07:32, Chris Angelico <ros...@gmail.com> wrote: > On Wed, Aug 8, 2018 at 4:14 PM, Ken Hilton <kenlhil...@gmail.com> wrote: >> >> Now, let's take a look at the following scenario: >> >> def read_multiple(*filenames): >> for filename in filenames: >> with open(filename) as f: >> yield f.read() >> >> Can you spot the problem? The "with open(filename)" statement is supposed to >> ensure that the file object is disposed of properly. However, the "yield >> f.read()" statement suspends execution within the with block, so if this >> happened: >> >> for contents in read_multiple('chunk1', 'chunk2', 'chunk3'): >> if contents == 'hello': >> break >> >> and the contents of "chunk2" were "hello" then the loop would exit, and >> "chunk2" would never be closed! Yielding inside a with block, therefore, >> doesn't make sense and can only lead to obscure bugs. > > This is only a problem if you consider it to be one. The value of the > 'with' statement is not destroyed; for example, you're capped at _one_ > open file (whereas without the context manager, it's entirely possible > for file-open in a loop to leak a large number of handles).
Without the context manager you could write: def read_multiple(*filenames): for filename in filenames: f = open(filename) yield f.read() f.close() Which also only leaks one file descriptor. The point of the with statement is that this was considered unsatisfactory but when you yield from a with statement the advantage of with is lost. In fact if you're happy depending on garbage collection for cleanup then you can write this more conveniently: def read_multiple(*filenames): for filename in filenames: yield open(filename).read() Or even: for text in (open(filename).read() for filename in filenames): # stuff In any case saying that you only leak one file descriptor misses the point. The with statement can do many different things and it's purpose is to guarantee *without depending on garbage collection* that the __exit__ method will be called. Yielding from the with statement circumvents that guarantee. >> I believe all possible cases where one would yield inside a context manager >> can be covered by saving anything required from the context manager and then >> yielding the results outside. Therefore, I propose making a "yield" inside a >> with block become a SyntaxError. > > What about this: > > def read_words(*filenames): > for filename in filenames: > with open(filename) as f: > for line in f: > yield from line.split() > > It'll read from a series of files and yield individual words (ignoring > annoying little things like punctuation and hyphenation for the sake > of simplicity). You are assuming that every yield-inside-with is a > *single* yield. Actually the general way to solve this problem is to move all context managers out of iterators/generators. Instead of a helper generator what you really want in this situation is a helper context manager. I think the reason this isn't always used is just because it's a bit harder to write a context manager than a generator e.g.: from contextlib import contextmanager @contextmanager def open_cat_split(*filenames): f = None def get_words(): nonlocal f for filename in filenames: with open(filename) as f: for line in f: yield from line.split() else: f = None try: yield get_words() finally: if f is not None: f.close() Then you can do: with open_cat_split('file1.txt', 'file2.txt') as words: for word in words: print(word) I am not sure exactly what the best primitive would be to make this sort of thing easier. I most often see examples of this when someone wants to process multiple files concatenated. The stdlib fileinput module has a context manager that *almost* works: https://docs.python.org/3.7/library/fileinput.html#fileinput.input The corner case that isn't handled correctly by fileinput.input is that when the list of filenames is empty it uses sys.argv or stdin which is unlikely to be what someone wants in most of these situations. You could fix that with from contextlib import contextmanager import fileinput def open_cat(*filenames): if not filenames: @contextmanager def dummy(): yield () return dummy() else: return fileinput.input(filenames) And with that you could do: def map_split(strings): for string in strings: yield from string.split() with open_cat('file1.txt', 'file2.txt') as lines: for word in map_split(lines): print(word) Perhaps actually what is wanted is a more generic contextlib helper. I guess ExitStack is intended for this sort of thing but it's a bit unwieldy to use: https://docs.python.org/3.7/library/contextlib.html#contextlib.ExitStack We can use ExitStack to make something more convenient though: from contextlib import ExitStack, contextmanager @contextmanager def map_with(func, iterable): stack = ExitStack() def g(): for arg in iterable: stack.close() iterable_cm = func(arg) stack.push(iterable_cm) yield iterable_cm with stack: yield g() The you can do something like: def cat_split(files): for f in files: for line in f: yield from line.split() with map_with(open, ['file1.txt', 'file2.txt']) as files: for word in cat_split(files): print(word) -- Oscar _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/