Hi Oscar, Thanks for the comments! Can I ask that you hold onto them until I post to python-ideas, though? (Should be later today.) It's a discussion worth having, but if we have it here then we'll just end up having to repeat it there anyway :-).
-n On Mon, Oct 17, 2016 at 5:04 AM, Oscar Benjamin <oscar.j.benja...@gmail.com> wrote: > On 17 October 2016 at 09:08, Nathaniel Smith <n...@pobox.com> wrote: >> Hi all, >> >> I've been poking at an idea for changing how 'for' loops work to >> hopefully make them work better for pypy and async/await code. I >> haven't taken it to python-ideas yet -- this is its first public >> outing, actually -- but since it directly addresses pypy GC issues I >> thought I'd send around a draft to see what you think. (E.g., would >> this be something that makes your life easier?) > > To be clear, I'm not a PyPy dev so I'm just answering from a general > Python perspective here. > >> Always inject resources, and do all cleanup at the top level >> ------------------------------------------------------------ >> >> It was suggested on python-dev (XX find link) that a pattern to avoid >> these problems is to always pass resources in from above, e.g. >> ``read_newline_separated_json`` should take a file object rather than >> a path, with cleanup handled at the top level:: > > I suggested this and I still think that it is the best idea. > >> def read_newline_separated_json(file_handle): >> for line in file_handle: >> yield json.loads(line) >> >> def read_users(file_handle): >> for document in read_newline_separated_json(file_handle): >> yield User.from_json(document) >> >> with open(path) as file_handle: >> for user in read_users(file_handle): >> ... >> >> This works well in simple cases; here it lets us avoid the "N+1 >> problem". But unfortunately, it breaks down quickly when things get >> more complex. Consider if instead of reading from a file, our >> generator was processing the body returned by an HTTP GET request -- >> while handling redirects and authentication via OAUTH. Then we'd >> really want the sockets to be managed down inside our HTTP client >> library, not at the top level. Plus there are other cases where >> ``finally`` blocks embedded inside generators are important in their >> own right: db transaction management, emitting logging information >> during cleanup (one of the major motivating use cases for WSGI >> ``close``), and so forth. > > I haven't written the kind of code that you're describing so I can't > say exactly how I would do it. I imagine though that helpers could be > used to solve some of the problems that you're referring to though. > Here's a case I do know where the above suggestion is awkward: > > def concat(filenames): > for filename in filenames: > with open(filename) as inputfile: > yield from inputfile > > for line in concat(filenames): > ... > > It's still possible to safely handle this use case by creating a > helper though. fileinput.input almost does what you want: > > with fileinput.input(filenames) as lines: > for line in lines: > ... > > Unfortunately if filenames is empty this will default to sys.stdin so > it's not perfect but really I think introducing useful helpers for > common cases (rather than core language changes) should be considered > as the obvious solution here. Generally it would have been better if > the discussion for PEP 525 has focussed more on helping people to > debug/fix dependence on __del__ rather than trying to magically fix > broken code. > >> New convenience functions >> ------------------------- >> >> The ``itertools`` module gains a new iterator wrapper that can be used >> to selectively disable the new ``__iterclose__`` behavior:: >> >> # XX FIXME: I feel like there might be a better name for this one? >> class protect(iterable): >> def __init__(self, iterable): >> self._it = iter(iterable) >> >> def __iter__(self): >> return self >> >> def __next__(self): >> return next(self._it) >> >> def __iterclose__(self): >> # Swallow __iterclose__ without passing it on >> pass >> >> Example usage (assuming that file objects implements ``__iterclose__``):: >> >> with open(...) as handle: >> # Iterate through the same file twice: >> for line in itertools.protect(handle): >> ... >> handle.seek(0) >> for line in itertools.protect(handle): >> ... > > It would be much simpler to reverse this suggestion and say let's > introduce a helper that selectively *enables* the new behaviour you're > proposing i.e.: > > for line in itertools.closeafter(open(...)): > ... > if not line.startswith('#'): > break # <--------------- file gets closed here > > Then we can leave (async) for loops as they are and there are no > backward compatbility problems etc. > > -- > Oscar > _______________________________________________ > pypy-dev mailing list > pypy-dev@python.org > https://mail.python.org/mailman/listinfo/pypy-dev -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev