Re: [Python-Dev] Aware datetime from naive local time Was: Status on PEP-431 Timezones
On Thu, Apr 16, 2015 at 1:14 AM, Alexander Belopolsky alexander.belopol...@gmail.com wrote: On Wed, Apr 15, 2015 at 4:46 PM, Akira Li 4kir4...@gmail.com wrote: Look what happened on July 1, 1990. At 2 AM, the clocks in Ukraine were moved back one hour. So times like 01:30 AM happened twice there on that day. Let's see how Python handles this situation $ TZ=Europe/Kiev python3 from email.utils import localtime from datetime import datetime localtime(datetime(1990,7,1,1,30)).strftime('%c %z %Z') 'Sun Jul 1 01:30:00 1990 +0400 MSD' So far so good, I've got the first of the two 01:30AM's. But what if I want the other 01:30AM? Well, localtime(datetime(1990,7,1,1,30), isdst=0).strftime('%c %z %Z') 'Sun Jul 1 01:30:00 1990 +0300 EEST' gives me the other 01:30AM, but it is counter-intuitive: I have to ask for the standard (winter) time to get the daylight savings (summer) time. It looks incorrect. Here's the corresponding pytz code: from datetime import datetime import pytz tz = pytz.timezone('Europe/Kiev') print(tz.localize(datetime(1990, 7, 1, 1, 30), is_dst=False).strftime('%c %z %Z')) # - Sun Jul 1 01:30:00 1990 +0300 EEST print(tz.localize(datetime(1990, 7, 1, 1, 30), is_dst=True).strftime('%c %z %Z')) # - Sun Jul 1 01:30:00 1990 +0400 MSD See also Enhance support for end-of-DST-like ambiguous time [1] [1] https://bugs.launchpad.net/pytz/+bug/1378150 `email.utils.localtime()` is broken: If you think there is a bug in email.utils.localtime - please open an issue at bugs.python.org. Your question below suggests that you believe it is not a bug i.e., `email.utils.localtime()` is broken *by design* unless you think it is ok to ignore `+0400 MSD`. pytz works for me (I can get both `+0300 EEST` and `+0400 MSD`). I don't think `localtime()` can be fixed without the tz database. I don't know whether it should be fixed, let somebody else who can't use pytz to pioneer the issue. The purpose of the code example is to **inform** that `email.utils.localtime()` fails (it returns only +0300 EEST) in this case: from datetime import datetime from email.utils import localtime print(localtime(datetime(1990, 7, 1, 1, 30)).strftime('%c %z %Z')) # - Sun Jul 1 01:30:00 1990 +0300 EEST print(localtime(datetime(1990, 7, 1, 1, 30), isdst=0).strftime('%c %z %Z')) # - Sun Jul 1 01:30:00 1990 +0300 EEST print(localtime(datetime(1990, 7, 1, 1, 30), isdst=1).strftime('%c %z %Z')) # - Sun Jul 1 01:30:00 1990 +0300 EEST print(localtime(datetime(1990, 7, 1, 1, 30), isdst=-1).strftime('%c %z %Z')) # - Sun Jul 1 01:30:00 1990 +0300 EEST Versions: $ ./python -V Python 3.5.0a3+ $ dpkg -s tzdata | grep -i version Version: 2015b-0ubuntu0.14.04 The uncertainty about how to deal with the repeated hour was the reason why email.utils.localtime-like interface did not make it to the datetime module. repeated hour (time jumps back) can be treated like a end-of-DST transition, to resolve ambiguities [1]. I don't understand what you are complaining about. It is quite possible that pytz uses is_dst flag differently from the way email.utils.localtime uses isdst. I was not able to find a good description of what is_dst means in pytz, but localtime's isdst is documented as follows: a positive or zero value for *isdst* causes localtime to presume initially that summer time (for example, Daylight Saving Time) is or is not (respectively) in effect for the specified time. Can you demonstrate that email.utils.localtime does not behave as documented? No need to be so defensive about it. *repeated hour (time jumps back) can be treated like a end-of-DST transition, to resolve ambiguities [1].* is just a *an example* on how to fix the problem in the same way how it is done in pytz: from datetime import datetime import pytz tz = pytz.timezone('Europe/Kiev') after = tz.localize(datetime(1990, 7, 1, 1, 30), is_dst=False) before = tz.localize(datetime(1990, 7, 1, 1, 30), is_dst=True) before after True before datetime.datetime(1990, 7, 1, 1, 30, tzinfo=DstTzInfo 'Europe/Kiev' MSD+4:00:00 DST) after datetime.datetime(1990, 7, 1, 1, 30, tzinfo=DstTzInfo 'Europe/Kiev' EEST+3:00:00 DST) before.astimezone(pytz.utc) datetime.datetime(1990, 6, 30, 21, 30, tzinfo=UTC) after.astimezone(pytz.utc) datetime.datetime(1990, 6, 30, 22, 30, tzinfo=UTC) before.dst() datetime.timedelta(0, 3600) after.dst() datetime.timedelta(0, 3600) pytz.OLSON_VERSION '2015b' Here's summer time in both cases i.e., it is not *true* end-of-DST transition (that is why I've used the word *like* above). If we ignore ambiguous time that may occur more than twice then a boolean flag such as pytz's is_dst is *always* enough to resolve the ambiguity assuming we have access to the tz database. And yes, the example demonstrates that the behavior of pytz's is_dst
Re: [Python-Dev] Status on PEP-431 Timezones
Lennart Regebro rege...@gmail.com writes: OK, so I realized another thing today, and that is that arithmetic doesn't necessarily round trip. For example, 2002-10-27 01:00 US/Eastern comes both in DST and STD. But 2002-10-27 01:00 US/Eastern STD minus two days is 2002-10-25 01:00 US/Eastern DST two days is ambiguous here. It is incorrect if you mean 48 hours (the difference is 49 hours): #!/usr/bin/env python3 from datetime import datetime, timedelta import pytz tz = pytz.timezone('US/Eastern') then_isdst = False # STD then = tz.localize(datetime(2002, 10, 27, 1), is_dst=then_isdst) now = tz.localize(datetime(2002, 10, 25, 1), is_dst=None) # no utc transition print((then - now) // timedelta(hours=1)) # - 49 However, 2002-10-25 01:00 US/Eastern DST plus two days is 2002-10-27 01:00 US/Eastern, but it is ambiguous if you want DST or not DST. It is not ambiguous if you know what two days *in your particular application* should mean (`day+2` vs. +48h exactly): print(tz.localize(now.replace(tzinfo=None) + timedelta(2), is_dst=then_isdst)) # - 2002-10-27 01:00:00-05:00 # +49h print(tz.normalize(now + timedelta(2))) # +48h # - 2002-10-27 01:00:00-04:00 Here's a simple mental model that can be used for date arithmetics: - naive datetime + timedelta(2) == same time, elapsed hours unknown - aware utc datetime + timedelta(2) == same time, +48h - aware datetime with timezone that may have different utc offsets at different times + timedelta(2) == unknown time, +48h unknown means that you can't tell without knowning the specific timezone. It ignores leap seconds. The 3rd case behaves *as if* the calculations are performed using these steps (the actual implementation may be different): 1. convert an aware datetime object to utc (dt.astimezone(pytz.utc)) 2. do the simple arithmetics using utc time 3. convert the result to the original pytz timezone (utc_dt.astimezone(tz)) you don't need `.localize()`, `.normalize()` calls here. And you can't pass in a is_dst flag to __add__, so the arithmatic must just pick one, and the sensible one is to keep to the same DST. That means that: tz = get_timezone('US/Eastern') dt = datetimedatetime(2002, 10, 27, 1, 0, tz=tz, is_dst=False) dt2 = dt - 420 + 420 assert dt == dt2 Will fail, which will be unexpected for most people. I think there is no way around this, but I thought I should flag for it. This is a good reason to do all your date time arithmetic in UTC. //Lennart It won't fail: from datetime import datetime, timedelta import pytz tz = pytz.timezone('US/Eastern') dt = tz.localize(datetime(2002, 10, 27, 1), is_dst=False) delta = timedelta(seconds=420) assert dt == tz.normalize(tz.normalize(dt - delta) + delta) The only reason `tz.normalize()` is used so that tzinfo would be correct for the resulting datetime object; it does not affect the comparison otherwise: assert dt == (dt - delta + delta) #XXX tzinfo may be incorrect assert dt == tz.normalize(dt - delta + delta) # correct tzinfo for the final result ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status on PEP-431 Timezones
Isaac Schwabacher ischwabac...@wisc.edu writes: ... I know that you can do datetime.now(tz), and you can do datetime(2013, 11, 3, 1, 30, tzinfo=zoneinfo('America/Chicago')), but not being able to add a time zone to an existing naive datetime is painful (and strptime doesn't even let you pass in a time zone). `.now(tz)` is correct. `datetime(..., tzinfo=tz)`) is wrong: if tz is a pytz timezone then you may get a wrong tzinfo (LMT), you should use `tz.localize(naive_dt, is_dst=False|True|None)` instead. ... ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Aware datetime from naive local time Was: Status on PEP-431 Timezones
Alexander Belopolsky alexander.belopol...@gmail.com writes: Sorry for a truncated message. Please scroll past the quoted portion. On Thu, Apr 9, 2015 at 10:21 PM, Alexander Belopolsky alexander.belopol...@gmail.com wrote: On Thu, Apr 9, 2015 at 4:51 PM, Isaac Schwabacher ischwabac...@wisc.edu wrote: Well, you are right, but at least we do have a localtime utility hidden in the email package: from datetime import * from email.utils import localtime print(localtime(datetime.now())) 2015-04-09 15:19:12.84-04:00 You can read http://bugs.python.org/issue9527 for the reasons it did not make into datetime. But that's restricted to the system time zone. Nothing good ever comes from the system time zone... Let's solve one problem at a time. ... PEP 431 proposes to import zoneinfo into the stdlib, ... I am changing the subject so that we can focus on one question without diverting to PEP-size issues that are better suited for python ideas. I would like to add a functionality to the datetime module that would solve a seemingly simple problem: given a naive datetime instance assumed to be in local time, construct the corresponding aware datetime object with tzinfo set to an appropriate fixed offset datetime.timezone instance. Python 3 has this functionality implemented in the email package since version 3.3, and it appears to work well even in the ambiguous hour from email.utils import localtime from datetime import datetime localtime(datetime(2014,11,2,1,30)).strftime('%c %z %Z') 'Sun Nov 2 01:30:00 2014 -0400 EDT' localtime(datetime(2014,11,2,1,30), isdst=0).strftime('%c %z %Z') 'Sun Nov 2 01:30:00 2014 -0500 EST' However, in a location with a more interesting history, you can get a situation that would look like this in the zoneinfo database: $ zdump -v -c 1992 Europe/Kiev ... Europe/Kiev Sat Mar 24 22:59:59 1990 UTC = Sun Mar 25 01:59:59 1990 MSK isdst=0 Europe/Kiev Sat Mar 24 23:00:00 1990 UTC = Sun Mar 25 03:00:00 1990 MSD isdst=1 Europe/Kiev Sat Jun 30 21:59:59 1990 UTC = Sun Jul 1 01:59:59 1990 MSD isdst=1 Europe/Kiev Sat Jun 30 22:00:00 1990 UTC = Sun Jul 1 01:00:00 1990 EEST isdst=1 Europe/Kiev Sat Sep 28 23:59:59 1991 UTC = Sun Sep 29 02:59:59 1991 EEST isdst=1 Europe/Kiev Sun Sep 29 00:00:00 1991 UTC = Sun Sep 29 02:00:00 1991 EET isdst=0 ... Look what happened on July 1, 1990. At 2 AM, the clocks in Ukraine were moved back one hour. So times like 01:30 AM happened twice there on that day. Let's see how Python handles this situation $ TZ=Europe/Kiev python3 from email.utils import localtime from datetime import datetime localtime(datetime(1990,7,1,1,30)).strftime('%c %z %Z') 'Sun Jul 1 01:30:00 1990 +0400 MSD' So far so good, I've got the first of the two 01:30AM's. But what if I want the other 01:30AM? Well, localtime(datetime(1990,7,1,1,30), isdst=0).strftime('%c %z %Z') 'Sun Jul 1 01:30:00 1990 +0300 EEST' gives me the other 01:30AM, but it is counter-intuitive: I have to ask for the standard (winter) time to get the daylight savings (summer) time. It looks incorrect. Here's the corresponding pytz code: from datetime import datetime import pytz tz = pytz.timezone('Europe/Kiev') print(tz.localize(datetime(1990, 7, 1, 1, 30), is_dst=False).strftime('%c %z %Z')) # - Sun Jul 1 01:30:00 1990 +0300 EEST print(tz.localize(datetime(1990, 7, 1, 1, 30), is_dst=True).strftime('%c %z %Z')) # - Sun Jul 1 01:30:00 1990 +0400 MSD See also Enhance support for end-of-DST-like ambiguous time [1] [1] https://bugs.launchpad.net/pytz/+bug/1378150 `email.utils.localtime()` is broken: from datetime import datetime from email.utils import localtime print(localtime(datetime(1990, 7, 1, 1, 30)).strftime('%c %z %Z')) # - Sun Jul 1 01:30:00 1990 +0300 EEST print(localtime(datetime(1990, 7, 1, 1, 30), isdst=0).strftime('%c %z %Z')) # - Sun Jul 1 01:30:00 1990 +0300 EEST print(localtime(datetime(1990, 7, 1, 1, 30), isdst=1).strftime('%c %z %Z')) # - Sun Jul 1 01:30:00 1990 +0300 EEST print(localtime(datetime(1990, 7, 1, 1, 30), isdst=-1).strftime('%c %z %Z')) # - Sun Jul 1 01:30:00 1990 +0300 EEST Versions: $ ./python -V Python 3.5.0a3+ $ dpkg -s tzdata | grep -i version Version: 2015b-0ubuntu0.14.04 The uncertainty about how to deal with the repeated hour was the reason why email.utils.localtime-like interface did not make it to the datetime module. repeated hour (time jumps back) can be treated like a end-of-DST transition, to resolve ambiguities [1]. The main objection to the isdst flag was that in most situations, determining whether DST is in effect is as hard as finding the UTC offset, so reducing the problem of finding the UTC offset to the one of finding the value for isdst does not solve much. I now realize that the problem is simply in the name for the flag. While we cannot often tell what isdst
Re: [Python-Dev] Status on PEP-431 Timezones
Alexander Belopolsky alexander.belopol...@gmail.com writes: On Wed, Apr 8, 2015 at 3:57 PM, Isaac Schwabacher ischwabac...@wisc.edu wrote: On 15-04-08, Alexander Belopolsky wrote: With datetime, we also have a problem that POSIX APIs don't have to deal with: local time arithmetics. What is t + timedelta(1) when t falls on the day before DST change? How would you set the isdst flag in the result? It's whatever time comes 60*60*24 seconds after t in the same time zone, because the timedelta class isn't expressive enough to represent anything but absolute time differences (nor should it be, IMO). This is not what most uses expect. The expect datetime(y, m, d, 12, tzinfo=New_York) + timedelta(1) to be datetime(y, m, d+1, 12, tzinfo=New_York) It is incorrect. If you want d+1 for +timedelta(1); use a **naive** datetime. Otherwise +timedelta(1) is +24h: tomorrow = tz.localize(aware_dt.replace(tzinfo=None) + timedelta(1), is_dst=None) dt_plus24h = tz.normalize(aware_dt + timedelta(1)) # +24h *tomorrow* and *aware_dt* have the *same* time but it is unknown how many hours have passed if the utc offset has changed in between. *dt_plus24h* may have a different time but there are exactly 24 hours have passed between *dt_plush24* and *aware_dt* http://stackoverflow.com/questions/441147/how-can-i-subtract-a-day-from-a-python-date ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status on PEP-431 Timezones
Isaac Schwabacher ischwabac...@wisc.edu writes: On 15-04-15, Akira Li 4kir4...@gmail.com wrote: Isaac Schwabacher ischwabac...@wisc.edu writes: ... I know that you can do datetime.now(tz), and you can do datetime(2013, 11, 3, 1, 30, tzinfo=zoneinfo('America/Chicago')), but not being able to add a time zone to an existing naive datetime is painful (and strptime doesn't even let you pass in a time zone). `.now(tz)` is correct. `datetime(..., tzinfo=tz)`) is wrong: if tz is a pytz timezone then you may get a wrong tzinfo (LMT), you should use `tz.localize(naive_dt, is_dst=False|True|None)` instead. The whole point of this thread is to finalize PEP 431, which fixes the problem for which `localize()` and `normalize()` are workarounds. When this is done, `datetime(..., tzinfo=tz)` will be correct. ijs The input time is ambiguous. Even if we assume PEP 431 is implemented in some form, your code is still missing isdst parameter (or the analog). PEP 431 won't fix it; it can't resolve the ambiguity by itself. Notice is_dst paramter in the `tz.localize()` call (current API). .now(tz) works even during end-of-DST transitions (current API) when the local time is ambiguous. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Aware datetime from naive local time Was: Status on PEP-431 Timezones
Alexander Belopolsky alexander.belopol...@gmail.com writes: ... For most world locations past discontinuities are fairly well documented for at least a century and future changes are published with at least 6 months lead time. It is important to note that the different versions of the tz database may lead to different tzinfo (utc offset, tzname) even for *past* dates. i.e., (lt, tzid, isdst) is not enough because the result for (lt, tzid(2015b), isdst) may be different from (lt, tzid(X), isdst) where lt = local time e.g., naive datetime tzid = timezone from the tz database e.g., Europe/Kiev isdst = a boolean flag for disambiguation X != 2015b In other words, a fixed utc offset might not be sufficient even for past dates. ... Moreover, a program that rejects invalid times on input, but stores them for a long time may see its database silently corrupted after a zoneinfo update. Now it is time to make specific proposal. I would like to extend datetime.astimezone() method to work on naive datetime instances. Such instances will be assumed to be in local time and discontinuities will be handled as follows: 1. wall(t) == lt has a single solution. This is the trivial case and lt.astimezone(utc) and lt.astimezone(utc, which=i) for i=0,1 should return that solution. 2. wall(t) == lt has two solutions t1 and t2 such that t1 t2. In this case lt.astimezone(utc) == lt.astimezone(utc, which=0) == t1 and lt.astimezone(utc, which=1) == t2. In pytz terms: `which = not isdst` (end-of-DST-like transition: isdst changes from True to False in the direction of utc time). It resolves AmbiguousTimeError raised by `tz.localize(naive, is_dst=None)`. 3. wall(t) == lt has no solution. This happens when there is UTC time t0 such that wall(t0) lt and wall(t0+epsilon) lt (a positive discontinuity at time t0). In this case lt.astimezone(utc) should return t0 + lt - wall(t0). I.e., we ignore the discontinuity and extend wall(t) linearly past t0. Obviously, in this case the invariant wall(lt.astimezone(utc)) == lt won't hold. The which flag should be handled as follows: lt.astimezone(utc) == lt.astimezone(utc, which=0) and lt.astimezone(utc, which=0) == t0 + lt - wall(t0+eps). It is inconsistent with the previous case: here `which = isdst` but `which = not isdst` above. `lt.astimezone(utc, which=0) == t0 + lt - wall(t0+eps)` corresponds to: result = tz.normalize(tz.localize(lt, isdst=False)) i.e., `which = isdst` (t0 is at the start of DST and therefore isdst changes from False to True). It resolves NonExistentTimeError raised by `tz.localize(naive, is_dst=None)`. start-of-DST-like transition (Spring forward). For example, from datetime import datetime, timedelta import pytz tz = pytz.timezone('America/New_York') # 2am -- non-existent time print(tz.normalize(tz.localize(datetime(2015, 3, 8, 2), is_dst=False))) # - 2015-03-08 03:00:00-04:00 # after the jump (wall(t0+eps)) print(tz.localize(datetime(2015, 3, 8, 3), is_dst=None)) # - 2015-03-08 03:00:00-04:00 # same time, unambiguous # 2:01am -- non-existent time print(tz.normalize(tz.localize(datetime(2015, 3, 8, 2, 1), is_dst=False))) # - 2015-03-08 03:01:00-04:00 print(tz.localize(datetime(2015, 3, 8, 3, 1), is_dst=None)) # - 2015-03-08 03:01:00-04:00 # same time, unambiguous # 2:59am non-existent time dt = tz.normalize(tz.localize(datetime(2015, 3, 8, 2, 59), is_dst=True)) print(dt) # - 2015-03-08 01:59:00-05:00 # before the jump (wall(t0-eps)) print(tz.normalize(dt + timedelta(minutes=1))) # - 2015-03-08 03:00:00-04:00 With the proposed features in place, one can use the naive code t = lt.astimezone(utc) and get predictable behavior in all cases and no crashes. A more sophisticated program can be written like this: t1 = lt.astimezone(utc, which=0) t2 = lt.astimezone(utc, which=1) if t1 == t2: t = t1 elif t2 t1: # ask the user to pick between t1 and t2 or raise AmbiguousLocalTimeError else: t = t1 # warn the user that time was invalid and changed or raise InvalidLocalTimeError ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 481 - Migrate Some Supporting Repositories to Git and Github
Larry Hastings la...@hastings.org writes: On 11/29/2014 04:37 PM, Donald Stufft wrote: On Nov 29, 2014, at 7:15 PM, Alex Gaynor alex.gay...@gmail.com wrote: Despite being a regular hg user for years, I have no idea how to create a local-only branch, or a branch which is pushed to a remote (to use the git term). I also don’t know how to do this. Instead of collectively scratching your heads, could one of you guys do the research and figure out whether or not hg supports this workflow? One of the following two things must be true: 1. hg supports this workflow (or a reasonable fascimile), which may lessen the need for this PEP. 2. hg doesn't support this workflow, which may strengthen the need for this PEP. Assuming git's all work is done in a local branch workflow, you could use bookmarks with hg http://lostechies.com/jimmybogard/2010/06/03/translating-my-git-workflow-with-local-branches-to-mercurial/ http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial/#branching-with-bookmarks http://mercurial.selenic.com/wiki/BookmarksExtension#Usage http://stackoverflow.com/questions/1598759/git-and-mercurial-compare-and-contrast -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
Steven D'Aprano st...@pearwood.info writes: On Wed, Sep 17, 2014 at 11:14:15AM +1000, Chris Angelico wrote: On Wed, Sep 17, 2014 at 5:29 AM, R. David Murray rdmur...@bitdance.com wrote: Basically, we are pretending that the each smuggled byte is single character for string parsing purposes...but they don't match any of our parsing constants. They are all any character matches in the regexes and what have you. This is slightly iffy, as you can't be sure that one byte represents one character, but as long as you don't much care about that, it's not going to be an issue. This discussion would probably be a lot more easy to follow, with fewer miscommunications, if there were some examples. Here is my example, perhaps someone can tell me if I'm understanding it correctly. I want to send an email including the header line: 'Subject: “NOBODY expects the Spanish Inquisition!”' from email.header import Header h = Header('Subject: “NOBODY expects the Spanish Inquisition!”') h.encode('utf-8') '=?utf-8?q?Subject=3A_=E2=80=9CNOBODY_expects_the_Spanish_Inquisition!?=\n =?utf-8?q?=E2=80=9D?=' h.encode() '=?utf-8?q?Subject=3A_=E2=80=9CNOBODY_expects_the_Spanish_Inquisition!?=\n =?utf-8?q?=E2=80=9D?=' h.encode('ascii') '=?utf-8?q?Subject=3A_=E2=80=9CNOBODY_expects_the_Spanish_Inquisition!?=\n =?utf-8?q?=E2=80=9D?=' -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline with statement line continuation
Nick Coghlan ncogh...@gmail.com writes: On 12 August 2014 22:15, Steven D'Aprano st...@pearwood.info wrote: Compare the natural way of writing this: with open(spam) as spam, open(eggs, w) as eggs, frobulate(cheese) as cheese: # do stuff with spam, eggs, cheese versus the dynamic way: with ExitStack() as stack: spam, eggs = [stack.enter_context(open(fname), mode) for fname, mode in zip((spam, eggs), (r, w)] cheese = stack.enter_context(frobulate(cheese)) # do stuff with spam, eggs, cheese You wouldn't necessarily switch at three. At only three, you have lots of options, including multiple nested with statements: with open(spam) as spam: with open(eggs, w) as eggs: with frobulate(cheese) as cheese: # do stuff with spam, eggs, cheese The multiple context managers in one with statement form is there *solely* to save indentation levels, and overuse can often be a sign that you may have a custom context manager trying to get out: @contextlib.contextmanager def dish(spam_file, egg_file, topping): with open(spam_file), open(egg_file, 'w'), frobulate(topping): yield with dish(spam, eggs, cheese) as spam, eggs, cheese: # do stuff with spam, eggs cheese ExitStack is mostly useful as a tool for writing flexible custom context managers, and for dealing with context managers in cases where lexical scoping doesn't necessarily work, rather than being something you'd regularly use for inline code. Why do I have so many contexts open at once in this function? is a question developers should ask themselves in the same way its worth asking why do I have so many local variables in this function? Multiline with-statement can be useful even with *two* context managers. Two is not many. Saving indentations levels along is a worthy goal. It can affect readability and the perceived complexity of the code. Here's how I'd like the code to look like: with (open('input filename') as input_file, open('output filename', 'w') as output_file): # code with list comprehensions to transform input file into output file Even one additional unnecessary indentation level may force to split list comprehensions into several lines (less readable) and/or use shorter names (less readable). Or it may force to move the inline code into a separate named function prematurely, solely to preserve the indentation level (also may be less readable) i.e., with ... as input_file: with ... as output_file: ... #XXX indentation level is lost for no reason with ... as infile, ... as outfile: #XXX shorter names ... with ... as input_file: with ... as output_file: transform(input_file, output_file) #XXX unnecessary function And (nested() can be implemented using ExitStack): with nested(open(..), open(..)) as (input_file, output_file): ... #XXX less readable Here's an example where nested() won't help: def get_integers(filename): with (open(filename, 'rb', 0) as file, mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as mmapped_file): for match in re.finditer(br'\d+', mmapped_file): yield int(match.group()) Here's another: with (open('log'+'some expression that generates filename', 'a') as logfile, redirect_stdout(logfile)): ... -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python2.7 infinite recursion when loading pickled object
Schmitt Uwe (ID SIS) uwe.schm...@id.ethz.ch writes: I discovered a problem using cPickle.loads from CPython 2.7.6. The last line in the following code raises an infinite recursion class T(object): def __init__(self): self.item = list() def __getattr__(self, name): return getattr(self.item, name) import cPickle t = T() l = cPickle.dumps(t) cPickle.loads(l) ... Is this a bug or did I miss something ? The issue is that your __getattr__ raises RuntimeError (due to infinite recursion) for non-existing attributes instead of AttributeError. To fix it, you could use object.__getattribute__: class C: def __init__(self): self.item = [] def __getattr__(self, name): return getattr(object.__getattribute__(self, 'item'), name) There were issues in the past due to {get,has}attr silencing non-AttributeError exceptions; therefore it is good that pickle breaks when it gets RuntimeError instead of AttributeError. -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] os.walk() is going to be *fast* with scandir
Armin Rigo ar...@tunes.org writes: On 10 August 2014 08:11, Larry Hastings la...@hastings.org wrote: A small tip from my bzr days - cd into the directory before scanning it I doubt that's permissible for a library function like os.scandir(). Indeed, chdir() is notably not compatible with multithreading. There would be a non-portable but clean way to do that: the functions openat() and fstatat(). They only exist on relatively modern Linuxes, though. There is os.fwalk() that could be both safer and faster than os.walk(). It yields rootdir fd that can be used by functions that support dir_fd parameter, see os.supports_dir_fd set. They use *at() functions under the hood. os.fwalk() could be implemented in terms of os.scandir() if the latter would support fd parameter like os.listdir() does (be in os.supports_fd set (note: it is different from os.supports_dir_fd)). Victor Stinner suggested [1] to allow scandir(fd) but I don't see it being mentioned in the pep 471 [2]: it neither supports nor rejects the idea. [1] https://mail.python.org/pipermail/python-dev/2014-July/135283.html [2] http://legacy.python.org/dev/peps/pep-0471/ -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Exposing the Android platform existence to Python modules
Guido van Rossum gu...@python.org writes: Well, it really does look like checking for the presence of those ANDROID_* environment variables it the best way to recognize the Android platform. Anyone can do that without waiting for a ruling on whether Android is Linux or not (which would be necessary because the docs for sys.platform are quite clear about its value on Linux systems). Googling terms like is Android Linux suggests that there is considerable controversy about the issue, so I suggest you don't wait. :-) I don't see sysconfig mentioned in the discussion (maybe for a reason). It might provide build-time information e.g., built_for_android = 'android' in sysconfig.get_config_var('MULTIARCH') assuming the complete value is something like 'arm-linux-android'. It says that the python binary is built for android (the current platform may or may not be Android). -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Exposing the Android platform existence to Python modules
Shiz h...@shiz.me writes: The most obvious change would be to subprocess.Popen(). The reason a generic approach there won't work is also the reason I expect more changes might be needed: the Android file system doesn't abide by any POSIX file system standards. Its shell isn't located at /bin/sh, but at /system/bin/sh. The only directories it provides that are POSIX-standard are /dev and /etc, to my knowledge. You could check to see if /system/bin/sh exists and use that first, but that would break the preferred shell on POSIX systems that happen to have /system for some reason or another. In short: the preferred shell on POSIX systems is /bin/sh, but on Android it's /system/bin/sh. Simple existence checking might break the preferred shell on either. For more specific stdlib examples I'd have to check the test suite again. FYI, /bin/sh is not POSIX, see http://bugs.python.org/issue16353#msg224514 -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Exposing the Android platform existence to Python modules
Shiz h...@shiz.me writes: Hi folks, I’m working on porting CPython to the Android platform, and while making decent progress, I’m currently stuck at a higher-level issue than adding #ifdefs for __ANDROID__ to C extension modules. The idea is, not only CPython extension modules have some assumptions that don’t seem to fit Android’s mold, some default Python-written modules do as well. However, whereas CPython extensions can trivially check if we’re building for Android by checking the __ANDROID__ compiler macro, Python modules can do no such check, and are left wondering how to figure out if the platform they are currently running on is an Android one. To my knowledge there is no reliable way to detect if one is using Android as a vehicle for their journey using any other way. Now, the main question is: what would be the best way to ‘expose’ the indication that Android is being ran on to Python-living modules? My own thought was to add sys.getlinuxuserland(), or platform.linux_userland(), in similar vein to sys.getwindowsversion() and platform.linux_distribution(), which could return information about the userland of running CPython instance, instead of knowing merely the kernel and the distribution. This way, code could trivially check if it ran on the GNU(+associates) userland, or under a BSD-ish userland, or Android… and adjust its behaviour accordingly. I would be delighted to hear comments on this proposal, or better yet, alternative solutions. :) Kind regards, Shiz P.S.: I am well aware that Android might as well never be officially supported in CPython. In that case, consider this a thought experiment of how it /would/ be handled. :) Python uses os.name, sys.platform, and various functions from `platform` module to provide version info: - coarse: os.name is 'posix', 'nt', 'ce', 'java' [1]. It is defined by availability of some builtin modules ('posix', 'nt' in particular) at import time. - finer: sys.platform may start with freebsd, linux, win, cygwin, darwin (`uname -s`). It is defined at python build time. - detailed: `platform` module. It provides as much info as possible e.g., platform.uname(), platform.platform(). It may use runtime commands to get it. If Android is posixy enough (would `posix` module work on Android?) then os.name could be left 'posix'. You could set sys.platform to 'android' (like sys.platform may be 'cygwin' on Windows) if Android is not like *any other* Linux distribution (from the point of view of writing a working Python code on it) i.e., if Android is further from other Linux distribution than freebsd, linux, darwin from each other then it might deserve sys.platform slot. If sys.platform is left 'linux' (like sys.platform is 'darwin' on iOS) then platform module could be used to detect Android e.g., platform.linux_distribution() though (it might be removed in Python 3.6) it is unpredictable [2] unless you fix it on your python distribution, e.g., here's an output on my machine: import platform platform.linux_distribution() ('Ubuntu', '14.04', 'trusty') For example: is_android = (platform.linux_distribution()[0] == 'Android') You could also define platform.android_version() that can provide Android specific version details as much as you need: is_android = bool(platform.android_version().release) You could provide an alias android_ver (like existing java_ver, libc_ver, mac_ver, win32_ver). See also, When to use os.name, sys.platform, or platform.system? [3] Unrelated, TIL [4]: Android is a Linux distribution according to the Linux Foundation [1] https://docs.python.org/3.4/library/os.html#os.name [2] http://bugs.python.org/issue1322 [3] http://stackoverflow.com/questions/4553129/when-to-use-os-name-sys-platform-or-platform-system [4] http://en.wikipedia.org/wiki/Android_(operating_system) btw, does it help adding os.get_shell_executable() [5] function, to avoid hacking subprocess module, so that os.confstr('CS_PATH') or os.defpath on Android could be defined to include /system/bin instead? [5] http://bugs.python.org/issue16353 -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 scandir accepted
Ben Hoyt benh...@gmail.com writes: I think if I were doing this from scratch I'd reimplement listdir() in Python as return [e.name for e in scandir(path)]. ... So my basic plan is to have an internal helper function in posixmodule.c that either yields DirEntry objects or strings. And then listdir() would simply be defined something like return list(_scandir(path, yield_strings=True)) in C or in Python. My reasoning is that then there'll be much less (if any) code duplication between scandir() and listdir(). Does this sound like a reasonable approach? Note: listdir() accepts an integer path (an open file descriptor that refers to a directory) that is passed to fdopendir() on POSIX [4] i.e., *you can't use scandir() to replace listdir() in this case* (as I've already mentioned in [1]). See the corresponding tests from [2]. [1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html [2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html From os.listdir() docs [3]: This function can also support specifying a file descriptor; the file descriptor must refer to a directory. [3] https://docs.python.org/3.4/library/os.html#os.listdir [4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736 -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 scandir accepted
Ben Hoyt benh...@gmail.com writes: Note: listdir() accepts an integer path (an open file descriptor that refers to a directory) that is passed to fdopendir() on POSIX [4] i.e., *you can't use scandir() to replace listdir() in this case* (as I've already mentioned in [1]). See the corresponding tests from [2]. [1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html [2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html From os.listdir() docs [3]: This function can also support specifying a file descriptor; the file descriptor must refer to a directory. [3] https://docs.python.org/3.4/library/os.html#os.listdir [4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736 Fair point. Yes, I hadn't realized listdir supported dir_fd (must have been looking at 2.x docs), though you've pointed it out at [1] above. and I guess I wasn't thinking about implementation at the time. FYI, dir_fd is related but *different*: compare specifying a file descriptor [1] vs. paths relative to directory descriptors [2]. NOTE: os.supports_fd and os.supports_dir_fd are different sets. [3]: import os os.listdir in os.supports_fd True os.listdir in os.supports_dir_fd False [1] https://docs.python.org/3/library/os.html#path-fd [2] https://docs.python.org/3/library/os.html#dir-fd [3] https://mail.python.org/pipermail/python-dev/2014-July/135296.html To be clear: *listdir() does not support dir_fd* though it can be emulated using os.open(dir_fd=..). You can safely ignore the rest of the e-mail until you want to implement path-fd [1] support for os.scandir() in several months. Here's code example that demonstrates both path-fd [1] and dir-fd [2]: import contextlib import os with contextlib.ExitStack() as stack: dir_fd = os.open('/etc', os.O_RDONLY) stack.callback(os.close, dir_fd) fd = os.open('init.d', os.O_RDONLY, dir_fd=dir_fd) # dir-fd [2] stack.callback(os.close, fd) print(\n.join(os.listdir(fd))) # path-fd [1] It is the same as os.listdir('/etc/init.d') unless '/etc' is symlinked to refer to another directory after the first os.open('/etc',..) call. See also, os.fwalk(dir_fd=..) [4] [4] https://docs.python.org/3/library/os.html#os.fwalk However, given that we have to support this for listdir() anyway, I think it's worth reconsidering whether scandir()'s directory argument can be an integer FD. What is entry.path in this case? If input directory is a file descriptor (an integer) then os.path.join(directory, entry.name) won't work. PEP 471 should explicitly reject the support for specifying a file descriptor so that a code that uses os.scandir may assume that entry.path attribute is always present (no exceptions due to a failure to read /proc/self/fd/NNN or an error while calling fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see http://stackoverflow.com/q/1188757 ). [5] [5] https://mail.python.org/pipermail/python-dev/2014-July/135441.html On the other hand os.fwalk() [4] that supports both path-fd [1] and dir-fd [2] could be implemented without entry.path property if os.scandir() supports just path-fd [1]. os.fwalk() provides a safe way to traverse a directory tree without symlink races e.g., [6]: def get_tree_size(directory): Return total size of files in directory and subdirs. return sum(entry.lstat().st_size for root, dirs, files, rootfd in fwalk(directory) for entry in files) [6] http://legacy.python.org/dev/peps/pep-0471/#examples where fwalk() is the exact copy of os.fwalk() except that it uses _fwalk() which is defined in terms of scandir(): import os # adapt os._fwalk() to use scandir() instead of os.listdir() def _fwalk(topfd, toppath, topdown, onerror, follow_symlinks): # Note: This uses O(depth of the directory tree) file descriptors: # if necessary, it can be adapted to only require O(1) FDs, see # http://bugs.python.org/issue13734 entries = scandir(topfd) dirs, nondirs = [], [] for entry in entries: #XXX call onerror on OSError on next() and return? # report symlinks to directories as directories (like os.walk) # but no recursion into symlinked subdirectories unless # follow_symlinks is true # add dangling symlinks as nondirs (DirEntry.is_dir() doesn't # raise on broken links) try: (dirs if entry.is_dir() else nondirs).append(entry) except FileNotFoundError: continue # ignore disappeared files if topdown: yield toppath, dirs, nondirs, topfd for entry in dirs: try: orig_st = entry.stat(follow_symlinks=follow_symlinks) #XXX O_DIRECTORY, O_CLOEXEC, [? O_NOCTTY, O_SEARCH ?] dirfd = os.open(entry.name, os.O_RDONLY, dir_fd=topfd) except OSError as err: if onerror is not None:
Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
Nick Coghlan ncogh...@gmail.com writes: On 13 Jul 2014 20:54, Tim Delaney timothy.c.dela...@gmail.com wrote: On 14 July 2014 10:33, Ben Hoyt benh...@gmail.com wrote: If we go with Victor's link-following .is_dir() and .is_file(), then we probably need to add his suggestion of a follow_symlinks=False parameter (defaults to True). Either that or you have to say stat.S_ISDIR(entry.lstat().st_mode) instead, which is a little bit less nice. Absolutely agreed that follow_symlinks is the way to go, disagree on the default value. Given the above arguments for symlink-following is_dir()/is_file() methods (have I missed any, Victor?), what do others think? I would say whichever way you go, someone will assume the opposite. IMO not following symlinks by default is safer. If you follow symlinks by default then everyone has the following issues: 1. Crossing filesystems (including onto network filesystems); 2. Recursive directory structures (symlink to a parent directory); 3. Symlinks to non-existent files/directories; 4. Symlink to an absolutely huge directory somewhere else (very annoying if you just wanted to do a directory sizer ...). If follow_symlinks=False by default, only those who opt-in have to deal with the above. Or the ever popular symlink to . (or a directory higher in the tree). I think os.walk() is a good source of inspiration here: call the flag followlink and default it to False. Let's not multiply entities beyond necessity. There is well-defined *follow_symlinks* parameter https://docs.python.org/3/library/os.html#follow-symlinks e.g., os.access, os.chown, os.link, os.stat, os.utime and many other functions in os module support follow_symlinks parameter, see os.supports_follow_symlinks. os.walk is an exception that uses *followlinks*. It might be because it is an old function e.g., newer os.fwalk uses follow_symlinks. As it has been said: os.path.isdir, pathlib.Path.is_dir in Python File.directory? in Ruby, System.Directory.doesDirectoryExist in Haskell, `test -d` in shell do follow symlinks i.e., follow_symlinks=True as default is more familiar for .is_dir method. `cd path` in shell, os.chdir(path), `ls path`, os.listdir(path), and os.scandir(path) itself follow symlinks (even on Windows: http://bugs.python.org/issue13772 ). GUI file managers such as `nautilus` also treat symlinks to directories as directories -- you may click on them to open corresponding directories. Only *recursive* functions such as os.walk, os.fwalk do not follow symlinks by default, to avoid symlink loops. Note: the behavior is consistent with coreutils commands such as `cp` that follows symlinks for non-recursive actions but e.g., `du` utility that is inherently recursive doesn't follow symlinks by default. follow_symlinks=True as default for DirEntry.is_dir method allows to avoid easy-to-introduce bugs while replacing old os.listdir/os.path.isdir code or writing a new code using the same mental model. -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?
Nick Coghlan ncogh...@gmail.com writes: ... definition of floats and the definition of container invariants like assert x in [x]) The current approach means that the lack of reflexivity of NaN's stays confined to floats and similar types - it doesn't leak out and infect the behaviour of the container types. What we've never figured out is a good place to *document* it. I thought there was an open bug for that, but I can't find it right now. There was related issue Tuple comparisons with NaNs are broken http://bugs.python.org/issue21873 but it was closed as not a bug despite the corresponding behavior is *not documented* anywhere. -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
Ben Hoyt benh...@gmail.com writes: ... ``scandir()`` yields a ``DirEntry`` object for each file and directory in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'`` pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each ``DirEntry`` object has the following attributes and methods: * ``name``: the entry's filename, relative to the ``path`` argument (corresponds to the return values of ``os.listdir``) * ``full_name``: the entry's full path name -- the equivalent of ``os.path.join(path, entry.name)`` I suggest renaming .full_name - .path .full_name might be misleading e.g., it implies that .full_name == abspath(.full_name) that might be false. The .path name has no such associations. The semantics of the the .path attribute is defined by these assertions:: for entry in os.scandir(topdir): #NOTE: assume os.path.normpath(topdir) is not called to create .path assert entry.path == os.path.join(topdir, entry.name) assert entry.name == os.path.basename(entry.path) assert entry.name == os.path.relpath(entry.path, start=topdir) assert os.path.dirname(entry.path) == topdir assert (entry.path != os.path.abspath(entry.path) or os.path.isabs(topdir)) # it is absolute only if topdir is assert (entry.path != os.path.realpath(entry.path) or topdir == os.path.realpath(topdir)) # symlinks are not resolved assert (entry.path != os.path.normcase(entry.path) or topdir == os.path.normcase(topdir)) # no case-folding, # unlike PureWindowsPath ... * ``is_dir()``: like ``os.path.isdir()``, but much cheaper -- it never requires a system call on Windows, and usually doesn't on POSIX systems I suggest documenting the implicit follow_symlinks parameter for .is_X methods. Note: lstat == partial(stat, follow_symlinks=False). In particular, .is_dir() should probably use follow_symlinks=True by default as suggested by Victor Stinner *if .is_dir() does it on Windows* MSDN says: GetFileAttributes() does not follow symlinks. os.path.isdir docs imply follow_symlinks=True: both islink() and isdir() can be true for the same path. ... Like the other functions in the ``os`` module, ``scandir()`` accepts either a bytes or str object for the ``path`` parameter, and returns the ``DirEntry.name`` and ``DirEntry.full_name`` attributes with the same type as ``path``. However, it is *strongly recommended* to use the str type, as this ensures cross-platform support for Unicode filenames. Document when {e.name for e in os.scandir(path)} != set(os.listdir(path)) + e.g., path can be an open file descriptor in os.listdir(path) since Python 3.3 but the PEP doesn't mention it explicitly. It has been discussed already e.g., https://mail.python.org/pipermail/python-dev/2014-July/135296.html PEP 471 should explicitly reject the support for specifying a file descriptor so that a code that uses os.scandir may assume that entry.path (.full_name) attribute is always present (no exceptions due to a failure to read /proc/self/fd/NNN or an error while calling fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see http://stackoverflow.com/q/1188757 ). Reject explicitly in PEP 471 the support for dir_fd parameter + aka the support for paths relative to directory descriptors. Note: it is a *different* (but related) issue. ... Notes on exception handling --- ``DirEntry.is_X()`` and ``DirEntry.lstat()`` are explicitly methods rather than attributes or properties, to make it clear that they may not be cheap operations, and they may do a system call. As a result, these methods may raise ``OSError``. For example, ``DirEntry.lstat()`` will always make a system call on POSIX-based systems, and the ``DirEntry.is_X()`` methods will make a ``stat()`` system call on such systems if ``readdir()`` returns a ``d_type`` with a value of ``DT_UNKNOWN``, which can occur under certain conditions or on certain file systems. For this reason, when a user requires fine-grained error handling, it's good to catch ``OSError`` around these method calls and then handle as appropriate. I suggest documenting that next(os.scandir()) may raise OSError e.g., on POSIX it may happen due to an OS error in opendir/readdir/closedir Also, document whether os.scandir() itself may raise OSError (whether opendir or other OS functions may be called before the first yield). ... os.scandir() should allow the explicit cleanup ++ :: with closing(os.scandir()) as entries: for _ in entries: break entries.close() is called that frees the resources if necessary, to *avoid relying on garbage-collection for managing file
Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)
Ben Hoyt benh...@gmail.com writes: Thanks, Victor. I don't have any experience with dir_fd handling, so unfortunately can't really comment here. What advantages does it bring? I notice that even os.listdir() on Python 3.4 doesn't have anything related to file descriptors, so I'd be in favour of not including support. We can always add it later. -Ben FYI, os.listdir does support file descriptors in Python 3.3+ try: import os os.listdir(os.open('.', os.O_RDONLY)) NOTE: os.supports_fd and os.supports_dir_fd are different sets. See also, https://mail.python.org/pipermail/python-dev/2014-June/135265.html -- Akira P.S. Please, don't put your answer on top of the message you are replying to. On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner victor.stin...@gmail.com wrote: Hi, IMO we must decide if scandir() must support or not file descriptor. It's an important decision which has an important impact on the API. To support scandir(fd), the minimum is to store dir_fd in DirEntry: dir_fd would be None for scandir(str). scandir(fd) must not close the file descriptor, it should be done by the caller. Handling the lifetime of the file descriptor is a difficult problem, it's better to let the user decide how to handle it. There is the problem of the limit of open file descriptors, usually 1024 but it can be lower. It *can* be an issue for very deep file hierarchy. If we choose to support scandir(fd), it's probably safer to not use scandir(fd) by default in os.walk() (use scandir(str) instead), wait until the feature is well tested, corner cases are well known, etc. The second step is to enhance pathlib.Path to support an optional file descriptor. Path already has methods on filenames like chmod(), exists(), rename(), etc. Example: fd = os.open(path, os.O_DIRECTORY) try: for entry in os.scandir(fd): # ... use entry to benefit of entry cache: is_dir(), lstat_result ... path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) # ... use path which uses dir_fd ... finally: os.close(fd) Problem: if the path object is stored somewhere and use after the loop, Path methods will fail because dir_fd was closed. It's even worse if a new directory uses the same file descriptor :-/ (security issue, or at least tricky bugs!) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
Chris Angelico ros...@gmail.com writes: On Sat, Jun 28, 2014 at 11:05 PM, Akira Li 4kir4...@gmail.com wrote: Have you considered adding support for paths relative to directory descriptors [1] via keyword only dir_fd=None parameter if it may lead to more efficient implementations on some platforms? [1]: https://docs.python.org/3.4/library/os.html#dir-fd Potentially more efficient and also potentially safer (see 'man openat')... but an enhancement that can wait, if necessary. Introducing the feature later creates unnecessary incompatibilities. Either it should be explicitly rejected in the PEP 471 and something-like `os.scandir(os.open(relative_path, dir_fd=fd))` recommended instead (assuming `os.scandir in os.supports_fd` like `os.listdir()`). At C level it could be implemented using fdopendir/openat or scandirat. Here's the function description using Argument Clinic DSL: /*[clinic input] os.scandir path : path_t(allow_fd=True, nullable=True) = '.' *path* can be specified as either str or bytes. On some platforms, *path* may also be specified as an open file descriptor; the file descriptor must refer to a directory. If this functionality is unavailable, using it raises NotImplementedError. * dir_fd : dir_fd = None If not None, it should be a file descriptor open to a directory, and *path* should be a relative string; path will then be relative to that directory. if *dir_fd* is unavailable, using it raises NotImplementedError. Yield a DirEntry object for each file and directory in *path*. Just like os.listdir, the '.' and '..' pseudo-directories are skipped, and the entries are yielded in system-dependent order. {parameters} It's an error to use *dir_fd* when specifying *path* as an open file descriptor. [clinic start generated code]*/ And corresponding tests (from test_posix:PosixTester), to show the compatibility with os.listdir argument parsing in detail: def test_scandir_default(self): # When scandir is called without argument, # it's the same as scandir(os.curdir). self.assertIn(support.TESTFN, [e.name for e in posix.scandir()]) def _test_scandir(self, curdir): filenames = sorted(e.name for e in posix.scandir(curdir)) self.assertIn(support.TESTFN, filenames) #NOTE: assume listdir, scandir accept the same types on the platform self.assertEqual(sorted(posix.listdir(curdir)), filenames) def test_scandir(self): self._test_scandir(os.curdir) def test_scandir_none(self): # it's the same as scandir(os.curdir). self._test_scandir(None) def test_scandir_bytes(self): # When scandir is called with a bytes object, # the returned entries names are still of type str. # Call `os.fsencode(entry.name)` to get bytes self.assertIn('a', {'a'}) self.assertNotIn(b'a', {'a'}) self._test_scandir(b'.') @unittest.skipUnless(posix.scandir in os.supports_fd, test needs fd support for posix.scandir()) def test_scandir_fd_minus_one(self): # it's the same as scandir(os.curdir). self._test_scandir(-1) def test_scandir_float(self): # invalid args self.assertRaises(TypeError, posix.scandir, -1.0) @unittest.skipUnless(posix.scandir in os.supports_fd, test needs fd support for posix.scandir()) def test_scandir_fd(self): fd = posix.open(posix.getcwd(), posix.O_RDONLY) self.addCleanup(posix.close, fd) self._test_scandir(fd) self.assertEqual( sorted(posix.scandir('.')), sorted(posix.scandir(fd))) # call 2nd time to test rewind self.assertEqual( sorted(posix.scandir('.')), sorted(posix.scandir(fd))) @unittest.skipUnless(posix.scandir in os.supports_dir_fd, test needs dir_fd support for os.scandir()) def test_scandir_dir_fd(self): relpath = 'relative_path' with support.temp_dir() as parent: fullpath = os.path.join(parent, relpath) with support.temp_dir(path=fullpath): support.create_empty_file(os.path.join(parent, 'a')) support.create_empty_file(os.path.join(fullpath, 'b')) fd = posix.open(parent, posix.O_RDONLY) self.addCleanup(posix.close, fd) self.assertEqual( sorted(posix.scandir(relpath, dir_fd=fd)), sorted(posix.scandir(fullpath))) # check that fd is still useful self.assertEqual( sorted(posix.scandir(relpath, dir_fd=fd)), sorted(posix.scandir(fullpath))) -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev
Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
Ben Hoyt benh...@gmail.com writes: Hi Python dev folks, I've written a PEP proposing a specific os.scandir() API for a directory iterator that returns the stat-like info from the OS, *the main advantage of which is to speed up os.walk() and similar operations between 4-20x, depending on your OS and file system.* ... http://legacy.python.org/dev/peps/pep-0471/ ... Specifically, this PEP proposes adding a single function to the ``os`` module in the standard library, ``scandir``, that takes a single, optional string as its argument:: scandir(path='.') - generator of DirEntry objects Have you considered adding support for paths relative to directory descriptors [1] via keyword only dir_fd=None parameter if it may lead to more efficient implementations on some platforms? [1]: https://docs.python.org/3.4/library/os.html#dir-fd -- akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character
Florian Bruhin m...@the-compiler.org writes: * Nikolaus Rath nikol...@rath.org [2014-06-12 19:11:07 -0700]: R. David Murray rdmur...@bitdance.com writes: Also notice that using a list with shell=True is using the API incorrectly. It wouldn't even work on Linux, so that torpedoes the cross-platform concern already :) This kind of confusion is why I opened http://bugs.python.org/issue7839. Can someone describe an use case where shell=True actually makes sense at all? It seems to me that whenever you need a shell, the argument's that you pass to it will be shell specific. So instead of e.g. Popen('for i in `seq 42`; do echo $i; done', shell=True) you almost certainly want to do Popen(['/bin/sh', 'for i in `seq 42`; do echo $i; done'], shell=False) because if your shell happens to be tcsh or cmd.exe, things are going to break. My usecase is a spawn-command in a GUI application, which the user can use to spawn an executable. I want the user to be able to use the usual shell features from there. However, I also pass an argument to that command, and that should be escaped. You should pass the command as a string and use cmd.exe quote rules [1] (note: they are different from the one provided by `subprocess.list2cmdline()` [2] that follows Microsoft C/C++ startup code rules [3] e.g., `^` is not special unlike in cmd.exe case). [1]: http://blogs.msdn.com/b/twistylittlepassagesallalike/archive/2011/04/23/everyone-quotes-arguments-the-wrong-way.aspx [2]: https://docs.python.org/3.4/library/subprocess.html#converting-an-argument-sequence-to-a-string-on-windows [3]: http://msdn.microsoft.com/en-us/library/17w5ykft%28v=vs.85%29.aspx -- akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] should tests be thread-safe?
Victor Stinner victor.stin...@gmail.com writes: If you need a well defined environement, run your test in a subprocess. Depending on the random function, your test may be run with more threads. On BSD, it changes for example which thread receives a signal. Importing the tkinter module creates a hidden C thread for the Tk loop. Does it mean that non-thread-safe tests can't be run using a GUI test runner that is implemented using tkinter? -- akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com