Re: [Python-Dev] Aware datetime from naive local time Was: Status on PEP-431 Timezones

2015-04-17 Thread Akira Li
On Thu, Apr 16, 2015 at 1:14 AM, Alexander Belopolsky 
alexander.belopol...@gmail.com wrote:


 On Wed, Apr 15, 2015 at 4:46 PM, Akira Li 4kir4...@gmail.com wrote:

  Look what happened on July 1, 1990.  At 2 AM, the clocks in Ukraine were
  moved back one hour.  So times like 01:30 AM happened twice there on
 that
  day.  Let's see how Python handles this situation
 
  $ TZ=Europe/Kiev python3
  from email.utils import localtime
  from datetime import datetime
  localtime(datetime(1990,7,1,1,30)).strftime('%c %z %Z')
  'Sun Jul  1 01:30:00 1990 +0400 MSD'
 
  So far so good, I've got the first of the two 01:30AM's.  But what if I
  want the other 01:30AM?  Well,
 
  localtime(datetime(1990,7,1,1,30), isdst=0).strftime('%c %z %Z')
  'Sun Jul  1 01:30:00 1990 +0300 EEST'
 
  gives me the other 01:30AM, but it is counter-intuitive: I have to ask
  for the standard (winter)  time to get the daylight savings (summer)
 time.
 

 It looks incorrect. Here's the corresponding pytz code:

   from datetime import datetime
   import pytz

   tz = pytz.timezone('Europe/Kiev')
   print(tz.localize(datetime(1990, 7, 1, 1, 30),
 is_dst=False).strftime('%c %z %Z'))
   # - Sun Jul  1 01:30:00 1990 +0300 EEST
   print(tz.localize(datetime(1990, 7, 1, 1, 30),
 is_dst=True).strftime('%c %z %Z'))
   # - Sun Jul  1 01:30:00 1990 +0400 MSD

 See also Enhance support for end-of-DST-like ambiguous time [1]

 [1] https://bugs.launchpad.net/pytz/+bug/1378150

 `email.utils.localtime()` is broken:


 If you think there is a bug in email.utils.localtime - please open an
 issue at bugs.python.org.



Your question below suggests that you believe it is not a bug i.e.,
`email.utils.localtime()` is broken *by design* unless you think it is ok
to ignore `+0400 MSD`.

pytz works for me (I can get both `+0300 EEST` and `+0400 MSD`).  I don't
think `localtime()` can be fixed without the tz database. I don't know
whether it should be fixed, let somebody else who can't use pytz to pioneer
the issue. The purpose of the code example is to **inform** that
`email.utils.localtime()` fails (it returns only +0300 EEST) in this case:


   from datetime import datetime
   from email.utils import localtime

   print(localtime(datetime(1990, 7, 1, 1, 30)).strftime('%c %z %Z'))
   # - Sun Jul  1 01:30:00 1990 +0300 EEST
   print(localtime(datetime(1990, 7, 1, 1, 30), isdst=0).strftime('%c %z
 %Z'))
   # - Sun Jul  1 01:30:00 1990 +0300 EEST
   print(localtime(datetime(1990, 7, 1, 1, 30), isdst=1).strftime('%c %z
 %Z'))
   # - Sun Jul  1 01:30:00 1990 +0300 EEST
   print(localtime(datetime(1990, 7, 1, 1, 30), isdst=-1).strftime('%c %z
 %Z'))
   # - Sun Jul  1 01:30:00 1990 +0300 EEST


 Versions:

   $ ./python -V
   Python 3.5.0a3+
   $ dpkg -s tzdata | grep -i version
   Version: 2015b-0ubuntu0.14.04

  The uncertainty about how to deal with the repeated hour was the reason
 why
  email.utils.localtime-like  interface did not make it to the datetime
  module.

 repeated hour (time jumps back) can be treated like a end-of-DST
 transition, to resolve ambiguities [1].


 I don't understand what you are complaining about.  It is quite possible
 that pytz uses is_dst flag differently from the way email.utils.localtime
 uses isdst.

 I was not able to find a good description of what is_dst means in pytz,
 but localtime's isdst is documented as follows:

 a positive or zero value for *isdst* causes localtime to
 presume initially that summer time (for example, Daylight Saving Time)
 is or is not (respectively) in effect for the specified time.

 Can you demonstrate that email.utils.localtime does not behave as
 documented?



No need to be so defensive about it. *repeated hour (time jumps back)
can be treated like a end-of-DST transition, to resolve ambiguities [1].*
is just a *an example* on how to fix the problem in the same way how it is
done in pytz:

   from datetime import datetime
   import pytz
   tz = pytz.timezone('Europe/Kiev')
   after = tz.localize(datetime(1990, 7, 1, 1, 30), is_dst=False)
   before = tz.localize(datetime(1990, 7, 1, 1, 30), is_dst=True)
   before  after
  True
   before
  datetime.datetime(1990, 7, 1, 1, 30, tzinfo=DstTzInfo 'Europe/Kiev'
MSD+4:00:00 DST)
   after
  datetime.datetime(1990, 7, 1, 1, 30, tzinfo=DstTzInfo 'Europe/Kiev'
EEST+3:00:00 DST)
   before.astimezone(pytz.utc)
datetime.datetime(1990, 6, 30, 21, 30, tzinfo=UTC)
   after.astimezone(pytz.utc)
datetime.datetime(1990, 6, 30, 22, 30, tzinfo=UTC)
   before.dst()
datetime.timedelta(0, 3600)
   after.dst()
datetime.timedelta(0, 3600)
   pytz.OLSON_VERSION
  '2015b'

Here's summer time in both cases i.e., it is not *true* end-of-DST
transition (that is why I've used the word *like* above).

If we ignore ambiguous time that may occur more than twice then a boolean
flag such as pytz's is_dst is *always* enough to resolve the ambiguity
assuming we have access to the tz database.

And yes, the example demonstrates that the behavior of pytz's is_dst

Re: [Python-Dev] Status on PEP-431 Timezones

2015-04-15 Thread Akira Li
Lennart Regebro rege...@gmail.com writes:

 OK, so I realized another thing today, and that is that arithmetic
 doesn't necessarily round trip.

 For example, 2002-10-27 01:00 US/Eastern comes both in DST and STD.

 But 2002-10-27 01:00 US/Eastern STD minus two days is 2002-10-25 01:00
 US/Eastern DST

two days is ambiguous here. It is incorrect if you mean 48 hours (the
difference is 49 hours):

  #!/usr/bin/env python3
  from datetime import datetime, timedelta
  import pytz

  tz = pytz.timezone('US/Eastern')
  then_isdst = False # STD
  then = tz.localize(datetime(2002, 10, 27, 1), is_dst=then_isdst)
  now =  tz.localize(datetime(2002, 10, 25, 1), is_dst=None) # no utc transition
  print((then - now) // timedelta(hours=1))
  # - 49

 However, 2002-10-25 01:00 US/Eastern DST plus two days is 2002-10-27
 01:00 US/Eastern, but it is ambiguous if you want DST or not DST.

It is not ambiguous if you know what two days *in your particular
application* should mean (`day+2` vs. +48h exactly):

  print(tz.localize(now.replace(tzinfo=None) + timedelta(2), is_dst=then_isdst))
  # - 2002-10-27 01:00:00-05:00 # +49h
  print(tz.normalize(now + timedelta(2))) # +48h
  # - 2002-10-27 01:00:00-04:00

Here's a simple mental model that can be used for date arithmetics:

- naive datetime + timedelta(2) == same time, elapsed hours unknown
- aware utc datetime + timedelta(2) == same time, +48h
- aware datetime with timezone that may have different utc offsets at
different times + timedelta(2) == unknown time, +48h

unknown means that you can't tell without knowning the specific
timezone.

It ignores leap seconds.

The 3rd case behaves *as if* the calculations are performed using these
steps (the actual implementation may be different):

1. convert an aware datetime
object to utc (dt.astimezone(pytz.utc))
2. do the simple arithmetics using utc time
3. convert the result to the original pytz timezone (utc_dt.astimezone(tz))

you don't need `.localize()`, `.normalize()` calls here.

 And you can't pass in a is_dst flag to __add__, so the arithmatic must
 just pick one, and the sensible one is to keep to the same DST.

 That means that:

 tz = get_timezone('US/Eastern')
 dt = datetimedatetime(2002, 10, 27, 1, 0, tz=tz, is_dst=False)
 dt2 = dt - 420 + 420
 assert dt == dt2

 Will fail, which will be unexpected for most people.

 I think there is no way around this, but I thought I should flag for
 it. This is a good reason to do all your date time arithmetic in UTC.

 //Lennart

It won't fail:

  from datetime import datetime, timedelta
  import pytz

  tz = pytz.timezone('US/Eastern')
  dt = tz.localize(datetime(2002, 10, 27, 1), is_dst=False)
  delta = timedelta(seconds=420)

  assert dt == tz.normalize(tz.normalize(dt - delta) + delta)

The only reason `tz.normalize()` is used so that tzinfo would be correct
for the resulting datetime object; it does not affect the comparison otherwise:

  assert dt == (dt - delta + delta) #XXX tzinfo may be incorrect
  assert dt == tz.normalize(dt - delta + delta) # correct tzinfo for the final 
result

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status on PEP-431 Timezones

2015-04-15 Thread Akira Li
Isaac Schwabacher ischwabac...@wisc.edu writes:
 ...

 I know that you can do datetime.now(tz), and you can do datetime(2013,
 11, 3, 1, 30, tzinfo=zoneinfo('America/Chicago')), but not being able
 to add a time zone to an existing naive datetime is painful (and
 strptime doesn't even let you pass in a time zone). 

`.now(tz)` is correct. `datetime(..., tzinfo=tz)`) is wrong: if tz is a
pytz timezone then you may get a wrong tzinfo (LMT), you should use
`tz.localize(naive_dt, is_dst=False|True|None)` instead.

 ...

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Aware datetime from naive local time Was: Status on PEP-431 Timezones

2015-04-15 Thread Akira Li
Alexander Belopolsky alexander.belopol...@gmail.com writes:

 Sorry for a truncated message.  Please scroll past the quoted portion.

 On Thu, Apr 9, 2015 at 10:21 PM, Alexander Belopolsky 
 alexander.belopol...@gmail.com wrote:


 On Thu, Apr 9, 2015 at 4:51 PM, Isaac Schwabacher ischwabac...@wisc.edu
 wrote:

Well, you are right, but at least we do have a localtime utility
 hidden in the email package:
   
 from datetime import *
 from email.utils import localtime
 print(localtime(datetime.now()))
2015-04-09 15:19:12.84-04:00
   
You can read http://bugs.python.org/issue9527 for the reasons it
 did not make into datetime.
  
   But that's restricted to the system time zone. Nothing good ever
 comes from the system time zone...
 
  Let's solve one problem at a time. ...

 PEP 431 proposes to import zoneinfo into the stdlib, ...


 I am changing the subject so that we can focus on one question without
 diverting to PEP-size issues that are better suited for python ideas.

 I would like to add a functionality to the datetime module that would
 solve a seemingly simple problem: given a naive datetime instance assumed
 to be in local time, construct the corresponding aware datetime object with
 tzinfo set to an appropriate fixed offset datetime.timezone instance.

 Python 3 has this functionality implemented in the email package since
 version 3.3, and it appears to work well even
 in the ambiguous hour

  from email.utils import localtime
  from datetime import datetime
  localtime(datetime(2014,11,2,1,30)).strftime('%c %z %Z')
 'Sun Nov  2 01:30:00 2014 -0400 EDT'
  localtime(datetime(2014,11,2,1,30), isdst=0).strftime('%c %z %Z')
 'Sun Nov  2 01:30:00 2014 -0500 EST'

 However, in a location with a more interesting history, you can get a
 situation that


 would look like this in the zoneinfo database:

 $ zdump -v  -c 1992 Europe/Kiev
 ...
 Europe/Kiev  Sat Mar 24 22:59:59 1990 UTC = Sun Mar 25 01:59:59 1990 MSK
 isdst=0
 Europe/Kiev  Sat Mar 24 23:00:00 1990 UTC = Sun Mar 25 03:00:00 1990 MSD
 isdst=1
 Europe/Kiev  Sat Jun 30 21:59:59 1990 UTC = Sun Jul  1 01:59:59 1990 MSD
 isdst=1
 Europe/Kiev  Sat Jun 30 22:00:00 1990 UTC = Sun Jul  1 01:00:00 1990 EEST
 isdst=1
 Europe/Kiev  Sat Sep 28 23:59:59 1991 UTC = Sun Sep 29 02:59:59 1991 EEST
 isdst=1
 Europe/Kiev  Sun Sep 29 00:00:00 1991 UTC = Sun Sep 29 02:00:00 1991 EET
 isdst=0
 ...

 Look what happened on July 1, 1990.  At 2 AM, the clocks in Ukraine were
 moved back one hour.  So times like 01:30 AM happened twice there on that
 day.  Let's see how Python handles this situation

 $ TZ=Europe/Kiev python3
 from email.utils import localtime
 from datetime import datetime
 localtime(datetime(1990,7,1,1,30)).strftime('%c %z %Z')
 'Sun Jul  1 01:30:00 1990 +0400 MSD'

 So far so good, I've got the first of the two 01:30AM's.  But what if I
 want the other 01:30AM?  Well,

 localtime(datetime(1990,7,1,1,30), isdst=0).strftime('%c %z %Z')
 'Sun Jul  1 01:30:00 1990 +0300 EEST'

 gives me the other 01:30AM, but it is counter-intuitive: I have to ask
 for the standard (winter)  time to get the daylight savings (summer) time.


It looks incorrect. Here's the corresponding pytz code:

  from datetime import datetime
  import pytz

  tz = pytz.timezone('Europe/Kiev')
  print(tz.localize(datetime(1990, 7, 1, 1, 30), is_dst=False).strftime('%c %z 
%Z'))
  # - Sun Jul  1 01:30:00 1990 +0300 EEST
  print(tz.localize(datetime(1990, 7, 1, 1, 30), is_dst=True).strftime('%c %z 
%Z'))
  # - Sun Jul  1 01:30:00 1990 +0400 MSD
  
See also Enhance support for end-of-DST-like ambiguous time [1]

[1] https://bugs.launchpad.net/pytz/+bug/1378150

`email.utils.localtime()` is broken:

  from datetime import datetime
  from email.utils import localtime

  print(localtime(datetime(1990, 7, 1, 1, 30)).strftime('%c %z %Z'))
  # - Sun Jul  1 01:30:00 1990 +0300 EEST
  print(localtime(datetime(1990, 7, 1, 1, 30), isdst=0).strftime('%c %z %Z'))
  # - Sun Jul  1 01:30:00 1990 +0300 EEST
  print(localtime(datetime(1990, 7, 1, 1, 30), isdst=1).strftime('%c %z %Z'))
  # - Sun Jul  1 01:30:00 1990 +0300 EEST
  print(localtime(datetime(1990, 7, 1, 1, 30), isdst=-1).strftime('%c %z %Z'))
  # - Sun Jul  1 01:30:00 1990 +0300 EEST
  

Versions:

  $ ./python -V
  Python 3.5.0a3+
  $ dpkg -s tzdata | grep -i version
  Version: 2015b-0ubuntu0.14.04

 The uncertainty about how to deal with the repeated hour was the reason why
 email.utils.localtime-like  interface did not make it to the datetime
 module.

repeated hour (time jumps back) can be treated like a end-of-DST
transition, to resolve ambiguities [1].

 The main objection to the isdst flag was that in most situations,
 determining whether DST is in effect is as hard as finding the UTC offset,
 so reducing the problem of finding the UTC offset to the one of finding the
 value for isdst does not solve much.

 I now realize that the problem is simply in the name for the flag.  While
 we cannot often tell what isdst 

Re: [Python-Dev] Status on PEP-431 Timezones

2015-04-15 Thread Akira Li
Alexander Belopolsky alexander.belopol...@gmail.com writes:

 On Wed, Apr 8, 2015 at 3:57 PM, Isaac Schwabacher ischwabac...@wisc.edu
 wrote:

 On 15-04-08, Alexander Belopolsky wrote:
  With datetime, we also have a problem that POSIX APIs don't have to
 deal with: local time
  arithmetics. What is t + timedelta(1) when t falls on the day before
 DST change? How would
  you set the isdst flag in the result?

 It's whatever time comes 60*60*24 seconds after t in the same time zone,
 because the timedelta class isn't expressive enough to represent anything
 but absolute time differences (nor should it be, IMO).

 This is not what most uses expect.  The expect

 datetime(y, m, d, 12, tzinfo=New_York) + timedelta(1)

 to be

 datetime(y, m, d+1, 12, tzinfo=New_York)

It is incorrect. If you want d+1 for +timedelta(1); use a **naive**
datetime. Otherwise +timedelta(1) is +24h:

  tomorrow = tz.localize(aware_dt.replace(tzinfo=None) + timedelta(1), 
is_dst=None)
  dt_plus24h = tz.normalize(aware_dt + timedelta(1)) # +24h

*tomorrow* and *aware_dt* have the *same* time but it is unknown how
 many hours have passed if the utc offset has changed in between.
*dt_plus24h* may have a different time but there are exactly 24 hours
 have passed between *dt_plush24* and *aware_dt*
http://stackoverflow.com/questions/441147/how-can-i-subtract-a-day-from-a-python-date

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status on PEP-431 Timezones

2015-04-15 Thread Akira Li
Isaac Schwabacher ischwabac...@wisc.edu writes:

 On 15-04-15, Akira Li 4kir4...@gmail.com wrote:
 Isaac Schwabacher ischwabac...@wisc.edu writes:
  ...
 
  I know that you can do datetime.now(tz), and you can do datetime(2013,
  11, 3, 1, 30, tzinfo=zoneinfo('America/Chicago')), but not being able
  to add a time zone to an existing naive datetime is painful (and
  strptime doesn't even let you pass in a time zone). 
 
 `.now(tz)` is correct. `datetime(..., tzinfo=tz)`) is wrong: if tz is a
 pytz timezone then you may get a wrong tzinfo (LMT), you should use
 `tz.localize(naive_dt, is_dst=False|True|None)` instead.

 The whole point of this thread is to finalize PEP 431, which fixes the
 problem for which `localize()` and `normalize()` are workarounds. When
 this is done, `datetime(..., tzinfo=tz)` will be correct.

 ijs

The input time is ambiguous. Even if we assume PEP 431 is implemented in
some form, your code is still missing isdst parameter (or the
analog). PEP 431 won't fix it; it can't resolve the ambiguity by
itself. Notice is_dst paramter in the `tz.localize()` call (current
API).

.now(tz) works even during end-of-DST transitions (current API) when the
local time is ambiguous.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Aware datetime from naive local time Was: Status on PEP-431 Timezones

2015-04-15 Thread Akira Li
Alexander Belopolsky alexander.belopol...@gmail.com writes:

 ...
 For most world locations past discontinuities are fairly well documented
 for at least a century and future changes are published with at least 6
 months lead time.

It is important to note that the different versions of the tz database
may lead to different tzinfo (utc offset, tzname) even for *past* dates.

i.e., (lt, tzid, isdst) is not enough because the result for (lt,
tzid(2015b), isdst) may be different from (lt, tzid(X), isdst)
where

lt = local time e.g., naive datetime
tzid = timezone from the tz database e.g., Europe/Kiev
isdst = a boolean flag for disambiguation
X != 2015b

In other words, a fixed utc offset might not be sufficient even for past
dates.

...
 Moreover, a program that rejects invalid times on input, but stores them
 for a long time may see its database silently corrupted after a zoneinfo
 update.

 Now it is time to make specific proposal.  I would like to extend
 datetime.astimezone() method to work on naive datetime instances.  Such
 instances will be assumed to be in local time and discontinuities will be
 handled as follows:


 1. wall(t) == lt has a single solution.  This is the trivial case and
 lt.astimezone(utc) and lt.astimezone(utc, which=i)  for i=0,1 should return
 that solution.

 2. wall(t) == lt has two solutions t1 and t2 such that t1  t2. In this
 case lt.astimezone(utc) == lt.astimezone(utc, which=0) == t1 and
  lt.astimezone(utc, which=1) == t2.

In pytz terms: `which = not isdst` (end-of-DST-like transition: isdst
changes from True to False in the direction of utc time).

It resolves AmbiguousTimeError raised by `tz.localize(naive, is_dst=None)`.

 3. wall(t) == lt has no solution.  This happens when there is UTC time t0
 such that wall(t0)  lt and wall(t0+epsilon)  lt (a positive discontinuity
 at time t0). In this case lt.astimezone(utc) should return t0 + lt -
 wall(t0).  I.e., we ignore the discontinuity and extend wall(t) linearly
 past t0.  Obviously, in this case the invariant wall(lt.astimezone(utc)) ==
 lt won't hold.   The which flag should be handled as follows:
  lt.astimezone(utc) == lt.astimezone(utc, which=0) and lt.astimezone(utc,
 which=0) == t0 + lt - wall(t0+eps).

It is inconsistent with the previous case: here `which = isdst` but
`which = not isdst` above.

`lt.astimezone(utc, which=0) == t0 + lt - wall(t0+eps)` corresponds to:

  result = tz.normalize(tz.localize(lt, isdst=False))

i.e., `which = isdst` (t0 is at the start of DST and therefore isdst
changes from False to True).

It resolves NonExistentTimeError raised by `tz.localize(naive,
is_dst=None)`. start-of-DST-like transition (Spring forward).

For example,

  from datetime import datetime, timedelta
  import pytz
  
  tz = pytz.timezone('America/New_York')
  # 2am -- non-existent time
  print(tz.normalize(tz.localize(datetime(2015, 3, 8, 2), is_dst=False)))
  # - 2015-03-08 03:00:00-04:00 # after the jump (wall(t0+eps))
  print(tz.localize(datetime(2015, 3, 8, 3), is_dst=None))
  # - 2015-03-08 03:00:00-04:00 # same time, unambiguous
  # 2:01am -- non-existent time
  print(tz.normalize(tz.localize(datetime(2015, 3, 8, 2, 1), is_dst=False)))
  # - 2015-03-08 03:01:00-04:00
  print(tz.localize(datetime(2015, 3, 8, 3, 1), is_dst=None))
  # - 2015-03-08 03:01:00-04:00 # same time, unambiguous
  # 2:59am non-existent time
  dt = tz.normalize(tz.localize(datetime(2015, 3, 8, 2, 59), is_dst=True))
  print(dt)
  # - 2015-03-08 01:59:00-05:00 # before the jump (wall(t0-eps))
  print(tz.normalize(dt + timedelta(minutes=1)))
  # - 2015-03-08 03:00:00-04:00


 With the proposed features in place, one can use the naive code

 t =  lt.astimezone(utc)

 and get predictable behavior in all cases and no crashes.

 A more sophisticated program can be written like this:

 t1 = lt.astimezone(utc, which=0)
 t2 = lt.astimezone(utc, which=1)
 if t1 == t2:
 t = t1
 elif t2  t1:
 # ask the user to pick between t1 and t2 or raise
 AmbiguousLocalTimeError
 else:
 t = t1
 # warn the user that time was invalid and changed or raise
 InvalidLocalTimeError

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 481 - Migrate Some Supporting Repositories to Git and Github

2014-11-30 Thread Akira Li
Larry Hastings la...@hastings.org writes:

 On 11/29/2014 04:37 PM, Donald Stufft wrote:
 On Nov 29, 2014, at 7:15 PM, Alex Gaynor alex.gay...@gmail.com wrote:
 Despite being a regular hg
 user for years, I have no idea how to create a local-only branch, or a 
 branch
 which is pushed to a remote (to use the git term).
 I also don’t know how to do this.

 Instead of collectively scratching your heads, could one of you guys
 do the research and figure out whether or not hg supports this
 workflow?  One of the following two things must be true:

 1. hg supports this workflow (or a reasonable fascimile), which may
lessen the need for this PEP.
 2. hg doesn't support this workflow, which may strengthen the need for
this PEP.


Assuming git's all work is done in a local branch workflow, you could
use bookmarks with hg 

http://lostechies.com/jimmybogard/2010/06/03/translating-my-git-workflow-with-local-branches-to-mercurial/
http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial/#branching-with-bookmarks
http://mercurial.selenic.com/wiki/BookmarksExtension#Usage
http://stackoverflow.com/questions/1598759/git-and-mercurial-compare-and-contrast


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-16 Thread Akira Li
Steven D'Aprano st...@pearwood.info writes:

 On Wed, Sep 17, 2014 at 11:14:15AM +1000, Chris Angelico wrote:
 On Wed, Sep 17, 2014 at 5:29 AM, R. David Murray rdmur...@bitdance.com 
 wrote:

  Basically, we are pretending that the each smuggled
  byte is single character for string parsing purposes...but they don't
  match any of our parsing constants.  They are all any character matches
  in the regexes and what have you.
 
 This is slightly iffy, as you can't be sure that one byte represents
 one character, but as long as you don't much care about that, it's not
 going to be an issue.

 This discussion would probably be a lot more easy to follow, with fewer 
 miscommunications, if there were some examples. Here is my example, 
 perhaps someone can tell me if I'm understanding it correctly.

 I want to send an email including the header line:

 'Subject: “NOBODY expects the Spanish Inquisition!”'


   from email.header import Header
   h = Header('Subject: “NOBODY expects the Spanish Inquisition!”')
   h.encode('utf-8')
  '=?utf-8?q?Subject=3A_=E2=80=9CNOBODY_expects_the_Spanish_Inquisition!?=\n 
=?utf-8?q?=E2=80=9D?='
   h.encode()
  '=?utf-8?q?Subject=3A_=E2=80=9CNOBODY_expects_the_Spanish_Inquisition!?=\n 
=?utf-8?q?=E2=80=9D?='
   h.encode('ascii')
  '=?utf-8?q?Subject=3A_=E2=80=9CNOBODY_expects_the_Spanish_Inquisition!?=\n 
=?utf-8?q?=E2=80=9D?='


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multiline with statement line continuation

2014-08-13 Thread Akira Li
Nick Coghlan ncogh...@gmail.com writes:

 On 12 August 2014 22:15, Steven D'Aprano st...@pearwood.info wrote:
 Compare the natural way of writing this:

 with open(spam) as spam, open(eggs, w) as eggs, frobulate(cheese) as 
 cheese:
 # do stuff with spam, eggs, cheese

 versus the dynamic way:

 with ExitStack() as stack:
 spam, eggs = [stack.enter_context(open(fname), mode) for fname, mode in
   zip((spam, eggs), (r, w)]
 cheese = stack.enter_context(frobulate(cheese))
 # do stuff with spam, eggs, cheese

 You wouldn't necessarily switch at three. At only three, you have lots
 of options, including multiple nested with statements:

 with open(spam) as spam:
 with open(eggs, w) as eggs:
 with frobulate(cheese) as cheese:
 # do stuff with spam, eggs, cheese

 The multiple context managers in one with statement form is there
 *solely* to save indentation levels, and overuse can often be a sign
 that you may have a custom context manager trying to get out:

 @contextlib.contextmanager
 def dish(spam_file, egg_file, topping):
 with open(spam_file), open(egg_file, 'w'), frobulate(topping):
 yield

 with dish(spam, eggs, cheese) as spam, eggs, cheese:
 # do stuff with spam, eggs  cheese

 ExitStack is mostly useful as a tool for writing flexible custom
 context managers, and for dealing with context managers in cases where
 lexical scoping doesn't necessarily work, rather than being something
 you'd regularly use for inline code.

 Why do I have so many contexts open at once in this function? is a
 question developers should ask themselves in the same way its worth
 asking why do I have so many local variables in this function?

Multiline with-statement can be useful even with *two* context
managers. Two is not many.

Saving indentations levels along is a worthy goal. It can affect
readability and the perceived complexity of the code.

Here's how I'd like the code to look like:

  with (open('input filename') as input_file,
open('output filename', 'w') as output_file):
  # code with list comprehensions to transform input file into output file

Even one additional unnecessary indentation level may force to split
list comprehensions into several lines (less readable) and/or use
shorter names (less readable). Or it may force to move the inline code
into a separate named function prematurely, solely to preserve the
indentation level (also may be less readable) i.e.,

  with ... as input_file:
  with ... as output_file:
  ... #XXX indentation level is lost for no reason

  with ... as infile, ... as outfile: #XXX shorter names
  ...

  with ... as input_file:
  with ... as output_file:
  transform(input_file, output_file) #XXX unnecessary function

And (nested() can be implemented using ExitStack):

  with nested(open(..),
  open(..)) as (input_file, output_file):
  ... #XXX less readable

Here's an example where nested() won't help:

  def get_integers(filename):
  with (open(filename, 'rb', 0) as file,
mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as 
mmapped_file):
  for match in re.finditer(br'\d+', mmapped_file):
  yield int(match.group())

Here's another:

  with (open('log'+'some expression that generates filename', 'a') as logfile,
redirect_stdout(logfile)):
  ...


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python2.7 infinite recursion when loading pickled object

2014-08-11 Thread Akira Li
Schmitt  Uwe (ID SIS) uwe.schm...@id.ethz.ch writes:

 I discovered a problem using cPickle.loads from CPython 2.7.6.

 The last line in the following code raises an infinite recursion

 class T(object):

 def __init__(self):
 self.item = list()

 def __getattr__(self, name):
 return getattr(self.item, name)

 import cPickle

 t = T()

 l = cPickle.dumps(t)
 cPickle.loads(l)
...
 Is this a bug or did I miss something ?

The issue is that your __getattr__ raises RuntimeError (due to infinite
recursion) for non-existing attributes instead of AttributeError. To fix
it, you could use object.__getattribute__:

  class C:
def __init__(self):
self.item = []
def __getattr__(self, name):
return getattr(object.__getattribute__(self, 'item'), name)

There were issues in the past due to {get,has}attr silencing
non-AttributeError exceptions; therefore it is good that pickle breaks
when it gets RuntimeError instead of AttributeError.


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.walk() is going to be *fast* with scandir

2014-08-11 Thread Akira Li
Armin Rigo ar...@tunes.org writes:

 On 10 August 2014 08:11, Larry Hastings la...@hastings.org wrote:
 A small tip from my bzr days - cd into the directory before scanning it

 I doubt that's permissible for a library function like os.scandir().

 Indeed, chdir() is notably not compatible with multithreading.  There
 would be a non-portable but clean way to do that: the functions
 openat() and fstatat().  They only exist on relatively modern Linuxes,
 though.

There is os.fwalk() that could be both safer and faster than
os.walk(). It yields rootdir fd that can be used by functions that
support dir_fd parameter, see os.supports_dir_fd set. They use *at()
functions under the hood.

os.fwalk() could be implemented in terms of os.scandir() if the latter
would support fd parameter like os.listdir() does (be in os.supports_fd
set (note: it is different from os.supports_dir_fd)).

Victor Stinner suggested [1] to allow scandir(fd) but I don't see it
being mentioned in the pep 471 [2]: it neither supports nor rejects the
idea.

[1] https://mail.python.org/pipermail/python-dev/2014-July/135283.html
[2] http://legacy.python.org/dev/peps/pep-0471/


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Exposing the Android platform existence to Python modules

2014-08-03 Thread Akira Li
Guido van Rossum gu...@python.org writes:

 Well, it really does look like checking for the presence of those ANDROID_*
 environment variables it the best way to recognize the Android platform.
 Anyone can do that without waiting for a ruling on whether Android is Linux
 or not (which would be necessary because the docs for sys.platform are
 quite clear about its value on Linux systems). Googling terms like is
 Android Linux suggests that there is considerable controversy about the
 issue, so I suggest you don't wait. :-)

I don't see sysconfig mentioned in the discussion (maybe for a
reason). It might provide build-time information e.g.,

  built_for_android = 'android' in sysconfig.get_config_var('MULTIARCH')

assuming the complete value is something like 'arm-linux-android'.  It
says that the python binary is built for android (the current platform
may or may not be Android).


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Exposing the Android platform existence to Python modules

2014-08-03 Thread Akira Li
Shiz h...@shiz.me writes:

 The most obvious change would be to subprocess.Popen(). The reason a
 generic approach there won't work is also the reason I expect more
 changes might be needed: the Android file system doesn't abide by any
 POSIX file system standards. Its shell isn't located at /bin/sh, but at
 /system/bin/sh. The only directories it provides that are POSIX-standard
 are /dev and /etc, to my knowledge. You could check to see if
 /system/bin/sh exists and use that first, but that would break the
 preferred shell on POSIX systems that happen to have /system for some
 reason or another. In short: the preferred shell on POSIX systems is
 /bin/sh, but on Android it's /system/bin/sh. Simple existence checking
 might break the preferred shell on either. For more specific stdlib
 examples I'd have to check the test suite again.

FYI, /bin/sh is not POSIX, see
http://bugs.python.org/issue16353#msg224514


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Exposing the Android platform existence to Python modules

2014-08-01 Thread Akira Li
Shiz h...@shiz.me writes:

 Hi folks,

 I’m working on porting CPython to the Android platform, and while
 making decent progress, I’m currently stuck at a higher-level issue
 than adding #ifdefs for __ANDROID__ to C extension modules.

 The idea is, not only CPython extension modules have some assumptions
 that don’t seem to fit Android’s mold, some default Python-written
 modules do as well. However, whereas CPython extensions can trivially
 check if we’re building for Android by checking the __ANDROID__
 compiler macro, Python modules can do no such check, and are left
 wondering how to figure out if the platform they are currently running
 on is an Android one. To my knowledge there is no reliable way to
 detect if one is using Android as a vehicle for their journey using
 any other way.

 Now, the main question is: what would be the best way to ‘expose’ the
 indication that Android is being ran on to Python-living modules? My
 own thought was to add sys.getlinuxuserland(), or
 platform.linux_userland(), in similar vein to sys.getwindowsversion()
 and platform.linux_distribution(), which could return information
 about the userland of running CPython instance, instead of knowing
 merely the kernel and the distribution.

 This way, code could trivially check if it ran on the GNU(+associates)
 userland, or under a BSD-ish userland, or Android… and adjust its
 behaviour accordingly.

 I would be delighted to hear comments on this proposal, or better yet,
 alternative solutions. :)

 Kind regards,
 Shiz

 P.S.: I am well aware that Android might as well never be officially
 supported in CPython. In that case, consider this a thought experiment
 of how it /would/ be handled. :)

Python uses os.name, sys.platform, and various functions from `platform`
module to provide version info:

- coarse: os.name is 'posix', 'nt', 'ce', 'java' [1]. It is defined by
  availability of some builtin modules ('posix', 'nt' in
  particular) at import time.

- finer: sys.platform may start with freebsd, linux, win, cygwin, darwin
 (`uname -s`). It is defined at python build time.

- detailed: `platform` module. It provides as much info as possible
e.g., platform.uname(), platform.platform().
It may use runtime commands to get it.

If Android is posixy enough (would `posix` module work on Android?)
then os.name could be left 'posix'.

You could set sys.platform to 'android' (like sys.platform may be
'cygwin' on Windows) if Android is not like *any other* Linux
distribution (from the point of view of writing a working Python code on
it) i.e., if Android is further from other Linux distribution than
freebsd, linux, darwin from each other then it might deserve
sys.platform slot.

If sys.platform is left 'linux' (like sys.platform is 'darwin' on iOS)
then platform module could be used to detect Android e.g.,
platform.linux_distribution() though (it might be removed in Python 3.6)
it is unpredictable [2] unless you fix it on your python distribution,
e.g., here's an output on my machine:

   import platform
   platform.linux_distribution()
  ('Ubuntu', '14.04', 'trusty')

For example:

  is_android = (platform.linux_distribution()[0] == 'Android')

You could also define platform.android_version() that can provide Android
specific version details as much as you need:

  is_android = bool(platform.android_version().release)

You could provide an alias android_ver (like existing java_ver, libc_ver,
mac_ver, win32_ver).

See also, When to use os.name, sys.platform, or platform.system? [3]

Unrelated, TIL [4]:

  Android is a Linux distribution according to the Linux Foundation

[1] https://docs.python.org/3.4/library/os.html#os.name
[2] http://bugs.python.org/issue1322
[3]
http://stackoverflow.com/questions/4553129/when-to-use-os-name-sys-platform-or-platform-system
[4] http://en.wikipedia.org/wiki/Android_(operating_system)


btw, does it help adding os.get_shell_executable() [5] function, to
avoid hacking subprocess module, so that os.confstr('CS_PATH') or
os.defpath on Android could be defined to include /system/bin instead?

[5] http://bugs.python.org/issue16353


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 scandir accepted

2014-07-22 Thread Akira Li
Ben Hoyt benh...@gmail.com writes:

 I think if I were doing this from scratch I'd reimplement listdir() in
 Python as return [e.name for e in scandir(path)].
...
 So my basic plan is to have an internal helper function in
 posixmodule.c that either yields DirEntry objects or strings. And then
 listdir() would simply be defined something like return
 list(_scandir(path, yield_strings=True)) in C or in Python.

 My reasoning is that then there'll be much less (if any) code
 duplication between scandir() and listdir().

 Does this sound like a reasonable approach?

Note: listdir() accepts an integer path (an open file descriptor that
refers to a directory) that is passed to fdopendir() on POSIX [4] i.e.,
*you can't use scandir() to replace listdir() in this case* (as I've
already mentioned in [1]). See the corresponding tests from [2].

[1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html
[2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html

From os.listdir() docs [3]:

 This function can also support specifying a file descriptor; the file
 descriptor must refer to a directory.

[3] https://docs.python.org/3.4/library/os.html#os.listdir
[4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 scandir accepted

2014-07-22 Thread Akira Li
Ben Hoyt benh...@gmail.com writes:

 Note: listdir() accepts an integer path (an open file descriptor that
 refers to a directory) that is passed to fdopendir() on POSIX [4] i.e.,
 *you can't use scandir() to replace listdir() in this case* (as I've
 already mentioned in [1]). See the corresponding tests from [2].

 [1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html
 [2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html

 From os.listdir() docs [3]:

 This function can also support specifying a file descriptor; the file
 descriptor must refer to a directory.

 [3] https://docs.python.org/3.4/library/os.html#os.listdir
 [4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736

 Fair point.

 Yes, I hadn't realized listdir supported dir_fd (must have been
 looking at 2.x docs), though you've pointed it out at [1] above. and I
 guess I wasn't thinking about implementation at the time.

FYI, dir_fd is related but *different*: compare specifying a file
descriptor [1] vs. paths relative to directory descriptors [2].

NOTE: os.supports_fd and os.supports_dir_fd are different sets. [3]:

   import os
   os.listdir in os.supports_fd
  True
   os.listdir in os.supports_dir_fd
  False


[1] https://docs.python.org/3/library/os.html#path-fd
[2] https://docs.python.org/3/library/os.html#dir-fd
[3] https://mail.python.org/pipermail/python-dev/2014-July/135296.html

To be clear: *listdir() does not support dir_fd* though it can be
emulated using os.open(dir_fd=..).

You can safely ignore the rest of the e-mail until you want to implement
path-fd [1] support for os.scandir() in several months.

Here's code example that demonstrates both path-fd [1] and dir-fd [2]:

  import contextlib
  import os

  with contextlib.ExitStack() as stack:
  dir_fd = os.open('/etc', os.O_RDONLY)
  stack.callback(os.close, dir_fd)
  fd = os.open('init.d', os.O_RDONLY, dir_fd=dir_fd) # dir-fd [2]
  stack.callback(os.close, fd)
  print(\n.join(os.listdir(fd))) # path-fd [1]

It is the same as os.listdir('/etc/init.d') unless '/etc' is symlinked
to refer to another directory after the first os.open('/etc',..)
call. See also, os.fwalk(dir_fd=..) [4]

[4] https://docs.python.org/3/library/os.html#os.fwalk

 However, given that we have to support this for listdir() anyway, I
 think it's worth reconsidering whether scandir()'s directory argument
 can be an integer FD.

What is entry.path in this case? If input directory is a file descriptor
(an integer) then os.path.join(directory, entry.name) won't work.

PEP 471 should explicitly reject the support for specifying a file
descriptor so that a code that uses os.scandir may assume that
entry.path attribute is always present (no exceptions due
to a failure to read /proc/self/fd/NNN or an error while calling
fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see
http://stackoverflow.com/q/1188757 ). [5]

[5] https://mail.python.org/pipermail/python-dev/2014-July/135441.html

On the other hand os.fwalk() [4] that supports both path-fd [1] and
dir-fd [2] could be implemented without entry.path property if
os.scandir() supports just path-fd [1]. os.fwalk() provides a safe way
to traverse a directory tree without symlink races e.g., [6]:

  def get_tree_size(directory):
  Return total size of files in directory and subdirs.
  return sum(entry.lstat().st_size
 for root, dirs, files, rootfd in fwalk(directory)
 for entry in files)

[6] http://legacy.python.org/dev/peps/pep-0471/#examples

where fwalk() is the exact copy of os.fwalk() except that it uses
_fwalk() which is defined in terms of scandir():

  import os

  # adapt os._fwalk() to use scandir() instead of os.listdir()
  def _fwalk(topfd, toppath, topdown, onerror, follow_symlinks):
  # Note: This uses O(depth of the directory tree) file descriptors:
  # if necessary, it can be adapted to only require O(1) FDs, see
  # http://bugs.python.org/issue13734

  entries = scandir(topfd)
  dirs, nondirs = [], []
  for entry in entries: #XXX call onerror on OSError on next() and return?
  # report symlinks to directories as directories (like os.walk)
  #  but no recursion into symlinked subdirectories unless
  #  follow_symlinks is true

  # add dangling symlinks as nondirs (DirEntry.is_dir() doesn't
  #  raise on broken links)
  try:
  (dirs if entry.is_dir() else nondirs).append(entry)
  except FileNotFoundError:
  continue # ignore disappeared files

  if topdown:
  yield toppath, dirs, nondirs, topfd

  for entry in dirs:
  try:
  orig_st = entry.stat(follow_symlinks=follow_symlinks)
  #XXX O_DIRECTORY, O_CLOEXEC, [? O_NOCTTY, O_SEARCH ?]
  dirfd = os.open(entry.name, os.O_RDONLY, dir_fd=topfd)
  except OSError as err:
  if onerror is not None:
  

Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-14 Thread Akira Li
Nick Coghlan ncogh...@gmail.com writes:

 On 13 Jul 2014 20:54, Tim Delaney timothy.c.dela...@gmail.com wrote:

 On 14 July 2014 10:33, Ben Hoyt benh...@gmail.com wrote:



 If we go with Victor's link-following .is_dir() and .is_file(), then
 we probably need to add his suggestion of a follow_symlinks=False
 parameter (defaults to True). Either that or you have to say
 stat.S_ISDIR(entry.lstat().st_mode) instead, which is a little bit
 less nice.


 Absolutely agreed that follow_symlinks is the way to go, disagree on the
 default value.


 Given the above arguments for symlink-following is_dir()/is_file()
 methods (have I missed any, Victor?), what do others think?


 I would say whichever way you go, someone will assume the opposite. IMO
 not following symlinks by default is safer. If you follow symlinks by
 default then everyone has the following issues:

 1. Crossing filesystems (including onto network filesystems);

 2. Recursive directory structures (symlink to a parent directory);

 3. Symlinks to non-existent files/directories;

 4. Symlink to an absolutely huge directory somewhere else (very annoying
 if you just wanted to do a directory sizer ...).

 If follow_symlinks=False by default, only those who opt-in have to deal
 with the above.

 Or the ever popular symlink to . (or a directory higher in the tree).

 I think os.walk() is a good source of inspiration here: call the flag
 followlink and default it to False.


Let's not multiply entities beyond necessity.

There is well-defined *follow_symlinks* parameter
https://docs.python.org/3/library/os.html#follow-symlinks
e.g., os.access, os.chown, os.link, os.stat, os.utime and many other
functions in os module support follow_symlinks parameter, see
os.supports_follow_symlinks.

os.walk is an exception that uses *followlinks*. It might be because it
is an old function e.g., newer os.fwalk uses follow_symlinks.



As it has been said: os.path.isdir, pathlib.Path.is_dir in Python
File.directory? in Ruby, System.Directory.doesDirectoryExist in Haskell,
`test -d` in shell do follow symlinks i.e., follow_symlinks=True as
default is more familiar for .is_dir method.

`cd path` in shell, os.chdir(path), `ls path`, os.listdir(path), and
os.scandir(path) itself follow symlinks (even on Windows:
http://bugs.python.org/issue13772 ). GUI file managers such as
`nautilus` also treat symlinks to directories as directories -- you may
click on them to open corresponding directories.

Only *recursive* functions such as os.walk, os.fwalk do not follow
symlinks by default, to avoid symlink loops. Note: the behavior is
consistent with coreutils commands such as `cp` that follows symlinks
for non-recursive actions but e.g., `du` utility that is inherently
recursive doesn't follow symlinks by default.

follow_symlinks=True as default for DirEntry.is_dir method allows to
avoid easy-to-introduce bugs while replacing old
os.listdir/os.path.isdir code or writing a new code using the same
mental model.


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Akira Li
Nick Coghlan ncogh...@gmail.com writes:
...
 definition of floats and the definition of container invariants like
 assert x in [x])

 The current approach means that the lack of reflexivity of NaN's stays
 confined to floats and similar types - it doesn't leak out and infect
 the behaviour of the container types.

 What we've never figured out is a good place to *document* it. I
 thought there was an open bug for that, but I can't find it right now.

There was related issue Tuple comparisons with NaNs are broken
http://bugs.python.org/issue21873 
but it was closed as not a bug despite the corresponding behavior is
*not documented* anywhere.


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 Thread Akira Li
Ben Hoyt benh...@gmail.com writes:
...
 ``scandir()`` yields a ``DirEntry`` object for each file and directory
 in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'``
 pseudo-directories are skipped, and the entries are yielded in
 system-dependent order. Each ``DirEntry`` object has the following
 attributes and methods:

 * ``name``: the entry's filename, relative to the ``path`` argument
   (corresponds to the return values of ``os.listdir``)

 * ``full_name``: the entry's full path name -- the equivalent of
   ``os.path.join(path, entry.name)``

I suggest renaming .full_name - .path

.full_name might be misleading e.g., it implies that .full_name ==
abspath(.full_name) that might be false. The .path name has no such
associations.

The semantics of the the .path attribute is defined by these assertions::

for entry in os.scandir(topdir):
#NOTE: assume os.path.normpath(topdir) is not called to create .path
assert entry.path == os.path.join(topdir, entry.name)
assert entry.name == os.path.basename(entry.path)
assert entry.name == os.path.relpath(entry.path, start=topdir)
assert os.path.dirname(entry.path) == topdir
assert (entry.path != os.path.abspath(entry.path) or
os.path.isabs(topdir)) # it is absolute only if topdir is
assert (entry.path != os.path.realpath(entry.path) or
topdir == os.path.realpath(topdir)) # symlinks are not resolved
assert (entry.path != os.path.normcase(entry.path) or
topdir == os.path.normcase(topdir)) # no case-folding,
# unlike PureWindowsPath


...
 * ``is_dir()``: like ``os.path.isdir()``, but much cheaper -- it never
   requires a system call on Windows, and usually doesn't on POSIX
   systems

I suggest documenting the implicit follow_symlinks parameter for .is_X methods.

Note: lstat == partial(stat, follow_symlinks=False).

In particular, .is_dir() should probably use follow_symlinks=True by
default as suggested by Victor Stinner *if .is_dir() does it on Windows*

MSDN says: GetFileAttributes() does not follow symlinks.

os.path.isdir docs imply follow_symlinks=True: both islink() and
isdir() can be true for the same path.


...
 Like the other functions in the ``os`` module, ``scandir()`` accepts
 either a bytes or str object for the ``path`` parameter, and returns
 the ``DirEntry.name`` and ``DirEntry.full_name`` attributes with the
 same type as ``path``. However, it is *strongly recommended* to use
 the str type, as this ensures cross-platform support for Unicode
 filenames.

Document when {e.name for e in os.scandir(path)} != set(os.listdir(path))
+

e.g., path can be an open file descriptor in os.listdir(path) since
Python 3.3 but the PEP doesn't mention it explicitly.

It has been discussed already e.g.,
https://mail.python.org/pipermail/python-dev/2014-July/135296.html

PEP 471 should explicitly reject the support for specifying a file
descriptor so that a code that uses os.scandir may assume that
entry.path (.full_name) attribute is always present (no exceptions due
to a failure to read /proc/self/fd/NNN or an error while calling
fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see
http://stackoverflow.com/q/1188757 ).

Reject explicitly in PEP 471 the support for dir_fd parameter
+

aka the support for paths relative to directory descriptors.

Note: it is a *different* (but related) issue.


...
 Notes on exception handling
 ---

 ``DirEntry.is_X()`` and ``DirEntry.lstat()`` are explicitly methods
 rather than attributes or properties, to make it clear that they may
 not be cheap operations, and they may do a system call. As a result,
 these methods may raise ``OSError``.

 For example, ``DirEntry.lstat()`` will always make a system call on
 POSIX-based systems, and the ``DirEntry.is_X()`` methods will make a
 ``stat()`` system call on such systems if ``readdir()`` returns a
 ``d_type`` with a value of ``DT_UNKNOWN``, which can occur under
 certain conditions or on certain file systems.

 For this reason, when a user requires fine-grained error handling,
 it's good to catch ``OSError`` around these method calls and then
 handle as appropriate.


I suggest documenting that next(os.scandir()) may raise OSError

e.g., on POSIX it may happen due to an OS error in opendir/readdir/closedir

Also, document whether os.scandir() itself may raise OSError (whether
opendir or other OS functions may be called before the first yield).


...
os.scandir() should allow the explicit cleanup
++

::
with closing(os.scandir()) as entries:
for _ in entries:
break

entries.close() is called that frees the resources if necessary, to
*avoid relying on garbage-collection for managing file 

Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

2014-07-01 Thread Akira Li
Ben Hoyt benh...@gmail.com writes:

 Thanks, Victor.

 I don't have any experience with dir_fd handling, so unfortunately
 can't really comment here.

 What advantages does it bring? I notice that even os.listdir() on
 Python 3.4 doesn't have anything related to file descriptors, so I'd
 be in favour of not including support. We can always add it later.

 -Ben

FYI, os.listdir does support file descriptors in Python 3.3+ try:

   import os
   os.listdir(os.open('.', os.O_RDONLY))

NOTE: os.supports_fd and os.supports_dir_fd are different sets.

See also,
https://mail.python.org/pipermail/python-dev/2014-June/135265.html


--
Akira


P.S. Please, don't put your answer on top of the message you are
replying to.


 On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner victor.stin...@gmail.com 
 wrote:
 Hi,

 IMO we must decide if scandir() must support or not file descriptor.
 It's an important decision which has an important impact on the API.


 To support scandir(fd), the minimum is to store dir_fd in DirEntry:
 dir_fd would be None for scandir(str).


 scandir(fd) must not close the file descriptor, it should be done by
 the caller. Handling the lifetime of the file descriptor is a
 difficult problem, it's better to let the user decide how to handle
 it.

 There is the problem of the limit of open file descriptors, usually
 1024 but it can be lower. It *can* be an issue for very deep file
 hierarchy.

 If we choose to support scandir(fd), it's probably safer to not use
 scandir(fd) by default in os.walk() (use scandir(str) instead), wait
 until the feature is well tested, corner cases are well known, etc.


 The second step is to enhance pathlib.Path to support an optional file
 descriptor. Path already has methods on filenames like chmod(),
 exists(), rename(), etc.


 Example:

 fd = os.open(path, os.O_DIRECTORY)
 try:
for entry in os.scandir(fd):
   # ... use entry to benefit of entry cache: is_dir(), lstat_result ...
   path = pathlib.Path(entry.name, dir_fd=entry.dir_fd)
   # ... use path which uses dir_fd ...
 finally:
 os.close(fd)

 Problem: if the path object is stored somewhere and use after the
 loop, Path methods will fail because dir_fd was closed. It's even
 worse if a new directory uses the same file descriptor :-/ (security
 issue, or at least tricky bugs!)

 Victor
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Akira Li
Chris Angelico ros...@gmail.com writes:

 On Sat, Jun 28, 2014 at 11:05 PM, Akira Li 4kir4...@gmail.com wrote:
 Have you considered adding support for paths relative to directory
 descriptors [1] via keyword only dir_fd=None parameter if it may lead to
 more efficient implementations on some platforms?

 [1]: https://docs.python.org/3.4/library/os.html#dir-fd

 Potentially more efficient and also potentially safer (see 'man
 openat')... but an enhancement that can wait, if necessary.


Introducing the feature later creates unnecessary incompatibilities.
Either it should be explicitly rejected in the PEP 471 and
something-like `os.scandir(os.open(relative_path, dir_fd=fd))` recommended
instead (assuming `os.scandir in os.supports_fd` like `os.listdir()`).

At C level it could be implemented using fdopendir/openat or scandirat.

Here's the function description using Argument Clinic DSL:

/*[clinic input]

os.scandir

path : path_t(allow_fd=True, nullable=True) = '.'

*path* can be specified as either str or bytes. On some
platforms, *path* may also be specified as an open file
descriptor; the file descriptor must refer to a directory.  If
this functionality is unavailable, using it raises
NotImplementedError.

*

dir_fd : dir_fd = None

If not None, it should be a file descriptor open to a
directory, and *path* should be a relative string; path will
then be relative to that directory.  if *dir_fd* is
unavailable, using it raises NotImplementedError.

Yield a DirEntry object for each file and directory in *path*.

Just like os.listdir, the '.' and '..' pseudo-directories are skipped,
and the entries are yielded in system-dependent order.

{parameters}
It's an error to use *dir_fd* when specifying *path* as an open file
descriptor.

[clinic start generated code]*/


And corresponding tests (from test_posix:PosixTester), to show the
compatibility with os.listdir argument parsing in detail:

def test_scandir_default(self):
# When scandir is called without argument,
# it's the same as scandir(os.curdir).
self.assertIn(support.TESTFN, [e.name for e in posix.scandir()])

def _test_scandir(self, curdir):
filenames = sorted(e.name for e in posix.scandir(curdir))
self.assertIn(support.TESTFN, filenames)
#NOTE: assume listdir, scandir accept the same types on the platform
self.assertEqual(sorted(posix.listdir(curdir)), filenames)

def test_scandir(self):
self._test_scandir(os.curdir)

def test_scandir_none(self):
# it's the same as scandir(os.curdir).
self._test_scandir(None)

def test_scandir_bytes(self):
# When scandir is called with a bytes object,
# the returned entries names are still of type str.
# Call `os.fsencode(entry.name)` to get bytes
self.assertIn('a', {'a'})
self.assertNotIn(b'a', {'a'})
self._test_scandir(b'.')

@unittest.skipUnless(posix.scandir in os.supports_fd,
 test needs fd support for posix.scandir())
def test_scandir_fd_minus_one(self):
# it's the same as scandir(os.curdir).
self._test_scandir(-1)

def test_scandir_float(self):
# invalid args
self.assertRaises(TypeError, posix.scandir, -1.0)

@unittest.skipUnless(posix.scandir in os.supports_fd,
 test needs fd support for posix.scandir())
def test_scandir_fd(self):
fd = posix.open(posix.getcwd(), posix.O_RDONLY)
self.addCleanup(posix.close, fd)
self._test_scandir(fd)
self.assertEqual(
sorted(posix.scandir('.')),
sorted(posix.scandir(fd)))
# call 2nd time to test rewind
self.assertEqual(
sorted(posix.scandir('.')),
sorted(posix.scandir(fd)))

@unittest.skipUnless(posix.scandir in os.supports_dir_fd,
 test needs dir_fd support for os.scandir())
def test_scandir_dir_fd(self):
relpath = 'relative_path'
with support.temp_dir() as parent:
fullpath = os.path.join(parent, relpath)
with support.temp_dir(path=fullpath):
support.create_empty_file(os.path.join(parent, 'a'))
support.create_empty_file(os.path.join(fullpath, 'b'))
fd = posix.open(parent, posix.O_RDONLY)
self.addCleanup(posix.close, fd)
self.assertEqual(
sorted(posix.scandir(relpath, dir_fd=fd)),
sorted(posix.scandir(fullpath)))
# check that fd is still useful
self.assertEqual(
sorted(posix.scandir(relpath, dir_fd=fd)),
sorted(posix.scandir(fullpath)))


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-28 Thread Akira Li
Ben Hoyt benh...@gmail.com writes:

 Hi Python dev folks,

 I've written a PEP proposing a specific os.scandir() API for a
 directory iterator that returns the stat-like info from the OS, *the
 main advantage of which is to speed up os.walk() and similar
 operations between 4-20x, depending on your OS and file system.*
 ...
 http://legacy.python.org/dev/peps/pep-0471/
 ...
 Specifically, this PEP proposes adding a single function to the ``os``
 module in the standard library, ``scandir``, that takes a single,
 optional string as its argument::

 scandir(path='.') - generator of DirEntry objects


Have you considered adding support for paths relative to directory
descriptors [1] via keyword only dir_fd=None parameter if it may lead to
more efficient implementations on some platforms?

[1]: https://docs.python.org/3.4/library/os.html#dir-fd


--
akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character

2014-06-13 Thread Akira Li
Florian Bruhin m...@the-compiler.org writes:

 * Nikolaus Rath nikol...@rath.org [2014-06-12 19:11:07 -0700]:
 R. David Murray rdmur...@bitdance.com writes:
  Also notice that using a list with shell=True is using the API
  incorrectly.  It wouldn't even work on Linux, so that torpedoes
  the cross-platform concern already :)
 
  This kind of confusion is why I opened http://bugs.python.org/issue7839.
 
 Can someone describe an use case where shell=True actually makes sense
 at all?
 
 It seems to me that whenever you need a shell, the argument's that you
 pass to it will be shell specific. So instead of e.g.
 
 Popen('for i in `seq 42`; do echo $i; done', shell=True)
 
 you almost certainly want to do
 
 Popen(['/bin/sh', 'for i in `seq 42`; do echo $i; done'], shell=False)
 
 because if your shell happens to be tcsh or cmd.exe, things are going to
 break.

 My usecase is a spawn-command in a GUI application, which the user can
 use to spawn an executable. I want the user to be able to use the
 usual shell features from there. However, I also pass an argument to
 that command, and that should be escaped.

You should pass the command as a string and use cmd.exe quote rules [1]
(note: they are different from the one provided by
`subprocess.list2cmdline()` [2] that follows Microsoft C/C++ startup
code rules [3] e.g., `^` is not special unlike in cmd.exe case).

[1]: 
http://blogs.msdn.com/b/twistylittlepassagesallalike/archive/2011/04/23/everyone-quotes-arguments-the-wrong-way.aspx

[2]: 
https://docs.python.org/3.4/library/subprocess.html#converting-an-argument-sequence-to-a-string-on-windows

[3]: http://msdn.microsoft.com/en-us/library/17w5ykft%28v=vs.85%29.aspx


--
akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] should tests be thread-safe?

2014-05-11 Thread Akira Li
Victor Stinner victor.stin...@gmail.com writes:

 If you need a well defined environement, run your test in a subprocess.
 Depending on the random function, your test may be run with more threads.
 On BSD, it changes for example which thread receives a signal. Importing
 the tkinter module creates a hidden C thread for the Tk loop.

Does it mean that non-thread-safe tests can't be run using a GUI test
runner that is implemented using tkinter?


--
akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com