Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-27 Thread Gregory P. Smith
On Fri, Jun 27, 2014 at 2:58 PM, Nick Coghlan  wrote:
>
>  * -1 on including Windows specific globbing support in the API
> * -0 on including cross platform globbing support in the initial iteration
> of the API (that could be done later as a separate RFE instead)
>
Agreed.  Globbing or filtering support should not hold this up.  If that
part isn't settled, just don't include it and work out what it should be as
a future enhancement.

> * +1 on a new section in the PEP covering rejected design options (calling
> it iterdir, returning a 2-tuple instead of a dedicated DirEntry type)
>
+1.  IMNSHO, one of the most important part of PEPs: capturing the entire
decision process to document the "why nots".

> * regarding "why not a 2-tuple", we know from experience that operating
> systems evolve and we end up wanting to add additional info to this kind of
> API. A dedicated DirEntry type lets us adjust the information returned over
> time, without breaking backwards compatibility and without resorting to
> ugly hacks like those in some of the time and stat APIs (or even our own
> codec info APIs)
> * it would be nice to see some relative performance numbers for NFS and
> CIFS network shares - the additional network round trips can make excessive
> stat calls absolutely brutal from a speed perspective when using a network
> drive (that's why the stat caching added to the import system in 3.3
> dramatically sped up the case of having network drives on sys.path, and why
> I thought AJ had a point when he was complaining about the fact we didn't
> expose the dirent data from os.listdir)
>
fwiw, I wouldn't wait for benchmark numbers.

A needless stat call when you've got the information from an earlier API
call is already brutal. It is easy to compute from existing ballparks
remote file server / cloud access: ~100ms, local spinning disk seek+read:
~10ms. fetch of stat info cached in memory on file server on the local
network: ~500us.  You can go down further to local system call overhead
which can vary wildly but should likely be assumed to be at least 10us.

You don't need a benchmark to tell you that adding needless >= 500us-100ms
blocking operations to your program is bad. :)

-gps
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7

2014-06-27 Thread Victor Stinner
2014-06-26 13:04 GMT+02:00 Antoine Pitrou :
> For the same reason, I agree with Victor that we should ditch the
> threading-disabled builds. It's too much of a hassle for no actual,
> practical benefit. People who want a threadless unicodeless Python can
> install Python 1.5.2 for all I care.

By the way, adding a buildbot for testing Python without thread
support is not enough. The buildbot is currently broken since more
than one month and nobody noticed :-p

http://buildbot.python.org/all/builders/AMD64%20Fedora%20without%20threads%203.x/

Ok, I noticed, but I consider that I spent too much time on this minor
use case. I prefer to leave such task to someone else :-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-27 Thread Nick Coghlan
On 28 Jun 2014 01:27, "Jonas Wielicki"  wrote:
>
> On 27.06.2014 00:59, Ben Hoyt wrote:
> > Specifics of proposal
> > =
> > [snip] Each ``DirEntry`` object has the following
> > attributes and methods:
> > [snip]
> > Notes on caching
> > 
> >
> > The ``DirEntry`` objects are relatively dumb -- the ``name`` attribute
> > is obviously always cached, and the ``is_X`` and ``lstat`` methods
> > cache their values (immediately on Windows via ``FindNextFile``, and
> > on first use on Linux / OS X via a ``stat`` call) and never refetch
> > from the system.
>
> I find this behaviour a bit misleading: using methods and have them
> return cached results. How much (implementation and/or performance
> and/or memory) overhead would incur by using property-like access here?
> I think this would underline the static nature of the data.
>
> This would break the semantics with respect to pathlib, but they’re only
> marginally equal anyways -- and as far as I understand it, pathlib won’t
> cache, so I think this has a fair point here.

Indeed - using properties rather than methods may help emphasise the
deliberate *difference* from pathlib in this case (i.e. value when the
result was retrieved from the OS, rather than the value right now). The
main benefit is that switching from using the DirEntry object to a pathlib
Path will require touching all the places where the performance
characteristics switch from "memory access" to "system call". This benefit
is also the main downside, so I'd actually be OK with either decision on
this one.

Other comments:

* +1 on the general idea
* +1 on scandir() over iterdir, since it *isn't* just an iterator version
of listdir
* -1 on including Windows specific globbing support in the API
* -0 on including cross platform globbing support in the initial iteration
of the API (that could be done later as a separate RFE instead)
* +1 on a new section in the PEP covering rejected design options (calling
it iterdir, returning a 2-tuple instead of a dedicated DirEntry type)
* regarding "why not a 2-tuple", we know from experience that operating
systems evolve and we end up wanting to add additional info to this kind of
API. A dedicated DirEntry type lets us adjust the information returned over
time, without breaking backwards compatibility and without resorting to
ugly hacks like those in some of the time and stat APIs (or even our own
codec info APIs)
* it would be nice to see some relative performance numbers for NFS and
CIFS network shares - the additional network round trips can make excessive
stat calls absolutely brutal from a speed perspective when using a network
drive (that's why the stat caching added to the import system in 3.3
dramatically sped up the case of having network drives on sys.path, and why
I thought AJ had a point when he was complaining about the fact we didn't
expose the dirent data from os.listdir)

Regards,
Nick.

>
> regards,
> jwi
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] LZO bug

2014-06-27 Thread Raymond Hettinger

On Jun 27, 2014, at 9:56 AM, MRAB  wrote:

> Is this something that we need to worry about?
> 
> Raising Lazarus - The 20 Year Old Bug that Went to Mars
> http://blog.securitymouse.com/2014/06/raising-lazarus-20-year-old-bug-that.html



Debunking the LZ4 "20 years old bug" myth
http://fastcompression.blogspot.com/2014/06/debunking-lz4-20-years-old-bug-myth.html


Raymond



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] LZO bug

2014-06-27 Thread MRAB

Is this something that we need to worry about?

Raising Lazarus - The 20 Year Old Bug that Went to Mars
http://blog.securitymouse.com/2014/06/raising-lazarus-20-year-old-bug-that.html

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] buildbot.python.org down?

2014-06-27 Thread Benjamin Peterson
On Fri, Jun 27, 2014, at 02:14, Ned Deily wrote:
> The buildbot web site seems to have been down for some hours and still 
> is as of 0915 UTC.  I'm not sure who is watching over it but I'll ping 
> the infrastructure team as well.

Fixed. The VM crashed, and Ernest rebooted it.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2014-06-27 Thread Python tracker

ACTIVITY SUMMARY (2014-06-20 - 2014-06-27)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open4643 (-12)
  closed 29004 (+72)
  total  33647 (+60)

Open issues with patches: 2162 


Issues opened (50)
==

#6916: Remove deprecated items from asynchat
http://bugs.python.org/issue6916  reopened by ezio.melotti

#10312: intcatcher() can deadlock
http://bugs.python.org/issue10312  reopened by Claudiu.Popa

#21817: `concurrent.futures.ProcessPoolExecutor` swallows tracebacks
http://bugs.python.org/issue21817  opened by cool-RR

#21818: cookielib documentation references Cookie module, not cookieli
http://bugs.python.org/issue21818  opened by Ajtag

#21820: unittest: unhelpful truncating of long strings.
http://bugs.python.org/issue21820  opened by cjw296

#21821: The function cygwinccompiler.is_cygwingcc leads to FileNotFoun
http://bugs.python.org/issue21821  opened by paugier

#21822: KeyboardInterrupt during Thread.join hangs that Thread
http://bugs.python.org/issue21822  opened by tupl

#21825: Embedding-Python example code from documentation crashes
http://bugs.python.org/issue21825  opened by Pat.Le.Cat

#21826: Performance issue (+fix) AIX ctypes.util with no /sbin/ldconfi
http://bugs.python.org/issue21826  opened by tw.bert

#21827: textwrap.dedent() fails when largest common whitespace is a su
http://bugs.python.org/issue21827  opened by robertjli

#21830: ssl.wrap_socket fails on Windows 7 when specifying ca_certs
http://bugs.python.org/issue21830  opened by David.M.Noriega

#21833: Fix unicodeless build of Python
http://bugs.python.org/issue21833  opened by serhiy.storchaka

#21834: Fix a number of tests in unicodeless build
http://bugs.python.org/issue21834  opened by serhiy.storchaka

#21835: Fix Tkinter in unicodeless build
http://bugs.python.org/issue21835  opened by serhiy.storchaka

#21836: Fix sqlite3 in unicodeless build
http://bugs.python.org/issue21836  opened by serhiy.storchaka

#21837: Fix tarfile in unicodeless build
http://bugs.python.org/issue21837  opened by serhiy.storchaka

#21838: Fix ctypes in unicodeless build
http://bugs.python.org/issue21838  opened by serhiy.storchaka

#21839: Fix distutils in unicodeless build
http://bugs.python.org/issue21839  opened by serhiy.storchaka

#21840: Fix os.path in unicodeless build
http://bugs.python.org/issue21840  opened by serhiy.storchaka

#21841: Fix xml.sax in unicodeless build
http://bugs.python.org/issue21841  opened by serhiy.storchaka

#21842: Fix IDLE in unicodeless build
http://bugs.python.org/issue21842  opened by serhiy.storchaka

#21843: Fix doctest in unicodeless build
http://bugs.python.org/issue21843  opened by serhiy.storchaka

#21844: Fix HTMLParser in unicodeless build
http://bugs.python.org/issue21844  opened by serhiy.storchaka

#21845: Fix plistlib in unicodeless build
http://bugs.python.org/issue21845  opened by serhiy.storchaka

#21846: Fix zipfile in unicodeless build
http://bugs.python.org/issue21846  opened by serhiy.storchaka

#21847: Fix xmlrpc in unicodeless build
http://bugs.python.org/issue21847  opened by serhiy.storchaka

#21848: Fix logging  in unicodeless build
http://bugs.python.org/issue21848  opened by serhiy.storchaka

#21849: Fix multiprocessing for non-ascii data
http://bugs.python.org/issue21849  opened by serhiy.storchaka

#21850: Fix httplib and SimpleHTTPServer in unicodeless build
http://bugs.python.org/issue21850  opened by serhiy.storchaka

#21851: Fix gettext in unicodeless build
http://bugs.python.org/issue21851  opened by serhiy.storchaka

#21852: Fix optparse in unicodeless build
http://bugs.python.org/issue21852  opened by serhiy.storchaka

#21853: Fix inspect in unicodeless build
http://bugs.python.org/issue21853  opened by serhiy.storchaka

#21854: Fix cookielib in unicodeless build
http://bugs.python.org/issue21854  opened by serhiy.storchaka

#21855: Fix decimal in unicodeless build
http://bugs.python.org/issue21855  opened by serhiy.storchaka

#21856: memoryview: no overflow on large slice values (start, stop, st
http://bugs.python.org/issue21856  opened by haypo

#21857: assert that functions clearing the current exception are not c
http://bugs.python.org/issue21857  opened by haypo

#21859: Add Python implementation of FileIO
http://bugs.python.org/issue21859  opened by serhiy.storchaka

#21860: Correct FileIO docstrings
http://bugs.python.org/issue21860  opened by serhiy.storchaka

#21861: io class name are hardcoded in reprs
http://bugs.python.org/issue21861  opened by serhiy.storchaka

#21862: cProfile command-line should accept "-m module_name" as an alt
http://bugs.python.org/issue21862  opened by pitrou

#21863: Display module names of C functions in cProfile
http://bugs.python.org/issue21863  opened by pitrou

#21864: Error in documentation of point 9.8 'Exceptions are classes to
http://bugs.python.org/issue21864  opened by Peib

Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-27 Thread Jonas Wielicki
On 27.06.2014 00:59, Ben Hoyt wrote:
> Specifics of proposal
> =
> [snip] Each ``DirEntry`` object has the following
> attributes and methods:
> [snip]
> Notes on caching
> 
> 
> The ``DirEntry`` objects are relatively dumb -- the ``name`` attribute
> is obviously always cached, and the ``is_X`` and ``lstat`` methods
> cache their values (immediately on Windows via ``FindNextFile``, and
> on first use on Linux / OS X via a ``stat`` call) and never refetch
> from the system.

I find this behaviour a bit misleading: using methods and have them
return cached results. How much (implementation and/or performance
and/or memory) overhead would incur by using property-like access here?
I think this would underline the static nature of the data.

This would break the semantics with respect to pathlib, but they’re only
marginally equal anyways -- and as far as I understand it, pathlib won’t
cache, so I think this has a fair point here.

regards,
jwi
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-27 Thread Jonas Wielicki
On 27.06.2014 03:50, MRAB wrote:
> On 2014-06-27 02:37, Ben Hoyt wrote:
>> I don't mind iterdir() and would take it :-), but I'll just say why I
>> chose the name scandir() -- though it wasn't my suggestion originally:
>>
>> iterdir() sounds like just an iterator version of listdir(), kinda
>> like keys() and iterkeys() in Python 2. Whereas in actual fact the
>> return values are quite different (DirEntry objects vs strings), and
>> so the name change reflects that difference a little.
>>
> [snip]
> 
> The re module has 'findall', which returns a list of strings, and
> 'finditer', which returns an iterator that yields match objects, so
> there's a precedent. :-)

A bad precedent in my opinion though -- I was just recently bitten by
that, and I find it very untypical for python.

regards,
Jonas
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-27 Thread Paul Sokolovsky
Hello,

On Fri, 27 Jun 2014 12:08:41 +1000
Steven D'Aprano  wrote:

> On Fri, Jun 27, 2014 at 03:07:46AM +0300, Paul Sokolovsky wrote:
> 
> > With my MicroPython hat on, os.scandir() would make things only
> > worse. With current interface, one can either have inefficient
> > implementation (like CPython chose) or efficient implementation
> > (like MicroPython chose) - all transparently. os.scandir()
> > supposedly opens up efficient implementation for everyone, but at
> > the price of bloating API and introducing heavy-weight objects to
> > wrap info. 
> 
> os.scandir is not part of the Python API, it is not a built-in
> function. It is part of the CPython standard library. 

Ok, so standard library also has API, and that's the API being
discussed. 

> That means (in
> my opinion) that there is an expectation that other Pythons should
> provide it, but not an absolute requirement. Especially for the os
> module, which by definition is platform-specific. 

Yes, that's intuitive, but not strict and formal, so is subject to
interpretations. As a developer working on alternative Python
implementation, I'd like to have better understanding of what needs to
be done to be a compliant implementation (in particular, because I need
to pass that info down to the users). So, I was told that
https://docs.python.org/3/reference/index.html describes Python, not
CPython. Next step is figuring out whether 
https://docs.python.org/3/library/index.html describes Python or
CPython, and if the latter, how to separate Python's stdlib essence from
extended library CPython provides?

> In my opinion that
> means you have four options:
> 
> 1. provide os.scandir, with exactly the same semantics as on CPython;
> 
> 2. provide os.scandir, but change its semantics to be more
> lightweight (e.g. return an ordinary tuple, as you already suggest);
> 
> 3. don't provide os.scandir at all; or
> 
> 4. do something different depending on whether the platform is Linux
>or an embedded system.
> 
> I would consider any of those acceptable for a library feature, but
> not for a language feature.

Good, thanks. If that represents shared opinion of (C)Python developers
(so, there won't be claims like "MicroPython is not Python because it
doesn't provide os.scandir()" (or hundred of other missing stdlib
functions ;-) )) that's good enough already.

With that in mind, I wished that any Python implementation was as
complete and as efficient as possible, and one way to achieve that is
to not add stdlib entities without real need (be it more API calls or
more data types). So, I'm glad to know that os.scandir() passed thru
Occam's Razor in this respect and specified the way it is really for
common good.


[]

-- 
Best regards,
 Paul  mailto:pmis...@gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-27 Thread Paul Sokolovsky
Hello,

On Thu, 26 Jun 2014 21:52:43 -0400
Ben Hoyt  wrote:

[]

> It's a fair point that os.walk() can be implemented efficiently
> without adding a new function and API. However, often you'll want more
> info, like the file size, which scandir() can give you via
> DirEntry.lstat(), which is free on Windows. So opening up this
> efficient API is beneficial.
> 
> In CPython, I think the DirEntry objects are as lightweight as
> stat_result objects.
> 
> I'm an embedded developer by background, so I know the constraints
> here, but I really don't think Python's development should be tailored
> to fit MicroPython. If os.scandir() is not very efficient on
> MicroPython, so be it -- 99% of all desktop/server users will gain
> from it.

Surely, tailoring Python to MicroPython's needs is completely not what
I suggest. It was an example of alternative implementation which
optimized os.walk() without need for any additional public module APIs.
Vice-versa, high-level nature of API call like os.walk() and
underspecification of low-level details (like which function
implemented in terms of which others) allow MicroPython provide
optimized implementation even with its resource constraints. So, power
of high-level interfaces and underspecification should not be
underestimated ;-).

But I don't want to argue that os.scandir() is "not needed", because
that's hardly productive. Something I'd like to prototype in uPy and
ideally lead further up to PEP status is to add iterator-based string
methods, and I pretty much can expect "we lived without it" response,
so don't want to go the same way regarding addition of other
iterator-based APIs - it's clear that more iterator/generator based APIs
is a good direction for Python to evolve.

> > It would be better if os.scandir() was specified to return a struct
> > (named tuple) compatible with return value of os.stat() (with only
> > fields relevant to underlying readdir()-like system call). The
> > grounds for that are obvious: it's already existing data interface
> > in module "os", which is also based on open standard for operating
> > systems - POSIX, so if one is to expect something about file
> > attributes, it's what one can reasonably base expectations on.
> 
> Yes, we considered this early on (see the python-ideas and python-dev
> threads referenced in the PEP), but decided it wasn't a great API to
> overload stat_result further, and have most of the attributes None or
> not present on Linux.
> 
[]

> 
> However, for scandir() to be useful, you also need the name. My
> original version of this directory iterator returned two-tuples of
> (name, stat_result). But most people didn't like the API, and I don't
> really either. You could overload stat_result with a .name attribute
> in this case, but it still isn't a nice API to have most of the
> attributes None, and then you have to test for that, etc.

Yes, returning (name, stat_result) would be my first motion too, I
don't see why someone wouldn't like pair of 2 values, with each value
of obvious type and semantics within "os" module. Regarding stat
result, os.stat() provides full information about a file,
and intuitively, one may expect that os.scandir() would provide subset
of that info, asymptotically reaching volume of what os.stat() may
provide, depending on OS capabilities. So, if truly OS-independent
interface is wanted to salvage more data from a dir scanning, using
os.stat struct as data interface is hard to ignore.


But well, if it was rejected already, what can be said? Perhaps, at
least the PEP could be extended to explicitly mention other approached
which were discussed and rejected, not just link to a discussion
archive (from experience with reading other PEPs, they oftentimes
contained such subsections, so hope this suggestion is not ungrounded).

> 
> So basically we tweaked the API to do what was best, and ended up with
> it returning DirEntry objects with is_file() and similar methods.
> 
> Hope that helps give a bit more context. If you haven't read the
> relevant python-ideas and python-dev threads, those are interesting
> too.
> 
> -Ben



-- 
Best regards,
 Paul  mailto:pmis...@gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-27 Thread Chris Barker - NOAA Federal
On Jun 26, 2014, at 4:38 PM, Tim Delaney 
wrote:

On 27 June 2014 09:28, MRAB  wrote:

>
> -1 for windows_wildcard (it would be an attractive nuisance to write
windows-only code)


Could you emulate it on other platforms?

+1 on the rest of it.

-Chris
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Binary CPython distribution for Linux

2014-06-27 Thread Nick Coghlan
On 27 Jun 2014 17:33, "Bohuslav Kabrda"  wrote:
>
> It's not true that 2.7 wasn't released until few weeks ago. It was
released few weeks ago as part of RHEL 7, but Red Hat has been shipping Red
Hat Software Collections (RHSCL) 1.0, that contain Python 2.7 and Python
3.3, for almost a year now [1] - RHSCL is installable on RHEL 6; RHSCL 1.1
(also with 2.7 and 3.3) has been released few weeks ago and is supported on
RHEL 6 and 7. Also, these collections now have their community rebuilds at
[2], so you can just download them without needing to talk to Red Hat at
all. But yeah, these are all RPMs, so you have to be root to install them.

Indeed, while there are still some rough edges, software collections look
like the best approach to doing maintainable system installs of Python
runtimes other than the system Python into Fedora/RHEL/CentOS et al (and I
say that while wearing both my upstream and downstream hats).

Collections solve this problem in a general (rather than CPython specific)
way, since they can be used to get upgraded versions of language runtimes,
databases, web servers, etc, all without risking the stability of the OS
itself. I hope to see someone put together collections for PyPy and PyPy3
as well.

The approaches used for runtime isolation of software collections should
also be applicable to Debian systems, but (as far as I am aware) the
tooling to build them as debs rather than RPMs doesn't exist yet.

> Please don't take this as a criticism of your ideas, I see what you're
trying to solve. I just think the way you're trying to solve it is
unachievable or would consume so much community resources, that it would
end up unmaintained and buggy most of the time.

For prebuilt userland installs on Linux, I think "miniconda" is the current
best available approach. It has its challenges (especially around its
handling of security concerns), but it's designed to offer a full cross
platform package management system that makes it well suited to the task of
managing prebuilt language runtimes in user space.

Cheers,
Nick.

>
> --
> Regards,
> Bohuslav "Slavek" Kabrda.
>
> [1] http://developerblog.redhat.com/2013/09/12/rhscl1-ga/
> [2] https://www.softwarecollections.org/en/scls/
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] buildbot.python.org down?

2014-06-27 Thread Ned Deily
The buildbot web site seems to have been down for some hours and still 
is as of 0915 UTC.  I'm not sure who is watching over it but I'll ping 
the infrastructure team as well.

-- 
 Ned Deily,
 n...@acm.org

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-27 Thread Victor Stinner
Hi,

You wrote a great PEP Ben, thanks :-) But it's now time  for comments!

> But the underlying system calls -- ``FindFirstFile`` /
> ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X --

What about FreeBSD, OpenBSD, NetBSD, Solaris, etc. They don't provide readdir?

You should add a link to FindFirstFile doc:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364418%28v=vs.85%29.aspx

It looks like the WIN32_FIND_DATA has a dwFileAttributes field. So we
should mimic stat_result recent addition: the new
stat_result.file_attributes field. Add DirEntry.file_attributes which
would only be available on Windows.

The Windows structure also contains

  FILETIME ftCreationTime;
  FILETIME ftLastAccessTime;
  FILETIME ftLastWriteTime;
  DWORDnFileSizeHigh;
  DWORDnFileSizeLow;

It would be nice to expose them as well. I'm  no more surprised that
the exact API is different depending on the OS for functions of the os
module.

> * Instead of bare filename strings, it returns lightweight
>   ``DirEntry`` objects that hold the filename string and provide
>   simple methods that allow access to the stat-like data the operating
>   system returned.

Does your implementation uses a free list to avoid the cost of memory
allocation? A short free list of 10 or maybe just 1 may help. The free
list may be stored directly in the generator object.

> ``scandir()`` yields a ``DirEntry`` object for each file and directory
> in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'``
> pseudo-directories are skipped, and the entries are yielded in
> system-dependent order. Each ``DirEntry`` object has the following
> attributes and methods:

Does it support also bytes filenames on UNIX?

Python now supports undecodable filenames thanks to the PEP 383
(surrogateescape). I prefer to use the same type for filenames on
Linux and Windows, so Unicode is better. But some users might prefer
bytes for other reasons.

> The ``DirEntry`` attribute and method names were chosen to be the same
> as those in the new ``pathlib`` module for consistency.

Great! That's exactly what I expected :-) Consistency with other modules.

> Notes on caching
> 
>
> The ``DirEntry`` objects are relatively dumb -- the ``name`` attribute
> is obviously always cached, and the ``is_X`` and ``lstat`` methods
> cache their values (immediately on Windows via ``FindNextFile``, and
> on first use on Linux / OS X via a ``stat`` call) and never refetch
> from the system.
>
> For this reason, ``DirEntry`` objects are intended to be used and
> thrown away after iteration, not stored in long-lived data structured
> and the methods called again and again.
>
> If a user wants to do that (for example, for watching a file's size
> change), they'll need to call the regular ``os.lstat()`` or
> ``os.path.getsize()`` functions which force a new system call each
> time.

Crazy idea: would it be possible to "convert" a DirEntry object to a
pathlib.Path object without losing the cache? I guess that
pathlib.Path expects a full  stat_result object.

> Or, for getting the total size of files in a directory tree -- showing
> use of the ``DirEntry.lstat()`` method::
>
> def get_tree_size(path):
> """Return total size of files in path and subdirs."""
> size = 0
> for entry in scandir(path):
> if entry.is_dir():
> sub_path = os.path.join(path, entry.name)
> size += get_tree_size(sub_path)
> else:
> size += entry.lstat().st_size
> return size
>
> Note that ``get_tree_size()`` will get a huge speed boost on Windows,
> because no extra stat call are needed, but on Linux and OS X the size
> information is not returned by the directory iteration functions, so
> this function won't gain anything there.

I don't understand how you can build a full lstat() result without
really calling stat. I see that WIN32_FIND_DATA contains the size, but
here you call lstat(). If you know that it's not a symlink, you
already know the size, but you still have to call stat() to retrieve
all fields required to build a stat_result no?

> Support
> ===
>
> The scandir module on GitHub has been forked and used quite a bit (see
> "Use in the wild" in this PEP),

Do you plan to continue to maintain your module for Python < 3.5, but
upgrade your module for the final PEP?

> Should scandir be in its own module?
> 
>
> Should the function be included in the standard library in a new
> module, ``scandir.scandir()``, or just as ``os.scandir()`` as
> discussed? The preference of this PEP's author (Ben Hoyt) would be
> ``os.scandir()``, as it's just a single function.

Yes, put it in the os module which is already bloated :-)

> Should there be a way to access the full path?
> --
>
> Should ``DirEntry``'s have a way to get the full path without using
> ``os.path.join(path, entry.name)``? This is

Re: [Python-Dev] Binary CPython distribution for Linux

2014-06-27 Thread Bohuslav Kabrda
- Original Message -
> While much of the opposition to dropping Python <2.7 stems from the RHEL
> community (they still have 2.4 in extended support and 2.7 wasn't in a
> release until a few weeks ago), a common objection from the users is "I
> can't install a different Python" or "it's too difficult to install a
> different Python." The former is a legit complaint - if you are on
> shared hosting and don't have root, as easy as it is to add an alternate
> package repository that provides 2.7 (or newer), you don't have the
> permissions so you can't do it.

It's not true that 2.7 wasn't released until few weeks ago. It was released few 
weeks ago as part of RHEL 7, but Red Hat has been shipping Red Hat Software 
Collections (RHSCL) 1.0, that contain Python 2.7 and Python 3.3, for almost a 
year now [1] - RHSCL is installable on RHEL 6; RHSCL 1.1 (also with 2.7 and 
3.3) has been released few weeks ago and is supported on RHEL 6 and 7. Also, 
these collections now have their community rebuilds at [2], so you can just 
download them without needing to talk to Red Hat at all. But yeah, these are 
all RPMs, so you have to be root to install them.

> I'd like to propose a solution to this problem: a pre-built distribution
> of CPython for Linux available via www.python.org in the list of
> downloads for a particular release [5]. This distribution could be
> downloaded and unarchived into the user's home directory and users could
> start running it immediately by setting an environment variable or two,
> creating a symlink, or even running a basic installer script. This would
> hopefully remove the hurdles of obtaining a (sane) Python distribution
> on Linux. This would allow projects to more easily drop end-of-life
> Python versions and would speed adoption of modern Python, including
> Python 3 (because porting is much easier if you only have to target 2.7).
> 
> I understand there may be technical challenges with doing this for some
> distributions and with producing a universal binary distribution. I
> would settle for a binary distribution that was targeted towards RHEL
> users and variant distros, as that is the user population that I
> perceive to be the most conservative and responsible for holding modern
> Python adoption back.

Speaking with my Fedora/RHEL/RHSCL Python maintainer's hat on, prebuilding 
Python is not as easy task as it may seem :) Someone has to write the build 
scripts (e.g. sort of specfile, but rpm/specfile wouldn't really work for you, 
since you want to install in user's home dirs). Someone has to update them when 
new Python comes out, so in the worst case you end up with slightly different 
build scripts for different versions of Python. Someone has to do rebuilds when 
there is CVE. Or a bug. Or a user requests a feature that makes sense. Someone 
has to do that for *each packaged version* - and each packaged version needs to 
be maintained for some amount of time so that it all actually makes sense.
Maintaining a prebuilt distribution of Python is a time consuming task even if 
you do it just for one Linux distro. If you want to maintain a *universal* 
prebuilt Python distribution, then you'll find out that it's a) undoable b) 
consumes so many resources and it's so fragile, that it's probably not worth 
it. You could just bundle all Python dependencies into your distribution to 
make it "easier", but that would just make the result grow in size (perhaps 
significantly) and you would then also need to update/bugfix/securityfix the 
bundled dependencies (which would consume even more time).
Please don't take this as a criticism of your ideas, I see what you're trying 
to solve. I just think the way you're trying to solve it is unachievable or 
would consume so much community resources, that it would end up unmaintained 
and buggy most of the time.

-- 
Regards,
Bohuslav "Slavek" Kabrda.

[1] http://developerblog.redhat.com/2013/09/12/rhscl1-ga/
[2] https://www.softwarecollections.org/en/scls/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com