[issue45239] email.utils.parsedate_tz raises UnboundLocalError if time has more than 2 dots in it

2021-09-18 Thread Ben Hoyt


Change by Ben Hoyt :


--
keywords: +patch
pull_requests: +26853
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/28452

___
Python tracker 
<https://bugs.python.org/issue45239>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45239] email.utils.parsedate_tz raises UnboundLocalError if time has more than 2 dots in it

2021-09-18 Thread Ben Hoyt


Ben Hoyt  added the comment:

For reference, here's a repro case:

$ python3.10 -c 'import email.utils; \
email.utils.parsedate_tz("Wed, 3 Apr 2002 12.34.56.78+0800")'
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.10/email/_parseaddr.py", line 50, in parsedate_tz
res = _parsedate_tz(data)
  File "/usr/local/lib/python3.10/email/_parseaddr.py", line 134, in 
_parsedate_tz
thh = int(thh)
UnboundLocalError: local variable 'thh' referenced before assignment

--

___
Python tracker 
<https://bugs.python.org/issue45239>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45239] email.utils.parsedate_tz raises UnboundLocalError if time has more than 2 dots in it

2021-09-18 Thread Ben Hoyt


New submission from Ben Hoyt :

In going through some standard library code, I found that the 
email.utils.parsedate_tz() function 
(https://docs.python.org/3/library/email.utils.html#email.utils.parsedate_tz) 
has a bug if the time value is in dotted format and has more than 2 dots in it, 
for example: "12.34.56.78". Per the docs, it should return None in the case of 
invalid input.

This is happening because in the case that handles the '.' separator (instead 
of the normal ':'), there's no else clause to return None when there's not 2 or 
3 segments.

>From 
>https://github.com/python/cpython/blob/dea59cf88adf5d20812edda330e085a4695baba4/Lib/email/_parseaddr.py#L118-L132:

if len(tm) == 2:
[thh, tmm] = tm
tss = '0'
elif len(tm) == 3:
[thh, tmm, tss] = tm
elif len(tm) == 1 and '.' in tm[0]:
# Some non-compliant MUAs use '.' to separate time elements.
tm = tm[0].split('.')
if len(tm) == 2:
[thh, tmm] = tm
tss = 0
elif len(tm) == 3:
[thh, tmm, tss] = tm
# HERE: need "else: return None"
else:
return None

We simply need to include that additional "else: return None" block in the '.' 
handling case.

I'll submit a pull request that fixes this soon (and adds a test for this case).

--
components: Library (Lib)
messages: 402140
nosy: barry, benhoyt
priority: normal
severity: normal
status: open
title: email.utils.parsedate_tz raises UnboundLocalError if time has more than 
2 dots in it
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue45239>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34712] Style fixes in examples of "Input and Output" tutorial section

2018-09-17 Thread Ben Hoyt


Ben Hoyt  added the comment:

GitHub PR link with changes and commit notes attached.

--

___
Python tracker 
<https://bugs.python.org/issue34712>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34712] Style fixes in examples of "Input and Output" tutorial section

2018-09-17 Thread Ben Hoyt


Change by Ben Hoyt :


--
keywords: +patch
pull_requests: +8784
stage:  -> patch review

___
Python tracker 
<https://bugs.python.org/issue34712>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34712] Style fixes in examples of "Input and Output" tutorial section

2018-09-17 Thread Ben Hoyt


New submission from Ben Hoyt :

There are a couple of examples in the "Input and Output" section of the 
tutorial that use an unusual / non-PEP 8 Python style or whitespace. I think 
our examples, especially in the tutorial, should reflect good, PEP 8 style. 
I'll fix these in a PR which I'll attach soon.

--
assignee: docs@python
components: Documentation
messages: 325550
nosy: akuchling, benhoyt, docs@python, rhettinger
priority: normal
severity: normal
status: open
title: Style fixes in examples of "Input and Output" tutorial section
versions: Python 3.8

___
Python tracker 
<https://bugs.python.org/issue34712>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31803] Remove not portable time.clock(), replaced by time.perf_counter() and time.process_time()

2017-10-17 Thread Ben Hoyt

Ben Hoyt <benh...@gmail.com> added the comment:

I don't think this is a good idea. I still use time.clock() out of habit (on 
Windows), and from the PR it's clear that the standard library and tests do 
too. I wouldn't be at all surprised to find others do as well.

I realize it's been deprecated since 3.3, but without a DeprecatingWarning, 
that's easy to miss. What about adding a DeprecatingWarning for 3.7 and 
deprecate it later, maybe in 3.9? (I know that's a long time, but time.clock() 
is an old function so we should be cautious!)

--
nosy: +benhoyt

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31803>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30659] Use ** correctly for "kwargs" in signature of namedtuple._replace()

2017-06-13 Thread Ben Hoyt

New submission from Ben Hoyt:

The documentation for namedtuple._replace() needs a tweak: currently its 
signature shows "kwargs" without the ** notation, indicating it's a single, 
positional argument. It should have the ** to indicate it's actually any number 
of keyword arguments.

See 
https://docs.python.org/3/library/collections.html#collections.somenamedtuple._replace
 and compare with the docs for the signature of Logger.debug() for example: 
https://docs.python.org/3/library/logging.html#logging.Logger.debug

--
assignee: docs@python
components: Documentation
messages: 295949
nosy: benhoyt, docs@python, rhettinger
priority: normal
pull_requests: 2225
severity: normal
status: open
title: Use ** correctly for "kwargs" in signature of namedtuple._replace()
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30659>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29540] Add compact=True flag to json.dump/dumps

2017-03-14 Thread Ben Hoyt

Ben Hoyt added the comment:

Agreed. Seems much the same as other argument constants,
like pickle.HIGHEST_PROTOCOL for the "protocol" argument. These are not
changing the API, just adding a helper constant to avoid the magic values.

-Ben

On Tue, Mar 14, 2017 at 12:24 PM, Brett Cannon <rep...@bugs.python.org>
wrote:

>
> Brett Cannon added the comment:
>
> I agree with David that I don't see how adding a constant to the module is
> really a complication of an API.
>
> --
>
> ___
> Python tracker <rep...@bugs.python.org>
> <http://bugs.python.org/issue29540>
> ___
>

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29540>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28609] argparse claims '*' positional argument is required in error output

2017-02-27 Thread Ben Hoyt

Ben Hoyt added the comment:

I definitely agree with Reuben here -- I just ran into this issue while 
creating a "production quality" tool, and the help message produced by argparse 
with nargs='*' confused me too. It's pretty clear that this is a simple bug in 
argparse's production of that usage message: it says ARGUMENT is required, but 
it isn't.

However, the workaround of specifying an (unused) default is a reasonable 
workaround in the meantime -- thanks Paul.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28609>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28609] argparse claims '*' positional argument is required in error output

2017-02-27 Thread Ben Hoyt

Changes by Ben Hoyt <benh...@gmail.com>:


--
nosy: +benhoyt

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28609>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29540] Add compact=True flag to json.dump/dumps

2017-02-17 Thread Ben Hoyt

Ben Hoyt added the comment:

I agree with the confusion (PR proposes separators=COMPACT, issue compact=True).

I like the concept but not either of the APIs proposed. I *much* more often 
want to get pretty output -- if I had a dime for every time I've written 
"json.dumps(obj, sort_keys=True, indent=4)" I'd be rich enough to buy an entire 
cup of Starbucks coffee. But then you'd need a pretty=True option as well, 
which would be mutually exclusive with compact=True, so not great.

But what about a style= (or "format="?) parameter, which defaults to 'default' 
(or just None) meaning the same as now. If you pass format='pretty' you get 
"sort_keys=True, indent=4" and if you pass format='compact' you get 
"separators=(',', ':')".

--
nosy: +benhoyt

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29540>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29549] Improve docstring for str.index

2017-02-17 Thread Ben Hoyt

Ben Hoyt added the comment:

Good call. Additionally, it's weird to me that the docstring for str.find() 
says "Return -1 on failure." because the string not being found is not a 
"failure". Like the docs 
(https://docs.python.org/3.5/library/stdtypes.html#str.find) say, the 
str.find() docstring should say "Return -1 if sub is not found."

--
nosy: +benhoyt

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29549>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27719] Misleading note about "args" attribute in "User-defined Exceptions" section of tutorial

2016-08-11 Thread Ben Hoyt

Ben Hoyt added the comment:

Removing that whole example sounds good to me, thanks.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27719>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27719] Misleading note about "args" attribute in "User-defined Exceptions" section of tutorial

2016-08-10 Thread Ben Hoyt

Ben Hoyt added the comment:

Okay, patch attached -- I think it's best to simply remove that sentence.

Side note: this is why GitHub is so good -- click edit, remove the sentence, 
click create pull request (approx time 30 seconds). I've got a new machine 
since I last did Python development, so to create an actual patch I needed to: 
download Mercurial, clone the repo, install Sphinx, edit the file, build the 
docs, create the patch (approx time 30 minutes). On the positive side, each of 
those steps was painless and trouble-free. :-)

--
keywords: +patch
Added file: http://bugs.python.org/file44071/issue27719-1.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27719>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27719] Misleading note about "args" attribute in "User-defined Exceptions" section of tutorial

2016-08-09 Thread Ben Hoyt

Ben Hoyt added the comment:

Note that I added the committers from issue 1692335 to the Nosy List -- 
probably overzealous and probably not the folks who maintain the docs, but oh 
well. :-)

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27719>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27719] Misleading note about "args" attribute in "User-defined Exceptions" section of tutorial

2016-08-09 Thread Ben Hoyt

New submission from Ben Hoyt:

In the official tutorial in the "User-defined Exceptions" section 
(https://docs.python.org/3.5/tutorial/errors.html#user-defined-exceptions) 
there's a note about a user-defined Exception subclass as follows:

"In this example, the default __init__() of Exception has been overridden. The 
new behavior simply creates the value attribute. This replaces the default 
behavior of creating the args attribute."

That last sentence is wrong: it used to be that way (it is in Python 2.x and I 
believe in Python pre 3.2), but now the implementation of BaseException.__new__ 
now sets args, so even when you override __init__ and don't call super() args 
is set. I think that's what you want, so I'm putting this down to a 
documentation bug. I think the sentence "This replaces the default behavior of 
creating the args attribute." should simply be removed (BaseException.__init__ 
basically does nothing now).

This change happened for Python 3.3 and was backported to Python 3.2. See also:

* The relevant part of BaseException.__new__ in the CPython codebase: 
https://github.com/python/cpython/blob/601ee5fab5df81a25611da0667030de531c1cda9/Objects/exceptions.c#L44-L48
* The issue where this behaviour was changed: 
http://bugs.python.org/issue1692335
* The commit where it was changed: 
https://hg.python.org/cpython/rev/68e2690a471d (on GitHub at 
https://github.com/python/cpython/commit/a71a87e695b05a67bd22c6ac74311b1f56f3932e)
* You can repro this and confirm that "args" is definitely set with the 
attached script:

$ python exception_args_test.py 
sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)
Error args: ('error!',)

--
assignee: docs@python
components: Documentation
files: exception_args_test.py
messages: 272256
nosy: benhoyt, docs@python, georg.brandl, sbt
priority: normal
severity: normal
status: open
title: Misleading note about "args" attribute in "User-defined Exceptions" 
section of tutorial
versions: Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6
Added file: http://bugs.python.org/file44058/exception_args_test.py

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27719>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26248] Improve scandir DirEntry docs, especially re symlinks and caching

2016-02-05 Thread Ben Hoyt

Ben Hoyt added the comment:

Seeing this has been merged (thanks Victor), can this issue be closed? Guido, 
are you happy with the changes given your comments at 
http://bugs.python.org/issue26032#msg257665?

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26248>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26248] Improve scandir DirEntry docs, especially re symlinks and caching

2016-02-01 Thread Ben Hoyt

Ben Hoyt added the comment:

Thanks, Victor!

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26248>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26032] Use scandir() to speed up pathlib globbing

2016-01-31 Thread Ben Hoyt

Ben Hoyt added the comment:

Guido, I've made some tweaks and improvements to the DirEntry docs here: 
http://bugs.python.org/issue26248 -- the idea is to fix the issues you 
mentioned to clarify when system calls are required with symlinks, mentioning 
that the results are cached separately for follow_symlinks True and False, etc.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26032>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26248] Improve scandir DirEntry docs, especially re symlinks and caching

2016-01-31 Thread Ben Hoyt

New submission from Ben Hoyt:

Per Guido's comment about DirEntry documentation on Issue 26032, especially 
http://bugs.python.org/issue26032#msg257665, it'd be good to improve the docs 
of the scandir DirEntry methods with regard to symlinks and caching.

Attaching my stab at a documentation fix. Changes here are:

1) Clarify that the return values of is_dir()/is_file()/etc are cached 
separately for follow_symlinks True and False.
2) Be more specific about when the functions require a system call, and how it 
relates to caching and follow_symlinks.
3) DRY up common stuff between is_dir and is_file by saying "Caching, system 
calls made, and exceptions raised are as per is_dir" in is_file.
4) Tweak to the first paragraph of docs for is_dir/is_file to simplify: assume 
the follow_symlinks=True default, then note the follow_symlinks=False 
non-default case after.

I think they're all improvements, though I'm not sure about #3. Is it better to 
just repeat those couple of paragraphs verbatim?

I'm also not 100% sure about mentioning the DT_UNKNOWN thing. But you really 
can't get more specific about when system calls are required on Unix without 
mentioning it.

--
assignee: docs@python
components: Documentation
files: direntry_doc_improvements.patch
keywords: patch
messages: 259282
nosy: benhoyt, docs@python, gvanrossum, haypo, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Improve scandir DirEntry docs, especially re symlinks and caching
versions: Python 3.5, Python 3.6
Added file: http://bugs.python.org/file41765/direntry_doc_improvements.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26248>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26111] On Windows, os.scandir will keep a handle on the directory until the iterator is exhausted

2016-01-14 Thread Ben Hoyt

Changes by Ben Hoyt <benh...@gmail.com>:


--
nosy: +benhoyt

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26111>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25596] Use scandir() to speed up the glob module

2016-01-11 Thread Ben Hoyt

Ben Hoyt added the comment:

Nice! Good to see scandir() getting used to speed up other functions, functions 
that folks are using a ton already, like glob().

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25596>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26032] Use scandir() to speed up pathlib globbing

2016-01-06 Thread Ben Hoyt

Ben Hoyt added the comment:

Guido, it's true that in almost all cases you get the speedup (no system call), 
and it's very much worth using. But the idea with the docs being non-committal 
is because being specific would make the docs fairly complex. I believe it's as 
follows for is_file/is_dir/is_symlink:

* no system call required on Windows or Unix if the entry is not a symlink
* unless you're on Unix with some different file system (maybe a network FS?) 
where d_type is DT_UNKNOWN
* some other edge case which I've probably forgotten :-)

Do you think the docs should try to make this more specific?

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26032>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25994] File descriptor leaks in os.scandir()

2016-01-02 Thread Ben Hoyt

Ben Hoyt added the comment:

I'm not sure this is actually a leak, because (looking at the code) 
ScandirIterator_close() is called not just in the destructor, but also at the 
end of iterating, just before raising StopIteration. Is the issue that if an 
exception is raised or you stop iterating before the end, then it's less 
determined when the destructor/close is called?

I think we could fairly easily add an explicit close method to the iterator and 
make it a context manager (add __enter__ and __exit__ methods to the iterator). 
Is this what you're thinking of:

# manual close
it = scandir.scandir(path)
first_entry = next(it)
it.close()

with scandir.scandir(path) as it:
first_entry = next(it)

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25994>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25911] Regression: os.walk now using os.scandir() breaks bytes filenames on windows

2015-12-21 Thread Ben Hoyt

Ben Hoyt added the comment:

Just to clarify what Victor and Zachary mentioned: bytes filenames have been 
deprecated only *on Windows* since Python 3.3; bytes filenames are still fine 
on POSIX.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25911>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24013] Improve os.scandir() and DirEntry documentation

2015-04-19 Thread Ben Hoyt

New submission from Ben Hoyt:

Victor Stinner's documentation for os.scandir and DirEntry is a great start 
(https://docs.python.org/dev/library/os.html#os.scandir), however there are a 
few mistakes in it, and a few ways I think it could be improved. Attaching a 
patch with the following overall changes:

1. Tweak the see also note on os.listdir() to mention *why* you'd want to use 
scandir -- namely that it includes file attribute info, and performance.
2. Move that str/bytes note in the scandir() docs down a bit, as I think that's 
really a detail and the other stuff is more important.
3. More details on why you'd want to use scandir -- to increase the 
performance of code that also needs file type or file -- and be more specific 
about what system calls are and are not required.
4. Replace POSIX with Unix, per the rest of the os module docs when talking 
about generic Unix-like operating systems. It seems POSIX is used 
specifically when talking about the POSIX standard.
5. Remove half-true statement in DirEntry docs, if a file is deleted between 
calling scandir and stat, a FileNotFoundError can be raised -- but the method 
docs state that they specifically don't raise FileNotFoundError.
6. Removed unfortunately, the behaviour of errors depends on the platform. 
This is always the case and doesn't add anything.
7. Tried to simplify and clarify the is_dir() and is_file() docs. Not sure I've 
succeeded here. This is hard!
8. Added Availability: Unix, Windows. to scandir docs like listdir and most 
other os functions have.
9. Remove the see also about how the listdir function returns the names of the 
directory entries as that's already stated/implied above.

--
assignee: docs@python
components: Documentation
files: scandir_doc_tweaks.patch
keywords: patch
messages: 241560
nosy: benhoyt, docs@python, haypo, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Improve os.scandir() and DirEntry documentation
versions: Python 3.5
Added file: http://bugs.python.org/file39131/scandir_doc_tweaks.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24013
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread Ben Hoyt

Ben Hoyt added the comment:

Yep, I'm good -- go ahead and close!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23605
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread Ben Hoyt

Ben Hoyt added the comment:

Thanks for the explanation (and the comment fix).

 What's your point about complexity? Would you like to drop os.scandir
 changes in os.walk(), or do you have a simpler version to propose?

No, not at all! I was just noting it and trying to brainstorm any ways to 
simplify it (while keeping the current behavior). But I'm not sure there is a 
good simplification, so that's fine.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23605
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-26 Thread Ben Hoyt

Ben Hoyt added the comment:

Victor, great work on pushing this out, especially with the modifying the 
directories fix. (And thanks Serhiy for pushing on correctness here.)

Couple of comments/questions about your new os.walk() implementation.

1) The new implementation is more complex. Of course, most of this is necessary 
due to the topdown directory issue. However, one thing I'm not sure about is 
why you create scandir_it manually and use a while True loop, adding complexity 
and making it require two versions of the error handling. Why not a simple for 
entry in scandir(top): ... with a try/except OSError around the whole loop? I 
could well be missing something here though.

2) In this commit http://bugs.python.org/review/23605/diff/14181/Lib/os.py -- 
which is not the final one, I don't quite understand the catch_oserror thing. 
Presumably this turned out to be unnecessary, as it's not in the final version?

3) Really minor thing: in one of the comments, you misspell symbolik. Should 
be symbolic.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23605
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-13 Thread Ben Hoyt

Ben Hoyt added the comment:

 I don't understand your benchmark. Do you mean that os.walk() is slower
 with fast_bottom-up.patch because islink() is called or because I replaced
 for entry in scandir(top): with entry = next(scandir_it)?

No, sorry, I was making two separate comments: 1) the code's gotten quite a bit 
more complex (and if it needs to be that way for correctness, I'm okay with 
that), and 2) I'm surprised at how fast it still is.

 Are you testing the top-bottom or bottom-up?

My benchmark.py calls os.walk() with topdown=True, which is the default. I was 
testing the Python 3.4 version of os.walk() via listdir against your 
fast_bottom-up.patch.

I'm keen to look into this a bit further, but it won't be today.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23605
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread Ben Hoyt

Ben Hoyt added the comment:

Thanks, Victor.

I haven't quite grokked all the changes here -- it's gotten somewhat more 
complicated with the scandir_it and manual next() call -- but I ran some 
benchmarks (via a hacked version of my scandir project's benchmark.py). The 
results were surprising, and in a good way:

Dev version in hg (no extra islink syscall):

Windows: 13.1x as fast (68.8x as fast in funky caching mode)
Linux: 7.8x as fast

With Victor's fast_bottom_up patch (100% correct behaviour):

Windows: 9.4x as fast (50.2x as fast in funky caching mode)
Linux: 6.5x as fast

So os.walk() will still be 10x as fast on Windows if you apply this patch, and 
6x as fast on my Linux VM. I haven't dug too deeply to know quite why the 
numbers are this good, especially on Linux, but that's what I'm seeing, which 
is great!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23605
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-11 Thread Ben Hoyt

Ben Hoyt added the comment:

Note specifically in the unsymlink() example Serhiy gave, you probably don't 
*want* os.walk() to recurse into a 
symlink-to-a-directory-that's-changed-to-a-directory, and you probably haven't 
thought about it doing so, so maybe the new behaviour is fine?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23605
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-11 Thread Ben Hoyt

Ben Hoyt added the comment:

To Victor and Serhiy:

1) Serhiy's point about not needing to build the symlinks set when followlinks 
is True is a good one, because it'll never get used. Just a if not 
followlinks: ... around that try/except would do it. Though this is a small 
optimization, as I expect 95% of people use os.walk() with followlinks=False, 
which is the default.

2) Much as I don't want to admit it :-), I think Serhiy has a point regarding 
the change in behaviour. Can we accept this tiny change? This happens because 
the previous implementation of os.walk() calls path.islink(), which does a real 
syscall, after the yield returns.

So if the caller modifies dirnames and adds a symlink to a directory it won't 
be in the symlinks set. Or if (as Serhiy's example shows) they change a 
symlink-to-a-directory to a directory the new implementation doesn't do another 
syscall to check, so differs from the old one -- in Serhiy's example, the old 
os.walk() will recurse into the changed 
symlink-to-a-directory-that's-now-a-directory, whereas the new os.walk() won't 
recurse.

Arguably the new behaviour is just as good here, but it is different. And 
Serhiy's function unsymlink() is a reasonable example of how this might play 
out.

The os.walk() docs explicitly allow modifying dirnames -- not just removing 
and reordering, but also adding entries, which could surface the difference in 
behaviour: the caller can modify the dirnames list in-place ... even to inform 
walk() about directories the caller creates or renames before it resumes walk() 
again. See here:

https://docs.python.org/3.4/library/os.html#os.walk

So I do think it's a real issue. Will this really be an issue in practice? I 
don't know; I'm tempted to think not.

Obviously if we have to call stat/islink again, it defeats half of the purpose 
of scandir. But I don't see a way around it if we want to keep the exact old 
os.walk() behaviour. We could fix the behaviour if a directory/symlink was 
added to dirnames fairly easily by testing whether it changed, but I don't 
see a way to fix the unsymlinks() example without a syscall.

Thoughts?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23605
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-11 Thread Ben Hoyt

Ben Hoyt added the comment:

@Scott Dial: just a response about this benchmark: note that this benchmark 
isn't really valid, as it says Using slower ctypes version of scandir, which 
is the slow all-Python version. You want it to be saying Using Python 3.5's 
builtin os.scandir().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23605
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-10 Thread Ben Hoyt

Ben Hoyt added the comment:

Hi Victor. I'm attaching a patch to the What's New doc. I know it's a minor 
thing, but I really read the What's New in Python x.y page whenever it comes 
out, so making this as specific as possible is good. The changes are:

1. Be more specific and flesh it out slightly: for example, instead of just 
saying PEP 471 adds a new directory iteration function, say very briefly why 
it was added (to speed up os.walk) and what it achieves.
2. Move PEP and implementation by... under correct correct PEP -- it's 
currently under PEP 475.
3. Be more specific (as per point 1) under os.scandir(), and link to issue
4. Add a note about the os.walk() performance improvement under Optimizations. 
I realize the changes to os.walk() haven't been committed yet, but hopefully 
they will be quite soon.

Any feedback? Are you happy to commit these changes?

I intend to do some review of the scandir/DirEntry docs as well. I'll send 
those in the next few days.

--
Added file: http://bugs.python.org/file38421/whatsnew.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-10 Thread Ben Hoyt

Ben Hoyt added the comment:

Thanks. Will do!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23619] Python 3.5.0a2 installer fails

2015-03-09 Thread Ben Hoyt

Ben Hoyt added the comment:

Same exact issue here. I didn't have a Python 3.5 alpha 1 previously installed, 
and I tried running the installer normally and also (after uninstalling) with 
right-click, Run as administrator.

Both do the same thing for me: pop up a dialog box at the end of installation 
that says The program can't start because api-ms-win-crt-runtime-l1-1-0.dll is 
missing from your computer...

I also get the same dialog box when I run C:\Program Files\Python 
3.5\python.exe

--
nosy: +benhoyt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23619
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-07 Thread Ben Hoyt

Ben Hoyt added the comment:

Oops, I'm sorry re previous comment -- looks like I forgot to attach 
scandir-8.patch. Now attached. Please re-read my previous comment and review. 
:-)

--
Added file: http://bugs.python.org/file38374/scandir-8.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23605] Use the new os.scandir() function in os.walk()

2015-03-07 Thread Ben Hoyt

Ben Hoyt added the comment:

Attaching a first cut at this -- basically the implementation I use for walk() 
in scandir.py on GitHub.

One thing that's really weird to me: are the os.walk() unit tests actually 
being run? In test_os.py, I notice everything's in WalkTest.setUp, which is 
kinda weird -- and it's not actually running. At least when I add a 1/0 
inside setUp() to force an error and run python -m test test_os I don't get 
any errors...

--
keywords: +patch
nosy: +benhoyt
Added file: http://bugs.python.org/file38385/os_walk_1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23605
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-03-05 Thread Ben Hoyt

Ben Hoyt added the comment:

Updated version of the patch after Victor's code review of scandir-7 and a 
couple of his comments offline. Full commit log is available on my GitHub 
project in posixmodule_scandir_main.c if anyone's interested. Again, I haven't 
looked closely at the docs or tests yet, but hopefully the C code is getting 
pretty close now!

KNOWN ISSUE: There's a reference leak in the POSIX version (found with python 
-m test -R 3:2 test_os). I'm a Windows guy and this is my first serious Python 
C extension code, so I'm not sure the best way to track this down. Would 
someone like to either point out the bug :-) or help me with how to track this 
down?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23535] os.path.join() wrong concatenation of C: on Windows

2015-02-27 Thread Ben Hoyt

Ben Hoyt added the comment:

Sorry, but this is operating as designed and documented. See the docs here: 
https://docs.python.org/3.4/library/os.path.html#os.path.join

On Windows ... since there is a current directory for each drive, 
os.path.join(c:, foo) represents a path relative to the current directory 
on drive C: (c:foo), not c:\foo.

So I think this issue should be closed as not a bug.

--
nosy: +benhoyt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23535
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-27 Thread Ben Hoyt

Ben Hoyt added the comment:

Okay, here's the next version of the patch. I've updated scandir-2.patch with a 
number of modifications and several fixes to the C code.

This includes Victor's scandir-6.patch test suite (with minor mods) and my 
original documentation. Note that I haven't looked at either the documentation 
or the tests too closely, but I'm uploading the patch in any case so Victor (or 
others) can start reviewing the C code.

--
Added file: http://bugs.python.org/file38259/scandir-7.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13617] Reject embedded null characters in wchar* strings

2015-02-26 Thread Ben Hoyt

Ben Hoyt added the comment:

Note that this (or a very similar issue) also affects os.listdir() on Windows: 
os.listdir(bytes_path_with_nul) raises ValueError as expected, but 
os.listdir(unicode_path_with_nul) does not. Test case:

 import os
 os.mkdir('foo')
 os.listdir(b'foo\x00zzz')
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: listdir: embedded null character in path
 os.listdir('foo\x00zzz')
[]

However, this is not the case on Linux, as there both calls raise an 
appropriate ValueError.

This needs to be fixed in posixmodule.c's path_converter() function.

I'm in the middle of implementing PEP 471 (os.scandir), so don't want to create 
a proper patch right now, but the fix is to add these lines in posixmodule.c 
path_converter() after the if (length  32767) {...} block:

if ((size_t)length != wcslen(wide)) {
FORMAT_EXCEPTION(PyExc_ValueError, embedded null character in %s);
Py_DECREF(unicode);
return 0;
}

We should also add test to test_os.py like the following:

def test_listdir_nul_in_path(self):
self.assertRaises(ValueError, os.listdir, 'y\x00z')
self.assertRaises(ValueError, os.listdir, b'y\x00z')

--
nosy: +benhoyt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13617
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-18 Thread Ben Hoyt

Ben Hoyt added the comment:

BTW, I replied to Victor privately, but for the thread: no worries, and apology 
accepted! :-) I'm working on updating my C patch with his feedback, and will 
update this issue hopefully in the next few days.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-13 Thread Ben Hoyt

Ben Hoyt added the comment:

Akari: yes, I'm aware of this, and it's definitely a concern to me -- it may be 
that scandir has a bug with counting the size of certain non-file directory 
entries, or my implementation of os.walk() via scandir is not quite correct. 
I'll look at it before too long!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread Ben Hoyt

Ben Hoyt added the comment:

Hi Victor, I thank you for your efforts here, especially your addition of 
DirEntry.inode() and your work on the tests.

However, I'm a bit frustrated that you just re-implemented the whole thing 
without discussion: I've been behind scandir and written the first couple of 
implementations, and I asked if you could review my code, but instead of 
reviewing it or interacting with my fairly clear desire for the all-C version, 
you simply turn up and say I've rewritten it, can you please review?

You did express concern off list about an all-C vs part-Python implementation, 
but you said it's still unclear to me which implementation should be added to 
Python and I don't feel able today to take the right decision. We may use 
python-dev to discuss that. But I didn't see that discussion at all, and I had 
expressed fairly clearly (both here and off list), with benchmarks, why I 
thought it should be the all-C version.

So it was a bit of a let down for a first-time Python contributor. Even a 
simple question would have been helpful here: Ben, I really think this much C 
code for a first version is bug-prone; what do you think about me 
re-implementing it and comparing?

TLDR: I'd like to see os.scandir() in Python even if it means the slower 
implementation, but some discussion would have been nice. :-)

So if it's not too late, I'll continue that discussion in my next message.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-12 Thread Ben Hoyt

Ben Hoyt added the comment:

To continue the actual which implementation discussion: as I mentioned last 
week in http://bugs.python.org/msg235458, I think the benchmarks above show 
pretty clearly we should use the all-C version.

For background: PEP 471 doesn't add any new functionality, and especially with 
the new pathlib module, it doesn't make directory iteration syntax nicer 
either: os.scandir() is all about letting the OS give you whatever info it can 
*for performance*. Most of the Rationale for adding scandir given in PEP 471 is 
because it can be so so much faster than listdir + stat.

My original all-C implementation is definitely more code to review (roughly 800 
lines of C vs scandir-6.patch's 400), but it's also more than twice as fast. On 
my Windows 7 SSD just now, running benchmark.py:

Original scandir-2.patch version:
os.walk took 0.509s, scandir.walk took 0.020s -- 25.4x as fast

New scandir-6.patch version:
os.walk took 0.455s, scandir.walk took 0.046s -- 10.0x as fast

So the all-C implementation is literally 2.5x as fast on Windows. (After both 
tests, just for a sanity check, I ran the ctypes version as well, and it said 
about 8x as fast for both runs.)

Then on Linux, not a perfect comparison (different benchmarks) but shows the 
same kind of trend:

Original scandir-2.patch benchmark (http://bugs.python.org/msg228857):
os.walk took 0.860s, scandir.walk took 0.268s -- 3.2x as fast

New scandir-6.patch benchmark (http://bugs.python.org/msg235865) -- note 
that 1.3x faster should actually read 1.3x as fast here:
bench: 1.3x faster (scandir: 164.9 ms, listdir: 216.3 ms)

So again, the all-C implementation is 2.5x as fast on Linux too.

And on Linux, the incremental improvement provided by scandir-6 over listdir is 
hardly worth it -- I'd use a new directory listing API for 3.2x as fast, but 
not for 1.3x as fast.

Admittedly a 10x speed gain (!) on Windows is still very much worth going for, 
so I'm positive about scandir even with a half-Python implementation, but 
hopefully the above shows fairly clearly why the all-C implementation is 
important, especially on Linux.

Also, if the consensus is in favour of slow but less C code, I think there are 
further tweaks we can make to the Python part of the code to improve things a 
bit more.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2015-02-05 Thread Ben Hoyt

Ben Hoyt added the comment:

Victor, I'd love to push forward with this (I doubt it's going to make alpha 1 
on February 8, but hopefully alpha 2 on March 8).

It seems pretty settled that we need to use the all-C version I've created, 
especially for Linux.

Can you please review scandir-2.patch? It's on Rietveld here: 
http://bugs.python.org/review/22524/#ps13104

You can see what changed between scandir-1.patch and scandir-2.patch more 
easily here: 
https://github.com/benhoyt/scandir/commit/d4e2d7083319ed8988aef6ed02d670fb36a00a33

Also, here is just the scandir C code in its own file: 
https://github.com/benhoyt/scandir/blob/d4e2d7083319ed8988aef6ed02d670fb36a00a33/posixmodule_scandir_main.c

One other open question is: should it go into CPython's posixmodule.c as I have 
it in the patch, or should I try to separate it out?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-11-25 Thread Ben Hoyt

Ben Hoyt added the comment:

Akira, note the Using slower ctypes version of scandir -- this is the older, 
ctypes implementation of scandir. On Linux, depending on file system etc, that 
can often be slower. You need to at least be using the fast C version in 
_scandir.c, which is then half C, half Python. But ideally you'd use the all-C 
version that I've provided as a CPython 3.5 patch.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-18 Thread Ben Hoyt

Ben Hoyt added the comment:

Attaching updated patch (scandir-2.patch) per Victor's code review here: 
http://bugs.python.org/review/22524/diff/13005/Modules/posixmodule.c

Note that I haven't included test_scandir.py this time, as I haven't made any 
changes there, and it's not really ready for Python 3.5 yet.

--
Added file: http://bugs.python.org/file36963/scandir-2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-09 Thread Ben Hoyt

Ben Hoyt added the comment:

Thanks, Victor and Antone. I'm somewhat surprised at the 2-3x numbers you're 
seeing, as I was consistently getting 4-5x in the Linux tests I did. But it 
does depend quite a bit on what file system you're running, what hardware, 
whether you're running in a VM, etc. Still, 2-3x faster is a good speedup!

The numbers are significantly better on Windows, as you can see. Even the 
smallest numbers I've seen with --scandir os are around 12x range on Windows.

In any case, Victor's last tests are right -- I presume we'll have *some* C, 
so what we want to be comparing is benchmark.py --scandir c versus 
benchmark.py --scandir os: the some C version versus the all C version in the 
attached CPython 3.5 patch.

BTW, Victor, Generic isn't really useful. I just used it as a test case that 
calls listdir() and os.stat() to implement the scandir/DirEntry interface. So 
it's going to be strictly slower than listdir + stat due to using listdir and 
creating all those DirEntry objects.

Anyway, where to from here? Are we agreed given the numbers that -- especially 
on Linux -- it makes good performance sense to use an all-C approach?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-09 Thread Ben Hoyt

Ben Hoyt added the comment:

I dunno, I'd be happy if you implement this, but it does mean *more* C code, 
not less. :-) I feel this would be a nice-to-have, but we should get the thing 
working first, and then do the multiple-OS-calls-in-one optimization.

In any case, if you implement this, I think you should do so against the 
CPython 3.5 patch, not the _scandir.c/scandir_helper code.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-08 Thread Ben Hoyt

Ben Hoyt added the comment:

Thanks for the initial response and code review, Victor. I'll take a look and 
respond further in the next few days.

In the meantime, however, I'm definitely open to splitting scandir out into its 
own C file. This will mean a little refactoring (making some functions 
public/non-static).

Based on the numbers so far, I'm not so keen on implementing just the sys calls 
in C and the rest in Python. I already do basically this with ctypes in the 
scandir module, and it's slowish. I'll send proper numbers through soon, but 
here's what I remember from running benchmark.py on my Windows laptop with SSD 
drive:

ctypes version: os.walk() 9x faster with scandir
CPython 3.5 C version (debug): os.walk() 24x faster with scandir
CPython 3.5 C version (release): os.walk() 55x faster with scandir

So you do get a lot of speedup from just the ctypes version, but you get a lot 
more (55/9 = 6x more here) by using the all-C version. Again, these numbers are 
from memory -- I'll send exact ones later.

One of the problems is that creating the DirEntry objects and calling their 
methods is fairly expensive, and if this is all done in Python, you lose a lot. 
I believe scandir() would end up being slower than listdir() in many cases.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-08 Thread Ben Hoyt

Ben Hoyt added the comment:

Here are the actual numbers (instead of just from memory) running on my Windows 
laptop, which happens to have an SSD drive, so os.walk() with scandir is 
particularly good here.

python 3.4: benchmark.py -c python (all implemented in Python using ctypes):

os.walk took 1.011s, scandir.walk took 0.057s -- 17.8x as fast

python 3.4: benchmark.py -c c (using _scandir.c, so the iteration implemented 
in C, the DirEntry object in Python):

os.walk took 1.014s, scandir.walk took 0.039s -- 25.8x as fast

python 3.5: benchmark.py -c os (using the new all-C version in posixmodule.c):

os.walk took 0.983s, scandir.walk took 0.019s -- 52.3x as fast

So as you can see, there's still another 2x speedup to be gained from having 
everything in C. I know it's a bit more to review, but my thinking is if we're 
going to speed this baby up, let's speed it right up!

I haven't done these tests on Linux yet.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-10-06 Thread Ben Hoyt

Ben Hoyt added the comment:

Attaching my first patch here. It includes docs, the implementation of scandir 
and DirEntry in posixmodule.c, and tests in test_scandir.py. It does not yet 
include the changes for os.walk() to use scandir; I'm hoping to do that in a 
separate patch for easier code review.

I'd love some code and documentation review, as well as some thoughts on these 
open questions:

1) posixmodule.c is getting kinda huge and unwieldy; it'd be nice to implement 
this in a separate C file, but that comes with its own problems, as several 
functions would need to be made available externally.

2) While writing the tests, I was getting issues with Windows (apparently) not 
deleting the symlinks immediately in my teardown function, so for the moment I 
just don't call the teardown function at all. Not great; I think we'll want a 
better solution.

3) Partly due to #2, I haven't yet added tests for the file-not-found condition 
if a file is found by the directory scanning but then deleted before a 
DirEntry.is_dir() or DirEntry.is_file() call. I intend to add these tests, and 
there's a TODO comment in test_scandir.py so I remember.

4) test_scandir.py will need some further cleanup for inclusion in CPython; 
it's currently taken straight from my scandir module, which supports Python 
down to Python 2.6.

--
keywords: +patch
Added file: http://bugs.python.org/file36831/scandir-1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-09-30 Thread Ben Hoyt

Ben Hoyt added the comment:

Yes, PEP 471 has been accepted, and I've got a mostly-finished C implementation 
of os.scandir() for CPython 3.5, as well as tests and docs. If you want a sneak 
preview, see posixmodule_scandir*.c, test/test_scandir.py, and os.rst here: 
https://github.com/benhoyt/scandir

It's working well on Windows, but the Linux version has a couple of tiny issues 
yet (core dumps ;-).

Given that os.scandir() will solve this issue (as well as the bigger 
performance problem due to listdir throwing away file type info), can we close 
this issue and open another one to track the implementation of os.scandir() / 
PEP 471?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22524] PEP 471 implementation: os.scandir() directory scanning function

2014-09-30 Thread Ben Hoyt

New submission from Ben Hoyt:

Opening this to track the implementation of PEP 471: os.scandir() [1]. This 
supercedes Issue #11406 (and possibly others)?

The implementation is most of the way there, but not yet done as a CPythono 3.5 
patch. Before I have a proper patch, it's available on GitHub [2] -- see 
posixmodule_scandir*.c, test/test_scandir.py, and os.rst.

[1] http://legacy.python.org/dev/peps/pep-0471/
[2] https://github.com/benhoyt/scandir

--
components: Library (Lib)
messages: 227933
nosy: benhoyt, haypo, mmarkk, pitrou, tim.golden
priority: normal
severity: normal
status: open
title: PEP 471 implementation: os.scandir() directory scanning function
type: enhancement
versions: Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22524
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-09-30 Thread Ben Hoyt

Ben Hoyt added the comment:

Okay, I've opened http://bugs.python.org/issue22524, but I don't have the 
permissions to close this one, so could someone with bugs.python.org 
superpowers please do that?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18577] lru_cache enhancement: lru_timestamp helper function

2014-09-03 Thread Ben Hoyt

Ben Hoyt added the comment:

I really like this idea (and am needing this functionality), but I don't think 
this API (or implementation) is very nice:

1) It means you have to change your function signature to use the timeout 
feature.

2) Specifying the interval in minutes seems odd (most similar timeouts in 
Python are specified in seconds).

I would love to see an optional timeout=seconds keyword arg to the lru_cache() 
decorator, or some other way to support this.

Raymond, what do you think would be the simplest way to hook this in?

One way I think would be nice (and also support other neat things) is to allow 
you to specify the dict-like object that's used for the cache (defaults to 
dict, of course). So the lru_cache signature would change to:

def lru_cache(maxsize=100, typed=False, cache=None):
...

From looking at the source, cache would need to support these methods: get, 
clear, __setitem__, __contains__, __len__, __delitem__

Would this just work? Or could there be a race condition if __contains__ (key 
in cache) returned True but then cache.get(key) returned False a bit later?

In any case, this seems nice and general to me, and would mean you could 
implement a simple ExpiringDict() and then pass that as 
cache=ExpiringDict(expiry_time).

Thoughts?

--
nosy: +benhoyt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18577
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10551] mimetypes read from the registry should not overwrite standard mime mappings

2014-07-29 Thread Ben Hoyt

Ben Hoyt added the comment:

Mark, are you referring to part 3 of this issue, the image/pjpeg type of 
problem? This was fixed in Python 2.7.6 -- see changeset 
http://hg.python.org/cpython/rev/e8cead08c556 and 
http://bugs.python.org/issue15207

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10551
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-06-26 Thread Ben Hoyt

Ben Hoyt added the comment:

Raymond, there are very compelling timings/benchmarks for this -- not so much 
the original issue here (generator vs list, that's not really an issue) but 
having a scandir() function that returns the stat-like info from the OS so you 
don't need extra stat calls. This speeds up os.walk() by 7-20 times on Windows 
and 4-5 times on Linux. See more at: 
https://github.com/benhoyt/scandir#benchmarks

I've written a draft PEP that I've sent to the PEP editors (if you're 
interested, it's at https://github.com/benhoyt/scandir/blob/master/PEP.txt). If 
any of the PEP editors are listening here ... would love some feedback on that 
at some stage. :-)

Victor -- development has started outside the stdlib here: 
https://github.com/benhoyt/scandir and PyPI module here: 
https://pypi.python.org/pypi/scandir Both are being used by various people.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-06-26 Thread Ben Hoyt

Ben Hoyt added the comment:

Thanks! Will post the PEP to python-dev in the next day or two.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2014-06-26 Thread Ben Hoyt

Ben Hoyt added the comment:

Nick -- sorry, already posted to python-dev before seeing your latest. However, 
I think it's the right place, as there's already been a fair bit of hashing 
this idea and API out on python-ideas first and then also python-dev. See links 
in the PEP here: http://legacy.python.org/dev/peps/pep-0471/#previous-discussion

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21719] Returning Windows file attribute information via os.stat()

2014-06-19 Thread Ben Hoyt

Ben Hoyt added the comment:

Great, thanks for committing!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21719
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21719] Returning Windows file attribute information via os.stat()

2014-06-19 Thread Ben Hoyt

Ben Hoyt added the comment:

BTW, thanks for the mention in What's New in Python 3.5: 
https://docs.python.org/3.5/whatsnew/3.5.html#os

Can I make one small suggestion for a tweak there? A link to the docs for 
os.stat() would be good. So maybe instead of mentioning os.stat_result -- 
which isn't really a documented class -- say The return value of 
:func:`os.stat` now has... where os.stat is a link to 
https://docs.python.org/3.5/library/os.html#os.stat with the documentation for 
that.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21719
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21719] Returning Windows file attribute information via os.stat()

2014-06-14 Thread Ben Hoyt

Ben Hoyt added the comment:

Uploading a (hopefully final! :-) patch to fix Zach Ware's points from the code 
review:

1) use stat.FILE_ATTRIBUTE_DIRECTORY constant in test_os.py
2) break line length in stat.rst doc source

--
Added file: http://bugs.python.org/file35632/issue21719-4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21719
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21719] Returning Windows file attribute information via os.stat()

2014-06-13 Thread Ben Hoyt

Ben Hoyt added the comment:

Full patch to add this to Python 3.5 attached:

* code changes to posixmodule.c and _stat.c
* tests added in test_os.py and test_stat.py
* docs added to os.rst and stat.rst

--
keywords: +patch
Added file: http://bugs.python.org/file35615/issue21719.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21719
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21745] Devguide: mention requirement to install Visual Studio SP1 on Windows

2014-06-13 Thread Ben Hoyt

Ben Hoyt added the comment:

Cool, thanks for applying. Out of curiosity, how often is the online devguide 
HTML updated?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21745
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21719] Returning Windows file attribute information via os.stat()

2014-06-13 Thread Ben Hoyt

Ben Hoyt added the comment:

Updated patch attached based on code reviews.

I'm replying to the code review here as when I tried to reply on 
bugs.python.org/review I got a Python exception, AttributeError: NoneType has 
no attribute something or other. FYI, it seems Django is set up in debug mode, 
because it was the full colourful traceback with local vars etc. I didn't save 
this file, sorry!

Zach Ware
=

 Either way, these should probably be defined in stat.py as well (sorry; I 
 should
 have said 'as well as' instead of 'instead of' when I mentioned _stat.c in the
 email thread).

I don't think so -- I thought the purpose of moving them to _stat.c was so you 
could use the windows.h constants rather than hard-coded integers? But if 
they're in stat.py, why do they need to be in _stat.c as well?

http://bugs.python.org/review/21719/diff/12149/Modules/_stat.c#newcode563
 Modules/_stat.c:563: // For some reason FILE_ATTRIBUTE_INTEGRITY_STREAM and
 PEP 7 bans C++ style comments.

Good point.


stoneleaf
=

 'covert' should be 'convert' - can you fix this as part of your patch? ;)

Fixed in latest patch.

 Instead of spelling out FILE_ATTRIBUTE_ each time, can we abbreviate that part
 to FA_ ?

I disagree. I realize FILE_ATTRIBUTE_* is longish, but it's what it's called in 
the Windows API, so will make these constants much more findable and obvious to 
Windows programmers (consistent with the Win32 documentation).

 On non-Windows machines 'st_file_attributes' will be 0?

No, it won't be present at all (like the other non-cross-platform attributes).

 There's no attribute for the value 8?

Correct -- see the Win32 API documentation and loewis's comment.



haypo
=

 It would be nice to document these options here :)

I don't think we should duplicate the Win32 documentation. Ours will just be 
less thorough and potentially get out of date. I agree we should add a link to 
the Win32 docs though.

 Instead of == 0, you can use assertFalse.

Could, but I don't like this when I'm doing bit/integer comparisons. Explicit 
is better.

 The purpose of the _stat module is to only provide data from the OS, to not
 hardcode constants. So -1 to provide these constants on all platforms.

Agreed. See above.

 If the constant is only available on new Visual Studio version, we may 
 hardcode
 the constant to avoid the dependency on a specific Visual Studio version. For
 example, in unicodeobject.c I used:

 #ifndef WC_ERR_INVALID_CHARS
 #  define WC_ERR_INVALID_CHARS 0x0080
 #endif

That's a good way of doing it. I'll use this approach.

 Modules/posixmodule.c:1437: unsigned long st_file_attributes;
 Should we use a DWORD here?

I don't think so. None of the other values in win32_stat (e.g., st_dev) are 
defined using DWORD or Win32-style types. See also loewis's comment.


loewis
==

 We shouldn't replicate Microsoft's documentation (if for no other reason than
 copyright). Instead, linking to
 
 http://msdn.microsoft.com/en-us/library/windows/desktop/gg258117.aspx

 might be helpful (as long as Microsoft preserves this URL).

Agreed. I'll make this change.

--
Added file: http://bugs.python.org/file35621/issue21719-2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21719
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21719] Returning Windows file attribute information via os.stat()

2014-06-13 Thread Ben Hoyt

Ben Hoyt added the comment:

 The idea is that _stat.c will use system-provided values wherever 
 possible, but stat.py should be as accurate as we can make it to
 provide a backup for when _stat isn't around (either when it's just
 not built, which isn't an issue on Windows, or in another Python
 implementation).

Fair call. I've added the constants to stat.py as well. Updated patch attached.

--
Added file: http://bugs.python.org/file35622/issue21719-3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21719
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21719] Returning Windows file attribute information via os.stat()

2014-06-12 Thread Ben Hoyt

Ben Hoyt added the comment:

I've got a patch for this. Need to finish the docs and add tests, and then I'll 
post here.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21719
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21745] Devguide: mention requirement to install Visual Studio SP1 on Windows

2014-06-12 Thread Ben Hoyt

New submission from Ben Hoyt:

Per my email on core-mentorship, the instructions for compiling CPython on 
Windows at
https://docs.python.org/devguide/setup.html#windows are good, however I did 
have one issue where the dev guide didn't help. During the link
step, I got this error:

LINK : fatal error LNK1123: failure during conversion to COFF: file
invalid or corrupt

It took me a while to figure out how to fix it. I found this question
on StackOverflow:

http://stackoverflow.com/questions/10888391/error-link-fatal-error-lnk1123-failure-during-conversion-to-coff-file-inval

The first part of the answer didn't help (because incremental linking
is already disabled). But the second part, kinda tucked away there, is
what fixed it for me -- simply installing Visual Studio 2010 SP1 here:

http://www.microsoft.com/en-us/download/details.aspx?id=23691

After this, I restarted my PC and it linked fine.

So I suggest a small addition to the dev guide to mention this -- adding the 
following paragraph after the Python 3.3 and later paragraph:

-
You'll also need to install the Visual Studio `Service Pack 1 (SP1)
http://www.microsoft.com/en-us/download/details.aspx?id=23691`_. If you
don't install this service pack, you may receive errors like the following
during linking: ``LINK : fatal error LNK1123: failure during conversion to
COFF: file invalid or corrupt``.
-

Patch attached.

--
assignee: docs@python
components: Documentation
files: visual_studio_sp1.patch
keywords: patch
messages: 220418
nosy: benhoyt, docs@python
priority: normal
severity: normal
status: open
title: Devguide: mention requirement to install Visual Studio SP1 on Windows
type: enhancement
Added file: http://bugs.python.org/file35607/visual_studio_sp1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21745
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21745] Devguide: mention requirement to install Visual Studio SP1 on Windows

2014-06-12 Thread Ben Hoyt

Changes by Ben Hoyt benh...@gmail.com:


--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21745
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21719] Returning Windows file attribute information via os.stat()

2014-06-11 Thread Ben Hoyt

New submission from Ben Hoyt:

I asked recently on python-dev [1] about adding a st_winattrs attribute to 
stat result objects on Windows, to return the full set of Windows file 
attribute bits, such as hidden or compressed status. Copying from that 
thread for a bit more context here:

Python's os.stat() simply discards most of the file attribute
information fetched via the Win32 system calls. On Windows, os.stat()
calls CreateFile to open the file and get the dwFileAttributes value,
but it throws it all away except the FILE_ATTRIBUTE_DIRECTORY and
FILE_ATTRIBUTE_READONLY bits. See CPython source at [2].

Given that os.stat() returns extended, platform-specific file
attributes on Linux and OS X platforms (for example,
st_blocks, st_rsize, etc), it seems that Windows is something of a
second-class citizen here.

There are several questions on StackOverflow about how to get this
information on Windows, and one has to resort to ctypes. For example,
[3].

To solve this problem, I think we should add a st_winattrs attribute to the 
object returned by os.stat() on
Windows. And we should add the relevant Win32 FILE_ATTRIBUTE_* constants to the 
stat module.

Then, similarly to existing code like hasattr(st, 'st_blocks') on
Linux, you could write a cross-platform function to determine if a
file was hidden, something like so:

def is_hidden(path):
if startswith(os.path.basename(path), '.'):
return True
st = os.stat(path)
return getattr(st, 'st_winattrs', 0)  FILE_ATTRIBUTE_HIDDEN != 0

There was general support for this idea on python-dev (see [4] [5] [6]), so I'd 
love to see this in Python 3.5.

Basically we need to add a st_winattrs integer attribute to the win32_stat 
struct in posixmodule.c and add the FILE_ATTRIBUTE_* constants to _stat.c, as 
well as adding Windows-specific tests and documentation.

I've never compiled CPython on Windows, but I aim to provide a patch for this 
at some stage soonish. Feedback and other patches welcome in the meantime. :-)

[1] https://mail.python.org/pipermail/python-dev/2014-June/134990.html
[2] https://github.com/python/cpython/blob/master/Modules/posixmodule.c#L1462
[3] http://stackoverflow.com/a/6365265
[4] https://mail.python.org/pipermail/python-dev/2014-June/134993.html
[5] https://mail.python.org/pipermail/python-dev/2014-June/135006.html
[6] https://mail.python.org/pipermail/python-dev/2014-June/135007.html

--
components: Library (Lib)
messages: 220266
nosy: benhoyt, ethan.furman, zach.ware
priority: normal
severity: normal
status: open
title: Returning Windows file attribute information via os.stat()
type: enhancement
versions: Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21719
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21719] Returning Windows file attribute information via os.stat()

2014-06-11 Thread Ben Hoyt

Ben Hoyt added the comment:

Fair call -- st_file_attributes sounds good to me, and matches the prefix of 
the FILE_ATTRIBUTES_* constants better too.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21719
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings

2013-08-12 Thread Ben Hoyt

Ben Hoyt added the comment:

All looks great. I like what you've done with default_encoding now. Thanks, Tim 
(and Dave for the original report).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15207
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings

2013-08-11 Thread Ben Hoyt

Ben Hoyt added the comment:

Thanks, Tim! Works for me! A couple of code review comments:

1) On 2.7, guess_type(s)[0] is a byte string as usual if the type doesn't exist 
in the registry, but it's a unicode string if it came from the registry. Seems 
like it should be a byte string in all cases (the mime type can only be ASCII 
char). I would say .encode('ascii') and if it raises UnicodeError, ignore that 
key.

2) Would 'subkeyname[0] == .' be better as subkeyname.startswith(.)? More 
idiomatic, and won't bomb out if subkeyname is zero length (though that 
probably can't happen). Relatedly, not subkeyname.startswith() with an 
early-continue would avoid an indent and is what the rest of the code does.

3) I believe the default_encoding variable is not needed anymore. That was used 
in the old registry code.

4) Super-minor: raises EnvironmentError would be the Pythonic way to say 
throws EnvironmentError.

5) Would it be worth adding a test for 'foo.png' as well, as that was another 
super-common type that was wrong?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15207
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-10 Thread Ben Hoyt

Ben Hoyt added the comment:

 Please bring this up on python-dev.

Good idea. Thread started: 
http://mail.python.org/pipermail/python-dev/2013-May/126119.html

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-06 Thread Ben Hoyt

Ben Hoyt added the comment:

 A normal caller would never expect a stat object to be partially populated: 
 if a function has a prototype returning a stat object, then I definitely 
 expect it to be a regular stat object, with all the fields guaranteed by 
 POSIX set (st_size, st_ino, st_dev...).

I don't think that's true in general, or true of how other Python APIs work. 
For instance, many APIs return a file-like object, and you can only do 
certain things on that object, depending on what the documentation says, or 
what EAFP gets you. Some file-like object don't support seek/tell, some don't 
support close, etc. I've seen plenty of walk-like-a-duck checks like this:

if hasattr(f, 'close'):
f.close()

Anyway, my point boils down to:

* scandir() is a new function, so there aren't old trends or things that will 
break
* we clearly document it as returning a tuple of (name, st), where st is a 
stat-like object whose invididual fields are None if they couldn't be 
determined for free with the directory scanning
* in fact, that's kind of the point of the st object in this function, so the 
example could be the one I gave above where you call os.stat() if either of the 
fields you want is None
* if that's clear in the documentation (of this new function) and the first 
example shows you exactly how it's meant to be used, I think that's pretty sane 
and sensible...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-05 Thread Ben Hoyt

Ben Hoyt added the comment:

 I find iterdir_stat() ugly :-) I like the scandir name, which has some 
 precedent with POSIX.

Fair enough. I'm cool with scandir().

 scandir() cannot return (name, stat), because on POSIX, readdir() only 
 returns d_name and d_type (the type of the entry): to return a stat, we would 
 have to call stat() on each entry, which would defeat the performance gain.

Yes, you're right. I solved this in BetterWalk with the solution you propose 
of returning a stat_result object with the fields it could get for free set, 
and the others set to None.

So on Linux, you'd get a stat_result with only st_mode set (or None for 
DT_UNKNOWN), and all the other fields None. However -- st_mode is the one 
you're most likely to use, usually looking just for whether it's a file or 
directory. So calling code would look something like this:

files = []
dirs = []
for name, st in scandir(path):
if st.st_mode is None:
st = os.stat(os.path.join(path, name))
if stat.S_ISDIR(st.st_mode):
dirs.append(name)
else:
files.append(name)

Meaning you'd get the speed improvements 99% of the time (when st_mode) was 
set, but if st_mode is None, you can call stat and handle errors and whatnot 
yourself.

 That's why scandir would be a rather low-level call, whose main user would be 
 walkdir, which only needs to know the entry time and not the whole stat 
 result.

Agreed. This is in the OS module after all, and there's tons of stuff that's 
OS-dependent in there. However, I think that doing something like the above, we 
can make it usable and performant on both Linux and Windows for use cases like 
walking directory trees.

 Also, I don't know which information is returned by the readdir equivalent on 
 Windows, but if we want a consistent API, we have to somehow map d_type and 
 Windows's returned type to a common type, like DT_FILE, DT_DIRECTORY, etc 
 (which could be an enum).

The Windows scan directory functions (FindFirstFile/FindNextFile) return a 
*full* stat (or at least, as much info as you get from a stat in Windows). We 
*could* map them to a common type -- but I'm suggesting that common type might 
as well be stat_result with None meaning not present. That way users don't 
have to learn a completely new type.

 The other approach would be to return a dummy stat object with only st_mode 
 set, but that would be kind of a hack to return a dummy stat result with only 
 part of the attributes set (some people will get bitten by this).

We could document any platform-specific stuff, and places you'd users could get 
bitten. But can you give me an example of where the 
stat_result-with-st_mode-or-None approach falls over completely?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-05 Thread Ben Hoyt

Ben Hoyt added the comment:

Yeah, I very much agree with what Nick says -- we really want a way to expose 
what the platform provides. It's  less important (though still the ideal), to 
expose that in a platform-independent way. Today the only way to get access to 
opendir/readdir on Linux and FindFirst/Next on Windows is by using a bunch of 
messy (and slowish) ctypes code. And yes, os.walk() would be the main 
cross-platform abstraction built on top of this.

Charles gave this example of code that would fall over:

size = 0
for name, st in scandir(path):
if stat.S_ISREG(st.st_mode):
size += st.st_size

I don't see it, though. In this case you need both .st_mode and .st_size, so a 
caller would check that those are not None, like so:

size = 0
for name, st in scandir(path):
if st.st_mode is None or st.st_size is None:
st = os.stat(os.path.join(path, name))
if stat.S_ISREG(st.st_mode):
size += st.st_size

One line of extra code for the caller, but a big performance gain in most cases.

Stinner said, But as usual, a benchmark on a real platform would be more 
convicing. Here's a start: https://github.com/benhoyt/betterwalk#benchmarks -- 
but it's not nearly as good as it gets yet, because those figures are still 
using the ctypes version. I've got a C version that's half-finished, and on 
Windows it makes os.walk() literally 10x the speed of the default version. Not 
sure about Linux/opendir/readdir yet, but I intend to do that too.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-04 Thread Ben Hoyt

Ben Hoyt added the comment:

That's right: if we have a separate scandir() that returns (name, stat) tuples, 
then a plain iterdir() is pretty much unnecessary -- callers just ignore the 
second stat value if they don't care about it.

I'd slightly prefer the name iterdir_stat(), as that almost makes the (name, 
stat) return values explicit in the name. But that's kind of bikeshedding -- 
scandir() works too.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15535] Fix pickling efficiency of named tuples in 2.7.3

2013-05-03 Thread Ben Hoyt

Ben Hoyt added the comment:

2.7 fix works for me, thanks! Just curious -- why the different fix for 3.3 
(addition of __getstate__ instead of removal of __dict__)?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15535
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-03 Thread Ben Hoyt

Ben Hoyt added the comment:

Thanks. I thought about that -- but I think I *want* to benchmark it when
they're cached, so that we're comparing apples with apples, cached system
calls with cached systems calls. The benchmark would almost certainly be a
lot better (BetterWalk would be even faster) if I was comparing the
non-cached results. I'll think about it some more though.

Thoughts?

-Ben

On Fri, May 3, 2013 at 7:03 PM, Charles-François Natali 
rep...@bugs.python.org wrote:


 Charles-François Natali added the comment:

  However, the reason I'm keen on iterdir_stat() is that I'm seeing it
 speed up os.walk() by a factor of 10 in my recent tests (note that I've
 made local mods, so these results aren't reproducible for others yet). This
 is doing a walk on a dir tree with 7800 files and 155 dirs:
 
  Using fast _betterwalk
  Priming the system's cache...
  Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 1/3...
  Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 2/3...
  Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 3/3...
  os.walk took 0.178s, BetterWalk took 0.017s -- 10.5x as fast
 
  Sometimes Windows will go into this I'm really caching stat results
 good mode -- I don't know what heuristic determines this -- and then I'm
 seeing a 40x speed increase. And no, you didn't read that wrong. :-)

 I/O benchmarks shouldn't use timeit or repeated calls: after the first
 run, most of your data is in cache, so subsequent runs are
 meaningless.

 I don't know about Windows, but on Linux you should do something like:
 # echo 3  /proc/sys/vm/drop_caches

 to start out clean.

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue11406
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-02 Thread Ben Hoyt

Ben Hoyt added the comment:

Ah, this is great. I definitely like the idea of a generator version of 
os.listdir(). And I like the name iterdir() -- it fits with iteritems() etc. 
They've gone in 3.x, of course, but listdir didn't change to an iterator, so...

See also Betterwalk, my work-in-progress with very similar goals: 
https://github.com/benhoyt/betterwalk#readme

It implements iterdir(), as well as iterdir_stat() which yields (name, stat) 
tuples. iterdir_stat() is especially important on Windows, where the directory 
iteration functions (FindFirstFile/FindNextFile) already give you full stat 
information.

The intent of Betterwalk is to use these functions to speed up os.walk() by 2-3 
times (and that's with the ctypes version, so it'll only get better in pure C).

So I'm +1 for adding iterdir(), and I'd love to see iterdir_stat() so users can 
write fast os.walk() type functions without resorting to C.

I'll look over the attached patches at some stage, especially the Windows code.

--
nosy: +benhoyt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11406] There is no os.listdir() equivalent returning generator instead of list

2013-05-02 Thread Ben Hoyt

Ben Hoyt added the comment:

Some folks have asked about benchmarks. I don't know about iterdir() vs 
listdir() -- I kind of suspect the speed gains there wouldn't be big. 

However, the reason I'm keen on iterdir_stat() is that I'm seeing it speed up 
os.walk() by a factor of 10 in my recent tests (note that I've made local mods, 
so these results aren't reproducible for others yet). This is doing a walk on a 
dir tree with 7800 files and 155 dirs:

Using fast _betterwalk
Priming the system's cache...
Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 1/3...
Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 2/3...
Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 3/3...
os.walk took 0.178s, BetterWalk took 0.017s -- 10.5x as fast

Sometimes Windows will go into this I'm really caching stat results good mode 
-- I don't know what heuristic determines this -- and then I'm seeing a 40x 
speed increase. And no, you didn't read that wrong. :-)

Sorry, I'm getting carried away. This bug is really more about iterdir. But 
seeing Martin suggested the stat/d_type info...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11406
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15535] Fix pickling efficiency of named tuples in 2.7.3

2013-04-25 Thread Ben Hoyt

Ben Hoyt added the comment:

I just hit this issue in a big way -- would have been nice for this fix to go 
into Python 2.7.4. :-)

It was quite hard to track down (as in, a day or two of debugging :-) because 
the symptoms didn't point directly to namedtuple. In our setup we 
pickle/unpickle some big files, and the symptoms we noticed were extremely high 
memory usage after *un*pickling -- as in, 3x what we were getting before 
upgrading from Python 2.6. We first tracked it down to unpickling, and then 
from there narrowed it down to namedtuple.

The first fix I discovered was that I could use pickletools.optimize() to 
reduce the memory-usage-after-unpickling back down to sensible levels. I don't 
know enough about pickle to know exactly why this is -- perhaps fragmentation 
due to extra unpickling data structures allocated on the heap, that optimize() 
removes?

Here's the memory usage of our Python process after unpickling a ~9MB pickle 
file (protocol 2) which includes a lot of namedtuples. This is on Python 2.7.4 
64-bit. With the original collections.py -- normal means un-optimized pickle, 
optimized means run through pickletools.optimize():

Memory usage after loading normal: 106664 KB
Memory usage after loading optimized: 31424 KB

With collections.py modified so namedtuple's templates include def 
__getstate__(self): return None:

Memory usage after loading normal: 33676 KB
Memory usage after loading optimized: 26392 KB

So you can see the Python 2.7 version of namedtuple makes the process use 
basically 3x the RAM when unpickled (without pickletools.optimize). Note that 
Python 2.6 does *not* do this (it doesn't define __dict__ or use OrderedDict so 
doesn't have this issue). And for some Python 3.3(.1) doesn't have the issue 
either, even though that does define __dict__ and use OrderedDict. I guess 
Python 3.3 does pickling (or garbage collection?) somewhat differently.

You can verify this yourself using the attached unpickletest.py script. Note 
that I'm running on Windows 7, but I presume this would happen on Linux/OS X 
too, as this issue has nothing to do with the OS. The script should work on 
non-Windows OSes, but you have to type in the RAM usage figures manually (using 
top or similar).

Note that I'm doing a gc.collect() just before fetching the memory usage figure 
just in case there's uncollected cyclical garbage floating around, and I didn't 
want that to affect the measurement.

I'm not sure I fully understand the cause (of where all this memory is going), 
or the fix for that matter. The OrderedDict is being pickled along with the 
namedtuple instance, because an OrderedDict is returned by __dict__, and pickle 
uses that. But is that staying in memory on unpickling? Why does optimizing the 
pickle fix the RAM usage issue to a large extent?

In any case, I've made the __getstate__ fix in our code, and that definitely 
fixes the RAM usage for us. (We're also going to be optimizing our pickles from 
now on.)

--
nosy: +benhoyt
Added file: http://bugs.python.org/file30021/unpickletest.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15535
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings

2013-04-17 Thread Ben Hoyt

Ben Hoyt added the comment:

Okay, I'm looking at the diff between mt-tip-noregistry.txt and 
mt-tip-newregistry.txt, and I've attached a file showing the lines that are 
*different* between the two, as well as the Apache mime.types value for that 
file extension.

In most cases, noregistry gives the right mime type, and newregistry is wrong. 
However, in a few cases, the registry value is right (i.e., according to 
Apache's mime.types). However, I think that's a totally separate issue, and 
maybe we should probably open a bug to update a few of the hard-coded mappings 
in mimetypes.py.

The cases where noregistry is right (according to Apache):

* .aif
* .aifc
* .aiff
* .avi
* .sh
* .wav
* .xsl
*. zip

The cases where noregistry is wrong (according to Apache):

* .bmp is hard-coded as image/x-ms-bmp, but it should be image/bmp
* .dll and .exe are hard-coded as application/octet-stream, but should be 
application/x-msdownload
* .ico is hard-coded as image/vnd.microsoft.icon but should be image/x-icon
* .m3u is hard-coded as application/vnd.apple.mpegurl but should be 
audio/x-mpegurl

None of these are standardized IANA mime types, and they're not particularly 
common for web servers to be serving, so it probably doesn't matter too much 
that the current hard-coded values are wrong. Also, I'm guessing web browsers 
will interpret the older type image/x-ms-bmp as image/bmp anyway. So maybe we 
should open another issue to fix the hard-coded types in mimetypes.py shown 
above, but again, that's another issue.

The other thing here is all the *new types* that the registry adds, such as 
.acrobatsecuritysettings. I don't see that these add much value -- just a 
bunch of types that depend on the programs you have installed. And in my mind 
at least, the behaviour of mimetypes.guess_type() should not change based on 
what programs you have installed.

In short, noregistry gives more accurate results in most cases that matter, 
and I still strongly feel we should revert to that. (The only alternative, in 
my opinion, is to switch to Dave Chambers' version of read_windows_registry(), 
but not call it by default.)

--
Added file: http://bugs.python.org/file29913/different.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15207
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings

2013-01-30 Thread Ben Hoyt

Ben Hoyt added the comment:

Any update on this, Tim or other Windows developers?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15207
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings

2012-12-09 Thread Ben Hoyt

Ben Hoyt added the comment:

Either way -- this needs to be reverted or fixed. It's a nasty gotcha for folks 
writing Python web services at the moment. I'm still for reverting, per my 
reasons above.

Dave Chambers, I'm not for faster but broken but for faster and fixed -- 
from what I've shown above, it's the Windows registry that's broken, so 
removing read_windows_registry() entirely would fix this (and as a bonus, be 
faster and simplify the code :-).

Per your suggestion http://bugs.python.org/issue15207#msg177092 -- I don't 
understand how mimetypes.py would know the types that aren't hardcoded.

R. David Murray, I don't understand the advantage of trying to maintain a list 
of Windows fixes. What if this list was wrong, or there was a Windows update 
which broke more mime types? Why can't we just avoid the complication and go 
back to the hardcoded types for Windows?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15207
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings

2012-12-09 Thread Ben Hoyt

Ben Hoyt added the comment:

Actually, I was suggesting using the hardcoded types for Windows only (i.e., 
only removing read_windows_registry). Several bugs have been opened on problems 
with the Windows registry mimetypes, but as far as I know this isn't an issue 
on Linux -- in other words, if Linux/other systems ain't broke, no need to fix 
them.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15207
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings

2012-12-05 Thread Ben Hoyt

Ben Hoyt added the comment:

Ah, thanks for making this an issue of its own! As I commented over at 
Issue10551, it's a serious problem, and makes mimetypes.guess_type() unusable 
out of the box on Windows.

Yes, the fix in Issue4969 uses MIME\Database\Content Type, which is a mime 
type - file extension mapping, *not the other way around*.

So while this patch is definitely an improvement (for the most part it doesn't 
produce wrong values!), but I'm not sure it's the way to go, for a few reasons:

1) Many of the most important keys aren't in the Windows registry (in 
HKEY_CLASSES_ROOT, where this patch looks). This includes .png, .jpg, and .gif. 
All of these important types fall back to the hard-coded types_map in 
mimetypes.py anyway.

2) Some that do exist are wrong in the registry (or at the least, different 
from the built-in types_map). This includes .zip, which is 
application/x-zip-compressed (at least in my registry) but should be 
application/zip.

3) It's slowish, as it has to load about 6000 registry keys (and more as you 
install more stuff on your system), but only about 200 of those have the 
Content Type subkey. On my machine (Windows 7, 64 bit CPython) this adds over 
100ms to the startup time even on subsequent runs when cached -- and I think 
100ms is pretty significant. Issue4969's version takes about 25ms, and 
reverting this completely would of course take 0ms.

4) Users and other programs can (and sometimes do!) change the Content Type 
keys in the registry -- whereas one wants mime type mappings to be consistent 
across systems. This last point is debatable for various reasons, and I think 
the above three points should carry the day, but I include it here for 
completeness. ;-)

For these reasons, I think we should revert the fix for Issue4969 and leave 
Windows users to get the default types_map as before, which is at least 
consisent -- and for mimetypes.guess_type(), you want consistency.

--
nosy: +benhoyt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15207
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10551] mimetypes read from the registry should not overwrite standard mime mappings

2012-10-09 Thread Ben Hoyt

Ben Hoyt added the comment:

This is definitely a real issue, and makes mimetypes.guess_type() useless out 
of the box on Windows.

However, I believe the reason it's broken is that the fix for Issue4969 doesn't 
actually work, and I'm not sure this is possible with the Windows registry.

You see, MIME\Database\Content Type in the Windows registry is a mime type - 
file extension mapping, *not the other way around*. But read_windows_registry() 
tries to use it as a file extension - mime type mapping, and bad things 
happen, because there are multiple mime types for certain file extensions.

As far as I can tell, there's nothing in the Windows registry that says which 
is the canonical mime type for a given extension. Again, this is because 
Microsoft intends it (and uses it) as a mime type - extension mapping. See 
more here: http://msdn.microsoft.com/en-us/library/ms775148(v=vs.85).aspx

For example, in my MIME\Database\Content Type we have:

image/jpeg - .jpg
image/jpg - .jpg
image/pjpeg - .jpg

And read_windows_registry() picks the last one for .jpg, which in this case is 
image/pjpeg -- NOT what users expect.

In short, I think the fix for Issue4969 is broken as is, and that you can't 
actually use the mime types database in the Windows registry in this way. I 
suggest reverting the fix for Issue4969.

Or, we could get clever and only use the Windows registry value if there's a 
single mime type - extension mapping for a given extension, and if there's 
more than one (meaning it'd be ambiguous), use the mimetypes default from 
types_map / common_types.

--
nosy: +benhoyt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10551
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6689] subprocess doesn't pass arguments correctly on Linux when shell=True

2009-08-12 Thread Ben Hoyt

Changes by Ben Hoyt benh...@gmail.com:


--
nosy: +benhoyt
type: behavior - feature request

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6689
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6689] subprocess doesn't pass arguments correctly on Linux when shell=True

2009-08-12 Thread Ben Hoyt

Ben Hoyt benh...@gmail.com added the comment:

Oops, didn't intend to change the type, changing back.

--
type: feature request - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6689
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com