Re: [Python-Dev] Pre-PEP: Redesigning extension modules

2013-09-01 Thread Stefan Behnel
Nick Coghlan, 01.09.2013 03:28:
> On 1 Sep 2013 05:18, "Stefan Behnel" wrote:
>> I can't really remember a case where I could afford the
>> runtime overhead of implementing a wrapper in Python and going through
>> something like ctypes or cffi. I mean, testing C libraries with Python
>> tools would be one, but then, you wouldn't want to write an extension
>> module for that and instead want to call it directly from the test code as
>> directly as possible.
>>
>> I'm certainly aware that that use case exists, though, and also the case
>> of just wanting to get things done as quickly and easily as possible.
> 
> Keep in mind I first came to Python as a tool for test automation of custom
> C++ hardware APIs that could be written to be SWIG friendly.

Interesting again. Would you still do it that way? I recently had a
discussion with Holger Krekel of py.test fame about testing C code with
Cython, and we quickly agreed that wrapping the code in an extension module
was both too cumbersome and too inflexible for testing purposes.
Specifically, neither of Cython's top selling points fits here, not speed,
not clarity, not API design. It's most likely different for SWIG, which
involves less (not no, just less) manual work and gives you API-wise more
of less exactly what you put in. However, cffi is almost certainly the
better way to do it, because it gives you all sorts of flexibility for your
test code without having to think about the wrapper design all the time.

The situation is also different for C++ where you have less options for
wrapping it. I can imagine SWIG still being the tool of choice on that
front when it comes to bare and direct testing of large code bases.


> I now work for an OS vendor where the 3 common languages for system
> utilities are C, C++ and Python.
> 
> For those use cases, dropping a bunch of standard Python objects in a
> module dict is often going to be a quick and easy solution that avoids a
> lot of nasty pointer lifecycle issues at the C level.

That's yet another use case, BTW. When you control the whole application,
then safety doesn't really matter at these points and keeping a bunch of
stuff in a dict will usually work just fine. I'm mainly used to writing
libraries for (sometimes tons of) other people, in which case the
requirements are so diverse on user side that safety is a top thing to care
about. Anything you can keep inside of C code should stay there.
(Especially when dealing with libxml2&friends in lxml which continuously
present their 'interesting' usability characteristics.)


> * PEP 3121 with a size of "0". As above, but avoids the module state APIs
> in order to support reloading. All module state (including type
> cross-references) is stored in hidden state (e.g. an instance of a custom
> type not exposed to Python, with a reference stored on each custom type
> object defined in the module, and any module level "functions" actually
> being methods of a hidden object).

Thanks for elaborating. I had completely failed to make the mental link
that you could simply stick bound methods as functions into the module
dict, i.e. that they don't even have to be methods of the module itself.
That's something that Cython could already use in older CPythons, even as a
preparation for any future import protocol changes. The object that they
are methods of would then eventually become the module instance.

You'd still suffer a slight performance hit from going from a static global
C variable to a pointer indirection - for everything: string constants,
cached Python objects, all user defined global C variables would have to go
there as Cython cannot know if they are module instance specific state or
not (they usually will be, I guess). But that has to be done anyway if the
goal is to get rid of static state to enable sub-interpreters. I can't wait
seeing lxml run threaded in mod_wsgi... ;-)


>> You seemed to be ok with my idea of making the loader return a wrapped
>> extension module instead of the module itself. We should actually try
>> that.
> 
> Sure, that's just a variant of the "hidden state object" idea I described
> above. It should actually work today with the PEP 3121 custom storage size
> set to zero.

True. The only difference is whether you leave it to the extension type
itself or make it a part of the loader architecture.

Anyway, I promise I'll give it a try in Cython. Will be some work, though,
to rewrite Cython's use of global variables, create a module state type,
migrate everything to heap types, ... I had wanted to do that for a couple
of years, but it's clearly not something for a happy afternoon or two.

Plus, it would even have to be optional in the compiler to avoid
performance regressions for modules that want to continue using fast static
globals simply because they cannot support multiple instances anyway (e.g.
due to external C library dependencies). Let's see if we can solve that at
C compilation time by throwing in a couple of macros. That would a

Re: [Python-Dev] Pre-PEP: Redesigning extension modules

2013-09-01 Thread Nick Coghlan
On 1 September 2013 18:11, Stefan Behnel  wrote:
> Nick Coghlan, 01.09.2013 03:28:
>> On 1 Sep 2013 05:18, "Stefan Behnel" wrote:
>>> I can't really remember a case where I could afford the
>>> runtime overhead of implementing a wrapper in Python and going through
>>> something like ctypes or cffi. I mean, testing C libraries with Python
>>> tools would be one, but then, you wouldn't want to write an extension
>>> module for that and instead want to call it directly from the test code as
>>> directly as possible.
>>>
>>> I'm certainly aware that that use case exists, though, and also the case
>>> of just wanting to get things done as quickly and easily as possible.
>>
>> Keep in mind I first came to Python as a tool for test automation of custom
>> C++ hardware APIs that could be written to be SWIG friendly.
>
> Interesting again. Would you still do it that way? I recently had a
> discussion with Holger Krekel of py.test fame about testing C code with
> Cython, and we quickly agreed that wrapping the code in an extension module
> was both too cumbersome and too inflexible for testing purposes.
> Specifically, neither of Cython's top selling points fits here, not speed,
> not clarity, not API design. It's most likely different for SWIG, which
> involves less (not no, just less) manual work and gives you API-wise more
> of less exactly what you put in. However, cffi is almost certainly the
> better way to do it, because it gives you all sorts of flexibility for your
> test code without having to think about the wrapper design all the time.
>
> The situation is also different for C++ where you have less options for
> wrapping it. I can imagine SWIG still being the tool of choice on that
> front when it comes to bare and direct testing of large code bases.

To directly wrap C++, I'd still use SWIG. It makes a huge difference
when you can tweak the C++ side of the API to be SWIG friendly rather
than having to live with whatever a third party C++ library provides.
Having classes in C++ map directly to classes in Python is the main
benefit of doing it this way over using a C wrapper and cffi.

However, for an existing C API, or a custom API where I didn't need
the direct object mapping that C++ can provide, using cffi would be a
more attractive option than SWIG these days (the stuff I was doing
with SWIG was back around 2003 or so).

I think this is getting a little off topic for the list, though :)

>> I now work for an OS vendor where the 3 common languages for system
>> utilities are C, C++ and Python.
>>
>> For those use cases, dropping a bunch of standard Python objects in a
>> module dict is often going to be a quick and easy solution that avoids a
>> lot of nasty pointer lifecycle issues at the C level.
>
> That's yet another use case, BTW. When you control the whole application,
> then safety doesn't really matter at these points and keeping a bunch of
> stuff in a dict will usually work just fine. I'm mainly used to writing
> libraries for (sometimes tons of) other people, in which case the
> requirements are so diverse on user side that safety is a top thing to care
> about. Anything you can keep inside of C code should stay there.
> (Especially when dealing with libxml2&friends in lxml which continuously
> present their 'interesting' usability characteristics.)

I don't think it's a coincidence that it was the etree interface with
expat that highlighted the deficiencies of the current extension
module hooks when it comes to working properly with
test.support.import_fresh_module :)

>> * PEP 3121 with a size of "0". As above, but avoids the module state APIs
>> in order to support reloading. All module state (including type
>> cross-references) is stored in hidden state (e.g. an instance of a custom
>> type not exposed to Python, with a reference stored on each custom type
>> object defined in the module, and any module level "functions" actually
>> being methods of a hidden object).
>
> Thanks for elaborating. I had completely failed to make the mental link
> that you could simply stick bound methods as functions into the module
> dict, i.e. that they don't even have to be methods of the module itself.
> That's something that Cython could already use in older CPythons, even as a
> preparation for any future import protocol changes. The object that they
> are methods of would then eventually become the module instance.
>
> You'd still suffer a slight performance hit from going from a static global
> C variable to a pointer indirection - for everything: string constants,
> cached Python objects, all user defined global C variables would have to go
> there as Cython cannot know if they are module instance specific state or
> not (they usually will be, I guess). But that has to be done anyway if the
> goal is to get rid of static state to enable sub-interpreters. I can't wait
> seeing lxml run threaded in mod_wsgi... ;-)

To be honest, I didn't realise that such a trick might already be
possible until I wa

Re: [Python-Dev] Pre-PEP: Redesigning extension modules

2013-09-01 Thread Stefan Behnel
Nick Coghlan, 01.09.2013 14:23:
> That means the powers any new extension initialisation API will offer
> will be limited to:
> 
> * letting the module know its own name (and other details)
> * letting the module explicitly block reloading
> * letting the module support loading multiple copies at once by taking
> the initial import out of sys.modules (but keeping a separate
> reference to it alive)

Which, all by themselves, can be considered a huge benefit, IMHO.

Plus, if we design the protocol broad enough now, specifically as a two-way
interface (info in, module out), we won't have to make any major changes to
it again anywhere in the near future, because incremental changes can just
be integrated into what's there then, in case we need any. It's sad that we
didn't see these requirements for Py3.0.


> In terms of where we go from here - do you mind if I use your pre-PEP
> as the initial basis for a PEP of my own some time in the next week or
> two (listing you as co-author)? Improving extension module
> initialisation has been the driver for most of the PEP 451 feedback
> I've been giving to Eric over on import-sig, so I have some definite
> ideas on how I think that API should look :)

Go for it. I'm not sure how much time I can actively spend on this during
the next weeks anyway, so I'm happy if this continues to get pushed onwards
in the meantime.

Stefan


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Redesigning extension modules

2013-09-01 Thread Antoine Pitrou
On Sun, 1 Sep 2013 11:28:36 +1000
Nick Coghlan  wrote:
> * PEP 3121 with a size of "0". As above, but avoids the module state APIs
> in order to support reloading. All module state (including type
> cross-references) is stored in hidden state (e.g. an instance of a custom
> type not exposed to Python, with a reference stored on each custom type
> object defined in the module, and any module level "functions" actually
> being methods of a hidden object). Still doesn't support loading a *fresh*
> copy due to the hidden PEP 3121 module cache.

Not sure what you mean by that:

>>> import atexit
>>> id(atexit)
140031896222680
>>> import sys
>>> del sys.modules['atexit']
>>> import atexit
>>> id(atexit)
140031896221400


> Due to refcounting, all instances of Python objects qualify as mutable
> state.

That's an overly broad definition. Many objects are shared between
subinterpreters without any problems (None, the empty tuple, built-in
types and most C extension types, etc.). As long as the state is
an internal implementation detail, there shouldn't be any problem.

> I wouldn't be willing to make the call about which of stateless vs stateful
> is more common without a lot more research :)
> 
> They're both common enough that I think they should both be well supported,
> and making the "no custom C level state" case as simple as possible.

Agreed.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Redesigning extension modules

2013-09-01 Thread Nick Coghlan
On 1 September 2013 23:03, Antoine Pitrou  wrote:
> On Sun, 1 Sep 2013 11:28:36 +1000
> Nick Coghlan  wrote:
>> * PEP 3121 with a size of "0". As above, but avoids the module state APIs
>> in order to support reloading. All module state (including type
>> cross-references) is stored in hidden state (e.g. an instance of a custom
>> type not exposed to Python, with a reference stored on each custom type
>> object defined in the module, and any module level "functions" actually
>> being methods of a hidden object). Still doesn't support loading a *fresh*
>> copy due to the hidden PEP 3121 module cache.
>
> Not sure what you mean by that:
>
 import atexit
 id(atexit)
> 140031896222680
 import sys
 del sys.modules['atexit']
 import atexit
 id(atexit)
> 140031896221400

Ah, you're right - I misremembered the exact problem that broke
xml.etree.ElementTree testing. PyModule_GetState is actually fine
(since that pointer is hidden state on the module object), it's only
PyState_GetModule that is broken when you import a second copy. So,
here, when the second import happens, it breaks the original atexit
module's callbacks, even though the two callback registries are
properly isolated:

$ ./python
Python 3.4.0a1+ (default:575071257c92+, Aug 25 2013, 00:42:17)
[GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import atexit
>>> atexit.register(print, "Hello World!")

>>> import sys
>>> del sys.modules["atexit"]
>>> import atexit as atexit2
>>> atexit2.register(print, "Goodbye World!")

>>>
Goodbye World!

So I think PEP 3121 is actually as good as we can get on the hidden
state front, but the important point is that it is the
*PyState_GetModule* API that can't handle fresh imports - the second
import will always replace the first one. So anyone affected needs to
find some other way of passing the state, like using bound methods of
a hidden type rather than ordinary callables. If you have to
interoperate with a C API that only accepts a C callback without
allowing additional state arguments, you're going to have trouble.

I think atexit serves as a good example, though - that _Py_PyAtExit
call will *always* be destructive (even if you still have a reference
to the original module), so there should be a way for the module to
explicitly indicate to the import system "you can only create this
module once, and then you're committed - unloading it and importing it
again won't work properly due to side effects on the process state".

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Completing the email6 API changes.

2013-09-01 Thread R. David Murray
On Sun, 01 Sep 2013 00:18:59 +0900, "Stephen J. Turnbull"  
wrote:
> R. David Murray writes:
> 
>  > Full validation is something that is currently a "future
>  > objective".
> 
> I didn't mean it to be anything else. :-)
> 
>  > There's infrastructure to do it, but not all of the necessary knowledge
>  > has been coded in yet.
> 
> Well, I assume you already know that there's no way that can ever
> happen (at least until we abandon messaging entirely): new RFCs will
> continue to be published.  So it needs to be an extensible mechanism,
> a "pipeline" of checks (Barry would say a "chain of rules", I think).

My idea was to encode as much of the current known rules as as we have
the stomach for, and to have a validation flag that you turn on if you
want to check your message against those standards.  But without that
flag the code allows you to set arbitrary parameters and headers.

As you say, an extensible mechanism for the validators is a good idea.
So I take it back that the infrastructure is in place, since extensibility
doesn't exist for that feature yet.

--David
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Redesigning extension modules

2013-09-01 Thread Antoine Pitrou
On Mon, 2 Sep 2013 00:10:08 +1000
Nick Coghlan  wrote:
> 
> $ ./python
> Python 3.4.0a1+ (default:575071257c92+, Aug 25 2013, 00:42:17)
> [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import atexit
> >>> atexit.register(print, "Hello World!")
> 
> >>> import sys
> >>> del sys.modules["atexit"]
> >>> import atexit as atexit2
> >>> atexit2.register(print, "Goodbye World!")
> 
> >>>
> Goodbye World!

Yeah, atexit is a very particular example, because it interacts with
global state by design (the main interpreter instance), and no amount
of module initialization magic can prevent that :-)

Speaking of which, it also doesn't work (well) with subinterpreters:
http://bugs.python.org/issue18618

Regards

Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] 'Subinterpreter' (was Re: Pre-PEP: Redesigning extension modules)

2013-09-01 Thread Terry Reedy



Speaking of which, it also doesn't work (well) with subinterpreters:


Could someone briefly explain 'subinterpreter' or point me somewhere in 
the docs? It appears throughout this thread but there is no index or 
glossary entry.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 'Subinterpreter' (was Re: Pre-PEP: Redesigning extension modules)

2013-09-01 Thread Antoine Pitrou
On Sun, 01 Sep 2013 16:02:33 -0400
Terry Reedy  wrote:
> 
> > Speaking of which, it also doesn't work (well) with subinterpreters:
> 
> Could someone briefly explain 'subinterpreter' or point me somewhere in 
> the docs? It appears throughout this thread but there is no index or 
> glossary entry.

http://docs.python.org/dev/c-api/init.html#sub-interpreter-support

Subinterpreters are a somewhat borderline feature that allows embedding
applications to host multiple Python programs in a single process.  A
well-known example is mod_wsgi.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython (merge 3.3 -> default): Merge fix from 3.3 into default.

2013-09-01 Thread Antoine Pitrou
On Sun,  1 Sep 2013 23:02:17 +0200 (CEST)
tim.peters  wrote:
> http://hg.python.org/cpython/rev/25211a8b
> changeset:   85495:25211a8b
> parent:  85493:267e09700978
> parent:  85494:8efcf3c823f9
> user:Tim Peters 
> date:Sun Sep 01 16:01:46 2013 -0500
> summary:
>   Merge fix from 3.3 into default.
> 
> Fix issue 18889: test_sax: multiple failures on Windows desktop.
> 
> "The fix" is to tell Mercurial that the test files are binary.
> 
> Windows developers:  to get the correct line endings in your checkout,
> delete Lib\test\xmltestdata, and then "hg revert" that directory.
> 
> Why the Windows buildbots didn't fail test_sax remains a mystery :-(

Probably because they don't have the hgeol extension enabled.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 'Subinterpreter' (was Re: Pre-PEP: Redesigning extension modules)

2013-09-01 Thread Stefan Behnel
Antoine Pitrou, 01.09.2013 22:06:
> On Sun, 01 Sep 2013 16:02:33 -0400
> Terry Reedy wrote:
>>> Speaking of which, it also doesn't work (well) with subinterpreters:
>>
>> Could someone briefly explain 'subinterpreter' or point me somewhere in 
>> the docs? It appears throughout this thread but there is no index or 
>> glossary entry.
> 
> http://docs.python.org/dev/c-api/init.html#sub-interpreter-support
> 
> Subinterpreters are a somewhat borderline feature that allows embedding
> applications to host multiple Python programs in a single process.  A
> well-known example is mod_wsgi.

And extension modules usually don't play well with subinterpreters because
each subinterpreter requires its own separate version of the module and
extension modules are rarely designed to keep their state completely local
to an interpreter, let alone being prepared for having their module init
function be called more than once.

Stefan


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Completing the email6 API changes.

2013-09-01 Thread R. David Murray
On Sat, 31 Aug 2013 18:57:56 +0900, "Stephen J. Turnbull"  
wrote:
> R. David Murray writes:
> 
>  > But I would certainly appreciate review from anyone so moved, since I
>  > haven't gotten any yet.
> 
> I'll try to make time for a serious (but obviously partial) review by
> Monday.
> 
> I don't know if this is "serious" bikeshedding, but I have a comment
> or two on the example:
> 
>  > from email.message import MIMEMessage
>  > from email.headerregistry import Address
>  > fullmsg = MIMEMessage()
>  > fullmsg['To'] = Address('Foő Bar', '[email protected]')
>  > fullmsg['From'] = "mé¨ "
>  > fullmsg['Subject'] = "j'ai un probléme de python."
> 
> This is very nice!  *I* *love* it.
> 
> But (sorry!) I worry that it's not obvious to "naive" users.  Maybe it
> would be useful to have a Message() factory which has one semantic
> difference from MIMEMessage: it "requires" RFC 822-required headers
> (or optionally RFC 1036 for news).  Eg:
> 
> # This message will be posted and mailed
> # These would conform to the latest Draft Standards
> # and be DKIM-signed
> fullmsg = Message('rfc822', 'rfc1036', 'dmarc')
> 
> I'm not sure how "required" would be implemented (perhaps through a
> .validate() method).  So the signature of the API suggested above is
> Message(*validators, **kw).

Adding new constructor arguments to the existing Message class is
possible.  However, given the new architecture, the more logical way
to do this is to put it in the policy.  So currently the idea would be
for this to be spelled like this:

fullmsg = Message(policy=policy.SMTP+policy.strict)

Then what would happen is that when the message is serialized (be it
via str(), bytes(), by passing it to smtplib.sendmail or
smtplib.sendmessage, or by an explicit call to a Generator), an
error would be raised if the minimum required headers are not
present.

As I said in an earlier message, currently there's no extensibility
mechanism for the validation.  If the parser recognizes a defect, whether
or not an error is raised is controlled by the policy.  But there's
no mechanism for adding new defect checking that the parser doesn't
already know about, or for issues that are not parse-time defects.
(There is currently one non-parsing defect for which there is a custom
control: the maximum number of headers of a given type that are allowed
to be added to a Message object.)

So we need some way to add additional constraints as well.  Probably a
list of validation functions that take a Message/MIMEPart as the
argument and do a raise if they want to reject the message.

The tricky bit is that currently raise_on_defect means you get an error
as soon as a (parsing) defect is discovered.  Likewise, if max_count
is being enforced for headers, the error is raised as soon as the
duplicate header is added.

Generating errors early when building messages was one of or original
design goals, and *only* detecting problems via validators runs counter
to that unless all the validators are called every time an operation
is performed that modifies a message.  Maybe that would be OK, but it
feels ugly.

For the missing header problem, the custom solution could be to add a
'headers' argument to Message that would allow you to write:

 fullmsg = Message(header=(
Header('Date', email.utils.localtime()),
Header('To', Address('Fred', '[email protected]')),
Header('From', Address('Sally, '[email protected]')),
Header('Subject', 'Foo'),
),
policy=policy.SMTP+policy.Strict)

This call could then immediately raise an error if not all of the
required headers are present.  (Header is unfortunately not a good
choice of name here because we already have a 'Header' class that has a
different API).

Aside: I could also imagine adding a 'content' argument that would let
you generate a simple text message via a single call...which means you
could also extend this model to specifying the entire message in a single
call, if you wrote a suitable content manager function for tuples:

 fullmsg = Message(
policy=policy.SMTP+policy.Strict,
header=(
   Header('Date', datetime.datetime.now()),
   Header('To', Address('Fred', '[email protected]')),
   Header('From', Address('Sally, '[email protected]')),
   Header('Subject', 'Foo'),
   ),
content=(
(
'This is the text part',
(
  'Here is the html',
  {'image1': b'image data'},
  ),
),
b'attachment data',
)

But that is probably a little bit crazy...easier to just write a custom
function for your appl

Re: [Python-Dev] [Python-checkins] cpython: Further reduce the cost of hash collisions by inspecting an additional nearby

2013-09-01 Thread Eli Bendersky
On Sat, Aug 31, 2013 at 9:29 PM, raymond.hettinger <
[email protected]> wrote:

> http://hg.python.org/cpython/rev/d40a65658ff0
> changeset:   85486:d40a65658ff0
> parent:  85482:4d604f1f0219
> user:Raymond Hettinger 
> date:Sat Aug 31 21:27:08 2013 -0700
> summary:
>   Further reduce the cost of hash collisions by inspecting an additional
> nearby entry.
>


Hi Raymond,

I'm curious about the benchmarks used to test such micro-optimizations. Do
you use one of the existing Python benchmark suites or do you have a
particular set of micro-benchmarks you're running this on?

Eli




>
> files:
>   Objects/setobject.c |  43 +---
>   1 files changed, 39 insertions(+), 4 deletions(-)
>
>
> diff --git a/Objects/setobject.c b/Objects/setobject.c
> --- a/Objects/setobject.c
> +++ b/Objects/setobject.c
> @@ -65,10 +65,11 @@
>  The initial probe index is computed as hash mod the table size. Subsequent
>  probe indices are computed as explained in Objects/dictobject.c.
>
> -To improve cache locality, each probe is done in pairs.
> -After the probe is examined, an adjacent entry is then examined as well.
> -The likelihood is that an adjacent entry is in the same cache line and
> -can be examined more cheaply than another probe elsewhere in memory.
> +To improve cache locality, each probe inspects nearby entries before
> +moving on to probes elsewhere in memory.  Depending on alignment and the
> +size of a cache line, the nearby entries are cheaper to inspect than
> +other probes elsewhere in memory.  This probe strategy reduces the cost
> +of hash collisions.
>
>  All arithmetic on hash should ignore overflow.
>
> @@ -130,6 +131,26 @@
>  if (entry->key == dummy && freeslot == NULL)
>  freeslot = entry;
>
> +entry = &table[j ^ 2];
> +if (entry->key == NULL)
> +break;
> +if (entry->key == key)
> +return entry;
> +if (entry->hash == hash && entry->key != dummy) {
> +PyObject *startkey = entry->key;
> +Py_INCREF(startkey);
> +cmp = PyObject_RichCompareBool(startkey, key, Py_EQ);
> +Py_DECREF(startkey);
> +if (cmp < 0)
> +return NULL;
> +if (table != so->table || entry->key != startkey)
> +return set_lookkey(so, key, hash);
> +if (cmp > 0)
> +return entry;
> +}
> +if (entry->key == dummy && freeslot == NULL)
> +freeslot = entry;
> +
>  i = i * 5 + perturb + 1;
>  j = i & mask;
>  perturb >>= PERTURB_SHIFT;
> @@ -190,6 +211,17 @@
>  if (entry->key == dummy && freeslot == NULL)
>  freeslot = entry;
>
> +entry = &table[j ^ 2];
> +if (entry->key == NULL)
> +break;
> +if (entry->key == key
> +|| (entry->hash == hash
> +&& entry->key != dummy
> +&& unicode_eq(entry->key, key)))
> +return entry;
> +if (entry->key == dummy && freeslot == NULL)
> +freeslot = entry;
> +
>  i = i * 5 + perturb + 1;
>  j = i & mask;
>  perturb >>= PERTURB_SHIFT;
> @@ -258,6 +290,9 @@
>  entry = &table[j ^ 1];
>  if (entry->key == NULL)
>  break;
> +entry = &table[j ^ 2];
> +if (entry->key == NULL)
> +break;
>  i = i * 5 + perturb + 1;
>  j = i & mask;
>  perturb >>= PERTURB_SHIFT;
>
> --
> Repository URL: http://hg.python.org/cpython
>
> ___
> Python-checkins mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-checkins
>
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Issue #11798: fix tests for regrtest -R :

2013-09-01 Thread Eli Bendersky
On Sat, Aug 31, 2013 at 9:58 PM, andrew.svetlov
wrote:

> http://hg.python.org/cpython/rev/39781c3737f8
> changeset:   85490:39781c3737f8
> user:Andrew Svetlov 
> date:Sun Sep 01 07:58:41 2013 +0300
> summary:
>   Issue #11798: fix tests for regrtest -R :
>
> files:
>   Lib/test/regrtest.py|  5 +
>   Lib/unittest/suite.py   |  8 ++--
>   Lib/unittest/test/test_suite.py |  8 
>   3 files changed, 19 insertions(+), 2 deletions(-)
>
>
>
Hi Andrew,

It would help if you could add more details into the commit message. This
would make both post-commit reviews and future code archeology simpler.

Eli



> diff --git a/Lib/test/regrtest.py b/Lib/test/regrtest.py
> --- a/Lib/test/regrtest.py
> +++ b/Lib/test/regrtest.py
> @@ -496,6 +496,8 @@
>
>  if ns.slaveargs is not None:
>  args, kwargs = json.loads(ns.slaveargs)
> +if kwargs.get('huntrleaks'):
> +unittest.BaseTestSuite._cleanup = False
>  try:
>  result = runtest(*args, **kwargs)
>  except KeyboardInterrupt:
> @@ -528,6 +530,9 @@
>  #gc.set_debug(gc.DEBUG_SAVEALL)
>  found_garbage = []
>
> +if ns.huntrleaks:
> +unittest.BaseTestSuite._cleanup = False
> +
>  if ns.single:
>  filename = os.path.join(TEMPDIR, 'pynexttest')
>  try:
> diff --git a/Lib/unittest/suite.py b/Lib/unittest/suite.py
> --- a/Lib/unittest/suite.py
> +++ b/Lib/unittest/suite.py
> @@ -16,6 +16,8 @@
>  class BaseTestSuite(object):
>  """A simple test suite that doesn't provide class or module shared
> fixtures.
>  """
> +_cleanup = True
> +
>  def __init__(self, tests=()):
>  self._tests = []
>  self.addTests(tests)
> @@ -61,7 +63,8 @@
>  if result.shouldStop:
>  break
>  test(result)
> -self._removeTestAtIndex(index)
> +if self._cleanup:
> +self._removeTestAtIndex(index)
>  return result
>
>  def _removeTestAtIndex(self, index):
> @@ -115,7 +118,8 @@
>  else:
>  test.debug()
>
> -self._removeTestAtIndex(index)
> +if self._cleanup:
> +self._removeTestAtIndex(index)
>
>  if topLevel:
>  self._tearDownPreviousClass(None, result)
> diff --git a/Lib/unittest/test/test_suite.py
> b/Lib/unittest/test/test_suite.py
> --- a/Lib/unittest/test/test_suite.py
> +++ b/Lib/unittest/test/test_suite.py
> @@ -303,6 +303,9 @@
>  suite.run(unittest.TestResult())
>
>  def test_remove_test_at_index(self):
> +if not unittest.BaseTestSuite._cleanup:
> +raise unittest.SkipTest("Suite cleanup is disabled")
> +
>  suite = unittest.TestSuite()
>
>  suite._tests = [1, 2, 3]
> @@ -311,6 +314,9 @@
>  self.assertEqual([1, None, 3], suite._tests)
>
>  def test_remove_test_at_index_not_indexable(self):
> +if not unittest.BaseTestSuite._cleanup:
> +raise unittest.SkipTest("Suite cleanup is disabled")
> +
>  suite = unittest.TestSuite()
>  suite._tests = None
>
> @@ -318,6 +324,8 @@
>  suite._removeTestAtIndex(2)
>
>  def assert_garbage_collect_test_after_run(self, TestSuiteClass):
> +if not unittest.BaseTestSuite._cleanup:
> +raise unittest.SkipTest("Suite cleanup is disabled")
>
>  class Foo(unittest.TestCase):
>  def test_nothing(self):
>
> --
> Repository URL: http://hg.python.org/cpython
>
> ___
> Python-checkins mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-checkins
>
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Update whatsnew/3.4.rst wrt. the socket constants switch to IntEnum

2013-09-01 Thread Eli Bendersky
On Sat, Aug 31, 2013 at 4:52 PM, Terry Reedy  wrote:

> On 8/31/2013 6:19 PM, eli.bendersky wrote:
>
>> http://hg.python.org/cpython/**rev/4d604f1f0219
>> changeset:   85482:4d604f1f0219
>> user:Eli Bendersky 
>> date:Sat Aug 31 15:18:48 2013 -0700
>> summary:
>>Update whatsnew/3.4.rst wrt. the socket constants switch to IntEnum
>>
>> [issue #18730]
>>
>
>
> Wrong issue number I think.
> ___
>

Oops, yes. Sorry about that.

Eli
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Completing the email6 API changes.

2013-09-01 Thread Glenn Linderman

On 9/1/2013 3:10 PM, R. David Murray wrote:

This doesn't work, though, because you could (although you usually
won't) have more than one 'text/html' part in a single multipart.


I was traveling and your original message is still unread in my queue of 
"things to look at later" :(  I haven't caught up with old stuff yet, 
but am trying to stay current on current stuff...


The quoted issue was mentioned in another message in this thread, though 
in different terms.


I recall being surprised when first seeing messages generated by Apple 
Mail software, that are multipart/related, having a sequence of 
intermixed text/plain and image/jpeg parts. This is apparently how Apple 
Mail generates messages that have inline pictures, without resorting to 
use of HTML mail. Other email clients handle this relatively better or 
worse, depending on the expectations of their authors! Several of them 
treat all the parts after the initial text/html part as attachments; 
some of them display inline attachments if they are text/html or 
image/jpeg and others do not. I can't say for sure if there are other 
ways they are treated; I rather imagine that Apple Mail displays the 
whole message with interspersed pictures quite effectively, without 
annoying the user with attachment "markup", but I'm not an Apple Mail 
user so I couldn't say for sure.


You should, of course, ensure that it is possible to create such a message.

Whether Apple Mail does that with other embedded image/* formats, or 
with other text/* formats, or other non-image, non-text formats, I 
couldn't say. I did attempt to determine if it was non-standard usage: 
it is certainly non-common usage, but I found nothing in the email/MIME 
RFCs that precludes such usage.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython (merge 3.3 -> default): Merge fix from 3.3 into default.

2013-09-01 Thread Terry Reedy

On 9/1/2013 5:04 PM, Antoine Pitrou wrote:

On Sun,  1 Sep 2013 23:02:17 +0200 (CEST)
tim.peters  wrote:



Windows developers:  to get the correct line endings in your checkout,
delete Lib\test\xmltestdata, and then "hg revert" that directory.


Or, in Tortoisehg Workbenck, select the four Working Directory patch 
entries, right click, and revert.



Why the Windows buildbots didn't fail test_sax remains a mystery :-(


Probably because they don't have the hgeol extension enabled.


Since the tests also failed on installed Python, it seems that the .msi 
installer is created in a repository with the extension enabled. If so, 
I think that the buildbots should have the extension enabled also, so 
that they are testing Python as it will be distributed.


Or we could stop supporting Notepad and change what we distribute ;-).

--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 'Subinterpreter' (was Re: Pre-PEP: Redesigning extension modules)

2013-09-01 Thread Terry Reedy

On 9/1/2013 5:13 PM, Stefan Behnel wrote:

Antoine Pitrou, 01.09.2013 22:06:

On Sun, 01 Sep 2013 16:02:33 -0400
Terry Reedy wrote:

Speaking of which, it also doesn't work (well) with subinterpreters:


Could someone briefly explain 'subinterpreter' or point me somewhere in
the docs? It appears throughout this thread but there is no index or
glossary entry.


http://docs.python.org/dev/c-api/init.html#sub-interpreter-support


So cpython specific.


Subinterpreters are a somewhat borderline feature that allows embedding
applications to host multiple Python programs in a single process.  A
well-known example is mod_wsgi.


Thank you for both the link *and* the explanatory example, which just 
what I needed to make the past discussion more intelligible. I imagine 
that wsgi uses a sub-interpreter for each user connection.



And extension modules usually don't play well with subinterpreters because
each subinterpreter requires its own separate version of the module and
extension modules are rarely designed to keep their state completely local
to an interpreter, let alone being prepared for having their module init
function be called more than once.


I can see now why this is a bit of a 'hair-puller';-).

--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Update whatsnew/3.4.rst wrt. the socket constants switch to IntEnum

2013-09-01 Thread Terry Reedy

On 9/1/2013 6:59 PM, Eli Bendersky wrote:




On Sat, Aug 31, 2013 at 4:52 PM, Terry Reedy mailto:[email protected]>> wrote:

On 8/31/2013 6:19 PM, eli.bendersky wrote:



[issue #18730]



Wrong issue number I think.
___


Oops, yes. Sorry about that.


I unlinked it from 18730.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] SEEK_* constants in io and os

2013-09-01 Thread Eli Bendersky
Hello,

I was looking at the possibility of replacing the SEEK_* constants by
IntEnums, and the first thing that catches attention is that these
constants are defined in both Lib/os.py and Lib/io.py; both places also
recently started supporting SEEK_HOLE and SEEK_DATA (though here io refers
to os.SEEK_HOLE and os.SEEK_DATA).

Additional data points: other modules take these constants as arguments -
* mmap: directs to use os.SEEK_*
* chunk and fcntk: spell out the numeric values.

os seems to import io in some functions; can this be done always? If yes,
we can just define the constants once and os.SEEK_* will alias io.SEEK_*?
The other way (io taking from os) is also a possibility (maybe the
preferred one because io already refers to os.SEEK_HOLE/DATA, at least in
the documentation).

Any ideas and suggestions are welcome,

Eli
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython (merge 3.3 -> default): Merge fix from 3.3 into default.

2013-09-01 Thread David Bolen
Terry Reedy  writes:

> On 9/1/2013 5:04 PM, Antoine Pitrou wrote:
>> Probably because they don't have the hgeol extension enabled.

Yes, I believe that's correct, at least for my Windows buildbots.

> Since the tests also failed on installed Python, it seems that the
> .msi installer is created in a repository with the extension
> enabled. If so, I think that the buildbots should have the extension
> enabled also, so that they are testing Python as it will be
> distributed.

If it should be enabled, I believe it will have to be done globally,
so affecting all buildbot branches (not that it sounds like that would
be an issue).  The individual branch repositories are physically
removed at times (such as during a failure to update, which escalates
to a removal followed by full clone, as well as I believe during a
custom build), so I don't believe persistent local changes to the
repository's own hgrc are possible in a reliable way.

In the event we did want per-repository control, then I think we'd
need to have some sort of additional script that ran as part of the
buildbot build process to ensure the right contents for the local
repository hgrc following a clone, or just prior to the update step.

-- David

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Issue #11798: fix tests for regrtest -R :

2013-09-01 Thread Andrew Svetlov
regrtest -R runs test suites several times. That's why test cleanup
should be disabled for this case.
Details discussed in issue.
I'll do more expressive commit messages next time.
Thanks.


On Mon, Sep 2, 2013 at 1:58 AM, Eli Bendersky  wrote:
>
>
>
> On Sat, Aug 31, 2013 at 9:58 PM, andrew.svetlov 
> wrote:
>>
>> http://hg.python.org/cpython/rev/39781c3737f8
>> changeset:   85490:39781c3737f8
>> user:Andrew Svetlov 
>> date:Sun Sep 01 07:58:41 2013 +0300
>> summary:
>>   Issue #11798: fix tests for regrtest -R :
>>
>> files:
>>   Lib/test/regrtest.py|  5 +
>>   Lib/unittest/suite.py   |  8 ++--
>>   Lib/unittest/test/test_suite.py |  8 
>>   3 files changed, 19 insertions(+), 2 deletions(-)
>>
>>
>
> Hi Andrew,
>
> It would help if you could add more details into the commit message. This
> would make both post-commit reviews and future code archeology simpler.
>
> Eli
>
>
>>
>> diff --git a/Lib/test/regrtest.py b/Lib/test/regrtest.py
>> --- a/Lib/test/regrtest.py
>> +++ b/Lib/test/regrtest.py
>> @@ -496,6 +496,8 @@
>>
>>  if ns.slaveargs is not None:
>>  args, kwargs = json.loads(ns.slaveargs)
>> +if kwargs.get('huntrleaks'):
>> +unittest.BaseTestSuite._cleanup = False
>>  try:
>>  result = runtest(*args, **kwargs)
>>  except KeyboardInterrupt:
>> @@ -528,6 +530,9 @@
>>  #gc.set_debug(gc.DEBUG_SAVEALL)
>>  found_garbage = []
>>
>> +if ns.huntrleaks:
>> +unittest.BaseTestSuite._cleanup = False
>> +
>>  if ns.single:
>>  filename = os.path.join(TEMPDIR, 'pynexttest')
>>  try:
>> diff --git a/Lib/unittest/suite.py b/Lib/unittest/suite.py
>> --- a/Lib/unittest/suite.py
>> +++ b/Lib/unittest/suite.py
>> @@ -16,6 +16,8 @@
>>  class BaseTestSuite(object):
>>  """A simple test suite that doesn't provide class or module shared
>> fixtures.
>>  """
>> +_cleanup = True
>> +
>>  def __init__(self, tests=()):
>>  self._tests = []
>>  self.addTests(tests)
>> @@ -61,7 +63,8 @@
>>  if result.shouldStop:
>>  break
>>  test(result)
>> -self._removeTestAtIndex(index)
>> +if self._cleanup:
>> +self._removeTestAtIndex(index)
>>  return result
>>
>>  def _removeTestAtIndex(self, index):
>> @@ -115,7 +118,8 @@
>>  else:
>>  test.debug()
>>
>> -self._removeTestAtIndex(index)
>> +if self._cleanup:
>> +self._removeTestAtIndex(index)
>>
>>  if topLevel:
>>  self._tearDownPreviousClass(None, result)
>> diff --git a/Lib/unittest/test/test_suite.py
>> b/Lib/unittest/test/test_suite.py
>> --- a/Lib/unittest/test/test_suite.py
>> +++ b/Lib/unittest/test/test_suite.py
>> @@ -303,6 +303,9 @@
>>  suite.run(unittest.TestResult())
>>
>>  def test_remove_test_at_index(self):
>> +if not unittest.BaseTestSuite._cleanup:
>> +raise unittest.SkipTest("Suite cleanup is disabled")
>> +
>>  suite = unittest.TestSuite()
>>
>>  suite._tests = [1, 2, 3]
>> @@ -311,6 +314,9 @@
>>  self.assertEqual([1, None, 3], suite._tests)
>>
>>  def test_remove_test_at_index_not_indexable(self):
>> +if not unittest.BaseTestSuite._cleanup:
>> +raise unittest.SkipTest("Suite cleanup is disabled")
>> +
>>  suite = unittest.TestSuite()
>>  suite._tests = None
>>
>> @@ -318,6 +324,8 @@
>>  suite._removeTestAtIndex(2)
>>
>>  def assert_garbage_collect_test_after_run(self, TestSuiteClass):
>> +if not unittest.BaseTestSuite._cleanup:
>> +raise unittest.SkipTest("Suite cleanup is disabled")
>>
>>  class Foo(unittest.TestCase):
>>  def test_nothing(self):
>>
>> --
>> Repository URL: http://hg.python.org/cpython
>>
>> ___
>> Python-checkins mailing list
>> [email protected]
>> http://mail.python.org/mailman/listinfo/python-checkins
>>
>
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com
>



-- 
Thanks,
Andrew Svetlov
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Completing the email6 API changes.

2013-09-01 Thread Stephen J. Turnbull
This is getting off-topic IMO; we should probably take this thread to
email-sig.

Glenn Linderman writes:

 > I recall being surprised when first seeing messages generated by
 > Apple Mail software, that are multipart/related, having a sequence
 > of intermixed text/plain and image/jpeg parts. This is apparently
 > how Apple Mail generates messages that have inline pictures,
 > without resorting to use of HTML mail.

(Are you sure you mean "text/plain" above?  I've not seen this form of
message.  And you mention only "text/html" below.)

This practice (like my suggestion) is based on the conjecture that
MUAs that implement multipart/related will treat it as multipart/mixed
if the "main" subpart isn't known to implement links to external
entities.

 > Other email clients handle this relatively better or worse,
 > depending on the expectations of their authors!

Sure.  After all, this is a world in which some MUAs have a long
history of happily executing virus executables.

 > I did attempt to determine if it was non-standard usage: it is
 > certainly non-common usage, but I found nothing in the email/MIME
 > RFCs that precludes such usage.

Clearly RFCs 2046 and 2387 envision a fallback to multipart/mixed, but
are silent on how to do it for MUAs that implement multipart/related.
RFC 2387 says:

MIME User Agents that do recognize Multipart/Related entities but
are unable to process the given type should give the user the
option of suppressing the entire Multipart/Related body part shall
be. [...]  Handling Multipart/Related differs [from handling of
existing composite subtypes] in that processing cannot be reduced
to handling the individual entities.

I think that the sane policy is that when processing multipart/related
internally, the MUA should treat the whole as multipart/mixed, unless
it knows how links are implemented in the "start" part.  But the RFC
doesn't say that.

 > Several of them treat all the parts after the initial text/html
 > part as attachments;

They don't implement RFC 2387 (promoted to draft standard in 1998,
following two others, the earlier being RFC 1872 from 1995).  Too bad
for their users.  But what I'm worried about is a different issue,
which is how to ensure that multipart/alternative messages present all
relevant content entities in both presentations.  For example, the
following hypothetical structure is efficient:

multipart/alternative
text/plain
multipart/related
text/html
application/x-opentype-font

because the text/plain can't use the font.  But this

multipart/alternative
text/plain
multipart/related
text/html
image/png
image/png

often cost the text/plain receiver a view of the images, and I don't
see any way to distinguish the two cases.  (The images might be
character glyphs, for example, effectively a "poor man's font".)
OTOH, if the message is structured

multipart/related
multipart/alternative
text/plain
text/html
image/png
image/png

the receiver can infer that the images are related to both text/*
parts and DTRT for each.

Steve

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Completing the email6 API changes.

2013-09-01 Thread Glenn Linderman

On 9/1/2013 8:03 PM, Stephen J. Turnbull wrote:

This is getting off-topic IMO; we should probably take this thread to
email-sig.


Probably, but you didn't :)


Glenn Linderman writes:

  > I recall being surprised when first seeing messages generated by
  > Apple Mail software, that are multipart/related, having a sequence
  > of intermixed text/plain and image/jpeg parts. This is apparently
  > how Apple Mail generates messages that have inline pictures,
  > without resorting to use of HTML mail.

(Are you sure you mean "text/plain" above?  I've not seen this form of
message.  And you mention only "text/html" below.)


Yes, I'm sure it was text/plain. I may be able to access the archived 
discussion from a non-Python mailing list about it, to verify, if that 
becomes important.  But now that you mention mulitpart/mixed, I'm not 
sure if it was multipart/related or mulitpart/mixed for the grouping 
MIME part. Perhaps someone with Apple Mail could produce one... probably 
by composing a message as text/plain, and dragging in a picture or two.


The other references to text/html was in error.


This practice (like my suggestion) is based on the conjecture that
MUAs that implement multipart/related will treat it as multipart/mixed
if the "main" subpart isn't known to implement links to external
entities.

  > Other email clients handle this relatively better or worse,
  > depending on the expectations of their authors!

Sure.  After all, this is a world in which some MUAs have a long
history of happily executing virus executables.

  > I did attempt to determine if it was non-standard usage: it is
  > certainly non-common usage, but I found nothing in the email/MIME
  > RFCs that precludes such usage.

Clearly RFCs 2046 and 2387 envision a fallback to multipart/mixed, but
are silent on how to do it for MUAs that implement multipart/related.
RFC 2387 says:

 MIME User Agents that do recognize Multipart/Related entities but
 are unable to process the given type should give the user the
 option of suppressing the entire Multipart/Related body part shall
 be. [...]  Handling Multipart/Related differs [from handling of
 existing composite subtypes] in that processing cannot be reduced
 to handling the individual entities.

I think that the sane policy is that when processing multipart/related
internally, the MUA should treat the whole as multipart/mixed, unless
it knows how links are implemented in the "start" part.  But the RFC
doesn't say that.

  > Several of them treat all the parts after the initial text/html
  > part as attachments;

They don't implement RFC 2387 (promoted to draft standard in 1998,
following two others, the earlier being RFC 1872 from 1995).  Too bad
for their users.


Correct... but the MUA receiving the Apple Mail message I was talking 
about being a text-mostly MUA, it is probably a reasonable method of 
handling them.



But what I'm worried about is a different issue,
which is how to ensure that multipart/alternative messages present all
relevant content entities in both presentations.  For example, the
following hypothetical structure is efficient:

 multipart/alternative
 text/plain
 multipart/related
 text/html
 application/x-opentype-font

because the text/plain can't use the font.  But this

 multipart/alternative
 text/plain
 multipart/related
 text/html
 image/png
 image/png

often cost the text/plain receiver a view of the images, and I don't
see any way to distinguish the two cases.  (The images might be
character glyphs, for example, effectively a "poor man's font".)


Yes, that issue is handled by some text MUA by showing the image/png (or 
anything in such a position) as attachments. Again, being text-mostly, 
that might be a reasonable way of handling them. Perhaps the standard 
says they should be ignored, when displaying text/plain alternative.



OTOH, if the message is structured

 multipart/related
 multipart/alternative
 text/plain
 text/html
 image/png
 image/png

the receiver can infer that the images are related to both text/*
parts and DTRT for each.

With the images being treated as attachments. Or is there a syntax to 
allow the text/html to embed the images and the text/plain to see them 
as attachments?  I think the text/html wants to refer to things within 
its containing multipart/related, but am not sure if that allows the 
intervening multipart/alternative.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com