Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-22 Thread Tony Nelson
On 10-01-22 02:53:21, Collin Winter wrote:
> On Thu, Jan 21, 2010 at 11:37 PM, Glyph Lefkowitz
>  wrote:
> >
> > On Jan 21, 2010, at 6:48 PM, Collin Winter wrote:
 ...
> > There's been a recent thread on our mailing list about a patch that
> > dramatically reduces the memory footprint of multiprocess
> > concurrency by separating reference counts from objects. ...

Currently, CPython gets a performance advantage from having reference 
counts hot in the cache when the referenced object is used.  There is 
still the write pressure from updating the counts.  With separate 
reference counts, an extra cache line must be loaded from memory (it is 
unlikely to be in the cache unless the program is trivial).  I see from 
the referenced posting that this is a 10% speed hit (the poster 
attributes the hit to extra instructions).

Perhaps the speed and memory hits could be minimized by only doing this 
for some objects?  Only objects that are fully shared (such as read-
only data) benefit from this change.  I don't know but shared objects 
may already be treated separately.

 ...
> The data I've seen comes from
> http://groups.google.com/group/comp.lang.python/msg/c18b671f2c4fef9e:
 ...

-- 

TonyN.:'   
  '  

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 376 : Changing the .egg-info structure

2009-05-15 Thread Tony Nelson
At 13:52 -0400 05/15/2009, P.J. Eby wrote:
>At 08:32 AM 5/15/2009 +0200, Jeroen Ruigrok van der Werven wrote:
>>Agreed. Within FreeBSD's ports the installed package registration
>>gets a MD5 hash per file recorded. Size is less interesting though,
>>since essentially this information is encapsulated within the hash.
>>Remove one byte from the file and your hash is already different.
>
>Which also means that in that case you can skip computing the
>MD5.  The size allows you to easily notice an overwrite/corruption
>without further processing.

In most cases the files will actually match, so the sizes and dates will be
the same and the checksum must be computed to verify the match.

RPM does this when asked to Verify a package.  It is faster than Removing a
package, and Verifying all installed packages takes a reasonable amount of
time.  I don't think Python would be any worse at verifying its own
packages, and it would normally have less data to verify, so it should be
fast enough.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383: Non-decodable Bytes in System C haracter Interfaces

2009-04-27 Thread Tony Nelson
At 16:09 + 04/27/2009, Antoine Pitrou wrote:
>Stephen J. Turnbull  xemacs.org> writes:
>>
>> I hate to break it to you, but most stages of mail processing have
>> very little to do with SMTP.  In particular, processing MIME
>> attachments often requires dealing with file names.
>
>AFAIK, the file name is only there as an indication for the user when he wants
>to save the file. If it's garbled a bit, no big deal.
 ...

Yep.  In fact, it should be cleaned carefully.  RFC 2183, 2.3:

"It is important that the receiving MUA not blindly use the suggested
filename.  The suggested filename SHOULD be checked (and possibly
changed) to see that it conforms to local filesystem conventions,
does not overwrite an existing file, and does not present a security
problem (see Security Considerations below).

The receiving MUA SHOULD NOT respect any directory path information
that may seem to be present in the filename parameter.  The filename
should be treated as a terminal component only.  Portable
specification of directory paths might possibly be done in the future
via a separate Content Disposition parmeter, but no provision is
made for it in this draft."

-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Tony Nelson
At 23:39 -0700 04/26/2009, Glenn Linderman wrote:
>On approximately 4/25/2009 5:35 AM, came the following characters from
>the keyboard of Martin v. Löwis:
>>> Because the encoding is not reliably reversible.
>>
>> Why do you say that? The encoding is completely reversible
>> (unless we disagree on what "reversible" means).
>>
>>> I'm +1 on the concept, -1 on the PEP, due solely to the lack of a
>>> reversible encoding.
>>
>> Then please provide an example for a setup where it is not reversible.
>>
>> Regards,
>> Martin
>
>It is reversible if you know that it is decoded, and apply the encoding.
>  But if you don't know that has been encoded, then applying the reverse
>transform can convert an undecoded str that matches the decoded str to
>the form that it could have, but never did take.
>
>The problem is that there is no guarantee that the str interface
>provides only strictly conforming Unicode, so decoding bytes to
>non-strictly conforming Unicode, can result in a data pun between
>non-strictly conforming Unicode coming from the str interface vs bytes
>being decoded to non-strictly conforming Unicode coming from the bytes
>interface.
 ...

Maybe this is a dumb idea, but some people might be reassured if the
half-surrogates had some particular pattern that is unlikely to occur even
in unreasonable text (as half-surrogates are an error in Unicode).  The
pattern could be some sequence of half-surrogate encoded bytes, framing the
intended data, as is done for RFC 2047 internationalized header fields in
email.  It would take up a few more bytes in the string, but no matter.  It
would also make it easier to diagnose when decoding was not properly done.

FWIW, I like the idea in the PEP, now that I think I understand it.

(BTW, gotta love what the email package is doing to the Subject: header
field. ;-')
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] #!/usr/bin/env python --> python3 where applicable

2009-04-18 Thread Tony Nelson
At 20:51 -0700 04/18/2009, Steven Bethard wrote:
>On Sat, Apr 18, 2009 at 8:14 PM, Benjamin Peterson 
>wrote:
>> 2009/4/18 Nick Coghlan :
>>> I see a few options:
>>> 1. Abandon the "python" name for the 3.x series and commit to calling it
>>> "python3" now and forever (i.e. actually make the decision that Mitchell
>>> refers to).
>>
>> I believe this was decided on sometime (the sprints?).
>
>That's an unfortunate decision. When the 2.X line stops being
>maintained (after 2.7 maybe?) we're going to be stuck with the "3"
>suffix forever for the "real" Python.
>
>Why doesn't it make more sense to just use "python3" only for
>"altinstall" and "python" for "fullinstall"?

Just use python3 in the shebang lines all the time (where applicable ;), as
it is made by both altinstall and fullinstall.  fullinstall also make plain
"python", but that is not important.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Needing help to change the grammar

2009-04-12 Thread Tony Nelson
At 16:30 -0400 04/12/2009, Terry Reedy wrote:
 ...
>  Source in .pyb (python-brazil) is parsed with with your new parser,
 ...

In case anyone ever does this again, I suggest that the extension be the
language and optionally country code:

.py_pt  or  .py_pt_BR
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json

2009-04-09 Thread Tony Nelson
At 22:26 -0400 04/09/2009, Barry Warsaw wrote:

>There are really two ways to look at an email message.  It's either an
>unstructured blob of bytes, or it's a structured tree of objects.
>Those objects have headers and payload.  The payload can be of any
>type, though I think it generally breaks down into "strings" for text/
>* types and bytes for anything else (not counting multiparts).
>
>The email package isn't a perfect mapping to this, which is something
>I want to improve.  That aside, I think storing a message in a
>database means storing some or all of the headers separately from the
>byte stream (or text?) of its payload.  That's for non-multipart
>types.  It would be more complicated to represent a message tree of
>course.

Storing an email message in a database does mean storing some of the header
fields as database fields, but the set of email header fields is open, so
any "unused" fields in a message must be stored elsewhere.  It isn't useful
to just have a bag of name/value pairs in a table.  General message MIME
payload trees don't map well to a database either, unless one wants to get
very relational.  Sometimes the database needs to represent the entire
email message, header fields and MIME tree, but only if it is an email
program and usually not even then.  Usually, the database has a specific
purpose, and can be designed for the data it cares about; it may choose to
keep the original message as bytes.


>It does seem to make sense to think about headers as text header names
>and text header values.  Of course, header values can contain almost
>anything and there's an encoding to bring it back to 7-bit ASCII, but
>again, you really have two views of a header value.  Which you want
>really depends on your application.

I think of header fields as having text-like names (the set of allowed
characters is more than just text, though defined headers don't make use of
that), but the data is either bytes or it should be parsed into something
appropriate:  text for unstructured fields like Subject:, a list of
addresses for address fields like To:.  Many of the structured header
fields have a reasonable mapping to text; certainly this is true for adress
header fields.  Content-Type header fields are barely text, they can be so
convolutedly structured, but I suppose one could flatten one of them to
text instead of bytes if the user wanted.  It's not very useful, though,
except for debugging (either by the programmer or the recipient who wants
to know what was cleaned from the message).


>Maybe you just care about the text of both the header name and value.
>In that case, I think you want the values as unicodes, and probably
>the headers as unicodes containing only ASCII.  So your table would be
>strings in both cases.  OTOH, maybe your application cares about the
>raw underlying encoded data, in which case the header names are
>probably still strings of ASCII-ish unicodes and the values are
>bytes.  It's this distinction (and I think the competing use cases)
>that make a true Python 3.x API for email more complicated.

If a database stores the Subject: header field, it would be as text.  The
various recipient address fields are a one message to many names and
addresses mapping, and need a related table of name/address fields, with
each field being text.  The original message (or whatever part of it one
preserves) should be bytes.  I don't think this complicates the email
package API; rather, it just shows where generality is needed.


>Thinking about this stuff makes me nostalgic for the sloppy happy days
>of Python 2.x

You now have the opportunity to finally unsnarl that mess.  It is not an
insurmountable opportunity.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Email-SIG] Dropping bytes "support" in json

2009-04-09 Thread Tony Nelson
At 22:38 -0400 04/09/2009, Barry Warsaw wrote:
 ...
>So, what I'm really asking is this.  Let's say you agree that there
>are use cases for accessing a header value as either the raw encoded
>bytes or the decoded unicode.  What should this return:
>
> >>> message['Subject']
>
>The raw bytes or the decoded unicode?

That's an easy one:  Subject: is an unstructured header, so it must be
text, thus Unicode.  We're looking at a high-level representation of an
email message, with parsed header fields and a MIME message tree.


>Okay, so you've picked one.  Now how do you spell the other way?

message.get_header_bytes('Subject')

Oh, I see that's what you picked.

>The Message class probably has these explicit methods:
>
> >>> Message.get_header_bytes('Subject')
> >>> Message.get_header_string('Subject')
>
>(or better names... it's late and I'm tired ;).  One of those maps to
>message['Subject'] but which is the more obvious choice?

Structured header fields are more of a problem.  Any header with addresses
should return a list of addresses.  I think the default return type should
depend on the data type.  To get an explicit bytes or string or list of
addresses, be explicit; otherwise, for convenience, return the appropriate
type for the particular header field name.


>Now, setting headers.  Sometimes you have some unicode thing and
>sometimes you have some bytes.  You need to end up with bytes in the
>ASCII range and you'd like to leave the header value unencoded if so.
>But in both cases, you might have bytes or characters outside that
>range, so you need an explicit encoding, defaulting to utf-8 probably.

Never for header fields.  The default is always RFC 2047, unless it isn't,
say for params.

The Message class should create an object of the appropriate subclass of
Header based on the name (or use the existing object, see other
discussion), and that should inspect its argument and DTRT or complain.

>
> >>> Message.set_header('Subject', 'Some text', encoding='utf-8')
> >>> Message.set_header('Subject', b'Some bytes')
>
>One of those maps to
>
> >>> message['Subject'] = ???

The expected data type should depend on the header field.  For Subject:, it
should be bytes to be parsed or verbatim text.  For To:, it should be a
list of addresses or bytes or text to be parsed.

The email package should be pythonic, and not require deep understanding of
dozens of RFCs to use properly.  Users don't need to know about the raw
bytes; that's the whole point of MIME and any email package.  It should be
easy to set header fields with their natural data types, and doing it with
bad data should produce an error.  This may require a bit more care in the
message parser, to always produce a parsed message with defects.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] BLOBs in Pg (was: email package Bytes vs Unicode)

2009-04-09 Thread Tony Nelson
At 21:24 +0400 04/09/2009, Oleg Broytmann wrote:
>On Thu, Apr 09, 2009 at 01:14:21PM -0400, Tony Nelson wrote:
>> I use MySQL, but sort of intend to learn PostgreSQL.  I didn't know that
>> PostgreSQL has no real support for BLOBs.
>
>   I think it has - BYTEA data type.

So it does; I see that now that I've opened up the PostgreSQL docs.  I
don't find escaping data to be a problem -- I do it for all untrusted data.

So, after all, there isn't an example of a database that makes onerous the
storing of email and other such byte-oriented data, and Python's email
package has no need for workarounds in that area.
-- 

TonyN.:'   <mailto:tonynel...@georgeanelson.com>
  '  <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package Bytes vs Unicode (was Re: Dropping bytes "support" in json)

2009-04-09 Thread Tony Nelson
(email-sig dropped, as I didn't see Steve Holden's message there)

At 12:20 -0400 04/09/2009, Steve Holden wrote:
>Tony Nelson wrote:
 ...
>> If you need the data from the message, by all means extract it and store it
>> in whatever form is useful to the purpose of the database.  If you need the
>> entire message, store it intact in the database, as the bytes it is.  Email
>> isn't Unicode any more than a JPEG or other image types (often payloads in
>> a message) are Unicode.
>
>This is all great, and I did quite quickly realize that the best
>approach was to store the mails in their network byte-stream format as
>bytes. The approach was negated in my own case because of PostgreSQL's
>execrable BLOB-handling capabilities. I took a look at the escaping they
>required, snorted with derision and gave it up as a bad job.
 ...

I use MySQL, but sort of intend to learn PostgreSQL.  I didn't know that
PostgreSQL has no real support for BLOBs.  I agree that having to import
them from a file is awful.  Also, there appears to be a severe limit on the
size of character data fields, so storing in Base64 is out.  About the only
thing to do then is to use external storage for the BLOBs.

Still, email seems to demand such binary storage, whether all databases
provide it or not.
-- 

TonyN.:'   <mailto:tonynel...@georgeanelson.com>
  '  <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] email package Bytes vs Unicode (was Re: Dropping bytes "support" in json)

2009-04-09 Thread Tony Nelson
(email-sig added)

At 08:07 -0400 04/09/2009, Steve Holden wrote:
>Barry Warsaw wrote:
 ...
>> This is an interesting question, and something I'm struggling with for
>> the email package for 3.x.  It turns out to be pretty convenient to have
>> both a bytes and a string API, both for input and output, but I think
>> email really wants to be represented internally as bytes.  Maybe.  Or
>> maybe just for content bodies and not headers, or maybe both.  Anyway,
>> aside from that decision, I haven't come up with an elegant way to allow
>> /output/ in both bytes and strings (input is I think theoretically
>> easier by sniffing the arguments).
>>
>The real problem I came across in storing email in a relational database
>was the inability to store messages as Unicode. Some messages have a
>body in one encoding and an attachment in another, so the only ways to
>store the messages are either as a monolithic bytes string that gets
>parsed when the individual components are required or as a sequence of
>components in the database's preferred encoding (if you want to keep the
>original encoding most relational databases won't be able to help unless
>you store the components as bytes).
 ...

I found it confusing myself, and did it wrong for a while.  Now, I
understand that essages come over the wire as bytes, either 7-bit US-ASCII
or 8-bit whatever, and are parsed at the receiver.  I think of the database
as a wire to the future, and store the data as bytes (a BLOB), letting the
future receiver parse them as it did the first time, when I cleaned the
message.  Data I care to query is extracted into fields (in UTF-8, what I
usually use for char fields).  I have no need to store messages as Unicode,
and they aren't Unicode anyway.  I have no need ever to flatten a message
to Unicode, only to US-ASCII or, for messages (spam) that are corrupt, raw
8-bit data.

If you need the data from the message, by all means extract it and store it
in whatever form is useful to the purpose of the database.  If you need the
entire message, store it intact in the database, as the bytes it is.  Email
isn't Unicode any more than a JPEG or other image types (often payloads in
a message) are Unicode.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Integrate BeautifulSoup into stdlib?

2009-03-04 Thread Tony Nelson
At 2:56 PM + 3/4/09, Chris Withers wrote:
>Vaibhav Mallya wrote:
>> We do have HTMLParser, but that doesn't handle malformed pages well, and
>> just isn't as nice as BeautifulSoup.
>
>Interesting, given that BeautifulSoup is built on HTMLParser ;-)

In BeautifulSoup >= 3.1, yes.  Before that (<= 3.07a), it was based on the
more robust sgmllib.SGMLParser.  The current BeautifulSoup can't handle
'', while the earlier SGMLParser versions can.  I don't
know quite how common that missing space is in the wild, but I've
personally made HTML with that problem.  Maybe this is the only problem
with using HTMLParser instead of SGMLParser; I don't know.  In the mean
time, if I have a need for BeautifulSoup in Python3.x, I'll port sgmllib
and use the older BeautifulSoup.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bug in SimpleHTTPRequestHandler.send_head?

2008-09-05 Thread Tony Nelson
At 1:19 PM +0100 9/5/08, Michael Foord wrote:
>Hello Kim,
>
>Thanks for your post. The source code control used for Python is Subversion.
>
>Patches submitted to this list will unfortunately get lost. Please post
>the bug report along with your comments and patch to the Python bug tracker:
>
>http://bugs.python.org/

Patches are usually done with patch, using the output of diff -u.
bugs.python.org links to the Python wiki with Help : Tracker Documentation,
and searching the wiki can turn up some info on bug submission, but I don't
see any step-by-step instructions for newbies.

If you're not yet confident that this is really a bug or don't want to
wrestle with the bug tracker just now, you might get more disscussion on
the newsgroup comp.lang.python.  Probably the subject should not say "bug",
or you might only get suggestions to submit a bug, but rather something
like "Should SimpleHTTPRequestHandler.send_head() change text line
endings?", or whatever you think might provoke discussion.

FWIW, Python 2.6 and 3.0 are near release, so any accepted patch would at
the earliest go into the next after version of Python: 2.7 or 3.1.  Patches
often laguish and need a champion to push them through.  Helping review
other patches or bugs is one way to contribute.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bsddb alternative (was Re: [issue3769] Deprecate bsddb for removal in 3.0)

2008-09-04 Thread Tony Nelson
At 7:37 AM -0700 9/4/08, C. Titus Brown wrote:
>On Thu, Sep 04, 2008 at 10:29:10AM -0400, Tony Nelson wrote:
 ...
>-> Shipping an application to end users is a different problem.  Such packages
>-> should include a private copy of Python as well as of any dependent
>-> libraries, as tested.
>
>Why?  On Mac OS X, for example, Python comes pre-installed -- not sure
>if it comes with Tk yet, but the next version probably will.  On Windows
>there's a handy few-click installer that installs Tk.  Is there some
>reason why I shouldn't be relying on those distributions??

Yes.  An application is tested with one version of Python and one version
of its libraries.  When MOSX updates Python or some other library, you are
relying on their testing of your application.  Unless you are Adobe or
similarly large they didn't do that testing.  Perhaps you have noticed the
threads about installing a new Python release over the Python that came
with an OS, and how bad an idea that is?  This is the same issue, from the
other side.

>Requiring users to install anything at all imposes a barrier to use.
>That barrier rises steeply in height the more packages (with versioning
>issues, etc.) are needed.  This also increases the tech support burden
>dramatically.
 ...

Precisely why one needs to ship a single installer that installs the
complete application, including Python and any other libraries it needs.
-- 

TonyN.:'   <mailto:[EMAIL PROTECTED]>
  '  <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bsddb alternative (was Re: [issue3769] Deprecate bsddb for removal in 3.0)

2008-09-04 Thread Tony Nelson
At 6:10 AM -0500 9/4/08, [EMAIL PROTECTED] wrote:
>>> Related but tangential question that we were discussing on the
>>> pygr[0] mailing list -- what is the "official" word on a scalable
>>> object store in Python?  We've been using bsddb, but is there an
>>> alternative?  And what if bsddb is removed?
>
>Brett> Beyond shelve there are no official plans to add a specific
>Brett> object store.
>
>Unless something has changed while I wasn't looking, shelve requires a
>concrete module under the covers: bsddb, gdbm, ndbm, dumbdbm.  It's just a
>thin layer over one of them that makes it appear as if you can have keys
>which aren't strings.

I thought that all that was happening was that BSDDB was becoming a
separate project.  If one needs BSDDB with Python2.6, one installs it.
Aren't there other parts of Python that require external modules, such as
Tk?  Using Tk requires installing it.  Such things are normally packaged by
each distro the same way as Python is packaged ("yum install tk bsddb").

Shipping an application to end users is a different problem.  Such packages
should include a private copy of Python as well as of any dependent
libraries, as tested.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Further PEP 8 compliance issues in threading and multiprocessing

2008-09-01 Thread Tony Nelson
At 1:04 PM +1200 9/2/08, Greg Ewing wrote:
>Antoine Pitrou wrote:
>
>> I don't see a problem for trivial functional wrappers to classes to be
>> capitalized like classes.
>
>The problem is that the capitalization makes you
>think it's a class, suggesting you can do things
>with it that you actually can't, e.g. subclassing.

Or that it returns a new object of that kind.


>I can't think of any reason to do this. If you
>don't want to promise that something is a class,
>what possible reason is there for naming it like
>one?
 ...

Lower-case names return something about an object.  Capitalized names
return a new object of the named type (more or less), either via a Class
constructor or a Factory object.  That's /a/ reason, anyway.

I suppose the question is what a capitalized name promises.  If it means
only "Class", then how should "Returns a new object", either from a Class
or a Factory, be shown?  Perhaps a new convention is needed for Factories?
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Another Proposal: Run GC less often

2008-06-21 Thread Tony Nelson
At 11:28 PM +0200 6/21/08, none wrote:
>Instead of collecting objects after a fixed number of allocations (700)
 ...

I've seen this asserted several times in this thread:  that GC is done
every fixed number of allocations.  This is not correct.  GC is done when
the surplus of allocations less deallocations exceeds a threashold.  See
Modules/gcmodule.c and look for ".count++" and ".count--".  In normal
operation, allocations and deallocations stay somewhat balanced, but when
creating a large data structure, it's allocations all the way and GC runs
often.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Assignment to None

2008-06-09 Thread Tony Nelson
At 4:46 PM +0100 6/9/08, Michael Foord wrote:
>Alex Martelli wrote:
>> The problem is more general: what if a member  (of some external
>> object we're proxying one way or another) is named print (in Python <
>> 3), or class, or...?  To allow foo.print or bar.class would require
>> pretty big changes to Python's parser -- I have vague memories that
>> the issue was discussed ages ago (possibly in conjunction with some
>> early release of Jython) but never went anywhere much (including
>> proposals to automatically append an underscore to such IDs in the
>> proxying layer, etc etc).  Maybe None in particular is enough of a
>> special case (if it just happens to be hugely often used in dotNET
>> libraries)?
>>
>
>'None' as a member does occur particularly frequently in the .NET world.
>
>A halfway house might be to state (something like):
>
>Python as a language disallows you from having names the same as
>keywords or 'None'. An implementation restriction specific to CPython is
>that the same restriction also applies to member names. Alternative
>implementations are free to not implement this restriction, with the
>caveat that code using reserved member names directly will be invalid
>syntax for CPython.
 ...

Or perhaps CPython should just stop trying to detect this at compile time.
Note that while assignment to ".None" is not allowed, setattr(foo, "None",
1) then referencing ".None" is allowed.

>>> f.None = 1
SyntaxError: assignment to None
>>> f.None
Traceback (most recent call last):
  File "", line 1, in ?
AttributeError: 'Foo' object has no attribute 'None'
>>> setattr(f, 'None', 1)
> f.None
1
>>>
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Copying cgi.parse_qs() to the urllib.parse module

2008-05-12 Thread Tony Nelson
At 11:56 PM -0400 5/10/08, Fred Drake wrote:
>On May 10, 2008, at 11:49 PM, Guido van Rossum wrote:
>> Works for me. The other thing I always use from cgi is escape() --
>> will that be available somewhere else too?
>
>
>xml.sax.saxutils.escape() would be an appropriate replacement, though
>the location is a little funky.

At least it's right next to the valuable quoteattr().
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread Tony Nelson
At 1:14 PM -0400 4/21/08, David Wolever wrote:
>On 21-Apr-08, at 12:44 PM, [EMAIL PROTECTED] wrote:
>>
>> David> Is there some sort of text encoding detection module is the
>> David> standard library?  And, if not, is there any reason not
>> to add
>> David> one?
>> No, there's not.  I suspect the fact that you can't correctly
>> determine the
>> encoding of a chunk of text 100% of the time mitigates against it.
>Sorry, I wasn't very clear what I was asking.
>
>I was thinking about making an educated guess -- just like chardet
>(http://chardet.feedparser.org/).
>
>This is useful when you get a hunk of data which _should_ be some
>sort of intelligible text from the Big Scary Internet (say, a posted
>web form or email message), and you want to do something useful with
>it (say, search the content).

Feedparser.org's chardet can't guess 'latin1', so it should be used as a
last resort, just as the docs say.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] fixing tests on windows

2008-04-03 Thread Tony Nelson
At 3:52 PM -0600 4/3/08, Steven Bethard wrote:
>On Thu, Apr 3, 2008 at 3:09 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
 ...
>Or were you suggesting that there is some programmatic way for the
>test suite to create directories that disallow the Search Service,
>etc.?

I'd think that files and directories created in the TEMP directory would
normally not be indexed on any OS, including MSWindows.  But this is just a
guess.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Syntax suggestion for imports

2008-01-03 Thread Tony Nelson
At 3:20 PM +0100 1/3/08, Christian Heimes wrote:
>Raymond Hettinger wrote:
>> How about a new, simpler syntax:
 ...
>> * import readline or emptymodule
>
>The syntax idea has a nice ring to it, except for the last idea. As
>others have already said, the name emptymodule is too magic.
>
>The readline example becomes more readable when you change the import to
>
>import readline or None as readline
>
>
>In my opinion the import or as syntax definition is easy to understand
>if you force the user to always have an "as" statement. The None name is
>optional but must be the last name:
>
>import name[, or name2[, or name3 ...] [, or None] as target
 ...

At 11:48 AM -0600 1/3/08, Ron Adam wrote:
 ...
>An alternative possibility might be, rather than "or", reuse "else" before
>import.
 ...

I prefer "else" to "or" but with the original single-statement syntax.

If the last clause could be an expression as well as a module name, what
I've done (used with and copied from BeautifulSoup):

try:
from htmlentitydefs import name2codepoint
except ImportError:
name2codepoint = {}

could become:

from htmlentitydefs else ({}) import name2codepoint as name2codepoint

Also:

import foo or (None) as foo
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Signals+Threads (PyGTK waking up 10x/sec).

2007-12-08 Thread Tony Nelson
At 11:17 AM +0100 12/8/07, Johan Dahlin wrote:
>Guido van Rossum wrote:
>> Adam, perhaps at some point (Monday?) we could get together on
>> #python-dev and interact in real time on this issue. Probably even
>> better on the phone. This offer is open to anyone who is serious about
>> getting this resolved. Someone please take it -- I'm offering free
>> consulting here!
>>
>> I'm curious -- is there anyone here who understands why [Py]GTK is
>> using signals anyway? It's not like writing robust signal handling
>> code in C is at all easy or obvious. If instead of a signal a file
>> descriptor could be used, all problems would likely be gone.
>
>The timeout handler was added for KeyboardInterrupt to be able to work when
>you want to Ctrl-C yourself out of the gtk.main() loop.

Is that always required (with threads), or are things better now that
Ctrl-C handling is improved (at least in the Socket module, which doesn't
lose signals anymore)?
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Signals+Threads (PyGTK waking up 10x/sec).

2007-12-08 Thread Tony Nelson
At 2:01 AM -0800 12/8/07, Guido van Rossum wrote:
 ...
>I'm curious -- is there anyone here who understands why [Py]GTK is
>using signals anyway? It's not like writing robust signal handling
>code in C is at all easy or obvious. If instead of a signal a file
>descriptor could be used, all problems would likely be gone.

I don't think PyGTK does for GTK2 signal emission -- though Johan Dahlin is
authoritative here.  See
 .
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Tony Nelson
At 3:30 PM -0400 9/14/07, Jean-Paul Calderone wrote:
>On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss <[EMAIL PROTECTED]> wrote:
>>Your idea can be combined with the maxint/2 initial refcount for
>>> non-disposable objects, which should about eliminate thread-count updates
>>> for them.
>>> --
>>>
>>
>> I don't really like the maxint/2 idea because it requires us to
>>differentiate between globals and everything else. Plus, it's a hack. I'd
>>like a more elegant solution if possible.
>
>It's not really a solution either.  If your program runs for a couple
>minutes and then exits, maybe it won't trigger some catastrophic behavior
>from this hack, but if you have a long running process then you're almost
>certain to be screwed over by this (it wouldn't even have to be *very*
>long running - a month or two could do it on a 32bit platform).

I don't think either of you understand what setting the initial refcount to
maxint/2 for global objects in a thread's refcount vector would do.  It has
/no/ effect on refcounting.  It only prevents the refcount from becoming
zero for objects that can never be released, but which would always have a
zero thread refcount on thread exit, which would cause a useless and
frequent thread count decrement for the object.  As the object can never be
released, its thread count would be initially non-zero, so the thread count
won't be made zero when the thread refcount becomes zero.  The thread count
is shared in the object.  The thread refcount is per thread, and should not
be shared, even at the physical cache line level, if good performance is
desired.

When a new thread is created, part of the thread state would be the
refcount vector.  Hopefully it would mostly be just VM magic, but the
initial part of the vector would contain the immortal objects' refcount,
and those would be set to maxint/2.  Or 1, for that matter.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Tony Nelson
At 1:51 AM -0500 9/14/07, Justin Tulloss wrote:
>On 9/14/07, Adam Olsen <[EMAIL PROTECTED]> wrote:
>
>> Could be worth a try. A first step might be to just implement
>> the atomic refcounting, and run that single-threaded to see
>> if it has terribly bad effects on performance.
>
>I've done this experiment.  It was about 12% on my box.  Later, once I
>had everything else setup so I could run two threads simultaneously, I
>found much worse costs.  All those literals become shared objects that
>create contention.
>
>
>It's hard to argue with cold hard facts when all we have is raw
>speculation. What do you think of a model where there is a global "thread
>count" that keeps track of how many threads reference an object? Then
>there are thread-specific reference counters for each object. When a
>thread's refcount goes to 0, it decrefs the object's thread count. If you
>did this right, hopefully there would only be cache updates when you
>update the thread count, which will only be when a thread first references
>an object and when it last references an object.

It's likely that cache line contention is the issue, so don't glom all the
different threads' refcount for an object into one vector.  Keep each
thread's refcounts in a per-thread vector of objects, so only that thread
will cache that vector, or make refcounts so large that each will be in its
own cache line (usu. 64 bytes, not too horrible for testing purposes).  I
don't know all what would be required for separate vectors of refcounts,
but each object could contain its index into the vectors, which would all
be the same size (Go Virtual Memory!).


>I mentioned this idea earlier and it's growing on me. Since you've
>actually messed around with the code, do you think this would alleviate
>some of the contention issues?
>
>Justin

Your idea can be combined with the maxint/2 initial refcount for
non-disposable objects, which should about eliminate thread-count updates
for them.
-- 

TonyN.:'   
  '  ___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Adventures with x64, VS7 and VS8 on Windows

2007-05-29 Thread Tony Nelson
At 1:14 PM + 5/29/07, Kristján Valur Jónsson wrote:
>> -Original Message-
>>
>> Microsoft's command line cannot cope with two pathnames that must be
>> quoted, so if the command path itself must be quoted, then no argument
>> to
>> the command can be quoted.  There are tricky hacks that can work around
>> this mind-boggling stupidity, but life is simpler if Python itself
>> doesn't
>> use up the one quoted pathname.  I don't know if Microsoft has had the
>> good
>> sense to fix this in Vista (which I probably will never use, since an
>> alternative exists), but they didn't in XP.
>
>Do you have any references for this claim?
>In my command line on XP sp2, this works just fine:
>
>C:\Program Files\Microsoft Visual Studio 8\VC>"c:\Program Files\TextPad 
>4\TextPad.exe" "c:\tmp\f a.txt" "c:\tmp\f b.txt"
>
>Both the program, and the two file names are quoted and textpad.exe opens
>them both.

I pounded my head against this issue when working on a .bat file a few
years back, until I read the help for cmd and saw the quote logic (and
switched to VBScript).  It's still there, in "help cmd".  I had once found
references to the same issue for the run command in Microsoft's online help.

Perhaps it is fixed in SP2. If so, just change it and don't worry about
users with earlier versions of Windows.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Adventures with x64, VS7 and VS8 on Windows

2007-05-26 Thread Tony Nelson
At 12:20 PM + 5/26/07, Kristján Valur Jónsson wrote:
>> -Original Message-
>> From: Alexey Borzenkov [mailto:[EMAIL PROTECTED]
>> Sent: Wednesday, May 23, 2007 20:36
>> To: Kristján Valur Jónsson
>> Cc: Martin v. Löwis; Mark Hammond; [EMAIL PROTECTED]; python-
>> [EMAIL PROTECTED]
>> Subject: Re: [Python-Dev] Adventures with x64, VS7 and VS8 on Windows
>>
>> On 5/23/07, Kristján Valur Jónsson <[EMAIL PROTECTED]> wrote:
>> > > > Install in the ProgramFiles folder.
>> > > Only over my dead body. *This* is silly.
>> > Bill doesn't think so.  And he gets to decide.  I mean we do want
>> > to play nice, don't we?  Nothing installs itself in the root anymore,
>> > not since windows 3.1
>>
>> Maybe installing in the root is not good, but installing to "Program
>> Files" is just asking for trouble. All sorts of development tools
>> might suddenly break because of that space in the middle of the path
>> and requirement to use quotes around it. I thus usually install things
>> to :\Programs. I'm not sure if any packages/programs will break
>> because of that space, but what if some will?
>
>Development tools used on windows already have to cope with this.
>Spaces are not going away, so why not bite the bullet and deal
>with them?  Moving forward sometimes means crossing rivers.
 ...

Microsoft's command line cannot cope with two pathnames that must be
quoted, so if the command path itself must be quoted, then no argument to
the command can be quoted.  There are tricky hacks that can work around
this mind-boggling stupidity, but life is simpler if Python itself doesn't
use up the one quoted pathname.  I don't know if Microsoft has had the good
sense to fix this in Vista (which I probably will never use, since an
alternative exists), but they didn't in XP.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Official version support statement

2007-05-11 Thread Tony Nelson
At 12:58 AM +0200 5/12/07, Martin v. Löwis wrote:
>> "The Python Software Foundation officially supports the current
>> stable major release of Python.  By "supports" we mean that the PSF
>> will produce bug fix releases of this version, currently Python 2.5.
>> We may release patches for earlier versions if necessary, such as to
>> fix security problems, but we generally do not make releases of such
>> unsupported versions.  Patch releases of earlier Python versions may
>> be made available through third parties, including OS vendors."
>
>If such an official statement still can be superseded by an even more
>official PEP, it's fine with me.
>
>However, I would prefer to not use the verb "support" at all. We (the
>PSF) don't provide any technical support for *any* version ever
>released: '''PSF is making Python available to Licensee on an "AS IS"
>basis.  PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
>IMPLIED.  BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND
>DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
>FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT
>INFRINGE ANY THIRD PARTY RIGHTS.'''
>
>The more I think about it: no, there is no official support for the
>current stable release. We will like produce more bug fix releases,
>but then, we may not if the volunteers doing so lose time or
>interest, and 2.6 comes out earlier than planned.
>
>Why do you need such a statement?

I think Fedora might want it, per recent discussions on fedora-devel-list.

My impertinent attempt:

"The Python Software Foundation maintains the current stable major
release of Python.  By "maintains" we mean that the PSF will produce
bug fix releases of that version, currently Python 2.5.  We have
released patches for earlier versions as necessary, such as to fix
security problems, but we generally do not make releases of such
prior versions.  Patched releases of earlier Python versions may be
made available through third parties, including OS vendors."
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] datetime module enhancements

2007-03-11 Thread Tony Nelson
At 5:45 PM +1300 3/11/07, Greg Ewing wrote:
>Jon Ribbens wrote:
>
>> What do you feel "next Tuesday plus 12 hours" means? ;-)
>
>I would say it's meaningless. My feeling is that subtracting
>two dates should give an integer number of days, and that is
>all you should be allowed to add to a date.

Apple's old MacOS had a very flexible LongDateRecord and date utilities.
Nearly anything one could do to a date had a useful meaning.  Perhaps
Python should be different, but I've found Apple's date calculations and
date parsing to be very useful, in a Pythonic sort of way.

>From old New Inside Macintosh, _Operating System Utilities_, Ch. 4 "Date,
Time, and Measurement Utilities":

Calculating Dates
-
In the date-time record and long date-time record, any value in the month,
day, hour, minute, or second field that exceeds the maximum value allowed
for that field, will cause a wraparound to a future date and time when you
modify the date-time format.

*   In the month field, values greater than 12 cause a wraparound
to a future year and month.
*   In the day field, values greater than the number of days in a
given month cause a wraparound to a future month and day.
*   In the hour field, values greater than 23 cause a wraparound to
a future day and hour.
*   In the minute field, values greater than 59 cause a wraparound
to a future hour and minute.
*   In the seconds field, values greater than 59 cause a wraparound
to a future minute and seconds.

You can use these wraparound facts to calculate and retrieve information
about a specific date. For example, you can use a date-time record and the
DateToSeconds and SecondsToDate procedures to calculate the 300th day of
1994. Set the month field of the date-time record to 1 and the year field
to 1994. To find the 300th day of 1994, set the day field of the date-time
record to 300. Initialize the rest of the fields in the record to values
that do not exceed the maximum value allowed for that field. (Refer to the
description of the date-time record on page 4-23 for a complete list of
possible values). To force a wrap-around, first convert the date and time
(in this example, January 1, 1994) to the number of seconds elapsed since
midnight, January 1, 1904 (by calling the DateToSeconds procedure). Once
you have converted the date and time to a number of seconds, you convert
the number of seconds back to a date and time (by calling the SecondsToDate
procedure). The fields in the date-time record now contain the values that
represent the 300th day of 1994. Listing 4-6 shows an application-defined
procedure that calculates the 300th day of the Gregorian calendar year
using a date-time record.

Listing 4-6 Calculating the 300th day of the year

PROCEDURE MyCalculate300Day;
VAR
myDateTimeRec:  DateTimeRec;
mySeconds:  LongInt;
BEGIN
WITH myDateTimeRec DO
BEGIN
year := 1994;
month := 1;
day := 300;
hour := 0;
minute := 0;
second := 0;
dayOfWeek := 1;
END;
DateToSeconds (myDateTimeRec, mySeconds);
SecondsToDate (mySeconds, myDateTimeRec);
END;

The DateToSeconds procedure converts the date and time to the number of
seconds elapsed since midnight, January 1, 1904, and the SecondsToDate
procedure converts the number of seconds back to a date and time. After the
conversions, the values in the year, month, day, and dayOfWeek fields of
the myDateTimeRec record represent the year, month, day of the month, and
day of the week for the 300th day of 1994. If the values in the hour,
minute, and second fields do not exceed the maximum value allowed for each
field, the values remain the same after the conversions (in this example,
the time is exactly 12:00 A.M.).

Similarly, you can use a long date-time record and the LongDateToSeconds
and LongSecondsToDate procedures to compute the day of the week
corresponding to a given date. Listing 4-7 shows an application-defined
procedure that computes and retrieves the name of the day for July 4, 1776.
Note that because the year is prior to 1904, it is necessary to use a long
date-time record.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] splitext('.cshrc')

2007-03-08 Thread Tony Nelson
At 2:16 PM -0500 3/8/07, Phillip J. Eby wrote:
>At 11:53 AM 3/8/2007 +0100, Martin v. Löwis wrote:
>>That assumes there is a need for the old functionality. I really don't
>>see it (pje claimed he needed it once, but I remain unconvinced, not
>>having seen an actual fragment where the old behavior is helpful).
>
>The code in question was a type association handler that looked up loader
>functions based on file extension.  This was specifically convenient for
>recognizing the difference between .htaccess files and other dotfiles that
>might appear in a web directory tree -- e.g. .htpasswd.  The proposed
>change of splitext() would break that determination, because .htpasswd and
>.htaccess would both be considered files with empty extensions, and would
>be handled by the "empty extension" handler.

So, ".htaccess" and "foo.htaccess" should be treated the same way?  Is that
what Apache does?
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Linux Standard Base (LSB)

2006-12-23 Thread Tony Nelson
At 8:42 PM +0100 12/2/06, Martin v. Löwis wrote:
>Jan Claeys schrieb:
>> Like I said, it's possible to split Python without making things
>> complicated for newbies.
>
>You may have that said, but I don't believe its truth. For example,
>most distributions won't include Tkinter in the "standard" Python
>installation: Tkinter depends on _tkinter depends on Tk depends on
>X11 client libraries. Since distributors want to make X11 client
>libraries optional, they exclude Tkinter. So people wonder why
>they can't run Tkinter applications (search comp.lang.python for
>proof that people wonder about precisely that).
>
>I don't think the current packaging tools can solve this newbie
>problem. It might be solvable if installation of X11 libraries
>would imply installation of Tcl, Tk, and Tkinter: people running
>X (i.e. most desktop users) would see Tkinter installed, yet
>it would be possible to omit Tkinter.

Given the current packaging tools, could Python have stub modules for such
things that would just throw a useful exception giving the name of the
required package?  Perhaps if Python just had an example of such a stub
(and Tkinter comes to mind), packagers would customize it and make any
others they needed?
-- 

TonyN.:'The Great Writ 
  '  is no more. 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Polling with Pending Calls?

2006-12-04 Thread Tony Nelson
At 12:48 PM -0500 12/4/06, Tony Nelson wrote:
>I think I have a need to handle *nix signals through polling in a library.
>It looks like chaining Pending Calls is almost the way to do it, but I see
>that doing so would make the interpreter edgy.
 ...

Bah.  Sorry to have put noise on the list.  I'm obviously too close to the
problem to see the simple solution of threading.Timer.  Checking once or
twice a second should be good enough.  Sorry to bother you all.
-- 

TonyN.:'The Great Writ <mailto:[EMAIL PROTECTED]>
  '  is no more. <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Polling with Pending Calls?

2006-12-04 Thread Tony Nelson
At 6:07 PM + 12/4/06, Gustavo Carneiro wrote:
>This patch may interest you:
>http://www.python.org/sf/1564547
>
>Not sure it completely solves your case, but it's at least close to
>your problem.

I don't think that patch is useful in this case.  This case is not stuck in
some extension module's poll() call.  The signal handler is not Python's
nor is it under my control (so no chance that it would look at some new
pipe), though the rpmlib Python bindings can look at the state bits it
sets.  The Python interpreter is running full-bore when the secret rpmlib
SIGINT state is needed.  I think the patch is for the exact /opposite/ of
my problem.
-- 

TonyN.:'The Great Writ 
  '  is no more. 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Polling with Pending Calls?

2006-12-04 Thread Tony Nelson
I think I have a need to handle *nix signals through polling in a library.
It looks like chaining Pending Calls is almost the way to do it, but I see
that doing so would make the interpreter edgy.

The RPM library takes (steals) the signal handling away from its client
application.  It has good reason to be so paranoid, but it breaks the
handling keyboard interrupts, especially if rpmlib is used in the normal
way:  opened at the beginning, various things are done by the app, closed
at the end.  If there is an extended period in the middle where no calls
are made to rpmlib (say, in yum during the downloading of packages or
package headers), then responst to a keyboard interrupt can be delayed for
/minutes/!  Yum is presently doing something awful to work around that
issue.

It is possible to poll rpmlib to find if there is a pending keyboard
interrupt.  Client applications could have such polls sprinkled throughout
them.  I think getting yum, for example, to do that might be politically
difficult.  I'm hoping to propose a patch to rpmlib's Python bindings to do
the polling automagically.

Looking at Python's normal signal handling, I see that Py_AddPendingCall()
and Py_MakePendingCalls(), and  PyEvel_EvalFrameEx()'s ticker check are how
signals and other async events are done.  I could imagine making rpmlib's
Python bindings add a Pending Call when the library is loaded (or some
such), and that Pending Call would make a quick check of rpmlib's caught
signals flags and then call Py_AddPendingCall() on itself.  It appears that
this would work, and is almost the expected thing to do, but unfortunately
it would cause the ticker check to think that Py_MakePendingCalls() had
failed and needed to be called again ASAP, which would drastically slow
down the interpreter.

Is there a right way to get the Python interpreter to poll something, or
should I look for another approach?

[I hope this message doesn't spend too many days in the grey list limbo.]
-- 

TonyN.:'The Great Writ 
  '  is no more. 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] 2.4.4 fix: Socketmodule Ctl-C patch

2006-10-03 Thread Tony Nelson
I've put a patch for 2.4.4 of the Socketmodule Ctl-C patch for 2.5, at the
old closed bug  .  It passes "make
EXTRAOPS-=unetwork test".

Should I try to put this into the wiki at Python24Fixes?  I haven't used
the wiki before.
-- 

TonyN.:'The Great Writ 
  '  is no more. 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Testing Socket Timeouts patch 1519025

2006-07-30 Thread Tony Nelson
At 12:58 AM -0400 7/31/06, Tony Nelson wrote:
>At 12:39 AM -0400 7/31/06, Tony Nelson wrote:
>
>>popen('"E:\Documents and Settings\Tony Nelson\My
>>Documents\Python\pydev\trunk\PCBuild\python.exe" -c "import
>>sys;sys.version_info"')
>
>Ehh, I must admit that I retyped that.  Obviously what I typed would not
>work, but what I used was:
>
>python = '"' + sys.executable + '"'
>popen(python + ' -c "import sys;sys.version_info"'
>
>So there wasn't a problem with backslashes.  I've also been using raw
>strings.  And, as I said, the file objects looked OK, with backslashes
>where they should be.  Sorry for the mistyping.

OK, I recognize the bug now.  It's that quote parsing bug in MSWindows
(which I can find again if you want) which can be worked around by using an
extra quote at the front (and maybe also the back):

popen('""E:\Documents ...

Not really a bug in Python at all.

TonyN.:'   <mailto:[EMAIL PROTECTED]>
  '  <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Testing Socket Timeouts patch 1519025

2006-07-30 Thread Tony Nelson
At 12:39 AM -0400 7/31/06, Tony Nelson wrote:

>popen('"E:\Documents and Settings\Tony Nelson\My
>Documents\Python\pydev\trunk\PCBuild\python.exe" -c "import
>sys;sys.version_info"')

Ehh, I must admit that I retyped that.  Obviously what I typed would not
work, but what I used was:

python = '"' + sys.executable + '"'
popen(python + ' -c "import sys;sys.version_info"'

So there wasn't a problem with backslashes.  I've also been using raw
strings.  And, as I said, the file objects looked OK, with backslashes
where they should be.  Sorry for the mistyping.

TonyN.:'   <mailto:[EMAIL PROTECTED]>
  '  <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Testing Socket Timeouts patch 1519025

2006-07-30 Thread Tony Nelson
At 4:34 AM +0200 7/31/06, Martin v. Löwis wrote:
>Tony Nelson schrieb:
>>Hmm. Well, it would make the test possible on MSWindows as well as on
>>OS's implementing alarm(2).  If I figure out how to build Python on
>>MSWindows, I might give it a try.  I tried to get MSVC 7.1 via the .Net
>>SDK, but it installed VS 8 instead, so I'm not quite sure how to proceed.
>
>The .NET SDK (any version) is not suitable to build Python.

I do see the warning in the instructions about it not be an optimizing
compiler.  I've managed to build python.exe and the rt.bat tests mostly
work -- 2 tests fail, test_popen, and test_cmd_line because of popen()
failing.

Hmm, actually, this might be a real problem with the MSWindows version of
posix_popen() in Modules/posixmodule.c.  The path to my built python.exe is:

"E:\Documents and Settings\Tony Nelson\My 
Documents\Python\pydev\trunk\PCBuild\python.exe"

(lots of spaces in it).  It seems properly quoted in the test and when I do
it by hand, but in a call to popen() it doesn't work:

popen('"E:\Documents and Settings\Tony Nelson\My 
Documents\Python\pydev\trunk\PCBuild\python.exe" -c "import 
sys;sys.version_info"')

The returned file object repr resembles one that does work.  If I just use
"python.exe" from within the PCBuild directory:

popen('python.exe -c "import sys;sys.version_info"')

I get the right version, and that's the only 2.5b2 python I've got, so the
built python must be working, but the path, even quoted, isn't accepted by
MSWindows XP SP2.  Should I report a bug?  It may well just be MSWindows
weirdness, and not something that posixmodule.c can do anything about. 
OTOH, it does work from the command line.  I'll bet I wouldn't have seen a
thing if I'd checked out to "E:\pydev" instead.

>You really need VS 2003; if you don't have it anymore, you might be able
>to find a copy of the free version of the VC Toolkit 2003
>(VCToolkitSetup.exe) somewhere.

I really never had VS 2003.  It doesn't appear to be on microsoft.com
anymore.  I'm reluctant to try to steal a copy.


>Of course, just for testing, you can also install VS Express 2005, and
>use the PCbuild8 projects directory; these changes should work the
>same under both versions.

I'll try that if I have any real trouble with the non-optimized python or
if you insist that it's necessary.

TonyN.:'   <mailto:[EMAIL PROTECTED]>
  '  <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Testing Socket Timeouts patch 1519025

2006-07-30 Thread Tony Nelson
At 7:23 PM -0400 7/30/06, Tony Nelson wrote:
 ...
>...I tried to get MSVC 7.1 via the .Net SDK, but it
>installed VS 8 instead, so I'm not quite sure how to proceed.
 ...

David Murmann suggested off-list that I'd probably installed the 2.0 .Net
SDK, and that I should install the 1.1 .Net SDK, which is the correct one.
Now I can try to build Python on MSWindows.

TonyN.:'   <mailto:[EMAIL PROTECTED]>
  '  <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Testing Socket Timeouts patch 1519025

2006-07-30 Thread Tony Nelson
At 11:42 PM +0200 7/30/06, Martin v. Löwis wrote:
>Tony Nelson schrieb:
>>> You can use GenerateConsoleCtrlEvent to send Ctrl-C to all processes
>>> that share the console of the calling process.
>[...]
>> Martin, your advice is usually spot-on, but I don't always understand it.
>> Maybe using it here is just complicated.
>
>This was really just in response to your remark that you couldn't
>find a way to send Ctrl-C programmatically. I researched (in
>the C library sources) how SIGINT was *generated* (through
>SetConsoleCtrlHandler), and that let me to a way to generate [one.]

Well, fine work there!

>I didn't mean to suggest that you *should* use GenerateConsoleCtrlEvent,
>only that you could if you wanted to.

Hmm.  Well, it would make the test possible on MSWindows as well as on OS's
implementing alarm(2).  If I figure out how to build Python on MSWindows, I
might give it a try.  I tried to get MSVC 7.1 via the .Net SDK, but it
installed VS 8 instead, so I'm not quite sure how to proceed.


>> I expect that
>> GenerateConsoleCtrlEvent() can be called through the ctypes module, though
>> that would make backporting the test to 2.4 a bit more difficult.
>
>Well, if there was general utility to that API, I would prefer exposing
>it in the nt module. It doesn't quite fit into kill(2), as it doesn't
>allow to specify a pid of the target process, so perhaps it doesn't
>have general utility. In any case, that would have to wait for 2.6.

A Process Group ID is the PID of the first process put in it, so it's sort
of a PID.  It just means a collection of processes, probably more than one.
It seems to be mostly applicable to MSWindows, and isn't a suitable way to
implement a form of kill(2).

I hope that the Socket Timeouts patch 1519025 can make it into 2.5, or
2.5.1, as it is a bug fix.  As such, it would probably be better to punt
the test on MSWindows than to do a tricky fancy test that might have its
own issues.

TonyN.:'   <mailto:[EMAIL PROTECTED]>
  '  <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Testing Socket Timeouts patch 1519025

2006-07-30 Thread Tony Nelson
At 9:42 AM +0200 7/30/06, Martin v. Löwis wrote:
>Tony Nelson schrieb:
>> Hmm, OK, darn, thanks.  MSWindows does allow users to press Ctl-C to send a
>> KeyboardInterrupt, so it's just too bad if I can't find a way to test it
>> from a script.
>
>You can use GenerateConsoleCtrlEvent to send Ctrl-C to all processes
>that share the console of the calling process.

That looks like it would work, but it seems prone to overkill.  To avoid
killing all the processes running from a console, the test would need to be
run in a subprocess in a new process group.  If the test simply sends the
event to its own process, all the other processes in its process group
would receive the event as well, and probably die.  I would expect that all
the processes sharing the console would die, but even if they didn't when I
tried it, I couldn't be sure that it wouldn't happen elsewhere, say when
run from a .bat file.

Martin, your advice is usually spot-on, but I don't always understand it.
Maybe using it here is just complicated.  I expect that
GenerateConsoleCtrlEvent() can be called through the ctypes module, though
that would make backporting the test to 2.4 a bit more difficult.  It looks
like the subprocess module can be passed the needed creation flag to make a
new process group.  The subprocess can send the event to itself, and could
return the test result in its result code, so that part isn't so bad.  To
avoid adding a new file to the distribution, test_socket.test_main() could
be modified to look for a command line argument requesting the particular
test action.


>> BTW, I picked SIGALRM because I could do it all with one thread.  Reading
>> POSIX, ISTM that if I sent the signal from another thread, it would bounce
>> off that thread to the main thread during the call to kill(), at which
>> point I got the willies.  OTOH, if kill() is more widely available than
>> alarm(), I'll give it a try, but going by the docs, I'd say it isn't.
>
>Indeed, alarm should be available on any POSIX system.

Well, if alarm() is available, then the test will work.  If not, it will be
silently skipped, as are some other tests already in test_socket.py.  I
can't offhand tell if MSWindows supports alarm(), but RiscOS and OS2 do not.

TonyN.:'   <mailto:[EMAIL PROTECTED]>
  '  <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Testing Socket Timeouts patch 1519025

2006-07-29 Thread Tony Nelson
At 2:38 PM -0700 7/29/06, Josiah Carlson wrote:
>Tony Nelson <[EMAIL PROTECTED]> wrote:
>>
>> I'm trying to write a test for my Socket Timeouts patch [1], which fixes
>> signal handling (notably Ctl-C == SIGINT == KeyboarInterrupt) on socket
>> operations using a timeout.  I don't see a portable way to send a signal,
>> and asking the test runner to press Ctl-C is a non-starter.  A "real"
>> signal is needed to interrupt the select() (or equivalent) call, because
>> that's what wasn't being handled correctly.  The bug should happen on the
>> other platforms I don't know how to test on.
>>
>> Is there a portable way to send a signal?  SIGINT would be best, but
>> another signal (such as SIGALRM) would do, I think.
>
>According to my (limited) research on signals, Windows signal support is
>horrible.  I have not been able to have Python send signals of any kind
>other than SIGABRT, and then only to the currently running process,
>which kills it (regardless of whether you have a signal handler or not).

Hmm, OK, darn, thanks.  MSWindows does allow users to press Ctl-C to send a
KeyboardInterrupt, so it's just too bad if I can't find a way to test it
from a script.


>> If not, should I write the test to only work on systems implementing
>> SIGALRM, the signal I'm using now, or implementing kill(), or what?
>
>I think that most non-Windows platforms should have non-braindead signal
>support, though the signal module seems to be severely lacking in
>sending any signal except for SIGALRM, and the os module has its fingers
>on SIGABRT.

The test now checks "hasattr(signal, 'alarm')" before proceeding, so at
least it won't die horribly.


>If someone is looking for a project for 2.6 that digs into all sorts of
>platform-specific nastiness, they could add actual signal sending to the
>signal module (at least for unix systems).

Isn't signal sending the province of kill (2) (or os.kill()) in python)?
Not that I know much about it.

BTW, I picked SIGALRM because I could do it all with one thread.  Reading
POSIX, ISTM that if I sent the signal from another thread, it would bounce
off that thread to the main thread during the call to kill(), at which
point I got the willies.  OTOH, if kill() is more widely available than
alarm(), I'll give it a try, but going by the docs, I'd say it isn't.

TonyN.:'   <mailto:[EMAIL PROTECTED]>
  '  <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Testing Socket Timeouts patch 1519025

2006-07-29 Thread Tony Nelson
I'm trying to write a test for my Socket Timeouts patch [1], which fixes
signal handling (notably Ctl-C == SIGINT == KeyboarInterrupt) on socket
operations using a timeout.  I don't see a portable way to send a signal,
and asking the test runner to press Ctl-C is a non-starter.  A "real"
signal is needed to interrupt the select() (or equivalent) call, because
that's what wasn't being handled correctly.  The bug should happen on the
other platforms I don't know how to test on.

Is there a portable way to send a signal?  SIGINT would be best, but
another signal (such as SIGALRM) would do, I think.

If not, should I write the test to only work on systems implementing
SIGALRM, the signal I'm using now, or implementing kill(), or what?

[1] 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Socket Timeouts patch 1519025

2006-07-23 Thread Tony Nelson
I request a review of my patch (1519025) to get socket timeouts to work
properly with errors and signals.  I don't expect this patch would make it
into 2.5, but perhaps it could be in 2.5.1, as it fixes a long-standing
bug.  I know that people are busy with getting 2.5 out the door, but it
would be helpful for me to know if my current patch is OK before I start on
another patch to make socket timeouts more useful.  There is also a version
of the patch for 2.4, which would make yum nicer in Fedora 4 and 5, and I
think that passing a review would make the patch more acceptable to
Fedora's maintainers.

My next patch will, if it works, make socket timeouts easier to use
per-thread, allow for the timing of entire operations rather than just
timing transaction phases, allow for setting an acceptable rate for file
transfers, and should be completely backward compatible, in that old code
would be unaffected and new code would work as well as possible now on
older unpatched versions.  That's my plan, anyway.  It would build on my
current patch, at least in its principles.

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-17 Thread Tony Nelson
At 11:56 AM +0200 10/16/05, Martin v. Löwis wrote:
>Tony Nelson wrote:
>> BTW, Martin, if you care to, would you explain to me how a Trie would be
>> used for charmap encoding?  I know a couple of approaches, but I don't know
>> how to do it fast.  (I've never actually had the occasion to use a Trie.)
>
>I currently envision a three-level trie, with 5, 4, and 7 bits. You take
>the Unicode character (only chacters below U+ supported), and take
>the uppermost 5 bits, as index in an array. There you find the base
>of a second array, to which you add the next 4 bits, which gives you an
>index into a third array, where you add the last 7 bits. This gives
>you the character, or 0 if it is unmappable.

Umm, 0 (NUL) is a valid output character in most of the 8-bit character
sets.  It could be handled by having a separate "exceptions" string of the
unicode code points that actually map to the exception char.  Usually
"exceptions" would be a string of length 1.  Suggested changes below.


>struct encoding_map{
>   unsigned char level0[32];
>   unsigned char *level1;
>   unsigned char *level2;

Py_UNICODE *exceptions;

>};
>
>struct encoding_map *table;
>Py_UNICODE character;
>int level1 = table->level0[character>>11];
>if(level1==0xFF)raise unmapped;
>int level2 = table->level1[16*level1 + ((character>>7) & 0xF)];
>if(level2==0xFF)raise unmapped;
>int mapped = table->level2[128*level2 + (character & 0x7F)];

change:

>if(mapped==0)raise unmapped;

to:

if(mapped==0) {
Py_UNICODE *ep;
for(ep=table->exceptions; *ep; ep++)
if(*ep==character)
break;
if(!*ep)raise unmapped;
}


>Over a hashtable, this has the advantage of not having to deal with
>collisions. Instead, it guarantees you a lookup in a constant time.

OK, I see the benefit.  Your code is about the same amount of work as the
hash table lookup in instructions, indirections, and branches, normally
uses less of the data cache, and has a fixed running time.  It may use one
more branch, but its branches are easily predicted.  Thank you for
explaining it.


>It is also quite space-efficient: all tables use bytes as indizes.
>As each level0 deals with 2048 characters, most character maps
>will only use 1 or two level1 blocks, meaning 16 or 32 bytes
>for level1. The number of level3 blocks required depends on
>the number of 127-character rows which the encoding spans;
>for most encodings, 3 or four such blocks will be sufficient
>(with ASCII spanning one such block typically), causing the
>entire memory consumption for many encodings to be less than
>600 bytes.
 ...

As you are concerned about pathological cases for hashing (that would make
the hash chains long), it is worth noting that in such cases this data
structure could take 64K bytes.  Of course, no such case occurs in standard
encodings, and 64K is affordable as long is it is rare.

TonyN.:'   <mailto:[EMAIL PROTECTED]>
  '  <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-15 Thread Tony Nelson
I have put up a new, packaged version of my fast charmap module at
 .  Hopefully it is packaged properly
and works properly (it works on my FC3 Python 2.3.4 system).  This version
is over 5 times faster than the base codec according to Hye-Shik Chang's
benchmark (mostly from compiling it with -O3).

I bring it up here mostly because I mention in its docs that improved
faster charmap codecs are coming from the Python developers.  Is it OK to
say that, and have I said it right?  I'll take that out if you folks want.

I understand that my module is not favored by Martin v. Löwis, and I don't
advocate it becoming part of Python.  My web page and docs say that it may
be useful until Python has the faster codecs.  It allows changing the
mappings because that is part of the current semantics -- a new version of
Python can certainly change those semantics.

I want to thank you all for so quickly going to work on the problem of
making charmap codecs faster.  It's to the benefit of Python users
everywhere to have faster charmap codecs in Python.  Your quickness
impressed me.

BTW, Martin, if you care to, would you explain to me how a Trie would be
used for charmap encoding?  I know a couple of approaches, but I don't know
how to do it fast.  (I've never actually had the occasion to use a Trie.)

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-13 Thread Tony Nelson
I have written my fastcharmap decoder and encoder.  It's not meant to be
better than the patch and other changes to come in a future version of
Python, but it does work now with the current codecs.  Using Hye-Shik
Chang's benchmark, decoding is about 4.3x faster than the base, and
encoding is about 2x faster than the base (that's comparing the base and
the fast versions on my machine).  If fastcharmap would be useful, please
tell me where I should make it available, and any changes that are needed.
I would also need to write an installer (distutils I guess).

<http://georgeanelson.com/fastcharmap.d.tar.gz>

Fastcharmap is written in Python and Pyrex 0.9.3, and the .pyx file will
need to be compiled before use.  I used:

pyrexc _fastcharmap.pyx
gcc -c -fPIC -I/usr/include/python2.3/ _fastcharmap.c
gcc -shared _fastcharmap.o -o _fastcharmap.so

To use, hook each codec to be speed up:

import fastcharmap
help(fastcharmap)
fastcharmap.hook('name_of_codec')
u = unicode('some text', 'name_of_codec')
s = u.encode('name_of_codec')

No codecs were rewritten.  It took me a while to learn enough to do this
(Pyrex, more Python, some Python C API), and there were some surprises.
Hooking in is grosser than I would have liked.  I've only used it on Python
2.3 on FC3.  Still, it should work going forward, and, if the dicts are
replaced by something else, fastcharmap will know to leave everything
alone.  There's still a tiny bit of debugging print statements in it.


>At 8:36 AM +0200 10/5/05, Martin v. Löwis wrote:
>>Tony Nelson wrote:
> ...
>>> Encoding can be made fast using a simple hash table with external chaining.
>>> There are max 256 codepoints to encode, and they will normally be well
>>> distributed in their lower 8 bits.  Hash on the low 8 bits (just mask), and
>>> chain to an area with 256 entries.  Modest storage, normally short chains,
>>> therefore fast encoding.
>>
>>This is what is currently done: a hash map with 256 keys. You are
>>complaining about the performance of that algorithm. The issue of
>>external chaining is likely irrelevant: there likely are no collisions,
>>even though Python uses open addressing.
>
>I think I'm complaining about the implementation, though on decode, not
>encode.
>
>In any case, there are likely to be collisions in my scheme.  Over the
>next few days I will try to do it myself, but I will need to learn Pyrex,
>some of the Python C API, and more about Python to do it.
>
>
>>>>...I suggest instead just /caching/ the translation in C arrays stored
>>>>with the codec object.  The cache would be invalidated on any write to the
>>>>codec's mapping dictionary, and rebuilt the next time anything was
>>>>translated.  This would maintain the present semantics, work with current
>>>>codecs, and still provide the desired speed improvement.
>>
>>That is not implementable. You cannot catch writes to the dictionary.
>
>I should have been more clear.  I am thinking about using a proxy object
>in the codec's 'encoding_map' and 'decoding_map' slots, that will forward
>all the dictionary stuff.  The proxy will delete the cache on any call
>which changes the dictionary contents.  There are proxy classed and
>dictproxy (don't know how its implemented yet) so it seems doable, at
>least as far as I've gotten so far.
>
>
>>> Note that this caching is done by new code added to the existing C
>>> functions (which, if I have it right, are in unicodeobject.c).  No
>>> architectural changes are made; no existing codecs need to be changed;
>>> everything will just work
>>
>>Please try to implement it. You will find that you cannot. I don't
>>see how regenerating/editing the codecs could be avoided.
>
>Will do!

TonyN.:'   <mailto:[EMAIL PROTECTED]>
  '  <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-06 Thread Tony Nelson
At 8:36 AM +0200 10/5/05, Martin v. Löwis wrote:
>Tony Nelson wrote:
 ...
>> Encoding can be made fast using a simple hash table with external chaining.
>> There are max 256 codepoints to encode, and they will normally be well
>> distributed in their lower 8 bits.  Hash on the low 8 bits (just mask), and
>> chain to an area with 256 entries.  Modest storage, normally short chains,
>> therefore fast encoding.
>
>This is what is currently done: a hash map with 256 keys. You are
>complaining about the performance of that algorithm. The issue of
>external chaining is likely irrelevant: there likely are no collisions,
>even though Python uses open addressing.

I think I'm complaining about the implementation, though on decode, not encode.

In any case, there are likely to be collisions in my scheme.  Over the next
few days I will try to do it myself, but I will need to learn Pyrex, some
of the Python C API, and more about Python to do it.


>>>...I suggest instead just /caching/ the translation in C arrays stored
>>>with the codec object.  The cache would be invalidated on any write to the
>>>codec's mapping dictionary, and rebuilt the next time anything was
>>>translated.  This would maintain the present semantics, work with current
>>>codecs, and still provide the desired speed improvement.
>
>That is not implementable. You cannot catch writes to the dictionary.

I should have been more clear.  I am thinking about using a proxy object in
the codec's 'encoding_map' and 'decoding_map' slots, that will forward all
the dictionary stuff.  The proxy will delete the cache on any call which
changes the dictionary contents.  There are proxy classed and dictproxy
(don't know how its implemented yet) so it seems doable, at least as far as
I've gotten so far.


>> Note that this caching is done by new code added to the existing C
>> functions (which, if I have it right, are in unicodeobject.c).  No
>> architectural changes are made; no existing codecs need to be changed;
>> everything will just work
>
>Please try to implement it. You will find that you cannot. I don't
>see how regenerating/editing the codecs could be avoided.

Will do!

TonyN.:'   <mailto:[EMAIL PROTECTED]>
  '  <http://www.georgeanelson.com/>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-04 Thread Tony Nelson
[Recipient list not trimmed, as my replies must be vetted by a moderator,
which seems to delay them. :]

At 11:48 PM +0200 10/4/05, Walter Dörwald wrote:
>Am 04.10.2005 um 21:50 schrieb Martin v. Löwis:
>
>> Walter Dörwald wrote:
>>
>>> For charmap decoding we might be able to use an array (e.g. a
>>> tuple  (or an array.array?) of codepoints instead of dictionary.
>>>
>>
>> This array would have to be sparse, of course.
>
>For encoding yes, for decoding no.
>
>> Using an array.array would be more efficient, I guess - but we would
>> need a C API for arrays (to validate the type code, and to get ob_item).
>
>For decoding it should be sufficient to use a unicode string of
>length 256. u"\ufffd" could be used for "maps to undefined". Or the
>string might be shorter and byte values greater than the length of
>the string are treated as "maps to undefined" too.

With Unicode using more than 64K codepoints now, it might be more forward
looking to use a table of 256 32-bit values, with no need for tricky
values.  There is no need to add any C code to the codecs; just add some
more code to the existing C function (which, if I have it right, is
PyUnicode_DecodeCharmap() in unicodeobject.c).

 ...
>> For encoding, having a C trie might give considerable speedup. _codecs
>> could offer an API to convert the current dictionaries into
>> lookup-efficient structures, and the conversion would be done when
>> importing the codec.
>>
>> For the trie, two levels (higher and lower byte) would probably be
>> sufficient: I believe most encodings only use 2 "rows" (256 code
>> point blocks), very few more than three.
>
>This might work, although nobody has complained about charmap
>encoding yet. Another option would be to generate a big switch
>statement in C and let the compiler decide about the best data
>structure.

I'm willing to complain. :)  I might allow saving of my (53 MB) MBox file.
(Not that editing received mail makes as much sense as searching it.)

Encoding can be made fast using a simple hash table with external chaining.
There are max 256 codepoints to encode, and they will normally be well
distributed in their lower 8 bits.  Hash on the low 8 bits (just mask), and
chain to an area with 256 entries.  Modest storage, normally short chains,
therefore fast encoding.


At 12:08 AM +0200 10/5/05, Martin v. Löwis wrote:

>I would try to avoid generating C code at all costs. Maintaining the
>build processes will just be a nightmare.

I agree; also I don't think the generated codecs need to be changed at all.
All the changes can be made to the existing C functions, by adding caching
per a reply of mine that hasn't made it to the list yet.  Well, OK,
something needs to hook writes to the codec's dictionary, but I /think/
that only needs Python code.  I say:

>...I suggest instead just /caching/ the translation in C arrays stored
>with the codec object.  The cache would be invalidated on any write to the
>codec's mapping dictionary, and rebuilt the next time anything was
>translated.  This would maintain the present semantics, work with current
>codecs, and still provide the desired speed improvement.

Note that this caching is done by new code added to the existing C
functions (which, if I have it right, are in unicodeobject.c).  No
architectural changes are made; no existing codecs need to be changed;
everything will just work, and usually work faster, with very modest memory
requirements of one 256 entry array of 32-bit Unicode values and a hash
table with 256 1-byte slots and 256 chain entries, each having a 4 byte
Unicode value, a byte output value, a byte chain index, and probably 2
bytes of filler, for a hash table size of 2304 bytes per codec.

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode charmap decoders slow

2005-10-04 Thread Tony Nelson
At 9:37 AM +0200 10/4/05, Walter Dörwald wrote:
>Am 04.10.2005 um 04:25 schrieb [EMAIL PROTECTED]:
>
>>As the OP suggests, decoding with a codec like mac-roman or iso8859-1 is
>>very slow compared to encoding or decoding with utf-8. Here I'm working
>>with 53k of data instead of 53 megs. (Note: this is a laptop, so it's
>>possible that thermal or battery management features affected these
>>numbers a bit, but by a factor of 3 at most)
>>
>> $ timeit.py -s "s='a'*53*1024; u=unicode(s)" "u.encode('utf-8')"
>> 1000 loops, best of 3: 591 usec per loop
>> $ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('utf-8')"
>> 1000 loops, best of 3: 1.25 msec per loop
>> $ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('mac-roman')"
>> 100 loops, best of 3: 13.5 msec per loop
>> $ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('iso8859-1')"
>> 100 loops, best of 3: 13.6 msec per loop
>>
>> With utf-8 encoding as the baseline, we have
>> decode('utf-8')  2.1x as long
>> decode('mac-roman') 22.8x as long
>> decode('iso8859-1') 23.0x as long
>>
>> Perhaps this is an area that is ripe for optimization.
>
>For charmap decoding we might be able to use an array (e.g. a tuple
>(or an array.array?) of codepoints instead of dictionary.
>
>Or we could implement this array as a C array (i.e. gencodec.py would
>generate C code).

Fine -- as long as it still allows changing code points.  I add the missing
"Apple logo" code point to mac-roman in order to permit round-tripping
(0xF0 <=> 0xF8FF, per Apple docs).  (New bug #1313051.)

If an all-C implementation wouldn't permit changing codepoints, I suggest
instead just /caching/ the translation in C arrays stored with the codec
object.  The cache would be invalidated on any write to the codec's mapping
dictionary, and rebuilt the next time anything was translated.  This would
maintain the present semantics, work with current codecs, and still provide
the desired speed improvement.

But is there really no way to say this fast in pure Python?  The way a
one-to-one byte mapping can be done with "".translate()?

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Unicode charmap decoders slow

2005-10-03 Thread Tony Nelson
Is there a faster way to transcode from 8-bit chars (charmaps) to utf-8
than going through unicode()?

I'm writing a small card-file program. As a test, I use a 53 MB MBox file,
in mac-roman encoding.  My program reads and parses the file into messages
in about 3 to 5 seconds (Wow! Go Python!), but takes about 14 seconds to
iterate over the cards and convert them to utf-8:

for i in xrange(len(cards)):
u = unicode(cards[i], encoding)
cards[i] = u.encode('utf-8')

The time is nearly all in the unicode() call.  It's not so much how much
time it takes, but that it takes 4 times as long as the real work, just to
do table lookups.

Looking at the source (which, if I have it right, is
PyUnicode_DecodeCharmap() in unicodeobject.c), I think it is doing a
dictionary lookup for each character.  I would have thought that it would
make and cache a LUT the size of the charmap (and hook the relevent
dictionary stuff to delete the cached LUT if the dictionary is changed).
(You may consider this a request for enhancement. ;)

I thought of using U"".translate(), but the unicode version is defined to
be slow, and anyway I can't find any way to just shove my 8-bit data into a
unicode string without translation.  Is there some similar approach?  I'm
almost (but not quite) ready to try it in Pyrex.

I'm new to Python.  I didn't google anything relevent on python.org or in
groups.  I posted this in comp.lang.python yesterday, got a couple of
responses, but I think this may be too sophisticated a question for that
group.

I'm not a member of this list, so please copy me on replies so I don't have
to hunt them down in the archive.

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com