[issue12819] PEP 393 - Flexible Unicode String Representation

2011-08-22 Thread Torsten Becker

New submission from Torsten Becker torsten.bec...@gmail.com:

I have started an implementation of PEP 393 -- Flexible String Representation 
[1] on bitbucket [2].  Not all code is ported to use the new API yet, but the 
interpreter starts with the new unicode representation, all unit tests pass, 
and some micro benchmarks show potential.  Please see the related wiki page [3] 
for details of my implementation.

[1]: http://www.python.org/dev/peps/pep-0393/
[2]: https://bitbucket.org/t0rsten/pep-393
[2]: http://wiki.python.org/moin/SummerOfCode/2011/PEP393

--
components: Unicode
files: pep-393-aug22.diff
keywords: patch
messages: 142741
nosy: torsten.becker
priority: normal
severity: normal
status: open
title: PEP 393 - Flexible Unicode String Representation
type: feature request
versions: Python 3.3
Added file: http://bugs.python.org/file23004/pep-393-aug22.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12819
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-19 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

Hi, Jesús, I merged the patch up in the branches 
startswith-slices-issue11828-3.2 [1] and startswith-slices-issue11828-3.3 
[2] in my hg repository.

[1]: https://bitbucket.org/t0rsten/cpython/changeset/49028581e43a
[2]: https://bitbucket.org/t0rsten/cpython/changeset/eafafe258362

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-18 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

I pushed my changes to a hg repository, they are in the two branches 
startswith-slices-issue11828-2.7 and startswith-slices-issue11828-3.1.

--
hgrepos: +21

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-18 Thread Torsten Becker

Changes by Torsten Becker torsten.bec...@gmail.com:


Added file: http://bugs.python.org/file21706/2b48fd451c85.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-18 Thread Torsten Becker

Changes by Torsten Becker torsten.bec...@gmail.com:


--
hgrepos: +22

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-18 Thread Torsten Becker

Changes by Torsten Becker torsten.bec...@gmail.com:


Removed file: http://bugs.python.org/file21706/2b48fd451c85.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11783] email parseaddr and formataddr should be IDNA aware

2011-04-17 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

Hi, here is my revised patch with email.utils.getaddresses() also decoding IDNs.

I decided to integrate IDN decoding in AddrlistClass.getaddress() instead of 
AddrlistClass.getaddrlist() since that function is one level lower and if 
somebody should ever all it directly, the conversion would not happen.

I also fixed a glitch in the docs, versionchanged seems to need two colons to 
end up in the generated HTML.


As a follow up, wouldn't it be helpful if email.Message would do the 
conversions directly?  So when you parse a mail into a Message and access the 
To field, you get a list of tuples which are decoded properly?

For example the following test currently still fails because the quoted header 
value is not decoded by email.feedparser.FeedParser nor email.Message:

def test_email_decodes_idns_and_unicode(self):
text = '''\
To: =?utf-8?b?SMOkbnMgV8O8cnN0?= h...@xn--dm-fka.ain

Hello World!'''
msg = Parser().parsestr(text)
self.assertEqual(utils.getaddresses(msg.get_all('To')),
[('H\xe4ns W\xfcrst', 'hans@d\xf6m.ain')])

Am I using the package wrong here or is this actually missing?  
email.header.decode_header seems to be able to do this already but it is not 
used.  Would it be safe to integrate this into the 
email.message._sanitize_header helper?

--
Added file: http://bugs.python.org/file21698/issue-11783-v4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11783] email parseaddr and formataddr should be IDNA aware

2011-04-14 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

 (The word anybody made me think.
 But fix properly ... i'm sure you cannot refer to myself.
 :))

fix properly referred to my inferior implementation and anybody
should probably have been worded Steffen or David.  So sure .. go
ahead. :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-13 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

 Some comments posted in the review.

I'm not sure if my review reply got mailed as I did not get a copy and nothing 
showed up here.  I added some responses/follow up questions in the review.

 Could you possibly post a patch for 2.7 too?.

Sure, I'll write the next version against 3.3 and 2.7

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-13 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

 I got your comments, Torsten. I finds funny too that the tracker is
 not notified.
 I wrote new comments too, but not using the right way, so now I am
 the one not sure you got them... :-)

That time I actually got a separate mail. :)

 Better to have a 3.1/2.7 patch. The current workflow requires to
 patch the old version first (3.1), and up-port the change to 3.2 and
 3.3.
 So, 2.7 and 3.1 would be more useful. Al least if the patch applies
 to 3.2 and 3.3 easily. If major surgery is needed, let me know.

I uploaded an improved v4 patch against 2.7 and 3.1.  patch does not apply it 
cleanly in the 3.2 and 3.3 branches, though.

This is mostly because Objects/stringlib/find.h has changed too much and the 
#define STRINGLIB_IS_UNICODE (3.3, 3.2) is called FROM_UNICODE in 3.1.  The 
other files work fine.

It should be no problem to merge this up by hand, though.

 PS: If you use mercurial, try to upload the patch directly from it.
 See the Remote hg repo box.

I'm using Mercurial, but unfortunately hg hangs forever when trying to push to 
bitbucket, so I am just sticking with patches for now.

--
Added file: http://bugs.python.org/file21655/issue-11828-v4-3.1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-13 Thread Torsten Becker

Changes by Torsten Becker torsten.bec...@gmail.com:


Added file: http://bugs.python.org/file21656/issue-11828-v4-2.7.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11783] email parseaddr and formataddr should be IDNA aware

2011-04-13 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

 OK, so when I went to apply this, I figured out that the patch isn't quite 
 right.  I've redone the doc updates, and am attaching a version of the patch 
 containing them.

 The issue is that the place that the IDNA decode support needs to be added 
 isn't in parseaddr, it's in _parseaddr.py's AddresslistClass.  Tests are then 
 needed to make sure that the IDNA decoding gets done both when parseaddr and 
 getaddresslist are used.

 Do you want to tackle this, Torsten?

I would like to, but I probably will not get to it before Monday.  So
if anybody wants to work on this before that time, please feel free to
fix it properly. :)

Just two questions for the implementation:
  1. Would it be fine to move the helper _encode_decode_addr() into
_parseaddr.py and then import it in util.py, so it can be shared
between the two?
  2. Would line 232 in _parseaddr.py (AddrlistClass.getaddrlist) be a
good place to integrate it?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-12 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

Just realized that part of my v1 patch did not conform to PEP 7, I hope, I 
fixed that in v2.

Please also excuse for the wrong name of the error message patch, it was 
supposed to be named issue-11828-error-msg-tests.patch.

--
Added file: http://bugs.python.org/file21626/issue-11828-v2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-12 Thread Torsten Becker

Changes by Torsten Becker torsten.bec...@gmail.com:


Added file: http://bugs.python.org/file21627/issue-11828-error-msg-tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-12 Thread Torsten Becker

Changes by Torsten Becker torsten.bec...@gmail.com:


Removed file: 
http://bugs.python.org/file21623/issue-8282-error-message-tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-12 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

Hi, since nobody stopped me by complaining about the approach or the first 
patch, I now fixed this for bytes and bytearray as well. :)

I renamed the old _ParseTupleFinds function to stringlib_parse_tuple_finds, 
added a parameter for function name, and another if it shall do unicode 
conversion.  I used this helper function throughout all 3 files now.

I am new to writing C code for Python, so any comments on how to improve the 
patch are welcome.

--
Added file: http://bugs.python.org/file21629/issue-11828-v3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11783] email parseaddr and formataddr should be IDNA aware

2011-04-11 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

 modulo some English wording that I'll fix up when I commit it.

Yeah, sorry for that, I seem to have trouble with writing good documentation. 
:)  I'll have a look at the documents referenced by [1] to improve my writing.

 The issue with the '@' is that it might not be there.

I added a fix and a test for this in v2.  However, when reading through the RFC 
[2] and Wikipedia [3], it seems like this is not actually allowed.

Is there a way to internationalize the local-part as well?  That is the only 
part which is missing now that domain and real name are covered.


[1]: http://docs.python.org/devguide/docquality.html
[2]: http://tools.ietf.org/html/rfc5322#section-3.4
[3]: http://en.wikipedia.org/wiki/Email_address#Invalid_email_addresses

--
Added file: http://bugs.python.org/file21614/issue-11783-v2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-11 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

Hi, I started working on a first patch for this.  A function _ParseTupleFinds() 
exists which does the proper parsing for this kind of arguments in 
unicodeobject.c, I adapted it to be usable for startswith() and endswith() 
besides find() and friends.

In issue-8282-v1.patch I fixed this for startswith() and endswith().  count() 
suffered from the same behavior and I updated it there as well.

--
keywords: +patch
nosy: +torsten.becker
Added file: http://bugs.python.org/file21620/issue-11828-v1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11828] startswith and endswith don't accept None as slice index

2011-04-11 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

While working on this, I discovered anther problem.  find(), etc. all use the 
same parsing function (_ParseTupleFinds()).  So when an error occurs, the 
exception message will always start with find() even though index() or 
rfind() might have caused the error:

 asd.index(x, None, None, None)
TypeError: find() takes at most 3 arguments (4 given)

I attached a patch (issue-8282-error-message-tests.patch) which adds test cases 
for the wrong error messages.

I was thinking about fixing this as well but wanted make sure my approach is 
correct first:

  - I would like to add another argument to _ParseTupleFinds(): const char * 
function_name
  - in _ParseTupleFinds(): allocate a buffer of 50 chars on the stack to hold 
O|OO: + function name
  - copy O|OO: into buffer
  - copy max(strlen(function_name), 44) chars from function_name into buffer
  - use buffer as format argument of PyArg_ParseTuple()
  - change all calls of _ParseTupleFinds to include the function name as first 
argument

Would that approach work with Python's C style or are there any Python-specific 
helper functions I could use?

--
Added file: 
http://bugs.python.org/file21623/issue-8282-error-message-tests.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11828
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11783] email parseaddr and formataddr should be IDNA aware

2011-04-09 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

 Have a nice weekend!

Thank you for the wishes, I hope yours is going well, too!

I added IDNA awareness to formataddr() and parseaddr(), updated the docs and 
wrote 2 tests for it.

I wasn't sure if the IDNA awareness should be optional via a argument or always 
automatically enabled, I favored the latter.

Also, is it safe to split at @ and encode/decode the last component?  I am 
not familiar with all the weird variants a email address could be in strictly 
after the RFCs.

--
keywords: +patch
Added file: http://bugs.python.org/file21595/issue-11783-v1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11783] email parseaddr and formataddr should be IDNA aware

2011-04-08 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

I was about to look into this over the weekend, but of course I don't
want to steal your fun, Steffen. :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11783
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1690608] email.utils.formataddr() should be rfc2047 aware

2011-04-07 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

Hi David, thank you for polishing up the patch and committing it. :)
I am glad I could help and I was actually about to ask you if you knew
any follow-up issues.  I'll definitely continue contributing as time
allows.  I did not submit the agreement yet, but I'll look into that
ASAP.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1690608
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8269] Missing return values for PyUnicode C/API functions

2011-04-03 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

Hi, I read through unicodeobject.c and added the (IMO) proper reference counts 
to the missing functions.  I attached a first patch which adds this to 
Doc/data/refcounts.dat.

The patch also fixes 2 minor glitches in Doc/c-api/unicode.rst, 
PyUnicode_DecodeMBCSStateful stated int instead of Py_ssize_t for it's 
arguments and PyUnicode_FromString had it's return value wrongly formated.

--
keywords: +patch
nosy: +torsten.becker
Added file: http://bugs.python.org/file21514/issue-8269-v1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8269
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1690608] email.utils.formataddr() should be rfc2047 aware

2011-03-28 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

I incorporated that change as well.  My rationale behind the previous version 
was to be consistent with how Lib/email/header.py handled this, unfortunately I 
did not look around in the other classes and didn't think about that kind of 
compatibility.

When formataddr() is called with a object which is not a string and which does 
not have a header_encode it will raise the following exception now:

 AttributeError: 'CharsetMock' object has no attribute 'header_encode'

Thank you for your patience, sorry that it took probably more of your time by 
taking 4 iterations for this patch than if you had just implemented it yourself.

--
Added file: http://bugs.python.org/file21436/issue-1690608-v4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1690608
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1690608] email.utils.formataddr() should be rfc2047 aware

2011-03-27 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

I implemented a basic test for the issue and an attempt for a fix.

I am not entirely sure with my implementation, specifically I would like to get 
comments concerning the following points:

  - Is is OK that formataddr() will now check if address is ascii safe and if 
not it will raise a UnicodeEncodeError?
  
  - I was not sure on the style how to append new tests to test_email.py, I 
just put it into the same spot where all the other formataddr() tests where, 
shall I put it to the end instead?


I am submitting this patch as part of my preparation for the Google Summer of 
Code to familiarize myself with the contribution process, any feedback on what 
I should do different is very welcome.

--
keywords: +patch
nosy: +torsten.becker
Added file: http://bugs.python.org/file21429/issue-1690608.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1690608
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1690608] email.utils.formataddr() should be rfc2047 aware

2011-03-27 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

 However, there should be a test for that, and I'm curious to know what 
 happens if you use such an address in an address field in the unmodified 
 email package.

I added a test to check if the exceptions get thrown when a address is invalid.

I also added a small test to check how a resulting message should look, it 
looks good to me but I am not a specialist with email.  Do you have any other 
ideas how to check if it does not have a negative impact to other parts of the 
module?


 Instead of directly calling bencode, you should use the charset module and 
 its header_encode method.  Note that you need to turn the charset into a 
 Charset instance first.  The advantage of doing this is that it will choose 
 the best encoding to use based on the charset and the contents of the 
 string.

The code also uses email.charset.Charset now.

--
Added file: http://bugs.python.org/file21431/issue-1690608-v2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1690608
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1690608] email.utils.formataddr() should be rfc2047 aware

2011-03-27 Thread Torsten Becker

Torsten Becker torsten.bec...@gmail.com added the comment:

I incorporated the changes as you suggested and added the text to the docs.  
Just out of curiosity, why are the docs repeated in email.util.rst when they 
are already in the docstrings?

--
Added file: http://bugs.python.org/file21434/issue-1690608-v3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1690608
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com