Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-12 Thread Antoine Pitrou
On Fri, 12 Sep 2014 07:54:56 +0100
Jeff Allen  wrote:
> Simply having a block "for private use" seems to create an unmanaged 
> space for conflict, reminiscent of the "other 128 characters" in 
> bilingual programming. I wondered if the way to respect use by 
> applications might be to make it private to a particular sub-class of 
> str, idly however.

It's not private from Python's point of view, it's actually specified
in a PEP. So all Python 3 code has to follow the rule, and there's no
conflict internally.

The characters shouldn't leak out to other applications, unless the
user's code does its I/O very badly :-)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-12 Thread Jim J. Jewett



On September 11, 2014, Jeff Allen wrote:

> ... the area of code point
> space used for the smuggling of bytes under PEP-383 is not a 
> "Unicode Private Use Area", but a portion of the trailing surrogate 
> range. This is a code violation, which I imagine is why 
> "surrogateescape" is an error handler, not a codec.

True, but I believe that is a CPython implementation detail.

Other implementations (including jython) should implement the
surrogatescape API, but I don't think it is important to use the
same internal representation for the invalid bytes.

(Well, unless you want to communicate with external tools (GUIs?)
that are trying to directly use (effectively bytes rather than
strings) in that particular internal encoding when communicating
with python.)

> lone surrogates preclude a naive use of the platform string library

Invalid input often causes problems.  Are you saying that there are
situations where the platform string library could easily handle
invalid characters in general, but has a problem with the specific
case of lone surrogates?

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2014-09-12 Thread Python tracker

ACTIVITY SUMMARY (2014-09-05 - 2014-09-12)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open4652 (+12)
  closed 29509 (+38)
  total  34161 (+50)

Open issues with patches: 2196 


Issues opened (39)
==

#16662: load_tests not invoked in package/__init__.py
http://bugs.python.org/issue16662  reopened by haypo

#22343: Install bash activate script on Windows when using venv
http://bugs.python.org/issue22343  opened by marfire

#22344: Reorganize unittest.mock docs into linear manner
http://bugs.python.org/issue22344  opened by py.user

#22347: mimetypes.guess_type("//example.com") misinterprets host name 
http://bugs.python.org/issue22347  opened by vadmium

#22348: Documentation of asyncio.StreamWriter.drain()
http://bugs.python.org/issue22348  opened by martius

#22350: nntplib file write failure causes exception from QUIT command
http://bugs.python.org/issue22350  opened by vadmium

#22351: NNTP constructor exception leaves socket for garbage collector
http://bugs.python.org/issue22351  opened by vadmium

#22352: Ensure opcode names and args fit in disassembly output
http://bugs.python.org/issue22352  opened by ncoghlan

#22354: Highlite tabs in the IDLE
http://bugs.python.org/issue22354  opened by Christian.Kleineidam

#22355: inconsistent results with inspect.getsource() / inspect.getsou
http://bugs.python.org/issue22355  opened by isedev

#22356: mention explicitly that stdlib assumes gmtime(0) epoch is 1970
http://bugs.python.org/issue22356  opened by akira

#22357: inspect module documentation makes no reference to __qualname_
http://bugs.python.org/issue22357  opened by isedev

#22359: Remove incorrect uses of recursive make
http://bugs.python.org/issue22359  opened by Sjlver

#22360: Adding manually offset parameter to str/bytes split function
http://bugs.python.org/issue22360  opened by cwr

#22361: Ability to join() threads in concurrent.futures.ThreadPoolExec
http://bugs.python.org/issue22361  opened by dktrkranz

#22362: Warn about octal escapes > 0o377 in re
http://bugs.python.org/issue22362  opened by serhiy.storchaka

#22363: argparse AssertionError with add_mutually_exclusive_group and 
http://bugs.python.org/issue22363  opened by Zacrath

#22364: Unify error messages of re and regex
http://bugs.python.org/issue22364  opened by serhiy.storchaka

#22365: SSLContext.load_verify_locations(cadata) does not accept CRLs
http://bugs.python.org/issue22365  opened by Ralph.Broenink

#22366: urllib.request.urlopen shoudl take a "context" (SSLContext) ar
http://bugs.python.org/issue22366  opened by alex

#22367: Please add F_OFD_SETLK, etc support to fcntl.lockf
http://bugs.python.org/issue22367  opened by Andrew.Lutomirski

#22370: pathlib OS detection
http://bugs.python.org/issue22370  opened by Antony.Lee

#22371: tests failing with -uall and http_proxy and https_proxy set
http://bugs.python.org/issue22371  opened by doko

#22374: Replace contextmanager example and improve explanation
http://bugs.python.org/issue22374  opened by terry.reedy

#22376: urllib2.urlopen().read().splitlines() opening a directory in a
http://bugs.python.org/issue22376  opened by alanoe

#22377: %Z in strptime doesn't match EST and others
http://bugs.python.org/issue22377  opened by cool-RR

#22378: SO_MARK support for Linux
http://bugs.python.org/issue22378  opened by jpv

#22379: Empty exception message of str.join
http://bugs.python.org/issue22379  opened by fossilet

#22382: sqlite3 connection built from apsw connection should raise Int
http://bugs.python.org/issue22382  opened by wtonkin

#22384: Tk.report_callback_exception kills process when run with pytho
http://bugs.python.org/issue22384  opened by Aivar.Annamaa

#22385: Allow 'x' and 'X' to accept bytes-like objects in string forma
http://bugs.python.org/issue22385  opened by ncoghlan

#22387: Making tempfile.NamedTemporaryFile a class
http://bugs.python.org/issue22387  opened by Antony.Lee

#22388: Unify style of "Contributed by" notes
http://bugs.python.org/issue22388  opened by serhiy.storchaka

#22389: Generalize contextlib.redirect_stdout
http://bugs.python.org/issue22389  opened by barry

#22390: test.regrtest should complain if a test doesn't remove tempora
http://bugs.python.org/issue22390  opened by haypo

#22391: MSILIB truncates last character in summary information stream
http://bugs.python.org/issue22391  opened by Kevin.Phillips

#22392: Clarify documentation of __getinitargs__
http://bugs.python.org/issue22392  opened by David.Gilman

#22393: multiprocessing.Pool shouldn't hang forever if a worker proces
http://bugs.python.org/issue22393  opened by dan.oreilly

#22394: Update documentation building to use venv and pip
http://bugs.python.org/issue22394  opened by brett.cannon



Most recent 15 issues with no replies (15)
==

#22394: Update documentation build

Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-12 Thread Stephen J. Turnbull
Jeff Allen writes:

 > Simply having a block "for private use" seems to create an unmanaged 
 > space for conflict,

No.  The uncharted range of human language (including recently-
invented nonsense like "emoticons" and the annual "design a character"
contest run by a newpaper in Taipei, with the grand prize being your
character gets added to the national standard IIRC, but maybe it's
just that newspaper's collection of private space characters) already
contains those conflicts.  Believe me, "private use space, manage it
yourself" was the best they could do.

I've been working with the beureaucratic insanity of the Japanese
national standard -- it took almost 3 decades before every Japanese
citizen could store their names in a computer using government-
approved codes -- and the chaos of the Taiwanese national standard --
which contains hordes of characters with one known use and no known
meaning, many of them duplicates -- for twenty years now.  Neither
approach works as well as Unicode's, despite its design-by-committee
flaws overlaid with national animosities that can flare into
linguicidal vetoes and code-space-stuffing logrolling.

 > reminiscent of the "other 128 characters" in bilingual
 > programming. I wondered if the way to respect use by applications
 > might be to make it private to a particular sub-class of str, idly
 > however.

If I understand your suggestion, that's precisely the intent of PEP
383, to make undecodable bytes in a coded character stream private.
But they need to be in the stream one way or another.  So PEP 383
chose to use a non-Unicode encoding (based on the "lone surrogate"
device invented by Markus Kuhn for utf-8b) to deal with that, and that
does effectively make those elements private to Python (but of course
not in the Unicode sense, as they're not even characters in Unicode).

But I gather the "native" Unicode type in Java doesn't allow you to
use that dodge because it checks for malformed Unicode internally (ie,
at a level not controllable by Jython).  So you have to embed such
stream elements in the space of Unicode characters.  You have the
option of the private space or unallocated (reserved) space.  The
latter seems like asking for trouble, and the only way to avoid it
would be to be prepared to move that data around in case of collision.
But that's precisely what I'm suggesting doing in private space.  Same
issue, either way.  Private space with a local registry seems saner.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-12 Thread Jeff Allen

Jim, Stephen:

It seems like we're off topic here, but to answer all as briefly as 
possible:


1. Java does not really have a Unicode type, therefore not one that 
validates. It has a String type that is a sequence of UTF-16 code units. 
There are some String methods and Character methods that deal with code 
points represented as int. I can put any 16-bit values I like in a String.
2. With proper accounting for indices, and as long as surrogates appear 
in pairs, I believe operations like find or endswith give correct 
answers about the unicode, when applied to the UTF-16. This is an 
attractive implementation option, and mostly what we do.
3. I'm fixing some bugs where we get it wrong beyond the BMP, and the 
fix involves banning lone surrogates (completely). At present you can't 
type them in literals but you can sneak them in from Java.
4. I think (with Antoine) if Jython supported PEP-383 byte smuggling, it 
would have to do it the same way as CPython, as it is visible. It's not 
impossible (I think), but is messy. Some are strongly against.


Jeff Allen

On 12/09/2014 16:37, Jim J. Jewett wrote:



On September 11, 2014, Jeff Allen wrote:


... "surrogateescape" is an error handler, not a codec.

True, but I believe that is a CPython implementation detail.

Other implementations (including jython) should implement the
surrogatescape API, but I don't think it is important to use the
same internal representation for the invalid bytes.


lone surrogates preclude a naive use of the platform string library

Invalid input often causes problems.  Are you saying that there are
situations where the platform string library could easily handle
invalid characters in general, but has a problem with the specific
case of lone surrogates?



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] new hg.python.org server

2014-09-12 Thread Benjamin Peterson
I just switched hg.python.org from a OSUOSL VM to a Rackspace VM. The
new VM is a bit beefier and has what I think is better network
connectivity, so hopefully that will improving the speed of repository
operations. We also now support HTTPS for repository browsing and
cloning, so update all your links to https://hg.python.org! IPv6 support
has also returned for those who like that sort of thing.

Note the host keys changed, so you'll probably have to futz with
known_hosts to quiet ssh down. I apologize, but I noticed that that the
current RSA host key is 1024 bits, so I decided to upgrade it to 2048
during the transition.

Thanks to Donald Stufft for helping me set this up.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python-committers] new hg.python.org server

2014-09-12 Thread Raymond Hettinger
On Sep 12, 2014, at 5:34 PM, Benjamin Peterson  wrote:

>  The
> new VM is a bit beefier and has what I think is better network
> connectivity, so hopefully that will improving the speed of repository
> operations.

Thanks Benjamin, the repo is noticeably faster.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python-committers] new hg.python.org server

2014-09-12 Thread Shorya Raj
Just wondering - are there any sys-adminy sort of tasks that could be
completed? I mean, I have some (note, some) experience doing this, and I
wouldn't mind helping out (I inquired in the buildbot thread as well, but
there wasn't much of a response).


Thanks
Shorya Raj

On Sat, Sep 13, 2014 at 1:02 PM, Raymond Hettinger <
raymond.hettin...@gmail.com> wrote:

> On Sep 12, 2014, at 5:34 PM, Benjamin Peterson 
> wrote:
>
>  The
> new VM is a bit beefier and has what I think is better network
> connectivity, so hopefully that will improving the speed of repository
> operations.
>
>
> Thanks Benjamin, the repo is noticeably faster.
>
>
> Raymond
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/rajshorya%40gmail.com
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [python-committers] new hg.python.org server

2014-09-12 Thread Benjamin Peterson


On Fri, Sep 12, 2014, at 21:52, Shorya Raj wrote:
> Just wondering - are there any sys-adminy sort of tasks that could be
> completed? I mean, I have some (note, some) experience doing this, and I
> wouldn't mind helping out (I inquired in the buildbot thread as well, but
> there wasn't much of a response).

Well, hg.python.org is basically done now. The main thing now is
understanding how other services (planet.python.org, bugs.python.org)
are setup and moving them to config management.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com