[Python-Dev] Re: Are "Batteries Included" still a Good Thing? [was: It's now time to deprecate the stdlib urllib module]

2022-03-30 Thread Toshio Kuratomi
On Tue, Mar 29, 2022, 10:55 AM Brett Cannon  wrote:

>
>
> On Tue, Mar 29, 2022 at 8:58 AM Ronald Oussoren 
> wrote:
>
>>
>>
>> On 29 Mar 2022, at 00:34, Brett Cannon  wrote:
>>
>>
>>
>> On Mon, Mar 28, 2022 at 11:52 AM Christopher Barker 
>> wrote:
>>
>>> On Mon, Mar 28, 2022 at 11:29 AM Paul Moore  wrote:
>>>


>> Having such a policy is a good thing and helps in evolving the stdlib,
>> but I wonder if the lack of such a document is the real problem.   IMHO the
>> main problem is that the CPython team is very small and therefore has
>> little bandwidth for maintaining, let alone evolving, large parts of the
>> stdlib.  In that it doesn’t help that some parts of the stdlib have APIs
>> that make it hard to make modifications (such as distutils where
>> effectively everything is part of the public API).  Shrinking the stdlib
>> helps in the maintenance burden, but feels as a partial solution.
>>
>
> You're right that is the fundamental problem. But for me this somewhat
> stems from the fact that we don't have a shared understanding of what the
> stdlib *is*,  and so the stdlib is a bit unbounded in its size and scope.
> That leads to a stdlib which is hard to maintain. It's just like dealing
> with any scarce resource: you try to cut back on your overall use as best
> as you can and then become more efficient with what you must still consume;
> I personally think we don't have an answer to the "must consume" part of
> that sentence that leads us to "cut back" to a size we can actually keep
> maintained so we don't have 1.6K open PRs
> .
>

One of the things that's often missed in discussions is that a *good*
policy document can also help grow the number of maintainers.

As just one example, i found two interesting items in the discussion
started by Skip about determining what modules don't have maintainers just
downstream if this. (1) There's a file which matches maintainers to modules
in the stdlib (this is documented but i only found out about it a few years
ago and Skip, who's been around even longer than me didn't know about it
either... So this means something about how our policy docs are currently
structured could be improved).  (2) Terry brought up that you don't have to
be a core maintainer in order to take up ownership of something in the
stdlib. That's awesome!  But this is definitely something i didn't know.
I've been "focusing"[1] on  becoming a core maintainer because i thought
that was a prerequisite to getting anything done in the stdlib. Knowing
that getting involved with stdlib maintenance is different could be vastly
helpful.

[1] focusing is the wrong word... It expresses the feeling of "directed
action" correctly but doesn't convey the lack of activity that sprinkles my
attempts.  Nor does it account for discouragement, helplessness, and
imposter-y feelings which are the reasons for that lack.

-toshio
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GBNDUQXWTBGCP5243L4HUU5UVLKQ7UWB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Are "Batteries Included" still a Good Thing? [was: It's now time to deprecate the stdlib urllib module]

2022-03-28 Thread Toshio Kuratomi
On Sun, Mar 27, 2022, 11:07 AM Paul Moore  wrote:

> On Sun, 27 Mar 2022 at 17:11, Christopher Barker 
> wrote:
> > Back to the topic at hand, rather than remove urllib, maybe it could be
> made better -- an as-easy-to-use-as-requests package in the stdlib would be
> really great.
>
> I think that's where the mistake happens, though. Someone who needs
> "best of breed" is motivated (and likely knowledgeable enough) to make
> informed decisions about what's on PyPI. But someone who just wants to
> get the job done probably doesn't - and that's the audience for the
> stdlib. A stdlib module needs to be a good, reliable set of basic
> functionality that non-experts can use successfully. There can be
> better libraries on PyPI, but that doesn't mean the stdlib module is
> unnecessary, nor does it mean that the stdlib has to match the PyPI
> library feature for feature.
>
> So here, specifically, I'd rather see urlllib be the best urlllib it
> can be, and not demand that it turn into requests. Requests is there
> if people need/want it (as is httpx, and urllib3, and aiohttp). But
> urllib is for people who want to get a file from the web, and *not*
> have to deal with dependencies, 3rd party libraries, etc.
>


One thing about talking about "make urllib more like requests" that is
different than any of the other libs, though, is that requests aims to be
easier to use than anything else (which I note Chris Barker called out as
why he wanted urllib to be more like it).  I think that's important to
think about because i think ease of use is also the number one thing that
the audience you talk about is also looking for.

Of course, figuring out whether an api like request's is actually easier to
use than urllib or merely more featureful is open to debate.

-toshio
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4ZQC4H7HD3UXFT3CONU64YPOQBSPUTVY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Toshio Kuratomi
This is an excellent enumeration of some of the concerns!

One minor comment about the introductory material:

On Mon, Nov 1, 2021 at 5:21 AM Petr Viktorin  wrote:

> >
> > Introduction
> > 
> >
> > Python code is written in `Unicode`_ – a system for encoding and
> > handling all kinds of written language.

Unicode specifies the mapping of glyphs to code points.  Then a second
mapping from code points to sequences of bytes is what is actually
recorded by the computer.  The second mapping is what programmers
using Python will commonly think of as the encoding while the majority
of what you're writing about has more to do with the first mapping.
I'd try to word this in a way that doesn't lead a reader to conflate
those two mappings.

Maybe something like this?

  `Unicode`_ is a system for handling all kinds of written language.
It aims to allow any character from any human natural language (as
well as a few characters which are not from natural languages) to be
used. Python code may consist of almost all valid Unicode characters.

> > While this allows programmers from all around the world to express 
> > themselves,
> > it also allows writing code that is potentially confusing to readers.
> >

-Toshio
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Q2T3GKC6R6UH5O7RZJJNREG3XQDDZ6N4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: [python-committers] Resignation from Stefan Krah

2020-10-09 Thread Toshio Kuratomi
On Fri, Oct 9, 2020, 5:30 AM Christian Heimes  wrote:

> On 09/10/2020 04.04, Ivan Pozdeev via Python-Dev wrote:
> > I don't see the point of requiring to "write an apology", especially
> > *before a 12-month ban*. If they understand that their behavior is
> > wrong, there's no need for a ban, at least not such a long one; if they
> > don't, they clearly aren't going to write it, at least not now (they
> > might later, after a few weeks or months, having cooled down and thought
> > it over). So all it would achieve is public shaming AFAICS. Same issue
> > with the threat of "zero tolerance policy" -- it's completely
> > unnecessary and only serves to humiliate and alienate the recipient.
>
>
> I have been the victim of Stefan's CoC violations on more than one
> occasion. He added me to nosy list of a ticket just to offend and
> humiliate me. For this reason I personally asked the SC to make a
> sincere apology a mandatory requirement for Stefan's reinstatement as a
> core dev.
>
> I would have been fine with a private apology. However Stefan has also
> verbally attacked non-core contributors. In one case another core dev
> and I contacted the contribute in private to apologize and ensure that
> the contributor was not alienated by Stefan's attitude. Therefore it
> makes sense that the SC has requested a public, general apology.
>
> Why are you more concerned with the reputation of a repeated offender
> and not with the feelings of multiple victims of harassment? As a victim
> of Stefan's behavior I feel that an apology is the first step to
> reconcile and rebuild trust.
>

At the risk of putting my nose in where it doesn't belong... I think that
Ivan has some good general points.  And i think that they could be
distilled as this: if you are looking to correct bad behavior but allow a
contributor to learn about proper behavior and then return to the
community, the steps taken here seen counter-productive (1).  I would add a
second piece to that: If, on the other hand, the goal is to remove a toxic
person from the community whoneeds to go through major personality shifting
changes before they can be allowed back, then this may be appropriate (2).

For (1), what I'm getting from Ivan's post is that these measures are at a
level that few (if any) people would be willing to fulfill them and then
come back to be a non-bitter contributor. When the requirements are too
costly for the violator to pay, they won't be able to learn and then pay
those costs until they can disavow their former selves.  "i'm sorry i acted
like that; i was a *different person* back then. I'm sorry that *past me*
felt the need to hurt you."

I would think that in general, not necessarily this specific case, the
steering committee would want to try taking steps to get people to learn
proper behavior first and only resort to something which amounts to a de
facto permanent ban when it becomes apparent that the person has to go
through some major personality changes before their behavior will change.

For (2), the steering committee is charged with protecting the community at
large. A toxic person can cause great havoc by themselves and set the tone
of a community such that other people feel that engaging in bad behavior is
the proper thing to do in this community.  With that in mind, at some
point, this kind of action has to be on the table.  It is great that pep-13
lists banning as a possibility so that people know where their actions can
lead.

One thing i would suggest, though, is documenting and, in general,
following a sequence of progressively more strict interventions by the
steering committee.  I think that just as it is harmful to the community to
let bad behavior slide, it is also harmful to the community to not know
that the steering committee's enforcement is in measured steps which will
telegraph the committee's intentions and the member's responsibilities well
in advance.

This specific case may already have been out of hand by the time it came to
the committee, the steering committee is relatively new and problems could
have festered before they formed and started governing, but a new member of
the community should know that if they step out of line, the committee will
make it apparent to them what the expectations are and whether their
ongoing behavior is putting them onto a disciplinary track well before that
discipline gets to the point of a one year ban and a public apology.

Thanks for reading,
-Toshio

>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IDFQDRHRA2JJ6OJAK2265UHCBEI45PIM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread Toshio Kuratomi
On Mon, Aug 5, 2019 at 6:47 PM  wrote:
>
> I wish people with more product management experience would chime in; 
> otherwise, 3.8 is going to ship with an intentional hard-to-ignore annoyance 
> on the premise that we don't like the way people have been programming and 
> that they need to change their code even if it was working just fine.
>

I was resisting weighing in since I don't know the discussion around
deprecating this language feature in the first place (other than
what's given in this thread).  However, in the product I work on we
made a very similar change in our last release so I'll throw it out
there for people to take what they will from it.

We have a long standing feature which allows people to define groups
of hosts and give them a name.  In the past that name could include
dashes, dots, and other characters which are not legal as Python
identifiers.  When users use those group names in our "DSL" (not truly
a DSL but close enough), they can do it using either dictionary-lookup
syntax (groupvars['groupname']) or using dotted attribute notation
groupvars.groupname.  We also have a longstanding problem where users
will try to do something like groupvars.group-name using the
dotted attribute notation with group names that aren't proper python
identifiers.  This causes problems as the name then gets split on the
characters that aren't legal in identifiers and results in something
unexpected (undefined variable, an actual subtraction operation, etc).
In our last release we decided to deprecate and eventually make it
illegal to use non-python-identifiers for the group names.

At first, product management *did* let us get away with this.  But
after some time and usage of the pre-releases, they came to realize
that this was a major problem.  User's had gotten used to being able
to use these characters in their group names.  They had defined their
group names and gotten used to typing their group names and built up a
whole body of playbooks that used these group names

Product management still let us get away with this.. sort of. The
scope of the change was definitely modified.  Users were now allowed
to select whether invalid group names were disallowed (so they could
port their installations), allowed with a warning (presumably so they
could do work but also see that they were affected) or allow without a
warning (presumably because they knew not to use these group names
with dotted attribute notation) .  This feature was also no longer
allowed to be deprecated... We could have a warning that said "Don't
do this" but not remove the feature in the future.

Now... I said this was a config option So what we do have in the
release is that the config option allows but warns by default and *the
config option* has a deprecation warning.  You see... we're planning
on changing from warn by default now to disallowing by default in the
future so the deprecation is flagging the change in config value.

And you know what?  User's absolutely hate this.  They don't like the
warning.  They don't like the implication that they're doing something
wrong by using a long-standing feature.  They don't like that we're
going to change the default so that they're current group names will
break.  They dislike that it's being warned about because of
attribute-lookup-notation which they can just learn not to use with
their group names.  They dislike this so much that some of us have
talked about abandoning this idea... instead, having a public group
name that users use when they write in the "DSL" and an internal group
name that we use when evaluating the group names. Perhaps that works,
perhaps it doesn't, but I think that's where my story starts being
specific to our feature and no longer applicable to Python and escape
sequences

Now like I said, I don't know the discussions that lead to invalid
escape sequences being deprecated so I don't know whether there's more
compelling reasons for doing it but it seems to me that there's even
less to gain by doing this than what we did in Ansible.  The thing
Ansible is complaining about can do the wrong thing when used in
conjunction with certain other features of our "DSL".  The thing that
the python escape sequences is complaining about are never invalid (As
was pointed out, it's complaining when a sequence of two characters
will do what the user intended rather than complaining when a sequence
of two characters will do something that the user did not intend).
Like the Ansible feature, though, the problem is that over time we've
discovered that it is hard to educate users about the exact
characteristic of the feature (\k == k but \n == newline;
groupvars['group-name']  works but groupvars.group-name does not) so
we've both given up on continuing to educate the users in favor of
attempting to nanny the user into not using the feature.  That most
emphatically has not worked for us and has spent a bunch of goodwill
with our users but the python userbase is not 

Re: [Python-Dev] Compile-time resolution of packages [Was: Another update for PEP 394...]

2019-02-27 Thread Toshio Kuratomi
On Tue, Feb 26, 2019 at 2:07 PM Neil Schemenauer 
wrote:

> On 2019-02-26, Gregory P. Smith wrote:
> > On Tue, Feb 26, 2019 at 9:55 AM Barry Warsaw  wrote:
> > For an OS distro provided interpreter, being able to restrict its use to
> > only OS distro provided software would be ideal (so ideal that people who
> > haven't learned the hard distro maintenance lessons may hate me for it).
>
> This idea has some definite problems.  I think enforcing it via convention
is about as much as would be good to do.  Anything more and you make it
hard for people who really need to use the vendor provided interpreter from
being able to do so.

Why might someone need to use the distro provided interpreter?

* Vendor provides some python modules in their system packages which are
not installable from pip (possibly even a proprietary extension module, so
not even buildable from source or copyable from the system location) which
the end user needs to use to do something to their system.
* End user writes a python module which is a plugin to a system tool which
has to be installed into the system python to from which that system tool
runs.  The user then wants to write a script which uses the system tool
with the plugin in order to do something to their system outside of the
system tool (perhaps the system tool is GUI-driven and the user wants to
automate a part of it via the python module).  They need their script to
use the system python so that they are using the same code as the system
tool itself would use.

There's probably other scenarios where the benefits of locking the user out
of the system python outweigh the benefits but these are the ones that I've
run across lately.

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inclusion of lz4 bindings in stdlib?

2018-11-29 Thread Toshio Kuratomi
On Thu, Nov 29, 2018, 6:56 AM Benjamin Peterson 
>
> On Wed, Nov 28, 2018, at 15:27, Steven D'Aprano wrote:
> > On Wed, Nov 28, 2018 at 10:43:04AM -0800, Gregory P. Smith wrote:
> >
> > > PyPI makes getting more algorithms easy.
> >
> > Can we please stop over-generalising like this? PyPI makes getting
> > more algorithms easy for *SOME* people. (Sorry for shouting, but you
> > just pressed one of my buttons.)
> >
> > PyPI might as well not exist for those who cannot, for technical or
> > policy reasons, install addition software beyond the std lib on the
> > computers they use. (I hesitate to say "their computers".)
> >
> > In many school or corporate networks, installing unapproved software can
> > get you expelled or fired. And getting approval may be effectively
> > impossible, or take months of considerable effort navigating some
> > complex bureaucratic process.
> >
> > This is not an argument either for or against adding LZ4, I have no
> > opinion either way. But it is a reminder that "just get it from PyPI"
> > represents an extremely privileged position that not all Python users
> > are capable of taking, and we shouldn't be so blase about abandoning
> > those who can't to future std lib improvements.
>
> While I'm sympathetic to users in such situations, I'm not sure how much
> we can really help them. These are the sorts of users who are likely to
> still be stuck using Python 2.6. Any stdlib improvements we discuss and
> implement today are easily a decade away from benefiting users in
> restrictive environments. On that kind of timescale, it's very hard to know
> what to do, especially since, as Paul says, we don't hear much feedback
> from such users.
>

As a developer of software that has to run in such environments, having a
library be in the stdlib is helpful as it is easier to convince the rest of
the team to bundle a backport of something that's in a future stdlib than a
random package from pypi.  Stdlib inclusion gives the library a known
future and a (perhaps illusory, perhaps real) blessing from the core devs
that helps to sell the library as the preferred solution.

-Toshio

>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A fast startup patch (was: Python startup time)

2018-05-05 Thread Toshio Kuratomi
On Sat, May 5, 2018, 10:40 AM Eric Fahlgren <ericfahlg...@gmail.com> wrote:

> On Sat, May 5, 2018 at 10:30 AM, Toshio Kuratomi <a.bad...@gmail.com>
> wrote:
>
>> On Fri, May 4, 2018, 7:00 PM Nathaniel Smith <n...@pobox.com> wrote:
>>
>>> What are the obstacles to including "preloaded" objects in regular .pyc
>>> files, so that everyone can take advantage of this without rebuilding the
>>> interpreter?
>>>
>>
>> Would this make .pyc files arch specific?
>>
>
> Or have parallel "pyh" (Python "heap") files, that are architecture
> specific... (But that would cost more stat calls.)
>

I ask because arch specific byte code files are a big change in consumers
expectations.  It's not necessarily a bad change but it should be
communicated to downstreams so they can decide how to adjust to it.

Linux distros which ship byte code files will need to build them for each
arch, for instance.  People who ship just the byte code as an obfuscation
of the source code will need to decide whether to ship packages for each
arch they care about or change how they distribute.

-Toshio

>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A fast startup patch (was: Python startup time)

2018-05-05 Thread Toshio Kuratomi
On Fri, May 4, 2018, 7:00 PM Nathaniel Smith  wrote:

> What are the obstacles to including "preloaded" objects in regular .pyc
> files, so that everyone can take advantage of this without rebuilding the
> interpreter?
>

Would this make .pyc files arch specific?

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate PEP 370 Per user site-packages directory?

2018-01-13 Thread Toshio Kuratomi
On Jan 13, 2018 9:08 AM, "Christian Heimes"  wrote:

Hi,

PEP 370 [1] was my first PEP that got accepted. I created it exactly one
decade and two days ago for Python 2.6 and 3.0.


I didn't know I had you to thank for this!  Thanks Christian!  This is one
of the best features of the python software packaging ecosystem!  I almost
exclusively install into user site packages these days.  It lets me pull in
the latest version of software when I want it for everyday use and revert
to what my system shipped with if the updates break something.  It's let me
I install libraries ported to python3 before my distro got stopping to
packaging the updates.  It's let me perform an install when I want to test
my packages as my users might be using it without touching the system
dirs.  It's been a godsend!


Fast forward 10 years...

Nowadays Python has venv in the standard library. The user-specific
site-packages directory is no longer that useful. I would even say it's
causing more trouble than it's worth. For example it's common for system
script to use "#!/usr/bin/python3" shebang without -s or -I option.


With great power comes great responsibility...

Sure, installing something into user site packages can break system
scripts.  But it can also fix them.  I can recall breaking system scripts
twice by installing something into user site packages (both times, the
tracebacks rapidly lead me to the reason that the scripts were failing).
As a counter point to that I can recall *fixing* problems in system scripts
by installing newer libraries into site packages twice in the last two
months.  (I've also fixed system software by installing into user and then
modifying that version but I do that less frequently... Perhaps only a
couple times a year...)

Removing the user site packages also doesn't prevent people from making
local changes that break system scripts (removing the pre-configuration of
user site packages does not stop honoring usage of PYTHONPATH); it only
makes people work a little harder to place their overridden packages into a
location that python will find and leads to nonstandard locations for these
overrides. This will make it harder for people to troubleshoot the problems
other people may be having.  Instead of asking "do you have any libraries
in .local in your tracebacks?"  as an easy first troubleshooting step.
Without the user site packages standard we'll be back to trying to
determine which directories are official for the user's install and then
finding any local directories that their site may have defined for
overrides

I propose to deprecate the feature and remove it in Python 4.0.


Although I don't like the idea of system scripts adding -s and -l because
it prevents me from fixing them for my own use by installing just a newer
or modified library into user site packages (similar to how c programs can
use overridden libraries via ld_library_path), it seems that if you want to
prevent users from choosing to use their own libraries with system scripts,
the right thing to do is to get changes to allow adding those to setuptools
and distutils.  Those flags will do a much more thorough job of preventing
this usage than removing user site packages can.

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)

2017-12-10 Thread Toshio Kuratomi
On Dec 9, 2017 8:53 PM, "INADA Naoki"  wrote:

> Earlier versions of PEP 538 thus included "en_US.UTF-8" on the
> candidate target locale list, but that turned out to cause assorted
> problems due to the "C -> en_US" part of the coercion.

Hm, but PEP 538 says:

> this PEP instead proposes to extend the "surrogateescape" default for
stdin and stderr error handling to also apply to the three potential
coercion target locales.

https://www.python.org/dev/peps/pep-0538/#defaulting-to-
surrogateescape-error-handling-on-the-standard-io-streams

I don't think en_US.UTF-8 should use surrogateescape error handler.


Could you explain why not? utf-8 seems like the common thread for using
surrogateescape so I'm not sure what would make en_US.UTF-8 different than
C.UTF-8.

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: go back to enabling DeprecationWarning by default

2017-11-07 Thread Toshio Kuratomi
On Nov 7, 2017 5:47 AM, "Paul Moore"  wrote:

On 7 November 2017 at 13:35, Philipp A.  wrote:
> Sorry, I still don’t understand how any of this is a problem.
>
> If you’re an application developer, google “python disable
> DeprecationWarning” and paste the code you found, so your users don’t see
> the warnings.
> If you’re a library developer, and a library you depend on raises
> DeprecationWarnings without it being your fault, file an issue/bug there.
>
> For super-increased convenience in case 2., we could also add a
convenience
> API that blocks deprecation warnings raised from certain module or its
> submodules.
> Best, Philipp

If you're a user and your application developer didn't do (1) or a
library developer developing one of the libraries your application
developer chose to use didn't do (2), you're hosed. If you're a user
who works in an environment where moving to a new version of the
application is administratively complex, you're hosed.

As I say, the proposal prioritises developer convenience over end user
experience.


I don't agree with this characterisation.  Even if we assume a user isn't
going to fix a DeprecationWarning they still benefit: (1) if they're a
sysadmin it will warn them that they need to be careful when upgrading a
dependency. (2) if the developer never hears about the DeprecationWarning
then it is ultimately the user who suffers when the tool they depend on
breaks without warning so seeing and reporting the DeprecationWarning helps
the end user. (3) if DeprecationWarnings are allowed to linger through
multiple releases, it may tell the user about the quality of the software
they're using.

More information is helpful to end users.  Developers are actually the ones
that it inconveniences as we'll be the ones grumbling when an end user who
hasn't evaluated the deprecation cycles of upstream projects as we have
demand immediate changes for deprecations that are still years away from
causing problems.  But unlike end users, we do have the ability to solve
that by turning those deprecations off in our code if we've done our due
diligence (or even if we haven't done our due diligence).

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

2017-05-04 Thread Toshio Kuratomi
On Sat, Mar 4, 2017 at 11:50 PM, Nick Coghlan  wrote:
>
> Providing implicit locale coercion only when running standalone
> ---
>
> Over the course of Python 3.x development, multiple attempts have been made
> to improve the handling of incorrect locale settings at the point where the
> Python interpreter is initialised. The problem that emerged is that this is
> ultimately *too late* in the interpreter startup process - data such as
> command
> line arguments and the contents of environment variables may have already
> been
> retrieved from the operating system and processed under the incorrect ASCII
> text encoding assumption well before ``Py_Initialize`` is called.
>
> The problems created by those inconsistencies were then even harder to
> diagnose
> and debug than those created by believing the operating system's claim that
> ASCII was a suitable encoding to use for operating system interfaces. This
> was
> the case even for the default CPython binary, let alone larger C/C++
> applications that embed CPython as a scripting engine.
>
> The approach proposed in this PEP handles that problem by moving the locale
> coercion as early as possible in the interpreter startup sequence when
> running
> standalone: it takes place directly in the C-level ``main()`` function, even
> before calling in to the `Py_Main()`` library function that implements the
> features of the CPython interpreter CLI.
>
> The ``Py_Initialize`` API then only gains an explicit warning (emitted on
> ``stderr``) when it detects use of the ``C`` locale, and relies on the
> embedding application to specify something more reasonable.
>

It feels like having a short section on the caveats of this approach
would help to introduce this section.  Something that says that this
PEP can cause a split in how Python behaves in non-sandalone
applications (mod_wsgi, IDEs where libpython is compiled in, etc) vs
standalone (unless the embedders take similar steps as standalone
python is doing).  Then go on to state that this approach was still
chosen as coercing in Py_Initialize is too late, causing the
inconsistencies and problems listed here.

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why does base64 return bytes?

2016-06-14 Thread Toshio Kuratomi
On Jun 14, 2016 8:32 AM, "Joao S. O. Bueno"  wrote:
>
> On 14 June 2016 at 12:19, Steven D'Aprano  wrote:
> > Is there
> > a good reason for returning bytes?
>
> What about: it returns 0-255 numeric values for each position in  a
stream, with
> no clue whatsoever to how those values map to text characters beyond
> the 32-128 range?
>
> Maybe base64.decode could take a "encoding" optional parameter - or
> there could  be
> a separate 'decote_to_text" method that would explicitly take a text
codec name.
> Otherwise, no, you simply can't take a bunch of bytes and say they
> represent text.
>
Although it's not explicit, the question seems to be about the output of
encoding (and for symmetry, the input of decoding).  In both of those
cases, valid output will consist only of ascii characters.

The input to encoding would have to remain bytes (that's the main purpose
of base64... to turn bytes into an ascii string).

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Request for pronouncement on PEP 493 (HTTPS verification backport guidance)

2015-11-26 Thread Toshio Kuratomi
On Nov 26, 2015 4:53 PM, "Nick Coghlan"  wrote:
>
> On 27 November 2015 at 03:15, Barry Warsaw  wrote:

>
> > Likewise in Ubuntu, we try to keep deviations from Debian at a minimum,
and
> > document them when we must deviate.  Ubuntu is a community driven
distro so
> > while Canonical itself has customers, it's much more likely that
feedback
> > about the Python stack comes from ordinary users.  Again, my personal
goal is
> > to make Python on Ubuntu a pleasant and comfortable environment, as
close to
> > installing from source as possible, consistent with the principles and
> > policies of the project.
>
> I'd strongly agree with that description for Fedora and
> softwarecollections.org, but for the RHEL/CentOS system Python I think
> the situation is slightly different: there, the goal is to meet the
> long term support commitments involved in being a base RHEL package.
> As the nominal base version of the package (2.7.5 in the case of RHEL
> 7) doesn't change, there is naturally going to be increasing
> divergence from the nominal version.

I think the goal in rhel/centos is similar, actually.  The maintenance
burden for non upstream changes has been acknowledged as a problem to be
avoided by rhel maintainers before.  The caveat for those distributions is
that they accumulate more *backports*.

However, backports are easier to maintain than non upstream changes.  The
test of the upstream community helps to find and fix bugs in the code; the
downstream maintainer just needs to stay aware of whether fixes are going
into the code they've backported.

> I tried to go down the "upstream first" path with a properly supported
> "off switch" in PEP 476, and didn't succeed (hence the monkeypatch
> compromise). It sounds like several folks would like to see us revisit
> that decision, though.
>
That's the rub.  If there's now enough support to push this upstream I
think everyone downstream will be happier.  If it turns out there's still
enough resistance to keep it from upstream then I suppose you cross that
bridge if it becomes necessary.

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Request for pronouncement on PEP 493 (HTTPS verification backport guidance)

2015-11-24 Thread Toshio Kuratomi
On Tue, Nov 24, 2015 at 10:08 AM, Paul Moore  wrote:

> I'm not actually sure that it's the place of this PEP to even comment
> on what the long term answer for such environments should be (or
> indeed, any answer, long term or not). I've argued the position that
> in some organisations it feels like security don't actually understand
> the issues of carefully balancing secure operation against flexible
> development practices,

I agree with this.

> but conversely it's certainly true that in many
> organisations, they *have* weighed the various arguments and made an
> informed decision on how to set up their internal network. It's
> entirely possible that self-signed certificates are entirely the right
> decision for their circumstances. Why would a Python PEP be qualified
> to comment on that decision?

I don't quite agree with this but it's probably moot in the face of
the previous and certain cornercases.  Self-signed certs work just
fine with the backports to python-2.7.9+ but you have to add the ca to
the clients.  An organization that has weighed the arguments and made
an informed decision to use self-signed certs should either do this
(to prevent MITM) or they should switch to using http instead of https
(because MITM isn't a feasible attack here).  The cornercases come
into play because you don't always control all of the devices and
services on your network.  The site could evaluate and decide that
MITM isn't a threat to their switch's configuration interface but that
interface might be served over https using a certificate signed by
their network vendor who doesn't give out their ca certificate (simply
stated: your security team knows what they're doing but your vendor's
does not).

> In my opinion, we should take all of the value judgements out of this
> paragraph, and just state the facts. How about:
>
> """
> In order to provide additional flexibility to allow infrastructure
> administrators to provide the appropriate solution for their
> environment, this PEP offers a way for administrators to upgrade to
> later versions of the Python 2.7 series without being forced to update
> their existing security certificate management infrastructure as a
> prerequisite.
> """

Two notes:

* python-2.7.9+ doesn't give you flexibility in this regard so
organizations do have to update their certificate management
infrastructure.  The cornercase described above becomes something that
has to be addressed at the code level.  Environments that are simply
misconfigured have to be fixed.  So in that regard, a value judgement
does seem appropriate here.  the judgement is "Listen guys, this PEP
advises redistributors on how they might provide a migration path for
you but it *does not bandaid the problem indefinitely*.  So long term,
you have to change your practices or you'll be out in the cold when
your redistributor upgrades to python-2.7.9+"

* Your proposed text actually removes the fix that I was adding --
this version of the paragraph implies that if your environment is
compatible with the redistributors' python-2.7.8 (or less) then it
will also be compatible with the redistributors' python-2.7.9+.  That
is not true.  Whether or not we take out any value judgement as to the
user's present environment this paragraph needs to be fixed to make it
clear that this only affects redistributor's packages which have
backported pep 476 to python-2.7.8 or older.  Once the redistributor
updates to a newer python sites which relied on this crutch will
break.

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Request for pronouncement on PEP 493 (HTTPS verification backport guidance)

2015-11-24 Thread Toshio Kuratomi
On Mon, Nov 23, 2015 at 5:59 PM, Barry Warsaw  wrote:

> I'm concerned about accepting PEP 493 making a strong recommendation to
> downstreams.  Yes, in an ideal world we all want security by default, but I
> think the backward compatibility concerns of the PEP are understated,
> especially as they relate to a maintenance release of a stable long term
> support version of the OS.  I don't want PEP 493 to be a cudgel that people
> beat us up with instead of having an honest discussion of the difficult
> trade-offs involved.
>
It sounds like the implementation sections of the PEP are acceptable
but that the PEP's general tone seems to assume that distributors are
champing at the bit to backport and that the recommendations here make
it so that backporting is a no-brainer -- which does not seem to
reflect the real-world?

I think the tone could be changed to address that as it doesn't seem
like forcing distros to backport is a real goal of the PEP.  The main
purposes of the PEP seem to be:

* Enumerate several ways that distributors can backport these 2.7.9
features to older releases
* Allow programmers to detect the presence of the features from their code
* Give end-users the ability to choose between backwards compatibility
and enhanced security

Here's some ideas for changing the tone:

  Abstract
  

  PEP 476 updated Python's default handling of HTTPS certificates to be
  appropriate for communication over the public internet. The Python 2.7 long
  term maintenance series was judged to be in scope for this change, with the
  new behaviour introduced in the Python 2.7.9 maintenance release.

+ Change to "PEP 476 updated Python's default handling of HTTPS
certificates to validate that the certs belonged to the server".  This
way we're saying what the change is rather than making a value
judgement of whether people who don't choose to backport are
"appropriate" or not.  Appropriate-ness is probably best left as an
argument in the text of PEP 476.

  This PEP provides recommendations to downstream redistributors wishing to
  provide a smoother migration experience when helping their users to manage
  this change in Python's default behaviour.

+ Change to "downstream redistributors wishing to backport the
enhancements in a way that allows users to choose between backwards
compatible behaviour or more secure certificate handling."  As barry
noted, this PEP doesn't change the amount of work needed to migrate.
It does, however, give users some choice in when they are going to
perform that work.  Additionally, this isn't simply about distributors
who want to make the transition smoother... (there's no downstreams
that want to make it "more painful" are there? ;-)  It's really about
making backporting of the enhancements less painful for users.

  Rationale
  =

  PEP 476 changed Python's default behaviour to better match the needs and
  expectations of developers operating over the public internet, a category
  which appears to include most new Python developers. It is the position of
  the authors of this PEP that this was a correct decision.

  However, it is also the case that this change *does* cause problems for
  infrastructure administrators operating private intranets that rely on
  self-signed certificates, or otherwise encounter problems with the new default
  certificate verification settings.

+ per barry's mesage, it would be good to either devote a paragraph to
the backwards compatibility implications here or link to
https://www.python.org/dev/peps/pep-0476/#backwards-compatibility

  The long term answer for such environments is to update their internal
  certificate management to at least match the standards set by the public
  internet, but in the meantime, it is desirable to offer these administrators
  a way to continue receiving maintenance updates to the Python 2.7 series,
  without having to gate that on upgrades to their certificate management
  infrastructure.

+ The wording here seems to reflect a different scope than merely
backporting by distros.  Perhaps we should change it to: "[...]set by
the public internet.  Distributions may wish to help these sites
transition by backporting the PEP 476 changes to earlier versions of
python in a way that does not require the administrators to upgrade
their certificate management infrastructure immediately.  This would
allow sites to choose to use the distribution suppiied python in a
backwards compatible fashion until their certificate management
infrastructure was updated and then toggle their site to utilize the
more secure features provided by PEP 476."

[...]

  These designs are being proposed as a recommendation for
redistributors, rather
  than as new upstream features, as they are needed purely to support legacy
  environments migrating from older versions of Python 2.7. Neither approach
  is being proposed as an upstream Python 2.7 feature, nor as a feature in any
  version of Python 3 (whether 

Re: [Python-Dev] Request for pronouncement on PEP 493 (HTTPS verification backport guidance)

2015-11-24 Thread Toshio Kuratomi
On Tue, Nov 24, 2015 at 10:56 AM, Paul Moore <p.f.mo...@gmail.com> wrote:
> On 24 November 2015 at 18:37, Toshio Kuratomi <a.bad...@gmail.com> wrote:

>> The cornercases come
>> into play because you don't always control all of the devices and
>> services on your network.  The site could evaluate and decide that
>> MITM isn't a threat to their switch's configuration interface but that
>> interface might be served over https using a certificate signed by
>> their network vendor who doesn't give out their ca certificate (simply
>> stated: your security team knows what they're doing but your vendor's
>> does not).
>
> This sounds like a similar situation to what I described above. I'm
> not sure I'd see these as corner cases, though - they are pretty much
> day to day business in my experience :-(
>
It sounds like you're coming from a Windows background and I'm coming
from a Linux background which might be a small disconnect here -- we
do seem to be in agreement that what's "right to do" isn't always easy
or possible for the client to accomplish so I think we should probably
leave it at that.

>>> In my opinion, we should take all of the value judgements out of this
>>> paragraph, and just state the facts. How about:
>>>
>>> """
>>> In order to provide additional flexibility to allow infrastructure
>>> administrators to provide the appropriate solution for their
>>> environment, this PEP offers a way for administrators to upgrade to
>>> later versions of the Python 2.7 series without being forced to update
>>> their existing security certificate management infrastructure as a
>>> prerequisite.
>>> """
>>
>> Two notes:
>>
>> * python-2.7.9+ doesn't give you flexibility in this regard so
>> organizations do have to update their certificate management
>> infrastructure.  The cornercase described above becomes something that
>> has to be addressed at the code level.  Environments that are simply
>> misconfigured have to be fixed.  So in that regard, a value judgement
>> does seem appropriate here.  the judgement is "Listen guys, this PEP
>> advises redistributors on how they might provide a migration path for
>> you but it *does not bandaid the problem indefinitely*.  So long term,
>> you have to change your practices or you'll be out in the cold when
>> your redistributor upgrades to python-2.7.9+"
>
> Hmm, maybe I misread the PEP (I only skimmed it - as I say, Linux is
> of limited interest to me). I thought that the environment variable
> gave developers a "get out" clause. Maybe it's not what we want them
> to do (for some value of "we") but isn't that the point of the PEP?
>
> Admittedly if distributions don't *implement* that part of the PEP
> (and I understand Red Hat haven't) then people are still stuck. But
> "this PEP offers a way" is not incompatible with "your vendor didn't
> implement the PEP so you're still stuck, sorry"...
>

Yeah, I think you are correct in your understanding of what actual
changes to the python distrribution are being proposed for
redistributors in the PEP.  Reading through the PEP again, I'm not
sure if I'm correct in thinking that this only applies to
backporting... it may be that the environment section of the PEP
applies to any python-2 while the config file section only applies to
backporting.  Nick, could you clarify?

The PEP is clear that it doesn't apply to python-3 or cross-distro.
So that means that sites still can't rely on this long-term (but their
long term would extend to the lifetime of their vendor supporting
python2 rather than when their vendor updated to 2.7.9+) and also that
developers can't depend on this if they're developing portable code.

>> * Your proposed text actually removes the fix that I was adding --
>> this version of the paragraph implies that if your environment is
>> compatible with the redistributors' python-2.7.8 (or less) then it
>> will also be compatible with the redistributors' python-2.7.9+.  That
>> is not true.  Whether or not we take out any value judgement as to the
>> user's present environment this paragraph needs to be fixed to make it
>> clear that this only affects redistributor's packages which have
>> backported pep 476 to python-2.7.8 or older.  Once the redistributor
>> updates to a newer python sites which relied on this crutch will
>> break.
>
> Sorry for that. Certainly getting the facts right is crucial, and it
> looks like my suggestion didn't do that. But hopefully someone can fix
> it up (if people think it's a good way to go).

Could be that I'm wrong -- will wait for Nick to clarify before I
think about what could be done to make this wording better.

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Fwd: Request for pronouncement on PEP 493 (HTTPS verification backport guidance)

2015-11-24 Thread Toshio Kuratomi
On Nov 24, 2015 6:28 AM, "Laura Creighton"  wrote:
>
> In a message of Tue, 24 Nov 2015 14:05:53 +, Paul Moore writes:
> >Simply adding "people who have no control over their broken
> >infrastructure" with a note that this PEP helps them, would be
> >sufficient here (and actually helps the case for the PEP, so why not?
> >;-))
>
> But does it help them?  Or does it increase the power of those who
> hand out certificates and who are intensely security conscious over
> those who would like to get some work done this afternoon?
>
My reading is that it will help more people but lockdown environments can
still trump their users if they wish.

If a distribution wishes to give users of older python versions the option
of verifying certificates then they will need to backport changes
authorized by previous peps.  By themselves, those changes would make it so
environment owners and application authors are in complete control.  If an
application is coded to do cert verification and the remote end has
certificates that aren't recognized as valid on the client end then the
user would have to change the client application code to be able to use it
in their environment (or figure out how to get the ca for the remote end
into their local certificate store... in extreme cases, this might be
impossible - the ca cert has been lost or belongs to another company).

This pep tells distributions how they might give the client end a bit more
power when they backport.  The settings file allows the client to toggle
verification site wide.  The environment variable allows clients to toggle
it per application invocation.  Both of these situations are better for a
client than having the backport and nothing else.  Both of these can be
shut down by an environment owner with sufficient authority to limit what's
running on the client (not sure the scope of the environment owner's powers
here so I thought I should acknowledge this factor).

So basically: backporting other peps (to increase security) will subtract
power from the clients.  This pep specifies several facilities the
backporters can implement to give some of that power back to the clients.

-Toshio

>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.7 is here until 2020, please don't call it a waste.

2015-05-30 Thread Toshio Kuratomi
On May 30, 2015 1:56 AM, Nick Coghlan ncogh...@gmail.com wrote:

 Being ready, willing and able to handle the kind of situation created
 by the Python 2-3 community transition is a large part of what it
 means to offer commercial support for community driven open source
 projects, as it buys customers' time for either migration technologies
 to mature to a point where the cost of migration drops dramatically,
 for the newer version of a platform to move far enough ahead of the
 legacy version for there to be a clear and compelling business case
 for forward porting existing software, or (as is the case we're aiming
 to engineer for Python), both.

Earlier, you said that it had been a surprise that people were against this
change.  I'd just point out that the reason is bound up in what you say
here.  Porting performance features from python 3 to python 2 has the
disadvantage of cutting into a compelling business case for users to move
forward to python 3.[1]  so doing this has a cost to python 3 adoption.
But, the question is whether there is a benefit that outweighs that cost.
I think seeing more steady, reliable contributors to python core is a very
large payment.  Sure, for now that payment is aimed at extending the legs
on the legacy version of python but at some point in the future python 2's
legs will be well and truly exhausted.  When that happens both the
developers who have gained the skill of contributing to cpython and the
companies who have invested money in training people to be cpython
contributors will have to decide whether to give up on all of that or
continue to utilize those skills and investments by bettering python 3.
I'd hope that we can prove ourselves a welcoming enough community that
they'd choose to stay.

-Toshio

[1] In fact, performance differences are a rather safe way to build
compelling business cases for forwards porting.  Safe because it is a
difference (unlike api and feature differences) that will not negatively
affect your ability to incrementally move your code to python 3.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Distutils] Python 3.x Adoption for PyPI and PyPI Download Numbers

2015-04-21 Thread Toshio Kuratomi
On Tue, Apr 21, 2015 at 01:54:55PM -0400, Donald Stufft wrote:
 
 Anyways, I'll have access to the data set for another day or two before I
 shut down the (expensive) server that I have to use to crunch the numbers so 
 if
 there's anything anyone else wants to see before I shut it down, speak up 
 soon.
 
Where are curl and wget getting categorized in the User Agent graphs?

Just morbidly curious as to whether they're in with Browser and therefore
mostly unused or Unknown and therefore only slightly less unused ;-)

-Toshio


pgpl68EPROSH6.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Use ptyhon -s as default shbang for system python executables/daemons

2015-03-23 Thread Toshio Kuratomi
On Mon, Mar 23, 2015 at 03:30:23PM +0100, Antoine Pitrou wrote:
 On Mon, 23 Mar 2015 07:22:56 -0700
 Toshio Kuratomi a.bad...@gmail.com wrote:
  
  Building off Nick's idea of a system python vs a python for users to use, I
  would see a more useful modification to be able to specify SPYTHONPATH (and
  other env vars) to go along with /usr/bin/spython .  That way the user
  maintains the capability to override specific libraries globally just like
  with LD_LIBRARY_PATH, PATH, and similar but you won't accidentally
  configure your own python to use one set of paths for your five python apps
  and then have that leak over and affect system tools.
 
 I really think Donald has a good point when he suggests a specific
 virtualenv for system programs using Python.
 
The isolation is what we're seeking but I think the amount of work required
and the added complexity for the distributions will make that hard to get
distributions to sign up for.

If someone had the time to write a front end to install packages into
a single system-wide isolation unit whose backend was a virtualenv we
might be able to get distributions on-board with using that.

The front end would need to install software so that you can still invoke
/usr/bin/system-application and system-application would take care of
activating the virtualenv.  It would need to be about as simple to build
as the present python2 setup.py build/install with the flexibility in
options that the distros need to install into FHS approved paths.  Some
things like man pages, locale files, config files, and possibly other data
files might need to be installed outside of the virtualenv directory.  Many
setup.py's already punt on some of those, though, letting the user choose
to install them manually.  So this might be similar.  It would need to be able
to handle 32bit and 64bit versions of the same library installed on the same
system.  It would need to be able to handle different versions of the same
library installed on the same system (as few of those as possible but it's
an unfortunate cornercase that can't be entirely ignored even for just
system packages).  It would need a mode where it doesn't use the network at
all; only operates with the packages and sources that are present on the
box.

And remember these two things: (1) we'd be asking the distros to do
a tremendous amount of work changing their packages to install into
a virtualenv instead of the python setup.py way that is well documented and
everyone's been using for ages.  it'll be a tough sell even with good
tooling.  (2) this theoretical front-end would have to appeal to the distro
maintainers so there would have to be a lot of talk to understand what
capabilities the distro maintainers would need from it.

-Toshio


pgp1JMWtlRGec.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Use ptyhon -s as default shbang for system python executables/daemons

2015-03-23 Thread Toshio Kuratomi
-Toshio
On Mar 19, 2015 3:27 PM, Victor Stinner victor.stin...@gmail.com wrote:

 2015-03-19 21:47 GMT+01:00 Toshio Kuratomi a.bad...@gmail.com:
  I think I've found the Debian discussion (October 2012):
 
  http://comments.gmane.org/gmane.linux.debian.devel.python/8188
 
  Lack of PYTHONWARNINGS was brought up late in the discussion thread

 Maybe we need to modify -E or add a new option to only ignore PYTHONPATH.

I think pythonpath is still useful on its own.

Building off Nick's idea of a system python vs a python for users to use, I
would see a more useful modification to be able to specify SPYTHONPATH (and
other env vars) to go along with /usr/bin/spython .  That way the user
maintains the capability to override specific libraries globally just like
with LD_LIBRARY_PATH, PATH, and similar but you won't accidentally
configure your own python to use one set of paths for your five python apps
and then have that leak over and affect system tools.

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Use ptyhon -s as default shbang for system python executables/daemons

2015-03-23 Thread Toshio Kuratomi
On Mon, Mar 23, 2015 at 04:14:52PM +0100, Antoine Pitrou wrote:
 On Mon, 23 Mar 2015 08:06:13 -0700
 Toshio Kuratomi a.bad...@gmail.com wrote:
   
   I really think Donald has a good point when he suggests a specific
   virtualenv for system programs using Python.
   
  The isolation is what we're seeking but I think the amount of work required
  and the added complexity for the distributions will make that hard to get
  distributions to sign up for.
  
  If someone had the time to write a front end to install packages into
  a single system-wide isolation unit whose backend was a virtualenv we
  might be able to get distributions on-board with using that.
 
 I don't think we're asking distributions anything. We're suggesting a
 possible path, but it's not python-dev's job to dictate distributions
 how they should package Python.
 
 The virtualenv solution has the virtue that any improvement we might
 put in it to help system packagers would automatically benefit everyone.
 A specific system Python would not.
 
  The front end would need to install software so that you can still invoke
  /usr/bin/system-application and system-application would take care of
  activating the virtualenv.  It would need to be about as simple to build
  as the present python2 setup.py build/install with the flexibility in
  options that the distros need to install into FHS approved paths.  Some
  things like man pages, locale files, config files, and possibly other data
  files might need to be installed outside of the virtualenv directory.
 
 Well, I don't understand what difference a virtualenv would make.
 Using a virtualenv amounts to invoking a different interpreter path.
 The rest of the filesystem (man pages locations, etc.) is still
 accessible in the same way. But I may miss something :-)
 
nod  I think people who are saying The system should just use
virtualenv aren't realizing all of the reasons that's not as simple as it
sounds for distributions to implement.  thus the work required to implement
alternate solutions like a system python may seem less to the distros 
unless those issues are partially addressed at the virtualenv and
python-packaging level.

-Toshio


pgpVr_lDSoJ_Q.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

2014-01-07 Thread Toshio Kuratomi
On Tue, Jan 07, 2014 at 09:26:20PM +0900, Stephen J. Turnbull wrote:
 Is this really a good idea?  PEP 460 proposes rather different
 semantics for bytes.format and the bytes % operator from the str
 versions.  I think this is going to be both confusing and a continuous
 target for further improvement until the two implementations
 converge.


Reading about the proposed differences reminded me of how in older python2
versions unicode() took keyword arguments but str.decode() only took
positional arguments.  I squashed a lot of trivial bugs in people's code
where that difference wasn't anticpated.  In later python2 versions both of
those came to understand how to take their arguments as keywords which saved
me from further unnecessary pain.

-Toshio


pgpuZ4S1f5GEP.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 to be default in Fedora 22

2013-10-25 Thread Toshio Kuratomi
On Fri, Oct 25, 2013 at 01:32:36PM +1000, Nick Coghlan wrote:
 
 On 25 Oct 2013 09:02, Terry Reedy tjre...@udel.edu wrote:
 
  http://lwn.net/Articles/571528/
  https://fedoraproject.org/wiki/Changes/Python_3_as_Default
 
 Note that unlike Arch, the Fedora devs currently plan to leave /usr/bin/
 python referring to Python 2 (see the User Experience part of the 
 proposal).
 
nod

The tangible changes for this are just that we're hoping to only have
python3, not python2 on our default LiveCD and cloud images.  This has been
a bit hard since many of our core packaging tools (and the large number of
release engineering, package-maintainer, distro installer, etc scripts built
on top of them) were written in python2.  The F22 release is hoping to have
a set of C libraries for those tools with both python3 and python2 bindings.
That will hopefully allow us to port the user-visible tools (installer and
things present on the selected images) to python3 for F22 while leaving the
release-engineering and packager-oriented scripts until a later Fedora
release.

-Toshio


pgpr8jz4t1Ec9.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] non-US zip archives support in zipfile.py

2013-10-17 Thread Toshio Kuratomi
On Tue, Oct 15, 2013 at 03:46:15PM +0200, Martin v. Löwis wrote:
 Am 15.10.13 14:49, schrieb Daniel Holth:
  It is part of the ZIP specification. CP437 or UTF-8 are the two
  official choices, but other encodings happen on Russian, Japanese
  systems.
 
 Indeed. Formally, the other encodings are not supported by the
 ZIP specification, and are thus formally misuse of the format.
 
nod  But the tools in the wild misuse the format in this manner.
CP437 can encode any byte so zip and unzip on Linux, for instance, take the
bytes that represent the filename on the filesystem and use those in the zip
file without setting the utf-8 flag.  When the files are extracted, the same
byte sequence are used as the filename for the new files.

 I believe (without having proof) that early versions of the
 specification failed to discuss the file name encoding at all,

These might be helpful:

No mention of file name encodings in this version of the spec:
http://www.pkware.com/documents/APPNOTE/APPNOTE-6.2.2.TXT

Appendix D, Language Encoding, shows up here:
http://www.pkware.com/documents/APPNOTE/APPNOTE-6.3.0.TXT

(Most recent version is 6.3.2)

 making people believe that it is unspecified and always the
 system encoding (which is useless, of course, as you create
 zip files to move them across systems).

Not always.  Backups are another use.  Also it's not useless.  If the files
are being moved within an organization (or in some cases geographical
regions have standardized on an encoding in practice), the same system
encoding could very well be in use on the machines where the files end up.

-Toshio


pgp9rjopytsng.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Offtopic: OpenID Providers

2013-09-09 Thread Toshio Kuratomi
On Thu, Sep 5, 2013 at 6:09 PM, Stephen J. Turnbull step...@xemacs.org wrote:
 Barry Warsaw writes:

   We're open source, and I think it benefits our mission to support open,
   decentralized, and free systems like OpenID and Persona.

 Thus speaks an employee of yet another Provider-That-Won't-Accept-My-
 Third-Party-Credentials.  Sorry, Barry, but you see the problem:
 Unfortunately, we can't do it alone.  What needs to happen is there
 needs to be a large network of sites that support login via O-D-F
 systems like OpenID and Persona.  Too many of the sites I use (news
 sources, GMail, etc) don't support them and my browser manages my
 logins to most of them, so why bother learning OpenID, and then
 setting it up site by site?

[snipped lots of observations that I generally agree with]

There's been a lot of negativity towards OpenID in this thread -- I'd
like to say that in Fedora Infrastructure we've found OpenID to be
very very good -- but not at addressing the problem that most people
are after here.  As you've observed being an OpenID provider is a
relatively easy to swallow proposition; accepting OpenID from third
parties is another thing entirely.  As you've also observed, this has
to do with trust.  A site can trust their own account system and
practices and issue OpenID based on those.  It is much riskier for the
site to trust someone else's account system and practices when
deciding whether a user is actually the owner of the account that they
claim.

So OpenID fails as a truly generic SSO method across sites on the
internet... what have we found it good for then?  SSO within our site.
 More and more apps support OpenID out of the box.  Many web
frameworks have modules for the code you write to authenticate against
an OpenID server.  A site configures these apps and modules to only
trust the site's OpenID service and then deploys them with less custom
code.  Sites also get a choice about how much risk they consider
compromised accounts to a particular application.  If they run a web
forum and a build system for instance, they might constrain the build
system to only their OpenID service but allow the forum to allow
OpenID from other providers. And finally, having an openid service
lets their users sign into more trusting sites like python.org
properties (unlike say, LDAP) :-)

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Offtopic: OpenID Providers

2013-09-05 Thread Toshio Kuratomi
On Thu, Sep 05, 2013 at 02:53:43PM -0400, Barry Warsaw wrote:
 
 This probably isn't the only application of these technologies, but I've
 always thought about OAuth as delegating authority to scripts and programs to
 act on your behalf.  For example, you can write a script to interact with
 Launchpad's REST API, but before you can use the script, you have to interact
 with the web ui once (since your browser is trusted, presumably) to receive a
 token which the script can then use to prove that it's acting on your behalf.
 If at some point you stop trusting that script, you can revoke the token to
 disable its access, without having to reset your password.
 
 To me, OpenID is about logging into web sites using single-sign on.  For
 example, once I've logged into Launchpad, I can essentially go anywhere that
 accepts OpenID, type my OpenID and generally not have to log in again (things
 like two-factor auth and such may change that interaction pattern).
 
 Or to summarize to a rough approximation: OpenID is for logins, OAuth is for
 scripts.
 
 Persona seems to fit the OpenID use case.  You'd still want OAuth for
 scripting.
 
nod  However, in some cases, Persona/OpenID can make more sense for
scripts.  For instance, if you have a script that is primarily interactive
in nature, it may be better to have the user login via that script than to
have an OAuth token laying around on the filesystem all the time
(Contrariwise, if the script is primarily run from cron or similar, it's
better to have a token with limited permissions laying around on the
filesystem than your OpenID password ;-)

It's probably also useful to point out that OAuth (because it was developed
to let third party websites have limited permission to act on your behalf)
is more paranoid than strictly required for many scripts where that
third-party is a script that you've written running on a box that you
control.  If that's the main use case for your service, OAuth may not be
a good fit for your authz needs.

-Toshio


pgpK1y9fAvp9j.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Offtopic: OpenID Providers

2013-09-05 Thread Toshio Kuratomi
On Thu, Sep 05, 2013 at 10:25:22PM +0400, Oleg Broytman wrote:
 On Thu, Sep 05, 2013 at 02:16:29PM -0400, Donald Stufft don...@stufft.io 
 wrote:
  
  On Sep 5, 2013, at 2:12 PM, Oleg Broytman p...@phdru.name wrote:
 I used to use myOpenID and became my own provider using poit[1].
   These days I seldom use OpenID -- there are too few sites that allow
   full-featured login with OpenID. The future lies in OAuth 2.0.
  
  The Auth in OAuth stands for Authorization not Authentication.
 
There is no authorization without authentication, so OAuth certainly
 performs authentication: http://oauth.net/core/1.0a/#anchor9 ,
 http://tools.ietf.org/html/rfc5849#section-3
 
Sortof The way OAuth looks to me, it's designed to prove that a given
client is authorized to perform an action.  It's not designed to prove that
the given client is a specific person.  In some cases, you really want to
know the latter and not merely the former.  So I think in these situations
Donald's separation of Authz and Authn makes sense.

-Toshio


pgppLjnxYjd1p.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 as a Default in Linux Distros

2013-07-25 Thread Toshio Kuratomi
On Jul 24, 2013 6:37 AM, Brett Cannon br...@python.org wrote:
 The key, though, is adding python2 and getting your code to use that
binary  specifically so that shifting the default name is more of a
convenience than something which might break existing code not ready for
the switch.

Applicable to this, does anyone know whether distutils, setuptools,
distlib, or any of the other standard build+install tools are doing shebang
requiring?  Are they doing the right thing wrt python vs python2?

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 as a Default in Linux Distros

2013-07-25 Thread Toshio Kuratomi
On Thu, Jul 25, 2013 at 10:25:26PM +1000, Nick Coghlan wrote:
 On 25 July 2013 20:38, Toshio Kuratomi a.bad...@gmail.com wrote:
 
  On Jul 24, 2013 6:37 AM, Brett Cannon br...@python.org wrote:
  The key, though, is adding python2 and getting your code to use that
  binary  specifically so that shifting the default name is more of a
  convenience than something which might break existing code not ready for 
  the
  switch.
 
  Applicable to this, does anyone know whether distutils, setuptools, distlib,
  or any of the other standard build+install tools are doing shebang
  requiring?  Are they doing the right thing wrt python vs python2?
 
 It occurs to me they're almost certainly using sys.executable to set
 the shebang line, so probably not :(
 
 distutils-sig could probably offer a better answer though, most of the
 packaging folks don't hang out here.
 
Thanks!

For other Linux distributors following along, here's my message to
distutils-sig:

http://mail.python.org/pipermail/distutils-sig/2013-July/022001.html

-Toshio


pgpRDSrmks3t3.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 as a Default in Linux Distros

2013-07-24 Thread Toshio Kuratomi
Note: I'm the opposite number to bkabrda in the discussion on the Fedora
Lists about how quickly we should be breaking end-user expectations of what
python means.

On Wed, Jul 24, 2013 at 09:34:11AM -0400, Brett Cannon wrote:
 
 
 
 On Wed, Jul 24, 2013 at 5:12 AM, Bohuslav Kabrda bkab...@redhat.com wrote:
 
 Hi all,
 in recent days, there has been a discussion on fedora-devel (see thread
 [1]) about moving to Python 3 as a default.
 I'd really love to hear opinions on the matter from the upstream, mainly
 regarding these two points (that are not that clearly defined in my
 original proposal and have been formed during the discussion):
 
Note that the proposal is for Fedora 22.  So the timeframe for making the
switch in development is approximately 8 months from now.  Timeframe for
that release to be public is October 2014.

 - Should we point /usr/bin/python to Python 3 when we make the move?
 I know that pep 394 [2] deals with this and it says that /usr/bin/python
 may refer to Python 3 on some bleeding edge distributions - supposedly,
 this was added to the pep because of what Arch Linux did, not the other 
 way
 round.
 As the pep says, the recommendation of pointing /usr/bin/python to Python 
 2
 may be changed after the Python 3 ecosystem is sufficiently mature. I'm
 wondering if there are any more specific criteria - list of big projects
 migrated/ported or something like that - or will this be judged by what 
 I'd
 call overall spirit in Python community (I hope you know what I mean by
 this)?
 In Fedora, we have two concerns that clash in this decision - being 
 First
 (e.g. actively promote and use new technologies and also suggest them to
 our users) vs. not breaking user expectations. So we figured it'd be a 
 good
 idea to ask upstream to get more opinions on this.
 
 - What should user get after using yum install python?
 There are basically few ways of coping with this:
 1) Just keep doing what we do, eventually far in the future drop python
 package and never provide it again (= go on only with python3/python4/...
 while having yum install python do nothing).
 2) Do what is in 1), but when python is dropped, use virtual provide (*)
 python for python3 package, so that yum install python installs
 python3.
 3), 4) Rename python to python2 and {don't add, add} virtual provide
 python in the same way that is in 1), 2)

4) Is my preference: python package becomes python2; Virtual Provide: python
means you'd get the python package is what I'd promote for now.  Users still
expect python2 when they talk about python.  At some point in the future,
people will come to pycon and talks will apply to python3 unless otherwise
specified.  People writing new blog posts will say python and the code
they use as samples won't run on the python2 interpreter.  Expecting for
that to be the case in 12 months seems premature.

 5) Rename python to python2 and python3 to python at one point. This makes
 sense to me from the traditional one version in distro + possibly compat
 package shipping the old approach in Linux, but some say that Python 2 
 and
 Python 3 are just different languages [3] and this should never be done.
 All of the approaches have their pros and cons, but generally it is all
 about what user should get when he tries to install python - either 
 nothing
 or python2 for now and python3 in future - and how we as a distro cope 
 with
 that on the technical side (and when we should actually do the switch).
 Just as a sidenote, IMO the package that gets installed as python (if
 any) should point to /usr/bin/python, which makes consider these two 
 points
 very closely coupled.
 
 
 A similar discussion broke out when Arch Linux switched python to point to
 python3. This led to http://www.python.org/dev/peps/pep-0394/ which says have
 python2/python3, and have python point at whatever makes the most sense to you
 based on your users and version uptake (option 3/4).

nod  I think bkabrda is looking for some clarification on PEP-394.  My
reading and participation in the previous discussions lead me to believe
that while PEP-394 wants to be diplomatic, the message it wants to get
across is:

1) warn distributions that what Arch was doing was premature.
2) provide a means to get them to switch at roughly the same time (when the
   recommendation in the PEP is flipped to suggest linking /usr/bin/python
   to /usr/bin/python3)

This is especially my reading from the Recommendations section of the PEP.
Unfortunately, we're getting stuck in the Abstract section which has this
bullet point:

* python should refer to the same target as python2 but may refer to python3
on some bleeding edge distributions

Knowing the history, I read this in two parts:
* Recommendation to distributions: python should refer to the same target
  as python2.
* 

Re: [Python-Dev] Python 3 as a Default in Linux Distros

2013-07-24 Thread Toshio Kuratomi
On Wed, Jul 24, 2013 at 12:42:09PM -0400, Barry Warsaw wrote:
 On Jul 25, 2013, at 01:41 AM, Nick Coghlan wrote:
 
 How's this for an updated wording in the abstract:
 
   * for the time being, all distributions should ensure that python
 refers to the same target as python2
   * however, users should be aware that python refers to python3 on at
 least Arch Linux (that change is
 what prompted the creation of this PEP), so python should be
 used in the shebang line only for
 scripts that are source compatible with both Python 2 and 3
 
 +1
 
+1 as well.  Much clearer.

-Toshio


pgpgh3MdAJ43H.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bilingual scripts

2013-05-28 Thread Toshio Kuratomi
On Tue, May 28, 2013 at 01:22:01PM -0400, Barry Warsaw wrote:
 On May 27, 2013, at 11:38 AM, Toshio Kuratomi wrote:
 
 - If upstream doesn't deal with it, then we use a python3- prefix.  This
 matches with our package naming so it seemed to make sense.  (But
 Barry's point about locate and tab completion and such would be a reason
 to revisit this... Perhaps standardizing on /usr/bin/foo2-python3
 [pathological case of having both package version and interpreter
 version in the name.]
 
 Note that the Gentoo example also takes into account versions that might act
 differently based on the interpreter's implementation.  So a -python3 suffix
 may not be enough.  Maybe now we're getting into PEP 425 compatibility tag
 territory.
 
nod  This is an interesting, unmapped area in Fedora at the moment... I
was hoping to talk to Nick and the Fedora python maintainer at our next
Fedora conference.

I've been looking at how Fedora's ruby guidelines are implemented wrt
alternate interpreters and wondering if we could do something similar for
python:

https://fedoraproject.org/wiki/Packaging:Ruby#Different_Interpreters_Compatibility

I'm not sure yet how much of that I'd (or Nick and the python maintainer
[bkabrda, the current python maintainer is the one who wrote the rubypick
script]) would want to use in python -- replacing /usr/bin/python with a
script that chooses between CPython and pypy based on user preference gave
me an instinctual feeling of dread the first time I looked at it but it
seems to be working well for the ruby folks.

My current feeling is that I wouldn't use this same system for interpreters
which are not mostly compatible (for instance, python2 vs python3).  but I
also haven't devoted much actual time to thinking about whether that might
have some advantages.

-Toshio


pgpKoSrX0710o.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bilingual scripts

2013-05-27 Thread Toshio Kuratomi
On Sat, May 25, 2013 at 05:57:28PM +1000, Nick Coghlan wrote:
 On Sat, May 25, 2013 at 5:56 AM, Barry Warsaw ba...@python.org wrote:
  Have any other *nix distros addressed this, and if so, how do you solve it?
 
 I believe Fedora follows the lead set by our own makefile and just
 appends a 3 to the script name when there is also a Python 2
 equivalent (thus ``pydoc3`` and ``pyvenv``). (I don't have any other
 system provided Python 3 scripts on this machine, though)
 

Fedora is a bit of a mess... we try to work with upstream's intent when
upstream has realized this problem exists and have a single standard when
upstream does not.  The full guidelines are here:

http://fedoraproject.org/wiki/Packaging:Python#Naming

Here's the summary:

* If the scripts don't care whether they're running on py2 or py3, just use
  the base name and choose python2 as the interpreter for now (since we
  can't currently get rid of python2 on an end user system, that is the
  choice that brings in less dependencies).  ex: /usr/bin/pygmentize

* If the script does two different things depending on python2 or python3
  being the interpreter (note: this includes both bilingual scripts and
  scripts which have been modified by 2to3/exist in two separate versions)
  then we have to look at what upstream is doing:

- If upstream already deals with it (ex: pydoc3, easy_install-3.1) then we
use upstream's name.  We don't love this from an inter-package
consistently standpoint as there are other packages which append a
version for their own usage (is /usr/bin/foo-3.4 for python-3.4 or the
3.4 version of the foo package?) (And we sometimes have to do this
locally if we need to have multiple versions of a package with the
multiple versions having scripts... )  We decided to use upstream's name
if they account for this issue because it will match with upstream's
documentation and nothing else seemed as important in this instance.

- If upstream doesn't deal with it, then we use a python3- prefix.  This
matches with our package naming so it seemed to make sense.  (But
Barry's point about locate and tab completion and such would be a reason
to revisit this... Perhaps standardizing on /usr/bin/foo2-python3
[pathological case of having both package version and interpreter
version in the name.]

  - (tangent from a different portion of this thread: we've found that this
is a larger problem than we would hope.  There are some obvious ones
like
- ipython (implements a python interpreter so python2 vs python3 is
  understandably important ad different). 
- nosetests (the python source being operated on is run through the
  python interpreter so the version has to match).
- easy_install (needs to install python modules to the correct
  interpreter's site-packages.  It decides the correct interpreter
  according to which interpreter invoked it.)

But recently we found a new class of problems:  frameworks which are
bilinugual.  For instance, if you have a web framework which has a
/usr/bin/django-admin script that can be used to quickstart a
project, run a python shell and automatically load your code, load your
ORM db schema and operate on it to make modifications to the db then
that script has to know whether your code is compatible with python2 or
python3.


  It would be nice if we could have some cross-platform recommendations so
  things work the same wherever you go.  To that end, if we can reach some
  consensus, I'd be willing to put together an informational PEP and some
  scripts that might be of general use.
 
 It seems to me the existing recommendation to use ``#!/usr/bin/env
 python`` instead of referencing a particular binary already covers the
 general case. The challenge for the distros is that we want a solution
 that *ignores* user level virtual environments.
 
 I think the simplest thing to do is just append the 3 to the binary
 name (as we do ourselves for pydoc) and then abide by the
 recommendations in PEP 394 to reference the correct system executable.
 
I'd rather not have a bare 3 for the issues notes above.  Something like py3
would be better.

There's still room for confusion when distributions have to push multiple
versions of a package with scripts that fall into this category.  Should the
format be:

/usr/bin/foo2-py3  (My preference as it places the version next to the
thing that it's a version of.)

or

/usr/bin/foo-py3-2  (Confusing as the 2 is bare.  Something like
/usr/bin/foo-py3-v2 is slightly better but still not as nice as the
previous IMHO)

-Toshio


pgpOcm8nDJ4cG.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Conflicts [was Re: Keyword meanings [was: Accept just PEP-0426]]

2012-12-10 Thread Toshio Kuratomi
On Sun, Dec 09, 2012 at 01:51:09PM +1100, Chris Angelico wrote:
 On Sun, Dec 9, 2012 at 1:11 PM, Steven D'Aprano st...@pearwood.info wrote:
  Why would a software package called Spam install a top-level module called
  Jam rather than Spam? Isn't the whole point of Python packages to solve
  this namespace problem?
 
 That would require/demand that the software package MUST define a
 module with its own name, and MUST NOT define any other top-level
 modules, and also that package names MUST be unique. (RFC 2119
 keywords.) That would work, as long as those restrictions are
 acceptable.
 
/me notes that setuptools itself is an example of a package that violates
this rule )setuptools and pkg_resources).

No objections to That would work, as long as those restrictions are
acceptable that seems to sum up where we're at.

-Toshio


pgpC3zr5ZrDtv.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Conflicts [was Re: Keyword meanings [was: Accept just PEP-0426]]

2012-12-10 Thread Toshio Kuratomi
On Sun, Dec 09, 2012 at 12:48:45AM -0500, PJ Eby wrote:
 
 Do any of the distro folks know of a Python project tagged as
 conflicting with another for their distro, where the conflict does
 *not* involve any files in conflict?
 
In Fedora we do work to avoid most types of Conflicts (backporting fixes,
etc) but I can give some examples of where Conflivts could have been used in
the past:

In docutils prior to the latest release, certain portions of docutils was
broken if pyxml was installed (since pyxml replaces certain stdlib xml.*
functionaltiy).  So older docutils versions could have had a Conflicts:
PyXML. Nick has since provided a technique for docutils to use that loads
from the stdlib first and only goes to PyXML if the functionality is not
available there.

Various libraries in web stacks have had bugs that prevent the propser
functioning of the web framework at the top level.  In case of major issues
(security, unable to startup), these top level frameworks could use
versioned Conflicts to prevent installation.  For instance:  TurboGears
might have a Conflicts: CherryPy  2.3.1 

Note, though, that if parallel installable versions and selection of the
proper versions from that work, then this type of Conflict wouldn't be
necessary.  Instead you'd have versioned Requires: instead.

-Toshio


pgpb06wUdfxFB.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-10 Thread Toshio Kuratomi
On Fri, Dec 7, 2012 at 10:46 PM, PJ Eby p...@telecommunity.com wrote:


 In any case, as I said before, I don't have an issue with the fields
 all being declared as being for informational purposes only.  My issue
 is only with recommendations for automated tool behavior that permit
 one project's author to exercise authority over another project's
 installation.


Skipping over a lot of other replies between you and I because I think that
we disagree on a lot but that's all moot if we agree here.

I have no problems with Obsoletes, Conflicts, Requires, and Provides types
of fields are marked informational.  In fact, there are many cases where
packages are overzealous in their use of Requires right now that cause
distributions to patch the dependency information in the package metadata.

-Toshio
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-07 Thread Toshio Kuratomi
On Fri, Dec 07, 2012 at 01:18:40AM -0500, PJ Eby wrote:
 On Thu, Dec 6, 2012 at 1:49 AM, Toshio Kuratomi a.bad...@gmail.com wrote:
  On Wed, Dec 05, 2012 at 07:34:41PM -0500, PJ Eby wrote:
  On Wed, Dec 5, 2012 at 6:07 PM, Donald Stufft donald.stu...@gmail.com 
  wrote:
 
  Nobody has actually proposed a better one, outside of package renaming
  -- and that example featured an author who could just as easily have
  used an obsoleted-by field.
 
  How about pexpect and pextpect-u as a better example?
 
 Perhaps you could explain?  I'm not familiar with those projects.
 

pexepect was last released in 2008.  Upstream went silent with unanswered
bugs in its tracker and no mailing list.  A fork of pexpect was created that
addressed the issue of unicode type in python2, a python3 port, and has
slowly evolvd since then.

I see that the original upstream has made some commits to their source
repository since the fork was created although there has still been no new
release.

  Note that although well-managed Linux distros attempt to control random
  forking internally, the distro package managers don't prevent people from
  installing from third parties.  So Ubuntu PPAs, upstreams that provide their
  own rpms/debs, and major third party repos (for instance, rpmfusion as
  an add-on repo to Fedora) all have and sometimes (mis)use the ability to
  Obsolete packages in the base repository.
 
 But in each of these cases, the packages are being defined *with
 reference to* some underlying vision of what the distro (or even a
 distro) is.  An Ubuntu PPA, if I understand correctly, is still
 *building an Ubuntu system*.  Python packaging as a whole lacks such
 frames of reference.  A forked distro is still a distro, and it's a
 fork *of something*.  Rpmfusion is defining an enhanced Fedora, not
 slinging random unrelated packages about.
 
Uhm that's both true and false as any complex system is.
rpm and deb are just packaging formats.  So:

*) Not all packages built build on top of that system.  There are rpm
packages provided by upstreams that users attempt (to greater and lesser
degrees of success) to install on SuSE, RHEL, Fedora, Mandriva, etc.  There
are debs built for Ubuntu that people attempt to install onto Debian.

*) PPAs and rpmfusion may both build on top of an existing system but they
can change the underlying structure, replacing components that other pieces
of the base system depend on.  You talk about the setuptools and distribute
problem on pypi there's absolutley nothing that prevents someone from
building a PPA or a package in a third-party rpm repository that packages
a setuptools that Obsoletes: distribute or a distribute package that
Obsoletes: setuptools.

  The install tools can then choose how they wish to deal with those caveats.
  Some example strategies: choose to prompt the user as to which to install,
  choose to always treat the fields as human-informational only, mark some
  repositories as being trusted to contain packages where these fields are
  active and other repositories where the fields are ignored.
 
 A peculiar phenomenon: every defense of these fields seems to refer
 almost exclusively to how the problems could be fixed or why the
 problems aren't that bad, rather than *how useful the fields would be*
 in real-world scenarios.  In some cases, the argument for the fields'
 safety actually runs *counter* to their usefulness, e.g., the fields
 aren't that bad because we could make them have a limited function or
 no function at all.  Isn't lack of usefulness generally considered an
 argument for *not* including a feature?  ;-)

If you constantly forget why the fields are useful, then I suppose you'll
always believe that :-)

-Toshio


pgpL8jo6bw502.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-05 Thread Toshio Kuratomi
On Wed, Dec 05, 2012 at 02:46:11AM -0500, Donald Stufft wrote:
 On Wednesday, December 5, 2012 at 2:13 AM, PJ Eby wrote:
 
 On Mon, Dec 3, 2012 at 2:43 PM, Daniel Holth dho...@gmail.com wrote:
 
 How to use Obsoletes:
 
 The author of B decides A is obsolete.
 
 A releases an empty version of itself that Requires: B
 
 B Obsoletes: A
 
 The package manager says These packages are obsolete: A. Would you
 like to
 remove them?
 
 User says OK.
 
 
 Um, no. Even if the the author of A and B are the same person, you
 can't remove A if there are other things on the user's system using
 it. The above scenario does not work *at all*, ever, except in the
 case where B is simply an updated version of A (i.e. identical API) --
 in which case, why bother? To change the project name? (Then it
 should be Formerly-named or something like that, not Obsoletes.)
 
 You can automatically uninstall A from B in an automatic dependency
 management system.  I *think* RPM does this, at the very least

This is correct.

 I believe it refuses to install B if A is already there (and the reverse
 as well).*

I'd have to test this but I believe you are correct about the first.  Not
sure about the reverse.

 There's nothing preventing an installer from, during it's attempt to
 install B, see it Obsoletes A, looking at what depends on A and
 warning the user what is going to happen and prompt it.
 
In rpm-land, if something depended on A and nothing besides the actual
A package provided A, rpm will refuse to install B.  But rpm is meant to be
used unattended so different package managers could certainly choose to
prompt.  For package renames, package B would have both an Obsoletes: A =
$OLD_VERSION and a Provides: A = NEW_VERSION

-Toshio


pgptWKkjHQPiB.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-12-05 Thread Toshio Kuratomi
On Wed, Dec 05, 2012 at 07:34:41PM -0500, PJ Eby wrote:
 On Wed, Dec 5, 2012 at 6:07 PM, Donald Stufft donald.stu...@gmail.com wrote:
 
 Nobody has actually proposed a better one, outside of package renaming
 -- and that example featured an author who could just as easily have
 used an obsoleted-by field.
 
How about pexpect and pextpect-u as a better example?

 
  Very convenient to declare that one of the major use cases for
  Obsoletes over Obsoleted-By is not valid because of your own
  personal opinions.
 
 I didn't say it was invalid, I said:
 
 Note that the author of package X no longer maintains it does not
 equal package Y is entitled to name itself the successor and enforce
 this upon all users
 
 These things are not equal.  AFAIK, well-managed Linux distros do not
 allow random forkers to declare themselves the official successor to a
 defunct package, so any analogy between this use case in the Python
 world and the distro world is strained at *best*.
 
Note that although well-managed Linux distros attempt to control random
forking internally, the distro package managers don't prevent people from
installing from third parties.  So Ubuntu PPAs, upstreams that provide their
own rpms/debs, and major third party repos (for instance, rpmfusion as
an add-on repo to Fedora) all have and sometimes (mis)use the ability to
Obsolete packages in the base repository.

So Donald isn't stretching the relationship quite as far as you make it out.
The ecosystem of packages for a distro carries uncontrolled packages just as
much as pypi.

 
  and merely having the ability to use it when it is the best tool for the job
  isn't going to cause any great issue.
 
 One of the posts I linked presents an instance where it would have
 actually *harmed* things to specify it, and it's quite easy to see how
 the same problem would arise if used for non-file-related conflicts...
 
 And the problem present is *directly* tied to the lack of a
 third-party Z who decides whether X and Y, as configured for release Q
 of distro P, conflict.
 
 This is not a problem that is solvable even in *principle* for an
 automated tool in the absence of party Z, which means that any such
 field's actual function is limited to a heads-up to a human user.
 
And the same for Provides. (ie: latest foo is 0.6c; bar Provides: foo-0.6d.
an automated tool that finds both foo and bar in its dep tree can choose to
install bar and not foo.)

The ability for this class of fields to cause harm is not, to me,
a compelling argument not to include them.  It could be an argument to
explicitly tell implementers of install tools that they all have caveats
when used with pypi and similar unpoliced community package repositories.
The install tools can then choose how they wish to deal with those caveats.
Some example strategies: choose to prompt the user as to which to install,
choose to always treat the fields as human-informational only, mark some
repositories as being trusted to contain packages where these fields are
active and other repositories where the fields are ignored.

-Toshio



pgpIdSsnfDzFy.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Accept just PEP-0426

2012-11-20 Thread Toshio Kuratomi
On Tue, Nov 20, 2012 at 06:43:32PM -0500, Daniel Holth wrote:
 No. We trust the packages we install, including the way they decide to use
 the metadata. A bad package could delete all our files or cause dependency
 resolution to fail. Mostly they won't.
 
Agreed.  And this is closer to the way that distributions' tools have to
operate than they'd want to :-(  Within the distribution we like to pretend
that we only need to care about the packages that we generate.  But we also
know that whether or not we support it, ordinary users will install pacakges
from outside of our walls.  That means that the packaging tools that we
create will need to deal with things that we might not condone within our
presumed authority.

We trust that people are going to do more or less the right thing with
the tools we offer.  Once in a while they don't but by and large they do.

-Toshio

 Daniel Holth
 
 On Nov 20, 2012, at 5:26 PM, Vinay Sajip vinay_sa...@yahoo.co.uk wrote:
 
  Daniel Holth dholth at gmail.com writes:
  
  They mean pretty much what the same words mean in RPM and do not need 
  further
  bikeshedding.
  
  But isn't it the case that the scenarios are different because in the case 
  of
  RPMs, we have a presumed authority which can determine e.g. what obsoletes 
  what,
  whereas with Python distributions, there's no central authority that has 
  this
  function?
  
  Regards,
  
  Vinay Sajip
  
  
  ___
  Python-Dev mailing list
  Python-Dev@python.org
  http://mail.python.org/mailman/listinfo/python-dev
  Unsubscribe: 
  http://mail.python.org/mailman/options/python-dev/dholth%40gmail.com
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/a.badger%40gmail.com


pgpvh6CUSBynq.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Accept just PEP-0426

2012-11-19 Thread Toshio Kuratomi
On Mon, Nov 19, 2012 at 07:49:41PM -0500, Donald Stufft wrote:
 Other languages seem to get along fine without it. Even the OS
 managers which have it don't allow it to be used to masquerade
 as another project, only to make generic virtual packages (e.g. email).
 
I'm not sure this assertion about OS package managers is correct.  I've only
just read:
http://www.python.org/dev/peps/pep-0426/#provides-dist-multiple-use

but the rough rpm analogue seems to be the Provides: tag.

Provides is given a string which is parsed into a name or a name and version
like this:

Provides: python
Provides: python = 3.1.0

rpm has no way at package build time to tell that a particular name given in
a provides in one package is the actual name of another package.

At installtime, rpm keeps package names and provides names separately but in
dependency comparisons either one can be used to satisfy a requirement.
What that means is that when asking about information on a package with name
python, you'll get information about the python package with that name and
not about anything else that Provides: python.  But if you are installing
something that has a requirement on python either the package with the
name python or any package that Provides: python can satisfy the requirement.

Package managers with builtin dep solvers can be built on top of rpm.  The
one that I am familiar with is yum.  Since yum is downloading the packages
that are being fed into rpm, yum could choose to prefer the package name
instead of things in Provides when it downloads.  It doesn't, though.  Just
like the underlying rpm, it treats package names and names specificed
through Provides: as equivalent.

-Toshio

 On Monday, November 19, 2012 at 7:43 PM, Daniel Holth wrote:
 
 Does it really have baggage? I think it is necessary, although it doesn't
 do favors to the implementer (and has never been implemented). How else is
 anyone supposed to fork or merge projects?
 
 Daniel Holth
 
 On Nov 19, 2012, at 7:37 PM, PJ Eby p...@telecommunity.com wrote:
 
 
 On Mon, Nov 19, 2012 at 6:53 PM, Daniel Holth dho...@gmail.com 
 wrote:
 
 I think this PEP is a significant improvement from its 
 predecessor.
 It represents features like extras (provides-extra) and build
 requirements (setup-requires-dist) that are in use in the Python
 community but cannot be represented in older versions of the
 format, it finally specifies a UTF-8 encoding, removes RFC 822,
 provides an extension mechanism, and allows the description to be
 placed in the document payload.
 
 
 Can we maybe kill Provides-Dist and its associated baggage first,
 though?
 
 
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: http://mail.python.org/mailman/options/python-dev/
 donald.stufft%40gmail.com
 
 

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/a.badger%40gmail.com



pgpCdA0z048OX.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bumping autoconf from 2.68 to 2.69

2012-10-16 Thread Toshio Kuratomi
On Tue, Oct 16, 2012 at 11:27:24AM +0200, Antoine Pitrou wrote:
 On Tue, 16 Oct 2012 05:05:23 -0400
 Trent Nelson tr...@snakebite.org wrote:
  On Tue, Oct 16, 2012 at 01:43:37AM -0700, Charles-François Natali wrote:
My understanding is that we use a specific version of autoconf.
The reason is that otherwise we end up with useless churn in the repo
as the generated file changes when different committers use different
versions.  In the past we have had issues with a new autoconf version
actually breaking the Python build, so we also need to test a new 
version
before switching to it.
   
   Well, so I guess all committers will have to use the same
   Linux/FreeBSD/whatever distribution then?
   AFAICT there's no requirement regarding the mercurial version used by
   committers either.
  
  Autoconf is a special case though.  Different versions of autoconf
  produce wildly different outputs for 'configure', making it impossible
  to vet configure.ac changes by reviewing the configure diff.
 
 Isn't it enough to review the configure.ac diff?
 
That's the ideal but it's been wrong in the past and may possibly be wrong
in the future as well.

Anecdotally, in the Linux distribution I package for we have a conversation
about whether we should apply patches to configure.ac and then run
autoreconf (or equivalent) or include the patches to ocnfigure about once
a year.  Although the latter has been pretty stable for several autoconf
version updates enough people have bad memories of those times when bumping
autoconf revisions that there's always a vocal contingent who advocate
shipping patches to the actual configure scripts (they're under the
impression the package maintainer will actually audit the configure patch
that autoconf generated to see if there's breakage that way).

-Toshio


pgpKtINQd1o1B.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] #12982: Should -O be required to *read* .pyo files?

2012-06-13 Thread Toshio Kuratomi
On Wed, Jun 13, 2012 at 01:58:10PM -0400, R. David Murray wrote:
 
 OK, but you didn't answer the question :).  If I understand correctly,
 everything you said applies to *writing* the bytecode, not reading it.
 
 So, is there any reason to not use the .pyo file (if that's all that is
 around) when -O is not specified?
 
 The only technical reason I can see why -O should be required for a .pyo
 file to be used (*if* it is the only thing around) is if it won't *run*
 without the -O switch.  Is there any expectation that that will ever be
 the case?
 
Yes.  For instance, if I create a .pyo with -OO it wouldn't have docstrings.
Another piece of code can legally import that and try to use the docstring
for something.  This would fail if only the .pyo was present.

Of course, it would also fail under the present behaviour since no .py or
.pyc was present to be imported.  The error that's displayed might be
clearer if we fail when attempting to read a .py/.pyc rather than failing
when the docstring is found to be missing, though.

-Toshio


pgpqk9ErpLKEV.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 411: Provisional packages in the Python standard library

2012-02-11 Thread Toshio Kuratomi
On Sat, Feb 11, 2012 at 04:32:56PM +1000, Nick Coghlan wrote:
 
 This would then be seen by pydoc and help(), as well as being amenable
 to programmatic inspection.
 
Would using
warnings.warn('This is a provisional API and may change radically from'
' release to release', ProvisionalWarning)

where ProvisionalWarning is a new exception/warning category (a subclaass of
FutureWarning?) be considered too intrusive?

-Toshio


pgpsUYqg9uSvm.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread Toshio Kuratomi
On Thu, Jan 05, 2012 at 08:35:57PM +, Paul Moore wrote:
 On 5 January 2012 19:33, David Malcolm dmalc...@redhat.com wrote:
  We have similar issues in RHEL, with the Python versions going much
  further back (e.g. 2.3)
 
  When backporting the fix to ancient python versions, I'm inclined to
  turn the change *off* by default, requiring the change to be enabled via
  an environment variable: I want to avoid breaking existing code, even if
  such code is technically relying on non-guaranteed behavior.  But we
  could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
  That way /usr/bin/python would default to the old behavior, but web apps
  would have some protection.   Any such logic here also suggests the need
  for an attribute in the sys module so that you can verify the behavior.
 
 Uh, surely no-one is suggesting backporting to ancient versions? I
 couldn't find the statement quickly on the python.org website (so this
 is via google), but isn't it true that 2.6 is in security-only mode
 and 2.5 and earlier will never get the fix?

I think when dmalcolm says backporting he means that he'll have to
backport the fix from modern, supported-by-python.org python to the ancient
python's that he's supporting as part of the Linux distributions where he's
the python package maintainer.

I'm thinking he's mentioning it here mainly to see if someone thinks that
his approach for those distributions causes anyone to point out a reason not
to diverge from upstream in that manner.

 Having a source-only
 release for 2.6 means the fix is off by default in the sense that
 you can choose not to build it. Or add a #ifdef to the source if it
 really matters.
 
I don't think that this would satisfy dmalcolm's needs.  What he's talking
about sounds more like a runtime switch (possibly only when initializing,
though, not on-the-fly).

-Toshio


pgp7qk95cGJ9b.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: Anyone still using Python 2.5?

2011-12-21 Thread Toshio Kuratomi
On Thu, Dec 22, 2011 at 02:49:06AM +0100, Victor Stinner wrote:
 
 Do people still have to use this in commercial environments or is
 everyone on 2.6+ nowadays?
 
 At work, we are still using Python 2.5. Six months ago, we started a
 project to upgrade to 2.7, but we have now more urgent tasks, so the
 upgrade is delayed to later. Even if we upgrade new clients to 2.7,
 we will have to continue to support 2.5 for some more months (or
 years?).
 
At my work, I'm on RHEL5 and RHEL6.  So I'm currently supporting python-2.4
and python-2.6.  We're up to 75% RHEL6 (though, not the machines where most
of our deployed, custom written apps are running) so I shouldn't have to
support python-2.4 for much longer.

 In a personal project (the IPy library), I dropped support of Python
 2.5 in february 2011. Recently, I got a mail asking me where the
 previous version of my library (supporting Python 2.4) can be
 downloaded! Someone is still using Python 2.4: I'm stuck with python
 2.4 in my work environment.
 
As part of work, I package for EPEL5 (addon packages for RHEL5).  Sometimes
we need a new version of a package or a new package for RHEL5 and thus need
to have python-2.4 compatible versions of the package and any of its
dependencies.

When I no longer need to maintain python-2.4 stuff for work, I'm hoping to
not have to do quite so much of this but sometimes I know I'll still get
requests to update an existing package to fix a bug or fix a feature and
that will require updates of dependent libraries.  I'll still be stuck
looking for python-2.4 compatible versions of all of these :-(

 What do people feel?
 
 For a new project, try to support Python 2.5, especially if you would
 like to write a portable library. For a new application working on
 Mac OS X, Windows and Linux, you can only support Python 2.6.
 
I agree that libraries have a need to go farther back than applications.
I have one library that I support on python-2.3 (for RHEL4... I'm counting
down the months on that one :-).  Every other library I maintain, I make sure
I support at least python-2.4.

Application-wise, I currently have to support python-2.4+ but given that
Linux distros seem to all have some version out that supports at least
python-2.6, I don't think I'll be developing any applications that
intentionally support less than that once I get moved away from RHEL-5 at my
workplace.

-Toshio


pgpxLKFA2jIf4.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Promoting Python 3 [was: PyPy 1.7 - widening the sweet spot]

2011-11-22 Thread Toshio Kuratomi
On Wed, Nov 23, 2011 at 01:41:46AM +0900, Stephen J. Turnbull wrote:
 Barry Warsaw writes:
 
   Hopefully, we're going to be making a dent in that in the next version of
   Ubuntu.
 
 This is still a big mess in Gentoo and MacPorts, though.  MacPorts
 hasn't done anything about ceating a transition infrastructure AFAICT.
 Gentoo has its eselect python set VERSION stuff, but it's very
 dangerous to set to a Python 3 version, as many things go permanently
 wonky once you do.  (So far I've been able to work around problems
 this creates, but it's not much fun.)  I have no experience with this
 in Debian, Red Hat (and derivatives) or *BSD, but I have to suspect
 they're no better.  (Well, maybe Red Hat has learned from its 1.5.2
 experience! :-)
 
For Fedora (and currently, Red Hat is based on Fedora -- a little more about
that later, though), we have parallel python2 and python3 stacks.  As time
goes on we've slowly brought more python-3 compatible modules onto the
python3 stack (I believe someone had the goal a year and a half ago to get
a complete pylons web development stack running on python3 on Fedora which
brought a lot of packages forward).

Unlike Barry's work with Ubuntu, though, we're mostly chiselling around the
edges; we're working at the level where there's a module that someone needs
to run something (or run some optional features of something) that runs on
python3.

 I don't have any connections to the distros, so can't really offer to
 help directly.  I think it might be a good idea for users to lobby
 (politely!)  their distros to work on the transition.
 
Where distros aren't working on parallel stacks, there definitely needs to
be some transition plan.  With my experience with parallel stacks, the best
help there is to 1) help upstreams port to py3k (If someone can get PIL's
py3k support finished and into a released package, that would free up a few
things).  2) open bugs or help with creating python3 packages of modules
when the upstream support is there.

Depending on what software Barry's talking about porting to python3, that
could be a big incentive as well.  Just like with the push in Fedora to have
pylons run on python3, I think that having certain applications that run on
python3 and therefore need to have stacks of modules that support it is one
of the prime ways that distros become motivated to provide python3 packages
and support.  This is basically the killer app idea in a new venue :-)

-Toshio


pgp4H9ogaSy0g.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.4 Release Manager

2011-11-22 Thread Toshio Kuratomi
On Tue, Nov 22, 2011 at 08:27:24PM -0800, Raymond Hettinger wrote:
 
 On Nov 22, 2011, at 7:50 PM, Larry Hastings wrote:
  But look!  I'm already practicing: NO YOU CAN'T CHECK THAT IN.  How's that? 
   Needs work?
 
 You could try a more positive leadership style:  THAT LOOKS GREAT, I'M SURE 
 THE RM FOR PYTHON 3.5 WILL LOVE IT ;-)
 
Wow!  My release engineering team needs to take classes from you guys!

-Toshio


pgpuU9lyX1YFu.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bring new features to older python versions

2011-10-11 Thread Toshio Kuratomi
On Tue, Oct 11, 2011 at 12:22:12AM -0400, Terry Reedy wrote:
 On 10/10/2011 4:21 PM, Giampaolo Rodolà wrote:
 Thanks everybody for your feedback.
 I created a gcode project here:
 http://code.google.com/p/pycompat/
 
 This project will be easier if the test suite for a particular
 function/class/module is up to par. If you find any gaping holes, you
 might file an issue on the tracker.
 
About testsuites... one issue that you'll run into is that while some stdlib
modules are written with backporting to older versions in mind, their
testsuites are not.  For instance, subprocess from python-2.7 runs fine on
python-2.3+.  The testsuite for subprocess in python-2.7 makes use of the
with statement, though, so it has to be ported.

-Toshio


pgpgjOvmVVUJ7.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bring new features to older python versions

2011-10-11 Thread Toshio Kuratomi
On Tue, Oct 11, 2011 at 01:49:43PM +0100, Michael Foord wrote:
 On 10/10/2011 21:21, Giampaolo Rodolà wrote:
 
 Toshio Kuratomia.bad...@gmail.com  wrote:
 I have a need to support a small amount of code as far back as python-2.3
 I don't suppose you're interested in that as well? ;-)
 I'm still not sure; 2.3 version is way too old (it doesn't even have
 decorators).
 It might make sense only in case the lib gets widely used, which I doubt.
 Personally, at some point I deliberately dropped support for 2.3 from
 all of my code/lib, mainly because I couldn't use decorators. so I
 don't have a real need to do this.
 
 Yes, rewriting code from Python 2.7 to support Python 2.3
 (pre-decorators) is a real nuisance. In my projects I'm currently
 supporting Python 2.4+. I'll probably drop support for Python 2.4
 soon which will allow for the use of the with statement.
 
So actually, decorators aren't a big deal when thinking about porting
a limited set of code to python-2.3.  decorators are simply syntactic sugar
after all, so it's only a one-line change::

  @cache()
  def function(arg):
 # do_expensive_something
 return result

becomes::

  def function(arg):
 # do_expensiv_something
 return result
  function = cache(function)

This may not be the preferred manner to write decorators but it's fairly
straightforward and easy to remember compared to, say, porting away from the
with statement.

That said, this was in the nature of hopeful, finger crossing, not really
expecting that I'd get someone else to commit to this as a limitation than
a, this is not worthwhile unless you go back to python-2.3.  I only have
to bear this burden until February 29 and believe me, I'm anxiously awaiting
that day :-)

-Toshio


pgpVhXAYYRc0x.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bring new features to older python versions

2011-10-08 Thread Toshio Kuratomi

I have some similar code in kitchen:
http://packages.python.org/kitchen/api-overview.html

It wasn't as ambitious as your initial goals sound (I was only working on
pulling out what was necessary for what people requested rather than an
all-inclusive set of changes).  You're welcome to join me and work on this
aspect of kitchen if you'd like or you can go your own way and I'll probably
start pointing people at your library (Like I do with hashlib, bunch,
iterutils, ordereddict, etc).

I have a need to support a small amount of code as far back as python-2.3
I don't suppose you're interested in that as well? ;-)

On Sat, Oct 08, 2011 at 06:57:47PM +0200, Giampaolo Rodolà wrote:
 functools (2.5)
 any, all builtins (2.5)
 collections.defaultdict (2.5)
 property setters/deleters (2.6)
 abc (2.6)
 fractions (2.6)
 collections.OrderedDict (2.7)
 collections.Counter (2.7)
 unittest2 (2.7)
 functools.lru_cache (3.2)
 functools.total_ordering (3.2)
 itertools.accumulate (3.2)
 reprlib (3.2)
 contextlib.ContextDecorator (3.2)
 

You can also add subprocess to this list.  There's various methods and
functions that were added to subprocess since it's first appearance in
python-2.4 (Check the library docs page for notes about this [1] _)

hashlib (which has a pypi backport already) is another one.

hmac is a third which you probably won't notice if you're just perusing
docs.  It's an issue because if someone tries to use the stdlib's hmac 
together with the pypi hashlib, hmac will fail unless it's from a recent enough
python.

.. [1]_:: http://docs.python.org/library/subprocess.html


Speaking as someone who works on a Linux distribution, one thing that I'd
appreciate is if you could take care to make it so the copied code doesn't
get used if the stdlib already provides the necessary code.  If you do this,
it makes it easier for people who have to audit the code to do their jobs.
Instead of having to check every consumer of the compat library to make sure
they use something like this::

try:
import functools
except ImportError:
from pycompat import functools
import sys

if sys.version_info = (2, 5):
import hmac
else:
   from pycompat import hmac


You can depend on roughly the same logic having been performed in the
library itself which greatly eases their burden.  You can look at the
kitchen pycompat code for some examples of doing this [2]_.

.. [2]_ http://bzr.fedorahosted.org/bzr/kitchen/devel/files/head:/kitchen/

-Toshio


pgpWhofvduZYY.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Using PEP384 Stable ABI for the lzma extension module

2011-10-05 Thread Toshio Kuratomi
On Wed, Oct 05, 2011 at 06:14:08PM +0200, Antoine Pitrou wrote:
 Le mercredi 05 octobre 2011 à 18:12 +0200, Martin v. Löwis a écrit :
   Not sure what you are using it for. If you need to extend the buffer
   in case it is too small, there is absolutely no way this could work
   without copies in the general case because of how computers use
   address space. Even _PyBytes_Resize will copy the data.
  
   That's not a given. Depending on the memory allocator, a copy can be
   avoided. That's why the str += str hack is much more efficient under
   Linux than Windows, AFAIK.
  
  Even Linux will have to copy a block on realloc in certain cases, no?
 
 Probably so. How often is totally unknown to me :)
 
http://www.gnu.org/software/libc/manual/html_node/Changing-Block-Size.html

It depends on whether there's enough free memory after the buffer you
currently have allocated.  I suppose that this becomes a question of what
people consider the general case :-)

-Toshio


pgpCHlc9jDncJ.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the /usr/bin/python2 symlink upstream)

2011-08-12 Thread Toshio Kuratomi
On Fri, Aug 12, 2011 at 12:19:23PM -0400, Barry Warsaw wrote:
 On Aug 12, 2011, at 01:10 PM, Nick Coghlan wrote:
 
 1. Accept the reality of that situation, and propose a mechanism that
 minimises the impact of the resulting ambiguity on end users of Python
 by allowing developers to be explicit about their target language.
 This is the approach advocated in PEP 394.
 
 2. Tell the Arch developers (and anyone else inclined to point the
 python name at python3) that they're wrong, and the python symlink
 should, now and forever, always refer to a version of Python 2.x.
 
 FWIW, although I generally support the PEP, I also think that distros
 themselves have a responsibility to ensure their #! lines are correct, for
 scripts they install.  Meaning, if it requires rewriting the #! line on OS
 package install, so be it.
 
+1 with the one caveat... it's nice to upstream fixes.  If there's a simple
thing like python == python-2 and python3 == python-3 everywhere, this is
possible.  If there's something like python2 == python-2 and python-3 ==
python3 everywhere, this is also possible.  The problem is that: the latter
is not the case (python from python.org itself doesn't produce a python2
symlink on install) and historically the former was the case but since
python-dev rejected the notion that python == python-2 that is no long true.

As long as it's just Arch, there's still time to go with #2.  #1 is not
a complete solution (especially because /usr/bin/python2 will never exist on
some historical systems [not ones I run though, so someone else will need to
beat that horse :-)]) but is better than where we are now where there is no
guidance on what's right and wrong at all.

-Toshio


pgpBwoEJ5g8Bg.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?

2011-06-28 Thread Toshio Kuratomi
On Tue, Jun 28, 2011 at 03:46:12PM +0100, Paul Moore wrote:
 On 28 June 2011 14:43, Victor Stinner victor.stin...@haypocalc.com wrote:
  As discussed before on this list, I propose to set the default encoding
  of open() to UTF-8 in Python 3.3, and add a warning in Python 3.2 if
  open() is called without an explicit encoding and if the locale encoding
  is not UTF-8. Using the warning, you will quickly notice the potential
  problem (using Python 3.2.2 and -Werror) on Windows or by using a
  different locale encoding (.e.g using LANG=C).
 
 -1. This will make things harder for simple scripts which are not
 intended to be cross-platform.
 
 I use Windows, and come from the UK, so 99% of my text files are
 ASCII. So the majority of my code will be unaffected. But in the
 occasional situation where I use a £ sign, I'll get encoding errors,
 where currently things will just work. And the failures will be data
 dependent, and hence intermittent (the worst type of problem). I'll
 write a quick script, use it once and it'll be fine, then use it later
 on some different data and get an error. :-(

I don't think this change would make things harder.  It will just move
where the pain occurs.  Right now, the failures are intermittent on A)
computers other than the one that you're using. or B) intermittent when run
under a different user than yourself.  Sys admins where I'm at are
constantly writing ad hoc scripts in python that break because you stick
something in a cron job and the locale settings suddenly become C and
therefore the script suddenly only deals with ASCII characters.

I don't know that Victor's proposed solution is the best (I personally would
like it a whole lot more than the current guessing but I never develop on
Windows so I can certainly see that your environment can lead to the
opposite assumption :-) but something should change here.  Issuing a warning
like open used without explicit encoding may lead to errors if open() is
used without an explicit encoding would help a little (at least, people who
get errors would then have an inkling that the culprit might be an open()
call).  If I read Victor's previous email correctly, though, he said this
was previously rejected.

Another brainstorming solution would be to use different default encodings on
different platforms.  For instance, for writing files, utf-8 on *nix systems
(including macosX) and utf-16 on windows.  For reading files, check for a utf-16
BOM, if not present, operate as utf-8.  That would seem to address your
issue with detection by vim, etc but I'm not sure about getting £ in your
input stream.  I don't know where your input is coming from and how Windows
equivalent of locale plays into that.

-Toshio


pgp7J0rQuExcz.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 396, Module Version Numbers

2011-04-07 Thread Toshio Kuratomi
On Wed, Apr 06, 2011 at 11:04:08AM +0200, John Arbash Meinel wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 
 ...
  #. ``__version_info__`` SHOULD be of the format returned by PEP 386's
 ``parse_version()`` function.
 
 The only reference to parse_version in PEP 386 I could find was the
 setuptools implementation which is pretty odd:
 
  
  In other words, parse_version will return a tuple for each version string, 
  that is compatible with StrictVersion but also accept arbitrary version and 
  deal with them so they can be compared:
  
  from pkg_resources import parse_version as V
  V('1.2')
  ('0001', '0002', '*final')
  V('1.2b2')
  ('0001', '0002', '*b', '0002', '*final')
  V('FunkyVersion')
  ('*funkyversion', '*final')
 
Barry -- I think we want to talk about NormalizedVersion.from_parts() rather
than parse_version().

 bzrlib has certainly used 'version_info' as a tuple indication such as:
 
 version_info = (2, 4, 0, 'dev', 2)
 
 and
 
 version_info = (2, 4, 0, 'beta', 1)
 
 and
 
 version_info = (2, 3, 1, 'final', 0)
 
 etc.
 
 This is mapping what we could sort out from Python's sys.version_info.
 
 The *really* nice bit is that you can do:
 
 if sys.version_info = (2, 6):
   # do stuff for python 2.6(.0) and beyond
 
nod  People like to compare versions and the tuple forms allow that.  Note
that the tuples you give don't compare correctly.  This is the order that
they sort:

(2, 4, 0)
(2, 4, 0, 'beta', 1)
(2, 4, 0, 'dev', 2)
(2, 4, 0, 'final', 0)

So that means, snapshot releases will always sort after the alpha and beta
releases (and release candidate if you use 'c' to mean release candidate).
Since the simple (2, 4, 0) tuple sorts before everything else, a comparison
that doesn't work with the 2.4.0-alpha (or beta or arbitrary dev snapshots)
would need to specify something like:

(2, 4, 0, 'z')

NormalizedVersion.from_parts() uses nested tuples to handle this better.
But I think that even with nested tuples a naive comparison fails since most
of the suffixes are prerelease strings.  ie: ((2, 4, 0),)  ((2, 4, 0),
('beta', 1))

So you can't escape needing a function to compare versions.
(NormalizedVersion does this by letting you compare NormalizedVersions
together).  Barry if this is correct, maybe __version_info__ is useless and
I shouldn't have brought it up at pycon?

-Toshio


pgpztjMBlMddF.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-30 Thread Toshio Kuratomi
On Wed, Mar 30, 2011 at 08:36:43AM +0200, Lennart Regebro wrote:
 On Wed, Mar 30, 2011 at 07:54, Toshio Kuratomi a.bad...@gmail.com wrote:
  Lennart is missing that you just need to use the same encoding
  + surrogateescape (or stick with bytes) for decoding the byte strings that
  you are comparing.
 
 You lost me here. I need to do this for what?

The lesson here seems to be if you have to use blacklists, and you
use unicode strings for those blacklists, also make sure the string
you compare with doesn't have surrogates.


Really, surrogates are a red herring to this whole issue.  The issue is that
the original code was trying to compare two different transformations of
byte sequences and expecting them to be equal.  Let's say that you have the
following byte value::
  b_test_value = b'\xa4\xaf'

This is something that's stored in a file or the filename of something on
a unix filesystem or stored in a database or any number of other things.
Now you want to compare that to another piece of data that you've read in
from somewhere outside of python.  You'd expect any of the following to
work::
  b_test_value == b_other_byte_value
  b_test_value.encode('utf-8', 'surrogateescape') == 
b_other_byte_value('utf-8', 'surrogateescape')
  b_test_value.encode('latin-1') == b_other_byte_value('latin-1')
  b_test_value.encode('euc_jp') == b_other_byte_value('euc_jp')

You wouldn't expect this to work::
  b_test_value.encode('latin-1') == b_other_byte_value('euc_jp')

Once you see that, you realize that the following is only a specific case of
the former, surrogateescape doesn't really matter::
  b_test_value.encode('utf-8', 'surrogateescape') == 
b_other_byte_value('euc_jp')

-Toshio


pgpZiMIuYZION.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Toshio Kuratomi
On Tue, Mar 29, 2011 at 07:23:25PM +0100, Michael Foord wrote:
 Hey all,
 
 Not sure how real the security risk is here:
 
 http://blog.omega-prime.co.uk/?p=107
 
 Basically  he is saying that if you store a list of blacklisted files
 with names encoded in big-5 (or some other non-utf8 compatible
 encoding) if those names are passed at the command line, or otherwise
 read in and decoded from an assumed-utf8 source with surrogate
 escaping, the surrogate escape decoded names will not match the
 properly decoded blacklisted names.
 
The example is correct.  The security risk is real.  However, there's a flaw
in the program and whether the question of whether there's also a flaw in
python is not so certain.

Here's the line I'd say is contentious::
  blacklist = open(blacklist.big5, encoding='big5').read().split()

The blacklist file contains a list of filenames.  However, this code treats
it as a list of strings.  This a logic error in the program, and he should
really be doing this::
  blacklist = open(blacklist.big5, 'rb').read().split()

Then, when comparing it against the values of sys.argv, either sys.argv gets
converted into bytes (using the system locale since that's what was used to
encode to unicode) or the items in blacklist get converted to unicode with
surrogateescape.

The possible flaw in python is this:  Code like the blog poster wrote passes
python3 without an error or a warning.  This gives the programmer no
feedback that they're doing something wrong until it actually bites them in
the foot in deployed code.

-Toshio


pgpZiD1gfinFR.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Security implications of pep 383

2011-03-29 Thread Toshio Kuratomi
On Tue, Mar 29, 2011 at 10:55:47PM +0200, Victor Stinner wrote:
 Le mardi 29 mars 2011 à 22:40 +0200, Lennart Regebro a écrit :
  The lesson here seems to be if you have to use blacklists, and you
  use unicode strings for those blacklists, also make sure the string
  you compare with doesn't have surrogates.
 
 No. '\u4f60\u597d'.encode('big5').decode('latin1') gives '§A¦n' which
 doesn't contain any surrogate character.
 
 The lesson is: if you compare Unicode filenames on UNIX, make sure that
 your system is correctly configured (the locale encoding must be the
 filesystem encoding).

You're both wrong :-)

Lennart is missing that you just need to use the same encoding
+ surrogateescape (or stick with bytes) for decoding the byte strings that
you are comparing.

You're missing that on UNIX there is no filesystem encoding so the idea of
locale and filesystem encoding matching is false (and unnecessary -- the
encodings that you use within python just need to be the same.  They don't
even need to match up to the reality of what's used on the filesystem or the
user's locale.)

-Toshio


pgpbDIzKAesS3.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Module version variable

2011-03-18 Thread Toshio Kuratomi
On Fri, Mar 18, 2011 at 07:40:43PM -0700, Guido van Rossum wrote:
 On Fri, Mar 18, 2011 at 7:28 PM, Greg Ewing greg.ew...@canterbury.ac.nz 
 wrote:
  Tres Seaver wrote:
 
  I'm not even sure why you would want __version__ in 99% of modules:  in
  the ordinary cases, a module's version should be either the Python
  version (for a module shipped in the stdlib), or the release of the
  distribution which shipped it.
 
  It's useful to be able to find out the version of a module
  you're using at run time so you can cope with API changes.
 
  I had a case just recently where the behaviour of something
  in pywin32 changed between one release and the next. I looked
  for an attribute called 'version' or something similar to
  test, but couldn't find anything.
 
  +1 on having a standard place to look for version info.
 
 I believe __version__ *is* the standard (like __author__). IIRC it was
 proposed by Ping. I think this convention is so old that there isn't a
 PEP for it. So yes, we might as well write it down. But it's really
 nothing new.
 
There is a section in PEP8 about __version__ but it serves a slightly
different purpose there:


Version Bookkeeping

If you have to have Subversion, CVS, or RCS crud in your source file, do
it as follows.

__version__ = $Revision: 88433 $
# $Source$

These lines should be included after the module's docstring, before any
other code, separated by a blank line above and below.


Personally, I've never found a need to access the repository revision
programatically from my pyhon applications but I have needed to access the
API version so it would make sense to me to change the meaning of
__version__.

-Toshio


pgpr66xyWCYYt.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [PEPs] Support the /usr/bin/python2 symlink upstream

2011-03-08 Thread Toshio Kuratomi
On Tue, Mar 08, 2011 at 06:43:19PM -0800, Glenn Linderman wrote:
 On 3/8/2011 12:02 PM, Terry Reedy wrote:
 
 On 3/7/2011 9:31 PM, Reliable Domains wrote:
 
 
 The launcher need not be called python.exe, and maybe it would be
 better called #@launcher.exe (or similar, depending on its exact
 function details).
 
 
 I do not know that the '#@' part is about, but pygo would be short and
 expressive.
 
 
 
 If my proposal to make a line starting with #@ to be used instead of the Unix
 #! (#@ could be on the first or second line, to allow cross-platform scripts 
 to
 use both, and Windows only scripts to not have #!

You'd need to allow for it to be on the third line as well.  pep-0263
has already taken the second line if it's in a script that has a Unix
shebang.


 ), then #@launcher.exe (and #
 @launcherw.exe I suppose) would reflect the functionality of the launcher,
 which need not be tightly tied to Python, if it uses a separate line.  But the
 launcher should probably not be the thing invoked from the command line, only
 implicitly when running scripts by naming them as the first thing on the
 command line.
 
 I'm of the opinion that attempting to parse a Unix #! line, and intuit what
 would be the equivalent on Windows is unnecessarily complex and error prone,
 and assumes that the variant systems are configured using the same guidelines
 (which the Python community may espouse, but may not be followed by all
 distributions, sysadmins, or users).

I do not have a Windows system so I don't have a horse in this race but if
the argument is to avoid complexity, be careful that your proposed solution
isn't more complex than what you're avoiding.  ie::

 Now that I've had this idea, one might want to create other 2nd character
 codes after the Unix #! line... one could have
 
 #! Unix command processor
 #@ Windows command processor
 #$ OS/2 command processor
 #% Alternate Windows command processor.
 
 One could even port it to Unix:
 
 #!/usr/bin/#@launcher
 #@c:\python2.6\python.exe
 #^/usr/bin/python2.5
 #/usr/bin/mono/IronPython2.6 for .NET 4.0/ipy.exe
 #  I made up the line above, having no knowledge of Mono, but I think you get
 the idea
 
 Choice of command line would be an environment variable, I suppose, that the
 launcher would look at, or if none, then a system-specific default.  It would
 have to search forward in the file until it finds the appropriate prefix or a
 line not starting with #, or starting with #  or ##, at which point it
 would give up.
 
-Toshio


pgpkYA49vPaay.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Support the /usr/bin/python2 symlink upstream

2011-03-07 Thread Toshio Kuratomi
On Tue, Mar 08, 2011 at 08:25:50AM +1000, Nick Coghlan wrote:
 On Tue, Mar 8, 2011 at 1:30 AM, Barry Warsaw ba...@python.org wrote:
  On Mar 04, 2011, at 12:00 PM, Toshio Kuratomi wrote:
 
 Actually, my post was saying that these two can be decoupled.  ie: It's
 possible to not have /usr/bin/python while still allowing users to type
 python at a shell prompt and get the interpreter.
 
 This is done by either redefining the PATH to include the directory that the
 interpreter named python is in or by creating an alias for python to the
 proper interpreter.
 
  I personally would prefer aliasing rather than $PATH manipulation.
 
 Toshio's suggestion wouldn't work anyway - the /usr/bin/env python
 idiom will pick up a python alias no matter where it lives on $PATH.
 
I thought I pointed out that env wouldn't work with PATH but I guess I just
thought that silently in my head.  Pointing that out was going to live in
the same paragraph as saying that it does work with an alias::

$ sudo mv /usr/bin/python /usr/bin/python.bak
$ alias python='/usr/bin/python2.7'
$ python --version
Python 2.7
$ cat test.py
#! /bin/env python
print 'hi'
$ ./test.py
/bin/env: python: No such file or directory
$ mv /usr/bin/python.bak /usr/bin/python
$ ./test.py
hi


-Toshio


pgpwQudNGJDWc.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 395: Module Aliasing

2011-03-05 Thread Toshio Kuratomi
On Fri, Mar 04, 2011 at 12:56:16PM -0500, Fred Drake wrote:
 On Fri, Mar 4, 2011 at 12:35 PM, Michael Foord
 fuzzy...@voidspace.org.uk wrote:
  That (below) is not distutils it is setuptools. distutils just uses
  `scripts=[...]`, which annoyingly *doesn't* work with setuptools.
 
 Right; distutils scripts are just sad.
 
 OTOH, entry-point based scripts are something setuptools got very,
 very right.  Probably not perfect, but... I've not yet needed anything
 different in practice.
 
Some of them can be annoying as hell when dealing with a system that also
installs multiple versions of a module.  But one could argue that's the
fault of setuptools' version handling rather than the entry-points
handling.

-Toshio


pgpUBRcxfWp3n.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Support the /usr/bin/python2 symlink upstream

2011-03-04 Thread Toshio Kuratomi
On Fri, Mar 04, 2011 at 01:56:39PM -0500, Barry Warsaw wrote:
 
 I don't agree that /usr/bin/python should not be installed.  The draft PEP
 language hits the right tone IMHO, and I would favor /usr/bin/python pointing
 to /usr/bin/python2 on Debian, but primarily used only for the interactive
 interpreter.
 
 Or IOW, I still want users to be able to type 'python' at a shell prompt and
 get the interpreter.
 
Actually, my post was saying that these two can be decoupled.  ie: It's
possible to not have /usr/bin/python while still allowing users to type
python at a shell prompt and get the interpreter.

This is done by either redefining the PATH to include the directory that the
interpreter named python is in or by creating an alias for python to the
proper interpreter.

Using the environment-modules tools is one solution that operated in this
way.  It also, incidentally, would let each user of a system choose whether
python invoked python2 or python3 (and on Debian, which sub-version of
those).  A more hardcoded approach is to have the python package drop some
configuration into /etc/profile.d/ style directories where the distribution
places files that are run by default by the user's shell with the default
startup files.

-Toshio


pgpVTu9R21jxR.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Support the /usr/bin/python2 symlink upstream

2011-03-03 Thread Toshio Kuratomi
On Thu, Mar 03, 2011 at 09:55:25AM +0100, Piotr Ożarowski wrote:
 [Guido van Rossum, 2011-03-02]
  On Wed, Mar 2, 2011 at 4:56 AM, Piotr Ożarowski pi...@debian.org wrote:
   [Sandro Tosi, 2011-03-02]
   On Wed, Mar 2, 2011 at 10:01, Piotr Ożarowski pi...@debian.org wrote:
I co-maintain with Matthias a package that provides /usr/bin/python
symlink in Debian and I can confirm that it will always point to Python
2.X. We also do not plan to add /usr/bin/python2 symlink (and I guess
only accepted PEP can change that)
  
   Can you please explain why you NACK this proposed change?
  
   it encourages people to change /usr/bin/python symlink to point to
   python3.X which I'm strongly against (how can I tell that upstream
   author meant python3.X and not python2.X without checking the code?)
  
  But the same is already true for python2.X vs. python2.Y. Explicit is
  better than implicit etc. Plus, 5 years from now everybody is going to
  be annoyed that python still refers to some ancient unused version
  of Python.
 
 I don't really mind adding /usr/bin/python2 symlink just to clean Arch
 mess, but I do mind changing /usr/bin/python to point to python3 (and I
 can use the same argument - Explicit is better than implicit - if you
 need Python 3, say so in the shebang, right?). What I'm afraid of is
 when we'll add /usr/bin/python2, we'll start getting a lot of scripts
 that will have to be checked manually every time new upstream version is
 released because we cannot assume what upstream author is using at given
 point.
 
 If /usr/bin/python will be disallowed in shebangs on the other hand
 (and all scripts will use /usr/bin/python2, /usr/bin/python3,
 /usr/bin/python4 or /usr/bin/python2.6 etc.) I don't see a problem with
 letting administrators choose /usr/bin/python (right now not only
 changing it from python2.X to python3.X will break the system but also
 changing it from /usr/bin/pytohn2.X to /usr/bin/python2.Y will break it,
 and believe me, I know what I'm talking about (one of the guys at work
 did something like this once))
 
 [all IMHO, dunno if other Debian's python-defaults maintainers agree
 with me]

Thinking outside of the box, I can think of something that would satisfy
your requirements but I don't know how appropriate it is for upstream python
to ship with.  Stop shipping /usr/bin/python.  Ship python in an alternate
location like $LIBEXECDIR/python2.7/bin (I think this would be
/usr/lib/python2.7/bin on Debian and /usr/libexec/python2.7/bin on Fedora
which would both be appropriate) then configure which python version is
invoked by the user typing python by configuring PATH (a shell alias might
also work).  You could configure this with environment-modules[1]_ if Debian
supports using that in packaging.

Coupled with a PEP that recommends against using /usr/bin/python in scripts
and instead using /usr/bin/python$MAJOR, this might be sufficient.  OTOH, my
cynical side doubts that script authors read PEPs so it'll take either
upstream python shipping without /usr/bin/python or consensus among the
distros to ship without /usr/bin/python to reach the point where script
authors realize that they need to use /usr/bin/python{2,3} instead of
/usr/bin/python.

.. _[1]: http://modules.sourceforge.net/

-Toshio


pgp97oSsV2cOw.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Support the /usr/bin/python2 symlink upstream

2011-03-03 Thread Toshio Kuratomi
On Thu, Mar 03, 2011 at 09:11:40PM -0500, Barry Warsaw wrote:
 On Mar 03, 2011, at 02:17 PM, David Malcolm wrote:
 
 On a related note, we have a number of scripts packaged across the
 distributions with a shebang line that reads:
#!/usr/bin/env python
 which AIUI follows upstream recommendations.
 
 Actually, I think this is *not* a good idea for distro provided scripts.  For
 any Python scripts released by the distro, you know exactly which Python it
 should run on, so it's better to hard code it.  That way, if someone installs
 Python from source, or installs an experimental version of a new distro
 Python, it won't break their system.  Yes, this has happened to me.  Also,
 note that distutils/setuptools/distribute rewrite the shebang line when they
 install scripts.
 
 There was a proposal to change these when packaging them to hardcode the
 specific python binary:
 
 https://fedoraproject.org/wiki/Features/SystemPythonExecutablesUseSystemPython
 on the grounds that a packaged system script is expecting (and has been
 tested against) a specific python build.
 
 That proposal has not yet been carried out.  Ideally if we did this,
 we'd implement it as a postprocessing phase within rpmbuild, rather
 than manually patching hundreds of files.
 
 Note that this would only cover shebang lines at the tops of scripts.
 
 JFDI!
 
 FWIW, a quick grep reveals about two dozen such scripts in /usr/bin on
 Ubuntu.  We should fix these. ;)
 
Note, we were unable to pass Guideline changes to do this in Fedora.  Gory
details of the FPC meeting are at 16:15:03 (abadger1999 == me):
http://meetbot.fedoraproject.org/fedora-meeting/2009-08-19/fedora-meeting.2009-08-19-16.01.log.html

The mailing list thread where this was discussed is here:
http://lists.fedoraproject.org/pipermail/packaging/2009-July/006248.html

Note to dmalcolm: IIRC, that also means that the Feature page you point to
isn't going to happen either.  Barry -- if other distros adopted stronger
policies, then that might justify me taking this back to the Packaging
Committee.

-Toshio


pgpeLOL8uwMOh.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Support the /usr/bin/python2 symlink upstream

2011-03-03 Thread Toshio Kuratomi
On Thu, Mar 03, 2011 at 09:46:23PM -0500, Barry Warsaw wrote:
 On Mar 03, 2011, at 09:08 AM, Toshio Kuratomi wrote:
 
 Thinking outside of the box, I can think of something that would satisfy
 your requirements but I don't know how appropriate it is for upstream python
 to ship with.  Stop shipping /usr/bin/python.  Ship python in an alternate
 location like $LIBEXECDIR/python2.7/bin (I think this would be
 /usr/lib/python2.7/bin on Debian and /usr/libexec/python2.7/bin on Fedora
 which would both be appropriate) then configure which python version is
 invoked by the user typing python by configuring PATH (a shell alias might
 also work).  You could configure this with environment-modules[1]_ if Debian
 supports using that in packaging.
 
 I wonder if Debian's alternatives system would be appropriate for this?
 
 http://wiki.debian.org/DebianAlternatives
 


No, alternatives is really only useful for a very small class of problems
[1]_ and [2]_.  For this discussion there's an additional problem which is
that alternatives works by creating symlinks.  Piotr Ożarowski wants to make
/usr/bin/python not exist so that scripts would have to use either
/usr/bin/python3 or /usr/bin/python2.  If alternatives places a symlink
there, it defeats the purpose of avoiding that path in the package itself.

I will note, though that scripts that have /usr/bin/env and take the route
of setting the PATH would still fall victim to this.  I think that
environment-modules can also set up aliases.  If so, that wouldbe better
than setting PATH for finding and removing python without a version in
scripts.

One further note on this since one of the other messages here had
a reference to this that kinda rains on this parade:
http://refspecs.linux-foundation.org/LSB_4.1.0/LSB-Languages/LSB-Languages/pylocation.html

The LSB is a standard that Linux distributions may or may not follow --
unlike the FHS, the LSB goes beyond encoding what most distros already do to
things that they think people should do.  For instance, Debian derivatives
might find the software installation section of LSB[3]_ to be a bit... hard
to swallow.  Fedora provides a package which aims to make a fedora system
lsb compliant but doesn't install it by default since it drags in gobs of
packages that are otherwise not necessary on many systems.

However, it does specify /usr/bin/python so getting rid of /usr/bin/python
at the Linux distribution level might not reach universal aclaim.  A united
front from upstream python through the python package maintainers on the
Linux distros would probably be needed to get people thinking about making
this change... and we still would likely have the ability to add
/usr/bin/python back onto a system (for instance, as part of that lsb
package I mentioned earlier.)

.. [1]:
https://fedoraproject.org/wiki/Packaging:EnvironmentModules#Introduction
.. [2]:
http://fedoraproject.org/wiki/Packaging:Alternatives#Recommended_usage

.. [3]:
http://refspecs.linux-foundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/swinstall.html

-Toshio


pgpRUO8y9NO0L.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Support the /usr/bin/python2 symlink upstream

2011-03-01 Thread Toshio Kuratomi
On Wed, Mar 02, 2011 at 01:14:32AM +0100, Martin v. Löwis wrote:
  I think a PEP would help, but in this case I would request that before
  the PEP gets written (it can be a really short one!) somebody actually
  go out and get consensus from a number of important distros. Besides
  Barry, do we have any representatives of distros here?
 
 Matthias Klose represents Debian, Dave Malcolm represents Redhat,
 and Dirkjan Ochtman represents Gentoo.
 
I'm here from Fedora.

-Toshio


pgpvGuHioHuln.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Import and unicode: part two

2011-01-26 Thread Toshio Kuratomi
On Wed, Jan 26, 2011 at 11:12:02AM +0100, Martin v. Löwis wrote:
 Am 26.01.2011 10:40, schrieb Victor Stinner:
  Le lundi 24 janvier 2011 à 19:26 -0800, Toshio Kuratomi a écrit :
  Why not locale:
  * Relying on locale is simply not portable. (...)
  * Mixing of modules from different locales won't work. (...)
  
  I don't understand what you are talking about.
 
 I think by portability, he means moving files from one computer to
 another. He argues that if Python would mandate UTF-8 for all file
 names on Unix, moving files in such a way would support portability,
 whereas using the locale's filename might not (if the locale use a
 different charset on the target system).
 
 While this is technically true, I don't think it's a helpful way of
 thinking: by mandating that file names are UTF-8 when accessed from
 Python, we make the actual files inaccessible on both the source and
 the target system.
 
  I don't understand the relation between the local filesystem encoding
  and the portability. I suppose that you are talking about the
  distribution of a module to other computers. Here the question is how
  the filenames are stored during the transfer. The user is free to use
  any tool, and try to find a tool handling Unicode correctly :-) But it's
  no more the Python problem.
 
 There are cases where there is no real transfer, in the sense in which
 you are using the word. For example, with NFS, you can access the very
 same file simultaneously on two systems, with no file name conversion
 (unless you are using NFSv4, and unless your NFSv4 implementations
 support the UTF-8 mandate in NFS well).
 
 Also, if two users of the same machine have different locale settings,
 the same file name might be interpreted differently.
 
Thanks Martin, I think that you understand my view even if you don't share
it.

There's one further case that I am worried about that has no real
transfer.  Since people here seem to think that unicode module names are
the future (for instance, the comments about redefining the C locale to
include utf-8 and the comments about archiving tools needing to support
encoding bits), there are eventually going to be unicode modules that become
dependencies of other modules and programs.  These will need to be installed
on systems.  Linux distributions that ship these will need to choose
a filesystem encoding for the filenames of these.  Likely the sensible thing
for them to do is to use utf-8 since all the ones I can think of default to
utf-8.  But, as Stephen and Victor have pointed out, users change their
locale settings to things that aren't utf-8 and save their modules using
filenames in that encoding.  When they update their OS to a version that has
utf-8 python module names, they will find that they have to make a choice.
They can either change their locale settings to a utf-8 encoding and have
the system installed modules work or they can leave their encoding on their
non-utf-8 encoding and have the modules that they've created on-site work.

This is not a good position to put users of these systems in.

-Toshio


pgpRiKtOLoK13.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Import and unicode: part two

2011-01-25 Thread Toshio Kuratomi
On Tue, Jan 25, 2011 at 10:22:41AM +0100, Xavier Morel wrote:
 On 2011-01-25, at 04:26 , Toshio Kuratomi wrote:
  
  * If you can pick a set of encodings that are valid (utf-8 for Linux and
   MacOS
 
 HFS+ uses UTF-16 in NFD (actually in an Apple-specific variant of NFD). Right 
 here you've already broken Python modules on OSX.

Others have been saying that Mac OSX's HFS+ uses UTF-8.  But the question is
not whether UTF-16 or UTF-8 is used by HFS+.  It's whether you can sensibly
decide on an encoding from the type of system that is being run on.  This
could be querying the filesystem or a check on sys.platform or some other
method.  I don't know what detection the current code does.

On Linux there's no defined encoding that will work; file names are just
bytes to the Linux kernel so based on people's argument that the convention
is and should be that filenames are utf-8 and anything else is
a misconfigured system -- python should mandate that its module filenames on
Linux are utf-8 rather than using the user's locale settings.
 
 And as far as I know, Linux software/FS generally use NFC (I've already seen 
 this issue cause trouble)
 
Linux FS's are bytes with a small blacklist (so you can't use the NULL byte
in a filename, for instance).  Linux software would be free to use any
normal form that they want.  If one software used NFC and another used NFD,
the FS would record two separate files with two separate filenames.  Other
programs might or might not display this correctly.

Example:
zsh$ touch cafe
zsh$ python
Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) 
 import os
 import unicodedata
 a=u'café'
 b=unicodedata.normalize('NFC', a)
 c=unicodedata.normalize('NFD', a)
 open(b.encode('utf8'), 'w').close()
 open(c.encode('utf8'), 'w').close()
 os.listdir(u'.')
 [u'people-etc-changes.txt', u'cafe\u0301', u'cafe', 
 u'people-etc-changes.sha256sum', u'caf\xe9']
 os.listdir('.')
 ['people-etc-changes.txt', 'cafe\xcc\x81', 'cafe', 
 'people-etc-changes.sha256sum', 'caf\xc3\xa9']
 ^D

zsh$ ls -al .
drwxrwxr-x.  2 badger badger  4096 Jan 25 07:46 .
drwxr-xr-x. 17 badger badger  4096 Jan 24 18:27 ..
-rw-rw-r--.  1 badger badger 0 Jan 25 07:45 cafe
-rw-rw-r--.  1 badger badger 0 Jan 25 07:46 cafe
-rw-rw-r--.  1 badger badger 0 Jan 25 07:46 café

zsh$ ls -al cafe
-rw-rw-r--.  1 badger badger 0 Jan 25 07:45 cafe
zsh$ ls -al cafe?
-rw-rw-r--.  1 badger badger 0 Jan 25 07:46 cafe

Now in this case, the decomposed form of the filename is being displayed
incorrectly and the shell treats the decomposed character as two characters
instead of one.  However, when you view these files in dolphin (the KDE file
manager) you properly see café repeated twice.

-Toshio


pgp2jXsIKYdB7.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Import and unicode: part two

2011-01-25 Thread Toshio Kuratomi
On Wed, Jan 26, 2011 at 11:24:54AM +0900, Stephen J. Turnbull wrote:
 Toshio Kuratomi writes:
 
   On Linux there's no defined encoding that will work; file names are just
   bytes to the Linux kernel so based on people's argument that the convention
   is and should be that filenames are utf-8 and anything else is
   a misconfigured system -- python should mandate that its module filenames 
 on
   Linux are utf-8 rather than using the user's locale settings.
 
 This isn't going to work where I live (Tsukuba).  At the national
 university alone there are hundreds of pre-existing *nix systems whose
 filesystems were often configured a decade or more ago.  Even if the
 hardware and OS have been upgraded, the filesystems are usually
 migrated as-is, with OS configuration tweaks to accomodate them.  Many
 of them use EUC-JP (and servers often Shift JIS).  That means that you
 won't be able to read module names with ls, and that will make Python
 unacceptable for this purpose.  I imagine that in Russia the same is
 true for the various Cyrillic encodings.
 
Sure ... but with these systems, neither read-modules-as-locale or
read-modules-as-utf-8 are a good solution to work, correct?  Especially if
the OS does get upgraded but the filesystems with user data (and user
created modules) are migrated as-is, you'll run into situations where system
installed modules are in utf-8 and user created modules are shift-jis and so
something will always be broken.

The only way to make sure that modules work is to restrict them to ASCII-only
on the filesystem.  But because unicode module names are seen as
a necessary feature, the question is which way forward is going to lead to
the least brokenness.  Which could be locale... but from the python2
locale-related bugs that I get to look at, I doubt.

 I really don't think there is anything that can be done here except to
 warn people that Kids, these stunts are performed by highly-trained
 professionals.  Don't try this at home!  Of course they will anyway,
 but at least they will have been warned in sufficiently strong terms
 that they might pay attention and be able to recover when they run
 into bizarre import exceptions.
 
So on the subject of warnings... I think a reason it's better to pick an
encoding for the platform/filesystem rather than to use locale is because
people will get an error or a warning at the appropriate time if that's the
case -- the first time they attempt to create and import a module with
a filename that's not encoded in the correct encoding for the platform.
It's all very well to say: We wrote in the documentation on
http://docs.python.org/distutils/introduction.html#Choosing-a-name that only
ASCII names should be used when distributing python modules but if the
interpreter doesn't complain when people use a non-ASCII filename we all
know that they aren't going to look in the documentation; they'll try it and
if it works they'll learn that habit.  

-Toshio


pgpjrrsvd3wof.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Import and unicode: part two

2011-01-24 Thread Toshio Kuratomi
On Thu, Jan 20, 2011 at 03:27:08PM -0500, Glyph Lefkowitz wrote:
 
 On Jan 20, 2011, at 11:46 AM, Guido van Rossum wrote:
 Same here. *Most* code will never be shared, or will only be shared
 between users in the same community. When it goes wrong it's also a
 learning opportunity. :-)
 
 
 Despite my usual proclivity for being contrarian, I find myself in agreement
 here.  Linux users with locales that don't specify UTF-8 frankly _should_ have
 to deal with all kinds of nastiness until they can transcode their 
 filesystems.
  MacOS and Windows both have a right answer here and your third-party tools
 shouldn't create mojibake in your filenames.
 
However, if this is the consensus, it makes a lot more sense to pick utf-8
as *the* encoding for python module filenames on Linux.

Why UTF-8:

* UTF-8 can cover the whole range of unicode whereas most (all?) other
  locale friendly encodings cannot.
* UTF-8 is becoming a standard for Linux distributions whether or not Linux
  users are adopting it.
* Third party tools are gaining support for UTF-8 even when they aren't
  gaining support for generic encodings (If I read the spec on zip
  correctly, this is actually what's happening there).

Why not locale:
* Relying on locale is simply not portable.  If nothing prevents people from
  distributing a unicode filename then they will go ahead and do so.  If
  the result works (say, because it's utf-8 and 80% of the Linux userbase is
  using utf-8) then it will get packaged and distributed and people won't
  know that it's a problem until someone with a non-utf-8 locale decids to
  use it.
* Mixing of modules from different locales won't work.  Suppose that the
  system python installs the previous module.  The local site has other
  modules that it has installed using a different filename encoding.
  The users at the site will find that either one or hte other of the two
  modules won't work.
* Because of the portability problems you have no choice but to tell people
  not to distribute python modules with non-ASCII names.  This makes the use
  of unicode names second class indefintely (until the kernel devs decide
  that they're wrong to not enforce a filesystem encoding or Linux becomes
  irrelevant as a platform).
* If you can pick a set of encodings that are valid (utf-8 for Linux and
  MacOS, wide unicode for windows [I get the feeling from other parts of the
  conversation that Windows won't be so lucky, though]) tools to convert
  python names become easier to write.  If you restrict it far enough, you
  could even write tools/importers that automatically do the detection.

PS: Sorry for not replying immediately, the team I'm on is dealing with an
issue at my work and I'm also preparing for a conference later this week.

-Toshio


pgpq1C0qGW77C.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Import and unicode: part two

2011-01-20 Thread Toshio Kuratomi
On Thu, Jan 20, 2011 at 12:51:29PM +0100, Victor Stinner wrote:
 Le mercredi 19 janvier 2011 à 20:39 -0800, Toshio Kuratomi a écrit :
  Teaching students to write non-portable code (relying on filesystem encoding
  where your solution is, don't upload to pypi anything that has non-ascii
  filenames) seems like the exact opposite of how you'd want to shape a young
  student's understanding of good programming practices.
 
 That was already discuted before: see PEP 3131.
 http://www.python.org/dev/peps/pep-3131/#common-objections
 
 If the teacher choose to use non-ASCII, (s)he is responsible to explain
 the consequences to his/her students :-)
 
It's not discussed in that PEP section.

The PEP section says this: People claim that they will not be able to use
a library if to do so they have to use characters they cannot type on their
keyboards.

Whether you can type it at your keyboard or not is not the problem here.
The problem is portability.  The students and professors are sharing code
with each other.  But because of a mixture of operating systems (let alone
locale settings), the code written by one partner is unable to run on the
computer of the other.

If non-ascii filenames without a defined encoding are considered a feature,
python cannot even issue a descriptive error when this occurs.  It can only
say that it could not find the module but not why.  A restriction on module
names to ascii only could actually state that module names are not allowed
to be non-ASCII when it encounters the import line.

   In a school, you can use the same configuration
   (encoding) on all computers.
   
  In a school computer lab perhaps.  But not on all the students' and
  professors' machines.  How many professors will be cursing python when they
  discover that the example code that they wrote on their Linux workstation
  doesn't work when the students try to use it in their windows computer lab?
 
 Because some students use a stupid or misconfigured OS, Python should
 only accept ASCII names?

Just a note -- you'll get much farther if you refrain from calling names.
It just makes me think that you aren't reading and understanding the issue
I'm raising.  My examples that you're replying to involve two properly
configured OS's.  The Linux workstations are configured with a UTF-8
locale.  The Windows OS's use wide character unicode.  The problem occurs in
that the code that one of the parties develops (either the students or the
professors) is developed on one of those OS's and then used on the other OS.

 So, why do Python 3 support non-ASCII
 filenames: it is very well known that non-ASCII filenames is the root in
 many troubles! Should we simply drop unicode support for all filenames?
 And maybe restrict bytes filenames to bytes in [0; 127]? Or better,
 restrict to [32; 126] (U+007f causes some troubles in some terminals).
 
If you want to argue that because python3 supports non-ascii filenames in
other code, then the logical extension is that the import mechanism should
support importing module names defined by byte sequences.  I happen to think
that import has a lot of differences between it and other filenames as I've
said three times now.

 I think that in 2011, non-ASCII filenames are well supported on all
 (modern) operating systems. Issues with non-ASCII filenames are OS
 specific and should be fixed by the user (the admin of the computer).
 
  Additionally, those other filesystem operations have
  been growing the ability to take byte values and encoding parameters because
  unicode translation via a single filesystem encoding is a good default but
  not a complete solution.
 
 If you are unable to configure correctly your system to decode/encode
 correctly filenames, you should just avoid non-ASCII characters in the
 module names.
 
This seems like an argument to only have unicode versions of all filesystem
operations.  Since you've been spearheading the effort to have bytes
versions of things that access filenames, environment variables, etc,
I don't think that you seriously mean that.  Perhaps there is a language
issue here.

 You only give theorical arguments: did you at least try to use non-ASCII
 module names on your system with Python 3.2? I suppose that it will just
 work and you will never notice that the unicode module name (on import
 café) in encoded to bytes.
 
Yes I did and I got it to fail a cornercase as I showed twice with the same
example in other posts.  However, I want to make clear here that the issue
is not that I can create a non-ascii filename and then import it.  The issue
is that I can create a non-ascii filename and then try to share it with the
usual tools and it won't work on the recipient's system.  (A tangent is
whether the recipient's system is physically distinct from mine or only has
a different environment on the same physical host.)

 It fails on on OSes using filesystem encodings other than UTF-8 (eg.
 Windows)... because of a Python bug, and I just asked if I have

Re: [Python-Dev] Import and unicode: part two

2011-01-20 Thread Toshio Kuratomi
On Thu, Jan 20, 2011 at 01:43:03PM -0500, Alexander Belopolsky wrote:
 On Thu, Jan 20, 2011 at 12:44 PM, Toshio Kuratomi a.bad...@gmail.com wrote:
  .. My examples that you're replying to involve two properly
  configured OS's.  The Linux workstations are configured with a UTF-8
  locale.  The Windows OS's use wide character unicode.  The problem occurs in
  that the code that one of the parties develops (either the students or the
  professors) is developed on one of those OS's and then used on the other OS.
 
 
 I re-read your posts on this thread, but could not find the examples
 that you refer to.

Examples might be a bad word in this context.  Victor was commenting on the
two brainstorm ideas for alternatives to ascii-only that I had.  One was:

* Mandate that every python module on a platform has a specific encoding
  (rather than the value of the locale)

The other was:
* allow using byte strings for import

I think that both ideas are inferior to mandating that every python module
filename is ascii.  From what I'm getting from Victor's posts is that he, at
least, considers the portability problems to be ignorable because dealing
with ambiguous file name encodings is something that he'd like to force
third party tools to deal with.

-Toshio


pgpdh2k6Fwv56.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Import and unicode: part two

2011-01-19 Thread Toshio Kuratomi
On Wed, Jan 19, 2011 at 04:40:24PM -0500, Terry Reedy wrote:
 On 1/19/2011 4:05 PM, Simon Cross wrote:
 
 I have no problem with non-ASCII module identifiers being valid
 syntax. It's a question of whether attempting to translate a non-ASCII
 
 If the names are the same, ie, produced with the same sequence of
 keystrokes in the save-as box and importing box, then there is no
 translation, at least from the user's view.
 
 module name into a file name (so the file can be imported) is a good
 idea and whether these sorts of files can be safely transferred among
 diverse filesystems.
 
 I believe we now have the situation that a package that works on *nix
 could fail on Windows, whereas I believe that patch would *improve*
 portability.
 
I'm not so sure about this  You may have something that works on Windows
and on *NIX under certain circumstances but it seems likely to fail when
moving files between them (for instance, as packages downloaded from pypi).
Additionally, many unix filesystem don't specify a filesystem encoding for
filenames; they deal in legal and illegal bytes which could lead to
troubles.  This problem of which encoding to use is a problem that can be
seen on UNIX systems even now.  Try this:

  echo 'print(hi)'  café.py
  convmv -f utf-8 -t latin1 café.py
  python3 -c 'import café'

ASCII seems very sensible to me when faced with these ambiguities.

Other options I can brainstorm that could be explored:

* Specify an encoding per platform and stick to that.  (So, for instance,
  all module names on posix platforms would have to be utf-8).  Force
  translation between encoding when installing packages (But that doesn't
  help for people that are creating their modules using their own build
  scripts rather than distutils, copying the files using raw tar, etc.)
* Change import semantics to allow specifying the encoding of the module on
  the filesystem (seems really icky).

-Toshio


pgpsh1AqAY9Vd.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Import and unicode: part two

2011-01-19 Thread Toshio Kuratomi
On Wed, Jan 19, 2011 at 07:11:52PM -0500, James Y Knight wrote:
 On Jan 19, 2011, at 6:44 PM, Toshio Kuratomi wrote:
  This problem of which encoding to use is a problem that can be
  seen on UNIX systems even now.  Try this:
  
   echo 'print(hi)'  café.py
   convmv -f utf-8 -t latin1 café.py
   python3 -c 'import café'
  
  ASCII seems very sensible to me when faced with these ambiguities.
  
  Other options I can brainstorm that could be explored:
  
  * Specify an encoding per platform and stick to that.  (So, for instance,
   all module names on posix platforms would have to be utf-8).  Force
   translation between encoding when installing packages (But that doesn't
   help for people that are creating their modules using their own build
   scripts rather than distutils, copying the files using raw tar, etc.)
  * Change import semantics to allow specifying the encoding of the module on
   the filesystem (seems really icky).
 
 None of this is unique to import -- the same exact issue occurs with 
 open(u'café'). I don't see any reason why import café should be though of as 
 more of a problem, or treated any differently.
 
It's unique in several ways:

1) With open, you can specify a byte string::
   open(b'caf\xe9.py').read()

   I don't know of any way to do that with import.
   This is needed when the filename is not compatible with your current
   locale.

2) import assigns a name to the module that it imports whereas open lets the
   programmer assign the name.  So even if you can specify what to use as
   a byte string for this filename on this particular filesystem you'd still
   end up with some ugly pseudo-representation of bytes when attempting to
   access it in code::
   import caf\xe9

   caf\xe9.do_something()

-Toshio


pgp3UpXl83i8t.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Import and unicode: part two

2011-01-19 Thread Toshio Kuratomi
On Thu, Jan 20, 2011 at 01:26:01AM +0100, Victor Stinner wrote:
 Le mercredi 19 janvier 2011 à 15:44 -0800, Toshio Kuratomi a écrit : 
  Additionally, many unix filesystem don't specify a filesystem encoding for
  filenames; they deal in legal and illegal bytes which could lead to
  troubles.  This problem of which encoding to use is a problem that can be
  seen on UNIX systems even now.
 
 If the system is not correctly configured, it is not a bug in Python,
 but a bug in the system config. Python relies on the locale to choose
 the filesystem encoding (sys.getfilesystemencoding()). Python uses this
 encoding to decode and encode all filenames.
 
Saying that multiple encodings on a single system is a misconfiguration
every time it comes up does not make it true.  There's been multiple
examples of how you can end up with multiple encodings of filenames on
a single system listed in past threads: multiple users with different
encodings for their locales, mounting remote filesystems, downloading
a file To the existing list I'd add getting a package from pypi --
neither tar nor zip files contain encoding information about the filenames.
Therefore if I create an sdist of a python module using non-ascii filenames
using a locale of latin1 and then upload to pypi, people downloading that
on a utf-8 using locale will end up not being able to use the module.

  * Specify an encoding per platform and stick to that.
 
 It doesn't work: on UNIX/BSD, the user chooses its own encoding and all
 programs will use it.
 
The proposal is that you ignore that when talking about loading and creating
(I mentioned distutils because my thought was that distutils could grow the
ability to translate from the system locale to a chosen neutral encoding
when running setup.py any of the dist commands but that doesn't address the
issue when testing a module that you've just written so perhaps that's not
necessary.) python modules.  Python modules would have a set of defined
filesystem encodings per system.  This prevents getting a mixture of
encodings of modules and having things work in one location but fail when
used somewhere else.  Instead, you get an upfront failure until you correct
the encoding.

 Anyway, I don't see why it is a problem to have different encodings on
 different systems. Each system can use its own encoding. The bug that
 I'm trying to solve is a Python bug, not an OS bug.
 
There is no OS bug here.  There is perhaps an OS design flaw but it's not
a flaw that will be going away soon (in part, because the present OS
designers do not see it as an OS flaw... to them it's a bug in code that
attempts to build a simpler interface on top of it.)

  * Change import semantics to allow specifying the encoding of the module on
the filesystem (seems really icky).
 
 This is a very bad idea. I introduced PYTHONFSENCODING environment
 variable in Python 3.2, but then quickly removed it, because it
 introduced a lot of inconsistencies.
 
Thanks for getting rid of that, PYTHONFSENCODING is a bad idea because it
doesn't solve the underlying issues.  However, when I say specifying the
encoding of the module on the filesystem, I don't mean something global like
PYTHONFSENCODING -- I mean something at the python code level::

   import café encoded_as('latin1')

After thinking about this one, though, I don't think it will work either.
This takes care of importing modules where the fs encoding of the module is
known but it doesn't where the fs encoding may be translated between
platforms.  I believe that this could arise when untarring a module on
windows using winzip or similar that gives you the option of translating
from utf-8 bytes into bytes that have meaning as characters on that
platform, for instance.

Do you have a solution to the problem?  I haven't looked at your patch so
perhaps you have an ingenous method of translating from the unicode
representation of the module in the import statement to the bytes in
arbitrary encodings on the filesystem that I haven't thought of.  If you
don't, however, then really - ASCII-only seems like the sanest of the three
solutions I can think of.

-Toshio


pgpxKdCbo8dSk.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Import and unicode: part two

2011-01-19 Thread Toshio Kuratomi
On Thu, Jan 20, 2011 at 03:51:05AM +0100, Victor Stinner wrote:
 For a lesson at school, it is nice to write examples in the
 mother language, instead of using raw english with ASCII identifiers
 and filenames.

Then use this::
   import cafe as café

When you do things this way you do not have to translate between unknown
encodings into unicode.  Everything is within python source where you have
a defined encoding.

Teaching students to write non-portable code (relying on filesystem encoding
where your solution is, don't upload to pypi anything that has non-ascii
filenames) seems like the exact opposite of how you'd want to shape a young
student's understanding of good programming practices.

 In a school, you can use the same configuration
 (encoding) on all computers.
 
In a school computer lab perhaps.  But not on all the students' and
professors' machines.  How many professors will be cursing python when they
discover that the example code that they wrote on their Linux workstation
doesn't work when the students try to use it in their windows computer lab?
How many students will be upset when the code they turn in runs on their
professor's test machine if the lab computers were booted into the Linux
partition but not if the they were booted into Windows?

 
* Specify an encoding per platform and stick to that.
   
   It doesn't work: on UNIX/BSD, the user chooses its own encoding and all
   programs will use it.
   
  (...) This prevents getting a mixture of encodings of modules (...)
 
 If you have an issue with encodings, when have to fix it when you create
 a module (on disk), not when you load a module (it is too late).
 
It's not too late to throw a clear error of what's wrong.

  I haven't looked at your patch so
  perhaps you have an ingenous method of translating from the unicode
  representation of the module in the import statement to the bytes in
  arbitrary encodings on the filesystem that I haven't thought of.
 
 On Windows, My patch tries to avoid any conversion: it uses unicode
 everywhere.
 
 On other OSes, it uses the Python filesystem encoding to encode a module
 name (as it is done for any other operation on the filesystem with an
 unicode filename).
 
The other interfaces are somewhat of a red herring here.  As I wrote in
another email, importing modules has ramifications that open(), for
instance, does not.  Additionally, those other filesystem operations have
been growing the ability to take byte values and encoding parameters because
unicode translation via a single filesystem encoding is a good default but
not a complete solution.

I think that this problem demands a complete solution, however, and it seems
to me that limiting the scope of the problem is the most pleasant method to
accomplish this.  Your solution creates modules which aren't portable.  One
of my proposals creates python code which isn't portable.  The other one
suffers some of the same disadvantages as your solution in portability but
allows for tools that could automatically correct modules.

 --
 
 Python 3 supports bytes filename to be able to read/copy/delete
 undecodable filenames, filenames stored in a encoding different than the
 system encoding, broken filenames. It is also possible to access these
 files using PEP 383 (with surrogate characters). This is useful to use
 Python on an old system.
 
  If you don't, however, then really - ASCII-only seems like the sanest 
  of the three solutions I can think of.
 
 But a (Python 3) module is not supposed to have a broken filename. If it
 is the case, you have better to fix its name, instead of trying to fix
 the problem later (in Python).
 
We agree that there should not be broken module names.  However it seems we
very hotly disagree about the definition of that.  You think that if
a module is named appropriately on one system but is not portable to another
system, that's fine.  I think that portability between systems is very
important and sacrificing that so that someone can locally use a module with
non-ASCII characters doesn't have a justifiable reward.

 With UTF-8 filesystem encoding (eg. on Mac OS X, and most Linux setups),
 it is already possible to use non-ASCII module names.
 
Tangent: This is not true about Linux.  UTF-8 is a matter of the
interpretation of the filesystem bytes that the user specifies by setting
their system locale.  Setting system locale to ASCII for use in system-wide
scripts, is quite common as is changing locale settings in other parts of
the world (as I can tell you from the bug reports colleagues CC me on to fix
for the problems with unicode support in their python2 programs).  Allowing
module names incompatible with ascii without specifying an encoding will
just lead to bug reports down the line.

Relatively few programmers understand the difference between the python
unicode abstraction and the byte representations possible for those strings.
Allowing non-ascii characters in module filenames without specifying an

Re: [Python-Dev] Import and unicode: part two

2011-01-19 Thread Toshio Kuratomi
On Wed, Jan 19, 2011 at 09:02:17PM -0800, Glenn Linderman wrote:
 On 1/19/2011 8:39 PM, Toshio Kuratomi wrote:
 
 use this::
 
import cafe as café
 
 When you do things this way you do not have to translate between unknown
 encodings into unicode.  Everything is within python source where you have
 a defined encoding.
 
 
 This is a great way of converting non-portable module names, if the module 
 ever
 leaves the bounds of its computer, and runs into problems there.
 
You're missing a piece here.  If you mandate ascii you can convert to
a unicode name using import as because python knows that it has ascii text
from the filesystem when it converts it to an abstract unicode string that
you've specified in the program text.  You cannot go the other way because
python lacks the information (the encoding of the filename on the
filesystem) to do the transformation.

 Your demonstration of such an easy solution to the concerns you raise 
 convinces
 me more than ever that it is acceptable to allow non-ASCII module names.  For
 those programmers in a single locale environment, it'll just work.  And for
 those not in a single locale environment, there is your above simple solution
 to achieve portability without changing large numbers of lines of code.
 
Does my demonstration that you can't do that mean that it's no longer
acceptable?  :-)

/me guesses that the relative merits of being forced to write portable code
vs convenience of writing a module name in your native script still has
a different balance than in mine, thus the smiley :-)

-Toshio


pgpVg5DKpRDXA.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 384 accepted

2010-12-04 Thread Toshio Kuratomi
On Fri, Dec 03, 2010 at 11:52:41PM +0100, Martin v. Löwis wrote:
 Am 03.12.2010 23:48, schrieb Éric Araujo:
  But I'm not interested at all in having it in distutils2. I want the
  Python build itself to use it, and alas, I can't because of the freeze.
  You can’t in 3.2, true.  Neither can you in 3.1, or any previous
  version.  If you implement it in distutils2, you have very good chances
  to get it for 3.3.  Isn’t that a win?
 
 It is, unfortunately, a very weak promise. Until distutils2 is
 integrated in Python, I probably won't spend any time on it.
 
At the language summit it was proposed and seemed generally accepted (maybe
I took silence as consent... it's been almost a year now) that bold new
modules (and bold rewrites of existing modules since it fell out of the
distutils/2 discussion) should get implemented in a module on pypi before
being merged into the python stdlib.  If you wouldn't want to work on any of
those modules until they were actually integrated into Python, it sounds
like you disagree with that as a general practice?

-Toshio


pgpBIM4lN9FET.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Porting Ideas

2010-12-01 Thread Toshio Kuratomi
On Wed, Dec 01, 2010 at 10:06:24PM -0500, Alexander Belopolsky wrote:
 On Wed, Dec 1, 2010 at 9:53 PM, Terry Reedy tjre...@udel.edu wrote:
 ..
  Does Sphinx run on PY3 yet?
 
 It does, but see issue10224 for details.
 
  http://bugs.python.org/issue10224

Also, docutils has an unported module.

/me needs to write a bug report for that as he really doesn't have the time
he thought he did to perform the port.

-Toshio


pgplgIh22rxh1.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking undocumented API

2010-11-09 Thread Toshio Kuratomi
On Tue, Nov 09, 2010 at 01:49:01PM -0500, Tres Seaver wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 11/08/2010 06:26 PM, Bobby Impollonia wrote:
 
  This does hurt because anyone who was relying on import * to get a
  name which is now omitted from __all__ is going to upgrade and find
  their program failing with NameErrors. This is a backwards compatible
  change and shouldn't happen without a deprecation warning first.
 
 Outside an interactive prompt, anyone using from foo import * has set
 themselves and their users up to lose anyway.
 
 That syntax is the single worst misfeature in all of Python.  It impairs
 readability and discoverability for *no* benefit beyond one-time typing
 convenience.  Module writers who compound the error by expecting to be
 imported this way, thereby bogarting the global namespace for their own
 purposes, should be fish-slapped. ;)
 
I think there's a valid case for bogarting the namespace in this instance,
but let me know if there's a better way to do it::

# Method to use system libraries if available, otherwise use a bundled copy,
# aka: make both system packagers and developers happy::


Relevant directories and files for this module::

+ foo/
+- __init__.py
++ compat/
 +- __init__.py
 ++ bar/
  +- __init__.py
  +- _bar.py

foo/compat/bar/_bar.py is a bundled module.

foo/compat/bar/__init__.py has:

try:
from bar import *
from bar import __all__
except ImportError::
from foo.compat.bar._bar import *
from foo.compat.bar._bar import __all__

-Toshio


pgp2MughtFdu4.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking undocumented API

2010-11-08 Thread Toshio Kuratomi
On Tue, Nov 09, 2010 at 11:46:59AM +1100, Ben Finney wrote:
 Ron Adam r...@ronadam.com writes:
 
  def _publicly_documented_private_api():
Not sure why you would want to do this
   instead of using comments.
  
  ...
 
 Because the docstring is available at the interpreter via ‘help()’, and
 because it's automatically available to ‘doctest’, and most of the other
 good reasons for docstrings.
 
  The _publicly_documented_private_api() is a problem because people
  *will* use it even though it has a leading underscore. Especially
  those who are new to python.
 
 That isn't an argument against docstrings, since the problem you
 describe isn't dependent on the presence or absence of docstrings.
 
Just wanted to expand a bit here:  as a general practice, you may be
involved in a project where the _private_api() is not intended by people
outside of the project but is intended to be used in multiple places within
the project.  If you have different people working on those different areas,
it can be very useful for them to be able to use help(_private_api) on the
other functions from within the interpreter shell.

-Toshio


pgpG39YJbm42M.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Continuing 2.x

2010-10-29 Thread Toshio Kuratomi
On Fri, Oct 29, 2010 at 11:12:28AM -0700, geremy condra wrote:
 On Thu, Oct 28, 2010 at 11:55 PM, Glyph Lefkowitz
  Let's take PyPI numbers as a proxy.  There are ~8000 packages with a
  Programming Language::Python classifier.  There are ~250 with Programming
  Langauge::Python::3.  Roughly speaking, we can say that is 3% of Python
  code which has been ported so far.  Python 3.0 was released at the end of
  2008, so people have had roughly 2 years to port, which comes up with 1.5%
  per year.
 Just my two cents:
 
Just one further informational note about using pypi in this way for
statistics... In the porting work we've done within Fedora, I've noticed
that a lot of packages are python3 ready or even officially support python3
but the language classifier on pypi does not reflect this.  Here's just
a few since I looked them up when working on the python porting wiki pages:

http://pypi.python.org/pypi/Beaker/
http://pypi.python.org/pypi/pycairo
http://pypi.python.org/pypi/docutils

-Toshio


pgphZAiUVGy6C.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] My work on Python3 and non-ascii paths is done

2010-10-21 Thread Toshio Kuratomi
On Thu, Oct 21, 2010 at 12:00:40PM -0400, Barry Warsaw wrote:
 On Oct 20, 2010, at 02:11 AM, Victor Stinner wrote:
 
 I plan to fix Python documentation: specify the encoding used to decode all 
 byte string arguments of the C API. I already wrote a draft patch: issue 
 #9738. This lack of documentation was a big problem for me, because I had to 
 follow the function calls to get the encoding.
 
This will be truly excellent!

 That's exactly what I was looking for!  Thanks.  I think you've learned a huge
 amount of good information that's difficult to find, so writing it up in a
 more permanent and easy to find location will really help future Python
 developers!
 
One further thing I'd be interested in is if you could document any best
practices from this experience.  Things like, surrogateescape is a good/bad
default in these cases,  When is parallel functions for bytes and str
better than a single polymorphic function?  That way when other modules are
added to the stdlib, things can be more consistent.

-Toshio


pgp6M2nRKwOkl.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Distutils2 scripts

2010-10-08 Thread Toshio Kuratomi
On Fri, Oct 08, 2010 at 10:26:36AM -0400, Barry Warsaw wrote:
 On Oct 08, 2010, at 03:22 PM, Tarek Ziadé wrote:
 
 Yes that what I was thinking about -- I am not too worried about this,
 since every Linux  deals with the 'more than one python installed'
 case.
 
 Kind of. wink  but anyway...
 
  I'm in favor of add a top-level setup module that can be invoked using
  python -m setup   There will be three cases:
 
 Nice idea ! I wouldn't call it setup though, since it does many other
 things. I can't think of a good name yet, but I'd like such a script
 to express the idea that it can be used to:
 
 I like 'python -m setup' too.  It's a small step from the familiar thing
 (python setup.py) to the new and shiny thing, without being confusing.  And
 you won't have to worry about things like version numbers because the Python
 executable will already have that baked in.
 
 - query pypi
 - browse what's installed
 - install/remove projects
 - create releases and upload them
 
 pkg_manager ?
 
 No underscores, please. :)
 
 Actually, a decent wrapper script could just be called 'setup'.  My
 command-not-found on Ubuntu doesn't find a collision, or even close
 similarities.
 
Simple English names like this are almost never a good idea for commands.
A quick google for /usr/bin/setup finds that Fedora-derived distros have
a /usr/bin/setup as a wrapper for all the text-mode configuration tools.
And there's a derivative of opensolaris that has a /usr/bin/setup for
configuring the system the first time.

 I still like 'egg' as a command too.  There are no collisions that I can see.
 I know this has been thrown around for years, and it's always been rejected
 because I think setuptools wanted to claim it, but since it still doesn't
 exist afaict, distutils2 could easily use it.
 
There's a 2D graphics library that provides a /usr/bin/egg command:
  http://www.ir.isas.jaxa.jp/~cyamauch/eggx_procall/
Latest Stable Version 0.93r3 (released 2010/4/14)

In the larger universe of programs, it might make for more intuitive
remembering of the command to use a prefix (either py or python) though.

python-setup  is a lot like python setup.py
pysetup is shorter
pyegg is even shorter :-)

-Toshio


pgpVyH77xDEyw.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Distutils2 scripts

2010-10-08 Thread Toshio Kuratomi
On Fri, Oct 08, 2010 at 05:12:44PM +0200, Antoine Pitrou wrote:
 On Fri, 8 Oct 2010 11:04:35 -0400
 Toshio Kuratomi a.bad...@gmail.com wrote:
  
  In the larger universe of programs, it might make for more intuitive
  remembering of the command to use a prefix (either py or python) though.
  
  python-setup  is a lot like python setup.py
  pysetup is shorter
  pyegg is even shorter :-)
 
 Wouldn't quiche be a better alternative for pyegg?
 
I won't bikeshed as long as we stay away from conflicting names.

-Toshio


pgpk9LAmigC2q.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] We should be using a tool for code reviews

2010-09-30 Thread Toshio Kuratomi
On Wed, Sep 29, 2010 at 01:23:24PM -0700, Guido van Rossum wrote:
 On Wed, Sep 29, 2010 at 1:12 PM, Brett Cannon br...@python.org wrote:
  On Wed, Sep 29, 2010 at 12:03, Guido van Rossum gu...@python.org wrote:
  A problem with that is that we regularly make matching improvements to
  upload.py and the server-side code it talks to. While we tend to be
  conservative in these changes (because we don't control what version
  of upload.py people use) it would be a pain to maintain backwards
  compatibility with a version that was distributed in Misc/ two years
  ago -- that's kind of outside our horizon.
 
  Well, I would assume people are working from a checkout. Patches from
  an outdated checkout simply would fail and that's fine by me.
 
 Ok, but that's an extra barrier for contributions. Lots of people when
 asked for a patch just modify their distro in place and you can count
 yourself lucky if they send you a diff from a clean copy.
 
 But maybe with Hg it's less of a burden to ask people to use a checkout.
 
  How often do we even get patches generated from a downloaded copy of
  Python? Is it enough to need to worry about this?
 
 I used to get these frequently. I don't know what the experience of
 the current crop of core developers is though, so maybe my gut
 feelings here are outdated.
 
When helping out on a Linux distribution, dealing with patches against the
latest tarball is a fairly frequent occurrence.  The question would be
whether these patches get filtered through the maintainer of the package
before landing in roundup/rietveld and whether the distro maintainer is
sufficiently in tune with python development that they're maintaining both
patches against the last tarball and a checkout of trunk with the patches
applied intelligently there.

A few other random thoughts:

* hg could be more of a burden in that it may be unfamiliar to the casual
  python user who happens to have found a fix for a bug and wants to submit
  it.  cvs and svn are similar enough that people comfortable with one are
  usually comfortable with the other but hg has different semantics.
* The barrier to entry seems to be higher the less well integrated the tools
  are.  I occassionally try to contribute patches to bzr in launchpad and
  the integration there is horrid.  You end up with two separate streams of
  comments and you don't automatically get subscribed to both.  There's
  several UI elements for associating a branch with a bug but some of them
  are buggy (or else are very strict on what input they're expecting) while
  other ones are hard to find.  Since I only contribute a patch two or three
  times a year, I have to re-figure out the process each time I try to
  contribute.
* I like the idea of patch complexity being a measure of whether the patch
  needs to go into a code review tool in that it keeps simple things simple
  and gives more advanced tools to more advanced cases.  I dislike it in
  that for someone who's just contributing a patch to fix a problem that
  they're encountering which happens to be somewhat complex, they end up
  having to learn a lot about tools that they may never use again.
* It seems like code review will be a great aid to people who submit changes
  or review changes frequently.  The trick will be making it
  non-intimidating for someone who's just going to contribute changes
  infrequently.

-Toshio


pgpaYtl9m5J7d.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Toshio Kuratomi
On Thu, Sep 16, 2010 at 09:52:48AM -0400, Barry Warsaw wrote:
 On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote:
 
 There are some APIs that should be able to handle bytes *or* strings,
 but the current use of string literals in their implementation means
 that bytes don't work. This turns out to be a PITA for some networking
 related code which really wants to be working with raw bytes (e.g.
 URLs coming off the wire).
 
 Note that email has exactly the same problem.  A general solution -- even if
 embodied in *well documented* best-practices and convention -- would really
 help make the stdlib work consistently, and I bet third party libraries too.
 
I too await a solution with abated breath :-) I've been working on
documenting best practices for APIs and Unicode and for this type of
function (take bytes or unicode and output the same type), knowing the
encoding is seems like a requirement in most cases:

http://packages.python.org/kitchen/designing-unicode-apis.html#take-either-bytes-or-unicode-output-the-same-type

I'd love to add another strategy there that shows how you can robustly
operate on bytes without knowing the encoding but from writing that, I think
that anytime you simplify your API you have to accept limitations on the
data you can take in.  (For instance, some simplifications can handle
anything except ASCII-incompatible encodings).

-Toshio


pgpAJSHDGRHtD.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Toshio Kuratomi
On Thu, Sep 16, 2010 at 10:56:56AM -0700, Guido van Rossum wrote:
 On Thu, Sep 16, 2010 at 10:46 AM, Martin (gzlist) gzl...@googlemail.com 
 wrote:
  On 16/09/2010, Guido van Rossum gu...@python.org wrote:
 
  In all cases I can imagine where such polymorphic functions make
  sense, the necessary and sufficient assumption should be that the
  encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all
  Latin-N variant, and AFAIK also the popular CJK encodings other than
  UTF-16. This is the same assumption made by Python's byte type when
  you use character-based methods like lower().
 
  Well, depends on what exactly you're doing, it's pretty easy to go wrong:
 
  Python 3.2a2+ (py3k, Sep 16 2010, 18:43:45) [MSC v.1500 32 bit (Intel)] on 
  win32
  Type help, copyright, credits or license for more information.
  import os, sys
  os.path.split(C:\\十)
  ('C:\\', '十')
  os.path.split(C:\\十.encode(sys.getfilesystemencoding()))
  (b'C:\\\x8f', b'')
 
  Similar things can catch out web developers once they step outside the
  percent encoding.
 
 Well, that character is not 7-bit ASCII. Of course things will go
 wrong there. That's the whole point of what I said, isn't it?
 
You were talking about encodings that were supersets of 7-bit ASCII.
I think Martin was demonstrating a byte string that was a superset of 7-bit
ASCII being fed to a stdlib function which went wrong.

-Toshio


pgpTUIwKWOepG.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing #7175: a standard location for Python config files

2010-08-12 Thread Toshio Kuratomi
On Fri, Aug 13, 2010 at 07:48:22AM +1000, Nick Coghlan wrote:
 2010/8/12 Éric Araujo mer...@netwok.org:
  Choosing an arbitrary location we think is good on every system is fine
  and non risky I think, as long as Python let the various distribution
  change those paths though configuration.
 
  Don’t you have a bootstrapping problem? How do you know where to look at
  the sysconfig file that tells where to look at config files?

I'd hardcode a list of locations.
  [os.path.join(os.path.dirname(__file__), 'sysconfig.cfg'),
   os.path.join('/etc', 'sysconfig.cfg')]

The distributor has a limited choice of options on where to look.

A good alternative would be to make the config file overridable.  That way
you can have sysconfig.cfg next to sysconfig.py or in a known config
directory relative to the python stdlib install but also let the
distributions and individual sites override the defaults by making changes
to /etc/python3/sysconfig.cfg for instance.

 
 Personally, I'm not clear on what a separate syconfig.cfg file offers
 over clearly separating the directory configuration settings and
 continuing to have distributions patch sysconfig.py directly. The
 bootstrapping problem (which would encourage classifying synconfig.cfg
 as source code and placing it alongside syscongig.py) is a major part
 of that point of view.
 
Here's some advantages but some of them are of dubious worth:

* Allows users/site-administrators to change paths and not have packaging
  systems overwrite the changes.
* Makes it conceptually cleaner to make this overridable via user defined
  config files since  it's now a matter of parsing several config files
  instead of having a hardcoded value in the file and overridable values
  outside of it.
* Allows sites to add additional paths to the config file.
* Makes it clear to distributions that the values in the config file are
  available for making changes to rather than having to look for it in code
  and not know the difference between thaat or say, the encoding parameter
  in python2.
* Documents the format to use for overriding the paths if individual sites
  can override the defaults that are shipped in the system version of
  python.

-Toshio


pgpBEZ2XsDBy9.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread Toshio Kuratomi
On Tue, Jul 06, 2010 at 10:10:09AM +0300, Nir Aides wrote:
 I take ...running off with the good stuff and selling it for profit to mean
 creating derivative work and commercializing it as proprietary code which 
 you
 can not do with GPL licensed code. Also, while the GPL does not prevent 
 selling
 copies for profit it does not make it very practical either.
 
Uhmmm http://finance.yahoo.com/q/is?s=RHTannual

It is very possible to make money with the GPL.  The GPL does, as you say,
prevents you from creating derivative works that are proprietary code.  It
does *not* prevent you from creating derivative works and commercializing
it.

-Toshio


pgpInicmKNFs3.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Toshio Kuratomi
On Wed, Jun 23, 2010 at 09:36:45PM +0200, Antoine Pitrou wrote:
 On Wed, 23 Jun 2010 14:23:33 -0400
 Tres Seaver tsea...@palladion.com wrote:
  - - the slow adoption / porting rate of major web frameworks and libraries
to Python 3.
 
 Some of the major web frameworks and libraries have a ton of
 dependencies, which would explain why they really haven't bothered yet.
 
 I don't think you can't claim, though, that Python 3 makes things
 significantly harder for these frameworks. The proof is that many of
 them already give the user unicode strings in Python 2.x. They must
 have somehow got the decoding right.
 
Note that this assumption seems optimistic to me.  I started talking to Graham
Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste
do decoding of bytes to unicode at different layers which caused problems
for application level code that should otherwise run fine when being served
by mod_wsgi or paste httpserver.  That was the beginning of Graham starting
to talk about what the wsgi spec really should look like under python3
instead of the broken way that the appendix to the current wsgi spec states.

-Toshio


pgpRSbaUGJzcz.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread Toshio Kuratomi
On Wed, Jun 23, 2010 at 11:35:12PM +0200, Antoine Pitrou wrote:
 On Wed, 23 Jun 2010 17:30:22 -0400
 Toshio Kuratomi a.bad...@gmail.com wrote:
  Note that this assumption seems optimistic to me.  I started talking to 
  Graham
  Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste
  do decoding of bytes to unicode at different layers which caused problems
  for application level code that should otherwise run fine when being served
  by mod_wsgi or paste httpserver.  That was the beginning of Graham starting
  to talk about what the wsgi spec really should look like under python3
  instead of the broken way that the appendix to the current wsgi spec states.
 
 Ok, but the reason would be that the WSGI spec is broken. Not Python 3
 itself.
 
Agreed.  Neither python2 nor python3 is broken.  It's the wsgi spec and the
implementation of that spec where things fall down.  From your first post,
I thought you were claiming that python3 was broken since web frameworks got
decoding right on python2 and I just wanted to defend python3 by showing
that python2 wasn't all sunshine and roses.

-Toshio


pgp8xQXfAPrYT.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Toshio Kuratomi
On Tue, Jun 22, 2010 at 11:58:57AM +0900, Stephen J. Turnbull wrote:
 Toshio Kuratomi writes:
 
   One comment here -- you can also have uri's that aren't decodable into 
 their
   true textual meaning using a single encoding.
   
   Apache will happily serve out uris that have utf-8, shift-jis, and
   euc-jp components inside of their path but the textual
   representation that was intended will be garbled (or be represented
   by escaped byte sequences).  For that matter, apache will serve
   requests that have no true textual representation as it is working
   on the byte level rather than the character level.
 
 Sure.  I've never seen that combination, but I have seen Shift JIS and
 KOI8-R in the same path.
 
 But in that case, just using 'latin-1' as the encoding allows you to
 use the (unicode) string operations internally, and then spew your
 mess out into the world for someone else to clean up, just as using
 bytes would.
 
This is true.  I'm giving this as a real-world counter example to the
assertion that URIs are text.  In fact, I think you're confusing things
a little by asserting that the RFC says that URIs are text.  I'll address
that in two sections down.

   So a complete solution really should allow the programmer to pass
   in uris as bytes when the programmer knows that they need it.
 
 Other than passing bytes into a constructor, I would argue if a
 complete solution requires, eg, an interface that allows
 urljoin(base,subdir) where the types of base and subdir are not
 required to match, then it doesn't belong in the stdlib.  For stdlib
 usage, that's premature optimization IMO.
 
I'll definitely buy that.  Would urljoin(b_base, b_subdir) = bytes and
urljoin(u_base, u_subdir) = unicode be acceptable though?  (I think, given
other options, I'd rather see two separate functions, though.  It seems more
discoverable and less prone to taking bad input some of the time to have two
functions that clearly only take one type of data apiece.)

 The RFC says that URIs are text, and therefore they can (and IMO
 should) be operated on as text in the stdlib.

If I'm reading the RFC correctly, you're actually operating on two different
levels here.  Here's the section 2 that you quoted earlier, now in its
entirety::
2.  Characters

   The URI syntax provides a method of encoding data, presumably for the
   sake of identifying a resource, as a sequence of characters.  The URI
   characters are, in turn, frequently encoded as octets for transport or
   presentation.  This specification does not mandate any particular
   character encoding for mapping between URI characters and the octets used
   to store or transmit those characters.  When a URI appears in a protocol
   element, the character encoding is defined by that protocol; without such
   a definition, a URI is assumed to be in the same character encoding as
   the surrounding text.

   The ABNF notation defines its terminal values to be non-negative integers
   (codepoints) based on the US-ASCII coded character set [ASCII].  Because
   a URI is a sequence of characters, we must invert that relation in order
   to understand the URI syntax.  Therefore, the integer values used by the
   ABNF must be mapped back to their corresponding characters via US-ASCII
   in order to complete the syntax rules.

   A URI is composed from a limited set of characters consisting of digits,
   letters, and a few graphic symbols.  A reserved subset of those
   characters may be used to delimit syntax components within a URI while
   the remaining characters, including both the unreserved set and those
   reserved characters not acting as delimiters, define each component's
   identifying data.

So here's some data that matches those terms up to actual steps in the
process::

  # We start off with some arbitrary data that defines a resource.  This is
  # not necessarily text.  It's the data from the first sentence:
  data = b\xff\xf0\xef\xe0

  # We encode that into text and combine it with the scheme and host to form
  # a complete uri.  This is the URI characters mentioned in section #2.
  # It's also the sequence of characters mentioned in 1.1 as it is not
  # until this point that we actually have a URI.
  uri = bhttp://host/; + percentencoded(data)
  # 
  # Note1: percentencoded() needs to take any bytes or characters outside of
  # the characters listed in section 2.3 (ALPHA / DIGIT / - / . / _
  # / ~) and percent encode them.  The URI can only consist of characters
  # from this set and the reserved character set (2.2).
  #
  # Note2: in this simplistic example, we're only dealing with one piece of
  # data.  With multiple pieces, we'd need to combine them with separators,
  # for instance like this:
  # uri = b'http://host/' + percentencoded(data1) + b'/'
  # + percentencoded(data2)
  #
  # Note3: at this point, the uri could be stored as unicode or bytes in
  # python3.  It doesn't matter.  It will be a subset of ASCII in either
  # case.

  # Then we

Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Toshio Kuratomi
On Tue, Jun 22, 2010 at 08:31:13PM +0900, Stephen J. Turnbull wrote:
 Toshio Kuratomi writes:
   unicode handling redesign.  I'm stating my reading of the RFC not to defend
   the use case Philip has, but because I think that the outlook that non-text
   uris (before being percentencoded) are violations of the RFC
 
 That's not what I'm saying.  What I'm trying to point out is that
 manipulating a bytes object as an URI sort of presumes a lot about its
 encoding as text.

I think we're more or less in agreement now but here I'm not sure.  What
manipulations are you thinking about?  Which stage of URI construction are
you considering?

I've just taken a quick look at python3.1's urllib module and I see that
there is a bit of confusion there.  But it's not about unicode vs bytes but
about whether a URI should be operated on at the real URI level or the
data-that-makes-a-uri level.

* all functions I looked at take python3 str rather than bytes so there's no
  confusing stuff here
* urllib.request.urlopen takes a strict uri.  That means that you must have
  a percent encoded uri at this point
* urllib.parse.urljoin takes regular string values
* urllib.parse and urllib.unparse take regular string values

 Since many of the URIs we deal with are more or
 less textual, why not take advantage of that?

Cool, so to summarize what I think we agree on:

* Percent encoded URIs are text according to the RFC.
* The data that is used to construct the URI is not defined as text by the
  RFC.
* However, it is very often text in an unspecified encoding
* It is extremely convenient for programmers to be able to treat the data
  that is used to form a URI as text in nearly all common cases.

-Toshio


pgpDvecDxPAjV.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)

2010-06-21 Thread Toshio Kuratomi
On Mon, Jun 21, 2010 at 09:57:30AM -0400, Barry Warsaw wrote:
 On Jun 21, 2010, at 09:37 AM, Arc Riley wrote:
 
 Also, under where it mentions that most OS's do not include Python 3, it
 should be noted which have good support for it.  Gentoo (for example) has
 excellent support for Python 3, automatically installing Python packages
 which have Py3 support for both Py2 and Py3, and the python-based Portage
 package system runs cleanly on Py2.6, Py3.1 and Py3.2.
 
 We're trying to get there for Ubuntu (driven also by Debian).  We have Python
 3.1.2 in main for Lucid, though we will probably not get 3.2 into Maverick
 (the October 2010 release).  We're currently concentrating on Python 2.7 as a
 supported version because it'll be released by then, while 3.2 will still be
 in beta.
 
 If you want to help, or have complaints, kudos, suggestions, etc. for Python
 support on Ubuntu, you can contact me off-list.
 
nod Fedora 14 is about the same.  A nice to have thing that goes along
with these would be a table that has packages ported to python3 and which
distributions have the python3 version of the package.

Once most of the important third party packages are ported to python3 and in
the distributions, this table will likely become out-dated and probably
should be reaped but right now it's a very useful thing to see.

-Toshio


pgp4ovCkaMeKl.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Toshio Kuratomi
On Mon, Jun 21, 2010 at 11:43:07AM -0400, Barry Warsaw wrote:
 On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote:
 
 Something that may make sense to ease the porting process is for some
 of these on the boundary I/O related string manipulation functions
 (such as os.path.join) to grow encoding keyword-only arguments. The
 recommended approach would be to provide all strings, but bytes could
 also be accepted if an encoding was specified. (If you want to mix
 encodings - tough, do the decoding yourself).
 
 This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz
 for it.
 
 Would it make sense to have encoding-carrying bytes and str types?
 Basically, I'm thinking of types (maybe even the current ones) that carry
 around a .encoding attribute so that they can be automatically encoded and
 decoded where necessary.  This at least would simplify APIs that need to do
 the conversion.
 
 By default, the .encoding attribute would be some marker to indicated I have
 no idea, do it explicitly and if you combine ebytes or estrs that have
 incompatible encodings, you'd either throw an exception or reset the .encoding
 to IAmConfuzzled.  But say you had an email header like:
 
 =?euc-jp?b?pc+l7aG8pe+hvKXrpcmhqg==?=
 
 And code like the following (made less crappy):
 
 -snip snip-
 class ebytes(bytes):
 encoding = 'ascii'
 
 def __str__(self):
 s = estr(self.decode(self.encoding))
 s.encoding = self.encoding
 return s
 
 
 class estr(str):
 encoding = 'ascii'
 
 
 s = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa', 
 'euc-jp')
 b = bytes(s, 'euc-jp')
 
 eb = ebytes(b)
 eb.encoding = 'euc-jp'
 es = str(eb)
 print(repr(eb), es, es.encoding)
 -snip snip-
 
 Running this you get:
 
 b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa' ハローワールド! 
 euc-jp
 
 Would it be feasible?  Dunno.  Would it help ease the bytes/str confusion?
 Dunno.  But I think it would help make APIs easier to design and use because
 it would cut down on the encoding-keyword function signature infection.
 
I like the idea of having encoding information carried with the data.
I don't think that an ebytes type that can *optionally* have an encoding
attribute makes the situation less confusing, though.  To me the biggest
problem with python-2.x's unicode/bytes handling was not that it threw
exceptions but that it didn't always throw exceptions.  You might test this
in python2::
t = u'cafe'
function(t)

And say, ah my code works.  Then a user gives it this::
t = u'café'
function(t)

And get a unicode error because the function only works with unicode in the
ascii range.

ebytes seems to have the same pitfall where the code path exercised by your
tests could work with::
eb = ebytes(b)
eb.encoding = 'euc-jp'
function(eb)

but the user exercises a code path that does this and fails::
eb = ebytes(b)
function(eb)

What do you think of making the encoding attribute a mandatory part of
creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).

-Toshio


pgpc4qEcxzofr.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   >