Re: [Python-Dev] PEP 461 Final?

2014-01-18 Thread Nick Coghlan
On 19 January 2014 12:34, Ethan Furman  wrote:
> On 01/18/2014 05:21 PM, Neil Schemenauer wrote:
>>
>> Ethan Furman  wrote:
>>>
>>> So, if %a is added it would act like:
>>>
>>> -
>>> "%a" % some_obj
>>> -
>>> tmp = str(some_obj)
>>> res = b''
>>> for ch in tmp:
>>> if ord(ch) < 256:
>>> res += bytes([ord(ch)]
>>> else:
>>> res += unicode_escape(ch)
>>> -
>>>
>>> where 'unicode_escape' would yield something like "\u0440" ?
>>
>>
>> My patch on the tracker already implements %a, it's simple.
>
>
> Before one implements a patch it is good to know the specifications.

A very sound engineering principle :)

Neil has the resulting semantics right for what I had in mind, but the
faster path to bytes (rather than going through the ASCII builtin) is
to do the C level equivalent of:

repr(obj).encode("ascii", errors="backslashreplace")

That's essentially what the ascii() builtin does, but that operates
entirely in the text domain, so (as Neil found) you still need a
separate encode step at the end.

>>> ascii("è").encode("ascii")
b"'\\xe8'"
>>> repr("è").encode("ascii", errors="backslashreplace")
b"'\\xe8'"

b"%a" % "è" should produce the same result as the two examples above.
(Code points higher up in the Unicode code space would produce \u and
\U escapes as needed, which should already be handled properly by the
backslashreplace error handler)

One nice thing about this definition is that in the specific case of
text input, the transformation can always be reversed by decoding as
ASCII and then applying ast.literal_eval():

>>> import ast
>>> ast.literal_eval(repr("è").encode("ascii",
"backslashreplace").decode("ascii"))
'è'

(Please don't use eval() to reverse a transformation like this, as
doing so not only makes security engineers cry, it's also likely to
make your code vulnerable to all kinds of interesting attacks)

As noted earlier in the thread, one key purpose of including this
feature is to reduce the likelihood of people inappropriately adding
__bytes__ implementations for %s compatibility that look like:

def __bytes__(self):
# This is unlikely to be a good idea!
return repr(self).encode("ascii", errors="backslashreplace")

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Nick Coghlan
On 19 January 2014 10:44, Steve Dower  wrote:
> Visual Studio will try to compile them if they end with .c, though this can
> be disabled on a per-file basis in the project file. Files ending in .h
> won't be compiled, though changes should be detected and cause the .c files
> that include them to be recompiled.

That sounds like a rather good argument for .clinic.h over .clinic.c :)

My assessment of the thread is that .clinic.h will give us the best
overall tool compatibility.

I use Eli Bendersky's pss for my command line source searching needs,
and should be able to update that to skip clinic files without much
difficulty (rather than having to exclude them manually from every
search).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 updates

2014-01-18 Thread Nick Coghlan
On 19 January 2014 00:39, Oscar Benjamin  wrote:
>
> If you want to draw a relevant lesson from that thread in this one
> then the lesson argues against PEP 461: adding back the bytes
> formatting methods helps people who refuse to understand text
> processing and continue implementing dirty hacks instead of doing it
> properly.

Yes, that's why it has taken so long to even *consider* bringing
binary interpolation support back - one of our primary concerns in the
early days of Python 3 was developers (including core developers!)
attempting to translate bad habits from Python 2 into Python 3 by
continuing to treat binary data as text. Making interpolation a purely
text domain operation helped strongly in enforcing this distinction,
as it generally required thinking about encoding issues in order to
get things into the text domain (or hitting them with the "latin-1"
hammer, in which case... *sigh*).

The reason PEP 460/461 came up is that we *do* acknowledge that there
is a legitimate use case for binary interpolation support when dealing
with binary formats that contain ASCII compatible segments. Now that
people have had a few years to get used to the Python 3 text model ,
lowering the barrier to migration from Python 2 and better handling
that use case in Python 3 in general has finally tilted the scales in
favour of providing the feature (assuming Guido is happy with PEP 461
after Ethan finishes the Rationale section).

(Tangent)

While I agree it's not relevant to the PEP 460/461 discussions, so
long as numpy.loadtxt is explicitly documented as only working with
latin-1 encoded files (it currently isn't), there's no problem. If
it's supposed to work with other encodings (but the entire file is
still required to use a consistent encoding), then it just needs
encoding and errors arguments to fit the Python 3 text model (with
"latin-1" documented as the default encoding). If it is intended to
allow S columns to contain text in arbitrary encodings, then that
should also be supported by the current API with an adjustment to the
default behaviour, since passing something like
codecs.getdecoder("utf-8") as a column converter should do the right
thing. However, if you're currently decoding S columns with latin-1
*before* passing the value to the converter, then you'll need to use a
WSGI style decoding dance instead:

def fix_encoding(text):
return text.encode("latin-1").decode("utf-8") # For example

That's more wasteful than just passing the raw bytes through for
decoding, but is the simplest backwards compatible option if you're
doing latin-1 decoding already.

If different rows in the *same* column are allowed to have different
encodings, then that's not a valid use of the operation (since the
column converter has no access to the rest of the row to determine
what encoding should be used for the decode operation).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args()

2014-01-18 Thread Ryan Smith-Roberts
Ah yes, my apologies, I was thrown off by the first converter declaration
in your class and didn't spot the second, so didn't realize what you were
up to.

I still advise you not to use this solution. time() is a system call on
many operating systems, and so it can be a heavier operation than you'd
think. Best to avoid it unless it's needed (on FreeBSD it seems to add
about 15% overhead to localtime(), for instance).

As for why you're getting that exception, it definitely looks like a bug in
Argument Clinic. I spotted another bug that would have bitten you while I
was looking for this one, so I've opened bugs on both issues, and put you
on the nosy list for them.


On Sat, Jan 18, 2014 at 7:42 PM, Nikolaus Rath  wrote:

> Hi Ryan,
>
>
> Ryan Smith-Roberts  writes:
> > Hi Nikolaus. I also started a conversion of timemodule, but dropped it
> when
> > I saw in the issue that you had taken over that conversion. I also tried
> to
> > turn parse_time_t_args into a converter. However, it won't work. The
> > problem is that parse_time_t_args must be called whether or not the user
> > supplies an argument to the function, but an Argument Clinic converter
> only
> > gets called if the user actually supplies something, and not on the
> default
> > value.
>
> I don't quite follow. My approach was to drop parse_time_t_args()
> completely and use _PyTime_ObjectToTime_t() as the conversion function
> (which only needs to be called if the user supplied something).
>
> In other words, I would have expected
>
> >> ,
> >> | /*[python input]
> >> | class time_t_converter(CConverter):
> >> | type = 'time_t'
> >> | converter = 'time_t_converter'
> >> | default = None
> >> | py_default = 'None'
> >> | c_default = 'time(NULL)'
> >> | converter = '_PyTime_ObjectToTime_t'
> >> | [python start generated code]*/
> >> |
> >> | /*[clinic input]
> >> | time.localtime
> >> |
> >> | seconds: time_t
> >> | /
> >> |
> >> | bla.
> >> | [clinic start generated code]*/
> >> `
>
> to produce something like this:
>
> static PyObject *
> time_localtime(PyObject *self, PyObject *args)
> {
>  PyObject *obj = NULL;
>  time_t seconds;
>  struct tm buf;
>
>  if (!PyArg_ParseTuple(args, "|O:localtime", &obj))
>  return NULL;
>  if (obj == NULL || obj == Py_None)
>  seconds = time(NULL);
>  else {
>  if (_PyTime_ObjectToTime_t(obj, &seconds) == -1)
>  return NULL;
>  }
>  return time_localtime_impl(self, seconds);
> }
>
>
> Apart from getting an error from clinic.py, it seems to me that this
> should in principle be possible.
>
> Best,
> Nikolaus
>
>
> >
> > So, the best idea is to
> >
> > * Remove the PyArgs_ParseTuple code from parse_time_t_args
> > * Declare seconds as a plain object in Argument Clinic
> > * Call the modified parse_time_t_args on seconds first thing in the _impl
> > functions
> >
> >
> > On Sat, Jan 18, 2014 at 4:56 PM, Nikolaus Rath 
> wrote:
> >
> >> Hello,
> >>
> >> I'm trying to convert functions using parse_time_t_args() (from
> >> timemodule.c) for argument parsing to argument clinic.
> >>
> >> The function is defined as:
> >>
> >> ,
> >> | static int
> >> | parse_time_t_args(PyObject *args, char *format, time_t *pwhen)
> >> | {
> >> | PyObject *ot = NULL;
> >> | time_t whent;
> >> |
> >> | if (!PyArg_ParseTuple(args, format, &ot))
> >> | return 0;
> >> | if (ot == NULL || ot == Py_None) {
> >> | whent = time(NULL);
> >> | }
> >> | else {
> >> | if (_PyTime_ObjectToTime_t(ot, &whent) == -1)
> >> | return 0;
> >> | }
> >> | *pwhen = whent;
> >> | return 1;
> >> | }
> >> `
> >>
> >> and used like this:
> >>
> >> ,
> >> | static PyObject *
> >> | time_localtime(PyObject *self, PyObject *args)
> >> | {
> >> | time_t when;
> >> | struct tm buf;
> >> |
> >> | if (!parse_time_t_args(args, "|O:localtime", &when))
> >> | return NULL;
> >> | if (pylocaltime(&when, &buf) == -1)
> >> | return NULL;
> >> | return tmtotuple(&buf);
> >> | }
> >> `
> >>
> >> In other words, if any Python object is passed to it, it calls
> >> _PyTime_ObjectToTime_t on it to convert it to time_t, and otherwise uses
> >> time(NULL) as the default value.
> >>
> >> May first attempt to implement something similar in argument clinic was:
> >>
> >> ,
> >> | /*[python input]
> >> | class time_t_converter(CConverter):
> >> | type = 'time_t'
> >> | converter = 'time_t_converter'
> >> | default = None
> >> | py_default = 'None'
> >> | c_default = 'time(NULL)'
> >> | converter = '_PyTime_ObjectToTime_t'
> >> | [python start generated code]*/
> >> |
> >> | /*[clinic input]
> >> | time.localtime
> >> |
> >> | seconds: time_t
> >> | /
> >> |
> >> | bla.
> >> | [clinic start generated code]*/
> >> `
> >>
> >> However, running clinic.py on this file gives:
> >>
> >> ,
> >> | $ Tools/clinic/clinic.py Modul

Re: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args()

2014-01-18 Thread Nikolaus Rath
Hi Ryan,


Ryan Smith-Roberts  writes:
> Hi Nikolaus. I also started a conversion of timemodule, but dropped it when
> I saw in the issue that you had taken over that conversion. I also tried to
> turn parse_time_t_args into a converter. However, it won't work. The
> problem is that parse_time_t_args must be called whether or not the user
> supplies an argument to the function, but an Argument Clinic converter only
> gets called if the user actually supplies something, and not on the default
> value.

I don't quite follow. My approach was to drop parse_time_t_args()
completely and use _PyTime_ObjectToTime_t() as the conversion function
(which only needs to be called if the user supplied something).

In other words, I would have expected

>> ,
>> | /*[python input]
>> | class time_t_converter(CConverter):
>> | type = 'time_t'
>> | converter = 'time_t_converter'
>> | default = None
>> | py_default = 'None'
>> | c_default = 'time(NULL)'
>> | converter = '_PyTime_ObjectToTime_t'
>> | [python start generated code]*/
>> |
>> | /*[clinic input]
>> | time.localtime
>> |
>> | seconds: time_t
>> | /
>> |
>> | bla.
>> | [clinic start generated code]*/
>> `

to produce something like this:

static PyObject *
time_localtime(PyObject *self, PyObject *args)
{
 PyObject *obj = NULL;
 time_t seconds;
 struct tm buf;

 if (!PyArg_ParseTuple(args, "|O:localtime", &obj))
 return NULL;
 if (obj == NULL || obj == Py_None)
 seconds = time(NULL);
 else {
 if (_PyTime_ObjectToTime_t(obj, &seconds) == -1)
 return NULL;
 }
 return time_localtime_impl(self, seconds);
}


Apart from getting an error from clinic.py, it seems to me that this
should in principle be possible.

Best,
Nikolaus


>
> So, the best idea is to
>
> * Remove the PyArgs_ParseTuple code from parse_time_t_args
> * Declare seconds as a plain object in Argument Clinic
> * Call the modified parse_time_t_args on seconds first thing in the _impl
> functions
>
>
> On Sat, Jan 18, 2014 at 4:56 PM, Nikolaus Rath  wrote:
>
>> Hello,
>>
>> I'm trying to convert functions using parse_time_t_args() (from
>> timemodule.c) for argument parsing to argument clinic.
>>
>> The function is defined as:
>>
>> ,
>> | static int
>> | parse_time_t_args(PyObject *args, char *format, time_t *pwhen)
>> | {
>> | PyObject *ot = NULL;
>> | time_t whent;
>> |
>> | if (!PyArg_ParseTuple(args, format, &ot))
>> | return 0;
>> | if (ot == NULL || ot == Py_None) {
>> | whent = time(NULL);
>> | }
>> | else {
>> | if (_PyTime_ObjectToTime_t(ot, &whent) == -1)
>> | return 0;
>> | }
>> | *pwhen = whent;
>> | return 1;
>> | }
>> `
>>
>> and used like this:
>>
>> ,
>> | static PyObject *
>> | time_localtime(PyObject *self, PyObject *args)
>> | {
>> | time_t when;
>> | struct tm buf;
>> |
>> | if (!parse_time_t_args(args, "|O:localtime", &when))
>> | return NULL;
>> | if (pylocaltime(&when, &buf) == -1)
>> | return NULL;
>> | return tmtotuple(&buf);
>> | }
>> `
>>
>> In other words, if any Python object is passed to it, it calls
>> _PyTime_ObjectToTime_t on it to convert it to time_t, and otherwise uses
>> time(NULL) as the default value.
>>
>> May first attempt to implement something similar in argument clinic was:
>>
>> ,
>> | /*[python input]
>> | class time_t_converter(CConverter):
>> | type = 'time_t'
>> | converter = 'time_t_converter'
>> | default = None
>> | py_default = 'None'
>> | c_default = 'time(NULL)'
>> | converter = '_PyTime_ObjectToTime_t'
>> | [python start generated code]*/
>> |
>> | /*[clinic input]
>> | time.localtime
>> |
>> | seconds: time_t
>> | /
>> |
>> | bla.
>> | [clinic start generated code]*/
>> `
>>
>> However, running clinic.py on this file gives:
>>
>> ,
>> | $ Tools/clinic/clinic.py Modules/timemodule.c
>> | Error in file "Modules/timemodule.c" on line 529:
>> | Exception raised during parsing:
>> | Traceback (most recent call last):
>> |   File "Tools/clinic/clinic.py", line 1445, in parse
>> | parser.parse(block)
>> |   File "Tools/clinic/clinic.py", line 2738, in parse
>> | self.state(None)
>> |   File "Tools/clinic/clinic.py", line 3468, in state_terminal
>> | self.function.docstring = self.format_docstring()
>> |   File "Tools/clinic/clinic.py", line 3344, in format_docstring
>> | s += "".join(a)
>> | TypeError: sequence item 2: expected str instance, NoneType found
>> `
>>
>> What am I doing wrong?
>>
>>
>> Best,
>> Nikolaus
>>
>> --
>> Encrypted emails preferred.
>> PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C
>>
>>  »Time flies like an arrow, fruit flies like a Banana.«
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>

Re: [Python-Dev] PEP 461 Final?

2014-01-18 Thread Ethan Furman

On 01/18/2014 02:01 PM, Ethan Furman wrote:


where 'unicode_escape' would yield something like "\u0440" ?


Just to be clear, "\u0440" is the six bytes b'\\', b'u', b'0', b'4', b'4', b'0'.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 Final?

2014-01-18 Thread Ethan Furman

On 01/18/2014 05:21 PM, Neil Schemenauer wrote:

Ethan Furman  wrote:

So, if %a is added it would act like:

-
"%a" % some_obj
-
tmp = str(some_obj)
res = b''
for ch in tmp:
if ord(ch) < 256:
res += bytes([ord(ch)]
else:
res += unicode_escape(ch)
-

where 'unicode_escape' would yield something like "\u0440" ?


My patch on the tracker already implements %a, it's simple.


Before one implements a patch it is good to know the specifications.


Just call PyObject_ASCII() (same as ascii()) then call
PyUnicode_AsLatin1String(s) to convert it to bytes and stick it in.
PyObject_ASCII does not return non-ASCII characters, no decode error
is possible.  We could call _PyUnicode_AsASCIIString(s, "strict")
instead if we are afraid for non-ASCII bytes coming out of
PyObject_ASCII.


I appreciate that this is the behavior you want, but I'm not sure it's the 
behavior Nick was describing.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Migration from Python 2.7 and bytes formatting

2014-01-18 Thread Stephen J. Turnbull
Neil Schemenauer writes:

 > That's it.  After sleeping on it, I'm not sure that's enough Python
 > 2.x compatibility to help a lot.  I haven't ported much code to 3.x
 > yet but I imagine the following are major challenges:
 > 
 > - comparisons between str and bytes always returns unequal
 > 
 > - indexing/iterating bytes returns integers, not bytes objects
 > 
 > - concatenation of str and bytes fails (not so bad since
 >   a TypeError is generated right away).

Experience shows these are rarely major challenges.  The reason we are
having this discussion is that if you are the kind of programmer who
runs into challenges once, you are likely to run into all of the above
and more, repeatedly, and addressing them using features available in
Python up to v3.3 make your code unreadable.

In other words, it's like unemployment at 5%.  It would be bearable
(just) if the pain were shared by 100% of the people being 5%
unemployed, but rather the burden falls on the 5% who are 100%
unemployed.

Now, the problem that many existing libraries face is that they were
designed for monolingual environments where text encodings are more or
less ASCII compatible[1].  If you stay in the Python 2 world, you can
"internationalize" with the existing design, more or less limp along,
fixing encoding bugs as they arise (not "if" but "when", and it can
take a decade to find them all).  But Python 3 *strongly* discourages
that policy.  From the point of view of design for the modern
environment, such libraries really should have their I/O modules
rewritten from scratch (not a huge job), and the necessary adjustments
made in processing code (few but randomly dispersed through the code,
and each a ticking time bomb for your users).  But I stress that the
problem here is that the design of such libraries is at fault, not
Python 3.  The world has changed.[2]

And then there are the remaining 5% or so that really need to work
mostly in bytes, but want to use string formatting to format their
byte streams.  I used to think that this was just a porting
convenience, but I was wrong.  Code written this way is often more
concise and more readable than code written using .join() or the
struct module.  It *should* be written using string formatting.  And
that's what PEPs 460 and 461 are intended to address.

We'll see what happens as these PEPs are implemented, but I suspect
that we'll find that there are very few bandaids left that are of much
use.  That is, as I claimed above, for the remaining problematic
libraries a redesign will be needed.

Footnotes: 
[1]  In the technical sense that you can rely on ASCII bytes to mean
ASCII characters, not part of a non-ASCII character.

[2]  And if the world *hasn't* changed for your application, what's
wrong with staying with Python 2?

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args()

2014-01-18 Thread Ryan Smith-Roberts
Hi Nikolaus. I also started a conversion of timemodule, but dropped it when
I saw in the issue that you had taken over that conversion. I also tried to
turn parse_time_t_args into a converter. However, it won't work. The
problem is that parse_time_t_args must be called whether or not the user
supplies an argument to the function, but an Argument Clinic converter only
gets called if the user actually supplies something, and not on the default
value.

So, the best idea is to

* Remove the PyArgs_ParseTuple code from parse_time_t_args
* Declare seconds as a plain object in Argument Clinic
* Call the modified parse_time_t_args on seconds first thing in the _impl
functions


On Sat, Jan 18, 2014 at 4:56 PM, Nikolaus Rath  wrote:

> Hello,
>
> I'm trying to convert functions using parse_time_t_args() (from
> timemodule.c) for argument parsing to argument clinic.
>
> The function is defined as:
>
> ,
> | static int
> | parse_time_t_args(PyObject *args, char *format, time_t *pwhen)
> | {
> | PyObject *ot = NULL;
> | time_t whent;
> |
> | if (!PyArg_ParseTuple(args, format, &ot))
> | return 0;
> | if (ot == NULL || ot == Py_None) {
> | whent = time(NULL);
> | }
> | else {
> | if (_PyTime_ObjectToTime_t(ot, &whent) == -1)
> | return 0;
> | }
> | *pwhen = whent;
> | return 1;
> | }
> `
>
> and used like this:
>
> ,
> | static PyObject *
> | time_localtime(PyObject *self, PyObject *args)
> | {
> | time_t when;
> | struct tm buf;
> |
> | if (!parse_time_t_args(args, "|O:localtime", &when))
> | return NULL;
> | if (pylocaltime(&when, &buf) == -1)
> | return NULL;
> | return tmtotuple(&buf);
> | }
> `
>
> In other words, if any Python object is passed to it, it calls
> _PyTime_ObjectToTime_t on it to convert it to time_t, and otherwise uses
> time(NULL) as the default value.
>
> May first attempt to implement something similar in argument clinic was:
>
> ,
> | /*[python input]
> | class time_t_converter(CConverter):
> | type = 'time_t'
> | converter = 'time_t_converter'
> | default = None
> | py_default = 'None'
> | c_default = 'time(NULL)'
> | converter = '_PyTime_ObjectToTime_t'
> | [python start generated code]*/
> |
> | /*[clinic input]
> | time.localtime
> |
> | seconds: time_t
> | /
> |
> | bla.
> | [clinic start generated code]*/
> `
>
> However, running clinic.py on this file gives:
>
> ,
> | $ Tools/clinic/clinic.py Modules/timemodule.c
> | Error in file "Modules/timemodule.c" on line 529:
> | Exception raised during parsing:
> | Traceback (most recent call last):
> |   File "Tools/clinic/clinic.py", line 1445, in parse
> | parser.parse(block)
> |   File "Tools/clinic/clinic.py", line 2738, in parse
> | self.state(None)
> |   File "Tools/clinic/clinic.py", line 3468, in state_terminal
> | self.function.docstring = self.format_docstring()
> |   File "Tools/clinic/clinic.py", line 3344, in format_docstring
> | s += "".join(a)
> | TypeError: sequence item 2: expected str instance, NoneType found
> `
>
> What am I doing wrong?
>
>
> Best,
> Nikolaus
>
> --
> Encrypted emails preferred.
> PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C
>
>  »Time flies like an arrow, fruit flies like a Banana.«
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 Final?

2014-01-18 Thread Neil Schemenauer
Steven D'Aprano  wrote:
>> To properly handle int and float subclasses, int(), index(), and float()
>> will be called on the objects intended for (d, i, u), (b, o, x, X), and
>> (e, E, f, F, g, G).
>
>
> -1 on this idea.
>
> This is a rather large violation of the principle of least surprise, and 
> radically different from the behaviour of Python 3 str. In Python 3, 
> '%d' interpolation calls the __str__ method, so if you subclass, you can 
> get the behaviour you want:
>
> py> class HexInt(int):
> ... def __str__(self):
> ... return hex(self)
> ...
> py> "%d" % HexInt(23)
> '0x17'
>
>
> which is exactly what we should expect from a subclass.
>
> You're suggesting that bytes should ignore any custom display 
> implemented by subclasses, and implicitly coerce them to the superclass 
> int. What is the justification for this? You don't define or even 
> describe what you consider "properly handle".

The proposed behavior (at least as I understand it and as I've
implemented in my proposed patch) matches Python 2 str/unicode and
Python 3 str behavior for these codes.  If you want to allow
subclasses to have control or to use duck-typing, you have to use
str and __format__.  I'm okay with the limitation, bytes formatting
can be simple, limited and fast.

  Neil

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 Final?

2014-01-18 Thread Neil Schemenauer
Ethan Furman  wrote:
> So, if %a is added it would act like:
>
> -
>"%a" % some_obj
> -
>tmp = str(some_obj)
>res = b''
>for ch in tmp:
>if ord(ch) < 256:
>res += bytes([ord(ch)]
>else:
>res += unicode_escape(ch)
> -
>
> where 'unicode_escape' would yield something like "\u0440" ?

My patch on the tracker already implements %a, it's simple.  Just
call PyObject_ASCII() (same as ascii()) then call
PyUnicode_AsLatin1String(s) to convert it to bytes and stick it in.
PyObject_ASCII does not return non-ASCII characters, no decode error
is possible.  We could call _PyUnicode_AsASCIIString(s, "strict")
instead if we are afraid for non-ASCII bytes coming out of
PyObject_ASCII.

  Neil

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Ethan Furman

On 01/18/2014 10:49 AM, Larry Hastings wrote:


Later in the thread someone suggests that ".h" would be a better ending; I'm 
willing to consider that.


I'll cast a vote for .clinic.h.  :)

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args()

2014-01-18 Thread Nikolaus Rath
Hello,

I'm trying to convert functions using parse_time_t_args() (from
timemodule.c) for argument parsing to argument clinic.

The function is defined as:

,
| static int
| parse_time_t_args(PyObject *args, char *format, time_t *pwhen)
| {
| PyObject *ot = NULL;
| time_t whent;
| 
| if (!PyArg_ParseTuple(args, format, &ot))
| return 0;
| if (ot == NULL || ot == Py_None) {
| whent = time(NULL);
| }
| else {
| if (_PyTime_ObjectToTime_t(ot, &whent) == -1)
| return 0;
| }
| *pwhen = whent;
| return 1;
| }
`

and used like this:

,
| static PyObject *
| time_localtime(PyObject *self, PyObject *args)
| {
| time_t when;
| struct tm buf;
| 
| if (!parse_time_t_args(args, "|O:localtime", &when))
| return NULL;
| if (pylocaltime(&when, &buf) == -1)
| return NULL;
| return tmtotuple(&buf);
| }
`

In other words, if any Python object is passed to it, it calls
_PyTime_ObjectToTime_t on it to convert it to time_t, and otherwise uses
time(NULL) as the default value.

May first attempt to implement something similar in argument clinic was:

,
| /*[python input]
| class time_t_converter(CConverter):
| type = 'time_t'
| converter = 'time_t_converter'
| default = None
| py_default = 'None'
| c_default = 'time(NULL)'
| converter = '_PyTime_ObjectToTime_t'
| [python start generated code]*/
| 
| /*[clinic input]
| time.localtime
| 
| seconds: time_t
| /
| 
| bla.
| [clinic start generated code]*/
`

However, running clinic.py on this file gives:

,
| $ Tools/clinic/clinic.py Modules/timemodule.c 
| Error in file "Modules/timemodule.c" on line 529:
| Exception raised during parsing:
| Traceback (most recent call last):
|   File "Tools/clinic/clinic.py", line 1445, in parse
| parser.parse(block)
|   File "Tools/clinic/clinic.py", line 2738, in parse
| self.state(None)
|   File "Tools/clinic/clinic.py", line 3468, in state_terminal
| self.function.docstring = self.format_docstring()
|   File "Tools/clinic/clinic.py", line 3344, in format_docstring
| s += "".join(a)
| TypeError: sequence item 2: expected str instance, NoneType found
`

What am I doing wrong?


Best,
Nikolaus

-- 
Encrypted emails preferred.
PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

 »Time flies like an arrow, fruit flies like a Banana.«
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Steve Dower
Visual Studio will try to compile them if they end with .c, though this can be 
disabled on a per-file basis in the project file. Files ending in .h won't be 
compiled, though changes should be detected and cause the .c files that include 
them to be recompiled.

.inl is also sometimes used as an extension for this purpose. I don't recall 
whether VS will add file associations for this type.

Cheers,
Steve

Top-posted from my Windows Phone

From: Larry Hastings
Sent: ‎1/‎18/‎2014 10:58
To: python-dev@python.org
Subject: Re: [Python-Dev] .clinic.c vs .c.clinic



On 01/18/2014 01:02 AM, Serhiy Storchaka wrote:

1. I very very often use global search in sources. It's my way of navigation
and it's my way of investigations. I don't want to get false results in
generated files. And it is much easy to specify mask '*.[ch]' or '*.c,*.h'
(depending on tool) than specify a mask and negative mask. The latter is even
not always possible, I can write cumbersome expression for the find command,
but Midnight Commander doesn't support negative masks at all (and perhaps your
favorite IDE doesn't support them too).

Apparently you do this at the command-line.  In that case, you can make an 
'alias' to hide the cumbersome expression. Perhaps you've already made one that 
ignores the ".hg" directory tree?

If the generated file didn't end in a standard extension, editors won't 
automatically recognize them and won't code-color them.  You tell me "everyone 
can easily reconfigure their editors" but it seems you writing an alias is 
unreasonable.



2. I'm not use any IDE, but if you use, it can be important for you. If IDE
shows sources tree, unlikely you want to see generated *.clinic.c files in
them. This will increase the list of sources almost twice.

My experience is that IDEs either show all files in the "project" (which should 
include the generated files anyway) or they show all files in the directory.  
So this concern assumes behavior that isn't true.



3. Pathname expansion works better with unique endings, You can open all
Modules/_io/*.c files, but unlikely you so interested in *.clinic.c files which
are matched by former pattern.

How often do people edit *.c in a directory?  And then, how often do people 
edit *.c in a directory and wouldn't want to see the Argument Clinic generated 
code?



4. .c suffix at the end lies. This is not compilable C source file. This file
should be included in other C source file. This will confuse accidental user
and other tools. Including Argument Clinic itself, this is why it inserts the
"preserve" directive at the start of generated file. But other tools have no
such sign.

This is nonsense.  The contents of the file is 100% C.  If you added the proper 
include files (by hand, not recommended) it would compile standalone.


A lot of your suggestions assume no one would ever want to examine the 
generated code.  But people will still want to look in there:

  *   to set breakpoints
  *   to make sure existing Argument Clinic generated code does what you wanted
  *   when experimenting with Argument Clinic inputs

So I don't see the need to make the generated files totally invisible.


Later in the thread someone suggests that ".h" would be a better ending; I'm 
willing to consider that.  (As in ".clinic.h".)  After all, you do include it, 
and there's some precedent for C code in H files (the already-cited stringlib).

Also, now I'm starting to worry that adding ".clinic.c" files to an IDE would 
mean the IDE would try to compile them.  Can somebody who uses an IDE to 
compile Python code experiment with ".clinic.c" files and report back--is it 
possible to add them to your "project" in such a way that the compiler will 
notice when they changed but won't try to compile them standalone?  I'm 
thinking specifically of MSVS, as that's explicitly supported by CPython, but 
I'm interested in results from other IDEs if people use them with CPython trunk.


Serhiy: I appreciate your contributions, both to Python in general and to 
Argument Clinic specifically.  And you're only doing this because you care.  
Still, I feel like you've never been shown a bikeshed you didn't have an 
opinion on.


/arry
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Richard Oudkerk


On 18/01/2014 05:09 pm, Antoine Pitrou wrote:

Or, if this collides with Include/*, one of the following:

memoryview_func.h  // public functions

memoryview_if.h  // public interface
Objects/memoryview.clinic.h should be fine.

Or maybe have a __clinic__ directory similar to __pycache__.

-- Richard
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Migration from Python 2.7 and bytes formatting

2014-01-18 Thread Neil Schemenauer
On 2014-01-18, Stephen J. Turnbull wrote:
> The above are descriptions of current behavior (ie, unchanged by PEPs
> 460, 461), and this:
[..]
> is the content of this proposal, is that right?

The proposal is that -2 enables the following:

- %r as an alias for %a (i.e. calls ascii())

- %s will fallback to calling PyObject_Str() and then
  call _PyUnicode_AsASCIIString(obj, "strict") to
  convert to bytes

That's it.  After sleeping on it, I'm not sure that's enough Python
2.x compatibility to help a lot.  I haven't ported much code to 3.x
yet but I imagine the following are major challenges:

- comparisons between str and bytes always returns unequal

- indexing/iterating bytes returns integers, not bytes objects

- concatenation of str and bytes fails (not so bad since
  a TypeError is generated right away).


Maybe the -2 command line option could revert to Python 2.x behavior
for the above but I'm worried it might break working 3.x library
code (the %r/%s change is very safe).  I think I'll play with the
idea and see which unit tests get broken.  Ideally, there would be
warnings generated when each backwards compatible behavior kicks in,
that would greatly help when fixing up code.

  Neil
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 Final?

2014-01-18 Thread Ethan Furman

On 01/18/2014 05:48 AM, Nick Coghlan wrote:

On 18 Jan 2014 11:52, "Ethan Furman" wrote:


I'll admit to being somewhat on the fence about %a.

It seems there are two possibilities with %a:

  1) have it be ascii(repr(obj))

  2) have it be str(obj).encode('ascii', 'strict')


This gets very close to crossing the line into implicit encoding of text again. 
Binary interpolation is being added back
for the specific use case of working with ASCII compatible segments in binary 
formats, and it's at best arguable that
supporting %a will help with that use case.


Agreed.



However, without it, there may be a greater temptation to inappropriately 
define __bytes__ just to support binary
interpolation, rather than because a type truly has an appropriate translation 
directly to bytes.


True.



By allowing %a, we avoid that temptation. This is also potentially useful 
specifically in the case of binary logging
formats and as a quick way to request backslash escaping of non-ASCII 
characters in text.

Call it +0.5 for allowing %a. I don't expect it to be used heavily, but I think 
it will head off a fair bit of potential
misuse of __bytes__.


So, if %a is added it would act like:

-
  "%a" % some_obj
-
  tmp = str(some_obj)
  res = b''
  for ch in tmp:
  if ord(ch) < 256:
  res += bytes([ord(ch)]
  else:
  res += unicode_escape(ch)
-

where 'unicode_escape' would yield something like "\u0440" ?

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Stefan Krah
Serhiy Storchaka  wrote:
> .ac is well known suffix of autoconf related files.

I know, but unless someone writes Objects/configure.c I think this won't be a
problem.


> And tail .h has same disadvantages as .c.

I'm not strongly inconvenienced by those you listed.


Stefan Krah



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Larry Hastings



On 01/18/2014 01:02 AM, Serhiy Storchaka wrote:

1. I very very often use global search in sources. It's my way of navigation
and it's my way of investigations. I don't want to get false results in
generated files. And it is much easy to specify mask '*.[ch]' or '*.c,*.h'
(depending on tool) than specify a mask and negative mask. The latter is even
not always possible, I can write cumbersome expression for the find command,
but Midnight Commander doesn't support negative masks at all (and perhaps your
favorite IDE doesn't support them too).


Apparently you do this at the command-line.  In that case, you can make 
an 'alias' to hide the cumbersome expression. Perhaps you've already 
made one that ignores the ".hg" directory tree?


If the generated file didn't end in a standard extension, editors won't 
automatically recognize them and won't code-color them.  You tell me 
"everyone can easily reconfigure their editors" but it seems you writing 
an alias is unreasonable.




2. I'm not use any IDE, but if you use, it can be important for you. If IDE
shows sources tree, unlikely you want to see generated *.clinic.c files in
them. This will increase the list of sources almost twice.


My experience is that IDEs either show all files in the "project" (which 
should include the generated files anyway) or they show all files in the 
directory.  So this concern assumes behavior that isn't true.




3. Pathname expansion works better with unique endings, You can open all
Modules/_io/*.c files, but unlikely you so interested in *.clinic.c files which
are matched by former pattern.


How often do people edit *.c in a directory?  And then, how often do 
people edit *.c in a directory and wouldn't want to see the Argument 
Clinic generated code?




4. .c suffix at the end lies. This is not compilable C source file. This file
should be included in other C source file. This will confuse accidental user
and other tools. Including Argument Clinic itself, this is why it inserts the
"preserve" directive at the start of generated file. But other tools have no
such sign.


This is nonsense.  The contents of the file is 100% C.  If you added the 
proper include files (by hand, not recommended) it would compile standalone.



A lot of your suggestions assume no one would ever want to examine the 
generated code.  But people will still want to look in there:


 * to set breakpoints
 * to make sure existing Argument Clinic generated code does what you
   wanted
 * when experimenting with Argument Clinic inputs

So I don't see the need to make the generated files totally invisible.


Later in the thread someone suggests that ".h" would be a better ending; 
I'm willing to consider that.  (As in ".clinic.h".)  After all, you do 
include it, and there's some precedent for C code in H files (the 
already-cited stringlib).


Also, now I'm starting to worry that adding ".clinic.c" files to an IDE 
would mean the IDE would try to compile them.  Can somebody who uses an 
IDE to compile Python code experiment with ".clinic.c" files and report 
back--is it possible to add them to your "project" in such a way that 
the compiler will notice when they changed but won't try to compile them 
standalone?  I'm thinking specifically of MSVS, as that's explicitly 
supported by CPython, but I'm interested in results from other IDEs if 
people use them with CPython trunk.



Serhiy: I appreciate your contributions, both to Python in general and 
to Argument Clinic specifically.  And you're only doing this because you 
care.  Still, I feel like you've never been shown a bikeshed you didn't 
have an opinion on.



//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Zachary Ware
On Sat, Jan 18, 2014 at 12:10 PM, Serhiy Storchaka  wrote:
> 18.01.14 19:09, Antoine Pitrou написав(ла):
>
>> On Sat, 18 Jan 2014 18:06:06 +0100
>> Stefan Krah  wrote:

 I'd rather see memoryview.h than memoryview.clinic.c.
>>>
>>>
>>> Or, if this collides with Include/*, one of the following:
>>>
>>> memoryview_func.h  // public functions
>>>
>>> memoryview_if.h  // public interface
>>
>>
>> Objects/memoryview.clinic.h should be fine.
>
>
> All my objections against .clinic.c are applicable to .clinic.h as well.

Would it be of any help for the clinic files to live in their own
separate directory?  Say, instead of Objects/memoryview.clinic.c,
Clinic/memoryview.clinic.c?

-- 
Zach
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Serhiy Storchaka

18.01.14 19:09, Antoine Pitrou написав(ла):

On Sat, 18 Jan 2014 18:06:06 +0100
Stefan Krah  wrote:

I'd rather see memoryview.h than memoryview.clinic.c.


Or, if this collides with Include/*, one of the following:

memoryview_func.h  // public functions

memoryview_if.h  // public interface


Objects/memoryview.clinic.h should be fine.


All my objections against .clinic.c are applicable to .clinic.h as well.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Serhiy Storchaka

18.01.14 19:39, Stefan Krah написав(ла):

Right.  Objects/memoryview.ac.h perhaps?  I sort of dislike reading full words
in filename extensions.


.ac is well known suffix of autoconf related files. And tail .h has same 
disadvantages as .c.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Serhiy Storchaka

18.01.14 15:28, Nick Coghlan написав(ла):

I can argue either side, but the biggest potential problem I see with
Serhiy's suggestion is the likelihood of breaking automatic cross
referencing of symbols in most IDEs, as well as causing possible issues
for interactive debuggers. These are at least valid fragments of C
files, even if they're not designed to be compiled independently.
However, if both Visual Studio and gdb can still find the symbols
correctly, even with the ".clinic" extension, then I would consider that
a point strongly in favour of Serhiy's suggestion.


Good point. This idea did not come into my mind, and now I am almost 
ready to give up my proposals.


But C allows you to include files with any extensions (.h, hpp, .h++, 
.c, .cpp, .inc, .gen, etc), and a powerful tool should monitor 
"#include"s not paying attention to expansions. On the other hand, 
simpler tools can work with filename masks, and for them it is much 
easier to add a new extension than to set exclude condition (the last 
option may not be supported at all). At least it is so with the tools 
that I use.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Serhiy Storchaka

18.01.14 11:06, Chris Angelico написав(ла):

A point for the contrary side: In any editor or IDE with syntax
highlighting, a .clinic.c file will be highlighted as C code, but it
would take extra configuration to handle a .clinic file that way. But
that's a relatively minor consideration (AIUI most people won't be
looking at the .clinic files much, and for those who do, configure the
editor appropriately).


Yes, this was the main Larry's objection. And as you, I think this is a 
minor consideration (for same reasons).



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Larry Hastings

On 01/18/2014 05:28 AM, Nick Coghlan wrote:


However, if both Visual Studio and gdb can still find the symbols 
correctly, even with the ".clinic" extension, then I would consider 
that a point strongly in favour of Serhiy's suggestion.




No, that would be a lack of a point against Serhiy's suggestion.


//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 Final?

2014-01-18 Thread Ethan Furman

On 01/18/2014 03:40 AM, Antoine Pitrou wrote:

On Fri, 17 Jan 2014 08:49:21 -0800
Ethan Furman  wrote:


PEP: 461


There are formatting issues in the HTML rendering, I think the ReST
code needs a bit massaging:
http://www.python.org/dev/peps/pep-0461/


I'm not seeing the problems (could be I don't have enough experience to spot 
them).



.. note::

 Because the str type does not have a __bytes__ method, attempts to
 directly use 'a string' as a bytes interpolation value will raise an
 exception.  To use 'string' values, they must be encoded or otherwise
 transformed into a bytes sequence::


s/'string' values/unicode strings/


Fixed, thanks.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Stefan Krah
Antoine Pitrou  wrote:
> > Objects/memoryview.api.h
> > 
> > 
> > That is more neutral and describes what the file contains.
> 
> Disagreed. It's not an API in the sense that it's something that's
> designed to be called directly by third-party code.

Right.  Objects/memoryview.ac.h perhaps?  I sort of dislike reading full words
in filename extensions.


Stefan Krah


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Antoine Pitrou
On Sat, 18 Jan 2014 18:18:49 +0100
Stefan Krah  wrote:
> Antoine Pitrou  wrote:
> > Objects/memoryview.clinic.h should be fine.
> 
> Last attempt:
> 
> Objects/memoryview.api.h
> 
> 
> That is more neutral and describes what the file contains.

Disagreed. It's not an API in the sense that it's something that's
designed to be called directly by third-party code.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Stefan Krah
Antoine Pitrou  wrote:
> Objects/memoryview.clinic.h should be fine.

Last attempt:

Objects/memoryview.api.h


That is more neutral and describes what the file contains.  IOW, it's easier to
ignore the name (which is good in this case).



Stefan Krah



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Antoine Pitrou
On Sat, 18 Jan 2014 18:06:06 +0100
Stefan Krah  wrote:
> > I'd rather see memoryview.h than memoryview.clinic.c.
> 
> Or, if this collides with Include/*, one of the following:
> 
>memoryview_func.h  // public functions
> 
>memoryview_if.h  // public interface

Objects/memoryview.clinic.h should be fine.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Stefan Krah
> I'd rather see memoryview.h than memoryview.clinic.c.

Or, if this collides with Include/*, one of the following:

   memoryview_func.h  // public functions

   memoryview_if.h  // public interface


Stefan Krah



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Eric V. Smith
On 1/18/2014 11:24 AM, Stefan Krah wrote:
> Serhiy Storchaka  wrote:
>> Now generated files have suffixes .clinic.c. I think it will be better, if 
>> they 
>> will end at special suffix (.c.clinic or even just .clinic).
> 
> Can the output not go into a header file with static inline functions?
> 
> I'd rather see memoryview.h than memoryview.clinic.c.

Same here. There's some history for this, but not for generated code. In
Objects/stringlib, all of the files are .h files. They're really C code
designed to be included by other .c files.

Eric.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Stefan Krah
Serhiy Storchaka  wrote:
> Now generated files have suffixes .clinic.c. I think it will be better, if 
> they 
> will end at special suffix (.c.clinic or even just .clinic).

Can the output not go into a header file with static inline functions?

I'd rather see memoryview.h than memoryview.clinic.c.


Stefan Krah


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 updates

2014-01-18 Thread Oscar Benjamin
On 17 January 2014 21:37, Chris Barker  wrote:
>
> For the record, we've got a pretty good thread (not this good, though!) over
> on the numpy list about how to untangle the mess that has resulted from
> porting text-file-parsing code to py3 (and the underlying issue with the 'S'
> data type in numpy...)
>
> One note from the github issue:
> """
>  The use of asbytes originates only from the fact that b'%d' % (20,) does
> not work.
> """
>
> So yeah PEP 461! (even if too late for numpy...)

The discussion about numpy.loadtxt and the 'S' dtype is not relevant
to PEP 461.  PEP 461 is about facilitating handling ascii/binary
protocols and file formats. The loadtxt function is for reading text
files. Reading text files is already handled very well in Python 3.
The only caveat is that you need to specify the encoding when you open
the file.

The loadtxt function doesn't specify the encoding when it opens the
file so on Python 3 it gets the system default encoding when reading
from the file. Since the 'S' dtype is for an array of bytes the
loadtxt function has to encode the unicode strings before storing them
in the array. The function has no idea what encoding the user wants so
it just uses latin-1 leading to mojibake if the file content and
encoding are not compatible with latin-1 e.g.: utf-8.

The loadtxt function is a classic example of how *not* to do text and
whoever made it that way probably didn't understand unicode and the
Python 3 text model. If they did understand what they were doing then
they knew that they were implementing a dirty hack.

If you want to draw a relevant lesson from that thread in this one
then the lesson argues against PEP 461: adding back the bytes
formatting methods helps people who refuse to understand text
processing and continue implementing dirty hacks instead of doing it
properly.


Oscar
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 Final?

2014-01-18 Thread Nick Coghlan
On 18 Jan 2014 11:52, "Ethan Furman"  wrote:
>
> On 01/17/2014 05:27 PM, Steven D'Aprano wrote:
>>
>> On Fri, Jan 17, 2014 at 08:49:21AM -0800, Ethan Furman wrote:
>>>
>>>
>>> Overriding Principles
>>> =
>>>
>>> In order to avoid the problems of auto-conversion and Unicode
>>> exceptions that could plague Py2 code, all object checking will
>>> be done by duck-typing, not by values contained in a Unicode
>>>  representation [3]_.
>>
>>
>> I don't understand this paragraph. What does "values contained in a
>> Unicode representation" mean?
>
>
> Yeah, that is clunky.  I'm trying to convey the idea that we don't want
errors based on content, i.e. which characters happens to be in a str.
>
>
>
>> [...]
>>>
>>> %s is restricted in what it will accept::
>>>
>>>- input type supports Py_buffer?
>>>  use it to collect the necessary bytes
>>
>>
>> Can you give some examples of what types support Py_buffer? Presumably
>> bytes. Anything else?
>
>
> Anybody?  Otherwise I'll go spelunking in the code.

bytes, bytearray, memoryview, ctypes arrays, array.array, numpy.ndarrray

It may actually be clearer to express this in terms of memoryview for the
benefits of those that aren't familiar with the C API, as that is the
closest equivalent Python level API (while there is an open issue regarding
the C only nature of the buffer export API, nobody has volunteered to put
together a PEP and implementation for a Python level follow up to the C
level PEP 3118. The problem is that the original use cases involve C
extensions anyway, so the relevant experts don't have any personal need for
a Python level buffer exporter interface. Instead, it's in the "should be
done for completeness, and would make some of our testing easier, but
doesn't have anyone clamouring for it" bucket.

>
>
>
>>>- input type is something else?
>>>  use its __bytes__ method; if there isn't one, raise a TypeError
>>
>>
>> I think you should explicitly state that this is a new special method,
>> and state which built-in types will grow a __bytes__ method (if any).
>
>
> It's not new.  I know bytes, str, and numbers /do not/ have __bytes__.

Right, it is already used by bytes to convert arbitrary objects to a binary
representation. The difference with Py_buffer/memoryview is that they
provide access to the raw data without necessarily copying anything.

str and numbers don't implement it as there's no obvious default
interpretation (the b'\x00' * n interpretation of integers is part of the
bytes constructor and now a decision we mostly regret - it should have been
a keyword argument or a separate class method)

>
>
>
>>> Unsupported codes
>>> -
>>>
>>> %r (which calls __repr__), and %a (which calls ascii() on __repr__) are
not
>>> supported.
>>
>>
>> +1 on not supporting b'%r' (i.e. I agree with the PEP).
>>
>> Why not support b'%a'? That seems to be a strange thing to prohibit.
>
>
> I'll admit to being somewhat on the fence about %a.
>
> It seems there are two possibilities with %a:
>
>   1) have it be ascii(repr(obj))
>
>   2) have it be str(obj).encode('ascii', 'strict')

This gets very close to crossing the line into implicit encoding of text
again. Binary interpolation is being added back for the specific use case
of working with ASCII compatible segments in binary formats, and it's at
best arguable that supporting %a will help with that use case.

However, without it, there may be a greater temptation to inappropriately
define __bytes__ just to support binary interpolation, rather than because
a type truly has an appropriate translation directly to bytes.

By allowing %a, we avoid that temptation. This is also potentially useful
specifically in the case of binary logging formats and as a quick way to
request backslash escaping of non-ASCII characters in text.

Call it +0.5 for allowing %a. I don't expect it to be used heavily, but I
think it will head off a fair bit of potential misuse of __bytes__.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Nick Coghlan
On 18 Jan 2014 19:08, "Chris Angelico"  wrote:
>
> On Sat, Jan 18, 2014 at 8:02 PM, Serhiy Storchaka 
wrote:
> > 2. I'm not use any IDE, but if you use, it can be important for you. If
IDE
> > shows sources tree, unlikely you want to see generated *.clinic.c files
in
> > them. This will increase the list of sources almost twice.
>
> A point for the contrary side: In any editor or IDE with syntax
> highlighting, a .clinic.c file will be highlighted as C code, but it
> would take extra configuration to handle a .clinic file that way. But
> that's a relatively minor consideration (AIUI most people won't be
> looking at the .clinic files much, and for those who do, configure the
> editor appropriately).

I can argue either side, but the biggest potential problem I see with
Serhiy's suggestion is the likelihood of breaking automatic cross
referencing of symbols in most IDEs, as well as causing possible issues for
interactive debuggers. These are at least valid fragments of C files, even
if they're not designed to be compiled independently. However, if both
Visual Studio and gdb can still find the symbols correctly, even with the
".clinic" extension, then I would consider that a point strongly in favour
of Serhiy's suggestion.

Picking up on a side comment in Serhiy's post, based on my experience
reviewing a patch that included changes to clinic input blocks, I'd also
prefer if a parallel file was the default, and single file was opt in (or
not allowed at all). Getting changes reviewed and merged is one of the
biggest bottlenecks in our workflow, and the inline version of clinic is
much harder to review due to the intermingled diff of clinic input and
generated output.

Cheers,
Nick.

>
> ChrisA
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 461 Final?

2014-01-18 Thread Antoine Pitrou
On Fri, 17 Jan 2014 08:49:21 -0800
Ethan Furman  wrote:
> 
> PEP: 461

There are formatting issues in the HTML rendering, I think the ReST
code needs a bit massaging:
http://www.python.org/dev/peps/pep-0461/

> .. note::
> 
> Because the str type does not have a __bytes__ method, attempts to
> directly use 'a string' as a bytes interpolation value will raise an
> exception.  To use 'string' values, they must be encoded or otherwise
> transformed into a bytes sequence::

s/'string' values/unicode strings/

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Chris Angelico
On Sat, Jan 18, 2014 at 8:02 PM, Serhiy Storchaka  wrote:
> 2. I'm not use any IDE, but if you use, it can be important for you. If IDE
> shows sources tree, unlikely you want to see generated *.clinic.c files in
> them. This will increase the list of sources almost twice.

A point for the contrary side: In any editor or IDE with syntax
highlighting, a .clinic.c file will be highlighted as C code, but it
would take extra configuration to handle a .clinic file that way. But
that's a relatively minor consideration (AIUI most people won't be
looking at the .clinic files much, and for those who do, configure the
editor appropriately).

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] .clinic.c vs .c.clinic

2014-01-18 Thread Serhiy Storchaka
After the latest Argument Clinic updates my patches began to look much better. 
Thank you Larry. Now Argument Clinic supports output to side file (this is not 
default, you should specify "output preset file" at the start of first clinic 
declaration).

I already wrote about this here, but it seems my post got lost in the heart of 
one of the numerous threads and was not noticed. So I repeat it as a separate 
thread.

Now generated files have suffixes .clinic.c. I think it will be better, if they 
will end at special suffix (.c.clinic or even just .clinic).

My reasons: 

1. I very very often use global search in sources. It's my way of navigation 
and it's my way of investigations. I don't want to get false results in 
generated files. And it is much easy to specify mask '*.[ch]' or '*.c,*.h' 
(depending on tool) than specify a mask and negative mask. The latter is even 
not always possible, I can write cumbersome expression for the find command, 
but Midnight Commander doesn't support negative masks at all (and perhaps your 
favorite IDE doesn't support them too). 

2. I'm not use any IDE, but if you use, it can be important for you. If IDE 
shows sources tree, unlikely you want to see generated *.clinic.c files in 
them. This will increase the list of sources almost twice. 

3. Pathname expansion works better with unique endings, You can open all 
Modules/_io/*.c files, but unlikely you so interested in *.clinic.c files which 
are matched by former pattern. 

4. .c suffix at the end lies. This is not compilable C source file. This file 
should be included in other C source file. This will confuse accidental user 
and other tools. Including Argument Clinic itself, this is why it inserts the 
"preserve" directive at the start of generated file. But other tools have no 
such sign.

My attempt to convince Larry on IRC failed. He agreed to change his opinion 
only if other core developers persuade him. I ask you to help me convince 
Larry.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com