Re: [Python-ideas] Is there any idea about dictionary destructing?

2018-04-10 Thread Thautwarm Zhao
 Your library seems difficult to extract values from nested dictionary, and
when the key is not an identifier it's also embarrassed.

For sure we can have a library using graphql syntax to extract data from
the dict of any schema,  but that's not my point.

I'm focused on the consistency of the language itself.

{key: value_pattern, **_} = {key: value, **_}

The reason why it's important is that, when destructing/constructing for
built-in data structures are not supported completely,
people might ask why "[a, *b] = c" is ok but "{"a": a, **b} = c" not.
If only multiple assignment is supported, why "(a, (b, c)) = d" could be
ok? It's exactly destructing!
And then if people think destructing of data structures is ok, they might
be curious about what's the boundary of this feature.
If you just tell them, "oh, it's just special, it just works for iterables,
even you can partially destruct the dict in argument passing, you cannot
use it in a simple statement!"


>> def f(a, b, **c):
  print (a, b, c)
 >> f(a=1,  **{'b': 2})
 1 2 {}

 >> {'a': a, 'b': b, **c} = {'a': 1, **{'b': 2}}
 SyntaxError: can't assign to literal

Above example could be confusing in some degree, I think. If we don't have
so many convenient helpers for function call,
in terms of consistency, it might be even better...


2018-04-10 9:23 GMT+08:00 Joao S. O. Bueno :

> On 9 April 2018 at 22:10, Brett Cannon  wrote:
> >
> >
> > On Mon, 9 Apr 2018 at 05:18 Joao S. O. Bueno 
> wrote:
> >>
>
> >> we could even call this approach a name such as "function call".
> >
> >
> > The harsh sarcasm is not really called for.
>
> Indeed - on rereading, I have to agree on that.
>
> I do apologize for the sarcasm. - really, I not only stand corrected:
> I recognize i was incorrect to start with.
>
> But my argument that this feature is needless language bloat stands.
>
> On the othe hand, as for getting variable names out of _shallow_  mappings,
> I've built that feature in a package I authored, using a context manager
> to abuse the import mechanism -
>
> In [96]: from extradict import MapGetter
>
> In [97]: data = {"A": None, "B": 10}
>
> In [98]: with MapGetter(data):
>...: from data import A, B
>...:
>
> In [99]: A, B
> Out[99]: (None, 10)
>
>
> That is on Pypi and can be used by anyone right now.
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

2018-04-10 Thread Michel Desmoulin


Le 10/04/2018 à 00:54, Peter O'Connor a écrit :
> Kyle, you sounded so reasonable when you were trashing
> itertools.accumulate (which I now agree is horrible).  But then you go
> and support Serhiy's madness:  "smooth_signal = [average for average in
> [0] for x in signal for average in [(1-decay)*average + decay*x]]" which
> I agree is clever, but reads more like a riddle than readable code.  
> 
> Anyway, I continue to stand by:
> 
>     (y:= f(y, x) for x in iter_x from y=initial_y)
> 
> And, if that's not offensive enough, to its extension:
> 
>     (z, y := f(z, x) -> y for x in iter_x from z=initial_z)
> 
> Which carries state "z" forward but only yields "y" at each iteration. 
> (see proposal: https://github.com/petered/peps/blob/master/pep-.rst
> )
> 
> Why am I so obsessed?  Because it will allow you to conveniently replace
> classes with more clean, concise, functional code.  People who thought
> they never needed such a construct may suddenly start finding it
> indispensable once they get used to it.  
> 
> How many times have you written something of the form?:
> 
>     class StatefulThing(object):
>     
>         def __init__(self, initial_state, param_1, param_2):
>             self._param_1= param_1 
>             self._param_2 = param_2 
>             self._state = initial_state
>     
>         def update_and_get_output(self, new_observation):  # (or just
> __call__)
>             self._state = do_some_state_update(self._state,
> new_observation, self._param_1) 
>             output = transform_state_to_output(self._state, self._param_2)
>             return output
>     
>     processor = StatefulThing(initial_state = initial_state, param_1 =
> 1, param_2 = 4)
>     processed_things = [processor.update_and_get_output(x) for x in x_gen]
>     
> I've done this many times.  Video encoding, robot controllers, neural
> networks, any iterative machine learning algorithm, and probably lots of
> things I don't know about - they all tend to have this general form.  
> 

Personally I never have to do that very often. But let's say for the
sake of the argument there is a class of problem a part of the Python
community often solves with this pattern. After all, Python is a
versatile language with a very large and diverse user base.

First, why a class would be a bad thing ? It's clear, easy to
understand, debug and extend. Besides, do_some_state_update and
transform_state_to_output may very well be methods.

Second, if you really don't want a class, use a coroutine, that's
exactly what they are for:

def stateful_thing(state, param_1, param_2, output=None):
while True:
new_observation = yield output
state = do_some_state_update(state, new_observation, param_1)
output = transform_state_to_output(state, param_2)

processor = stateful_thing(1, 1, 4)
next(processor)
processed_things = [processor.send(x) for x in x_gen]

If you have that much of a complex workflow, you really should not make
that a one-liner.

And before trying to ask for a new syntax in the language, try to solve
the problem with the existing tools.

I know, I get the frustration.

I've been trying to get slicing on generators and inline try/except on
this mailing list for years and I've been said no again and again. It's
hard. But it's also why Python stayed sane for decades.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Is there any idea about dictionary destructing?

2018-04-10 Thread Steven D'Aprano
On Tue, Apr 10, 2018 at 03:29:08PM +0800, Thautwarm Zhao wrote:

> I'm focused on the consistency of the language itself.

Consistency is good, but it is not the only factor to consider. We must 
guard against *foolish* consistency: adding features just for the sake 
of matching some other, often barely related, feature. Each feature must 
justify itself, and consistency with something else is merely one 
possible attempt at justification.


> {key: value_pattern, **_} = {key: value, **_}

If I saw that, I would have no idea what it could even possibly do. 
Let's pick the simplest concrete example I can think of:

{'A': 1, **{}} = {'A': 0, **{}}

I cannot interpret what that should do. Is it some sort of 
pattern-matching? An update? What is the result? It is obviously some 
sort of binding operation, an assignment, but an assignment to what?

Sequence binding and unpacking was obvious the first time I saw it. I 
had no problem guessing what:

a, b, c = 1, 2, 3

meant, and once I had seen that, it wasn't hard to guess what

a, b, c = *sequence

meant. From there it is easy to predict extended unpacking. But I can't 
say the same for this.

I can almost see the point of:

a, b, c, = **{'a': 1, 'b': 2, 'c': 3}

but I'm having trouble thinking of a situation where I would actually 
use it. But your syntax above just confuses me.


> The reason why it's important is that, when destructing/constructing for
> built-in data structures are not supported completely,
> people might ask why "[a, *b] = c" is ok but "{"a": a, **b} = c" not.

People can ask all sorts of questions. I've seen people ask why Python 
doesn't support line numbers and GOTO. We're allowed to answer "Because 
it is a bad idea", or even "Because we don't think it is good enough to 
justify the cost".


> If only multiple assignment is supported, why "(a, (b, c)) = d" could be
> ok? It's exactly destructing!

That syntax is supported. I don't understand your point here.


>  >> {'a': a, 'b': b, **c} = {'a': 1, **{'b': 2}}
>  SyntaxError: can't assign to literal
> 
> Above example could be confusing in some degree, I think.

I have no idea what you expect it to do. Even something simpler:

{'a': a} = {'a': 2}

leaves me in the dark.




-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Is there any idea about dictionary destructing?

2018-04-10 Thread Jacco van Dorp
I must say I can't really see the point either. If you say like:

>  {'a': a, 'b': b, **c} = {'a': 1, **{'b': 2}}

Do you basically mean:

c =  {'a': 1, **{'b': 2}}
a = c.pop("a")
b = c.pop("b")  # ?

That's the only thing I could think of.

I think most of these problems could be solved with pop and the
occasional list comprehension like this:

a, b, c = [{'a':1,'b':2,'c':3}.pop(key) for key in ('a', 'b', 'c')]

or for your example:

c =  {'a': 1, **{'b': 2}}  # I suppose this one would generally
 # be dynamic, but I need a name here.
a, b = [c.pop(key) for key in ('a', 'b')]

would extract all the keys you need, and has the advantage that
you don't need hardcoded dict structure if you expand it to nested
dicts. It's even less writing, and just as extensible to nested dicts.
And if you dont actually want to destruct (tuples and lists aren't
destroyed either), just use __getitem__ access instead of pop.

2018-04-10 11:21 GMT+02:00 Steven D'Aprano :
> On Tue, Apr 10, 2018 at 03:29:08PM +0800, Thautwarm Zhao wrote:
>
>> I'm focused on the consistency of the language itself.
>
> Consistency is good, but it is not the only factor to consider. We must
> guard against *foolish* consistency: adding features just for the sake
> of matching some other, often barely related, feature. Each feature must
> justify itself, and consistency with something else is merely one
> possible attempt at justification.
>
>
>> {key: value_pattern, **_} = {key: value, **_}
>
> If I saw that, I would have no idea what it could even possibly do.
> Let's pick the simplest concrete example I can think of:
>
> {'A': 1, **{}} = {'A': 0, **{}}
>
> I cannot interpret what that should do. Is it some sort of
> pattern-matching? An update? What is the result? It is obviously some
> sort of binding operation, an assignment, but an assignment to what?
>
> Sequence binding and unpacking was obvious the first time I saw it. I
> had no problem guessing what:
>
> a, b, c = 1, 2, 3
>
> meant, and once I had seen that, it wasn't hard to guess what
>
> a, b, c = *sequence
>
> meant. From there it is easy to predict extended unpacking. But I can't
> say the same for this.
>
> I can almost see the point of:
>
> a, b, c, = **{'a': 1, 'b': 2, 'c': 3}
>
> but I'm having trouble thinking of a situation where I would actually
> use it. But your syntax above just confuses me.
>
>
>> The reason why it's important is that, when destructing/constructing for
>> built-in data structures are not supported completely,
>> people might ask why "[a, *b] = c" is ok but "{"a": a, **b} = c" not.
>
> People can ask all sorts of questions. I've seen people ask why Python
> doesn't support line numbers and GOTO. We're allowed to answer "Because
> it is a bad idea", or even "Because we don't think it is good enough to
> justify the cost".
>
>
>> If only multiple assignment is supported, why "(a, (b, c)) = d" could be
>> ok? It's exactly destructing!
>
> That syntax is supported. I don't understand your point here.
>
>
>>  >> {'a': a, 'b': b, **c} = {'a': 1, **{'b': 2}}
>>  SyntaxError: can't assign to literal
>>
>> Above example could be confusing in some degree, I think.
>
> I have no idea what you expect it to do. Even something simpler:
>
> {'a': a} = {'a': 2}
>
> leaves me in the dark.
>
>
>
>
> --
> Steve
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]

2018-04-10 Thread Guido van Rossum
On Mon, Apr 9, 2018 at 11:35 PM, Stephen J. Turnbull <
turnbull.stephen...@u.tsukuba.ac.jp> wrote:

> Tim Peters writes:
>
>  > "Sum reduction" and "running-sum accumulation" are primitives in
>  > many peoples' brains.
>
> I wonder what Kahneman would say about that.  He goes to some length
> to explain that people are quite good (as human abilities go) at
> perceiving averages over sets but terrible at summing the same.  Maybe
> they substitute the abstraction of summation for the ability to
> perform the operation?
>

[OT] How is that human ability tested? I am a visual learner and I would
propose that if you have a set of numbers, you can graph it in different
ways to make it easier to perceive one or the other (or maybe both):

- to emphasize the average, draw a line graph -- in my mind I draw a line
through the average (getting the trend for free)
- to emphasize the sum, draw a histogram -- in my mind I add up the sizes
of the bars

-- 
--Guido van Rossum (python.org/~guido)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Is there any idea about dictionary destructing?

2018-04-10 Thread Guido van Rossum
Here's one argument why sequence unpacking is more important than dict
unpacking.

Without sequence unpacking, you have a long sequence, to get at a specific
item you'd need to use indexing, where you often end up having to remember
the indices for each type of information. Say you have points of the form
(x, y, z, t), to get at the t coordinate you'd have to write p[3]. With
sequence unpacking you can write

  x, y, z, t = p

and then you can use the individual variables in the subsequent code.

However if your point had the form {'x': x, 'y': y, 'z': z, 't': t}, you
could just write p['t']. This is much more mnemonic than p[3].

All the rest follows -- after a while extended forms of iterable unpacking
start making sense. But for dicts the use case is just much less common.

(If you're doing a lot of JSON you might have a different view on that. You
should probably use some kind of schema-guided parser though.)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Add more information in the header of pyc files

2018-04-10 Thread Serhiy Storchaka
The format of the header of pyc files was stable for long time and 
changed only few times. First time it was changed in 3.3: added the size 
of the corresponding source mod 2**32. [1]  Second time it was changed 
in 3.7: added the 32-bit flags field and support of hash-based pyc files 
(PEP 552). [2] [3]


I think that it is worth to make more changed.

1. More stable file signature. Currently the magic number is changed in 
every feature release. Only the third and the forth bytes are stable 
(b'\r\n'), the first bytes are changed non-predicable. The 'py' launcher 
and third-party software like the 'file' command should support the list 
of magic numbers for all existing Python releases, and they can't detect 
pyc file for future versions. There is also a chance the pyc file 
signature will match the signature of other file type by accident. It 
would be better if the first 4 bytes of pyc files be same for all Python 
versions (or at least for all Python versions with the same major number).


2. Include the Python version. Currently the 'py' launcher needs to 
support the table that maps magic numbers to Python version. It can 
recognize only Python versions released before building the launcher. If 
the two major numbers of Python version be included in the version, it 
would not need such table.


3. The number of compatible subversion. Currently the interpreter 
supports only a single magic number. If the updated version of the 
compiler produces more optimal or more correct but compatible bytecode 
(like ), there is no way to say that the new bytecode is preferable, but 
the old bytecode can be used too. Changing the magic number causes 
invalidating all pyc files compiled by the old compiler (see [4] for the 
example of problems caused by this). The header could contain two magic 
numbers: the major magic number should be bumped for incompatible 
changes, the minor magic number should be reset to 0 when the major 
magic number is bumped, and should be bumped when the compiler become 
producing different but compatible bytecode. If the import system reads 
the pyc file with the minor magic number equal or greater than current, 
it just uses the pyc file. If it reads the pyc file with the minor magic 
number lesser than current, it can regenerate the pyc file if it is 
writeable. And the compileall module should regenerate all pyc files 
with minor magic numbers lesser than current.


[1] https://bugs.python.org/issue13645
[2] https://bugs.python.org/issue31650
[3] http://www.python.org/dev/peps/pep-0552/
[4] https://bugs.python.org/issue27286

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add more information in the header of pyc files

2018-04-10 Thread Antoine Pitrou
On Tue, 10 Apr 2018 18:49:36 +0300
Serhiy Storchaka 
wrote:
> 
> 1. More stable file signature. Currently the magic number is changed in 
> every feature release. Only the third and the forth bytes are stable 
> (b'\r\n'), the first bytes are changed non-predicable. The 'py' launcher 
> and third-party software like the 'file' command should support the list 
> of magic numbers for all existing Python releases, and they can't detect 
> pyc file for future versions. There is also a chance the pyc file 
> signature will match the signature of other file type by accident. It 
> would be better if the first 4 bytes of pyc files be same for all Python 
> versions (or at least for all Python versions with the same major number).

+1.

> 2. Include the Python version. Currently the 'py' launcher needs to 
> support the table that maps magic numbers to Python version. It can 
> recognize only Python versions released before building the launcher. If 
> the two major numbers of Python version be included in the version, it 
> would not need such table.

+1.

> 3. The number of compatible subversion. Currently the interpreter 
> supports only a single magic number. If the updated version of the 
> compiler produces more optimal or more correct but compatible bytecode 
> (like ), there is no way to say that the new bytecode is preferable, but 
> the old bytecode can be used too. Changing the magic number causes 
> invalidating all pyc files compiled by the old compiler (see [4] for the 
> example of problems caused by this). The header could contain two magic 
> numbers: the major magic number should be bumped for incompatible 
> changes, the minor magic number should be reset to 0 when the major 
> magic number is bumped, and should be bumped when the compiler become 
> producing different but compatible bytecode.

-1.  This is a risky move (and costly, in maintenance terms).  It's easy
to overlook subtle differencies that may translate into
incompatibilities in some production uses.  The rule « one Python
feature release == one bytecode version » is easy to remember and
understand, and is generally very well accepted.

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Serhiy Storchaka
Currently pyc files contain data that is useful mostly for developing 
and is not needed in most normal cases in stable program. There is even 
an option that allows to exclude a part of this information from pyc 
files. It is expected that this saves memory, startup time, and disk 
space (or the time of loading from network). I propose to move this data 
from pyc files into separate file or files. pyc files should contain 
only external references to external files. If the corresponding 
external file is absent or specific option suppresses them, references 
are replaced with None or NULL at import time, otherwise they are loaded 
from external files.


1. Docstrings. They are needed mainly for developing.

2. Line numbers (lnotab). They are helpful for formatting tracebacks, 
for tracing, and debugging with the debugger. Sources are helpful in 
such cases too. If the program doesn't contain errors ;-) and is sipped 
without sources, they could be removed.


3. Annotations. They are used mainly by third party tools that 
statically analyze sources. They are rarely used at runtime.


Docstrings will be read from the corresponding docstring file unless -OO 
is supplied. This will allow also to localize docstrings. Depending on 
locale or other settings different docstring file can be used.


For suppressing line numbers and annotations new options can be added.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

2018-04-10 Thread Peter O'Connor
>
> First, why a class would be a bad thing ? It's clear, easy to
> understand, debug and extend.


- Lots of reduntand-looking "frameworky" lines of code: "self._param_1 =
param_1"
- Potential for opaque state changes: Caller doesn't know if
"y=my_object.do_something(x)" has any side-effect, whereas with ("y,
new_state=do_something(state, x)" / "y=do_something(state, x)") it's clear
that there (is / is not).
- Makes more assumptions on usage (should I add "param_1" as an arg to
"StatefulThing.__init__" or to "StatefulThing.update_and_get_output"


> And before trying to ask for a new syntax in the language, try to solve
> the problem with the existing tools.


Oh I have, and of course there are ways but I find them all clunkier than
needed.  I added your coroutine to the freak show:
https://github.com/petered/peters_example_code/blob/master/peters_example_code/ways_to_skin_a_cat.py#L106


> processor = stateful_thing(1, 1, 4)
> next(processor)
> processed_things = [processor.send(x) for x in x_gen]


I *almost* like the coroutine thing but find it unusable because the
peculiarity of having to initialize the generator when you use it (you do
it with next(processor)) is pretty much guaranteed to lead to errors when
people forget to do it.  Earlier in the thread Steven D'Aprano showed how a
@coroutine decorator can get around this:
https://github.com/petered/peters_example_code/blob/master/peters_example_code/ways_to_skin_a_cat.py#L63
- Still, the whole coroutine thing still feels a bit magical, hacky and
"clever".  Also the use of generator.send will probably confuse around 90%
of programmers.

If you have that much of a complex workflow, you really should not make
> that a one-liner.


It's not a complex workflow, it's a moving average.  It just seems complex
because we don't have a nice, compact way to describe it.

I've been trying to get slicing on generators and inline try/except on
> this mailing list for years and I've been said no again and again. It's
> hard. But it's also why Python stayed sane for decades.


Hey I'll support your campaign if you support mine.




On Tue, Apr 10, 2018 at 4:18 AM, Michel Desmoulin  wrote:

>
>
> Le 10/04/2018 à 00:54, Peter O'Connor a écrit :
> > Kyle, you sounded so reasonable when you were trashing
> > itertools.accumulate (which I now agree is horrible).  But then you go
> > and support Serhiy's madness:  "smooth_signal = [average for average in
> > [0] for x in signal for average in [(1-decay)*average + decay*x]]" which
> > I agree is clever, but reads more like a riddle than readable code.
> >
> > Anyway, I continue to stand by:
> >
> > (y:= f(y, x) for x in iter_x from y=initial_y)
> >
> > And, if that's not offensive enough, to its extension:
> >
> > (z, y := f(z, x) -> y for x in iter_x from z=initial_z)
> >
> > Which carries state "z" forward but only yields "y" at each iteration.
> > (see proposal: https://github.com/petered/peps/blob/master/pep-.rst
> > )
> >
> > Why am I so obsessed?  Because it will allow you to conveniently replace
> > classes with more clean, concise, functional code.  People who thought
> > they never needed such a construct may suddenly start finding it
> > indispensable once they get used to it.
> >
> > How many times have you written something of the form?:
> >
> > class StatefulThing(object):
> >
> > def __init__(self, initial_state, param_1, param_2):
> > self._param_1= param_1
> > self._param_2 = param_2
> > self._state = initial_state
> >
> > def update_and_get_output(self, new_observation):  # (or just
> > __call__)
> > self._state = do_some_state_update(self._state,
> > new_observation, self._param_1)
> > output = transform_state_to_output(self._state,
> self._param_2)
> > return output
> >
> > processor = StatefulThing(initial_state = initial_state, param_1 =
> > 1, param_2 = 4)
> > processed_things = [processor.update_and_get_output(x) for x in
> x_gen]
> >
> > I've done this many times.  Video encoding, robot controllers, neural
> > networks, any iterative machine learning algorithm, and probably lots of
> > things I don't know about - they all tend to have this general form.
> >
>
> Personally I never have to do that very often. But let's say for the
> sake of the argument there is a class of problem a part of the Python
> community often solves with this pattern. After all, Python is a
> versatile language with a very large and diverse user base.
>
> First, why a class would be a bad thing ? It's clear, easy to
> understand, debug and extend. Besides, do_some_state_update and
> transform_state_to_output may very well be methods.
>
> Second, if you really don't want a class, use a coroutine, that's
> exactly what they are for:
>
> def stateful_thing(state, param_1, param_2, output=None):
> while True:
> new_observation = yield output
>  

Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Antoine Pitrou
On Tue, 10 Apr 2018 19:14:58 +0300
Serhiy Storchaka 
wrote:
> Currently pyc files contain data that is useful mostly for developing 
> and is not needed in most normal cases in stable program. There is even 
> an option that allows to exclude a part of this information from pyc 
> files. It is expected that this saves memory, startup time, and disk 
> space (or the time of loading from network). I propose to move this data 
> from pyc files into separate file or files. pyc files should contain 
> only external references to external files. If the corresponding 
> external file is absent or specific option suppresses them, references 
> are replaced with None or NULL at import time, otherwise they are loaded 
> from external files.
> 
> 1. Docstrings. They are needed mainly for developing.

Indeed, it may be nice to find a solution to ship them separately.

> 2. Line numbers (lnotab). They are helpful for formatting tracebacks, 
> for tracing, and debugging with the debugger. Sources are helpful in 
> such cases too. If the program doesn't contain errors ;-) and is sipped 
> without sources, they could be removed.

What is the weight of lnotab arrays?  While docstrings can be large,
I'm somehow skeptical that removing lnotab arrays would bring a
significant improvement.  It would be nice to have more data about this.

> 3. Annotations. They are used mainly by third party tools that 
> statically analyze sources. They are rarely used at runtime.

Even less used than docstrings probably.

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add more information in the header of pyc files

2018-04-10 Thread Serhiy Storchaka

10.04.18 18:58, Antoine Pitrou пише:

On Tue, 10 Apr 2018 18:49:36 +0300
Serhiy Storchaka 
wrote:

3. The number of compatible subversion. Currently the interpreter
supports only a single magic number. If the updated version of the
compiler produces more optimal or more correct but compatible bytecode
(like ), there is no way to say that the new bytecode is preferable, but
the old bytecode can be used too. Changing the magic number causes
invalidating all pyc files compiled by the old compiler (see [4] for the
example of problems caused by this). The header could contain two magic
numbers: the major magic number should be bumped for incompatible
changes, the minor magic number should be reset to 0 when the major
magic number is bumped, and should be bumped when the compiler become
producing different but compatible bytecode.


-1.  This is a risky move (and costly, in maintenance terms).  It's easy
to overlook subtle differencies that may translate into
incompatibilities in some production uses.  The rule « one Python
feature release == one bytecode version » is easy to remember and
understand, and is generally very well accepted.


A bugfix release can fix bugs in bytecode generation. See for example 
issue27286. [1]  The part of issue33041 backported to 3.7 and 3.6 is an 
other example. [2]  There were other examples of compatible changing the 
bytecode. Without bumping the magic number these fixes can just not have 
any effect if existing pyc files were generated by older compilers. But 
bumping the magic number in a bugfix release can lead to rebuilding 
every pyc file (even unaffected by the fix) in distributives.


[1] https://bugs.python.org/issue27286
[2] https://bugs.python.org/issue33041

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]

2018-04-10 Thread Tim Peters
[Tim]
> Woo hoo!  Another coincidence.  I just happened to be playing with
> this problem today:
>
> You have a large list - xs - of N numbers.  It's necessary to compute slice 
> sums
>
> sum(xs[i:j])
>
> for a great many slices, 0 <= i <= j <= N.

Which brought to mind a different problem:  we have a list of numbers,
`xs`.  For each index position `i`, we want to know the largest sum
among all segments ending at xs[i], and the number of elements in a
maximal-sum slice ending at xs[i].

`accumulate()` is a natural way to approach this, for someone with a
functional language background.  You'll have to trust me on that ;-)

But there are some twists:

- The identity element for max() is minus infinity, which accumulate()
can';t know.

- We want to generate a sequence of 2-tuples, despite that xs is a
sequence of numbers.

- In this case, we do _not_ want to see the special initial value.

For example, given the input xs

[-10, 3, -1, 7, -9, -7, -9, 7, 4]

we want to generate

(-10, 1), (3, 1), (2, 2), (9, 3), (0, 4), (-7, 1), (-9, 1), (7, 1), (11, 2)

Note:  the largest sum across all non-empty slices is then
max(that_result)[0].  The code below could be easily changed to keep
track off that incrementally too, but this is already so different
from "plain old running sum" that I don't want more novelty than
necessary to make the points (a special initial value is needed, and
it's not necessarily insane to want to produce results of a different
type than the inputs).

The key part is the state updating function:

def update(state, x):
prevmax, count = state
newsum = prevmax + x
if newsum > x:
return newsum, count + 1
else:
return x, 1

That's all there is to it!

Then, e.g.,

>>> from itertools import accumulate, chain
>>> import math
>>> xs = [-10, 3, -1, 7, -9, -7, -9, 7, 4]
>>> initial = (-math.inf, 1)
>>> result = accumulate(chain([initial], xs), update)
>>> next(result) # discard unwanted value
(-inf, 1)
>>> list(result)
[(-10, 1), (3, 1), (2, 2), (9, 3), (0, 4), (-7, 1), (-9, 1), (7, 1), (11, 2)]
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add more information in the header of pyc files

2018-04-10 Thread Antoine Pitrou
On Tue, 10 Apr 2018 19:29:18 +0300
Serhiy Storchaka 
wrote:
> 
> A bugfix release can fix bugs in bytecode generation. See for example 
> issue27286. [1]  The part of issue33041 backported to 3.7 and 3.6 is an 
> other example. [2]  There were other examples of compatible changing the 
> bytecode. Without bumping the magic number these fixes can just not have 
> any effect if existing pyc files were generated by older compilers. But 
> bumping the magic number in a bugfix release can lead to rebuilding 
> every pyc file (even unaffected by the fix) in distributives.

Sure, but I don't think rebuilding every pyc file is a significant
problem.  It's certainly less error-prone than cherry-picking which
files need rebuilding.

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

2018-04-10 Thread Steven D'Aprano
On Tue, Apr 10, 2018 at 12:18:27PM -0400, Peter O'Connor wrote:

[...]
> I added your coroutine to the freak show:

Peter, I realise that you're a fan of functional programming idioms, and 
I'm very sympathetic to that. I'm a fan of judicious use of FP too, and 
while I'm not keen on your specific syntax, I am interested in the 
general concept and would like it to have the best possible case made 
for it.

But even I find your use of dysphemisms like "freak show" for non-FP 
solutions quite off-putting. (I think this is the second time you've 
used the term.)

Python is not a functional programming language like Haskell, it is a 
multi-paradigm language with strong support for OO and procedural 
idioms. Notwithstanding the problems with OO idioms that you describe, 
many Python programmers find OO "better", simpler to understand, learn 
and maintain than FP. Or at least more familiar.

The rejection or approval of features into Python is not a popularity 
contest, ultimately it only requires one person (Guido) to either reject 
or approve a new feature. But popular opinion is not irrelevant either: 
like all benevolent dictators, Guido has a good sense of what's popular, 
and takes it into account in his considerations. If you put people 
off-side, you hurt your chances of having this feature approved.


[...]
> I *almost* like the coroutine thing but find it unusable because the
> peculiarity of having to initialize the generator when you use it (you do
> it with next(processor)) is pretty much guaranteed to lead to errors when
> people forget to do it.  Earlier in the thread Steven D'Aprano showed how a
> @coroutine decorator can get around this:

I agree that the (old-style, pre-async) coroutine idiom is little known, 
in part because of the awkwardness needed to make it work. Nevertheless, 
I think your argument about it leading to errors is overstated: if you 
forget to initialize the coroutine, you get a clear and obvious failure:

py> def co():
... x = (yield 1)
...
py> a = co()
py> a.send(99)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: can't send non-None value to a just-started generator



> - Still, the whole coroutine thing still feels a bit magical, hacky and
> "clever".  Also the use of generator.send will probably confuse around 90%
> of programmers.

In my experience, heavy use of FP idioms will probably confuse about the 
same percentage. Including me: I like FP in moderation, I wouldn't want 
to use a strict 100% functional language, and if someone even says the 
word "Monad" I break out in hives.



> If you have that much of a complex workflow, you really should not make
> > that a one-liner.
> 
> It's not a complex workflow, it's a moving average.  It just seems complex
> because we don't have a nice, compact way to describe it.

Indeed. But it seems to me that itertools.accumulate() with a initial 
value probably will solve that issue.

Besides... moving averages aren't that common that they *necessarily* 
need syntactic support. Wrapping the complexity in a function, then 
calling the function, may be an acceptible solution instead of putting 
the complexity directly into the language itself.

The Conservation Of Complexity Principle suggests that complexity cannot 
be created or destroyed, only moved around. If we reduce the complexity 
of the Python code needed to write a moving average, we invariably 
increase the complexity of the language, the interpreter, and the amount 
of syntax people need to learn in order to be productive with Python.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Chris Angelico
On Wed, Apr 11, 2018 at 2:14 AM, Serhiy Storchaka  wrote:
> Currently pyc files contain data that is useful mostly for developing and is
> not needed in most normal cases in stable program. There is even an option
> that allows to exclude a part of this information from pyc files. It is
> expected that this saves memory, startup time, and disk space (or the time
> of loading from network). I propose to move this data from pyc files into
> separate file or files. pyc files should contain only external references to
> external files. If the corresponding external file is absent or specific
> option suppresses them, references are replaced with None or NULL at import
> time, otherwise they are loaded from external files.
>
> 1. Docstrings. They are needed mainly for developing.
>
> 2. Line numbers (lnotab). They are helpful for formatting tracebacks, for
> tracing, and debugging with the debugger. Sources are helpful in such cases
> too. If the program doesn't contain errors ;-) and is sipped without
> sources, they could be removed.
>
> 3. Annotations. They are used mainly by third party tools that statically
> analyze sources. They are rarely used at runtime.
>
> Docstrings will be read from the corresponding docstring file unless -OO is
> supplied. This will allow also to localize docstrings. Depending on locale
> or other settings different docstring file can be used.
>
> For suppressing line numbers and annotations new options can be added.

A deployed Python distribution generally has .pyc files for all of the
standard library. I don't think people want to lose the ability to
call help(), and unless I'm misunderstanding, that requires
docstrings. So this will mean twice as many files and twice as many
file-open calls to import from the standard library. What will be the
impact on startup time?

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Zachary Ware
On Tue, Apr 10, 2018 at 12:38 PM, Chris Angelico  wrote:
> A deployed Python distribution generally has .pyc files for all of the
> standard library. I don't think people want to lose the ability to
> call help(), and unless I'm misunderstanding, that requires
> docstrings. So this will mean twice as many files and twice as many
> file-open calls to import from the standard library. What will be the
> impact on startup time?

What about instead of separate files turning the single file into a
pseudo-zip file containing all of the proposed files, and provide a
simple tool for removing whatever parts you don't want?

-- 
Zach
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Ethan Furman

On 04/10/2018 10:54 AM, Zachary Ware wrote:

On Tue, Apr 10, 2018 at 12:38 PM, Chris Angelico  wrote:

A deployed Python distribution generally has .pyc files for all of the
standard library. I don't think people want to lose the ability to
call help(), and unless I'm misunderstanding, that requires
docstrings. So this will mean twice as many files and twice as many
file-open calls to import from the standard library. What will be the
impact on startup time?


What about instead of separate files turning the single file into a
pseudo-zip file containing all of the proposed files, and provide a
simple tool for removing whatever parts you don't want?


-O and -OO already do some trimming; perhaps going that route instead of having 
multiple files would be better.

--
~Ethan~

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Stephan Houben
There are libraries out there like this:

https://docopt.readthedocs.io/en/0.2.0/

which use docstrings for runtime info.

Today we already have -OO which allows us to create docstring-less bytecode
files in case we have, after careful consideration, established that it is
safe to do so.

I think the current way (-OO) to avoid docstring loading is the correct one.
It pushes the responsibility on whoever did the packaging to decide if -OO
is appropriate.

The ability to remove the docstrings after bytecode generation would be
kinda nice
(similar to Unix "strip" command)
but given how fast bytecode compilation is, frankly I don't think it is
very important.

Stephan

2018-04-10 19:54 GMT+02:00 Zachary Ware :

> On Tue, Apr 10, 2018 at 12:38 PM, Chris Angelico  wrote:
> > A deployed Python distribution generally has .pyc files for all of the
> > standard library. I don't think people want to lose the ability to
> > call help(), and unless I'm misunderstanding, that requires
> > docstrings. So this will mean twice as many files and twice as many
> > file-open calls to import from the standard library. What will be the
> > impact on startup time?
>
> What about instead of separate files turning the single file into a
> pseudo-zip file containing all of the proposed files, and provide a
> simple tool for removing whatever parts you don't want?
>
> --
> Zach
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Daniel Moisset
I'm not sure I understand the benefit of this, perhaps you can clarify.
What I see is two scenarios

Scenario A) External files are present

In this case, the data is loaded from the pyc and then from external file,
so there are no savings in memory, startup time, disk space, or network
load time, it's just the same disk information and runtime structure with a
different layout

Scenario B) External files are not present

In this case, you get runtime improvements exactly identical to not having
the data in the pyc which is roughly what you get with -OO.

The only new capability I see this adds is the localization benefit, is
that what this proposal is about?



On 10 April 2018 at 17:14, Serhiy Storchaka  wrote:

> Currently pyc files contain data that is useful mostly for developing and
> is not needed in most normal cases in stable program. There is even an
> option that allows to exclude a part of this information from pyc files. It
> is expected that this saves memory, startup time, and disk space (or the
> time of loading from network). I propose to move this data from pyc files
> into separate file or files. pyc files should contain only external
> references to external files. If the corresponding external file is absent
> or specific option suppresses them, references are replaced with None or
> NULL at import time, otherwise they are loaded from external files.
>
> 1. Docstrings. They are needed mainly for developing.
>
> 2. Line numbers (lnotab). They are helpful for formatting tracebacks, for
> tracing, and debugging with the debugger. Sources are helpful in such cases
> too. If the program doesn't contain errors ;-) and is sipped without
> sources, they could be removed.
>
> 3. Annotations. They are used mainly by third party tools that statically
> analyze sources. They are rarely used at runtime.
>
> Docstrings will be read from the corresponding docstring file unless -OO
> is supplied. This will allow also to localize docstrings. Depending on
> locale or other settings different docstring file can be used.
>
> For suppressing line numbers and annotations new options can be added.
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Daniel F. Moisset - UK Country Manager - Machinalis Limited
www.machinalis.co.uk 
Skype: @dmoisset T: + 44 7398 827139

1 Fore St, London, EC2Y 9DT

Machinalis Limited is a company registered in England and Wales. Registered
number: 10574987.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Antoine Pitrou
On Tue, 10 Apr 2018 11:13:01 -0700
Ethan Furman  wrote:
> On 04/10/2018 10:54 AM, Zachary Ware wrote:
> > On Tue, Apr 10, 2018 at 12:38 PM, Chris Angelico  wrote:  
> >> A deployed Python distribution generally has .pyc files for all of the
> >> standard library. I don't think people want to lose the ability to
> >> call help(), and unless I'm misunderstanding, that requires
> >> docstrings. So this will mean twice as many files and twice as many
> >> file-open calls to import from the standard library. What will be the
> >> impact on startup time?  
> >
> > What about instead of separate files turning the single file into a
> > pseudo-zip file containing all of the proposed files, and provide a
> > simple tool for removing whatever parts you don't want?  
> 
> -O and -OO already do some trimming; perhaps going that route instead of 
> having multiple files would be better.

"python -O" and "python -OO" *do* generate different pyc files.
If you want to trim docstrings with those options, you need to
regenerate pyc files for all your dependencies (including third-party
libraries and standard library modules).

Serhiy's proposal allows "-O" and "-OO" to work without needing a
custom bytecode generation step.

Regard

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

2018-04-10 Thread Peter O'Connor
>
> But even I find your use of dysphemisms like "freak show" for non-FP
> solutions quite off-putting.


Ah, I'm sorry, "freak show" was not mean to be disparaging to the authors
or even the code itself, but to describe the variety of strange solutions
(my own included) to this simple problem.

Indeed. But it seems to me that itertools.accumulate() with a initial value
> probably will solve that issue.


Kyle Lahnakoski made a pretty good case for not using
itertools.accumulate() earlier in this thread, and Tim Peters made the
point that it's non-initialized behaviour can be extremely unintuitive (try
"print(list(itertools.accumulate([1, 2, 3], lambda x, y: str(x) +
str(y"  ).  These convinced me that that itertools.accumulate should be
avoided altogether.

Alternatively, if anyone has a proposed syntax that does the same thing as
Serhiy Storchaka's:

smooth_signal = [average for average in [0] for x in signal for average
in [(1-decay)*average + decay*x]]

But in a way that more intuitively expresses the intent of the code, it
would be great to have more options on the market.



On Tue, Apr 10, 2018 at 1:32 PM, Steven D'Aprano 
wrote:

> On Tue, Apr 10, 2018 at 12:18:27PM -0400, Peter O'Connor wrote:
>
> [...]
> > I added your coroutine to the freak show:
>
> Peter, I realise that you're a fan of functional programming idioms, and
> I'm very sympathetic to that. I'm a fan of judicious use of FP too, and
> while I'm not keen on your specific syntax, I am interested in the
> general concept and would like it to have the best possible case made
> for it.
>
> But even I find your use of dysphemisms like "freak show" for non-FP
> solutions quite off-putting. (I think this is the second time you've
> used the term.)
>
> Python is not a functional programming language like Haskell, it is a
> multi-paradigm language with strong support for OO and procedural
> idioms. Notwithstanding the problems with OO idioms that you describe,
> many Python programmers find OO "better", simpler to understand, learn
> and maintain than FP. Or at least more familiar.
>
> The rejection or approval of features into Python is not a popularity
> contest, ultimately it only requires one person (Guido) to either reject
> or approve a new feature. But popular opinion is not irrelevant either:
> like all benevolent dictators, Guido has a good sense of what's popular,
> and takes it into account in his considerations. If you put people
> off-side, you hurt your chances of having this feature approved.
>
>
> [...]
> > I *almost* like the coroutine thing but find it unusable because the
> > peculiarity of having to initialize the generator when you use it (you do
> > it with next(processor)) is pretty much guaranteed to lead to errors when
> > people forget to do it.  Earlier in the thread Steven D'Aprano showed
> how a
> > @coroutine decorator can get around this:
>
> I agree that the (old-style, pre-async) coroutine idiom is little known,
> in part because of the awkwardness needed to make it work. Nevertheless,
> I think your argument about it leading to errors is overstated: if you
> forget to initialize the coroutine, you get a clear and obvious failure:
>
> py> def co():
> ... x = (yield 1)
> ...
> py> a = co()
> py> a.send(99)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: can't send non-None value to a just-started generator
>
>
>
> > - Still, the whole coroutine thing still feels a bit magical, hacky and
> > "clever".  Also the use of generator.send will probably confuse around
> 90%
> > of programmers.
>
> In my experience, heavy use of FP idioms will probably confuse about the
> same percentage. Including me: I like FP in moderation, I wouldn't want
> to use a strict 100% functional language, and if someone even says the
> word "Monad" I break out in hives.
>
>
>
> > If you have that much of a complex workflow, you really should not make
> > > that a one-liner.
> >
> > It's not a complex workflow, it's a moving average.  It just seems
> complex
> > because we don't have a nice, compact way to describe it.
>
> Indeed. But it seems to me that itertools.accumulate() with a initial
> value probably will solve that issue.
>
> Besides... moving averages aren't that common that they *necessarily*
> need syntactic support. Wrapping the complexity in a function, then
> calling the function, may be an acceptible solution instead of putting
> the complexity directly into the language itself.
>
> The Conservation Of Complexity Principle suggests that complexity cannot
> be created or destroyed, only moved around. If we reduce the complexity
> of the Python code needed to write a moving average, we invariably
> increase the complexity of the language, the interpreter, and the amount
> of syntax people need to learn in order to be productive with Python.
>
>
> --
> Steve
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mail

Re: [Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

2018-04-10 Thread Rhodri James

On 10/04/18 18:32, Steven D'Aprano wrote:

On Tue, Apr 10, 2018 at 12:18:27PM -0400, Peter O'Connor wrote:

[...]

I added your coroutine to the freak show:

Peter, I realise that you're a fan of functional programming idioms, and
I'm very sympathetic to that. I'm a fan of judicious use of FP too, and
while I'm not keen on your specific syntax, I am interested in the
general concept and would like it to have the best possible case made
for it.

But even I find your use of dysphemisms like "freak show" for non-FP
solutions quite off-putting. (I think this is the second time you've
used the term.)


Thank you for saying that, Steven.  I must admit I was beginning to find 
the implicit insults rather grating.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

2018-04-10 Thread Paul Moore
On 10 April 2018 at 19:25, Peter O'Connor  wrote:
> Kyle Lahnakoski made a pretty good case for not using itertools.accumulate() 
> earlier in this thread

I wouldn't call it a "pretty good case". He argued that writing
*functions* was a bad thing, because the name of a function didn't
provide all the details of what was going on in the same way that
explicitly writing the code inline would do. That seems to me to be a
somewhat bizarre argument - after all, encapsulation and abstraction
are pretty fundamental to programming. I'm not even sure he had any
specific comments about accumulate other than his general point that
as a named function it's somehow worse than writing out the explicit
loop.

> But in a way that more intuitively expresses the intent of the code, it
> would be great to have more options on the market.

It's worth adding a reminder here that "having more options on the
market" is pretty directly in contradiction to the Zen of Python -
"There should be one-- and preferably only one --obvious way to do
it".

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Antoine Pitrou
On Tue, 10 Apr 2018 19:14:58 +0300
Serhiy Storchaka 
wrote:
> Currently pyc files contain data that is useful mostly for developing 
> and is not needed in most normal cases in stable program. There is even 
> an option that allows to exclude a part of this information from pyc 
> files. It is expected that this saves memory, startup time, and disk 
> space (or the time of loading from network). I propose to move this data 
> from pyc files into separate file or files. pyc files should contain 
> only external references to external files. If the corresponding 
> external file is absent or specific option suppresses them, references 
> are replaced with None or NULL at import time, otherwise they are loaded 
> from external files.
> 
> 1. Docstrings. They are needed mainly for developing.
> 
> 2. Line numbers (lnotab). They are helpful for formatting tracebacks, 
> for tracing, and debugging with the debugger. Sources are helpful in 
> such cases too. If the program doesn't contain errors ;-) and is sipped 
> without sources, they could be removed.
> 
> 3. Annotations. They are used mainly by third party tools that 
> statically analyze sources. They are rarely used at runtime.
> 
> Docstrings will be read from the corresponding docstring file unless -OO 
> is supplied. This will allow also to localize docstrings. Depending on 
> locale or other settings different docstring file can be used.

An alternate proposal would be to have separate sections in a
single marshal file.  The main section (containing the loadable
module) would have references to the other sections. This way it's easy
for the loader to say "all references to the docstring section and/or
to the annotation section are replaced with None", depending on how
Python is started.  It would also be possible to do it on disk with a
strip-like utility.

I'm not volunteering to do all this, so just my 2 cents ;-)

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Eric V. Smith

>> 3. Annotations. They are used mainly by third party tools that 
>> statically analyze sources. They are rarely used at runtime.
> 
> Even less used than docstrings probably.

typing.NamedTuple and dataclasses use annotations at runtime. 

Eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Steven D'Aprano
On Wed, Apr 11, 2018 at 03:38:08AM +1000, Chris Angelico wrote:
> On Wed, Apr 11, 2018 at 2:14 AM, Serhiy Storchaka  wrote:
> > Currently pyc files contain data that is useful mostly for developing and is
> > not needed in most normal cases in stable program. There is even an option
> > that allows to exclude a part of this information from pyc files. It is
> > expected that this saves memory, startup time, and disk space (or the time
> > of loading from network). I propose to move this data from pyc files into
> > separate file or files. pyc files should contain only external references to
> > external files. If the corresponding external file is absent or specific
> > option suppresses them, references are replaced with None or NULL at import
> > time, otherwise they are loaded from external files.
> >
> > 1. Docstrings. They are needed mainly for developing.
> >
> > 2. Line numbers (lnotab). They are helpful for formatting tracebacks, for
> > tracing, and debugging with the debugger. Sources are helpful in such cases
> > too. If the program doesn't contain errors ;-) and is sipped without
> > sources, they could be removed.
> >
> > 3. Annotations. They are used mainly by third party tools that statically
> > analyze sources. They are rarely used at runtime.
> >
> > Docstrings will be read from the corresponding docstring file unless -OO is
> > supplied. This will allow also to localize docstrings. Depending on locale
> > or other settings different docstring file can be used.
> >
> > For suppressing line numbers and annotations new options can be added.
> 
> A deployed Python distribution generally has .pyc files for all of the
> standard library. I don't think people want to lose the ability to
> call help(), and unless I'm misunderstanding, that requires
> docstrings. So this will mean twice as many files and twice as many
> file-open calls to import from the standard library. What will be the
> impact on startup time?

I shouldn't think that the number of files on disk is very important, 
now that they're hidden away in the __pycache__ directory where they can 
be ignored by humans. Even venerable old FAT32 has a limit of 65,534 
files in a single folder, and 268,435,437 on the entire volume. So 
unless the std lib expands to 16000+ modules, the number of files in the 
__pycache__ directory ought to be well below that limit.

I think even MicroPython ought to be okay with that. (But it would be 
nice to find out for sure: does it support file systems with *really* 
tiny limits?)

The entire __pycache__ directory is supposed to be a black box except 
under unusual circumstances, so it doesn't matter (at least not to me)
if we have:

__pycache__/spam.cpython-38.pyc

alone or:

__pycache__/spam.cpython-38.pyc
__pycache__/spam.cpython-38-doc.pyc
__pycache__/spam.cpython-38-lno.pyc
__pycache__/spam.cpython-38-ann.pyc

(say). And if the external references are loaded lazily, on need, rather 
than eagerly, this could save startup time, which I think is the 
intention. The doc strings would be still available, just not loaded 
until the first time you try to use them.

However, Python supports byte-code only distribution, using .pyc files 
external to the __pycache__. In that case, it would be annoying and 
inconvenient to distribute four top-level files, so I think that the use 
of external references has to be optional, and there has to be a way to 
either compile to a single .pyc file containing all four parts, or an 
external tool that can take the existing four files and merge them.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Chris Angelico
On Wed, Apr 11, 2018 at 10:03 AM, Steven D'Aprano  wrote:
> On Wed, Apr 11, 2018 at 03:38:08AM +1000, Chris Angelico wrote:
>> A deployed Python distribution generally has .pyc files for all of the
>> standard library. I don't think people want to lose the ability to
>> call help(), and unless I'm misunderstanding, that requires
>> docstrings. So this will mean twice as many files and twice as many
>> file-open calls to import from the standard library. What will be the
>> impact on startup time?
>
> I shouldn't think that the number of files on disk is very important,
> now that they're hidden away in the __pycache__ directory where they can
> be ignored by humans. Even venerable old FAT32 has a limit of 65,534
> files in a single folder, and 268,435,437 on the entire volume. So
> unless the std lib expands to 16000+ modules, the number of files in the
> __pycache__ directory ought to be well below that limit.
>
> I think even MicroPython ought to be okay with that. (But it would be
> nice to find out for sure: does it support file systems with *really*
> tiny limits?)

File system limits aren't usually an issue; as you say, even FAT32 can
store a metric ton of files in a single directory. I'm more interested
in how long it takes to open a file, and whether doubling that time
will have a measurable impact on Python startup time. Part of that
cost can be reduced by using openat(), on platforms that support it,
but even with a directory handle, there's still a definite non-zero
cost to opening and reading an additional file.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Gregory P. Smith
On Tue, Apr 10, 2018 at 12:51 PM Eric V. Smith  wrote:

>
> >> 3. Annotations. They are used mainly by third party tools that
> >> statically analyze sources. They are rarely used at runtime.
> >
> > Even less used than docstrings probably.
>
> typing.NamedTuple and dataclasses use annotations at runtime.
>
> Eric
>

Yep. Everything accessible in any way at runtime is used by something at
runtime. It's a public API, we can't just get rid of it.

Several libraries rely on docstrings being available (additional case in
point beyond the already linked to cli tool: ply
)

Most of the world never appears to use -O and -OO.  If they do, they don't
use these libraries or jump through special hoops to prevent pyo
compliation of any sources that need them.  (unlikely)

-gps
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Eric Fahlgren
On Tue, Apr 10, 2018 at 5:03 PM, Steven D'Aprano 
wrote:

>
> __pycache__/spam.cpython-38.pyc
> __pycache__/spam.cpython-38-doc.pyc
> __pycache__/spam.cpython-38-lno.pyc
> __pycache__/spam.cpython-38-ann.pyc
>

​Our product uses the doc strings for auto-generated help, so we need to
keep those.  We also allow users to write plugins and scripts, so getting
valid feedback in tracebacks is essential for our support people, so we'll
keep the lno files, too.  Annotations can probably go.

Looking at one of our little pyc files, I see:

-rwx--+ 1 efahlgren admins  9252 Apr 10 17:25 ./lm/lib/config.pyc*​

Since disk blocks are typically 4096 bytes, that's really a 12k file.
Let's say it's 8k of byte code, 1k of doc, a bit of lno.  So the proposed
layout would give:

config.pyc -> 8k
config-doc.pyc -> 4k
config-lno.pyc -> 4k

So now I've increased disk usage by 25% (yeah yeah, I know, I picked that
small file on purpose to illustrate the point, but it's not unusual).

These files are often opened over a network, at least for user plugins.
This can take a really, really long time on some of our poorly connected
machines, like 1-2 seconds per file (no kidding, it's horrible).  Now
instead of opening just one file in 1-2 seconds, we have increased the time
by 300%, just to do the stat+open, probably another stat to make sure
there's no "ann" file laying about.  Ouch.

-1 from me.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Accepting multiple mappings as positional arguments to create dicts

2018-04-10 Thread Mike Miller


On 2018-04-09 04:23, Daniel Moisset wrote:

In which way would this be different to {**mapping1, **mapping2, **mapping3} ?


That's possible now, but believe the form mentioned previously would be more 
readable:


dict(d1, d2, d3)

-Mike

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Steven D'Aprano
On Wed, Apr 11, 2018 at 10:08:58AM +1000, Chris Angelico wrote:

> File system limits aren't usually an issue; as you say, even FAT32 can
> store a metric ton of files in a single directory. I'm more interested
> in how long it takes to open a file, and whether doubling that time
> will have a measurable impact on Python startup time. Part of that
> cost can be reduced by using openat(), on platforms that support it,
> but even with a directory handle, there's still a definite non-zero
> cost to opening and reading an additional file.

Yes, it will double the number of files. Actually quadruple it, if the 
annotations and line numbers are in separate files too. But if most of 
those extra files never need to be opened, then there's no cost to them. 
And whatever extra cost there is, is amortized over the lifetime of the 
interpreter.

The expectation here is that this could lead to reducing startup time, 
since the files which are read are smaller and less data needs to be 
read and traverse the network up front, but can be defered until they're 
actually needed.

Serhiy is experienced enough that I think we should assume he's not 
going to push this optimization into production unless it actually does 
reduce startup time. He has proven himself enough that we should assume 
competence rather than incompetence :-)

Here is the proposal as I understand it:

- by default, change .pyc files to store annotations, docstrings
  and line numbers as references to external files which will be
  lazily loaded on-need;

- single-file .pyc files must still be supported, but this won't
  be the default and could rely on an external "merge" tool;

- objects that rely on docstrings or annotations, such as dataclass,
  may experience a (hopefully very small) increase of import time,
  since they may not be able to defer loading the extra files;

- but in general, most modules should (we expect) see an decrease
  in the load time;

- which will (we hope) reduce startup time;

- libraries which make eager use of docstrings and annotations might
  even ship with the single-file .pyc instead (the library installer
  can look after that aspect), and so avoid any extra cost.

Naturally pushing this into production will require benchmarks that 
prove this actually does improve startup time. I believe that Serhiy's 
reason for asking is to determine whether it is worth his while to 
experiment on this. There's no point in implementing these changes and 
benchmarking them, if there's no chance of it being accepted.

So on the assumptions that:

- benchmarking does demonstrate a non-trivial speedup of
  interpreter startup;

- single-file .pyc files are still supported, for the use
  of byte-code only libraries;

- and modules which are particularly badly impacted by this
  change are able to opt-out and use a single .pyc file;

I see no reason not to support this idea if Serhiy (or someone else) is 
willing to put in the work.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 572: Statement-Local Name Bindings, take three!

2018-04-10 Thread Mike Miller
If anyone is interested I came across this same subject on a blog post and 
discussion on HN today:


- https://www.hillelwayne.com/post/equals-as-assignment/
- https://news.ycombinator.com/item?id=16803874


On 2018-04-02 15:03, Guido van Rossum wrote:
IIRC Algol-68 (the lesser-known, more complicated version) used 'int x = 0;' to 
declare a constant and 'int x := 0;' to declare a variable. And there was a lot 
more to it; see https://en.wikipedia.org/wiki/ALGOL_68#mode:_Declarations. I'm 
guessing Go reversed this because they want '=' to be the common assignment 
(whereas in Algol-68 the common assignment was ':=').


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

2018-04-10 Thread Tim Peters
[Jacco van Dorp ]
> I've sometimes thought that exhaust(iterator) or iterator.exhaust() would be
> a good thing to have - I've often wrote code doing basically "call this 
> function
> for every element in this container, and idc about return values", but find
> myself using a list comprehension instead of generator. I guess it's such an
> edge case that exhaust(iterator) as builtin would be overkill (but perhaps
> itertools could have it ?), and most people don't pass around iterators, so
> (f(x) for x in y).exhaust() might not look natural to most people.

"The standard" clever way to do this is to create a 0-sized deque:

>>> from collections import deque
>>> deque((i for i in range(1000)), 0)
deque([], maxlen=0)

The deque constructor consumes the entire iterable "at C speed", but
throws all the results away because the deque's maximum size is too
small to hold any of them ;-)

> It could return the value for the last() semantics, but I think exhaustion
> would often be more important than the last value.

For last(),

>>> deque((i for i in range(1000)), 1)[0]
999

In that case the deque only has enough room to remember one element,
and so remembers the last one it sees.  Of course this generalizes to
larger values too:

>>> for x in deque((i for i in range(1000)), 5):
... print(x)
995
996
997
998
999

I think I'd like to see itertools add a `drop(iterable, n=None)`
function.  If `n` is not given, it would consume the entire iterable.
Else for an integer n >= 0, it would return an iterator that skips
over the first `n` values of the input iterable.

`drop n xs` has been in Haskell forever, and is also in the Python
itertoolz package:

http://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.drop

I'm not happy about switching the argument order from those, but would
really like to omit `n` as a a way to spell "pretend n is infinity",
so there would be no more need for the "empty deque" trick.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

2018-04-10 Thread Steven D'Aprano
On Tue, Apr 10, 2018 at 08:12:14PM +0100, Paul Moore wrote:
> On 10 April 2018 at 19:25, Peter O'Connor  wrote:
> > Kyle Lahnakoski made a pretty good case for not using 
> > itertools.accumulate() earlier in this thread
> 
> I wouldn't call it a "pretty good case". He argued that writing
> *functions* was a bad thing, because the name of a function didn't
> provide all the details of what was going on in the same way that
> explicitly writing the code inline would do. That seems to me to be a
> somewhat bizarre argument - after all, encapsulation and abstraction
> are pretty fundamental to programming. I'm not even sure he had any
> specific comments about accumulate other than his general point that
> as a named function it's somehow worse than writing out the explicit
> loop.

I agree with Paul here -- I think that Kyle's argument is idiosyncratic. 
It isn't going to stop me from writing functions :-)


> > But in a way that more intuitively expresses the intent of the code, it
> > would be great to have more options on the market.
> 
> It's worth adding a reminder here that "having more options on the
> market" is pretty directly in contradiction to the Zen of Python -
> "There should be one-- and preferably only one --obvious way to do
> it".

I'm afraid I'm going to (mildly) object here. At least you didn't 
misquote the Zen as "Only One Way To Do It" :-)

The Zen here is not a prohibition against there being multiple ways to 
do something -- how could it, given that Python is a general purpose 
programming language there is always going to be multiple ways to write 
any piece of code? Rather, it exhorts us to make sure that there are one 
or more ways to "do it", at least one of which is obvious.

And since "it" is open to interpretation, we can legitimately wonder 
whether (for example):

- for loops
- list comprehensions
- list(generator expression)

etc are three different ways to do "it", or three different "it"s. If we 
wish to dispute the old slander that Python has Only One Way to do 
anything, then we can emphasise the similarities and declare them three 
ways; if we want to defend the Zen, we can emphasise the differences and 
declare them to be three different "it"s.

So I think Peter is on reasonable ground to suggest this, if he can make 
a good enough case for it.

Personally, I still think the best approach here is a combination of 
itertools.accumulate, and the proposed name-binding as an expression 
feature:

total = 0
running_totals = [(total := total + x) for x in values]
# alternative syntax
running_totals = [(total + x as total) for x in values]

If you don't like the dependency on an external variable (or if that 
turns out not to be practical) then we could have:

running_totals = [(total := total + x) for total in [0] for x in values]


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 572: Statement-Local Name Bindings, take three!

2018-04-10 Thread Chris Angelico
On Wed, Apr 11, 2018 at 1:15 PM, Mike Miller  wrote:
> If anyone is interested I came across this same subject on a blog post and
> discussion on HN today:
>
> - https://www.hillelwayne.com/post/equals-as-assignment/
> - https://news.ycombinator.com/item?id=16803874

Those people who say "x = x + 1" makes no sense... do they also get
confused by the fact that you can multiply a string by a number?
Programming is not algebra. The ONLY reason that "x = x + 1" can fail
to make sense is if you start by assuming that there is no such thing
as time. That's the case in algebra, but it simply isn't true in
software. Functional programming languages are closer to algebra than
imperative languages are, but that doesn't mean they _are_ algebraic,
and they go to great lengths to lie about how you can have
side-effect-free side effects and such.

Fortunately, Python is not bound by such silly rules, and can do
things because they are useful for real-world work. Thus the question
of ":=" vs "=" vs "==" vs "===" comes down to what is actually worth
doing, not what would look tidiest to someone who is trying to
represent a mathematician's blackboard in ASCII.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

2018-04-10 Thread Brendan Barnwell

On 2018-04-08 10:41, Kyle Lahnakoski wrote:

For example before I read the docs on
itertools.accumulate(list_of_length_N, func), here are the unknowns I see:



	It sounds like you're saying you don't like using functions because you 
have to read documentation.  That may be so, but I don't have much 
sympathy for that position.  One of the most useful features of 
functions is that they exist as defined chunks of code that can be 
explicitly documented.  Snippets of inline code are harder to document 
and harder to "address" in the sense of identifying precisely which 
chunk of code is being documented.


	If the documentation for accumulate doesn't give the information that 
people using it need to know, that's a documentation bug for sure, but 
it doesn't mean we should stop using functions.


--
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."

   --author unknown
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

2018-04-10 Thread Chris Angelico
On Wed, Apr 11, 2018 at 1:41 PM, Steven D'Aprano  wrote:
> Personally, I still think the best approach here is a combination of
> itertools.accumulate, and the proposed name-binding as an expression
> feature:
>
> total = 0
> running_totals = [(total := total + x) for x in values]
> # alternative syntax
> running_totals = [(total + x as total) for x in values]
>
> If you don't like the dependency on an external variable (or if that
> turns out not to be practical) then we could have:
>
> running_totals = [(total := total + x) for total in [0] for x in values]

That last one works, but it's not exactly pretty. Using an additional
'for' loop to initialize variables feels like a gross hack.
Unfortunately, the first one is equivalent to this (in a PEP 572
world):

total = 0
def ():
result = []
for x in values:
result.push(total := total + x)
return result
running_totals = ()

Problem: it's still happening in a function, which means this bombs
with UnboundLocalError.

Solution 1: Use the extra loop to initialize 'total' inside the
comprehension. Ugly.

Solution 2: Completely redefine comprehensions to use subscopes
instead of a nested function. I used to think this was a good thing,
but after the discussions here, I've found that this creates as many
problems as it solves.

Solution 3: Have some way for a comprehension to request that a name
be imported from the surrounding context. Effectively this:

total = 0
def (total=total):
result = []
for x in values:
result.push(total := total + x)
return result
running_totals = ()

This is how, in a PEP 572 world, the oddities of class scope are
resolved. (I'll be posting a new PEP as soon as I fix up three failing
CPython tests.) It does have its own problems, though. How do you know
which names to import like that? What if 'total' wasn't assigned to
right there, but instead was being lifted from a scope further out?

Solution 4: Have *all* local variables in a comprehension get
initialized to None.

def ():
result = []
total = x = None
for x in values:
result.push(total := (total or 0) + x)
return result
running_totals = ()

running_totals = [(total := (total or 0) + x) for total in [0] for x in values]

That'd add to the run-time cost of every list comp, but probably not
measurably. (Did you know, for instance, that "except Exception as e:"
will set e to None before unbinding it?) It's still not exactly
pretty, though, and having to explain why you have "or 0" in a purely
arithmetic operation may not quite work.

Solution 5: Allow an explicit initializer syntax. Could work, but
you'd have to come up with one that people are happy with.

None is truly ideal IMO.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Chris Angelico
On Wed, Apr 11, 2018 at 1:02 PM, Steven D'Aprano  wrote:
> On Wed, Apr 11, 2018 at 10:08:58AM +1000, Chris Angelico wrote:
>
>> File system limits aren't usually an issue; as you say, even FAT32 can
>> store a metric ton of files in a single directory. I'm more interested
>> in how long it takes to open a file, and whether doubling that time
>> will have a measurable impact on Python startup time. Part of that
>> cost can be reduced by using openat(), on platforms that support it,
>> but even with a directory handle, there's still a definite non-zero
>> cost to opening and reading an additional file.
>
> Yes, it will double the number of files. Actually quadruple it, if the
> annotations and line numbers are in separate files too. But if most of
> those extra files never need to be opened, then there's no cost to them.
> And whatever extra cost there is, is amortized over the lifetime of the
> interpreter.

Yes, if they are actually not needed. My question was about whether
that is truly valid. Consider a very common use-case: an OS-provided
Python interpreter whose files are all owned by 'root'. Those will be
distributed with .pyc files for performance, but you don't want to
deprive the users of help() and anything else that needs docstrings
etc. So... are the docstrings lazily loaded or eagerly loaded? If
eagerly, you've doubled the number of file-open calls to initialize
the interpreter. (Or quadrupled, if you need annotations and line
numbers and they're all separate.) If lazily, things are a lot more
complicated than the original description suggested, and there'd need
to be some semantic changes here.

> Serhiy is experienced enough that I think we should assume he's not
> going to push this optimization into production unless it actually does
> reduce startup time. He has proven himself enough that we should assume
> competence rather than incompetence :-)

Oh, I'm definitely assuming that he knows what he's doing :-) Doesn't
mean I can't ask the question though.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Accepting multiple mappings as positional arguments to create dicts

2018-04-10 Thread Chris Angelico
On Wed, Apr 11, 2018 at 11:19 AM, Mike Miller  wrote:
>
> On 2018-04-09 04:23, Daniel Moisset wrote:
>>
>> In which way would this be different to {**mapping1, **mapping2,
>> **mapping3} ?
>
>
> That's possible now, but believe the form mentioned previously would be more
> readable:
>
> dict(d1, d2, d3)
>

That's more readable than {**d1, **d2, **d3} ? Doesn't look materially
different to me.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Accepting multiple mappings as positional arguments to create dicts

2018-04-10 Thread Steven D'Aprano
On Wed, Apr 11, 2018 at 02:22:08PM +1000, Chris Angelico wrote:

> > dict(d1, d2, d3)
> 
> That's more readable than {**d1, **d2, **d3} ? Doesn't look materially
> different to me.

It does to me.

On the one hand, we have a function call (okay, technically a type...) 
"dict()" that can be googled on, with three arguments; on the other 
hand, we have syntax that looks like a set {...} and contains the 
obscure ** prefix operator which is hard to google for.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Accepting multiple mappings as positional arguments to create dicts

2018-04-10 Thread Chris Angelico
On Wed, Apr 11, 2018 at 2:44 PM, Steven D'Aprano  wrote:
> On Wed, Apr 11, 2018 at 02:22:08PM +1000, Chris Angelico wrote:
>
>> > dict(d1, d2, d3)
>>
>> That's more readable than {**d1, **d2, **d3} ? Doesn't look materially
>> different to me.
>
> It does to me.
>
> On the one hand, we have a function call (okay, technically a type...)
> "dict()" that can be googled on, with three arguments; on the other
> hand, we have syntax that looks like a set {...} and contains the
> obscure ** prefix operator which is hard to google for.

True, you can google 'dict'. But the double-star operator is exactly
the same as is used in kwargs, and actually, I *can* search for it.

https://www.google.com.au/search?q=python+**

Lots of results for kwargs, which is a good start. (DuckDuckGo is less
useful here, though it too is capable of searching for "**". It just
gives more results about exponentiation than about packing/unpacking.)
 The googleability argument may have been a killer a few years ago,
but search engines get smarter every day [1], and it's most definitely
possible to search for operators. Or at least some of them; Google and
DDG don't give me anything useful for "python @".

ChrisA

[1] and a search engine can help you find SmarterEveryDay, not that he
talks about Python
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] PEP 572: Assignment Expressions (post #4)

2018-04-10 Thread Chris Angelico
Wholesale changes since the previous version. Statement-local name
bindings have been dropped (I'm still keeping the idea in the back of
my head; this PEP wasn't the first time I'd raised the concept), and
we're now focusing primarily on assignment expressions, but also with
consequent changes to comprehensions.

Sorry for the lengthy delays; getting a reference implementation going
took me longer than I expected or intended.

ChrisA

PEP: 572
Title: Assignment Expressions
Author: Chris Angelico 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 28-Feb-2018
Python-Version: 3.8
Post-History: 28-Feb-2018, 02-Mar-2018, 23-Mar-2018, 04-Apr-2018


Abstract


This is a proposal for creating a way to assign to names within an expression.
Additionally, the precise scope of comprehensions is adjusted, to maintain
consistency and follow expectations.


Rationale
=

Naming the result of an expression is an important part of programming,
allowing a descriptive name to be used in place of a longer expression,
and permitting reuse.  Currently, this feature is available only in
statement form, making it unavailable in list comprehensions and other
expression contexts.  Merely introducing a way to assign as an expression
would create bizarre edge cases around comprehensions, though, and to avoid
the worst of the confusions, we change the definition of comprehensions,
causing some edge cases to be interpreted differently, but maintaining the
existing behaviour in the majority of situations.


Syntax and semantics


In any context where arbitrary Python expressions can be used, a **named
expression** can appear. This can be parenthesized for clarity, and is of
the form ``(target := expr)`` where ``expr`` is any valid Python expression,
and ``target`` is any valid assignment target.

The value of such a named expression is the same as the incorporated
expression, with the additional side-effect that the target is assigned
that value.

# Similar to the boolean 'or' but checking for None specifically
x = "default" if (eggs := spam().ham) is None else eggs

# Even complex expressions can be built up piece by piece
y = ((eggs := spam()), (cheese := eggs.method()), cheese[eggs])


Differences from regular assignment statements
--

An assignment statement can assign to multiple targets::

x = y = z = 0

To do the same with assignment expressions, they must be parenthesized::

assert 0 == (x := (y := (z := 0)))

Augmented assignment is not supported in expression form::

>>> x +:= 1
  File "", line 1
x +:= 1
^
SyntaxError: invalid syntax

Otherwise, the semantics of assignment are unchanged by this proposal.


Alterations to comprehensions
-

The current behaviour of list/set/dict comprehensions and generator
expressions has some edge cases that would behave strangely if an assignment
expression were to be used. Therefore the proposed semantics are changed,
removing the current edge cases, and instead altering their behaviour *only*
in a class scope.

As of Python 3.7, the outermost iterable of any comprehension is evaluated
in the surrounding context, and then passed as an argument to the implicit
function that evaluates the comprehension.

Under this proposal, the entire body of the comprehension is evaluated in
its implicit function. Names not assigned to within the comprehension are
located in the surrounding scopes, as with normal lookups. As one special
case, a comprehension at class scope will **eagerly bind** any name which
is already defined in the class scope.

A list comprehension can be unrolled into an equivalent function. With
Python 3.7 semantics::

numbers = [x + y for x in range(3) for y in range(4)]
# Is approximately equivalent to
def (iterator):
result = []
for x in iterator:
for y in range(4):
result.append(x + y)
return result
numbers = (iter(range(3)))

Under the new semantics, this would instead be equivalent to::

def ():
result = []
for x in range(3):
for y in range(4):
result.append(x + y)
return result
numbers = ()

When a class scope is involved, a naive transformation into a function would
prevent name lookups (as the function would behave like a method).

class X:
names = ["Fred", "Barney", "Joe"]
prefix = "> "
prefixed_names = [prefix + name for name in names]

With Python 3.7 semantics, this will evaluate the outermost iterable at class
scope, which will succeed; but it will evaluate everything else in a function::

class X:
names = ["Fred", "Barney", "Joe"]
prefix = "> "
def (iterator):
result = []
for name in iterator:
result.append(prefix + name)
return result
prefixed_names =

Re: [Python-ideas] PEP 572: Assignment Expressions (post #4)

2018-04-10 Thread Ethan Furman

On 04/10/2018 10:32 PM, Chris Angelico wrote:


Migration path
==

The semantic changes to list/set/dict comprehensions, and more so to generator
expressions, may potentially require migration of code. In many cases, the
changes simply make legal what used to raise an exception, but there are some
edge cases that were previously legal and are not, and a few corner cases with
altered semantics.


s/previously legal and are not/previously legal and now are not/

--
~Ethan~
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Steven D'Aprano
On Wed, Apr 11, 2018 at 02:21:17PM +1000, Chris Angelico wrote:

[...]
> > Yes, it will double the number of files. Actually quadruple it, if the
> > annotations and line numbers are in separate files too. But if most of
> > those extra files never need to be opened, then there's no cost to them.
> > And whatever extra cost there is, is amortized over the lifetime of the
> > interpreter.
> 
> Yes, if they are actually not needed. My question was about whether
> that is truly valid.

We're never really going to know the affect on performance without 
implementing and benchmarking the code. It might turn out that, to our 
surprise, three quarters of the std lib relies on loading docstrings 
during startup. But I doubt it.


> Consider a very common use-case: an OS-provided
> Python interpreter whose files are all owned by 'root'. Those will be
> distributed with .pyc files for performance, but you don't want to
> deprive the users of help() and anything else that needs docstrings
> etc. So... are the docstrings lazily loaded or eagerly loaded?

What relevance is that they're owned by root?


> If eagerly, you've doubled the number of file-open calls to initialize
> the interpreter.

I do not understand why you think this is even an option. Has Serhiy 
said something that I missed that makes this seem to be on the table? 
That's not a rhetorical question -- I may have missed something. But I'm 
sure he understands that doubling or quadrupling the number of file 
operations during startup is not an optimization.


> (Or quadrupled, if you need annotations and line
> numbers and they're all separate.) If lazily, things are a lot more
> complicated than the original description suggested, and there'd need
> to be some semantic changes here.

What semantic change do you expect?

There's an implementation change, of course, but that's Serhiy's problem 
to deal with and I'm sure that he has considered that. There should be 
no semantic change. When you access obj.__doc__, then and only then are 
the compiled docstrings for that module read from the disk.

I don't know the current implementation of .pyc files, but I like 
Antoine's suggestion of laying it out in four separate areas (plus 
header), each one marshalled:

code
docstrings
annotations
line numbers

Aside from code, which is mandatory, the three other sections could be 
None to represent "not available", as is the case when you pass -00 to 
the interpreter, or they could be some other sentinel that means "load 
lazily from the appropriate file", or they could be the marshalled data 
directly in place to support byte-code only libraries.

As for the in-memory data structures of objects themselves, I imagine 
something like the __doc__ and __annotation__ slots pointing to a table 
of strings, which is not initialised until you attempt to read from the 
table. Or something -- don't pay too much attention to my wild guesses.

The bottom line is, is there some reason *aside from performance* to 
avoid this? Because if the performance is worse, I'm sure Serhiy will be 
the first to dump this idea.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Python-ideas Digest, Vol 137, Issue 40

2018-04-10 Thread Thautwarm Zhao
I think Guido has given a direct answer why dict unpacking is not supported
in syntax level,
I can take it and I think it's better to implement a function for dict
unpacking in standard library, just like

from dict_unpack import dict_unpack, pattern as pat
some_dict = {'a': {'b': {'c': 1}, 'd':2}, 'e': 3}

extracted = dict_unpack(some_dict, schema = {'a': {'b': {'c':
pat('V1')}, 'd': pat('V2')}, 'e': pat('V3')})
# extract to a flatten dictionary

v1, v2, v3 = (extracted[k] for k in ('V1', 'V2', 'V3'))
assert (v1, v2, v3) == (1, 2, 3)


As for Steve's confusion,

> > {key: value_pattern, **_} = {key: value, **_}

> If I saw that, I would have no idea what it could even possibly do.
> Let's pick the simplest concrete example I can think of:
>
> {'A': 1, **{}} = {'A': 0, **{}}
>
> I cannot interpret what that should do. Is it some sort of
> pattern-matching? An update? What is the result? It is obviously some
> sort of binding operation, an assignment, but an assignment to what?

{'A': 1, **{}} = {'A': 0, **{}} should be just wrong because for any k-v
pair at LHS, the key should be a expression and the value is for unpacking.
{'A': [*a, b]} = {'A': [1, 2,  3]} is welcome, but {'A': 1} = {'A': '1'} is
also something like pattern matching which is out of our topic.

Anyway, this feature will not come true, let's forget it...


I think Jacco is totally correct in following words.

> I think most of these problems could be solved with pop and the
> occasional list comprehension like this:
>
> a, b, c = [{'a':1,'b':2,'c':3}.pop(key) for key in ('a', 'b', 'c')]
>
> or for your example:
>
> c =  {'a': 1, **{'b': 2}}  # I suppose this one would generally
>  # be dynamic, but I need a name here.
> a, b = [c.pop(key) for key in ('a', 'b')]
>
> would extract all the keys you need, and has the advantage that
> you don't need hardcoded dict structure if you expand it to nested
> dicts. It's even less writing, and just as extensible to nested dicts.
> And if you dont actually want to destruct (tuples and lists aren't
> destroyed either), just use __getitem__ access instead of pop.

But pop cannot work for a nested case.

Feel free to end this topic.

thautwarm



2018-04-10 23:20 GMT+08:00 :

> Send Python-ideas mailing list submissions to
> python-ideas@python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.python.org/mailman/listinfo/python-ideas
> or, via email, send a message with subject or body 'help' to
> python-ideas-requ...@python.org
>
> You can reach the person managing the list at
> python-ideas-ow...@python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Python-ideas digest..."
>
> Today's Topics:
>
>1. Re: Is there any idea about dictionary destructing?
>   (Steven D'Aprano)
>2. Re: Is there any idea about dictionary destructing?
>   (Jacco van Dorp)
>3. Re: Start argument for itertools.accumulate() [Was: Proposal:
>   A Reduce-Map Comprehension and a "last" builtin] (Guido van Rossum)
>4. Re: Is there any idea about dictionary destructing?
>   (Guido van Rossum)
>
>
> -- 已转发邮件 --
> From: "Steven D'Aprano" 
> To: python-ideas@python.org
> Cc:
> Bcc:
> Date: Tue, 10 Apr 2018 19:21:35 +1000
> Subject: Re: [Python-ideas] Is there any idea about dictionary destructing?
> On Tue, Apr 10, 2018 at 03:29:08PM +0800, Thautwarm Zhao wrote:
>
> > I'm focused on the consistency of the language itself.
>
> Consistency is good, but it is not the only factor to consider. We must
> guard against *foolish* consistency: adding features just for the sake
> of matching some other, often barely related, feature. Each feature must
> justify itself, and consistency with something else is merely one
> possible attempt at justification.
>
>
> > {key: value_pattern, **_} = {key: value, **_}
>
> If I saw that, I would have no idea what it could even possibly do.
> Let's pick the simplest concrete example I can think of:
>
> {'A': 1, **{}} = {'A': 0, **{}}
>
> I cannot interpret what that should do. Is it some sort of
> pattern-matching? An update? What is the result? It is obviously some
> sort of binding operation, an assignment, but an assignment to what?
>
> Sequence binding and unpacking was obvious the first time I saw it. I
> had no problem guessing what:
>
> a, b, c = 1, 2, 3
>
> meant, and once I had seen that, it wasn't hard to guess what
>
> a, b, c = *sequence
>
> meant. From there it is easy to predict extended unpacking. But I can't
> say the same for this.
>
> I can almost see the point of:
>
> a, b, c, = **{'a': 1, 'b': 2, 'c': 3}
>
> but I'm having trouble thinking of a situation where I would actually
> use it. But your syntax above just confuses me.
>
>
> > The reason why it's important is that, when destructing/constructing for
> > built-in data structures are not support

Re: [Python-ideas] PEP 572: Assignment Expressions (post #4)

2018-04-10 Thread Chris Angelico
On Wed, Apr 11, 2018 at 3:54 PM, Ethan Furman  wrote:
> On 04/10/2018 10:32 PM, Chris Angelico wrote:
>
>> Migration path
>> ==
>>
>> The semantic changes to list/set/dict comprehensions, and more so to
>> generator
>> expressions, may potentially require migration of code. In many cases, the
>> changes simply make legal what used to raise an exception, but there are
>> some
>> edge cases that were previously legal and are not, and a few corner cases
>> with
>> altered semantics.
>
>
> s/previously legal and are not/previously legal and now are not/
>

Trivial change, easy fix. Thanks.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Move optional data out of pyc files

2018-04-10 Thread Steve Barnes


On 10/04/2018 18:54, Zachary Ware wrote:
> On Tue, Apr 10, 2018 at 12:38 PM, Chris Angelico  wrote:
>> A deployed Python distribution generally has .pyc files for all of the
>> standard library. I don't think people want to lose the ability to
>> call help(), and unless I'm misunderstanding, that requires
>> docstrings. So this will mean twice as many files and twice as many
>> file-open calls to import from the standard library. What will be the
>> impact on startup time?
> 
> What about instead of separate files turning the single file into a
> pseudo-zip file containing all of the proposed files, and provide a
> simple tool for removing whatever parts you don't want?
> 

Personally I quite like the idea of having the doc strings, and possibly 
other optional components, in a zipped section after a marker for the 
end of the operational code. Possibly the loader could stop reading at 
that point, (reducing load time and memory impact), and only load and 
unzip on demand.

Zipping the doc strings should have a significant reduction in file 
sizes but it is worth remembering a couple of things:

  - Python is already one of the most compact languages for what it can 
do - I have had experts demanding to know where the rest of the program 
is hidden and how it is being downloaded when they noticed the size of 
the installed code verses the functionality provided.
  - File size <> disk space consumed - on most file systems each file 
typically occupies 1 + (file_size // allocation_size) clusters of the 
drive and with increasing disk sizes generally the allocation_size is 
increasing both of my NTFS drives currently have 4096 byte allocation 
sizes but I am offered up to 2 MB allocation sizes - splitting a .pyc 
10,052 byte .pyc file, (picking a random example from my drive) into a 
5,052 and 5,000 byte files will change the disk space occupied  from 
3*4,096 to 4*4,096 plus the extra directory entry.
  - Where absolute file size is critical you, (such as embedded 
systems), can always use the -O & -OO flags.
-- 
Steve (Gadget) Barnes
Any opinions in this message are my personal opinions and do not reflect 
those of my employer.

---
This email has been checked for viruses by AVG.
http://www.avg.com

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/