Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-06 Thread Chris Angelico
On Sun, Nov 6, 2016 at 12:50 AM, Arthur Havlicek
 wrote:
> 2016-11-05 12:47 GMT+01:00 Chris Angelico :
>
>> On Sat, Nov 5, 2016 at 9:50 PM, Arthur Havlicek
>>
>> But here's the thing. For everyone who writes a decorator function,
>> there could be dozens who use it.
>>
>
> The day that one guy leaves the team, suddenly you have code that's become
> a bit tricky to maintain.

True, and this can become a problem when those dozens have no
comprehension of how these features work. But fortunately, all it
takes is for one person to step up and learn how decorators are
written, and the problem is solved. (And it's not that hard. We teach
decorator authorship in our Web Programming In Python course. Not in a
huge amount of detail, but enough that a student will be able to
carefully tease apart the code of a decorator function and figure out
what it's actually doing.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-06 Thread Arthur Havlicek
2016-11-05 12:47 GMT+01:00 Chris Angelico :

> On Sat, Nov 5, 2016 at 9:50 PM, Arthur Havlicek
>
> But here's the thing. For everyone who writes a decorator function,
> there could be dozens who use it.
>

The day that one guy leaves the team, suddenly you have code that's become
a bit tricky to maintain. Our philosophy is that everyone should be able to
contribute anywhere although I agree with you: there is such thing as using
a decorator as a black box, and even diving into it isn't that hard. Notice
I didn't say we don't use these features at all, we just tend to avoid them
when a more familiar approach exist. Even so, maybe we are wrong to do so,
and the code would be clearer if we used more features. Maybe we just lack
the expertise and we are indeed stuck in an old mindset, after all. Anyway,
I have some directives to make my code easy to access and must compose with
what I would call industrial constraints.

As a gambler I can't refuse that bet proposal, no sir :) Expect me to come
back with a piece of code and/or questions about CPython engine.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-05 Thread Chris Angelico
On Sat, Nov 5, 2016 at 9:50 PM, Arthur Havlicek
 wrote:
> Pick 10 programmers for hire and count how many know how to write a
> decorator. If you have specified you needed python specialists, you may
> have 3-4. If not, you are lucky to find even one.

By "write a decorator", I presume you mean implement the decorator
function, because anyone who's used Flask will have used "@app.route",
for instance.

But here's the thing. For everyone who writes a decorator function,
there could be dozens who use it. How many people *on this mailing
list* know how to implement namedtuple, vs how how many know how to
use it? How many know how to properly implement a cache, vs how many
can slap "@lru_cache()" on top of a function? For the most utterly
extreme example possible, how many people in the world know how to
encrypt data securely, vs how many know how to put "https:" into their
browsers? When you look at "programmers for hire", you get a small
number of brilliant experts with huge amounts of experience, and the
rest are split between mid-level people looking for a new job, and
entry-level people straight out of college. (Depending on your
environment, the stats could easily be skewed either direction.) Most
entry-level programmers are not going to have experience with writing
decorator functions - but they don't need it. (Not that it's that
hard. They're functions that accept and return functions, that's all.)

> This is easy but time-consuming, I could roll my implementation and
> showcase a few benchs. I am very confident that is the case. I would have
> bet that I would need to do it at some point, but I wanted to poll opinions
> a bit beforehand.
>
> About your bench: I don't really know why you are surprised the for loop is
> slower. That's the result of comprehension being native while for loop
> isn't. That does not mean writing to a copy would save time for exchange of
> memory. In my opinion, the fact that we will win something is guaranteed
> because we save a copy call and do the exact same operation. There is
> nothing like cache magic optimization that could happen because the mapping
> needs to read the first list anyway. Nor we need a temporary buffer to
> cache operations since they are independent. Really, I am ready to make a
> serious bet.

Okay! *Do* make that bet. Gamble the most precious currency there is:
developer time. How much are you prepared to bet? Two hours? Ten
hours? Spend that much time writing the implementation and
benchmarking it. If you're confident that this really is such a win,
the time won't be lost - you'll end up with a viable patch for
inclusion. If you lose the bet, well, that's what betting is like.
Sometimes you lose.

I may sound cynical and critical from the above ("show me"), but in
reality, I would love to see the results of your benchmark. Improving
performance is a good thing. I just want to know that it's actually
going to happen.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-05 Thread Arthur Havlicek
2016-11-05 9:42 GMT+01:00 Steve D'Aprano :

>
> I don't know who you are quoting there. It is considered impolite to quote
> people without giving attribution, and makes it harder to respond.
>

My bad. I was unaware of that. This was quoted from Ned Batchelder's mali.


2016-11-05 9:42 GMT+01:00 Steve D'Aprano :

> I trust that isn't actually the case, but by describing decorators and
> lambda as too fancy to use, you're giving a good impression of somebody who
> doesn't know what they're doing.
>

I was a bit humorous here, making fun about the fact I do not write
anything sophisticated.

However, the fact that we do avoid them is entirely true ! We do so because
these concepts are foreign to the average developer (lambda being a bit
apart: it's easily understood, but functional constructs may not, so its a
result of us avoiding functional constructs as well).

Pick 10 programmers for hire and count how many know how to write a
decorator. If you have specified you needed python specialists, you may
have 3-4. If not, you are lucky to find even one. And where I live we don't
have the luxury of finding competent Python experts by kicking a tree. In
our open space, we are 6 devs (+ one ex-dev being my manager), only 2 had
previous Python experience: me and the CEO. And the other 4 are doing fine
! Locking ourselves in Python-specific semantics, over time, will make us
loose precious dev time. The best code is the one everyone can understand.
That is why I point this as "complex, non-obvious", and my manager wouldn't
have so kind words (but he is a Java fan, so he doesn't count.)


2016-11-05 9:42 GMT+01:00 Steve D'Aprano :

> - you would have to demonstrate a good reason to think that this new
> feature
> will actually be more efficient and faster
>

This is easy but time-consuming, I could roll my implementation and
showcase a few benchs. I am very confident that is the case. I would have
bet that I would need to do it at some point, but I wanted to poll opinions
a bit beforehand.


2016-11-05 9:42 GMT+01:00 Steve D'Aprano :

> In other words, in my opinion:
>
> - you would need to prove that there is a widespread need for in-place
> modification of lists, where the lists are big enough that using a
> temporary list is not practical
>

However this is tricky, I may very well be an exception.

About your bench: I don't really know why you are surprised the for loop is
slower. That's the result of comprehension being native while for loop
isn't. That does not mean writing to a copy would save time for exchange of
memory. In my opinion, the fact that we will win something is guaranteed
because we save a copy call and do the exact same operation. There is
nothing like cache magic optimization that could happen because the mapping
needs to read the first list anyway. Nor we need a temporary buffer to
cache operations since they are independent. Really, I am ready to make a
serious bet.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-05 Thread Steve D'Aprano
I've been giving your proposal a bit more thought, and while I can't say I'm
really keep on the idea, I have warmed slightly to it.


On Fri, 4 Nov 2016 07:29 am, Arthur Havlicek wrote:

> I understand that, the cost of change is such that it's very unlikely
> something like this ever goes into Python, but I feel like the interest of
> the proposition is being underestimated here, that's why I'm going to
> argue a few points and give a bit more context as needed.
> 
>> While mapping and filtering are common operations, I'm not sure mapping
>> and filtering and then reassigning back to the original sequence is
>> especially common.
> 
> It depends of your context. On the last 3 months, I stumbled across this
> at least 3 times,

I don't know who you are quoting there. It is considered impolite to quote
people without giving attribution, and makes it harder to respond. But for
what it's worth, I agree with this person.

In my code, it is quite common for me to map back to the same *variable*,
for example I might write something like this:

def process(alist):
alist = [float(x) for x in alist]
...


But that is not the same as *in-place* modification of the list. I would
very rarely modify the list in place, but if I did, I would write it like
this:

alist[:] = [float(x) for x in alist]


In my experience, wanting to modify a list in-place without creating a
temporary version first is unusual. And *needing* to do so (rather than
doing so as premature optimization) is even rarer.

That's my experience. But if you can demonstrate the need for in-place
modifications, that changes the situation somewhat.


> which is 3 times more than I used a lambda or a
> metaclass or a decorator or other fancy language feature that we simply
> avoid whenever possible.

You aren't helping your case by describing "lambda or ... decorator" as
fancy language features to be avoided.

The impression that gives (hopefully this is the wrong impression!) is that
you are a barely competent Python developer, fearful and suspicious of some
rather simple, powerful and widely-used features, stuck in an ancient 1980s
programming paradigm.

I trust that isn't actually the case, but by describing decorators and
lambda as too fancy to use, you're giving a good impression of somebody who
doesn't know what they're doing.


[...]
> The reason I'm being especially impacted by this is because I am
> maintainer of a decent-sized Python application (~50-100K lines of code)
> that extensively uses lists and dictionaries. We value "low level" data
> manipulation and efficiency a lot more than complex, non-obvious
> constructs.

And what do you consider "complex, non-obvious"?


[...]
> Like most Python programmers, I'm not in the case of needing a performance
> boost very bad, but that does not mean I disregard performance entirely.
> The need of performance is not so binary that it either don't matter at
> all or is enough to motivate a rewrite.

Indeed.

But you're arguing for a new feature on the basis that it will boost
performance, without giving any reason to think that it actually will lead
to a performance improvement!

I know that there is a chicken-and-egg problem here. Nobody wants to spend
time writing a new feature if there's no possibly chance for it to be
accepted, but when your sole argument for a new feature is that it will be
more efficient (in both memory and time!), then you need to prove that is
the case!

In other words, in my opinion:

- you would need to prove that there is a widespread need for in-place
modification of lists, where the lists are big enough that using a
temporary list is not practical;

- you would have to demonstrate a good reason to think that this new feature
will actually be more efficient and faster

before this feature would be seriously considered for addition to the
language.

You might think that it is obvious that in-place modification MUST be faster
since it avoids making a temporary copy, but that's not obviously true at
all! Not for a high-level language like Python. Its often the case that
algorithms can exchange space for time (use more memory to save time, or
use more time to save memory). It may be that this is one of the times.

Even when doing (nearly) everything in pure Python, making a new list and
then doing a slice assignment is virtually as fast as writing directly to
the original list:

py> from timeit import Timer
py> setup = "data = list(range(1))"
py> t1 = Timer("""for i, a in enumerate(data):
... data[i] = a+1
... """, setup=setup)
py> 
py> t2 = Timer("""tmp = [None]*len(data)
... for i, a in enumerate(data):
... tmp[i] = a+1
... data[:] = tmp
... """, setup=setup)
py>
py> min(t1.repeat(number=1, repeat=7))
36.63875983003527
py>
py> min(t2.repeat(number=1, repeat=7))
37.70047474466264


Taking the plain Python for-loop, modifying the list directly in place, we
see that making a temporary list and then copying it over the original list
is just 3% 

Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-04 Thread arthurhavlicek
> I don't think that justifies the claim of "especially
> bad", which to me implies something much worse.

Quicksort has built its popularity by performing better by "a mere factor two" 
better than mergesort and heapsort. It became the reference sorting algorithm 
even though its worst case complexity is worse than its competitors.

You can insurge about the fact I call a factor two to be especially bad, and 
you are right it is an hyperbole in the general case, but I am still right in 
the specific context of a frequently used linear algorithm.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-04 Thread Chris Angelico
On Sat, Nov 5, 2016 at 11:42 AM, Steve D'Aprano
 wrote:
> On Fri, 4 Nov 2016 08:34 am, Chris Angelico wrote:
>
> [...]
>> List comps themselves involve one function call (zero in Py2). What
>> you do inside the expression is your business. Do you agree that list
>> comps don't have the overhead of opening and closing files?
>
> /tongue firmly in cheek
>
> I'd like to see you run a Python script containing a list comprehension
> without opening and closing the .py file.
>
> :-P

You got me! Let's call it a night.
-- Kristoff (nearly my namesake)

> Okay, I see what you are getting at now: in CPython 3, list comprehensions
> are implemented in such a way that the list comprehension requires a
> minimum of one function call, while list(map(func, iterable))) requires a
> minimum of O(N)+2 function calls: a call to list, a call to map, and a call
> to func for each of N values.
>
> That's *technically* true, but it is an implementation detail. I'm sure that
> some day PyPy or Nuitka or maybe even CPython itself will start in-lining
> at least some functions (if PyPy doesn't already do so). That would be an
> obvious optimization to apply to map() and filter().

Mmmm, interesting. The fact still remains that map depends on some
kind of "object representation" of a block of code (since it's being
passed to some other function), where a comprehension can be
implemented as an actual expression. So either Python-the-language
needs a way to pass around lightweight blocks of code, or the
interpreter (for some instance of 'the interpreter') needs to
recognize the function calls and optimize them away, or list comps
will always have an inherent advantage over map.

> As far as the speed of map() versus list comps, my micro-benchmarks show
> that at least on my computer, map() can be marginally but consistently
> faster than a list comp once you equalise that cost of function calls, that
> is, if the list comp explicitly calls a function. I don't think that speed
> difference is significant, so let's just call them "equally fast" when
> comparing similar cases:
>
> [func(obj) for obj in iterable]
>
> map(func, iterable)

Right, which is why a lot of style guides recommend against the first
form, *in this specific instance*. Using map with a lambda function,
or a comprehension with nothing but a function call, is rightly called
out in code review.

> I didn't read the OP as making a specific claim about these two *specific*
> map and filter examples:
>
> lst = map (lambda x: x*5, lst)
> lst = filter (lambda x: x%3 == 1, lst)
>
> I read these as mere examples of a general claim that map and
> filter "perform especially bad in CPython compared to a comprehension".
> ... I don't think that justifies the claim of "especially
> bad", which to me implies something much worse. If you're going to describe
> a mere factor of two as "especially bad", what words do we have left for
> something that is ten thousand times slower?
>
> As the wisest man in the universe once said, hyperbole is the most terrible,
> awful crime against humanity.
>
> *wink*

Ah, now we get to the point where we disagreed. I was responding to a
misunderstanding of your position - I thought you disagreed that the
performance difference could even be significant, but you were arguing
against the "especially bad". Gotcha. In that case, I believe we're in
agreement; even a two-to-one difference isn't "especially bad" here,
and that would be an extreme case.

>> But this conclusion I agree with. There is a performance difference,
>> but it is not overly significant. Compared to the *actual work*
>> involved in the task (going through one list and doing some operation
>> on each operation), the difference between map and a comprehension is
>> generally going to be negligible.
>
> I wouldn't go quite so far as to say "negligible" -- a factor of two speed
> up on a large list is not something to be sneezed at. But I think we're
> converging on agreement: list comps and map/filter typically have
> comparable performance, and as the cost of the work done increases, the
> extra overhead of a function call becomes less and less significant.

I said negligible because the factor of two disappears almost
completely when you add a large constant factor to it. What's the
performance of these?

def lie(n):
"""it's not a fib, honest"""
if n < 2: return 3
return lie(n-1) + lie(n-2)

mapped = list(map(lie, range(50)))
comprehended = [lie(x) for x in range(50)]
badly_mapped = list(map(lambda x: lie(x), range(50)))

By any reasonable metric, I would expect all three to have extremely
comparable performance. The difference between map's best case
(passing an existing function), a list comp, and map's worst case
(wrapping the function in a useless lambda function) might be two to
one, but it's negligible compared to the surpassing cost of those
massively-nested lies. Oh, what a tangled web we weave...

ChrisA
-- 

Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-04 Thread Steve D'Aprano
On Fri, 4 Nov 2016 08:34 am, Chris Angelico wrote:

[...]
> List comps themselves involve one function call (zero in Py2). What
> you do inside the expression is your business. Do you agree that list
> comps don't have the overhead of opening and closing files?

/tongue firmly in cheek

I'd like to see you run a Python script containing a list comprehension
without opening and closing the .py file.

:-P


Okay, I see what you are getting at now: in CPython 3, list comprehensions
are implemented in such a way that the list comprehension requires a
minimum of one function call, while list(map(func, iterable))) requires a
minimum of O(N)+2 function calls: a call to list, a call to map, and a call
to func for each of N values.

That's *technically* true, but it is an implementation detail. I'm sure that
some day PyPy or Nuitka or maybe even CPython itself will start in-lining
at least some functions (if PyPy doesn't already do so). That would be an
obvious optimization to apply to map() and filter().

As far as the speed of map() versus list comps, my micro-benchmarks show
that at least on my computer, map() can be marginally but consistently
faster than a list comp once you equalise that cost of function calls, that
is, if the list comp explicitly calls a function. I don't think that speed
difference is significant, so let's just call them "equally fast" when
comparing similar cases:

[func(obj) for obj in iterable]

map(func, iterable)


But of course you're right that list comprehensions give you the opportunity
to avoid that function call -- at least in CPython. I already agreed with
that, but to emphasise what I've already agreed, I'll say it again :-)

If you can manually in-line func() as a single expression inside the list
comprehension:

[(spam + len(obj.eggs))*2 for obj in iterable]

then you can expect to save the cost of N function calls, which may be
significant. As I said earlier, that's why we have list comps.

(But on the other hand, if the expression is expensive enough, the cost of
an extra function call may be utterly insignificant.)

The point that I am trying to make is that none of these facts justifies the
claim that map() performs "especially bad" compared to list comprehensions.
According to my tests, *at worst* map() will be a bit better than half as
fast as a list comprehension, and at best just as fast if not slightly
faster.


>> Here's some timing results using 3.5 on my computer. For simplicity, so
>> folks can replicate the test themselves, here's the timing code:
>>
>>
>> from timeit import Timer
>> setup = """data = list(range(1))
>> def func(x):  # simulate some calculation
>> return {x+1: x**2}
>> """
>> t1 = Timer('[func(a) for a in data]', setup=setup)
>> t2 = Timer('list(map(func, data))', setup=setup)
> 
> This is very different from the original example, about which the OP
> said that map performs badly, and you doubted it.

I didn't read the OP as making a specific claim about these two *specific*
map and filter examples:

lst = map (lambda x: x*5, lst)
lst = filter (lambda x: x%3 == 1, lst)

I read these as mere examples of a general claim that map and
filter "perform especially bad in CPython compared to a comprehension".

But just for the exercise, I repeated my benchmarks with these specific
examples, comparing:

list(map(lambda x: x*5, data))
[x*5 for x in data]

and 

list(filter(lambda x: x%3 == 1, data))
[x for x in data if x%3 == 1]

and again got a roughly factor of two performance difference, with the list
comp being faster. I don't think that justifies the claim of "especially
bad", which to me implies something much worse. If you're going to describe
a mere factor of two as "especially bad", what words do we have left for
something that is ten thousand times slower?

As the wisest man in the universe once said, hyperbole is the most terrible,
awful crime against humanity.

*wink*


[...]
> Thing is, this is extremely common. How often do you actually use a
> comprehension with something that is absolutely exactly a function
> call on the element in question?

"This" being something that can be in-lined in the body of the list comp.

Sure. I cheerfully acknowledge that list comps where you can write an
in-line expression are very common. That's the beauty of list comps!


[...]
> But this conclusion I agree with. There is a performance difference,
> but it is not overly significant. Compared to the *actual work*
> involved in the task (going through one list and doing some operation
> on each operation), the difference between map and a comprehension is
> generally going to be negligible.

I wouldn't go quite so far as to say "negligible" -- a factor of two speed
up on a large list is not something to be sneezed at. But I think we're
converging on agreement: list comps and map/filter typically have
comparable performance, and as the cost of the work done increases, the
extra overhead of a function call becomes 

Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-04 Thread Arthur Havlicek
> If slice assignment is done as I hope it will optimize remain memory
operations.

Bad news.

http://stackoverflow.com/questions/4948293/python-slice-assignment-memory-usage/4948508#4948508

> If you want something like C++ move semantics, use C++.

I don't see anything like this in my proposal. If any in-place operation is
"C++ semantics" how do you explain there is already a bunch of in-place
operator stuffed in Python that have even less justification to their
existance than map or filter does such as sort/sorted, reverse/reversed ?
Especially troubling since the optimisation in a sort operation is likely
to be less significant than in a linear algorithm.

2016-11-04 1:03 GMT+01:00 Terry Reedy :

> On 11/3/2016 2:56 AM, arthurhavli...@gmail.com wrote:
>
> lst = [ item for item in lst if predicate(item) ]
>> lst = [ f(item) for item in lst ]
>>
>> Both these expressions feature redundancy, lst occurs twice and item at
>> least twice. Additionally, the readability is hurt, because one has to dive
>> through the semantics of the comprehension to truely understand I am
>> filtering the list or remapping its values.
>>
> ...
>
>> A language support for these operations to be made in-place could improve
>> the efficiency of this operations through reduced use of memory.
>>
>
> We already have that: slice assignment with an iterator.
>
> lst[:] = (item for item in list if predicate(item))
> lst[:] = map(f, lst)  # iterator in 3.x.
>
> To save memory, stop using unneeded temporary lists and use iterators
> instead.  If slice assignment is done as I hope it will optimize remain
> memory operations.  (But I have not read the code.) It should overwrite
> existing slots until either a) the iterator is exhausted or b) existing
> memory is used up.  When lst is both source and destination, only case a)
> can happen.  When it does, the list can be finalized with its new contents.
>
> As for timings.
>
> from timeit import Timer
> setup = """data = list(range(1))
> def func(x):
> return x
> """
> t1a = Timer('data[:] = [func(a) for a in data]', setup=setup)
> t1b = Timer('data[:] = (func(a) for a in data)', setup=setup)
> t2a = Timer('data[:] = list(map(func, data))', setup=setup)
> t2b = Timer('data[:] = map(func, data)', setup=setup)
>
> print('t1a', min(t1a.repeat(number=500, repeat=7)))
> print('t1b', min(t1b.repeat(number=500, repeat=7)))
> print('t2a', min(t2a.repeat(number=500, repeat=7)))
> print('t2b', min(t2b.repeat(number=500, repeat=7)))
> #
> t1a 0.5675313005414555
> t1b 0.7034254675598604
> t2a 0.518128598520
> t2b 0.5196112759726024
>
> If f does more work, the % difference among these will decrease.
>
>
>
> --
> Terry Jan Reedy
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-03 Thread Terry Reedy

On 11/3/2016 2:56 AM, arthurhavli...@gmail.com wrote:


lst = [ item for item in lst if predicate(item) ]
lst = [ f(item) for item in lst ]

Both these expressions feature redundancy, lst occurs twice and item at least 
twice. Additionally, the readability is hurt, because one has to dive through 
the semantics of the comprehension to truely understand I am filtering the list 
or remapping its values.

...

A language support for these operations to be made in-place could improve the 
efficiency of this operations through reduced use of memory.


We already have that: slice assignment with an iterator.

lst[:] = (item for item in list if predicate(item))
lst[:] = map(f, lst)  # iterator in 3.x.

To save memory, stop using unneeded temporary lists and use iterators 
instead.  If slice assignment is done as I hope it will optimize remain 
memory operations.  (But I have not read the code.) It should overwrite 
existing slots until either a) the iterator is exhausted or b) existing 
memory is used up.  When lst is both source and destination, only case 
a) can happen.  When it does, the list can be finalized with its new 
contents.


As for timings.

from timeit import Timer
setup = """data = list(range(1))
def func(x):
return x
"""
t1a = Timer('data[:] = [func(a) for a in data]', setup=setup)
t1b = Timer('data[:] = (func(a) for a in data)', setup=setup)
t2a = Timer('data[:] = list(map(func, data))', setup=setup)
t2b = Timer('data[:] = map(func, data)', setup=setup)

print('t1a', min(t1a.repeat(number=500, repeat=7)))
print('t1b', min(t1b.repeat(number=500, repeat=7)))
print('t2a', min(t2a.repeat(number=500, repeat=7)))
print('t2b', min(t2b.repeat(number=500, repeat=7)))
#
t1a 0.5675313005414555
t1b 0.7034254675598604
t2a 0.518128598520
t2b 0.5196112759726024

If f does more work, the % difference among these will decrease.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-03 Thread Paul Rubin
arthurhavli...@gmail.com writes:
> I would gladly appreciate your returns on this, regarding:
> 1 - Whether a similar proposition has been made
> 2 - If you find this of any interest at all
> 3 - If you have a suggestion for improving the proposal

Bleccch.  Might be ok as a behind-the-scenes optimization by the
compiler.  If you want something like C++ move semantics, use C++.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-03 Thread Chris Angelico
On Fri, Nov 4, 2016 at 4:00 AM, Steve D'Aprano
 wrote:
> On Fri, 4 Nov 2016 01:05 am, Chris Angelico wrote:
>
>> On Thu, Nov 3, 2016 at 7:29 PM, Steven D'Aprano
>>  wrote:
 lst = map (lambda x: x*5, lst)
 lst = filter (lambda x: x%3 == 1, lst)
 And perform especially bad in CPython compared to a comprehension.
>>>
>>> I doubt that.
>>>
>>
>> It's entirely possible. A list comp involves one function call (zero
>> in Py2), but map/lambda involves a function call per element in the
>> list. Function calls have overhead.
>
> I don't know why you think that list comps don't involve function calls.

List comps themselves involve one function call (zero in Py2). What
you do inside the expression is your business. Do you agree that list
comps don't have the overhead of opening and closing files?

files = "/etc/hostname", "/etc/resolv.conf", ".bashrc"
contents = [open(fn).read() for fn in files]

In a comparison between comprehensions and map, this cost is
irrelevant, unless your point is that "they're all pretty quick".

> Here's some timing results using 3.5 on my computer. For simplicity, so
> folks can replicate the test themselves, here's the timing code:
>
>
> from timeit import Timer
> setup = """data = list(range(1))
> def func(x):  # simulate some calculation
> return {x+1: x**2}
> """
> t1 = Timer('[func(a) for a in data]', setup=setup)
> t2 = Timer('list(map(func, data))', setup=setup)

This is very different from the original example, about which the OP
said that map performs badly, and you doubted it. In that example, the
list comp includes the expression directly, whereas map (by its
nature) must use a function. The "inline expression" of a
comprehension is more efficient than the "inline expression" of a
lambda function given to map.

> And here's the timing results on my machine:
>
> py> min(t1.repeat(number=1000, repeat=7))  # list comp
> 18.2571472954005
> py> min(t2.repeat(number=1000, repeat=7))  # map
> 18.157311914488673
>
> So there's hardly any difference, but map() is about 0.5% faster in this
> case.

Right. As has often been stated, map is perfectly efficient, *if* the
body of the comprehension would be simply a function call, nothing
more. You can map(str, stuff) to stringify a bunch of things. Nice.
But narrow, and not the code that was being discussed.

> Now, it is true that *if* you can write the list comprehension as a direct
> calculation in an expression instead of a function call:
>
> [a+1 for a in data]
>
> *and* compare it to map using a function call:
>
> map(lambda a: a+1, data)
>
>
> then the overhead of the function call in map() may be significant.

Thing is, this is extremely common. How often do you actually use a
comprehension with something that is absolutely exactly a function
call on the element in question?

> But the
> extra cost is effectively just a multiplier. It isn't going to change
> the "Big Oh" behaviour.

Sure it doesn't. In each case, the cost is linear. But the work is
linear, so I would expect the time to be linear.

> So map() here is less than a factor of two slower. I wouldn't call
> that "especially bad" -- often times, a factor of two is not important.
> What really hurts is O(N**2) performance, or worse.
>
> So at worst, map() is maybe half as fast as a list comprehension, and at
> best, its perhaps a smidgen faster. I would expect around the same
> performance, differing only by a small multiplicative factor: I wouldn't
> expect one to be thousands or even tens of times slower that the other.

But this conclusion I agree with. There is a performance difference,
but it is not overly significant. Compared to the *actual work*
involved in the task (going through one list and doing some operation
on each operation), the difference between map and a comprehension is
generally going to be negligible.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-03 Thread Arthur Havlicek
I understand that, the cost of change is such that it's very unlikely
something like this ever goes into Python, but I feel like the interest of
the proposition is being underestimated here, that's why I'm going to argue
a few points and give a bit more context as needed.

> While mapping and filtering are common operations, I'm not sure mapping
> and filtering and then reassigning back to the original sequence is
> especially common.

It depends of your context. On the last 3 months, I stumbled across this at
least 3 times, which is 3 times more than I used a lambda or a metaclass or
a decorator or other fancy language feature that we simply avoid whenever
possible. It also happened to my collegue. I remember these examples
because we had a bit of humour about how nice can be inlined ternaries
inside comprehensions, but I could be missing a lot more.

The reason I'm being especially impacted by this is because I am maintainer
of a decent-sized Python application (~50-100K lines of code) that
extensively uses lists and dictionaries. We value "low level" data
manipulation and efficiency a lot more than complex, non-obvious
constructs. In other contexts, it may be very different. Note that my
context is only relevant for illustration here, I don't expect a feature to
save me since we are currently shipping to Centos 6 and thus will not see
the light of Python 3.7 in the next 10 years (optimistic estimation).

> Arthur, I would suggest looking at what numpy and pandas do.

In my example context, their benefits can't apply, because I'm not going to
rewrite code that uses lists for them to uses np.array instead for example.
Although the performance boost is likely to be bigger if used properly, I
would have prefered a lesser boost that comes for (almost) free.

Like most Python programmers, I'm not in the case of needing a performance
boost very bad, but that does not mean I disregard performance entirely.
The need of performance is not so binary that it either don't matter at all
or is enough to motivate a rewrite.

> To my eyes, this proposed syntax is completely foreign

I must admit I don't have much imagination for syntax proposals...all that
mattered to me here was to make it clear you are doing an in-place
modification. Feel free to completely ignore that part. Any proposal
welcomed of course.
About Readability & Redundancy

I have misused the terms here, but I wasn't expecting so much nitpicking. I
should have used the term maintenability, because that one is bland and
subjective enough that nobody would have noticed :D

How about "I find that cooler." Good enough ?

In a less sarcastic tone:

What I truely meant here is that when you contain the behavior of your code
inside a specific keyword or syntax, you are making your intentions clear
to the reader. It may be harder for him to gain access to the knowledge in
the first place, but makes it easier over time.

Famous example:

When you learned programming, you may have had no idea what "+=" was doing,
but now you do, you probably rate "a += 2" syntax to be much better than "a
= a + 2". You make the economy of a token, but more important, you make
your intentions clearer because "+=" rings a bell, wihle "=" is a more
generic syntax with a broader meaning.

> So map() here is less than a factor of two slower. I wouldn't call
> that "especially bad" -- often times, a factor of two is not important.
> What really hurts is O(N**2) performance, or worse.

When you evaluate your application bottleneck or your average daily
algorithmic evaluation, perhaps. When regarding language core features we
are not on the same scale. If a language X is 2 times faster than a
language Y to do the same task, that's a huge seller, and is of real
importance.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-03 Thread Steve D'Aprano
On Fri, 4 Nov 2016 01:05 am, Chris Angelico wrote:

> On Thu, Nov 3, 2016 at 7:29 PM, Steven D'Aprano
>  wrote:
>>> lst = map (lambda x: x*5, lst)
>>> lst = filter (lambda x: x%3 == 1, lst)
>>> And perform especially bad in CPython compared to a comprehension.
>>
>> I doubt that.
>>
> 
> It's entirely possible. A list comp involves one function call (zero
> in Py2), but map/lambda involves a function call per element in the
> list. Function calls have overhead.

I don't know why you think that list comps don't involve function calls.

Here's some timing results using 3.5 on my computer. For simplicity, so
folks can replicate the test themselves, here's the timing code:


from timeit import Timer
setup = """data = list(range(1))
def func(x):  # simulate some calculation
return {x+1: x**2}
"""
t1 = Timer('[func(a) for a in data]', setup=setup)
t2 = Timer('list(map(func, data))', setup=setup)



And here's the timing results on my machine:

py> min(t1.repeat(number=1000, repeat=7))  # list comp
18.2571472954005
py> min(t2.repeat(number=1000, repeat=7))  # map
18.157311914488673

So there's hardly any difference, but map() is about 0.5% faster in this
case.

Now, it is true that *if* you can write the list comprehension as a direct
calculation in an expression instead of a function call:

[a+1 for a in data]

*and* compare it to map using a function call:

map(lambda a: a+1, data)


then the overhead of the function call in map() may be significant. But the
extra cost is effectively just a multiplier. It isn't going to change
the "Big Oh" behaviour. Here's an example:

t3 = Timer('[a+1 for a in data]', setup=setup)
t4 = Timer('list(map(lambda a: a+1, data))', setup=setup)
extra = """from functools import partial
from operator import add
"""
t5 = Timer('list(map(partial(add, 1), data))', setup=setup+extra)

And the timing results:

py> min(t3.repeat(number=1000, repeat=7))  # list comp with expression
2.6977453008294106
py> min(t4.repeat(number=1000, repeat=7))  # map with function
4.280411267653108
py> min(t5.repeat(number=1000, repeat=7))  # map with partial
3.446241058409214

So map() here is less than a factor of two slower. I wouldn't call
that "especially bad" -- often times, a factor of two is not important.
What really hurts is O(N**2) performance, or worse.

So at worst, map() is maybe half as fast as a list comprehension, and at
best, its perhaps a smidgen faster. I would expect around the same
performance, differing only by a small multiplicative factor: I wouldn't
expect one to be thousands or even tens of times slower that the other.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-03 Thread Terry Reedy

On 11/3/2016 4:29 AM, Steven D'Aprano wrote:


Nonsense. It is perfectly readable because it is explicit about what is being
done, unlike some magic method that you have to read the docs to understand
what it does.


Agreed.


A list comprehension or for-loop is more general and can be combined so you can
do both:

alist[:] = [func(x) for x in alist if condition(x)]


The qualifier 'list' is not needed.  The right hand side of slice 
assignment just has to be an iterable.  So a second interation to build 
an intermediate list is not required.


alist[:] = (func(x) for x in alist if condition(x))

The parentheses around the generator expression are required here. 
(Steven, I know you know that, but not everyone else will.)


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-03 Thread Chris Angelico
On Thu, Nov 3, 2016 at 7:29 PM, Steven D'Aprano
 wrote:
>> lst = map (lambda x: x*5, lst)
>> lst = filter (lambda x: x%3 == 1, lst)
>> And perform especially bad in CPython compared to a comprehension.
>
> I doubt that.
>

It's entirely possible. A list comp involves one function call (zero
in Py2), but map/lambda involves a function call per element in the
list. Function calls have overhead.

Arthur, I would suggest looking at what numpy and pandas do. When
you're working with ridiculously large data sets, they absolutely
shine; and if you're not working with that much data, the performance
of map or a list comp is unlikely to be significant. If the numpy
folks have a problem that can't be solved without new syntax, then a
proposal can be brought to the core (like matmul, which was approved
and implemented).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Pre-pep discussion material: in-place equivalents to map and filter

2016-11-03 Thread arthurhavlicek
Hi everybody,

I have an enhancement proposal for Python and, as suggested by PEP 1, am 
exposing a stub to the mailing list before possibly starting writing a PEP. 
This is my first message to python mailing list. I hope you will find this 
content of interest.

Python features a powerful and fast way to create lists through comprehensions. 
Because of their ease of use and efficiency through native implementation, they 
are an advantageous alternative to map, filter, and more. However, when used in 
replacement for an in-place version of these functions, they are sub-optimal, 
and Python offer no alternative.

This lack of implementation have already been pointed out:
http://stackoverflow.com/questions/4148375/is-there-an-in-place-equivalent-to-map-in-python
Notice the concerns of OPs in his comments to replies in this one:
http://stackoverflow.com/questions/3000461/python-map-in-place
In this answer, developpers here are wondering about performance issues 
regarding both loop iteration and comprehension:
http://stackoverflow.com/a/1540069/3323394

I would suggest to implement a language-level, in-place version for them. There 
is severeal motivations for this:

1 - Code readability and reduced redundancy

lst = [ item for item in lst if predicate(item) ]
lst = [ f(item) for item in lst ]

Both these expressions feature redundancy, lst occurs twice and item at least 
twice. Additionally, the readability is hurt, because one has to dive through 
the semantics of the comprehension to truely understand I am filtering the list 
or remapping its values.

Map and filter, although they are more explicit, also feature redundancy. They 
look OK with functional predicate:

lst = map (f, lst)
lst = filter (predicate, lst)

But are less elegant when using an expression, than one has to convert through 
a lambda:

lst = map (lambda x: x*5, lst)
lst = filter (lambda x: x%3 == 1, lst)

And perform especially bad in CPython compared to a comprehension.

2 - Efficiency

A language support for these operations to be made in-place could improve the 
efficiency of this operations through reduced use of memory.


I would propose this syntax. (TODO: find appropriate keywords I guess):

lst.map x: x*5
lst.filter x: x%3 == 1

They can work for dictionaries too.

dct.map k,v: v*5
dct.filter k,v: k+v == 10

The reasonning for the need of a language-level approach is the need for an 
efficient implementation that would support giving an arbitrary expression and 
not only a function. Expressions are shorter and, I guess, would be more 
efficient.


I would gladly appreciate your returns on this, regarding:
1 - Whether a similar proposition has been made
2 - If you find this of any interest at all
3 - If you have a suggestion for improving the proposal

Thanks for reading. Have a nice day
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-03 Thread Ned Batchelder
On Thursday, November 3, 2016 at 4:30:00 AM UTC-4, Steven D'Aprano wrote:
> On Thursday 03 November 2016 17:56, arthurhavli...@gmail.com wrote:
> > I would propose this syntax. (TODO: find appropriate keywords I guess):
> > 
> > lst.map x: x*5
> > lst.filter x: x%3 == 1
> 
> I think the chances of Guido accepting new syntax for something as trivial as 
> this with three existing solutions is absolutely zero.
> 
> I think the chances of Guido accepting new list/dict methods for in place map 
> and/or filter is a tiny bit higher than zero.


To my eyes, this proposed syntax is completely foreign. "lst.map" looks
like attribute or method access on lst, but there's no operation on the
result, or function call.  They implicitly assign back to lst, with no
recognizable syntax to indicate that they do (= or "as" is the usual
marker).

While mapping and filtering are common operations, I'm not sure mapping
and filtering and then reassigning back to the original sequence is
especially common.  It's certainly not common enough to deserve completely
new syntax when the existing methods already exist.

Part of the problem here is that you value explicitness, but also are
trying to reduce redundancy.  When you say that lst occurs twice in
your examples, what I see is that it occurs twice because it's being
used for two different things: it is the source of the data, and it is
also the target for the result. I think it is good to have it appear
twice in this case, especially since there's no reason to think it will
usually be used for both purposes.  How do I do exactly the same data
manipulation, but then assign it to lst2 because I belatedly realized
I wanted both the before and after list?  Under your proposed syntax,
I would have to completely rewrite the line to use a different filtering
mechanism because now I need to unbundle the filtering and the assignment.
That seems unfortunate.

You've done the right thing by bringing the proposal here.  I think it
is useful to see how people approach the language, and where they want
changes.  Discussing the pros and cons is a good way to get a deeper
appreciation both for the language and the rationale for its design.
But I don't think this proposal has a chance of moving forward.

--Ned.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pre-pep discussion material: in-place equivalents to map and filter

2016-11-03 Thread Steven D'Aprano
On Thursday 03 November 2016 17:56, arthurhavli...@gmail.com wrote:

[...]
> Python features a powerful and fast way to create lists through
> comprehensions. Because of their ease of use and efficiency through native
> implementation, they are an advantageous alternative to map, filter, and
> more. However, when used in replacement for an in-place version of these
> functions, they are sub-optimal, and Python offer no alternative.


Of course Python offers an alternative: the basic for loop.


for i, x in enumerate(alist):
alist[i] = func(x)


Not enough of a one-liner for you? Write a helper function:

def inplace_map(func, alist):
for i, x in enumerate(alist):
alist[i] = func(x)


In practice, you may even find that except for the most enormous lists (so big 
that memory becomes an issue, so we're talking tens of millions of items) it 
will probably be faster to use a slice and a comprehension:

alist[:] = [func(x) for x in alist]


[...]
> Notice the concerns of OPs in his comments to replies in this one:
> http://stackoverflow.com/questions/3000461/python-map-in-place

What I saw was the OP's *premature optimization*:

"I wanted to use map for performance gains"

"I figured map would be quicker than the list comprehension way"

Although in fairness he does also say:

"I'm working on a big list, and the times we're talking about are seconds in 
difference, which clearly matters"

I'm not necessarily sure that seconds always matters -- if your code takes 90 
seconds, or 96 seconds, who is going to even notice? But let's assume he's 
right and a few seconds difference makes a real difference.

He has three obvious alternatives to waiting until Python 3.7 (the earliest 
such a new feature can be added):

- modify the list in place with a for-loop;
- slice assignment using map;
- slice assignment using list comprehension.


> 1 - Code readability and reduced redundancy
> 
> lst = [ item for item in lst if predicate(item) ]
> lst = [ f(item) for item in lst ]
> 
> Both these expressions feature redundancy, lst occurs twice and item at least
> twice. 

That's not what redundancy means. "Redundancy" doesn't refer to the re-use of 
any arbitrary token. It means doing the same thing twice in two different 
places.


> Additionally, the readability is hurt, because one has to dive through
> the semantics of the comprehension to truely understand I am filtering the
> list or remapping its values.

Nonsense. It is perfectly readable because it is explicit about what is being 
done, unlike some magic method that you have to read the docs to understand 
what it does.

A list comprehension or for-loop is more general and can be combined so you can 
do both:

alist[:] = [func(x) for x in alist if condition(x)]



> Map and filter, although they are more explicit, 

*Less* explicit.

To most people, "map" means the thing that you follow when you are travelling 
in unfamiliar territory, and "filter" means the paper doohickey you put in your 
coffee machine to keep the ground up coffee from ending up in your cup.


> also feature redundancy.
> They look OK with functional predicate:
> 
> lst = map (f, lst)
> lst = filter (predicate, lst)
> 
> But are less elegant when using an expression, than one has to convert
> through a lambda:
> 
> lst = map (lambda x: x*5, lst)
> lst = filter (lambda x: x%3 == 1, lst)

And that's why we have list comprehensions.


> And perform especially bad in CPython compared to a comprehension.

I doubt that.



> 2 - Efficiency
> 
> A language support for these operations to be made in-place could improve the
> efficiency of this operations through reduced use of memory.

*shrug*

Saving memory sometimes costs time.


> I would propose this syntax. (TODO: find appropriate keywords I guess):
> 
> lst.map x: x*5
> lst.filter x: x%3 == 1

I think the chances of Guido accepting new syntax for something as trivial as 
this with three existing solutions is absolutely zero.

I think the chances of Guido accepting new list/dict methods for in place map 
and/or filter is a tiny bit higher than zero.


> The reasonning for the need of a language-level approach is the need for an
> efficient implementation that would support giving an arbitrary expression
> and not only a function. 

We already have three of those: for-loops, list comprehensions, and map.




-- 
Steven
299792.458 km/s — not just a good idea, it’s the law!

-- 
https://mail.python.org/mailman/listinfo/python-list