Re: Pre-pep discussion material: in-place equivalents to map and filter
On Sun, Nov 6, 2016 at 12:50 AM, Arthur Havlicekwrote: > 2016-11-05 12:47 GMT+01:00 Chris Angelico : > >> On Sat, Nov 5, 2016 at 9:50 PM, Arthur Havlicek >> >> But here's the thing. For everyone who writes a decorator function, >> there could be dozens who use it. >> > > The day that one guy leaves the team, suddenly you have code that's become > a bit tricky to maintain. True, and this can become a problem when those dozens have no comprehension of how these features work. But fortunately, all it takes is for one person to step up and learn how decorators are written, and the problem is solved. (And it's not that hard. We teach decorator authorship in our Web Programming In Python course. Not in a huge amount of detail, but enough that a student will be able to carefully tease apart the code of a decorator function and figure out what it's actually doing.) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
2016-11-05 12:47 GMT+01:00 Chris Angelico: > On Sat, Nov 5, 2016 at 9:50 PM, Arthur Havlicek > > But here's the thing. For everyone who writes a decorator function, > there could be dozens who use it. > The day that one guy leaves the team, suddenly you have code that's become a bit tricky to maintain. Our philosophy is that everyone should be able to contribute anywhere although I agree with you: there is such thing as using a decorator as a black box, and even diving into it isn't that hard. Notice I didn't say we don't use these features at all, we just tend to avoid them when a more familiar approach exist. Even so, maybe we are wrong to do so, and the code would be clearer if we used more features. Maybe we just lack the expertise and we are indeed stuck in an old mindset, after all. Anyway, I have some directives to make my code easy to access and must compose with what I would call industrial constraints. As a gambler I can't refuse that bet proposal, no sir :) Expect me to come back with a piece of code and/or questions about CPython engine. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
On Sat, Nov 5, 2016 at 9:50 PM, Arthur Havlicekwrote: > Pick 10 programmers for hire and count how many know how to write a > decorator. If you have specified you needed python specialists, you may > have 3-4. If not, you are lucky to find even one. By "write a decorator", I presume you mean implement the decorator function, because anyone who's used Flask will have used "@app.route", for instance. But here's the thing. For everyone who writes a decorator function, there could be dozens who use it. How many people *on this mailing list* know how to implement namedtuple, vs how how many know how to use it? How many know how to properly implement a cache, vs how many can slap "@lru_cache()" on top of a function? For the most utterly extreme example possible, how many people in the world know how to encrypt data securely, vs how many know how to put "https:" into their browsers? When you look at "programmers for hire", you get a small number of brilliant experts with huge amounts of experience, and the rest are split between mid-level people looking for a new job, and entry-level people straight out of college. (Depending on your environment, the stats could easily be skewed either direction.) Most entry-level programmers are not going to have experience with writing decorator functions - but they don't need it. (Not that it's that hard. They're functions that accept and return functions, that's all.) > This is easy but time-consuming, I could roll my implementation and > showcase a few benchs. I am very confident that is the case. I would have > bet that I would need to do it at some point, but I wanted to poll opinions > a bit beforehand. > > About your bench: I don't really know why you are surprised the for loop is > slower. That's the result of comprehension being native while for loop > isn't. That does not mean writing to a copy would save time for exchange of > memory. In my opinion, the fact that we will win something is guaranteed > because we save a copy call and do the exact same operation. There is > nothing like cache magic optimization that could happen because the mapping > needs to read the first list anyway. Nor we need a temporary buffer to > cache operations since they are independent. Really, I am ready to make a > serious bet. Okay! *Do* make that bet. Gamble the most precious currency there is: developer time. How much are you prepared to bet? Two hours? Ten hours? Spend that much time writing the implementation and benchmarking it. If you're confident that this really is such a win, the time won't be lost - you'll end up with a viable patch for inclusion. If you lose the bet, well, that's what betting is like. Sometimes you lose. I may sound cynical and critical from the above ("show me"), but in reality, I would love to see the results of your benchmark. Improving performance is a good thing. I just want to know that it's actually going to happen. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
2016-11-05 9:42 GMT+01:00 Steve D'Aprano: > > I don't know who you are quoting there. It is considered impolite to quote > people without giving attribution, and makes it harder to respond. > My bad. I was unaware of that. This was quoted from Ned Batchelder's mali. 2016-11-05 9:42 GMT+01:00 Steve D'Aprano : > I trust that isn't actually the case, but by describing decorators and > lambda as too fancy to use, you're giving a good impression of somebody who > doesn't know what they're doing. > I was a bit humorous here, making fun about the fact I do not write anything sophisticated. However, the fact that we do avoid them is entirely true ! We do so because these concepts are foreign to the average developer (lambda being a bit apart: it's easily understood, but functional constructs may not, so its a result of us avoiding functional constructs as well). Pick 10 programmers for hire and count how many know how to write a decorator. If you have specified you needed python specialists, you may have 3-4. If not, you are lucky to find even one. And where I live we don't have the luxury of finding competent Python experts by kicking a tree. In our open space, we are 6 devs (+ one ex-dev being my manager), only 2 had previous Python experience: me and the CEO. And the other 4 are doing fine ! Locking ourselves in Python-specific semantics, over time, will make us loose precious dev time. The best code is the one everyone can understand. That is why I point this as "complex, non-obvious", and my manager wouldn't have so kind words (but he is a Java fan, so he doesn't count.) 2016-11-05 9:42 GMT+01:00 Steve D'Aprano : > - you would have to demonstrate a good reason to think that this new > feature > will actually be more efficient and faster > This is easy but time-consuming, I could roll my implementation and showcase a few benchs. I am very confident that is the case. I would have bet that I would need to do it at some point, but I wanted to poll opinions a bit beforehand. 2016-11-05 9:42 GMT+01:00 Steve D'Aprano : > In other words, in my opinion: > > - you would need to prove that there is a widespread need for in-place > modification of lists, where the lists are big enough that using a > temporary list is not practical > However this is tricky, I may very well be an exception. About your bench: I don't really know why you are surprised the for loop is slower. That's the result of comprehension being native while for loop isn't. That does not mean writing to a copy would save time for exchange of memory. In my opinion, the fact that we will win something is guaranteed because we save a copy call and do the exact same operation. There is nothing like cache magic optimization that could happen because the mapping needs to read the first list anyway. Nor we need a temporary buffer to cache operations since they are independent. Really, I am ready to make a serious bet. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
I've been giving your proposal a bit more thought, and while I can't say I'm really keep on the idea, I have warmed slightly to it. On Fri, 4 Nov 2016 07:29 am, Arthur Havlicek wrote: > I understand that, the cost of change is such that it's very unlikely > something like this ever goes into Python, but I feel like the interest of > the proposition is being underestimated here, that's why I'm going to > argue a few points and give a bit more context as needed. > >> While mapping and filtering are common operations, I'm not sure mapping >> and filtering and then reassigning back to the original sequence is >> especially common. > > It depends of your context. On the last 3 months, I stumbled across this > at least 3 times, I don't know who you are quoting there. It is considered impolite to quote people without giving attribution, and makes it harder to respond. But for what it's worth, I agree with this person. In my code, it is quite common for me to map back to the same *variable*, for example I might write something like this: def process(alist): alist = [float(x) for x in alist] ... But that is not the same as *in-place* modification of the list. I would very rarely modify the list in place, but if I did, I would write it like this: alist[:] = [float(x) for x in alist] In my experience, wanting to modify a list in-place without creating a temporary version first is unusual. And *needing* to do so (rather than doing so as premature optimization) is even rarer. That's my experience. But if you can demonstrate the need for in-place modifications, that changes the situation somewhat. > which is 3 times more than I used a lambda or a > metaclass or a decorator or other fancy language feature that we simply > avoid whenever possible. You aren't helping your case by describing "lambda or ... decorator" as fancy language features to be avoided. The impression that gives (hopefully this is the wrong impression!) is that you are a barely competent Python developer, fearful and suspicious of some rather simple, powerful and widely-used features, stuck in an ancient 1980s programming paradigm. I trust that isn't actually the case, but by describing decorators and lambda as too fancy to use, you're giving a good impression of somebody who doesn't know what they're doing. [...] > The reason I'm being especially impacted by this is because I am > maintainer of a decent-sized Python application (~50-100K lines of code) > that extensively uses lists and dictionaries. We value "low level" data > manipulation and efficiency a lot more than complex, non-obvious > constructs. And what do you consider "complex, non-obvious"? [...] > Like most Python programmers, I'm not in the case of needing a performance > boost very bad, but that does not mean I disregard performance entirely. > The need of performance is not so binary that it either don't matter at > all or is enough to motivate a rewrite. Indeed. But you're arguing for a new feature on the basis that it will boost performance, without giving any reason to think that it actually will lead to a performance improvement! I know that there is a chicken-and-egg problem here. Nobody wants to spend time writing a new feature if there's no possibly chance for it to be accepted, but when your sole argument for a new feature is that it will be more efficient (in both memory and time!), then you need to prove that is the case! In other words, in my opinion: - you would need to prove that there is a widespread need for in-place modification of lists, where the lists are big enough that using a temporary list is not practical; - you would have to demonstrate a good reason to think that this new feature will actually be more efficient and faster before this feature would be seriously considered for addition to the language. You might think that it is obvious that in-place modification MUST be faster since it avoids making a temporary copy, but that's not obviously true at all! Not for a high-level language like Python. Its often the case that algorithms can exchange space for time (use more memory to save time, or use more time to save memory). It may be that this is one of the times. Even when doing (nearly) everything in pure Python, making a new list and then doing a slice assignment is virtually as fast as writing directly to the original list: py> from timeit import Timer py> setup = "data = list(range(1))" py> t1 = Timer("""for i, a in enumerate(data): ... data[i] = a+1 ... """, setup=setup) py> py> t2 = Timer("""tmp = [None]*len(data) ... for i, a in enumerate(data): ... tmp[i] = a+1 ... data[:] = tmp ... """, setup=setup) py> py> min(t1.repeat(number=1, repeat=7)) 36.63875983003527 py> py> min(t2.repeat(number=1, repeat=7)) 37.70047474466264 Taking the plain Python for-loop, modifying the list directly in place, we see that making a temporary list and then copying it over the original list is just 3%
Re: Pre-pep discussion material: in-place equivalents to map and filter
> I don't think that justifies the claim of "especially > bad", which to me implies something much worse. Quicksort has built its popularity by performing better by "a mere factor two" better than mergesort and heapsort. It became the reference sorting algorithm even though its worst case complexity is worse than its competitors. You can insurge about the fact I call a factor two to be especially bad, and you are right it is an hyperbole in the general case, but I am still right in the specific context of a frequently used linear algorithm. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
On Sat, Nov 5, 2016 at 11:42 AM, Steve D'Apranowrote: > On Fri, 4 Nov 2016 08:34 am, Chris Angelico wrote: > > [...] >> List comps themselves involve one function call (zero in Py2). What >> you do inside the expression is your business. Do you agree that list >> comps don't have the overhead of opening and closing files? > > /tongue firmly in cheek > > I'd like to see you run a Python script containing a list comprehension > without opening and closing the .py file. > > :-P You got me! Let's call it a night. -- Kristoff (nearly my namesake) > Okay, I see what you are getting at now: in CPython 3, list comprehensions > are implemented in such a way that the list comprehension requires a > minimum of one function call, while list(map(func, iterable))) requires a > minimum of O(N)+2 function calls: a call to list, a call to map, and a call > to func for each of N values. > > That's *technically* true, but it is an implementation detail. I'm sure that > some day PyPy or Nuitka or maybe even CPython itself will start in-lining > at least some functions (if PyPy doesn't already do so). That would be an > obvious optimization to apply to map() and filter(). Mmmm, interesting. The fact still remains that map depends on some kind of "object representation" of a block of code (since it's being passed to some other function), where a comprehension can be implemented as an actual expression. So either Python-the-language needs a way to pass around lightweight blocks of code, or the interpreter (for some instance of 'the interpreter') needs to recognize the function calls and optimize them away, or list comps will always have an inherent advantage over map. > As far as the speed of map() versus list comps, my micro-benchmarks show > that at least on my computer, map() can be marginally but consistently > faster than a list comp once you equalise that cost of function calls, that > is, if the list comp explicitly calls a function. I don't think that speed > difference is significant, so let's just call them "equally fast" when > comparing similar cases: > > [func(obj) for obj in iterable] > > map(func, iterable) Right, which is why a lot of style guides recommend against the first form, *in this specific instance*. Using map with a lambda function, or a comprehension with nothing but a function call, is rightly called out in code review. > I didn't read the OP as making a specific claim about these two *specific* > map and filter examples: > > lst = map (lambda x: x*5, lst) > lst = filter (lambda x: x%3 == 1, lst) > > I read these as mere examples of a general claim that map and > filter "perform especially bad in CPython compared to a comprehension". > ... I don't think that justifies the claim of "especially > bad", which to me implies something much worse. If you're going to describe > a mere factor of two as "especially bad", what words do we have left for > something that is ten thousand times slower? > > As the wisest man in the universe once said, hyperbole is the most terrible, > awful crime against humanity. > > *wink* Ah, now we get to the point where we disagreed. I was responding to a misunderstanding of your position - I thought you disagreed that the performance difference could even be significant, but you were arguing against the "especially bad". Gotcha. In that case, I believe we're in agreement; even a two-to-one difference isn't "especially bad" here, and that would be an extreme case. >> But this conclusion I agree with. There is a performance difference, >> but it is not overly significant. Compared to the *actual work* >> involved in the task (going through one list and doing some operation >> on each operation), the difference between map and a comprehension is >> generally going to be negligible. > > I wouldn't go quite so far as to say "negligible" -- a factor of two speed > up on a large list is not something to be sneezed at. But I think we're > converging on agreement: list comps and map/filter typically have > comparable performance, and as the cost of the work done increases, the > extra overhead of a function call becomes less and less significant. I said negligible because the factor of two disappears almost completely when you add a large constant factor to it. What's the performance of these? def lie(n): """it's not a fib, honest""" if n < 2: return 3 return lie(n-1) + lie(n-2) mapped = list(map(lie, range(50))) comprehended = [lie(x) for x in range(50)] badly_mapped = list(map(lambda x: lie(x), range(50))) By any reasonable metric, I would expect all three to have extremely comparable performance. The difference between map's best case (passing an existing function), a list comp, and map's worst case (wrapping the function in a useless lambda function) might be two to one, but it's negligible compared to the surpassing cost of those massively-nested lies. Oh, what a tangled web we weave... ChrisA --
Re: Pre-pep discussion material: in-place equivalents to map and filter
On Fri, 4 Nov 2016 08:34 am, Chris Angelico wrote: [...] > List comps themselves involve one function call (zero in Py2). What > you do inside the expression is your business. Do you agree that list > comps don't have the overhead of opening and closing files? /tongue firmly in cheek I'd like to see you run a Python script containing a list comprehension without opening and closing the .py file. :-P Okay, I see what you are getting at now: in CPython 3, list comprehensions are implemented in such a way that the list comprehension requires a minimum of one function call, while list(map(func, iterable))) requires a minimum of O(N)+2 function calls: a call to list, a call to map, and a call to func for each of N values. That's *technically* true, but it is an implementation detail. I'm sure that some day PyPy or Nuitka or maybe even CPython itself will start in-lining at least some functions (if PyPy doesn't already do so). That would be an obvious optimization to apply to map() and filter(). As far as the speed of map() versus list comps, my micro-benchmarks show that at least on my computer, map() can be marginally but consistently faster than a list comp once you equalise that cost of function calls, that is, if the list comp explicitly calls a function. I don't think that speed difference is significant, so let's just call them "equally fast" when comparing similar cases: [func(obj) for obj in iterable] map(func, iterable) But of course you're right that list comprehensions give you the opportunity to avoid that function call -- at least in CPython. I already agreed with that, but to emphasise what I've already agreed, I'll say it again :-) If you can manually in-line func() as a single expression inside the list comprehension: [(spam + len(obj.eggs))*2 for obj in iterable] then you can expect to save the cost of N function calls, which may be significant. As I said earlier, that's why we have list comps. (But on the other hand, if the expression is expensive enough, the cost of an extra function call may be utterly insignificant.) The point that I am trying to make is that none of these facts justifies the claim that map() performs "especially bad" compared to list comprehensions. According to my tests, *at worst* map() will be a bit better than half as fast as a list comprehension, and at best just as fast if not slightly faster. >> Here's some timing results using 3.5 on my computer. For simplicity, so >> folks can replicate the test themselves, here's the timing code: >> >> >> from timeit import Timer >> setup = """data = list(range(1)) >> def func(x): # simulate some calculation >> return {x+1: x**2} >> """ >> t1 = Timer('[func(a) for a in data]', setup=setup) >> t2 = Timer('list(map(func, data))', setup=setup) > > This is very different from the original example, about which the OP > said that map performs badly, and you doubted it. I didn't read the OP as making a specific claim about these two *specific* map and filter examples: lst = map (lambda x: x*5, lst) lst = filter (lambda x: x%3 == 1, lst) I read these as mere examples of a general claim that map and filter "perform especially bad in CPython compared to a comprehension". But just for the exercise, I repeated my benchmarks with these specific examples, comparing: list(map(lambda x: x*5, data)) [x*5 for x in data] and list(filter(lambda x: x%3 == 1, data)) [x for x in data if x%3 == 1] and again got a roughly factor of two performance difference, with the list comp being faster. I don't think that justifies the claim of "especially bad", which to me implies something much worse. If you're going to describe a mere factor of two as "especially bad", what words do we have left for something that is ten thousand times slower? As the wisest man in the universe once said, hyperbole is the most terrible, awful crime against humanity. *wink* [...] > Thing is, this is extremely common. How often do you actually use a > comprehension with something that is absolutely exactly a function > call on the element in question? "This" being something that can be in-lined in the body of the list comp. Sure. I cheerfully acknowledge that list comps where you can write an in-line expression are very common. That's the beauty of list comps! [...] > But this conclusion I agree with. There is a performance difference, > but it is not overly significant. Compared to the *actual work* > involved in the task (going through one list and doing some operation > on each operation), the difference between map and a comprehension is > generally going to be negligible. I wouldn't go quite so far as to say "negligible" -- a factor of two speed up on a large list is not something to be sneezed at. But I think we're converging on agreement: list comps and map/filter typically have comparable performance, and as the cost of the work done increases, the extra overhead of a function call becomes
Re: Pre-pep discussion material: in-place equivalents to map and filter
> If slice assignment is done as I hope it will optimize remain memory operations. Bad news. http://stackoverflow.com/questions/4948293/python-slice-assignment-memory-usage/4948508#4948508 > If you want something like C++ move semantics, use C++. I don't see anything like this in my proposal. If any in-place operation is "C++ semantics" how do you explain there is already a bunch of in-place operator stuffed in Python that have even less justification to their existance than map or filter does such as sort/sorted, reverse/reversed ? Especially troubling since the optimisation in a sort operation is likely to be less significant than in a linear algorithm. 2016-11-04 1:03 GMT+01:00 Terry Reedy: > On 11/3/2016 2:56 AM, arthurhavli...@gmail.com wrote: > > lst = [ item for item in lst if predicate(item) ] >> lst = [ f(item) for item in lst ] >> >> Both these expressions feature redundancy, lst occurs twice and item at >> least twice. Additionally, the readability is hurt, because one has to dive >> through the semantics of the comprehension to truely understand I am >> filtering the list or remapping its values. >> > ... > >> A language support for these operations to be made in-place could improve >> the efficiency of this operations through reduced use of memory. >> > > We already have that: slice assignment with an iterator. > > lst[:] = (item for item in list if predicate(item)) > lst[:] = map(f, lst) # iterator in 3.x. > > To save memory, stop using unneeded temporary lists and use iterators > instead. If slice assignment is done as I hope it will optimize remain > memory operations. (But I have not read the code.) It should overwrite > existing slots until either a) the iterator is exhausted or b) existing > memory is used up. When lst is both source and destination, only case a) > can happen. When it does, the list can be finalized with its new contents. > > As for timings. > > from timeit import Timer > setup = """data = list(range(1)) > def func(x): > return x > """ > t1a = Timer('data[:] = [func(a) for a in data]', setup=setup) > t1b = Timer('data[:] = (func(a) for a in data)', setup=setup) > t2a = Timer('data[:] = list(map(func, data))', setup=setup) > t2b = Timer('data[:] = map(func, data)', setup=setup) > > print('t1a', min(t1a.repeat(number=500, repeat=7))) > print('t1b', min(t1b.repeat(number=500, repeat=7))) > print('t2a', min(t2a.repeat(number=500, repeat=7))) > print('t2b', min(t2b.repeat(number=500, repeat=7))) > # > t1a 0.5675313005414555 > t1b 0.7034254675598604 > t2a 0.518128598520 > t2b 0.5196112759726024 > > If f does more work, the % difference among these will decrease. > > > > -- > Terry Jan Reedy > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
On 11/3/2016 2:56 AM, arthurhavli...@gmail.com wrote: lst = [ item for item in lst if predicate(item) ] lst = [ f(item) for item in lst ] Both these expressions feature redundancy, lst occurs twice and item at least twice. Additionally, the readability is hurt, because one has to dive through the semantics of the comprehension to truely understand I am filtering the list or remapping its values. ... A language support for these operations to be made in-place could improve the efficiency of this operations through reduced use of memory. We already have that: slice assignment with an iterator. lst[:] = (item for item in list if predicate(item)) lst[:] = map(f, lst) # iterator in 3.x. To save memory, stop using unneeded temporary lists and use iterators instead. If slice assignment is done as I hope it will optimize remain memory operations. (But I have not read the code.) It should overwrite existing slots until either a) the iterator is exhausted or b) existing memory is used up. When lst is both source and destination, only case a) can happen. When it does, the list can be finalized with its new contents. As for timings. from timeit import Timer setup = """data = list(range(1)) def func(x): return x """ t1a = Timer('data[:] = [func(a) for a in data]', setup=setup) t1b = Timer('data[:] = (func(a) for a in data)', setup=setup) t2a = Timer('data[:] = list(map(func, data))', setup=setup) t2b = Timer('data[:] = map(func, data)', setup=setup) print('t1a', min(t1a.repeat(number=500, repeat=7))) print('t1b', min(t1b.repeat(number=500, repeat=7))) print('t2a', min(t2a.repeat(number=500, repeat=7))) print('t2b', min(t2b.repeat(number=500, repeat=7))) # t1a 0.5675313005414555 t1b 0.7034254675598604 t2a 0.518128598520 t2b 0.5196112759726024 If f does more work, the % difference among these will decrease. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
arthurhavli...@gmail.com writes: > I would gladly appreciate your returns on this, regarding: > 1 - Whether a similar proposition has been made > 2 - If you find this of any interest at all > 3 - If you have a suggestion for improving the proposal Bleccch. Might be ok as a behind-the-scenes optimization by the compiler. If you want something like C++ move semantics, use C++. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
On Fri, Nov 4, 2016 at 4:00 AM, Steve D'Apranowrote: > On Fri, 4 Nov 2016 01:05 am, Chris Angelico wrote: > >> On Thu, Nov 3, 2016 at 7:29 PM, Steven D'Aprano >> wrote: lst = map (lambda x: x*5, lst) lst = filter (lambda x: x%3 == 1, lst) And perform especially bad in CPython compared to a comprehension. >>> >>> I doubt that. >>> >> >> It's entirely possible. A list comp involves one function call (zero >> in Py2), but map/lambda involves a function call per element in the >> list. Function calls have overhead. > > I don't know why you think that list comps don't involve function calls. List comps themselves involve one function call (zero in Py2). What you do inside the expression is your business. Do you agree that list comps don't have the overhead of opening and closing files? files = "/etc/hostname", "/etc/resolv.conf", ".bashrc" contents = [open(fn).read() for fn in files] In a comparison between comprehensions and map, this cost is irrelevant, unless your point is that "they're all pretty quick". > Here's some timing results using 3.5 on my computer. For simplicity, so > folks can replicate the test themselves, here's the timing code: > > > from timeit import Timer > setup = """data = list(range(1)) > def func(x): # simulate some calculation > return {x+1: x**2} > """ > t1 = Timer('[func(a) for a in data]', setup=setup) > t2 = Timer('list(map(func, data))', setup=setup) This is very different from the original example, about which the OP said that map performs badly, and you doubted it. In that example, the list comp includes the expression directly, whereas map (by its nature) must use a function. The "inline expression" of a comprehension is more efficient than the "inline expression" of a lambda function given to map. > And here's the timing results on my machine: > > py> min(t1.repeat(number=1000, repeat=7)) # list comp > 18.2571472954005 > py> min(t2.repeat(number=1000, repeat=7)) # map > 18.157311914488673 > > So there's hardly any difference, but map() is about 0.5% faster in this > case. Right. As has often been stated, map is perfectly efficient, *if* the body of the comprehension would be simply a function call, nothing more. You can map(str, stuff) to stringify a bunch of things. Nice. But narrow, and not the code that was being discussed. > Now, it is true that *if* you can write the list comprehension as a direct > calculation in an expression instead of a function call: > > [a+1 for a in data] > > *and* compare it to map using a function call: > > map(lambda a: a+1, data) > > > then the overhead of the function call in map() may be significant. Thing is, this is extremely common. How often do you actually use a comprehension with something that is absolutely exactly a function call on the element in question? > But the > extra cost is effectively just a multiplier. It isn't going to change > the "Big Oh" behaviour. Sure it doesn't. In each case, the cost is linear. But the work is linear, so I would expect the time to be linear. > So map() here is less than a factor of two slower. I wouldn't call > that "especially bad" -- often times, a factor of two is not important. > What really hurts is O(N**2) performance, or worse. > > So at worst, map() is maybe half as fast as a list comprehension, and at > best, its perhaps a smidgen faster. I would expect around the same > performance, differing only by a small multiplicative factor: I wouldn't > expect one to be thousands or even tens of times slower that the other. But this conclusion I agree with. There is a performance difference, but it is not overly significant. Compared to the *actual work* involved in the task (going through one list and doing some operation on each operation), the difference between map and a comprehension is generally going to be negligible. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
I understand that, the cost of change is such that it's very unlikely something like this ever goes into Python, but I feel like the interest of the proposition is being underestimated here, that's why I'm going to argue a few points and give a bit more context as needed. > While mapping and filtering are common operations, I'm not sure mapping > and filtering and then reassigning back to the original sequence is > especially common. It depends of your context. On the last 3 months, I stumbled across this at least 3 times, which is 3 times more than I used a lambda or a metaclass or a decorator or other fancy language feature that we simply avoid whenever possible. It also happened to my collegue. I remember these examples because we had a bit of humour about how nice can be inlined ternaries inside comprehensions, but I could be missing a lot more. The reason I'm being especially impacted by this is because I am maintainer of a decent-sized Python application (~50-100K lines of code) that extensively uses lists and dictionaries. We value "low level" data manipulation and efficiency a lot more than complex, non-obvious constructs. In other contexts, it may be very different. Note that my context is only relevant for illustration here, I don't expect a feature to save me since we are currently shipping to Centos 6 and thus will not see the light of Python 3.7 in the next 10 years (optimistic estimation). > Arthur, I would suggest looking at what numpy and pandas do. In my example context, their benefits can't apply, because I'm not going to rewrite code that uses lists for them to uses np.array instead for example. Although the performance boost is likely to be bigger if used properly, I would have prefered a lesser boost that comes for (almost) free. Like most Python programmers, I'm not in the case of needing a performance boost very bad, but that does not mean I disregard performance entirely. The need of performance is not so binary that it either don't matter at all or is enough to motivate a rewrite. > To my eyes, this proposed syntax is completely foreign I must admit I don't have much imagination for syntax proposals...all that mattered to me here was to make it clear you are doing an in-place modification. Feel free to completely ignore that part. Any proposal welcomed of course. About Readability & Redundancy I have misused the terms here, but I wasn't expecting so much nitpicking. I should have used the term maintenability, because that one is bland and subjective enough that nobody would have noticed :D How about "I find that cooler." Good enough ? In a less sarcastic tone: What I truely meant here is that when you contain the behavior of your code inside a specific keyword or syntax, you are making your intentions clear to the reader. It may be harder for him to gain access to the knowledge in the first place, but makes it easier over time. Famous example: When you learned programming, you may have had no idea what "+=" was doing, but now you do, you probably rate "a += 2" syntax to be much better than "a = a + 2". You make the economy of a token, but more important, you make your intentions clearer because "+=" rings a bell, wihle "=" is a more generic syntax with a broader meaning. > So map() here is less than a factor of two slower. I wouldn't call > that "especially bad" -- often times, a factor of two is not important. > What really hurts is O(N**2) performance, or worse. When you evaluate your application bottleneck or your average daily algorithmic evaluation, perhaps. When regarding language core features we are not on the same scale. If a language X is 2 times faster than a language Y to do the same task, that's a huge seller, and is of real importance. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
On Fri, 4 Nov 2016 01:05 am, Chris Angelico wrote: > On Thu, Nov 3, 2016 at 7:29 PM, Steven D'Aprano >wrote: >>> lst = map (lambda x: x*5, lst) >>> lst = filter (lambda x: x%3 == 1, lst) >>> And perform especially bad in CPython compared to a comprehension. >> >> I doubt that. >> > > It's entirely possible. A list comp involves one function call (zero > in Py2), but map/lambda involves a function call per element in the > list. Function calls have overhead. I don't know why you think that list comps don't involve function calls. Here's some timing results using 3.5 on my computer. For simplicity, so folks can replicate the test themselves, here's the timing code: from timeit import Timer setup = """data = list(range(1)) def func(x): # simulate some calculation return {x+1: x**2} """ t1 = Timer('[func(a) for a in data]', setup=setup) t2 = Timer('list(map(func, data))', setup=setup) And here's the timing results on my machine: py> min(t1.repeat(number=1000, repeat=7)) # list comp 18.2571472954005 py> min(t2.repeat(number=1000, repeat=7)) # map 18.157311914488673 So there's hardly any difference, but map() is about 0.5% faster in this case. Now, it is true that *if* you can write the list comprehension as a direct calculation in an expression instead of a function call: [a+1 for a in data] *and* compare it to map using a function call: map(lambda a: a+1, data) then the overhead of the function call in map() may be significant. But the extra cost is effectively just a multiplier. It isn't going to change the "Big Oh" behaviour. Here's an example: t3 = Timer('[a+1 for a in data]', setup=setup) t4 = Timer('list(map(lambda a: a+1, data))', setup=setup) extra = """from functools import partial from operator import add """ t5 = Timer('list(map(partial(add, 1), data))', setup=setup+extra) And the timing results: py> min(t3.repeat(number=1000, repeat=7)) # list comp with expression 2.6977453008294106 py> min(t4.repeat(number=1000, repeat=7)) # map with function 4.280411267653108 py> min(t5.repeat(number=1000, repeat=7)) # map with partial 3.446241058409214 So map() here is less than a factor of two slower. I wouldn't call that "especially bad" -- often times, a factor of two is not important. What really hurts is O(N**2) performance, or worse. So at worst, map() is maybe half as fast as a list comprehension, and at best, its perhaps a smidgen faster. I would expect around the same performance, differing only by a small multiplicative factor: I wouldn't expect one to be thousands or even tens of times slower that the other. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
On 11/3/2016 4:29 AM, Steven D'Aprano wrote: Nonsense. It is perfectly readable because it is explicit about what is being done, unlike some magic method that you have to read the docs to understand what it does. Agreed. A list comprehension or for-loop is more general and can be combined so you can do both: alist[:] = [func(x) for x in alist if condition(x)] The qualifier 'list' is not needed. The right hand side of slice assignment just has to be an iterable. So a second interation to build an intermediate list is not required. alist[:] = (func(x) for x in alist if condition(x)) The parentheses around the generator expression are required here. (Steven, I know you know that, but not everyone else will.) -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
On Thu, Nov 3, 2016 at 7:29 PM, Steven D'Apranowrote: >> lst = map (lambda x: x*5, lst) >> lst = filter (lambda x: x%3 == 1, lst) >> And perform especially bad in CPython compared to a comprehension. > > I doubt that. > It's entirely possible. A list comp involves one function call (zero in Py2), but map/lambda involves a function call per element in the list. Function calls have overhead. Arthur, I would suggest looking at what numpy and pandas do. When you're working with ridiculously large data sets, they absolutely shine; and if you're not working with that much data, the performance of map or a list comp is unlikely to be significant. If the numpy folks have a problem that can't be solved without new syntax, then a proposal can be brought to the core (like matmul, which was approved and implemented). ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Pre-pep discussion material: in-place equivalents to map and filter
Hi everybody, I have an enhancement proposal for Python and, as suggested by PEP 1, am exposing a stub to the mailing list before possibly starting writing a PEP. This is my first message to python mailing list. I hope you will find this content of interest. Python features a powerful and fast way to create lists through comprehensions. Because of their ease of use and efficiency through native implementation, they are an advantageous alternative to map, filter, and more. However, when used in replacement for an in-place version of these functions, they are sub-optimal, and Python offer no alternative. This lack of implementation have already been pointed out: http://stackoverflow.com/questions/4148375/is-there-an-in-place-equivalent-to-map-in-python Notice the concerns of OPs in his comments to replies in this one: http://stackoverflow.com/questions/3000461/python-map-in-place In this answer, developpers here are wondering about performance issues regarding both loop iteration and comprehension: http://stackoverflow.com/a/1540069/3323394 I would suggest to implement a language-level, in-place version for them. There is severeal motivations for this: 1 - Code readability and reduced redundancy lst = [ item for item in lst if predicate(item) ] lst = [ f(item) for item in lst ] Both these expressions feature redundancy, lst occurs twice and item at least twice. Additionally, the readability is hurt, because one has to dive through the semantics of the comprehension to truely understand I am filtering the list or remapping its values. Map and filter, although they are more explicit, also feature redundancy. They look OK with functional predicate: lst = map (f, lst) lst = filter (predicate, lst) But are less elegant when using an expression, than one has to convert through a lambda: lst = map (lambda x: x*5, lst) lst = filter (lambda x: x%3 == 1, lst) And perform especially bad in CPython compared to a comprehension. 2 - Efficiency A language support for these operations to be made in-place could improve the efficiency of this operations through reduced use of memory. I would propose this syntax. (TODO: find appropriate keywords I guess): lst.map x: x*5 lst.filter x: x%3 == 1 They can work for dictionaries too. dct.map k,v: v*5 dct.filter k,v: k+v == 10 The reasonning for the need of a language-level approach is the need for an efficient implementation that would support giving an arbitrary expression and not only a function. Expressions are shorter and, I guess, would be more efficient. I would gladly appreciate your returns on this, regarding: 1 - Whether a similar proposition has been made 2 - If you find this of any interest at all 3 - If you have a suggestion for improving the proposal Thanks for reading. Have a nice day -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
On Thursday, November 3, 2016 at 4:30:00 AM UTC-4, Steven D'Aprano wrote: > On Thursday 03 November 2016 17:56, arthurhavli...@gmail.com wrote: > > I would propose this syntax. (TODO: find appropriate keywords I guess): > > > > lst.map x: x*5 > > lst.filter x: x%3 == 1 > > I think the chances of Guido accepting new syntax for something as trivial as > this with three existing solutions is absolutely zero. > > I think the chances of Guido accepting new list/dict methods for in place map > and/or filter is a tiny bit higher than zero. To my eyes, this proposed syntax is completely foreign. "lst.map" looks like attribute or method access on lst, but there's no operation on the result, or function call. They implicitly assign back to lst, with no recognizable syntax to indicate that they do (= or "as" is the usual marker). While mapping and filtering are common operations, I'm not sure mapping and filtering and then reassigning back to the original sequence is especially common. It's certainly not common enough to deserve completely new syntax when the existing methods already exist. Part of the problem here is that you value explicitness, but also are trying to reduce redundancy. When you say that lst occurs twice in your examples, what I see is that it occurs twice because it's being used for two different things: it is the source of the data, and it is also the target for the result. I think it is good to have it appear twice in this case, especially since there's no reason to think it will usually be used for both purposes. How do I do exactly the same data manipulation, but then assign it to lst2 because I belatedly realized I wanted both the before and after list? Under your proposed syntax, I would have to completely rewrite the line to use a different filtering mechanism because now I need to unbundle the filtering and the assignment. That seems unfortunate. You've done the right thing by bringing the proposal here. I think it is useful to see how people approach the language, and where they want changes. Discussing the pros and cons is a good way to get a deeper appreciation both for the language and the rationale for its design. But I don't think this proposal has a chance of moving forward. --Ned. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
On Thursday 03 November 2016 17:56, arthurhavli...@gmail.com wrote: [...] > Python features a powerful and fast way to create lists through > comprehensions. Because of their ease of use and efficiency through native > implementation, they are an advantageous alternative to map, filter, and > more. However, when used in replacement for an in-place version of these > functions, they are sub-optimal, and Python offer no alternative. Of course Python offers an alternative: the basic for loop. for i, x in enumerate(alist): alist[i] = func(x) Not enough of a one-liner for you? Write a helper function: def inplace_map(func, alist): for i, x in enumerate(alist): alist[i] = func(x) In practice, you may even find that except for the most enormous lists (so big that memory becomes an issue, so we're talking tens of millions of items) it will probably be faster to use a slice and a comprehension: alist[:] = [func(x) for x in alist] [...] > Notice the concerns of OPs in his comments to replies in this one: > http://stackoverflow.com/questions/3000461/python-map-in-place What I saw was the OP's *premature optimization*: "I wanted to use map for performance gains" "I figured map would be quicker than the list comprehension way" Although in fairness he does also say: "I'm working on a big list, and the times we're talking about are seconds in difference, which clearly matters" I'm not necessarily sure that seconds always matters -- if your code takes 90 seconds, or 96 seconds, who is going to even notice? But let's assume he's right and a few seconds difference makes a real difference. He has three obvious alternatives to waiting until Python 3.7 (the earliest such a new feature can be added): - modify the list in place with a for-loop; - slice assignment using map; - slice assignment using list comprehension. > 1 - Code readability and reduced redundancy > > lst = [ item for item in lst if predicate(item) ] > lst = [ f(item) for item in lst ] > > Both these expressions feature redundancy, lst occurs twice and item at least > twice. That's not what redundancy means. "Redundancy" doesn't refer to the re-use of any arbitrary token. It means doing the same thing twice in two different places. > Additionally, the readability is hurt, because one has to dive through > the semantics of the comprehension to truely understand I am filtering the > list or remapping its values. Nonsense. It is perfectly readable because it is explicit about what is being done, unlike some magic method that you have to read the docs to understand what it does. A list comprehension or for-loop is more general and can be combined so you can do both: alist[:] = [func(x) for x in alist if condition(x)] > Map and filter, although they are more explicit, *Less* explicit. To most people, "map" means the thing that you follow when you are travelling in unfamiliar territory, and "filter" means the paper doohickey you put in your coffee machine to keep the ground up coffee from ending up in your cup. > also feature redundancy. > They look OK with functional predicate: > > lst = map (f, lst) > lst = filter (predicate, lst) > > But are less elegant when using an expression, than one has to convert > through a lambda: > > lst = map (lambda x: x*5, lst) > lst = filter (lambda x: x%3 == 1, lst) And that's why we have list comprehensions. > And perform especially bad in CPython compared to a comprehension. I doubt that. > 2 - Efficiency > > A language support for these operations to be made in-place could improve the > efficiency of this operations through reduced use of memory. *shrug* Saving memory sometimes costs time. > I would propose this syntax. (TODO: find appropriate keywords I guess): > > lst.map x: x*5 > lst.filter x: x%3 == 1 I think the chances of Guido accepting new syntax for something as trivial as this with three existing solutions is absolutely zero. I think the chances of Guido accepting new list/dict methods for in place map and/or filter is a tiny bit higher than zero. > The reasonning for the need of a language-level approach is the need for an > efficient implementation that would support giving an arbitrary expression > and not only a function. We already have three of those: for-loops, list comprehensions, and map. -- Steven 299792.458 km/s — not just a good idea, it’s the law! -- https://mail.python.org/mailman/listinfo/python-list