Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Cameron Simpson

On 19Oct2018 23:07, Alan Gauld  wrote:

On 19/10/18 17:12, Pat Martin wrote:
TLDR; How do you figure out if code is inefficient (if it isn't 
necessarily obvious) and how do you find a more efficient solution?


Others have addressed most of the issues but I'd just add the caveat
that you can spend forever trying to make your code "more efficient".
Usually, in the real world, you don't need to.

So how do you figure out if your code is inefficient?
Run it and if it doesn't work fast enough for your purpose then
try to speed it up. If it does then move onto the next project.


how do I go about finding a more efficient solution?


Google, a good reference book and experience.


Particularly, get some knowledge of the cost of various algorithms and 
data structures. There are outright inefficient things to do, but many 
others have costs and benefits: things fast in time but costly in space, 
and vice versa, and things efficient for some kinds of data but known to 
be bad for others.


Have a quick read of this:

 https://en.wikipedia.org/wiki/Big_O_notation

which talks about a common notation for costs. Skip the formal section 
with the math and go to the Orders of Common Functions section.


 https://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions

You'll find people talk about big O notation a lot with algorithms and 
their efficiency. For example, a dict _lookup_ is O(1): constant time 
per lookup.  But that doesn't mean always use a dict, because there's a 
cost to creating one in the first place. What you use it for matters.



But always, always, measure because there has been more
time wasted on unnecessary optimisations than on any other
feature of programming.


I just want to add a little nuance to all of this, because it to me it 
reads a little like (a) if it works and you get things done in a timely 
manner it is efficient and (b) if it isn't fast enough you can always 
make it faster. Alan hasn't actually said that, but one could get that 
impression.


To my mind:

Alan's point about "fast enough" is perfectly valid: in the real world, 
if your problem has been solved then further work on efficiency might 
cost more to implement than you gain after doing it.


His other point about measurement is also very important. It is possible 
to guess incorrectly about where performance problems lie, particularly 
with higher level languages because their internals has costs not 
apparent to the eye (because you can't see the internals). So from an 
engineering point of view, it is important to measure where a 
problematic programme spends its time and devote effort to those parts 
consuming the most time.


The standard example is the tight loop:

 for a in sequence1:
   for b in sequence2:
 do thing A
 do thing B

The cost of "do thing A" is more important than the cost of "do thing B" 
because likely A runs many many times more often than B. So even if B 
has some known inefficiency you can see, which will take some work to 
improve, effort spent on A will probably be more useful (if that's 
feasible).


With a complex programme this isn't always obvious; in the example above 
the 2 pieces of code are side by side.


The point about measurement is that profiling tools are useful here: 
when you _don't_ have an obvious primary target to improve but do need 
to improve efficiency, a profiling tool can measure where a programme 
spends its time in aggregate. That can point the way to code more 
beneficial to inspect.


It at least gets you objective information about where your programme 
spends its time. It is limited by the data you give your programme: toy 
example input data are not as good as real world data.


Finally, some things are as efficient as they get. You _can't_ always 
make things even more efficient.


Cheers,
Cameron Simpson 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Alan Gauld via Tutor
On 19/10/18 17:12, Pat Martin wrote:

> TLDR; How do you figure out if code is inefficient (if it isn't necessarily
> obvious) and how do you find a more efficient solution?

Others have addressed most of the issues but I'd just add the caveat
that you can spend forever trying to make your code "more efficient".
Usually, in the real world, you don't need to.

So how do you figure out if your code is inefficient?
Run it and if it doesn't work fast enough for your purpose then
try to speed it up. If it does then move onto the next project.

> how do I go about finding a more efficient solution?

Google, a good reference book and experience.

But always, always, measure because there has been more
time wasted on unnecessary optimisations than on any other
feature of programming.


-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] (no subject)

2018-10-19 Thread Albert-Jan Roskam



On 19 Oct 2018 18:09, Pat Martin  wrote:

TLDR; How do you figure out if code is inefficient (if it isn't necessarily
obvious) and how do you find a more efficient

-
Hi, check this: https://docs.python.org/3/library/profile.html. Also, the 
timeit module is handy for smaller snippets.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Peter Otten
Pat Martin wrote:

> So should we always use sets or dictionaries if possible? Are these more
> efficient because of not keeping track of their position?

Not always. If you want to know how often one entry/char occurs in a linear 
storage like a list, a string, or a file, then you can loop over the data:

num_c = some_string.count(c)  # the loop is hidden in the count() method


pairs = (line.partition("=")[::2] for line in file)
value = next(value for key, value in pairs if key == wanted_key)

However, if you expect frequent similar questions go with a lookup table:

freq = Counter(some_string)
for c in strings.ascii_lowercase:
print(c, "occurs", freq[c], "times")


lookup = dict(pairs)
for key in wanted_keys:
print(key, "is", lookup.get(key, "unknown"))

Before you start measure if the effort may be worthwhile, i. e. whether the 
linear approach causes a relevant delay. 

When you're done doublecheck if you got the desired improvement, i. e. 
whether you optimised the right place.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Pat Martin
So should we always use sets or dictionaries if possible? Are these more
efficient because of not keeping track of their position?

On Fri, Oct 19, 2018 at 10:47 AM Pat Martin  wrote:

> That's correct sorry, I reassigned with replace back to string2, I forgot
> that in my retyping of the snippet.
>
> That makes sense, when it comes to strings and it taking so long. Thanks
> for explaining that.
>
> On Fri, Oct 19, 2018 at 10:09 AM Peter Otten <__pete...@web.de> wrote:
>
>> Mats Wichmann wrote:
>>
>> > On 10/19/2018 10:12 AM, Pat Martin wrote:
>> >> Sorry my first email didn't have a subject line
>> >>
>> >> TLDR; How do you figure out if code is inefficient (if it isn't
>> >> necessarily obvious) and how do you find a more efficient solution?
>> >
>> > I think you've hit it in your last sentence ("except maybe write more
>> > code and get more experience"): experience will let you recognize
>> > patterns.
>> >
>> >> I use code wars sometimes to get some practice with Python, there was a
>> >> challenge to compare two strings and if string1 had enough characters
>> to
>> >> be rearranged to make string2 return True, otherwise return False.
>> >>
>> >> I wrote a script that was like this:
>> >>
>> >> for i in string1:
>> >> if i not in string2:
>> >> return False
>> >> string2.replace(i,"",1)
>> >> return True
>> >>
>> >> This worked but I kept getting that my function was too inefficient and
>> >> it took too long. I did a search for the problem and found someone was
>> >> using collections.Counter. This basically takes the string and returns
>> >> the number of times each character occurs in it. Then just compare the
>> >> count of one string to another to see if there is enough of each letter
>> >> to make the other string. This seems like an elegant way to do it.
>> >
>> > notwithstanding that the challenge is a little contrived... here's
>> > something you will come to recognize.  You use the string replace
>> > method, which is described thus, pasting from the official docs:
>> >
>> > """
>> > str.replace(old, new[, count])
>> >
>> > Return a copy of the string with all occurrences of substring old
>> > replaced by new. If the optional argument count is given, only the first
>> > count occurrences are replaced.
>> > """
>> >
>> > Strings are not modifiable, so there is no replace-in-place.  Each time
>> > you call string2.replace you build a new string... and then do nothing
>> > with that string, as you never assign a name to it. So that line clearly
>> > makes your code inefficient - you do some work that is just thrown away.
>> > And it's not clear what the purpose of this replacement is, anyway.
>>
>> This is probably retyped from memory. If the line were
>>
>> string2 = string2.replace(i,"",1)
>>
>> it would address your concern below about repeated chars.
>>
>> >>> def contains(s, t):
>> ... for c in s:
>> ... if c not in t: return False
>> ... t = t.replace(c, "", 1)  # remove one occurence of c from t
>> ... return True
>> ...
>> >>> contains("ata", "attax")
>> True
>> >>> contains("tata", "attax")
>> True
>> >>> contains("tatat", "attax")  # not enough 't's
>> False
>>
>> > Further it's not clear that you have the right answer.  What if string1
>> > contains one 'r' and string2 contains three?  For the one 'r', the test
>> > that it is also in string2 succeeds... but nowhere do you find out that
>> > you actually needed to have three in order to be able to "rearrange to
>> > make string2".
>> >
>> > Solving this problem might benefit from thinking about tests first: if
>> > you write a test where you know the answer is going to be negative, a
>> > second where it is going to be positive, and a third that is a
>> > non-trivial instance of positive (like the multiple-letter count), then
>> > you have something to code your solution against. After it works, you
>> > can then think about refactoring: is there a better way? This will kind
>> > of naturally lead you to thinking in terms of efficiency.
>> >
>> > Lesson 2: for very many tasks, someone has already done it, and you can
>> > benefit by using some existing code, from the standard library or a
>> > module installed separately.  That's often the difference between a
>> > programming challenge, which may want you to code it yourself to show
>> > you can, and real-world problem solving, where you are rewarded (in
>> > time) by efficiently reusing existing code.
>> >
>> >
>> >
>> > ___
>> > Tutor maillist  -  Tutor@python.org
>> > To unsubscribe or change subscription options:
>> > https://mail.python.org/mailman/listinfo/tutor
>>
>>
>> ___
>> Tutor maillist  -  Tutor@python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:

Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Pat Martin
That's correct sorry, I reassigned with replace back to string2, I forgot
that in my retyping of the snippet.

That makes sense, when it comes to strings and it taking so long. Thanks
for explaining that.

On Fri, Oct 19, 2018 at 10:09 AM Peter Otten <__pete...@web.de> wrote:

> Mats Wichmann wrote:
>
> > On 10/19/2018 10:12 AM, Pat Martin wrote:
> >> Sorry my first email didn't have a subject line
> >>
> >> TLDR; How do you figure out if code is inefficient (if it isn't
> >> necessarily obvious) and how do you find a more efficient solution?
> >
> > I think you've hit it in your last sentence ("except maybe write more
> > code and get more experience"): experience will let you recognize
> > patterns.
> >
> >> I use code wars sometimes to get some practice with Python, there was a
> >> challenge to compare two strings and if string1 had enough characters to
> >> be rearranged to make string2 return True, otherwise return False.
> >>
> >> I wrote a script that was like this:
> >>
> >> for i in string1:
> >> if i not in string2:
> >> return False
> >> string2.replace(i,"",1)
> >> return True
> >>
> >> This worked but I kept getting that my function was too inefficient and
> >> it took too long. I did a search for the problem and found someone was
> >> using collections.Counter. This basically takes the string and returns
> >> the number of times each character occurs in it. Then just compare the
> >> count of one string to another to see if there is enough of each letter
> >> to make the other string. This seems like an elegant way to do it.
> >
> > notwithstanding that the challenge is a little contrived... here's
> > something you will come to recognize.  You use the string replace
> > method, which is described thus, pasting from the official docs:
> >
> > """
> > str.replace(old, new[, count])
> >
> > Return a copy of the string with all occurrences of substring old
> > replaced by new. If the optional argument count is given, only the first
> > count occurrences are replaced.
> > """
> >
> > Strings are not modifiable, so there is no replace-in-place.  Each time
> > you call string2.replace you build a new string... and then do nothing
> > with that string, as you never assign a name to it. So that line clearly
> > makes your code inefficient - you do some work that is just thrown away.
> > And it's not clear what the purpose of this replacement is, anyway.
>
> This is probably retyped from memory. If the line were
>
> string2 = string2.replace(i,"",1)
>
> it would address your concern below about repeated chars.
>
> >>> def contains(s, t):
> ... for c in s:
> ... if c not in t: return False
> ... t = t.replace(c, "", 1)  # remove one occurence of c from t
> ... return True
> ...
> >>> contains("ata", "attax")
> True
> >>> contains("tata", "attax")
> True
> >>> contains("tatat", "attax")  # not enough 't's
> False
>
> > Further it's not clear that you have the right answer.  What if string1
> > contains one 'r' and string2 contains three?  For the one 'r', the test
> > that it is also in string2 succeeds... but nowhere do you find out that
> > you actually needed to have three in order to be able to "rearrange to
> > make string2".
> >
> > Solving this problem might benefit from thinking about tests first: if
> > you write a test where you know the answer is going to be negative, a
> > second where it is going to be positive, and a third that is a
> > non-trivial instance of positive (like the multiple-letter count), then
> > you have something to code your solution against. After it works, you
> > can then think about refactoring: is there a better way? This will kind
> > of naturally lead you to thinking in terms of efficiency.
> >
> > Lesson 2: for very many tasks, someone has already done it, and you can
> > benefit by using some existing code, from the standard library or a
> > module installed separately.  That's often the difference between a
> > programming challenge, which may want you to code it yourself to show
> > you can, and real-world problem solving, where you are rewarded (in
> > time) by efficiently reusing existing code.
> >
> >
> >
> > ___
> > Tutor maillist  -  Tutor@python.org
> > To unsubscribe or change subscription options:
> > https://mail.python.org/mailman/listinfo/tutor
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Peter Otten
Mats Wichmann wrote:

> On 10/19/2018 10:12 AM, Pat Martin wrote:
>> Sorry my first email didn't have a subject line
>> 
>> TLDR; How do you figure out if code is inefficient (if it isn't
>> necessarily obvious) and how do you find a more efficient solution?
> 
> I think you've hit it in your last sentence ("except maybe write more
> code and get more experience"): experience will let you recognize
> patterns.
> 
>> I use code wars sometimes to get some practice with Python, there was a
>> challenge to compare two strings and if string1 had enough characters to
>> be rearranged to make string2 return True, otherwise return False.
>> 
>> I wrote a script that was like this:
>> 
>> for i in string1:
>> if i not in string2:
>> return False
>> string2.replace(i,"",1)
>> return True
>> 
>> This worked but I kept getting that my function was too inefficient and
>> it took too long. I did a search for the problem and found someone was
>> using collections.Counter. This basically takes the string and returns
>> the number of times each character occurs in it. Then just compare the
>> count of one string to another to see if there is enough of each letter
>> to make the other string. This seems like an elegant way to do it.
> 
> notwithstanding that the challenge is a little contrived... here's
> something you will come to recognize.  You use the string replace
> method, which is described thus, pasting from the official docs:
> 
> """
> str.replace(old, new[, count])
> 
> Return a copy of the string with all occurrences of substring old
> replaced by new. If the optional argument count is given, only the first
> count occurrences are replaced.
> """
> 
> Strings are not modifiable, so there is no replace-in-place.  Each time
> you call string2.replace you build a new string... and then do nothing
> with that string, as you never assign a name to it. So that line clearly
> makes your code inefficient - you do some work that is just thrown away.
> And it's not clear what the purpose of this replacement is, anyway.

This is probably retyped from memory. If the line were

string2 = string2.replace(i,"",1)

it would address your concern below about repeated chars.

>>> def contains(s, t):
... for c in s:
... if c not in t: return False
... t = t.replace(c, "", 1)  # remove one occurence of c from t
... return True
... 
>>> contains("ata", "attax")
True
>>> contains("tata", "attax")
True
>>> contains("tatat", "attax")  # not enough 't's
False

> Further it's not clear that you have the right answer.  What if string1
> contains one 'r' and string2 contains three?  For the one 'r', the test
> that it is also in string2 succeeds... but nowhere do you find out that
> you actually needed to have three in order to be able to "rearrange to
> make string2".
> 
> Solving this problem might benefit from thinking about tests first: if
> you write a test where you know the answer is going to be negative, a
> second where it is going to be positive, and a third that is a
> non-trivial instance of positive (like the multiple-letter count), then
> you have something to code your solution against. After it works, you
> can then think about refactoring: is there a better way? This will kind
> of naturally lead you to thinking in terms of efficiency.
> 
> Lesson 2: for very many tasks, someone has already done it, and you can
> benefit by using some existing code, from the standard library or a
> module installed separately.  That's often the difference between a
> programming challenge, which may want you to code it yourself to show
> you can, and real-world problem solving, where you are rewarded (in
> time) by efficiently reusing existing code.
> 
> 
> 
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Peter Otten
Pat Martin wrote:

> Sorry my first email didn't have a subject line
> 
> TLDR; How do you figure out if code is inefficient (if it isn't
> necessarily obvious) and how do you find a more efficient solution?
> 
> I use code wars sometimes to get some practice with Python, there was a
> challenge to compare two strings and if string1 had enough characters to
> be rearranged to make string2 return True, otherwise return False.
> 
> I wrote a script that was like this:
> 
> for i in string1:
> if i not in string2:
> return False
> string2.replace(i,"",1)
> return True
> 
> This worked but I kept getting that my function was too inefficient and it
> took too long. I did a search for the problem and found someone was using
> collections.Counter. This basically takes the string and returns the
> number of times each character occurs in it. Then just compare the count
> of one string to another to see if there is enough of each letter to make
> the other string. This seems like an elegant way to do it.
> 
> My question is, how do I know something is inefficient and more
> importantly how do I go about finding a more efficient solution?
> 
> I have just discovered dir() and it has really helped in finding methods
> for items but that doesn't help when finding actual packages especially if
> I don't know exactly what I am looking for.
> 
> Sorry about the long email, not sure if there is even an answer to this
> question except maybe write more code and get more experience?

In Python my rule of thumb is: replace lists and loops with dict and set 
lookups whenever possible. One small challenge with this pattern is that not 
all loops look like a loop:

char in some_string  # "slow"; have to loop through the whole string
char in set_of_chars # "fast"; calculate some hash value and then likely
 # jump to the right set entry



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Mats Wichmann
On 10/19/2018 10:12 AM, Pat Martin wrote:
> Sorry my first email didn't have a subject line
> 
> TLDR; How do you figure out if code is inefficient (if it isn't necessarily
> obvious) and how do you find a more efficient solution?

I think you've hit it in your last sentence ("except maybe write more
code and get more experience"): experience will let you recognize patterns.

> I use code wars sometimes to get some practice with Python, there was a
> challenge to compare two strings and if string1 had enough characters to be
> rearranged to make string2 return True, otherwise return False.
> 
> I wrote a script that was like this:
> 
> for i in string1:
> if i not in string2:
> return False
> string2.replace(i,"",1)
> return True
> 
> This worked but I kept getting that my function was too inefficient and it
> took too long. I did a search for the problem and found someone was using
> collections.Counter. This basically takes the string and returns the number
> of times each character occurs in it. Then just compare the count of one
> string to another to see if there is enough of each letter to make the
> other string. This seems like an elegant way to do it.

notwithstanding that the challenge is a little contrived... here's
something you will come to recognize.  You use the string replace
method, which is described thus, pasting from the official docs:

"""
str.replace(old, new[, count])

Return a copy of the string with all occurrences of substring old
replaced by new. If the optional argument count is given, only the first
count occurrences are replaced.
"""

Strings are not modifiable, so there is no replace-in-place.  Each time
you call string2.replace you build a new string... and then do nothing
with that string, as you never assign a name to it. So that line clearly
makes your code inefficient - you do some work that is just thrown away.
And it's not clear what the purpose of this replacement is, anyway.

Further it's not clear that you have the right answer.  What if string1
contains one 'r' and string2 contains three?  For the one 'r', the test
that it is also in string2 succeeds... but nowhere do you find out that
you actually needed to have three in order to be able to "rearrange to
make string2".

Solving this problem might benefit from thinking about tests first: if
you write a test where you know the answer is going to be negative, a
second where it is going to be positive, and a third that is a
non-trivial instance of positive (like the multiple-letter count), then
you have something to code your solution against. After it works, you
can then think about refactoring: is there a better way? This will kind
of naturally lead you to thinking in terms of efficiency.

Lesson 2: for very many tasks, someone has already done it, and you can
benefit by using some existing code, from the standard library or a
module installed separately.  That's often the difference between a
programming challenge, which may want you to code it yourself to show
you can, and real-world problem solving, where you are rewarded (in
time) by efficiently reusing existing code.



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] How to find optimisations for code

2018-10-19 Thread Pat Martin
Sorry my first email didn't have a subject line

TLDR; How do you figure out if code is inefficient (if it isn't necessarily
obvious) and how do you find a more efficient solution?

I use code wars sometimes to get some practice with Python, there was a
challenge to compare two strings and if string1 had enough characters to be
rearranged to make string2 return True, otherwise return False.

I wrote a script that was like this:

for i in string1:
if i not in string2:
return False
string2.replace(i,"",1)
return True

This worked but I kept getting that my function was too inefficient and it
took too long. I did a search for the problem and found someone was using
collections.Counter. This basically takes the string and returns the number
of times each character occurs in it. Then just compare the count of one
string to another to see if there is enough of each letter to make the
other string. This seems like an elegant way to do it.

My question is, how do I know something is inefficient and more importantly
how do I go about finding a more efficient solution?

I have just discovered dir() and it has really helped in finding methods
for items but that doesn't help when finding actual packages especially if
I don't know exactly what I am looking for.

Sorry about the long email, not sure if there is even an answer to this
question except maybe write more code and get more experience?

Pat
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] (no subject)

2018-10-19 Thread Pat Martin
TLDR; How do you figure out if code is inefficient (if it isn't necessarily
obvious) and how do you find a more efficient solution?

I use code wars sometimes to get some practice with Python, there was a
challenge to compare two strings and if string1 had enough characters to be
rearranged to make string2 return True, otherwise return False.

I wrote a script that was like this:

for i in string1:
if i not in string2:
return False
string2.replace(i,"",1)
return True

This worked but I kept getting that my function was too inefficient and it
took too long. I did a search for the problem and found someone was using
collections.Counter. This basically takes the string and returns the number
of times each character occurs in it. Then just compare the count of one
string to another to see if there is enough of each letter to make the
other string. This seems like an elegant way to do it.

My question is, how do I know something is inefficient and more importantly
how do I go about finding a more efficient solution?

I have just discovered dir() and it has really helped in finding methods
for items but that doesn't help when finding actual packages especially if
I don't know exactly what I am looking for.

Sorry about the long email, not sure if there is even an answer to this
question except maybe write more code and get more experience?

Pat
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor