Re: [Tutor] How to find optimisations for code

2018-10-20 Thread Avi Gross
There are many ways to do some things. I present a fairly straightforward
method for consideration. 

 

Synopsis. Use two dictionaries to keep track of how many times a letter can
be found in the source word and the target word.

 

Loop over the target letters and as soon as you find a letter that needs
more copies of that letter than are available in the source, return False.

 

Otherwise, return True.

 

Slight optimization, return False immediately if the target is too long.

 

Note: Remove the print statements used to show what it does when trying for
efficiency.

 

Note I do not claim this is fast. But for longer strings this requires one
pass over each string to set the dictionary counters and often less than one
pass over the target string to determine Truth. I can visualize many
variants but efficiency of each may vary with the data, version of Python
and so on. 

 

Python CODE using version 3.7.0 :

 

"""

Function to check if string str_target is a strict subset

of str_source - but with multiple copies of a letter being unique

in what is thus not a set. In English, can the letters in str_target

be found in the required quantities with possibly some left over

when examining str_source. For example, 'level' has two l's and two e's

and can be made from 'multilevel' but not from 'evil' which has the needed

e v and l but not two copies.

"""

 

def sub_word(str_source, str_target):

"""

Return True/False if letters in source are enough to make target

Method: make dictionaries of number of letters in each. See if enough.

"""

 

# Check for trivial False condition: word too long

if len(str_target) > len(str_source) : return False

 

# Initialize empty dictionaries to hold leter counts.

source = {}

target = {}

 

# Count letters in source thgen target, initializing

# to zero if not yet present.

for letter in str_source.lower():

   source[letter] = source.get(letter, 0) + 1

 

   

for letter in str_target.lower():

   target[letter] = target.get(letter, 0) + 1

 

# During bebug phase, show what the dictionaries contain

print("SOURCE dict:", source)

print("TARGET dict:", target)

 

# Iterate over letters in target and return False

# if the number of instances needed exceeds what

# can be found in the source.

for key in target:

if target[key] > source.get(key, 0) : return False

 

# if you reached here, return True as all letters needed

# have been found.

return True

 

Examples of use:

 

>>> sub_word("multilevel", "level")

SOURCE dict: {'m': 1, 'u': 1, 'l': 3, 't': 1, 'i': 1, 'e': 2, 'v': 1}

TARGET dict: {'l': 2, 'e': 2, 'v': 1}

True

 

>>> sub_word("shorT", "MuchTooLong")

False

>>> sub_word("supercalifragilisticexpialidocious", "california")

SOURCE dict: {'s': 3, 'u': 2, 'p': 2, 'e': 2, 'r': 2, 'c': 3, 'a': 3, 'l':
3, 'i': 7, 'f': 1, 'g': 1, 't': 1, 'x': 1, 'd': 1, 'o': 2}

TARGET dict: {'c': 1, 'a': 2, 'l': 1, 'i': 2, 'f': 1, 'o': 1, 'r': 1, 'n':
1}

False

 

>>> sub_word("supercalifragilisticexpialidocious", "fragile")

SOURCE dict: {'s': 3, 'u': 2, 'p': 2, 'e': 2, 'r': 2, 'c': 3, 'a': 3, 'l':
3, 'i': 7, 'f': 1, 'g': 1, 't': 1, 'x': 1, 'd': 1, 'o': 2}

TARGET dict: {'f': 1, 'r': 1, 'a': 1, 'g': 1, 'i': 1, 'l': 1, 'e': 1}

True

 

>>> sub_word("devil", "vile")

SOURCE dict: {'d': 1, 'e': 1, 'v': 1, 'i': 1, 'l': 1}

TARGET dict: {'v': 1, 'i': 1, 'l': 1, 'e': 1}

True

 

 

-Original Message-
From: Tutor  On Behalf Of
Peter Otten
Sent: Saturday, October 20, 2018 3:02 AM
To: tutor@python.org
Subject: Re: [Tutor] How to find optimisations for code

 

Steven D'Aprano wrote:

 

> We don't need to check that the individual letters are the same, 

> because checking the counts will suffice. If they are not the same, 

> one string will have (let's say) two A's while the other will have 

> none, and the counts will be different.

 

Another great optimisation is solving the actual problem ;)

 

The OP isn't looking for anagrams, he wants to know if string2 can be spelt
with the characters in string1:

 

abba, abacus --> True (don't care about the extra b, c, u, s) abba, baab
--> True (don't care about character order) abba, bab --> False (bab is
missing an a)

 

 

___

Tutor maillist  -   <mailto:Tutor@python.org> Tutor@python.org

To unsubscribe or change subscription options:

 <https://mail.python.org/mailman/listinfo/tutor>
https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-20 Thread Peter Otten
Steven D'Aprano wrote:

> We don't need to check that the individual letters are the same, because
> checking the counts will suffice. If they are not the same, one string
> will have (let's say) two A's while the other will have none, and the
> counts will be different.

Another great optimisation is solving the actual problem ;)

The OP isn't looking for anagrams, he wants to know if string2 can be spelt 
with the characters in string1:

abba, abacus --> True (don't care about the extra b, c, u, s)
abba, baab --> True (don't care about character order)
abba, bab --> False (bab is missing an a)


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-20 Thread Steven D'Aprano
On Fri, Oct 19, 2018 at 09:12:54AM -0700, Pat Martin wrote:
> Sorry my first email didn't have a subject line
> 
> TLDR; How do you figure out if code is inefficient (if it isn't necessarily
> obvious) and how do you find a more efficient solution?

You know it is inefficient if it takes longer to run than it ought to.

The tricky questions are:

- if you don't *measure* the time, how do you know how long it takes?

- how do you know how long it ought to take? other than wishful thinking?

- how do you find a more efficient solution?

That's comes from experience, practice, and thinking about the problem 
you are trying to solve. The last one is especially tricky, because it 
may involve coming up with a solution to a problem that has never been 
solved before. That requires creativity and insight.

But before you do any of that, remember the two critical rules of 
optimization, from the classic 1975 book "Principles of Program Design" 
by Michael A. Jackson:

Rule 1: Don’t do it.
Rule 2 (for experts only): Don’t do it yet.


Donald Knuth wrote:

"We should forget about small efficiencies, say about 97% of the
time: premature optimization is the root of all evil. Yet we
should not pass up our opportunities in that critical 3%."


And to quote W.A. Wulf:

"More computing sins are committed in the name of efficiency
(without necessarily achieving it) than for any other single
reason — including blind stupidity."



> I use code wars sometimes to get some practice with Python, there was a
> challenge to compare two strings and if string1 had enough characters to be
> rearranged to make string2 return True, otherwise return False.

Think about the problem to be solved. How do we know that one string 
could be rearranged to the other? Each string much have:

- the same individual letters, e.g. "binary" and "brainy";
- the same total number of letters;
- the same number of *each* letter.

We don't need to check that the individual letters are the same, because 
checking the counts will suffice. If they are not the same, one string 
will have (let's say) two A's while the other will have none, and the 
counts will be different.

If the two strings have different length, we know they can't be anagrams 
of each other:

if len(string1) != len(string2):
return False

Only then do we need to compare the counts of each letter:

for c in string1:
if string1.count(c) != string2.count(c):
return False

Can that be improved? Perhaps -- for short strings, that ought to be 
pretty speedy, but for very long strings it has to inspect every letter 
over and over again...

for the first letter in string1:
count how many times it appears:
- compare it to the first letter;
- compare it to the second letter;
- compare it to the third letter; etc
for the second letter in string1:
count how many times it appears:
- compare it to the first letter;
- compare it to the second letter;
- compare it to the third letter; etc
etc.

So you can see, each letter gets inspected over and over again. If the 
strings are long, that does a lot of work. True, it is done at the speed 
of optimized C code (the string.count() method is pretty fast), but we 
can do better by using the Counter, which only needs to inspect each 
letter once:

if collections.Counter(string1) != collections.Counter(string2):
return False

Can we do better? Maybe... building the counters only needs to inspect 
each letter once. But Counters have a bit of overhead, they're not 
trivial code. There's a conceptually simpler way which *might* (or might 
not!) be faster in practice:

if sorted(string1) != sorted(string2):
return False

Think about why that test works.

If you have truly huge strings, then the Counter will be better, for 
sure. But for moderate-sized, or small, strings, I wouldn't want to 
guess which is faster. Even though technically sorted does more work, it 
probably has less overhead. It might very well win for some cases.


> Sorry about the long email, not sure if there is even an answer to this
> question except maybe write more code and get more experience?

Indeed. Nothing beats experience.

But you can also stand on the shoulders of those who came before you, in 
the words of Sir Isaac Newton, one of the greatest mathematicians and 
physicists of any time:

"If I have seen further, it is because I have stood on the 
shoulders of giants"

which was a not-at-all veiled dig against his rival and equal genius, 
Sir Robert Hooke, who was short and bent. (Newton was a genius, but he 
was not a nice guy, and he had many rivalries and feuds. In fairness, so 
did Hooke.)

You should start here:

https://www.joelonsoftware.com/2001/12/11/back-to-basics/

and think about how that relates to your original algorithm.

As far as programming in Python goes, some general advice:

- whenever you can do something by calling a built-in 

Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Cameron Simpson

On 19Oct2018 23:07, Alan Gauld  wrote:

On 19/10/18 17:12, Pat Martin wrote:
TLDR; How do you figure out if code is inefficient (if it isn't 
necessarily obvious) and how do you find a more efficient solution?


Others have addressed most of the issues but I'd just add the caveat
that you can spend forever trying to make your code "more efficient".
Usually, in the real world, you don't need to.

So how do you figure out if your code is inefficient?
Run it and if it doesn't work fast enough for your purpose then
try to speed it up. If it does then move onto the next project.


how do I go about finding a more efficient solution?


Google, a good reference book and experience.


Particularly, get some knowledge of the cost of various algorithms and 
data structures. There are outright inefficient things to do, but many 
others have costs and benefits: things fast in time but costly in space, 
and vice versa, and things efficient for some kinds of data but known to 
be bad for others.


Have a quick read of this:

 https://en.wikipedia.org/wiki/Big_O_notation

which talks about a common notation for costs. Skip the formal section 
with the math and go to the Orders of Common Functions section.


 https://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions

You'll find people talk about big O notation a lot with algorithms and 
their efficiency. For example, a dict _lookup_ is O(1): constant time 
per lookup.  But that doesn't mean always use a dict, because there's a 
cost to creating one in the first place. What you use it for matters.



But always, always, measure because there has been more
time wasted on unnecessary optimisations than on any other
feature of programming.


I just want to add a little nuance to all of this, because it to me it 
reads a little like (a) if it works and you get things done in a timely 
manner it is efficient and (b) if it isn't fast enough you can always 
make it faster. Alan hasn't actually said that, but one could get that 
impression.


To my mind:

Alan's point about "fast enough" is perfectly valid: in the real world, 
if your problem has been solved then further work on efficiency might 
cost more to implement than you gain after doing it.


His other point about measurement is also very important. It is possible 
to guess incorrectly about where performance problems lie, particularly 
with higher level languages because their internals has costs not 
apparent to the eye (because you can't see the internals). So from an 
engineering point of view, it is important to measure where a 
problematic programme spends its time and devote effort to those parts 
consuming the most time.


The standard example is the tight loop:

 for a in sequence1:
   for b in sequence2:
 do thing A
 do thing B

The cost of "do thing A" is more important than the cost of "do thing B" 
because likely A runs many many times more often than B. So even if B 
has some known inefficiency you can see, which will take some work to 
improve, effort spent on A will probably be more useful (if that's 
feasible).


With a complex programme this isn't always obvious; in the example above 
the 2 pieces of code are side by side.


The point about measurement is that profiling tools are useful here: 
when you _don't_ have an obvious primary target to improve but do need 
to improve efficiency, a profiling tool can measure where a programme 
spends its time in aggregate. That can point the way to code more 
beneficial to inspect.


It at least gets you objective information about where your programme 
spends its time. It is limited by the data you give your programme: toy 
example input data are not as good as real world data.


Finally, some things are as efficient as they get. You _can't_ always 
make things even more efficient.


Cheers,
Cameron Simpson 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Alan Gauld via Tutor
On 19/10/18 17:12, Pat Martin wrote:

> TLDR; How do you figure out if code is inefficient (if it isn't necessarily
> obvious) and how do you find a more efficient solution?

Others have addressed most of the issues but I'd just add the caveat
that you can spend forever trying to make your code "more efficient".
Usually, in the real world, you don't need to.

So how do you figure out if your code is inefficient?
Run it and if it doesn't work fast enough for your purpose then
try to speed it up. If it does then move onto the next project.

> how do I go about finding a more efficient solution?

Google, a good reference book and experience.

But always, always, measure because there has been more
time wasted on unnecessary optimisations than on any other
feature of programming.


-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Peter Otten
Pat Martin wrote:

> So should we always use sets or dictionaries if possible? Are these more
> efficient because of not keeping track of their position?

Not always. If you want to know how often one entry/char occurs in a linear 
storage like a list, a string, or a file, then you can loop over the data:

num_c = some_string.count(c)  # the loop is hidden in the count() method


pairs = (line.partition("=")[::2] for line in file)
value = next(value for key, value in pairs if key == wanted_key)

However, if you expect frequent similar questions go with a lookup table:

freq = Counter(some_string)
for c in strings.ascii_lowercase:
print(c, "occurs", freq[c], "times")


lookup = dict(pairs)
for key in wanted_keys:
print(key, "is", lookup.get(key, "unknown"))

Before you start measure if the effort may be worthwhile, i. e. whether the 
linear approach causes a relevant delay. 

When you're done doublecheck if you got the desired improvement, i. e. 
whether you optimised the right place.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Pat Martin
So should we always use sets or dictionaries if possible? Are these more
efficient because of not keeping track of their position?

On Fri, Oct 19, 2018 at 10:47 AM Pat Martin  wrote:

> That's correct sorry, I reassigned with replace back to string2, I forgot
> that in my retyping of the snippet.
>
> That makes sense, when it comes to strings and it taking so long. Thanks
> for explaining that.
>
> On Fri, Oct 19, 2018 at 10:09 AM Peter Otten <__pete...@web.de> wrote:
>
>> Mats Wichmann wrote:
>>
>> > On 10/19/2018 10:12 AM, Pat Martin wrote:
>> >> Sorry my first email didn't have a subject line
>> >>
>> >> TLDR; How do you figure out if code is inefficient (if it isn't
>> >> necessarily obvious) and how do you find a more efficient solution?
>> >
>> > I think you've hit it in your last sentence ("except maybe write more
>> > code and get more experience"): experience will let you recognize
>> > patterns.
>> >
>> >> I use code wars sometimes to get some practice with Python, there was a
>> >> challenge to compare two strings and if string1 had enough characters
>> to
>> >> be rearranged to make string2 return True, otherwise return False.
>> >>
>> >> I wrote a script that was like this:
>> >>
>> >> for i in string1:
>> >> if i not in string2:
>> >> return False
>> >> string2.replace(i,"",1)
>> >> return True
>> >>
>> >> This worked but I kept getting that my function was too inefficient and
>> >> it took too long. I did a search for the problem and found someone was
>> >> using collections.Counter. This basically takes the string and returns
>> >> the number of times each character occurs in it. Then just compare the
>> >> count of one string to another to see if there is enough of each letter
>> >> to make the other string. This seems like an elegant way to do it.
>> >
>> > notwithstanding that the challenge is a little contrived... here's
>> > something you will come to recognize.  You use the string replace
>> > method, which is described thus, pasting from the official docs:
>> >
>> > """
>> > str.replace(old, new[, count])
>> >
>> > Return a copy of the string with all occurrences of substring old
>> > replaced by new. If the optional argument count is given, only the first
>> > count occurrences are replaced.
>> > """
>> >
>> > Strings are not modifiable, so there is no replace-in-place.  Each time
>> > you call string2.replace you build a new string... and then do nothing
>> > with that string, as you never assign a name to it. So that line clearly
>> > makes your code inefficient - you do some work that is just thrown away.
>> > And it's not clear what the purpose of this replacement is, anyway.
>>
>> This is probably retyped from memory. If the line were
>>
>> string2 = string2.replace(i,"",1)
>>
>> it would address your concern below about repeated chars.
>>
>> >>> def contains(s, t):
>> ... for c in s:
>> ... if c not in t: return False
>> ... t = t.replace(c, "", 1)  # remove one occurence of c from t
>> ... return True
>> ...
>> >>> contains("ata", "attax")
>> True
>> >>> contains("tata", "attax")
>> True
>> >>> contains("tatat", "attax")  # not enough 't's
>> False
>>
>> > Further it's not clear that you have the right answer.  What if string1
>> > contains one 'r' and string2 contains three?  For the one 'r', the test
>> > that it is also in string2 succeeds... but nowhere do you find out that
>> > you actually needed to have three in order to be able to "rearrange to
>> > make string2".
>> >
>> > Solving this problem might benefit from thinking about tests first: if
>> > you write a test where you know the answer is going to be negative, a
>> > second where it is going to be positive, and a third that is a
>> > non-trivial instance of positive (like the multiple-letter count), then
>> > you have something to code your solution against. After it works, you
>> > can then think about refactoring: is there a better way? This will kind
>> > of naturally lead you to thinking in terms of efficiency.
>> >
>> > Lesson 2: for very many tasks, someone has already done it, and you can
>> > benefit by using some existing code, from the standard library or a
>> > module installed separately.  That's often the difference between a
>> > programming challenge, which may want you to code it yourself to show
>> > you can, and real-world problem solving, where you are rewarded (in
>> > time) by efficiently reusing existing code.
>> >
>> >
>> >
>> > ___
>> > Tutor maillist  -  Tutor@python.org
>> > To unsubscribe or change subscription options:
>> > https://mail.python.org/mailman/listinfo/tutor
>>
>>
>> ___
>> Tutor maillist  -  Tutor@python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:

Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Pat Martin
That's correct sorry, I reassigned with replace back to string2, I forgot
that in my retyping of the snippet.

That makes sense, when it comes to strings and it taking so long. Thanks
for explaining that.

On Fri, Oct 19, 2018 at 10:09 AM Peter Otten <__pete...@web.de> wrote:

> Mats Wichmann wrote:
>
> > On 10/19/2018 10:12 AM, Pat Martin wrote:
> >> Sorry my first email didn't have a subject line
> >>
> >> TLDR; How do you figure out if code is inefficient (if it isn't
> >> necessarily obvious) and how do you find a more efficient solution?
> >
> > I think you've hit it in your last sentence ("except maybe write more
> > code and get more experience"): experience will let you recognize
> > patterns.
> >
> >> I use code wars sometimes to get some practice with Python, there was a
> >> challenge to compare two strings and if string1 had enough characters to
> >> be rearranged to make string2 return True, otherwise return False.
> >>
> >> I wrote a script that was like this:
> >>
> >> for i in string1:
> >> if i not in string2:
> >> return False
> >> string2.replace(i,"",1)
> >> return True
> >>
> >> This worked but I kept getting that my function was too inefficient and
> >> it took too long. I did a search for the problem and found someone was
> >> using collections.Counter. This basically takes the string and returns
> >> the number of times each character occurs in it. Then just compare the
> >> count of one string to another to see if there is enough of each letter
> >> to make the other string. This seems like an elegant way to do it.
> >
> > notwithstanding that the challenge is a little contrived... here's
> > something you will come to recognize.  You use the string replace
> > method, which is described thus, pasting from the official docs:
> >
> > """
> > str.replace(old, new[, count])
> >
> > Return a copy of the string with all occurrences of substring old
> > replaced by new. If the optional argument count is given, only the first
> > count occurrences are replaced.
> > """
> >
> > Strings are not modifiable, so there is no replace-in-place.  Each time
> > you call string2.replace you build a new string... and then do nothing
> > with that string, as you never assign a name to it. So that line clearly
> > makes your code inefficient - you do some work that is just thrown away.
> > And it's not clear what the purpose of this replacement is, anyway.
>
> This is probably retyped from memory. If the line were
>
> string2 = string2.replace(i,"",1)
>
> it would address your concern below about repeated chars.
>
> >>> def contains(s, t):
> ... for c in s:
> ... if c not in t: return False
> ... t = t.replace(c, "", 1)  # remove one occurence of c from t
> ... return True
> ...
> >>> contains("ata", "attax")
> True
> >>> contains("tata", "attax")
> True
> >>> contains("tatat", "attax")  # not enough 't's
> False
>
> > Further it's not clear that you have the right answer.  What if string1
> > contains one 'r' and string2 contains three?  For the one 'r', the test
> > that it is also in string2 succeeds... but nowhere do you find out that
> > you actually needed to have three in order to be able to "rearrange to
> > make string2".
> >
> > Solving this problem might benefit from thinking about tests first: if
> > you write a test where you know the answer is going to be negative, a
> > second where it is going to be positive, and a third that is a
> > non-trivial instance of positive (like the multiple-letter count), then
> > you have something to code your solution against. After it works, you
> > can then think about refactoring: is there a better way? This will kind
> > of naturally lead you to thinking in terms of efficiency.
> >
> > Lesson 2: for very many tasks, someone has already done it, and you can
> > benefit by using some existing code, from the standard library or a
> > module installed separately.  That's often the difference between a
> > programming challenge, which may want you to code it yourself to show
> > you can, and real-world problem solving, where you are rewarded (in
> > time) by efficiently reusing existing code.
> >
> >
> >
> > ___
> > Tutor maillist  -  Tutor@python.org
> > To unsubscribe or change subscription options:
> > https://mail.python.org/mailman/listinfo/tutor
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Peter Otten
Mats Wichmann wrote:

> On 10/19/2018 10:12 AM, Pat Martin wrote:
>> Sorry my first email didn't have a subject line
>> 
>> TLDR; How do you figure out if code is inefficient (if it isn't
>> necessarily obvious) and how do you find a more efficient solution?
> 
> I think you've hit it in your last sentence ("except maybe write more
> code and get more experience"): experience will let you recognize
> patterns.
> 
>> I use code wars sometimes to get some practice with Python, there was a
>> challenge to compare two strings and if string1 had enough characters to
>> be rearranged to make string2 return True, otherwise return False.
>> 
>> I wrote a script that was like this:
>> 
>> for i in string1:
>> if i not in string2:
>> return False
>> string2.replace(i,"",1)
>> return True
>> 
>> This worked but I kept getting that my function was too inefficient and
>> it took too long. I did a search for the problem and found someone was
>> using collections.Counter. This basically takes the string and returns
>> the number of times each character occurs in it. Then just compare the
>> count of one string to another to see if there is enough of each letter
>> to make the other string. This seems like an elegant way to do it.
> 
> notwithstanding that the challenge is a little contrived... here's
> something you will come to recognize.  You use the string replace
> method, which is described thus, pasting from the official docs:
> 
> """
> str.replace(old, new[, count])
> 
> Return a copy of the string with all occurrences of substring old
> replaced by new. If the optional argument count is given, only the first
> count occurrences are replaced.
> """
> 
> Strings are not modifiable, so there is no replace-in-place.  Each time
> you call string2.replace you build a new string... and then do nothing
> with that string, as you never assign a name to it. So that line clearly
> makes your code inefficient - you do some work that is just thrown away.
> And it's not clear what the purpose of this replacement is, anyway.

This is probably retyped from memory. If the line were

string2 = string2.replace(i,"",1)

it would address your concern below about repeated chars.

>>> def contains(s, t):
... for c in s:
... if c not in t: return False
... t = t.replace(c, "", 1)  # remove one occurence of c from t
... return True
... 
>>> contains("ata", "attax")
True
>>> contains("tata", "attax")
True
>>> contains("tatat", "attax")  # not enough 't's
False

> Further it's not clear that you have the right answer.  What if string1
> contains one 'r' and string2 contains three?  For the one 'r', the test
> that it is also in string2 succeeds... but nowhere do you find out that
> you actually needed to have three in order to be able to "rearrange to
> make string2".
> 
> Solving this problem might benefit from thinking about tests first: if
> you write a test where you know the answer is going to be negative, a
> second where it is going to be positive, and a third that is a
> non-trivial instance of positive (like the multiple-letter count), then
> you have something to code your solution against. After it works, you
> can then think about refactoring: is there a better way? This will kind
> of naturally lead you to thinking in terms of efficiency.
> 
> Lesson 2: for very many tasks, someone has already done it, and you can
> benefit by using some existing code, from the standard library or a
> module installed separately.  That's often the difference between a
> programming challenge, which may want you to code it yourself to show
> you can, and real-world problem solving, where you are rewarded (in
> time) by efficiently reusing existing code.
> 
> 
> 
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Peter Otten
Pat Martin wrote:

> Sorry my first email didn't have a subject line
> 
> TLDR; How do you figure out if code is inefficient (if it isn't
> necessarily obvious) and how do you find a more efficient solution?
> 
> I use code wars sometimes to get some practice with Python, there was a
> challenge to compare two strings and if string1 had enough characters to
> be rearranged to make string2 return True, otherwise return False.
> 
> I wrote a script that was like this:
> 
> for i in string1:
> if i not in string2:
> return False
> string2.replace(i,"",1)
> return True
> 
> This worked but I kept getting that my function was too inefficient and it
> took too long. I did a search for the problem and found someone was using
> collections.Counter. This basically takes the string and returns the
> number of times each character occurs in it. Then just compare the count
> of one string to another to see if there is enough of each letter to make
> the other string. This seems like an elegant way to do it.
> 
> My question is, how do I know something is inefficient and more
> importantly how do I go about finding a more efficient solution?
> 
> I have just discovered dir() and it has really helped in finding methods
> for items but that doesn't help when finding actual packages especially if
> I don't know exactly what I am looking for.
> 
> Sorry about the long email, not sure if there is even an answer to this
> question except maybe write more code and get more experience?

In Python my rule of thumb is: replace lists and loops with dict and set 
lookups whenever possible. One small challenge with this pattern is that not 
all loops look like a loop:

char in some_string  # "slow"; have to loop through the whole string
char in set_of_chars # "fast"; calculate some hash value and then likely
 # jump to the right set entry



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to find optimisations for code

2018-10-19 Thread Mats Wichmann
On 10/19/2018 10:12 AM, Pat Martin wrote:
> Sorry my first email didn't have a subject line
> 
> TLDR; How do you figure out if code is inefficient (if it isn't necessarily
> obvious) and how do you find a more efficient solution?

I think you've hit it in your last sentence ("except maybe write more
code and get more experience"): experience will let you recognize patterns.

> I use code wars sometimes to get some practice with Python, there was a
> challenge to compare two strings and if string1 had enough characters to be
> rearranged to make string2 return True, otherwise return False.
> 
> I wrote a script that was like this:
> 
> for i in string1:
> if i not in string2:
> return False
> string2.replace(i,"",1)
> return True
> 
> This worked but I kept getting that my function was too inefficient and it
> took too long. I did a search for the problem and found someone was using
> collections.Counter. This basically takes the string and returns the number
> of times each character occurs in it. Then just compare the count of one
> string to another to see if there is enough of each letter to make the
> other string. This seems like an elegant way to do it.

notwithstanding that the challenge is a little contrived... here's
something you will come to recognize.  You use the string replace
method, which is described thus, pasting from the official docs:

"""
str.replace(old, new[, count])

Return a copy of the string with all occurrences of substring old
replaced by new. If the optional argument count is given, only the first
count occurrences are replaced.
"""

Strings are not modifiable, so there is no replace-in-place.  Each time
you call string2.replace you build a new string... and then do nothing
with that string, as you never assign a name to it. So that line clearly
makes your code inefficient - you do some work that is just thrown away.
And it's not clear what the purpose of this replacement is, anyway.

Further it's not clear that you have the right answer.  What if string1
contains one 'r' and string2 contains three?  For the one 'r', the test
that it is also in string2 succeeds... but nowhere do you find out that
you actually needed to have three in order to be able to "rearrange to
make string2".

Solving this problem might benefit from thinking about tests first: if
you write a test where you know the answer is going to be negative, a
second where it is going to be positive, and a third that is a
non-trivial instance of positive (like the multiple-letter count), then
you have something to code your solution against. After it works, you
can then think about refactoring: is there a better way? This will kind
of naturally lead you to thinking in terms of efficiency.

Lesson 2: for very many tasks, someone has already done it, and you can
benefit by using some existing code, from the standard library or a
module installed separately.  That's often the difference between a
programming challenge, which may want you to code it yourself to show
you can, and real-world problem solving, where you are rewarded (in
time) by efficiently reusing existing code.



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] How to find optimisations for code

2018-10-19 Thread Pat Martin
Sorry my first email didn't have a subject line

TLDR; How do you figure out if code is inefficient (if it isn't necessarily
obvious) and how do you find a more efficient solution?

I use code wars sometimes to get some practice with Python, there was a
challenge to compare two strings and if string1 had enough characters to be
rearranged to make string2 return True, otherwise return False.

I wrote a script that was like this:

for i in string1:
if i not in string2:
return False
string2.replace(i,"",1)
return True

This worked but I kept getting that my function was too inefficient and it
took too long. I did a search for the problem and found someone was using
collections.Counter. This basically takes the string and returns the number
of times each character occurs in it. Then just compare the count of one
string to another to see if there is enough of each letter to make the
other string. This seems like an elegant way to do it.

My question is, how do I know something is inefficient and more importantly
how do I go about finding a more efficient solution?

I have just discovered dir() and it has really helped in finding methods
for items but that doesn't help when finding actual packages especially if
I don't know exactly what I am looking for.

Sorry about the long email, not sure if there is even an answer to this
question except maybe write more code and get more experience?

Pat
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor