Re: [Tutor] How to find optimisations for code
There are many ways to do some things. I present a fairly straightforward method for consideration. Synopsis. Use two dictionaries to keep track of how many times a letter can be found in the source word and the target word. Loop over the target letters and as soon as you find a letter that needs more copies of that letter than are available in the source, return False. Otherwise, return True. Slight optimization, return False immediately if the target is too long. Note: Remove the print statements used to show what it does when trying for efficiency. Note I do not claim this is fast. But for longer strings this requires one pass over each string to set the dictionary counters and often less than one pass over the target string to determine Truth. I can visualize many variants but efficiency of each may vary with the data, version of Python and so on. Python CODE using version 3.7.0 : """ Function to check if string str_target is a strict subset of str_source - but with multiple copies of a letter being unique in what is thus not a set. In English, can the letters in str_target be found in the required quantities with possibly some left over when examining str_source. For example, 'level' has two l's and two e's and can be made from 'multilevel' but not from 'evil' which has the needed e v and l but not two copies. """ def sub_word(str_source, str_target): """ Return True/False if letters in source are enough to make target Method: make dictionaries of number of letters in each. See if enough. """ # Check for trivial False condition: word too long if len(str_target) > len(str_source) : return False # Initialize empty dictionaries to hold leter counts. source = {} target = {} # Count letters in source thgen target, initializing # to zero if not yet present. for letter in str_source.lower(): source[letter] = source.get(letter, 0) + 1 for letter in str_target.lower(): target[letter] = target.get(letter, 0) + 1 # During bebug phase, show what the dictionaries contain print("SOURCE dict:", source) print("TARGET dict:", target) # Iterate over letters in target and return False # if the number of instances needed exceeds what # can be found in the source. for key in target: if target[key] > source.get(key, 0) : return False # if you reached here, return True as all letters needed # have been found. return True Examples of use: >>> sub_word("multilevel", "level") SOURCE dict: {'m': 1, 'u': 1, 'l': 3, 't': 1, 'i': 1, 'e': 2, 'v': 1} TARGET dict: {'l': 2, 'e': 2, 'v': 1} True >>> sub_word("shorT", "MuchTooLong") False >>> sub_word("supercalifragilisticexpialidocious", "california") SOURCE dict: {'s': 3, 'u': 2, 'p': 2, 'e': 2, 'r': 2, 'c': 3, 'a': 3, 'l': 3, 'i': 7, 'f': 1, 'g': 1, 't': 1, 'x': 1, 'd': 1, 'o': 2} TARGET dict: {'c': 1, 'a': 2, 'l': 1, 'i': 2, 'f': 1, 'o': 1, 'r': 1, 'n': 1} False >>> sub_word("supercalifragilisticexpialidocious", "fragile") SOURCE dict: {'s': 3, 'u': 2, 'p': 2, 'e': 2, 'r': 2, 'c': 3, 'a': 3, 'l': 3, 'i': 7, 'f': 1, 'g': 1, 't': 1, 'x': 1, 'd': 1, 'o': 2} TARGET dict: {'f': 1, 'r': 1, 'a': 1, 'g': 1, 'i': 1, 'l': 1, 'e': 1} True >>> sub_word("devil", "vile") SOURCE dict: {'d': 1, 'e': 1, 'v': 1, 'i': 1, 'l': 1} TARGET dict: {'v': 1, 'i': 1, 'l': 1, 'e': 1} True -Original Message- From: Tutor On Behalf Of Peter Otten Sent: Saturday, October 20, 2018 3:02 AM To: tutor@python.org Subject: Re: [Tutor] How to find optimisations for code Steven D'Aprano wrote: > We don't need to check that the individual letters are the same, > because checking the counts will suffice. If they are not the same, > one string will have (let's say) two A's while the other will have > none, and the counts will be different. Another great optimisation is solving the actual problem ;) The OP isn't looking for anagrams, he wants to know if string2 can be spelt with the characters in string1: abba, abacus --> True (don't care about the extra b, c, u, s) abba, baab --> True (don't care about character order) abba, bab --> False (bab is missing an a) ___ Tutor maillist - <mailto:Tutor@python.org> Tutor@python.org To unsubscribe or change subscription options: <https://mail.python.org/mailman/listinfo/tutor> https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How to find optimisations for code
Steven D'Aprano wrote: > We don't need to check that the individual letters are the same, because > checking the counts will suffice. If they are not the same, one string > will have (let's say) two A's while the other will have none, and the > counts will be different. Another great optimisation is solving the actual problem ;) The OP isn't looking for anagrams, he wants to know if string2 can be spelt with the characters in string1: abba, abacus --> True (don't care about the extra b, c, u, s) abba, baab --> True (don't care about character order) abba, bab --> False (bab is missing an a) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How to find optimisations for code
On Fri, Oct 19, 2018 at 09:12:54AM -0700, Pat Martin wrote: > Sorry my first email didn't have a subject line > > TLDR; How do you figure out if code is inefficient (if it isn't necessarily > obvious) and how do you find a more efficient solution? You know it is inefficient if it takes longer to run than it ought to. The tricky questions are: - if you don't *measure* the time, how do you know how long it takes? - how do you know how long it ought to take? other than wishful thinking? - how do you find a more efficient solution? That's comes from experience, practice, and thinking about the problem you are trying to solve. The last one is especially tricky, because it may involve coming up with a solution to a problem that has never been solved before. That requires creativity and insight. But before you do any of that, remember the two critical rules of optimization, from the classic 1975 book "Principles of Program Design" by Michael A. Jackson: Rule 1: Don’t do it. Rule 2 (for experts only): Don’t do it yet. Donald Knuth wrote: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." And to quote W.A. Wulf: "More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason — including blind stupidity." > I use code wars sometimes to get some practice with Python, there was a > challenge to compare two strings and if string1 had enough characters to be > rearranged to make string2 return True, otherwise return False. Think about the problem to be solved. How do we know that one string could be rearranged to the other? Each string much have: - the same individual letters, e.g. "binary" and "brainy"; - the same total number of letters; - the same number of *each* letter. We don't need to check that the individual letters are the same, because checking the counts will suffice. If they are not the same, one string will have (let's say) two A's while the other will have none, and the counts will be different. If the two strings have different length, we know they can't be anagrams of each other: if len(string1) != len(string2): return False Only then do we need to compare the counts of each letter: for c in string1: if string1.count(c) != string2.count(c): return False Can that be improved? Perhaps -- for short strings, that ought to be pretty speedy, but for very long strings it has to inspect every letter over and over again... for the first letter in string1: count how many times it appears: - compare it to the first letter; - compare it to the second letter; - compare it to the third letter; etc for the second letter in string1: count how many times it appears: - compare it to the first letter; - compare it to the second letter; - compare it to the third letter; etc etc. So you can see, each letter gets inspected over and over again. If the strings are long, that does a lot of work. True, it is done at the speed of optimized C code (the string.count() method is pretty fast), but we can do better by using the Counter, which only needs to inspect each letter once: if collections.Counter(string1) != collections.Counter(string2): return False Can we do better? Maybe... building the counters only needs to inspect each letter once. But Counters have a bit of overhead, they're not trivial code. There's a conceptually simpler way which *might* (or might not!) be faster in practice: if sorted(string1) != sorted(string2): return False Think about why that test works. If you have truly huge strings, then the Counter will be better, for sure. But for moderate-sized, or small, strings, I wouldn't want to guess which is faster. Even though technically sorted does more work, it probably has less overhead. It might very well win for some cases. > Sorry about the long email, not sure if there is even an answer to this > question except maybe write more code and get more experience? Indeed. Nothing beats experience. But you can also stand on the shoulders of those who came before you, in the words of Sir Isaac Newton, one of the greatest mathematicians and physicists of any time: "If I have seen further, it is because I have stood on the shoulders of giants" which was a not-at-all veiled dig against his rival and equal genius, Sir Robert Hooke, who was short and bent. (Newton was a genius, but he was not a nice guy, and he had many rivalries and feuds. In fairness, so did Hooke.) You should start here: https://www.joelonsoftware.com/2001/12/11/back-to-basics/ and think about how that relates to your original algorithm. As far as programming in Python goes, some general advice: - whenever you can do something by calling a built-in
Re: [Tutor] How to find optimisations for code
On 19Oct2018 23:07, Alan Gauld wrote: On 19/10/18 17:12, Pat Martin wrote: TLDR; How do you figure out if code is inefficient (if it isn't necessarily obvious) and how do you find a more efficient solution? Others have addressed most of the issues but I'd just add the caveat that you can spend forever trying to make your code "more efficient". Usually, in the real world, you don't need to. So how do you figure out if your code is inefficient? Run it and if it doesn't work fast enough for your purpose then try to speed it up. If it does then move onto the next project. how do I go about finding a more efficient solution? Google, a good reference book and experience. Particularly, get some knowledge of the cost of various algorithms and data structures. There are outright inefficient things to do, but many others have costs and benefits: things fast in time but costly in space, and vice versa, and things efficient for some kinds of data but known to be bad for others. Have a quick read of this: https://en.wikipedia.org/wiki/Big_O_notation which talks about a common notation for costs. Skip the formal section with the math and go to the Orders of Common Functions section. https://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions You'll find people talk about big O notation a lot with algorithms and their efficiency. For example, a dict _lookup_ is O(1): constant time per lookup. But that doesn't mean always use a dict, because there's a cost to creating one in the first place. What you use it for matters. But always, always, measure because there has been more time wasted on unnecessary optimisations than on any other feature of programming. I just want to add a little nuance to all of this, because it to me it reads a little like (a) if it works and you get things done in a timely manner it is efficient and (b) if it isn't fast enough you can always make it faster. Alan hasn't actually said that, but one could get that impression. To my mind: Alan's point about "fast enough" is perfectly valid: in the real world, if your problem has been solved then further work on efficiency might cost more to implement than you gain after doing it. His other point about measurement is also very important. It is possible to guess incorrectly about where performance problems lie, particularly with higher level languages because their internals has costs not apparent to the eye (because you can't see the internals). So from an engineering point of view, it is important to measure where a problematic programme spends its time and devote effort to those parts consuming the most time. The standard example is the tight loop: for a in sequence1: for b in sequence2: do thing A do thing B The cost of "do thing A" is more important than the cost of "do thing B" because likely A runs many many times more often than B. So even if B has some known inefficiency you can see, which will take some work to improve, effort spent on A will probably be more useful (if that's feasible). With a complex programme this isn't always obvious; in the example above the 2 pieces of code are side by side. The point about measurement is that profiling tools are useful here: when you _don't_ have an obvious primary target to improve but do need to improve efficiency, a profiling tool can measure where a programme spends its time in aggregate. That can point the way to code more beneficial to inspect. It at least gets you objective information about where your programme spends its time. It is limited by the data you give your programme: toy example input data are not as good as real world data. Finally, some things are as efficient as they get. You _can't_ always make things even more efficient. Cheers, Cameron Simpson ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How to find optimisations for code
On 19/10/18 17:12, Pat Martin wrote: > TLDR; How do you figure out if code is inefficient (if it isn't necessarily > obvious) and how do you find a more efficient solution? Others have addressed most of the issues but I'd just add the caveat that you can spend forever trying to make your code "more efficient". Usually, in the real world, you don't need to. So how do you figure out if your code is inefficient? Run it and if it doesn't work fast enough for your purpose then try to speed it up. If it does then move onto the next project. > how do I go about finding a more efficient solution? Google, a good reference book and experience. But always, always, measure because there has been more time wasted on unnecessary optimisations than on any other feature of programming. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How to find optimisations for code
Pat Martin wrote: > So should we always use sets or dictionaries if possible? Are these more > efficient because of not keeping track of their position? Not always. If you want to know how often one entry/char occurs in a linear storage like a list, a string, or a file, then you can loop over the data: num_c = some_string.count(c) # the loop is hidden in the count() method pairs = (line.partition("=")[::2] for line in file) value = next(value for key, value in pairs if key == wanted_key) However, if you expect frequent similar questions go with a lookup table: freq = Counter(some_string) for c in strings.ascii_lowercase: print(c, "occurs", freq[c], "times") lookup = dict(pairs) for key in wanted_keys: print(key, "is", lookup.get(key, "unknown")) Before you start measure if the effort may be worthwhile, i. e. whether the linear approach causes a relevant delay. When you're done doublecheck if you got the desired improvement, i. e. whether you optimised the right place. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How to find optimisations for code
So should we always use sets or dictionaries if possible? Are these more efficient because of not keeping track of their position? On Fri, Oct 19, 2018 at 10:47 AM Pat Martin wrote: > That's correct sorry, I reassigned with replace back to string2, I forgot > that in my retyping of the snippet. > > That makes sense, when it comes to strings and it taking so long. Thanks > for explaining that. > > On Fri, Oct 19, 2018 at 10:09 AM Peter Otten <__pete...@web.de> wrote: > >> Mats Wichmann wrote: >> >> > On 10/19/2018 10:12 AM, Pat Martin wrote: >> >> Sorry my first email didn't have a subject line >> >> >> >> TLDR; How do you figure out if code is inefficient (if it isn't >> >> necessarily obvious) and how do you find a more efficient solution? >> > >> > I think you've hit it in your last sentence ("except maybe write more >> > code and get more experience"): experience will let you recognize >> > patterns. >> > >> >> I use code wars sometimes to get some practice with Python, there was a >> >> challenge to compare two strings and if string1 had enough characters >> to >> >> be rearranged to make string2 return True, otherwise return False. >> >> >> >> I wrote a script that was like this: >> >> >> >> for i in string1: >> >> if i not in string2: >> >> return False >> >> string2.replace(i,"",1) >> >> return True >> >> >> >> This worked but I kept getting that my function was too inefficient and >> >> it took too long. I did a search for the problem and found someone was >> >> using collections.Counter. This basically takes the string and returns >> >> the number of times each character occurs in it. Then just compare the >> >> count of one string to another to see if there is enough of each letter >> >> to make the other string. This seems like an elegant way to do it. >> > >> > notwithstanding that the challenge is a little contrived... here's >> > something you will come to recognize. You use the string replace >> > method, which is described thus, pasting from the official docs: >> > >> > """ >> > str.replace(old, new[, count]) >> > >> > Return a copy of the string with all occurrences of substring old >> > replaced by new. If the optional argument count is given, only the first >> > count occurrences are replaced. >> > """ >> > >> > Strings are not modifiable, so there is no replace-in-place. Each time >> > you call string2.replace you build a new string... and then do nothing >> > with that string, as you never assign a name to it. So that line clearly >> > makes your code inefficient - you do some work that is just thrown away. >> > And it's not clear what the purpose of this replacement is, anyway. >> >> This is probably retyped from memory. If the line were >> >> string2 = string2.replace(i,"",1) >> >> it would address your concern below about repeated chars. >> >> >>> def contains(s, t): >> ... for c in s: >> ... if c not in t: return False >> ... t = t.replace(c, "", 1) # remove one occurence of c from t >> ... return True >> ... >> >>> contains("ata", "attax") >> True >> >>> contains("tata", "attax") >> True >> >>> contains("tatat", "attax") # not enough 't's >> False >> >> > Further it's not clear that you have the right answer. What if string1 >> > contains one 'r' and string2 contains three? For the one 'r', the test >> > that it is also in string2 succeeds... but nowhere do you find out that >> > you actually needed to have three in order to be able to "rearrange to >> > make string2". >> > >> > Solving this problem might benefit from thinking about tests first: if >> > you write a test where you know the answer is going to be negative, a >> > second where it is going to be positive, and a third that is a >> > non-trivial instance of positive (like the multiple-letter count), then >> > you have something to code your solution against. After it works, you >> > can then think about refactoring: is there a better way? This will kind >> > of naturally lead you to thinking in terms of efficiency. >> > >> > Lesson 2: for very many tasks, someone has already done it, and you can >> > benefit by using some existing code, from the standard library or a >> > module installed separately. That's often the difference between a >> > programming challenge, which may want you to code it yourself to show >> > you can, and real-world problem solving, where you are rewarded (in >> > time) by efficiently reusing existing code. >> > >> > >> > >> > ___ >> > Tutor maillist - Tutor@python.org >> > To unsubscribe or change subscription options: >> > https://mail.python.org/mailman/listinfo/tutor >> >> >> ___ >> Tutor maillist - Tutor@python.org >> To unsubscribe or change subscription options: >> https://mail.python.org/mailman/listinfo/tutor >> > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options:
Re: [Tutor] How to find optimisations for code
That's correct sorry, I reassigned with replace back to string2, I forgot that in my retyping of the snippet. That makes sense, when it comes to strings and it taking so long. Thanks for explaining that. On Fri, Oct 19, 2018 at 10:09 AM Peter Otten <__pete...@web.de> wrote: > Mats Wichmann wrote: > > > On 10/19/2018 10:12 AM, Pat Martin wrote: > >> Sorry my first email didn't have a subject line > >> > >> TLDR; How do you figure out if code is inefficient (if it isn't > >> necessarily obvious) and how do you find a more efficient solution? > > > > I think you've hit it in your last sentence ("except maybe write more > > code and get more experience"): experience will let you recognize > > patterns. > > > >> I use code wars sometimes to get some practice with Python, there was a > >> challenge to compare two strings and if string1 had enough characters to > >> be rearranged to make string2 return True, otherwise return False. > >> > >> I wrote a script that was like this: > >> > >> for i in string1: > >> if i not in string2: > >> return False > >> string2.replace(i,"",1) > >> return True > >> > >> This worked but I kept getting that my function was too inefficient and > >> it took too long. I did a search for the problem and found someone was > >> using collections.Counter. This basically takes the string and returns > >> the number of times each character occurs in it. Then just compare the > >> count of one string to another to see if there is enough of each letter > >> to make the other string. This seems like an elegant way to do it. > > > > notwithstanding that the challenge is a little contrived... here's > > something you will come to recognize. You use the string replace > > method, which is described thus, pasting from the official docs: > > > > """ > > str.replace(old, new[, count]) > > > > Return a copy of the string with all occurrences of substring old > > replaced by new. If the optional argument count is given, only the first > > count occurrences are replaced. > > """ > > > > Strings are not modifiable, so there is no replace-in-place. Each time > > you call string2.replace you build a new string... and then do nothing > > with that string, as you never assign a name to it. So that line clearly > > makes your code inefficient - you do some work that is just thrown away. > > And it's not clear what the purpose of this replacement is, anyway. > > This is probably retyped from memory. If the line were > > string2 = string2.replace(i,"",1) > > it would address your concern below about repeated chars. > > >>> def contains(s, t): > ... for c in s: > ... if c not in t: return False > ... t = t.replace(c, "", 1) # remove one occurence of c from t > ... return True > ... > >>> contains("ata", "attax") > True > >>> contains("tata", "attax") > True > >>> contains("tatat", "attax") # not enough 't's > False > > > Further it's not clear that you have the right answer. What if string1 > > contains one 'r' and string2 contains three? For the one 'r', the test > > that it is also in string2 succeeds... but nowhere do you find out that > > you actually needed to have three in order to be able to "rearrange to > > make string2". > > > > Solving this problem might benefit from thinking about tests first: if > > you write a test where you know the answer is going to be negative, a > > second where it is going to be positive, and a third that is a > > non-trivial instance of positive (like the multiple-letter count), then > > you have something to code your solution against. After it works, you > > can then think about refactoring: is there a better way? This will kind > > of naturally lead you to thinking in terms of efficiency. > > > > Lesson 2: for very many tasks, someone has already done it, and you can > > benefit by using some existing code, from the standard library or a > > module installed separately. That's often the difference between a > > programming challenge, which may want you to code it yourself to show > > you can, and real-world problem solving, where you are rewarded (in > > time) by efficiently reusing existing code. > > > > > > > > ___ > > Tutor maillist - Tutor@python.org > > To unsubscribe or change subscription options: > > https://mail.python.org/mailman/listinfo/tutor > > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How to find optimisations for code
Mats Wichmann wrote: > On 10/19/2018 10:12 AM, Pat Martin wrote: >> Sorry my first email didn't have a subject line >> >> TLDR; How do you figure out if code is inefficient (if it isn't >> necessarily obvious) and how do you find a more efficient solution? > > I think you've hit it in your last sentence ("except maybe write more > code and get more experience"): experience will let you recognize > patterns. > >> I use code wars sometimes to get some practice with Python, there was a >> challenge to compare two strings and if string1 had enough characters to >> be rearranged to make string2 return True, otherwise return False. >> >> I wrote a script that was like this: >> >> for i in string1: >> if i not in string2: >> return False >> string2.replace(i,"",1) >> return True >> >> This worked but I kept getting that my function was too inefficient and >> it took too long. I did a search for the problem and found someone was >> using collections.Counter. This basically takes the string and returns >> the number of times each character occurs in it. Then just compare the >> count of one string to another to see if there is enough of each letter >> to make the other string. This seems like an elegant way to do it. > > notwithstanding that the challenge is a little contrived... here's > something you will come to recognize. You use the string replace > method, which is described thus, pasting from the official docs: > > """ > str.replace(old, new[, count]) > > Return a copy of the string with all occurrences of substring old > replaced by new. If the optional argument count is given, only the first > count occurrences are replaced. > """ > > Strings are not modifiable, so there is no replace-in-place. Each time > you call string2.replace you build a new string... and then do nothing > with that string, as you never assign a name to it. So that line clearly > makes your code inefficient - you do some work that is just thrown away. > And it's not clear what the purpose of this replacement is, anyway. This is probably retyped from memory. If the line were string2 = string2.replace(i,"",1) it would address your concern below about repeated chars. >>> def contains(s, t): ... for c in s: ... if c not in t: return False ... t = t.replace(c, "", 1) # remove one occurence of c from t ... return True ... >>> contains("ata", "attax") True >>> contains("tata", "attax") True >>> contains("tatat", "attax") # not enough 't's False > Further it's not clear that you have the right answer. What if string1 > contains one 'r' and string2 contains three? For the one 'r', the test > that it is also in string2 succeeds... but nowhere do you find out that > you actually needed to have three in order to be able to "rearrange to > make string2". > > Solving this problem might benefit from thinking about tests first: if > you write a test where you know the answer is going to be negative, a > second where it is going to be positive, and a third that is a > non-trivial instance of positive (like the multiple-letter count), then > you have something to code your solution against. After it works, you > can then think about refactoring: is there a better way? This will kind > of naturally lead you to thinking in terms of efficiency. > > Lesson 2: for very many tasks, someone has already done it, and you can > benefit by using some existing code, from the standard library or a > module installed separately. That's often the difference between a > programming challenge, which may want you to code it yourself to show > you can, and real-world problem solving, where you are rewarded (in > time) by efficiently reusing existing code. > > > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How to find optimisations for code
Pat Martin wrote: > Sorry my first email didn't have a subject line > > TLDR; How do you figure out if code is inefficient (if it isn't > necessarily obvious) and how do you find a more efficient solution? > > I use code wars sometimes to get some practice with Python, there was a > challenge to compare two strings and if string1 had enough characters to > be rearranged to make string2 return True, otherwise return False. > > I wrote a script that was like this: > > for i in string1: > if i not in string2: > return False > string2.replace(i,"",1) > return True > > This worked but I kept getting that my function was too inefficient and it > took too long. I did a search for the problem and found someone was using > collections.Counter. This basically takes the string and returns the > number of times each character occurs in it. Then just compare the count > of one string to another to see if there is enough of each letter to make > the other string. This seems like an elegant way to do it. > > My question is, how do I know something is inefficient and more > importantly how do I go about finding a more efficient solution? > > I have just discovered dir() and it has really helped in finding methods > for items but that doesn't help when finding actual packages especially if > I don't know exactly what I am looking for. > > Sorry about the long email, not sure if there is even an answer to this > question except maybe write more code and get more experience? In Python my rule of thumb is: replace lists and loops with dict and set lookups whenever possible. One small challenge with this pattern is that not all loops look like a loop: char in some_string # "slow"; have to loop through the whole string char in set_of_chars # "fast"; calculate some hash value and then likely # jump to the right set entry ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How to find optimisations for code
On 10/19/2018 10:12 AM, Pat Martin wrote: > Sorry my first email didn't have a subject line > > TLDR; How do you figure out if code is inefficient (if it isn't necessarily > obvious) and how do you find a more efficient solution? I think you've hit it in your last sentence ("except maybe write more code and get more experience"): experience will let you recognize patterns. > I use code wars sometimes to get some practice with Python, there was a > challenge to compare two strings and if string1 had enough characters to be > rearranged to make string2 return True, otherwise return False. > > I wrote a script that was like this: > > for i in string1: > if i not in string2: > return False > string2.replace(i,"",1) > return True > > This worked but I kept getting that my function was too inefficient and it > took too long. I did a search for the problem and found someone was using > collections.Counter. This basically takes the string and returns the number > of times each character occurs in it. Then just compare the count of one > string to another to see if there is enough of each letter to make the > other string. This seems like an elegant way to do it. notwithstanding that the challenge is a little contrived... here's something you will come to recognize. You use the string replace method, which is described thus, pasting from the official docs: """ str.replace(old, new[, count]) Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced. """ Strings are not modifiable, so there is no replace-in-place. Each time you call string2.replace you build a new string... and then do nothing with that string, as you never assign a name to it. So that line clearly makes your code inefficient - you do some work that is just thrown away. And it's not clear what the purpose of this replacement is, anyway. Further it's not clear that you have the right answer. What if string1 contains one 'r' and string2 contains three? For the one 'r', the test that it is also in string2 succeeds... but nowhere do you find out that you actually needed to have three in order to be able to "rearrange to make string2". Solving this problem might benefit from thinking about tests first: if you write a test where you know the answer is going to be negative, a second where it is going to be positive, and a third that is a non-trivial instance of positive (like the multiple-letter count), then you have something to code your solution against. After it works, you can then think about refactoring: is there a better way? This will kind of naturally lead you to thinking in terms of efficiency. Lesson 2: for very many tasks, someone has already done it, and you can benefit by using some existing code, from the standard library or a module installed separately. That's often the difference between a programming challenge, which may want you to code it yourself to show you can, and real-world problem solving, where you are rewarded (in time) by efficiently reusing existing code. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] How to find optimisations for code
Sorry my first email didn't have a subject line TLDR; How do you figure out if code is inefficient (if it isn't necessarily obvious) and how do you find a more efficient solution? I use code wars sometimes to get some practice with Python, there was a challenge to compare two strings and if string1 had enough characters to be rearranged to make string2 return True, otherwise return False. I wrote a script that was like this: for i in string1: if i not in string2: return False string2.replace(i,"",1) return True This worked but I kept getting that my function was too inefficient and it took too long. I did a search for the problem and found someone was using collections.Counter. This basically takes the string and returns the number of times each character occurs in it. Then just compare the count of one string to another to see if there is enough of each letter to make the other string. This seems like an elegant way to do it. My question is, how do I know something is inefficient and more importantly how do I go about finding a more efficient solution? I have just discovered dir() and it has really helped in finding methods for items but that doesn't help when finding actual packages especially if I don't know exactly what I am looking for. Sorry about the long email, not sure if there is even an answer to this question except maybe write more code and get more experience? Pat ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor