Re: anagram finder / dict mapping question
On May 9, 11:19 pm, dave [EMAIL PROTECTED] wrote: On 2008-05-09 18:53:19 -0600, George Sakkis [EMAIL PROTECTED] said: On May 9, 5:19 pm, [EMAIL PROTECTED] wrote: What would be the best method to print the top results, the one's that had the highest amount of anagrams?? Create a new histogram dict? You can use the max() function to find the biggest list of anagrams: top_results = max(anagrams.itervalues(), key=len) -- Arnaud That is the biggest list of anagrams, what if I wanted the 3 biggest lists? Is there a way to specific top three w/ a max command?? import heapq help(heapq.nlargest) Help on function nlargest in module heapq: nlargest(n, iterable, key=None) Find the n largest elements in a dataset. Equivalent to: sorted(iterable, key=key, reverse=True)[:n] HTH, George I both the 'nlargest' and the 'sorted' methods.. I could only get the sorted to work.. however it would only return values like (3, edam) not the list of words.. Here is what I have now.. Am I over-analyzing this? It doesn't seem like it should be this hard to get the program to print the largest set of anagrams first... def anafind(): fin = open('text.txt') mapdic = {} finalres = [] # top answers go here for line in fin: line = line.strip() alphaword = ''.join(sorted(line)) #sorted word as key if alphaword not in mapdic: mapdic[alphaword] = [line] else: mapdic[alphaword].append(line) topans = sorted((len(mapdic[key]), key) for key in mapdic.keys())[-3:] #top 3 answers for key, value in topans: # finalres.append(mapdic[value]) print finalres Here is a working, cleaned up version: from heapq import nlargest from collections import defaultdict def anagrams(words, top=None): key2words = defaultdict(set) for word in words: word = word.strip() key = ''.join(sorted(word)) key2words[key].add(word) if top is None: return sorted(key2words.itervalues(), key=len, reverse=True) else: return nlargest(top, key2words.itervalues(), key=len) if __name__ == '__main__': wordlist = ['live', 'evil', 'one', 'nose', 'vile', 'neo'] for words in anagrams(wordlist,2): print words By the way, try to generalize your functions (and rest of the code for that matter) so that it can be reused easily. For example, hardcoding the input file name in the function's body is usually undesirable. Similarly for other constants like 'get top 3 answers'; it doesn't cost you anything to replace 3 with 'top' and pass it as an argument (or set it as default top=3 if that's the default case). HTH, George -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
Kam-Hung Soh [EMAIL PROTECTED] writes: On Sat, 10 May 2008 07:19:38 +1000, [EMAIL PROTECTED] wrote: What would be the best method to print the top results, the one's that had the highest amount of anagrams?? Create a new histogram dict? You can use the max() function to find the biggest list of anagrams: top_results = max(anagrams.itervalues(), key=len) -- Arnaud That is the biggest list of anagrams, what if I wanted the 3 biggest lists? Is there a way to specific top three w/ a max command?? Built-in max() function only returns one result. My solution is to make a sorted list and return last three items: sorted((len(anagrams[key]), key) for key in anagrams.keys())[-3:] Using the key= parameter of sorted() beats explicit DSU: sorted(anagrams.itervalues(), key=len)[-3:] Or even sorted(anagrams.itervalues(), key=len, reverse=True)[:3] -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
On Sat, 10 May 2008 18:06:17 +1000, Arnaud Delobelle [EMAIL PROTECTED] wrote: Kam-Hung Soh [EMAIL PROTECTED] writes: On Sat, 10 May 2008 07:19:38 +1000, [EMAIL PROTECTED] wrote: What would be the best method to print the top results, the one's that had the highest amount of anagrams?? Create a new histogram dict? You can use the max() function to find the biggest list of anagrams: top_results = max(anagrams.itervalues(), key=len) -- Arnaud That is the biggest list of anagrams, what if I wanted the 3 biggest lists? Is there a way to specific top three w/ a max command?? Built-in max() function only returns one result. My solution is to make a sorted list and return last three items: sorted((len(anagrams[key]), key) for key in anagrams.keys())[-3:] Using the key= parameter of sorted() beats explicit DSU: sorted(anagrams.itervalues(), key=len)[-3:] Or even sorted(anagrams.itervalues(), key=len, reverse=True)[:3] Nice! I just found out that DSU = Decorate-Sort-Undecorate idiom, from http://wiki.python.org/moin/HowTo/Sorting. -- Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/a -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
Kam-Hung Soh [EMAIL PROTECTED] writes: I avoid creating a temporary variable if it's only used once immediately, so I would have written: newtlist = ''.join(sorted(list(line))) Or ''.join(sorted(line)) as sorted() works with any iterable. -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
key = ''.join(sorted(word)) I tend to strip and lower the word as well, otherwise Hello and hello do not compare...depends on what you want though! Plus you might get a lot of word\n as keys... My technique is the this way def anagram_finder(words): anagrams = {} for word in words: word = word.strip() key = ''.join(sorted(word.lower())) anagrams.setdefault(key, []).append(word) return anagrams -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
On May 9, 1:45 am, [EMAIL PROTECTED] wrote: key = ''.join(sorted(word)) I tend to strip and lower the word as well, otherwise Hello and hello do not compare...depends on what you want though! Plus you might get a lot of word\n as keys... My technique is the this way def anagram_finder(words): anagrams = {} for word in words: word = word.strip() key = ''.join(sorted(word.lower())) anagrams.setdefault(key, []).append(word) return anagrams What would be the best method to print the top results, the one's that had the highest amount of anagrams?? Create a new histogram dict? -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
[EMAIL PROTECTED] writes: On May 9, 1:45 am, [EMAIL PROTECTED] wrote: key = ''.join(sorted(word)) I tend to strip and lower the word as well, otherwise Hello and hello do not compare...depends on what you want though! Plus you might get a lot of word\n as keys... My technique is the this way def anagram_finder(words): anagrams = {} for word in words: word = word.strip() key = ''.join(sorted(word.lower())) anagrams.setdefault(key, []).append(word) return anagrams What would be the best method to print the top results, the one's that had the highest amount of anagrams?? Create a new histogram dict? You can use the max() function to find the biggest list of anagrams: top_results = max(anagrams.itervalues(), key=len) -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
What would be the best method to print the top results, the one's that had the highest amount of anagrams?? Create a new histogram dict? You can use the max() function to find the biggest list of anagrams: top_results = max(anagrams.itervalues(), key=len) -- Arnaud That is the biggest list of anagrams, what if I wanted the 3 biggest lists? Is there a way to specific top three w/ a max command?? -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
On Sat, 10 May 2008 07:19:38 +1000, [EMAIL PROTECTED] wrote: What would be the best method to print the top results, the one's that had the highest amount of anagrams?? Create a new histogram dict? You can use the max() function to find the biggest list of anagrams: top_results = max(anagrams.itervalues(), key=len) -- Arnaud That is the biggest list of anagrams, what if I wanted the 3 biggest lists? Is there a way to specific top three w/ a max command?? Built-in max() function only returns one result. My solution is to make a sorted list and return last three items: sorted((len(anagrams[key]), key) for key in anagrams.keys())[-3:] -- Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/a -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
On May 9, 5:19 pm, [EMAIL PROTECTED] wrote: What would be the best method to print the top results, the one's that had the highest amount of anagrams?? Create a new histogram dict? You can use the max() function to find the biggest list of anagrams: top_results = max(anagrams.itervalues(), key=len) -- Arnaud That is the biggest list of anagrams, what if I wanted the 3 biggest lists? Is there a way to specific top three w/ a max command?? import heapq help(heapq.nlargest) Help on function nlargest in module heapq: nlargest(n, iterable, key=None) Find the n largest elements in a dataset. Equivalent to: sorted(iterable, key=key, reverse=True)[:n] HTH, George -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
On 2008-05-09 18:53:19 -0600, George Sakkis [EMAIL PROTECTED] said: On May 9, 5:19 pm, [EMAIL PROTECTED] wrote: What would be the best method to print the top results, the one's that had the highest amount of anagrams?? Create a new histogram dict? You can use the max() function to find the biggest list of anagrams: top_results = max(anagrams.itervalues(), key=len) -- Arnaud That is the biggest list of anagrams, what if I wanted the 3 biggest lists? Is there a way to specific top three w/ a max command?? import heapq help(heapq.nlargest) Help on function nlargest in module heapq: nlargest(n, iterable, key=None) Find the n largest elements in a dataset. Equivalent to: sorted(iterable, key=key, reverse=True)[:n] HTH, George I both the 'nlargest' and the 'sorted' methods.. I could only get the sorted to work.. however it would only return values like (3, edam) not the list of words.. Here is what I have now.. Am I over-analyzing this? It doesn't seem like it should be this hard to get the program to print the largest set of anagrams first... def anafind(): fin = open('text.txt') mapdic = {} finalres = [] # top answers go here for line in fin: line = line.strip() alphaword = ''.join(sorted(line)) #sorted word as key if alphaword not in mapdic: mapdic[alphaword] = [line] else: mapdic[alphaword].append(line) topans = sorted((len(mapdic[key]), key) for key in mapdic.keys())[-3:] #top 3 answers for key, value in topans: # finalres.append(mapdic[value]) print finalres -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
On Thu, 08 May 2008 15:42:07 +1000, dave [EMAIL PROTECTED] wrote: This is what i've came up with. My problem is that I can't get them to properly evaluate.. when comparewords() runs it finds itself... Should I have the keys of mapdict iterate over itself? Is that possible? def annafind(): fin = open('text.txt') # file has one word per line mapdic = {} # each word gets sorted goes in here for line in fin: rawword = line.strip() word = list(rawword) word.sort() mapdic[''.join(word)] = 0 return mapdic def comparewords(): ***not working as intended fin = open('text.txt') for line in fin: line = line.strip() word = list(line) word.sort() sortedword = (''.join(word)) if sortedword in mapdic: print line On 2008-05-07 19:25:53 -0600, Kam-Hung Soh [EMAIL PROTECTED] said: On Thu, 08 May 2008 11:02:12 +1000, dave [EMAIL PROTECTED] wrote: Hi All, I wrote a program that takes a string sequence and finds all the words inside a text file (one word per line) and prints them: def anagfind(letters): #find anagrams of these letters fin = open('text.txt') #one word per line file wordbox = [] #this is where the words will go for line in fin: word = line.strip() count = 0 for char in letters: if char not in word: break else: count += 1 if count == len(word): wordbox.append(word) return wordbox Now I'd like to modify the code to naturally find all anagrams inside a wordlist. What would be the best way to do this? Using Hints? Is it possible to iterate over dict keys? How can I make a dict that maps from a set of letters to a list of words that are spelled from those letters? Wouldn't I need to make the set of letters a key in a dict? As always - Thanks for helping someone trying to learn... Dave Suggestion: for each word, sort their characters and use them as the dictionary key. If two words have the same combination of characters, then they are anagrams. For example: edam and made are anagrams because they have the letters 'a', 'd', 'e' and 'm'. Refer Programming Pearls by Jon Bentley. -- Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/ a Your code is always going to return the same list because every word is an anagram of itself. Tip: Create a list for each dictionary key, then add a word to the list if that word is not in the list. So: mapdic('adem') -- [edam, made] P.S. When replying, the convention is to add your response to the bottom, not top of the message. -- Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/a -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
Kam-Hung Soh [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] | On Thu, 08 May 2008 15:42:07 +1000, dave [EMAIL PROTECTED] | wrote: | Your code is always going to return the same list because every word is an | anagram of itself. | | Tip: Create a list for each dictionary key, then add a word to the list if | that word is not in the list. So: | | mapdic('adem') -- [edam, made] Unless you plan to keep the list in alphabetical order, a set might be slightly better. -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
Been there, done that: http://polyglot-anagrams.googlecode.com/svn/trunk/python/ -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
On 2008-05-07 19:25:53 -0600, Kam-Hung Soh [EMAIL PROTECTED] said: On Thu, 08 May 2008 11:02:12 +1000, dave [EMAIL PROTECTED] t wrote: Hi All, I wrote a program that takes a string sequence and finds all the wo rds inside a text file (one word per line) and prints them: def anagfind(letters): #find anagrams of these letters fin = open('text.txt') #one word per line file wordbox = [] #this is where the words will go for line in fin: word = line.strip() count = 0 for char in letters: if char not in word: break else: count += 1 if count == len(word): wordbox.append(word) return wordbox Now I'd like to modify the code to naturally find all anagrams insi de a wordlist. What would be the best way to do this? Using Hints? Is it possible to iterate over dict keys? How can I make a dict that maps from a set of letters to a list of words that are spelled from those letters? Wouldn't I need to make the set of letters a key in a dict ? As always - Thanks for helping someone trying to learn... Dave Suggestion: for each word, sort their characters and use them as the dictionary key. If two words have the same combination of characters , then they are anagrams. For example: edam and made are anagrams because they have the letters 'a', 'd', 'e' and 'm'. Refer Programming Pearls by Jon Bentley. -- Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salarima n/ a Your code is always going to return the same list because every word is an anagram of itself. Tip: Create a list for each dictionary key, then add a word to the list if that word is not in the list. So: mapdic('adem') -- [edam, made] P.S. When replying, the convention is to add your response to the bottom , not top of the message. -- Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/ a I got it! Thanks for all your help!!! Please tell me what you think: def anafind(): fin = open('short.txt') #one word per line mapdic = {} #this dic maps letters to anagrams for line in fin: line = line.strip() templist = sorted(list(line))#sort letters newtlist = (''.join(templist)) #join letters if newtlist not in mapdic: mapdic[newtlist] = [line] if line not in mapdic[newtlist]: mapdic[newtlist].append(line) for value in mapdic.values(): if len(value) = 1: pass else: print value -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
dave [EMAIL PROTECTED] writes: if newtlist not in mapdic: mapdic[newtlist] = [line] if line not in mapdic[newtlist]: mapdic[newtlist].append(line) I'd use the defaultdict module to avoid the above. for value in mapdic.values(): if len(value) = 1: pass else: print value I'd write: for value in mapdic.values(): if len(value) 1: print value I'd actually use itervalues rather than values, but that's a little bit fancy, you don't have to worry about it til later. -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
On Fri, 09 May 2008 09:52:53 +1000, dave [EMAIL PROTECTED] wrote: I got it! Thanks for all your help!!! Please tell me what you think: def anafind(): fin = open('short.txt') #one word per line mapdic = {} #this dic maps letters to anagrams for line in fin: line = line.strip() templist = sorted(list(line)) #sort letters newtlist = (''.join(templist)) #join letters if newtlist not in mapdic: mapdic[newtlist] = [line] if line not in mapdic[newtlist]: mapdic[newtlist].append(line) for value in mapdic.values(): if len(value) = 1: pass else: print value I avoid creating a temporary variable if it's only used once immediately, so I would have written: newtlist = ''.join(sorted(list(line))) Looks pretty good, and I found the other tips in parallel responses useful to me too! -- Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/a -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
On Thu, May 8, 2008 at 7:52 PM, dave [EMAIL PROTECTED] wrote: I got it! Thanks for all your help!!! Please tell me what you think: Here's yet another version of the same thing, using defaultdicts and sets. This way we don't have to test for membership in either the dictionary or the collection of anagrams. The code is shorter, but no less readable in my opinion. from collections import defaultdict def anafind(words): Given a sequence of words, return a dictionary with groups of letters as keys and sets of anagrams as values anagrams = defaultdict(set) for word in words: key = ''.join(sorted(word)) anagrams[key].add(word) return anagrams if __name__ == __main__: wordlist = ['live', 'evil', 'one', 'nose', 'vile', 'neo'] anagrams = anafind(wordlist) for letters, words in anagrams.items(): print %s: %s % (letters, words) -- Jerry -- http://mail.python.org/mailman/listinfo/python-list
anagram finder / dict mapping question
Hi All, I wrote a program that takes a string sequence and finds all the words inside a text file (one word per line) and prints them: def anagfind(letters): #find anagrams of these letters fin = open('text.txt') #one word per line file wordbox = [] #this is where the words will go for line in fin: word = line.strip() count = 0 for char in letters: if char not in word: break else: count += 1 if count == len(word): wordbox.append(word) return wordbox Now I'd like to modify the code to naturally find all anagrams inside a wordlist. What would be the best way to do this? Using Hints? Is it possible to iterate over dict keys? How can I make a dict that maps from a set of letters to a list of words that are spelled from those letters? Wouldn't I need to make the set of letters a key in a dict? As always - Thanks for helping someone trying to learn... Dave -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
On Thu, 08 May 2008 11:02:12 +1000, dave [EMAIL PROTECTED] wrote: Hi All, I wrote a program that takes a string sequence and finds all the words inside a text file (one word per line) and prints them: def anagfind(letters): #find anagrams of these letters fin = open('text.txt') #one word per line file wordbox = [] #this is where the words will go for line in fin: word = line.strip() count = 0 for char in letters: if char not in word: break else: count += 1 if count == len(word): wordbox.append(word) return wordbox Now I'd like to modify the code to naturally find all anagrams inside a wordlist. What would be the best way to do this? Using Hints? Is it possible to iterate over dict keys? How can I make a dict that maps from a set of letters to a list of words that are spelled from those letters? Wouldn't I need to make the set of letters a key in a dict? As always - Thanks for helping someone trying to learn... Dave Suggestion: for each word, sort their characters and use them as the dictionary key. If two words have the same combination of characters, then they are anagrams. For example: edam and made are anagrams because they have the letters 'a', 'd', 'e' and 'm'. Refer Programming Pearls by Jon Bentley. -- Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/a -- http://mail.python.org/mailman/listinfo/python-list
Re: anagram finder / dict mapping question
This is what i've came up with. My problem is that I can't get them to properly evaluate.. when comparewords() runs it finds itself... Should I have the keys of mapdict iterate over itself? Is that possible? def annafind(): fin = open('text.txt') # file has one word per line mapdic = {} # each word gets sorted goes in here for line in fin: rawword = line.strip() word = list(rawword) word.sort() mapdic[''.join(word)] = 0 return mapdic def comparewords(): ***not working as intended fin = open('text.txt') for line in fin: line = line.strip() word = list(line) word.sort() sortedword = (''.join(word)) if sortedword in mapdic: print line On 2008-05-07 19:25:53 -0600, Kam-Hung Soh [EMAIL PROTECTED] said: On Thu, 08 May 2008 11:02:12 +1000, dave [EMAIL PROTECTED] wrote: Hi All, I wrote a program that takes a string sequence and finds all the words inside a text file (one word per line) and prints them: def anagfind(letters): #find anagrams of these letters fin = open('text.txt') #one word per line file wordbox = [] #this is where the words will go for line in fin: word = line.strip() count = 0 for char in letters: if char not in word: break else: count += 1 if count == len(word): wordbox.append(word) return wordbox Now I'd like to modify the code to naturally find all anagrams inside a wordlist. What would be the best way to do this? Using Hints? Is it possible to iterate over dict keys? How can I make a dict that maps from a set of letters to a list of words that are spelled from those letters? Wouldn't I need to make the set of letters a key in a dict? As always - Thanks for helping someone trying to learn... Dave Suggestion: for each word, sort their characters and use them as the dictionary key. If two words have the same combination of characters, then they are anagrams. For example: edam and made are anagrams because they have the letters 'a', 'd', 'e' and 'm'. Refer Programming Pearls by Jon Bentley. -- Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/ a -- http://mail.python.org/mailman/listinfo/python-list