Re: anagram finder / dict mapping question

2008-05-10 Thread George Sakkis
On May 9, 11:19 pm, dave [EMAIL PROTECTED] wrote:
 On 2008-05-09 18:53:19 -0600, George Sakkis [EMAIL PROTECTED] said:



  On May 9, 5:19 pm, [EMAIL PROTECTED] wrote:
  What would be the best method to print the top results, the one's that

  had the highest amount of anagrams??  Create a new histogram dict?

  You can use the max() function to find the biggest list of anagrams:

  top_results = max(anagrams.itervalues(), key=len)

  --
  Arnaud

  That is the biggest list of anagrams, what if I wanted the 3 biggest
  lists?  Is there a way to specific top three w/ a max command??

  import heapq
  help(heapq.nlargest)
  Help on function nlargest in module heapq:

  nlargest(n, iterable, key=None)
      Find the n largest elements in a dataset.

      Equivalent to:  sorted(iterable, key=key, reverse=True)[:n]

  HTH,
  George

 I both the 'nlargest' and the 'sorted' methods.. I could only get the
 sorted to work.. however it would only return values like (3,  edam)
 not the list of words..

 Here is what I have now.. Am I over-analyzing this?  It doesn't seem
 like it should be this hard to get the program to print the largest set
 of anagrams first...

 def anafind():
         fin = open('text.txt')
         mapdic = {}
         finalres = []                   # top answers go here
         for line in fin:
                 line = line.strip()
                 alphaword = ''.join(sorted(line))       #sorted word as key
                 if alphaword not in mapdic:
                         mapdic[alphaword] = [line]
                 else:
                         mapdic[alphaword].append(line)
         topans = sorted((len(mapdic[key]), key) for key in mapdic.keys())[-3:]
   #top 3 answers
         for key, value in topans:       #
                 finalres.append(mapdic[value])
         print finalres

Here is a working, cleaned up version:

from heapq import nlargest
from collections import defaultdict

def anagrams(words, top=None):
key2words = defaultdict(set)
for word in words:
word = word.strip()
key = ''.join(sorted(word))
key2words[key].add(word)
if top is None:
return sorted(key2words.itervalues(), key=len, reverse=True)
else:
return nlargest(top, key2words.itervalues(), key=len)

if __name__ == '__main__':
wordlist = ['live', 'evil', 'one', 'nose', 'vile', 'neo']
for words in anagrams(wordlist,2):
print words


By the way, try to generalize your functions (and rest of the code for
that matter) so that it can be reused easily. For example, hardcoding
the input file name in the function's body is usually undesirable.
Similarly for other constants like 'get top 3 answers'; it doesn't
cost you anything to replace 3 with 'top' and pass it as an argument
(or set it as default top=3 if that's the default case).

HTH,
George
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-10 Thread Arnaud Delobelle
Kam-Hung Soh [EMAIL PROTECTED] writes:

 On Sat, 10 May 2008 07:19:38 +1000, [EMAIL PROTECTED] wrote:

  What would be the best method to print the top results, the one's that
  had the highest amount of anagrams??  Create a new histogram dict?

 You can use the max() function to find the biggest list of anagrams:

 top_results = max(anagrams.itervalues(), key=len)

 --
 Arnaud

 That is the biggest list of anagrams, what if I wanted the 3 biggest
 lists?  Is there a way to specific top three w/ a max command??


 Built-in max() function only returns one result.

 My solution is to make a sorted list and return last three items:

 sorted((len(anagrams[key]), key) for key in anagrams.keys())[-3:]

Using the key= parameter of sorted() beats explicit DSU:

sorted(anagrams.itervalues(), key=len)[-3:]

Or even

sorted(anagrams.itervalues(), key=len, reverse=True)[:3]

-- 
Arnaud
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-10 Thread Kam-Hung Soh
On Sat, 10 May 2008 18:06:17 +1000, Arnaud Delobelle  
[EMAIL PROTECTED] wrote:



Kam-Hung Soh [EMAIL PROTECTED] writes:


On Sat, 10 May 2008 07:19:38 +1000, [EMAIL PROTECTED] wrote:

 What would be the best method to print the top results, the one's  
that

 had the highest amount of anagrams??  Create a new histogram dict?

You can use the max() function to find the biggest list of anagrams:

top_results = max(anagrams.itervalues(), key=len)

--
Arnaud


That is the biggest list of anagrams, what if I wanted the 3 biggest
lists?  Is there a way to specific top three w/ a max command??



Built-in max() function only returns one result.

My solution is to make a sorted list and return last three items:

sorted((len(anagrams[key]), key) for key in anagrams.keys())[-3:]


Using the key= parameter of sorted() beats explicit DSU:

sorted(anagrams.itervalues(), key=len)[-3:]

Or even

sorted(anagrams.itervalues(), key=len, reverse=True)[:3]



Nice!

I just found out that DSU = Decorate-Sort-Undecorate idiom, from  
http://wiki.python.org/moin/HowTo/Sorting.


--
Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/a
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-09 Thread Arnaud Delobelle
Kam-Hung Soh [EMAIL PROTECTED] writes:

 I avoid creating a temporary variable if it's only used once
 immediately,  so I would have written:

 newtlist = ''.join(sorted(list(line)))

Or ''.join(sorted(line))

as sorted() works with any iterable.

-- 
Arnaud
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-09 Thread cokofreedom
 key = ''.join(sorted(word))

I tend to strip and lower the word as well, otherwise Hello and
hello do not compare...depends on what you want though!
Plus you might get a lot of word\n as keys...

My technique is the this way

def anagram_finder(words):
anagrams = {}
for word in words:
word = word.strip()
key = ''.join(sorted(word.lower()))
anagrams.setdefault(key, []).append(word)
return anagrams
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-09 Thread umpsumps
On May 9, 1:45 am, [EMAIL PROTECTED] wrote:
  key = ''.join(sorted(word))

 I tend to strip and lower the word as well, otherwise Hello and
 hello do not compare...depends on what you want though!
 Plus you might get a lot of word\n as keys...

 My technique is the this way

 def anagram_finder(words):
 anagrams = {}
 for word in words:
 word = word.strip()
 key = ''.join(sorted(word.lower()))
 anagrams.setdefault(key, []).append(word)
 return anagrams

What would be the best method to print the top results, the one's that
had the highest amount of anagrams??  Create a new histogram dict?
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-09 Thread Arnaud Delobelle
[EMAIL PROTECTED] writes:

 On May 9, 1:45 am, [EMAIL PROTECTED] wrote:
  key = ''.join(sorted(word))

 I tend to strip and lower the word as well, otherwise Hello and
 hello do not compare...depends on what you want though!
 Plus you might get a lot of word\n as keys...

 My technique is the this way

 def anagram_finder(words):
 anagrams = {}
 for word in words:
 word = word.strip()
 key = ''.join(sorted(word.lower()))
 anagrams.setdefault(key, []).append(word)
 return anagrams

 What would be the best method to print the top results, the one's that
 had the highest amount of anagrams??  Create a new histogram dict?

You can use the max() function to find the biggest list of anagrams:

top_results = max(anagrams.itervalues(), key=len)

-- 
Arnaud
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-09 Thread umpsumps
  What would be the best method to print the top results, the one's that
  had the highest amount of anagrams??  Create a new histogram dict?

 You can use the max() function to find the biggest list of anagrams:

 top_results = max(anagrams.itervalues(), key=len)

 --
 Arnaud

That is the biggest list of anagrams, what if I wanted the 3 biggest
lists?  Is there a way to specific top three w/ a max command??

--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-09 Thread Kam-Hung Soh

On Sat, 10 May 2008 07:19:38 +1000, [EMAIL PROTECTED] wrote:


 What would be the best method to print the top results, the one's that
 had the highest amount of anagrams??  Create a new histogram dict?

You can use the max() function to find the biggest list of anagrams:

top_results = max(anagrams.itervalues(), key=len)

--
Arnaud


That is the biggest list of anagrams, what if I wanted the 3 biggest
lists?  Is there a way to specific top three w/ a max command??



Built-in max() function only returns one result.

My solution is to make a sorted list and return last three items:

sorted((len(anagrams[key]), key) for key in anagrams.keys())[-3:]

--
Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/a
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-09 Thread George Sakkis
On May 9, 5:19 pm, [EMAIL PROTECTED] wrote:
   What would be the best method to print the top results, the one's that
   had the highest amount of anagrams??  Create a new histogram dict?

  You can use the max() function to find the biggest list of anagrams:

  top_results = max(anagrams.itervalues(), key=len)

  --
  Arnaud

 That is the biggest list of anagrams, what if I wanted the 3 biggest
 lists?  Is there a way to specific top three w/ a max command??

 import heapq
 help(heapq.nlargest)
Help on function nlargest in module heapq:

nlargest(n, iterable, key=None)
Find the n largest elements in a dataset.

Equivalent to:  sorted(iterable, key=key, reverse=True)[:n]


HTH,
George
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-09 Thread dave

On 2008-05-09 18:53:19 -0600, George Sakkis [EMAIL PROTECTED] said:


On May 9, 5:19 pm, [EMAIL PROTECTED] wrote:

What would be the best method to print the top results, the one's that



had the highest amount of anagrams??  Create a new histogram dict?



You can use the max() function to find the biggest list of anagrams:



top_results = max(anagrams.itervalues(), key=len)



--
Arnaud


That is the biggest list of anagrams, what if I wanted the 3 biggest
lists?  Is there a way to specific top three w/ a max command??



import heapq
help(heapq.nlargest)

Help on function nlargest in module heapq:

nlargest(n, iterable, key=None)
Find the n largest elements in a dataset.

Equivalent to:  sorted(iterable, key=key, reverse=True)[:n]


HTH,
George


I both the 'nlargest' and the 'sorted' methods.. I could only get the 
sorted to work.. however it would only return values like (3,  edam) 
not the list of words..


Here is what I have now.. Am I over-analyzing this?  It doesn't seem 
like it should be this hard to get the program to print the largest set 
of anagrams first...



def anafind():
fin = open('text.txt')
mapdic = {}
finalres = []   # top answers go here
for line in fin:
line = line.strip()
alphaword = ''.join(sorted(line))   #sorted word as key
if alphaword not in mapdic:
mapdic[alphaword] = [line]
else:
mapdic[alphaword].append(line)
	topans = sorted((len(mapdic[key]), key) for key in mapdic.keys())[-3:] 
 #top 3 answers

for key, value in topans:   #
finalres.append(mapdic[value])
print finalres

--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-08 Thread Kam-Hung Soh
On Thu, 08 May 2008 15:42:07 +1000, dave [EMAIL PROTECTED]  
wrote:


This is what i've came up with.  My problem is that I can't get them to  
properly evaluate.. when  comparewords() runs it finds itself...  Should  
I have the keys of mapdict iterate over itself?  Is that possible?


def annafind():
fin = open('text.txt')  # file has one word per line
mapdic = {} # each word gets sorted  goes in here
for line in fin:
rawword = line.strip()
word = list(rawword)
word.sort()
mapdic[''.join(word)] = 0
return mapdic


def comparewords(): ***not working as intended
fin = open('text.txt')
for line in fin:
line = line.strip()
word = list(line)
word.sort()
sortedword = (''.join(word))
if sortedword in mapdic:
print line




On 2008-05-07 19:25:53 -0600, Kam-Hung Soh [EMAIL PROTECTED]  
said:



On Thu, 08 May 2008 11:02:12 +1000, dave [EMAIL PROTECTED]
wrote:


Hi All,
 I wrote a program that takes a string sequence and finds all the words



inside a text file (one word per line) and prints them:
 def anagfind(letters): #find anagrams of these letters
fin = open('text.txt')  #one word per line file
wordbox = [] #this is where the words will go
for line in fin:
word = line.strip()
count = 0
for char in letters:
if char not in word:
break
else:
count += 1
if count == len(word):
wordbox.append(word)
return wordbox
 Now I'd like to modify the code to naturally find all anagrams inside

a

wordlist.  What would be the best way to do this?  Using Hints?  Is it



possible to iterate over dict keys?  How can I make a dict that maps



from a set of letters to a list of words that are spelled from those



letters?  Wouldn't I need to make the set of letters a key in a dict?
 As always - Thanks for helping someone trying to learn...
 Dave


 Suggestion: for each word, sort their characters and use them as the
dictionary key.  If two words have the same combination of characters,
then they are anagrams.  For example: edam and made are anagrams
because they have the letters 'a', 'd', 'e' and 'm'.
 Refer Programming Pearls by Jon Bentley.
 --
Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/  
a





Your code is always going to return the same list because every word is an  
anagram of itself.


Tip: Create a list for each dictionary key, then add a word to the list if  
that word is not in the list.  So:


mapdic('adem') -- [edam, made]

P.S. When replying, the convention is to add your response to the bottom,  
not top of the message.


--
Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/a
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-08 Thread Terry Reedy

Kam-Hung Soh [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
| On Thu, 08 May 2008 15:42:07 +1000, dave [EMAIL PROTECTED]
| wrote:
| Your code is always going to return the same list because every word is 
an
| anagram of itself.
|
| Tip: Create a list for each dictionary key, then add a word to the list 
if
| that word is not in the list.  So:
|
| mapdic('adem') -- [edam, made]

Unless you plan to keep the list in alphabetical order, a set might be 
slightly better.



--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-08 Thread offby1
Been there, done that:
http://polyglot-anagrams.googlecode.com/svn/trunk/python/
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-08 Thread dave

On 2008-05-07 19:25:53 -0600, Kam-Hung Soh [EMAIL PROTECTED]



said:


On Thu, 08 May 2008 11:02:12 +1000, dave [EMAIL PROTECTED]

t

wrote:


Hi All,
 I wrote a program that takes a string sequence and finds all the wo

rds



inside a text file (one word per line) and prints them:
 def anagfind(letters): #find anagrams of these letters
fin = open('text.txt')  #one word per line file
wordbox = [] #this is where the words will go
for line in fin:
word = line.strip()
count = 0
for char in letters:
if char not in word:
break
else:
count += 1
if count == len(word):
wordbox.append(word)
return wordbox
 Now I'd like to modify the code to naturally find all anagrams insi

de

a

wordlist.  What would be the best way to do this?  Using Hints?  Is

it



possible to iterate over dict keys?  How can I make a dict that maps





from a set of letters to a list of words that are spelled from those





letters?  Wouldn't I need to make the set of letters a key in a dict

?

 As always - Thanks for helping someone trying to learn...
 Dave


 Suggestion: for each word, sort their characters and use them as the



dictionary key.  If two words have the same combination of characters

,

then they are anagrams.  For example: edam and made are anagrams
because they have the letters 'a', 'd', 'e' and 'm'.
 Refer Programming Pearls by Jon Bentley.
 --
Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salarima

n/

a





Your code is always going to return the same list because every word is  an
anagram of itself.

Tip: Create a list for each dictionary key, then add a word to the list  if
that word is not in the list.  So:

mapdic('adem') -- [edam, made]

P.S. When replying, the convention is to add your response to the bottom ,
not top of the message.

--
Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/ a


I got it!  Thanks for all your help!!!  Please tell me what you think:

def anafind():
   fin = open('short.txt')  #one word per line
   mapdic = {}  #this dic maps letters 
to anagrams
   for line in fin:
   line = line.strip()
   templist = sorted(list(line))#sort letters
   newtlist = (''.join(templist))   #join letters
   if newtlist not in mapdic:
   mapdic[newtlist] = [line]
   if line not in mapdic[newtlist]:
   mapdic[newtlist].append(line)
   for value in mapdic.values():
   if len(value) = 1:
   pass
   else:
   print value


--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-08 Thread Paul Rubin
dave [EMAIL PROTECTED] writes:
 if newtlist not in mapdic:
 mapdic[newtlist] = [line]
 if line not in mapdic[newtlist]:
 mapdic[newtlist].append(line)

I'd use the defaultdict module to avoid the above.

 for value in mapdic.values():
 if len(value) = 1:
 pass
 else:
 print value

I'd write:

   for value in mapdic.values():
  if len(value)  1:
print value

I'd actually use itervalues rather than values, but that's a little
bit fancy, you don't have to worry about it til later.
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-08 Thread Kam-Hung Soh
On Fri, 09 May 2008 09:52:53 +1000, dave [EMAIL PROTECTED]  
wrote:



I got it!  Thanks for all your help!!!  Please tell me what you think:

def anafind():
fin = open('short.txt') #one word per line
mapdic = {} #this dic maps letters 
to anagrams
for line in fin:
line = line.strip()
templist = sorted(list(line))   #sort letters
newtlist = (''.join(templist))  #join letters
if newtlist not in mapdic:
mapdic[newtlist] = [line]
if line not in mapdic[newtlist]:
mapdic[newtlist].append(line)
for value in mapdic.values():
if len(value) = 1:
pass
else:
print value




I avoid creating a temporary variable if it's only used once immediately,  
so I would have written:


newtlist = ''.join(sorted(list(line)))

Looks pretty good, and I found the other tips in parallel responses useful  
to me too!


--
Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/a
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-08 Thread Jerry Hill
On Thu, May 8, 2008 at 7:52 PM, dave [EMAIL PROTECTED] wrote:
   I got it!  Thanks for all your help!!!  Please tell me what you think:

 Here's yet another version of the same thing, using defaultdicts and
 sets.  This way we don't have to test for membership in either the
 dictionary or the collection of anagrams.  The code is shorter, but no
 less readable in my opinion.

 from collections import defaultdict

 def anafind(words):

Given a sequence of words, return a dictionary with groups
of letters as keys and sets of anagrams as values

anagrams = defaultdict(set)
for word in words:
key = ''.join(sorted(word))
anagrams[key].add(word)
return anagrams

 if __name__ == __main__:
wordlist = ['live', 'evil', 'one', 'nose', 'vile', 'neo']
anagrams = anafind(wordlist)
for letters, words in anagrams.items():
print %s: %s % (letters, words)

 --
 Jerry
--
http://mail.python.org/mailman/listinfo/python-list


anagram finder / dict mapping question

2008-05-07 Thread dave

Hi All,

I wrote a program that takes a string sequence and finds all the words 
inside a text file (one word per line) and prints them:


def anagfind(letters):  #find anagrams of these letters
fin = open('text.txt')  #one word per line file
wordbox = [] #this is where the words will go
for line in fin:
word = line.strip()
count = 0
for char in letters:
if char not in word:
break
else:
count += 1
if count == len(word):
wordbox.append(word)
return wordbox

Now I'd like to modify the code to naturally find all anagrams inside a 
wordlist.  What would be the best way to do this?  Using Hints?  Is it 
possible to iterate over dict keys?  How can I make a dict that maps 
from a set of letters to a list of words that are spelled from those 
letters?  Wouldn't I need to make the set of letters a key in a dict?


As always - Thanks for helping someone trying to learn...

Dave

--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-07 Thread Kam-Hung Soh
On Thu, 08 May 2008 11:02:12 +1000, dave [EMAIL PROTECTED]  
wrote:



Hi All,

I wrote a program that takes a string sequence and finds all the words  
inside a text file (one word per line) and prints them:


def anagfind(letters):  #find anagrams of these letters
fin = open('text.txt')  #one word per line file
wordbox = [] #this is where the words will go
for line in fin:
word = line.strip()
count = 0
for char in letters:
if char not in word:
break
else:
count += 1
if count == len(word):
wordbox.append(word)
return wordbox

Now I'd like to modify the code to naturally find all anagrams inside a  
wordlist.  What would be the best way to do this?  Using Hints?  Is it  
possible to iterate over dict keys?  How can I make a dict that maps  
from a set of letters to a list of words that are spelled from those  
letters?  Wouldn't I need to make the set of letters a key in a dict?


As always - Thanks for helping someone trying to learn...

Dave



Suggestion: for each word, sort their characters and use them as the  
dictionary key.  If two words have the same combination of characters,  
then they are anagrams.  For example: edam and made are anagrams  
because they have the letters 'a', 'd', 'e' and 'm'.


Refer Programming Pearls by Jon Bentley.

--
Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/a
--
http://mail.python.org/mailman/listinfo/python-list


Re: anagram finder / dict mapping question

2008-05-07 Thread dave
This is what i've came up with.  My problem is that I can't get them to 
properly evaluate.. when  comparewords() runs it finds itself...  
Should I have the keys of mapdict iterate over itself?  Is that 
possible?


def annafind():
fin = open('text.txt')  # file has one word per line
mapdic = {} # each word gets sorted  goes in here
for line in fin:
rawword = line.strip()
word = list(rawword)
word.sort()
mapdic[''.join(word)] = 0
return mapdic


def comparewords(): ***not working as intended
fin = open('text.txt')
for line in fin:
line = line.strip()
word = list(line)
word.sort()
sortedword = (''.join(word))
if sortedword in mapdic:
print line




On 2008-05-07 19:25:53 -0600, Kam-Hung Soh [EMAIL PROTECTED] said:


On Thu, 08 May 2008 11:02:12 +1000, dave [EMAIL PROTECTED]
wrote:


Hi All,

I wrote a program that takes a string sequence and finds all the words



inside a text file (one word per line) and prints them:

def anagfind(letters):  #find anagrams of these letters
fin = open('text.txt')  #one word per line file
wordbox = [] #this is where the words will go
for line in fin:
word = line.strip()
count = 0
for char in letters:
if char not in word:
break
else:
count += 1
if count == len(word):
wordbox.append(word)
return wordbox

Now I'd like to modify the code to naturally find all anagrams inside

a

wordlist.  What would be the best way to do this?  Using Hints?  Is it



possible to iterate over dict keys?  How can I make a dict that maps



from a set of letters to a list of words that are spelled from those



letters?  Wouldn't I need to make the set of letters a key in a dict?

As always - Thanks for helping someone trying to learn...

Dave



Suggestion: for each word, sort their characters and use them as the
dictionary key.  If two words have the same combination of characters,
then they are anagrams.  For example: edam and made are anagrams
because they have the letters 'a', 'd', 'e' and 'm'.

Refer Programming Pearls by Jon Bentley.

--
Kam-Hung Soh a href=http://kamhungsoh.com/blog;Software Salariman/ a



--
http://mail.python.org/mailman/listinfo/python-list