On Sat, 23 Nov 2013 04:23:42 +0000 MRAB <pyt...@mrabarnett.plus.com> wrote:
> On 23/11/2013 00:58, John O'Hagan wrote: > > On Thu, 21 Nov 2013 12:59:26 -0800 > > Dan Stromberg <drsali...@gmail.com> wrote: > > > >> On Wed, Nov 20, 2013 at 10:46 PM, John O'Hagan > >> <resea...@johnohagan.com>wrote: > >> > >> > > >> > Short story: the subject says it all, so if you have an answer > >> > already, fire away. Below is the long story of what I'm using it > >> > for, and why I think it needs to be recursive. It may even be of > >> > more general interest in terms of filtering the results of > >> > generators. > >> > > >> > >> I think you probably need permutations rather than combinations. > >> > >> Also, I think you'll need to form a word (partitioned off by > >> spaces), and then check it against a set > >> containing /usr/share/dict/words before recursing for the > >> remainder of the sentence - this should speed things up a LOT. > > > > Thanks for the reply. If I understand you correctly, you are > > suggesting permuting the input _characters_ to form words and then > > seeing if they exist, as opposed to my approach of combining known > > words and seeing if they are anagrams. (Permutations of words would > > not help find anagrams as they merely change the word order). Here > > is an attempt at that: > > > > def anagrams(partition, input_string): > > """Find anagrams which fit given partition of input string > > length""" if not partition: > > yield (), input_string > > return > > for words, checkstring in anagrams(partition[:-1], > > input_string): for word in itertools.permutations(checkstring, > > partition[-1]): word = ''.join(word) > > if word in WORDS: #WORDS is collection of dictionary > > words newstring = checkstring > > for l in word: > > newstring = newstring.replace(l, '' , 1) > > yield words + (word,), newstring > > > > There are two problems with this. If there are repeated characters > > in the input, redundant results are produced; a multiset-permutation > > algorithm would fix this. But the main problem is it is incredibly > > slow: on my run-of-the-mill laptop, it chokes on anything longer > > than about 10 characters, spending most of its time rejecting > > non-words. > > > > Or have I misunderstood your suggestion? > > > If you want to know how to get unique permutations, have a look here: > > http://mail.python.org/pipermail/python-ideas/2013-October/023610.html > For this particular problem I don't need multiset permutations but multiset combinations (see original post). But that thread was a good read. This is a little OT, but I did need multiset permutations a couple of years back for a project involving the generation of musical structures. There was zero interest here at the time (fair enough!) and I ended up implementing the C++ function "next_permutation" in Python. So it was good to read in that thread that there seems to be some interest in incorporating multiset combinatorics into itertools (including your excellent contribution). IMHO the scepticism there about non-exotic use-cases was unjustified. Leaving aside my probably atypical uses, it crops in many ordinary situations dealing with arrangements of multiple items of several types where each instance of a type is equivalent. Take stock control: when stacking a warehouse shelf it doesn't matter which particular box goes where, only how many of each size. Or timetabling: if Mrs Smith teaches the same students maths on Tuesdays and Thursdays, swapping the classes does nothing. The same goes for cyclic and cyclic-multiset ("necklaces") combinatorics, where the beginning and end of an arrangement is not significant, eg. 24-hour rostering, laying tiles, etc. And musical scales. Disclaimer: I am far from expert on combinatorics but seem to end up using it a lot. -- John -- https://mail.python.org/mailman/listinfo/python-list