[algogeeks] Re: longest chain

Gene Sun, 27 Jun 2010 19:48:19 -0700


On Jun 20, 6:20 am, sharad <sharad20073...@gmail.com> wrote:
> Given a dictionary of words, find out the longest chain of 'ancestors'
> in it. Ancestors are defined in the following way: A word is said to
> be the parent of another word if
>      1) It is a valid word in the dictionary
>      2) It can be obtained by deleting one character of the word and
> permuting the remaining characters.
>      eg. run is a parent of urns
>     A possible chain of ancestors could be:
>     a
>     an
>     ban
>     nabs
>     bands
> This chain is of length 4. What is the length of the longest chain in
> a given dictionary?


I solved this using an array of hash tables, C, where C[n] is for
strings of length n.  The key of each hash table entry is a string
formed by sorting the lowercased characters of the word it
represents.  The corresponding value is a triple of the form [ word,
chain length, subkey ].  The subkey is just the key for the next
shortest word in the chain.  It effectively serves as a pointer so you
can get the final answer.

Process the dictionary in sorted order (ignoring case).  For word W,
compute key(W).  Let W_i be W with the i'th character deleted.  Then
also compute all possible subkeys S_i = key(W_i).  For each subkey
S_i, check the hash table C[length(W) - 1] to see if S_i is there.  If
not and C[length(W)] is not already defined for W, then add [W, 1,
null] to C[length] with key key(W).  If there is already a value for
S_i in the table C[length(W) - 1], say it is [X, L, S], then add [W, L
+ 1, S_i] to C[length(W)] with key key(W) if L+1 is bigger than any
previously stored value.

Note that you could save memory by throwing away all the information
about strings that are more than one letter shorter than the current
W.  But then you would need a real pointer rather than a subkey in the
hash table in order to retrieve the final answer.

I implemented all the above in perl.  Below is the result using the
dictionary on my Macbook.  It takes about 7 seconds to run including
reading the dictionary.

Read 234936 words.
Polystomatidae
Platystomidae
Platysomidae
plasmodiate
metapodial
diplomate
diplomat
diploma
amidol
amido
amid
aim
ai
A

Here is the code.  It was very quickly written. No claim that it's
correct.

sub find_chains {
    my $dict = shift;
    my @chains;
    my $longest;
    foreach my $word (@$dict) {
        my $n = length($word);
        my $key = join '', sort split '', lc($word);
        foreach my $i (0 .. length($key) - 1) {
            my $subkey = substr($key, 0, $i) . substr($key, $i + 1);
            my $chain = $n > 1 && $chains[$n - 1] && $chains[$n - 1]-
>{$subkey};
            my $new_chain;
            my $existing = $chains[$n]->{$key};
            if ($chain) {
                if (!$existing || $existing->[1] < $chain->[1] + 1) {
                    $new_chain = $chains[$n]->{$key} = [$word, $chain->[1] + 1,
$subkey];
                }
            }
            else {
                if (!$existing) {
                    $new_chain = $chains[$n]->{$key} = [$word, 1, ''];
                }
            }
            if ($new_chain && (!$longest || $new_chain->[1] > $longest->[1]))
{
                $longest = $new_chain;
            }
        }
    }
    my $p = $longest;
    while ($p) {
        print "$p->[0]\n";
        $p = $chains[$p->[1] - 1]->{$p->[2]};
    }
}

sub main {
    open F, "/usr/share/dict/words" or die $!;
    my @words = map { chomp; $_ } <F>;
    close F;
    print "Read " . scalar(@words) . " words.\n";
    find_chains(\...@words);
}

main;

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algoge...@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

[algogeeks] Re: longest chain

Reply via email to