[gcj] My analysis of Alien Rhyme

Ronald van Loon Mon, 15 Apr 2019 17:56:16 -0700

Hi,

tried to participate in the first round, but it was at 3am local time. Had my 
alarmclock at 2:45, duly turned it off, nodded of for five minutes, then 
suddenly it was 4:30. Don't know how that happened.

Anyway, here is my analysis of the problem:

This is a problem about suffixes. Unfortunately, these are at the end of the
words, so it makes sense to put these at the front.

So, while reading, reverse the words - we do not have to list the pairs, after
all, only count them. This is a O(N) operation.

After we read all the words, we sort them. (Note: you could actually do the
compares without sorting or even while sorting, but sorting them puts the
longer matches closer together and is usually optimized if you use the standard
library). This is an O(N log N) operation.

We now compare every word against each other, noting the common suffix. We put
the result of the comparison in a multimap, meaning that we sort on
<suffixlength, <word1-index, word2-index> >. This is O(N^2). The problem states
that a suffix is of at least length 1, so if two words do not have a common
suffix we do not need to put them in this map.

Once this is done, we no longer need the words themselves, as we can now use
transitive properties (if the suffix of a certain word W1, matches that of
another word with W2, and W1 == W3, then W2 == W3, provided they all have a
suffixlength in common that is the same for all three of them).

We now have a map with all pairs, grouped together by suffix length. We now
iterate over this map in the following fashion:

- get the largest suffixlength left in the map
- get all pairs with that suffixlength from the map
- process all pairs, but only process those pairs that have two words in them
that have not been picked in a previous pass as rhyming. (there could be a
match which was processed earlier with a larger suffixlength, causing a word to
be marked as rhyming, while it has a smaller suffixlength, being processed now,
with another word - we could weed it out, but then our datastructure needs to
be more complex, and this is a simple O(log N) verification)

By using the transitive properties, we now know that if a certain word in this
range has a common suffixlength to another, they rhyme. So we can partition the
pairs we have at this suffixlength into distinct sets. If a word is in one set,
it cannot be in another set. (Or we would get a contradiction with the
transitive properties). This is essentially a graph algorithm to determine
distinct subgraphs.

So, all words in a set rhyme. However, we have the restriction that it has to
be pairwise. If there is only one pair, we are done with the set, we mark each
word as handled (ie. part of a pair). If we have more than one pair, we just
pick the first one, mark those as handled. However, we cannot ignore the other
pairs - we already know they rhyme, and will also rhyme if we shorten their
common length by one. So, we reinsert them into the map at a shorter length (of
course, we don't have to that if the length we are processing is equal to 1, as
we have at least one letter to put an accent on).

When all sets have been processed in this fashion, we erase all the pairs we
just processed for the main multimap. This ensures that our loop ends - we know
that once a certain length L has been processed, we will process L - 1. As we
do not reinsert length L pairs, but only L-1 length pairs, the map will
diminish in size. We break at length 1, so there is a termination point.

This algorithm has a worst case of O(P). Worst case this would be O(N^2), but
depending on the sparsity of the original words, this would amortize to O(N).

Note that original analysis uses a trie and recursion. The above does not, as
it only needs the pairs, not the words themselves, once processed. In essence,
all we need to is compare integers, once we have processed all the pairwise
comparisons. This might be a bit faster.

I can post my actual C++ code for those interested.

Hope this was useful.

Regards,

Ronald

--
You received this message because you are subscribed to the Google Groups
"Google Code Jam" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/google-code/ca46d64d-aaaf-4d47-be0d-8aa5b8948e09%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[gcj] My analysis of Alien Rhyme

Reply via email to