Interesting. This problem is like compression, but the first thing I
thought of was pattern recognition. If your tokens are just characters
in a string, you can let regular expressions do all the heavy lifting:
(defn rep-groups [s] (re-seq #"(.+)\1+|." s))
Then (rep-groups ".,ababcababc.,cc,.abab.") for example, gives
(["." nil] ["," nil] ["ababcababc" "ababc"] ["." nil] ["," nil] ["cc"
"c"] ["," nil] ["." nil] ["abab" "ab"] ["." nil])
Now we just need to convert each of those match vectors into the form
you want, like this:
(map (fn [[mtch subst]] (if subst
[(/ (count mtch) (count subst)) subst]
mtch))
*1) ;; Where *1 is the match results, above.
Gives
("." "," [2 "ababc"] "." "," [2 "c"] "," "." [2 "ab"] ".")
Cool! Pretty close to what you need. But you wanted to use arbitrary
tokens, like :a, :b, etc. So let's just represent each token by a
unique character. (There are LOTS of characters available.) Here's a
function that creates two 1-to-1 maps -- one that converts tokens to
characters, and the other its inverse, converting characters to their
tokens:
(defn token-to-char-map-and-inverse [token-collection]
(let [tokens (seq (set token-collection))
chars (map char (range (count tokens)))]
[(zipmap tokens chars) (zipmap chars tokens)]))
We'll need a presentation function like the (fn ...) above. Since it
will also convert the characters back to tokens, it will also depend
upon the char-to-token map. So the following function takes such a map
and returns such a function:
(defn count-subst-fn [c-to-t]
(fn [[mtch subst]]
(if subst
[(/ (count mtch) (count subst)) (map c-to-t subst)]
(apply c-to-t mtch))))
Finally, put it all together:
(defn parse-reps [token-seq]
(let
[[t-to-c c-to-t] (token-to-char-map-and-inverse token-seq)
s (apply str (map t-to-c token-seq))]
(map (count-subst-fn c-to-t) (rep-groups s))))
For example, (parse-reps [:_ :- :a :b :c :a :b :c :a :a :_]) gives
(:_ :- [2 (:a :b :c)] [2 (:a)] :_)
Notice that the input itself is used to define the set of tokens we're
using. We could even us a string as input. (parse-reps
".,ababcababc.,cc,.abab.") gives
(\. \, [2 (\a \b \a \b \c)] \. \, [2 (\c)] \, \. [2 (\a \b)] \.)
These four (defn ...) provide a simple solution to your problem. Using
regular expressions also yields a nice separation of concerns, I
think, and it allows for experimentation with the pattern. As defined
above, (parse-reps [:a :b :a :b :c :a :b :a :b :c]) gives
([2 (:a :b :a :b :c)])
But let's try redefining the regular expression (note the "?"):
(defn rep-groups [s] (re-seq #"(.+?)\1+|." s))
Then the same (parse-reps [:a :b :a :b :c :a :b :a :b :c]) gives the
less greedy result,
([2 (:a :b)] :c [2 (:a :b)] :c)
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en