On 10 March 2015 at 17:58, Sam Raker <sam.ra...@gmail.com> wrote: > I more meant deciding on a maximum size and storing them qua ngrams--it > seems limiting. On the other hand, after a certain size, they stop being > ngrams and start being something else--"texts," possibly.
Exactly. When I first read your post, I almost suggested you model this in a graph database like Neo4j or Titan. Each word would be a node in the graph with an edge linking it to the next word in the sentence. You could define an index on the words (so retrieving all nodes for a given word would be fast), then follow edges to find and count particular n-grams. This is more complicated than the relational model I proposed, and will be a bit slower to query. But if you don't want to put an upper-bound on the length of the n-gram when you index the data, it might be the way to go. Ray. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.