Re: [OT?] Best DB/architecture for n-gram corpus?

Ray Miller Tue, 10 Mar 2015 12:48:18 -0700

On 10 March 2015 at 17:58, Sam Raker <sam.ra...@gmail.com> wrote:
> I more meant deciding on a maximum size and storing them qua ngrams--it
> seems limiting. On the other hand, after a certain size, they stop being
> ngrams and start being something else--"texts," possibly.


Exactly. When I first read your post, I almost suggested you model
this in a graph database like Neo4j or Titan. Each word would be a
node in the graph with an edge linking it to the next word in the
sentence. You could define an index on the words (so retrieving all
nodes for a given word would be fast), then follow edges to find and
count particular n-grams. This is more complicated than the relational
model I proposed, and will be a bit slower to query. But if you don't
want to put an upper-bound on the length of the n-gram when you index
the data, it might be the way to go.

Ray.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [OT?] Best DB/architecture for n-gram corpus?

Reply via email to