Re: Building Trees

Jonah Benton Mon, 16 Sep 2013 19:21:34 -0700

I find your description interesting but I'm confused about what the actual
underlying problem is- is indexing and searching for data in documents
involved, or is that just an example? If the real problem is about
augmenting large document data set searches with potentially relevant
taxonomic or semantic terms...ok, but I'm not sure what that necessarily
has to do with frequent patterns in sequences/event streams? e.g. because
taxonomic or semantic associations have richer relationships than just
their location in a sequence...


On the other hand, if this is just about extracting sequences from event
streams- if it has to be done live, where you want to incrementally record
distinct sequences of events for a series of window lengths- that sounds
like you're about talking maintaining what are essentially lots of tries,
one for each distinct root? But if you can do it statically, then you're
basically just generating n-grams from the event sequence data and indexing
those?




On Mon, Sep 16, 2013 at 12:08 PM, Peter Mancini <peter.manc...@gmail.com>wrote:

> That is along the lines of my thinking. I am starting to look at zippers
> again. However much of what I need it to do either isn't documented at all
> or it was never intended to construct trees over time.
>
> The purpose of this is to look at sequenced events and detect frequent
> patterns. These patterns can have variations. It is built from the top down
> with the most frequent events added first, the 2nd most added second, etc.
> What ends up happening is that I see A-B-C-D a lot, for example, but I also
> see A-B-D-F sometimes. I see B-D-E-F-G rarely. So what I want to do is keep
> track of each of these. The first two are variations of each other stemming
> after A-B. What I want to do is to be able to look at D and see what events
> often precede it. I see that in all of my data A, B and C precede it.
> However just by examining the counts I can determine that C does not occur
> more than random chance so I can drop that and return B, A as my set, in
> that order because I see B preceeding more than A. Big Whoop you say. But
> now lets apply this to some interesting situations. Let's say that instead
> of letters we have topics. D is part of some greater topic concept. I want
> to know what other topics predict the existence of D such that if I see
> them I think that its possible they imply D. B and A do this, according to
> this analysis. So when I tell my search engine I'm interested in D it will
> also search for B and A knowing that while D isn't mentioned by name, there
> is a possibility it is implied when B or A are present.
>
> Concrete example. I want documents that talk about "automobiles."
> Automobile associates with things like "drivers seat" "steering wheel"
> "rear view mirror" but also with "car" "car wash" "Prius" "Ford" and other
> terms. Why is this important? Let's say I don't want a general return of
> information but I am looking for a very specific document. If I don't have
> the right keyword I have to guess. A lot. I may never guess right. So I
> need to be sure my search is augmented properly. That is what this will do.
> However the number of events I will be looking at will be in the hundreds
> of millions and for some data sets will be in the billions or greater. The
> problem will be distributed. However, for today, I just need to get this to
> work on my laptop, with an eye to the future as to how I will get this to
> work on a large network of machines. So far I have loved Functional
> Programming... right up to this problem. This one is killing me. I am going
> to examine the other suggestions here for how to do it. I am certain that
> Clojure is extremely excellent at solving this kind of problem and I am
> becoming more certain I am missing some FP concept that is preventing me
> from tapping into that power!
>
> Solve this problem and Clojure becomes the best platform for large scale
> machine learning out there. My company has 3 complex applications up
> already making money and the bugs are 99% design issues instead of
> something not working in an expected manner. This is really fantastic and
> has me sold that Clojure was a good choice.
>
> --Pete
>
>
> On Monday, September 9, 2013 4:06:41 PM UTC-5, Laurent PETIT wrote:
>>
>> This is quite an interesting post, with the underlying question in my
>> mind : where to put the line between pure datastructure manipulation
>> with clojure.core only, and a datalog (datomic)/sql engine ?
>>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Building Trees

Reply via email to