That is along the lines of my thinking. I am starting to look at zippers 
again. However much of what I need it to do either isn't documented at all 
or it was never intended to construct trees over time.

The purpose of this is to look at sequenced events and detect frequent 
patterns. These patterns can have variations. It is built from the top down 
with the most frequent events added first, the 2nd most added second, etc. 
What ends up happening is that I see A-B-C-D a lot, for example, but I also 
see A-B-D-F sometimes. I see B-D-E-F-G rarely. So what I want to do is keep 
track of each of these. The first two are variations of each other stemming 
after A-B. What I want to do is to be able to look at D and see what events 
often precede it. I see that in all of my data A, B and C precede it. 
However just by examining the counts I can determine that C does not occur 
more than random chance so I can drop that and return B, A as my set, in 
that order because I see B preceeding more than A. Big Whoop you say. But 
now lets apply this to some interesting situations. Let's say that instead 
of letters we have topics. D is part of some greater topic concept. I want 
to know what other topics predict the existence of D such that if I see 
them I think that its possible they imply D. B and A do this, according to 
this analysis. So when I tell my search engine I'm interested in D it will 
also search for B and A knowing that while D isn't mentioned by name, there 
is a possibility it is implied when B or A are present.

Concrete example. I want documents that talk about "automobiles." 
Automobile associates with things like "drivers seat" "steering wheel" 
"rear view mirror" but also with "car" "car wash" "Prius" "Ford" and other 
terms. Why is this important? Let's say I don't want a general return of 
information but I am looking for a very specific document. If I don't have 
the right keyword I have to guess. A lot. I may never guess right. So I 
need to be sure my search is augmented properly. That is what this will do. 
However the number of events I will be looking at will be in the hundreds 
of millions and for some data sets will be in the billions or greater. The 
problem will be distributed. However, for today, I just need to get this to 
work on my laptop, with an eye to the future as to how I will get this to 
work on a large network of machines. So far I have loved Functional 
Programming... right up to this problem. This one is killing me. I am going 
to examine the other suggestions here for how to do it. I am certain that 
Clojure is extremely excellent at solving this kind of problem and I am 
becoming more certain I am missing some FP concept that is preventing me 
from tapping into that power!

Solve this problem and Clojure becomes the best platform for large scale 
machine learning out there. My company has 3 complex applications up 
already making money and the bugs are 99% design issues instead of 
something not working in an expected manner. This is really fantastic and 
has me sold that Clojure was a good choice.

--Pete

On Monday, September 9, 2013 4:06:41 PM UTC-5, Laurent PETIT wrote:
>
> This is quite an interesting post, with the underlying question in my 
> mind : where to put the line between pure datastructure manipulation 
> with clojure.core only, and a datalog (datomic)/sql engine ? 
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to