That is along the lines of my thinking. I am starting to look at zippers again. However much of what I need it to do either isn't documented at all or it was never intended to construct trees over time.
The purpose of this is to look at sequenced events and detect frequent patterns. These patterns can have variations. It is built from the top down with the most frequent events added first, the 2nd most added second, etc. What ends up happening is that I see A-B-C-D a lot, for example, but I also see A-B-D-F sometimes. I see B-D-E-F-G rarely. So what I want to do is keep track of each of these. The first two are variations of each other stemming after A-B. What I want to do is to be able to look at D and see what events often precede it. I see that in all of my data A, B and C precede it. However just by examining the counts I can determine that C does not occur more than random chance so I can drop that and return B, A as my set, in that order because I see B preceeding more than A. Big Whoop you say. But now lets apply this to some interesting situations. Let's say that instead of letters we have topics. D is part of some greater topic concept. I want to know what other topics predict the existence of D such that if I see them I think that its possible they imply D. B and A do this, according to this analysis. So when I tell my search engine I'm interested in D it will also search for B and A knowing that while D isn't mentioned by name, there is a possibility it is implied when B or A are present. Concrete example. I want documents that talk about "automobiles." Automobile associates with things like "drivers seat" "steering wheel" "rear view mirror" but also with "car" "car wash" "Prius" "Ford" and other terms. Why is this important? Let's say I don't want a general return of information but I am looking for a very specific document. If I don't have the right keyword I have to guess. A lot. I may never guess right. So I need to be sure my search is augmented properly. That is what this will do. However the number of events I will be looking at will be in the hundreds of millions and for some data sets will be in the billions or greater. The problem will be distributed. However, for today, I just need to get this to work on my laptop, with an eye to the future as to how I will get this to work on a large network of machines. So far I have loved Functional Programming... right up to this problem. This one is killing me. I am going to examine the other suggestions here for how to do it. I am certain that Clojure is extremely excellent at solving this kind of problem and I am becoming more certain I am missing some FP concept that is preventing me from tapping into that power! Solve this problem and Clojure becomes the best platform for large scale machine learning out there. My company has 3 complex applications up already making money and the bugs are 99% design issues instead of something not working in an expected manner. This is really fantastic and has me sold that Clojure was a good choice. --Pete On Monday, September 9, 2013 4:06:41 PM UTC-5, Laurent PETIT wrote: > > This is quite an interesting post, with the underlying question in my > mind : where to put the line between pure datastructure manipulation > with clojure.core only, and a datalog (datomic)/sql engine ? > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.