Hi Marcel, doesn't that mean I never can be sure I'll get a proper result when searching for the path of a node?
Best regards, Dominik On Tue, Jun 2, 2009 at 1:43 PM, Marcel Reutegger <[email protected]>wrote: > Hi, > > 2009/5/21 Dominik Süß <[email protected]>: > > Hi everybody, > > > > after having some time of indirect contact with JCR throught sling and > day > > crx/cq I now think it's time to get in touch with jackrabbit directly. As > > the subject says I do this after having an idea which I'd like to share > and > > need some help to realize (since my lucene experiences are close to > nothing > > but pure usage & theory). I did try to start with a proof of concept but > as > > I looked in the current implementations of search in jcr I had to realize > I > > need someone who could give me a jumpstart and does the first steps > together > > with me. So here I go with my idea: > > > > I recently had some thoughts about something I'd call sementic distance > in > > multidimensional hierachies (content structures + hierarchical tagging > like > > in CQ 5 [1]). > > > > The task I would like to fullfill: Find the semantically closest nodes > for a > > given node. > > > > I postulate that structure represents the semantic relation as well as > the > > referenced tags are in a hierarchie that represents semantic relations. > > Furthermore I postulate subnodes are semanticaly a subset of the "type" > of > > the parentnode (not thinking of jcr-types but in semantical > classifications) > > This leads into the following thesis: The distance to the closest shared > > parentnode represents the unidirectional distance of a node to another > node. > > The result is that a whole branch has the same distance to a node. (which > > should be correct since the subnode in the branch belongs to the parent > node > > which connects the branches we have to look at). > > > > My try to figure out a good way to produce an index for this really seams > to > > be hard so I rethought my assumptions and came up with the following way > of > > determining the distance without indexing the explicit distance (came up > > with this thought after reading a bit about the Analyzers and Stemming). > > > > 1. For indexing all referenced taghandles and the own handle will be > taken > > into account for indexing > > 2. an analyzer produces stringtokens out of each handle. Each handle will > be > > split up in multiple handles by removing the last node till the rootnode > is > > reached (so the node and every parentnode is indexed for this node as > well > > as for each referenced tag) > > this will only work as long as you don't move nodes. moving a node in > jackrabbit is a light weight operation, which means only the moved > node is re-indexed. all descendant nodes are kept untouched even > though their path (handle) changed! > > regards > marcel > > > 3. The query has to built based on a given handle since I want to search > for > > the semantically closest nodes. > > 4. The query is built the same way as the Analyzer has to split the > handle > > in all parent handles. > > Result: A 100% match can only be produced for the same node (for all > other > > nodes at least the own handle of the node is missing). The "semantically" > > closer a node is the more handles will match wich will result in an > ordering > > as I intended. Et Voilá we have all we need to search for search > > semantically close pages in a proper sorting order. > > > > I might have a gap in my conclusions but didn't realise it yet, Id love > to > > have some feedback and would appreciate some help to get startet with the > > mentioned proof of concept. > > > > WDYT? > > > > Best regards, > > Dominik > > > > [1] http://dev.day.com/microsling/content/blogs/main/cq5tags.html > > >
