Re: Semantic distance search

Alexander Klimetschek Fri, 05 Jun 2009 12:24:42 -0700

No, the search will work, because the path information is not storedin the lucene index - hence no reindex is needed upon a move - andpath location steps are handled without the lucene index.


Regards,
Alex


--
Alexander Klimetschek @iPhone


Am 05.06.2009 um 11:58 schrieb Dominik Süß <[email protected]>:

Hi Marcel,
doesn't that mean I never can be sure I'll get a proper result whensearching for the path of a node?
Best regards,
Dominik
On Tue, Jun 2, 2009 at 1:43 PM, Marcel Reutegger <[email protected]> wrote:
Hi,

2009/5/21 Dominik Süß <[email protected]>:
> Hi everybody,
>
> after having some time of indirect contact with JCR throught slingand day> crx/cq I now think it's time to get in touch with jackrabbitdirectly. As> the subject says I do this after having an idea which I'd like toshare and> need some help to realize (since my lucene experiences are closeto nothing> but pure usage & theory). I did try to start with a proof ofconcept but as> I looked in the current implementations of search in jcr I had torealize I> need someone who could give me a jumpstart and does the firststeps together
> with me. So here I go with my idea:
>
> I recently had some thoughts about something I'd call sementicdistance in> multidimensional hierachies (content structures + hierarchicaltagging like
> in CQ 5 [1]).
>
> The task I would like to fullfill: Find the semantically closestnodes for a
> given node.
>
> I postulate that structure represents the semantic relation aswell as the> referenced tags are in a hierarchie that represents semanticrelations.> Furthermore I postulate subnodes are semanticaly a subset of the"type" of> the parentnode (not thinking of jcr-types but in semanticalclassifications)> This leads into the following thesis: The distance to the closestshared> parentnode represents the unidirectional distance of a node toanother node.> The result is that a whole branch has the same distance to a node.(which> should be correct since the subnode in the branch belongs to theparent node
> which connects the branches we have to look at).
>
> My try to figure out a good way to produce an index for thisreally seams to> be hard so I rethought my assumptions and came up with thefollowing way of> determining the distance without indexing the explicit distance(came up> with this thought after reading a bit about the Analyzers andStemming).
>
> 1. For indexing all referenced taghandles and the own handle willbe taken
> into account for indexing
> 2. an analyzer produces stringtokens out of each handle. Eachhandle will be> split up in multiple handles by removing the last node till therootnode is> reached (so the node and every parentnode is indexed for this nodeas well
> as for each referenced tag)

this will only work as long as you don't move nodes. moving a node in
jackrabbit is a light weight operation, which means only the moved
node is re-indexed. all descendant nodes are kept untouched even
though their path (handle) changed!

regards
 marcel
> 3. The query has to built based on a given handle since I want tosearch for
> the semantically closest nodes.
> 4. The query is built the same way as the Analyzer has to splitthe handle
> in all parent handles.
> Result: A 100% match can only be produced for the same node (forall other> nodes at least the own handle of the node is missing). The"semantically"> closer a node is the more handles will match wich will result inan ordering
> as I intended. Et Voilá we have all we need to search for search
> semantically close pages in a proper sorting order.
>
> I might have a gap in my conclusions but didn't realise it yet, Idlove to> have some feedback and would appreciate some help to get startetwith the
> mentioned proof of concept.
>
> WDYT?
>
> Best regards,
> Dominik
>
> [1] http://dev.day.com/microsling/content/blogs/main/cq5tags.html
>

Re: Semantic distance search

Reply via email to