> -----Ursprüngliche Nachricht-----
> Von: xing jiang [mailto:[EMAIL PROTECTED] 
> Gesendet: Donnerstag, 19. Jänner 2006 13:11
> An: java-user@lucene.apache.org
> Betreff: Re: Use the lucene for searching in the Semantic Web.
> 
> Hi,
> 
> I am not sure whether my understanding is correct.
> 
> In your application, A concept "document" first should be 
> defined as a class
> in the ontology? Then, each document is an instance of this 
> class. It uses
> its contents as its features. Also, the related concepts will 
> be added into
> the feature vector.

Yes, thats it in general. You decide which classes are the ones to index and 
select all instances from this class or ist subclasses. 

> I think besides how to select the features, another problem 
> is how to define
> the similarity measure. Given a query submitted. How do you define the
> similarity between the query and the result? One document is 
> featured by its
> keywords and the ontological annotations.
the similarity measure is term based, tf*idf weighted in ist simple form. 
Further enhancement would be a "weighting" of nodes e.g. based on information 
content (see e.g. Rodriguez, M.A. & Egenhofer, M.J. (2003)), where a test 
corpus helps to weight the importance of nodes based on their labels. But this 
is just a direction, not tested yet. 

To introduce path based similarity using lucene I'm afraid is in my opinion 
impossible :) What someone - if not me - could try is to use the structural 
context of a node instead of the textual context based on paths as I've done 
with MPEG-7. This should be quite easy as RDF shares most characteristics with 
MPEG-7 semantic graphs, having e.g. unique node labels (URIs per definitionem 
in RDF), a limited set of possible relations (limited by the number of nodes in 
RDF, but that should do also) and so on.

- mathias

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to