> -----Ursprüngliche Nachricht----- > Von: xing jiang [mailto:[EMAIL PROTECTED] > Gesendet: Donnerstag, 19. Jänner 2006 13:11 > An: java-user@lucene.apache.org > Betreff: Re: Use the lucene for searching in the Semantic Web. > > Hi, > > I am not sure whether my understanding is correct. > > In your application, A concept "document" first should be > defined as a class > in the ontology? Then, each document is an instance of this > class. It uses > its contents as its features. Also, the related concepts will > be added into > the feature vector.
Yes, thats it in general. You decide which classes are the ones to index and select all instances from this class or ist subclasses. > I think besides how to select the features, another problem > is how to define > the similarity measure. Given a query submitted. How do you define the > similarity between the query and the result? One document is > featured by its > keywords and the ontological annotations. the similarity measure is term based, tf*idf weighted in ist simple form. Further enhancement would be a "weighting" of nodes e.g. based on information content (see e.g. Rodriguez, M.A. & Egenhofer, M.J. (2003)), where a test corpus helps to weight the importance of nodes based on their labels. But this is just a direction, not tested yet. To introduce path based similarity using lucene I'm afraid is in my opinion impossible :) What someone - if not me - could try is to use the structural context of a node instead of the textual context based on paths as I've done with MPEG-7. This should be quite easy as RDF shares most characteristics with MPEG-7 semantic graphs, having e.g. unique node labels (URIs per definitionem in RDF), a limited set of possible relations (limited by the number of nodes in RDF, but that should do also) and so on. - mathias --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]