On 1/19/06, Mathias Lux <[EMAIL PROTECTED]> wrote: > > > > > -----Ursprüngliche Nachricht----- > > Von: xing jiang [mailto:[EMAIL PROTECTED] > > Gesendet: Donnerstag, 19. Jänner 2006 13:11 > > An: java-user@lucene.apache.org > > Betreff: Re: Use the lucene for searching in the Semantic Web. > > > > Hi, > > > > I am not sure whether my understanding is correct. > > > > In your application, A concept "document" first should be > > defined as a class > > in the ontology? Then, each document is an instance of this > > class. It uses > > its contents as its features. Also, the related concepts will > > be added into > > the feature vector. > > Yes, thats it in general. You decide which classes are the ones to index > and select all instances from this class or ist subclasses. > > > I think besides how to select the features, another problem > > is how to define > > the similarity measure. Given a query submitted. How do you define the > > similarity between the query and the result? One document is > > featured by its > > keywords and the ontological annotations. > the similarity measure is term based, tf*idf weighted in ist simple form. > Further enhancement would be a "weighting" of nodes e.g. based on > information content (see e.g. Rodriguez, M.A. & Egenhofer, M.J. (2003)), > where a test corpus helps to weight the importance of nodes based on their > labels. But this is just a direction, not tested yet.
Actually, my problem is that, for instance, for a document d, Its feature vector may be keywords and concepts. I don't know how to weight the two items. Right now, i used a stupid method, given a document d, i can obtain a rank D based on keyword method. Also, it is annotated with a concept c (The most simple example) . People can have a rank C of these concepts in the domain ontology, where the most relevant concepts should be the at top of this concept list. Finally, document's rank is decided by the sum of (C + D). To introduce path based similarity using lucene I'm afraid is in my opinion > impossible :) What someone - if not me - could try is to use the structural > context of a node instead of the textual context based on paths as I've done > with MPEG-7. This should be quite easy as RDF shares most characteristics > with MPEG-7 semantic graphs, having e.g. unique node labels (URIs per > definitionem in RDF), a limited set of possible relations (limited by the > number of nodes in RDF, but that should do also) and so on. > > - mathias > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Regards Jiang Xing