On 1/19/06, Mathias Lux <[EMAIL PROTECTED]> wrote:
>
>
>
> > -----Ursprüngliche Nachricht-----
> > Von: xing jiang [mailto:[EMAIL PROTECTED]
> > Gesendet: Donnerstag, 19. Jänner 2006 13:11
> > An: java-user@lucene.apache.org
> > Betreff: Re: Use the lucene for searching in the Semantic Web.
> >
> > Hi,
> >
> > I am not sure whether my understanding is correct.
> >
> > In your application, A concept "document" first should be
> > defined as a class
> > in the ontology? Then, each document is an instance of this
> > class. It uses
> > its contents as its features. Also, the related concepts will
> > be added into
> > the feature vector.
>
> Yes, thats it in general. You decide which classes are the ones to index
> and select all instances from this class or ist subclasses.
>
> > I think besides how to select the features, another problem
> > is how to define
> > the similarity measure. Given a query submitted. How do you define the
> > similarity between the query and the result? One document is
> > featured by its
> > keywords and the ontological annotations.
> the similarity measure is term based, tf*idf weighted in ist simple form.
> Further enhancement would be a "weighting" of nodes e.g. based on
> information content (see e.g. Rodriguez, M.A. & Egenhofer, M.J. (2003)),
> where a test corpus helps to weight the importance of nodes based on their
> labels. But this is just a direction, not tested yet.


Actually, my problem is that, for instance, for a document d, Its feature
vector may be keywords and concepts. I don't know how to weight the two
items. Right now, i used a stupid method, given a document d, i can obtain a
rank D based on keyword method. Also, it is annotated with a concept c (The
most simple example) . People can have a rank  C of these concepts in the
domain ontology, where the most relevant concepts should be the at top of
this concept list. Finally, document's rank is decided by the sum of (C +
D).


To introduce path based similarity using lucene I'm afraid is in my opinion
> impossible :) What someone - if not me - could try is to use the structural
> context of a node instead of the textual context based on paths as I've done
> with MPEG-7. This should be quite easy as RDF shares most characteristics
> with MPEG-7 semantic graphs, having e.g. unique node labels (URIs per
> definitionem in RDF), a limited set of possible relations (limited by the
> number of nodes in RDF, but that should do also) and so on.
>
> - mathias
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


--
Regards

Jiang Xing

Reply via email to