Its for both, onto + contents (Word, Pdf, PPT, all time the same candidates). The main disadvantage of this approach is that "main" nodes in the ontology have to be defined.
Imagine following use case: An ontology describes a companies content and knowledge management system. Persons, hierarchies, projects, documents and concepts (project manager, author, requirements document and so on) are within this ontology. If you for instance choose to index all documents, projects and persons you identify all nodes, that symbolize the documents, projects and persons. For persons you can then index name (given-, surname, title and so on), department, age, ... For documents you can take the contents (from the document file itself), all metadata, the authors name, the category and type and so on. For projects you can extract from the ontology e.g. the description, name, participants, ... In the lucene index there are only 3 types of nodes indexed. The main point in this is: How do you define the features of a node (in this use case document, person and project) and which neighbouring literals describe the concept of the ontology best? The selection of concepts / classes / node types (whatever :) depends on the use case. hope this helps a bit, mathias > -----Ursprüngliche Nachricht----- > Von: xing jiang [mailto:[EMAIL PROTECTED] > Gesendet: Donnerstag, 19. Jänner 2006 12:14 > An: java-user@lucene.apache.org > Betreff: Re: Use the lucene for searching in the Semantic Web. > > Hi Mathias, > > Can you give more details? Is your application for text + > ontology, or > ontology only? > > regards > > jiang xing > > On 1/19/06, Mathias Lux <[EMAIL PROTECTED]> wrote: > > > > Hi! > > > > (1) I'm working on a similar problem, but based on MPEG-7 Semantic > > Description Graphs. I've already a prototype for pakth > based matching > > within Lucene integrated in my sf project Caliph & Emir > > (http://caliph-emir.sf.net). I've already adapted the approach to an > > ontology, which had to be searched. > > > > My approach works roughly like this: > > * index all paths up to a certain length in a graph as strings in > > Lucene > > * index all node descriptions in another index > > * Within the query graph nodes are lucene queries -> query > expansion to > > node ids based on the node index > > * search for all expanded query graphs and merge results. > > > > Unfortunately I didn't have time yet to do a full evaluation, but > > preeliminary results are promising. > > > > The valuation and a more comprehensive description of the > approach can > > be found in the proceedings of the TIR 05 (Text Information > Retrieval > > Workschop 2005 in Koblenz, Germany): > > > http://www-ai.upb.de/aisearch/tir-05/proceedings/lux05-fast-an d-simple-p > > ath-index-based-retrieval.pdf > > > > The prototype is available @ http://caliph-emir.sf.net. > > > > I'm open to comments and ideas on the approach as it is > part of my PhD > > and I'm working on a method without query expansion :-) > > > > (2) A second thing is the feature based retrieval of nodes within an > > ontology, which allows really fast indexing and retrieval as no > > pathwalking takes place. > > Works like this: > > * nodes being the documents / entities being searched for in the > > ontology are extracted > > > * surrounding nodes / literals are used as characteristic features > > * with some heuristics and some runtime configuration > classifications, > > text & keyword fields are separated > > * Retrieval is purely based on text and keywords, the same with > > similarity search > > * additional Clustering is done on snippets from search results. > > I already have a prototype running with this approach, but > no evaluation > > yet, sorry! For more information on this one please contact me. A > > publication on this is currently in review, so I cannot > give a link here > > ;( > > > > References: > > - Rodriguez, M.A. & Egenhofer, M.J. (2003), 'Determining Semantic > > Similarity among Entity Classes from Different Ontologies', IEEE > > Transactions on Knowledge and Data Engineering 15(2), 442--456. > > - Varelas, G.; Voutsakis, E.; Raftopoulou, P.; Petrakis, > E.G. & Milios, > > E.E. (2005), Semantic similarity meth-ods in wordNet and their > > application to information retrieval on the web, in 'WIDM '05: > > Proceedings of the 7th annual ACM international workshop on Web > > information and data management', ACM Press, New York, NY, USA, pp. > > 10--16. > > > > > > regards, > > Mathias > > > > ============================================================ > > DI Mathias Lux > > Know-Center & Graz University of Technology, Austria > > Institute for Knowledge Management (IWM) > > Inffeldgasse 21a, 8010 Graz, Austria > > Email : [EMAIL PROTECTED] > > Tel: +43 316 873 9274 Fax: +43 316 873 9252 > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > -- > Regards > > Jiang Xing > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]