Dear users, Question on approaches to indexing TEI XML or similar section/subsectioned files.
I'm indexing TEI P4 XML files using Lucene 2.x. Currently, each TEI XML file corresponds to a Lucene document. I extract the data from each XML file using XPath expressions e.g. for the body text: "/TEI.2/text//p". I also extract and store various meta data e.g. author, title, publishing data etc. per document. The issue is that TEI documents can be very large and contain several chapters. Ideally, search terms would return references to chapter(s) in which the terms were found. The user would then follow a hyperlink to a particular subsection rather than retrieving the entire file. I think it is possible to transform TEI files into chapterised sections using XSLT although I have not managed this yet. The final system is likely to use Apache Cocoon to present documents in various formats but that is a separate issue. I'm tending towards a solution involving indexing each section as a document (possibly with only the front-matter being associated with the meta data e.g. title) and then maybe using XPointer to associate the source document. Any comments/approaches taken to similar issues appreciated. Thanks, Aodh Ó Lionáird. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]