sounds easy (I said sounds :), e.g. your Statement becomes Document in Lucene lingo, you make it with 3-4 Lucene fields, CONTENT (Tokenized, not stored) OFFSET(not indexed, stored) - offset in file of the first byte of your statement DOC_LENGTH(not indexed, stored) - if you have no END-OF-Statement to identify end of it PAGE_COUNT(not indexed, stored) - you have to know it before add(Document), but that is your domain. Lucene has no idea what your "pages" are
you can search on CONTENT and than fetch OFFSET/LENGTH for hits, so you can easily identify your statement as an optimization, you could pack offset/length/page count to one binary field... but that would be to early :) have fun and be surprised how fast this can be e. ----- Original Message ---- From: Brad Harper <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, 13 June, 2007 9:39:02 PM Subject: Investigating Lucene's Applicability to [Unusual?] Use Case Hello: [This is not an intentional cross-posting. I posted this same question to the 'general' lucene list; replies there suggested that I'd have better/quicker responses using this list instead.] I'm investigating Lucene as a replacement for a special-purpose search technology that was developed long before Lucene (or any of the current IR libraries) became available. The use case involves so-called print streams. Imagine 20,000 statements concatenated into one large file suitable for delivery to a print system. The document formats vary, but include AFP (an IBM printer format), PCL (an HP format), Postscript, PDF, and even "plain-text". The indexing application must track the total page count of the embedded statements. On a hit, the search application must extract and return the [possibly multi-page] statement embedded within the larger print-stream file. How would the search application know (be informed by the Lucene/indexer) the extent of the internal document(s)? I'm not seeing this scenario discussed in forums or books. Does anyone have comments or thoughts on Lucene's applicability as a solution? Thanks. Brad -- View this message in context: http://www.nabble.com/Investigating-Lucene%27s-Applicability-to--Unusual---Use-Case-tf3917320.html#a11107323 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___________________________________________________________ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]