As a log4j developer, I've been toying with the idea of what Lucene could do for me, maybe as an excuse to play around with Lucene.

I've started creating a LoggingEvent->Document converter, and thinking through how I'd like this utility to work when I came across a question I wasn't sure about.

When scanning/searching through logging events, one is usually looking for a particular matching event which Lucene does excellently, but what a person usually needs is also the context of that matching logging event around it.

With grep, one can use the "-C<contextSize>" argument to grep to provide X # of lines around the matching entry. I'd like to be able to do the same thing with Lucene.

Now, I could provide a Field to the LoggingEvent Document that has a sequence #, and once a user has chosen an appropriate matching event, do another search for the documents with a Sequence # between +/- the context size.

My question is, is that going to be an efficient way to do this? The sequence # would be treated as text, wouldn't it? Would the range search on an int be the most efficient way to do this?

I know from the Hits documentation that one can retrieve the Document ID of a matching entry. What is the contract on this Document ID? Is each Document added to the Index given an increasing number? Can one search an index by Document ID? Could one search for Document ID's between a range? (Hope you can see where I'm going here).

If you have any other recommendations about "Context" searching I would appreciate any thoughts.

Many thanks for an excellent API, and kudos to Erik & Otis for a great eBook btw.

regards,

Paul Smith

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to