On 19/05/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
i assume when you say this... : 1. I need to temporarilly index sets of documents on the Fly say 100 at a : Time. you mean that you'll have lots of temporary indexes of a few hundrad documents and then you'll do a bunch of queries and throw the index away. Even if i'm wrong most of the rest of my advice will wtill be usefull, but its' good to clarify.
Correct I will throw them away! : My problem is that for these queries I need to know which Documents hit. I
: also need to know which terms hit and if possible : the location of the hits for each term in the hit Document. knowing which docs match your is easy. knowing where in a document a particular term matches can be done using the TermPositions APIs ... but it does you that info as a number of "terms" which for HTML content may be confusing depending on how your analyzer deals with that HTML.
Okay based on your answer and a little testing just to see what it gives me - I assume Lucene only stores the Term Offset (which is Analyser Dependent) and not the Actual Offset as retrieved from the Plain Text Stream for the Term. if you have complex boolean queries and you need to know which individual
pat of the query matched that's not really trivial. you didn't mention anything about "score" or "relevancy" in your email, so i'm guessing all you care about is boolean "did it match or not" logic .. in that case using Filters directly (without ever searching) is your friend. You can build a Filter for each individual clause, intersect/union the bitsets to get the final set of matching documents for your whole query, but inspect the individual bitsets to know he specifics about which ones match which documents.
Score/Relavence is not Important. I need the Yes/No logic with the what caused the Match Info. Could you mayby explain the intersect/union the bitsets and the interogating to know what matched? some people don't like Filters because of how much space they take up for
really large indexes, but if you've only got 100 docs ... there's no reason not to use them
Nope will never have any really large Indexes here 100 to 200 docs at the most. -Hoss
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Thanx for the Relpy much appreciated.