It must be time to eat lunch, since the more I stare at this code, the less sense it makes to me. Which is a sure sign that I need a break <G>.
But a couple of things..... 1> my test cases throw some exceptions with the code as-is. The spans.get(0) is a problem in that it's not guaranteed that the spans returned will have anything in them. Also, I don't think that the test for reqSpans.get(0).next in queryClauses[i].isRequired is correct (even if it doesn't throw exceptions). Isn't the sense there that we want to include the spans if we *do* have entries?? 2> But more importantly, I think this throws things in the "span bucket" across documents. Consicer two documents with text "a b c d e f" is in one document, and "x y z" is in another, and we query on "a AND z", it seems like extractSpansFromTermQuery would return one span from each document, which would satisfy the tests in getSpansFromBooleanQuery inappropriately. Is it just me or is working with Spans really intended to be "one pass through and only forward"? There are several places in the SpansExtractor code where we want to ask "are there any spans in here?". But to ask that, you have to call next(). Which changes the state of the Spans such that you have to be really careful when you use any Spans that have had this test performed already and do a do..while (spans.next()); rather than a while ( spans.next()) {}..... Ditto with skipTo. I'm finally realizing that I need to write more custom stuff here than is probably useful for the community at large, since I only want to count spans for a single document. But this is a great start for me since it puts a bunch of the code in place for me and the rest should probably be just keeping some lists..... I'll let y'all know if I come up with anything really interesting.... Erick