I've solved the problem, thanks to tips from Mark Miller and Ard Schrijvers, and am simply recording it so that someone else walking through the archives might get some benefit.
A while ago I had been working on a case-sensitive version of Lucene, where with a prefix symbol, it was possible to indicate whether you were interested in the token as-is, or the case insensitive version. (For instance 'LET' is a real acronym, whereas 'let' is a stop word.) I did this by writing a filter that injected two tokens for every one. The problem was, that token.setPositionIncrement(0) wasn't being called on the duplicate token. As such, it did appear that there were intermediate tokens between the real ones, when they should have occupied the same logical position. Adding that simple call made phrases work perfectly, even in the light of all the modified functionality I've added. Gotta love Lucene! What would have been a helpful debugging tool, would have been if Luke was capable of dumping the token positions. Clearly it can reconstruct a document in this manner, but it would have been extra nice to see numerical positions they resided at, not just the sequence of tokens. Hope this helps someone else in the future, -Walt Stoneburner, http://www.wwco.com/~wls/blog/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]