Not sure how to write that subject line. I'm getting some weird behavior out of the highlighter in Solr. It seems like an edge case, but I'm curious to hear if this is known about, or if it's something worth looking into further.
Background: I'm using Solr's highlighting facility to tag words, found in content crawled via Nutch. I split up the content based on those tags, which is later fed into a moderation process. Sample Data (snippet from larger content): [url=\"http://www.sampleurl.com/baffle_prices.html\"]baffle[/url] (My "hl.simple.pre" is set to "TEST_KEYWORD_START" and my "hl.simple.post" is set to "TEST_KEYWORD_END") Query for "baffle", and solr highlights it thus: TEST_KEYWORD_STARTbaffle_prices.html\"]baffleTEST_KEYWORD_END What should be happening, is this: TEST_KEYWORD_STARTbaffleTEST_KEYWORD_END_prices.html\"]TEST_KEYWORD_STARTbaffleTEST_KEYWORD_END Is there something about this data that makes the highlighter not want to split it up? Do I have to have Solr tokenize the words by some character that I somehow excluded? Thank you, Scott Gonyea