[ https://issues.apache.org/jira/browse/LUCENE-5943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130661#comment-14130661 ]
suleman mubarik edited comment on LUCENE-5943 at 9/11/14 8:33 PM: ------------------------------------------------------------------ Here is other example if input is this "I love <pizza hut>" then i get tokens "i", "love" ,"pizza", "hut" and offsets (0,1), (2,6), (7,11), (12,14) if HTMLStripCharFilter remove text between angle brackets then i should get "i", "love" and not "i", "love" ,"pizza", "hut" here is other example "I love <html>" tokens i get "i", "love" ,"html" I am on Lucene 4.8 was (Author: sulemanmubarik): Here is other example if input is this "I love <pizza hut>" then i get tokens "i", "love" ,"pizza", "hut" and offsets (0,1), (2,6), (7,11), (12,14) if HTMLStripCharFilter remove text between angle brackets then i should get "i", "love" and not "i", "love" ,"pizza", "hut" I am on Lucene 4.8 > HTML strip filter removes text between < and > > ---------------------------------------------- > > Key: LUCENE-5943 > URL: https://issues.apache.org/jira/browse/LUCENE-5943 > Project: Lucene - Core > Issue Type: Bug > Components: core/index > Environment: Production > Reporter: suleman mubarik > > If I have this as input “I love <pizza hut> so much” > When I apply html striper it removes “pizza hut” and I get tokens "i", > "love" ,"so", "much" > And these are offsets I get back ((0,1), (2,6), (20,22), (23,27)) > Html strip filter should return "i", "love" ,"pizza", "hut", "so", "much" -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org