[ 
https://issues.apache.org/jira/browse/LUCENE-5943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130661#comment-14130661
 ] 

suleman mubarik edited comment on LUCENE-5943 at 9/11/14 8:33 PM:
------------------------------------------------------------------

Here is other example
if input is this "I love <pizza  hut>"
then i get tokens "i", "love" ,"pizza", "hut" and offsets (0,1), (2,6), (7,11), 
(12,14)
if HTMLStripCharFilter remove text between angle brackets then i should get 
"i", "love"  and not  "i", "love" ,"pizza", "hut"

here is other example "I love <html>"
tokens i get "i", "love" ,"html"
I am on Lucene 4.8


was (Author: sulemanmubarik):
Here is other example
if input is this "I love <pizza  hut>"
then i get tokens "i", "love" ,"pizza", "hut" and offsets (0,1), (2,6), (7,11), 
(12,14)
if HTMLStripCharFilter remove text between angle brackets then i should get 
"i", "love"  and not  "i", "love" ,"pizza", "hut"
I am on Lucene 4.8

> HTML strip filter removes text between < and >
> ----------------------------------------------
>
>                 Key: LUCENE-5943
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5943
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>         Environment: Production
>            Reporter: suleman mubarik
>
> If I have this as input “I love <pizza  hut> so much”
> When I apply html striper it removes “pizza  hut” and I get tokens "i", 
> "love" ,"so", "much"
> And these are offsets I get back ((0,1), (2,6), (20,22), (23,27))
> Html strip filter should return "i", "love" ,"pizza", "hut", "so", "much"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to