Hi All!

 

Let say I have a filter that produces new tokens based on the original ones.

How bad will it be if my filter sets the start of each token to 0 and end to
the length of a token?

An example (based on the phrase "How are you?":

 

Original token: 

[you?] (8,12)

 

New tokens: 

[you] (0,3)      

[?] (0,1)

 

It wouldn't be so hard to calculate the right numbers for left to right
languages and it is a bit more challenging to do it for right to left ones
but for mixed text it is quite hard. 

 

Thanks.

Reply via email to