uschindler commented on PR #15900:
URL: https://github.com/apache/lucene/pull/15900#issuecomment-4175906875
Here is the test for above code:
```java
public void testTruncating() throws Exception {
// Note: also truncates one more char when the final char would
otherwise be half of a surrogate
// pair
TokenStream stream =
whitespaceMockTokenizer(
"abcdefg 1234567 ABCDEFG abcde abc 12345 123 1234😃5 😃12345 😃😃
😃😃😃 😃😃😃😃 😃😃😃😃😃 😃😃😃😃😃😃");
stream = new TruncateTokenFilter(stream, 5);
assertTokenStreamContents(
stream,
new String[] {
"abcde",
"12345",
"ABCDE",
"abcde",
"abc",
"12345",
"123",
"1234😃",
"😃1234",
"😃😃",
"😃😃😃",
"😃😃😃😃",
"😃😃😃😃😃",
"😃😃😃😃😃"
});
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]