[
https://issues.apache.org/jira/browse/OPENNLP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801610#comment-17801610
]
ASF GitHub Bot commented on OPENNLP-1531:
-----------------------------------------
kinow commented on code in PR #581:
URL: https://github.com/apache/opennlp/pull/581#discussion_r1439085946
##########
opennlp-tools/src/test/java/opennlp/tools/tokenize/TokenizerFactoryTest.java:
##########
@@ -194,7 +199,10 @@ void checkCustomPatternForTokenizerME(String lang, String
pattern, String senten
String[] tokens = tokenizer.tokenize(sentence);
Assertions.assertEquals(expectedNumTokens, tokens.length);
- String[] sentSplit = sentence.replaceAll("'", " '").split(" ");
+ String[] sentSplit = sentence
+ .replaceAll("'", " '")
+ .replaceAll(",", " ,")
Review Comment:
One of my examples failed returning a token as `word,`, so I added this
extra replace in this test :+1:
> Add Portuguese abbreviation dictionary
> --------------------------------------
>
> Key: OPENNLP-1531
> URL: https://issues.apache.org/jira/browse/OPENNLP-1531
> Project: OpenNLP
> Issue Type: Improvement
> Affects Versions: 2.3.1
> Reporter: Bruno P. Kinoshita
> Priority: Minor
>
> Similar to the addition inĀ OPENNLP-570 and OPENNLP-1526, an abbreviation
> dictionary for Italian sentence detection and tokenisation might be
> beneficial.
> Aims:
> - Create and add a new file {{abb_PT.xml}} to _opennlp-tools/lang/pt_
> - Add basic set of test cases
> Other:
> - Confirm if European/Brazilian/African/Creole Portuguese have the same
> abbreviations or if we need different languages...
--
This message was sent by Atlassian Jira
(v8.20.10#820010)