[ https://issues.apache.org/jira/browse/OPENNLP-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697111#comment-17697111 ]
ASF GitHub Bot commented on OPENNLP-1474: ----------------------------------------- mawiesne merged PR #516: URL: https://github.com/apache/opennlp/pull/516 > Create tokenizer factories for other langs (Spanish, Italian, ...) > ------------------------------------------------------------------ > > Key: OPENNLP-1474 > URL: https://issues.apache.org/jira/browse/OPENNLP-1474 > Project: OpenNLP > Issue Type: Improvement > Components: Tokenizer > Affects Versions: 2.1.1 > Reporter: Bruno P. Kinoshita > Assignee: Martin Wiesner > Priority: Major > Fix For: 2.1.2 > > > From [https://github.com/apache/opennlp/pull/506#issuecomment-1445849746] > We can create more factories for languages such as Spanish and Italian. For > example: > {noformat} > // From: https://it.wikipedia.org/wiki/Alfabeto_italiano > private static final Pattern ITALIAN = > Pattern.compile("^[0-9a-zàèéìîíòóùüA-ZÀÈÉÌÎÍÒÓÙÜ]+$"); > // From: https://en.wikiversity.org/wiki/Alphabet/Spanish_alphabet & > https://en.wikipedia.org/wiki/Spanish_orthography#Alphabet_in_Spanish & > https://www.fundeu.es/consulta/tilde-en-la-y-y-griega-o-ye-24786/ > private static final Pattern SPANISH = > Pattern.compile("^[0-9a-záéíóúüýñA-ZÁÉÍÓÚÝÑ]+$"); {noformat} > Community feedback would be appreciated. -- This message was sent by Atlassian Jira (v8.20.10#820010)