[ https://issues.apache.org/jira/browse/OPENNLP-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798177#comment-17798177 ]
ASF GitHub Bot commented on OPENNLP-1526: ----------------------------------------- kinow commented on code in PR #566: URL: https://github.com/apache/opennlp/pull/566#discussion_r1430072960 ########## opennlp-tools/lang/es/abb_ES.xml: ########## @@ -0,0 +1,254 @@ +<?xml version="1.0" encoding="UTF-8"?> + +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +<dictionary case_sensitive="false"> + <entry> + <token>a.C.</token> + </entry> + <entry> + <token>a. de C.</token> + </entry> + <entry> + <token>a.J.C.</token> + </entry> + <entry> + <token>a. de J.C.</token> + </entry> + <entry> + <token>a. m.</token> + </entry> + <entry> + <token>apdo.</token> + </entry> + <entry> + <token>apdo.</token> + </entry> + <entry> + <token>aprox.</token> + </entry> + <entry> + <token>Av.</token> + </entry> + <entry> + <token>Avda.</token> + </entry> + <entry> + <token>Bs. As.</token> + </entry> + <entry> + <token>c.c.</token> + </entry> + <entry> + <token>cap.</token> + </entry> + <entry> + <token>D.</token> + </entry> + <entry> + <token>Da.</token> + </entry> + <entry> + <token>Dña.</token> + </entry> + <entry> + <token>d.C.</token> + </entry> + <entry> + <token>d. de C.</token> + </entry> + <entry> + <token>d.J.C.</token> + </entry> + <entry> + <token>d. de J.C</token> Review Comment: Good catch. I wonder if it will detect it as "después de jesuscristo" or "don de Jesus Cristo" :thinking: > Add Spanish abbreviation dictionary > ----------------------------------- > > Key: OPENNLP-1526 > URL: https://issues.apache.org/jira/browse/OPENNLP-1526 > Project: OpenNLP > Issue Type: Improvement > Components: Sentence Detector, Tokenizer > Affects Versions: 2.3.0, 2.3.1 > Reporter: Martin Wiesner > Assignee: Martin Wiesner > Priority: Minor > Fix For: 2.3.2 > > Attachments: abb_ES.xml > > Time Spent: 1h > Remaining Estimate: 1h > > Similar to the addition in OPENNLP-570, an abbreviation dictionary for > Spanish sentence detection and tokenisation might be beneficial. > Aims: > - Create and add a new file {{abb_ES.xml}} to _opennlp-tools/lang/es_ > - Add basic set of test cases -- This message was sent by Atlassian Jira (v8.20.10#820010)