[ 
https://issues.apache.org/jira/browse/OPENNLP-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798177#comment-17798177
 ] 

ASF GitHub Bot commented on OPENNLP-1526:
-----------------------------------------

kinow commented on code in PR #566:
URL: https://github.com/apache/opennlp/pull/566#discussion_r1430072960


##########
opennlp-tools/lang/es/abb_ES.xml:
##########
@@ -0,0 +1,254 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.
+-->
+
+<dictionary case_sensitive="false">
+  <entry>
+    <token>a.C.</token>
+  </entry>
+  <entry>
+    <token>a. de C.</token>
+  </entry>
+  <entry>
+    <token>a.J.C.</token>
+  </entry>
+  <entry>
+    <token>a. de J.C.</token>
+  </entry>
+  <entry>
+    <token>a. m.</token>
+  </entry>
+  <entry>
+    <token>apdo.</token>
+  </entry>
+  <entry>
+    <token>apdo.</token>
+  </entry>
+  <entry>
+    <token>aprox.</token>
+  </entry>
+  <entry>
+    <token>Av.</token>
+  </entry>
+  <entry>
+    <token>Avda.</token>
+  </entry>
+  <entry>
+    <token>Bs. As.</token>
+  </entry>
+  <entry>
+    <token>c.c.</token>
+  </entry>
+  <entry>
+    <token>cap.</token>
+  </entry>
+  <entry>
+    <token>D.</token>
+  </entry>
+  <entry>
+    <token>Da.</token>
+  </entry>
+  <entry>
+    <token>Dña.</token>
+  </entry>
+  <entry>
+    <token>d.C.</token>
+  </entry>
+  <entry>
+    <token>d. de C.</token>
+  </entry>
+  <entry>
+    <token>d.J.C.</token>
+  </entry>
+  <entry>
+    <token>d. de J.C</token>

Review Comment:
   Good catch. I wonder if it will detect it as "después de jesuscristo" or 
"don de Jesus Cristo" :thinking: 





> Add Spanish abbreviation dictionary
> -----------------------------------
>
>                 Key: OPENNLP-1526
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1526
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector, Tokenizer
>    Affects Versions: 2.3.0, 2.3.1
>            Reporter: Martin Wiesner
>            Assignee: Martin Wiesner
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: abb_ES.xml
>
>          Time Spent: 1h
>  Remaining Estimate: 1h
>
> Similar to the addition in OPENNLP-570, an abbreviation dictionary for 
> Spanish sentence detection and tokenisation might be beneficial.
> Aims:
>  - Create and add a new file {{abb_ES.xml}} to _opennlp-tools/lang/es_
>  - Add basic set of test cases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to