[
https://issues.apache.org/jira/browse/OPENNLP-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17894820#comment-17894820
]
ASF GitHub Bot commented on OPENNLP-1226:
-----------------------------------------
mawiesne opened a new pull request, #678:
URL: https://github.com/apache/opennlp/pull/678
Change
-
- adds `NameFinderMEWithDatesTest` verifying German (and English) date
formats, via custom training data per language, as reproducer for OpenNLP-1226
- adds `RandomGermanNewsGenerator` to generate synthetic news corpora with
dates annotated, typical for DE locale
- adds `RandomEnglishNewsGenerator` to generate synthetic news corpora with
dates annotated, typical for EN,US/UK/AUS locale
- extracts code to `AbstractNameFinderTest` to avoid cnp of re-usable code
Tasks
-
Thank you for contributing to Apache OpenNLP.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
### For all changes:
- [x] Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
- [x] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA
number you are trying to resolve? Pay particular attention to the hyphen "-"
character.
- [x] Has your PR been rebased against the latest commit within the target
branch (typically main)?
- [x] Is your initial contribution a single, squashed commit?
### For code changes:
- [x] Have you ensured that the full suite of tests is executed via mvn
clean install at the root opennlp folder?
- [x] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the LICENSE file, including the main
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main
NOTICE file found in opennlp folder?
### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in which
it is rendered?
### Note:
Please ensure that once the PR is submitted, you check GitHub Actions for
build issues and submit an update to your PR as soon as possible.
> Training an NER model for dates with 'dd.mm.yyyy' as Date format
> ----------------------------------------------------------------
>
> Key: OPENNLP-1226
> URL: https://issues.apache.org/jira/browse/OPENNLP-1226
> Project: OpenNLP
> Issue Type: Question
> Components: Name Finder
> Reporter: Olga
> Assignee: Martin Wiesner
> Priority: Minor
> Labels: newbie
> Fix For: 2.4.1
>
> Time Spent: 6h
> Remaining Estimate: 0h
>
> My txt file for model training has date tags in <START:date> dd.mm.yyyy <END>
> format. But when I try to use the trained .bin file, the dates are not
> extracted as they should. My txt tagged file is written one sentence in line.
> I was wondering maybe the format, and the fullstops in this date format make
> a difficulty for the model to learn. In the official OpenNLP documentation I
> can see there is a bin file with date extraction, but I can't see the txt
> file containing the tags.
> I tried to open this bin as a txt format but I read in Stack Overflow that I
> can't do that.
> https://stackoverflow.com/questions/26140492/how-can-i-view-the-content-of-a-bin-file-in-opennlp
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)