[
https://issues.apache.org/jira/browse/OPENNLP-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Wiesner updated OPENNLP-1702:
------------------------------------
Description:
With the recent addition of {{BratNameSampleStreamFactoryTest}} via
OPENNLP-1695, it became obvious (Eval test run), that the code in
BratDocumentStream is prone to non-determinism. This stems from the fact that
{{java.util.File#listFiles(..)}} does not guarantee any order of the returned
elements.
A potential fix for achieving determinism again is to sort the result of
listFiles(..) alphabetically in ASC order.
was:
With the recent addition of {{BratNameSampleStreamFactoryTest}} via
OPENNLP-1695, it became obvious (Eval test run), that the code in
BratDocumentStream is prone to non-determinism. This stems from the fact that
{{java.util.File#listFiles(..)}} does not guarantee any order of the returned
elements.
A potential fix for achieving determinism again, is to sort the result of
listFiles(..) alphabetically in ASC order.
> BratDocumentStream should process files in bratCorpusDir deterministically
> --------------------------------------------------------------------------
>
> Key: OPENNLP-1702
> URL: https://issues.apache.org/jira/browse/OPENNLP-1702
> Project: OpenNLP
> Issue Type: Bug
> Components: Build, Packaging and Test
> Affects Versions: 2.5.3
> Reporter: Martin Wiesner
> Assignee: Martin Wiesner
> Priority: Minor
> Fix For: 2.5.4
>
>
> With the recent addition of {{BratNameSampleStreamFactoryTest}} via
> OPENNLP-1695, it became obvious (Eval test run), that the code in
> BratDocumentStream is prone to non-determinism. This stems from the fact that
> {{java.util.File#listFiles(..)}} does not guarantee any order of the returned
> elements.
> A potential fix for achieving determinism again is to sort the result of
> listFiles(..) alphabetically in ASC order.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)