[jira] [Commented] (OPENNLP-1482) Document the OpenNLP eval test data

ASF GitHub Bot (Jira) Mon, 24 Apr 2023 08:39:30 -0700


    [ 
https://issues.apache.org/jira/browse/OPENNLP-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715881#comment-17715881
 ]


ASF GitHub Bot commented on OPENNLP-1482:
-----------------------------------------

kinow commented on code in PR #531:
URL: https://github.com/apache/opennlp/pull/531#discussion_r1175464856


##########
opennlp-docs/src/docbkx/evaltest.xml:
##########
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="opennlp.evaltest">
+<title>Evaluation Test Data</title>
+       <section id="opennlp.evaltest.whatisit">
+               <title>What is it ?</title>
+               <para>
+                       Eval test data is the data that helps with the 
evaluation tests to evaluate the functionality and
+                       performance of OpenNLP.
+                       These tests ensure reliability and can help identify 
potential bugs, errors, or performance issues.
+               </para>
+               <para>
+                       The evaluation tests leverage the k-fold 
cross-validation procedure.
+                       This technique works by dividing the eval-data into 'k' 
equally sized parts or folds.
+                       The algorithm is then trained on 'k-1' of the folds and 
tested on the remaining fold.
+                       This process is repeated 'k' times, so that each of the 
k-folds is used exactly once as the test data,
+                       and the results of each fold are combined to produce an 
overall estimate of the algorithm's performance.
+               </para>
+       </section>
+       <section id="opennlp.evaltest.whereisit">
+               <title>Where is it?</title>
+               <para>
+                       OpenNLP Eval Test Data is available at <ulink 
url="https://nightlies.apache.org/opennlp/opennlp-data.zip";>

Review Comment:
   Maybe link to the folder? <https://nightlies.apache.org/opennlp/
   
   So there are no accidental clicks to a 3.2 GB file :sweat_smile: and also if 
we change the name of the file, or add anything else.



##########
opennlp-docs/src/docbkx/evaltest.xml:
##########
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="opennlp.evaltest">
+<title>Evaluation Test Data</title>
+       <section id="opennlp.evaltest.whatisit">
+               <title>What is it ?</title>
+               <para>
+                       Eval test data is the data that helps with the 
evaluation tests to evaluate the functionality and
+                       performance of OpenNLP.
+                       These tests ensure reliability and can help identify 
potential bugs, errors, or performance issues.
+               </para>
+               <para>
+                       The evaluation tests leverage the k-fold 
cross-validation procedure.
+                       This technique works by dividing the eval-data into 'k' 
equally sized parts or folds.

Review Comment:
   Also, do we have some formatting in the `opennlp-docs` for variables? Does 
it work if we `<i>k</i>`, or `<pre>k</pre>`, or use some other markup for 
variables like `k`, `k-1`, etc? 



##########
opennlp-docs/src/docbkx/evaltest.xml:
##########
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="opennlp.evaltest">
+<title>Evaluation Test Data</title>
+       <section id="opennlp.evaltest.whatisit">
+               <title>What is it ?</title>
+               <para>
+                       Eval test data is the data that helps with the 
evaluation tests to evaluate the functionality and
+                       performance of OpenNLP.
+                       These tests ensure reliability and can help identify 
potential bugs, errors, or performance issues.
+               </para>
+               <para>
+                       The evaluation tests leverage the k-fold 
cross-validation procedure.
+                       This technique works by dividing the eval-data into 'k' 
equally sized parts or folds.

Review Comment:
   s/eval-data/evaluation data



##########
opennlp-docs/src/docbkx/evaltest.xml:
##########
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="opennlp.evaltest">
+<title>Evaluation Test Data</title>
+       <section id="opennlp.evaltest.whatisit">
+               <title>What is it ?</title>
+               <para>
+                       Eval test data is the data that helps with the 
evaluation tests to evaluate the functionality and
+                       performance of OpenNLP.

Review Comment:
   Eval test... evaluation tests to evalue. Some repetition there. Maybe 
something like
   
   "The evaluation test data is the data used in the tests that evaluate 
functionality and performance of OpenNLP."? 



##########
opennlp-docs/src/docbkx/evaltest.xml:
##########
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="opennlp.evaltest">
+<title>Evaluation Test Data</title>
+       <section id="opennlp.evaltest.whatisit">
+               <title>What is it ?</title>
+               <para>
+                       Eval test data is the data that helps with the 
evaluation tests to evaluate the functionality and
+                       performance of OpenNLP.
+                       These tests ensure reliability and can help identify 
potential bugs, errors, or performance issues.
+               </para>
+               <para>
+                       The evaluation tests leverage the k-fold 
cross-validation procedure.
+                       This technique works by dividing the eval-data into 'k' 
equally sized parts or folds.
+                       The algorithm is then trained on 'k-1' of the folds and 
tested on the remaining fold.
+                       This process is repeated 'k' times, so that each of the 
k-folds is used exactly once as the test data,
+                       and the results of each fold are combined to produce an 
overall estimate of the algorithm's performance.
+               </para>
+       </section>
+       <section id="opennlp.evaltest.whereisit">
+               <title>Where is it?</title>
+               <para>
+                       OpenNLP Eval Test Data is available at <ulink 
url="https://nightlies.apache.org/opennlp/opennlp-data.zip";>
+                       
https://nightlies.apache.org/opennlp/opennlp-data.zip</ulink>
+                       Here's a link to the eval-tests build on Jenkins:<ulink 
url="https://builds.apache.org/job/OpenNLP/";>
+                       https://builds.apache.org/job/OpenNLP/</ulink>
+               </para>
+        </section>
+       <section id="opennlp.evaltest.howtouseit">
+               <title>How to use the eval-data to run test?</title>
+               <para>
+                       The Eval Test Data can be downloaded and saved in the 
desired directory and can be used to run
+                       OpenNLP Eval Test as below:
+               <screen>
+                       <![CDATA[
+mvn test -DOPENNLP_DATA_DIR=/path/to/opennlp-eval-test-data/ -Peval-tests
+                       ]]>
+               </screen>
+               </para>
+       </section>
+       <section id="opennlp.evaltest.howtochangeit">
+               <title>How to change eval-data?</title>

Review Comment:
   s/eval-data/evaluation data



##########
opennlp-docs/src/docbkx/evaltest.xml:
##########
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="opennlp.evaltest">
+<title>Evaluation Test Data</title>
+       <section id="opennlp.evaltest.whatisit">
+               <title>What is it ?</title>
+               <para>
+                       Eval test data is the data that helps with the 
evaluation tests to evaluate the functionality and
+                       performance of OpenNLP.
+                       These tests ensure reliability and can help identify 
potential bugs, errors, or performance issues.
+               </para>
+               <para>
+                       The evaluation tests leverage the k-fold 
cross-validation procedure.
+                       This technique works by dividing the eval-data into 'k' 
equally sized parts or folds.
+                       The algorithm is then trained on 'k-1' of the folds and 
tested on the remaining fold.
+                       This process is repeated 'k' times, so that each of the 
k-folds is used exactly once as the test data,
+                       and the results of each fold are combined to produce an 
overall estimate of the algorithm's performance.
+               </para>
+       </section>
+       <section id="opennlp.evaltest.whereisit">
+               <title>Where is it?</title>
+               <para>
+                       OpenNLP Eval Test Data is available at <ulink 
url="https://nightlies.apache.org/opennlp/opennlp-data.zip";>
+                       
https://nightlies.apache.org/opennlp/opennlp-data.zip</ulink>
+                       Here's a link to the eval-tests build on Jenkins:<ulink 
url="https://builds.apache.org/job/OpenNLP/";>
+                       https://builds.apache.org/job/OpenNLP/</ulink>
+               </para>
+        </section>
+       <section id="opennlp.evaltest.howtouseit">
+               <title>How to use the eval-data to run test?</title>
+               <para>
+                       The Eval Test Data can be downloaded and saved in the 
desired directory and can be used to run

Review Comment:
   Maybe we should use just Evaluation Test Data, to avoid eval-data, eval test 
data, etc? Same for Eval Tests, or eval-tests, and use just Evaluation Tests? 
Should be fine to use amongst ourselves, but for docs it might be helpful to be 
more uniform, I think.





> Document the OpenNLP eval test data
> -----------------------------------
>
>                 Key: OPENNLP-1482
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1482
>             Project: OpenNLP
>          Issue Type: Task
>          Components: Documentation
>            Reporter: Jeff Zemerick
>            Assignee: Atita Arora
>            Priority: Major
>
> Document the OpenNLP eval test data. Include things like what it is, where it 
> is, how to use it, how to change it, etc.
> [https://nightlies.apache.org/opennlp/opennlp-data.zip]
> How to change files on nightlies: https://nightlies.apache.org/authoring.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (OPENNLP-1482) Document the OpenNLP eval test data

Reply via email to