[GitHub] [opennlp] kinow commented on a diff in pull request #531: OPENNLP-1482 : Documentation for OpenNLP Eval Test Data

via GitHub Mon, 24 Apr 2023 08:39:21 -0700


kinow commented on code in PR #531:
URL: https://github.com/apache/opennlp/pull/531#discussion_r1175464856



##########
opennlp-docs/src/docbkx/evaltest.xml:
##########
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="opennlp.evaltest">
+<title>Evaluation Test Data</title>
+       <section id="opennlp.evaltest.whatisit">
+               <title>What is it ?</title>
+               <para>
+                       Eval test data is the data that helps with the 
evaluation tests to evaluate the functionality and
+                       performance of OpenNLP.
+                       These tests ensure reliability and can help identify 
potential bugs, errors, or performance issues.
+               </para>
+               <para>
+                       The evaluation tests leverage the k-fold 
cross-validation procedure.
+                       This technique works by dividing the eval-data into 'k' 
equally sized parts or folds.
+                       The algorithm is then trained on 'k-1' of the folds and 
tested on the remaining fold.
+                       This process is repeated 'k' times, so that each of the 
k-folds is used exactly once as the test data,
+                       and the results of each fold are combined to produce an 
overall estimate of the algorithm's performance.
+               </para>
+       </section>
+       <section id="opennlp.evaltest.whereisit">
+               <title>Where is it?</title>
+               <para>
+                       OpenNLP Eval Test Data is available at <ulink 
url="https://nightlies.apache.org/opennlp/opennlp-data.zip";>

Review Comment:
   Maybe link to the folder? <https://nightlies.apache.org/opennlp/
   
   So there are no accidental clicks to a 3.2 GB file :sweat_smile: and also if 
we change the name of the file, or add anything else.



##########
opennlp-docs/src/docbkx/evaltest.xml:
##########
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="opennlp.evaltest">
+<title>Evaluation Test Data</title>
+       <section id="opennlp.evaltest.whatisit">
+               <title>What is it ?</title>
+               <para>
+                       Eval test data is the data that helps with the 
evaluation tests to evaluate the functionality and
+                       performance of OpenNLP.
+                       These tests ensure reliability and can help identify 
potential bugs, errors, or performance issues.
+               </para>
+               <para>
+                       The evaluation tests leverage the k-fold 
cross-validation procedure.
+                       This technique works by dividing the eval-data into 'k' 
equally sized parts or folds.

Review Comment:
   Also, do we have some formatting in the `opennlp-docs` for variables? Does 
it work if we `<i>k</i>`, or `<pre>k</pre>`, or use some other markup for 
variables like `k`, `k-1`, etc? 



##########
opennlp-docs/src/docbkx/evaltest.xml:
##########
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="opennlp.evaltest">
+<title>Evaluation Test Data</title>
+       <section id="opennlp.evaltest.whatisit">
+               <title>What is it ?</title>
+               <para>
+                       Eval test data is the data that helps with the 
evaluation tests to evaluate the functionality and
+                       performance of OpenNLP.
+                       These tests ensure reliability and can help identify 
potential bugs, errors, or performance issues.
+               </para>
+               <para>
+                       The evaluation tests leverage the k-fold 
cross-validation procedure.
+                       This technique works by dividing the eval-data into 'k' 
equally sized parts or folds.

Review Comment:
   s/eval-data/evaluation data



##########
opennlp-docs/src/docbkx/evaltest.xml:
##########
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="opennlp.evaltest">
+<title>Evaluation Test Data</title>
+       <section id="opennlp.evaltest.whatisit">
+               <title>What is it ?</title>
+               <para>
+                       Eval test data is the data that helps with the 
evaluation tests to evaluate the functionality and
+                       performance of OpenNLP.

Review Comment:
   Eval test... evaluation tests to evalue. Some repetition there. Maybe 
something like
   
   "The evaluation test data is the data used in the tests that evaluate 
functionality and performance of OpenNLP."? 



##########
opennlp-docs/src/docbkx/evaltest.xml:
##########
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="opennlp.evaltest">
+<title>Evaluation Test Data</title>
+       <section id="opennlp.evaltest.whatisit">
+               <title>What is it ?</title>
+               <para>
+                       Eval test data is the data that helps with the 
evaluation tests to evaluate the functionality and
+                       performance of OpenNLP.
+                       These tests ensure reliability and can help identify 
potential bugs, errors, or performance issues.
+               </para>
+               <para>
+                       The evaluation tests leverage the k-fold 
cross-validation procedure.
+                       This technique works by dividing the eval-data into 'k' 
equally sized parts or folds.
+                       The algorithm is then trained on 'k-1' of the folds and 
tested on the remaining fold.
+                       This process is repeated 'k' times, so that each of the 
k-folds is used exactly once as the test data,
+                       and the results of each fold are combined to produce an 
overall estimate of the algorithm's performance.
+               </para>
+       </section>
+       <section id="opennlp.evaltest.whereisit">
+               <title>Where is it?</title>
+               <para>
+                       OpenNLP Eval Test Data is available at <ulink 
url="https://nightlies.apache.org/opennlp/opennlp-data.zip";>
+                       
https://nightlies.apache.org/opennlp/opennlp-data.zip</ulink>
+                       Here's a link to the eval-tests build on Jenkins:<ulink 
url="https://builds.apache.org/job/OpenNLP/";>
+                       https://builds.apache.org/job/OpenNLP/</ulink>
+               </para>
+        </section>
+       <section id="opennlp.evaltest.howtouseit">
+               <title>How to use the eval-data to run test?</title>
+               <para>
+                       The Eval Test Data can be downloaded and saved in the 
desired directory and can be used to run
+                       OpenNLP Eval Test as below:
+               <screen>
+                       <![CDATA[
+mvn test -DOPENNLP_DATA_DIR=/path/to/opennlp-eval-test-data/ -Peval-tests
+                       ]]>
+               </screen>
+               </para>
+       </section>
+       <section id="opennlp.evaltest.howtochangeit">
+               <title>How to change eval-data?</title>

Review Comment:
   s/eval-data/evaluation data



##########
opennlp-docs/src/docbkx/evaltest.xml:
##########
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";[
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<chapter id="opennlp.evaltest">
+<title>Evaluation Test Data</title>
+       <section id="opennlp.evaltest.whatisit">
+               <title>What is it ?</title>
+               <para>
+                       Eval test data is the data that helps with the 
evaluation tests to evaluate the functionality and
+                       performance of OpenNLP.
+                       These tests ensure reliability and can help identify 
potential bugs, errors, or performance issues.
+               </para>
+               <para>
+                       The evaluation tests leverage the k-fold 
cross-validation procedure.
+                       This technique works by dividing the eval-data into 'k' 
equally sized parts or folds.
+                       The algorithm is then trained on 'k-1' of the folds and 
tested on the remaining fold.
+                       This process is repeated 'k' times, so that each of the 
k-folds is used exactly once as the test data,
+                       and the results of each fold are combined to produce an 
overall estimate of the algorithm's performance.
+               </para>
+       </section>
+       <section id="opennlp.evaltest.whereisit">
+               <title>Where is it?</title>
+               <para>
+                       OpenNLP Eval Test Data is available at <ulink 
url="https://nightlies.apache.org/opennlp/opennlp-data.zip";>
+                       
https://nightlies.apache.org/opennlp/opennlp-data.zip</ulink>
+                       Here's a link to the eval-tests build on Jenkins:<ulink 
url="https://builds.apache.org/job/OpenNLP/";>
+                       https://builds.apache.org/job/OpenNLP/</ulink>
+               </para>
+        </section>
+       <section id="opennlp.evaltest.howtouseit">
+               <title>How to use the eval-data to run test?</title>
+               <para>
+                       The Eval Test Data can be downloaded and saved in the 
desired directory and can be used to run

Review Comment:
   Maybe we should use just Evaluation Test Data, to avoid eval-data, eval test 
data, etc? Same for Eval Tests, or eval-tests, and use just Evaluation Tests? 
Should be fine to use amongst ourselves, but for docs it might be helpful to be 
more uniform, I think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [opennlp] kinow commented on a diff in pull request #531: OPENNLP-1482 : Documentation for OpenNLP Eval Test Data

Reply via email to