[ https://issues.apache.org/jira/browse/OPENNLP-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715881#comment-17715881 ]
ASF GitHub Bot commented on OPENNLP-1482: ----------------------------------------- kinow commented on code in PR #531: URL: https://github.com/apache/opennlp/pull/531#discussion_r1175464856 ########## opennlp-docs/src/docbkx/evaltest.xml: ########## @@ -0,0 +1,77 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" +"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ +]> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +<chapter id="opennlp.evaltest"> +<title>Evaluation Test Data</title> + <section id="opennlp.evaltest.whatisit"> + <title>What is it ?</title> + <para> + Eval test data is the data that helps with the evaluation tests to evaluate the functionality and + performance of OpenNLP. + These tests ensure reliability and can help identify potential bugs, errors, or performance issues. + </para> + <para> + The evaluation tests leverage the k-fold cross-validation procedure. + This technique works by dividing the eval-data into 'k' equally sized parts or folds. + The algorithm is then trained on 'k-1' of the folds and tested on the remaining fold. + This process is repeated 'k' times, so that each of the k-folds is used exactly once as the test data, + and the results of each fold are combined to produce an overall estimate of the algorithm's performance. + </para> + </section> + <section id="opennlp.evaltest.whereisit"> + <title>Where is it?</title> + <para> + OpenNLP Eval Test Data is available at <ulink url="https://nightlies.apache.org/opennlp/opennlp-data.zip"> Review Comment: Maybe link to the folder? <https://nightlies.apache.org/opennlp/ So there are no accidental clicks to a 3.2 GB file :sweat_smile: and also if we change the name of the file, or add anything else. ########## opennlp-docs/src/docbkx/evaltest.xml: ########## @@ -0,0 +1,77 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" +"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ +]> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +<chapter id="opennlp.evaltest"> +<title>Evaluation Test Data</title> + <section id="opennlp.evaltest.whatisit"> + <title>What is it ?</title> + <para> + Eval test data is the data that helps with the evaluation tests to evaluate the functionality and + performance of OpenNLP. + These tests ensure reliability and can help identify potential bugs, errors, or performance issues. + </para> + <para> + The evaluation tests leverage the k-fold cross-validation procedure. + This technique works by dividing the eval-data into 'k' equally sized parts or folds. Review Comment: Also, do we have some formatting in the `opennlp-docs` for variables? Does it work if we `<i>k</i>`, or `<pre>k</pre>`, or use some other markup for variables like `k`, `k-1`, etc? ########## opennlp-docs/src/docbkx/evaltest.xml: ########## @@ -0,0 +1,77 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" +"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ +]> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +<chapter id="opennlp.evaltest"> +<title>Evaluation Test Data</title> + <section id="opennlp.evaltest.whatisit"> + <title>What is it ?</title> + <para> + Eval test data is the data that helps with the evaluation tests to evaluate the functionality and + performance of OpenNLP. + These tests ensure reliability and can help identify potential bugs, errors, or performance issues. + </para> + <para> + The evaluation tests leverage the k-fold cross-validation procedure. + This technique works by dividing the eval-data into 'k' equally sized parts or folds. Review Comment: s/eval-data/evaluation data ########## opennlp-docs/src/docbkx/evaltest.xml: ########## @@ -0,0 +1,77 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" +"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ +]> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +<chapter id="opennlp.evaltest"> +<title>Evaluation Test Data</title> + <section id="opennlp.evaltest.whatisit"> + <title>What is it ?</title> + <para> + Eval test data is the data that helps with the evaluation tests to evaluate the functionality and + performance of OpenNLP. Review Comment: Eval test... evaluation tests to evalue. Some repetition there. Maybe something like "The evaluation test data is the data used in the tests that evaluate functionality and performance of OpenNLP."? ########## opennlp-docs/src/docbkx/evaltest.xml: ########## @@ -0,0 +1,77 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" +"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ +]> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +<chapter id="opennlp.evaltest"> +<title>Evaluation Test Data</title> + <section id="opennlp.evaltest.whatisit"> + <title>What is it ?</title> + <para> + Eval test data is the data that helps with the evaluation tests to evaluate the functionality and + performance of OpenNLP. + These tests ensure reliability and can help identify potential bugs, errors, or performance issues. + </para> + <para> + The evaluation tests leverage the k-fold cross-validation procedure. + This technique works by dividing the eval-data into 'k' equally sized parts or folds. + The algorithm is then trained on 'k-1' of the folds and tested on the remaining fold. + This process is repeated 'k' times, so that each of the k-folds is used exactly once as the test data, + and the results of each fold are combined to produce an overall estimate of the algorithm's performance. + </para> + </section> + <section id="opennlp.evaltest.whereisit"> + <title>Where is it?</title> + <para> + OpenNLP Eval Test Data is available at <ulink url="https://nightlies.apache.org/opennlp/opennlp-data.zip"> + https://nightlies.apache.org/opennlp/opennlp-data.zip</ulink> + Here's a link to the eval-tests build on Jenkins:<ulink url="https://builds.apache.org/job/OpenNLP/"> + https://builds.apache.org/job/OpenNLP/</ulink> + </para> + </section> + <section id="opennlp.evaltest.howtouseit"> + <title>How to use the eval-data to run test?</title> + <para> + The Eval Test Data can be downloaded and saved in the desired directory and can be used to run + OpenNLP Eval Test as below: + <screen> + <![CDATA[ +mvn test -DOPENNLP_DATA_DIR=/path/to/opennlp-eval-test-data/ -Peval-tests + ]]> + </screen> + </para> + </section> + <section id="opennlp.evaltest.howtochangeit"> + <title>How to change eval-data?</title> Review Comment: s/eval-data/evaluation data ########## opennlp-docs/src/docbkx/evaltest.xml: ########## @@ -0,0 +1,77 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" +"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ +]> +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +<chapter id="opennlp.evaltest"> +<title>Evaluation Test Data</title> + <section id="opennlp.evaltest.whatisit"> + <title>What is it ?</title> + <para> + Eval test data is the data that helps with the evaluation tests to evaluate the functionality and + performance of OpenNLP. + These tests ensure reliability and can help identify potential bugs, errors, or performance issues. + </para> + <para> + The evaluation tests leverage the k-fold cross-validation procedure. + This technique works by dividing the eval-data into 'k' equally sized parts or folds. + The algorithm is then trained on 'k-1' of the folds and tested on the remaining fold. + This process is repeated 'k' times, so that each of the k-folds is used exactly once as the test data, + and the results of each fold are combined to produce an overall estimate of the algorithm's performance. + </para> + </section> + <section id="opennlp.evaltest.whereisit"> + <title>Where is it?</title> + <para> + OpenNLP Eval Test Data is available at <ulink url="https://nightlies.apache.org/opennlp/opennlp-data.zip"> + https://nightlies.apache.org/opennlp/opennlp-data.zip</ulink> + Here's a link to the eval-tests build on Jenkins:<ulink url="https://builds.apache.org/job/OpenNLP/"> + https://builds.apache.org/job/OpenNLP/</ulink> + </para> + </section> + <section id="opennlp.evaltest.howtouseit"> + <title>How to use the eval-data to run test?</title> + <para> + The Eval Test Data can be downloaded and saved in the desired directory and can be used to run Review Comment: Maybe we should use just Evaluation Test Data, to avoid eval-data, eval test data, etc? Same for Eval Tests, or eval-tests, and use just Evaluation Tests? Should be fine to use amongst ourselves, but for docs it might be helpful to be more uniform, I think. > Document the OpenNLP eval test data > ----------------------------------- > > Key: OPENNLP-1482 > URL: https://issues.apache.org/jira/browse/OPENNLP-1482 > Project: OpenNLP > Issue Type: Task > Components: Documentation > Reporter: Jeff Zemerick > Assignee: Atita Arora > Priority: Major > > Document the OpenNLP eval test data. Include things like what it is, where it > is, how to use it, how to change it, etc. > [https://nightlies.apache.org/opennlp/opennlp-data.zip] > How to change files on nightlies: https://nightlies.apache.org/authoring.html -- This message was sent by Atlassian Jira (v8.20.10#820010)