[ 
https://issues.apache.org/jira/browse/OPENNLP-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654044#comment-17654044
 ] 

ASF GitHub Bot commented on OPENNLP-1182:
-----------------------------------------

rzo1 commented on code in PR #482:
URL: https://github.com/apache/opennlp/pull/482#discussion_r1060658665


##########
opennlp-tools/src/test/java/opennlp/tools/formats/leipzig/LeipzigLanguageSampleStreamTest.java:
##########
@@ -34,6 +35,9 @@ public class LeipzigLanguageSampleStreamTest {
   private static String testDataPath = LeipzigLanguageSampleStreamTest.class
       
.getClassLoader().getResource("opennlp/tools/formats/leipzig/samples").getPath();
 
+  @TempDir

Review Comment:
   Might toogle GH action Windows builds under certain conditions. But looks 
like IT worked this time.





> LanguageDetectorConverterTool is a no-op, despite the docs saying otherwise
> ---------------------------------------------------------------------------
>
>                 Key: OPENNLP-1182
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1182
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Language Detector
>    Affects Versions: 1.8.4
>            Reporter: Steven Rowe
>            Assignee: Atita Arora
>            Priority: Minor
>
> Contrary to the docs (see below), LanguageDetectorConverterTool doesn't 
> actually do anything at all; the class is empty.
> {quote}
> The following sequence of commands shows how to convert the Leipzig Corpora 
> collection at folder leipzig-train/ to the default Language Detector format, 
> by creating groups of 5 sentences as documents and limiting to 10000 
> documents per language. Them, it shuffles the result and select the first 
> 100000 lines as train corpus and the last 20000 as evaluation corpus:
> {noformat}                                    
> $ bin/opennlp LanguageDetectorConverter leipzig -sentencesDir leipzig-train/ 
> -sentencesPerSample 5 -samplesPerLanguage 10000 > leipzig.txt
> $ perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);' < leipzig.txt > 
> leipzig_shuf.txt
> $ head -100000 < leipzig_shuf.txt > leipzig.train
> $ tail -20000 < leipzig_shuf.txt > leipzig.eval
> {noformat}
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to