Hi. It might be usefull to implement a check like this to clean the corpus into a cleaner one to feed to LT. This way the cleaning is only done once.
Ruud > Hi, > > When analyzing a long text or a corpus with LanguageTool on the command > line, it would be useful to discard sentences in a language other than the > expected one (quotations, dialogs, bibliography...). That way we could > remove a lot a annoying alarms. Some kind of threshold should be set (i.e. > 3 or 4 spelling mistakes, or 5 or 6 total mistakes per sentence), and the > sentences that exceed the threshold should be marked someway as discarded > and printed separately. > > This could be an option on the command line similar to this one: > > -u, --list-unkown also print a summary of words from the input that > LanguageTool doesn't know. > > What do you think about implementing this option? > > Regards, > Jaume Ortolà > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. SALE $99.99 this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122412_______________________________________________ > Languagetool-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122412 _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
