Hi,

The results are quite better tonight.

I have applied this simple rule: "When neither the current sentence not the
previous one have an ending punctuation mark, then don't require uppercase
sentence start".

For example, in Polish, 3816 false alarms were generated on July 2, and
3531 alarms have been removed on July 3. I think that the remaining alarms
are probably related to Wikipedia encoding issues.

Daniel, could you generate the differences between July 1 and July 3 to see
exactly what are these problems.

Regards,
Jaume



2013/7/3 Marcin Miłkowski <list-addr...@wp.pl>

> Hi,
>
> W dniu 2013-07-03 00:03, Jaume Ortolà i Font pisze:
> > Hi,
> >
> > You can see what has happened in the Wikipedia checks. See the links
> below.
> >
> > In some languages, there are false alarms removed: French, Breton and
> > Catalan. That looks good.
> > Other languages have added alarms: English, German, Russian, Polish and
> > Italian. The reason is that these languages had previously a special
> >   treatment that has been now removed.
>
> And all alarms are false. No false alarm has been removed. I think this
> is a clear case of a regression.
>
> >
> > The question is what to do with sentences that end with no ending
> > punctuation mark (.?!...). If we don't require uppercase sentence start
> > in these sentences, we avoid a lot of false alarms in lists, tables,
> > etc., as you can see in the Wikipedia check. On the other hand, we can
> > get false negatives, as in the reported bug, in titles, etc., when (by
> > mistake or not) there is no punctuation mark at the sentence end.
>
> Polish has special norms about titles: you can use a question mark but
> not a dot at the end (even if it's a complete sentence).
>
> >
> > We can try a midway solution: don't require upper case sentence start
> > when both the previous and the current sentence have no ending
> > punctuation mark. This situation is what we can find in a list or a
> > table, and we can surmise it isn't an accumulation of mistakes.
> >
> > What do you think? Any ideas?
>
> Well, I'm not sure but your current solution definitely does not work
> for Polish.
>
> Regards,
> Marcin
>
> >
> > Regards,
> > Jaume Ortolà
> >
> >
> > LanguageTool Nightly Diff Overview 2013-07-02 22:20
> >
> >     This page lists the results of our automatic nightly testing against
> a
> >     fixed Wikipedia corpus with 1000 articles per language.
> >
> >     Changes 2013-07-01 22:20 to 2013-07-02 22:20
> >     Version: 2.3-SNAPSHOT (2013-07-02 22:02)
> >     [1]Changed: en
> >     [2]Changed: de
> >     [3]Changed: fr
> >     [4]Changed: ru
> >     [5]Changed: br
> >     [6]Changed: ca
> >     [7]Changed: pl
> >     [8]Changed: it
> >
> >     Total runtime: 2013-07-02 22:20 to 2013-07-02 23:10
> >
> > References
> >
> >     1.
> >
> http://languagetool.org/regression-tests/20130702/result_en_20130702.html<
> http://languagetool.org/regression-tests/20130702/result_en_20130702.html>
> >     2.
> >
> http://languagetool.org/regression-tests/20130702/result_de_20130702.html<
> http://languagetool.org/regression-tests/20130702/result_de_20130702.html>
> >     3.
> >
> http://languagetool.org/regression-tests/20130702/result_fr_20130702.html<
> http://languagetool.org/regression-tests/20130702/result_fr_20130702.html>
> >     4.
> >
> http://languagetool.org/regression-tests/20130702/result_ru_20130702.html<
> http://languagetool.org/regression-tests/20130702/result_ru_20130702.html>
> >     5.
> >
> http://languagetool.org/regression-tests/20130702/result_br_20130702.html<
> http://languagetool.org/regression-tests/20130702/result_br_20130702.html>
> >     6.
> >
> http://languagetool.org/regression-tests/20130702/result_ca_20130702.html<
> http://languagetool.org/regression-tests/20130702/result_ca_20130702.html>
> >     7.
> >
> http://languagetool.org/regression-tests/20130702/result_pl_20130702.html<
> http://languagetool.org/regression-tests/20130702/result_pl_20130702.html>
> >     8.
> >
> http://languagetool.org/regression-tests/20130702/result_it_20130702.html<
> http://languagetool.org/regression-tests/20130702/result_it_20130702.html>
> >
> >
> > 2013/7/2 Jaume Ortolà i Font <jaumeort...@gmail.com
> > <mailto:jaumeort...@gmail.com>>
> >
> >     Hi,
> >
> >     There is a bug report about the behavior of
> UppercaseSentenceStartRule:
> >
> >     https://sourceforge.net/p/languagetool/bugs/185/
> >     <https://sourceforge.net/p/languagetool/bugs/185/>
> >
> >     I think that the only situation in which we can safely prevent the
> >     rule to match is when the previous sentence ends with comma or
> >     semicolon. So I propose to implement this for all languages.
> >
> >     Perhaps we can do the same when the previous sentence ends with no
> >     punctuation mark at all. This could be useful for table cells, but
> >     sometimes there will be ambiguities. I am not sure.
> >
> >     The current implementation looks at the sentence end to decide what
> >     to do at the start of the same sentence. I think this makes no sense
> >     and causes false negatives.
> >
> >     I can make some changes and we'll be able to see what happens in the
> >     wikipedia checks.
> >
> >     Regards,
> >     Jaume Ortolà
> >
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > This SF.net email is sponsored by Windows:
> >
> > Build for Windows Store.
> >
> > http://p.sf.net/sfu/windows-dev2dev
> >
> >
> >
> > _______________________________________________
> > Languagetool-devel mailing list
> > Languagetool-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> >
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to