Re: The SENT_END challenge

Juan Martorell Sun, 10 Aug 2014 09:51:06 -0700

Thank you for your answer, Jaume.


On 9 August 2014 11:30, Jaume Ortolà i Font <jaumeort...@gmail.com> wrote:

> Hi,
>
> A possible and simple solution is to write two rules. One for sentences
> with ending punctuation:
>
>     <pattern>
>         <marker>
>             <token regexp="yes">(you|thei|ou)r</token>
>         </marker>
>         <token regexp="yes">[.?!]</token>
>     </pattern>
>
> And another one for sentences without ending punctuation:
>
>     <pattern>
>         <marker>
>             <token postag="SENT_END" regexp="yes">(you|thei|ou)r</token>
>         </marker>
>     </pattern>
>
>
> They are in fact two different patterns, so it is logical to use two
> different rules.
>
>
Actually they are the same issue, only separated by the lack of a EOS
symbol in the second case. The pattern varies because the tokenizing, but
the facts is that every rule regarding SENT_END must be duplicated. Since
there are many potential rules based on the end of sentence, I think it is
worth thinking on a way to avoid this duplication. BTW duplicated code
<http://en.wikipedia.org/wiki/Duplicate_code> is generally considered a code
smell <http://en.wikipedia.org/wiki/Code_smell> and thus should be avoided.

Regards,

Juan Martorell

------------------------------------------------------------------------------

_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: The SENT_END challenge

Reply via email to