On 6 June 2014 23:00, Dominique Pellé <dominique.pe...@gmail.com> wrote:

>
> Glancing at the Spanish regexp, I spot those that could
> perhaps be written in a more efficient way:
>
> In rule "PP_V_QUE
> postag="VM.*3.*"
>
> postag="VM..3.*"
>
>  "N.*|C.*"   ->  "[NC].*"
>
> But I doubt it will make a big difference overall.
>
>
You're right, no noticeable improvement.


> In French grammar.xml, I have some very long regexp,
> and LT is still fast.  I prefer to find as many errors as I can
> with as fewer false positive as possible.  For speed, I try
> to optimize regexp if I can, but not at the price removing
> rules and finding less errors.
>
>
I agree on that as long as it is worth, so to say I son´t want a rule with
lots of exceptions which triggers once every million sentences in texts.


> On another topic, I'm not sure whether you saw those
> warnings for Spanish when running tests:
>
> The Spanish rule: QUE:1 has exception word [no] which cannot match the
> regexp token [2] [[Qq]ue] so exception seems useless, or did you
> forget skip="..." or scope="previous"?
>
> The Spanish rule: TE2:11 has exception regexp [de|verde|perla|negro]
> in token word [1] [te] which seems useless, or did you forget
> skip="..." or scope="previous"?
>

I saw them recently. These are unmanaged contributions from other
collaborator. The first one is proposed for deletion among others because
they are "lottery" rules (30-40% chance of false positive). The latter
needs refinement but that will be addressed later.


>
> The Spanish rule: D_AN:1, token [3], marked as negated but is empty so
> the negation is useless. Did you mix up negate="yes" and
> negate_pos="yes"?
>
>
That seems to be fixed now but I just saved the file using IDEA instead of
Eclipse.

The section is

      <marker>
        <and>
          <token postag="N.*" postag_regexp="yes"/>

          <token postag="A.*" postag_regexp="yes"/>

          <token negate_pos="yes" postag="V.*" postag_regexp="yes"/>
        </and>
      </marker>

meaning "catch tokents with POS value names and adjetives but not verbs".

 It doesn´t trigger that any more but it may be interesting hearing from
other alternative. I did not manage to use exceptions here.


Best regards,
Juan
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their 
applications. Written by three acclaimed leaders in the field, 
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to