Okay, thanks. Good to know this.
This is however not the time to do that; currently, there is a lot more to do to make what is already in the Dutch LT of better quality. I will keep it in mind for later, when I have a more clear view of remaining spelling issues. (Currently 10% of the collected sentences from internet sources have at least 1 spelling error.)

Ruud

Op 16-09-14 om 15:31 schreef Jaume Ortolà i Font:
2014-09-16 14:43 GMT+02:00 R.Baars <baar...@xs4all.nl <mailto:baar...@xs4all.nl>>:

    How is that done?

    Ruud


Do you mean ignoring tagged words in spellchecking (even if they are not in the dictionary)? It's a configurable option of the speller (at least in the Morfologik speller rule). A line of Java code.

Jaume




    Op 16-09-14 om 13:23 schreef Jaume Ortolà i Font:
    2014-09-16 13:03 GMT+02:00 R.Baars <baar...@xs4all.nl
    <mailto:baar...@xs4all.nl>>:

        I see. This is probably of no use for spellchecking, but it
        is for postagging.


    It gives no suggestions, but it can be used for avoiding false
    positives in spellchecking, if you set that tagged words are to
    be ignored.


        Does
        Abu Dhabi NPCNG00
        cause both words to be tagged with that tag, or are they
        considered 1 token with that postag?


    Tokenization is not changed. In this case:

    <token postag="<NPCNG00>">Abu</token>
    <token postag="</NPCNG00>">Dhabi</token>

    if there are more than two tokens, the inside tokens are not
    tagged. Perhaps this should be optionally changed (ie, tag the
    inside tokens too).

    Regards,
    Jaume



        (Might come in handy for just this tagging..)

        Ruud

        Op 16-09-14 om 12:56 schreef Jaume Ortolà i Font:
        Hi, Ruud.

        I don't find any documentation. It is used in Polish,
        French, Catalan, Russian, Ukrainian and Spanish.

        Implementation:

        Enable it (Java).
        Create a "multiwords.txt" in your resources folder like
        these [1]. The tokens are separated by white space and the
        tag is separated by a tab.

        Result:

        The first token of the multiword is tagged with "<POSTAG>"
        and the last token is tagged with "</POSTAG>".

        The MultiwordChunker is case-insensitive. I would like to
        make it configurable, specially for first letter uppercase.

        Regards,
        Jaume


        [1]
        
https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/pl/src/main/resources/org/languagetool/resource/pl/multiwords.txt

        
https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/ca/src/main/resources/org/languagetool/resource/ca/multiwords.txt

        2014-09-16 12:33 GMT+02:00 R.Baars <baar...@xs4all.nl
        <mailto:baar...@xs4all.nl>>:

            Jaume, thanks, but I am not sure.

            Depends on its implementation I think.

            Where can I find more info?

            Ruud

            Op 16-09-14 om 12:26 schreef Jaume Ortolà i Font:
            2014-09-16 11:21 GMT+02:00 R.J. Baars
            <r.j.ba...@xs4all.nl <mailto:r.j.ba...@xs4all.nl>>:

                We don't agree. There is a spellchecker, but also a
                single word ignore
                list for it.
                There are XML rules, but also a Simplereplace rule,
                a compounding rule.

                So apart from the hammer and the screwdriver, there
                are more tools.


            There is indeed another tool for multi-words. It seems
            that Ruud doesn't know it.

            We can enable a HybridDisambiguator and add a
            MultiwordChunker to the disambiguation. With this you
            can write a list of "multi-words" with its
            corresponding tag in a plain text file (multiwords.txt).

            I use the MultiwordChunker with two objectives: improve
            disambiguation and avoid spelling matches in multiwords.

            Would it be useful for you, Ruud?

            Regards,
            Jaume



                But anyway, adding the most frequent ones tot the
                disambiguator works.

                Getting rid of wrong postags and 10% reported
                possible spelling errors on
                the entire corpus is a higher priority.
                And fixing false positives. Having almost doubled
                the amount or rules is
                enough for this month.

                Ruud



                > W dniu 2014-09-16 o 09:03, R.J. Baars pisze:
                >> A word like 'Aviv'is not correct unless 'Tel' is
                before it.
                >> So it is best to leave Tel and Aviv out of the
                spell checker.
                >> That results in spell checking reporting errors
                for Aviv.
                >>
                >> In the disambiguator, there is the option to
                block that, by making an
                >> immunizing rule:
                >>
                >> <!-- Tel Aviv-->
                >> <rule id="TEL_AVIV" name="Tel Aviv">
                >> <pattern>
                >> <token>Tel</token>
                >> <token>Aviv</token>
                >> </pattern>
                >> <disambig action="ignore_spelling"/>
                >> </rule>
                >>
                >> That works perfectly. But then, there are a lot
                of these word
                >> combinations. Wouldn't it be better to have a
                multi-word ignore list for
                >> the spell checker?
                >>
                >> (Or even a multi-word spell checker, not just
                knowing 'correct' and 'not
                >> in list', but 'correct', 'incorrect' and 'not in
                list')
                >
                > It would not be an enhancement, as this would not
                give new functionality
                > but cripple the existing one. Also, the ability
                to use all XML syntax is
                > extremely important to me (I use POS tags and
                regular expressions), so I
                > wouldn't make use of the multi-word spell checker
                anyway. So we'd have
                > to introduce a crippled syntax that would look a
                little bit different
                > for a human being but with no meaningful
                functional change. I don't
                > think it's worth our time.
                >
                > The spell checker is best for checking individual
                words. Just like a
                > hammer, it's good for nails, and not for screws.
                For screws, we have a
                > screwdriver. For multi-word entities, we have
                more refined tools, like
                > tagging and disambiguation and special attributes.
                >
                > Best,
                > Marcin
                >
                >
                
------------------------------------------------------------------------------
                > Want excitement?
                > Manually upgrade your production database.
                > When you want reliability, choose Perforce.
                > Perforce version control. Predictably reliable.
                >
                
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
                > _______________________________________________
                > Languagetool-devel mailing list
                > Languagetool-devel@lists.sourceforge.net
                <mailto:Languagetool-devel@lists.sourceforge.net>
                >
                https://lists.sourceforge.net/lists/listinfo/languagetool-devel
                >



                
------------------------------------------------------------------------------
                Want excitement?
                Manually upgrade your production database.
                When you want reliability, choose Perforce.
                Perforce version control. Predictably reliable.
                
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
                _______________________________________________
                Languagetool-devel mailing list
                Languagetool-devel@lists.sourceforge.net
                <mailto:Languagetool-devel@lists.sourceforge.net>
                https://lists.sourceforge.net/lists/listinfo/languagetool-devel




            
------------------------------------------------------------------------------
            Want excitement?
            Manually upgrade your production database.
            When you want reliability, choose Perforce.
            Perforce version control. Predictably reliable.
            
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


            _______________________________________________
            Languagetool-devel mailing list
            Languagetool-devel@lists.sourceforge.net  
<mailto:Languagetool-devel@lists.sourceforge.net>
            https://lists.sourceforge.net/lists/listinfo/languagetool-devel


            
------------------------------------------------------------------------------
            Want excitement?
            Manually upgrade your production database.
            When you want reliability, choose Perforce.
            Perforce version control. Predictably reliable.
            
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
            _______________________________________________
            Languagetool-devel mailing list
            Languagetool-devel@lists.sourceforge.net
            <mailto:Languagetool-devel@lists.sourceforge.net>
            https://lists.sourceforge.net/lists/listinfo/languagetool-devel




        
------------------------------------------------------------------------------
        Want excitement?
        Manually upgrade your production database.
        When you want reliability, choose Perforce.
        Perforce version control. Predictably reliable.
        
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


        _______________________________________________
        Languagetool-devel mailing list
        Languagetool-devel@lists.sourceforge.net  
<mailto:Languagetool-devel@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/languagetool-devel


        
------------------------------------------------------------------------------
        Want excitement?
        Manually upgrade your production database.
        When you want reliability, choose Perforce.
        Perforce version control. Predictably reliable.
        
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
        _______________________________________________
        Languagetool-devel mailing list
        Languagetool-devel@lists.sourceforge.net
        <mailto:Languagetool-devel@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/languagetool-devel




    
------------------------------------------------------------------------------
    Want excitement?
    Manually upgrade your production database.
    When you want reliability, choose Perforce.
    Perforce version control. Predictably reliable.
    http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


    _______________________________________________
    Languagetool-devel mailing list
    Languagetool-devel@lists.sourceforge.net  
<mailto:Languagetool-devel@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/languagetool-devel


    
------------------------------------------------------------------------------
    Want excitement?
    Manually upgrade your production database.
    When you want reliability, choose Perforce.
    Perforce version control. Predictably reliable.
    http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
    _______________________________________________
    Languagetool-devel mailing list
    Languagetool-devel@lists.sourceforge.net
    <mailto:Languagetool-devel@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/languagetool-devel




------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to