https://bugs.freedesktop.org/show_bug.cgi?id=46950

             Bug #: 46950
           Summary: Incorrect word breaking for spell-checking
    Classification: Unclassified
           Product: LibreOffice
           Version: unspecified
          Platform: Other
        OS/Version: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Linguistic component
        AssignedTo: libreoffice-bugs@lists.freedesktop.org
        ReportedBy: n...@math.technion.ac.il


It appears that before LibreOffice passes text to the spell-checker, it breaks
them into separate words. The problem is that (apparently) it does this using
some general language-agnostic rules, while different languages might have
different rules as to what characters may be part of a word, and what breaks
words.

My problem is specifically with the Hebrew spell-checking: In Hebrew, the quote
characters - ' and ", are used not just for quoting, but have an additional
unrelated use as in-word characters:
1. The single-quote is used to mark foreign sounds. E.g., the word ג'ירפה has a
single-quote character after the gimmel, which means it should be pronounced
"j", not "g".
2. The double-quote is used inside acronyms, to mark them as such. For example
מנכ"ל is the acronym for CEO. מנכ"לים is its plural. Both have quotes in the
middle of the word - and these words, together with this quote, are in the
dictionary.

Because of this, the Hebrew hunspell dictionary includes the following lines in
he_IL.aff:

   BREAK 3
   BREAK ^"
   BREAK "$
   BREAK ^'

This means that " only breaks words when it's in the beginning and end (and '
only in the beginning) - these characters in the middle of a word never mean a
word break in Hebrew. With this setting, hunspell correctly word-breaks and
spell-checks Hebrew text.

Unfortunately, LibreOffice doesn't respect these instructions. It appears that
it incorrectly breaks up the words before sending them to hunspell. The end
result is that all Hebrew words which are acronyms or have foreign sounds in
them are incorrectly marked as being errors, which is very annoying.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Reply via email to