Am 03.12.2010 um 19:38 schrieb Enrico Forestieri:

> On Fri, Dec 03, 2010 at 06:40:39PM +0100, Georg Baum wrote:
> 
>> Stephan Witt wrote:
>> 
>>> Am 03.12.2010 um 04:36 schrieb Enrico Forestieri:
>>> 
>>>> 2) replace all occurrences of isDigit() with isDigitASCII()
>> 
>> I don't know. Unless somebody goes through all usages and confirms that 
>> indeed an ASCII digit is wanted I would rather keep it as it is.
> 
> Independent of the fact that an ASCII digit is wanted or not, actually
> isDigit() performs the same test as isDigitAscii(). From the Qt docs:
> 
> bool QChar::isDigit () const
> Returns true if the character is a decimal digit (Number_DecimalDigit);
> otherwise returns false.
> 
> And the other unicode categories addressed by Qt are: Number_Letter and
> Number_Other (http://doc.trolltech.com/4.7/qchar.html#Category-enum)
> So, it is pretty clear that isDigit() only tests for 0..9.
> This is reassured by the description of isNumber():
> 
> bool QChar::isNumber () const
> Returns true if the character is a number (Number_* categories, not just
> 0-9); otherwise returns false.
> See also isDigit().
> 
> So, I think that using isDigitASCII() in place of isDigit() is clearer.
> 
>>>> 3) use QChar::isNumber() instead of iswdigit().
>>> 
>>> I'd have no problem with this. My goal was to get the thing fixed.
>>> IMHO, a wrapper around QChar::isNumber() in textutils.h and lstrings.cpp
>>> should be added then.
>> 
>> If you want to use QChar::isNumber() for the spell checking test: Why not. 
>> Unless somebody comes up with an example I believe that it basically does 
>> not matter whether isDigit(), isDigitASCII() or QChar::isNumber() is used 
>> there. Whatever you decide, please ensure that hasDigit() gets renamed 
>> properly.
> 
> IsDigit() and isDigitASCII() perform the exact same test. Now, I don't
> know whether QChar::isNumber() is appropriate for the spell checking test,
> but it for sure is not equivalent to the other two. Indeed, the first two
> only care for the Number_DecimalDigit category, while the latter is for
> all Number_* categories.
> 
> Quoting http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters :
> 
> * Decimal digit (Nd)
> * Letter (Nl) — Numerals composed of letters or letterlike symbols
>   (e.g., Roman numerals)
> * Other (No) — Includes vulgar fractions and superscript and subscript
>   digits.

I made another patch for the topic. 
Enrico, Georg, do you have any plans yourself?

The Paragraph.cpp part goes the QChar::isNumber() now.
In case the configure option gets added one can make it three state: 
* ignore words with digits
* ignore words with numerals
* ignore none

Stephan

Index: src/support/textutils.h
===================================================================
--- src/support/textutils.h     (Revision 36720)
+++ src/support/textutils.h     (Arbeitskopie)
@@ -44,6 +44,9 @@
 /// return true if a unicode char is a digit.
 bool isDigit(char_type c);
 
+/// return true if a unicode char is a numeral.
+bool isNumber(char_type c);
+
 /// return whether \p c is a digit in the ASCII range
 bool isDigitASCII(char_type c);
 
Index: src/support/lstrings.cpp
===================================================================
--- src/support/lstrings.cpp    (Revision 36720)
+++ src/support/lstrings.cpp    (Arbeitskopie)
@@ -157,6 +157,16 @@
 }
 
 
+bool isNumber(char_type c)
+{
+       if (!is_utf16(c))
+               // assume that no non-utf16 character is a numeral
+               // c outside the UCS4 range is catched as well
+               return false;
+       return ucs4_to_qchar(c).isNumber();
+}
+
+
 bool isDigitASCII(char_type c)
 {
        return '0' <= c && c <= '9';
@@ -165,8 +175,7 @@
 
 bool isAlnumASCII(char_type c)
 {
-       return ('0' <= c && c <= '9')
-               || ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z');
+       return isAlphaASCII(c) || isDigitASCII(c);
 }
 
 
@@ -266,7 +275,7 @@
 
        string::const_iterator end = tmpstr.end();
        for (; cit != end; ++cit)
-               if (!isdigit((*cit)))
+               if (!isDigitASCII(*cit))
                        return false;
 
        return true;
@@ -286,7 +295,7 @@
        string::const_iterator cit = tmpstr.begin();
        string::const_iterator end = tmpstr.end();
        for (; cit != end; ++cit)
-               if (!isdigit((*cit)))
+               if (!isDigitASCII(*cit))
                        return false;
 
        return true;
@@ -310,7 +319,7 @@
                ++cit;
        string::const_iterator end = tmpstr.end();
        for (; cit != end; ++cit) {
-               if (!isdigit(*cit) && *cit != '.')
+               if (!isDigitASCII(*cit) && *cit != '.')
                        return false;
                if ('.' == (*cit)) {
                        if (found_dot)
@@ -324,17 +333,11 @@
 
 bool hasDigit(docstring const & str)
 {
-       if (str.empty())
-               return false;
-
        docstring::const_iterator cit = str.begin();
        docstring::const_iterator const end = str.end();
-       for (; cit != end; ++cit) {
-               if (*cit == ' ')
-                       continue;
-               if (isdigit((*cit)))
+       for (; cit != end; ++cit)
+               if (isDigit(*cit))
                        return true;
-       }
        return false;
 }
 
Index: src/Paragraph.cpp
===================================================================
--- src/Paragraph.cpp   (Revision 36720)
+++ src/Paragraph.cpp   (Arbeitskopie)
@@ -358,6 +358,8 @@
                return speller_change_number > 
speller_state_.currentChangeNumber();
        }
 
+       bool ignoreWord(docstring const & word) const ;
+       
        void setMisspelled(pos_type from, pos_type to, SpellChecker::Result 
state)
        {
                pos_type textsize = owner_->size();
@@ -3545,6 +3547,21 @@
 }
 
 
+bool Paragraph::Private::ignoreWord(docstring const & word) const
+{
+       // Ignore words with digits
+       // FIXME: make this customizable
+       // (note that some checkers ignore words with digits by default)
+       docstring::const_iterator cit = word.begin();
+       docstring::const_iterator const end = word.end();
+       for (; cit != end; ++cit) {
+               if (isNumber((*cit)))
+                       return true;
+       }
+       return false;
+}
+
+
 SpellChecker::Result Paragraph::spellCheck(pos_type & from, pos_type & to,
        WordLangTuple & wl, docstring_list & suggestions,
        bool do_suggestion, bool check_learned) const
@@ -3570,10 +3587,7 @@
                return result;
 
        if (needsSpellCheck() || check_learned) {
-               // Ignore words with digits
-               // FIXME: make this customizable
-               // (note that some checkers ignore words with digits by default)
-               if (!hasDigit(word)) {
+               if (!d->ignoreWord(word)) {
                        bool const trailing_dot = to < size() && d->text_[to] 
== '.';
                        result = speller->check(wl);
                        if (SpellChecker::misspelled(result) && trailing_dot) {

Reply via email to