Am 03.12.2010 um 19:38 schrieb Enrico Forestieri: > On Fri, Dec 03, 2010 at 06:40:39PM +0100, Georg Baum wrote: > >> Stephan Witt wrote: >> >>> Am 03.12.2010 um 04:36 schrieb Enrico Forestieri: >>> >>>> 2) replace all occurrences of isDigit() with isDigitASCII() >> >> I don't know. Unless somebody goes through all usages and confirms that >> indeed an ASCII digit is wanted I would rather keep it as it is. > > Independent of the fact that an ASCII digit is wanted or not, actually > isDigit() performs the same test as isDigitAscii(). From the Qt docs: > > bool QChar::isDigit () const > Returns true if the character is a decimal digit (Number_DecimalDigit); > otherwise returns false. > > And the other unicode categories addressed by Qt are: Number_Letter and > Number_Other (http://doc.trolltech.com/4.7/qchar.html#Category-enum) > So, it is pretty clear that isDigit() only tests for 0..9. > This is reassured by the description of isNumber(): > > bool QChar::isNumber () const > Returns true if the character is a number (Number_* categories, not just > 0-9); otherwise returns false. > See also isDigit(). > > So, I think that using isDigitASCII() in place of isDigit() is clearer. > >>>> 3) use QChar::isNumber() instead of iswdigit(). >>> >>> I'd have no problem with this. My goal was to get the thing fixed. >>> IMHO, a wrapper around QChar::isNumber() in textutils.h and lstrings.cpp >>> should be added then. >> >> If you want to use QChar::isNumber() for the spell checking test: Why not. >> Unless somebody comes up with an example I believe that it basically does >> not matter whether isDigit(), isDigitASCII() or QChar::isNumber() is used >> there. Whatever you decide, please ensure that hasDigit() gets renamed >> properly. > > IsDigit() and isDigitASCII() perform the exact same test. Now, I don't > know whether QChar::isNumber() is appropriate for the spell checking test, > but it for sure is not equivalent to the other two. Indeed, the first two > only care for the Number_DecimalDigit category, while the latter is for > all Number_* categories. > > Quoting http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters : > > * Decimal digit (Nd) > * Letter (Nl) — Numerals composed of letters or letterlike symbols > (e.g., Roman numerals) > * Other (No) — Includes vulgar fractions and superscript and subscript > digits.
I made another patch for the topic. Enrico, Georg, do you have any plans yourself? The Paragraph.cpp part goes the QChar::isNumber() now. In case the configure option gets added one can make it three state: * ignore words with digits * ignore words with numerals * ignore none Stephan
Index: src/support/textutils.h =================================================================== --- src/support/textutils.h (Revision 36720) +++ src/support/textutils.h (Arbeitskopie) @@ -44,6 +44,9 @@ /// return true if a unicode char is a digit. bool isDigit(char_type c); +/// return true if a unicode char is a numeral. +bool isNumber(char_type c); + /// return whether \p c is a digit in the ASCII range bool isDigitASCII(char_type c); Index: src/support/lstrings.cpp =================================================================== --- src/support/lstrings.cpp (Revision 36720) +++ src/support/lstrings.cpp (Arbeitskopie) @@ -157,6 +157,16 @@ } +bool isNumber(char_type c) +{ + if (!is_utf16(c)) + // assume that no non-utf16 character is a numeral + // c outside the UCS4 range is catched as well + return false; + return ucs4_to_qchar(c).isNumber(); +} + + bool isDigitASCII(char_type c) { return '0' <= c && c <= '9'; @@ -165,8 +175,7 @@ bool isAlnumASCII(char_type c) { - return ('0' <= c && c <= '9') - || ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z'); + return isAlphaASCII(c) || isDigitASCII(c); } @@ -266,7 +275,7 @@ string::const_iterator end = tmpstr.end(); for (; cit != end; ++cit) - if (!isdigit((*cit))) + if (!isDigitASCII(*cit)) return false; return true; @@ -286,7 +295,7 @@ string::const_iterator cit = tmpstr.begin(); string::const_iterator end = tmpstr.end(); for (; cit != end; ++cit) - if (!isdigit((*cit))) + if (!isDigitASCII(*cit)) return false; return true; @@ -310,7 +319,7 @@ ++cit; string::const_iterator end = tmpstr.end(); for (; cit != end; ++cit) { - if (!isdigit(*cit) && *cit != '.') + if (!isDigitASCII(*cit) && *cit != '.') return false; if ('.' == (*cit)) { if (found_dot) @@ -324,17 +333,11 @@ bool hasDigit(docstring const & str) { - if (str.empty()) - return false; - docstring::const_iterator cit = str.begin(); docstring::const_iterator const end = str.end(); - for (; cit != end; ++cit) { - if (*cit == ' ') - continue; - if (isdigit((*cit))) + for (; cit != end; ++cit) + if (isDigit(*cit)) return true; - } return false; } Index: src/Paragraph.cpp =================================================================== --- src/Paragraph.cpp (Revision 36720) +++ src/Paragraph.cpp (Arbeitskopie) @@ -358,6 +358,8 @@ return speller_change_number > speller_state_.currentChangeNumber(); } + bool ignoreWord(docstring const & word) const ; + void setMisspelled(pos_type from, pos_type to, SpellChecker::Result state) { pos_type textsize = owner_->size(); @@ -3545,6 +3547,21 @@ } +bool Paragraph::Private::ignoreWord(docstring const & word) const +{ + // Ignore words with digits + // FIXME: make this customizable + // (note that some checkers ignore words with digits by default) + docstring::const_iterator cit = word.begin(); + docstring::const_iterator const end = word.end(); + for (; cit != end; ++cit) { + if (isNumber((*cit))) + return true; + } + return false; +} + + SpellChecker::Result Paragraph::spellCheck(pos_type & from, pos_type & to, WordLangTuple & wl, docstring_list & suggestions, bool do_suggestion, bool check_learned) const @@ -3570,10 +3587,7 @@ return result; if (needsSpellCheck() || check_learned) { - // Ignore words with digits - // FIXME: make this customizable - // (note that some checkers ignore words with digits by default) - if (!hasDigit(word)) { + if (!d->ignoreWord(word)) { bool const trailing_dot = to < size() && d->text_[to] == '.'; result = speller->check(wl); if (SpellChecker::misspelled(result) && trailing_dot) {