** Description changed:
pyenchant's SpellChecker class has a problem with special characters,
for example when using a Portuguese dictionary. Checking single words
works with special characters:
- $ python
- Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
- [GCC 4.5.2] on linux2
- Type help, copyright, credits or license for more information.
- from enchant.checker import *
- c = SpellChecker('pt_BR.UTF-8')
- c.check('enrolação')
- True
- c.check('enrolaçao')
- False
+ $ python
+ Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
+ [GCC 4.5.2] on linux2
+ Type help, copyright, credits or license for more information.
+ from enchant.checker import *
+ c = SpellChecker('pt_BR.UTF-8')
+ c.check('enrolação')
+ True
+ c.check('enrolaçao')
+ False
but when checking sentences, it works only when no special characters
- are present
+ are present:
- c.set_text('alegria totall')
- for error in c:
- ... print(error.word) # this works
- ...
- totall
+ c.set_text('alegria totall')
+ for error in c:
+ ... print(error.word) # this works
+ ...
+ totall
- However, it does not work with special charactes like ç or ã:
+ However, it does not work with special characters like ç or ã, due
+ to a failure of the default (English) tokenizer:
- c.set_text('enrolação totall')
- for error in c:
- ... print(error.word) # this does not
- ...
- Traceback (most recent call last):
-File stdin, line 1, in module
-File /usr/lib/pymodules/python2.7/enchant/checker/__init__.py, line 243,
in next
- (word,pos) = self._tokens.next()
-File /usr/lib/pymodules/python2.7/enchant/tokenize/__init__.py, line
344, in next
- return self.next()
-File /usr/lib/pymodules/python2.7/enchant/tokenize/__init__.py, line
335, in next
- (word,pos) = self._curtok.next()
-File /usr/lib/pymodules/python2.7/enchant/tokenize/en.py, line 149, in
next
- incr = self._consume_alpha(text,offset)
-File /usr/lib/pymodules/python2.7/enchant/tokenize/en.py, line 98, in
_consume_alpha_b
- return self._consume_alpha_utf8(text,offset)
-File /usr/lib/pymodules/python2.7/enchant/tokenize/en.py, line 107, in
_consume_alpha_utf8
- u = text[offset:offset+incr].decode(utf8)
- AttributeError: 'array.array' object has no attribute 'decode'
+ c.set_text('enrolação totall')
+ for error in c:
+ ... print(error.word) # this does not
+ ...
+ Traceback (most recent call last):
+ File stdin, line 1, in module
+ File /usr/lib/pymodules/python2.7/enchant/checker/__init__.py, line 243,
in next
+ (word,pos) = self._tokens.next()
+ File /usr/lib/pymodules/python2.7/enchant/tokenize/__init__.py, line
344, in next
+ return self.next()
+ File /usr/lib/pymodules/python2.7/enchant/tokenize/__init__.py, line
335, in next
+ (word,pos) = self._curtok.next()
+ File /usr/lib/pymodules/python2.7/enchant/tokenize/en.py, line 149, in
next
+ incr = self._consume_alpha(text,offset)
+ File /usr/lib/pymodules/python2.7/enchant/tokenize/en.py, line 98, in
_consume_alpha_b
+ return self._consume_alpha_utf8(text,offset)
+ File /usr/lib/pymodules/python2.7/enchant/tokenize/en.py, line 107, in
_consume_alpha_utf8
+ u = text[offset:offset+incr].decode(utf8)
+ AttributeError: 'array.array' object has no attribute 'decode'
If you need further information, please let me know.
Cheers, Ulf
ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: python-enchant 1.5.3-2
ProcVersionSignature: Ubuntu 2.6.38-10.46-generic 2.6.38.7
Uname: Linux 2.6.38-10-generic x86_64
Architecture: amd64
Date: Fri Aug 19 06:47:11 2011
EcryptfsInUse: Yes
InstallationMedia: Ubuntu 10.10 Maverick Meerkat - Release amd64 (20101007)
PackageArchitecture: all
ProcEnviron:
- LANGUAGE=en_GB:en
- PATH=(custom, user)
- LANG=de_DE.UTF-8
- LC_MESSAGES=en_GB.UTF-8
- SHELL=/bin/bash
+ LANGUAGE=en_GB:en
+ PATH=(custom, user)
+ LANG=de_DE.UTF-8
+ LC_MESSAGES=en_GB.UTF-8
+ SHELL=/bin/bash
SourcePackage: pyenchant
UpgradeStatus: Upgraded to natty on 2011-04-29 (111 days ago)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/829288
Title:
pyenchant's SpellChecker has problems with UTF-8 locale
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pyenchant/+bug/829288/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs