[Bug 829288] Re: pyenchant's SpellChecker has problems with UTF-8 locale

2013-11-23 Thread Harald Sitter
Please report to the upstream developer.

http://pythonhosted.org/pyenchant/


** Changed in: pyenchant (Ubuntu)
   Status: New = Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/829288

Title:
  pyenchant's SpellChecker has problems with UTF-8 locale

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pyenchant/+bug/829288/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 829288] Re: pyenchant's SpellChecker has problems with UTF-8 locale

2011-08-19 Thread Ulf Mehlig
-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/829288

Title:
  pyenchant's SpellChecker has problems with UTF-8 locale

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pyenchant/+bug/829288/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 829288] Re: pyenchant's SpellChecker has problems with UTF-8 locale

2011-08-19 Thread Ulf Mehlig
** Description changed:

  pyenchant's SpellChecker class has a problem with special characters,
  for example when using a Portuguese dictionary. Checking single words
  works with special characters:
  
-  $ python 
-  Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) 
-  [GCC 4.5.2] on linux2 
-  Type help, copyright, credits or license for more information.
-   from enchant.checker import *
-   c = SpellChecker('pt_BR.UTF-8')
-   c.check('enrolação')
-  True
-   c.check('enrolaçao')
-  False
+  $ python
+  Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
+  [GCC 4.5.2] on linux2
+  Type help, copyright, credits or license for more information.
+   from enchant.checker import *
+   c = SpellChecker('pt_BR.UTF-8')
+   c.check('enrolação')
+  True
+   c.check('enrolaçao')
+  False
  
  but when checking sentences, it works only when no special characters
- are present
+ are present:
  
-   c.set_text('alegria totall')
-   for error in c:
-  ... print(error.word) # this works
-  ... 
-  totall
+   c.set_text('alegria totall')
+   for error in c:
+  ... print(error.word) # this works
+  ...
+  totall
  
- However, it does not work with special charactes like ç or ã:
+ However, it does not work with special characters like ç or ã, due
+ to a failure of the default (English) tokenizer:
  
-   c.set_text('enrolação totall')
-   for error in c:
-  ... print(error.word) # this does not
-  ... 
-  Traceback (most recent call last):
-File stdin, line 1, in module
-File /usr/lib/pymodules/python2.7/enchant/checker/__init__.py, line 243, 
in next
-  (word,pos) = self._tokens.next()
-File /usr/lib/pymodules/python2.7/enchant/tokenize/__init__.py, line 
344, in next
-  return self.next()
-File /usr/lib/pymodules/python2.7/enchant/tokenize/__init__.py, line 
335, in next
-  (word,pos) = self._curtok.next()
-File /usr/lib/pymodules/python2.7/enchant/tokenize/en.py, line 149, in 
next
-  incr = self._consume_alpha(text,offset)
-File /usr/lib/pymodules/python2.7/enchant/tokenize/en.py, line 98, in 
_consume_alpha_b
-  return self._consume_alpha_utf8(text,offset)
-File /usr/lib/pymodules/python2.7/enchant/tokenize/en.py, line 107, in 
_consume_alpha_utf8
-  u = text[offset:offset+incr].decode(utf8)
-  AttributeError: 'array.array' object has no attribute 'decode'
+   c.set_text('enrolação totall')
+   for error in c:
+  ... print(error.word) # this does not
+  ...
+  Traceback (most recent call last):
+    File stdin, line 1, in module
+    File /usr/lib/pymodules/python2.7/enchant/checker/__init__.py, line 243, 
in next
+  (word,pos) = self._tokens.next()
+    File /usr/lib/pymodules/python2.7/enchant/tokenize/__init__.py, line 
344, in next
+  return self.next()
+    File /usr/lib/pymodules/python2.7/enchant/tokenize/__init__.py, line 
335, in next
+  (word,pos) = self._curtok.next()
+    File /usr/lib/pymodules/python2.7/enchant/tokenize/en.py, line 149, in 
next
+  incr = self._consume_alpha(text,offset)
+    File /usr/lib/pymodules/python2.7/enchant/tokenize/en.py, line 98, in 
_consume_alpha_b
+  return self._consume_alpha_utf8(text,offset)
+    File /usr/lib/pymodules/python2.7/enchant/tokenize/en.py, line 107, in 
_consume_alpha_utf8
+  u = text[offset:offset+incr].decode(utf8)
+  AttributeError: 'array.array' object has no attribute 'decode'
  
  If you need further information, please let me know.
  
  Cheers, Ulf
  
  ProblemType: Bug
  DistroRelease: Ubuntu 11.04
  Package: python-enchant 1.5.3-2
  ProcVersionSignature: Ubuntu 2.6.38-10.46-generic 2.6.38.7
  Uname: Linux 2.6.38-10-generic x86_64
  Architecture: amd64
  Date: Fri Aug 19 06:47:11 2011
  EcryptfsInUse: Yes
  InstallationMedia: Ubuntu 10.10 Maverick Meerkat - Release amd64 (20101007)
  PackageArchitecture: all
  ProcEnviron:
-  LANGUAGE=en_GB:en
-  PATH=(custom, user)
-  LANG=de_DE.UTF-8
-  LC_MESSAGES=en_GB.UTF-8
-  SHELL=/bin/bash
+  LANGUAGE=en_GB:en
+  PATH=(custom, user)
+  LANG=de_DE.UTF-8
+  LC_MESSAGES=en_GB.UTF-8
+  SHELL=/bin/bash
  SourcePackage: pyenchant
  UpgradeStatus: Upgraded to natty on 2011-04-29 (111 days ago)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/829288

Title:
  pyenchant's SpellChecker has problems with UTF-8 locale

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pyenchant/+bug/829288/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs