hi,
On Monday 29 March 2010 18:38:12 Alexander Artemenko wrote:
> >> On 2010/03/29 16:05 - svetlyak40wt wrote :
> >> I've solved this annoing problem. Here is the patch:
> >> http://gist.github.com/347854
> >
> > On 2010/03/29 16:57 - sthenault wrote :
> > would you please add a test case to the functional suite ?
> >
> > see test/input and test/messages or search ml archives for more
> > details
>
> Hi Sylvain, I've updated the patch and added tests.
>
>
> by: Alexander Artemenko
> url: http://www.logilab.org/ticket/4683
I tried out your patch; but unfortunately it generated an
UnicodeDecodeError in our test suite
I fixed it without understanding, by splitting your lambda
declaration in two lines:
+ decode = stream.readline().decode
+ line_generator = lambda: decode(encoding)
instead of:
+ line_generator = lambda:
stream.readline().decode(encoding)
Can somebody explain me what happened ?
Anyhow, Appended my new patch (we use func_noerror_* if we don't want the
message triggered)
Is that ok ?
--
Emile Anclin <[email protected]>
http://www.logilab.fr/ http://www.logilab.org/
Informatique scientifique & et gestion de connaissances
fix #4683: Non-ASCII characters count double if utf8
diff -r cdd571901fea checkers/format.py
--- a/checkers/format.py Mon Mar 29 11:27:19 2010 +0200
+++ b/checkers/format.py Tue Mar 30 11:13:09 2010 +0200
@@ -31,6 +31,7 @@
from pylint.interfaces import IRawChecker, IASTNGChecker
from pylint.checkers import BaseRawChecker
+from pylint.checkers.misc import guess_encoding, is_ascii
MSGS = {
'C0301': ('Line too long (%s/%s)',
@@ -178,6 +179,25 @@
self._lines = None
self._visited_lines = None
+ def process_module(self, stream):
+ """extracts encoding from the stream and
+ decodes each line, so that international
+ text's lenght properly calculated.
+ """
+ data = stream.read()
+ line_generator = stream.readline
+
+ ascii, lineno = is_ascii(data)
+ if not ascii:
+ encoding = guess_encoding(data)
+ if encoding is not None:
+ decode = stream.readline().decode
+ line_generator = lambda: decode(encoding)
+ del data
+
+ stream.seek(0)
+ self.process_tokens(tokenize.generate_tokens(line_generator))
+
def new_line(self, tok_type, line, line_num, junk):
"""a new line has been encountered, process it if necessary"""
if not tok_type in junk:
diff -r cdd571901fea test/input/func_noerror_long_utf8_line.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test/input/func_noerror_long_utf8_line.py Tue Mar 30 11:13:09 2010 +0200
@@ -0,0 +1,8 @@
+# -*- coding: utf-8 -*-
+"""this utf-8 doc string have some non ASCII caracters like 'é', or '¢»ß'"""
+### check also comments with some more non ASCII caracters like 'é' or '¢»ß'
+
+__revision__ = 1100
+print "------------------------------------------------------------------------"
+print "-----------------------------------------------------------------------é"
+
_______________________________________________
Python-Projects mailing list
[email protected]
http://lists.logilab.org/mailman/listinfo/python-projects