Re: [Catalyst] UTF8 problems with plugin::encoding

2014-07-22 Thread Bernhard Bauch
hey all,

this pyton3 script triggers the error ….


import httplib2
import urllib.parse

somestr = '深入 so what'
encodedstr = somestr.encode('gb2312')
url = 'http://myappdomain.com/search'   
body = { encodedstr:encodedstr }
headers = {
'Content-type': 'application/x-www-form-urlencoded', 
'Accept': 'text/html, application/xml;q=0.9, application/xhtml+xml, 
image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh;q=0.9,en;q=0.8'
}
http = httplib2.Http()
response, content = http.request(url, 'POST', headers=headers, 
body=urllib.parse.urlencode(body))


now its possible to reproduce the error :)

any ideas how to solve this ?
ruby people did this with adding a utf8-sanitizer in the middleware..

bye, bernhard


On 21 Jul 2014, at 22:19, Bernhard Bauch ba...@zsi.at wrote:

 more news..
 
 the crawler/searcheinge that triggers these errors is
   http://easou.com
 
 this searchengine delivers their pages not in UTF8 — but in “gb2312” which is 
 “simple chinese”
 if i open the “wrong utf8” parameters from the faulty requests with “gb2312” 
 some readable signs appear.
  this leads me to: catalyst does not handle requests with gb2312 encoded 
  parameters (because they are not utf8) -and the request does not promote 
  that it is encoded in other than utf8.
 
 any ideas what to do ?
 
 bye, bernhard
 
 
 
 On 21 Jul 2014, at 14:36, Roman Winfinit winfi...@gmail.com wrote:
 
 Hello,
 
 How are you running your application? Ie: mod_perl, fcgi, fcgi + 
 httpd/nginx, plack + ... also what version of perl are you using and what os?
 
 -roman
 
 On Jul 21, 2014 6:58 AM, Bernhard Bauch ba...@zsi.at wrote:
 Hey all,
 
 on most of my website running on (latest catalyst: 5.90065) i always get 
 utf8 related errors.
 the usually appear if a spider 
  Mozilla/5.0 (compatible; EasouSpider; 
 +http://www.easou.com/search/spider.html)
 comes accross.
 
 the error is:
  Caught exception in engine UTF8 Error: utf8 \x98 does not map to 
 Unicode at /usr/local/…./lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line 
 167.
 
 It took me while to get the actual parameters the spiders sends because the 
 debug-message of catalyst do not tell that much :...
 
 —
 [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim 
 /usr/local/…./lib/perl5/Catalyst.pm +2016: *** Request 164 (0.032/s) [10682] 
 [Wed Jul 16 15:08:47 2014] ***
 [2014/07/16 15:08:47] [5.255.253.218] [DEBUG] vim 
 /usr/local/…./lib/perl5/Catalyst.pm +2309: Response Code: 400; Content-Type: 
 text/plain; charset=UTF-8; Content-Length: unknown
 [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim 
 /usr/local/.../lib/perl5/Catalyst.pm +1880: Request took 0.006491s 
 (154.059/s)
 .---+---.
 | Action 
  
   | Time  |
 +---+---+
 '---+---'
 —
 
 i changed to Plugin::Unicode::Encoding plugin a bit to find out what the 
 client sends … the results are these:
 UTF8 trash arrives - and the module seems unable to deal with it…
 
 
 Caught exception in engine UTF8 Error: utf8 \x98 does not map to Unicode 
 at /usr/local/…../lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line 170.
  -
 
 URL: notice/list
 
 PARAMS:$VAR1 = {
   'X*Ö^K^@^@^@^@¸®ä
 ^@^@^@^@883^H^K^@^@^@^@h¡ä
 ^@^@^@^@Hµä
 ^@^@^@^@X^Z^N^Q^@^@^@^@ø91^F^Q^@^@^@^@Ø^F^N^Q^@^@^@^@¸92^F^Q^@^@^@^@(^K^N^Q^@^@^@^@88^B^N^Q^@^@^@^@¸úÝ^P^@^@^@^@^X%q^G^@^@^@^@اñ^O^@^@^@^@ØøB.^@^@^@^@èâÝ^P^@^@^@^@XÛ_^L^@^@^@^@ÈíÝ^P^@^@^@^@¸~P^S^@^@^@^@èåÝ^P^@^@^@^@Øný^O^@^@^@^@88úÝ^P^@^@^@^@^Xá(
  ^@^@^@^@ئÆ
 ^@^@^@^@Øï*^Q^@^@^@^@^X' = 
 

Re: [Catalyst] UTF8 problems with plugin::encoding

2014-07-22 Thread Bernhard Bauch
here’s also a perl-script that does it

--
use Encode qw(decode encode);
use LWP::UserAgent;

my $str = '深入 so what';
my $oct = encode(gb2312, $str);
my $url = 'http://wbc-inco.net/object/event/past';
my $ua   = LWP::UserAgent-new();
my $response = $ua-post( $url, { $oct = $oct } );
my $content  = $response-decoded_content();
--

On 22 Jul 2014, at 11:33, Bernhard Bauch ba...@zsi.at wrote:

 hey all,
 
 this pyton3 script triggers the error ….
 
 
 import httplib2
 import urllib.parse
 
 somestr = '深入 so what'
 encodedstr = somestr.encode('gb2312')
 url = 'http://myappdomain.com/search'   
 body = { encodedstr:encodedstr }
 headers = {
 'Content-type': 'application/x-www-form-urlencoded', 
 'Accept': 'text/html, application/xml;q=0.9, application/xhtml+xml, 
 image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1',
 'Accept-Encoding': 'gzip, deflate',
 'Accept-Language': 'zh;q=0.9,en;q=0.8'
 }
 http = httplib2.Http()
 response, content = http.request(url, 'POST', headers=headers, 
 body=urllib.parse.urlencode(body))
 
 
 now its possible to reproduce the error :)
 
 any ideas how to solve this ?
 ruby people did this with adding a utf8-sanitizer in the middleware..
 
 bye, bernhard
 
 
 On 21 Jul 2014, at 22:19, Bernhard Bauch ba...@zsi.at wrote:
 
 more news..
 
 the crawler/searcheinge that triggers these errors is
  http://easou.com
 
 this searchengine delivers their pages not in UTF8 — but in “gb2312” which 
 is “simple chinese”
 if i open the “wrong utf8” parameters from the faulty requests with “gb2312” 
 some readable signs appear.
  this leads me to: catalyst does not handle requests with gb2312 encoded 
  parameters (because they are not utf8) -and the request does not promote 
  that it is encoded in other than utf8.
 
 any ideas what to do ?
 
 bye, bernhard
 
 
 
 On 21 Jul 2014, at 14:36, Roman Winfinit winfi...@gmail.com wrote:
 
 Hello,
 
 How are you running your application? Ie: mod_perl, fcgi, fcgi + 
 httpd/nginx, plack + ... also what version of perl are you using and what 
 os?
 
 -roman
 
 On Jul 21, 2014 6:58 AM, Bernhard Bauch ba...@zsi.at wrote:
 Hey all,
 
 on most of my website running on (latest catalyst: 5.90065) i always get 
 utf8 related errors.
 the usually appear if a spider 
 Mozilla/5.0 (compatible; EasouSpider; 
 +http://www.easou.com/search/spider.html)
 comes accross.
 
 the error is:
 Caught exception in engine UTF8 Error: utf8 \x98 does not map to 
 Unicode at /usr/local/…./lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line 
 167.
 
 It took me while to get the actual parameters the spiders sends because the 
 debug-message of catalyst do not tell that much :...
 
 —
 [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim 
 /usr/local/…./lib/perl5/Catalyst.pm +2016: *** Request 164 (0.032/s) 
 [10682] [Wed Jul 16 15:08:47 2014] ***
 [2014/07/16 15:08:47] [5.255.253.218] [DEBUG] vim 
 /usr/local/…./lib/perl5/Catalyst.pm +2309: Response Code: 400; 
 Content-Type: text/plain; charset=UTF-8; Content-Length: unknown
 [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim 
 /usr/local/.../lib/perl5/Catalyst.pm +1880: Request took 0.006491s 
 (154.059/s)
 .---+---.
 | Action
 
 | Time  |
 +---+---+
 '---+---'
 —
 
 i changed to Plugin::Unicode::Encoding plugin a bit to find out what the 
 client sends … the results are these:
 UTF8 trash arrives - and the module seems unable to deal with it…
 
 
 Caught exception in engine UTF8 Error: utf8 \x98 does not map to Unicode 
 at /usr/local/…../lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line 170.
  -
 
 URL: notice/list
 
 PARAMS:$VAR1 = {
   'X*Ö^K^@^@^@^@¸®ä
 ^@^@^@^@883^H^K^@^@^@^@h¡ä
 ^@^@^@^@Hµä
 ^@^@^@^@X^Z^N^Q^@^@^@^@ø91^F^Q^@^@^@^@Ø^F^N^Q^@^@^@^@¸92^F^Q^@^@^@^@(^K^N^Q^@^@^@^@88^B^N^Q^@^@^@^@¸úÝ^P^@^@^@^@^X%q^G^@^@^@^@اñ^O^@^@^@^@ØøB.^@^@^@^@èâÝ^P^@^@^@^@XÛ_^L^@^@^@^@ÈíÝ^P^@^@^@^@¸~P^S^@^@^@^@èåÝ^P^@^@^@^@Øný^O^@^@^@^@88úÝ^P^@^@^@^@^Xá(
  ^@^@^@^@ئÆ
 ^@^@^@^@Øï*^Q^@^@^@^@^X' = 
 

Re: [Catalyst] UTF8 problems with plugin::encoding

2014-07-22 Thread Mark Ellis
I don't think there's anything you can do, you're app wants utf8 and
they're sending something else which doesn't map. and since you can't know
what format it is in, then all you can do is die if it doesn't map, which
is what the plugin does.

as far as i can tell the ruby middleware i found handles this by returning
a 400 bad request, which cataylst does as well. so there's no affect, other
than the noise in the logs.


On 22 July 2014 11:21, Bernhard Bauch ba...@zsi.at wrote:

 here’s also a perl-script that does it

 --
 use Encode qw(decode encode);
 use LWP::UserAgent;

 my $str = '深入 so what';
 my $oct = encode(gb2312, $str);
 my $url = 'http://wbc-inco.net/object/event/past';
 my $ua   = LWP::UserAgent-new();
 my $response = $ua-post( $url, { $oct = $oct } );
 my $content  = $response-decoded_content();
 --

 On 22 Jul 2014, at 11:33, Bernhard Bauch ba...@zsi.at wrote:

 hey all,

 this pyton3 script triggers the error ….

 
 import httplib2
 import urllib.parse

 somestr = '深入 so what'
 encodedstr = somestr.encode('gb2312')
 url = 'http://myappdomain.com/search'
 body = { encodedstr:encodedstr }
 headers = {
 'Content-type': 'application/x-www-form-urlencoded',
 'Accept': 'text/html, application/xml;q=0.9, application/xhtml+xml,
 image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1',
 'Accept-Encoding': 'gzip, deflate',
 'Accept-Language': 'zh;q=0.9,en;q=0.8'
 }
 http = httplib2.Http()
 response, content = http.request(url, 'POST', headers=headers,
 body=urllib.parse.urlencode(body))
 

 now its possible to reproduce the error :)

 any ideas how to solve this ?
 ruby people did this with adding a utf8-sanitizer in the middleware..

 bye, bernhard


 On 21 Jul 2014, at 22:19, Bernhard Bauch ba...@zsi.at wrote:

 more news..

 the crawler/searcheinge that triggers these errors is
 http://easou.com

 this searchengine delivers their pages not in UTF8 — but in “gb2312” which
 is “simple chinese”
 if i open the “wrong utf8” parameters from the faulty requests with
 “gb2312” some readable signs appear.
  this leads me to: catalyst does not handle requests with gb2312 encoded
 parameters (because they are not utf8) -and the request does not promote
 that it is encoded in other than utf8.

 any ideas what to do ?

 bye, bernhard



 On 21 Jul 2014, at 14:36, Roman Winfinit winfi...@gmail.com wrote:

 Hello,

 How are you running your application? Ie: mod_perl, fcgi, fcgi +
 httpd/nginx, plack + ... also what version of perl are you using and what
 os?

 -roman
 On Jul 21, 2014 6:58 AM, Bernhard Bauch ba...@zsi.at wrote:

 Hey all,

 on most of my website running on (latest catalyst: 5.90065) i always get
 utf8 related errors.
 the usually appear if a spider
 Mozilla/5.0 (compatible; EasouSpider; +
 http://www.easou.com/search/spider.html)
 comes accross.

 the error is:
 Caught exception in engine UTF8 Error: utf8 \x98 does not map to
 Unicode at /usr/local/…./lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line
 167.

 It took me while to get the actual parameters the spiders sends because
 the debug-message of catalyst do not tell that much :...

 —
 [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim
 /usr/local/…./lib/perl5/Catalyst.pm +2016: *** Request 164 (0.032/s)
 [10682] [Wed Jul 16 15:08:47 2014] ***
 [2014/07/16 15:08:47] [5.255.253.218] [DEBUG] vim
 /usr/local/…./lib/perl5/Catalyst.pm +2309: Response Code: 400;
 Content-Type: text/plain; charset=UTF-8; Content-Length: unknown
 [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim
 /usr/local/.../lib/perl5/Catalyst.pm +1880: Request took 0.006491s
 (154.059/s)

 .---+---.
 | Action

| Time  |

 +---+---+

 '---+---'
 —

 i changed to Plugin::Unicode::Encoding plugin a bit to find out what the
 client sends … the results are these:
 UTF8 trash arrives - and the module seems unable to deal with it…

 
 Caught exception in engine UTF8 Error: utf8 \x98 does not map to
 Unicode at /usr/local/…../lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm
 line 170.
  -

 URL: notice/list

 PARAMS:$VAR1 = {
   'X*Ö^K^@^@^@^@¸®ä
 ^@^@^@^@883^H^K^@^@^@^@h¡ä
 ^@^@^@^@Hµä
 

Re: [Catalyst] UTF8 problems with plugin::encoding

2014-07-22 Thread Roman Jurkov
Bernhard,

To stop exception, you can modify $CHECK in Catalyst::Plugin::Unicode::Encoding 
by removing “FB_CROAK”, that way it won’t throw exception, and let the code go 
through, however it will not decode content correctly, but in this case, since 
it is a spider, i don’t know if it matters to you.

you can just add this to your catalyst application:

use Catalyst::Plugin::Unicode::Encoding;
$Catalyst::Plugin::Unicode::Encoding::CHECK = Encode::LEAVE_SRC;

more isolated test that is illustrating this issue:

use strict;
use warnings;

use Encode qw(decode encode);

our $CHECK = Encode::FB_CROAK | Encode::LEAVE_SRC;

my $str = '深入 so what';
my $oct = encode(euc-cn, $str);

my $obj = Encode::find_encoding('UTF-8');
my $res = $obj-decode($oct, $CHECK);
warn $res;


-roman

On Jul 22, 2014, at 7:31 AM, Mark Ellis m...@rkellis.com wrote:

 I don't think there's anything you can do, you're app wants utf8 and they're 
 sending something else which doesn't map. and since you can't know what 
 format it is in, then all you can do is die if it doesn't map, which is what 
 the plugin does.
 
 as far as i can tell the ruby middleware i found handles this by returning a 
 400 bad request, which cataylst does as well. so there's no affect, other 
 than the noise in the logs.
 
 
 On 22 July 2014 11:21, Bernhard Bauch ba...@zsi.at wrote:
 here’s also a perl-script that does it
 
 --
 use Encode qw(decode encode);
 use LWP::UserAgent;
 
 my $str = '深入 so what';
 my $oct = encode(gb2312, $str);
 my $url = 'http://wbc-inco.net/object/event/past';
 my $ua   = LWP::UserAgent-new();
 my $response = $ua-post( $url, { $oct = $oct } );
 my $content  = $response-decoded_content();
 --
 
 On 22 Jul 2014, at 11:33, Bernhard Bauch ba...@zsi.at wrote:
 
 hey all,
 
 this pyton3 script triggers the error ….
 
 
 import httplib2
 import urllib.parse
 
 somestr = '深入 so what'
 encodedstr = somestr.encode('gb2312')
 url = 'http://myappdomain.com/search'   
 body = { encodedstr:encodedstr }
 headers = {
 'Content-type': 'application/x-www-form-urlencoded', 
 'Accept': 'text/html, application/xml;q=0.9, application/xhtml+xml, 
 image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1',
 'Accept-Encoding': 'gzip, deflate',
 'Accept-Language': 'zh;q=0.9,en;q=0.8'
 }
 http = httplib2.Http()
 response, content = http.request(url, 'POST', headers=headers, 
 body=urllib.parse.urlencode(body))
 
 
 now its possible to reproduce the error :)
 
 any ideas how to solve this ?
 ruby people did this with adding a utf8-sanitizer in the middleware..
 
 bye, bernhard
 
 
 On 21 Jul 2014, at 22:19, Bernhard Bauch ba...@zsi.at wrote:
 
 more news..
 
 the crawler/searcheinge that triggers these errors is
 http://easou.com
 
 this searchengine delivers their pages not in UTF8 — but in “gb2312” which 
 is “simple chinese”
 if i open the “wrong utf8” parameters from the faulty requests with 
 “gb2312” some readable signs appear.
  this leads me to: catalyst does not handle requests with gb2312 encoded 
  parameters (because they are not utf8) -and the request does not promote 
  that it is encoded in other than utf8.
 
 any ideas what to do ?
 
 bye, bernhard
 
 
 
 On 21 Jul 2014, at 14:36, Roman Winfinit winfi...@gmail.com wrote:
 
 Hello,
 
 How are you running your application? Ie: mod_perl, fcgi, fcgi + 
 httpd/nginx, plack + ... also what version of perl are you using and what 
 os?
 
 -roman
 
 On Jul 21, 2014 6:58 AM, Bernhard Bauch ba...@zsi.at wrote:
 Hey all,
 
 on most of my website running on (latest catalyst: 5.90065) i always get 
 utf8 related errors.
 the usually appear if a spider 
Mozilla/5.0 (compatible; EasouSpider; 
 +http://www.easou.com/search/spider.html)
 comes accross.
 
 the error is:
Caught exception in engine UTF8 Error: utf8 \x98 does not map to 
 Unicode at /usr/local/…./lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm 
 line 167.
 
 It took me while to get the actual parameters the spiders sends because 
 the debug-message of catalyst do not tell that much :...
 
 —
 [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim 
 /usr/local/…./lib/perl5/Catalyst.pm +2016: *** Request 164 (0.032/s) 
 [10682] [Wed Jul 16 15:08:47 2014] ***
 [2014/07/16 15:08:47] [5.255.253.218] [DEBUG] vim 
 /usr/local/…./lib/perl5/Catalyst.pm +2309: Response Code: 400; 
 Content-Type: text/plain; charset=UTF-8; Content-Length: unknown
 [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim 
 /usr/local/.../lib/perl5/Catalyst.pm +1880: Request took 0.006491s 
 (154.059/s)
 .---+---.
 | Action   
  

[Catalyst] UTF8 problems with plugin::encoding

2014-07-21 Thread Bernhard Bauch
Hey all,

on most of my website running on (latest catalyst: 5.90065) i always get utf8 
related errors.
the usually appear if a spider 
Mozilla/5.0 (compatible; EasouSpider; 
+http://www.easou.com/search/spider.html)
comes accross.

the error is:
Caught exception in engine UTF8 Error: utf8 \x98 does not map to 
Unicode at /usr/local/…./lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line 167.

It took me while to get the actual parameters the spiders sends because the 
debug-message of catalyst do not tell that much :...

—
[2014/07/16 15:08:47] [5.255.253.218] [INFO] vim 
/usr/local/…./lib/perl5/Catalyst.pm +2016: *** Request 164 (0.032/s) [10682] 
[Wed Jul 16 15:08:47 2014] ***
[2014/07/16 15:08:47] [5.255.253.218] [DEBUG] vim 
/usr/local/…./lib/perl5/Catalyst.pm +2309: Response Code: 400; Content-Type: 
text/plain; charset=UTF-8; Content-Length: unknown
[2014/07/16 15:08:47] [5.255.253.218] [INFO] vim 
/usr/local/.../lib/perl5/Catalyst.pm +1880: Request took 0.006491s (154.059/s)
.---+---.
| Action

| Time  |
+---+---+
'---+---'
—

i changed to Plugin::Unicode::Encoding plugin a bit to find out what the client 
sends … the results are these:
UTF8 trash arrives - and the module seems unable to deal with it…


Caught exception in engine UTF8 Error: utf8 \x98 does not map to Unicode at 
/usr/local/…../lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line 170.
 -

URL: notice/list

PARAMS:$VAR1 = {
  'X*Ö^K^@^@^@^@¸®ä
^@^@^@^@883^H^K^@^@^@^@h¡ä
^@^@^@^@Hµä
^@^@^@^@X^Z^N^Q^@^@^@^@ø91^F^Q^@^@^@^@Ø^F^N^Q^@^@^@^@¸92^F^Q^@^@^@^@(^K^N^Q^@^@^@^@88^B^N^Q^@^@^@^@¸úÝ^P^@^@^@^@^X%q^G^@^@^@^@اñ^O^@^@^@^@ØøB.^@^@^@^@èâÝ^P^@^@^@^@XÛ_^L^@^@^@^@ÈíÝ^P^@^@^@^@¸~P^S^@^@^@^@èåÝ^P^@^@^@^@Øný^O^@^@^@^@88úÝ^P^@^@^@^@^Xá(
 ^@^@^@^@ئÆ
^@^@^@^@Øï*^Q^@^@^@^@^X' = 
'^F^L^@^@^@^@98Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@98=H^@^@^@^@ø99ó^K^@^@^@^@hÔu^R^@^@^@^@¸8eó^K^@^@^@^@^Xä_^L^@^@^@^@Ø90a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû98^Q^@^@^@^@x¦h^H^@^@^@^@Xý98^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå98^Q^@^@^@^@ø¤h^H^@^@^@^@Xé98^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf82^Q^@^@^@^@^XéH^@^@^@^@xv82^Q^@^@^@^@X6éH^@^@^@^@xl82^Q^@^@^@^@83Ì^G^@^@^@^@Xl82^Q^@^@^@^@¸Ñý^M^@^@^@^@xr82^Q^@^@^@^@H[^H^Q^@^@^@^@^X|82^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u82^Q^@^@^@^@98Á¢^K^@^@^@^@Øp82^Q^@^@^@^@8Í¢^K^@^@^@^@Øl82^Q^@^@^@^@XË¢^K^@^@^@^@Xq82^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc82^Q^@^@^@^@¸Å¢^K^@^@^@^@8h82^Q^@^@^@^@98Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC95^M^@^@^@^@°S95^M^@^@^@^@^PI95^M^@^@^@^@À\\95^M^@^@^@^@ðE95^M^@^@^@^@80B95^M^@^@^@^@@P95^M^@^@^@^@80Q95^M^@^@^@^@
 
J95^M^@^@^@^@p\\95^M^@^@^@^@àU95^M^@^@^@^@àF95^M^@^@^@^@àA95^M^@^@^@^@^@9eô^P^@^@^@^@°9dô^P^@^@^@^@091ô^P^@^@^@^@
 9eô^P^@^@^@^@^P8eô^P^@^@^@^@ 88ô^P^@^@^@^@Ð82ô^P^@^@^@^@ 
8dô^P^@^@^@^@9095ô^P^@^@^@^@à90ô^P^@^@^@^@@95ô^P^@^@^@^@P8fô^P^@^@^@^@9081ô^P^@^@^@^@
 
97ô^P^@^@^@^@Ð8cô^P^@^@^@^@p88ô^P^@^@^@^@P99ô^P^@^@^@^@9090ô^P^@^@^@^@@9aô^P^@^@^@^@09bô^P^@^@^@'
};


 // value: $VAR1 = 
'^F^L^@^@^@^@98Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@98=H^@^@^@^@ø99ó^K^@^@^@^@hÔu^R^@^@^@^@¸8eó^K^@^@^@^@^Xä_^L^@^@^@^@Ø90a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû98^Q^@^@^@^@x¦h^H^@^@^@^@Xý98^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå98^Q^@^@^@^@ø¤h^H^@^@^@^@Xé98^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf82^Q^@^@^@^@^XéH^@^@^@^@xv82^Q^@^@^@^@X6éH^@^@^@^@xl82^Q^@^@^@^@83Ì^G^@^@^@^@Xl82^Q^@^@^@^@¸Ñý^M^@^@^@^@xr82^Q^@^@^@^@H[^H^Q^@^@^@^@^X|82^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u82^Q^@^@^@^@98Á¢^K^@^@^@^@Øp82^Q^@^@^@^@8Í¢^K^@^@^@^@Øl82^Q^@^@^@^@XË¢^K^@^@^@^@Xq82^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc82^Q^@^@^@^@¸Å¢^K^@^@^@^@8h82^Q^@^@^@^@98Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC95^M^@^@^@^@°S95^M^@^@^@^@^PI95^M^@^@^@^@À\\95^M^@^@^@^@ðE95^M^@^@^@^@80B95^M^@^@^@^@@P95^M^@^@^@^@80Q95^M^@^@^@^@
 
J95^M^@^@^@^@p\\95^M^@^@^@^@àU95^M^@^@^@^@àF95^M^@^@^@^@àA95^M^@^@^@^@^@9eô^P^@^@^@^@°9dô^P^@^@^@^@091ô^P^@^@^@^@
 9eô^P^@^@^@^@^P8eô^P^@^@^@^@ 88ô^P^@^@^@^@Ð82ô^P^@^@^@^@ 

Re: [Catalyst] UTF8 problems with plugin::encoding

2014-07-21 Thread Karol Bujaček

On 07/21/2014 12:57 PM, Bernhard Bauch wrote:

Hey all,

on most of my website running on (latest catalyst: 5.90065) i always get
utf8 related errors.



Hello,

I have no answer to your question, I just want to add something related. 
Several days ago I created application with Catalyst, TT, FormFu and 
PostgreSQL. Immediately after I put [% form %] into my template file, 
encoding was broken. I spent several hours trying to fix it. Also, there 
is Catalyst::Plugin::Unicode which seems to be deprecated, 
Catalyst::Plugin::Unicode::Encoding. Maybe it is just because I'm 
relatively new to Catalyst and other stuff, but I think it  is confusing.


It was necessary to 'use utf8', add 'ENCODING UTF-8' both for 
Controller::HTML::FormFu and View::HTML into config file and set 
pg_enable_utf8 = 1 for DBIC. Maybe it is possible to change Catalyst in 
order to read/write encoding information only once and use it for whole 
site/templates/forms/etc?




Karol

___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] UTF8 problems with plugin::encoding

2014-07-21 Thread Roman Winfinit
Hello,

How are you running your application? Ie: mod_perl, fcgi, fcgi +
httpd/nginx, plack + ... also what version of perl are you using and what
os?

-roman
On Jul 21, 2014 6:58 AM, Bernhard Bauch ba...@zsi.at wrote:

 Hey all,

 on most of my website running on (latest catalyst: 5.90065) i always get
 utf8 related errors.
 the usually appear if a spider
 Mozilla/5.0 (compatible; EasouSpider; +
 http://www.easou.com/search/spider.html)
 comes accross.

 the error is:
 Caught exception in engine UTF8 Error: utf8 \x98 does not map to
 Unicode at /usr/local/…./lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line
 167.

 It took me while to get the actual parameters the spiders sends because
 the debug-message of catalyst do not tell that much :...

 —
 [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim
 /usr/local/…./lib/perl5/Catalyst.pm +2016: *** Request 164 (0.032/s)
 [10682] [Wed Jul 16 15:08:47 2014] ***
 [2014/07/16 15:08:47] [5.255.253.218] [DEBUG] vim
 /usr/local/…./lib/perl5/Catalyst.pm +2309: Response Code: 400;
 Content-Type: text/plain; charset=UTF-8; Content-Length: unknown
 [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim
 /usr/local/.../lib/perl5/Catalyst.pm +1880: Request took 0.006491s
 (154.059/s)

 .---+---.
 | Action

  | Time  |

 +---+---+

 '---+---'
 —

 i changed to Plugin::Unicode::Encoding plugin a bit to find out what the
 client sends … the results are these:
 UTF8 trash arrives - and the module seems unable to deal with it…

 
 Caught exception in engine UTF8 Error: utf8 \x98 does not map to
 Unicode at /usr/local/…../lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm
 line 170.
  -

 URL: notice/list

 PARAMS:$VAR1 = {
   'X*Ö^K^@^@^@^@¸®ä
 ^@^@^@^@883^H^K^@^@^@^@h¡ä
 ^@^@^@^@Hµä
 ^@^@^@^@X^Z^N^Q^@^@^@^@ø91^F^Q^@^@^@^@Ø^F^N^Q^@^@^@^@¸92^F^Q^@^@^@^@(^K^N^Q^@^@^@^@88^B^N^Q^@^@^@^@¸úÝ^P^@^@^@^@^X%q^G^@^@^@^@اñ^O^@^@^@^@ØøB.^@^@^@^@èâÝ^P^@^@^@^@XÛ_^L^@^@^@^@ÈíÝ^P^@^@^@^@¸~P^S^@^@^@^@èåÝ^P^@^@^@^@Øný^O^@^@^@^@88úÝ^P^@^@^@^@^Xá(
 ^@^@^@^@ئÆ
 ^@^@^@^@Øï*^Q^@^@^@^@^X'
 = 
 '^F^L^@^@^@^@98Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@98=H^@^@^@^@ø99ó^K^@^@^@^@hÔu^R^@^@^@^@¸8eó^K^@^@^@^@^Xä_^L^@^@^@^@Ø90a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû98^Q^@^@^@^@x¦h^H^@^@^@^@Xý98^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå98^Q^@^@^@^@ø¤h^H^@^@^@^@Xé98^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf82^Q^@^@^@^@^XéH^@^@^@^@xv82^Q^@^@^@^@X6éH^@^@^@^@xl82^Q^@^@^@^@83Ì^G^@^@^@^@Xl82^Q^@^@^@^@¸Ñý^M^@^@^@^@xr82^Q^@^@^@^@H[^H^Q^@^@^@^@^X|82^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u82^Q^@^@^@^@98Á¢^K^@^@^@^@Øp82^Q^@^@^@^@8Í¢^K^@^@^@^@Øl82^Q^@^@^@^@XË¢^K^@^@^@^@Xq82^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc82^Q^@^@^@^@¸Å¢^K^@^@^@^@8h82^Q^@^@^@^@98Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC95^M^@^@^@^@°S95^M^@^@^@^@^PI95^M^@^@^@^@À\\95^M^@^@^@^@ðE95^M^@^@^@^@80B95^M^@^@^@^@@P95^M^@^@^@^@80Q95^M^@^@^@^@
 J95^M^@^@^@^@p\\95^M^@^@^@^@àU95^M^@^@^@^@àF95^M^@^@^@^@àA95^M^@^@^@^@^@9eô^P^@^@^@^@°9dô^P^@^@^@^@091ô^P^@^@^@^@
  9eô^P^@^@^@^@^P8eô^P^@^@^@^@
 88ô^P^@^@^@^@Ð82ô^P^@^@^@^@
 8dô^P^@^@^@^@9095ô^P^@^@^@^@à90ô^P^@^@^@^@@95ô^P^@^@^@^@P8fô^P^@^@^@^@9081ô^P^@^@^@^@
  
 97ô^P^@^@^@^@Ð8cô^P^@^@^@^@p88ô^P^@^@^@^@P99ô^P^@^@^@^@9090ô^P^@^@^@^@@9aô^P^@^@^@^@09bô^P^@^@^@'
 };


  // value: $VAR1
 = 
 '^F^L^@^@^@^@98Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@98=H^@^@^@^@ø99ó^K^@^@^@^@hÔu^R^@^@^@^@¸8eó^K^@^@^@^@^Xä_^L^@^@^@^@Ø90a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû98^Q^@^@^@^@x¦h^H^@^@^@^@Xý98^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå98^Q^@^@^@^@ø¤h^H^@^@^@^@Xé98^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf82^Q^@^@^@^@^XéH^@^@^@^@xv82^Q^@^@^@^@X6éH^@^@^@^@xl82^Q^@^@^@^@83Ì^G^@^@^@^@Xl82^Q^@^@^@^@¸Ñý^M^@^@^@^@xr82^Q^@^@^@^@H[^H^Q^@^@^@^@^X|82^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u82^Q^@^@^@^@98Á¢^K^@^@^@^@Øp82^Q^@^@^@^@8Í¢^K^@^@^@^@Øl82^Q^@^@^@^@XË¢^K^@^@^@^@Xq82^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc82^Q^@^@^@^@¸Å¢^K^@^@^@^@8h82^Q^@^@^@^@98Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC95^M^@^@^@^@°S95^M^@^@^@^@^PI95^M^@^@^@^@À\\95^M^@^@^@^@ðE95^M^@^@^@^@80B95^M^@^@^@^@@P95^M^@^@^@^@80Q95^M^@^@^@^@
 

Re: [Catalyst] UTF8 problems with plugin::encoding

2014-07-21 Thread Bernhard Bauch
more news..

the crawler/searcheinge that triggers these errors is
http://easou.com

this searchengine delivers their pages not in UTF8 — but in “gb2312” which is 
“simple chinese”
if i open the “wrong utf8” parameters from the faulty requests with “gb2312” 
some readable signs appear.
 this leads me to: catalyst does not handle requests with gb2312 encoded 
 parameters (because they are not utf8) -and the request does not promote 
 that it is encoded in other than utf8.

any ideas what to do ?

bye, bernhard



On 21 Jul 2014, at 14:36, Roman Winfinit winfi...@gmail.com wrote:

 Hello,
 
 How are you running your application? Ie: mod_perl, fcgi, fcgi + httpd/nginx, 
 plack + ... also what version of perl are you using and what os?
 
 -roman
 
 On Jul 21, 2014 6:58 AM, Bernhard Bauch ba...@zsi.at wrote:
 Hey all,
 
 on most of my website running on (latest catalyst: 5.90065) i always get utf8 
 related errors.
 the usually appear if a spider 
   Mozilla/5.0 (compatible; EasouSpider; 
 +http://www.easou.com/search/spider.html)
 comes accross.
 
 the error is:
   Caught exception in engine UTF8 Error: utf8 \x98 does not map to 
 Unicode at /usr/local/…./lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line 
 167.
 
 It took me while to get the actual parameters the spiders sends because the 
 debug-message of catalyst do not tell that much :...
 
 —
 [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim 
 /usr/local/…./lib/perl5/Catalyst.pm +2016: *** Request 164 (0.032/s) [10682] 
 [Wed Jul 16 15:08:47 2014] ***
 [2014/07/16 15:08:47] [5.255.253.218] [DEBUG] vim 
 /usr/local/…./lib/perl5/Catalyst.pm +2309: Response Code: 400; Content-Type: 
 text/plain; charset=UTF-8; Content-Length: unknown
 [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim 
 /usr/local/.../lib/perl5/Catalyst.pm +1880: Request took 0.006491s (154.059/s)
 .---+---.
 | Action  
   
 | Time  |
 +---+---+
 '---+---'
 —
 
 i changed to Plugin::Unicode::Encoding plugin a bit to find out what the 
 client sends … the results are these:
 UTF8 trash arrives - and the module seems unable to deal with it…
 
 
 Caught exception in engine UTF8 Error: utf8 \x98 does not map to Unicode 
 at /usr/local/…../lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line 170.
  -
 
 URL: notice/list
 
 PARAMS:$VAR1 = {
   'X*Ö^K^@^@^@^@¸®ä
 ^@^@^@^@883^H^K^@^@^@^@h¡ä
 ^@^@^@^@Hµä
 ^@^@^@^@X^Z^N^Q^@^@^@^@ø91^F^Q^@^@^@^@Ø^F^N^Q^@^@^@^@¸92^F^Q^@^@^@^@(^K^N^Q^@^@^@^@88^B^N^Q^@^@^@^@¸úÝ^P^@^@^@^@^X%q^G^@^@^@^@اñ^O^@^@^@^@ØøB.^@^@^@^@èâÝ^P^@^@^@^@XÛ_^L^@^@^@^@ÈíÝ^P^@^@^@^@¸~P^S^@^@^@^@èåÝ^P^@^@^@^@Øný^O^@^@^@^@88úÝ^P^@^@^@^@^Xá(
  ^@^@^@^@ئÆ
 ^@^@^@^@Øï*^Q^@^@^@^@^X' = 
 '^F^L^@^@^@^@98Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@98=H^@^@^@^@ø99ó^K^@^@^@^@hÔu^R^@^@^@^@¸8eó^K^@^@^@^@^Xä_^L^@^@^@^@Ø90a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû98^Q^@^@^@^@x¦h^H^@^@^@^@Xý98^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå98^Q^@^@^@^@ø¤h^H^@^@^@^@Xé98^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf82^Q^@^@^@^@^XéH^@^@^@^@xv82^Q^@^@^@^@X6éH^@^@^@^@xl82^Q^@^@^@^@83Ì^G^@^@^@^@Xl82^Q^@^@^@^@¸Ñý^M^@^@^@^@xr82^Q^@^@^@^@H[^H^Q^@^@^@^@^X|82^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u82^Q^@^@^@^@98Á¢^K^@^@^@^@Øp82^Q^@^@^@^@8Í¢^K^@^@^@^@Øl82^Q^@^@^@^@XË¢^K^@^@^@^@Xq82^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc82^Q^@^@^@^@¸Å¢^K^@^@^@^@8h82^Q^@^@^@^@98Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC95^M^@^@^@^@°S95^M^@^@^@^@^PI95^M^@^@^@^@À\\95^M^@^@^@^@ðE95^M^@^@^@^@80B95^M^@^@^@^@@P95^M^@^@^@^@80Q95^M^@^@^@^@
  
 J95^M^@^@^@^@p\\95^M^@^@^@^@àU95^M^@^@^@^@àF95^M^@^@^@^@àA95^M^@^@^@^@^@9eô^P^@^@^@^@°9dô^P^@^@^@^@091ô^P^@^@^@^@
  9eô^P^@^@^@^@^P8eô^P^@^@^@^@ 88ô^P^@^@^@^@Ð82ô^P^@^@^@^@ 
 8dô^P^@^@^@^@9095ô^P^@^@^@^@à90ô^P^@^@^@^@@95ô^P^@^@^@^@P8fô^P^@^@^@^@9081ô^P^@^@^@^@
  
 97ô^P^@^@^@^@Ð8cô^P^@^@^@^@p88ô^P^@^@^@^@P99ô^P^@^@^@^@9090ô^P^@^@^@^@@9aô^P^@^@^@^@09bô^P^@^@^@'
 };
 
 
  // value: $VAR1 =