hey all, this pyton3 script triggers the error ….
-------------------------------- import httplib2 import urllib.parse somestr = '深入 so what' encodedstr = somestr.encode('gb2312') url = 'http://myappdomain.com/search' body = { encodedstr:encodedstr } headers = { 'Content-type': 'application/x-www-form-urlencoded', 'Accept': 'text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'zh;q=0.9,en;q=0.8' } http = httplib2.Http() response, content = http.request(url, 'POST', headers=headers, body=urllib.parse.urlencode(body)) ———————————————— now its possible to reproduce the error :) any ideas how to solve this ? ruby people did this with adding a utf8-sanitizer in the middleware.. bye, bernhard On 21 Jul 2014, at 22:19, Bernhard Bauch <ba...@zsi.at> wrote: > more news.. > > the crawler/searcheinge that triggers these errors is > http://easou.com > > this searchengine delivers their pages not in UTF8 — but in “gb2312” which is > “simple chinese” > if i open the “wrong utf8” parameters from the faulty requests with “gb2312” > some readable signs appear. > >> this leads me to: catalyst does not handle requests with gb2312 encoded > >> parameters (because they are not utf8) -and the request does not promote > >> that it is encoded in other than utf8. > > any ideas what to do ? > > bye, bernhard > > > > On 21 Jul 2014, at 14:36, Roman Winfinit <winfi...@gmail.com> wrote: > >> Hello, >> >> How are you running your application? Ie: mod_perl, fcgi, fcgi + >> httpd/nginx, plack + ... also what version of perl are you using and what os? >> >> -roman >> >> On Jul 21, 2014 6:58 AM, "Bernhard Bauch" <ba...@zsi.at> wrote: >> Hey all, >> >> on most of my website running on (latest catalyst: 5.90065) i always get >> utf8 related errors. >> the usually appear if a spider >> Mozilla/5.0 (compatible; EasouSpider; >> +http://www.easou.com/search/spider.html) >> comes accross. >> >> the error is: >> Caught exception in engine "UTF8 Error: utf8 "\x98" does not map to >> Unicode at /usr/local/…./lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line >> 167. >> >> It took me while to get the actual parameters the spiders sends because the >> debug-message of catalyst do not tell that much :... >> >> ————————————— >> [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim >> /usr/local/…./lib/perl5/Catalyst.pm +2016: *** Request 164 (0.032/s) [10682] >> [Wed Jul 16 15:08:47 2014] *** >> [2014/07/16 15:08:47] [5.255.253.218] [DEBUG] vim >> /usr/local/…./lib/perl5/Catalyst.pm +2309: Response Code: 400; Content-Type: >> text/plain; charset=UTF-8; Content-Length: unknown >> [2014/07/16 15:08:47] [5.255.253.218] [INFO] vim >> /usr/local/.../lib/perl5/Catalyst.pm +1880: Request took 0.006491s >> (154.059/s) >> .---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------. >> | Action >> >> | Time | >> +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ >> '---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------' >> ————————————— >> >> i changed to Plugin::Unicode::Encoding plugin a bit to find out what the >> client sends … the results are these: >> UTF8 trash arrives - and the module seems unable to deal with it… >> >> ———————————— >> Caught exception in engine "UTF8 Error: utf8 "\x98" does not map to Unicode >> at /usr/local/…../lib/perl5/Catalyst/Plugin/Unicode/Encoding.pm line 170. >> - >> >> URL: notice/list >> >> PARAMS:$VAR1 = { >> 'X*Ö^K^@^@^@^@¸®ä >> ^@^@^@^@8<83>^H^K^@^@^@^@h¡ä >> ^@^@^@^@Hµä >> ^@^@^@^@X^Z^N^Q^@^@^@^@ø<91>^F^Q^@^@^@^@Ø^F^N^Q^@^@^@^@¸<92>^F^Q^@^@^@^@(^K^N^Q^@^@^@^@<88>^B^N^Q^@^@^@^@¸úÝ^P^@^@^@^@^X%q^G^@^@^@^@اñ^O^@^@^@^@ØøB.^@^@^@^@èâÝ^P^@^@^@^@XÛ_^L^@^@^@^@ÈíÝ^P^@^@^@^@¸~P^S^@^@^@^@èåÝ^P^@^@^@^@Øný^O^@^@^@^@<88>úÝ^P^@^@^@^@^Xá( >> ^@^@^@^@Ø¦Æ >> ^@^@^@^@Øï*^Q^@^@^@^@^X' => >> '^F^L^@^@^@^@<98>Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@<98>=H>^@^@^@^@ø<99>ó^K^@^@^@^@hÔu^R^@^@^@^@¸<8e>ó^K^@^@^@^@^Xä_^L^@^@^@^@Ø<90>a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû<98>^Q^@^@^@^@x¦h^H^@^@^@^@Xý<98>^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå<98>^Q^@^@^@^@ø¤h^H^@^@^@^@Xé<98>^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf<82>^Q^@^@^@^@^X>éH^@^@^@^@xv<82>^Q^@^@^@^@X6éH^@^@^@^@xl<82>^Q^@^@^@^@83Ì^G^@^@^@^@Xl<82>^Q^@^@^@^@¸Ñý^M^@^@^@^@xr<82>^Q^@^@^@^@H[^H^Q^@^@^@^@^X|<82>^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u<82>^Q^@^@^@^@<98>Á¢^K^@^@^@^@Øp<82>^Q^@^@^@^@8Í¢^K^@^@^@^@Øl<82>^Q^@^@^@^@XË¢^K^@^@^@^@Xq<82>^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc<82>^Q^@^@^@^@¸Å¢^K^@^@^@^@8h<82>^Q^@^@^@^@<98>Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC<95>^M^@^@^@^@°S<95>^M^@^@^@^@^PI<95>^M^@^@^@^@À\\<95>^M^@^@^@^@ðE<95>^M^@^@^@^@<80>B<95>^M^@^@^@^@@P<95>^M^@^@^@^@<80>Q<95>^M^@^@^@^@ >> >> J<95>^M^@^@^@^@p\\<95>^M^@^@^@^@àU<95>^M^@^@^@^@àF<95>^M^@^@^@^@àA<95>^M^@^@^@^@^@<9e>ô^P^@^@^@^@°<9d>ô^P^@^@^@^@0<91>ô^P^@^@^@^@ >> <9e>ô^P^@^@^@^@^P<8e>ô^P^@^@^@^@ <88>ô^P^@^@^@^@Ð<82>ô^P^@^@^@^@ >> <8d>ô^P^@^@^@^@<90><95>ô^P^@^@^@^@à<90>ô^P^@^@^@^@@<95>ô^P^@^@^@^@P<8f>ô^P^@^@^@^@<90><81>ô^P^@^@^@^@ >> >> <97>ô^P^@^@^@^@Ð<8c>ô^P^@^@^@^@p<88>ô^P^@^@^@^@P<99>ô^P^@^@^@^@<90><90>ô^P^@^@^@^@@<9a>ô^P^@^@^@^@0<9b>ô^P^@^@^@' >> }; >> >> >> // value: $VAR1 = >> '^F^L^@^@^@^@<98>Ûø^O^@^@^@^@Ø~^A^N^@^@^@^@<98>=H>^@^@^@^@ø<99>ó^K^@^@^@^@hÔu^R^@^@^@^@¸<8e>ó^K^@^@^@^@^Xä_^L^@^@^@^@Ø<90>a^G^@^@^@^@hðÉ^O^@^@^@^@8ã*^G^@^@^@^@ØØý^M^@^@^@^@Xùë^F^@^@^@^@^HÜý^M^@^@^@^@8W6^H^@^@^@^@øÐý^M^@^@^@^@xÿÃ^K^@^@^@^@X]i^O^@^@^@^@8^Mÿ^H^@^@^@^@Xû<98>^Q^@^@^@^@x¦h^H^@^@^@^@Xý<98>^Q^@^@^@^@^X=5^H^@^@^@^@^X¦ú^K^@^@^@^@^XVQ^P^@^@^@^@^H^Yû^N^@^@^@^@x¤h^H^@^@^@^@^Xå<98>^Q^@^@^@^@ø¤h^H^@^@^@^@Xé<98>^Q^@^@^@^@X¼h^H^@^@^@^@Ø¡h^H^@^@^@^@øf<82>^Q^@^@^@^@^X>éH^@^@^@^@xv<82>^Q^@^@^@^@X6éH^@^@^@^@xl<82>^Q^@^@^@^@83Ì^G^@^@^@^@Xl<82>^Q^@^@^@^@¸Ñý^M^@^@^@^@xr<82>^Q^@^@^@^@H[^H^Q^@^@^@^@^X|<82>^Q^@^@^@^@¸Ë¢^K^@^@^@^@¸u<82>^Q^@^@^@^@<98>Á¢^K^@^@^@^@Øp<82>^Q^@^@^@^@8Í¢^K^@^@^@^@Øl<82>^Q^@^@^@^@XË¢^K^@^@^@^@Xq<82>^Q^@^@^@^@^Xi^W^H^@^@^@^@Xc<82>^Q^@^@^@^@¸Å¢^K^@^@^@^@8h<82>^Q^@^@^@^@<98>Т^K^@^@^@^@¨fÐ^Q^@^@^@^@ØÉ=^R^@^@^@^@ÀC<95>^M^@^@^@^@°S<95>^M^@^@^@^@^PI<95>^M^@^@^@^@À\\<95>^M^@^@^@^@ðE<95>^M^@^@^@^@<80>B<95>^M^@^@^@^@@P<95>^M^@^@^@^@<80>Q<95>^M^@^@^@^@ >> >> J<95>^M^@^@^@^@p\\<95>^M^@^@^@^@àU<95>^M^@^@^@^@àF<95>^M^@^@^@^@àA<95>^M^@^@^@^@^@<9e>ô^P^@^@^@^@°<9d>ô^P^@^@^@^@0<91>ô^P^@^@^@^@ >> <9e>ô^P^@^@^@^@^P<8e>ô^P^@^@^@^@ <88>ô^P^@^@^@^@Ð<82>ô^P^@^@^@^@ >> <8d>ô^P^@^@^@^@<90><95>ô^P^@^@^@^@à<90>ô^P^@^@^@^@@<95>ô^P^@^@^@^@P<8f>ô^P^@^@^@^@<90><81>ô^P^@^@^@^@ >> >> <97>ô^P^@^@^@^@Ð<8c>ô^P^@^@^@^@p<88>ô^P^@^@^@^@P<99>ô^P^@^@^@^@<90><90>ô^P^@^@^@^@@<9a>ô^P^@^@^@^@0<9b>ô^P^@^@^@'; >> >> >> headers: Connection: close >> Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, >> image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1 >> Accept-Encoding: gzip, deflate >> Accept-Language: zh;q=0.9,en;q=0.8 >> Host: wbc-inco.net >> User-Agent: Mozilla/5.0 (compatible; EasouSpider; >> +http://www.easou.com/search/spider.html) >> Content-Length: 927 >> Content-Type: application/x-www-form-urlencoded >> REFER: http://b------.net/“ >> >> ———————————— >> >> to understand the logging above: this is what i added /changed in the >> Catalyst::Plugin::Unicode::Encoding >> >> ———————————————————— >> around line 168: >> >> my $val; >> eval { >> $val = Encode::is_utf8( $value ) ? $value : $enc->decode( $value, >> $CHECK ); >> }; >> if ($@){ >> # UPS ! >> # get request infos >> use Data::Dumper; >> my $params = $self->req->parameters; >> my $headers= $self->req->headers->as_string; >> die "UTF8 Error: $@ - \n\nURL: " . $self->req->path . "\n\nPARAMS:" . >> Dumper( $params ) . "\n\n // value: " . Dumper($value) . "\n\nheaders: " . >> $headers; >> …. >> ———————————————————— >> >> I guess my Catalyst Apps are not the only ones with these errors ? >> >> >> about my App settings / config: >> >> app-config has >> encoding UTF-8 >> >> App.pm does not load Unicode::Encoding anymore (since this is not need when >> using latest Catalyst: 5.90065) >> >> i am using postgres with >> pg_enable_utf8 1 >> (but the error about is far away from any DB related problem i guess) >> >> using Catalyst::Plugin::Unicode::Encoding version 2.1 (coming with catalyxt) >> >> i just checked out the tracker for catalyst on cpan, there is an UTF8 issue >> ticket >> https://rt.cpan.org/Public/Bug/Display.html?id=94957 >> but i does not look as it was this problem ... >> >> Any ideas what todo ? >> Add a issue/ticket ? >> >> thanks for feedback, >> bernhard bauch >> >> >> >> — >> Bernhard Bauch, Webdevelopment >> ZSI - Zentrum für soziale Innovation >> ba...@zsi.at >> Skype: berni-zsi >> >> >> _______________________________________________ >> List: Catalyst@lists.scsys.co.uk >> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst >> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ >> Dev site: http://dev.catalyst.perl.org/ >> >> !DSPAM:53cd09a3104511692032419! >> _______________________________________________ >> List: Catalyst@lists.scsys.co.uk >> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst >> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ >> Dev site: http://dev.catalyst.perl.org/ >> >> >> !DSPAM:53cd09a3104511692032419! > > — > Bernhard Bauch, Webdevelopment > ZSI - Zentrum für soziale Innovation > ba...@zsi.at > Skype: berni-zsi > > _______________________________________________ > List: Catalyst@lists.scsys.co.uk > Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst > Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ > Dev site: http://dev.catalyst.perl.org/ > > > !DSPAM:53cd7626104517769513966! — Bernhard Bauch, Webdevelopment ZSI - Zentrum für soziale Innovation ba...@zsi.at Skype: berni-zsi
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/ Dev site: http://dev.catalyst.perl.org/