Re: [Catalyst] Does uri_for() URL-escape arguments correctly ?

2012-12-09 Thread Marc SCHAEFER
On Tue, Dec 04, 2012 at 03:37:31PM -0800, Bill Moseley wrote:
> I've always used href="[% c.uri_for( ... ) | html  %]"

I can see a few issues when using ?a=b&c=d type of URL parameters[1], but
this is not usually what you do with Catalyst, so let's set this aside
for the moment. Your suggestion is XHTML compatible, which is good, but
maybe not enough.

My problem is quite simple, let's use your approach (which will avoid
XHTML warnings).

My template:

[% v = a | url %]
test 1: [% a | html %]
test 2: [% a | html %]

The 'a' stash variable is set as:

sub toto :Global {
   my ($self, $c) = @_;

   $c->stash('a' => "a=b&c%34 '"); # ends with space apostroph
}

The result I get (stray HTML excerpt -- your mail client might
corrupt this):

test 1: http://192.168.99.121:3001/directory/a=b&c%2534%20'/object">a=b&c%34
 '

test 2: http://192.168.99.121:3001/directory/a=b&c%34%20'/object">a=b&c%34
 '

>From above:

   - only doing | url manually and then | html encodes the % character
 correctly, just doing | html after uri_for() is not enough -- funny is
 that the space character *is* encoded correctly by uri_for(), but not
 the % character.

   - neither of uri_for(), url or html does anything for the apostroph, which
 was already mentionned on the mailing-list -- I never use ' as a
 HTML quote anyway.

Can you reproduce this with your version of Catalyst ?  Maybe mine has
a specific bug and I should upgrade. This is not a security problem, it's
more a data passthrough issue.

PS: Aristotle Pagaltzis's idea of uri_for() could be a work-around for the
non-encoding of some of the dangerous characters such as %, however
a more general solution should be handy, e.g. fix uri_for() ?

[1] specifically if you want a to be b&c=d. Should be either encoded as
?a=b%26c=d or double-encoded as ?a=b&c=d. I would prefer
the former.


___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


[Catalyst] Does uri_for() URL-escape arguments correctly ?

2012-12-04 Thread Marc SCHAEFER
Hi,

for some time I write things like this in my templates:



where file is something which can contain a lot of dangerous characters.

I assumed (and after experiencing a bit it seemed to be the case) that
it would escape spaces, quotes, slashes, etc using the %XX URL-escapes.
It seems to do it, even for / e.g.

However, it does not escape the % character itself. Yes, I do have filenames
with % in them :)

The url filter in the Template Toolkit does, so the following work-around
works (because already %-encoded sequences are untouched by uri_for())

   [% file = path _ video | url %]

Am I mistaken so to think that c.uri_for(x, y) does the auto-filtering
for y automatically as required ?

I might also have a question regarding the priority of operations in
path _ video | url. In my case it works, because path doesn't contain %,
but slashes.


___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


[Catalyst] Issues with HTML::FormFu and ISO-8859-1

2012-01-15 Thread Marc SCHAEFER
Hi,

I am trying to port an old Catalyst application I wrote in 2009 to the
current framework.  Apart from quite a few obsoleted methods I needed
to rewrite, I stumbled on a very bizarre problem which seems to prevent loading
from all of my templates using HTML::FormFu

Basically, I always run in a ISO-8859-1 environment, for various
KISS reasons (the number of bugs you see in the UNICODE stuff is
really impressive, and I don't need that additionnal complexity).

Thus, my application is completely and only ISO-8859-1.
However I get an error:

Caught exception in livres::Controller::livres->del "Error parsing 
/home/catalyst/catalyst-deploy/root/forms/livres/del.yml: YAML::XS::Load Error: 
The problem:

Invalid trailing UTF-8 octet

was found at document: 0
 at /usr/local/share/perl/5.10.1/HTML/FormFu/ObjectUtil.pm line 151"

>From the manual of YAML::XS it seems only UNICODE is supported.
Is this a dead end ?

For the time being I have recoded all the templates using HTML entities
(good old ASCII always works) with recode. I however would really like
to get back my "clean ISO-8859-1" path, if it is still possible.


___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-23 Thread Marc SCHAEFER
On Mon, Nov 23, 2009 at 05:43:25PM +0100, Aristotle Pagaltzis wrote:
> If you use the `html` filter instead of `html_entity`, it will
> escape only the five characters that have to be.

Thank you. It works like a charm.

> I had an IRC convo with Tomas Doran last night and explained the
> problem to him. He knocked out some tests for the broken

Thank you for your time!  It's nice to see the responsiveness of the
project.


___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


Re: [Catalyst] Avoiding UTF8 in Catalyst

2009-11-23 Thread Marc SCHAEFER
On Mon, Nov 23, 2009 at 07:42:06AM -0800, Bill Moseley wrote:
> Still not following.   You are talking about Catalyst::View::TT?

It appears that the latin1 -> htmlentities conversion is done by
View:TT's htmlentity, e.g.:

[% FOREACH h IN cols %][% b.$h | html_entity %][% END %]

This is perfectly OK, even if not strictly required.  I thought it was
something else doing that, but it isn't.

> BTW -- when looking at C::V::TT I see where you got that DEFAULT_ENCODING
> from -- it's documented in C::V::TT.

The simple fact that html_entity above changes é (iso-8859-1) in é
means that something must have understood I am using iso-8859-1, which
is good. But you seem to be right:

> As far as I know there's no such setting in Template Toolkit.  There's
> "ENCODING" to specify the encoding of your templates.

I am using:

   package MyApp::View::TT;

   use strict;
   use base 'Catalyst::View::TT';

   __PACKAGE__->config(TEMPLATE_EXTENSION => '.tt',
   FILTERS => { 'latex' => \&latex },
   DEFAULT_ENCODING   => 'iso-8859-1',
   WRAPPER => 'wrapper.tt');

You are however right that removing the DEFAULT_ENCODING above
doesn't change anything. Replacing it by ENCODING => 'utf-8'
creates a charset conversion bug (which is expected). Replacing with
ENCODING => 'iso-8859-1' doesn't change anything. So I can safely
assume that as usually expected, iso-8859-1 is the default.  I now
removed this specification altogether.

> If your templates are 8859-1 with 8 bit characters my suggestion would be to
> convert them to utf-8 and set ENCODING to utf8 for the templates, and move
> toward utf8 everywhere.Make sure you use the plugin to decode and
> encode.

Again, utf8 is out of the question here: be it in the source file, the
database, or the output. UTF-8 is unacceptable in our environment.

My problem (Catalyst sending iso-8859-1 data to the browser, but having
a wrong Content-Length: as if counting the bytes from the UTF-8
equivalent (or Perl Unicode upgraded string as mentionned in a separate
mail by Aristotle Pagaltzis)) was solved by adding the following to MyApp.pm:

before 'finalize_headers'
   => sub {
 my $c = shift;

 if ($c->response) {
my $s = $c->response->body;
utf8::downgrade($s);
$c->response->body($s);
 }
  };   

There is still apparently something wrong: there is absolutely no reason
why a Perl Unicode string should be used, but I was unable to determine
why it was created (upgraded) in the first place.

The fact is that counting bytes from the Perl Unicode upgraded string is
wrong when using ISO-8859-1.

Maybe Catalyst dropped any support for non UTF-8 charset. By doing that
it also dropped any support for any charset having a bytesize different
than the Perl Unicode upgraded string internal format, apparently.

But I am no expert on this.



___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


[Catalyst] Avoiding UTF8 in Catalyst

2009-11-21 Thread Marc SCHAEFER
Hi,

my goal: no UTF8, in short:

   - all the perl code, all the data files, all the template files and
 the UNIX locale are all in ISO-8859-1

   - the HTML result should be in ISO-8859-1
 (Content-Type: text/html; charset=iso-8859-1)

   - the Content-Length: should be correct.

First, I modified lib/MyApp/View/TT.pm as follows:

   __PACKAGE__->config(TEMPLATE_EXTENSION => '.tt',
   DEFAULT_ENCODING   => 'ISO-8859-1',
   WRAPPER => 'wrapper.tt');

Apparently all diacritic characters are expanded into HTML entities.
Which is functional, but not optimal.  However, with FormFu, this
unnecessary expansion doesn't happen, which is fine. 

I got the following result:

   - the HTML data is in ISO-8859-1 (or as HTML entities, which is
 acceptable as a work-around) as wanted
   - however the HTTP header charset is UTF8

After looking at line 45 of
   /usr/local/share/perl/5.8.8/Catalyst/Action/RenderView.pm
it looks that the utf-8 charset HTTP header is hardcoded. I have thus modified
my lib/MyApp/Controller/Root.pm to do the following in
end : ActionClass('RenderView'):

   $c->response->content_type('text/html; charset=iso-8859-1');

With this, I got the following result:

   - the HTML data is in ISO-8859-1 as wanted (no change, logical)
   - the HTTP header charset is now the correct iso-8859-1
   - however, the Content-Length: sent is wrong.

After investigating, the Content-Length: is one off per non 7-bit
character. As if the standard iso-8859-1 byte stream was sent as
is, but was, internally converted to UTF-8 just for generating
a wrong byte count. Very strange.  Normally that process should really
output something wrong or generate an error in the conversion. It
doesn't.

My questions:

   - is there a better way to use the standard charset than to do all
 of the above hacks ?

   - if not, how to work-around the content length in
  end : ActionClass('RenderView') ?  Unfortunately, it looks like
 $c->result->body is undefined at this point, and that
 $c->finalize_body() doesn't do anything useful.

Version info:
 Catalyst 5.80007 and 5.80013

PS: I wouldn't have noticed the Content-Length: issue if I hadn't use a
reverse proxy.  With that reverse proxy, and the standalone Catalyst
server, you get 5-10 seconds hangs if the Content-Length is too big,
which is what happens with this strange UTF8 behaviour. Without it,
the size is wrong (as seen by wireshark != PageInfo Firefox), but
the WWW client seems to compensate.

PS/2: the http://www.catb.org/~esr/faqs/smart-questions.html URL doesn't
  work currently, so maybe my question is unsmart.

___
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/