On Thu, Jun 4, 2009 at 3:39 PM, Gisle Aas <gi...@aas.no> wrote:

> On Thu, Jun 4, 2009 at 11:46, Alex Kapranoff <k...@nadoby.ru> wrote:
> > Oh, I see. I suppose this can break old scripts that expect 8 bit
> characters
> > without UTF-8 flag come through unaltered.
>
> They would then have to set $form->accept_charset("latin-1") as a
> workaround.  I think that's acceptable.  I would hate to make this the
> default for backwards compatiblity.


This will also affect people who use koi8-r, cp1251 оr any of hundreds of 8
bit encodings that were in wide use :)
I think this point should at least be documented in POD and Changes, then.

Something like this:
HTML::Form now always encodes data from its inputs into destination encoding
when generating HTTP::Request objects. To specify the destination encoding
you can use accept_charset() method or standard "accept-charset" attribute
of the <form> tag if you create HTML::Form instance using parse()
constructor. Destination encoding defaults to UTF-8 (imitating modern
browsers). If you want your 8 bit data to come through unchanged you have
two choices: 1) either decode it from $your_charset into internal Unicode
representation using Encode::decode() before feeding into HTML::Form and
then specify accept-charset($your_encoding) or 2) call
accept-charset("latin1"). The latter method is not recommended unless you
really use latin1.

We should also find a way to propagate the original charset of the
> HTML document that's parsed.  This should the be the default
> accept_charset(), what you get when the attribute is still 'UNKNOWN'.
> For the HTML::Form->parse($response) case this should happen
> automatically.


Yes, totally! That was my original intention behind the patch :) We do
exactly that in our WWW::Mechanize scripts now -- that is, manually set
accept_charset() on forms from document charset, it would be awesome to have
some automatic propagation.

--Gisle
>

Reply via email to