Re: [Israel.pm] Regular expression for special html characters

Mikhael Goikhman Thu, 03 Feb 2011 16:12:24 -0800

On 03 Feb 2011 12:46:04 +0200, sawyer x wrote:
> 
> use strict;
> use warnings;
> use utf8;
> use HTML::Entities;
> use Encode 'decode';
> 
> my $string = "&#8220; test &#8221; ניסיון ";
> my $converted = decode_entities( decode( 'UTF-8', $string ) );


Just a minor clarification about this code. I think Sawyer made a
copy-paste error from his previous example. :)

I.e. use either "use utf8;" or decode('UTF-8', ...), but not both.

And I would recommend to use decode_utf8 instead that makes the code
slightly shorter and slightly more tollerant to input errors (because
encoding 'UTF-8' is more strict in perl than 'utf8' and croaks more).

The complete alternatives then are:

  use HTML::Entities;
  use utf8;
  print decode_entities("&#8220; test &#8221; ניסיון\n");

or:

  use HTML::Entities;
  use Encode;
  print decode_entities(decode_utf8("&#8220; test &#8221; ניסיון\n"));

> And one more example. If you're going to print this string to a terminal,
> you need to make sure that STDOUT (which is what you're actually printing
> to) is in binmode, so it can show Hebrew properly:
> 
> binmode STDOUT, ':utf8';

An alternative to this binmode line (and to "use utf8;") can be:

        use encoding 'utf8', STDOUT => 'utf8';

Regards,
Mikhael.

-- 
perl -e 'print+chr(64+hex)for+split//,d9b815c07f9b8d1e'
_______________________________________________
Perl mailing list
[email protected]
http://mail.perl.org.il/mailman/listinfo/perl

Re: [Israel.pm] Regular expression for special html characters

Reply via email to