Hi Meir.

So, what do you want me to write in this chapter?

I'm trying to explain a bit about unicode and encodings, and not to talk 
about certain modules.


About the module itself, I agree with you. especially when the 
developers are English natives,

the need to support other encodings does not come easily to them.

did you open a bug for this module about this issue? I think you should.


Well, at least in that case there *is* a workaround. sometimes there 
isn't, and you need to hack to module to make it do the right thing...


Shmuel.


On 2011/01/17 14:59, Meir Guttman wrote:

> Dear Shmuel,
>
> My comment here relates to your Chapter 9, "Hebrew in Perl".
>
> Unicode text will often force us to avoid many shortcuts. As an 
> example, I am attaching a (slightly modified version) of an actual 
> Perl snippet I am using to parse HTML files with the LWP module. This 
> immensely helpful module has a very nice shortcut that can build a 
> parse tree directly, given a file name. Unfortunately, this often 
> breaks down if the file is not Unicode encoded. So here one must use 
> the long route: First create a "new" /empty/ parse-tree; then "open" 
> the HTML file to be parsed using a three-argument "open" version 
> specifying the files encoding; and finally, use this file's /handle/ 
> (rather than the file's /name/) to populate the tree.
>
> # The HTML::TreeBuilder->new_from_file(...) method cannot be used
>
> # here because the site's HTML files are not UTF-8 encoded. Rather,
>
> # we first create a "new" empty tree object "$root". We then "open"
>
> # the file, thus giving us the opportunity to specify its encoding.
>
> # This root is then populated with the file's contents using its
>
> # handle rather then name in the "parse_file" method.
>
> Use LWP;
>
> my $root = HTML::TreeBuilder->new();
>
> open (my $fh, "<:encoding(windows-1255)", $filename) or die "$!";
>
> $root->parse_file($fh);
>
> close $fh;
>
> I found this long route to be required in many other modules too. I 
> would therefore suggest that module /writers/ be aware of this and 
> always provide a way to support files having all kinds of text-encoding.
>
> This is my 2¢ contributions…
>
> Meir
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On 
> Behalf Of Shmuel Fomberg
> Sent: Sunday, January 16, 2011 11:25 PM
> To: Perl in Israel
> Subject: [Israel.pm] My Perl-Hebrew-Tutorial
>
> Hi All.
>
> I have some time in my hands, so I'm trying to improve my
>
> Perl-Hebrew-Tutorial a bit.
>
> http://code.semuel.co.il/perlhebtut/index.html
>
> Now I improved chapter 6, about handling files. what do you think?
>
> Also, any more points in the tutorial that need some updating?
>
> Shmuel.
>
> _______________________________________________
>
> Perl mailing list
>
> [email protected]
>
> http://mail.perl.org.il/mailman/listinfo/perl
>
>
> _______________________________________________
> Perl mailing list
> [email protected]
> http://mail.perl.org.il/mailman/listinfo/perl

_______________________________________________
Perl mailing list
[email protected]
http://mail.perl.org.il/mailman/listinfo/perl

Reply via email to