Hi Meir.
So, what do you want me to write in this chapter? I'm trying to explain a bit about unicode and encodings, and not to talk about certain modules. About the module itself, I agree with you. especially when the developers are English natives, the need to support other encodings does not come easily to them. did you open a bug for this module about this issue? I think you should. Well, at least in that case there *is* a workaround. sometimes there isn't, and you need to hack to module to make it do the right thing... Shmuel. On 2011/01/17 14:59, Meir Guttman wrote: > Dear Shmuel, > > My comment here relates to your Chapter 9, "Hebrew in Perl". > > Unicode text will often force us to avoid many shortcuts. As an > example, I am attaching a (slightly modified version) of an actual > Perl snippet I am using to parse HTML files with the LWP module. This > immensely helpful module has a very nice shortcut that can build a > parse tree directly, given a file name. Unfortunately, this often > breaks down if the file is not Unicode encoded. So here one must use > the long route: First create a "new" /empty/ parse-tree; then "open" > the HTML file to be parsed using a three-argument "open" version > specifying the files encoding; and finally, use this file's /handle/ > (rather than the file's /name/) to populate the tree. > > # The HTML::TreeBuilder->new_from_file(...) method cannot be used > > # here because the site's HTML files are not UTF-8 encoded. Rather, > > # we first create a "new" empty tree object "$root". We then "open" > > # the file, thus giving us the opportunity to specify its encoding. > > # This root is then populated with the file's contents using its > > # handle rather then name in the "parse_file" method. > > Use LWP; > > my $root = HTML::TreeBuilder->new(); > > open (my $fh, "<:encoding(windows-1255)", $filename) or die "$!"; > > $root->parse_file($fh); > > close $fh; > > I found this long route to be required in many other modules too. I > would therefore suggest that module /writers/ be aware of this and > always provide a way to support files having all kinds of text-encoding. > > This is my 2¢ contributions… > > Meir > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On > Behalf Of Shmuel Fomberg > Sent: Sunday, January 16, 2011 11:25 PM > To: Perl in Israel > Subject: [Israel.pm] My Perl-Hebrew-Tutorial > > Hi All. > > I have some time in my hands, so I'm trying to improve my > > Perl-Hebrew-Tutorial a bit. > > http://code.semuel.co.il/perlhebtut/index.html > > Now I improved chapter 6, about handling files. what do you think? > > Also, any more points in the tutorial that need some updating? > > Shmuel. > > _______________________________________________ > > Perl mailing list > > [email protected] > > http://mail.perl.org.il/mailman/listinfo/perl > > > _______________________________________________ > Perl mailing list > [email protected] > http://mail.perl.org.il/mailman/listinfo/perl _______________________________________________ Perl mailing list [email protected] http://mail.perl.org.il/mailman/listinfo/perl
