Dear Shmuel,

 

My comment here relates to your Chapter 9, "Hebrew in Perl".

 

Unicode text will often force us to avoid many shortcuts. As an example, I
am attaching a (slightly modified version) of an actual Perl snippet I am
using to parse HTML files with the LWP module. This immensely helpful module
has a very nice shortcut that can build a parse tree directly, given a file
name. Unfortunately, this often breaks down if the file is not Unicode
encoded. So here one must use the long route: First create a "new" empty
parse-tree; then "open" the HTML file to be parsed using a three-argument
"open" version specifying the files encoding; and finally, use this file's
handle (rather than the file's name) to populate the tree. 

 


# The HTML::TreeBuilder->new_from_file(...) method cannot be used

# here because the site's HTML files are not UTF-8 encoded. Rather,

# we first create a "new" empty tree object "$root". We then "open"

# the file, thus giving us the opportunity to specify its encoding. 

# This root is then populated with the file's contents using its

# handle rather then name in the "parse_file" method.

 

Use LWP;

my $root = HTML::TreeBuilder->new();

open (my $fh, "<:encoding(windows-1255)", $filename) or die "$!";

$root->parse_file($fh);

close $fh;

 

I found this long route to be required in many other modules too. I would
therefore suggest that module writers be aware of this and always provide a
way to support files having all kinds of text-encoding.

 

This is my 2¢ contributions…

Meir

 

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf
Of Shmuel Fomberg
Sent: Sunday, January 16, 2011 11:25 PM
To: Perl in Israel
Subject: [Israel.pm] My Perl-Hebrew-Tutorial

 

Hi All.

 

 

I have some time in my hands, so I'm trying to improve my 

Perl-Hebrew-Tutorial a bit.

 

http://code.semuel.co.il/perlhebtut/index.html

 

Now I improved chapter 6, about handling files. what do you think?

 

Also, any more points in the tutorial that need some updating?

 

 

Shmuel.

 

_______________________________________________

Perl mailing list

[email protected]

http://mail.perl.org.il/mailman/listinfo/perl

_______________________________________________
Perl mailing list
[email protected]
http://mail.perl.org.il/mailman/listinfo/perl

Reply via email to