Hi Fabrizio, see below for my response.
On Fri, 6 Apr 2012 03:30:46 -0700 (PDT) Fabrizio Di Carlo <dicarlo.fabri...@gmail.com> wrote: > Hello to all, > > I'm very newbie of Perl but every I'm understanding how is powerful this > language, but I have a problem: > > I'm using Perl with Selenium for scraping data (for a job) the code looks > like this > > [code] > use strict; > use warnings; > use Time::HiRes qw(sleep); > use Test::WWW::Selenium; > use Test::More "no_plan"; > use Test::Exception; > > > open (INFO, '>>database.csv') or die "$!"; > print INFO ("titolo\;descrizione\;schedaTecnica\;URLFoto\n"); > > my $sel = Test::WWW::Selenium->new( host => "localhost", > port => 4444, > browser => "*chrome", > browser_url => > "http://www.example.com/it/page.html" ); > > sub estrai{ > $sel->wait_for_page_to_load_ok("30000"); > my $titolo = $sel->get_text("//h1"); > my $schedaTecnica = $sel->get_text("//td[3]/ul"); > my $img = $sel->get_attribute("//div/img\@src"); > my $descrizione = $sel->get_text("//td[2]"); > print INFO ("$titolo\;$descrizione\;$schedaTecnica\;$img\n"); > $sel->go_back_ok(); > $sel->wait_for_page_to_load_ok("30000"); > } > > $sel->open_ok("/it/page.html"); > $sel->click_ok("//div[2]/div/div/div[2]/h3/a"); > $sel->wait_for_page_to_load_ok("30000"); > $sel->click_ok("//div[2]/div/div/div[2]/h3/a"); > $sel->wait_for_page_to_load_ok("30000"); > estrai($sel); > ... > close (INFO); > [/code] > > Unfortunately my CSV is very bad because (sometimes) when I extract data from > "//ul" my file looks like: > > [code] > Art. S500 Set Yoga "Siddhartha";Idea regalo ?SET YOGA Siddhartha? Elegante > scatola in cartone lucido contenente: > 2 mattoni in legno naturale mis. cm 20 x 12,5 x 7 > > 1 cinghia in cotone mis. cm 4 x 235 > > 1 stuoia in cotone mis. cm 70 x 170 > > 1 manuale di introduzione allo yoga stampato > > > > Tutto rigorosamente realizzato con materiali naturali;€ > 82,50;../images/S500%20(Custom).jpg > [/code] > So when I extract data I need to implement UTF8 encoding and to eliminate > spaces between lines, how is possible? > You should play with the encoding layer of file handles (e.g: «binmode $myfh, ":encoding(utf8)"») and with Encode.pm's decode() and encode() functions. For me at least, it usually takes some trial and error. Regards, Shlomi Fish -- ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ List of Portability Libraries - http://shlom.in/port-libs Chuck Norris wrote a complete Perl 6 implementation in a day, but then destroyed all evidence with his bare hands, so no‐one will know his secrets. Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/