Hi Fabrizio,

see below for my response.

On Fri, 6 Apr 2012 03:30:46 -0700 (PDT)
Fabrizio Di Carlo <dicarlo.fabri...@gmail.com> wrote:

> Hello to all,
> 
> I'm very newbie of Perl but every I'm understanding how is powerful this 
> language, but I have a problem:
> 
> I'm using Perl with Selenium for scraping data (for a job) the code looks 
> like this 
> 
> [code]
> use strict;
> use warnings;
> use Time::HiRes qw(sleep);
> use Test::WWW::Selenium;
> use Test::More "no_plan";
> use Test::Exception;
> 
> 
> open (INFO, '>>database.csv') or die "$!";    
> print INFO ("titolo\;descrizione\;schedaTecnica\;URLFoto\n");                 
>                                                 
> my $sel = Test::WWW::Selenium->new( host => "localhost", 
>                                     port => 4444, 
>                                     browser => "*chrome", 
>                                     browser_url => 
> "http://www.example.com/it/page.html"; );
> 
> sub estrai{
>       $sel->wait_for_page_to_load_ok("30000");
>       my $titolo = $sel->get_text("//h1");
>       my $schedaTecnica = $sel->get_text("//td[3]/ul");
>       my $img = $sel->get_attribute("//div/img\@src");
>       my $descrizione = $sel->get_text("//td[2]");
>       print INFO ("$titolo\;$descrizione\;$schedaTecnica\;$img\n");
>       $sel->go_back_ok();
>       $sel->wait_for_page_to_load_ok("30000");
> }
>                                                                       
> $sel->open_ok("/it/page.html");
> $sel->click_ok("//div[2]/div/div/div[2]/h3/a");
> $sel->wait_for_page_to_load_ok("30000");
> $sel->click_ok("//div[2]/div/div/div[2]/h3/a");
> $sel->wait_for_page_to_load_ok("30000");
> estrai($sel);
> ...
> close (INFO);
> [/code]
> 
> Unfortunately my CSV is very bad because (sometimes) when I extract data from 
> "//ul" my file looks like:
> 
> [code]
> Art. S500 Set Yoga "Siddhartha";Idea regalo ?SET YOGA Siddhartha? Elegante 
> scatola in cartone lucido contenente:
>  2 mattoni in legno naturale mis. cm 20 x 12,5 x 7
>  
>  1 cinghia in cotone mis. cm 4 x 235
>  
>  1 stuoia in cotone mis. cm 70 x 170
>  
>  1 manuale di introduzione allo yoga stampato
>  
>  
>  
>  Tutto rigorosamente realizzato con materiali naturali;€ 
> 82,50;../images/S500%20(Custom).jpg
> [/code]
> So when I extract data I need to implement UTF8 encoding and to eliminate 
> spaces between lines, how is possible?
> 

You should play with the encoding layer of file handles (e.g: «binmode $myfh,
":encoding(utf8)"») and with Encode.pm's decode() and encode() functions. For
me at least, it usually takes some trial and error.

Regards,

        Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
List of Portability Libraries - http://shlom.in/port-libs

Chuck Norris wrote a complete Perl 6 implementation in a day, but then
destroyed all evidence with his bare hands, so no‐one will know his secrets.

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to