Re: [Pharo-users] Web scrapping with Pharo Chrome

Alistair Grant Thu, 15 Feb 2018 09:52:25 -0800

Hi Offray,

On 14 February 2018 at 20:29, Offray Vladimir Luna Cárdenas
<offray.l...@mutabit.com> wrote:
> Yes. Me too. Alistair, any starting points with this example? I will take
> from there and we could get visibility in the upcoming Open Data Day.


I'm not sure that I understand what you're after, but maybe the
following will help.

This simply returns a collection of all the h4 headings that are in cells:


| rootNode divs cells cellTitleNodes cellTitles |

rootNode := GoogleChrome get: 'http://mutabit.com/grafoscopio/index.en.html'.
divs := rootNode findAllTags: 'div'.
cells := divs select: [ :each | (' ' split: (each attributeAt:
'class')) includes: 'mdl-cell' ].
cellTitleNodes := cells flatCollect: [ :each | each findAllTags: 'h4' ].
cellTitles := cellTitleNodes collect: [ :each | (each findAllStrings:
true) first nodeValue ].
{ rootNode. divs. cells. cellTitleNodes. cellTitles }




> Of course, we're going to document everything and share back (I have already
> proposed some improvements in documentation via PR on the Git repo).

Thanks very much for improving the readme.  I've merged the PR.

Cheers,
Alistair


> Cheers,
>
> Offray
>
>
> On 14/02/18 14:13, Stephane Ducasse wrote:
>
> I would love to have a little how to and that we can turn it into a
> document.
>
> Stef
>
> On Wed, Feb 14, 2018 at 5:25 PM, Offray Vladimir Luna Cárdenas
> <offray.l...@mutabit.com> wrote:
>>
>> Hi,
>>
>> I have been finally able to install and use properly Pharo Chrome. The
>> issues I reported in other thread, were caused by conflicts between
>> OSProcess and OSSubProcess on Linux. Now I'm able to launch Chrome, point it
>> to particular addresses and get some info from there.
>>
>> I would like to continue the conversation related with web scraping using
>> Pharo Chrome, which comes handy now that there is a lot of React and other
>> technologies making the web more and more opaque for digital citizenship and
>> data activism endeavors. So, as an starting example, I would like to scrap
>> the Grafoscopio's own page [1]. It makes use of Material Design Light [2].
>>
>> [1] http://mutabit.com/grafoscopio/index.en.html
>> [2] http://getmdl.io/
>>
>> An starting example would be fine to kickstart myself (from the ones I'm
>> reading, there is still something I don't get). Let's say I want to get all
>> cards in [1] , as shown in the screenshot below. I know the div class of
>> each one, and the div class where they are located. Which could be a minimal
>> example of a scraper, to start with?
>>
>> Thanks,
>>
>> Offray
>>
>>
>
>

Re: [Pharo-users] Web scrapping with Pharo Chrome

Reply via email to