Hi Offray, On 14 February 2018 at 20:29, Offray Vladimir Luna Cárdenas <offray.l...@mutabit.com> wrote: > Yes. Me too. Alistair, any starting points with this example? I will take > from there and we could get visibility in the upcoming Open Data Day.
I'm not sure that I understand what you're after, but maybe the following will help. This simply returns a collection of all the h4 headings that are in cells: | rootNode divs cells cellTitleNodes cellTitles | rootNode := GoogleChrome get: 'http://mutabit.com/grafoscopio/index.en.html'. divs := rootNode findAllTags: 'div'. cells := divs select: [ :each | (' ' split: (each attributeAt: 'class')) includes: 'mdl-cell' ]. cellTitleNodes := cells flatCollect: [ :each | each findAllTags: 'h4' ]. cellTitles := cellTitleNodes collect: [ :each | (each findAllStrings: true) first nodeValue ]. { rootNode. divs. cells. cellTitleNodes. cellTitles } > Of course, we're going to document everything and share back (I have already > proposed some improvements in documentation via PR on the Git repo). Thanks very much for improving the readme. I've merged the PR. Cheers, Alistair > Cheers, > > Offray > > > On 14/02/18 14:13, Stephane Ducasse wrote: > > I would love to have a little how to and that we can turn it into a > document. > > Stef > > On Wed, Feb 14, 2018 at 5:25 PM, Offray Vladimir Luna Cárdenas > <offray.l...@mutabit.com> wrote: >> >> Hi, >> >> I have been finally able to install and use properly Pharo Chrome. The >> issues I reported in other thread, were caused by conflicts between >> OSProcess and OSSubProcess on Linux. Now I'm able to launch Chrome, point it >> to particular addresses and get some info from there. >> >> I would like to continue the conversation related with web scraping using >> Pharo Chrome, which comes handy now that there is a lot of React and other >> technologies making the web more and more opaque for digital citizenship and >> data activism endeavors. So, as an starting example, I would like to scrap >> the Grafoscopio's own page [1]. It makes use of Material Design Light [2]. >> >> [1] http://mutabit.com/grafoscopio/index.en.html >> [2] http://getmdl.io/ >> >> An starting example would be fine to kickstart myself (from the ones I'm >> reading, there is still something I don't get). Let's say I want to get all >> cards in [1] , as shown in the screenshot below. I know the div class of >> each one, and the div class where they are located. Which could be a minimal >> example of a scraper, to start with? >> >> Thanks, >> >> Offray >> >> > >