Hi Torsten and All,

Quick Introduction for those not familiar with Pharo-Chrome: 

Pharo-Chrome enables Pharo to control and query Chrome / Chromium, in
particular to retrieve the DOM of a page.  This is useful as many modern
pages are just a template which then loads some javascript to
asynchronously build the DOM, meaning that the ZnEasy / Soap combination
doesn't get the bulk of the information on a page.



Pharo-Chrome is now mostly working, i.e. it is possible to open
a connection to Chrome, navigate to a requested URL, wait for it to
load, retrieve the DOM and then navigate the DOM using a subset of the
Soap API, e.g. #findAllStrings:, #findAllTags:, attributeAt:, etc..

GoogleChrome class>>exampleNavigation has been updated to retrieve the
DOM from http://pharo.org.

GoogleChrome class>>get: is analogous to ZnEasy class>>get:, although it
returns a ChromeNode, not an html string.

I wasn't able to get rid of the delay while waiting for the page to
finish loading.   This actually makes sense, since, as mentioned above,
many modern pages build the DOM asynchronously, so there's no clear
indication of when it is complete.  The default delay is currently 2000
milliseconds, which is about twice the maximum I saw needed (983ms), but
this can be changed (ChromeTabPage>>pageLoadDelay:).

I had three use cases for this library: one which works with
ZnEasy+Soap, one that used to work with ZnEasy+Soap, but doesn't due to
a page redesign, and one which I hadn't got working before.  All three
are working now.

Unlike Soap, I've currently modelled the nodes as a single class, and
have only implemented a subset of Soap's methods, but is enough for what
I need.

I've introduced a dependency on the Beacon logging framework.  I find it
useful, but can remove it if you don't want the additional dependency.
(I'm planning to add some GoogleChrome specific logging classes and use
those to better understand what pageLoadDelay should be).

I was focussed on trying to understand the events that Chrome generates,
so documentation is still lacking (read "missing" :-)).

I'll generate a pull request after some more testing, tweaking and
documenting, but if you would like to take a look, the code is available
at:

https://github.com/akgrant43/Pharo-Chrome/tree/development

I haven't yet updated BaselineOfChrome with the Beacon dependency.  I
did merge in your two commits from May 23.

If you, or anyone else, finds this useful, I welcome any feedback.

P.S.  I've just realised that I need to tidy up #sendMessage:,
#sendMessageDictionary and #sendMessageDictionary:wait:.  I'll do that
as part of the genral tidy up.

Cheers,
Alistair


# vim: tw=72
On Sun, May 21, 2017 at 09:37:56PM +0000, Alistair Grant wrote:
> Hi Torsten,
> 
> On Fri, May 19, 2017 at 09:20:48PM +0000, Alistair Grant wrote:
> > 
> > On Fri, May 19, 2017 at 10:50:41PM +0200, Torsten Bergmann wrote:
> > > Hi Alistair,
> > > 
> > > cant look right now but two things:
> > > 
> > >   - there are also events in the protocol - if we could hook Pharo into 
> > > them
> > >     this would solve the problem without abusing delay (because then you 
> > > will 
> > >     get informed when the page loading is finished)
> > 
> > That would be great.  It will be a while before I get a chance to look
> > at this (I want to finish some proposed changes to the FileSystem
> > packages first), but I'll try and include it then.
> 
> I've got basic event listening working.  It requires that all messages
> are read asynchronously, so I'll need to change the interface to handle
> that.
> 
> Knowing when a page has finished loading isn't quite as simple as
> looking for an event - a page can consist of multiple frames, and 
> notifications are delivered for each frame.  The page I'm interested in
> has around 25 frames.
> 
> If anyone has a good design pattern for writing an asynchronous
> WebSocket client please let me know, I don't have anything concrete in
> mind.
> 
> Thanks,
> Alistair
> 

Reply via email to