[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635389#comment-14635389 ]
Michael Joyce commented on NUTCH-2062: -------------------------------------- Hi folks, Just wanted to elaborate a bit on what this does at the moment and what the point of it is. This plugin is effectively the protocol-selenium plugin but it allows for a "handler" to interact with the WebDriver before returning the page content. Handlers require a simple interface to be implemented. Which handler(s) are run is determined by setting the class name of the handler in a comma separated list in the config. For each URL, all the handlers are run in config-specified order. The resulting content from each driver is appended together and returned as the content. > Add Plugin for interacting with Selenium WebDriver > -------------------------------------------------- > > Key: NUTCH-2062 > URL: https://issues.apache.org/jira/browse/NUTCH-2062 > Project: Nutch > Issue Type: Improvement > Components: plugin > Affects Versions: 1.10 > Reporter: Michael Joyce > Fix For: 1.11 > > > The protocol-selenium plugin is great for pulling webpages that dynamically > load content. However, I've run into use cases where I need to actively > interact with a page in Selenium before it becomes useful. For instance, I > may need to paginate through a table to get all results that I'm interested > in. This plugin will handle that use case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)