[ 
https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959659#comment-14959659
 ] 

Michael Joyce commented on NUTCH-2141:
--------------------------------------

Cool makes sense. Do you have any examples? I'd like to poke as well. You're 
going to need to handle the screenshot functionality differently as well. 
getHTMLContent does more than just return the body content. We probably don't 
really need the DefalultMultiInteractionHandler example either if this 
basically replaces that. [~asitang] might have some ideas as well.

> Change the InteractiveSelenium plugin handler Interface to return page content
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-2141
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2141
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin
>            Reporter: Balaji Gurumurthy
>              Labels: selenium
>
> The handler interface in the protocol-interactiveselenium plugin currently 
> provide methods to manipulate the page content and the HTTPResponse class 
> read's the page content from the driver. This limits the amount of HTML 
> content that could be returned to nutch.
> The processDriver method could return a String object instead. This is 
> particularly helpful  in cases such as handling pagination when multiple 
> pages' content can be appended and returned from the handler. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to