[ https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958233#comment-14958233 ]
ASF GitHub Bot commented on NUTCH-2141: --------------------------------------- GitHub user balajig17 opened a pull request: https://github.com/apache/nutch/pull/77 fix for NUTCH-2141 contributed by Balaji Gurumurthy You can merge this pull request into a Git repository by running: $ git pull https://github.com/balajig17/nutch NUTCH-2141 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nutch/pull/77.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #77 ---- commit d9486a5567ceb9a6c77e6fe3994350f37a433510 Author: Balaji <balaji...@gmail.com> Date: 2015-10-15T03:10:16Z fix for NUTCH-2141 contributed by Balaji Gurumurthy ---- > Change the InteractiveSelenium plugin handler Interface to return page content > ------------------------------------------------------------------------------ > > Key: NUTCH-2141 > URL: https://issues.apache.org/jira/browse/NUTCH-2141 > Project: Nutch > Issue Type: Improvement > Components: plugin > Reporter: Balaji Gurumurthy > Labels: selenium > > The handler interface in the protocol-interactiveselenium plugin currently > provide methods to manipulate the page content and the HTTPResponse class > read's the page content from the driver. This limits the amount of HTML > content that could be returned to nutch. > The processDriver method could return a String object instead. This is > particularly helpful in cases such as handling pagination when multiple > pages' content can be appended and returned from the handler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)