[ https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959627#comment-14959627 ]
Balaji Gurumurthy commented on NUTCH-2141: ------------------------------------------ When we concatenate the content from multiple pages and then try to load it back to the browser using JavascriptExecutor, more often than not we get exceptions ("Unterminated string literal", "Missing ; before statement" to name a few ) while executing the javascript string. Debugging these errors from all the pages' concatenated content is pain. Instead of concatenating the content and loading it back to driver and reading it from the driver back again in HTTPResponse class, just returning the concatenated result back to Nutch seemed better. > Change the InteractiveSelenium plugin handler Interface to return page content > ------------------------------------------------------------------------------ > > Key: NUTCH-2141 > URL: https://issues.apache.org/jira/browse/NUTCH-2141 > Project: Nutch > Issue Type: Improvement > Components: plugin > Reporter: Balaji Gurumurthy > Labels: selenium > > The handler interface in the protocol-interactiveselenium plugin currently > provide methods to manipulate the page content and the HTTPResponse class > read's the page content from the driver. This limits the amount of HTML > content that could be returned to nutch. > The processDriver method could return a String object instead. This is > particularly helpful in cases such as handling pagination when multiple > pages' content can be appended and returned from the handler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)