[ 
https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959627#comment-14959627
 ] 

Balaji Gurumurthy commented on NUTCH-2141:
------------------------------------------

When we concatenate the content from multiple pages and then try to load it 
back to the browser using JavascriptExecutor, more often than not we get 
exceptions ("Unterminated string literal", "Missing ; before statement" to name 
a few ) while executing the javascript string. Debugging these errors from all 
the pages' concatenated content is pain.
Instead of concatenating the content and loading it back to driver and reading 
it from the driver back again in HTTPResponse class, just returning the 
concatenated result back to Nutch seemed better.

> Change the InteractiveSelenium plugin handler Interface to return page content
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-2141
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2141
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin
>            Reporter: Balaji Gurumurthy
>              Labels: selenium
>
> The handler interface in the protocol-interactiveselenium plugin currently 
> provide methods to manipulate the page content and the HTTPResponse class 
> read's the page content from the driver. This limits the amount of HTML 
> content that could be returned to nutch.
> The processDriver method could return a String object instead. This is 
> particularly helpful  in cases such as handling pagination when multiple 
> pages' content can be appended and returned from the handler. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to