[ 
https://issues.apache.org/jira/browse/HTTPCORE-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602926#action_12602926
 ] 

Sam Berlin commented on HTTPCORE-162:
-------------------------------------

I think this stems from a misunderstanding (and perhaps incomplete 
documentation) of how ThrottlingHttpClientHandler works.  The throttling, as I 
understand it, is to keep the in-memory buffered contents of the pages down -- 
but I think the throttling is per-connection, not over all connections.  It 
looks like you're doing a semi-crawl, scanning each page for more links and 
spawning more connects.  Since you're using an unbounded threadpool, this means 
each connect is going to spawn even more threads, and each of those is going to 
spawn even more threads... each of which is going to create its own throttled 
buffer with a limited size.  Eventually, there's going to be so many threads 
running and so many buffers created that it's going to trigger an OOM.

There's a few things to workaround this.

One way is to use AsyncNHttpClientHandler (only available in httpcore-nio 
snapshots right now), but that requires a pretty extensive change to the way 
you're parsing links -- you'd have to parse the results in piecemeal instead of 
a whole page at a time.  (The async handler notifies you when any bit of data 
is available, but you aren't guaranteed that all of it is.)

Another approach is to use a fixed-size thread pool.  This is the easiest, but 
is going to significantly reduce speed if there's some lagging slower 
connections.

Another approach would be to hack into ThrottlingHttpClientHandler and make the 
total buffer size a shared resource among all connections.  That'd be a 
significant change, and would have implications beyond slower connections -- it 
might lead towards some worker threads starving others from being able to read. 
 Throttling over multiple connections in a fair non-blocking way is very 
difficult.

> Out of Memory when using ThrottlingHttpClientHandler 
> -----------------------------------------------------
>
>                 Key: HTTPCORE-162
>                 URL: https://issues.apache.org/jira/browse/HTTPCORE-162
>             Project: HttpComponents HttpCore
>          Issue Type: Bug
>          Components: HttpCore NIO
>    Affects Versions: 4.0-beta1
>            Reporter: maomaode
>         Attachments: 162-testcase.patch
>
>
> I'm hitting a Out Of Memory error when using ThrottlingHttpClientHandler 
> <http://hc.apache.org/httpcomponents-core/httpcore-nio/apidocs/org/apache/http/nio/protocol/ThrottlingHttpClientHandler.html>
>  
> with the      Executors.newCachedThreadPool() , Will provide a testcase later

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to