[jira] [Updated] (JCLOUDS-1187) Avoid excessive memory usage when processing massive http response message

Andrea Turli (JIRA) Wed, 23 Nov 2016 07:57:11 -0800

     [ 
https://issues.apache.org/jira/browse/JCLOUDS-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrea Turli updated JCLOUDS-1187:
----------------------------------
    Fix Version/s:     (was: 1.9.3)
                   2.0.0

> Avoid excessive memory usage when processing massive http response message
> --------------------------------------------------------------------------
>
>                 Key: JCLOUDS-1187
>                 URL: https://issues.apache.org/jira/browse/JCLOUDS-1187
>             Project: jclouds
>          Issue Type: Improvement
>          Components: jclouds-core
>    Affects Versions: 1.9.2
>            Reporter: Aled Sage
>             Fix For: 2.0.0
>
>
> With jclouds 1.9.3-SNAPSHOT and 2.0.0-SNAPSHOT (and all GA versions)...
> If jclouds receives a crazily big http response (e.g. see 
> https://github.com/jclouds/jclouds/pull/1020), then jclouds consumes a huge 
> amount of memory while processing that response. It holds multiple copies of 
> the response in-memory at the same time.
> As reported in https://issues.apache.org/jira/browse/BROOKLYN-364, the memory 
> usage (according to {{jmap -histo:live <pid>}}) is most char arrays (which is 
> what backs the StringBuilder and String). When processing a 154MB response 
> payload, the memory used by char arrays goes something like this: 66MB -> 
> 104MB -> 323MB -> 625MB -> 939MB -> 461MB -> 633MB -> 327MB -> 21MB.
> (I don't know how much to believe the {{:live}}: is that definitely the size 
> of char arrays that could not be GC'ed?)
> There are two main areas where this memory is consumed.
> *Wire Log*
> In the jclouds wire log, the following happens (approximately?!):
> * We read the response input stream, writing it to a 
> {{FileBackedOutputStream}} - nice!
> * In {{Wire.wire(String, InputStream)}}, we read the 
> {{FileBackedOutputStream}} into a {{StringBuilder}}, and then call 
> {{getWireLog().debug(buffer.toString())}}:  
>   The StringBuilder holds the {{char[]}}; the toString() duplicates it - so 
> two copies in memory.  
>   Unfortunately in the Softlayer example, it's all one huge line, so we 
> logged it all in one go.  
>   I think (but need to dig into it more) that the logging framework (slf4j -> 
> log4j in my case) ends up with multiple copies as well, while processing the 
> call to {{log.debug(String)}}. (Hence peaking at 939MB of char arrays in 
> memory).
>   When the method returns, all these copies can be GC'ed.
> * The response payload has now been switched to the 
> {{FileBackedOutputStream}}, so that will be used subsequently.
> *Json Parsing*
> To parse the HTTP response:
> * The response is passed to 
> {{org.jclouds.http.functions.ParseJson.apply(HttpResponse)}}
> * This calls {{json.fromJson(Strings2.toStringAndClose(stream), type)}}.  
> * The {{Strings2.toStringAndClose}} calls {{CharStreams.toString(new 
> InputStreamReader(input, Charsets.UTF_8))}}.  
>   This reads the stream into a StringBuilder, then calls toString - so holds 
> two copies in memory.  
>   This explains the second spike in memory usage (though I'm surprised it 
> went as high as 633MB of char arrays in memory).  
>   When the method returns, we have our one String.  
> *Possible Improvements to Wire Log*
> {{org.jclouds.logging.internal.Wire}} could be configurable to only log the 
> first X bytes of a response (so crazily long messages would be truncated in 
> the log).
> Alternatively/additionally, {{Wire.wire}} could force splitting a huge line 
> into multiple log messages when calling 
> {{getWireLog().debug(buffer.toString())}}. Again this could be configurable.
> In production usage, I personally would always configure it to truncate: 
> better to miss the end of the response rather than risk an 
> {{OutOfMemoryError}}. Note this particular part isn't an issue if 
> jclouds.wire is set to INFO or higher.
> *Possible Improvements to Json Parsing*
> I think {{ParseJson.apply(InputStream stream, Type type)}} should pass the 
> {{new InputStreamReader(inputStream)}} to 
> {{org.jclouds.json.Json.fromJson()}} (in a try-finally to close the stream, 
> obviously).
> This would require adding a new method to the interface 
> {{org.jclouds.json.Json}}.
> The implementation of {{GsonWrapper}} can handle that easily: it can call 
> {{gson.fromJson(Reader, Type)}} instead of {{gson.fromJson(String, Type)}}. 
> It looks to me like that will parse the stream as it reads it, rather than 
> having to hold the whole string in memory at once.
> If we do these improvements, I don't think we'll ever hold the full 154MB 
> char array in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (JCLOUDS-1187) Avoid excessive memory usage when processing massive http response message

Reply via email to