[ https://issues.apache.org/jira/browse/JCLOUDS-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15600074#comment-15600074 ]
Ignasi Barrera commented on JCLOUDS-1187: ----------------------------------------- Thanks for the detailed report [~aled.sage]. It is pretty much appreciated. I agree with Andrea and would say that wire logs at debug level are not meant for production. They are too verbose and often carry sensitive information. Apart from that, they may be convenient for compute apis, but when you use the blob store providers, payloads usually contain binary data that just pollutes the logs. Wire logs are fine for debugging purposes during short amounts of time, but should be generally disabled. Most logging frameworks support hot reloading of the configuration file, so enabling the wire logs can be done in most cases even without stopping the application. I've opened a PR with your suggested improvement to the default json serialization: https://github.com/jclouds/jclouds/pull/1025 Could you give it a try and report back if disabling the wire logs and applying the patch fix the issue? This would be a nice to have for 2.0. > Avoid excessive memory usage when processing massive http response message > -------------------------------------------------------------------------- > > Key: JCLOUDS-1187 > URL: https://issues.apache.org/jira/browse/JCLOUDS-1187 > Project: jclouds > Issue Type: Improvement > Components: jclouds-core > Affects Versions: 1.9.2 > Reporter: Aled Sage > > With jclouds 1.9.3-SNAPSHOT and 2.0.0-SNAPSHOT (and all GA versions)... > If jclouds receives a crazily big http response (e.g. see > https://github.com/jclouds/jclouds/pull/1020), then jclouds consumes a huge > amount of memory while processing that response. It holds multiple copies of > the response in-memory at the same time. > As reported in https://issues.apache.org/jira/browse/BROOKLYN-364, the memory > usage (according to {{jmap -histo:live <pid>}}) is most char arrays (which is > what backs the StringBuilder and String). When processing a 154MB response > payload, the memory used by char arrays goes something like this: 66MB -> > 104MB -> 323MB -> 625MB -> 939MB -> 461MB -> 633MB -> 327MB -> 21MB. > (I don't know how much to believe the {{:live}}: is that definitely the size > of char arrays that could not be GC'ed?) > There are two main areas where this memory is consumed. > *Wire Log* > In the jclouds wire log, the following happens (approximately?!): > * We read the response input stream, writing it to a > {{FileBackedOutputStream}} - nice! > * In {{Wire.wire(String, InputStream)}}, we read the > {{FileBackedOutputStream}} into a {{StringBuilder}}, and then call > {{getWireLog().debug(buffer.toString())}}: > The StringBuilder holds the {{char[]}}; the toString() duplicates it - so > two copies in memory. > Unfortunately in the Softlayer example, it's all one huge line, so we > logged it all in one go. > I think (but need to dig into it more) that the logging framework (slf4j -> > log4j in my case) ends up with multiple copies as well, while processing the > call to {{log.debug(String)}}. (Hence peaking at 939MB of char arrays in > memory). > When the method returns, all these copies can be GC'ed. > * The response payload has now been switched to the > {{FileBackedOutputStream}}, so that will be used subsequently. > *Json Parsing* > To parse the HTTP response: > * The response is passed to > {{org.jclouds.http.functions.ParseJson.apply(HttpResponse)}} > * This calls {{json.fromJson(Strings2.toStringAndClose(stream), type)}}. > * The {{Strings2.toStringAndClose}} calls {{CharStreams.toString(new > InputStreamReader(input, Charsets.UTF_8))}}. > This reads the stream into a StringBuilder, then calls toString - so holds > two copies in memory. > This explains the second spike in memory usage (though I'm surprised it > went as high as 633MB of char arrays in memory). > When the method returns, we have our one String. > *Possible Improvements to Wire Log* > {{org.jclouds.logging.internal.Wire}} could be configurable to only log the > first X bytes of a response (so crazily long messages would be truncated in > the log). > Alternatively/additionally, {{Wire.wire}} could force splitting a huge line > into multiple log messages when calling > {{getWireLog().debug(buffer.toString())}}. Again this could be configurable. > In production usage, I personally would always configure it to truncate: > better to miss the end of the response rather than risk an > {{OutOfMemoryError}}. Note this particular part isn't an issue if > jclouds.wire is set to INFO or higher. > *Possible Improvements to Json Parsing* > I think {{ParseJson.apply(InputStream stream, Type type)}} should pass the > {{new InputStreamReader(inputStream)}} to > {{org.jclouds.json.Json.fromJson()}} (in a try-finally to close the stream, > obviously). > This would require adding a new method to the interface > {{org.jclouds.json.Json}}. > The implementation of {{GsonWrapper}} can handle that easily: it can call > {{gson.fromJson(Reader, Type)}} instead of {{gson.fromJson(String, Type)}}. > It looks to me like that will parse the stream as it reads it, rather than > having to hold the whole string in memory at once. > If we do these improvements, I don't think we'll ever hold the full 154MB > char array in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)