[ https://issues.apache.org/jira/browse/JCLOUDS-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrea Turli updated JCLOUDS-1187: ---------------------------------- Fix Version/s: (was: 1.9.3) 2.0.0 > Avoid excessive memory usage when processing massive http response message > -------------------------------------------------------------------------- > > Key: JCLOUDS-1187 > URL: https://issues.apache.org/jira/browse/JCLOUDS-1187 > Project: jclouds > Issue Type: Improvement > Components: jclouds-core > Affects Versions: 1.9.2 > Reporter: Aled Sage > Fix For: 2.0.0 > > > With jclouds 1.9.3-SNAPSHOT and 2.0.0-SNAPSHOT (and all GA versions)... > If jclouds receives a crazily big http response (e.g. see > https://github.com/jclouds/jclouds/pull/1020), then jclouds consumes a huge > amount of memory while processing that response. It holds multiple copies of > the response in-memory at the same time. > As reported in https://issues.apache.org/jira/browse/BROOKLYN-364, the memory > usage (according to {{jmap -histo:live <pid>}}) is most char arrays (which is > what backs the StringBuilder and String). When processing a 154MB response > payload, the memory used by char arrays goes something like this: 66MB -> > 104MB -> 323MB -> 625MB -> 939MB -> 461MB -> 633MB -> 327MB -> 21MB. > (I don't know how much to believe the {{:live}}: is that definitely the size > of char arrays that could not be GC'ed?) > There are two main areas where this memory is consumed. > *Wire Log* > In the jclouds wire log, the following happens (approximately?!): > * We read the response input stream, writing it to a > {{FileBackedOutputStream}} - nice! > * In {{Wire.wire(String, InputStream)}}, we read the > {{FileBackedOutputStream}} into a {{StringBuilder}}, and then call > {{getWireLog().debug(buffer.toString())}}: > The StringBuilder holds the {{char[]}}; the toString() duplicates it - so > two copies in memory. > Unfortunately in the Softlayer example, it's all one huge line, so we > logged it all in one go. > I think (but need to dig into it more) that the logging framework (slf4j -> > log4j in my case) ends up with multiple copies as well, while processing the > call to {{log.debug(String)}}. (Hence peaking at 939MB of char arrays in > memory). > When the method returns, all these copies can be GC'ed. > * The response payload has now been switched to the > {{FileBackedOutputStream}}, so that will be used subsequently. > *Json Parsing* > To parse the HTTP response: > * The response is passed to > {{org.jclouds.http.functions.ParseJson.apply(HttpResponse)}} > * This calls {{json.fromJson(Strings2.toStringAndClose(stream), type)}}. > * The {{Strings2.toStringAndClose}} calls {{CharStreams.toString(new > InputStreamReader(input, Charsets.UTF_8))}}. > This reads the stream into a StringBuilder, then calls toString - so holds > two copies in memory. > This explains the second spike in memory usage (though I'm surprised it > went as high as 633MB of char arrays in memory). > When the method returns, we have our one String. > *Possible Improvements to Wire Log* > {{org.jclouds.logging.internal.Wire}} could be configurable to only log the > first X bytes of a response (so crazily long messages would be truncated in > the log). > Alternatively/additionally, {{Wire.wire}} could force splitting a huge line > into multiple log messages when calling > {{getWireLog().debug(buffer.toString())}}. Again this could be configurable. > In production usage, I personally would always configure it to truncate: > better to miss the end of the response rather than risk an > {{OutOfMemoryError}}. Note this particular part isn't an issue if > jclouds.wire is set to INFO or higher. > *Possible Improvements to Json Parsing* > I think {{ParseJson.apply(InputStream stream, Type type)}} should pass the > {{new InputStreamReader(inputStream)}} to > {{org.jclouds.json.Json.fromJson()}} (in a try-finally to close the stream, > obviously). > This would require adding a new method to the interface > {{org.jclouds.json.Json}}. > The implementation of {{GsonWrapper}} can handle that easily: it can call > {{gson.fromJson(Reader, Type)}} instead of {{gson.fromJson(String, Type)}}. > It looks to me like that will parse the stream as it reads it, rather than > having to hold the whole string in memory at once. > If we do these improvements, I don't think we'll ever hold the full 154MB > char array in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)