[ https://issues.apache.org/jira/browse/AVRO-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom White resolved AVRO-1881. ----------------------------- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.9.0 I just committed this. Thanks Nandor! > Avro (Java) Memory Leak when reusing JsonDecoder instance > --------------------------------------------------------- > > Key: AVRO-1881 > URL: https://issues.apache.org/jira/browse/AVRO-1881 > Project: Avro > Issue Type: Bug > Components: java > Affects Versions: 1.8.1 > Environment: Ubuntu 15.04 > Oracle 1.8.0_91 and OpenJDK 1.8.0_45 > Reporter: Matt Allen > Assignee: Nandor Kollar > Fix For: 1.9.0 > > > {{JsonDecoder}} maintains state for each record decoded, leading to a memory > leak if the same instance is used for multiple inputs. Using > {{JsonDecoder.configure}} to change the input does not correctly clean up the > state stored in {{JsonDecoder.reorderBuffers}}, which leads to an unbounded > number of {{ReorderBuffer}} instances being accumulated. If a new > {{JsonDecoder}} is created for each input there is no memory leak, but it is > significantly more expensive than reusing the same instance. > This problem seems to only occur when the input schema contains a record, > which is consistent with the {{reorderBuffers}} being the source of the leak. > My first look at the {{JsonDecoder}} code leads me to believe that the > {{reorderBuffers}} stack should be empty after a record is fully processed, > so there may be other behavior at play here. > The following is a minimal example which will exhaust a 50MB heap (-Xmx50m) > after about 5.25 million iterations. The first section demonstrates that no > memory leak is encountered when creating a fresh {{JsonDecoder}} instance for > each input. > {code:title=JsonDecoderMemoryLeak.java|borderStyle=solid} > import org.apache.avro.Schema; > import org.apache.avro.io.*; > import org.apache.avro.generic.*; > import java.io.IOException; > public class JsonDecoderMemoryLeak { > public static DecoderFactory decoderFactory = DecoderFactory.get(); > public static JsonDecoder createDecoder(String input, Schema schema) > throws IOException { > return decoderFactory.jsonDecoder(schema, input); > } > public static Object decodeAvro(String input, Schema schema, JsonDecoder > decoder) throws IOException { > if (decoder == null) { > decoder = createDecoder(input, schema); > } else { > decoder.configure(input); > } > GenericDatumReader reader = new > GenericDatumReader<GenericRecord>(schema); > return reader.read(null, decoder); > } > public static Schema.Parser parser = new Schema.Parser(); > public static Schema schema = parser.parse("{\"name\": \"TestRecord\", > \"type\": \"record\", \"fields\": [{\"name\": \"field1\", \"type\": > \"long\"}]}"); > public static String record(long i) { > StringBuilder builder = new StringBuilder("{\"field1\": "); > builder.append(i); > builder.append("}"); > return builder.toString(); > } > public static void main(String[] args) throws IOException { > // No memory issues when creating a new decoder for each record > System.out.println("Running with fresh JsonDecoder instances for > 6000000 iterations"); > for(long i = 0; i < 6000000; i++) { > decodeAvro(record(i), schema, null); > } > > // Runs out of memory after ~5250000 records > System.out.println("Running with a single reused JsonDecoder > instance"); > long count = 0; > try { > JsonDecoder decoder = createDecoder(record(0), schema); > while(true) { > decodeAvro(record(count), schema, decoder); > count++; > } > } catch (OutOfMemoryError e) { > System.out.println("Out of memory after " + count + " records"); > e.printStackTrace(); > } > } > } > {code} > {code:title=Output|borderStyle=solid} > $ java -Xmx50m -jar json-decoder-memory-leak.jar > Running with fresh JsonDecoder instances for 6000000 iterations > Running with a single reused JsonDecoder instance > Out of memory after 5242880 records > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3210) > at java.util.Arrays.copyOf(Arrays.java:3181) > at java.util.Vector.grow(Vector.java:266) > at java.util.Vector.ensureCapacityHelper(Vector.java:246) > at java.util.Vector.addElement(Vector.java:620) > at java.util.Stack.push(Stack.java:67) > at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:487) > at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) > at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:139) > at org.apache.avro.io.JsonDecoder.readLong(JsonDecoder.java:178) > at > org.apache.avro.io.ResolvingDecoder.readLong(ResolvingDecoder.java:162) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144) > at com.spiceworks.App.decodeAvro(App.java:25) > at com.spiceworks.App.main(App.java:52) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)