[ https://issues.apache.org/jira/browse/AVRO-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553598#comment-17553598 ]
tansion commented on AVRO-3531: ------------------------------- And it is ture that not easy to reproduce the issue (Same code running well more than two years before i meet this issue) {code:java} /** * Returns index for Object x. */ private static int hash(Object x, int length) { int h = System.identityHashCode(x); // Multiply by -127, and left-shift to use least bit as part of hash return ((h << 1) - (h << 8)) & (length - 1); } /** * Circularly traverses table of size len. */ private static int nextKeyIndex(int i, int len) { return (i + 2 < len ? i + 2 : 0); } public V put(K key, V value) { final Object k = maskNull(key); retryAfterResize: for (;;) { final Object[] tab = table; final int len = tab.length; int i = hash(k, len); for (Object item; (item = tab[i]) != null; i = nextKeyIndex(i, len)) { if (item == k) { @SuppressWarnings("unchecked") V oldValue = (V) tab[i + 1]; tab[i + 1] = value; return oldValue; } } final int s = size + 1; // Use optimized form of 3 * s. // Next capacity is len, 2 * current capacity. if (s + (s << 1) > len && resize(len)) continue retryAfterResize; modCount++; tab[i] = k; tab[i + 1] = value; size = s; return null; } } {code} The initial capacity is 32, only when there are 32 KVs (or 32 * 2/3) being inserted into the map concurrently, and each key's hash code happens to be different, like 0~31. The resize operation might be skipped, and all the slot is occupied. Then the get() method will go into infinite loop status for each time. > GenericDatumReader in multithread lead to infinite loop cause misused of > IdentityHashMap > ---------------------------------------------------------------------------------------- > > Key: AVRO-3531 > URL: https://issues.apache.org/jira/browse/AVRO-3531 > Project: Apache Avro > Issue Type: Bug > Components: java > Affects Versions: 1.11.0 > Reporter: tansion > Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Hi, > I am working on a java project that uses Kafka with Avro > serialization/deserialization in an messaging platform. > In production enrionment, we meet a serious issue on the deserialization > processs. The GenericDatumReader process some how get into a infinite loop > status, and it is happened accationally. > When the issue happens, The thread stack is like this: > > {code:java} > "DmqFixedRateConsumer-Thread-17" #453 daemon prio=5 os_prio=0 > tid=0x00007f2ae1832800 nid=0xef49 runnable [0x00007f2a743fc000] > java.lang.Thread.State: RUNNABLE > at java.util.IdentityHashMap.get(IdentityHashMap.java:337) > at > org.apache.avro.generic.GenericDatumReader.getStringClass(GenericDatumReader.java:503) > at > org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:454) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:191) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187) > at > org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247) > at > org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187) > at > org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247) > at > org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187) > at > org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247) > at > org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187) > at > org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247) > at > org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187) > at > org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247) > at > org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at com.xxx.xxx.xxx.xxx.xxx.XXX.deserialize(XXX.java:252) > at com.xxx.xxx.xxx.xxx.xxx.ZZZ.deserialize(ZZZ.java:216) > at com.xxx.xxx.xxx.xxx.xxx.SSS.processMessage(SSS.java:152) > at com.xxx.xxx.xxx.xxx.xxx.SSS.loopProcess(SSS.java:127) > at com.xxx.xxx.xxx.xxx.xxx.SSS$$Lambda$172/367082698.run(Unknown Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} > We create 30 threads, and all the threads are the same as above! They all get > stuck in the IdentityHashMap.get() method. > > Accroding to this mail [1.7.6 Slow > Deserialization|https://www.mail-archive.com/user@avro.apache.org/msg02902.html], > the Reader is thread-safe, But actually, it seems not. > Why? > org.apache.avro.generic.GenericDatumReader#getStringClass > > {code:java} > /** > * Called to read strings. Subclasses may override to use a different string > * representation. By default, this calls {@link #readString(Object,Decoder)}. > */ > protected Object readString(Object old, Schema expected, Decoder in) throws > IOException { > Class stringClass = getStringClass(expected); > if (stringClass == String.class) { > return in.readString(); > } > if (stringClass == CharSequence.class) { > return readString(old, in); > } > return newInstanceFromString(stringClass, in.readString()); > } > private Map<Schema, Class> stringClassCache = new IdentityHashMap<>(); > private Class getStringClass(Schema s) { > Class c = stringClassCache.get(s); > if (c == null) { > c = findStringClass(s); > stringClassCache.put(s, c); > } > return c; > } > {code} > The IdentityHashMap is not thread-safe, which is addressed by javadoc > clearly! Like Hashmap infinite loop issue in multithread using, same issue > happen to IdentityHashMap,too. > My question is: Can the class GenericDatumReader fix this issue and act like > real thread-safe? Or we need to avoid use the single instance of > GenericDatumReader in multithread? > Thanks a lot, > Xtsong. > -- This message was sent by Atlassian Jira (v8.20.7#820007)