[ https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated SPARK-47959: ------------------------------- Description: We have a Spark executor that is running 32 workers in parallel. The query is a simple SELECT with several `GET_JSON_OBJECT` UDF calls. We noticed that 80+% of the stacktrace of the worker threads are blocked on the following stacktrace: {code:java} com.fasterxml.jackson.core.util.InternCache.intern(InternCache.java:50) - blocked on java.lang.Object@7529fde1 com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:947) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2482) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2339) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1870) com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1825) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:798) com.fasterxml.jackson.core.base.ParserMinimalBase.skipChildren(ParserMinimalBase.java:240) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:383) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:287) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4(jsonExpressions.scala:198) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4$adapted(jsonExpressions.scala:196) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase$$Lambda$8585/1316745697.apply(Unknown Source) ... {code} Apparently jackson-core has such a performance bug from version 2.3 - 2.15, and not fixed until version 2.18 (unreleased): [https://github.com/FasterXML/jackson-core/blob/fc51d1e13f4ba62a25a739f26be9e05aaad88c3e/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L50] {code:java} synchronized (lock) { if (size() >= MAX_ENTRIES) { clear(); } } {code} instead of [https://github.com/FasterXML/jackson-core/blob/8b87cc1a96f649a7e7872c5baa8cf97909cabf6b/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L59] {code:java} /* As of 2.18, the limit is not strictly enforced, but we do try to * clear entries if we have reached the limit. We do not expect to * go too much over the limit, and if we do, it's not a huge problem. * If some other thread has the lock, we will not clear but the lock should * not be held for long, so another thread should be able to clear in the near future. */ if (lock.tryLock()) { try { if (size() >= DEFAULT_MAX_ENTRIES) { clear(); } } finally { lock.unlock(); } } {code} was: We have a Spark executor that is running 32 workers in parallel. The query is a simple SELECT with several `GET_JSON_OBJECT` UDF calls. We noticed that 80+% of the stacktrace of the worker threads are blocked on the following stacktrace: {code:java} com.fasterxml.jackson.core.util.InternCache.intern(InternCache.java:50) - blocked on java.lang.Object@7529fde1 com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:947) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2482) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2339) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1870) com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1825) com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:798) com.fasterxml.jackson.core.base.ParserMinimalBase.skipChildren(ParserMinimalBase.java:240) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:383) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:287) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4(jsonExpressions.scala:198) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4$adapted(jsonExpressions.scala:196) org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase$$Lambda$8585/1316745697.apply(Unknown Source) ... {code} Apparently jackson-core has such a performance bug from version 2.3 - 2.15, and not fixed until version 2.18 (unreleased): [https://github.com/FasterXML/jackson-core/blob/fc51d1e13f4ba62a25a739f26be9e05aaad88c3e/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L50] {code:java} synchronized (lock) { if (size() >= MAX_ENTRIES) { clear(); } } {code} instead of [https://github.com/FasterXML/jackson-core/blob/8b87cc1a96f649a7e7872c5baa8cf97909cabf6b/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L59] {code:java} /* 18-Sep-2013, tatu: We used to use LinkedHashMap, which has simple LRU * method. No such functionality exists with CHM; and let's use simplest * possible limitation: just clear all contents. This because otherwise * we are simply likely to keep on clearing same, commonly used entries. */ if (size() >= MAX_ENTRIES) { /* Not incorrect wrt well-known double-locking anti-pattern because underlying * storage gives close enough answer to real one here; and we are * more concerned with flooding than starvation. */ synchronized (lock) { if (size() >= MAX_ENTRIES) { clear(); } } } {code} > Improve GET_JSON_OBJECT performance on executors running multiple tasks > ----------------------------------------------------------------------- > > Key: SPARK-47959 > URL: https://issues.apache.org/jira/browse/SPARK-47959 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.5.1 > Reporter: Zheng Shao > Priority: Major > > We have a Spark executor that is running 32 workers in parallel. The query > is a simple SELECT with several `GET_JSON_OBJECT` UDF calls. > We noticed that 80+% of the stacktrace of the worker threads are blocked on > the following stacktrace: > > {code:java} > com.fasterxml.jackson.core.util.InternCache.intern(InternCache.java:50) - > blocked on java.lang.Object@7529fde1 > com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.addName(ByteQuadsCanonicalizer.java:947) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.addName(UTF8StreamJsonParser.java:2482) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.findName(UTF8StreamJsonParser.java:2339) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseMediumName(UTF8StreamJsonParser.java:1870) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1825) > > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:798) > > com.fasterxml.jackson.core.base.ParserMinimalBase.skipChildren(ParserMinimalBase.java:240) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:383) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.evaluatePath(jsonExpressions.scala:287) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4(jsonExpressions.scala:198) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase.$anonfun$eval$4$adapted(jsonExpressions.scala:196) > > org.apache.spark.sql.catalyst.expressions.GetJsonObjectBase$$Lambda$8585/1316745697.apply(Unknown > Source) > ... > {code} > > Apparently jackson-core has such a performance bug from version 2.3 - 2.15, > and not fixed until version 2.18 (unreleased): > [https://github.com/FasterXML/jackson-core/blob/fc51d1e13f4ba62a25a739f26be9e05aaad88c3e/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L50] > > {code:java} > synchronized (lock) { > if (size() >= MAX_ENTRIES) { > clear(); > } > } > {code} > > instead of > [https://github.com/FasterXML/jackson-core/blob/8b87cc1a96f649a7e7872c5baa8cf97909cabf6b/src/main/java/com/fasterxml/jackson/core/util/InternCache.java#L59] > > {code:java} > /* As of 2.18, the limit is not strictly enforced, but we do try > to > * clear entries if we have reached the limit. We do not expect to > * go too much over the limit, and if we do, it's not a huge > problem. > * If some other thread has the lock, we will not clear but the > lock should > * not be held for long, so another thread should be able to > clear in the near future. > */ > if (lock.tryLock()) { > try { > if (size() >= DEFAULT_MAX_ENTRIES) { > clear(); > } > } finally { > lock.unlock(); > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org