[ https://issues.apache.org/jira/browse/SPARK-34607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294566#comment-17294566 ]
Takeshi Yamamuro commented on SPARK-34607: ------------------------------------------ okay, I'll dig into this tomorrow. > NewInstance.resolved should not throw malformed class name error > ---------------------------------------------------------------- > > Key: SPARK-34607 > URL: https://issues.apache.org/jira/browse/SPARK-34607 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.7, 3.0.2, 3.1.1 > Reporter: Kris Mok > Priority: Major > > I'd like to seek for community help on fixing this issue: > Related to SPARK-34596, one of our users had hit an issue with > {{ExpressionEncoder}} when running Spark code in a Scala REPL, where > {{NewInstance.resolved}} was throwing {{"Malformed class name"}} error, with > the following kind of stack trace: > {code} > java.lang.InternalError: Malformed class name > at java.lang.Class.getSimpleBinaryName(Class.java:1450) > at java.lang.Class.isMemberClass(Class.java:1433) > at > org.apache.spark.sql.catalyst.expressions.objects.NewInstance.resolved$lzycompute(objects.scala:447) > at > org.apache.spark.sql.catalyst.expressions.objects.NewInstance.resolved(objects.scala:441) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpressionBottomUp(Analyzer.scala:1935) > ... > Caused by: sbt.ForkMain$ForkError: > java.lang.StringIndexOutOfBoundsException: String index out of range: -83 > at java.lang.String.substring(String.java:1931) > at java.lang.Class.getSimpleBinaryName(Class.java:1448) > at java.lang.Class.isMemberClass(Class.java:1433) > at > org.apache.spark.sql.catalyst.expressions.objects.NewInstance.resolved$lzycompute(objects.scala:447) > at > org.apache.spark.sql.catalyst.expressions.objects.NewInstance.resolved(objects.scala:441) > ... > {code} > The most important point in the stack trace is this: > {code} > java.lang.InternalError: Malformed class name > at java.lang.Class.getSimpleBinaryName(Class.java:1450) > at java.lang.Class.isMemberClass(Class.java:1433) > {code} > The most common way to hit the {{"Malformed class name"}} issue in Spark is > via {{java.lang.Class.getSimpleName}}. But as this stack trace demonstrates, > it can happen via other code paths from the JDK as well. > If we want to fix it in a similar fashion as {{Utils.getSimpleName}}, we'd > have to emulate {{java.lang.Class.isMemberClass}} in Spark's {{Utils}}, and > then use it in the {{NewInstance.resolved}} code path. > Here's a reproducer test case (in diff form against Spark master's > {{ExpressionEncoderSuite}} ), which uses explicit nested classes to emulate > the code structure that'd be generated by Scala's REPL: > (the level of nesting can be further reduced and still reproduce the issue) > {code} > diff --git > a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala > > b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala > index 2635264..fd1b23d 100644 > --- > a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala > +++ > b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala > @@ -217,6 +217,95 @@ class ExpressionEncoderSuite extends > CodegenInterpretedPlanTest with AnalysisTes > "nested Scala class should work") > } > > + object OuterLevelWithVeryVeryVeryLongClassName1 { > + object OuterLevelWithVeryVeryVeryLongClassName2 { > + object OuterLevelWithVeryVeryVeryLongClassName3 { > + object OuterLevelWithVeryVeryVeryLongClassName4 { > + object OuterLevelWithVeryVeryVeryLongClassName5 { > + object OuterLevelWithVeryVeryVeryLongClassName6 { > + object OuterLevelWithVeryVeryVeryLongClassName7 { > + object OuterLevelWithVeryVeryVeryLongClassName8 { > + object OuterLevelWithVeryVeryVeryLongClassName9 { > + object OuterLevelWithVeryVeryVeryLongClassName10 { > + object OuterLevelWithVeryVeryVeryLongClassName11 { > + object OuterLevelWithVeryVeryVeryLongClassName12 { > + object OuterLevelWithVeryVeryVeryLongClassName13 { > + object OuterLevelWithVeryVeryVeryLongClassName14 > { > + object > OuterLevelWithVeryVeryVeryLongClassName15 { > + object > OuterLevelWithVeryVeryVeryLongClassName16 { > + object > OuterLevelWithVeryVeryVeryLongClassName17 { > + object > OuterLevelWithVeryVeryVeryLongClassName18 { > + object > OuterLevelWithVeryVeryVeryLongClassName19 { > + object > OuterLevelWithVeryVeryVeryLongClassName20 { > + case class > MalformedNameExample2(x: Int) > + } > + } > + } > + } > + } > + } > + } > + } > + } > + } > + } > + } > + } > + } > + } > + } > + } > + } > + } > + } > + > + { > + OuterScopes.addOuterScope( > + OuterLevelWithVeryVeryVeryLongClassName1 > + .OuterLevelWithVeryVeryVeryLongClassName2 > + .OuterLevelWithVeryVeryVeryLongClassName3 > + .OuterLevelWithVeryVeryVeryLongClassName4 > + .OuterLevelWithVeryVeryVeryLongClassName5 > + .OuterLevelWithVeryVeryVeryLongClassName6 > + .OuterLevelWithVeryVeryVeryLongClassName7 > + .OuterLevelWithVeryVeryVeryLongClassName8 > + .OuterLevelWithVeryVeryVeryLongClassName9 > + .OuterLevelWithVeryVeryVeryLongClassName10 > + .OuterLevelWithVeryVeryVeryLongClassName11 > + .OuterLevelWithVeryVeryVeryLongClassName12 > + .OuterLevelWithVeryVeryVeryLongClassName13 > + .OuterLevelWithVeryVeryVeryLongClassName14 > + .OuterLevelWithVeryVeryVeryLongClassName15 > + .OuterLevelWithVeryVeryVeryLongClassName16 > + .OuterLevelWithVeryVeryVeryLongClassName17 > + .OuterLevelWithVeryVeryVeryLongClassName18 > + .OuterLevelWithVeryVeryVeryLongClassName19 > + .OuterLevelWithVeryVeryVeryLongClassName20) > + encodeDecodeTest( > + OuterLevelWithVeryVeryVeryLongClassName1 > + .OuterLevelWithVeryVeryVeryLongClassName2 > + .OuterLevelWithVeryVeryVeryLongClassName3 > + .OuterLevelWithVeryVeryVeryLongClassName4 > + .OuterLevelWithVeryVeryVeryLongClassName5 > + .OuterLevelWithVeryVeryVeryLongClassName6 > + .OuterLevelWithVeryVeryVeryLongClassName7 > + .OuterLevelWithVeryVeryVeryLongClassName8 > + .OuterLevelWithVeryVeryVeryLongClassName9 > + .OuterLevelWithVeryVeryVeryLongClassName10 > + .OuterLevelWithVeryVeryVeryLongClassName11 > + .OuterLevelWithVeryVeryVeryLongClassName12 > + .OuterLevelWithVeryVeryVeryLongClassName13 > + .OuterLevelWithVeryVeryVeryLongClassName14 > + .OuterLevelWithVeryVeryVeryLongClassName15 > + .OuterLevelWithVeryVeryVeryLongClassName16 > + .OuterLevelWithVeryVeryVeryLongClassName17 > + .OuterLevelWithVeryVeryVeryLongClassName18 > + .OuterLevelWithVeryVeryVeryLongClassName19 > + .OuterLevelWithVeryVeryVeryLongClassName20 > + .MalformedNameExample2(42), > + "deeply nested Scala class should work") > + } > + > productTest(PrimitiveData(1, 1, 1, 1, 1, 1, true)) > > productTest( > {code} > The reason why the test case above is so convoluted is in the way Scala > generates the class name for nested classes. In general, Scala generates a > class name for a nested class by inserting the dollar-sign ( {{$}} ) in > between each level of class nesting. The problem is that this format can > concatenate into a very long string that goes beyond certain limits, so Scala > will change the class name format beyond certain length threshold. > For the example above, we can see that the first two levels of class nesting > have class names that look like this: > {code} > org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassName1$ > org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassName1$OuterLevelWithVeryVeryVeryLongClassName2$ > {code} > If we leave out the fact that Scala uses a dollar-sign ( {{$}} ) suffix for > the class name of the companion object, > {{OuterLevelWithVeryVeryVeryLongClassName1}}'s full name is a prefix > (substring) of {{OuterLevelWithVeryVeryVeryLongClassName2}}. > But if we keep going deeper into the levels of nesting, you'll find names > that look like: > {code} > org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$2a1321b953c615695d7442b2adb1$$$$ryVeryLongClassName8$OuterLevelWithVeryVeryVeryLongClassName9$OuterLevelWithVeryVeryVeryLongClassName10$ > org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$2a1321b953c615695d7442b2adb1$$$$ryVeryLongClassName8$OuterLevelWithVeryVeryVeryLongClassName9$OuterLevelWithVeryVeryVeryLongClassName10$OuterLevelWithVeryVeryVeryLongClassName11$ > org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$ > org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$OuterLevelWithVeryVeryVeryLongClassName13$ > org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$OuterLevelWithVeryVeryVeryLongClassName13$OuterLevelWithVeryVeryVeryLongClassName14$ > org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$ > org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$OuterLevelWithVeryVeryVeryLongClassName16$ > org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$OuterLevelWithVeryVeryVeryLongClassName16$OuterLevelWithVeryVeryVeryLongClassName17$ > org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$ > org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$OuterLevelWithVeryVeryVeryLongClassName19$ > org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$OuterLevelWithVeryVeryVeryLongClassName19$OuterLevelWithVeryVeryVeryLongClassName20$ > {code} > with a hash code in the middle and various levels of nesting omitted. > The {{java.lang.Class.isMemberClass}} method is implemented in JDK8u as: > http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/tip/src/share/classes/java/lang/Class.java#l1425 > {code:java} > /** > * Returns {@code true} if and only if the underlying class > * is a member class. > * > * @return {@code true} if and only if this class is a member class. > * @since 1.5 > */ > public boolean isMemberClass() { > return getSimpleBinaryName() != null && !isLocalOrAnonymousClass(); > } > /** > * Returns the "simple binary name" of the underlying class, i.e., > * the binary name without the leading enclosing class name. > * Returns {@code null} if the underlying class is a top level > * class. > */ > private String getSimpleBinaryName() { > Class<?> enclosingClass = getEnclosingClass(); > if (enclosingClass == null) // top level class > return null; > // Otherwise, strip the enclosing class' name > try { > return getName().substring(enclosingClass.getName().length()); > } catch (IndexOutOfBoundsException ex) { > throw new InternalError("Malformed class name", ex); > } > } > {code} > and the problematic code is > {{getName().substring(enclosingClass.getName().length())}} -- if a class's > enclosing class's full name is *longer* than the nested class's full name, > this logic would end up going out of bounds. > The bug has been fixed in JDK9 by > https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8057919 , but still > exists in the latest JDK8u release. So from the Spark side we'd need to do > something to avoid hitting this problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org