[ 
https://issues.apache.org/jira/browse/SPARK-34607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294566#comment-17294566
 ] 

Takeshi Yamamuro commented on SPARK-34607:
------------------------------------------

okay, I'll dig into this tomorrow.

> NewInstance.resolved should not throw malformed class name error
> ----------------------------------------------------------------
>
>                 Key: SPARK-34607
>                 URL: https://issues.apache.org/jira/browse/SPARK-34607
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.7, 3.0.2, 3.1.1
>            Reporter: Kris Mok
>            Priority: Major
>
> I'd like to seek for community help on fixing this issue:
> Related to SPARK-34596, one of our users had hit an issue with 
> {{ExpressionEncoder}} when running Spark code in a Scala REPL, where 
> {{NewInstance.resolved}} was throwing {{"Malformed class name"}} error, with 
> the following kind of stack trace:
> {code}
> java.lang.InternalError: Malformed class name
>       at java.lang.Class.getSimpleBinaryName(Class.java:1450)
>       at java.lang.Class.isMemberClass(Class.java:1433)
>       at 
> org.apache.spark.sql.catalyst.expressions.objects.NewInstance.resolved$lzycompute(objects.scala:447)
>       at 
> org.apache.spark.sql.catalyst.expressions.objects.NewInstance.resolved(objects.scala:441)
>       at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpressionBottomUp(Analyzer.scala:1935)
>       ...
>  Caused by: sbt.ForkMain$ForkError: 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -83
>       at java.lang.String.substring(String.java:1931)
>       at java.lang.Class.getSimpleBinaryName(Class.java:1448)
>       at java.lang.Class.isMemberClass(Class.java:1433)
>       at 
> org.apache.spark.sql.catalyst.expressions.objects.NewInstance.resolved$lzycompute(objects.scala:447)
>       at 
> org.apache.spark.sql.catalyst.expressions.objects.NewInstance.resolved(objects.scala:441)
>   ...
> {code}
> The most important point in the stack trace is this:
> {code}
> java.lang.InternalError: Malformed class name
>       at java.lang.Class.getSimpleBinaryName(Class.java:1450)
>       at java.lang.Class.isMemberClass(Class.java:1433)
> {code}
> The most common way to hit the {{"Malformed class name"}} issue in Spark is 
> via {{java.lang.Class.getSimpleName}}. But as this stack trace demonstrates, 
> it can happen via other code paths from the JDK as well.
> If we want to fix it in a similar fashion as {{Utils.getSimpleName}}, we'd 
> have to emulate {{java.lang.Class.isMemberClass}} in Spark's {{Utils}}, and 
> then use it in the {{NewInstance.resolved}} code path.
> Here's a reproducer test case (in diff form against Spark master's 
> {{ExpressionEncoderSuite}} ), which uses explicit nested classes to emulate 
> the code structure that'd be generated by Scala's REPL:
> (the level of nesting can be further reduced and still reproduce the issue)
> {code}
> diff --git 
> a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
>  
> b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
> index 2635264..fd1b23d 100644
> --- 
> a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
> +++ 
> b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
> @@ -217,6 +217,95 @@ class ExpressionEncoderSuite extends 
> CodegenInterpretedPlanTest with AnalysisTes
>        "nested Scala class should work")
>    }
>  
> +  object OuterLevelWithVeryVeryVeryLongClassName1 {
> +    object OuterLevelWithVeryVeryVeryLongClassName2 {
> +      object OuterLevelWithVeryVeryVeryLongClassName3 {
> +        object OuterLevelWithVeryVeryVeryLongClassName4 {
> +          object OuterLevelWithVeryVeryVeryLongClassName5 {
> +            object OuterLevelWithVeryVeryVeryLongClassName6 {
> +              object OuterLevelWithVeryVeryVeryLongClassName7 {
> +                object OuterLevelWithVeryVeryVeryLongClassName8 {
> +                  object OuterLevelWithVeryVeryVeryLongClassName9 {
> +                    object OuterLevelWithVeryVeryVeryLongClassName10 {
> +                      object OuterLevelWithVeryVeryVeryLongClassName11 {
> +                        object OuterLevelWithVeryVeryVeryLongClassName12 {
> +                          object OuterLevelWithVeryVeryVeryLongClassName13 {
> +                            object OuterLevelWithVeryVeryVeryLongClassName14 
> {
> +                              object 
> OuterLevelWithVeryVeryVeryLongClassName15 {
> +                                object 
> OuterLevelWithVeryVeryVeryLongClassName16 {
> +                                  object 
> OuterLevelWithVeryVeryVeryLongClassName17 {
> +                                    object 
> OuterLevelWithVeryVeryVeryLongClassName18 {
> +                                      object 
> OuterLevelWithVeryVeryVeryLongClassName19 {
> +                                        object 
> OuterLevelWithVeryVeryVeryLongClassName20 {
> +                                          case class 
> MalformedNameExample2(x: Int)
> +                                        }
> +                                      }
> +                                    }
> +                                  }
> +                                }
> +                              }
> +                            }
> +                          }
> +                        }
> +                      }
> +                    }
> +                  }
> +                }
> +              }
> +            }
> +          }
> +        }
> +      }
> +    }
> +  }
> +
> +  {
> +    OuterScopes.addOuterScope(
> +      OuterLevelWithVeryVeryVeryLongClassName1
> +        .OuterLevelWithVeryVeryVeryLongClassName2
> +        .OuterLevelWithVeryVeryVeryLongClassName3
> +        .OuterLevelWithVeryVeryVeryLongClassName4
> +        .OuterLevelWithVeryVeryVeryLongClassName5
> +        .OuterLevelWithVeryVeryVeryLongClassName6
> +        .OuterLevelWithVeryVeryVeryLongClassName7
> +        .OuterLevelWithVeryVeryVeryLongClassName8
> +        .OuterLevelWithVeryVeryVeryLongClassName9
> +        .OuterLevelWithVeryVeryVeryLongClassName10
> +        .OuterLevelWithVeryVeryVeryLongClassName11
> +        .OuterLevelWithVeryVeryVeryLongClassName12
> +        .OuterLevelWithVeryVeryVeryLongClassName13
> +        .OuterLevelWithVeryVeryVeryLongClassName14
> +        .OuterLevelWithVeryVeryVeryLongClassName15
> +        .OuterLevelWithVeryVeryVeryLongClassName16
> +        .OuterLevelWithVeryVeryVeryLongClassName17
> +        .OuterLevelWithVeryVeryVeryLongClassName18
> +        .OuterLevelWithVeryVeryVeryLongClassName19
> +        .OuterLevelWithVeryVeryVeryLongClassName20)
> +    encodeDecodeTest(
> +      OuterLevelWithVeryVeryVeryLongClassName1
> +        .OuterLevelWithVeryVeryVeryLongClassName2
> +        .OuterLevelWithVeryVeryVeryLongClassName3
> +        .OuterLevelWithVeryVeryVeryLongClassName4
> +        .OuterLevelWithVeryVeryVeryLongClassName5
> +        .OuterLevelWithVeryVeryVeryLongClassName6
> +        .OuterLevelWithVeryVeryVeryLongClassName7
> +        .OuterLevelWithVeryVeryVeryLongClassName8
> +        .OuterLevelWithVeryVeryVeryLongClassName9
> +        .OuterLevelWithVeryVeryVeryLongClassName10
> +        .OuterLevelWithVeryVeryVeryLongClassName11
> +        .OuterLevelWithVeryVeryVeryLongClassName12
> +        .OuterLevelWithVeryVeryVeryLongClassName13
> +        .OuterLevelWithVeryVeryVeryLongClassName14
> +        .OuterLevelWithVeryVeryVeryLongClassName15
> +        .OuterLevelWithVeryVeryVeryLongClassName16
> +        .OuterLevelWithVeryVeryVeryLongClassName17
> +        .OuterLevelWithVeryVeryVeryLongClassName18
> +        .OuterLevelWithVeryVeryVeryLongClassName19
> +        .OuterLevelWithVeryVeryVeryLongClassName20
> +        .MalformedNameExample2(42),
> +      "deeply nested Scala class should work")
> +  }
> +
>    productTest(PrimitiveData(1, 1, 1, 1, 1, 1, true))
>  
>    productTest(
> {code}
> The reason why the test case above is so convoluted is in the way Scala 
> generates the class name for nested classes. In general, Scala generates a 
> class name for a nested class by inserting the dollar-sign ( {{$}} ) in 
> between each level of class nesting. The problem is that this format can 
> concatenate into a very long string that goes beyond certain limits, so Scala 
> will change the class name format beyond certain length threshold.
> For the example above, we can see that the first two levels of class nesting 
> have class names that look like this:
> {code}
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassName1$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassName1$OuterLevelWithVeryVeryVeryLongClassName2$
> {code}
> If we leave out the fact that Scala uses a dollar-sign ( {{$}} ) suffix for 
> the class name of the companion object, 
> {{OuterLevelWithVeryVeryVeryLongClassName1}}'s full name is a prefix 
> (substring) of {{OuterLevelWithVeryVeryVeryLongClassName2}}.
> But if we keep going deeper into the levels of nesting, you'll find names 
> that look like:
> {code}
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$2a1321b953c615695d7442b2adb1$$$$ryVeryLongClassName8$OuterLevelWithVeryVeryVeryLongClassName9$OuterLevelWithVeryVeryVeryLongClassName10$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$2a1321b953c615695d7442b2adb1$$$$ryVeryLongClassName8$OuterLevelWithVeryVeryVeryLongClassName9$OuterLevelWithVeryVeryVeryLongClassName10$OuterLevelWithVeryVeryVeryLongClassName11$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$OuterLevelWithVeryVeryVeryLongClassName13$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$OuterLevelWithVeryVeryVeryLongClassName13$OuterLevelWithVeryVeryVeryLongClassName14$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$OuterLevelWithVeryVeryVeryLongClassName16$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$OuterLevelWithVeryVeryVeryLongClassName16$OuterLevelWithVeryVeryVeryLongClassName17$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$OuterLevelWithVeryVeryVeryLongClassName19$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$OuterLevelWithVeryVeryVeryLongClassName19$OuterLevelWithVeryVeryVeryLongClassName20$
> {code}
> with a hash code in the middle and various levels of nesting omitted.
> The {{java.lang.Class.isMemberClass}} method is implemented in JDK8u as:
> http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/tip/src/share/classes/java/lang/Class.java#l1425
> {code:java}
>     /**
>      * Returns {@code true} if and only if the underlying class
>      * is a member class.
>      *
>      * @return {@code true} if and only if this class is a member class.
>      * @since 1.5
>      */
>     public boolean isMemberClass() {
>         return getSimpleBinaryName() != null && !isLocalOrAnonymousClass();
>     }
>     /**
>      * Returns the "simple binary name" of the underlying class, i.e.,
>      * the binary name without the leading enclosing class name.
>      * Returns {@code null} if the underlying class is a top level
>      * class.
>      */
>     private String getSimpleBinaryName() {
>         Class<?> enclosingClass = getEnclosingClass();
>         if (enclosingClass == null) // top level class
>             return null;
>         // Otherwise, strip the enclosing class' name
>         try {
>             return getName().substring(enclosingClass.getName().length());
>         } catch (IndexOutOfBoundsException ex) {
>             throw new InternalError("Malformed class name", ex);
>         }
>     }
> {code}
> and the problematic code is 
> {{getName().substring(enclosingClass.getName().length())}} -- if a class's 
> enclosing class's full name is *longer* than the nested class's full name, 
> this logic would end up going out of bounds.
> The bug has been fixed in JDK9 by 
> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8057919 , but still 
> exists in the latest JDK8u release. So from the Spark side we'd need to do 
> something to avoid hitting this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to