[ 
https://issues.apache.org/jira/browse/FLINK-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522143#comment-16522143
 ] 

ASF GitHub Bot commented on FLINK-9654:
---------------------------------------

GitHub user zsolt-donca opened a pull request:

    https://github.com/apache/flink/pull/6206

    [FLINK-9654] [core] Changed the check for anonymous classes to avoid 
InternalError

    …SI-2034.
    
    ## What is the purpose of the change
    
    This pull request avoids triggering 
[SI-2034](https://issues.scala-lang.org/browse/SI-2034) for Scala classes that 
are defined inside of Scala objects. The issue will be fixed only when Scala is 
will be officially supported by Java 9, as, after all, it's fixed in Java 9: 
https://bugs.openjdk.java.net/browse/JDK-8057919.
    
    As explained in 
[FLINK-9654](https://issues.apache.org/jira/browse/FLINK-9654), whenever there 
is a custom `TypeSerializer` implementation that, when serialized, has in its 
object graph a reference to a class that triggers 
[SI-2034](https://issues.scala-lang.org/browse/SI-2034), it makes the task 
manager instance fail, potentially bringing down the entire Flink cluster.
    
    ## Brief change log
      - made the classname-related checks happen *before* the call to 
`Class.isAnonymousClass`, after all, there is no reason to call it if we can 
know it just by looking at the class name;
      - added the check for "$macro$" in the name checks, as, after all, 
macro-generated classes are always anonymous;
      - wrapped the call to `isAnonymousClass` into a try-catch block, to catch 
the `InternalError` that the issue might trigger.
    
    ## Verifying this change
    
    This change is a trivial rework / code cleanup without any test coverage.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): no
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
      - The serializers: no
      - The runtime per-record code paths (performance sensitive): no
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no
      - The S3 file system connector: no
    
    ## Documentation
    
      - Does this pull request introduce a new feature? no
      - If yes, how is the feature documented? not applicable


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsolt-donca/flink 
FLINK-9654-internal-error-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6206.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6206
    
----
commit fbad06a398fe58f8d312e0ed4dc6bdd31ac65d08
Author: Zsolt Donca <zsolt.donca@...>
Date:   2018-06-25T07:43:29Z

    FLINK-9654 Changed the way we check if a class is anonymous to avoid 
SI-2034.

----


> Internal error while deserializing custom Scala TypeSerializer instances
> ------------------------------------------------------------------------
>
>                 Key: FLINK-9654
>                 URL: https://issues.apache.org/jira/browse/FLINK-9654
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Zsolt Donca
>            Priority: Major
>              Labels: pull-request-available
>
> When you are using custom `TypeSerializer` instances implemented in Scala, 
> the Scala issue [SI-2034|https://issues.scala-lang.org/browse/SI-2034] can 
> manifest itself when a Flink job is restored from checkpoint or started with 
> a savepoint.
> The reason is that in such a restore from checkpoint or savepoint, Flink uses 
> `InstantiationUtil.FailureTolerantObjectInputStream` to deserialize the type 
> serializers and their configurations. The deserialization walks through the 
> entire object graph corresponding, and for each class it calls 
> `isAnonymousClass`, which, in turn, calls `getSimpleName` (mechanism in place 
> for FLINK-6869). If there is an internal class defined in a Scala object for 
> which `getSimpleName` fails (see the Scala issue), then a 
> `java.lang.InternalError` is thrown which causes the task manager to restart. 
> In this case, Flink tries to restart the job on another task manager, causing 
> all the task managers to restart, wreaking havoc on the entire Flink cluster.
> There are some alternative type information derivation mechanisms that rely 
> on anonymous classes and, most importantly, classes generated by macros, that 
> can easily trigger the above problem. I am personally working on 
> [https://github.com/zsolt-donca/flink-alt], and there is also 
> [https://github.com/joroKr21/flink-shapeless]
> I prepared a pull request that fixes the issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to