[jira] [Commented] (KAFKA-12308) ConfigDef.parseType deadlock

2021-02-24 Thread Tom Bentley (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289998#comment-17289998
 ] 

Tom Bentley commented on KAFKA-12308:
-

I think this is caused by the fact the {{DelegatingClassLoader}} is not 
registered as parallel capable, but should be. It should be because, according 
to https://docs.oracle.com/javase/7/docs/technotes/guides/lang/cl-mt.html, to 
qualify for the acyclic delegation model "If the class is not found, the class 
loader asks its parent to locate the class. If the parent cannot find the 
class, the class loader attempts to locate the class itself.", but 
{{DelegatingClassLoader}} may actually ask the {{PluginClassLoader}} to load a 
class before it's tried {{super}}.

>From the stack dump provided
{noformat}
"StartAndStopExecutor-connect-1-5":
at java.lang.ClassLoader.loadClass(ClassLoader.java:398)
  // wait for DCL getClassLoadingLock
- waiting to lock <0x0006c222db00> (a 
org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
at 
org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.loadClass(DelegatingClassLoader.java:397)
 // deletate to super
at java.lang.ClassLoader.loadClass(ClassLoader.java:405)
  // super delegates to parent (DCL)
- locked <0x00077b9bf3c0> (a java.lang.Object)  
  // lock PCLY+name (super's getClassLoadingLock)
at 
org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104)
- locked <0x00077b9bf3c0> (a java.lang.Object)  
  // lock PCLY+name (getClassLoadingLock)
- locked <0x0006c25b4e38> (a 
org.apache.kafka.connect.runtime.isolation.PluginClassLoader)// lock 
PCLY (synchronized)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
{noformat}

and 

{noformat}
"StartAndStopExecutor-connect-1-6":
at 
org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:91)
 // lock PCLX (synchronized)
- waiting to lock <0x0006c25b4e38> (a 
org.apache.kafka.connect.runtime.isolation.PluginClassLoader)
at 
org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.loadClass(DelegatingClassLoader.java:394)
 // delegated to PCL
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
 // ClassLoader.loadClass(String name) calling 
PCL.loadClass(String,
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
{noformat}

It also says 

{noformat}
"StartAndStopExecutor-connect-1-5":
  waiting to lock monitor 0x0203a553b6f8 (object 0x0006c222db00, a 
org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader),
  which is held by "StartAndStopExecutor-connect-1-6"
{noformat}

the {{0x0006c222db00}} doesn't appear in the stacktrace, I think that's 
because it's [held by the JVM 
itself|https://github.com/openjdk/jdk/blob/06170b7cbf6129274747b4406562184802d4ff07/src/hotspot/share/classfile/systemDictionary.cpp#L695].
 

If DelegatingClassloader is registered as parallel capable this won't happen

{noformat}
at java.lang.ClassLoader.loadClass(ClassLoader.java:398)
  // wait for DCL getClassLoadingLock
- waiting to lock <0x0006c222db00> (a 
org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
{noformat}

Because {{DCL.getClassLoadingLock}} will return an object specific to the class 
being loaded, rather than the DCL instance itself, which is locked by the JVM.

Does this seem plausible to you [~kkonstantine] [~ChrisEgerton]?

> ConfigDef.parseType deadlock
> 
>
> Key: KAFKA-12308
> URL: https://issues.apache.org/jira/browse/KAFKA-12308
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Affects Versions: 2.5.0
> Environment: kafka 2.5.0
> centos7
> java version "1.8.0_231"
>Reporter: cosmozhu
>Priority: Major
> Attachments: deadlock.log
>
>
> hi,
>  the problem was found, when I restarted *ConnectDistributed*
> I restart ConnectDistributed in the single node for the test, with not delete 
> connectors.
>  sometimes the process stopped when creating connectors.
> I add some logger and found it had a deadlock in `ConfigDef.parseType`.My 
> connectors always have the same transforms. I guess when connector startup 
> (in startAndStopExecutor which default 8 threads) and load the same class 
> file it has something wrong.
> I att

[jira] [Commented] (KAFKA-12308) ConfigDef.parseType deadlock

2021-02-24 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290756#comment-17290756
 ] 

Konstantine Karantasis commented on KAFKA-12308:


[~tombentley] I actually think that the initial suggesting in 
https://issues.apache.org/jira/browse/KAFKA-7421 regarding the removal of the 
method lock is correct. 

The `DelegatingClassLoader` doesn't seem to need to be parallel because it 
delegates loading to either `PluginClassLoader` instances that are parallel 
capable or the parent which normally is the system classloader and should also 
be parallel. 

Note, that the loading sequence that you mention above, is inverted on purpose 
to actually implement classloading isolation. First we attempt loading the 
class from the "child" `PluginClassLoader` of the designated plugin and if not 
found then the parent classloader of the `DelegatingClassLoader` is consulted. 

I have updated the PR that had added a test for this type of deadlock 
originally submitted by [~gharris1727] in: 
[https://github.com/apache/kafka/pull/8259]

cc [~rhauch]

> ConfigDef.parseType deadlock
> 
>
> Key: KAFKA-12308
> URL: https://issues.apache.org/jira/browse/KAFKA-12308
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Affects Versions: 2.5.0
> Environment: kafka 2.5.0
> centos7
> java version "1.8.0_231"
>Reporter: cosmozhu
>Priority: Major
> Attachments: deadlock.log
>
>
> hi,
>  the problem was found, when I restarted *ConnectDistributed*
> I restart ConnectDistributed in the single node for the test, with not delete 
> connectors.
>  sometimes the process stopped when creating connectors.
> I add some logger and found it had a deadlock in `ConfigDef.parseType`.My 
> connectors always have the same transforms. I guess when connector startup 
> (in startAndStopExecutor which default 8 threads) and load the same class 
> file it has something wrong.
> I attached the jstack log file.
> thanks for any help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12308) ConfigDef.parseType deadlock

2021-02-25 Thread Tom Bentley (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290793#comment-17290793
 ] 

Tom Bentley commented on KAFKA-12308:
-

[~kkonstantine] I'm not an expert in classloaders but I'm still not sure that 
DCL shouldn't be considered parallel. The referred class loader guide 
explicitly says that an acyclic CL should delegate to {{super}} _first_. I 
understand that delegating to PCL first is intentional, but it doesn't fit with 
the definition given AFAICS. The fact that the CLs it delegates to are both 
parallel doesn't seem to be relevant. Also, the parent of the PCL is the DCL, 
which looks like a cycle to me (but, as I said, I'm no expert, so happy to be 
corrected). 

Assuming the {{synchronized}} was removed from PCL {{loadClass}}, then 
{noformat}
"StartAndStopExecutor-connect-1-6":
at 
org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:91)
 // lock PCLX (synchronized)
{noformat}
wouldn't get blocked, but there would still be two threads contenting two locks 
when racing to load the same class, those locks would be the 
{{getClassLoadingLock()}} on the PCL and the monitor of the DCL instance 
itself, so I think perhaps a deadlock would still be possible, just on 
different monitors. 

> ConfigDef.parseType deadlock
> 
>
> Key: KAFKA-12308
> URL: https://issues.apache.org/jira/browse/KAFKA-12308
> Project: Kafka
>  Issue Type: Bug
>  Components: config, KafkaConnect
>Affects Versions: 2.5.0
> Environment: kafka 2.5.0
> centos7
> java version "1.8.0_231"
>Reporter: cosmozhu
>Priority: Major
> Attachments: deadlock.log
>
>
> hi,
>  the problem was found, when I restarted *ConnectDistributed*
> I restart ConnectDistributed in the single node for the test, with not delete 
> connectors.
>  sometimes the process stopped when creating connectors.
> I add some logger and found it had a deadlock in `ConfigDef.parseType`.My 
> connectors always have the same transforms. I guess when connector startup 
> (in startAndStopExecutor which default 8 threads) and load the same class 
> file it has something wrong.
> I attached the jstack log file.
> thanks for any help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-12308) ConfigDef.parseType deadlock

2021-07-14 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380772#comment-17380772
 ] 

Konstantine Karantasis commented on KAFKA-12308:


Adding the comment that I added in the PR here as well: 



The idea that the {{DelegatingClassLoader}} did not have to be parallel capable 
originated to the fact that it doesn't load classes directly. It delegates 
loading either to the appropriate PluginClassLoader directly via composition, 
or to the parent by calling {{super.loadClass}}.

The latter is the key point of why we need to make the 
{{DelegatingClassLoader}} also parallel capable even though it doesn't load a 
class. Because inheritance is used (via a call to {{super.loadClass}}) and not 
composition (via a hypothetical call to {{parent.loadClass}}, which is not 
possible because {{parent}} is a private member of the base abstract class 
{{ClassLoader}}) when {{getClassLoadingLock}} is called in {{super.loadClass}} 
it checks that actually the derived class (here an instance of 
{{DelegatingClassLoader}}) is not parallel capable and therefore ends up not 
applying fine-grain locking during classloading even though the parent 
clasloader is used actually load the classes.

Based on the above, the {{DelegatingClassLoader}} needs to be parallel capable 
too in order for the parent loader to load classes in parallel. 

I've tested both classloader types being parallel capable in a variety of 
scenarios with multiple connectors, SMTs and converters and a deadlock did not 
reproduce. Of course reproducing the issue is difficult without the specifics 
of the jar layout to begin with. The possibility of a deadlock is still not 
zero, but also probably not exacerbated compared to the current code. The 
plugin that depends on other plugins to be loaded while it's loading its 
classes is the connector type plugin only and there are no inter-connector 
dependencies (a connector requiring another connector's classes to be loaded 
while loading its own). With that in mind, a deadlock should be even less 
possible now. In the future we could consider introducing deadlock recovery 
methods to get out of this type of situation if necessary.

> ConfigDef.parseType deadlock
> 
>
> Key: KAFKA-12308
> URL: https://issues.apache.org/jira/browse/KAFKA-12308
> Project: Kafka
>  Issue Type: Bug
>  Components: config, KafkaConnect
>Affects Versions: 2.5.0
> Environment: kafka 2.5.0
> centos7
> java version "1.8.0_231"
>Reporter: cosmozhu
>Priority: Major
> Attachments: deadlock.log
>
>
> hi,
>  the problem was found, when I restarted *ConnectDistributed*
> I restart ConnectDistributed in the single node for the test, with not delete 
> connectors.
>  sometimes the process stopped when creating connectors.
> I add some logger and found it had a deadlock in `ConfigDef.parseType`.My 
> connectors always have the same transforms. I guess when connector startup 
> (in startAndStopExecutor which default 8 threads) and load the same class 
> file it has something wrong.
> I attached the jstack log file.
> thanks for any help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)