[jira] [Commented] (KAFKA-12308) ConfigDef.parseType deadlock
[ https://issues.apache.org/jira/browse/KAFKA-12308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289998#comment-17289998 ] Tom Bentley commented on KAFKA-12308: - I think this is caused by the fact the {{DelegatingClassLoader}} is not registered as parallel capable, but should be. It should be because, according to https://docs.oracle.com/javase/7/docs/technotes/guides/lang/cl-mt.html, to qualify for the acyclic delegation model "If the class is not found, the class loader asks its parent to locate the class. If the parent cannot find the class, the class loader attempts to locate the class itself.", but {{DelegatingClassLoader}} may actually ask the {{PluginClassLoader}} to load a class before it's tried {{super}}. >From the stack dump provided {noformat} "StartAndStopExecutor-connect-1-5": at java.lang.ClassLoader.loadClass(ClassLoader.java:398) // wait for DCL getClassLoadingLock - waiting to lock <0x0006c222db00> (a org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.loadClass(DelegatingClassLoader.java:397) // deletate to super at java.lang.ClassLoader.loadClass(ClassLoader.java:405) // super delegates to parent (DCL) - locked <0x00077b9bf3c0> (a java.lang.Object) // lock PCLY+name (super's getClassLoadingLock) at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104) - locked <0x00077b9bf3c0> (a java.lang.Object) // lock PCLY+name (getClassLoadingLock) - locked <0x0006c25b4e38> (a org.apache.kafka.connect.runtime.isolation.PluginClassLoader)// lock PCLY (synchronized) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) {noformat} and {noformat} "StartAndStopExecutor-connect-1-6": at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:91) // lock PCLX (synchronized) - waiting to lock <0x0006c25b4e38> (a org.apache.kafka.connect.runtime.isolation.PluginClassLoader) at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.loadClass(DelegatingClassLoader.java:394) // delegated to PCL at java.lang.ClassLoader.loadClass(ClassLoader.java:351) // ClassLoader.loadClass(String name) calling PCL.loadClass(String, at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) {noformat} It also says {noformat} "StartAndStopExecutor-connect-1-5": waiting to lock monitor 0x0203a553b6f8 (object 0x0006c222db00, a org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader), which is held by "StartAndStopExecutor-connect-1-6" {noformat} the {{0x0006c222db00}} doesn't appear in the stacktrace, I think that's because it's [held by the JVM itself|https://github.com/openjdk/jdk/blob/06170b7cbf6129274747b4406562184802d4ff07/src/hotspot/share/classfile/systemDictionary.cpp#L695]. If DelegatingClassloader is registered as parallel capable this won't happen {noformat} at java.lang.ClassLoader.loadClass(ClassLoader.java:398) // wait for DCL getClassLoadingLock - waiting to lock <0x0006c222db00> (a org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) {noformat} Because {{DCL.getClassLoadingLock}} will return an object specific to the class being loaded, rather than the DCL instance itself, which is locked by the JVM. Does this seem plausible to you [~kkonstantine] [~ChrisEgerton]? > ConfigDef.parseType deadlock > > > Key: KAFKA-12308 > URL: https://issues.apache.org/jira/browse/KAFKA-12308 > Project: Kafka > Issue Type: Bug > Components: config >Affects Versions: 2.5.0 > Environment: kafka 2.5.0 > centos7 > java version "1.8.0_231" >Reporter: cosmozhu >Priority: Major > Attachments: deadlock.log > > > hi, > the problem was found, when I restarted *ConnectDistributed* > I restart ConnectDistributed in the single node for the test, with not delete > connectors. > sometimes the process stopped when creating connectors. > I add some logger and found it had a deadlock in `ConfigDef.parseType`.My > connectors always have the same transforms. I guess when connector startup > (in startAndStopExecutor which default 8 threads) and load the same class > file it has something wrong. > I att
[jira] [Commented] (KAFKA-12308) ConfigDef.parseType deadlock
[ https://issues.apache.org/jira/browse/KAFKA-12308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290756#comment-17290756 ] Konstantine Karantasis commented on KAFKA-12308: [~tombentley] I actually think that the initial suggesting in https://issues.apache.org/jira/browse/KAFKA-7421 regarding the removal of the method lock is correct. The `DelegatingClassLoader` doesn't seem to need to be parallel because it delegates loading to either `PluginClassLoader` instances that are parallel capable or the parent which normally is the system classloader and should also be parallel. Note, that the loading sequence that you mention above, is inverted on purpose to actually implement classloading isolation. First we attempt loading the class from the "child" `PluginClassLoader` of the designated plugin and if not found then the parent classloader of the `DelegatingClassLoader` is consulted. I have updated the PR that had added a test for this type of deadlock originally submitted by [~gharris1727] in: [https://github.com/apache/kafka/pull/8259] cc [~rhauch] > ConfigDef.parseType deadlock > > > Key: KAFKA-12308 > URL: https://issues.apache.org/jira/browse/KAFKA-12308 > Project: Kafka > Issue Type: Bug > Components: config >Affects Versions: 2.5.0 > Environment: kafka 2.5.0 > centos7 > java version "1.8.0_231" >Reporter: cosmozhu >Priority: Major > Attachments: deadlock.log > > > hi, > the problem was found, when I restarted *ConnectDistributed* > I restart ConnectDistributed in the single node for the test, with not delete > connectors. > sometimes the process stopped when creating connectors. > I add some logger and found it had a deadlock in `ConfigDef.parseType`.My > connectors always have the same transforms. I guess when connector startup > (in startAndStopExecutor which default 8 threads) and load the same class > file it has something wrong. > I attached the jstack log file. > thanks for any help. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12308) ConfigDef.parseType deadlock
[ https://issues.apache.org/jira/browse/KAFKA-12308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290793#comment-17290793 ] Tom Bentley commented on KAFKA-12308: - [~kkonstantine] I'm not an expert in classloaders but I'm still not sure that DCL shouldn't be considered parallel. The referred class loader guide explicitly says that an acyclic CL should delegate to {{super}} _first_. I understand that delegating to PCL first is intentional, but it doesn't fit with the definition given AFAICS. The fact that the CLs it delegates to are both parallel doesn't seem to be relevant. Also, the parent of the PCL is the DCL, which looks like a cycle to me (but, as I said, I'm no expert, so happy to be corrected). Assuming the {{synchronized}} was removed from PCL {{loadClass}}, then {noformat} "StartAndStopExecutor-connect-1-6": at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:91) // lock PCLX (synchronized) {noformat} wouldn't get blocked, but there would still be two threads contenting two locks when racing to load the same class, those locks would be the {{getClassLoadingLock()}} on the PCL and the monitor of the DCL instance itself, so I think perhaps a deadlock would still be possible, just on different monitors. > ConfigDef.parseType deadlock > > > Key: KAFKA-12308 > URL: https://issues.apache.org/jira/browse/KAFKA-12308 > Project: Kafka > Issue Type: Bug > Components: config, KafkaConnect >Affects Versions: 2.5.0 > Environment: kafka 2.5.0 > centos7 > java version "1.8.0_231" >Reporter: cosmozhu >Priority: Major > Attachments: deadlock.log > > > hi, > the problem was found, when I restarted *ConnectDistributed* > I restart ConnectDistributed in the single node for the test, with not delete > connectors. > sometimes the process stopped when creating connectors. > I add some logger and found it had a deadlock in `ConfigDef.parseType`.My > connectors always have the same transforms. I guess when connector startup > (in startAndStopExecutor which default 8 threads) and load the same class > file it has something wrong. > I attached the jstack log file. > thanks for any help. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12308) ConfigDef.parseType deadlock
[ https://issues.apache.org/jira/browse/KAFKA-12308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380772#comment-17380772 ] Konstantine Karantasis commented on KAFKA-12308: Adding the comment that I added in the PR here as well: The idea that the {{DelegatingClassLoader}} did not have to be parallel capable originated to the fact that it doesn't load classes directly. It delegates loading either to the appropriate PluginClassLoader directly via composition, or to the parent by calling {{super.loadClass}}. The latter is the key point of why we need to make the {{DelegatingClassLoader}} also parallel capable even though it doesn't load a class. Because inheritance is used (via a call to {{super.loadClass}}) and not composition (via a hypothetical call to {{parent.loadClass}}, which is not possible because {{parent}} is a private member of the base abstract class {{ClassLoader}}) when {{getClassLoadingLock}} is called in {{super.loadClass}} it checks that actually the derived class (here an instance of {{DelegatingClassLoader}}) is not parallel capable and therefore ends up not applying fine-grain locking during classloading even though the parent clasloader is used actually load the classes. Based on the above, the {{DelegatingClassLoader}} needs to be parallel capable too in order for the parent loader to load classes in parallel. I've tested both classloader types being parallel capable in a variety of scenarios with multiple connectors, SMTs and converters and a deadlock did not reproduce. Of course reproducing the issue is difficult without the specifics of the jar layout to begin with. The possibility of a deadlock is still not zero, but also probably not exacerbated compared to the current code. The plugin that depends on other plugins to be loaded while it's loading its classes is the connector type plugin only and there are no inter-connector dependencies (a connector requiring another connector's classes to be loaded while loading its own). With that in mind, a deadlock should be even less possible now. In the future we could consider introducing deadlock recovery methods to get out of this type of situation if necessary. > ConfigDef.parseType deadlock > > > Key: KAFKA-12308 > URL: https://issues.apache.org/jira/browse/KAFKA-12308 > Project: Kafka > Issue Type: Bug > Components: config, KafkaConnect >Affects Versions: 2.5.0 > Environment: kafka 2.5.0 > centos7 > java version "1.8.0_231" >Reporter: cosmozhu >Priority: Major > Attachments: deadlock.log > > > hi, > the problem was found, when I restarted *ConnectDistributed* > I restart ConnectDistributed in the single node for the test, with not delete > connectors. > sometimes the process stopped when creating connectors. > I add some logger and found it had a deadlock in `ConfigDef.parseType`.My > connectors always have the same transforms. I guess when connector startup > (in startAndStopExecutor which default 8 threads) and load the same class > file it has something wrong. > I attached the jstack log file. > thanks for any help. -- This message was sent by Atlassian Jira (v8.3.4#803005)