[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow
[ https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365002#comment-16365002 ] ASF GitHub Bot commented on KAFKA-6503: --- ewencp closed pull request #4561: KAFKA-6503: Parallelize plugin scanning URL: https://github.com/apache/kafka/pull/4561 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java index 345d7ef011d..b21cdcbfab4 100644 --- a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java +++ b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java @@ -20,7 +20,10 @@ import org.apache.kafka.connect.storage.Converter; import org.apache.kafka.connect.storage.HeaderConverter; import org.apache.kafka.connect.transforms.Transformation; +import org.reflections.Configuration; import org.reflections.Reflections; +import org.reflections.ReflectionsException; +import org.reflections.scanners.SubTypesScanner; import org.reflections.util.ClasspathHelper; import org.reflections.util.ConfigurationBuilder; import org.slf4j.Logger; @@ -269,7 +272,10 @@ private PluginScanResult scanPluginPath( ConfigurationBuilder builder = new ConfigurationBuilder(); builder.setClassLoaders(new ClassLoader[]{loader}); builder.addUrls(urls); -Reflections reflections = new Reflections(builder); +builder.setScanners(new SubTypesScanner()); +builder.setExpandSuperTypes(false); +builder.useParallelExecutor(); +Reflections reflections = new InternalReflections(builder); return new PluginScanResult( getPluginDesc(reflections, Connector.class, loader), @@ -353,4 +359,25 @@ private void addAllAliases() { } } } + +private static class InternalReflections extends Reflections { + +public InternalReflections(Configuration configuration) { +super(configuration); +} + +// When Reflections is used for parallel scans, it has a bug where it propagates ReflectionsException +// as RuntimeException. Override the scan behavior to emulate the singled-threaded logic. +@Override +protected void scan(URL url) { +try { +super.scan(url); +} catch (ReflectionsException e) { +Logger log = Reflections.log; +if (log != null && log.isWarnEnabled()) { +log.warn("could not create Vfs.Dir from url. ignoring the exception and continuing", e); +} +} +} +} } This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Connect: Plugin scan is very slow > - > > Key: KAFKA-6503 > URL: https://issues.apache.org/jira/browse/KAFKA-6503 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 1.0.0 >Reporter: Per Steffensen >Assignee: Robert Yokota >Priority: Critical > Fix For: 1.1.0, 1.2.0 > > > Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is > very slow - see logs from starting my Kafka-Connect instance at the bottom. > It takes almost 4 minutes scanning. I am running Kafka-Connect in docker > based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to > /usr/share/java. The only thing I have added is a 13MB jar in > /usr/share/java/kafka-connect-file-streamer-client containing two connectors > and a converter. That one alone seems to take 20 secs. > If it was just scanning in the background, and everything was working it > probably would not be a big issue. But it does not. Right after starting the > Kafka-Connect instance I try to create a connector via the /connectors > endpoint, but it will not succeed before the plugin scanning has finished (4 > minutes) > I am not even sure why scanning is necessary. Is it not always true that > connectors, converters etc are mentioned by name, so to see if it exists, > just try to load the class - the classloader will tell if it is available. > Hmmm, there is probably a reason. > Anyway, either it
[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow
[ https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363139#comment-16363139 ] Robert Yokota commented on KAFKA-6503: -- With my setup where plugin.path was pointing to share/java, the parallel scanning introduced with this change reduced worker initialization from 32s to 12s on my Mac, and from 38s to 14s on Docker/Linux. > Connect: Plugin scan is very slow > - > > Key: KAFKA-6503 > URL: https://issues.apache.org/jira/browse/KAFKA-6503 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 1.0.0 >Reporter: Per Steffensen >Priority: Critical > Fix For: 1.1.0 > > > Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is > very slow - see logs from starting my Kafka-Connect instance at the bottom. > It takes almost 4 minutes scanning. I am running Kafka-Connect in docker > based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to > /usr/share/java. The only thing I have added is a 13MB jar in > /usr/share/java/kafka-connect-file-streamer-client containing two connectors > and a converter. That one alone seems to take 20 secs. > If it was just scanning in the background, and everything was working it > probably would not be a big issue. But it does not. Right after starting the > Kafka-Connect instance I try to create a connector via the /connectors > endpoint, but it will not succeed before the plugin scanning has finished (4 > minutes) > I am not even sure why scanning is necessary. Is it not always true that > connectors, converters etc are mentioned by name, so to see if it exists, > just try to load the class - the classloader will tell if it is available. > Hmmm, there is probably a reason. > Anyway, either it should be made much faster, or at least Kafka-Connect > should be fully functional (or as functional as possible) while scanning is > going on. > {code} > [2018-01-30 13:52:26,834] INFO Scanning for plugin classes. This might take a > moment ... (org.apache.kafka.connect.cli.ConnectDistributed) > [2018-01-30 13:52:27,218] INFO Loading plugin from: > /usr/share/java/kafka-connect-file-streamer-client > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,037] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-file-streamer-client/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,038] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerStreamSourceConnectorManager' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,039] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerFilesStreamerServerSourceConnectorManager' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,040] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.KafkaConnectByteArrayConverter' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,049] INFO Loading plugin from: > /usr/share/java/kafka-connect-elasticsearch > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,595] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-elasticsearch/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,611] INFO Added plugin > 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,651] INFO Loading plugin from: > /usr/share/java/kafka-connect-jdbc > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,491] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-jdbc/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,491] INFO Added plugin > 'io.confluent.connect.jdbc.JdbcSinkConnector' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,492] INFO Added plugin > 'io.confluent.connect.jdbc.JdbcSourceConnector' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,663] INFO Loading plugin from: > /usr/share/java/kafka-connect-s3 > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:53:51,055] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-s3/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:53:51,055] INFO Added plugin >
[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow
[ https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362933#comment-16362933 ] ASF GitHub Bot commented on KAFKA-6503: --- rayokota opened a new pull request #4561: KAFKA-6503: Parallelize plugin scanning URL: https://github.com/apache/kafka/pull/4561 This is a one-liner to parallelize plugin scanning. This may help in some environments where otherwise plugin scanning is slow. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Connect: Plugin scan is very slow > - > > Key: KAFKA-6503 > URL: https://issues.apache.org/jira/browse/KAFKA-6503 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 1.0.0 >Reporter: Per Steffensen >Priority: Critical > Fix For: 1.1.0 > > > Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is > very slow - see logs from starting my Kafka-Connect instance at the bottom. > It takes almost 4 minutes scanning. I am running Kafka-Connect in docker > based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to > /usr/share/java. The only thing I have added is a 13MB jar in > /usr/share/java/kafka-connect-file-streamer-client containing two connectors > and a converter. That one alone seems to take 20 secs. > If it was just scanning in the background, and everything was working it > probably would not be a big issue. But it does not. Right after starting the > Kafka-Connect instance I try to create a connector via the /connectors > endpoint, but it will not succeed before the plugin scanning has finished (4 > minutes) > I am not even sure why scanning is necessary. Is it not always true that > connectors, converters etc are mentioned by name, so to see if it exists, > just try to load the class - the classloader will tell if it is available. > Hmmm, there is probably a reason. > Anyway, either it should be made much faster, or at least Kafka-Connect > should be fully functional (or as functional as possible) while scanning is > going on. > {code} > [2018-01-30 13:52:26,834] INFO Scanning for plugin classes. This might take a > moment ... (org.apache.kafka.connect.cli.ConnectDistributed) > [2018-01-30 13:52:27,218] INFO Loading plugin from: > /usr/share/java/kafka-connect-file-streamer-client > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,037] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-file-streamer-client/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,038] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerStreamSourceConnectorManager' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,039] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerFilesStreamerServerSourceConnectorManager' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,040] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.KafkaConnectByteArrayConverter' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,049] INFO Loading plugin from: > /usr/share/java/kafka-connect-elasticsearch > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,595] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-elasticsearch/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,611] INFO Added plugin > 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,651] INFO Loading plugin from: > /usr/share/java/kafka-connect-jdbc > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,491] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-jdbc/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,491] INFO Added plugin > 'io.confluent.connect.jdbc.JdbcSinkConnector' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,492] INFO Added plugin > 'io.confluent.connect.jdbc.JdbcSourceConnector' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,663] INFO Loading plugin from: >
[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow
[ https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362930#comment-16362930 ] ASF GitHub Bot commented on KAFKA-6503: --- rayokota opened a new pull request #4561: KAFKA-6503: Parallelize plugin scanning URL: https://github.com/apache/kafka/pull/4561 This is a one-liner to parallelize plugin scanning. This may help in some environments where otherwise plugin scanning is slow. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Connect: Plugin scan is very slow > - > > Key: KAFKA-6503 > URL: https://issues.apache.org/jira/browse/KAFKA-6503 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 1.0.0 >Reporter: Per Steffensen >Priority: Critical > Fix For: 1.1.0 > > > Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is > very slow - see logs from starting my Kafka-Connect instance at the bottom. > It takes almost 4 minutes scanning. I am running Kafka-Connect in docker > based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to > /usr/share/java. The only thing I have added is a 13MB jar in > /usr/share/java/kafka-connect-file-streamer-client containing two connectors > and a converter. That one alone seems to take 20 secs. > If it was just scanning in the background, and everything was working it > probably would not be a big issue. But it does not. Right after starting the > Kafka-Connect instance I try to create a connector via the /connectors > endpoint, but it will not succeed before the plugin scanning has finished (4 > minutes) > I am not even sure why scanning is necessary. Is it not always true that > connectors, converters etc are mentioned by name, so to see if it exists, > just try to load the class - the classloader will tell if it is available. > Hmmm, there is probably a reason. > Anyway, either it should be made much faster, or at least Kafka-Connect > should be fully functional (or as functional as possible) while scanning is > going on. > {code} > [2018-01-30 13:52:26,834] INFO Scanning for plugin classes. This might take a > moment ... (org.apache.kafka.connect.cli.ConnectDistributed) > [2018-01-30 13:52:27,218] INFO Loading plugin from: > /usr/share/java/kafka-connect-file-streamer-client > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,037] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-file-streamer-client/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,038] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerStreamSourceConnectorManager' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,039] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerFilesStreamerServerSourceConnectorManager' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,040] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.KafkaConnectByteArrayConverter' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,049] INFO Loading plugin from: > /usr/share/java/kafka-connect-elasticsearch > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,595] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-elasticsearch/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,611] INFO Added plugin > 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,651] INFO Loading plugin from: > /usr/share/java/kafka-connect-jdbc > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,491] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-jdbc/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,491] INFO Added plugin > 'io.confluent.connect.jdbc.JdbcSinkConnector' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,492] INFO Added plugin > 'io.confluent.connect.jdbc.JdbcSourceConnector' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,663] INFO Loading plugin from: >
[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow
[ https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362931#comment-16362931 ] ASF GitHub Bot commented on KAFKA-6503: --- rayokota closed pull request #4561: KAFKA-6503: Parallelize plugin scanning URL: https://github.com/apache/kafka/pull/4561 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java index 345d7ef011d..e4a54d621c5 100644 --- a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java +++ b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java @@ -20,7 +20,10 @@ import org.apache.kafka.connect.storage.Converter; import org.apache.kafka.connect.storage.HeaderConverter; import org.apache.kafka.connect.transforms.Transformation; +import org.reflections.Configuration; import org.reflections.Reflections; +import org.reflections.ReflectionsException; +import org.reflections.scanners.SubTypesScanner; import org.reflections.util.ClasspathHelper; import org.reflections.util.ConfigurationBuilder; import org.slf4j.Logger; @@ -269,7 +272,10 @@ private PluginScanResult scanPluginPath( ConfigurationBuilder builder = new ConfigurationBuilder(); builder.setClassLoaders(new ClassLoader[]{loader}); builder.addUrls(urls); -Reflections reflections = new Reflections(builder); +builder.setScanners(new SubTypesScanner()); +builder.setExpandSuperTypes(false); +builder.useParallelExecutor(); +Reflections reflections = new InternalReflections(builder); return new PluginScanResult( getPluginDesc(reflections, Connector.class, loader), @@ -353,4 +359,24 @@ private void addAllAliases() { } } } + +private static class InternalReflections extends Reflections { + +public InternalReflections(final Configuration configuration) { +super(configuration); +} + +// When Reflections is used for parallel scans, it has a bug where it propagates ReflectionsException +// as RuntimeException. Override the scan behavior to emulate the singled-threaded logic. +protected void scan(URL url) { +try { +super.scan(url); +} catch (ReflectionsException e) { +Logger log = Reflections.log; +if (log != null && log.isWarnEnabled()) { +log.warn("could not create Vfs.Dir from url. ignoring the exception and continuing", e); +} +} +} +} } This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Connect: Plugin scan is very slow > - > > Key: KAFKA-6503 > URL: https://issues.apache.org/jira/browse/KAFKA-6503 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 1.0.0 >Reporter: Per Steffensen >Priority: Critical > Fix For: 1.1.0 > > > Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is > very slow - see logs from starting my Kafka-Connect instance at the bottom. > It takes almost 4 minutes scanning. I am running Kafka-Connect in docker > based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to > /usr/share/java. The only thing I have added is a 13MB jar in > /usr/share/java/kafka-connect-file-streamer-client containing two connectors > and a converter. That one alone seems to take 20 secs. > If it was just scanning in the background, and everything was working it > probably would not be a big issue. But it does not. Right after starting the > Kafka-Connect instance I try to create a connector via the /connectors > endpoint, but it will not succeed before the plugin scanning has finished (4 > minutes) > I am not even sure why scanning is necessary. Is it not always true that > connectors, converters etc are mentioned by name, so to see if it exists, > just try to load the class - the classloader will tell if it is available. > Hmmm, there is probably a reason. > Anyway, either it should be made much faster, or at least Kafka-Connect >
[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow
[ https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361812#comment-16361812 ] ASF GitHub Bot commented on KAFKA-6503: --- rayokota closed pull request #4561: KAFKA-6503: Parallelize plugin scanning URL: https://github.com/apache/kafka/pull/4561 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java index 345d7ef011d..c57caac9cba 100644 --- a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java +++ b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java @@ -269,6 +269,7 @@ private PluginScanResult scanPluginPath( ConfigurationBuilder builder = new ConfigurationBuilder(); builder.setClassLoaders(new ClassLoader[]{loader}); builder.addUrls(urls); +builder.useParallelExecutor(); Reflections reflections = new Reflections(builder); return new PluginScanResult( This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Connect: Plugin scan is very slow > - > > Key: KAFKA-6503 > URL: https://issues.apache.org/jira/browse/KAFKA-6503 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 1.0.0 >Reporter: Per Steffensen >Priority: Critical > Fix For: 1.1.0 > > > Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is > very slow - see logs from starting my Kafka-Connect instance at the bottom. > It takes almost 4 minutes scanning. I am running Kafka-Connect in docker > based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to > /usr/share/java. The only thing I have added is a 13MB jar in > /usr/share/java/kafka-connect-file-streamer-client containing two connectors > and a converter. That one alone seems to take 20 secs. > If it was just scanning in the background, and everything was working it > probably would not be a big issue. But it does not. Right after starting the > Kafka-Connect instance I try to create a connector via the /connectors > endpoint, but it will not succeed before the plugin scanning has finished (4 > minutes) > I am not even sure why scanning is necessary. Is it not always true that > connectors, converters etc are mentioned by name, so to see if it exists, > just try to load the class - the classloader will tell if it is available. > Hmmm, there is probably a reason. > Anyway, either it should be made much faster, or at least Kafka-Connect > should be fully functional (or as functional as possible) while scanning is > going on. > {code} > [2018-01-30 13:52:26,834] INFO Scanning for plugin classes. This might take a > moment ... (org.apache.kafka.connect.cli.ConnectDistributed) > [2018-01-30 13:52:27,218] INFO Loading plugin from: > /usr/share/java/kafka-connect-file-streamer-client > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,037] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-file-streamer-client/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,038] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerStreamSourceConnectorManager' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,039] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerFilesStreamerServerSourceConnectorManager' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,040] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.KafkaConnectByteArrayConverter' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,049] INFO Loading plugin from: > /usr/share/java/kafka-connect-elasticsearch > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,595] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-elasticsearch/} >
[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow
[ https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347234#comment-16347234 ] Randall Hauch commented on KAFKA-6503: -- Specifically, we're not using an executor when scanning: {code:java} ConfigurationBuilder builder = new ConfigurationBuilder(); builder.setClassLoaders(new ClassLoader[]{loader}); builder.addUrls(urls); Reflections reflections = new Reflections(builder); {code} When no executor is used, then scanning is done in a single thread. However, we are supplying more than a few URLs, so we should be able to parallelize this by providing an executor. Reflections actually has a method that will create an executor sized with the number of processors: {code:java} ConfigurationBuilder builder = new ConfigurationBuilder(); builder.setClassLoaders(new ClassLoader[]{loader}); builder.addUrls(urls); builder.useParallelExecutor(); Reflections reflections = new Reflections(builder); {code} The challenge is that on Docker with older JRE 8 releases, the JVM won't report the correct number of processors. We might just have to deal with that, though. > Connect: Plugin scan is very slow > - > > Key: KAFKA-6503 > URL: https://issues.apache.org/jira/browse/KAFKA-6503 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect >Affects Versions: 1.0.0 >Reporter: Per Steffensen >Priority: Critical > > Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is > very slow - see logs from starting my Kafka-Connect instance at the bottom. > It takes almost 4 minutes scanning. I am running Kafka-Connect in docker > based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to > /usr/share/java. The only thing I have added is a 13MB jar in > /usr/share/java/kafka-connect-file-streamer-client containing two connectors > and a converter. That one alone seems to take 20 secs. > If it was just scanning in the background, and everything was working it > probably would not be a big issue. But it does not. Right after starting the > Kafka-Connect instance I try to create a connector via the /connectors > endpoint, but it will not succeed before the plugin scanning has finished (4 > minutes) > I am not even sure why scanning is necessary. Is it not always true that > connectors, converters etc are mentioned by name, so to see if it exists, > just try to load the class - the classloader will tell if it is available. > Hmmm, there is probably a reason. > Anyway, either it should be made much faster, or at least Kafka-Connect > should be fully functional (or as functional as possible) while scanning is > going on. > {code} > [2018-01-30 13:52:26,834] INFO Scanning for plugin classes. This might take a > moment ... (org.apache.kafka.connect.cli.ConnectDistributed) > [2018-01-30 13:52:27,218] INFO Loading plugin from: > /usr/share/java/kafka-connect-file-streamer-client > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,037] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-file-streamer-client/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,038] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerStreamSourceConnectorManager' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,039] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerFilesStreamerServerSourceConnectorManager' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,040] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.KafkaConnectByteArrayConverter' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,049] INFO Loading plugin from: > /usr/share/java/kafka-connect-elasticsearch > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,595] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-elasticsearch/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,611] INFO Added plugin > 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,651] INFO Loading plugin from: > /usr/share/java/kafka-connect-jdbc > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,491] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-jdbc/} >
[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow
[ https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347021#comment-16347021 ] Randall Hauch commented on KAFKA-6503: -- Very similar to KAFKA-6208, but the latter actually proposes using the ServiceLoader mechanism that requires connector, transform, and converter implementations to supply an additional file in their JARs. That will take a number of releases to do, so this should be our long term directly. In the meantime, we should be able to speed up the scanning by either doing it in parallel or by scanning once for a set of interfaces (rather than one scan per interface). This is a short term win. > Connect: Plugin scan is very slow > - > > Key: KAFKA-6503 > URL: https://issues.apache.org/jira/browse/KAFKA-6503 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect >Affects Versions: 1.0.0 >Reporter: Per Steffensen >Priority: Major > > Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is > very slow - see logs from starting my Kafka-Connect instance at the bottom. > It takes almost 4 minutes scanning. I am running Kafka-Connect in docker > based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to > /usr/share/java. The only thing I have added is a 13MB jar in > /usr/share/java/kafka-connect-file-streamer-client containing two connectors > and a converter. That one alone seems to take 20 secs. > If it was just scanning in the background, and everything was working it > probably would not be a big issue. But it does not. Right after starting the > Kafka-Connect instance I try to create a connector via the /connectors > endpoint, but it will not succeed before the plugin scanning has finished (4 > minutes) > I am not even sure why scanning is necessary. Is it not always true that > connectors, converters etc are mentioned by name, so to see if it exists, > just try to load the class - the classloader will tell if it is available. > Hmmm, there is probably a reason. > Anyway, either it should be made much faster, or at least Kafka-Connect > should be fully functional (or as functional as possible) while scanning is > going on. > {code} > [2018-01-30 13:52:26,834] INFO Scanning for plugin classes. This might take a > moment ... (org.apache.kafka.connect.cli.ConnectDistributed) > [2018-01-30 13:52:27,218] INFO Loading plugin from: > /usr/share/java/kafka-connect-file-streamer-client > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,037] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-file-streamer-client/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,038] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerStreamSourceConnectorManager' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,039] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerFilesStreamerServerSourceConnectorManager' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,040] INFO Added plugin > 'com.tlt.common.files.streamer.client.kafka.connect.KafkaConnectByteArrayConverter' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:43,049] INFO Loading plugin from: > /usr/share/java/kafka-connect-elasticsearch > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,595] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-elasticsearch/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,611] INFO Added plugin > 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:47,651] INFO Loading plugin from: > /usr/share/java/kafka-connect-jdbc > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,491] INFO Registered loader: > PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-jdbc/} > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,491] INFO Added plugin > 'io.confluent.connect.jdbc.JdbcSinkConnector' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,492] INFO Added plugin > 'io.confluent.connect.jdbc.JdbcSourceConnector' > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30 13:52:49,663] INFO Loading plugin from: > /usr/share/java/kafka-connect-s3 > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > [2018-01-30