[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow

2018-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365002#comment-16365002
 ] 

ASF GitHub Bot commented on KAFKA-6503:
---

ewencp closed pull request #4561: KAFKA-6503: Parallelize plugin scanning
URL: https://github.com/apache/kafka/pull/4561
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java
 
b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java
index 345d7ef011d..b21cdcbfab4 100644
--- 
a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java
+++ 
b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java
@@ -20,7 +20,10 @@
 import org.apache.kafka.connect.storage.Converter;
 import org.apache.kafka.connect.storage.HeaderConverter;
 import org.apache.kafka.connect.transforms.Transformation;
+import org.reflections.Configuration;
 import org.reflections.Reflections;
+import org.reflections.ReflectionsException;
+import org.reflections.scanners.SubTypesScanner;
 import org.reflections.util.ClasspathHelper;
 import org.reflections.util.ConfigurationBuilder;
 import org.slf4j.Logger;
@@ -269,7 +272,10 @@ private PluginScanResult scanPluginPath(
 ConfigurationBuilder builder = new ConfigurationBuilder();
 builder.setClassLoaders(new ClassLoader[]{loader});
 builder.addUrls(urls);
-Reflections reflections = new Reflections(builder);
+builder.setScanners(new SubTypesScanner());
+builder.setExpandSuperTypes(false);
+builder.useParallelExecutor();
+Reflections reflections = new InternalReflections(builder);
 
 return new PluginScanResult(
 getPluginDesc(reflections, Connector.class, loader),
@@ -353,4 +359,25 @@ private void addAllAliases() {
 }
 }
 }
+
+private static class InternalReflections extends Reflections {
+
+public InternalReflections(Configuration configuration) {
+super(configuration);
+}
+
+// When Reflections is used for parallel scans, it has a bug where it 
propagates ReflectionsException
+// as RuntimeException.  Override the scan behavior to emulate the 
singled-threaded logic.
+@Override
+protected void scan(URL url) {
+try {
+super.scan(url);
+} catch (ReflectionsException e) {
+Logger log = Reflections.log;
+if (log != null && log.isWarnEnabled()) {
+log.warn("could not create Vfs.Dir from url. ignoring the 
exception and continuing", e);
+}
+}
+}
+}
 }


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Connect: Plugin scan is very slow
> -
>
> Key: KAFKA-6503
> URL: https://issues.apache.org/jira/browse/KAFKA-6503
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.0.0
>Reporter: Per Steffensen
>Assignee: Robert Yokota
>Priority: Critical
> Fix For: 1.1.0, 1.2.0
>
>
> Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is 
> very slow - see logs from starting my Kafka-Connect instance at the bottom. 
> It takes almost 4 minutes scanning. I am running Kafka-Connect in docker 
> based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to 
> /usr/share/java. The only thing I have added is a 13MB jar in 
> /usr/share/java/kafka-connect-file-streamer-client containing two connectors 
> and a converter. That one alone seems to take 20 secs.
> If it was just scanning in the background, and everything was working it 
> probably would not be a big issue. But it does not. Right after starting the 
> Kafka-Connect instance I try to create a connector via the /connectors 
> endpoint, but it will not succeed before the plugin scanning has finished (4 
> minutes)
> I am not even sure why scanning is necessary. Is it not always true that 
> connectors, converters etc are mentioned by name, so to see if it exists, 
> just try to load the class - the classloader will tell if it is available. 
> Hmmm, there is probably a reason.
> Anyway, either it 

[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow

2018-02-13 Thread Robert Yokota (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363139#comment-16363139
 ] 

Robert Yokota commented on KAFKA-6503:
--

With my setup where plugin.path was pointing to share/java, the parallel 
scanning introduced with this change reduced worker initialization from 32s to 
12s on my Mac, and from 38s to 14s on Docker/Linux.

> Connect: Plugin scan is very slow
> -
>
> Key: KAFKA-6503
> URL: https://issues.apache.org/jira/browse/KAFKA-6503
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.0.0
>Reporter: Per Steffensen
>Priority: Critical
> Fix For: 1.1.0
>
>
> Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is 
> very slow - see logs from starting my Kafka-Connect instance at the bottom. 
> It takes almost 4 minutes scanning. I am running Kafka-Connect in docker 
> based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to 
> /usr/share/java. The only thing I have added is a 13MB jar in 
> /usr/share/java/kafka-connect-file-streamer-client containing two connectors 
> and a converter. That one alone seems to take 20 secs.
> If it was just scanning in the background, and everything was working it 
> probably would not be a big issue. But it does not. Right after starting the 
> Kafka-Connect instance I try to create a connector via the /connectors 
> endpoint, but it will not succeed before the plugin scanning has finished (4 
> minutes)
> I am not even sure why scanning is necessary. Is it not always true that 
> connectors, converters etc are mentioned by name, so to see if it exists, 
> just try to load the class - the classloader will tell if it is available. 
> Hmmm, there is probably a reason.
> Anyway, either it should be made much faster, or at least Kafka-Connect 
> should be fully functional (or as functional as possible) while scanning is 
> going on.
> {code}
> [2018-01-30 13:52:26,834] INFO Scanning for plugin classes. This might take a 
> moment ... (org.apache.kafka.connect.cli.ConnectDistributed)
> [2018-01-30 13:52:27,218] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-file-streamer-client 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,037] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-file-streamer-client/}
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,038] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerStreamSourceConnectorManager'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,039] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerFilesStreamerServerSourceConnectorManager'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,040] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.KafkaConnectByteArrayConverter'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,049] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-elasticsearch 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,595] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-elasticsearch/}
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,611] INFO Added plugin 
> 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector' 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,651] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-jdbc 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,491] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-jdbc/} 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,491] INFO Added plugin 
> 'io.confluent.connect.jdbc.JdbcSinkConnector' 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,492] INFO Added plugin 
> 'io.confluent.connect.jdbc.JdbcSourceConnector' 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,663] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-s3 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:53:51,055] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-s3/} 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:53:51,055] INFO Added plugin 
> 

[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow

2018-02-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362933#comment-16362933
 ] 

ASF GitHub Bot commented on KAFKA-6503:
---

rayokota opened a new pull request #4561: KAFKA-6503: Parallelize plugin 
scanning
URL: https://github.com/apache/kafka/pull/4561
 
 
   This is a one-liner to parallelize plugin scanning.  This may help in some 
environments where otherwise plugin scanning is slow.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Connect: Plugin scan is very slow
> -
>
> Key: KAFKA-6503
> URL: https://issues.apache.org/jira/browse/KAFKA-6503
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.0.0
>Reporter: Per Steffensen
>Priority: Critical
> Fix For: 1.1.0
>
>
> Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is 
> very slow - see logs from starting my Kafka-Connect instance at the bottom. 
> It takes almost 4 minutes scanning. I am running Kafka-Connect in docker 
> based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to 
> /usr/share/java. The only thing I have added is a 13MB jar in 
> /usr/share/java/kafka-connect-file-streamer-client containing two connectors 
> and a converter. That one alone seems to take 20 secs.
> If it was just scanning in the background, and everything was working it 
> probably would not be a big issue. But it does not. Right after starting the 
> Kafka-Connect instance I try to create a connector via the /connectors 
> endpoint, but it will not succeed before the plugin scanning has finished (4 
> minutes)
> I am not even sure why scanning is necessary. Is it not always true that 
> connectors, converters etc are mentioned by name, so to see if it exists, 
> just try to load the class - the classloader will tell if it is available. 
> Hmmm, there is probably a reason.
> Anyway, either it should be made much faster, or at least Kafka-Connect 
> should be fully functional (or as functional as possible) while scanning is 
> going on.
> {code}
> [2018-01-30 13:52:26,834] INFO Scanning for plugin classes. This might take a 
> moment ... (org.apache.kafka.connect.cli.ConnectDistributed)
> [2018-01-30 13:52:27,218] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-file-streamer-client 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,037] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-file-streamer-client/}
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,038] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerStreamSourceConnectorManager'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,039] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerFilesStreamerServerSourceConnectorManager'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,040] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.KafkaConnectByteArrayConverter'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,049] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-elasticsearch 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,595] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-elasticsearch/}
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,611] INFO Added plugin 
> 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector' 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,651] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-jdbc 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,491] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-jdbc/} 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,491] INFO Added plugin 
> 'io.confluent.connect.jdbc.JdbcSinkConnector' 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,492] INFO Added plugin 
> 'io.confluent.connect.jdbc.JdbcSourceConnector' 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,663] INFO Loading plugin from: 
> 

[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow

2018-02-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362930#comment-16362930
 ] 

ASF GitHub Bot commented on KAFKA-6503:
---

rayokota opened a new pull request #4561: KAFKA-6503: Parallelize plugin 
scanning
URL: https://github.com/apache/kafka/pull/4561
 
 
   This is a one-liner to parallelize plugin scanning.  This may help in some 
environments where otherwise plugin scanning is slow.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Connect: Plugin scan is very slow
> -
>
> Key: KAFKA-6503
> URL: https://issues.apache.org/jira/browse/KAFKA-6503
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.0.0
>Reporter: Per Steffensen
>Priority: Critical
> Fix For: 1.1.0
>
>
> Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is 
> very slow - see logs from starting my Kafka-Connect instance at the bottom. 
> It takes almost 4 minutes scanning. I am running Kafka-Connect in docker 
> based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to 
> /usr/share/java. The only thing I have added is a 13MB jar in 
> /usr/share/java/kafka-connect-file-streamer-client containing two connectors 
> and a converter. That one alone seems to take 20 secs.
> If it was just scanning in the background, and everything was working it 
> probably would not be a big issue. But it does not. Right after starting the 
> Kafka-Connect instance I try to create a connector via the /connectors 
> endpoint, but it will not succeed before the plugin scanning has finished (4 
> minutes)
> I am not even sure why scanning is necessary. Is it not always true that 
> connectors, converters etc are mentioned by name, so to see if it exists, 
> just try to load the class - the classloader will tell if it is available. 
> Hmmm, there is probably a reason.
> Anyway, either it should be made much faster, or at least Kafka-Connect 
> should be fully functional (or as functional as possible) while scanning is 
> going on.
> {code}
> [2018-01-30 13:52:26,834] INFO Scanning for plugin classes. This might take a 
> moment ... (org.apache.kafka.connect.cli.ConnectDistributed)
> [2018-01-30 13:52:27,218] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-file-streamer-client 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,037] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-file-streamer-client/}
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,038] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerStreamSourceConnectorManager'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,039] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerFilesStreamerServerSourceConnectorManager'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,040] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.KafkaConnectByteArrayConverter'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,049] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-elasticsearch 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,595] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-elasticsearch/}
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,611] INFO Added plugin 
> 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector' 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,651] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-jdbc 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,491] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-jdbc/} 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,491] INFO Added plugin 
> 'io.confluent.connect.jdbc.JdbcSinkConnector' 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,492] INFO Added plugin 
> 'io.confluent.connect.jdbc.JdbcSourceConnector' 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,663] INFO Loading plugin from: 
> 

[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow

2018-02-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362931#comment-16362931
 ] 

ASF GitHub Bot commented on KAFKA-6503:
---

rayokota closed pull request #4561: KAFKA-6503: Parallelize plugin scanning
URL: https://github.com/apache/kafka/pull/4561
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java
 
b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java
index 345d7ef011d..e4a54d621c5 100644
--- 
a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java
+++ 
b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java
@@ -20,7 +20,10 @@
 import org.apache.kafka.connect.storage.Converter;
 import org.apache.kafka.connect.storage.HeaderConverter;
 import org.apache.kafka.connect.transforms.Transformation;
+import org.reflections.Configuration;
 import org.reflections.Reflections;
+import org.reflections.ReflectionsException;
+import org.reflections.scanners.SubTypesScanner;
 import org.reflections.util.ClasspathHelper;
 import org.reflections.util.ConfigurationBuilder;
 import org.slf4j.Logger;
@@ -269,7 +272,10 @@ private PluginScanResult scanPluginPath(
 ConfigurationBuilder builder = new ConfigurationBuilder();
 builder.setClassLoaders(new ClassLoader[]{loader});
 builder.addUrls(urls);
-Reflections reflections = new Reflections(builder);
+builder.setScanners(new SubTypesScanner());
+builder.setExpandSuperTypes(false);
+builder.useParallelExecutor();
+Reflections reflections = new InternalReflections(builder);
 
 return new PluginScanResult(
 getPluginDesc(reflections, Connector.class, loader),
@@ -353,4 +359,24 @@ private void addAllAliases() {
 }
 }
 }
+
+private static class InternalReflections extends Reflections {
+
+public InternalReflections(final Configuration configuration) {
+super(configuration);
+}
+
+// When Reflections is used for parallel scans, it has a bug where it 
propagates ReflectionsException
+// as RuntimeException.  Override the scan behavior to emulate the 
singled-threaded logic.
+protected void scan(URL url) {
+try {
+super.scan(url);
+} catch (ReflectionsException e) {
+Logger log = Reflections.log;
+if (log != null && log.isWarnEnabled()) {
+log.warn("could not create Vfs.Dir from url. ignoring the 
exception and continuing", e);
+}
+}
+}
+}
 }


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Connect: Plugin scan is very slow
> -
>
> Key: KAFKA-6503
> URL: https://issues.apache.org/jira/browse/KAFKA-6503
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.0.0
>Reporter: Per Steffensen
>Priority: Critical
> Fix For: 1.1.0
>
>
> Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is 
> very slow - see logs from starting my Kafka-Connect instance at the bottom. 
> It takes almost 4 minutes scanning. I am running Kafka-Connect in docker 
> based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to 
> /usr/share/java. The only thing I have added is a 13MB jar in 
> /usr/share/java/kafka-connect-file-streamer-client containing two connectors 
> and a converter. That one alone seems to take 20 secs.
> If it was just scanning in the background, and everything was working it 
> probably would not be a big issue. But it does not. Right after starting the 
> Kafka-Connect instance I try to create a connector via the /connectors 
> endpoint, but it will not succeed before the plugin scanning has finished (4 
> minutes)
> I am not even sure why scanning is necessary. Is it not always true that 
> connectors, converters etc are mentioned by name, so to see if it exists, 
> just try to load the class - the classloader will tell if it is available. 
> Hmmm, there is probably a reason.
> Anyway, either it should be made much faster, or at least Kafka-Connect 
> 

[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow

2018-02-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361812#comment-16361812
 ] 

ASF GitHub Bot commented on KAFKA-6503:
---

rayokota closed pull request #4561: KAFKA-6503: Parallelize plugin scanning
URL: https://github.com/apache/kafka/pull/4561
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java
 
b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java
index 345d7ef011d..c57caac9cba 100644
--- 
a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java
+++ 
b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java
@@ -269,6 +269,7 @@ private PluginScanResult scanPluginPath(
 ConfigurationBuilder builder = new ConfigurationBuilder();
 builder.setClassLoaders(new ClassLoader[]{loader});
 builder.addUrls(urls);
+builder.useParallelExecutor();
 Reflections reflections = new Reflections(builder);
 
 return new PluginScanResult(


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Connect: Plugin scan is very slow
> -
>
> Key: KAFKA-6503
> URL: https://issues.apache.org/jira/browse/KAFKA-6503
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.0.0
>Reporter: Per Steffensen
>Priority: Critical
> Fix For: 1.1.0
>
>
> Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is 
> very slow - see logs from starting my Kafka-Connect instance at the bottom. 
> It takes almost 4 minutes scanning. I am running Kafka-Connect in docker 
> based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to 
> /usr/share/java. The only thing I have added is a 13MB jar in 
> /usr/share/java/kafka-connect-file-streamer-client containing two connectors 
> and a converter. That one alone seems to take 20 secs.
> If it was just scanning in the background, and everything was working it 
> probably would not be a big issue. But it does not. Right after starting the 
> Kafka-Connect instance I try to create a connector via the /connectors 
> endpoint, but it will not succeed before the plugin scanning has finished (4 
> minutes)
> I am not even sure why scanning is necessary. Is it not always true that 
> connectors, converters etc are mentioned by name, so to see if it exists, 
> just try to load the class - the classloader will tell if it is available. 
> Hmmm, there is probably a reason.
> Anyway, either it should be made much faster, or at least Kafka-Connect 
> should be fully functional (or as functional as possible) while scanning is 
> going on.
> {code}
> [2018-01-30 13:52:26,834] INFO Scanning for plugin classes. This might take a 
> moment ... (org.apache.kafka.connect.cli.ConnectDistributed)
> [2018-01-30 13:52:27,218] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-file-streamer-client 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,037] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-file-streamer-client/}
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,038] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerStreamSourceConnectorManager'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,039] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerFilesStreamerServerSourceConnectorManager'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,040] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.KafkaConnectByteArrayConverter'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,049] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-elasticsearch 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,595] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-elasticsearch/}
>  

[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow

2018-01-31 Thread Randall Hauch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347234#comment-16347234
 ] 

Randall Hauch commented on KAFKA-6503:
--

Specifically, we're not using an executor when scanning:

{code:java}
ConfigurationBuilder builder = new ConfigurationBuilder();
builder.setClassLoaders(new ClassLoader[]{loader});
builder.addUrls(urls);
Reflections reflections = new Reflections(builder);
{code}

When no executor is used, then scanning is done in a single thread. However, we 
are supplying more than a few URLs, so we should be able to parallelize this by 
providing an executor. Reflections actually has a method that will create an 
executor sized with the number of processors:
{code:java}
ConfigurationBuilder builder = new ConfigurationBuilder();
builder.setClassLoaders(new ClassLoader[]{loader});
builder.addUrls(urls);
builder.useParallelExecutor();
Reflections reflections = new Reflections(builder);
{code}

The challenge is that on Docker with older JRE 8 releases, the JVM won't report 
the correct number of processors. We might just have to deal with that, though.


> Connect: Plugin scan is very slow
> -
>
> Key: KAFKA-6503
> URL: https://issues.apache.org/jira/browse/KAFKA-6503
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 1.0.0
>Reporter: Per Steffensen
>Priority: Critical
>
> Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is 
> very slow - see logs from starting my Kafka-Connect instance at the bottom. 
> It takes almost 4 minutes scanning. I am running Kafka-Connect in docker 
> based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to 
> /usr/share/java. The only thing I have added is a 13MB jar in 
> /usr/share/java/kafka-connect-file-streamer-client containing two connectors 
> and a converter. That one alone seems to take 20 secs.
> If it was just scanning in the background, and everything was working it 
> probably would not be a big issue. But it does not. Right after starting the 
> Kafka-Connect instance I try to create a connector via the /connectors 
> endpoint, but it will not succeed before the plugin scanning has finished (4 
> minutes)
> I am not even sure why scanning is necessary. Is it not always true that 
> connectors, converters etc are mentioned by name, so to see if it exists, 
> just try to load the class - the classloader will tell if it is available. 
> Hmmm, there is probably a reason.
> Anyway, either it should be made much faster, or at least Kafka-Connect 
> should be fully functional (or as functional as possible) while scanning is 
> going on.
> {code}
> [2018-01-30 13:52:26,834] INFO Scanning for plugin classes. This might take a 
> moment ... (org.apache.kafka.connect.cli.ConnectDistributed)
> [2018-01-30 13:52:27,218] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-file-streamer-client 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,037] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-file-streamer-client/}
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,038] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerStreamSourceConnectorManager'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,039] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerFilesStreamerServerSourceConnectorManager'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,040] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.KafkaConnectByteArrayConverter'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,049] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-elasticsearch 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,595] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-elasticsearch/}
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,611] INFO Added plugin 
> 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector' 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,651] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-jdbc 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,491] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-jdbc/} 
> 

[jira] [Commented] (KAFKA-6503) Connect: Plugin scan is very slow

2018-01-31 Thread Randall Hauch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347021#comment-16347021
 ] 

Randall Hauch commented on KAFKA-6503:
--

Very similar to KAFKA-6208, but the latter actually proposes using the 
ServiceLoader mechanism that requires connector, transform, and converter 
implementations to supply an additional file in their JARs. That will take a 
number of releases to do, so this should be our long term directly.

In the meantime, we should be able to speed up the scanning by either doing it 
in parallel or by scanning once for a set of interfaces (rather than one scan 
per interface). This is a short term win.

> Connect: Plugin scan is very slow
> -
>
> Key: KAFKA-6503
> URL: https://issues.apache.org/jira/browse/KAFKA-6503
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 1.0.0
>Reporter: Per Steffensen
>Priority: Major
>
> Just upgraded to 1.0.0. It seems some plugin scan has been introduced. It is 
> very slow - see logs from starting my Kafka-Connect instance at the bottom. 
> It takes almost 4 minutes scanning. I am running Kafka-Connect in docker 
> based on confluentinc/cp-kafka-connect:4.0.0. I set plugin.path to 
> /usr/share/java. The only thing I have added is a 13MB jar in 
> /usr/share/java/kafka-connect-file-streamer-client containing two connectors 
> and a converter. That one alone seems to take 20 secs.
> If it was just scanning in the background, and everything was working it 
> probably would not be a big issue. But it does not. Right after starting the 
> Kafka-Connect instance I try to create a connector via the /connectors 
> endpoint, but it will not succeed before the plugin scanning has finished (4 
> minutes)
> I am not even sure why scanning is necessary. Is it not always true that 
> connectors, converters etc are mentioned by name, so to see if it exists, 
> just try to load the class - the classloader will tell if it is available. 
> Hmmm, there is probably a reason.
> Anyway, either it should be made much faster, or at least Kafka-Connect 
> should be fully functional (or as functional as possible) while scanning is 
> going on.
> {code}
> [2018-01-30 13:52:26,834] INFO Scanning for plugin classes. This might take a 
> moment ... (org.apache.kafka.connect.cli.ConnectDistributed)
> [2018-01-30 13:52:27,218] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-file-streamer-client 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,037] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-file-streamer-client/}
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,038] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerStreamSourceConnectorManager'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,039] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.OneTaskPerFilesStreamerServerSourceConnectorManager'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,040] INFO Added plugin 
> 'com.tlt.common.files.streamer.client.kafka.connect.KafkaConnectByteArrayConverter'
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:43,049] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-elasticsearch 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,595] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-elasticsearch/}
>  (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,611] INFO Added plugin 
> 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector' 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:47,651] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-jdbc 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,491] INFO Registered loader: 
> PluginClassLoader{pluginLocation=file:/usr/share/java/kafka-connect-jdbc/} 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,491] INFO Added plugin 
> 'io.confluent.connect.jdbc.JdbcSinkConnector' 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,492] INFO Added plugin 
> 'io.confluent.connect.jdbc.JdbcSourceConnector' 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30 13:52:49,663] INFO Loading plugin from: 
> /usr/share/java/kafka-connect-s3 
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> [2018-01-30