On Wed, May 3, 2017 at 11:17 AM, Konstantine Karantasis < konstant...@confluent.io> wrote:
> Thanks Ewen. I'm replying inline as well. > > On Tue, May 2, 2017 at 11:24 AM, Ewen Cheslack-Postava <e...@confluent.io> > wrote: > > > Thanks for the KIP. > > > > A few responses inline, followed by additional comments. > > > > On Mon, May 1, 2017 at 9:50 PM, Konstantine Karantasis < > > konstant...@confluent.io> wrote: > > > > > Gwen, Randall thank you for your very insightful observations. I'm glad > > you > > > find this first draft to be an adequate platform for discussion. > > > > > > I'll attempt replying to your comments in order. > > > > > > Gwen, I also debated exactly the same two options: a) interpreting > > absence > > > of module path as a user's intention to turn off isolation and b) > > > explicitly using an additional boolean property. A few reasons why I > went > > > with b) in this first draft are: > > > 1) As Randall mentions, to leave the option of using a default value > > open. > > > If not immediately in the first version of isolation, maybe in the > > future. > > > 2) I didn't like the implicit character of the choice of interpreting > an > > > empty string as a clear intention to turn isolation off by the user. > Half > > > the time could be just that users forget to set a location, although > > they'd > > > like to use class loading isolation. > > > 3) There's a slim possibility that in rare occasions a user might want > to > > > avoid even the slightest increase in memory consumption due to class > > > loading duplication. I admit this should be very rare, but given the > > other > > > concerns and that we would really like to keep the isolation > > implementation > > > simple, the option to turn off this feature by using only one > additional > > > config property might not seem too excessive. At least at the start of > > this > > > discussion. > > > 4) Debugging during development might be simpler in some cases. > > > 5) Finally, as you mention, this could allow for smoother upgrades. > > > > > > > I'm not sure any of these keep you from removing the extra config. Is > there > > any reason you couldn't have clean support for relying on the CLASSPATH > > while still supporting the classloaders? Then getting people onto the new > > classloaders does require documentation for how to install connectors, > but > > that's pretty minimal. And we don't break existing installations where > > people are just adding to the CLASSPATH. It seems like this: > > > > 1. Allows you to set a default. Isolation is always enabled, but we won't > > include any paths/directories we already use. Setting a default just > > requires specifying a new location where we'd hold these directories. > > 2. It doesn't require the implicit choice -- you actually never turn off > > isolation, but still support the regular CLASSPATH with an empty list of > > isolated loaders > > 3. The user can still use CLASSPATH if they want to minimize classloader > > overhead > > 4. Debugging can still use CLASSPATH > > 5. Upgrades just work. > > > > Falling back to CLASSPATH for non-isolated mode makes sense. The extra > config property was suggested proactively, as well as for clarity and > handling of defaults. But it's much better if we can do without it. Will be > removed. > > Thank you (and I'm sure Confluent's support team also thanks you - one configuration should be harder to accidentally screw up than two...) > > > > > > > > Randall, regarding your comments: > > > 1) To keep its focus narrow, this KIP, as well as the first > > implementation > > > of isolation in Connect, assume filesystem based discovery. With > careful > > > implementation, transitioning to discovery schemes that support broader > > > URIs I believe should be easy in the future. > > > > > > > Maybe just mention a couple of quick examples in the KIP. When described > > inline it might be more obvious that it will extend cleanly. > > > > > There's an example for a filesystem-based structure. I will enhance it. > > > > > > 2) The example you give makes a good point. However I'm inclined to say > > > that such cases should be addressed more as exceptions rather than as > > being > > > the common case. Therefore, I wouldn't see all dependencies imported by > > the > > > framework as required to be filtered out, because in that case we lose > > the > > > advantage of isolation between the framework and the connectors (and we > > are > > > left only with isolation between connectors). > > > > 3) I tried to abstract implementation details in this the KIP, but you > are > > > right. Even though filtering here is mainly used semantically rather > than > > > literally, it gives an implementation hint that we could avoid. > > > > > > > I think we're missing another option -- don't do filtering and require > that > > those dependencies are correctly filtered out of the modules. If we want > to > > be nicer about this, we could also detect maybe 2 or 3 classes while > > scanning for Connectors/Converters/Transformations that indicate the > > classloader has jars that it shouldn't and warn about it. I can't think > of > > that many that would be an issue -- basically connect-api, > connect-runtime > > if they really mess it up, and maybe slf4j. > > > > > This sounds a bit more strict, which is not necessarily a bad thing. Still, > it seems to also keep the subject under the category of implementation > decisions. > > > 4) In the same spirit as in 3) I believe we should reserve enough > > > flexibility to the implementation to discover and load classes, when > they > > > appear in multiple locations under the general module location. > > > > > > > > And a couple of addition comments: > > > > - module.path should be module.paths if it accepts multiple paths > > > > The configuration property was named in a way that resembles classpath. I > felt this might offer some intuition, even if the word "path" is too > filesystem oriented. Users are used to give a list of paths in classpath, > so I didn't find singular too confusing. Additionally, it will work when > there's only one "path", or I should say URI, given. But I don't feel very > strong about the name of this property. (Alternatives that I tried such as > module.uris, module.locations didn't seem satisfying) > > > - I think the description of the module paths is a bit confusing because > I > > think for simpler configuration we actually want to specify the *parent* > > directory of module paths. I definitely prefer this since it's simpler > > although I am not certain how it will mix with future extensions to other > > URL handlers (where a zip file or http URL wouldn't do the same discovery > > within subdirectories) > > > > My impression is that the 2nd paragraph below the introduction of > module.path is where the "parent" convention is attempted to be explained. > I will attempt to improve this text, but I feel that, again, the filesystem > convention will follow the generic definition of the property (a list of > locations/paths). > I was also confused while reading the description. It clicked at some point, but it wasn't immediately clear that we load from sub-directories too. > > -Konstantine > > > > > > -Ewen > > > > > > > Thanks again! Let me know what you think. > > > Konstantine > > > > > > > > > On Mon, May 1, 2017 at 10:12 AM, Randall Hauch <rha...@gmail.com> > wrote: > > > > > > > Very nice work, Konstantine. Conflicting dependencies of connectors > is > > > > indeed a big issue that makes it hard to manage installed connectors. > > > > > > > > I do like Gwen's idea about removing the 'module.isolation.enabled' > > > > property. However, I would have anticipated always using classpath > > > > isolation for *only* those components registered under the module > path > > > and > > > > not really for anything else already on the normal classpath. So, > > people > > > > could continue to place custom connector JARs onto the classpath, > > though > > > > this would become deprecated in favor of installing custom connector > > > JARs / > > > > modules via the module path. This keeps configuration simple, gives > > > people > > > > time to migrate, but let's people that need classpath isolation get > it > > to > > > > install a variety of connectors each with their dependencies that > > > > potentially conflict with other components. > > > > > > > > The challenge is whether there should be a default for 'module.path'. > > > > Ideally there would be so that users know where they can install > their > > > > connectors. However, I suspect that this might be difficult to do > > unless > > > it > > > > can make use of system properties such as "${kafka.home}" so that > > > relative > > > > directories can be specified. > > > > > > > > A few other questions/comments: > > > > > > > > 1) Does the KIP have to specify how are components / modules > installed, > > > > discovered, or recognized by Kafka Connect? Or perhaps the KIP needs > to > > > > just specify the semantics of the file system module path (e.g., the > > > > directories below those specified in the module path are to be unique > > and > > > > identify an installed component). > > > > > > > > 2) Will the module classloader filtering also have to exclude Kafka > > > Connect > > > > dependencies? The only one that I can think of is the SLF4J API, > which > > > > can't be loaded from the module's classloader if the connector is to > > send > > > > its log messages to the same logging system. > > > > > > > > 3) Rather than specify filtering, would be it a bit more flexible to > > > simply > > > > say that the implementation will need to ensure that Java, Kafka > > Connect, > > > > and other third party APIs (e.g., SLF4J API) will not be loaded from > > the > > > > module classloaders? It'd be better to avoid specifying how it will > be > > > > done, just in case the implementation needs to evolve or use a > > different > > > > technique (e.g., load the Java and public Kafka Connect APIs via one > > > > classloader that is reused and that always appears before the module > > > > classloader, while Kafka Connect implementation JARs appear after the > > > > component's classloader. > > > > > > > > 4) Perhaps to address #2 and #3 above, perhaps the KIP could > explicitly > > > > specify the classloader order for a deployed connector. For example, > > > > 'java', 'kafka-connect-apis', 'connector-module', 'smt-module-1', > ..., > > > > 'kafka-connect-impls', where 'connector-module' is the classloader > for > > > the > > > > (first) module where the connector is found, 'smt-module-1' is the > > > > classloader for the (first) module where the first SMT class is found > > (if > > > > specified and found in a separate module), 'smt-module-2' is the > > > > classloader .... Might also need to say that the KIP does not specify > > how > > > > the implementation will pick the module if a specified class if found > > in > > > > more than one module. > > > > > > > > Thoughts? > > > > > > > > Randall > > > > > > > > On Mon, May 1, 2017 at 6:43 AM, Gwen Shapira <g...@confluent.io> > > wrote: > > > > > > > > > Hi Konstantine, > > > > > > > > > > Thank you so much for driving this! The connector classpath mess is > > > > driving > > > > > me nuts (or worse, driving me to use Docker). > > > > > > > > > > I like the proposal for micro-benchmarks to test the context > > switching > > > > > overhead. > > > > > > > > > > I have a difficult time figuring out the module.isolation.enabled. > > > > > Especially with a default to false. I can't think of a reason that > > > anyone > > > > > will not want classpath isolation. "No! I want my connectors to > mess > > up > > > > > each other's dependencies" said no one ever. > > > > > > > > > > So it looks like this is mostly for upgrade purpose? Because the > > > initial > > > > > upgrade will not have the module.path set and therefore classpath > > > > isolation > > > > > will simply not work by default? > > > > > > > > > > In that case, why don't we simply use the existence of non-empty > > > > > module.path as an indicator of whether isolation should work or > not? > > > seem > > > > > simpler and intuitive to me. > > > > > > > > > > Thanks! > > > > > > > > > > Gwen > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis < > > > > > konstant...@confluent.io> wrote: > > > > > > > > > > > * Because of KIP number collision, please disregard my previous > KIP > > > > > > announcement and use this thread for discussion instead * > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > we aim to address dependency conflicts in Kafka Connect soon by > > > > applying > > > > > > class loading isolation. > > > > > > > > > > > > Feel free to take a look at KIP-146 here: > > > > > > > > > > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > > 146+-+Classloading+Isolation+in+Connect > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > > 146+-+Classloading+Isolation+in+Connect>* > > > > > > > > > > > > which describes minimal required changes to public interfaces and > > the > > > > > > general implementation approach. > > > > > > > > > > > > This is a much wanted feature for Kafka Connect. Your feedback is > > > > highly > > > > > > appreciated. > > > > > > > > > > > > -Konstantine > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > *Gwen Shapira* > > > > > Product Manager | Confluent > > > > > 650.450.2760 | @gwenshap > > > > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog > > > > > <http://www.confluent.io/blog> > > > > > > > > > > > > > > > -- *Gwen Shapira* Product Manager | Confluent 650.450.2760 | @gwenshap Follow us: Twitter <https://twitter.com/ConfluentInc> | blog <http://www.confluent.io/blog>