Thanks, Stephen, this is really helpful! On Tue, Apr 28, 2020 at 6:24 AM Stephen Mallette <spmalle...@gmail.com> wrote:
> > > > To step out of the weeds a bit - other than the Zookeeper / Curator > > example, does anyone know of any other apache projects that have either > > subprojects or complementary sideprojects they're interdependent upon in > > their ecosystems? > > > Every Apache project is different, so it's quite possible that the > experience I have in this area doesn't apply much here, but I'll offer some > words on the matter in the event that some of it is helpful. > > For many years even prior to joining Apache, TinkerPop was quite against > bringing in driver-style sub-projects. Our main concern was one that I > think was voiced here in this thread in some fashion, where core developers > would have to be knowledgeable of the incoming body of work and maintain > that going forward. For core contributors who were primarily Java > developers it was difficult to think that we'd suddenly be responsible for > reviews/VOTEs on Python code, for example. It was with a bit of > trepidation that we eventually decided it a good idea and opened the > project to them. For our purposes we brought all such projects directly > into our core repository as the thinking was that we wanted to keep all > aspects of the project unified (testing, release, etc) to ensure that for a > particular release tag you could be sure that everything worked together. > We initially started with just Python and developed that as our model for > how new drivers would arrive (there were already other disparate projects > out there in other languages). > > We wanted a model that ensured a reasonably high bar for acceptance and > created a rough set of minimum criteria we wanted to have for adding a new > driver to our release lines. The core of that criteria was a common > language agnostic test suite that needed to pass for us to consider it > "ready" in any sense and the project needed to build, test and release > using Maven (which is our build tool for the project). The former ensured > that we had a reasonable level of common tested functionality among drivers > and the latter ensured an easy and consistent way to manage build/release > practices (which fed nicely into our Docker infrastructure for both full > builds and for giving non-JVM developers a nice way to develop drivers > against the latest code without having to be Java experts). Once we > established this approach with Python, we successfully brought in .NET and > Javascript. > > I think there were a number of nice upsides to deciding to bring in drivers > in the first place and then in the model for acceptance that we chose: > > + We saw a greater diversity of folks contributing in general as the > ecosystem opened up beyond just the JVM. > + We saw that the general community coalesced around the "official" > drivers, contributing as one to them, rather than going off and creating > one-off projects. I'm not really aware of any third-party drivers right now > for the languages we support, but if you look at something like Go, there > are three or more choices. I suppose Go would be our next target for > official inclusion. > + Release day was pretty simple despite the complexity of the environment > with that mixed ecosystem because of our unified build model using Maven > and there wasn't a lot of disparate tooling exposed to the release manager > directly. > + I can't say that we really saw problems with core project developers (who > mostly new Java) having to review python/c#/javascript. For the most part, > the contribution quality was high and we managed and became more > knowledgeable as we went. > + As we released drivers and core together, we no longer had situations > where some third-party driver lagged behind some feature in core - if you > wanted to use the latest core functionality you just used the latest > release of core and driver and you could be assured they worked together > and we felt confident saying so. > > Doing it over again, I think I would still consider going single repo for > this situation but I think I might not place the requirement that the > projects build with Maven. I think Maven has turned-off some contributors > from those language ecosystems who don't know the JVM. They would have been > much more comfortable just working more directly with the tool systems that > they were familiar with. Of course, to get rid of local maven builds > completely we would have to build a "latest" Docker images so that folks > didn't need to do that themselves like they do now (also with Maven). > > Aside from TinkerPop experiences I will offer that, while I'm not > completely sure, I think that for a contribution like this one where the > bulk of the code has been developed outside of the ASF, the DS drivers > would need to go through an IP Clearance process: > > https://incubator.apache.org/ip-clearance/ > > > > On Mon, Apr 27, 2020 at 12:57 PM Joshua McKenzie <jmcken...@apache.org> > wrote: > > > To step out of the weeds a bit - other than the Zookeeper / Curator > > example, does anyone know of any other apache projects that have either > > subprojects or complementary sideprojects they're interdependent upon in > > their ecosystems? I'd like to reach out to some other pmc's for advice > and > > feedback on this topic since there's no sense in reinventing the wheel if > > other projects have wisdom to share on this. > > > > On Mon, Apr 27, 2020 at 12:42 PM Joshua McKenzie <jmcken...@apache.org> > > wrote: > > > > > re: ML noise, how hard would it be to filter out JIRA updates > w/component > > > "Drivers"? Or from JIRA queries? > > > > > > For governance, I see it cutting both ways. If we have two separate > > > projects and ML's for drivers and C*, how do we keep a coherent view of > > new > > > features and roadmap stuff? Do we have CEP's for both projects and tie > > them > > > together? Do we drive changes in the driver feature ecosystem via CEP's > > in > > > C*? > > > > > > In the Venn diagram of overlap vs. non between the two projects, I see > > > there being more overlap than not. > > > > > > On Mon, Apr 27, 2020 at 12:34 PM Dinesh Joshi <djo...@apache.org> > wrote: > > > > > >> > > >> > > >> > On Apr 27, 2020, at 2:50 AM, Sylvain Lebresne <lebre...@gmail.com> > > >> wrote: > > >> > > > >> > Fwiw, I agree with the concerns raised by Benedict, and think we > > should > > >> > carefully think about how this is handled. Which isn't not a > rejection > > >> of > > >> > the donation in any way. > > >> > > > >> > Drivers are not small projects, and the majority of their day to day > > >> > maintenance is unrelated to the server (and the reverse is true). > > >> > > > >> > From the user point of view, I think it would be fabulous that > > Cassandra > > >> > appears like one project with a server and some official drivers, > with > > >> one > > >> > coherent website and documentation for all. I'm all for striving for > > >> that. > > >> > > >> +1 > > >> > > >> > Behind the scenes however, I feel tings should be setup so that some > > >> amount > > >> > of > > >> > separation remains between server and whichever drivers are donated > > and > > >> > accepted, or I'm fairly sure things would get messy very > quickly[1]). > > >> In my > > >> > > >> Can you say more about what "getting messy very quickly" means here? > > >> > > >> > mind that means *at a minimum*: > > >> > - separate JIRA projects. > > >> > - dedicated _dev_ (and commits) mailing lists. > > >> > > >> If we're thinking through how this would be setup, initially we had > the > > >> same Jira project for sidecar but now there is a separate one to track > > >> sidecar specific jiras. At the moment we do not have a separate > mailing > > >> list. I think Cassandra dev mailing list's volume is low enough to > keep > > >> using the same ML. There is an added value that it gives visibility > and > > >> developers don't need to go between multiple mailing lists. > > >> > > >> > But it's also worth thinking whether a single pool of committers/PMC > > >> > members is > > >> > desirable. > > >> > > > >> > Tbc, I'm not sure what is the best way to achieve this within the > > >> > constraint of > > >> > the Apache fundation, and maybe I'm just stating the obvious here. > > >> > > > >> > > > >> > [1] fwiw, I say this as someone that at some points in time was > > >> > simultaneously > > >> > somewhat actively involved in both Cassandra and the DataStax Java > > >> driver. > > >> > > > >> > -- > > >> > Sylvain > > >> > > > >> > > > >> > On Fri, Apr 24, 2020 at 12:54 AM Benedict Elliott Smith < > > >> bened...@apache.org> > > >> > wrote: > > >> > > > >> >> Do you have some examples of issues? > > >> >> > > >> >> So, to explain my thinking: I believe there is value in most > > >> contributors > > >> >> being able to know and understand a majority of what the project > > >> >> undertakes. Many people track a wide variety of activity on the > > >> project, > > >> >> and whether they express an opinion they probably form one and will > > >> involve > > >> >> themselves if they consider it important to do so. I worry that > > >> importing > > >> >> several distinct and only loosely related projects to the same > > >> governance > > >> >> and communication structures has a strong potential to undermine > that > > >> >> capability, as people begin to assume that activity and > > >> decision-making is > > >> >> unrelated to them - and if that happens I think something important > > is > > >> lost. > > >> >> > > >> >> The sidecar challenges this already but seems hopefully manageable: > > it > > >> is > > >> >> a logical extension of Cassandra, existing primarily to plug gaps > in > > >> >> Cassandra's own functionality, and features may migrate to > Cassandra > > >> over > > >> >> time. It is likely to have releases closely tied to Cassandra > > itself. > > >> >> Other subprojects are so far exclusively for consumption by the > > >> Cassandra > > >> >> project itself, and are all naturally coupled. > > >> >> > > >> >> Drivers however are inherently arms-length endeavours: we publish a > > >> >> protocol specification, and driver maintainers implement it. They > > are > > >> >> otherwise fairly independent, and while a dialogue is helpful it > does > > >> not > > >> >> need to be controlled by a single entity. Many drivers will > continue > > >> to be > > >> >> controlled by others, as they have been until now. We're of course > > >> able to > > >> >> ensure there's a strong overlap of governance, which I think would > be > > >> very > > >> >> helpful, and something Curator and Zookeeper seem not to have > > managed. > > >> >> > > >> >> Looking at the Curator website, it also seems to pitch itself as a > > >> >> relatively opinionated product, and much more than a driver. I > hope > > >> the > > >> >> recipe for conflict in our case is much more limited given the > > >> functional > > >> >> scope of a driver - and anyway better avoided with more integrated, > > but > > >> >> still distinct governance. > > >> >> > > >> >> That's not to say I don't see some value in the project controlling > > the > > >> >> driver directly, I just worry about the above. > > >> >> > > >> >> > > >> >> > > >> >> On 22/04/2020, 21:25, "Nate McCall" <zznat...@gmail.com> wrote: > > >> >> > > >> >> On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith < > > >> >> bened...@apache.org> > > >> >> wrote: > > >> >> > > >> >>> I welcome the donation, and hope we are able to accept all of the > > >> >>> drivers. This is really great news IMO. > > >> >>> > > >> >>> I do however wonder if the project may be accumulating too many > > >> >>> sub-projects? I wonder if it's time to think about splitting, and > > >> >> perhaps > > >> >>> incubating a project for the drivers? > > >> >>> > > >> >> > > >> >> This is a legit concern and good question, but I think this is > > more > > >> a > > >> >> natural evolution of growing a project. There is precedent for > > this > > >> in > > >> >> Spark, Beam, Hadoop and others who have a number of different > > >> >> repositories > > >> >> under the general project umbrella. > > >> >> > > >> >> What I would like to avoid is a situation like with Apache > Curator > > >> and > > >> >> Apache Zookeeper. The former being a zookeeper client donation > > from > > >> >> Netflix > > >> >> that came in as a top level project. From the peanut gallery, it > > >> seems > > >> >> like > > >> >> that has been less than ideal a couple of times in the past > > >> >> coordinating > > >> >> releases, trademarks and such with separate project management. > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > --------------------------------------------------------------------- > > >> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >> >> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >> >> > > >> >> > > >> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > > >> > > >> > > >