There are definitely pros and cons of splitting vs. being a single project.
The bigger pains for me until now have been the following ones:

Digging Solr failures

The theory is that Solr failures can help find Lucene bugs that Lucene bugs
wouldn't catch, and while this occurred a couple times, I found the
benefit-cost ratio to not be interesting as Solr tests can be especially
hard to debug being integration tests that use threading and usually don't
reproduce failing seeds.

Synchronized releases

Releasing at the same time has created interesting situations in the past.
For instance, we have had several Lucene patch releases with empty
changelogs. Another problem is that the more changes go into a release, the
more likely you would find last-minute blockers and need to respin.
Splitting would help keep the scope of each release smaller and reduce
chances of needing to respin.

The argument has been made that we would still need to coordinate major
releases because of backward compatibility guarantees, but to be clear
we're talking about something that would be much more lightweight than the
coordination that we require today. It would be totally fine if Solr
released new major versions several months after Lucene, the only
requirement is to have the same cadence.

Adapting Solr to Lucene changes

Lucene embraces its N-1 backward compatibility policy to move forward, but
Solr is reluctant to. For instance uninverting and numeric fields have been
deprecated more than 4 years ago (in favor of doc values and points
respectively), but they are still used in Solr, and the task of finding way
to keep the functionality working after removing the feature from Lucene
fell on the plate of the person who drove the deprecation/removal in
Lucene. While some might argue that a full split might move the cursor too
far in the other direction, I feel that this is something we should work on
addressing, even if the final decision is to not split.

In the end, I think that Lucene and Solr should keep a close relationship,
but reducing coupling would help. I wish the two projects had a
relationship that looked more like the relationship that we have with
OpenJDK, testing early-access builds, embracing new features and
deprecations, but without forcing tight coupling. I don't have strong
feelings about splitting PMCs and making Solr a TLP vs. remaining the same
project, but I wish we would at least make Solr depend on Lucene JARs and
decouple builds/releases.


On Mon, May 4, 2020 at 11:11 AM Dawid Weiss <dawid.we...@gmail.com> wrote:

> Dear Lucene and Solr developers!
>
> A few days ago, I initiated a discussion among PMC members about
> potential pros and cons of splitting the project into separate Lucene
> and Solr entities by promoting Solr to its own top-level Apache
> project (TLP). Let me share with you the motivation for such an action
> and some follow-up thoughts I heard from other PMC members so far.
>
> Please read this e-mail carefully. Both the PMC and I look forward to
> hearing your opinion. This is a DISCUSS thread and it will be followed
> next week by a VOTE thread. This is our shared project and we should
> all shape its future responsibly.
>
> The big question is this: “Is this the right time to split Solr and
> Lucene into two independent projects?”.
>
> Here are several technical considerations that drove me to ask the
> question above (in no order of priorities):
>
> 1) Precommit/ test times. These are crazy high. If we split into two
> projects we can pretty much cut all of Lucene testing out of Solr (and
> likewise), making development a bit more fun again.
>
> 2) Build system itself and source release packaging. The current
> combined codebase is a *beast* to maintain. Working with gradle on
> both projects at once made me realise how little the two have in
> common. The code layout, the dependencies, even the workflow of people
>
> working on these projects... The build (both ant and gradle) is full
> of Solr and Lucene-specific exceptions and hooks that could be more
> elegantly solved if moved to each project independently.
>
> 3) Packaging. There is no single source distribution package for
> Solr+Lucene. They are already "independent" there. Why should Lucene
> and Solr always be released at the same pace? Does it always make
> sense?
>
> 4) Solr is essentially taking in Lucene and its dependencies as a
> whole (so is Elasticsearch and many other projects). In my opinion
> this makes Lucene eligible for refactoring and
>
> maintenance as a separate component. The learning curve for people
> coming to each project separately is going to be gentler than trying
> to dive into the combined codebase.
>
> 5) Mailing lists, build servers. Mailing lists for users are already
> separated. I think this is yet another indication that Solr is
> something more than a component within Lucene. It is perceived as an
> independent entity and used as an independent product. I would really
> like to have separate mailing lists for these two projects (this
> includes build and test results) as it would make life easier: if your
> focus is more on Lucene (or Solr), you would only need to track half
> of the current traffic.
>
>
> As I already mentioned, the discussion among PMC members highlighted
> some initial concerns and reasons why the project should perhaps
> remain glued together. These are outlined below with some of the
> counter-arguments presented under each concern to avoid repetition of
> the same content from the PMC mailing list (they’re copied from the
> private discussion list).
>
> 1) Both projects may gradually split their ways after the separation
> and even develop “against” each other like it used to be before the
> merge.
>
> Whether this is a legitimate concern is hard to tell. If Solr goes TLP
> then all existing Lucene committers will automatically become Solr
> committers (unless they opt not to) so there will be both procedural
> ways to prevent this from happening (vetoes) as well as common-sense
> reasons to just cooperate.
>
> 2) Some people like parallel version numbering (concurrent Solr and
> Lucene releases) as it gives instant clarity which Solr version uses
> which version of Lucene.
>
> This can still be done on Solr side (it is Solr’s decision to adapt
> any versioning scheme the project feels comfortable with). I
> personally (DW) think this kind of versioning is actually more
> confusing than helpful; Solr should have its own cadence of releases
> driven by features, not sub-component changes. If the “backwards
> compatibility” is a factor then a solution might be to sync on major
> version releases only (e.g., this is how Elasticsearch is handling
> this).
>
> 3) Solr tests are the first “battlefield” test zone for Lucene changes
> - if it becomes TLP this part will be gone.
>
> Yes, true. But realistically Solr will have to adopt some kind of
> snapshot-based dependency on Lucene anyway (whether as a git submodule
> or a maven snapshot dependency). So if there are bugs in Lucene they
> will still be detected by Solr tests (and fairly early).
>
> 4) Why split now if we merged in the first place?
>
> Some of you may wonder why split the project that was initially
> *merged* from two independent codebases (around 10 years ago). In
> short, there was a lot of code duplication and interaction between
> Solr and Lucene back then, with patches flying back and forth.
> Integration into a single codebase seemed like a great idea to clean
> things up and make things easier. In many ways this is exactly what
> did happen: we have cleaned up code dependencies and reusable
> components (on Lucene side) consumed by not just Solr but also other
> projects (downstream from Lucene).
>
> The situation we find ourselves now is different to what it was
> before: recent and ongoing development for the most part falls within
> Solr or Lucene exclusively.
>
>
> This e-mail is for discussing the idea and presenting arguments/
> counter-arguments for or against the split. It will be followed by a
> separate VOTE thread e-mail next Monday. If the vote passes then there
> are many questions about how this process should be arranged and
> orchestrated. There are past examples even within Lucene [1] that we
> can learn from, and there are people who know how to do it - the
> actual process is of lesser concern at the moment, what we mostly want
> to do is to reach out to you, signal the idea and ask about your
> opinion. Let us know what you think.
>
> [1]
> https://lists.apache.org/thread.html/15bf2dc6d6ccd25459f8a43f0122751eedd3834caa31705f790844d7%401270142638%40%3Cuser.nutch.apache.org%3E
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Adrien

Reply via email to