davido commented on issue #12307:
URL: https://github.com/apache/lucene/issues/12307#issuecomment-1807097946
@uschindler
Thank you for clarifying and for the links to the specifications.
What confused me, is that the problem showed up after two changes: using JDK
21 at runtime, and updating to Lucene 9.8.0.
Apparently, starting from Lucene 9.x the Multi-Release JAR files are used:
```bash
davido@wizball:~/projects/gerrit/junk/WEB-INF/lib (jdk_21_support %)$ unzip
-t lucene-core-9.8.0.jar | grep MemorySegmentIndexInputProvider
testing:
META-INF/versions/19/org/apache/lucene/store/MemorySegmentIndexInputProvider.class
OK
testing:
META-INF/versions/20/org/apache/lucene/store/MemorySegmentIndexInputProvider.class
OK
testing:
META-INF/versions/21/org/apache/lucene/store/MemorySegmentIndexInputProvider.class
OK
```
Clearly, because Bazel currently doesn't have MR-JAR support: [1], trying to
merge `lucene-core.jar` and `lucene-backward-codecs.jar` is breaking MR-JAR
format, that explains `NoClassDefFoundError` we are seeing.
Gerrit Code Review project started to merge `lucene-core` and
`backward-codecs` 8 years ago to "understand" index format created by older
Lucene releases, so that Gerrit site could be reindexed with new Gerrit
releases that is shipping new Lucene version, with this explanation in the
commit message:
```
Merge Lucene core and backward-codecs jars
Both of these jars provide a provider-configuration file in
META-INF/services/org.apache.lucene.codecs.Codec registering their
respective implementations as providers of this codec. The proper way
to merge these files is to concatenate them, but the normal Buck build
process would otherwise choose one arbitrarily.
Add a new custom rule merge_maven_jars to merge multiple Maven jars
together using a simple Python script. The script concatenates all the
entries in two zip files, preferring the entry found in the first file
on the command line, which is still arbitrary but at least
deterministic. It specially handles files in the META-INF/services
directory by concatenating them.
Use this new rule to merge the old :core and :backward-codecs rules
into a single :core-and-backward-codecs rule.
```
In fact, it's still true for Lucene 9.8.0, where:
lucene-backward-codecs/META-INF/services/org.apache.lucene.codecs.Codec
```
org.apache.lucene.backward_codecs.lucene80.Lucene80Codec
org.apache.lucene.backward_codecs.lucene84.Lucene84Codec
org.apache.lucene.backward_codecs.lucene86.Lucene86Codec
org.apache.lucene.backward_codecs.lucene87.Lucene87Codec
org.apache.lucene.backward_codecs.lucene70.Lucene70Codec
org.apache.lucene.backward_codecs.lucene90.Lucene90Codec
org.apache.lucene.backward_codecs.lucene91.Lucene91Codec
org.apache.lucene.backward_codecs.lucene92.Lucene92Codec
org.apache.lucene.backward_codecs.lucene94.Lucene94Codec
```
and
lucene-core/META-INF/services/org.apache.lucene.codecs.Codec
```
org.apache.lucene.codecs.lucene95.Lucene95Codec
```
However, the better question is: why those file need to be merged? To not
mess around with MR-JAR file format, wouldn't it e sufficient to just put the
`lucene-backward-codecs` and `lucene-core` AS-IS on the classpath?
So, I stopped merging the JARs, and preserved the original JARs. Now, the
tests are passing, and I was able to reindex gerrit site with latest Lucene
release 9.8.0 that was created with previous Lucene release 8.11.2:
```
$ davido@wizball:~/projects/gerrit (jdk_21_support %)$ unzip -t
bazel-bin/gerrit.war | grep lucene | grep 9.8.0
testing: WEB-INF/lib/lucene-core-9.8.0.jar OK
testing: WEB-INF/lib/lucene-backward-codecs-9.8.0.jar OK
testing: WEB-INF/lib/lucene-queryparser-9.8.0.jar OK
testing: WEB-INF/lib/lucene-analysis-common-9.8.0.jar OK
testing: WEB-INF/lib/lucene-misc-9.8.0.jar OK
```
Am I understanding correctly, that with the recent Lucene releases, the
merging of `lucene-backward-codecs` and `lucene-core` JARs is not necessary any
more?
[1] https://github.com/bazelbuild/bazel/issues/5947
[2] https://gerrit-review.googlesource.com/c/gerrit/+/69850
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]