Hi,

 

I am currently on Easter vacation, but just wanted to give some feedback:

They wy you are doing it works at the moment, because you don’t have anything 
to do with service providers. You don’t use custom codecs (outside 
lucene-core.jar) and you just ignored the META-IF/services folder. Your example 
works, as Lucene finds everything that is needed in the core.jar, so the 
“legacy” service loading works (lucene-core.jar find its own services in 
lucene-core.jar). As soon as you would add lucene-backwards-codecs.jar, it 
would break unfixable, unless you copy the whole backwards codecs into core. 
This is completely unfixable in Lucene 8.x, as the new service-loader interface 
requires Java 9+.

 

But you may not yet have noticed that I already started to do preparatory work 
and migrated in master (Lucene 9.0) the service provider to use the Java 
Runtime classes and we have thrown away our own service loader (which cannot 
cross module boundaries). The fix was merged not long ago:  
<https://issues.apache.org/jira/browse/LUCENE-9281> 
https://issues.apache.org/jira/browse/LUCENE-9281 - this cannot be backported 
to Java 8 / Lucene 8, as the reason for the home-made SPIClassIterator was 
exactly the missing features (and some classpath ordering issues in 3rd party 
JVMs like IBM J9). Those issues are fixed in Java 11 (also allowing to 
instantiate the classes on your own). When you are testing your “hack patch” 
with Lucene’s master branch you may succeed with also loading classes from the 
backwards codecs – no guarantees yet!)

 

About StandardAnalyzer: Unfortunately I aggressively complained a while back 
when Mike McCandless wanted to move standard analyzer out of the analysis 
package into core (“for convenience”). This was a bad step, and IMHO we should 
revert that or completely rename the packages and everything. The problem here 
is: As the analysis services are only part of lucene-analyzers, we had to leave 
the factory classes there, but move the implementation classes in core. The 
package has to be the same. The only way around that is to move the analysis 
factory framework also to core (I would not be against that). This would 
include all factory base classes and the service loading stuff. Then we can 
move standard analyzer and some of the filters/tokenizers including their 
factories to core an that problem would be solved.

 

My plan to move to modules is the following:

 

(1) Do preparatory work:

*       Retire SPIClassIterator (DONE).
*       Add some preparatory issues to cleanup class hierarchy: Move Analysis 
SPI to core / remove StandardAnalyzer and related classes out of core back to 
anaysis
*       Fix Codec API to no rely on package-private classes, so we can have a 
completely public API with abstract classes for codecs, so stuff in 
backwards-codecs does not need to have access to package private stuff in core.
*       Cleanup sandbox to prefix all classes there with “sandbox” package and 
where needed remove package-private access. If it’s needed for internal access, 
WTF: Just move the stuff to core! We have a new version 9.0, so either 
retire/delete Sandbox stuff or make it part of Lucene core.

(2) Wait for the Gradle Build to be finalized, because including Module stuff 
into the current Ant build won’t work

*       Ant build must be retired!

(3) Make Lucene real modules

*       As we are on Java 11 already, add module-info.java everywhere
*       Fix gradle build to create and test modules (Latest Gradle needed)
*       Migrate all META-INF/services/* to module-info.java (before doing this, 
figure out of the META-INF files must stays for non-module usage, or if Java is 
clever enough to also look into module descriptor to find services). We may 
need all services at both locations (for module or classpath usage; we need a 
build helper to check that it’s in-line)

 

I don’t want Lucene work with automodules in Java 11, it should be fully 
modularized.

 

If you want to help and express interest: Please open an issue for the 
preparatory work listing the cases of same-package in different jar files. I 
would split this up as described above: Analysis issues with standardanalyzer, 
codecs pkg-private apis, sandbox.

 

Uwe

 

-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: [email protected]

 

From: David Ryan <[email protected]> 
Sent: Saturday, April 11, 2020 9:30 AM
To: [email protected]
Subject: Re: Lucene 9.0 Java module system support

 

 

Thanks, I had probably missed some of what Uwe was saying in regards to the 
limitation of what would be possible even if the suggested changes were made. 
Given your points and the fact that it would take a while to have any of the 
changes filter into a release version, I decided to develop a patch-lucene.sh 
script to validate if the changes would allow me to access the required parts 
of Lucene.  While it may not provide the full functionality provided by SPI 
including analysis chains and codecs, I worked out it will allow the use of the 
basics.

 

The patch script below adds Automatic module names to the four Lucene libraries 
I needed (not strictly required, but I was interested to check if it would 
work).  For now, any duplicate packages I identified have been moved into the 
core library (as identified in suggested changes). Given, I'm not using the SPI 
functionality, I simply deleted the standard analysers from the services list 
(required or the module system complains they are missing).  Thankfully Gradle 
doesn't validate jars in the cache so the patch script only needs to be run 
once.

 

Once the patch script was applied, I was able to use Ngram analysers and the 
query parser without issue. This will provide the essentials of what we needed 
for an embedded solution (location indexing, short text indexing with ngrams 
tokenizer and lower case filter, searching by location with nearest, bounding 
box and radius search, and partial text searches using the query parser). Of 
course, I've only scratched the surface of the APIs and right now it isn't 
clear if/when the SPI functionality might cause further issues or be required.  

 

So, I've validated that the suggested changes would allow basic functionality 
to work under the Java module system. If there's anything else I can do to help 
progress the suggested changes please let me know.

 

Regards,

David.

 

 

--------- patch-lucene.sh -- works on osx ------

mkdir -p patch-lucene/core
mkdir -p patch-lucene/analyzers
mkdir -p patch-lucene/sandbox
mkdir -p patch-lucene/queryparser
cd patch-lucene

# copy the jars from the gradle cache.
cp 
~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-core/8.5.0/3f9ea85fff4fc3f7c83869dddb9b0ef7818c0cae/lucene-core-8.5.0.jar
 .
cp 
~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-analyzers-common/8.5.0/7156f2e545fd6e7faaee4781d15eb60cf5f07646/lucene-analyzers-common-8.5.0.jar
 .
cp 
~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-sandbox/8.5.0/2b275921f2fd92b15b4f1a2a565467c3fa221ef9/lucene-sandbox-8.5.0.jar
 .
cp 
~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-queryparser/8.5.0/13c38f39b1a7d10c4749ba789fa95da5868d4885/lucene-queryparser-8.5.0.jar
 .

# Add automatic module name to core.
cd core
jar -xf ../lucene-core-8.5.0.jar
sed -i '' -e '/Created-By:/a\'$'\n''Automatic-Module-Name: 
org.apache.lucene.core'$'\n' META-INF/MANIFEST.MF

# Add automiatc module name, move standard analysis classes into core. Remove 
any classes in standard from service lists. 
cd ../analyzers
jar -xf ../lucene-analyzers-common-8.5.0.jar
sed -i '' -e '/Created-By:/a\'$'\n''Automatic-Module-Name: 
org.apache.lucene.analyzers.common'$'\n' META-INF/MANIFEST.MF
sed -i '' -e '/standard/d' META-INF/services/*
mv org/apache/lucene/analysis/standard/* 
../core/org/apache/lucene/analysis/standard
rmdir org/apache/lucene/analysis/standard
jar -cfm ../lucene-analyzers-common-8.5.0-fix.jar META-INF/MANIFEST.MF .

# Add automatic module name, move search and document packages into core. move 
Java 9 specific search into core. 
cd ../sandbox
jar -xf ../lucene-sandbox-8.5.0.jar
sed -i '' -e '/Created-By:/a\'$'\n''Automatic-Module-Name: 
org.apache.lucene.sandbox'$'\n' META-INF/MANIFEST.MF
mv org/apache/lucene/search/* ../core/org/apache/lucene/search
rmdir org/apache/lucene/search
mv org/apache/lucene/document/* ../core/org/apache/lucene/document
rmdir org/apache/lucene/document
mv META-INF/versions/9/org/apache/lucene/search/* 
../core/META-INF/versions/9/org/apache/lucene/search
rm -rf META-INF/versions
jar -cfm ../lucene-sandbox-8.5.0-fix.jar META-INF/MANIFEST.MF .

# Package up core with changes.
cd ../core
jar -cfm ../lucene-core-8.5.0-fix.jar META-INF/MANIFEST.MF .

# Add automatic module name.
cd ../queryparser
jar -xf ../lucene-queryparser-8.5.0.jar
sed -i '' -e '/Created-By:/a\'$'\n''Automatic-Module-Name: 
org.apache.lucene.queryparser'$'\n' META-INF/MANIFEST.MF
jar -cfm ../lucene-queryparser-8.5.0-fix.jar META-INF/MANIFEST.MF .

# Copy the fixed versions back into gradle cache.
cd ..
cp lucene-core-8.5.0-fix.jar 
~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-core/8.5.0/3f9ea85fff4fc3f7c83869dddb9b0ef7818c0cae/lucene-core-8.5.0.jar
cp lucene-analyzers-common-8.5.0-fix.jar 
~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-analyzers-common/8.5.0/7156f2e545fd6e7faaee4781d15eb60cf5f07646/lucene-analyzers-common-8.5.0.jar
cp lucene-sandbox-8.5.0-fix.jar 
~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-sandbox/8.5.0/2b275921f2fd92b15b4f1a2a565467c3fa221ef9/lucene-sandbox-8.5.0.jar
cp lucene-queryparser-8.5.0-fix.jar 
~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-queryparser/8.5.0/13c38f39b1a7d10c4749ba789fa95da5868d4885/lucene-queryparser-8.5.0.jar

------------

 

On Sat, Apr 11, 2020 at 11:21 AM Chris Hostetter <[email protected] 
<mailto:[email protected]> > wrote:


: If the changes I proposed are still viewed as having too many downstream
: impacts, my fallback position will be to patch the jars. This involves
: using the gradle import system to get the jars from Maven, then using a
: patch script to manually unzip the jars, move the offending classes into
: other jars which share the same package name and rezip. So far, I've been

I'm no expert here, but i trust that Uwe is, and i feel like your followup 
questions/suggests have still avoided his primary point about *why* 
Lucene/Solr hasn't attempted jigsaw modulariation...

: >>> There is currently some preparatory things to move forward with modules,
: >>> so although you might be able to actually compile Lucene with module 
system
: >>> (by limiting to a subset of JAR files), it currently won’t work
: >>> cross-module due to the way how it handles ServiceLoader interfaces
: >>> (codecs, postings formats, analyzers, see
: >>> https://issues.apache.org/jira/browse/LUCENE-9281). The only way to
: >>> make it work at runtime is to put all of Lucene into one module.

...so, IIUC, even if you patch the *current* jars, any Lucene code you use 
that depends on SPI (like analysis chains, codecs, etc...) isn't going to 
work unless follow Uwe's primary suggestion for folks who care about 
modules...

: >>> Th general recommendation is to combine all required Lucene libraries
: >>> into a separate JAR file during the maven / gradle build (e.g. using the
: >>> Maven Shade plugin). Keep in mind that Lucene is also not suitable for use


-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] 
<mailto:[email protected]> 
For additional commands, e-mail: [email protected] 
<mailto:[email protected]> 

Reply via email to