Hi Karl,

I did a quick alpha version of a Solr 9 connector to test: I can confirm that it works with older Solr versions !

HOWEVER, in SolrJ 9, the new Solr client has been reimplemented: it now prevents to easily customize the httpClient and the way it performs requests. This makes it very challenging - at least for me - to port all of the custom code concerning the multipart post requests, as well as the basic and preemptive auth of the current Solr connector! Who knows, maybe with this new SolrJ client, those custom codes have become useless and now the multipart/basic/preemptive auth work OOTB... Unfortunatly, I don't have time to test whether those functionalities work OOTB, not to mention that I don't have a test environment to give it a try. Maybe the MCF committers of these solr related updates could give it a look if I commit a final version of the connector on a dedicated branch ?

Julien

On 29/11/2022 22:35, Karl Wright wrote:
Hi Julien,

Sorry for the delay; I've been under intense pressure at work of late and
just saw this email now.

Regarding library updates: we should generally go ahead and do those
FIRST.  There are custom fixes for httpclient checked into the ManifoldCF
code base so we may need to work a little to get those to build properly.
But I'm reasonably sure it can be done.  Libraries are backwards compatible
at the minor version level so all is good there.  When somebody wants to go
to HttpClient 5, though, we are in trouble.

AFTER that is done we should evaluate whether the 9.x Solr library is
backwards compatible enough with 8.x to work.  We had to do very little to
go from 7.x to 8.x, so unless the Solr people suddenly changed their
philosophy dramatically, it should be possible to do this too.  But we will
see.

Karl


On Tue, Nov 29, 2022 at 9:59 AM Julien Massiera <
julien.massi...@francelabs.com> wrote:

Hi Karl,

the Solr output connector does not seem to work with Solr 9.x according
to our tests. We are going to either update or develop a new connector
but there is a problem concerning the libraries required. A solr 9.x
connector will of course involve a solrj 9.x lib but also the update of
the following libs in MCF:

- zookeeper from 3.4.10 to >= 3.7.0 (current 3.8.0)
- httpcomponent.httpclient.version from 4.5.3 to 4.5.13
- httpcomponent.httpcore.version from 4.4.6 to 4.4.15
- httpcomponent.httpmime.version from 4.5.3 to 4.5.13

Those updates should not cause problems to other connectors in MCF, the
real problem here concerns the current Solr connector as I am not sure
that an updated version would be compatible with a Solr < 9.x.
There is also the modified solr clients using the custom multi-parts
http post methods that will cause some troubles to be ported on Solrj 9
.x according to me.

If I am not wrong, historically those custom clients were developed to
avoid errors with the embedded Tika of Solr for some documents. But
IMHO, it has become a challenge that is not worth the effort: the way to
go should be to have the documents processed by Tika BEFORE the Solr
indexation. Not to mention that the tika embedded in Solr is too old
(1.28.1) and will most certainly be removed someday (as stated in this
tickethttps://issues.apache.org/jira/browse/SOLR-13973). Thus, I think
it is not worth it to port the custom solr clients in the new connector.
This would ease the creation of the Solr 9 output connector.

Whatever happens, if we want to maintain output connectors for different
versions of Solr, and IF the Solr 9 output connector is not compatible
with previous versions of Solr (still needs to be checked), we'll end up
with several versions of the libs in ManifoldCF. To be honest, I do not
see a proper way to deal with the libs conflicts between the two
connectors...

What do you think ?

Regards,
Julien

--
Julien MASSIERA
Directeur développement produit
France Labs – Les experts du Search
Datafari – Vainqueur du trophée Big Data 2018 au Digital Innovation Makers 
Summit
www.francelabs.com

Reply via email to