I've committed a hack to trunk. It has been tested for Solr Cell documents, deletions, and for tika-connector-extracted documents that don't have a lot of metadata. I'm asking Julien to test it with his specific image that has lots of metadata to see if the pathway for that case works properly. If it does, I'll spin another RC.
Long term, since I'm a Lucene/Solr committer, I think I'm going to have to take SolrJ under my wing if we expect it to work for ManifoldCF. I don't have a lot of time to do stuff like this anymore but clearly neither does the Solr team. Karl On Tue, Sep 25, 2018 at 6:14 AM Karl Wright <[email protected]> wrote: > The back-and-forth is not going well. Mr. Noble is needing to be > convinced that it is a valid use case for Solr to have metadata longer than > 4096 characters. In fact it seems like the Solr folks have deliberately > been trying to get rid of support for multipart posts for a while, because > they don't see the need for them. I'm still hoping to convince them > otherwise but I'm not getting a positive feel. > > I'm still trying to figure out if multipart posts have any fundamental > conflict with their RequestWriter architecture. If not I can perhaps > override the RequestWrite implementation and add multipart support that > way. But it's not going to be a quick process by any means. > > > On Mon, Sep 24, 2018 at 12:13 PM Karl Wright <[email protected]> wrote: > >> Hi Julien, >> >> This has nothing to do with the new Tika. >> >> It is not normal; it means that UpdateRequests are not being sent as >> multipart form posts. It's going to require work from the Solr team to fix >> this problem, however, because everything I do to work around the issue >> nonetheless seems to fail. :-( >> >> I'm having a back-and-forth with Paul Noble right now. I'll update >> accordingly when I know more. >> >> Karl >> >> >> On Mon, Sep 24, 2018 at 11:33 AM Julien Massiera < >> [email protected]> wrote: >> >>> After testing it, it is a +1 for me >>> >>> However, I found a new interesting issue coming with the new Tika >>> version. I had a jpg file for which some metadata were not extracted >>> before, like the RedTRC, BlueTRC and GreenTRC which contain >>> approximatively 2048 bytes of data each. As the metadata are passed to >>> Solr through the URI, I get the following error : URI is too large >8192 >>> >>> Do we consider it as a "normal issue" or is it worth checking the >>> metadata length before sending the ingest request ? >>> >>> >>> On 24/09/2018 16:43, Karl Wright wrote: >>> > Please vote on whether to release ManifoldCF 2.11, RC3. This release >>> > contains a number of fixes/improvements/additions, described in the >>> > CHANGES.txt file. In addition, it includes Tika 1.19, which has a >>> number >>> > of fixes for classpath issues specifically requested by ManifoldCF. >>> > >>> > This completely fixes a SolrJ related problem with the Solr Connector >>> found >>> > in RC3. All tests pass. >>> > >>> > The release artifact can be found at: >>> > >>> > >>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11 >>> > >>> > There is also a tag at: >>> > >>> > https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC3 >>> > >>> > Thanks again, >>> > Karl Wright >>> > >>> >>> -- >>> Julien MASSIERA >>> Directeur développement produit >>> France Labs – Les experts du Search >>> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC >>> www.francelabs.com >>> >>>
