> I'd really like to get rid of the "TIKA runs in Solr" process stuff
+1

>I think we can improve the REST endpoint so Solr only needs to pass the 
>documents once to get both metadata and text contents (HTML).

The "new" /rmeta endpoint accomplishes this. The output is a JSON list of 
metadata objects with the content stored in as a value in X-TIKA-Content (or 
similar).  The primary document is the first metadata object in the list, and 
then each attachment is another metadata object in the list.  (see SOLR-7229)

Whether Solr goes this route or not, we need to harden the server to handle OOM 
and permanent hangs robustly.



-----Original Message-----
From: Uwe Schindler [mailto:[email protected]] 
Sent: Friday, June 17, 2016 11:20 AM
To: [email protected]
Subject: RE: [jira] [Commented] (SOLR-8981) Upgrade to Tika 1.13 when it is 
available

Hi,

I'd really like to get rid of the "TIKA runs in Solr" process stuff. I think we 
can improve the REST endpoint so Solr only needs to pass the documents once to 
get both metadata and text contents (HTML). Currently it would need to pass 2 
times, which is lots of overhead.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]

> -----Original Message-----
> From: Allison, Timothy B. [mailto:[email protected]]
> Sent: Friday, June 17, 2016 5:04 PM
> To: [email protected]
> Subject: RE: [jira] [Commented] (SOLR-8981) Upgrade to Tika 1.13 when 
> it is available
> 
> Thank _you_ Lewis!  Y, they're shipping with 1.7 currently.
> 
> What would be great is if we could help them isolate Tika in case of 
> catastrophic failures via a hardened tika-server and SOLR-7632, perhaps?
> 
> Erik Hatcher's recommendation of a Tika pass-through to Solr would 
> also be great to add...CORS? or perhaps allow users to configure an 
> endpoint ...then we could contribute a Solr handler that includes most 
> of the current DIH functionality.
> 
> 
> 
> -----Original Message-----
> From: Lewis John Mcgibbney [mailto:[email protected]]
> Sent: Friday, June 17, 2016 10:47 AM
> To: [email protected]
> Subject: Fwd: [jira] [Commented] (SOLR-8981) Upgrade to Tika 1.13 when 
> it is available
> 
> Hi Folks,
> Pretty cool news. It seems like Tim and I managed to win the hearts 
> and minds of Uwe and Solr devs. Solr 6.2 will run with Tika 1.13. 
> Hopefully this sets a precedent for us breaking down the barriers 
> which have meant that until now Tika has been upgraded sparingly in Solr.
> Nice work Tim.
> Lewis
> 
> ---------- Forwarded message ----------
> From: *Uwe Schindler (JIRA)* <[email protected]>
> Date: Friday, June 17, 2016
> Subject: [jira] [Commented] (SOLR-8981) Upgrade to Tika 1.13 when it 
> is available
> To: [email protected]
> 
> 
> 
>     [
> https://issues.apache.org/jira/browse/SOLR-
> 8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=15336224#comment-15336224
> ]
> 
> Uwe Schindler commented on SOLR-8981:
> -------------------------------------
> 
> Thanks! I will merge the PR later this evening! For 6.1 it is now to 
> late, but 6.2 will have this :-)
> 
> Uwe
> 
> > Upgrade to Tika 1.13 when it is available
> > -----------------------------------------
> >
> >                 Key: SOLR-8981
> >                 URL: https://issues.apache.org/jira/browse/SOLR-8981
> >             Project: Solr
> >          Issue Type: Improvement
> >            Reporter: Tim Allison
> >            Assignee: Uwe Schindler
> >            Priority: Minor
> >
> > Tika 1.13 should be out within a month.  This includes PDFBox 2.0.0 
> > and a
> number of other upgrades and improvements.
> > If there are any showstoppers in 1.13 from Solr's side or requests 
> > before
> we roll 1.13, let us know.
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
> 
> 
> 
> --
> *Lewis*

Reply via email to