> I'd really like to get rid of the "TIKA runs in Solr" process stuff +1
>I think we can improve the REST endpoint so Solr only needs to pass the >documents once to get both metadata and text contents (HTML). The "new" /rmeta endpoint accomplishes this. The output is a JSON list of metadata objects with the content stored in as a value in X-TIKA-Content (or similar). The primary document is the first metadata object in the list, and then each attachment is another metadata object in the list. (see SOLR-7229) Whether Solr goes this route or not, we need to harden the server to handle OOM and permanent hangs robustly. -----Original Message----- From: Uwe Schindler [mailto:[email protected]] Sent: Friday, June 17, 2016 11:20 AM To: [email protected] Subject: RE: [jira] [Commented] (SOLR-8981) Upgrade to Tika 1.13 when it is available Hi, I'd really like to get rid of the "TIKA runs in Solr" process stuff. I think we can improve the REST endpoint so Solr only needs to pass the documents once to get both metadata and text contents (HTML). Currently it would need to pass 2 times, which is lots of overhead. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [email protected] > -----Original Message----- > From: Allison, Timothy B. [mailto:[email protected]] > Sent: Friday, June 17, 2016 5:04 PM > To: [email protected] > Subject: RE: [jira] [Commented] (SOLR-8981) Upgrade to Tika 1.13 when > it is available > > Thank _you_ Lewis! Y, they're shipping with 1.7 currently. > > What would be great is if we could help them isolate Tika in case of > catastrophic failures via a hardened tika-server and SOLR-7632, perhaps? > > Erik Hatcher's recommendation of a Tika pass-through to Solr would > also be great to add...CORS? or perhaps allow users to configure an > endpoint ...then we could contribute a Solr handler that includes most > of the current DIH functionality. > > > > -----Original Message----- > From: Lewis John Mcgibbney [mailto:[email protected]] > Sent: Friday, June 17, 2016 10:47 AM > To: [email protected] > Subject: Fwd: [jira] [Commented] (SOLR-8981) Upgrade to Tika 1.13 when > it is available > > Hi Folks, > Pretty cool news. It seems like Tim and I managed to win the hearts > and minds of Uwe and Solr devs. Solr 6.2 will run with Tika 1.13. > Hopefully this sets a precedent for us breaking down the barriers > which have meant that until now Tika has been upgraded sparingly in Solr. > Nice work Tim. > Lewis > > ---------- Forwarded message ---------- > From: *Uwe Schindler (JIRA)* <[email protected]> > Date: Friday, June 17, 2016 > Subject: [jira] [Commented] (SOLR-8981) Upgrade to Tika 1.13 when it > is available > To: [email protected] > > > > [ > https://issues.apache.org/jira/browse/SOLR- > 8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment- > tabpanel&focusedCommentId=15336224#comment-15336224 > ] > > Uwe Schindler commented on SOLR-8981: > ------------------------------------- > > Thanks! I will merge the PR later this evening! For 6.1 it is now to > late, but 6.2 will have this :-) > > Uwe > > > Upgrade to Tika 1.13 when it is available > > ----------------------------------------- > > > > Key: SOLR-8981 > > URL: https://issues.apache.org/jira/browse/SOLR-8981 > > Project: Solr > > Issue Type: Improvement > > Reporter: Tim Allison > > Assignee: Uwe Schindler > > Priority: Minor > > > > Tika 1.13 should be out within a month. This includes PDFBox 2.0.0 > > and a > number of other upgrades and improvements. > > If there are any showstoppers in 1.13 from Solr's side or requests > > before > we roll 1.13, let us know. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > > > > -- > *Lewis*
