As long as we have a switch and a warning (and pointer to CVE URL with that warning), I’m +1 to re-enable it.
On 9/14/16, 4:40 AM, "Nick Burch" <apa...@gagravarr.org> wrote: On Wed, 14 Sep 2016, Allison, Timothy B. wrote: > Would it be as much of a disaster to require the user to allow the > fileUrl capability on the commandline at server startup? We could add > some menacing "all bets are off, we hope you know what you're doing" > warning. With a special switch, and a warning, enabling file:/// again wouldn't be too bad in my view. I'm not sure about arbitrary URLs though - there's the security + dos stuff, plus the fact that we won't be doing robots checking / niceness / etc. For anyone doing remote URLs, I think they do need to be using a proper + safe + server-friendly crawler, then passing the result of a successful fetch to the Tika server >> My main concern in accessing the Tika libraries via TikaJAXRS is the >> performance overheads associated ?>with going through sockets (and >> possible the additional memory/file copying of file data if fileUrl is >> not >available). > > In my experience, depending on the file types, y, there's definitely > some overhead, but the bottleneck is in the parsers (esp for complex > document formats -- msoffice, pdf, etc), not data sloshing. I agree - for almost all formats, the slow bit isn't byte shuffling it's parsing Nick