As long as we have a switch and a warning (and pointer to CVE URL with that
warning), I’m +1 to re-enable it.

On 9/14/16, 4:40 AM, "Nick Burch" <apa...@gagravarr.org> wrote:

    On Wed, 14 Sep 2016, Allison, Timothy B. wrote:
    > Would it be as much of a disaster to require the user to allow the 
    > fileUrl capability on the commandline at server startup?  We could add 
    > some menacing "all bets are off, we hope you know what you're doing" 
    > warning.
    
    With a special switch, and a warning, enabling file:/// again wouldn't be 
    too bad in my view.
    
    I'm not sure about arbitrary URLs though - there's the security + dos 
    stuff, plus the fact that we won't be doing robots checking / niceness / 
    etc. For anyone doing remote URLs, I think they do need to be using a 
    proper + safe + server-friendly crawler, then passing the result of a 
    successful fetch to the Tika server
    
    >> My main concern in accessing the Tika libraries via TikaJAXRS is the 
    >> performance overheads associated ?>with going through sockets (and 
    >> possible the additional memory/file copying of file data if fileUrl is 
    >> not >available).
    >
    > In my experience, depending on the file types, y, there's definitely 
    > some overhead, but the bottleneck is in the parsers (esp for complex 
    > document formats -- msoffice, pdf, etc), not data sloshing.
    
    I agree - for almost all formats, the slow bit isn't byte shuffling it's 
    parsing
    
    Nick
    



Reply via email to