+1 Great idea Konstantin

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect, Instrument Software and Science Data Systems Section (398)
Manager, Open Source Projects Formulation and Development Office (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 


On 9/14/16, 8:35 AM, "Konstantin Gribov" <gros...@gmail.com> wrote:

    +1 to re-enable fileUrl with warning (with CVE ID at least) and at least
    special flag to enable it.
    
    IMHO, even better would be to require two flags (something like
    `--enable-dangerous-features/--enable-unsecure-features` plus actual
    `--enable-fileurl` like Sun/Oracle use for commercial features). It will
    force user to think twice before start tika-server with fileUrl enabled and
    clearly state that server is running in unsecure mode for anyone looking in
    ps/htop/initscript/et cetera.
    
    ср, 14 сент. 2016 г. в 17:15, Chris Mattmann <mattm...@apache.org>:
    
    > As long as we have a switch and a warning (and pointer to CVE URL with 
that
    > warning), I’m +1 to re-enable it.
    >
    > On 9/14/16, 4:40 AM, "Nick Burch" <apa...@gagravarr.org> wrote:
    >
    >     On Wed, 14 Sep 2016, Allison, Timothy B. wrote:
    >     > Would it be as much of a disaster to require the user to allow the
    >     > fileUrl capability on the commandline at server startup?  We could
    > add
    >     > some menacing "all bets are off, we hope you know what you're doing"
    >     > warning.
    >
    >     With a special switch, and a warning, enabling file:/// again wouldn't
    > be
    >     too bad in my view.
    >
    >     I'm not sure about arbitrary URLs though - there's the security + dos
    >     stuff, plus the fact that we won't be doing robots checking / niceness
    > /
    >     etc. For anyone doing remote URLs, I think they do need to be using a
    >     proper + safe + server-friendly crawler, then passing the result of a
    >     successful fetch to the Tika server
    >
    >     >> My main concern in accessing the Tika libraries via TikaJAXRS is 
the
    >     >> performance overheads associated ?>with going through sockets (and
    >     >> possible the additional memory/file copying of file data if fileUrl
    > is
    >     >> not >available).
    >     >
    >     > In my experience, depending on the file types, y, there's definitely
    >     > some overhead, but the bottleneck is in the parsers (esp for complex
    >     > document formats -- msoffice, pdf, etc), not data sloshing.
    >
    >     I agree - for almost all formats, the slow bit isn't byte shuffling
    > it's
    >     parsing
    >
    >     Nick
    >
    >
    >
    >
    > --
    
    Best regards,
    Konstantin Gribov
    


Reply via email to