Should we require that the user enter a key, or have tika-server spit out a 
random UUID that clients have to include in their calls?

Or will Konstantin's two flags be sufficient?

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Wednesday, September 14, 2016 11:49 AM
To: dev@tika.apache.org
Cc: jle...@lightblue.com
Subject: Re: Query on correct use of 'fileUrl' in TikaJAXRS Server to extract 
document at remote url - my request is not working

+1 Great idea Konstantin

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect, Instrument Software and Science Data Systems Section (398) 
Manager, Open Source Projects Formulation and Development Office (8212) NASA 
Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate 
Professor, Computer Science Department University of Southern California, Los 
Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 


On 9/14/16, 8:35 AM, "Konstantin Gribov" <gros...@gmail.com> wrote:

    +1 to re-enable fileUrl with warning (with CVE ID at least) and at least
    special flag to enable it.
    
    IMHO, even better would be to require two flags (something like
    `--enable-dangerous-features/--enable-unsecure-features` plus actual
    `--enable-fileurl` like Sun/Oracle use for commercial features). It will
    force user to think twice before start tika-server with fileUrl enabled and
    clearly state that server is running in unsecure mode for anyone looking in
    ps/htop/initscript/et cetera.
    
    ср, 14 сент. 2016 г. в 17:15, Chris Mattmann <mattm...@apache.org>:
    
    > As long as we have a switch and a warning (and pointer to CVE URL with 
that
    > warning), I’m +1 to re-enable it.
    >
    > On 9/14/16, 4:40 AM, "Nick Burch" <apa...@gagravarr.org> wrote:
    >
    >     On Wed, 14 Sep 2016, Allison, Timothy B. wrote:
    >     > Would it be as much of a disaster to require the user to allow the
    >     > fileUrl capability on the commandline at server startup?  We could
    > add
    >     > some menacing "all bets are off, we hope you know what you're doing"
    >     > warning.
    >
    >     With a special switch, and a warning, enabling file:/// again wouldn't
    > be
    >     too bad in my view.
    >
    >     I'm not sure about arbitrary URLs though - there's the security + dos
    >     stuff, plus the fact that we won't be doing robots checking / niceness
    > /
    >     etc. For anyone doing remote URLs, I think they do need to be using a
    >     proper + safe + server-friendly crawler, then passing the result of a
    >     successful fetch to the Tika server
    >
    >     >> My main concern in accessing the Tika libraries via TikaJAXRS is 
the
    >     >> performance overheads associated ?>with going through sockets (and
    >     >> possible the additional memory/file copying of file data if fileUrl
    > is
    >     >> not >available).
    >     >
    >     > In my experience, depending on the file types, y, there's definitely
    >     > some overhead, but the bottleneck is in the parsers (esp for complex
    >     > document formats -- msoffice, pdf, etc), not data sloshing.
    >
    >     I agree - for almost all formats, the slow bit isn't byte shuffling
    > it's
    >     parsing
    >
    >     Nick
    >
    >
    >
    >
    > --
    
    Best regards,
    Konstantin Gribov
    


Reply via email to