Thank you, Nick. For the reasons you listed, I'm averse to adding fileUrl back, but I'm not entirely against it.
Would it be as much of a disaster to require the user to allow the fileUrl capability on the commandline at server startup? We could add some menacing "all bets are off, we hope you know what you're doing" warning. If we went with something like this, we could allow all urls, and users wouldn't have to ship the bytes via the network, tika-server could read local files from the file share. This might still be a remarkably bad idea... Cheers, Tim P.S. > My main concern in accessing the Tika libraries via TikaJAXRS is the > performance overheads associated ?>with going through sockets (and possible > the additional memory/file copying of file data if fileUrl is not >available). In my experience, depending on the file types, y, there's definitely some overhead, but the bottleneck is in the parsers (esp for complex document formats -- msoffice, pdf, etc), not data sloshing. -----Original Message----- From: John Dougrez-Lewis [mailto:jle...@lightblue.com] Sent: Wednesday, September 14, 2016 2:35 AM To: dev@tika.apache.org Cc: 'Nick Burch' <apa...@gagravarr.org> Subject: RE: Query on correct use of 'fileUrl' in TikaJAXRS Server to extract document at remote url - my request is not working Thanks for the insight. My interest (as a developer) in TikaJAXRS is that it provides a nice encapsulation of Tika functionality which is accessible across language boundaries. The fact that it can then also cross network boundaries is of secondary importance to me. I'm developing code in C++ and I'd like to be able to access Tika's capabilities. The TikaJAXRS offers an easy way in. If the fileURL functionality was in place and running TikaJAXRS on the same box as the Client and restricted to listening on 127.0.0.1 with the file:// check as well, this would limit some of the dangers listed below - an attacker would then need access to your host box itself in which case you would have already lost. My main concern in accessing the Tika libraries via TikaJAXRS is the performance overheads associated with going through sockets (and possible the additional memory/file copying of file data if fileUrl is not available). Short of the Herculean task of porting the entirety of Tika from java to C++, are there any better, well-established, more performant ways of interfacing to Tika from C++ to the java Tika code ? Regards, John