Hi Dirk, The Tika documentation is not very clear[1]. tika-app has a simple server mode. tika-server, which I am using, is a different jar [2]
[1] http://stackoverflow.com/questions/12231630/how-to-use-tika-in-server-mode [2] http://mvnrepository.com/artifact/org.apache.tika/tika-server/1.4 On Sun, Jan 5, 2014 at 3:39 PM, Dirk Kirsten <d...@basex.org> wrote: > Hello, > > You can also simple get all the request headers using the -v flag when > running curl. Or you could use wireshark, which (at least to me) seems > easier than using tcpdump. > > I'd like to reproduce your problem, but I seem to be too stupid to get > the Tika server up and running. > When running > java -jar tika-app-1.4.jar -s 9999 > > (or even with the verbose flag) I simply don't get any thing (but a > running process) and the server seems to me not properly started, e.g. > if I do > curl -X GET http://localhost:9998/tika > > I simply get nothing (I don't get any response, servers seems not to > send any response). > > However, I would suggest to try to look at the request sent by curl, as > curl sets some headers automatically and I also experienced similar > problems before (i.e. for some servers not setting some obscure headers > seems to be fatal...) > > Cheers, > Dirk > > > On 05/01/14 15:00, Florent Georges wrote: > > On 5 January 2014 00:57, Andy Bunce wrote: > > > > Hi, > > > >> curl -X PUT -T aa.pdf http://localhost:9998/tika > >> [...] > >> I have tried: > >> let $file:="C:\tmp\aa.pdf" > >> let $request := > >> <http:request method='PUT' > > >> <http:body media-type="application/octet-stream">{ > >> fetch:binary($file) > >> }</http:body> > >> </http:request> > > > > I do not know Tika, I do not have BaseX on this machine, and you did > > not give a lot of details about what is not working nor error messages, > > so it is a bit difficult to help here. All I can say is that I would > > use the following as the EXPath HTTP Client equivalent to the above > > CURL command: > > > > <http:request method="put"> > > <http:body media-type="application/pdf" > src="file:/c:/tmp/aa.pdf"/> > > </http:request> > > > > The @media-type is mandatory. You do not set any explicitly with > > CURL, so you should probably find which MIME type works with CURL in > > the first place. The @src lets the processor handle the details of > > accessing the binary file, which makes things easier and then you are > > sure the problem is not with fetch:binary() or with the analysis of > > the binary content of http:body. > > > > If you find a MIME type that works with CURL (you can use the -H > > option like the following: -H "Content-Type: application/pdf"), and it > > is still failing, tcpdump can help as well. Open a terminal window, > > and execute the following: > > > > sudo tcpdump -s 0 -A -i any tcp and host localhost and port 9998 > > > > This will dump all traffic to localhost:9998. Then go to another > > terminal window (because tcpdump is still running) and execute the > > CURL command. After the completion, go back to the first window and > > press Ctrl-C (to kill tcpdump). In between, tcpdump has output to the > > console a dump of the request. It will as well if you keep it running > > when you test your query in BaseX. So you can compare both requests > > and see what is different (or post it here so we can see what is > > happening). > > > > Regards, > > > > -- > Dirk Kirsten, BaseX GmbH, http://basex.org > |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz > |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: > | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle > `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22 > _______________________________________________ > BaseX-Talk mailing list > BaseX-Talk@mailman.uni-konstanz.de > https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk >
_______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk