Hi all,

again:

curl -v -X PUT -T some.pdf http://localhost:9998/tika --header
"Content-Type: application/pdf"

... and tika returns plain text as it should - so a working MIME type would
be 'application/pdf'.


*Now off to BaseX:*

 let
 $request :=
  <http:request  method='PUT'    >
    <http:body media-type="application/pdf" src="some.pdf"/>
  </http:request>
return
 http:send-request($request,"http://localhost:9998/tika";)

*For this, tika returns 415* - unsupported media type. Although specifying
the MIME type this time, the content that BaseX sends does not look like
what tika expects.

let
  $file:="some.pdf",
  $request :=
<http:request  method='PUT'>
 <http:body media-type="application/pdf">{
  fetch:binary($file)
 }</http:body>
</http:request>
return
 http:send-request($request,"http://localhost:9998/tika";)

*For this, tika returns 500* - processing error. Media type is specified to
'application/pdf' which works with curl (see above) but not with BaseX.
Also the tcpdump differs for the BaseX requests, as expected. So either
we're doing something really wrong, or BaseX sends the content in a way
it's not supposed to. In the latter case I'm not the one to look into this
issue and we have to wait for someone to take a proper look at it.

Regards,
Lukas


On Sun, Jan 5, 2014 at 5:06 PM, Andy Bunce <bunce.a...@gmail.com> wrote:

> Hi Dirk,
> The Tika documentation is not very clear[1]. tika-app has a simple server
> mode. tika-server, which I am using,  is a different jar [2]
>
> [1]
> http://stackoverflow.com/questions/12231630/how-to-use-tika-in-server-mode
> [2] http://mvnrepository.com/artifact/org.apache.tika/tika-server/1.4
>
>
> On Sun, Jan 5, 2014 at 3:39 PM, Dirk Kirsten <d...@basex.org> wrote:
>
>> Hello,
>>
>> You can also simple get all the request headers using the -v flag when
>> running curl. Or you could use wireshark, which (at least to me) seems
>> easier than using tcpdump.
>>
>> I'd like to reproduce your problem, but I seem to be too stupid to get
>> the Tika server up and running.
>> When running
>>   java -jar tika-app-1.4.jar -s 9999
>>
>> (or even with the verbose flag) I simply don't get any thing (but a
>> running process) and the server seems to me not properly started, e.g.
>> if I do
>>   curl -X GET http://localhost:9998/tika
>>
>> I simply get nothing (I don't get any response, servers seems not to
>> send any response).
>>
>> However, I would suggest to try to look at the request sent by curl, as
>> curl sets some headers automatically and I also experienced similar
>> problems before (i.e. for some servers not setting some obscure headers
>> seems to be fatal...)
>>
>> Cheers,
>> Dirk
>>
>>
>> On 05/01/14 15:00, Florent Georges wrote:
>> > On 5 January 2014 00:57, Andy Bunce wrote:
>> >
>> >   Hi,
>> >
>> >> curl -X PUT -T aa.pdf http://localhost:9998/tika
>> >> [...]
>> >> I have tried:
>> >> let $file:="C:\tmp\aa.pdf"
>> >> let $request :=
>> >>   <http:request  method='PUT'    >
>> >>     <http:body media-type="application/octet-stream">{
>> >>       fetch:binary($file)
>> >>     }</http:body>
>> >>     </http:request>
>> >
>> >   I do not know Tika, I do not have BaseX on this machine, and you did
>> > not give a lot of details about what is not working nor error messages,
>> > so it is a bit difficult to help here.  All I can say is that I would
>> > use the following as the EXPath HTTP Client equivalent to the above
>> > CURL command:
>> >
>> >     <http:request method="put">
>> >        <http:body media-type="application/pdf"
>> src="file:/c:/tmp/aa.pdf"/>
>> >     </http:request>
>> >
>> >   The @media-type is mandatory.  You do not set any explicitly with
>> > CURL, so you should probably find which MIME type works with CURL in
>> > the first place.  The @src lets the processor handle the details of
>> > accessing the binary file, which makes things easier and then you are
>> > sure the problem is not with fetch:binary() or with the analysis of
>> > the binary content of http:body.
>> >
>> >   If you find a MIME type that works with CURL (you can use the -H
>> > option like the following: -H "Content-Type: application/pdf"), and it
>> > is still failing, tcpdump can help as well.  Open a terminal window,
>> > and execute the following:
>> >
>> >     sudo tcpdump -s 0 -A -i any tcp and host localhost and port 9998
>> >
>> >   This will dump all traffic to localhost:9998.  Then go to another
>> > terminal window (because tcpdump is still running) and execute the
>> > CURL command.  After the completion, go back to the first window and
>> > press Ctrl-C (to kill tcpdump).  In between, tcpdump has output to the
>> > console a dump of the request.  It will as well if you keep it running
>> > when you test your query in BaseX.  So you can compare both requests
>> > and see what is different (or post it here so we can see what is
>> > happening).
>> >
>> >   Regards,
>> >
>>
>> --
>> Dirk Kirsten, BaseX GmbH, http://basex.org
>> |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
>> |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
>> |   Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
>> `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22
>> _______________________________________________
>> BaseX-Talk mailing list
>> BaseX-Talk@mailman.uni-konstanz.de
>> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
>>
>
>
> _______________________________________________
> BaseX-Talk mailing list
> BaseX-Talk@mailman.uni-konstanz.de
> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
>
>
_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Reply via email to