Hi Andriy, For the Enhancer RESTful API
The MediaType is taken from the "MediaType mediaType" parameter as parsed by JAX-RS to the "readFrom(..)" method of the "MessageBodyReader". This should be equals to the 'Content-Type' header parsed in the request. The uploaded content is stored as Blob to the created ContentItem. In case you are sending "multipart/form-data" requests than you need to consider the specification as documented in the "Multipart MIME serialization" section of [1] For the Tika Engine: The MimeType is parsed from ContentItem#getBlob()#getMimeType() (see also [1]). If the mime type can no be parsed of is application/octet-stream than the Tika is used to detect the correct MimeType. Otherwise the content type as set in the Blob is used. BTW. plain text files are not processed by the Tika engine. best Rupert [1] http://stanbol.apache.org/docs/trunk/components/enhancer/contentitem.html On Fri, Nov 23, 2012 at 9:18 AM, Andriy Nikolov <[email protected]> wrote: > Dear all, > > I have another question about the use of Stanbol enhancer REST API > (apologies if it is already covered in the documentation, i didn't > find it). > Is there some default content type which is expected by the enhancer? > For instance, if I send a PDF file to the dbpedia-spotlight chain > without specifying its content type, it gets processed correctly: > curl -X POST -H "Accept: text/turtle" -T test.pdf > http://localhost:8080/enhancer/chain/dbpedia-spotlight?uri=urn:testItem > However, if I send a plain text file instead, nothing is returned: > curl -X POST -H "Accept: text/turtle" -T dummy.txt > http://localhost:8080/enhancer/chain/dbpedia-spotlight > I have to set "Content-type: text/plain" in the header. > Similarly, when I send PDF content from Java client via > HttpURLConnection, if I don't set "Content-type: > application/octet-stream" explicitly, it gets interpreted as plain > text. > > I guess, Tika engine is able to recognise both plain text and > different binary formats, so can I set some "default" content type, > which will just defer the recognition of input format to the Tika > engine? > That will allow me sending any file to the service without first doing > some "pre-guessing" on the client side. > > Best regards, > > Andriy Nikolov > > R&D Engineer > > F +49 6227 3849-565 > > [email protected] > > http://www.fluidops.com > > fluid Operations AG > > Altrottstr. 31 > > 69190 Walldorf, Germany > > Geschäftsführer/Managing Directors: Vasu Chandrasekhara, Dr. Andreas > Eberhart, Dr. Stefan Kraus, Dr. Ulrich Walther > > Beirat/Advisory Board: Prof. Dr. Andreas Reuter, Thomas Reinhart > > Registergericht/Commercial Register: Mannheim, HRB 704027 > > USt-Id Nr./VAT-No.: DE258759786 > > This e-mail may contain confidential and/or privileged information. If > you are not the intended recipient (or have received this e-mail in > error) please notify the sender immediately and destroy this e-mail. > Any unauthorised copying, disclosure or distribution of the material > in this e-mail is strictly forbidden. -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
