Wow, Hoss, this post was so long ago I barely remember writing it. ;-)

The problem we were having is not that the content type is not set in SolrJ - 
it's that SolrCell does not discover it as it did when we used multipart posts 
and ran with Solr 3.6.  We still aren't sure where the change is that broke the 
Tika content-type-discovery functionality, or whether it is in Tika or in Solr, 
but we did set the content type in the content stream from the source, where 
possible, and that helped enormously.

The specific test case we had was an SJIS text file, which in Solr 3.6 is 
properly discovered to be SJIS, while in Solr 4.1 it is only discovered to be 
sjis if we set a content type other than application/octet-stream.

Karl


-----Original Message-----
From: ext Chris Hostetter [mailto:[email protected]] 
Sent: Wednesday, February 13, 2013 2:53 PM
To: [email protected]
Subject: RE: Solrj/Tika question about content types


: questions still apply: since Tika apparently cares deeply about
: content-type now, what content-type can I supply through SolrJ to tell
: it 'please discover the document type on your own'?  And how do I do
: that through SolrJ?

SolrJ sets the Content-Type header based on what is returned by he 
"getContentType()" of the ContentStream -- the default behavior is 
"application/octet-stream" if getContentType() returns null.

: (1) Does the getContentType() method actually even get used on Solrj?  
: When I looked at wire logging, it seemed that Solrj just posts a generic
: "application/xml; charset=UTF-8" content type, and does not transmit
: anything else.  It uses standard POST, not multipart/form POST, also.

Even in the case of a single ContentStream (so no multi-part) it still uses 
ContentStream.getContentType() ... can you provide a test case (or quick and 
dirty sample code) that demonstrates what you are seeing with "application/xml; 
charset=UTF-8" getting sent over the wire even though you explicitly provide a 
diff content-type in the ContentStream?


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For additional 
commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to