Hi, On Wed, Aug 29, 2012 at 6:02 PM, chraj007 <chraj.k...@gmail.com> wrote: > http://lucene.472066.n3.nabble.com/file/n4004078/test.html test.html
Looks like that file has an incorrect http-equiv declaration: <META http-equiv="Content-Type" content="text/html; charset=utf-16"> The encoding of the file is not UTF-16. Can you file a TIKA issue about this? Tika should be able to automatically detect the correct encoding and use it if the declared one is obviously incorrect. BR, Jukka Zitting