Hi Shai,
with XML parsers you should generally avoid using Readers, unless you know exactly that the underlying XML encoding is really the one given to the Reader. Readers as parameters should only be used for sources that are invariant of the encoding (like Java Strings containing XML, and without encoding declaration!!!!). Good examples of correctly using a Reader are: - new InputSource(new StringReader("<tag>..</tag>")); // no xml declaration - An XML stream serialized from a SAX/DOM to a Writer itself (so it is without encoding), e.g. stored in a Lucene Stored String. But documents from unknown source should always handled as byte streams. The XML parser must be able to switch the encoding according to the declaration it found in XML header, this is not possible with Readers. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de _____ From: Shai Erera [mailto:ser...@gmail.com] Sent: Friday, April 10, 2009 8:47 AM To: java-dev@lucene.apache.org Subject: Benchmark: EnwikiDocMaker does not use fileIn (BufferedReader) I started working on the patch for 1591, and noticed EnwikiDocMaker uses the FileInputStream instance from LineDocMaker and not the BuferredReader. I don't see any reason to this, as InputSource accepts a Reader. I can change it as part of 1591, unless you think I'm missing something.