Henry, can you review this please? I think this is an area you have worked in.
I suspect the duplication is partly due to too many different people working on these various substrates and none taking ownership. IIRC, Raju (cc-ed) did the CSS implementation, so he may want to have a look too. On 2011-03-27, at 09:57, André Bargull wrote: > Change bargull-20110327-wHm by bargull@Bargull02 on 2011-03-27 13:48:00 > in /home/anba/src/svn/openlaszlo/trunk > for http://svn.openlaszlo.org/openlaszlo/trunk > > Summary: Support UTF-16 encoding for files > > Bugs Fixed: LPP-1657 (UTF16 data broken) > > Technical Reviewer: ptw > QA Reviewer: (pending) > > Overview: > UTF-16 encoding support is a bit inconsistent, it isn't supported for some > files (e.g. lzx or proxied xml from server), while for other files it is > (e.g. css files). In addition to that, I've also encountered "duplicated" > code for BOM detection or encoding detection from the xml-declaration (in > "<?xml encoding=... ?>"). "duplicated" is put in quotes here, because even > though the code was used for the same purpose, it was slightly different > leading to other results depending on which method was called. Here's an > example for BOM detection from FileUtils: > > FileUtils contained several methods to read the BOM, but all of them were > kind of broken: > - detectBOMEncoding() used mark() to mark the current position in the stream, > but never called reset() > - stripByteOrderMark(PushbackInputStream) mixed up UTF-16LE and UTF-16BE, and > it may unread too much bytes (if the input stream contains less than three > bytes) > - stripByteOrderMark(byte[]) was more or less a plain copy of the other > stripByteOrderMark() method, duplicate code, yuk... > > > Details: > FileUtils: > #Encoding > -- Simple enum for the encoding information (utf-8 and utf-16 only) > #readString(Reader) > -- New utility method to return the content of a Reader > #getXMLStreamReader(InputStream, String) > -- no longer read the complete InputStream into memory, instead use mark() to > reset the position resp. SequenceInputStream > -- use BOM to deduce proper charset to read xml-prolog, if BOM not present, > guess possible encoding per auto-detection described in xml specification > #readFully(InputStream, byte[]) > -- helper method to read from InputStream into byte array > #skipFully(InputStream, long) > -- helper method to skip() bytes more reliably > #getEncoding(byte[]) > -- helper method to return encoding from BOM > #guessXMLEncoding(byte[]) > -- helper method to guess possible encoding by reading first bytes of input > #getBOMEncoding(BufferedInputStream) > -- use getEncoding() method to retrieve encoding from BOM > #makeXMLReaderForFile(String, String) > -- use toByteArrayInputStream() to read complete input into memory now that > getXMLStreamReader() no longer uses in-memory streams > #toByteArrayInputStream(InputStream) > -- reads InputStream content into memory and returns ByteArrayInputStream > with the same content > #stripByteOrderMark(PushbackInputStream) > -- use getEncoding() to read BOM > -- don't unread bytes after EOF > #stripByteOrderMark(byte[]) > -- use getEncoding() to read BOM > > CSSHandler: > - use new methods from FileUtils to avoid duplicating code > > HttpData: > - use encoding detection from FileUtils > > XMLGrabber: > - the PullParser we're using doesn't perform any encoding detection, > therefore fallback to the utility methods in FileUtils > > ResponderXML: > - don't read files unconditionally in a certain encoding (here: UTF-8) > instead just return the plain bytes > > FileUtils_Test2: > - test case for the new resp. changed method from FileUtils > > WholeFile_Test: > - initialize ReprocessComments.setOptions() in order to run test case > > build.xml: > - re-enable FileUtils_Test, according to LPP-2054 it is currently failing, > but I couldn't reproduce the failure > - add FileUtils_Test2 > - remove stale EDU/** entry from the oswego concurrent classes > - profiler properties for visual vm, they're not enabled by default but > useful for things like LPP-9841 > > > Tests: > test case from LPP-1657, junit test case > > Files: > M WEB-INF/lps/server/build.xml > M WEB-INF/lps/server/src/org/openlaszlo/js2doc/WholeFile_Test.java > M WEB-INF/lps/server/src/org/openlaszlo/css/CSSHandler.java > M WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils.java > A WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils_Test2.java > M WEB-INF/lps/server/src/org/openlaszlo/servlets/responders/ResponderXML.java > M WEB-INF/lps/server/src/org/openlaszlo/data/HttpData.java > M WEB-INF/lps/server/src/org/openlaszlo/data/XMLGrabber.java > > Changeset: > http://svn.openlaszlo.org/openlaszlo/patches/bargull-20110327-wHm.tar >
