Henry, can you review this please?  I think this is an area you have worked in.

I suspect the duplication is partly due to too many different people working on 
these various substrates and none taking ownership.

IIRC, Raju (cc-ed) did the CSS implementation, so he may want to have a look 
too.

On 2011-03-27, at 09:57, André Bargull wrote:

> Change bargull-20110327-wHm by bargull@Bargull02 on 2011-03-27 13:48:00
> in /home/anba/src/svn/openlaszlo/trunk
> for http://svn.openlaszlo.org/openlaszlo/trunk
> 
> Summary: Support UTF-16 encoding for files
> 
> Bugs Fixed: LPP-1657 (UTF16 data broken)
> 
> Technical Reviewer: ptw
> QA Reviewer: (pending)
> 
> Overview:
> UTF-16 encoding support is a bit inconsistent, it isn't supported for some 
> files (e.g. lzx or proxied xml from server), while for other files it is 
> (e.g. css files). In addition to that, I've also encountered "duplicated" 
> code for BOM detection or encoding detection from the xml-declaration (in 
> "<?xml encoding=... ?>"). "duplicated" is put in quotes here, because even 
> though the code was used for the same purpose, it was slightly different 
> leading to other results depending on which method was called. Here's an 
> example for BOM detection from FileUtils:
> 
> FileUtils contained several methods to read the BOM, but all of them were 
> kind of broken:
> - detectBOMEncoding() used mark() to mark the current position in the stream, 
> but never called reset()
> - stripByteOrderMark(PushbackInputStream) mixed up UTF-16LE and UTF-16BE, and 
> it may unread too much bytes (if the input stream contains less than three 
> bytes)
> - stripByteOrderMark(byte[]) was more or less a plain copy of the other 
> stripByteOrderMark() method, duplicate code, yuk...
> 
> 
> Details:
> FileUtils:
> #Encoding
> -- Simple enum for the encoding information (utf-8 and utf-16 only)
> #readString(Reader)
> -- New utility method to return the content of a Reader
> #getXMLStreamReader(InputStream, String)
> -- no longer read the complete InputStream into memory, instead use mark() to 
> reset the position resp. SequenceInputStream
> -- use BOM to deduce proper charset to read xml-prolog, if BOM not present, 
> guess possible encoding per auto-detection described in xml specification
> #readFully(InputStream, byte[])
> -- helper method to read from InputStream into byte array
> #skipFully(InputStream, long)
> -- helper method to skip() bytes more reliably
> #getEncoding(byte[])
> -- helper method to return encoding from BOM
> #guessXMLEncoding(byte[])
> -- helper method to guess possible encoding by reading first bytes of input
> #getBOMEncoding(BufferedInputStream)
> -- use getEncoding() method to retrieve encoding from BOM
> #makeXMLReaderForFile(String, String)
> -- use toByteArrayInputStream() to read complete input into memory now that 
> getXMLStreamReader() no longer uses in-memory streams
> #toByteArrayInputStream(InputStream)
> -- reads InputStream content into memory and returns ByteArrayInputStream 
> with the same content
> #stripByteOrderMark(PushbackInputStream)
> -- use getEncoding() to read BOM
> -- don't unread bytes after EOF
> #stripByteOrderMark(byte[])
> -- use getEncoding() to read BOM
> 
> CSSHandler:
> - use new methods from FileUtils to avoid duplicating code
> 
> HttpData:
> - use encoding detection from FileUtils
> 
> XMLGrabber:
> - the PullParser we're using doesn't perform any encoding detection, 
> therefore fallback to the utility methods in FileUtils
> 
> ResponderXML:
> - don't read files unconditionally in a certain encoding (here: UTF-8) 
> instead just return the plain bytes
> 
> FileUtils_Test2:
> - test case for the new resp. changed method from FileUtils
> 
> WholeFile_Test:
> - initialize ReprocessComments.setOptions() in order to run test case
> 
> build.xml:
> - re-enable FileUtils_Test, according to LPP-2054 it is currently failing, 
> but I couldn't reproduce the failure
> - add FileUtils_Test2
> - remove stale EDU/** entry from the oswego concurrent classes
> - profiler properties for visual vm, they're not enabled by default but 
> useful for things like LPP-9841
> 
> 
> Tests:
> test case from LPP-1657, junit test case
> 
> Files:
> M WEB-INF/lps/server/build.xml
> M WEB-INF/lps/server/src/org/openlaszlo/js2doc/WholeFile_Test.java
> M WEB-INF/lps/server/src/org/openlaszlo/css/CSSHandler.java
> M WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils.java
> A WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils_Test2.java
> M WEB-INF/lps/server/src/org/openlaszlo/servlets/responders/ResponderXML.java
> M WEB-INF/lps/server/src/org/openlaszlo/data/HttpData.java
> M WEB-INF/lps/server/src/org/openlaszlo/data/XMLGrabber.java
> 
> Changeset: 
> http://svn.openlaszlo.org/openlaszlo/patches/bargull-20110327-wHm.tar
> 


Reply via email to