Change bargull-20110327-wHm by bargull@Bargull02 on 2011-03-27 13:48:00
in /home/anba/src/svn/openlaszlo/trunk
for http://svn.openlaszlo.org/openlaszlo/trunk

Summary: Support UTF-16 encoding for files

Bugs Fixed: LPP-1657 (UTF16 data broken)

Technical Reviewer: ptw
QA Reviewer: (pending)

Overview:
UTF-16 encoding support is a bit inconsistent, it isn't supported for some files (e.g. lzx or proxied xml from server), while for other files it is (e.g. css files). In addition to that, I've also encountered "duplicated" code for BOM detection or encoding detection from the xml-declaration (in "<?xml encoding=... ?>"). "duplicated" is put in quotes here, because even though the code was used for the same purpose, it was slightly different leading to other results depending on which method was called. Here's an example for BOM detection from FileUtils:

FileUtils contained several methods to read the BOM, but all of them were kind of broken: - detectBOMEncoding() used mark() to mark the current position in the stream, but never called reset() - stripByteOrderMark(PushbackInputStream) mixed up UTF-16LE and UTF-16BE, and it may unread too much bytes (if the input stream contains less than three bytes) - stripByteOrderMark(byte[]) was more or less a plain copy of the other stripByteOrderMark() method, duplicate code, yuk...


Details:
FileUtils:
#Encoding
-- Simple enum for the encoding information (utf-8 and utf-16 only)
#readString(Reader)
-- New utility method to return the content of a Reader
#getXMLStreamReader(InputStream, String)
-- no longer read the complete InputStream into memory, instead use mark() to reset the position resp. SequenceInputStream -- use BOM to deduce proper charset to read xml-prolog, if BOM not present, guess possible encoding per auto-detection described in xml specification
#readFully(InputStream, byte[])
-- helper method to read from InputStream into byte array
#skipFully(InputStream, long)
-- helper method to skip() bytes more reliably
#getEncoding(byte[])
-- helper method to return encoding from BOM
#guessXMLEncoding(byte[])
-- helper method to guess possible encoding by reading first bytes of input
#getBOMEncoding(BufferedInputStream)
-- use getEncoding() method to retrieve encoding from BOM
#makeXMLReaderForFile(String, String)
-- use toByteArrayInputStream() to read complete input into memory now that getXMLStreamReader() no longer uses in-memory streams
#toByteArrayInputStream(InputStream)
-- reads InputStream content into memory and returns ByteArrayInputStream with the same content
#stripByteOrderMark(PushbackInputStream)
-- use getEncoding() to read BOM
-- don't unread bytes after EOF
#stripByteOrderMark(byte[])
-- use getEncoding() to read BOM

CSSHandler:
- use new methods from FileUtils to avoid duplicating code

HttpData:
- use encoding detection from FileUtils

XMLGrabber:
- the PullParser we're using doesn't perform any encoding detection, therefore fallback to the utility methods in FileUtils

ResponderXML:
- don't read files unconditionally in a certain encoding (here: UTF-8) instead just return the plain bytes

FileUtils_Test2:
- test case for the new resp. changed method from FileUtils

WholeFile_Test:
- initialize ReprocessComments.setOptions() in order to run test case

build.xml:
- re-enable FileUtils_Test, according to LPP-2054 it is currently failing, but I couldn't reproduce the failure
- add FileUtils_Test2
- remove stale EDU/** entry from the oswego concurrent classes
- profiler properties for visual vm, they're not enabled by default but useful for things like LPP-9841


Tests:
test case from LPP-1657, junit test case

Files:
M WEB-INF/lps/server/build.xml
M WEB-INF/lps/server/src/org/openlaszlo/js2doc/WholeFile_Test.java
M WEB-INF/lps/server/src/org/openlaszlo/css/CSSHandler.java
M WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils.java
A WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils_Test2.java
M WEB-INF/lps/server/src/org/openlaszlo/servlets/responders/ResponderXML.java
M WEB-INF/lps/server/src/org/openlaszlo/data/HttpData.java
M WEB-INF/lps/server/src/org/openlaszlo/data/XMLGrabber.java

Changeset: http://svn.openlaszlo.org/openlaszlo/patches/bargull-20110327-wHm.tar

Reply via email to