stevedlawrence commented on a change in pull request #529:
URL: https://github.com/apache/daffodil/pull/529#discussion_r614256960
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/layers/GZipTransformer.scala
##########
@@ -69,3 +69,84 @@ object GZIPTransformerFactory
xformer
}
}
+
+
+object GZIPFixedOutputStream {
+
+ private val fixIsNeeded = !scala.util.Properties.isJavaAtLeast("16")
+
+ /**
+ * Create a GZIPOutputStream that, if necessary, proxies writes through an
+ * OutputStream that fixes inconsistencies between Java versions
+ */
+ def apply(os: java.io.OutputStream) = {
+ val fixedOS = if (fixIsNeeded) new GZIPFixedOutputStream(os) else os
+ new java.util.zip.GZIPOutputStream(fixedOS)
+ }
+}
+
+/**
+ * Prior to Java 16, the java.util.zip.GZIPOutputStream wrote a value of zero
for
+ * the OS field in the header (byte index 9). In Java 16, this was changed to a
+ * value of 255 to better abide by the GZIP specification. Unfortunately, this
+ * means unparsed data using a GZIP layer might have a single byte difference,
+ * depending on the Java version used. This can lead to inconsistent behavior
of
+ * test failures that expect a certain byte value.
+ *
+ * To resolve this issue, we create this GZIPFixedOutputStream. This should
wrap
+ * the underlying OutputStream and be passed as the OutputStream to the
+ * GZIPOutputStream. When the GZIPOutputStream writes the 9th byte to this
+ * GZIPFixedOutputStream, this will always write a value of 255, making all
Java
+ * versions prior to 16 consistent with Java 16+ behavior.
+ */
+class GZIPFixedOutputStream private (os: java.io.OutputStream) extends
java.io.OutputStream {
+
+ /**
+ * The next byte position that byte will be written to. If this is negative,
+ * that means we have already fixed the output and everything should just
+ * pass straight through.
+ */
+ private var bytePosition = 0
+
+ override def close(): Unit = os.close()
+ override def flush(): Unit = os.flush()
+
+ override def write(b: Array[Byte], off: Int, len: Int): Unit = {
+ if (bytePosition < 0) {
+ // The bad byte has been fixed, pass all writes directly through to the
+ // underlying OutputStream. This may be more efficient than the default
+ // OutputStream write() function, which writes the bytes from his array
+ // one at a time
+ os.write(b, off, len)
+ } else {
+ // The bad byte has not been fixed yet. Unless a newer version of Java
+ // has made changes, the GZIPOutputStreamm will have passed in a 10 byte
+ // array to this function that includes the bad byte. Let's just write
+ // that array using the default write(array) method that writes these
+ // bytes one at a time and will call the write(int) method that will fix
+ // that byte. Calling write() one at a time is maybe inefficient but for
+ // such a small array it should not have a noticeable effect.
+ super.write(b, off, len)
+ }
+ }
+
+ override def write(b: Int): Unit = {
+ if (bytePosition < 0) {
+ // The bad byte has already been fixed, simply pass this byte through to
+ // the underlying OutputStream
+ os.write(b)
Review comment:
I think I have an idea of what's going on.
The first thing GZIPOutputStream does is write the 10 byte gzip header. To
do this, it first calls ``write(array)`` with an Array of size 10. The last of
these bytes (index 9) is the bad byte we need to fix. BytePosition is 0, so we
call ``super.write(b, off, len)``. That will call this ``write`` function for
each byte, one by one.
For bytes 0-8 we'll fall into the ``if (bytePosition < 9)`` case. In that
case, we'll send the byte through to the underlying OutputStream unchanged, and
increment ``bytePosition``.
For last byte at index 9 (the bad byte), we'll fall into the ``if
(bytePosition == 9)`` case. In this case, we will fix the bad byte and we'll
set ``bytePositio``n to -1.
And now, because this was the last index in the array we called
``super.write`` on, we won't call this ``write(int)`` function again. And
because the ``bytePosition`` is now -1, all future calls to ``write(array)``
will go straight to the underlying stream. So the only way this can get hit is
if the GZIPOutputStream calls ``write(int)``. And looking through the code of
the GZIPOutputSTream, I think this just never happens. It always writes from an
array, so we always just pass it through directly.
So, because GZIPOutputStream calls ``write(array)`` for the header, and only
the last byte of that array needs to be fixed, and because it never directly
calls ``write(int)`` for anything else, we never hit this case. If the
GZIPOutputStream ever changes so either of these aren't the case, we will be
able to hit this case. So technically this isn't possible, but only because of
the current behavior of GZIPOutputStream. It's possible that could change in
the future.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]