Hi,

Could this not be handled like any other third party provided binary which
can run in our process? Say, a jvmti agent, any JNI library, or any system
library? For all these cases we expect the user to know what he is doing,
resp. the system library not to be buggy. Why not here? This seems like an
arbitrary distinction.

Cheers, Thomas




On Thu 23. Jul 2020 at 21:35, Lance Andersen <lance.ander...@oracle.com>
wrote:

> Hi Volker,
>
> A change such as this I believe would require a JEP to fully define the
> changes/risks/benefits
>
> I do have concerns about using multiple implementations for
> compression/decompression.  From a testing and support perspective, this
> adds additional burden.  What type of testing matrix are you proposing?
>
> Thought has to be given as to what it would mean to be compatible which
> could require updates to the compatibility rules for JCK testing.
>
> I am also concerned whether the proposed change adds enough value for the
> additional complexity (and support burden)  it brings.
>
> Perhaps a proposal to replace  the current zlib with an alternative might
> be an easier path forward
>
> I believe to move forward some, if not all of the above, including a JEP
> needs to be further flushed out.
>
> Best,
> Lance
>
> > On Jul 23, 2020, at 1:18 PM, Volker Simonis <volker.simo...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > can I please get some reviews for the following small enhancement
> > which will allow you to configure different zlib implementations at VM
> > start-up and get up to 100% better throughput for compression and
> > about 50% better throughput for decompression (depending on the
> > selected zlib implementation). We're using a similar change
> > productively in Amazon since quite a while with good results on
> > compression/decompression heavy workloads.
> >
> > As usual, my write-up is moch longer and much more complex than the
> > change itself :)
> >
> > http://cr.openjdk.java.net/~simonis/webrevs/2020/8249963/
> > https://bugs.openjdk.java.net/browse/JDK-8249963
> >
> > I've tested these changes locally on Linux/Windows/Mac without any
> > problems but got a Mach5 build error for Windows after sending them to
> > the submit repo (Job:
> > mach5-one-simonis-JDK-8249963-20200723-1236-12890347, BuildId:
> > 2020-07-23-1235507.volker.simonis.source). Would be great if somebody
> > from Oracle could shed some light on that issue.
> >
> > Thank you and best regards,
> > Volker
> >
> > Following some background, benchmarks and implementation details for the
> change:
> >
> > The JDK bundles the original zlib implementation [1] almost from the
> > very beginning, starting with zlib 1.0.4 in JDK 1.1. In the meantime,
> > the bundled zlib implementation was updated to the latest released
> > version 1.2.11. However, the innovation velocity in the original zlib
> > version has considerably slowed down in the last years.
> >
> > Lately, new projects have picked up the original zlib implementation
> > and considerably improved its deflate and inflate speed. Testing shows
> > that compression throughput can be improved between 50 and 100% with
> > Cloudflare's zlib version [2] and decompression throughput between 30
> > and 60% using the zlib implementation from the Chromium project [3].
> >
> > Overview
> > --------
> >
> > This enhancement proposes some simple changes to the JDK's zlib
> > wrapper libzip which make it possible to choose alternative zlib
> > implementations selectively for deflation and/or inflation at VM
> > startup based on a system property. As an additional benefit it makes
> > it possible to use the system zlib version even if the JDK was
> > configured with a bundled zlib.
> >
> > Already now, the JDK can be configured at build time to either use the
> > bundled zlib version or to dynamically link against the system zlib.
> > If linked dynamically, LD_LIBRARY_PATH or LD_PRELOAD can be used to
> > run the JDK with an alternative zlib implementation. This approach is
> > however not very flexible because:
> > - once configured with a "bundled" zlib it's not possible to use
> > another implementation any more. Some platforms like Windows don't
> > have a default, system provided zlib version so they are always built
> > with "--with-zlib=bundled" and therefore can't change the
> > implementation at runtime.
> > - even if configured with "--with-zlib=system", it is only possible a
> > single alternative version as a substitute for the system zlib. But
> > we've found (see benchmarks [4]) that there's no single zlib
> > implementation which improves both compression and decompression
> > equally well. It may therefore make sense to have the possibility to
> > selectively use one implementation for compression and another one for
> > decompression.
> >
> > This enhancement proposes three new system properties:
> >
> > org.openjdk.zlib.implementation
> > org.openjdk.zlib.implementation.inflate
> > org.openjdk.zlib.implementation.deflate
> >
> > which can be used to either replace the full zlib functionality or
> > selectively just the deflate or inflate part with a new
> > implementation. As an argument they take an absolute or relative path
> > to an API-compatible, shared zlib library. E.g.
> >
> > -Dorg.openjdk.zlib.implementation=zlib-cloudflare.so
> >
> -Dorg.openjdk.zlib.implementation.deflate=/Git/zlib-bench/lib/x86_64/zlib-cloudflare.dylib
> >
> -Dorg.openjdk.zlib.implementation.inflate='D:\Git\zlib-bench\lib\x86_64\zlib-chromium.dll`
> >
> > An argument with the value "system" can be used to instruct the JDK to
> > load the default zlib version on the corresponding system. Notice that
> > this option now also works in the case where the JDK was configured
> > with a bundled zlib.
> >
> > The naming of the properties is of course subject to discussion. If we
> > get an agreement on this change and the naming, I'll be happy to open
> > a CSR for the introduction of the new properties.
> >
> > For testing purpose I've made some precompiled version of
> > zlib-cloudflare and zlib-chromium for Linux/x86_64, Linux/aarch64,
> > MacOS/x86_64 and Windows/x86_64 available in my GitHub repository [6].
> >
> > The implementation of this whole enhancement is fairly simple. Instead
> > of calling right into zlib from libzip I've replaced all direct calls
> > by indirect calls through function pointers. By default, these
> > function pointers are pointing either to the corresponding functions
> > in the bundled zlib implementation or to the corresponding functions
> > in the dynamically linked system zlib. If none of the above system
> > properties will be used, the only change to the current implementation
> > is the replacement of some direct calls by indirect calls. Because
> > these function calls are usually made through JNI and because the
> > underlying code is usually quite compute intensive, I couldn't measure
> > any performance regression because of this change.
> >
> > If one of the above system properties will be set, the selected shared
> > library will be dynamically loaded and the corresponding function
> > pointers will be redirected to point to the new implementation. I have
> > verified that this works on Linux, Windows and Mac OS X. You can find
> > some more implementation details at the end of this write-up.
> >
> > Benchmarks
> > ----------
> >
> > I've run native benchmarks to compare the compression/decompression
> > throughput and the compression ratio at various compression levels for
> > zlib-adler [7], zlib-chromium [8], zlib-ng [9], zlib-ipp [10],
> > zlib-cloudflare [11] and the isa-l [12] library. The results and more
> > details can be found in my zlib-bench GitHub repository [5].
> >
> > After that I've used precompiled versions of zlib-cloudflare,
> > zlib-chromium and the changes proposed by this enhancement to rerun
> > these benchmarks in Java on Linux/x86_64 and Linux/aarch64. I think
> > the results are quite impressive [4] showing around 100% higher
> > throughput for compression and around 50% higher throughput for
> > decompression on both architectures.
> >
> > Implementation details
> > ----------------------
> >
> > Before change "8237750: Load libzip.so only if necessary" [13] libzip
> > (and transitively libz) was loaded early at VM startup in
> > "init_globals() -> ClassLoader::initialize()". For this change I've
> > re-enabled this early loading if and only if one of the new system
> > properties was given on the command line. I think these new properties
> > will only be used if heavy compression/decompression usage is expected
> > and in such a case it doesn't harm to load libzip early at startup.
> > Loading libzip later and on demand would otherwise make the
> > synchronization of the function pointer initialization unnecessary
> > hard because libzip can be loaded independently from the VM as well as
> > from the class library.
> >
> > An alternative approach would be to use __attribute(constructor) (on
> > Linux) or DLLMain() (on Windows) to perform the function pointer
> > initialization once when libzip gets loaded. But first, this approach
> > is quite system dependent and I'm not sure it is available on all
> > platforms (I'm pretty sure it doesn't work on AIX :). Second, I think
> > it is not possible to dynamically load DLLs from DLLMain() on Windows
> > which would be required in this case.
> >
> > A third possibility would be to define a JNI OnLoad function for
> > libzip. But unfortunately, libzip can not only be loaded from
> > java.util.zip where we have a JNIEnv but also from HotSpot or directly
> > with dlopen() from libjimage.
> >
> > I've also removed the following, OpenJDK-specific patch to "zconf.h"
> > in the bundled zlib version which seems to have been there "since
> > ever" but which I don't think is required any more:
> >
> > src/java.base/share/native/libzip/zlib/zconf.h
> > -#ifdef _LP64
> > -typedef unsigned int  uLong;  /* 32 bits or more */
> > -#else
> > typedef unsigned long  uLong; /* 32 bits or more */
> > -#endif
> >
> > The change is needed to make the bundled zlib compatible with the
> > system or the other zlib implementations which all do not have this
> > change. Notice that if we're building with the system zlib, we are
> > already using this setting today because in that case "zconf.h" is
> > taken from the system include path.
> >
> > Finally, it has to be noticed that although we are loading libzip
> > early there are still two call sites of zlib functionality which won't
> > be able to benefit from the new implementations because they either
> > statically link in parts of the bundled zlib version or directly and
> > dynamically link against the system zlib version. That's
> > libsplashscreen and libjli. Both are only used at startup and both
> > only make limited use of zlib (if they use it at all) so I don't think
> > that they are relevant.
> >
> > Risks
> > -----
> >
> > Unfortunately ZipInputStream/ZipOutputStream have been designed such
> > that it is very easy to use them in an unsupported way. It is in
> > general not possible to read a ZipEntry from a ZipInputStream and
> > write that exact ZipEntry into a ZipOutputStream followed by the
> > inflated and re-deflated data of that entry. However this naive
> > approach is sometimes used to copy zip entries from a zip input file
> > into another zip output file. This approach will only work in the case
> > where the exact same deflate implementation and compression was used
> > for the initial compression like for the recompression.
> >
> > This is because the ZipEntry will contain the compressed size of the
> > data belonging to that entry. But the Deflate format doesn't mandate a
> > specific, fixed compression algorithm which will always result in the
> > same compressed size. Instead, different implementations are free to
> > use different approaches for compression which can result in valid
> > entries but potentially with a different compressed size.
> >
> > Actually, even if just a single implementation is used for both
> > compression and decompression, it is still not possible to detect the
> > compression level from the compressed data. It is  therefore possible
> > that decompressing and re-compressing that data into a ZipOutputStream
> > will result in a different compressed size. However, ZipOutputStream
> > will throw an exception if the compressed size entry in a ZipEntry is
> > different from the actual size of the compressed data.
> >
> > I've recently fixed two such cases in the OpenJDK (JDK-8240333 [14],
> > JDK-8240235 [15]) itself, but there may be other such use cases in the
> > wild which may throw an "ZipException: invalid entry compressed size
> > (expected 66 but got 67 bytes)". If that happens, the user can either
> > fix his code (which is trivial and advised because of the problems
> > explained before) or simply remove the new system properties in order
> > to fall back to the bundled or system implementation.
> >
> > [1] https://www.zlib.net/
> > [2] https://github.com/cloudflare/zlib
> > [3] https://chromium.googlesource.com/chromium/src/third_party/zlib/
> > [4] https://github.com/simonis/zlib-bench/blob/master/Results-ojdk.md
> > [5] https://github.com/simonis/zlib-bench/blob/master/Results.md
> > [6] https://github.com/simonis/zlib-bench/tree/master/lib
> > [7] https://github.com/madler/zlib
> > [8] https://chromium.googlesource.com/chromium/src/third_party/zlib/
> > [9] https://github.com/zlib-ng/zlib-ng
> > [10]
> https://software.intel.com/en-us/articles/intel-ipp-zlib-coding-functions
> > [11] https://github.com/cloudflare/zlib
> > [12] https://github.com/intel/isa-l
> > [13] https://bugs.openjdk.java.net/browse/JDK-8237750
> > [14] https://bugs.openjdk.java.net/browse/JDK-8240333
> > [15] https://bugs.openjdk.java.net/browse/JDK-8240235
>
>  <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>  <http://oracle.com/us/design/oracle-email-sig-198324.gif> <
> http://oracle.com/us/design/oracle-email-sig-198324.gif>
>  <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen|
> Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> lance.ander...@oracle.com <mailto:lance.ander...@oracle.com>
>
>
>
>

Reply via email to