Thank you Roger. Comments inline. In general, I will hold off changes until
after merge so as not to destabilize. Note that ImageBufferCache and
Decompression are not currently used.
On Mar 8, 2016, at 2:07 PM, Roger Riggs <roger.ri...@oracle.com> wrote:
Hi,
Review comments for the jimage code in java.base, mostly cleanup but perhaps a
bug or two.
General: inconsistent style and not following guidelines, copyright dates may
need an update.
Use of assert for error checking is not going to catch or help isolate errors
in production builds.
Noted.
dir: jdk/src/java.base/share/classes/jdk/internal/jimage/
BasicImageReader:
- readHeader() exception message should say not a jimage or wrong version
Other IOExceptions can occur if the file is corrupted or is too small, array
index out of bounds, etc.
Replaced with specific exceptions.
- findLocation() should use a more explicit computation for the redirection
than rehashing; this would allow the hash seed to be encapsulated.
As part of the review change set, I've documented the Minimal Perfect Hash
Method (see PerfectHashBuilder.) The algorithm requires a one time one rehash
when there is a collision. A new seed forces a different distribution of the
colliding strings. If you started with the existing hashCode, you would not
get a new distribution, merely a shift of all the colliding elements. Random
distribution is the essence of algorithm.
- slice() synchronizes on the ByteBuffer; does every other access?
Byte buffer use here is immutable, however slicing of the ByteBuffer does
require modifying position and limit. Wrapping or cloning the buffers would be
of little benefit and would take a performance hit. The ByteBuffer API might
have been well served with an atomic non-modifying slice operation.
- getAllLocations() is unused and does not respect its sorted argument;
Best to remove the method for now.
Removed.
- getResourceBuffer(): unused - remove
Removed.
- getResourceStream(): unused - remove
Removed.
ImageStringsReader:
- Is Modified UTF-8 really needed here; using normal UTF-8 would allow
re-use and better performance. Alternatively, Modified UTF-8 should be
added as a supported encoding.
Modified UTF-8 is used to be more compact and to simplify use in native code.
The jimage string table has overlapping use with constant pool compaction so is
really an older revision of the Modified UTF-8 standard. Not sure why Modifed
UTF-8 is not offered as an encoding. It would have made constant pool
management easier. And note that relying on a particular encoding is
problematic. Encoding standards do change, affecting the stability of the hash
algorithm and may prevent the reading of older jimage files.
- The ImageStringsReader.hashCode() should be defined so it can take advantage
of 32 or 64 bit accesses; since it is used frequently. And it needs to be
stable across revisions that may be accessing jimages from different JDK
versions.
Reasonable and will consider. Not sure what you mean by stable across
revisions. This is not string hashCode. This is a hashCode that is specific to
MPHM and is deterministic.
- stringFromMUTF8(byte[]) - there may be some optimization to avoid expanding
ascii to chars and then re-encoding in String.
Will touch base with the Aleksey.
- charsFromByteBufferLength/charsFromByteBuffer should throw an exception if no
terminating null, not just when assertions are enabled
Will throw InternalError instead.
- inconsistent assert errors vs exceptions for malformed mutf8.
Use of assert will not detect errors in production builds
Will throw InternalError instead.
- mutf8FromChars allocates a new working buffer (byte[8]) potentially for every
UNICODE char.
Will hoist the buffer allocation and reuse.
- mutf8FromString: should take advantage of new String(bytes, offset, length,
charsetName) to avoid extra allocations.
(would need a Modified utf-8 charset).
Modified UTF-8 not available.
- hashCode(byte[0, seed) should be private; the hashcode algorithm should not
be variable.
This is not String hashCode. see above.
ImageLocation:
- Merge ImageLocation functions into ImageLocationBase and rename.
ImageLocation does not have enough unique functionality to be a separate
class.
Merged.
- It is a poor design to create an instance that may not be valid.
It would be better to have an ImageLocation factory method that encapsulated
the creation and checking
Noted.
ImageLocationBase:
- Failing asserts will not cause a runtime error in production.
But will degenerate into unpredictable other exceptions
Will throw InternalError instead.
ImageStream:
- compress(): Too heavyweight an object to be used and discarded just for
compress / decompress of small values.
compress()? I don’t see ImageStream used anywhere except for large buffers.
As a note, dynamic ByteBuffers really should be part of the JDK.
ImageBufferCache:
- Comments on each method would make the intent clear
Will add comments.
- getBuffer: mixing for loop and explicit indexes is fragile
- releaseBuffer: check buffer.capacity before threadLocal.get() for early
return
- Logic of i,j is flawed, i is never changed; j is/becomes 0
The first buffer is removed if there are more than max
Its not clear what algorithm was intended.
A buffer that is in use can be removed.
Rewrote using lambdas to clarify the algorithm.
- ImageBufferCache constructor: the isUsed flag should be set in getBuffer()
when it is added to the list (not in the constructor)
It keeps it consistent with when it is set for a re-used buffer.
Modified.
ImageHeader:
- BADMAGIC is not used
Removed.
- readFrom could throw BufferUnderflowException - undocumented
get(i) can IndexOutOfBoundsException [and get() BufferUnderflowException] Will
add explicit check.
ImageModuleData:
- ImageModuleData seems to be unused; allModuleNames() is unreferenced
Removed.
ImageReader:
- ImageReader(path): unused - remove
Removed.
- Use of assert is non-reassuring, ineffective in production builds
Added explicit check.
- In a long running application the per thread buffers will get to the point
wehere they are unused and should be released.
There is no support for soft or weak references to handle that case.
Alternatively, if the buffers were not per thread then they would have a
lesser impact.
The core libs team was concerned about the performance of class loading. The
pattern used here is the one followed by NIO. We will probably come up with a
shared NIO cache at some point.
Also, why should there be more than one buffer per thread?
The image cache is only used by 32 bit systems and decompression. It is
possible for a channel read buffer and a decompression buffer to be allocated
at the same time.
ImageReader.Resource:
- unused create method
Removed.
- unused parent argument (0) in Resource constructor (dir, loc, attr)
Removed.
ImageReader.LinkNode:
- constructor has unused parent arg
Removed
decompress/CompressIndexes:
- readInt: wasteful allocation of a single byte array and then another small
byte array;
and then another heap buffer allocation in the ByteBuffer.
Removed all allocation.
- decompress is hugely wasteful of allocations of ByteBuffers of small sizes.
Removed all allocation.
decompress/CompressedResourceHeader:
- MAGIC does not need to be public
decompress/Decompressor:
- IOException "Plugin name not found" should include the missing name
- StringsProvider is an unnecessary interface; could use Supplier<String>
- Redundant public modifiers in interface declarations
decompress/SignatureParser:
- Comments describing the input and output formats needed
- style around operators does not consistently include spaces; and "if("...
decompress/StringSharingDecompressor:
- Wasteful use of ByteBuffers and small arrays without re-use (safeAdd...)
- StringSharingDecompressor constructor with unused argument - remove
decompress/ZipDecompressor:
- Should be more specific about the zip algorithms supported
since it needs to be stable across releases
- decompress(): use try-with-resources for safer inflation
I think decompression needs significant reworking..
src/java.base/share/native/libjimage/imageDecompressor.cpp
- "// If ptr is not aligned, sparc will fail." implies input argument must be
aligned
- General doubt about using assert; for example to check on malloc allocation
failure
- In production environment will exhibit as segv
Noted.