Re: RFR: 8317678: Fix up hashCode() for ZipFile.Source.Key [v4]

Jaikiran Pai Mon, 16 Oct 2023 00:50:43 -0700

On Fri, 13 Oct 2023 12:18:41 GMT, Sean Coffey <coff...@openjdk.org> wrote:


>> Fix up java.util.zip.ZipFile$Source hashCode() impl so that duplicate Source 
>> objects aren't created for the same zip file.
>
> Sean Coffey has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   minor test edits and comment updates

I had a look at the linked JBS and the new jtreg test case in this PR. Is my 
understanding correct that the issue that we are trying to solve here is the 
case where the cache would end up having 2 separate entries for the same jar 
file, if one `JarFile` was constructed using an absolute path and the other 
`JarFile` (for the same underlying file on the filesystem) was constructed 
using a relative path?

The mention of fixing hashCode() makes me wonder if there is/was some other 
issue that was triggered which lead to creation of that JBS issue.

Coming to the proposed implementation in this PR, I see that we first rely on 
`BasicFileAttributes.fileKey()` and if it is available for a given `File` 
instance, we store that `File` instance as a component of the `Key`. Looking at 
the javadoc of `BasicFileAttributes.fileKey()` it states:


* Returns an object that uniquely identifies the given file, or {@code
* null} if a file key is not available. On some platforms or file systems
* it is possible to use an identifier, or a combination of identifiers to
* uniquely identify a file. Such identifiers are important for operations
* such as file tree traversal in file systems that support <a
* href="../package-summary.html#links">symbolic links</a> or file systems
* that allow a file to be an entry in more than one directory. On UNIX file
* systems, for example, the <em>device ID</em> and <em>inode</em> are
* commonly used for such purposes.

It's unclear to me (and I haven't yet read up on inode management in Unix) if 2 
different symbolic links pointing to the same target (jar) file on the 
filesystem would have 2 different/unique `fileKey()`. If they do, then do you 
think that even with the current proposed solution of using `fileKey()`, we 
would still end up with 2 different entries in this cache, each with a 
different `Key`? Would that then bring us back to the original issue?

As for the use of `File.getCanonicalFile()` in the absence of a `fileKey()`, 
there used to be performance considerations with the use of 
`File.getCanonicalFile()` (and `File.getCanonicalPath()`) to the extent that 
there used to be a cache involved. Recent versions of Java have removed that 
cache because it was no longer necessary as detailed in 
https://bugs.openjdk.org/browse/JDK-8301001. I don't have historical knowledge 
behind the motivation of that cache (JDK-8301001 does provide some necessary 
history), however JDK-8301001 also says that one of the reasons why the cache 
is no longer needed is because:

> The work in JDK 9 to remove canonicalization from FilePermission 
> (JDK-8164705) mostly eliminated the need for this caching.

Would introducing this `File.getCanonicalFile()` in this `JarFile` area now 
reintroduce performance considerations (for startup?) on filesystems that don't 
return a `fileKey()`? I see that the implementation of 
`File.getCanonicalFile()` now ends up being a native call.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16115#issuecomment-1763912405

Re: RFR: 8317678: Fix up hashCode() for ZipFile.Source.Key [v4]

Reply via email to