On 9/19/18, 2:02 PM, Roger Riggs wrote:
Hi Alan,
On 9/19/18 4:14 PM, Alan Bateman wrote:
On 19/09/2018 19:48, Joe Wang wrote:
Hi,
After much discussion and 10 iterations of reviews, this proposal
has evolved from what was the original isSameContent method to a
mismatch method. API-wise, a compare method was also considered as
it looked like just a short step forward from mismatch, however, it
was eventually dropped since there is no convincing use case
comparing files lexicographically by contents. Impl-wise, extensive
performance benchmarking has been done to compare a buffered reading
vs memory mapping, the result was that a simple buffered reading
performed better among small files, and those with the mismatched
byte closer to the beginning of files. Since the proposed method's
targeted files are small ones, the impl currently does a buffered
reading only.
Please review.
JBS: https://bugs.openjdk.java.net/browse/JDK-8202285
specdiff:
http://cr.openjdk.java.net/~joehw/jdk12/8202285/specdiff/java/nio/file/Files.html
webrev: http://cr.openjdk.java.net/~joehw/jdk12/8202285/webrev/
Starting out using buffered I/O is probably okay but I assume we will
want to change this in the future to having it use memory mapped I/O
beyond a certain threshold.
This came up in off-line discussions, it seems unlikely that two files
will differ only in the last of 100Mb
and it will require a separate code path that will very infrequently
be exercised. So I'd still to a single
technique even if it is slightly slower for very large files to keep
the size of the code in check.
If it shows up later as a performance problem it can be added.
+1
$.02, Roger
Can you explain the use of toRealPath and comparing the names? That
shouldn't be needed. Also the catching of ProviderMismatchExcepiton
seems a bit strange too. Can you replace mismatchByAttrs with
isSameFile? You could call this from Files.mismatch and then use the
supporting implementation for the case that the files are not the same.
The purpose of toRealPath is so that we can do a quick Path comparison
by following symlinks if any, I'll add a test to compare a file and its
symbolic link. ProviderMismatchExcepiton was added in one of the
iterations when we defined UOE, shall be removed in the next update
(I'll have an update after some performance work to compare with direct
buffer is done).
mismatchByAttrs cannot be replaced with isSameFile, it does one edge
case check more than isSameFile when the size of one of the files is
zero. mismatchByAttrs opens the files once vs twice if isSameFile and
size(path) are called.
Thanks,
Joe
-Alan