On 9/19/18, 2:02 PM, Roger Riggs wrote:
Hi Alan,

On 9/19/18 4:14 PM, Alan Bateman wrote:
On 19/09/2018 19:48, Joe Wang wrote:
Hi,

After much discussion and 10 iterations of reviews, this proposal has evolved from what was the original isSameContent method to a mismatch method. API-wise, a compare method was also considered as it looked like just a short step forward from mismatch, however, it was eventually dropped since there is no convincing use case comparing files lexicographically by contents. Impl-wise, extensive performance benchmarking has been done to compare a buffered reading vs memory mapping, the result was that a simple buffered reading performed better among small files, and those with the mismatched byte closer to the beginning of files. Since the proposed method's targeted files are small ones, the impl currently does a buffered reading only.

Please review.

JBS: https://bugs.openjdk.java.net/browse/JDK-8202285

specdiff: http://cr.openjdk.java.net/~joehw/jdk12/8202285/specdiff/java/nio/file/Files.html

webrev: http://cr.openjdk.java.net/~joehw/jdk12/8202285/webrev/
Starting out using buffered I/O is probably okay but I assume we will want to change this in the future to having it use memory mapped I/O beyond a certain threshold.
This came up in off-line discussions, it seems unlikely that two files will differ only in the last of 100Mb and it will require a separate code path that will very infrequently be exercised. So I'd still to a single technique even if it is slightly slower for very large files to keep the size of the code in check.
If it shows up later as a performance problem it can be added.

+1

$.02, Roger


Can you explain the use of toRealPath and comparing the names? That shouldn't be needed. Also the catching of ProviderMismatchExcepiton seems a bit strange too. Can you replace mismatchByAttrs with isSameFile? You could call this from Files.mismatch and then use the supporting implementation for the case that the files are not the same.

The purpose of toRealPath is so that we can do a quick Path comparison by following symlinks if any, I'll add a test to compare a file and its symbolic link. ProviderMismatchExcepiton was added in one of the iterations when we defined UOE, shall be removed in the next update (I'll have an update after some performance work to compare with direct buffer is done).

mismatchByAttrs cannot be replaced with isSameFile, it does one edge case check more than isSameFile when the size of one of the files is zero. mismatchByAttrs opens the files once vs twice if isSameFile and size(path) are called.

Thanks,
Joe


-Alan

Reply via email to