* Fay Stegerman <f...@obfusk.net> [2024-04-11 04:28]: > * Holger Levsen <hol...@layer-acht.org> [2024-04-11 02:14]: > > > unzip does seem to extract all the files, though it errors out. Not sure > > > what > > > diffoscope should do here. This is definitely a broken ZIP file. That > > > bug > > > should probably be reported against libscout or whatever tooling it used > > > to > > > create that JAR. > > > > I agree it's more complicated, but fundamentally, diffoscope should *not* > > crash > > here! (but rather report the broken zip file.) > > I think we all agree it shouldn't crash :) > > What I meant is that I'm not sure it should simply catch the error, report the > file as broken, and not attempt extraction, or if it makes sense to attempt to > work around this issue, at least in cases like this specific one where the > entries are exact duplicates and the files can presumably be safely extracted. > I think my workaround (which could be implemented slightly differently as > well, > without modifying the ZipFile, but processing it differently in diffoscope) > would accomplish that for this JAR at least. I could make an MR for that. > Though as I said I will also report this upstream to cpython, probably > tomorrow. > > - Fay
The attached patch avoids the crash in this case, FWIW. I would still recommend catching the error for other cases. - Fay
diff --git a/diffoscope/comparators/zip.py b/diffoscope/comparators/zip.py index 2a27042a..4bfb1527 100644 --- a/diffoscope/comparators/zip.py +++ b/diffoscope/comparators/zip.py @@ -182,7 +182,12 @@ class ZipDirectory(Directory, ArchiveMember): class ZipContainer(Archive): def open_archive(self): - return zipfile.ZipFile(self.source.path, "r") + zf = zipfile.ZipFile(self.source.path, "r") + self.name_to_info = {} + for info in zf.infolist(): + if info.filename not in self.name_to_info: + self.name_to_info[info.filename] = info + return zf def close_archive(self): self.archive.close() @@ -199,7 +204,8 @@ class ZipContainer(Archive): ).encode(sys.getfilesystemencoding(), errors="replace") try: - with self.archive.open(member_name) as source, open( + info = self.name_to_info[member_name] + with self.archive.open(info) as source, open( targetpath, "wb" ) as target: shutil.copyfileobj(source, target)