* Fay Stegerman <f...@obfusk.net> [2024-04-11 04:28]:
> * Holger Levsen <hol...@layer-acht.org> [2024-04-11 02:14]:
> > > unzip does seem to extract all the files, though it errors out.  Not sure 
> > > what
> > > diffoscope should do here.  This is definitely a broken ZIP file.  That 
> > > bug
> > > should probably be reported against libscout or whatever tooling it used 
> > > to
> > > create that JAR.
> > 
> > I agree it's more complicated, but fundamentally, diffoscope should *not* 
> > crash
> > here! (but rather report the broken zip file.)
> 
> I think we all agree it shouldn't crash :)
> 
> What I meant is that I'm not sure it should simply catch the error, report the
> file as broken, and not attempt extraction, or if it makes sense to attempt to
> work around this issue, at least in cases like this specific one where the
> entries are exact duplicates and the files can presumably be safely extracted.
> I think my workaround (which could be implemented slightly differently as 
> well,
> without modifying the ZipFile, but processing it differently in diffoscope)
> would accomplish that for this JAR at least.  I could make an MR for that.
> Though as I said I will also report this upstream to cpython, probably 
> tomorrow.
> 
> - Fay

The attached patch avoids the crash in this case, FWIW.  I would still recommend
catching the error for other cases.

- Fay
diff --git a/diffoscope/comparators/zip.py b/diffoscope/comparators/zip.py
index 2a27042a..4bfb1527 100644
--- a/diffoscope/comparators/zip.py
+++ b/diffoscope/comparators/zip.py
@@ -182,7 +182,12 @@ class ZipDirectory(Directory, ArchiveMember):
 
 class ZipContainer(Archive):
     def open_archive(self):
-        return zipfile.ZipFile(self.source.path, "r")
+        zf = zipfile.ZipFile(self.source.path, "r")
+        self.name_to_info = {}
+        for info in zf.infolist():
+            if info.filename not in self.name_to_info:
+                self.name_to_info[info.filename] = info
+        return zf
 
     def close_archive(self):
         self.archive.close()
@@ -199,7 +204,8 @@ class ZipContainer(Archive):
         ).encode(sys.getfilesystemencoding(), errors="replace")
 
         try:
-            with self.archive.open(member_name) as source, open(
+            info = self.name_to_info[member_name]
+            with self.archive.open(info) as source, open(
                 targetpath, "wb"
             ) as target:
                 shutil.copyfileobj(source, target)

Reply via email to