Bug#669427: apt segfaults on s390x
Hi David, On Sat, May 5, 2012 at 02:16:18 +0200, David Kalnischkies wrote: [Raising severity and therefore blocking transition as depending on dangling pointers isn't a great idea, even if it seems to work for all but not-yet release architectures… Upload next week or so] Any chance this could happen soon? Cheers, Julien signature.asc Description: Digital signature
Bug#669427: apt segfaults on s390x
On Sat, May 05, 2012 at 02:16:18AM +0200, David Kalnischkies wrote: package apt forcemerge 669427 669243 severity 669427 serious tag 669427 + patch thanks On Wed, May 2, 2012 at 5:23 PM, David Kalnischkies kalnischkies+deb...@gmail.com wrote: We are missing a bit of error checking here (callers of NewDescription() do not check if return is != 0 and IsDuplicateDescription doesn't check if the given Description is valid), but both shouldn't be a problem as NewDescription can only really fail if new memory can't be allocated and as each version has at least one description you shouldn't hit a problem in the dup check either. Both wouldn't be limited to s390x either way: We seem to have a similar bugreport from ppc64 (#669243), if i understand right it's also bigendian 64bit, but no other report. Lesson learned today: If you know you have a bug in your code, don't put it on the todolist, just fix it! (or at least the parts which are trivial to fix) The assumption that each version has a description is correct for all but one version: In line 442ff we iterate over all packages with the same name and all versions for these packages to check if we already have a version with this description. The problem: We iterate also over the version we have added just a few lines above which has no description yet as we are in the process of (maybe) adding one for it. Result: The duplication check will use a dangling pointer to a string which should be a md5sum but properly is whatever it wants to be. On the pro side this usually has the intended effect as a random string properly doesn't fit the constraints for an md5sum (yet alone that it matches). Still, i am really fascinated that this worked for months here and everywhere else (expect s390x and ppc64). I would have expected a segfault at least once in a while as this is not done once or twice but for every version, so more like 100.000 times. Amazing. I am going to play in the lottery now, maybe this segfault prevention luck is transitive⦠(properly more like: I wasted all my luck on this one here) [Raising severity and therefore blocking transition as depending on dangling pointers isn't a great idea, even if it seems to work for all but not-yet release architectures⦠Upload next week or so] Thanks to both of you for debugging this and sorry for the inconvenience! I have just tried, and I confirm the patch fixes the issue, at least on s390x. Aurelien -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#669427: apt segfaults on s390x
On Wed, May 02, 2012 at 05:28:16PM +0200, Philipp Kern wrote: On Wed, May 02, 2012 at 05:23:30PM +0200, David Kalnischkies wrote: 2129.4.17  kalnisc |   /* Record the Description (it is not translated) */          |   MD5SumValue CurMd5 = List.Description_md5();          |   if (CurMd5.Value().empty() == true)          |    return true;          |   std::string CurLang = List.DescriptionLanguage();          |          |   /* Before we add a new description we first search in the group for          |    a version with a description of the same MD5 - if so we reuse this          |    description group instead of creating our own for this version */          |   for (pkgCache::PkgIterator P = Grp.PackageList();          |   P.end() == false; P = Grp.NextPkg(P))          |   {          |    for (pkgCache::VerIterator V = P.VersionList();          |    V.end() == false; ++V)          |    {          |   if (IsDuplicateDescription(V.DescriptionList(), CurMd5, ) == false)          |     continue;          |   Ver-DescriptionList = V-DescriptionList;          |   return true;          |    }          |   } When IsDuplicateDescription is called, calling md5() on the V.DescriptionList() points to unallocated memory.  Any idea why that could be? So you mean the Description struct is invalid (V.DescriptionList()) or the V.DescriptionList().md5() char* ? The latter does not point to any initialized memory. We are missing a bit of error checking here (callers of NewDescription() do not check if return is != 0 and IsDuplicateDescription doesn't check if the given Description is valid), but both shouldn't be a problem as NewDescription can only really fail if new memory can't be allocated and as each version has at least one description you shouldn't hit a problem in the dup check either. Both wouldn't be limited to s390x either way: We seem to have a similar bugreport from ppc64 (#669243), if i understand right it's also bigendian 64bit, but no other report. Yes. So would be sparc64. s390x is the only in-archive arch that's 64bit be. The code in pkgcachegen.cc was reworked for multi-arch and especially the dup check is new and the code as such works wih pointer left and right, but non of it should be architecture dependent⦠Somehow i fear that it's more related to our checksum changes. We had way to many problems with sha1 and sha2 to assume md5 would be okay (the code for md5 itself is not new, but the code warping around it). At least the problem was not in constructing the checksum object, but in retrieving the content for it (i.e. md5() in IsDuplicateDescription). You mentioned bisecting? Any insigns which revisions are (not) effected? (bzr has no bisect included by default, and last time i check the plugin was⦠lets say suboptimal for us as we tend to have big merges) Either the plugin was broken or the tree it acted upon was useless to bisect. I don't know. I have been able to bisect the issue, by first converting the bzr repository to git, and then with the usual method. The commit that has broken apt on s390x is the following: commit 22f07fc5e77bcedbc774a4b60d305da847fab287 Author: David Kalnischkies kalnischk...@gmail.com Date: Wed Oct 12 15:47:56 2011 +0200 a version can have only a single md5 for descriptions, so we can optimize the merging with this knowledge a bit and by correctly sharing the lists we only need to have a single description list for possibly many different versions. This also means that description translations are shared between different sources diff --git a/apt-pkg/pkgcachegen.cc b/apt-pkg/pkgcachegen.cc index 3545517..3b2c08e 100644 --- a/apt-pkg/pkgcachegen.cc +++ b/apt-pkg/pkgcachegen.cc @@ -279,33 +279,36 @@ bool pkgCacheGenerator::MergeListPackage(ListParser List, pkgCache::PkgIterator for (Ver = Pkg.VersionList(); Ver.end() == false; ++Ver) { pkgCache::DescIterator Desc = Ver.DescriptionList(); - DynamicpkgCache::DescIterator DynDesc(Desc); - map_ptrloc *LastDesc = Ver-DescriptionList; + + // a version can only have one md5 describing it + if (MD5SumValue(Desc.md5()) != CurMd5) +continue; // don't add a new description if we have one for the given // md5 language if (IsDuplicateDescription(Desc, CurMd5, CurLang) == true) continue;
Bug#669427: apt segfaults on s390x
package apt forcemerge 669427 669243 severity 669427 serious tag 669427 + patch thanks On Wed, May 2, 2012 at 5:23 PM, David Kalnischkies kalnischkies+deb...@gmail.com wrote: We are missing a bit of error checking here (callers of NewDescription() do not check if return is != 0 and IsDuplicateDescription doesn't check if the given Description is valid), but both shouldn't be a problem as NewDescription can only really fail if new memory can't be allocated and as each version has at least one description you shouldn't hit a problem in the dup check either. Both wouldn't be limited to s390x either way: We seem to have a similar bugreport from ppc64 (#669243), if i understand right it's also bigendian 64bit, but no other report. Lesson learned today: If you know you have a bug in your code, don't put it on the todolist, just fix it! (or at least the parts which are trivial to fix) The assumption that each version has a description is correct for all but one version: In line 442ff we iterate over all packages with the same name and all versions for these packages to check if we already have a version with this description. The problem: We iterate also over the version we have added just a few lines above which has no description yet as we are in the process of (maybe) adding one for it. Result: The duplication check will use a dangling pointer to a string which should be a md5sum but properly is whatever it wants to be. On the pro side this usually has the intended effect as a random string properly doesn't fit the constraints for an md5sum (yet alone that it matches). Still, i am really fascinated that this worked for months here and everywhere else (expect s390x and ppc64). I would have expected a segfault at least once in a while as this is not done once or twice but for every version, so more like 100.000 times. Amazing. I am going to play in the lottery now, maybe this segfault prevention luck is transitive… (properly more like: I wasted all my luck on this one here) [Raising severity and therefore blocking transition as depending on dangling pointers isn't a great idea, even if it seems to work for all but not-yet release architectures… Upload next week or so] Thanks to both of you for debugging this and sorry for the inconvenience! Best regards David Kalnischkies apt-669427-dangling-description-duplication-check-segfaults.diff Description: Binary data
Bug#669427: apt segfaults on s390x
On Tue, May 1, 2012 at 4:53 PM, Philipp Kern pk...@debian.org wrote: On Fri, Apr 20, 2012 at 11:26:57PM +0200, Philipp Kern wrote: On Fri, Apr 20, 2012 at 10:04:28PM +0200, Philipp Kern wrote: On Thu, Apr 19, 2012 at 09:12:09PM +0200, Mehdi Dogguy wrote: On 19/04/12 20:21, Faidon Liambotis wrote: Package: apt Version: 0.9.1 Severity: serious apt 0.9.1 currently segfaults on the zelenka (our s390/s390x porterbox) sid_s390x chroot. Downgrading apt to 0.8.15.10 makes it work again. Does it also segfault on s390? (s390x is not a release arch yet so it doesn't warrant an RC severity, unless the maintainer thinks so). It breaks many package builds on s390x and it's broken on an architecture where it built before. And wrt s390x I can pull a Hurd: The current progress is however encouraging. It's in the construction of the MD5SumValue. But I'm not prepared to curse at bzr for the remainder of the evening, about a bisect plugin that frankly doesn't do what it's supposed to do. 2129.4.17 kalnisc | /* Record the Description (it is not translated) */ | MD5SumValue CurMd5 = List.Description_md5(); | if (CurMd5.Value().empty() == true) | return true; | std::string CurLang = List.DescriptionLanguage(); | | /* Before we add a new description we first search in the group for | a version with a description of the same MD5 - if so we reuse this | description group instead of creating our own for this version */ | for (pkgCache::PkgIterator P = Grp.PackageList(); | P.end() == false; P = Grp.NextPkg(P)) | { | for (pkgCache::VerIterator V = P.VersionList(); | V.end() == false; ++V) | { | if (IsDuplicateDescription(V.DescriptionList(), CurMd5, ) == false) | continue; | Ver-DescriptionList = V-DescriptionList; | return true; | } | } When IsDuplicateDescription is called, calling md5() on the V.DescriptionList() points to unallocated memory. Any idea why that could be? So you mean the Description struct is invalid (V.DescriptionList()) or the V.DescriptionList().md5() char* ? We are missing a bit of error checking here (callers of NewDescription() do not check if return is != 0 and IsDuplicateDescription doesn't check if the given Description is valid), but both shouldn't be a problem as NewDescription can only really fail if new memory can't be allocated and as each version has at least one description you shouldn't hit a problem in the dup check either. Both wouldn't be limited to s390x either way: We seem to have a similar bugreport from ppc64 (#669243), if i understand right it's also bigendian 64bit, but no other report. The code in pkgcachegen.cc was reworked for multi-arch and especially the dup check is new and the code as such works wih pointer left and right, but non of it should be architecture dependent… Somehow i fear that it's more related to our checksum changes. We had way to many problems with sha1 and sha2 to assume md5 would be okay (the code for md5 itself is not new, but the code warping around it). You mentioned bisecting? Any insigns which revisions are (not) effected? (bzr has no bisect included by default, and last time i check the plugin was… lets say suboptimal for us as we tend to have big merges) Best regards David Kalnischkies -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#669427: apt segfaults on s390x
On Wed, May 02, 2012 at 05:23:30PM +0200, David Kalnischkies wrote: 2129.4.17 kalnisc | /* Record the Description (it is not translated) */ | MD5SumValue CurMd5 = List.Description_md5(); | if (CurMd5.Value().empty() == true) | return true; | std::string CurLang = List.DescriptionLanguage(); | | /* Before we add a new description we first search in the group for | a version with a description of the same MD5 - if so we reuse this | description group instead of creating our own for this version */ | for (pkgCache::PkgIterator P = Grp.PackageList(); | P.end() == false; P = Grp.NextPkg(P)) | { | for (pkgCache::VerIterator V = P.VersionList(); | V.end() == false; ++V) | { | if (IsDuplicateDescription(V.DescriptionList(), CurMd5, ) == false) | continue; | Ver-DescriptionList = V-DescriptionList; | return true; | } | } When IsDuplicateDescription is called, calling md5() on the V.DescriptionList() points to unallocated memory. Any idea why that could be? So you mean the Description struct is invalid (V.DescriptionList()) or the V.DescriptionList().md5() char* ? The latter does not point to any initialized memory. We are missing a bit of error checking here (callers of NewDescription() do not check if return is != 0 and IsDuplicateDescription doesn't check if the given Description is valid), but both shouldn't be a problem as NewDescription can only really fail if new memory can't be allocated and as each version has at least one description you shouldn't hit a problem in the dup check either. Both wouldn't be limited to s390x either way: We seem to have a similar bugreport from ppc64 (#669243), if i understand right it's also bigendian 64bit, but no other report. Yes. So would be sparc64. s390x is the only in-archive arch that's 64bit be. The code in pkgcachegen.cc was reworked for multi-arch and especially the dup check is new and the code as such works wih pointer left and right, but non of it should be architecture dependent… Somehow i fear that it's more related to our checksum changes. We had way to many problems with sha1 and sha2 to assume md5 would be okay (the code for md5 itself is not new, but the code warping around it). At least the problem was not in constructing the checksum object, but in retrieving the content for it (i.e. md5() in IsDuplicateDescription). You mentioned bisecting? Any insigns which revisions are (not) effected? (bzr has no bisect included by default, and last time i check the plugin was… lets say suboptimal for us as we tend to have big merges) Either the plugin was broken or the tree it acted upon was useless to bisect. I don't know. Kind regards Philipp Kern signature.asc Description: Digital signature
Bug#669427: apt segfaults on s390x
On Fri, Apr 20, 2012 at 11:26:57PM +0200, Philipp Kern wrote: On Fri, Apr 20, 2012 at 10:04:28PM +0200, Philipp Kern wrote: On Thu, Apr 19, 2012 at 09:12:09PM +0200, Mehdi Dogguy wrote: On 19/04/12 20:21, Faidon Liambotis wrote: Package: apt Version: 0.9.1 Severity: serious apt 0.9.1 currently segfaults on the zelenka (our s390/s390x porterbox) sid_s390x chroot. Downgrading apt to 0.8.15.10 makes it work again. Does it also segfault on s390? (s390x is not a release arch yet so it doesn't warrant an RC severity, unless the maintainer thinks so). It breaks many package builds on s390x and it's broken on an architecture where it built before. And wrt s390x I can pull a Hurd: The current progress is however encouraging. It's in the construction of the MD5SumValue. But I'm not prepared to curse at bzr for the remainder of the evening, about a bisect plugin that frankly doesn't do what it's supposed to do. 2129.4.17 kalnisc |/* Record the Description (it is not translated) */ |MD5SumValue CurMd5 = List.Description_md5(); |if (CurMd5.Value().empty() == true) | return true; |std::string CurLang = List.DescriptionLanguage(); | |/* Before we add a new description we first search in the group for | a version with a description of the same MD5 - if so we reuse this | description group instead of creating our own for this version */ |for (pkgCache::PkgIterator P = Grp.PackageList(); |P.end() == false; P = Grp.NextPkg(P)) |{ | for (pkgCache::VerIterator V = P.VersionList(); | V.end() == false; ++V) | { | if (IsDuplicateDescription(V.DescriptionList(), CurMd5, ) == false) |continue; | Ver-DescriptionList = V-DescriptionList; | return true; | } |} When IsDuplicateDescription is called, calling md5() on the V.DescriptionList() points to unallocated memory. Any idea why that could be? Kind regards Philipp Kern signature.asc Description: Digital signature
Bug#669427: apt segfaults on s390x
On Thu, Apr 19, 2012 at 09:12:09PM +0200, Mehdi Dogguy wrote: On 19/04/12 20:21, Faidon Liambotis wrote: Package: apt Version: 0.9.1 Severity: serious apt 0.9.1 currently segfaults on the zelenka (our s390/s390x porterbox) sid_s390x chroot. Downgrading apt to 0.8.15.10 makes it work again. Does it also segfault on s390? (s390x is not a release arch yet so it doesn't warrant an RC severity, unless the maintainer thinks so). It breaks many package builds on s390x and it's broken on an architecture where it built before. And wrt s390x I can pull a Hurd: The current progress is however encouraging. Kind regards Philipp Kern signature.asc Description: Digital signature
Bug#669427: apt segfaults on s390x
On Fri, Apr 20, 2012 at 10:04:28PM +0200, Philipp Kern wrote: On Thu, Apr 19, 2012 at 09:12:09PM +0200, Mehdi Dogguy wrote: On 19/04/12 20:21, Faidon Liambotis wrote: Package: apt Version: 0.9.1 Severity: serious apt 0.9.1 currently segfaults on the zelenka (our s390/s390x porterbox) sid_s390x chroot. Downgrading apt to 0.8.15.10 makes it work again. Does it also segfault on s390? (s390x is not a release arch yet so it doesn't warrant an RC severity, unless the maintainer thinks so). It breaks many package builds on s390x and it's broken on an architecture where it built before. And wrt s390x I can pull a Hurd: The current progress is however encouraging. It's in the construction of the MD5SumValue. But I'm not prepared to curse at bzr for the remainder of the evening, about a bisect plugin that frankly doesn't do what it's supposed to do. Kind regards Philipp Kern signature.asc Description: Digital signature
Bug#669427: apt segfaults on s390x
Package: apt Version: 0.9.1 Severity: serious apt 0.9.1 currently segfaults on the zelenka (our s390/s390x porterbox) sid_s390x chroot. Downgrading apt to 0.8.15.10 makes it work again. Backtrace follows, even though it probably isn't that helpful, since apt doesn't have a debugging symbols package from what I could see. For your convienience, I went on and installed apt's build-deps on sid_s390x so that you rebuild and debug. Let me know if you need anything else. Thanks, Faidon #0 0x0229f386 in std::basic_stringchar, std::char_traitschar, std::allocatorchar ::basic_string(char const*, std::allocatorchar const) () from /usr/lib/s390x-linux-gnu/libstdc++.so.6 #1 0x020fa17a in IsDuplicateDescription(pkgCache::DescIterator, HashSumValue128 const, std::string const) () from /usr/lib/s390x-linux-gnu/libapt-pkg.so.4.12 #2 0x020ff8b4 in pkgCacheGenerator::MergeListVersion(pkgCacheGenerator::ListParser, pkgCache::PkgIterator, std::string const, pkgCache::VerIterator*) () from /usr/lib/s390x-linux-gnu/libapt-pkg.so.4.12 #3 0x02100034 in pkgCacheGenerator::MergeList(pkgCacheGenerator::ListParser, pkgCache::VerIterator*) () from /usr/lib/s390x-linux-gnu/libapt-pkg.so.4.12 #4 0x021664ec in debPackagesIndex::Merge(pkgCacheGenerator, OpProgress*) const () from /usr/lib/s390x-linux-gnu/libapt-pkg.so.4.12 #5 0x020f7390 in ?? () from /usr/lib/s390x-linux-gnu/libapt-pkg.so.4.12 #6 0x020fae6a in pkgCacheGenerator::MakeStatusCache(pkgSourceList, OpProgress*, MMap**, bool) () from /usr/lib/s390x-linux-gnu/libapt-pkg.so.4.12 #7 0x020ee74e in pkgCacheFile::BuildCaches(OpProgress*, bool) () from /usr/lib/s390x-linux-gnu/libapt-pkg.so.4.12 #8 0x020eec02 in pkgCacheFile::Open(OpProgress*, bool) () from /usr/lib/s390x-linux-gnu/libapt-pkg.so.4.12 #9 0x8001f03e in ?? () #10 0x02096b22 in CommandLine::DispatchArg(CommandLine::Dispatch*, bool) () from /usr/lib/s390x-linux-gnu/libapt-pkg.so.4.12 #11 0x8000cc4e in ?? () #12 0x023bea40 in __libc_start_main (main=optimized out, argc=optimized out, ubp_av=optimized out, init=0x80029440, fini=0x8002943c, rtld_fini=0x2010778, stack_end=0x3da2630) at libc-start.c:228 #13 0x8000cf82 in ?? () -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#669427: apt segfaults on s390x
On 19/04/12 20:21, Faidon Liambotis wrote: Package: apt Version: 0.9.1 Severity: serious apt 0.9.1 currently segfaults on the zelenka (our s390/s390x porterbox) sid_s390x chroot. Downgrading apt to 0.8.15.10 makes it work again. Does it also segfault on s390? (s390x is not a release arch yet so it doesn't warrant an RC severity, unless the maintainer thinks so). Regards, -- Mehdi -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#669427: apt segfaults on s390x
severity 669427 important thanks On Thu, Apr 19, 2012 at 09:12:09PM +0200, Mehdi Dogguy wrote: apt 0.9.1 currently segfaults on the zelenka (our s390/s390x porterbox) sid_s390x chroot. Downgrading apt to 0.8.15.10 makes it work again. Does it also segfault on s390? (s390x is not a release arch yet so it doesn't warrant an RC severity, unless the maintainer thinks so). Good point, didn't think much about that at the time, sorry. It seems to work on s390, downgrading. Thanks, Faidon -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org