Bug#474947: the state of Bug#474947
clone 474947 -1 reassign -1 release-notes retitle -1 Update information about apt MMap problem in release notes thanks On Tue, Oct 21, 2008 at 12:03:49PM +0200, A Mennucc wrote: > hi bug, hi people, hi d-release > > I did some study on bug 474947, that is grave/RC, and is posted against APT. > > Since I was told that the APT team is understaffed, I decided to take > action myself. > Firstly, thank you for all your hard work on this problem. > So my conclusion is that the forthcoming release notes do address the > problem some people may encounter in upgrading from Etch to Lenny. > I agree. This issue isn't RC for lenny. > I propose the attached patch, though, since it is funny to suggest a > value of 1250 (bytes) when the internal value in Lenny is 20MB. > I hope someone in the d-relase team can apply it. > Thanks, I'll clone this and re-assign. Neil -- * stockholm bangs head against budget outsch h01ger: it is still very soft, i did not hurt myself stockholm: But you bled on the budget, and now it's red again! signature.asc Description: Digital signature
Bug#474947: the state of Bug#474947
Elliott Mitchell wrote: >> Yes. So, If you claim this have to be fixed before Lenny, go ahead and ask >> Debian release >> team what they think about changes in internals of apt and additional >> month(s) of testing. >> > > I thought that was the point of copy the messages to their list was. And, nevertheless, it was no answer yet. I assume you should ask them directly by mail to receive 'yes' or 'no' answer. >> Yes, it will change ABI and API. This will cause recompiling packages that >> rely on apt >> against new apt, and would cause breakage of some apt-dependent tools (such >> as aptitude, >> perl and python bindings). Another big pain for other developers. > > Adding a level of indirection isn't a very big change. Yes, it has > effects all over the place, but 95% of those are pretty simple (can > mostly be done with `sed`). The difficult part is change the allocator, > which I presume is the portion you did? Rather simple, but in many places. Yes, it's exactly what I did. But this part may contain bugs too, as I cannot test it without all redirections implemented. >> My conclusion: please not force fixing this bug before Lenny until release >> team agree to >> change internals of apt at this stage. > > My point with the above is to keep working on it. Even if slight, there > is a very small chance it might be possible to complete in time. I hope apt in Squeeze will be significantly more bug-clear than in Lenny. But apt in Lenny is better than apt in Etch. And, again, this bug is important. But not so important to be forced to fix before Lenn release. > As for combining bug reports, #474947 is distinct from the #380509, > #413024, #429171, #431410 and #451526. None of those includes a [some investigations snipped] I assume this investigation will be also done post-Lenny. Thanks for triaging and attention. -- Eugene V. Lyubimkin aka JackYF, Ukrainian C++ developer. signature.asc Description: OpenPGP digital signature
Bug#474947: the state of Bug#474947
>From: Eugene V. Lyubimkin <[EMAIL PROTECTED]> > Elliott Mitchell wrote: > > I have made no such claims. I am merely stating that this is a serious > > bug. Severe enough to seriously consider delaying the release. This is > > what the release team gets to decide, which is worse (neither option is > > good)? > Yes. So, If you claim this have to be fixed before Lenny, go ahead and ask > Debian release > team what they think about changes in internals of apt and additional > month(s) of testing. > I thought that was the point of copy the messages to their list was. > It might be found that fixing it isn't anywhere near as bad > > as you thought. Even though it changes the API/ABI, if no one has ever > > touched that field, the impact on other packages will be zero. > Yes, it will change ABI and API. This will cause recompiling packages that > rely on apt > against new apt, and would cause breakage of some apt-dependent tools (such > as aptitude, > perl and python bindings). Another big pain for other developers. Adding a level of indirection isn't a very big change. Yes, it has effects all over the place, but 95% of those are pretty simple (can mostly be done with `sed`). The difficult part is change the allocator, which I presume is the portion you did? > > Perhaps > > the release team will decide it is worth delaying the release, in which > > case a head start in testing will be of great value. Perhaps some other > > issue will force a delay of the release, in which case the extra time > > might allow sufficient testing. > Perhaps. And perhaps not. > > My conclusion: please not force fixing this bug before Lenny until release > team agree to > change internals of apt at this stage. My point with the above is to keep working on it. Even if slight, there is a very small chance it might be possible to complete in time. As for combining bug reports, #474947 is distinct from the #380509, #413024, #429171, #431410 and #451526. None of those includes a segmentation violation. #474947 might get fixed simply because fixing the little core piece will prevent it from being tickled or the rewrite might just squash it; but I still think it is distinct. I also think #429173 should be separate from that grouping. Again, this one might never show up, if not for the MMap issue; but this issue is that locks are left behind on error, not the MMap issue. On the flip side, #474947 might be the same as #443564. The MMap issue shows up, followed by a segmentation violation. Similar fixes work, but that is due to aggrevation by the MMap issue, without that bug these might be clearly distinct. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| [EMAIL PROTECTED] PGP F6B23DE0 |) / \_CS\ | _ -O #include O- _ | / _/ 2477\___\_|_/DC21 03A0 5D61 985B <-PGP-> F2BE 6526 ABD2 F6B2\_|_/___/3DE0 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#474947: the state of Bug#474947
Dropped debian-release from CC. Elliott Mitchell wrote: >> - this patch reduces apt speed (not serious though, as I see) on most >> operations with the cache; > > I guess I should ask, do you have an less issues less relevant waiting in > the wings? While more speed is good, that is worthless if it is badly > broken. Yes. >> - fix requires a big patch (small part of it was written by me, see #474947 >> thread); >> - this patch have to change internals of apt; >> - this patch can break apt API and ABI (don't checked); > >> - this patch definitely requires thorough review and testing; > > Here it is a matter of the weightings. The flipside is without this > fixed: > > - Almost certainly a significant number of users will run into this > issue during the lifetime of Lenny (the history is 5-10 bugs/year; plus > an unknown and likely large number of people who do not report it since > they see it has already been reported and therefore presume work has > already begun on fixing it). Release notes have already an item about upgrading APT first and setting Cache-Limit, if I recall correctly. > - This complicates debugging, as it escalates otherwise harmless issues > to major severity (see #400768; while certainly an otherwise unrelated > bug, if the MMap issue wasn't present, this bug would never have caused > any problems). It seems that 400768 was caused by other limits. Though I may be wrong. > - It is quite likely that upon upgrading to a version of Debian after > Lenny, APT will again break due to this issue and again have to include > a major warning in the release notes. Yes :( > - As far as actual impact of the change we still do not know. Despite > knowing about the first problem for at least 5 years (#178623 is the > oldest report I have found), and knowing that it was still very > definitely an issue for a minimum of nearly 2 years (#400768); we still > do not have anything but rough guestimates. It might be that this is the > time your estimate is wildly wrong, but we do not know since no patch has > ever been tried. Why not try this in experimental? Then we would have > real experience to judge how much work it will take to fix. Please, leave this after Lenny. Testing in experimental is definitely not enough. Yes, it's bad that this bug wasn't touched during long time. >> I don't think this would be acceptable by release team. > > Too a point I think I can summarize your position as: It is too dangerous > to fix this in Lenny. Yes. > > Correspondingly I can summarize my position as: It is too dangerous NOT > to fix in Lenny. > > There is no clearly right answer here. The issue is which will damage > Debian more; delaying the release, or releasing with another serious > issue? Well, old bug with clear instructions how to fix it is better that undiscovered bunch of new ones which, If will not be discovered before Lenny, will affect user during all Lenny lifetime. You also know that, firstly, Debian had full freeze in July 2008, freeze of core components was at several months before. >> Elliott, reason for this bug is apt architecture. Do you think we can easily >> change architecture of the core package at freeze stage? > > I have made no such claims. I am merely stating that this is a serious > bug. Severe enough to seriously consider delaying the release. This is > what the release team gets to decide, which is worse (neither option is > good)? Yes. So, If you claim this have to be fixed before Lenny, go ahead and ask Debian release team what they think about changes in internals of apt and additional month(s) of testing. > Yet, since you've got an initial patch, why not put that out in > experimental? I've have not written initial patch. I've written small part of the patch. It might be found that fixing it isn't anywhere near as bad > as you thought. Even though it changes the API/ABI, if no one has ever > touched that field, the impact on other packages will be zero. Yes, it will change ABI and API. This will cause recompiling packages that rely on apt against new apt, and would cause breakage of some apt-dependent tools (such as aptitude, perl and python bindings). Another big pain for other developers. > Perhaps > the release team will decide it is worth delaying the release, in which > case a head start in testing will be of great value. Perhaps some other > issue will force a delay of the release, in which case the extra time > might allow sufficient testing. Perhaps. And perhaps not. My conclusion: please not force fixing this bug before Lenny until release team agree to change internals of apt at this stage. -- Eugene V. Lyubimkin aka JackYF, Ukrainian C++ developer. signature.asc Description: OpenPGP digital signature
Bug#474947: the state of Bug#474947
>From: Eugene V. Lyubimkin <[EMAIL PROTECTED]> > A Mennucc wrote: > > IMHO one way to decide if to accept a patch during the freeze is to > > see how large and "important" it is. Does anybody have an example > > patch, or a description of what code changes would be necessary? > I had a look on this bug and, thus seems I have. I don't understand why > Elliott ignored my previous 2 mails about it. So, I am repeating my humble > look here. Reasons for not touching this bug anymore before Lenny release are: I ignored nothing. This does not mean I will come to the same conclusion. > - this patch reduces apt speed (not serious though, as I see) on most > operations with the cache; I guess I should ask, do you have an less issues less relevant waiting in the wings? While more speed is good, that is worthless if it is badly broken. > - fix requires a big patch (small part of it was written by me, see #474947 > thread); > - this patch have to change internals of apt; > - this patch can break apt API and ABI (don't checked); > - this patch definitely requires thorough review and testing; Here it is a matter of the weightings. The flipside is without this fixed: - Almost certainly a significant number of users will run into this issue during the lifetime of Lenny (the history is 5-10 bugs/year; plus an unknown and likely large number of people who do not report it since they see it has already been reported and therefore presume work has already begun on fixing it). - This complicates debugging, as it escalates otherwise harmless issues to major severity (see #400768; while certainly an otherwise unrelated bug, if the MMap issue wasn't present, this bug would never have caused any problems). - It is quite likely that upon upgrading to a version of Debian after Lenny, APT will again break due to this issue and again have to include a major warning in the release notes. - As far as actual impact of the change we still do not know. Despite knowing about the first problem for at least 5 years (#178623 is the oldest report I have found), and knowing that it was still very definitely an issue for a minimum of nearly 2 years (#400768); we still do not have anything but rough guestimates. It might be that this is the time your estimate is wildly wrong, but we do not know since no patch has ever been tried. Why not try this in experimental? Then we would have real experience to judge how much work it will take to fix. > I don't think this would be acceptable by release team. Too a point I think I can summarize your position as: It is too dangerous to fix this in Lenny. Correspondingly I can summarize my position as: It is too dangerous NOT to fix in Lenny. There is no clearly right answer here. The issue is which will damage Debian more; delaying the release, or releasing with another serious issue? > Elliott, reason for this bug is apt architecture. Do you think we can easily > change architecture of the core package at freeze stage? I have made no such claims. I am merely stating that this is a serious bug. Severe enough to seriously consider delaying the release. This is what the release team gets to decide, which is worse (neither option is good)? Yet, since you've got an initial patch, why not put that out in experimental? It might be found that fixing it isn't anywhere near as bad as you thought. Even though it changes the API/ABI, if no one has ever touched that field, the impact on other packages will be zero. Perhaps the release team will decide it is worth delaying the release, in which case a head start in testing will be of great value. Perhaps some other issue will force a delay of the release, in which case the extra time might allow sufficient testing. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| [EMAIL PROTECTED] PGP F6B23DE0 |) / \_CS\ | _ -O #include O- _ | / _/ 2477\___\_|_/DC21 03A0 5D61 985B <-PGP-> F2BE 6526 ABD2 F6B2\_|_/___/3DE0 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#474947: the state of Bug#474947
A Mennucc wrote: > On Wed, Oct 22, 2008 at 10:09:58PM -0700, Elliott Mitchell wrote: >> I must therefore suggest that at the very least, the first part of this >> bug is too severe to allow to continue on to yet another release. Despite >> the pain now, that it is better to solve this issue and avoid yet more >> pain down the road. > > IMHO one way to decide if to accept a patch during the freeze is to > see how large and "important" it is. Does anybody have an example > patch, or a description of what code changes would be necessary? I had a look on this bug and, thus seems I have. I don't understand why Elliott ignored my previous 2 mails about it. So, I am repeating my humble look here. Reasons for not touching this bug anymore before Lenny release are: - fix requires a big patch (small part of it was written by me, see #474947 thread); - this patch have to change internals of apt; - this patch can break apt API and ABI (don't checked); - this patch reduces apt speed (not serious though, as I see) on most operations with the cache; - this patch definitely requires thorough review and testing; I don't think this would be acceptable by release team. Elliott, reason for this bug is apt architecture. Do you think we can easily change architecture of the core package at freeze stage? -- Eugene V. Lyubimkin aka JackYF signature.asc Description: OpenPGP digital signature
Bug#474947: the state of Bug#474947
On Wed, Oct 22, 2008 at 10:09:58PM -0700, Elliott Mitchell wrote: > I must therefore suggest that at the very least, the first part of this > bug is too severe to allow to continue on to yet another release. Despite > the pain now, that it is better to solve this issue and avoid yet more > pain down the road. IMHO one way to decide if to accept a patch during the freeze is to see how large and "important" it is. Does anybody have an example patch, or a description of what code changes would be necessary? a. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#474947: the state of Bug#474947
>From: A Mennucc <[EMAIL PROTECTED]> > In this same bug there were reported two different issues, a "Dynamic > MMap error" and a segmentation fault; moreover some people were using > APT in Etch, and some other in Lenny. I believe this is a good estimation of how it breaks down. > The only way to trigger a segmentation fault was instead to set > Cache-Limit to a ridiculously low value. Perhaps I was lucky. It is difficult to probe deeper with the first bug complicating #474947. Bugs #409336 and #443564 may be similar, or perhaps the exact same issue. > So my conclusion is that the forthcoming release notes do address the > problem some people may encounter in upgrading from Etch to Lenny. > > I propose the attached patch, though, since it is funny to suggest a > value of 1250 (bytes) when the internal value in Lenny is 20MB. > I hope someone in the d-relase team can apply it. One thing comes to mind here, is this an amount of space reserved or an amount allocated? If the latter, then smaller (embedded) systems will have problems. > Some months go by; in Sept. JackYF offers to help fixing the problem by > changing the code. > (Unfortunately we are already in deep freeze, and I am afraid deep > changes to APT would not be accepted by the release team.) I must suggest that the release team think very carefully about this. The earliest manifestation of the former (MMap problem without the core dump) is bug #178623. For the past 5 years, every release has had this bug multiple times as an important issue and had to have it documented in release notes. I'm estimating 25-50 reports, plus a large number of people who found the workaround via Google or simply looking at the reports and decided that since it was reported and a workaround was known, a fix must be under way. This effects a rather large number of people. Given past history, no matter how high the limit (workaround) is set now, a significant number of people will encounter it within the lifespan of Lenny. Worse, when the next version beyond Lenny is produced, upgrades will very likely trigger this bug yet again and a horde of people will encounter it. I must therefore suggest that at the very least, the first part of this bug is too severe to allow to continue on to yet another release. Despite the pain now, that it is better to solve this issue and avoid yet more pain down the road. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| [EMAIL PROTECTED] PGP F6B23DE0 |) / \_CS\ | _ -O #include O- _ | / _/ 2477\___\_|_/DC21 03A0 5D61 985B <-PGP-> F2BE 6526 ABD2 F6B2\_|_/___/3DE0 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#474947: the state of Bug#474947
retitle 474947 "fix Dynamic MMaps error" severity 474947 important tag 474947 -unreproducible thanks hi bug, hi people, hi d-release I did some study on bug 474947, that is grave/RC, and is posted against APT. Since I was told that the APT team is understaffed, I decided to take action myself. --- First of all, I tried to focus the problem. In this same bug there were reported two different issues, a "Dynamic MMap error" and a segmentation fault; moreover some people were using APT in Etch, and some other in Lenny. Let's summarize it shortly. First poster is Elliot Mitchel, in Apr 08; he is using APT 0.6.46 inside Etch. He claims that he cannot work around a "Dynamic MMap" error; he then sets severity to grave. It turns out that he is using the wrong option, he is using '-o APT::Cache-File=2' instead of '-o APT::Cache-Limit=2' . (And that confuses "Joe Nahmias" and me as well - the mistake is noted by JackYF quite later). So the first report is flawed. jasen reports a similar problem, but I don't see enough details to comment. Then Elliott Mitchell again also posts a report about a segmentation fault. Some months go by; in Sept. JackYF offers to help fixing the problem by changing the code. (Unfortunately we are already in deep freeze, and I am afraid deep changes to APT would not be accepted by the release team.) I did some more research around, and tests, and this is what I found out. The default in APT (inside the code) for Cache-Limit in Lenny is 20MB , in Etch is 8MB . Note also that in the Lenny release notes http://svn.debian.org/viewsvn/ddp/manuals/branches/release-notes/lenny/en/upgrading.dbk it is suggested to set Cache-Limit to 1250 in case of Dynamic MMap error. I tested APT 0.6.46 in Etch. I tried with 3 different sources.list, see attachment. The "fat" list is the union of all lists of all reporters (minus obsolete and duplicates). The two smaller files work perfectly well; the "fat" sources file does trigger the DynamicMMap problem, that I can though work around by using 'apt-get -o APT::Cache-Limit=1 update' The only way to trigger a segmentation fault was instead to set Cache-Limit to a ridiculously low value. So my conclusion is that the forthcoming release notes do address the problem some people may encounter in upgrading from Etch to Lenny. I propose the attached patch, though, since it is funny to suggest a value of 1250 (bytes) when the internal value in Lenny is 20MB. I hope someone in the d-relase team can apply it. a. Index: manuals/branches/release-notes/lenny/en/upgrading.dbk === --- manuals/branches/release-notes/lenny/en/upgrading.dbk (revisione 5426) +++ manuals/branches/release-notes/lenny/en/upgrading.dbk (copia locale) @@ -957,10 +957,11 @@ to a value that should be sufficient for the upgrade: -# echo 'APT::Cache-Limit 1250;' >> /etc/apt/apt.conf +# echo 'APT::Cache-Limit 2100;' >> /etc/apt/apt.conf -This assumes that you do not yet have this variable set in that file. +This assumes that you do not yet have this variable set in that file; +otherwise you may manually edit the file to set the above variable. Sometimes it's necessary to enable the APT::Force-LoopBreak deb http://debian.oregonstate.edu/debian/ stable main contrib non-free deb http://security.debian.org stable/updates main contrib non-free deb http://www.debian-multimedia.org stable main deb http://volatile.debian.org/debian-volatile stable/volatile main contrib non-free deb http://debian.oregonstate.edu/debian/ testing main contrib non-free deb-src http://debian.oregonstate.edu/debian/ stable main contrib non-free deb-src http://security.debian.org stable/updates main contrib non-free deb-src http://www.debian-multimedia.org stable main deb-src http://volatile.debian.org/debian-volatile stable/volatile main contrib non-free deb-src http://debian.oregonstate.edu/debian/ testing main contrib non-free #- deb http://ftp.egr.msu.edu/debian/ unstable main contrib deb-src http://ftp.egr.msu.edu/debian/ unstable main contrib deb http://ftp.us.debian.org/debian/ unstable main contrib non-free deb-src http://ftp.us.debian.org/debian/ unstable main contrib non-free deb http://ftp.us.debian.org/debian/ etch main contrib non-free deb http://ftp.us.debian.org/debian/ lenny main contrib non-free deb-src http://mentors.debian.net/debian unstable main contrib # deb ftp://ftp.nz.debian.org/debian etch main non-free contrib deb-src http://ftp.nz.debian.org/debian stable main non-free contrib deb http://ftp.nz.debian.org/debian lenny main contrib non-free deb http://ftp.nz.debian.org/debian unstable main contrib deb http://www.debian-multimedia.org etch main deb ftp://ftp.au.debian.org/debian etch main non-free contrib deb-src http://ftp.au.debian.org/debian etch main non-free contrib deb-src http://ftp.nz.debian.org/debian testing main contrib