Bug#1060251: Bug#1059352: src:apt: fails to migrate to testing for too long: autopkgtest regression on armhf

2024-01-12 Thread Emanuele Rocca
Hi Julian!

On 2024-01-12 01:47, Julian Andres Klode wrote:
> Either way, these are harmless

I'm not saying they're harmful, what I'm saying is:

1) the errors you see on armhf when building apt without
   stack-clash-protection are the same valgrind reports on amd64 as
   well. Hence, you could consider rebuilding apt without
   stack-clash-protection on armhf and dropping the armhf conditional in
   test/integration/test-apt-update-simple

2) if you run valgrind in CI, I think you should probably use
   --error-exitcode. Otherwise, what's the point of using valgrind at
   all, if you don't fail when it reports errors? :-)
   This is not directly relevant to #1060251 of course, I'm happy to
   file a separate bug against apt if you find that useful.



Bug#1060251: Bug#1059352: src:apt: fails to migrate to testing for too long: autopkgtest regression on armhf

2024-01-12 Thread Julian Andres Klode
On Thu, Jan 11, 2024 at 10:18:58PM +0100, Emanuele Rocca wrote:
> Hi Julian,
> 
> On 2024-01-11 05:46, Julian Andres Klode wrote:
> > And there aren't any hard errors. We could zero initialize
> > those or add supressions to make things look nicer I suppose.
> 
> Mmmh no, they are all actual errors as far as valgrind is concerned.
> 
> The thing is, you're running valgrind without --error-exitcode. By doing
> so, the exit code of your tests is the exit code of apt-get, not of
> valgrind.
> 
> Try this instead:
> 
> (sid-amd64)root@ariel:~# valgrind --error-exitcode=1 apt-get update
> [...]
> ==308534== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 0 from 0)
> (sid-amd64)root@ariel:~# echo $?
> 1

That's all just strawman arguments.

As said before, these aren't the issue. These all occur in locations
where we read the entire memory arena of our allocator, either for
hashing or writing it to a file.

While they _should_ be zero-initialized if you look at the code, we
actually initialize things with a header object. The problem is the
header object looks like:

uint32_t magic;
uint16_t major;
uint16_t minor;
bool dirty;
uint16_t boo;

And the padding bytes are never initialized but will inevitably
be read by the hashing and write back of the entire arena.

We can fix these errors by either zero initializing the padding in
the object constructor: 


diff --git a/apt-pkg/pkgcache.cc b/apt-pkg/pkgcache.cc
index 76336b9b3..b845cb13c 100644
--- a/apt-pkg/pkgcache.cc
+++ b/apt-pkg/pkgcache.cc
@@ -52,6 +52,7 @@ using APT::StringView;
 /* Simply initialize the header */
 pkgCache::Header::Header()
 {
+   memset(this, 0, sizeof(*this));
 #define APT_HEADER_SET(X,Y) X = Y; 
static_assert(std::numeric_limits::max() > Y, "Size violation 
detected in pkgCache::Header")
APT_HEADER_SET(Signature, 0x98FE76DC);
 


or by not creating a header on the stack (with uninitialized padding) and then 
copying
it in, by using placement new operator:


diff --git a/apt-pkg/pkgcachegen.cc b/apt-pkg/pkgcachegen.cc
index 9e47ef369..3a85a9585 100644
--- a/apt-pkg/pkgcachegen.cc
+++ b/apt-pkg/pkgcachegen.cc
@@ -89,7 +89,7 @@ bool pkgCacheGenerator::Start()
   
Map.UsePools(*Cache.HeaderP->Pools,sizeof(Cache.HeaderP->Pools)/sizeof(Cache.HeaderP->Pools[0]));
 
   // Starting header
-  *Cache.HeaderP = pkgCache::Header();
+  new (Cache.HeaderP) pkgCache::Header();
 
   // make room for the hashtables for packages and groups
   if (Map.RawAllocate(2 * (Cache.HeaderP->GetHashTableSize() * 
sizeof(map_pointer))) == 0)


Either way, these are harmless
-- 
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer  i speak de, en



Bug#1060251: Bug#1059352: src:apt: fails to migrate to testing for too long: autopkgtest regression on armhf

2024-01-11 Thread Emanuele Rocca
Hi Julian,

On 2024-01-11 05:46, Julian Andres Klode wrote:
> And there aren't any hard errors. We could zero initialize
> those or add supressions to make things look nicer I suppose.

Mmmh no, they are all actual errors as far as valgrind is concerned.

The thing is, you're running valgrind without --error-exitcode. By doing
so, the exit code of your tests is the exit code of apt-get, not of
valgrind.

Try this instead:

(sid-amd64)root@ariel:~# valgrind --error-exitcode=1 apt-get update
[...]
==308534== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 0 from 0)
(sid-amd64)root@ariel:~# echo $?
1



Bug#1060251: Bug#1059352: src:apt: fails to migrate to testing for too long: autopkgtest regression on armhf

2024-01-11 Thread Julian Andres Klode
On Thu, Jan 11, 2024 at 03:53:17PM +0100, Emanuele Rocca wrote:
> Hi Julian,
> 
> On 2024-01-08 10:28, Julian Andres Klode wrote:
> > (in Ubuntu we have partially recovered by disabling stack clash
> > protection but it crashes on invalid writes there, I suppose we need
> > to rebuild some more apt dependencies without the flag...).
> 
> The 'invalid writes' issue seems unrelated to armhf and 
> stack-clash-protection,
> I can reproduce it on my x86 workstation. It would be interesting to see if
> once these problems are fixed valgrind on armhf still segfaults.
> 
> (sid-amd64)root@ariel:~# valgrind apt-get update
> ==194196== Memcheck, a memory error detector
> ==194196== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
> ==194196== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
> ==194196== Command: apt-get update
> ==194196== 
> Hit:1 http://127.0.0.1:3142/debian sid InRelease
> ==194196== Conditional jump or move depends on uninitialised value(s)
> ==194196==at 0x4A89B3B: pkgCache::ReMap(bool const&) (in 
> /usr/lib/x86_64-linux-gnu/libapt-pkg.so.6.0.0)
> [... more errors follow]

The uninitialized values in ReMap are actually normal and
correct behavior, not errors. It happens because we need to
grow the array/map without having written all bytes of it first.

The same applies to uninitalized bytes passed to write from
pkgCacheFile::BuildCaches(), it's writing the partially
initialized memory pool to the file.

And there aren't any hard errors. We could zero initialize
those or add supressions to make things look nicer I suppose.

-- 
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer  i speak de, en



Bug#1060251: Bug#1059352: src:apt: fails to migrate to testing for too long: autopkgtest regression on armhf

2024-01-11 Thread Emanuele Rocca
Hi Julian,

On 2024-01-08 10:28, Julian Andres Klode wrote:
> (in Ubuntu we have partially recovered by disabling stack clash
> protection but it crashes on invalid writes there, I suppose we need
> to rebuild some more apt dependencies without the flag...).

The 'invalid writes' issue seems unrelated to armhf and stack-clash-protection,
I can reproduce it on my x86 workstation. It would be interesting to see if
once these problems are fixed valgrind on armhf still segfaults.

(sid-amd64)root@ariel:~# valgrind apt-get update
==194196== Memcheck, a memory error detector
==194196== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==194196== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==194196== Command: apt-get update
==194196== 
Hit:1 http://127.0.0.1:3142/debian sid InRelease
==194196== Conditional jump or move depends on uninitialised value(s)
==194196==at 0x4A89B3B: pkgCache::ReMap(bool const&) (in 
/usr/lib/x86_64-linux-gnu/libapt-pkg.so.6.0.0)
[... more errors follow]



Bug#1059352: src:apt: fails to migrate to testing for too long: autopkgtest regression on armhf

2024-01-08 Thread Emanuele Rocca
Control: retitle -1 valgrind: Access not within mapped region on armhf

Hello Paul and Julian,

On 2023-12-24 07:31, Paul Gevers wrote:
> On 23-12-2023 20:40, Julian Andres Klode wrote:
> > the logs for well known bugs so that you don't end up filing bugs
> > against every package broken by valgrind's missing support for the new
> > NOP sequences in binutils.
> 
> I do *some* minor triaging, but I didn't know this to be a common problem.

Note that there are three distinct valgrind problems mentioned here:

- on armhf, flagging accesses below the stack pointer as illegal. This
  came up after enabling stack-clash-protection on armhf. I added a
  suppression for this problem in valgrind 1:3.20.0-2, so we shouldn't
  be seeing those anymore. For reference:
  https://lists.debian.org/debian-arm/2023/12/msg7.html

- on i386, valgrind was failing due to new nop patterns that it didn't
  know about. This is https://bugs.debian.org/1057693 and was fixed by
  Aurelien in 1:3.20.0-2.1

- on armhf, a segfault when testing apt. I don't think this issue is
  widespread at all: in fact I'm not aware of any other instances except
  for apt. After going through all armhf CI failures caused by segfaults, apt
  seems to be the only one with valgrind in the picture. Just to be
  clear: I'm not saying that the problem is trivial, just that AFAIK it's
  not widespread. I'm including the relevant CI logs below for
  reference. Also I'm attaching the horrible script I've written to fetch
  all armhf CI failures in case anyone wants to poke around.

722s ==221367== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
722s ==221367== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright 
info
722s ==221367== Command: /usr/bin/apt-get update -o Acquire::Queue-Mode=access
722s ==221367== 
722s ==221367== 
722s ==221367== Process terminating with default action of signal 11 (SIGSEGV)
722s ==221367==  Access not within mapped region at address 0xFE7F2A9C
722s ==221367==at 0x4938838: ReadMessages(int, 
std::vector, 
std::allocator >, std::allocator, std::allocator > > >&) (in 
/usr/lib/arm-linux-gnueabihf/libapt-pkg.so.6.0.0)
722s ==221367==  If you believe this happened as a result of a stack
722s ==221367==  overflow in your program's main thread (unlikely but
722s ==221367==  possible), you can try to increase the size of the
722s ==221367==  main thread stack using the --main-stacksize= flag.
722s ==221367==  The main thread stack size used in this run was 8388608.
722s ==221367== 
722s ==221367== HEAP SUMMARY:
722s ==221367== in use at exit: 119,915 bytes in 1,882 blocks
722s ==221367==   total heap usage: 11,673 allocs, 9,791 frees, 868,849 bytes 
allocated
722s ==221367== 
722s ==221367== LEAK SUMMARY:
722s ==221367==definitely lost: 0 bytes in 0 blocks
722s ==221367==indirectly lost: 0 bytes in 0 blocks
722s ==221367==  possibly lost: 0 bytes in 0 blocks
722s ==221367==still reachable: 119,915 bytes in 1,882 blocks
722s ==221367== suppressed: 0 bytes in 0 blocks
722s ==221367== Rerun with --leak-check=full to see details of leaked memory
722s ==221367== 
722s ==221367== For lists of detected and suppressed errors, rerun with: -s
722s ==221367== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 42821 from 
965)
722s Segmentation fault
722s FAIL: exitcode 139


armhf-ci-segfaults.sh
Description: Bourne shell script


Bug#1059352: src:apt: fails to migrate to testing for too long: autopkgtest regression on armhf

2024-01-08 Thread Julian Andres Klode
Control: clone -1 -2
Control: reassign -2 valgrind
Control: retitle -2 valgrind: armhf is broken

On Sat, Dec 23, 2023 at 08:46:27AM +0100, Paul Gevers wrote:
> Source: apt
> Version: 2.7.6
> Severity: serious
> Control: close -1 2.7.7
> Tags: sid trixie
> User: release.debian@packages.debian.org
> Usertags: out-of-sync
> 
> Dear maintainer(s),
> 
> The Release Team considers packages that are out-of-sync between testing and
> unstable for more than 30 days as having a Release Critical bug in testing
> [1]. Your package src:apt has been trying to migrate for 31 days [2]. Hence,
> I am filing this bug. The version in unstable fails its own autopkgtest on
> armhf.
> 
> If a package is out of sync between unstable and testing for a longer
> period, this usually means that bugs in the package in testing cannot be
> fixed via unstable. Additionally, blocked packages can have impact on other
> packages, which makes preparing for the release more difficult. Finally, it
> often exposes issues with the package and/or
> its (reverse-)dependencies. We expect maintainers to fix issues that hamper
> the migration of their package in a timely manner.
> 
> This bug will trigger auto-removal when appropriate. As with all new bugs,
> there will be at least 30 days before the package is auto-removed.
> 
> I have immediately closed this bug with the version in unstable, so if that
> version or a later version migrates, this bug will no longer affect testing.
> I have also tagged this bug to only affect sid and trixie, so it doesn't
> affect (old-)stable.
> 
> If you believe your package is unable to migrate to testing due to issues
> beyond your control, don't hesitate to contact the Release Team.
> 
> Paul
> 
> [1] https://lists.debian.org/debian-devel-announce/2023/06/msg1.html
> [2] https://qa.debian.org/excuses.php?package=apt
> 

This is a release critical bug in valgrind. valgrind does not understand
the new toolchain defaults like stack clash protection on armhf and just
segfaults (in Ubuntu we have partially recovered by disabling stack
clash protection but it crashes on invalid writes there, I suppose we
need to rebuild some more apt dependencies without the flag...).

As there doesn't appear to be any progress on the valgrind or toolchain
sides to fix these issues, I'll go and disable valgrind on armhf in the
next apt upload, but I am cloning this to valgrind to make it clear that
it's horribly borked.

-- 
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer  i speak de, en


signature.asc
Description: PGP signature


Bug#1059352: src:apt: fails to migrate to testing for too long: autopkgtest regression on armhf

2023-12-23 Thread Paul Gevers

Hi Julian,

On 23-12-2023 20:40, Julian Andres Klode wrote:

I know this is automated but I feel lije it would be sensible to scan


It's *mostly* automated.


the logs for well known bugs so that you don't end up filing bugs
against every package broken by valgrind's missing support for the new
NOP sequences in binutils.


I do *some* minor triaging, but I didn't know this to be a common 
problem. Where do you think I should have picked this up?


Anyways, this bug is already closed (when I submitted) and because apt 
is a key package, it doesn't even do anything against apt, so the bug 
serves mostly for tracking purposes.


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1059352: src:apt: fails to migrate to testing for too long: autopkgtest regression on armhf

2023-12-23 Thread Julian Andres Klode
On Sat, Dec 23, 2023 at 08:46:27AM +0100, Paul Gevers wrote:
> Source: apt
> Version: 2.7.6
> Severity: serious
> Control: close -1 2.7.7
> Tags: sid trixie
> User: release.debian@packages.debian.org
> Usertags: out-of-sync
> 
> Dear maintainer(s),
> 
> The Release Team considers packages that are out-of-sync between testing and
> unstable for more than 30 days as having a Release Critical bug in testing
> [1]. Your package src:apt has been trying to migrate for 31 days [2]. Hence,
> I am filing this bug. The version in unstable fails its own autopkgtest on
> armhf.
> 
> If a package is out of sync between unstable and testing for a longer
> period, this usually means that bugs in the package in testing cannot be
> fixed via unstable. Additionally, blocked packages can have impact on other
> packages, which makes preparing for the release more difficult. Finally, it
> often exposes issues with the package and/or
> its (reverse-)dependencies. We expect maintainers to fix issues that hamper
> the migration of their package in a timely manner.
> 
> This bug will trigger auto-removal when appropriate. As with all new bugs,
> there will be at least 30 days before the package is auto-removed.
> 
> I have immediately closed this bug with the version in unstable, so if that
> version or a later version migrates, this bug will no longer affect testing.
> I have also tagged this bug to only affect sid and trixie, so it doesn't
> affect (old-)stable.
> 
> If you believe your package is unable to migrate to testing due to issues
> beyond your control, don't hesitate to contact the Release Team.

I know this is automated but I feel lije it would be sensible to scan
the logs for well known bugs so that you don't end up filing bugs
against every package broken by valgrind's missing support for the new
NOP sequences in binutils.

Like this work is tracked in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057693

and normally we would forcemerge this into it.

-- 
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer  i speak de, en



Bug#1059352: src:apt: fails to migrate to testing for too long: autopkgtest regression on armhf

2023-12-22 Thread Paul Gevers

Source: apt
Version: 2.7.6
Severity: serious
Control: close -1 2.7.7
Tags: sid trixie
User: release.debian@packages.debian.org
Usertags: out-of-sync

Dear maintainer(s),

The Release Team considers packages that are out-of-sync between testing 
and unstable for more than 30 days as having a Release Critical bug in 
testing [1]. Your package src:apt has been trying to migrate for 31 days 
[2]. Hence, I am filing this bug. The version in unstable fails its own 
autopkgtest on armhf.


If a package is out of sync between unstable and testing for a longer 
period, this usually means that bugs in the package in testing cannot be 
fixed via unstable. Additionally, blocked packages can have impact on 
other packages, which makes preparing for the release more difficult. 
Finally, it often exposes issues with the package and/or
its (reverse-)dependencies. We expect maintainers to fix issues that 
hamper the migration of their package in a timely manner.


This bug will trigger auto-removal when appropriate. As with all new 
bugs, there will be at least 30 days before the package is auto-removed.


I have immediately closed this bug with the version in unstable, so if 
that version or a later version migrates, this bug will no longer affect 
testing. I have also tagged this bug to only affect sid and trixie, so 
it doesn't affect (old-)stable.


If you believe your package is unable to migrate to testing due to 
issues beyond your control, don't hesitate to contact the Release Team.


Paul

[1] https://lists.debian.org/debian-devel-announce/2023/06/msg1.html
[2] https://qa.debian.org/excuses.php?package=apt



OpenPGP_signature.asc
Description: OpenPGP digital signature