I've been worrying a bit about pseudo. Since we made it stricter about inode mismatches, we see a trickle of reports of pseudo aborts (fakeroot tasks showing exit 134 which is SIGABORT).
The issue occurs when a file in the pseudo database is removed outside of pseudo's context. The inode stored in the database can then appear as a new file, which would trigger path mismatch errors. Since pseudo is an LD_PRELOAD, even getting a sensible error to the user is hard. The error occurs in the pseudo server process (which has the database) and is reported back over a connection to the library code wrapping some libc call in some user application. All we can really do is abort(), we can't print to stdout/stderr since we don't even known whether that is available or where it might go. One of the worries is about build determinism. Rather than randomly hitting these issues, could we hit them more consistently? There are two and a half ideas I've had there: a) Adding in a startup DB integrity check. I have a patch which does this, i.e. when the server loads, it just exits if the DB inodes don't match those on disk. The trouble is the server is usually spawned through some application making a glibc call, so reporting any sensible error is near impossible, we can just abort(). We can put a decent error in pseudo.log but that isn't something seen on the console, particularly problematic for CI. Locally in testing, I do see occasional issues with missing files /tmp/ with this. The second issue here is the server startup retry code. It takes pseudo about 80s to timeout startup a server due to the backoff+retry algorithm it understandably has. bitbake sits looking confused during this time (no tasks running) as the worker processes never report in. b) We could add a new command to run an integrity check on the DB to pseudo. If we do that, we would then be able to show the user a decent error and above the timeout issue. The question is where/when to trigger it and whether races could occur against the check (e.g. where multiple fakeroot tasks are running in parallel against the same WORKDIR). c) We could add specialist code to bitbake such that when a fakeroot worker exits with 134, we dump the tail end of the pseudo log if present. That doens't directly fix the issue but would help users debug problems. This does come at a cost of making the bitbake code pseudo specific. Unfortunately the position of pseudo maintainer is effectively open, I know some people have expressed interest but nobody is really working on issues like this. I am open to people's thoughts on the ideas above or whether there is some other approach anyone can see... Cheers, Richard
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#167322): https://lists.openembedded.org/g/openembedded-core/message/167322 Mute This Topic: https://lists.openembedded.org/mt/92020938/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-