Re: [systemd-devel] Monotonic time went backwards, rotating log
Pekka Paalanen writes: > have you checked your boot ID, maybe it's often the same as the previous > boot? Good thought, but it doesn't look like it: IDX BOOT ID FIRST ENTRY LAST ENTRY -20 c2a5e3af1f044d79805c4fbdd120beec Wed 2023-05-10 11:17:40 EDT Wed 2023-05-17 08:59:48 EDT -19 3f20878d92714d09a7928b1d1260074c Wed 2023-05-17 09:30:55 EDT Wed 2023-05-17 09:33:36 EDT -18 22944fc66fc949048c14a3e9e559824e Wed 2023-05-17 09:34:13 EDT Mon 2023-05-22 15:41:05 EDT -17 28c59dbf8d13407c8aa89ef2d3b3024c Tue 2023-05-23 08:34:38 EDT Fri 2023-05-26 14:46:36 EDT -16 e6804a377e984bc499fbc44dd9a14f40 Tue 2023-05-30 08:53:41 EDT Tue 2023-05-30 09:35:00 EDT -15 d3f4946a0f4e4951961ce62ae88d390c Tue 2023-05-30 09:35:36 EDT Thu 2023-06-01 13:00:15 EDT -14 9bd458cbf57d458e869d5405c534d549 Thu 2023-06-01 13:01:06 EDT Thu 2023-06-01 13:01:35 EDT -13 df67fe939f4a434f9beadfe81101e10e Thu 2023-06-01 13:40:22 EDT Tue 2023-06-06 10:56:06 EDT -12 b860691a4da841e6bd223a4035536ef6 Tue 2023-06-06 10:57:13 EDT Wed 2023-06-21 11:16:51 EDT -11 1f72e13c3d2542e69abfd5c38d8050fe Fri 2023-06-23 11:41:45 EDT Mon 2023-06-26 15:48:31 EDT -10 50821dde5780459bbf05d5dffc52ac37 Fri 2023-07-28 15:08:56 EDT Mon 2023-07-31 16:15:21 EDT -9 444f76e5a93b422583e2a8089816aafe Mon 2023-07-31 16:16:13 EDT Fri 2023-08-04 13:26:22 EDT -8 3ff965b69adb4c51b058cba5dcaa4c09 Tue 2023-08-08 12:50:10 EDT Tue 2023-08-08 12:50:19 EDT -7 448ca4a0ef024be0a0dd7ec2b58b1015 Tue 2023-08-08 12:54:43 EDT Thu 2023-08-10 10:17:08 EDT -6 68f66c75cf674dd48b6216cb05c9278c Thu 2023-08-24 10:00:56 EDT Thu 2023-08-24 15:09:49 EDT -5 184ac41dd9164e1786edc74d19e4cef9 Fri 2023-09-01 09:34:58 EDT Tue 2023-09-12 15:28:40 EDT -4 0f7b2c0b1d244769bff218e2933ba46d Mon 2023-09-25 12:04:21 EDT Tue 2023-09-26 10:58:30 EDT -3 f1369263334a4c6db183fa7fa61074c6 Tue 2023-09-26 10:59:10 EDT Thu 2023-10-05 13:02:18 EDT -2 dea7a07fe6d24ad49ce1841e0260b42e Thu 2023-10-05 13:03:49 EDT Thu 2023-10-05 13:11:00 EDT -1 b2d29b9e947942a79303ad6944d7ad31 Thu 2023-10-05 13:11:45 EDT Thu 2023-10-05 13:22:45 EDT 0 4539fd6a1ddf471e8795345cc3965f44 Thu 2023-10-05 13:23:22 EDT Fri 2023-10-06 13:38:45 EDT
Re: [systemd-devel] Monotonic time went backwards, rotating log
Phillip Susi writes: > Lennart Poettering writes: > >> It actually checks that first: >> >> https://github.com/systemd/systemd/blob/main/src/libsystemd/sd-journal/journal-file.c#L2201 > > That's what I'm saying: it should have noticed that FIRST and not gotten > to the monotonic time check, but it didn't. I decided to try looking into this again. It seems it's also my system log that is rotated on each boot with this message about the monotonic clock, despite the fact that it should be rotated first just because it's a new boot. There are some debug prints that might shed light on what is happening, but I can't seem to get them to enable. I tried setting Environment=SYSTEMD_DEBUG_LEVEL=debug on systemd-journald.service, but I still don't get them.
Re: [systemd-devel] Monotonic time went backwards, rotating log
Lennart Poettering writes: > It actually checks that first: > > https://github.com/systemd/systemd/blob/main/src/libsystemd/sd-journal/journal-file.c#L2201 That's what I'm saying: it should have noticed that FIRST and not gotten to the monotonic time check, but it didn't.
Re: [systemd-devel] Monotonic time went backwards, rotating log
Lennart Poettering writes: > We want that within each file all records are strictly ordered by all > clocks, so that we can find specific entries via bisection. Why *all* clocks? Even if you want to search on the monotonic time, you first have to specify a boot ID within which that monotonic time is valid, don't you? So the first step in your search would be to find the boot records, then bisect from there. > The message is debug level, no? log_ratelimit_info(), which appears to be printed by default when I log in, and I presume my session systemd instance is started. I guess that's the problem: it should be debug. Also though, why doesn't it first note that the boot ID changed?
[systemd-devel] Monotonic time went backwards, rotating log
Every time I reboot, when I first log in, journald ( 253.3-r1 ) complains that the monotonic time went backwards, rotating log file. This appears to happen because journal_file_append_entry_internal() wishes to enforce strict time ordering within the log file. I'm not sure why it cares about the *monotonic* time being in strict order though, since that will always go backwards when you reboot. I'm also not sure why the previous check that the boot ID has changed did not trigger. If it is intentional that journals be rotated after a reboot, could it at least be done without complaining about it?
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
Michael Chapman writes: > What specifically is the difference between: > > * swap does not exist at all; > * swap is full of data that will not be swapped in for weeks or months; That's the wrong question. The question is, what is the difference between having NO swap, and having some swap that you don't use much of? The answer to that is that there will be a non zero amount of anonymous memory allocated to processes that hardly ever touch it, and that can be tossed out to swap to provide more memory to use for, if nothing else, caching files that ARE being accessed. Now that amount may not be much if you usually have plenty of free ram, but it won't be zero. I too have long gone without a swap partition because the small benefit of having a little more ram to cache files did not justify the risk of going into thrashing mode when some process went haywire, but if that problem has been solved, and you want a swap partition for hibernation anyhow, then you may as well keep it mounted all the time since unmounting it when you aren't about to hibernate costs *something* and gains *nothing*.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
Lennart Poettering writes: > oomd/PSI looks at memory allocation latencies to determine memory > pressure. Since you disallow anonymous memory to be paged out and thus > increase IO on file backed memory you increase the latencies > unnecessarily, thus making oomd trigger earlier. Did this get changed in the last few years? Because I'm sure it used to be based on the total commit limit, and so OOM wouldn't start killing until your swap was full, which didn't happen until the system was thrashing itself to uselessness for 20 minutes already. If this has been fixed then I guess it's time for me to start using swap again. What happens if you use zswap? Will hibernation try to save things to there instead of a real disk swap? It might be nice to have zswap for normal use and the on disk swap for hibernate.
Re: [systemd-devel] .local searches not working
Silvio Knizek writes: > So in fact your network is not standard conform. You have to define > .local as search and routing domain in the configuration of sd- > resolved. Interesting... so what are you supposed to name your local, private domains? I believe Microsoft used to ( or still do? ) recommend using .local to name your domain if you don't have a public domain name, so surely I'm not the first person to run into this? Why does systemd-resolved not fall back to DNS if it can't first resolve the name using mDNS? That appears to be allowed by the RFC. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] .local searches not working
What special treatment does systemd-resolved give to .local domains? The corporate windows network uses a .local domain and even when I point systemd-resolved at the domain controller, it fails the query without bothering to ask the dc saying: resolve call failed: No appropriate name servers or networks for name found ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Colin Guthrie writes: > I think the defaults are more complex than just "each journal file can > grow to 128M" no? Not as far as I can see. > I mean there is SystemMaxUse= which defaults to 10% of the partition on > which journal files live (this is for all journal files, not just the > SystemMaxFileSize= which refers to just one file). That controls when to delete old journals, not when to rotate a journal. It looks like you can manually request a rotation, and you can set a time based rotation, but it defaults to off, so that leaves rotating once the file reaches the max size ( 128M ). ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Colin Guthrie writes: > Are those journal files suffixed with a ~. Only ~ suffixed journals > represent a dirty journal file (i.e. from an unexpected shutdown). Nope. > Journals rotate for other reason too (e.g. user request, overall space > requirements etc.) which might explain this wasted space? I've made no requests to rotate and config is default, which afaics means only rotate when the log hits max size of 128MB. Thus I wouldn't expect to really see any holes in the log, especially in the middle. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Phillip Susi writes: > Wait, what do you mean the inode nr changes? I thought the whole point > of the block donating thing was that you get a contiguous set of blocks > in the new file, then transfer those blocks back to the old inode so > that the inode number and timestamps of the file don't change. I just tested this with e4defrag and the inode nr does not change. Oddly, it refused to improve my archived journals which had 12-15 fragments. I finally found /var/log/btmp.1 which despite being less than 8mb had several hundred fragments. e4defrag got it down to 1 fragment, but for some reason, it is still described by 3 separate entries in the extent tree. Looking at the archived journals though, I wonder why am I seeing so many unwritten areas? Just the last extent of this file has nearly 4 mb that were never written to. This system has never had an unexpected shutdown. Attached is the extent map. Filesystem type is: ef53 File size of system@13a67b4b418d4869b37247eda6ebe494-00151338-0005b9ee46d7d4a9.journal is 117440512 (28672 blocks of 4096 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0.. 0:1712667.. 1712667: 1: 1:1..2047:1591168.. 1593214: 2047:1712668: 2: 2048..2132:3012608.. 3012692: 85:1593215: 3: 2133..2139:3012693.. 3012699: 7: unwritten 4: 2140..4095:3012700.. 3014655: 1956: 5: 4096..6143:3041280.. 3043327: 2048:3014656: 6: 6144..8191:3010560.. 3012607: 2048:3043328: 7: 8192..9011:3002368.. 3003187:820:3012608: 8: 9012..9013:3003188.. 3003189: 2: unwritten 9: 9014.. 10239:3003190.. 3004415: 1226: 10:10240.. 11255:3024896.. 3025911: 1016:3004416: 11:11256.. 11268:3025912.. 3025924: 13: unwritten 12:11269.. 11348:3025925.. 3026004: 80: 13:11349.. 11352:3026005.. 3026008: 4: unwritten 14:11353.. 11360:3026009.. 3026016: 8: 15:11361.. 11364:3026017.. 3026020: 4: unwritten 16:11365.. 11373:3026021.. 3026029: 9: 17:11374.. 11376:3026030.. 3026032: 3: unwritten 18:11377.. 11642:3026033.. 3026298:266: 19:11643.. 11688:3026299.. 3026344: 46: unwritten 20:11689.. 11961:3026345.. 3026617:273: 21:11962.. 11962:3026618.. 3026618: 1: unwritten 22:11963.. 12287:3026619.. 3026943:325: 23:12288.. 12347:3033088.. 3033147: 60:3026944: 24:12348.. 12381:3033148.. 3033181: 34: unwritten 25:12382.. 12466:3033182.. 3033266: 85: 26:12467.. 12503:3033267.. 3033303: 37: unwritten 27:12504.. 13007:3033304.. 3033807:504: 28:13008.. 13024:3033808.. 3033824: 17: unwritten 29:13025.. 13044:3033825.. 3033844: 20: 30:13045.. 13061:3033845.. 3033861: 17: unwritten 31:13062.. 13081:3033862.. 3033881: 20: 32:13082.. 13098:3033882.. 3033898: 17: unwritten 33:13099.. 13642:3033899.. 3034442:544: 34:13643.. 13648:3034443.. 3034448: 6: unwritten 35:13649.. 13655:3034449.. 3034455: 7: 36:13656.. 13660:3034456.. 3034460: 5: unwritten 37:13661.. 13667:3034461.. 3034467: 7: 38:13668.. 13673:3034468.. 3034473: 6: unwritten 39:13674.. 13680:3034474.. 3034480: 7: 40:13681.. 13685:3034481.. 3034485: 5: unwritten 41:13686.. 13692:3034486.. 3034492: 7: 42:13693.. 13698:3034493.. 3034498: 6: unwritten 43:13699.. 14276:3034499.. 3035076:578: 44:14277.. 14277:3035077.. 3035077: 1: unwritten 45:14278.. 14458:3035078.. 3035258:181: 46:14459.. 14529:3035259.. 3035329: 71: unwritten 47:14530.. 14570:3035330.. 3035370: 41: 48:14571.. 14641:3035371.. 3035441: 71: unwritten 49:14642.. 14928:3035442.. 3035728:287: 50:14929.. 15002:3035729.. 3035802: 74: unwritten 51:15003.. 15837
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Lennart Poettering writes: > inode, and then donate the old blocks over. This means the inode nr > changes, which is something I don't like. Semantically it's only > marginally better than just creating a new file from scratch. Wait, what do you mean the inode nr changes? I thought the whole point of the block donating thing was that you get a contiguous set of blocks in the new file, then transfer those blocks back to the old inode so that the inode number and timestamps of the file don't change. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [EXT] Re: consider dropping defrag of journals on btrfs
Chris Murphy writes: > It's not interleaving. It uses delayed allocation to make random > writes into sequential writes. It's tries harder to keep file blocks Yes, and when you do that, you are inverleaving data from multiple files into a single stream, which you really shouldn't be doing. IIRC, XFS has special io streaming modes specifically designed to *prevent* this from happening and record multiple video streams simultaniously to different parts of the disk to keep them from being fragmented to hell like that. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Chris Murphy writes: > And I agree 8MB isn't a big deal. Does anyone complain about journal > fragmentation on ext4 or xfs? If not, then we come full circle to my > second email in the thread which is don't defragment when nodatacow, > only defragment when datacow. Or use BTRFS_IOC_DEFRAG_RANGE and > specify 8MB length. That does seem to consistently no op on nodatacow > journals which have 8MB extents. Ok, I agree there. > The reason I'm dismissive is because the nodatacow fragment case is > the same as ext4 and XFS; the datacow fragment case is both > spectacular and non-deterministic. The workload will matter where Your argument seems to be that it's no worse than ext4 and so if we don't defrag there, why on btrfs? Lennart seems to be arguing that the only reason systemd doesn't defrag on ext4 is because the ioctl is harder to use. Maybe it should defrag as well, so he's asking for actual performance data to evaluate whether the defrag is pointless or whether maybe ext4 should also start doing a defrag. At least I think that's his point. Personally I agree ( and showed the calculations in a previous post ) that 8 MB/fragment is only going to have a negligiable impact on performance and so isn't worth bothering with a defrag, but he has asked for real world data... > And also, only defragmenting on rotation strikes me as leaving > performance on the table, right? If there is concern about fragmented No, because fragmentation only causes additional latency on HDD, not SSD. > But it sounds to me like you want to learn what the performance is of > journals defragmented with BTFS_IOC_DEFRAG specifically? I don't think > it's interesting because you're still better off leaving nodatacow > journals alone, and something still has to be done in the datacow Except that you're not. Your definition of better off appears to be only on SSD and only because it is preferable to have fewer writes than less fragmentation. On HDD defragmenting is a good thing. Lennart seems to want real world performance data to evaluate just *how* good and whether it's worth the bother, at least for HDD. For SSDs, I believe he agreed that it may as well be shut off there since it provides no benefit, but your patch kills it on HDDs as well. > Is there a test mode for journald to just dump a bunch of random stuff > into the journal to age it? I don't want to wait weeks to get a dozen > journal files. The cause of the fragmentation is slowly appending to the file over time, so if you dump a bunch of data in too quickly you would eliminate the fragmentation. You might try: while true ; do logger "This is a test log message to act as filler" ; sleep 1 ; done To speed things up a little bit. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [EXT] Re: consider dropping defrag of journals on btrfs
Chris Murphy writes: > Basically correct. It will merge random writes such that they become > sequential writes. But it means inserts/appends/overwrites for a file > won't be located with the original extents. Wait, I thoguht that was only true for metadata, not normal file data blocks? Well, maybe it becomes true for normal data if you enable compression. Or small files that get leaf packed into the metadata chunk. If it's really combining streaming writes from two different files into a single interleaved write to the disk, that would be really silly. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Chris Murphy writes: > I showed that the archived journals have way more fragmentation than > active journals. And the fragments in active journals are > insignificant, and can even be reduced by fully allocating the journal Then clearly this is a problem with btrfs: it absolutely should not be making the files more fragmented when asked to defrag them. > file to final size rather than appending - which has a good chance of > fragmenting the file on any file system, not just Btrfs. And yet, you just said the active journal had minimal fragmentation. That seems to mean that the 8mb fallocates that journald does is working well. Sure, you could proabbly get fewer fragments by fallocating the whole 128 mb at once, but there are tradeoffs to that that are not worth it. One fragment per 8 mb isn't a big deal. Ideally a filesystem will manage to do better than that ( didn't btrfs have a persistent reservation system for this purpose? ), but it certainly should not commonly do worse. > Further, even *despite* this worse fragmentation of the archived > journals, bcc-tools fileslower shows no meaningful latency as a > result. I wrote this in the previous email. I don't understand what > you want me to show you. *Of course* it showed no meaningful latency because you did the test on an SSD, which has no meaningful latency penalty due to fragmentation. The question is how bad is it on HDD. > And since journald offers no ability to disable the defragment on > Btrfs, I can't really do a longer term A/B comparison can I? You proposed a patch to disable it. Test before and after the patch. > I did provide data. That you don't like what the data shows: archived > journals have more fragments than active journals, is not my fault. > The existing "optimization" is making things worse, in addition to > adding a pile of unnecessary writes upon journal rotation. If it is making things worse, that is definately a bug in btrfs. It might be nice to avoid the writes on SSD though since there is no benefit there. > Conversely, you have not provided data proving that nodatacow > fallocated files on Btrfs are any more fragmented than fallocated > files on ext4 or XFS. That's a fair point: if btrfs isn't any worse than other filessytems, then why is it the only one that gets a defrag? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Chris Murphy writes: >> It sounds like you are arguing that it is better to do the wrong thing >> on all SSDs rather than do the right thing on ones that aren't broken. > > No I'm suggesting there isn't currently a way to isolate > defragmentation to just HDDs. Yes, but it sounded like you were suggesting that we shouldn't even try, not just that it isn't 100% accurate. Sure, some SSDs will be stupid and report that they are rotational, but most aren't stupid, so it's a good idea to disable the defragmentation on drives that report that they are non rotational. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Lennart Poettering writes: > journalctl gives you one long continues log stream, joining everything > available, archived or not into one big interleaved stream. If you ask for everything, yes... but if you run journalctl -b then shuoldn't it only read back until it finds the start of the current boot? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Maksim Fomin writes: > I would say it depends on whether defragmentation issues are feature > of btrfs. As Chris mentioned, if root fs is snapshotted, > 'defragmenting' the journal can actually increase fragmentation. This > is an example when the problem is caused by a feature (not a bug) in > btrfs. For example, my 'system.journal' file is currently 16 MB and > according to filefrag it has 1608 extents (consequence of snapshotted > rootfs?). It looks too much, if I am not missing some technical Holy smokes! How did btrfs manage to butcher that poor file that badly? It shouldn't be possible for it to be *that* bad. I mean, that's only an average of 10kb per fragment! ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Dave Howorth writes: > PS I'm subscribed to the list. I don't need a copy. FYI, rather than ask others to go out of their way when replying to you, you should configure your mail client to set the Reply-To: header to point to the mailing list address so that other people's mail clients do what you want automatically. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Lennart Poettering writes: > Nope. We always interleave stuff. We currently open all journal files > in parallel. The system one and the per-user ones, the current ones > and the archived ones. Wait... every time you look at the journal at all, it has to read back through ALL of the archived journals, even if you are only interested in information since the last boot that just happened 5 minutes ago? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Lennart Poettering writes: > You are focussing only on the one-time iops generated during archival, > and are ignoring the extra latency during access that fragmented files > cost. Show me that the iops reduction during the one-time operation > matters and the extra latency during access doesn't matter and we can > look into making changes. But without anything resembling any form of > profiling we are just blind people in the fog... I'm curious why you seem to think that latency accessing old logs is so important. I would think that old logs tend to be accessed very rarely. On such a rare occasion, a few extra mS doesn't seem very important to me. Even if it's on a 5400 rpm drive, typical latency is what? 8 mS? Even with a fragment every 8 MB, that's only going to add up to an extra 128 mS to read and parse a 128 MB log file. Even with no fragments it's going to take over 1 second to read that file, so we're only talking about a ~11% slow down here, on an operation that is rare and you're going to be spending far more time actually looking at the log than it took to read off the disk. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Chris Murphy writes: > But it gets worse. The way systemd-journald is submitting the journals > for defragmentation is making them more fragmented than just leaving > them alone. Wait, doesn't it just create a new file, fallocate the whole thing, copy the contents, and delete the original? How can that possibly make fragmentation *worse*? > All of those archived files have more fragments (post defrag) than > they had when they were active. And here is the FIEMAP for the 96MB > file which has 92 fragments. How the heck did you end up with nearly 1 frag per mb? > If you want an optimization that's actually useful on Btrfs, > /var/log/journal/ could be a nested subvolume. That would prevent any > snapshots above from turning the nodatacow journals into datacow > journals, which does significantly increase fragmentation (it would in > the exact same case if it were a reflink copy on XFS for that matter). Wouldn't that mean that when you take snapshots, they don't include the logs? That seems like an anti feature that violates the principal of least surprise. If I make a snapshot of my root, I *expect* it to contain my logs. > I don't get the iops thing at all. What we care about in this case is > latency. A least noticeable latency of around 150ms seems reasonable > as a starting point, that's where users realize a delay between a key > press and a character appearing. However, if I check for 10ms latency > (using bcc-tools fileslower) when reading all of the above journals at > once: > > $ sudo journalctl -D > /mnt/varlog33/journal/b51b4a725db84fd286dcf4a790a50a1d/ --no-pager > > Not a single report. None. Nothing took even 10ms. And those journals > are more fragmented than your 20 in a 100MB file. > > I don't have any hard drives to test this on. This is what, 10% of the > market at this point? The best you can do there is the same as on SSD. The above sounded like great data, but not if it was done on SSD. Of course it doesn't cause latency on an SSD. I don't know about market trends, but I stopped trusting my data to SSDs a few years ago when my ext4 fs kept being corrupted and it appeared that the FTL of the drive was randomly swapping the contents of different sectors around when I found things like the contents of a text file in a block of the inode table or a directory. > You can't depend on sysfs to conditionally do defragmentation on only > rotational media, too many fragile media claim to be rotating. It sounds like you are arguing that it is better to do the wrong thing on all SSDs rather than do the right thing on ones that aren't broken. > Looking at the two original commits, I think they were always in > conflict with each other, happening within months of each other. They > are independent ways of dealing with the same problem, where only one > of them is needed. And the best of the two is fallocate+nodatacow > which makes the journals behave the same as on ext4 where you also > don't do defragmentation. This makes sense. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] consider dropping defrag of journals on btrfs
Lennart Poettering writes: > Well, at least on my system here there are still like 20 fragments per > file. That's not nothin? In a 100 mb file? It could be better, but I very much doubt you're going to notice a difference after defragmenting that. I may be the nut that rescued the old ext2 defrag utility from the dustbin of history, but even I have to admit that it isn't really important to use and there is a reasson why the linux community abandoned it. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Antw: [EXT] emergency shutdown, don't wait for timeouts
Reindl Harald writes: > i have seen "user manager" instances hanging for way too long and way > more than 3 minutes over the last 10 years The default timeout is 3 minutes iirc, so at that point it should be forcibly killed. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Antw: [EXT] emergency shutdown, don't wait for timeouts
Reindl Harald writes: > topic missed - it makes no difference if it can hold the power 3 > minutes, 3 hours or even 3 days at the point where it decides "i need to > shutdown everything because the battery goes empty" It is that point that really should be at least 3 minutes before power fails. As long as the battery lasts for at least 3 minutes, then the monitoring daemon should easily be able to begin the shutdown when 3 minutes remain. I'm not sure that forcibly killing services to quickly shut down is really much better than the sudden power loss you are trying to avoid. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] ssh.service in rescue.target
Simon McVittie writes: > The Debian/Ubuntu package for systemd already masks various services > that are superseded by something in systemd, such as procps.service and > rcS.service. It used to also mask all the services from initscripts, > but that seems to have been dropped in version 243-5. Ahh, that explains why it seems to be implicitly masked on 18.04. > Perhaps the systemd Debian/Ubuntu package still needs to mask rc1 services > like killprocs, or perhaps the initscripts package should take over Sounds like it. >> initramfs-tools does not depend on initscripts, but *breaks* it, which >> should mean it is not possible for both packages to be installed at the >> same time. > > initramfs-tools only Breaks initscripts (<< 2.88dsf-59.3~), which means > it is possible for both to be installed at the same time, as long as > initscripts is at a sufficiently new version. Yes, but why is it listed as *depending* on initscripts when it only breaks it ( and only an older version at that )? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] ssh.service in rescue.target
Michael Biebl writes: > Are you sure? > Which Ubuntu version is that? > At least in Debian, /etc/init.d/killprocs is shipped by "initscripts" > which is no longer installed by default. 20.04. apt-cache rdepends shows: Reverse Depends: sysv-rc util-linux hostapd wpasupplicant util-linux initramfs-tools base-files hostapd wpasupplicant sysvinit-utils initramfs-tools base-files console-setup-linux So it looks like it's a required package. I guess I'll try masking it. Hrm... odd... I wondered why util-linux would depend on initscripts... apt-cache depends util-linux says that it does not *depend* on it but *replaces* it. Doesn't that mean that when util-linux is installed, initscripts should be removed? And initramfs-tools does not depend on initscripts, but *breaks* it, which should mean it is not possible for both packages to be installed at the same time. WTF over? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] ssh.service in rescue.target
Lennart Poettering writes: > Are you running systemd? If so, please get rid of "killproc". It will > interfere with systemd's service management. I see.. apparently Ubuntu still has it around. How does systemd handle it? For instance, if a user logged in and forked off a background process, how does systemd make sure it gets killed when isolating to rescue.target? Does it decide that it is still connected to ssh.service and so won't kill it when isolating? I'd like to make sure anything like that is killed and maybe restart sshd if needed. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] ssh.service in rescue.target
Lennart Poettering writes: > What is "killprocs"? > > Is something killing services behind systemd's back? What's that > about? It's the thing that kills all remaining processes right before shutdown that we've had since the sysvinit? And also when isolating I suppose. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] ssh.service in rescue.target
Lennart Poettering writes: > Look at the logs? > > if they are not immeidately helpful, consider turning on debug logging > in systemd first, and then redoing the action and then looking at the > logs. You can use "systemd-analyze log-level debug" to turn on debug > logging in PID 1 any time. It appears that systemd decides that ssh.service should remain running, removes the redundant start job since it is already running, but killprocs sends sshd a SIGTERM, so it shuts down, and systemd decides not to restart it. iirc, there was a list of pids that would NOT be killed at that stage... it appears that the pid for ssh.service isn't getting placed in that list. How did that work again? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] ssh.service in rescue.target
I used to just have to add-wants ssh.service to rescue.target and I could isolate to rescue mode for remote system maintainence without loosing remote access to the server. After an upgrade, even though ssh.service is wanted by rescue.target, it is still killed if I isolate. How can I figure out why? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Hotplug auto mounting and masked mount units
Lennart Poettering writes: > Can you file a bug about this? Sounds like something to fix. Sure. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Hotplug auto mounting and masked mount units
Someone in #debian mentioned to me that they were getting some odd errors in their logs when running gparted. It seems that several years ago there was someone with a problem caused by systemd auto mounting filesystems in response to udev events triggered by gparted, and so as a workaround, gparted masks all mount units. Curtis Gedeck and I can't seem to figure out now, why this was needed because we can't seen to get systemd to automatically mount a filesystem just because it's device is hot plugged. Are there any circumstances under which systemd will mount a filesystem when it's device is hotplugged? Also I'm pretty sure this part is a bug in systemd: any service that depends on -.mount ( so most of them ) it will refuse to start while -.mount is masked. It shouldn't matter that it's masked if it is already mounted should it? Only if it isn't mounted, then it can't be mounted to satisfy the dependency. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Inhibiting plug and play
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 7/16/2013 1:23 PM, Lennart Poettering wrote: So, Kay suggested we should use BSD file locks for this. i.e. all tools which want to turn off events for a device would take one on that specific device fd. As long as it is taken udev would not generate events. As soon as the BSD lock is released again it would recheck the device. To me this sounds like a pretty clean thing to do. Locks usually suck, but for this purpose they appear to do exactly what they should, and most of the problematic things with them don't apply in this specific case. Doing things way would be quite robust, as we have clean synchronization and the kernel will release the locks automatically when the owner dies. Opinions? Sounds like it might work. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJR5YRIAAoJEJrBOlT6nu75VLUH/3X7fHhppdUCw5WFt1PpitKK O9tuPcs9RZBWhaQ+Ol9Sp82qnEG+mqmmCLAc3z35Zj1PpNRKTLYrGWbmqlbkPsks ZU4UZTnr9i03uDRuQfSMtUsTpnriBILT8tfyPkH7XYulGBligI3D3Sdk6LWD6Y6J tm0SnVlOk/tm4FasWFT4KlFp/obRuL8yUBnZvgYqyTblCeVTX2013xEtXN/TG9pH 4iNSgRTQ98K/EdZQP1yz2j/LSLn0MFQTCPU4YuDVmds9nU2iZAllaY+sSMQCSkm6 Ks4giagyhKsBw8oy3AAN5f/VEvpriuAAVrLxNNaTzTAfR//J7gHwzB40ploB3oo= =+o3u -END PGP SIGNATURE- ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Inhibiting plug and play
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Various tools, but most notably partitioners, manipulate disks in such a way that they need to prevent the rest of the system from racing with them while they are in the middle of manipulating the disk. Presently this is done with a hodge podge of hacks that involve running some script or executable to temporarily hold off on some aspects ( typically only auto mounting ) of plug and play processing. Which one depends on whether you are running hal, udisks, udisks2, or systemd. There really needs to be a proper way at a lower level, either udev, or maybe in the kernel, to inhibit processing events until the tool changing the device has finished completely. The question is, should this be in the kernel, or in udev, and what should the interface be? -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRwJylAAoJEJrBOlT6nu75TlAH/1Eso89Jta4AFn/ynYZUWwVD xS1Nm8ZbRHQizBFmv5rq5Yunr6XUcUQlux9EeG81QwgJ2mgOAk3XE2ldzOp0lUei cqQYsrdWKHXz8ZXpNG1Jsgw77EUyrs39Z6NmNC+X1AcFbzxRXplGMTJfRSWtW3bw Ngi8MCjKZOx/qNzUcyZnR3tdAF0veLHWtr7j5XvgO+/iomnAxIOcYiSCv1OeDMdX SCx8bULT4/LaRWzbcmpzmh1irMsXavrOwuPzIGBTdMKhByyxnwxiOdIyhOs1OJda 059zK7CxMNidD37ON9hMyMtYz5BeCzZmPJdJ6Ef4G7ZrH++xiI4cGvgVOClP6vI= =Ym1b -END PGP SIGNATURE- ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Inhibiting plug and play
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 6/18/2013 2:03 PM, David Zeuthen wrote: When I was younger I used to think things like this was a good idea and, in fact, did a lot of work to add complex interfaces for this in the various components you mention. These interfaces didn't really work well, someone would always complain that this or that edge-case didn't work. Or some other desktop environment ended up not using the interfaces. Or some kernel hacker running twm (with carefully selected bits of GNOME or KDE to get automounting) ran into problems. It was awful. Just awful. I can't really extract any meaning from this without knowledge of what was tried and what problems it caused. I also don't see why it can't be something as simple as opening the device with O_EXCL. What _did_ turn out to work really well - and what GNOME is using today and have been for the last couple of years - is that the should_automount flag [1] is set only if, and only if, the device the volume is on, has been added within the last five seconds [2]. It's incredibly simple (and low-tech). And judging from bug reports, it works really well. I don't follow. You mean udisks delays auto mounting by 5 seconds? That's not going to help if, for instance, you use gparted to move a partition to the right. It first enlarges the partition, which generates a remove/add event, then starts moving data. 5 seconds later udisks tries to mount the partition, which very well may succeed with horrible consequences. The problem also goes beyond udisks and auto mounting, which is why I say it really needs done either at the udev or kernel level. For instance, a udev script may identify the new volume as part of a raid ( leftover metadata ) and try to attach mdadm to it, at the same time you're running mkfs. I'm also pretty sure that I have seen the mdadm udev script race with mdadm itself while you are trying to create a new raid volume. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRwKm1AAoJEJrBOlT6nu75ZqMIAM/EUrDIQQn6O5dlCMAOwGSm h/D5Pbb6amPmDiFELooQgb+BMuUw9bAYwdcukMWZB1MqBTMBOtwLGTeI9TEeWH4y y2c753e2JBgkPnzY6iFkfPXDvsTEIZSHsx00YLZt06aDL5k/Fmt5eN+mD5pSiC2T l1qSdhtEw2IseWVuXOjwjy5K00vIDDAaLG1o2Ff135gNx/wUaOK8nL0vSUZhDK96 V3WS4LGKJDlrGESeAyDELfuExrvtmASgohlpUEy2IK9R9lpNicudStPDZFp+dzCA wv/D1HXkZiIRS74u6Nl3BLtWWd9rPF0ub2OXKCwURYXl2ULE7bPwaiJIdtYp/zo= =BWbx -END PGP SIGNATURE- ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel