Re: [Dng] OT - It may be only one file, but it does point to the bigger problem!
On Tue, Feb 24, 2015 at 12:57 AM, Noel Torres wrote: > We have RAID tools like mdadm for RAID, and filesystems like ext4 or Reiserfs > for file storage. > > Why would I want a tool combining both? You'd want one so you can, for isntance, avoid a RAID5 write hole. ZFS seems pretty cool, the only downside i see is perhaps more fragmentation that other systems. Cheers, Nuno ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [Dng] OT - It may be only one file, but it does point to the bigger problem
On Mon, 23 Feb 2015 16:46:34 -0600 "T.J. Duchene" wrote: > > > > My philosopher as a free software author is this: The buck stops > > with me. If my software screws up, it's my fault and my > > responsibility to fix, regardless of the actual root cause is in > > code I wrote or a tool I use. > > > > If I were having problems with two different compilers treating my > > code two different ways, I'd #ifdef the hell out of it to kludge it > > back to working order on both. > > > > But that's just me. I've seen a lot of free software authors say > > "hey, it's not my fault, it's the __ library or tool. Doesn't > > help the user a heck of a lot. > > > > SteveT > > > > That's a fair point, in an overall sense, Steve. I'm afraid as a > matter of practicality, I must disagree. > > Debugging on a compiler is a very specific skill-set. Asking someone > who doesn't do that every day to fix what is probably a compiler bug > is asking a lot - especially when you may have to venture into the > realm of processor mnemonics and specific registers to fix the > problem. > > In my opinion, that is especially relevant when dealing with ARM > because there are so many makers of ARM processors with specific > tweaks. > > T.J. Ahhh, now we're in my turf: Troubleshooting. If ARM restricts your choice of compilers, then I'll agree with you vis-a-vis ARM, sort of. For the wider application of my philosophy, it's amazing how little subject matter expertise (in this case tracing a compiler all the way down to instructions and registers) one needs in order to troubleshoot very effectively. Just as one example, in my classes I teach the power of having one system malfunctioning and one not malfunctioning. You can continue making each like the other until you can toggle the symptom with one statement. I call it "exploit the differences", and it's very powerful. So, let's say that I can narrow it down to (just to pull an imaginary example out of the air) clang crashing on memset() while gcc doesn't. Obviously, I'd better be sure the locale is the same on both. The next step could be writing a simplest case that does nothing but a memset, and see if it still crashes on memset(). If so, then I could write my own memset, and see if that crashes, and investigate why. Eventually perhaps, on clang, I could ifdef in my own memset(). Or, if I have the skills, I could trace memset into assembler. Perhaps a single memset wouldn't reproduce the symptom. I can then keep reducing the program until I get the smallest program that can reproduce the symptom, and experiment with that. And of course, the most likely scenario will be that it's *my* bad code, not the compilers, but even if I can prove it's the compiler's, I can work around it while I wait for the compiler guys to fix their compiler now that I've reported the problem. When I find a situation of unexpected behavior with a library or tool, I usually just work around it and report it to the library devs. The last thing the user needs is me and the library devs pointing fingers at each other. SteveT Steve Litt* http://www.troubleshooters.com/ Troubleshooting Training * Human Performance ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [Dng] OT - It may be only one file, but it does point to the bigger problem!
On Sunday, 22 de February de 2015 18:28:06 Jim Murphy escribió: [...] > If I have a btrfs mirror and I didn't mess with it by setting FS_NOCOW, > shouldn't I be able to recover the file? I would sure hope so. He > creates this "better" way of logging, then he seems to not even care if > you can use it. Isn't btrfs the contrary to KISS? We have RAID tools like mdadm for RAID, and filesystems like ext4 or Reiserfs for file storage. Why would I want a tool combining both? er Envite -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? OpenPGP key: 1586 50C8 7DBF B050 DE62 EA12 70B4 00F3 EEC7 C372 Spiral galaxies always have at least TWO arms. signature.asc Description: This is a digitally signed message part. ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [Dng] OT - It may be only one file, but it does point to the bigger problem
On Monday, February 23, 2015 04:46:34 PM you wrote: > > My philosopher as a free software author is this: The buck stops with > > me. If my software screws up, it's my fault and my responsibility to > > fix, regardless of the actual root cause is in code I wrote or a tool I > > use. > > > > If I were having problems with two different compilers treating my code > > two different ways, I'd #ifdef the hell out of it to kludge it back to > > working order on both. > > > > But that's just me. I've seen a lot of free software authors say "hey, > > it's not my fault, it's the __ library or tool. Doesn't help the > > user a heck of a lot. > > > > SteveT > > That's a fair point, in an overall sense, Steve. I'm afraid as a matter of > practicality, I must disagree. > > Debugging on a compiler is a very specific skill-set. Asking someone who > doesn't do that every day to fix what is probably a compiler bug is asking a > lot - especially when you may have to venture into the realm of processor > mnemonics and specific registers to fix the problem. > > In my opinion, that is especially relevant when dealing with ARM because > there are so many makers of ARM processors with specific tweaks. > > T.J. I realize I should have spoken more clearly and for that I apologize. I'll endeavor to be clearer in the future. What I was trying say is that, I agree that you should make every effort to make sure your code works, ultimately you are somewhat hostage to the compiler. The average programmer has no skills in that area, and they should simply not make a greater mess by altering their design to accommodate someone else's flaw. These "chains of flaws" go one for years. What is really scary is that eventually people's code *depends* upon the flaw, and that - to me at least - is unacceptable. As a matter of personal pride, I refuse to "kludge" up my code to fix bugs in other people's code. Readable code is "un-kludged" code. If possible, I will hunt down the bug and fix it. If that is not possible, I will either rewrite the code to not trigger the bug, or a patch will be placed in a separate file to check for processor type. Have a great day! T.J. ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
[Dng] OT - It may be only one file, but it does point to the bigger problem
> My philosopher as a free software author is this: The buck stops with > me. If my software screws up, it's my fault and my responsibility to > fix, regardless of the actual root cause is in code I wrote or a tool I > use. > > If I were having problems with two different compilers treating my code > two different ways, I'd #ifdef the hell out of it to kludge it back to > working order on both. > > But that's just me. I've seen a lot of free software authors say "hey, > it's not my fault, it's the __ library or tool. Doesn't help the > user a heck of a lot. > > SteveT > That's a fair point, in an overall sense, Steve. I'm afraid as a matter of practicality, I must disagree. Debugging on a compiler is a very specific skill-set. Asking someone who doesn't do that every day to fix what is probably a compiler bug is asking a lot - especially when you may have to venture into the realm of processor mnemonics and specific registers to fix the problem. In my opinion, that is especially relevant when dealing with ARM because there are so many makers of ARM processors with specific tweaks. T.J. ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [Dng] OT - It may be only one file, but it does point to the bigger problem!
On Mon, Feb 23, 2015 at 11:47:16AM +0100, Didier Kryn wrote: > As far as I understand, COW means that the whole file is > rewritten everytime you change a single byte in it (or is it only > some "extent"?). That's a real mess when you are continuously > appending to files hundreds of megabytes large, which is the job of > a log server. No, only a single block. This is sometimes unwanted as it causes fragmentation -- your nice contiguous extents will split into small page/leaf-sized blocks all around, but NOCOW is still a terrible idea. It breaks pretty much all reasons one might want btrfs over an old-style filesystem (other than compression and checksums). NOCOW breaks the semantics behind reflinks and snapshots, which mean you can't use them for cloning stuff, backups, etc, anymore. Thus, every single program that uses NOCOW without an explicit request from the admin is broken and shouldn't be used anywhere near btrfs. > If you happen to loose the log files, you don't loose precious data. If you have two clones, writing to one will overwrite the other. If you try to roll back to an old snapshot, whether for forensic or data recovery reasons, the log is lost. > Nevertheless I would rather use a different filesystem for /var for > example and keep btrfs for /usr and /home. Having all dpkg-managed files (ie, / except /home, /srv, perhaps /var/cache and friends if you micromanage) on a single btrfs subvolume is required for proper atomic snapshots. -- // If you believe in so-called "intellectual property", please immediately // cease using counterfeit alphabets. Instead, contact the nearest temple // of Amon, whose priests will provide you with scribal services for all // your writing needs, for Reasonable and Non-Discriminatory prices. ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [Dng] OT - It may be only one file, but it does point to the bigger problem!
Le 22/02/2015 19:28, Jim Murphy a écrit : Hi, First let me make it clear I'm not a fan of either systemd of journald. I've been watching the "btrfs-linux" mailing list, when the following subject popped up a few days ago: Systemd 219 now sets the special FS_NOCOW file flag for its journal files, possibly breaking RAID repairs.[1] From what I can glean from the thread and from "[systemd-devel] [ANNOUNCE] systemd 219"[2] the concern is for the ability of btrfs to recover the systemd-journald file if it becomes corrupted. Poettering seems to be concerned about write speed, the reason for setting FS_NOCOW it the first place. I wonder it the speed issue is due to the fact that his team are all developing on systems with SSDs. There was also the statement that the way FS_NOCOW is set, it only involves the one file and not the filesystem itself. I didn't see anything that contradicted that statement, but I could have missed it. Part of the discussion: btrfs checksumming theoretically allows you to transparently recover after media corruption if filesystem has redundancy (more than one copy of data). Journald checksum will probably detect corruption, but can it repair it? No it cannot. But btrfs checksumming cannot fix things for you either if you lose non-trivial amounts of data. It might be able to fix a few bits of errors, but not non-trivial amounts. I mean, that's a simple property of error correction codes: the more you want to be able to correct the longer must your checksum be. Neither btrfs' nor journald's are substantial enough to correct even a sector... Lennart If I have a btrfs mirror and I didn't mess with it by setting FS_NOCOW, shouldn't I be able to recover the file? I would sure hope so. He creates this "better" way of logging, then he seems to not even care if you can use it. Systemd, to me, is a horror story. The more I read the scarier it gets. At the very beginning of the 219 Lennart announcement you find this: Note that this version is not available in Fedora F22/F23 yet. The linker on ARM segfaults. Since the i386 and x86_64 versions built fine, I decided to release 219 anyway. Onward no matter what. Ready or not here systemd comes. We can only hope that, sooner rather then later, it catches up with them and bites them, you know where. [1] The archive for the thread starts here: http://thread.gmane.org/gmane.comp.file-systems.btrfs/43187 [2] The actual Systemd 219 announcement and LONG discussion can be found here: http://lists.freedesktop.org/archives/systemd-devel/2015-February/028447.html Just another 2¢ in the pot. Has anyone been keeping track of how much is in the pot? :-) Jim ___ Hi Jim. As far as I understand, COW means that the whole file is rewritten everytime you change a single byte in it (or is it only some "extent"?). That's a real mess when you are continuously appending to files hundreds of megabytes large, which is the job of a log server. I have played with the filesystem bits, wanting to try automatic compression, but not forcing it by default, and, for sure, they can be set per file. And I doubt it would affect the filesystem's journal. The NOCOW bit for log files therefore makes sense. If you happen to loose the log files, you don't loose precious data. Nevertheless I would rather use a different filesystem for /var for example and keep btrfs for /usr and /home. Didier ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
[Dng] OT - It may be only one file, but it does point to the bigger problem!
> > Systemd, to me, is a horror story. The more I read the scarier it gets. > > At the very beginning of the 219 Lennart announcement you find this: > > Note that this version is not available in Fedora F22/F23 yet. The > > linker on ARM segfaults. Since the i386 and x86_64 versions built > > fine, I decided to release 219 anyway. Systemd has its problems, I agree. However, that said, before you take anyone - even Lennart - to task on such a comment, please consider objectively that it may not be a code problem, but in fact a compiler problem. I'm am not familiar with the specifics of the situation, but I felt compelled to mention that GCC has a long history of processor specific problems, which I have experienced firsthand. The only truth that I can be certain of from reading this is that GCC works best only on x86 processors, and that has not changed in nearly 2 decades. It is also true that a lot of opensource code, even the Linux kernel, presently only compiles properly on GCC, rather than others such as Clang/LLVM. > Onward no matter what. Ready or not here systemd comes. We can only > hope that, sooner rather then later, it catches up with them and bites > them, you know where. In keeping with commonsense and not hysteria, I hope they do fix things with eventually, but the truth is that compilers - regardless of language - can be finicky beasts from one processor family to the next. T.J. ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
[Dng] OT - It may be only one file, but it does point to the bigger problem!
Hi, First let me make it clear I'm not a fan of either systemd of journald. I've been watching the "btrfs-linux" mailing list, when the following subject popped up a few days ago: Systemd 219 now sets the special FS_NOCOW file flag for its journal files, possibly breaking RAID repairs.[1] From what I can glean from the thread and from "[systemd-devel] [ANNOUNCE] systemd 219"[2] the concern is for the ability of btrfs to recover the systemd-journald file if it becomes corrupted. Poettering seems to be concerned about write speed, the reason for setting FS_NOCOW it the first place. I wonder it the speed issue is due to the fact that his team are all developing on systems with SSDs. There was also the statement that the way FS_NOCOW is set, it only involves the one file and not the filesystem itself. I didn't see anything that contradicted that statement, but I could have missed it. Part of the discussion: >> btrfs checksumming theoretically allows you to transparently recover >> after media corruption if filesystem has redundancy (more than one >> copy of data). Journald checksum will probably detect corruption, but >> can it repair it? > No it cannot. > But btrfs checksumming cannot fix things for you either if you lose > non-trivial amounts of data. It might be able to fix a few bits of > errors, but not non-trivial amounts. I mean, that's a simple property > of error correction codes: the more you want to be able to correct the > longer must your checksum be. Neither btrfs' nor journald's are > substantial enough to correct even a sector... > Lennart If I have a btrfs mirror and I didn't mess with it by setting FS_NOCOW, shouldn't I be able to recover the file? I would sure hope so. He creates this "better" way of logging, then he seems to not even care if you can use it. Systemd, to me, is a horror story. The more I read the scarier it gets. At the very beginning of the 219 Lennart announcement you find this: > Note that this version is not available in Fedora F22/F23 yet. The > linker on ARM segfaults. Since the i386 and x86_64 versions built > fine, I decided to release 219 anyway. Onward no matter what. Ready or not here systemd comes. We can only hope that, sooner rather then later, it catches up with them and bites them, you know where. [1] The archive for the thread starts here: http://thread.gmane.org/gmane.comp.file-systems.btrfs/43187 [2] The actual Systemd 219 announcement and LONG discussion can be found here: http://lists.freedesktop.org/archives/systemd-devel/2015-February/028447.html Just another 2¢ in the pot. Has anyone been keeping track of how much is in the pot? :-) Jim ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng