Re: [Dng] OT - It may be only one file, but it does point to the bigger problem

2015-02-24 Thread Steve Litt
On Mon, 23 Feb 2015 16:46:34 -0600
T.J. Duchene t.j.duch...@gmail.com wrote:

 
 
  My philosopher as a free software author is this: The buck stops
  with me. If my software screws up, it's my fault and my
  responsibility to fix, regardless of the actual root cause is in
  code I wrote or a tool I use.
  
  If I were having problems with two different compilers treating my
  code two different ways, I'd #ifdef the hell out of it to kludge it
  back to working order on both.
  
  But that's just me. I've seen a lot of free software authors say
  hey, it's not my fault, it's the __ library or tool. Doesn't
  help the user a heck of a lot.
  
  SteveT
  
 
 That's a fair point, in an overall sense, Steve.  I'm afraid as a
 matter of practicality, I must disagree.
 
 Debugging on a compiler is a very specific skill-set.  Asking someone
 who doesn't do that every day to fix what is probably a compiler bug
 is asking a lot - especially when you may have to venture into the
 realm of processor mnemonics and specific registers to fix the
 problem.
 
 In my opinion, that is especially relevant when dealing with ARM
 because there are so many makers of ARM processors with specific
 tweaks.
 
 T.J.

Ahhh, now we're in my turf: Troubleshooting. If ARM restricts your
choice of compilers, then I'll agree with you vis-a-vis ARM, sort of.

For the wider application of my philosophy, it's amazing how little
subject matter expertise (in this case tracing a compiler all the way
down to instructions and registers) one needs in order to troubleshoot
very effectively. 

Just as one example, in my classes I teach the power of having one
system malfunctioning and one not malfunctioning. You can continue
making each like the other until you can toggle the symptom with one
statement. I call it exploit the differences, and it's very powerful.

So, let's say that I can narrow it down to (just to pull an imaginary
example out of the air) clang crashing on memset() while gcc doesn't.
Obviously, I'd better be sure the locale is the same on both. The next
step could be writing a simplest case that does nothing but a memset,
and see if it still crashes on memset(). If so, then I could write my
own memset, and see if that crashes, and investigate why. Eventually
perhaps, on clang, I could ifdef in my own memset(). Or, if I have the
skills, I could trace memset into assembler.

Perhaps a single memset wouldn't reproduce the symptom. I can then keep
reducing the program until I get the smallest program that can
reproduce the symptom, and experiment with that. And of course, the
most likely scenario will be that it's *my* bad code, not the
compilers, but even if I can prove it's the compiler's, I can work
around it while I wait for the compiler guys to fix their compiler now
that I've reported the problem.

When I find a situation of unexpected behavior with a library or tool,
I usually just work around it and report it to the library devs. The
last thing the user needs is me and the library devs pointing fingers
at each other.

SteveT

Steve Litt*  http://www.troubleshooters.com/
Troubleshooting Training  *  Human Performance

___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


Re: [Dng] OT - It may be only one file, but it does point to the bigger problem!

2015-02-24 Thread Nuno Magalhães
On Tue, Feb 24, 2015 at 12:57 AM, Noel Torres env...@rolamasao.org wrote:
 We have RAID tools like mdadm for RAID, and filesystems like ext4 or Reiserfs
 for file storage.

 Why would I want a tool combining both?

You'd want one so you can, for isntance, avoid a RAID5 write hole. ZFS
seems pretty cool, the only downside i see is perhaps more
fragmentation that other systems.

Cheers,
Nuno
___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


Re: [Dng] OT - It may be only one file, but it does point to the bigger problem!

2015-02-23 Thread Adam Borowski
On Mon, Feb 23, 2015 at 11:47:16AM +0100, Didier Kryn wrote:
 As far as I understand, COW means that the whole file is
 rewritten everytime you change a single byte in it (or is it only
 some extent?). That's a real mess when you are continuously
 appending to files hundreds of megabytes large, which is the job of
 a log server.

No, only a single block.  This is sometimes unwanted as it causes
fragmentation -- your nice contiguous extents will split into small
page/leaf-sized blocks all around, but NOCOW is still a terrible idea.
It breaks pretty much all reasons one might want btrfs over an old-style
filesystem (other than compression and checksums).

NOCOW breaks the semantics behind reflinks and snapshots, which mean you
can't use them for cloning stuff, backups, etc, anymore.  Thus, every single
program that uses NOCOW without an explicit request from the admin is broken
and shouldn't be used anywhere near btrfs.

 If you happen to loose the log files, you don't loose precious data.

If you have two clones, writing to one will overwrite the other.  If you try
to roll back to an old snapshot, whether for forensic or data recovery
reasons, the log is lost.

 Nevertheless I would rather use a different filesystem for /var for
 example and keep btrfs for /usr and /home.

Having all dpkg-managed files (ie, / except /home, /srv, perhaps /var/cache
and friends if you micromanage) on a single btrfs subvolume is required for
proper atomic snapshots.

-- 
// If you believe in so-called intellectual property, please immediately
// cease using counterfeit alphabets.  Instead, contact the nearest temple
// of Amon, whose priests will provide you with scribal services for all
// your writing needs, for Reasonable and Non-Discriminatory prices.
___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


Re: [Dng] OT - It may be only one file, but it does point to the bigger problem!

2015-02-23 Thread Noel Torres
On Sunday, 22 de February de 2015 18:28:06 Jim Murphy escribió:
[...]
 If I have a btrfs mirror and I didn't mess with it by setting FS_NOCOW,
 shouldn't I be able to recover the file?  I would sure hope so.  He
 creates this better way of logging, then he seems to not even care if
 you can use it.

Isn't btrfs the contrary to KISS?

We have RAID tools like mdadm for RAID, and filesystems like ext4 or Reiserfs 
for file storage.

Why would I want a tool combining both?

er Envite
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

OpenPGP key: 1586 50C8 7DBF B050 DE62  EA12 70B4 00F3 EEC7 C372

Spiral galaxies always have at least TWO arms.


signature.asc
Description: This is a digitally signed message part.
___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


Re: [Dng] OT - It may be only one file, but it does point to the bigger problem

2015-02-23 Thread T.J. Duchene
On Monday, February 23, 2015 04:46:34 PM you wrote:
  My philosopher as a free software author is this: The buck stops with
  me. If my software screws up, it's my fault and my responsibility to
  fix, regardless of the actual root cause is in code I wrote or a tool I
  use.
  
  If I were having problems with two different compilers treating my code
  two different ways, I'd #ifdef the hell out of it to kludge it back to
  working order on both.
  
  But that's just me. I've seen a lot of free software authors say hey,
  it's not my fault, it's the __ library or tool. Doesn't help the
  user a heck of a lot.
  
  SteveT
 
 That's a fair point, in an overall sense, Steve.  I'm afraid as a matter of
 practicality, I must disagree.
 
 Debugging on a compiler is a very specific skill-set.  Asking someone who
 doesn't do that every day to fix what is probably a compiler bug is asking a
 lot - especially when you may have to venture into the realm of processor
 mnemonics and specific registers to fix the problem.
 
 In my opinion, that is especially relevant when dealing with ARM because
 there are so many makers of ARM processors with specific tweaks.
 
 T.J.


I realize  I should have spoken more clearly and for that I apologize.  I'll 
endeavor to be clearer in the future.  

What I was trying say is that, I agree that you should make every effort to 
make sure your code works, ultimately you are somewhat hostage to the 
compiler.  The average programmer has no skills in that area, and they should 
simply not make a greater mess by altering their design to accommodate someone 
else's flaw.  These chains of flaws go one for years.  What is really scary 
is 
that eventually people's code *depends* upon the flaw, and that - to me at 
least - is unacceptable.


As a matter of personal pride, I refuse to kludge up my code to fix bugs in 
other people's code.  Readable code is un-kludged code.


If possible, I will hunt down the bug and fix it.  If that is not possible, I 
will either rewrite the code to not trigger the bug, or a patch will be placed 
in a separate file to check for processor type.  

Have a great day!
T.J.

___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


[Dng] OT - It may be only one file, but it does point to the bigger problem!

2015-02-22 Thread Jim Murphy
Hi,

First let me make it clear I'm not a fan of either systemd of journald.

I've been watching the btrfs-linux mailing list, when the following
subject popped up a few days ago:

Systemd 219 now sets the special FS_NOCOW file flag for its journal
   files, possibly breaking RAID repairs.[1]

From what I can glean from the thread and from [systemd-devel]
[ANNOUNCE] systemd 219[2] the concern is for the ability of btrfs to
recover the systemd-journald file if it becomes corrupted.  Poettering
seems to be concerned about write speed, the reason for setting
FS_NOCOW it the first place.  I wonder it the speed issue is due to the
fact that his team are all developing on systems with SSDs.  There was
also the statement that the way FS_NOCOW is set, it only involves the
one file and not the filesystem itself.  I didn't see anything that
contradicted that statement, but I could have missed it.

Part of the discussion:

 btrfs checksumming theoretically allows you to transparently recover
 after media corruption if filesystem has redundancy (more than one
 copy of data). Journald checksum will probably detect corruption, but
 can it repair it?

 No it cannot.

 But btrfs checksumming cannot fix things for you either if you lose
 non-trivial amounts of data. It might be able to fix a few bits of
 errors, but not non-trivial amounts. I mean, that's a simple property
 of error correction codes: the more you want to be able to correct the
 longer must your checksum be. Neither btrfs' nor journald's are
 substantial enough to correct even a sector...

 Lennart

If I have a btrfs mirror and I didn't mess with it by setting FS_NOCOW,
shouldn't I be able to recover the file?  I would sure hope so.  He
creates this better way of logging, then he seems to not even care if
you can use it.

Systemd, to me, is a horror story.  The more I read the scarier it gets.
At the very beginning of the 219 Lennart announcement you find this:

 Note that this version is not available in Fedora F22/F23 yet. The
 linker on ARM segfaults. Since the i386 and x86_64 versions built
 fine, I decided to release 219 anyway.

Onward no matter what.  Ready or not here systemd comes.  We can only
hope that, sooner rather then later, it catches up with them and bites
them, you know where.

[1] The archive for the thread starts here:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/43187

[2] The actual Systemd 219  announcement and LONG discussion can be
  found here:
http://lists.freedesktop.org/archives/systemd-devel/2015-February/028447.html

Just another 2¢ in the pot.  Has anyone been keeping track of how much
is in the pot? :-)

Jim
___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


[Dng] OT - It may be only one file, but it does point to the bigger problem!

2015-02-22 Thread T.J. Duchene

 
 Systemd, to me, is a horror story.  The more I read the scarier it gets.
 
 At the very beginning of the 219 Lennart announcement you find this:
  Note that this version is not available in Fedora F22/F23 yet. The
  linker on ARM segfaults. Since the i386 and x86_64 versions built
  fine, I decided to release 219 anyway.



Systemd has its problems, I agree.

However, that said, before you take anyone - even Lennart - to task on such a 
comment, please consider objectively that it may not be a code problem, but in 
fact a compiler problem.  I'm am not familiar with the specifics of the 
situation, but I felt compelled to mention that GCC has a long history of 
processor specific problems, which I have experienced firsthand.  

The only truth that I can be certain of from reading this is that GCC works 
best only on x86 processors, and that has not changed in nearly 2 decades.
It is also true that a lot of opensource code, even the Linux kernel, 
presently only compiles properly on GCC, rather than others such as 
Clang/LLVM. 



 
 Onward no matter what.  Ready or not here systemd comes.  We can only
 hope that, sooner rather then later, it catches up with them and bites
 them, you know where.

In keeping with commonsense and not hysteria, I hope they do fix things with 
eventually, but the truth is that compilers - regardless of language - can be 
finicky beasts from one processor family to the next.

T.J.

___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng