Ian Jackson <ijack...@chiark.greenend.org.uk> wrote: > Hi. I'm sorry that something I had a hand in is causing you an > inconvenience. > > I'm afraid it's not clear to me what "working" and "non-working" > behaviour from your example program is. I don't feel I can reply > comprehensively, so I will comment on some of the details in your > message.
With each iteration of the loop in my example, the file size increases as shown by `-s', yet readline isn't returning any data after it sees a transient EOF. > Eric Wong writes ("Re: Bug#1040947: perl: readline fails to detect updated > file"): > > Both can be data loss bugs, but checking a log file which is > > being written to by another process is a much more common > > occurence than FS failures[1] (or attempting readline on a > > directory (as given in the #1016369 example)) > > AFAICT you are saying that the fix to #1016369 broke a program which > was tailing a logfile. I agree that one should be able to tail a > logfile in perl. I don't think I have a complete opinion about > precisely what set of calls ought to be used to do that, but I would > expect them to mirror the calls needed in C with stdio. Right, and C stdio.h is similarly tricky when it comes to properly dealing with errors, too. I often ended up using unistd.h read(2) or Perl sysread directly for stuff I really care about; combined with checking stat(2) st_size to ensure I've read everything. Data I care about tends to have checksumming built into it's data format (git objects/packs, gzipped texts/tarballs, FLAC audio) Uncompressed log files are transient data that ceases to be relevant after a short time. > > Since this is Perl and TIMTOWTDI, I've never used IO::Handle->error; > > instead I always check defined-ness on each critical return value and > > also enable Perl warnings to catch undefined return values. > > I've never used `eof' checks, either; checking `chomp' result > > can ensure proper termination of lines to detect truncated reads. > > AFIACT you are saying that you have always treated an undef value > from line-reading operations as EOF, and never checked for error. > I think that is erroneous. Maybe so, though most of the reads I do are less critical than writes. A failed read means there's *already* lost data and there's nothing one can do about it. A failed write can be retried on a different FS or rolled back. IME, write errors are far more common (but perhaps that's because I have errors=remount-ro in all my fstabs) In the case of an application's log file, the application is already toast if there's any I/O error; thus any monitoring on application-level log files would cease to be relevant. > That IO errors are rare doesn't mean they oughtn't to be checked for. > Reliable software must check for IO errors and not assume that undef > means EOF. > > I believe perl's autodie gets this wrong, which is very unfortunate. Right, autodie doesn't appear to handle readline at all. > > [1] yes, my early (by my standards) upgrade to bookworm was triggered > > by an SSD failure, but SSD failures aren't a common occurence > > compared to tailing a log file. > > I don't think this is the right tradeoff calculus. > > *With* the fix to #1016369 it is *possible* to write a reliable > program, but soee buggy programs lose data more often. > > *Without* the fix to #1016369 it is completely impossible to write a > reliable program. For reliable programs (e.g. file servers), it's required to check expected vs actual bytes read; that pattern can be applied regardless of #1016369. > Having said all that, I don't see why the *eof* indicator ought to > have to persist. It is only the *errors* that mustn't get lost. So I > think it might be possible for perl to have behaviour that would > make it possible to write reliable programs, which still helping buggy > programs fail less often. Right; EOF indicators should be transient for regular files. It's wrong to consider EOF a permanent condition on regular files. > But, even if that's possible, I'm not sure that it's a good idea. > Buggy programs that lose data only in exceptional error conditions are > a menace. Much better to make such buggy programs malfunction all the > time - then they will be found and fixed. This mentality of breaking imperfect-but-practically-working code in favor of some perfect ideal is damaging to Perl (based on my experience with changes to Ruby driving away users). Fwiw, using `strace -P $PATH -e inject=syscall...' to inject errors for certain paths, both gawk and mawk fail as expected when it fails to read STDIN: echo hello >x # create file named `x' strace -P x -e inject=read:error=EIO gawk '{ print }' <x # exits with 2 However, Neither Perl 5.36 nor 5.32.1 detect EIO on STDIN: strace -P x -e inject=read:error=EIO perl -ane '{ print $_ }' <x # exits with 0 even on Perl 5.36 bookworm At this point (given Perl's maturity); it's less surprising if it kept it's lack of error detection in all cases.