bug#35939: version sort is incorrect with hyphen-minus
Vincent Lefevre writes ("Re: bug#35939: version sort is incorrect with hyphen-minus"): > On 2019-06-26 18:40:50 -0700, Paul Eggert wrote: > > Perhaps the coreutils manual could be improved to make this all clearer, and > > perhaps it should refer to the Debian manual if it doesn't already. > > In this case, there should be a new ordering option to provide > true numeric sort with strings mixing non-negative integers and > characters. I think the Debian algorithm is such an algorithm, but it has a wrinkle which you are not expecting. Here is the specification: https://www.debian.org/doc/debian-policy/ch-controlfields.html#version Note in particular | The lexical comparison is a comparison of ASCII values modified so | that all the letters sort earlier than all the non-letters and so | that a tilde sorts before anything, even the end of a part So in the Debian algorithm, `-' sorts after `a'. I specified this rule. I did it mainly because of versions like `1.0beta3', which is is probably a prerelease of `1.0' and therefore earlier than `1.0.3'. So `b' has to sort before `.' and my rule seemed the simplest one to achieve that. (The version comparison algorithm is a tradeoff between complexity, and breadth of support for people's then-existing practices.) Nowadays Debian invariably writes `1.0~beta3' but when I invented this scheme I did not include the (invaluable) `~' feature. When this is extended to UTF-8, presumably the ordering should be an ordering of unicode scalar values, with the rule about letters interpreted as referring to anything which Unicode considers a letter. If you want to test the Debian algorithm and have access to a copy of dpkg, you can append -1 to both strings to be the "Debian revision", and prepend "1:" to be the "epoch", and then the middle part should be compared the same way as sort -V etc. Vincent, what is your use case for a comparison algorithm which is like the Debian one but which sorts letters after punctuation ? Ian. -- Ian JacksonThese opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
bug#35939: version sort is incorrect with hyphen-minus
Ian Jackson writes ("Re: bug#35939: version sort is incorrect with hyphen-minus"): > Paul Eggert writes ("Re: bug#35939: version sort is incorrect with > hyphen-minus"): > > GNU sort uses the same algorithm as glibc strverscmp, and this algorithm > > has > > changed only once since strverscmp was added to glibc in 1997. The change > > was > > made in 2009, to fix this bug: > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=9913 > > > > Has the Debian version-comparison algorithm changed since 1997? If so, > > could you > > give details about the changes to the Debian algorithm? Perhaps glibc > > should be > > changed to stay consistent with Debian. > > Debian introduced a special (and very useful) meaning for ~, many > years ago now. > > I checked the Debian policy manual and according to its upgrading > checklist this change was made in 2007. I have just checked the manpage I have here for strverscmp and it is far from clear to me that the algorithm described there, and the dpkg algorithm, produce the same answers. (Even disregarding ~, and the fact that the specification of the dpkg algorithm is defined only over a subset of possible strings even though the unique extension to UTF-8 strings is fairly obvious.) -- Ian JacksonThese opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
bug#35939: version sort is incorrect with hyphen-minus
Paul Eggert writes ("Re: bug#35939: version sort is incorrect with hyphen-minus"): > GNU sort uses the same algorithm as glibc strverscmp, and this algorithm has > changed only once since strverscmp was added to glibc in 1997. The change was > made in 2009, to fix this bug: > > https://sourceware.org/bugzilla/show_bug.cgi?id=9913 > > Has the Debian version-comparison algorithm changed since 1997? If so, could > you > give details about the changes to the Debian algorithm? Perhaps glibc should > be > changed to stay consistent with Debian. Debian introduced a special (and very useful) meaning for ~, many years ago now. I checked the Debian policy manual and according to its upgrading checklist this change was made in 2007. Ian. -- Ian JacksonThese opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
bug#35939: version sort is incorrect with hyphen-minus
Assaf Gordon writes ("Re: bug#35939: version sort is incorrect with hyphen-minus"): > Thanks for the report and the clear details. Hi. I haven't read the original report, but everything you say about the behaviour of GNU coreutils and dpkg sounds correct. This is perhaps an unfortunate wrinkle but I think it is right of coreutils to use the "upstream part" of the dpkg algorithm. > I hope this helps explain the differences (I also hope this explanation is > correct, and I invite others to chime in). I wonder if this could go in some manual somewhere. Regards, Ian. -- Ian JacksonThese opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
bug#21760: timeout: Feature Request: --verbose ==> output if timeout
Ian Jackson writes ("Re: bug#21760: timeout: Feature Request: --verbose ==> output if timeout"): > Pádraig Brady writes ("Re: bug#21760: timeout: Feature Request: --verbose ==> > output if timeout"): > > timeout: aborting command 'blah' with signal SIGTERM > > timeout: aborting command 'blah' with signal SIGKILL > > Yes, please, that would do nicely. I guess we can have -v for this, as well as --verbose ? Ian.
bug#21760: timeout: Feature Request: --verbose ==> output if timeout
Pádraig Brady writes ("Re: bug#21760: timeout: Feature Request: --verbose ==> output if timeout"): > Thanks for detailing your arguments, and +2 for the phrase: > "runic shell circumlocutions will proliferate" :) YW :-). > So I'm leaning towards supporting --verbose which would output something like: > > timeout: aborting command 'blah' with signal SIGTERM > timeout: aborting command 'blah' with signal SIGKILL Yes, please, that would do nicely. Regards, Ian.
bug#21760: timeout: Feature Request: --verbose ==> output if timeout
I have to say that I find this bug thread quite perplexing. It is completely normal for a GNU/Unix command line utility to print a message to stderr in error cases. Almost every program that exits nonzero prints a message to stderr. The normal convention in shell scripts (and other contexts where commands are invoked) is to: * use the exit status to decide whether to continue executing * rely on the failing command to print a message to the script's stderr The stderr error message from a failing command appears on the user's terminal in a script run interactively; it appears in emailed logs from cron; it can appear in logfiles; etc. When I first discovered that GNU timeout(1) does not print an error message when the timeout occurs, I was astonished. IMO that ought to have been the default behaviour. Unfortunately that is too late to fix now but we should at least have a one-letter option to request behaviour compatible with normal shell programming conventions. The alternative is that at most times when use of timeout is added to some program or config file, the programmer/administrator will have to write a clumsy shell circumlocution to arrange that an appropriate message is sent to stderr. These runic shell circumlocutions will proliferate. They will have bugs. The bugs will propagate by cut-and-paste, followed by fixes for the bugs. Everyone's commands will become verbose and hard to understand. All of this could be prevented by simply providing a way to make timeout print a message to stderr. I guess I need to dispose of some the potential problems which have been advanced as counterarguments, even though to my mind they are extremely weak. A key observation I would make is that the arguments against timeout(1) printing a message are fully general counterarguments against _any_ program printing _any_ error message. Surely that shows that they can't be right. > For example I don't like the N seconds, or N.012 more detailed > output. As soon as this is produced there will be other people > trying to parse it. Most of the people who are asking for this feature don't care exactly what the message is. It should mention the program which was invoked and the fact that there was a timeout. The exact format is immaterial. The purpose is not for it to be parsed, but for it to be read by humans who are trying to debug something. This is generally true of error messages. If anyone complains that they are trying to parse this error message you can tell them not to be so silly. There will be many fewer of those than there will be people inconvenienced by the lack of a message at all. Likewise, if someone sends a patch to add more information to the message, that is not a problem. You can just accept it, or not, as you like. > BTW: timeout shares stdout/stderr with its child; therefore, > wouldn't the interleaved output be problematic? No. The purpose is precisely to have the error report from timeout(1) to go to the same place as errors from the command are reported. This is not a problem with any other adverbial command, of which there are very many nowadays. See for example xargs, fakeroot, faketime, authbind, etc. etc. > A good example of a possible problem due to the law of unintended > consequences. How bogglesome. This "interleaving" is precisely the intended consequence. (Actually, what will normally happen is that the message from timeout will follow all of the program's output.) > And if this leads to the request for --output-fd=N to > reroute file descriptors just to work around it then that is much too > much and shouldn't be done. Other adverbial commands have not had such requests and in general I agree that they should be rejected. If this is a problem then a shell rune can be used to replumb the fds. That is a hypothetical timeout -v --output-fd=42 blah blah can be replaced with timeout 3>&2 2>&42 -v sh -ec 'exec 2>&3 3>&- "$@"' x blah blah (assuming fd 3 is not used for something else in $@). This is a fully general technique which can be deployed to implement any such minority use case. The main point is that "want it to print an error message if there is an error" is not a minority use case. Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
bug#8700: Simple way to switch user/group permissions without requiring PAM sessions
Pádraig Brady writes (Re: bug#8700: Simple way to switch user/group permissions without requiring PAM sessions): On 05/19/2011 03:22 PM, Jim Meyering wrote: Colin Watson wrote: Every so often I wish that there existed (preferably in the Debian base system) a tool analogous to chroot that drops privileges from root to a nominated user, group, etc. and runs a given program. chiark-really (Source: chiark-utils) has really which can do this, but of course it's not in Debian base and being set-id for its other purpose it's probably not suitable. OTOH the code is trivial and the behaviour is I think exactly as desired. Ian.
Re: willing to contribute verrevcmp to gnulib?
AJ: the FSF want to put an implementation of the dpkg version comparison algorithm into gnulib. I approve of this, but the git logs say that the current version was supplied by you. There are some copyright questions. Jim Meyering writes (Re: willing to contribute verrevcmp to gnulib?): Bruno was noting that you are listed as the copyright holder. Listed by who ? In the source tarball ? I see the copyright notice at the top of the individual file has not been updated but this is not unusual. Obviously I'm more meticulous about this with my own projects but I'm not currently one of the dpkg maintainers. According to http://git.debian.org/?p=dpkg/dpkg.git;a=commitdiff;h=dba844bd36a18e6abb9e8fc6bb7eff5cb0de4347;hp=cb324560fd600a1a20cb7c930c025879c543e43a the implementation you want was written by Anthony Towns. I know it can seem silly, but it's better if the FSF is the copyright holder. Would you be willing to assign copyright to the FSF? As I understand it the purpose of assigning the copyright to the FSF is so that the FSF can take enforcement action against violators without needing to contact all the authors. But I think such an assignment would prevent the original author from enforcing the copyright themselves ? Is there some other arrangement that would allow either the author or the FSF to enforce the copyright ? For example the author could appoint the FSF or the SFLC as their attorney. This all seems quite a lot of fuss over 20 lines of code! Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: willing to contribute verrevcmp to gnulib?
Kamil Dudka writes (Re: willing to contribute verrevcmp to gnulib?): we use the extended version (handling ~ character), even slightly modified to work better with file names (with suffixes). The new (gnulib) filevercmp code is in attachment. Sorry, I overlooked this message before. Now I'm really confused. You seem to have already rewritten it. So why are people asking for permission and/or copyright assignment ? I haven't reviewed your implementation for correctness but I will if you want me to. Did you base your version on the Debian specification at http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Version ? /* TODO: copyright? Who actually wrote that code ? Actually, looking at it it seems to have been created by copying and pasting Anthony Towns's verrevcmp and reformatting it ! This is surely a ridiculous approach. What if it needs to be extended or modified in the future ? You won't be able to just apply patches from dpkg. Reformatting code just for the sake of it is always wrong. NB that this is a technical objection. I'm sure that Anthony Towns, just like any member of the Free Software world, would not try to use copyright law to prevent someone from making stupid modifications to their code. On to a more useful conversation: Can you explain in some more detail why you took the approach that you did for file extensions ? I'm not sure I understand the specific effect. Is it exactly equivalent to adjusting the lexical value of the rightmost `.' in each string to be just greater than `~' ? What about filenames like alice_1.63.orig.tar.gzor linux-2.6.16.21.tar.bz2 alice_1.63-2.dsc linux-2.6.16.21.tar.bz2.sign alice_1.63-2.diff.gz linux-2.6.16.21.tar.gz linux-2.6.16.21.tar.gz.sign ? If you were to just compare whole filenames you get: alice_1.63-2.diff.gz and linux-2.6.16.21.tar.bz2 alice_1.63-2.dsc linux-2.6.16.21.tar.bz2.sign alice_1.63.orig.tar.gzlinux-2.6.16.21.tar.gz linux-2.6.16.21.tar.gz.sign which is for alice perhaps not ideal but your algorithm does this alice_1.63-2.diff .gz and linux-2.6.16.21.tar .bz2 alice_1.63-2 .dsc linux-2.6.16.21.tar .gz2 alice_1.63.orig.tar .gz linux-2.6.16.21.tar.bz2 .sign linux-2.6.16.21.tar.gz .sign which is just bizarre (spaces put in to show the cut point) and in the RHS leads to odd results. So I think this wrinkle may be doing more harm than good. We might be better off with a simpler algorithm. Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: willing to contribute verrevcmp to gnulib?
Ian Jackson writes (Re: willing to contribute verrevcmp to gnulib?): Did you base your version on the Debian specification at http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Version ? And another thing: you should put a comment in it saying /* * This implements the algorithm for comparison of version strings * specified by Debian and now widely adopted. The detailed * specification can be found in the Debian Policy Manual in the * section on the `Version' control field. This version of the code * implements that from s5.6.12 of Debian Policy v3.8.0.1 * http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Version */ or the moral equivalent. Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: willing to contribute verrevcmp to gnulib?
Jim Meyering writes (Re: willing to contribute verrevcmp to gnulib?): Ben Pfaff [EMAIL PROTECTED] wrote: Hi Ian. The GNU gnulib project is considering adding a function that would compare strings in the same way that dpkg compares version numbers, as you can see in the thread here: http://permalink.gmane.org/gmane.comp.lib.gnulib.bugs/14693 Is there a chance that you would be willing to contribute to the FSF the actual code from dpkg for this? (Otherwise, we will probably have one person write a formal specification for the version comparison algorithm, and then another person implement something equivalent from that specification.) Thanks, Ben. Hi Ian, Perhaps you didn't see the message quoted above? We're about to re-implement verrevcmp from scratch, but you might be able to save us the trouble. You're right, I didn't see it. Thanks for the chase. What is the licence on gnulib ? I'm very probably be happy to relicence. Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: willing to contribute verrevcmp to gnulib?
Ian Jackson writes (Re: willing to contribute verrevcmp to gnulib?): What is the licence on gnulib ? I'm very probably be happy to relicence. I see that there is a mixture of licenses in gnulib. What precisely is the problem ? Is it just that the version in dpkg is GPLv2+ and you want something more liberal ? I certainly think it is silly for the FSF to rewrite code for licensing reasons when that code was written by a GNU Project member! Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: willing to contribute verrevcmp to gnulib?
Ian Jackson writes (Re: willing to contribute verrevcmp to gnulib?): What precisely is the problem ? Is it just that the version in dpkg is GPLv2+ and you want something more liberal ? And which files are we talking about ? NB that the dpkg comparison algorithm was recently extended to support a new character ~ which sorts before the empty string. This work wasn't done by me - but I approve of it and it should be in the gnulib version too. Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: willing to contribute verrevcmp to gnulib?
Jim Meyering writes (Re: willing to contribute verrevcmp to gnulib?): Depending on the code: GPL, LGPL, LGPLv2+. For this function, LGPL or LGPLv2+ would be appropriate, since it's intended to replace glibc's strverscmp. Right. I hereby permit, insofar as I am able, the relicensing of dpkg's lib/vercmp.c, as LGPLv2+. The copy of the verrevcmp function (which is the one I think you probably want as you probably don't want the epoch and revision stuff which is rather Debian-package-specific) in dpkg 1.13.25 appears to have been rewritten. I don't remember rewriting it so I presume that the new implementation (the one currently in use) was written by someone else. Probably it was written by whoever did the `~' support. dpkg's git history should say. Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: willing to contribute verrevcmp to gnulib?
Also, ... Is there a chance that you would be willing to contribute to the FSF the actual code from dpkg for this? (Otherwise, we will probably have one person write a formal specification for the version comparison algorithm, and then another person implement something equivalent from that specification.) ... even if someone does need to rewrite it, there is already a specification in the Debian Policy Manual. Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: making GNU ls -i (--inode) work around the linux readdir bug
Jim Meyering writes (Re: making GNU ls -i (--inode) work around the linux readdir bug): Phillip Susi [EMAIL PROTECTED] wrote: EVERY application that invokes ls -i is effected. Please name one. magicmirror Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: making GNU ls -i (--inode) work around the linux readdir bug
Paul Eggert writes (Re: making GNU ls -i (--inode) work around the linux readdir bug): Tony Finch [EMAIL PROTECTED] writes: Also, readdir(3) is not the only part of POSIX that needs clarifying. I participated in the discussion that resulted in this new d_ino wording being added to POSIX, and my recollection is that the common behavior where readdir returns the inode number of the underlying mount point is now considered to be a bug, _Why_ is it considered a bug ? Is it just that the members of the relevant committee didn't understand what d_ino was for, and its inherent limitations in the absence of a the corresponding dev ? Were there any examples of applications that are broken near a mountpoint with the traditional behaviour but correct with the new behaviour ? If there's any further question about this it might help to track down the relevant Austin Group discussion. Do you have a reference ? Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: making GNU ls -i (--inode) work around the linux readdir bug
Jim Meyering writes (making GNU ls -i (--inode) work around the linux readdir bug): With a Linux-based kernel, GNU ls -i can list the wrong inode for a mount point. Ian Jackson raised this issue two years ago with http://bugs.debian.org/369822, and Wayne Pollock reported it last week via http://bugzilla.redhat.com/453709 This is not the issue I am complaining about. What I was complaining about is that ls -i was very slow because the optimisation had been disabled. That is to say you are proposing to fix my complaint by entrenching the thing I was complaining about. The plan is to test each non-root mount point at configure time by running a C program that calls readdir and lstat and compares the resulting inode numbers. If they ever mismatch, or the test fails for any other reason, disable the optimization whereby ls.c relies on readdir's POSIX-specified d_ino value rather than calling lstat for each directory entry. Note that this applies only to implicit arguments, i.e., not to names listed on the ls command-line. I think this is quite wrong. You should never disable this optimisation. Note that since ls -i does not print device numbers, the output is not really meaningful near mountpoints, since inode numbers are only unique within a device. All systems have traditionally behaved the way I want: that is, to return the inode number of the underlying masked mountpoint directory. Really, I don't care what number is returned and neither should anyone else. Are there _any_ even arguably correct programs which depend on the inode number there being `right' ? What I care about is that ls -i should be as fast as readdir. It always used to be. Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: making GNU ls -i (--inode) work around the linux readdir bug
Jim Meyering writes (Re: making GNU ls -i (--inode) work around the linux readdir bug): Ian Jackson [EMAIL PROTECTED] wrote: That is to say you are proposing to fix my complaint by entrenching the thing I was complaining about. Yes, but only on a system where readdir- and lstat-reported inode numbers differ. That is all systems. All UN*X systems since the dawn of time have behaved this way. I think correctness is important enough to sacrifice the optimization in this unusual corner-case usage of ls. This is the whole _point_ of ls -i ! (how often do you use ls -i? of those times, how often are there enough implicitly-listed files that you would notice a longer run time?) Nearly all of the invocations of ls -i on my systems are by automated processes which do not list all of the arguments and which depend on the optimisation. I consider the stat-reported inode numbers to be authoritative. That ls -i prints other numbers _as a result of an optimization_ feels disconcertingly like a bug. I think you should get over this feeling. It's not a bug. It's the permitted by the specs and it is the way ls -i has always behaved in every system not using the (broken) coreutils behaviour. Note that since ls -i does not print device numbers, the output is not really meaningful near mountpoints, since inode numbers are only unique within a device. Perhaps no program relies on 'ls -i'-reported inode numbers (for implicitly-listed files) matching those reported by stat. Not unlikely. But this is a subtle enough issue that I can imagine it causing trouble some day. Those programs are already broken on every system which doesn't use GNU ls. The kind of programs which uses ls -i is the kind which wants to efficiently determine which files are the same as which other files. Such programs depend on the knowledge that they aren't going to be interfered with by mountpoints. They also depend on the optimisation for acceptable performance. All systems have traditionally behaved the way I want: that is, to return the inode number of the underlying masked mountpoint directory. I've run experiments on Solaris 10 and FreeBSD 6, and see that they exhibit the same undesirable behavior, so this is not Linux-specific. So behaviour you consider `undesirable' is in fact the standard. Really, I don't care what number is returned and neither should anyone else. Are there _any_ even arguably correct programs which depend on the inode number there being `right' ? What I care about is that ls -i should be as fast as readdir. Why? and more importantly, Why should performance trump correctness? It's only incorrect in situations where using the inode number is incorrect anyway. You've failed to respond to my comments about the lack of the device number. Do you know of an application that uses ls -i and requires the performance of the stat-avoiding optimization? Yes! My Debian bug report even describes one! How do you think I discovered this problem ? It always used to be. No, GNU ls has not always used this optimization. Then GNU ls has always had this bug. By `always' I meant in UN*X. Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: making GNU ls -i (--inode) work around the linux readdir bug
Jim Meyering writes (Re: making GNU ls -i (--inode) work around the linux readdir bug): Ian Jackson [EMAIL PROTECTED] wrote: That is all systems. All UN*X systems since the dawn of time have behaved this way. Just because everyone does it doesn't make it right. In fact, since you yourself are referring to standards documents (which are supposed to document existing practice) to prove your point, yes, it does! Furthermore it _is_ right even in absolute terms. You have failed _again_ to respond to my point about the device number. Let me repeat it: When files are only in a single filesystem the inum is sufficient to uniquely identify a file. But when we consider more than one filesystem, the inum and device are needed together because inums on different filesystems are unrelated and may be (often are) the same. Thus any program which uses _only_ inums to tell files apart is broken if it works near a mountpoint. Either the documentation for or filesystem layout used by that program must ensure that mountpoints are not relevant, or the program must take extra special care somehow itself. ls -i prints only inums. So by using ls -i a program promises that mountpoints are not relevant. On many conventional filesystems, readdir is O(n) in the size of the directory but so is stat. So ls -i which does stat is O(n^2). Even on more recent filesystems with tree-structured directories, stat is O(log n) so a statting ls -i is O(n.log n) whereas a traditional ls -i is O(n). ls -i is the _only_ way to get coreutils to give you this listing in O(n). Even if a new interface was introduced to get the old behaviour it would not be backward compatible with existing software. Besides, according to this, http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=369822#60 at least Cygwin 1.5.20 provides a readdir function that works the way I expect. I can't believe it! You're holding up Cygwin as an example of what the UN*X API should be! Am I in some kind of bad dream mirror universe ?! I'm sorry if my tone is rather strident but I'm just boggling. I think correctness is important enough to sacrifice the optimization in this unusual corner-case usage of ls. This is the whole _point_ of ls -i ! Being fast and inaccurate was never the point of ls -i. It's not `inaccurate'. It's perfectly accurate. Any existing correct program will behave correctly with the traditional ls -i. Besides, ls had the -i option long before d_ino was invented. I think you will find that this is false. d_ino has nearly always been there, although some libcs suppressed it. If there are tools for which the optimization is important enough, and ls gets a new option, then people will eventually update them to use the new option to enable the fast-and-loose behavior that is currently the default. I wrote such a tool myself, magicmirror. I'm sure there are others but I don't have any references right now. There are *no existing programs* and *no plausible correct programs* which depend on your new behaviour. Why break existing software for no benefit ? Or better still, maybe someone will fix Linux's getdents (the syscall behind readdir) to do the right thing even in the presence of mount points. This would be slow for many of the same reasons (although maybe not _as_ slow as doing stat for each entry). I hope and trust that kernel developers are more aware of the proper behaviour of the API than implied by your suggestion. Can you name _one_ UN*X system (Cygwin does _not_ count) which behaves the way you think is correct ? I insist: it is a bug. If I weren't convinced I wouldn't be spending time on it, now. If there is nothing I can say to change your mind then why are we having this conversation ? It's the permitted by the specs The old POSIX spec permitted anything. The soon-to-be-current version of POSIX has new wording: The value of the structure's d_ino member shall be set to the file serial number of the file named by the d_name member. If there is no caveat (I don't have the text here) then this is wrong. But adding new options to ls is a big deal, requiring more justification than I've seen so far. If you provide some actual details, like names of applications, along with performance comparisons, that may be enough. My own application magicmirror runs perfectly well without this alleged `fix'. The ls -i takes a negligible time compared to the rest of the program. With a stat on each call, the ls -i did not complete within the time I was willing to let it have (several hours IIRC). I don't presume to know all usage scenarios, so want the default behavior to favor correctness. What correct programs are broken by the traditional behaviour ? So behaviour you consider `undesirable' is in fact the standard. Ha! No. It just means they're all wrong. *boggle* Even if POSIX is adjusted or interpreted to allow their legacy misbehavior, I prefer
Re: Bug#369822: ls -i stats unnecessarily
Paul Eggert writes (Re: Bug#369822: ls -i stats unnecessarily): Ian Jackson [EMAIL PROTECTED] writes: This behaviour is expected: if you readdir the directory containing a mountpoint, you get the inode number of the directory in the underlying filesystem; That's not the behavior that I expected. Also, it's not useful behavior--at least, it's not useful for the vast majority of real-world applications. In contrast, it is useful for 'ls -i' to print the inode number of the root of the mounted file system, for 'find -inum' to use that inode number, and so forth. If you know it's a mounted filesystem, you can ask ls -id /mount/point/. instead of ls -id /mount/point If you don't know it's a mounted filesystem then the inode number of the mounted filesystem is useless to you. Eg, inum=`ls -id /unexpected/mount/point/.` find / -xdev -inum $inum isn't going to work properly ! I can understand why readdir might have the behavior that you describe: it might be more efficient internally. But that doesn't make it correct, or even expected. It's a bug in readdir. You might say that it's a deficiency in the readdir interface, as well as in ls -i, etc. etc., that it doesn't provide the dev as well as the inum. But however you look at it, the inum on its own isn't useful if you don't know either (a) there are no mountpoints here or (b) exactly where the mountpoints are. In case (a) you don't care about the distinction; in case (b) you can compensate with stat. Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Bug#369822: ls -i stats unnecessarily
Jim Meyering writes (Re: Bug#369822: ls -i stats unnecessarily): So at least Solaris 8 and some glibc are affected. Err, what's glibc got to do with it ? This behaviour is expected: if you readdir the directory containing a mountpoint, you get the inode number of the directory in the underlying filesystem; if you then stat the mountpoint, you get the inode number of the root of the filesystem mounted on top. There are I think two approaches to this problem: * find a list of mountpoints in some system-specific way for each one stat mountpoint/.. compare device and inode with those of the directory we're readdir'ing * provide an option to allow the user to specify that they don't mind the inode numbers of mountpoints being wrong The 2nd is easier and certainly less fragile, and since this no-stat optimisation is only necessary in some specialised applications (of which I happen to have an application where it's absolutely essential because statting each file takes far far too long), it's not that unreasonable to demand a special option. unless I find a better approach, I'll turn off this optimization by default, and add an option to turn it back on. Right. Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Bug#369822: ls -i stats unnecessarily
Ian Jackson writes (Re: Bug#369822: ls -i stats unnecessarily): There are I think two approaches to this problem: * find a list of mountpoints in some system-specific way for each one stat mountpoint/.. compare device and inode with those of the directory we're readdir'ing * provide an option to allow the user to specify that they don't mind the inode numbers of mountpoints being wrong Someone has just pointed out to me that no matter what you do, you don't get the dev for the covering filesystem. So returning the inum of the root of the covering fs is definitely wrong and should never be done. Think about it: if you ls -i anywhere near a mount point you're _inevitably_ going to get useless data because the output doesn't contain devs. So anyone who does ls -i usefully must know that there are no mountpoints and this whole issue can be ignored. Ian. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils