bug#14231: mv, rm, cp -i need super clear explanation of -i...
$ rm -i * rm: remove regular file `Ph_D_Thesis'? you had better ask my mother $ ls $ I think the info pages should make very clear what is going on in this case, to avoid legal threats one day.
bug#14229: invalid TZ and /bin/date
On 04/18/13 13:24, Donald Berry wrote: > If an invalid TZ argument is passed to /bin/date, > it silently fails but prints the UTC result In the GNU system there is no such thing as an invalid TZ string. Every TZ string has some interpretation (typically as UTC). This is true not just for /bin/date, but for every other program.
bug#14229: invalid TZ and /bin/date
If an invalid TZ argument is passed to /bin/date, it silently fails but prints the UTC result: [dberry@dberry ~]$ TZ=EDT date -d @0 Thu Jan 1 00:00:00 EDT 1970 [dberry@dberry ~]$ TZ=foo date -d @0 Thu Jan 1 00:00:00 foo 1970 It works correctly if using no argument or a valid argument: [dberry@dberry ~]$ date -d @0 Wed Dec 31 19:00:00 EST 1969 [dberry@dberry ~]$ TZ=EST5EDT date -d @0 Wed Dec 31 19:00:00 EST 1969 [dberry@dberry ~]$ TZ=UTC date -d @0 Thu Jan 1 00:00:00 UTC 1970 [dberry@dberry ~]$ rpm -q coreutils coreutils-8.4-19.el6.x86_64 [dberry@dberry ~]$ uname -a Linux dberry.csb 2.6.32-279.19.1.el6.x86_64 #1 SMP Sat Nov 24 14:35:28 EST 2012 x86_64 x86_64 x86_64 GNU/Linux [dberry@dberry ~]$ cat /etc/redhat-release Red Hat Enterprise Linux Workstation release 6.3 (Santiago) [dberry@dberry ~]$ date Thu Apr 18 16:23:46 EDT 2013 Donald Berry, RHCE Technical Account Manager Red Hat Canada Ltd. mobile: 647-338-6329
bug#14226: Sort -c takes in account fields that were outside sorting scope
tag 14226 notabug thanks On 04/18/2013 09:04 AM, Camion SPAM wrote: > The following commands report an error on equals lines because field outside > sorting scope were not sorted How refreshing to get a non-FAQ report on sort - you made me actually do some research! The fact that you used LANG=C to pin the locale is also nice (most people aren't aware that most reported non-bugs in sort are due to locale issues). However, I still think sort is doing the right thing. > > $ cat <<'.' | >> AAA AAA >> BBB BBB >> ZZZ CCC >> DDD DDD >> BBC EEE >> BBD EEE >> BBC EEE >> BBE EEE >> CCC FFF >> DDD GGG >> EEE HHH >> . >> LANG=C sort -k 2,2 -c > sort: -:7: disorder: BBC EEE POSIX says: "Except when the -u option is specified, lines that otherwise compare equal shall be ordered as if none of the options -d, -f, -i, -n, or -k were present (but with -r still in effect, if it was specified) and with all bytes in the lines significant to the comparison." http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html In your example, you did not use -u, and the key you specified was duplicated between two rows, so POSIX requires sort to break the tie by comparing the entire line, and the entire line is indeed different. For comparison purposes, I checked out /usr/bin/sort on Solaris 10; it has the same behavior of declaring your input unsorted. /usr/xpg4/bin/sort on the same machine is not POSIX compliant, in that it lacks -C, and treats -c like the POSIX -C; but it also had non-zero exit status on your sample. If you don't like the POSIX behavior of a mandated entire line as a sort key of final resort, then you should use the GNU extension of -s, I tested that 'LC_ALL=C sort -k2,2 -c -s' has no problems with your example. To see the difference of using or not using the entire line as the final sort key, replace -c by --debug, both with and without -s (you can't use -c and --debug at the same time, unfortunately). However, remember that not all sort implementations have -s, so there is no standard way to get the behavior you are after. I'm closing this as not a bug, although you may continue to add comments or questions to this topic. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
bug#14226: Sort -c takes in account fields that were outside sorting scope
The following commands report an error on equals lines because field outside sorting scope were not sorted $ cat <<'.' | > AAA AAA > BBB BBB > ZZZ CCC > DDD DDD > BBC EEE > BBD EEE > BBC EEE > BBE EEE > CCC FFF > DDD GGG > EEE HHH > . > LANG=C sort -k 2,2 -c sort: -:7: disorder: BBC EEE
bug#14189: ls -d bug ??
Bernhard Voelker wrote: > Bob Proulx wrote: > > `-d' > > `--directory' > > List only the name of directories, not the contents. This is > > most typically used with `-l' to list the information for the > > directory itself instead of its contents. Do not follow symbolic > > links unless the `--dereference-command-line' (`-H'), > > `--dereference' (`-L'), or > > `--dereference-command-line-symlink-to-dir' options are > > specified. Overrides `--recursive', (`-R'). > > Not bad, but I'm still missing the point that `-d' changes ls's behavior > for *directory arguments* only. Hmm... "If an argument is a directory then list only the name of the directory not the contents. Otherwise list the name of the file normally." Showing this around I had one person who was shocked to learn that directories were files. They really wanted this written so that it acted as if directories and files were completely different things. I countered that since directores were files, special files, that we shouldn't make the documentation lead people astray just to make it fit a wrong model of the machine. > Furthermore, I don't think mentioning `-l' is of much relevance here. > So this would melt down the first two sentences as follows: > > `-d' > `--directory' > For directory arguments, list only the information for the > directory itself instead of its contents. Do not follow symbolic > links unless the `--dereference-command-line' (`-H'), > `--dereference' (`-L'), or > `--dereference-command-line-symlink-to-dir' options are > specified. Overrides `--recursive', (`-R'). > That comment made the person who I worked with wordsmithing that line very sad. She was adamant that that tidbit about -l was the only useful part of the option description. And I think I agree that for someone reading the documentation and learning about it that the connection between -d and -l is important to point out explicitly. Maybe not this way but in some way I think we need to tie those two concepts together. > And what about the usage() string? I'd bet this is still 95% where > users are looking for. Something like the following perhaps? > > - -d, --directorylist directory entries instead of contents, > - and do not dereference symbolic links > + -d, --directoryfor directory arguments, list the entry itself > + instead of contents, and do not dereference > + symbolic links I think that is definitely an improvement. Because "entries" in the original I think isn't descriptive enough and makes people think contents instead of just the argument name. But frankly I still don't think it flows very well. If we are already pushed into three lines then let's make use of them. -d, --directory for directory arguments, list the name instead of contents, and do not dereference symbolic links Or perhaps better is: -d, --directory for directory arguments, list the directory name instead of directory contents, and do not dereference symbolic links Looking through other options for style I see: -L, --dereference when showing file information for a symbolic link, show information for the file the link references rather than for the link itself That entry has the same challenge. It is much wordier. The "when showing file information for a symbolic link" is the same task as our "when showing file information for a directory". I like the shorter version "for symbolic link arguments" form. Perhaps as a separate improvement we could change it to: -L, --dereference for symbolic link arguments, show information for the target instead of for the link itself And then perhaps while gaining consistency of description without decreasing the usefulness of either entry we would gain back the line that we used above. Bob
bug#14224: Feature request for the `cut`: record delimiter
Eric Blake wrote: > Should we patch README to include this URL to current HACKING contents, > since we don't ship HACKING in our tarballs? Or, should we reconsider > our position and start shipping HACKING in the tarballs? Of the > statements currently in README: > > > If you obtained this file as part of a "git clone", then see the > > README-hacking file. If this file came to you as part of a tar archive, > > then see the file INSTALL for compilation and installation instructions. > > This one makes sense (HACKING won't be present unless you are working > from git), except that you are not told _how_ to do a "git clone". > > > If you would like to suggest a patch, see the files README-hacking > > and HACKING for tips. > > But this one doesn't mention anything about the files being git-only. I think it would definitely make sense to include some information about the preferred method of getting the source in the main README file. That file is usually the one included in downstream distributions. It would enable people to bootstrap themselves to the source. And GNU is all about access to the source. So I think that would make a lot of sense. Bob
bug#14224: Feature request for the `cut`: record delimiter
George Brink wrote: > Actually I just found yet another way to solve my problem: > perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]), > \"\002\");" data.dat >new_data.dat > It works fine, I was thinking of Perl's -0 option when I asked if you would say a few words about the file and task. But since you had described it yet I was hesitant to suggest it. > but I am a little concerned of the speed. I have over three > hundreds of such files, from 3Mb to 30Mb each. And this process should be > run every day... I thought that by using cut (which just looks for > delimiters) I can gain a few minutes on the whole process. I always recommend benchmarking before optimizing. Knuth is quoted as "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil". Don't forget programmer productivity either. You might shave 10% off of something now but making it imcomprehensible to future admin maintainers who need to understand it later. Simply upgrading the hardware might give a 50% increase in performance. In which case I would leave the algorithm simple and more easily understand and not worry about the performance. Simple and easy to understand is better than raw speed. > Bob, > I understand your desire to receive a discussion of features not inside the > bug related mail list, but here is a extract from the README: > > Mail suggestions and bug reports for these programs to > > the address on the last line of --help output. > And guess what, the `cut --help` has the bug-coreutils email in the last > line! The coreutils email is not mentioned inside README at all. And > bug-coreutils is mentioned several times in different context. > I apologize for using this mail-list inappropriately, but I did not know > about any other mail-lists As Pádraig said, no worries. I didn't mean it to sound mean or snarky. But I can see that my last sentence did come out that way. Sorry. But if I didn't say anything then you wouldn't have said anything and then we wouldn't have been reminded that the contact address hadn't been updated in your version. So it ended well. The way to get the word out is by continuing to talk about it. If people even just read it in passing then they might be informed for the future. Bob
bug#14224: Feature request for the `cut`: record delimiter
On Thu, Apr 18, 2013 at 12:18 PM, Pádraig Brady wrote: > > awk is often suggested too as an alternative to cut. > No, I looked at awk, but it does not have a convenient way to specify lists of printed fields. awk -e "BEGIN{FS="☺"; RS="☻"; OFS=FS; ORS=RS;}; {print $1,$2,$3,$15,$16,$17 ??? ) } You got the picture... It is possible to repeat a cut in awk (and documentation for awk does show how), but this would be a creation of an external application, not a one-liner with a tool from the box.
bug#14224: Feature request for the `cut`: record delimiter
On 04/18/2013 08:41 AM, George Brink wrote: > On Wed, Apr 17, 2013 at 9:13 PM, Pádraig Brady wrote: > >> On 04/17/2013 02:26 PM, George Brink wrote: >>> Hello, >>> >>> I have a task of extracting several "fields" from the text file. The >>> standard `cut` tool could be a perfect tool for a job, but... >>> In my file the '\n' character is a legal symbol inside fields and >> therefore >>> the text file uses other symbol for record-separator. And the `cut` has a >>> hard-coded '\n' for record separator (I just checked the source from the >>> coreutils-8.21 package). >> >> The patch would be simple but not without compatibility cost. >> I.E. scripts using this would immediately become incompatible >> with any systems without this feature. >> >> So you'd like something like tac -s, --separator >> However cut -s is taken, so we'd have to avoid the short -s at least. >> Also tac -s takes a string rather than a character, so >> that gives some extra credence (and complexity) to that option there. >> >> Also related would be to support the -z, --zero-terminated option. >> join, sort and uniq all have this option to use NUL as the record >> separator, >> however they're all closely related sort dependent utilities >> and we're trying to unify options between them. >> >> If it is just a character you want to separate on, >> then you can always use tr to convert before processing, >> albeit with associated data copying overhead. >> >> SEP=^ >> tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr "$SEP"'\n' '\n'"$SEP" >> >> So given that cut is not special here among the text filters, >> and there is a workaround available, I'm 60:40 against >> adding this feature. >> >> thanks, >> Pádraig. >> > > Pádraig, > > Thank you for alternative suggestions. > Actually I just found yet another way to solve my problem: > perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]), > \"\002\");" data.dat >new_data.dat > It works fine, but I am a little concerned of the speed. I have over three > hundreds of such files, from 3Mb to 30Mb each. And this process should be > run every day... I thought that by using cut (which just looks for > delimiters) I can gain a few minutes on the whole process. > > Originally I though of adding "-r, --record-delimiter=DELIM" and > "--output-record-delimiter=DELIM: keys to the cut. > Then the example above could be done with > cut -d☺ -r☻ --output-delimiter=☺ --output-record-delimiter=☻ -f1-3,15-47 > data.dat >new_data.dat > I think it is feasible and would be more convenient (and hopefully faster) > than using a whole perl or two calls to tr. Yes they're the tradeoffs. awk is often suggested too as an alternative to cut. > Bob, > I understand your desire to receive a discussion of features not inside the > bug related mail list, but here is a extract from the README: >> Mail suggestions and bug reports for these programs to >> the address on the last line of --help output. > And guess what, the `cut --help` has the bug-coreutils email in the last > line! The coreutils email is not mentioned inside README at all. And > bug-coreutils is mentioned several times in different context. > I apologize for using this mail-list inappropriately, but I did not know > about any other mail-lists No worries. I saw no issue with your mails. In future cut --help will just point at the following URL which hopefully is easier to follow: http://www.gnu.org/software/coreutils/ thanks, Pádraig.
bug#14224: Feature request for the `cut`: record delimiter
Pádraig, Thank you for alternative suggestions. Actually I just found yet another way to solve my problem: perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]), \"\002\");" data.dat >new_data.dat It works fine, but I am a little concerned of the speed. I have over three hundreds of such files, from 3Mb to 30Mb each. And this process should be run every day... I thought that by using cut (which just looks for delimiters) I can gain a few minutes on the whole process. Originally I though of adding "-r, --record-delimiter=DELIM" and "--output-record-delimiter=DELIM: keys to the cut. Then the example above could be done with cut -d☺ -r☻ --output-delimiter=☺ --output-record-delimiter=☻ -f1-3,15-47 data.dat >new_data.dat I think it is feasible and would be more convenient (and hopefully faster) than using a whole perl or two calls to tr. Bob, I understand your desire to receive a discussion of features not inside the bug related mail list, but here is a extract from the README: > Mail suggestions and bug reports for these programs to > the address on the last line of --help output. And guess what, the `cut --help` has the bug-coreutils email in the last line! The coreutils email is not mentioned inside README at all. And bug-coreutils is mentioned several times in different context. I apologize for using this mail-list inappropriately, but I did not know about any other mail-lists On Wed, Apr 17, 2013 at 9:13 PM, Pádraig Brady wrote: > On 04/17/2013 02:26 PM, George Brink wrote: > > Hello, > > > > I have a task of extracting several "fields" from the text file. The > > standard `cut` tool could be a perfect tool for a job, but... > > In my file the '\n' character is a legal symbol inside fields and > therefore > > the text file uses other symbol for record-separator. And the `cut` has a > > hard-coded '\n' for record separator (I just checked the source from the > > coreutils-8.21 package). > > The patch would be simple but not without compatibility cost. > I.E. scripts using this would immediately become incompatible > with any systems without this feature. > > So you'd like something like tac -s, --separator > However cut -s is taken, so we'd have to avoid the short -s at least. > Also tac -s takes a string rather than a character, so > that gives some extra credence (and complexity) to that option there. > > Also related would be to support the -z, --zero-terminated option. > join, sort and uniq all have this option to use NUL as the record > separator, > however they're all closely related sort dependent utilities > and we're trying to unify options between them. > > If it is just a character you want to separate on, > then you can always use tr to convert before processing, > albeit with associated data copying overhead. > > SEP=^ > tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr "$SEP"'\n' '\n'"$SEP" > > So given that cut is not special here among the text filters, > and there is a workaround available, I'm 60:40 against > adding this feature. > > thanks, > Pádraig. >