bug#14231: mv, rm, cp -i need super clear explanation of -i...

2013-04-18 Thread jidanni
$ rm -i *
rm: remove regular file `Ph_D_Thesis'? you had better ask my mother
$ ls
$

I think the info pages should make very clear what is going on in this
case, to avoid legal threats one day.





bug#14229: invalid TZ and /bin/date

2013-04-18 Thread Paul Eggert
On 04/18/13 13:24, Donald Berry wrote:
> If an invalid TZ argument is passed to /bin/date,
> it silently fails but prints the UTC result

In the GNU system there is no such thing
as an invalid TZ string.  Every TZ string has
some interpretation (typically as UTC).
This is true not just for /bin/date, but for
every other program.






bug#14229: invalid TZ and /bin/date

2013-04-18 Thread Donald Berry
If an invalid TZ argument is passed to /bin/date, it silently fails but prints 
the UTC result:
[dberry@dberry ~]$ TZ=EDT date -d @0
Thu Jan  1 00:00:00 EDT 1970
[dberry@dberry ~]$ TZ=foo date -d @0
Thu Jan  1 00:00:00 foo 1970

It works correctly if using no argument or a valid argument:
[dberry@dberry ~]$ date -d @0
Wed Dec 31 19:00:00 EST 1969
[dberry@dberry ~]$ TZ=EST5EDT date -d @0
Wed Dec 31 19:00:00 EST 1969
[dberry@dberry ~]$ TZ=UTC date -d @0
Thu Jan  1 00:00:00 UTC 1970

[dberry@dberry ~]$ rpm -q coreutils
coreutils-8.4-19.el6.x86_64
[dberry@dberry ~]$ uname -a
Linux dberry.csb 2.6.32-279.19.1.el6.x86_64 #1 SMP Sat Nov 24 14:35:28 EST 2012 
x86_64 x86_64 x86_64 GNU/Linux
[dberry@dberry ~]$ cat /etc/redhat-release 
Red Hat Enterprise Linux Workstation release 6.3 (Santiago)
[dberry@dberry ~]$ date
Thu Apr 18 16:23:46 EDT 2013

Donald Berry, RHCE
Technical Account Manager
Red Hat Canada Ltd.
mobile: 647-338-6329






bug#14226: Sort -c takes in account fields that were outside sorting scope

2013-04-18 Thread Eric Blake
tag 14226 notabug
thanks

On 04/18/2013 09:04 AM, Camion SPAM wrote:
> The following commands report an error on equals lines because field outside 
> sorting scope were not sorted

How refreshing to get a non-FAQ report on sort - you made me actually do
some research!  The fact that you used LANG=C to pin the locale is also
nice (most people aren't aware that most reported non-bugs in sort are
due to locale issues).  However, I still think sort is doing the right
thing.

> 
> $ cat <<'.' |
>> AAA AAA
>> BBB BBB
>> ZZZ CCC
>> DDD DDD
>> BBC EEE
>> BBD EEE
>> BBC EEE
>> BBE EEE
>> CCC FFF
>> DDD GGG
>> EEE HHH
>> .
>> LANG=C sort -k 2,2 -c
> sort: -:7: disorder: BBC EEE

POSIX says:
"Except when the -u option is specified, lines that otherwise compare
equal shall be ordered as if none of the options -d, -f, -i, -n, or -k
were present (but with -r still in effect, if it was specified) and with
all bytes in the lines significant to the comparison."
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html

In your example, you did not use -u, and the key you specified was
duplicated between two rows, so POSIX requires sort to break the tie by
comparing the entire line, and the entire line is indeed different.

For comparison purposes, I checked out /usr/bin/sort on Solaris 10; it
has the same behavior of declaring your input unsorted.
/usr/xpg4/bin/sort on the same machine is not POSIX compliant, in that
it lacks -C, and treats -c like the POSIX -C; but it also had non-zero
exit status on your sample.

If you don't like the POSIX behavior of a mandated entire line as a sort
key of final resort, then you should use the GNU extension of -s, I
tested that 'LC_ALL=C sort -k2,2 -c -s' has no problems with your
example.  To see the difference of using or not using the entire line as
the final sort key, replace -c by --debug, both with and without -s (you
can't use -c and --debug at the same time, unfortunately).  However,
remember that not all sort implementations have -s, so there is no
standard way to get the behavior you are after.

I'm closing this as not a bug, although you may continue to add comments
or questions to this topic.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


bug#14226: Sort -c takes in account fields that were outside sorting scope

2013-04-18 Thread Camion SPAM
The following commands report an error on equals lines because field outside 
sorting scope were not sorted

$ cat <<'.' |
> AAA AAA
> BBB BBB
> ZZZ CCC
> DDD DDD
> BBC EEE
> BBD EEE
> BBC EEE
> BBE EEE
> CCC FFF
> DDD GGG
> EEE HHH
> .
> LANG=C sort -k 2,2 -c
sort: -:7: disorder: BBC EEE



bug#14189: ls -d bug ??

2013-04-18 Thread Bob Proulx
Bernhard Voelker wrote:
> Bob Proulx wrote:
> >   `-d'
> >   `--directory'
> >  List only the name of directories, not the contents.  This is
> >  most typically used with `-l' to list the information for the
> >  directory itself instead of its contents.  Do not follow symbolic
> >  links unless the `--dereference-command-line' (`-H'),
> >  `--dereference' (`-L'), or
> >  `--dereference-command-line-symlink-to-dir' options are
> >  specified.  Overrides `--recursive', (`-R').
> 
> Not bad, but I'm still missing the point that `-d' changes ls's behavior
> for *directory arguments* only.

Hmm...  "If an argument is a directory then list only the name of the
directory not the contents.  Otherwise list the name of the file
normally."

Showing this around I had one person who was shocked to learn that
directories were files.  They really wanted this written so that it
acted as if directories and files were completely different things.
I countered that since directores were files, special files, that we
shouldn't make the documentation lead people astray just to make it
fit a wrong model of the machine.

> Furthermore, I don't think mentioning `-l' is of much relevance here.
> So this would melt down the first two sentences as follows:
> 
>   `-d'
>   `--directory'
>  For directory arguments, list only the information for the
>  directory itself instead of its contents.  Do not follow symbolic
>  links unless the `--dereference-command-line' (`-H'),
>  `--dereference' (`-L'), or
>  `--dereference-command-line-symlink-to-dir' options are
>  specified.  Overrides `--recursive', (`-R').
> 

That comment made the person who I worked with wordsmithing that line
very sad.  She was adamant that that tidbit about -l was the only
useful part of the option description.  And I think I agree that for
someone reading the documentation and learning about it that the
connection between -d and -l is important to point out explicitly.
Maybe not this way but in some way I think we need to tie those two
concepts together.

> And what about the usage() string?  I'd bet this is still 95% where
> users are looking for. Something like the following perhaps?
> 
> -  -d, --directorylist directory entries instead of contents,
> -   and do not dereference symbolic links
> +  -d, --directoryfor directory arguments, list the entry itself
> +   instead of contents, and do not dereference
> +   symbolic links

I think that is definitely an improvement.  Because "entries" in the
original I think isn't descriptive enough and makes people think
contents instead of just the argument name.  But frankly I still don't
think it flows very well.

If we are already pushed into three lines then let's make use of them.

 -d, --directory  for directory arguments, list the name
instead of contents, and do not
dereference symbolic links

Or perhaps better is:

 -d, --directory  for directory arguments, list the directory
name instead of directory contents,
and do not dereference symbolic links

Looking through other options for style I see:

  -L, --dereference  when showing file information for a symbolic
   link, show information for the file the link
   references rather than for the link itself

That entry has the same challenge.  It is much wordier.  The "when
showing file information for a symbolic link" is the same task as our
"when showing file information for a directory".  I like the shorter
version "for symbolic link arguments" form.  Perhaps as a separate
improvement we could change it to:

  -L, --dereference  for symbolic link arguments, show information
   for the target instead of for the link itself

And then perhaps while gaining consistency of description without
decreasing the usefulness of either entry we would gain back the line
that we used above.

Bob





bug#14224: Feature request for the `cut`: record delimiter

2013-04-18 Thread Bob Proulx
Eric Blake wrote:
> Should we patch README to include this URL to current HACKING contents,
> since we don't ship HACKING in our tarballs?  Or, should we reconsider
> our position and start shipping HACKING in the tarballs?  Of the
> statements currently in README:
> 
> > If you obtained this file as part of a "git clone", then see the
> > README-hacking file.  If this file came to you as part of a tar archive,
> > then see the file INSTALL for compilation and installation instructions.
> 
> This one makes sense (HACKING won't be present unless you are working
> from git), except that you are not told _how_ to do a "git clone".
> 
> > If you would like to suggest a patch, see the files README-hacking
> > and HACKING for tips.
> 
> But this one doesn't mention anything about the files being git-only.

I think it would definitely make sense to include some information
about the preferred method of getting the source in the main README
file.  That file is usually the one included in downstream
distributions.  It would enable people to bootstrap themselves to the
source.  And GNU is all about access to the source.  So I think that
would make a lot of sense.

Bob





bug#14224: Feature request for the `cut`: record delimiter

2013-04-18 Thread Bob Proulx
George Brink wrote:
> Actually I just found yet another way to solve my problem:
> perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]), 
> \"\002\");" data.dat >new_data.dat
> It works fine,

I was thinking of Perl's -0 option when I asked if you would say a few
words about the file and task.  But since you had described it yet I
was hesitant to suggest it.

> but I am a little concerned of the speed. I have over three
> hundreds of such files, from 3Mb to 30Mb each. And this process should be
> run every day... I thought that by using cut (which just looks for
> delimiters) I can gain a few minutes on the whole process.

I always recommend benchmarking before optimizing.  Knuth is quoted as
"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil".

Don't forget programmer productivity either.  You might shave 10% off
of something now but making it imcomprehensible to future admin
maintainers who need to understand it later.  Simply upgrading the
hardware might give a 50% increase in performance.  In which case I
would leave the algorithm simple and more easily understand and not
worry about the performance.  Simple and easy to understand is better
than raw speed.

> Bob,
> I understand your desire to receive a discussion of features not inside the
> bug related mail list, but here is a extract from the README:
> > Mail suggestions and bug reports for these programs to
> > the address on the last line of --help output.
> And guess what, the `cut --help` has the bug-coreutils email in the last
> line! The coreutils email is not mentioned inside README at all. And
> bug-coreutils is mentioned several times in different context.
> I apologize for using this mail-list inappropriately, but I did not know
> about any other mail-lists

As Pádraig said, no worries.  I didn't mean it to sound mean or
snarky.  But I can see that my last sentence did come out that way.
Sorry.

But if I didn't say anything then you wouldn't have said anything and
then we wouldn't have been reminded that the contact address hadn't
been updated in your version.  So it ended well.  The way to get the
word out is by continuing to talk about it.  If people even just read
it in passing then they might be informed for the future.

Bob





bug#14224: Feature request for the `cut`: record delimiter

2013-04-18 Thread George Brink
On Thu, Apr 18, 2013 at 12:18 PM, Pádraig Brady  wrote:

>
> awk is often suggested too as an alternative to cut.
>
No, I looked at awk, but it does not have a convenient way to specify lists
of printed fields.
awk -e "BEGIN{FS="☺"; RS="☻"; OFS=FS; ORS=RS;}; {print $1,$2,$3,$15,$16,$17
??? ) }
You got the picture...
It is possible to repeat a cut in awk (and documentation for awk does show
how), but this would be a creation of an external application, not a
one-liner with a tool from the box.


bug#14224: Feature request for the `cut`: record delimiter

2013-04-18 Thread Pádraig Brady
On 04/18/2013 08:41 AM, George Brink wrote:
> On Wed, Apr 17, 2013 at 9:13 PM, Pádraig Brady  wrote:
> 
>> On 04/17/2013 02:26 PM, George Brink wrote:
>>> Hello,
>>>
>>> I have a task of extracting several "fields" from the text file. The
>>> standard `cut` tool could be a perfect tool for a job, but...
>>> In my file the '\n' character is a legal symbol inside fields and
>> therefore
>>> the text file uses other symbol for record-separator. And the `cut` has a
>>> hard-coded '\n' for record separator (I just checked the source from the
>>> coreutils-8.21 package).
>>
>> The patch would be simple but not without compatibility cost.
>> I.E. scripts using this would immediately become incompatible
>> with any systems without this feature.
>>
>> So you'd like something like tac -s, --separator
>> However cut -s is taken, so we'd have to avoid the short -s at least.
>> Also tac -s takes a string rather than a character, so
>> that gives some extra credence (and complexity) to that option there.
>>
>> Also related would be to support the -z, --zero-terminated option.
>> join, sort and uniq all have this option to use NUL as the record
>> separator,
>> however they're all closely related sort dependent utilities
>> and we're trying to unify options between them.
>>
>> If it is just a character you want to separate on,
>> then you can always use tr to convert before processing,
>> albeit with associated data copying overhead.
>>
>> SEP=^
>> tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr "$SEP"'\n' '\n'"$SEP"
>>
>> So given that cut is not special here among the text filters,
>> and there is a workaround available, I'm 60:40 against
>> adding this feature.
>>
>> thanks,
>> Pádraig.
>>
> 
> Pádraig,
>
> Thank you for alternative suggestions.
> Actually I just found yet another way to solve my problem:
> perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]),
> \"\002\");" data.dat >new_data.dat
> It works fine, but I am a little concerned of the speed. I have over three
> hundreds of such files, from 3Mb to 30Mb each. And this process should be
> run every day... I thought that by using cut (which just looks for
> delimiters) I can gain a few minutes on the whole process.
>
> Originally I though of adding "-r, --record-delimiter=DELIM" and
> "--output-record-delimiter=DELIM: keys to the cut.
> Then the example above could be done with
> cut -d☺ -r☻ --output-delimiter=☺ --output-record-delimiter=☻ -f1-3,15-47
> data.dat >new_data.dat
> I think it is feasible and would be more convenient (and hopefully faster)
> than using a whole perl or two calls to tr.

Yes they're the tradeoffs.
awk is often suggested too as an alternative to cut.

> Bob,
> I understand your desire to receive a discussion of features not inside the
> bug related mail list, but here is a extract from the README:
>> Mail suggestions and bug reports for these programs to
>> the address on the last line of --help output.
> And guess what, the `cut --help` has the bug-coreutils email in the last
> line! The coreutils email is not mentioned inside README at all. And
> bug-coreutils is mentioned several times in different context.
> I apologize for using this mail-list inappropriately, but I did not know
> about any other mail-lists

No worries.  I saw no issue with your mails.
In future cut --help will just point at the
following URL which hopefully is easier to follow:
http://www.gnu.org/software/coreutils/

thanks,
Pádraig.





bug#14224: Feature request for the `cut`: record delimiter

2013-04-18 Thread George Brink
Pádraig,

Thank you for alternative suggestions.
Actually I just found yet another way to solve my problem:
perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]),
\"\002\");" data.dat >new_data.dat
It works fine, but I am a little concerned of the speed. I have over three
hundreds of such files, from 3Mb to 30Mb each. And this process should be
run every day... I thought that by using cut (which just looks for
delimiters) I can gain a few minutes on the whole process.

Originally I though of adding "-r, --record-delimiter=DELIM" and
"--output-record-delimiter=DELIM: keys to the cut.
Then the example above could be done with
cut -d☺ -r☻ --output-delimiter=☺ --output-record-delimiter=☻ -f1-3,15-47
data.dat >new_data.dat
I think it is feasible and would be more convenient (and hopefully faster)
than using a whole perl or two calls to tr.




Bob,
I understand your desire to receive a discussion of features not inside the
bug related mail list, but here is a extract from the README:
> Mail suggestions and bug reports for these programs to
> the address on the last line of --help output.
And guess what, the `cut --help` has the bug-coreutils email in the last
line! The coreutils email is not mentioned inside README at all. And
bug-coreutils is mentioned several times in different context.
I apologize for using this mail-list inappropriately, but I did not know
about any other mail-lists



On Wed, Apr 17, 2013 at 9:13 PM, Pádraig Brady  wrote:

> On 04/17/2013 02:26 PM, George Brink wrote:
> > Hello,
> >
> > I have a task of extracting several "fields" from the text file. The
> > standard `cut` tool could be a perfect tool for a job, but...
> > In my file the '\n' character is a legal symbol inside fields and
> therefore
> > the text file uses other symbol for record-separator. And the `cut` has a
> > hard-coded '\n' for record separator (I just checked the source from the
> > coreutils-8.21 package).
>
> The patch would be simple but not without compatibility cost.
> I.E. scripts using this would immediately become incompatible
> with any systems without this feature.
>
> So you'd like something like tac -s, --separator
> However cut -s is taken, so we'd have to avoid the short -s at least.
> Also tac -s takes a string rather than a character, so
> that gives some extra credence (and complexity) to that option there.
>
> Also related would be to support the -z, --zero-terminated option.
> join, sort and uniq all have this option to use NUL as the record
> separator,
> however they're all closely related sort dependent utilities
> and we're trying to unify options between them.
>
> If it is just a character you want to separate on,
> then you can always use tr to convert before processing,
> albeit with associated data copying overhead.
>
> SEP=^
> tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr "$SEP"'\n' '\n'"$SEP"
>
> So given that cut is not special here among the text filters,
> and there is a workaround available, I'm 60:40 against
> adding this feature.
>
> thanks,
> Pádraig.
>