Bug#886173: general: ls quotes results with special characters in its output

2018-01-03 Thread Sean Whitton
Hello,

On Tue, Jan 02 2018, Ben Hutchings wrote:

> The output of ls on a terminal has never, in general, been an accurate
> reflection of the contents of a directory.  Consider that filenames
> can contain *any* byte value other than '\0' or '/', so including
> carriage return, newline, backspace and escape characters.
>
> ls also uses multiple columns by default, but without quoting you
> can't generally tell where the columns are, e.g. is:
>
> aa ba ca ab bb cb
>
> a list of 6 two-letter filenames, or 2 filenames with spaces in, or
> something else again?

More points here: https://mywiki.wooledge.org/ParsingLs

> Be careful what you wish for.  But I think this will revert the recent
> change:
>
> alias ls='ls --literal'

Thank you for this tip.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#886173: general: ls quotes results with special characters in its output

2018-01-02 Thread Russ Allbery
mqu...@neosmart.net writes:

> In particular, `ls` output (both in regular and `-l` modes) wraps in
> single quotes the names of files that contain special characters (or, at
> least, a parenthesis), meaning its output is not an accurate reflection
> of the actual contents of the directory.

This was an upstream change in coreutils 8.25, not something specific to
Debian:

  ls now quotes file names unambiguously and appropriate for use in a shell,
  when outputting to a terminal.

Whatever one's opinion of the merits of that upstream change, I think it's
unlikely Debian will want to diverge from upstream behavior for a package
as central as coreutils and a command as central as ls.

-- 
Russ Allbery (r...@debian.org)   



Bug#886173: general: ls quotes results with special characters in its output

2018-01-02 Thread Mahmoud Al-Qudsi
On Tue, Jan 2, 2018 at 3:26 PM, Ben Hutchings  wrote:
> The output of ls on a terminal has never, in general, been an accurate
> reflection of the contents of a directory.  Consider that filenames can
> contain *any* byte value other than '\0' or '/', so including carriage
> return, newline, backspace and escape characters.

Yes, but those are not characters one would reasonably expect to encounter in
a filename or at least would only run into them when expecting to find them
there, whereas one may often run `ls` to see if they correctly escaped the
output of a command (for instance). Both `(` and `'` are commonly found in
document names, too.

> ls also uses multiple columns by default, but without quoting you can't
> generally tell where the columns are, e.g. is:
>
> aa  ba  ca ab  bb  cb
>
> a list of 6 two-letter filenames, or 2 filenames with spaces in, or
> something else again?

This is a carefully chosen example where all entries have the same length; in
a more realistic context, column spacing can easily be determined in the
presence of entries of varying widths. Note that I specifically pointed out
the broken behavior of `ls -l`, including the revered `ls -ahltr`

> I think that the behaviour of single-quoting is very consistent across
> shells, in part because it is specified by POSIX.

You are correct; I should have used double quotes in my example. We're all
C/C++ programmers here, single quotes are for individual characters only ;-)

> Be careful what you wish for.  But I think this will revert the recent
> change:
>
> alias ls='ls --literal'

Thanks for sharing that option, and I shall certainly make heavy use of it.

My final example which demonstrates precisely why this behavior only makes
things worse for both beginners and newcomers to the terminal alike remains
unaddressed:

```
mqudsi@buster ~> touch \'\test\(\'
mqudsi@buster ~> ls -l
-rw-r--r-- 1 mqudsi mqudsi0 Jan  2 14:22 ''\''test('\'''
```

And with regards to the following from your later message:
> You're looking for the ls --quoting-style=WORD option of ls. ls defaults
> to shell-escape if the output is a terminal, and literal otherwise.

If I may be so bold as to disagree, the outut of `ls` executed in a terminal
without and not piped to any other process, is primarily meant for human
consumption. How often does "human consumption" manifest as copy-and-paste vs
"show me the contents of this directory so I can see what it contains?"

Modern shells have solved the escaping problem by using bracketed paste mode to
trigger escaping of content pasted to the terminal. For everyone else that
doesn't need to copy-and-paste and knows offhand that typing certain
characters in a shell requires them to be escaped, that quoting behavior is
only a source of confusion. If you don't have a mouse to copy-and-paste with
(and you're not using vim/tmux or some other esoteric method of selecting text
to copy and paste to the keyboard that would take more time than actually
typing out what you see on the screen), how does this change in any way make
things better or easier?

I really do appreciate that time and effort has gone into the implementation
of this feature, and that dismissing it out of hand like this is bound to
induce a defensive reply, but I ask you to consider that I ran into this bug
within the hour of deploying debian-testing, and in almost two decades of
using everything ranging from the most popular to the most esoteric unix-like
platforms out there, I have never had reason to spend so long or put this much
effort into understanding the output of `ls` or figuring out what files I
actually had on my filesystem. I am sure that I'm not going to be the only one
that is this flummoxed when buster is released with this breaking change - and,
seriously, if the filenames output by `ls` are not held sacrosanct, what is?

Mahmoud Al-Qudsi
NeoSmart Technologies



Bug#886173: general: ls quotes results with special characters in its output

2018-01-02 Thread Ben Hutchings
Control: reassign -1 coreutils

On Tue, 2018-01-02 at 20:38 +, mqu...@neosmart.net wrote:
[...]
> In particular, `ls` output (both in regular and `-l` modes) wraps in
> single quotes the names of files that contain special characters (or, at
> least, a parenthesis), meaning its output is not an accurate reflection
> of the actual contents of the directory.

The output of ls on a terminal has never, in general, been an accurate
reflection of the contents of a directory.  Consider that filenames can
contain *any* byte value other than '\0' or '/', so including carriage
return, newline, backspace and escape characters.

ls also uses multiple columns by default, but without quoting you can't
generally tell where the columns are, e.g. is:

aa  ba  ca
ab  bb  cb

a list of 6 two-letter filenames, or 2 filenames with spaces in, or
something else again?

[...]
> Additionally, the output cannot be copied-and-pasted as it is. What is
> the point of injecting quotes if they don't actually escape/quote their
> content? To illustrate with an example:
> 
> ```
> mqudsi@buster ~> touch \$\(test\)
> mqudsi@buster ~> ls -l
> -rw-r--r-- 1 mqudsi mqudsi0 Jan  2 14:27 '$(test)'
> ```
> 
> That's not shell-safe, but the quotes might lead you to think it were.[...]

It is shell-safe.  No $-expansion is done within single-quoted text.

> There's no good way to account for all inputs or to account for the
> idiosyncrasies of the quoting behavior of all the different shells.

I think that the behaviour of single-quoting is very consistent across
shells, in part because it is specified by POSIX.

[...]
> Hopefully this behavior can be changed to simply listing the contents of
> the specified directory as-is.

Be careful what you wish for.  But I think this will revert the recent
change:

alias ls='ls --literal'

and this will disable all transformation of filenames (the same as if
output is not sent to a terminal):

alias ls='ls --literal --color=never --show-control-chars -1'

Ben.

-- 
Ben Hutchings
Lowery's Law:
If it jams, force it. If it breaks, it needed replacing anyway.


signature.asc
Description: This is a digitally signed message part


Bug#886173: general: ls quotes results with special characters in its output

2018-01-02 Thread mqudsi
Package: general
Severity: normal

Dear Maintainer,

On a clean installation of debian buster, I ran into an issue where the
output of `ls` in various modes led me to confusedly attempt to figure
out where my scripting had gone wrong.

In particular, `ls` output (both in regular and `-l` modes) wraps in
single quotes the names of files that contain special characters (or, at
least, a parenthesis), meaning its output is not an accurate reflection
of the actual contents of the directory.

Observe:

```
mqudsi@buster ~> touch test\(
mqudsi@buster ~> ls -l
mqudsi@buster ld ~> ls -l
-rw-r--r-- 1 mqudsi mqudsi 0 Jan  2 14:22 'test('
```

In my case, I had downloaded a file via `wget` that contained a
parenthesis in its filename. After running `ls`, I presumed that there
was a bug in my shell that passed in the parentheses to `wget`, as I
presumed that `ls` was showing me what was actually there.

Additionally, the output cannot be copied-and-pasted as it is. What is
the point of injecting quotes if they don't actually escape/quote their
content? To illustrate with an example:

```
mqudsi@buster ~> touch \$\(test\)
mqudsi@buster ~> ls -l
-rw-r--r-- 1 mqudsi mqudsi0 Jan  2 14:27 '$(test)'
```

That's not shell-safe, but the quotes might lead you to think it were.

Moreover, the quoting behavior is... weird. I'm not sure what is meant
to be quoted below -- actually, after seeing the output of ls, I'm no
longer sure of anything and have no clue what actually exists on my
filesystem and what doesn't:

```
mqudsi@buster ~> touch \'\test\(\'
mqudsi@buster ~> ls -l
-rw-r--r-- 1 mqudsi mqudsi0 Jan  2 14:22 ''\''test('\'''
```

I think we can all agree that's a bit confusing?

(I'm not clear on whether this is simply a bug or by design, so I'm
hedging my bets and pointing out why it's not a good idea just in case
this behavior was intentional.)

There's no good way to account for all inputs or to account for the
idiosyncrasies of the quoting behavior of all the different shells. It's
not `ls`'s job to insulate the user from making quotes mistakes when
referencing files, if only because in attempting to do so it makes
things a lot worse:

```
mqudsi@buster ~> ls -l
-rw-r--r-- 1 mqudsi mqudsi0 Jan  2 14:27 '$(test)'
mqudsi@buster ~> rm '$(test)' #copy-and-pasted unsafely w/ side-effects
# operation fails, with potentially dangerous side effects
mqudsi@buster ~> rm \'\$(test\)\' #presuming ls output is not quoted
# operation fails because file does not exist
mqudsi@buster ~> rm test\(
# all it took to doing it right
```

Hopefully this behavior can be changed to simply listing the contents of
the specified directory as-is.

Thank you,

Mahmoud Al-Qudsi
NeoSmart Technologies

-- System Information:
Debian Release: buster/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 4.13.0-1-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)