bug#60989: [PATCH] rm: fail on duplicate input if force not enabled

2023-01-22 Thread Pádraig Brady

On 22/01/2023 18:18, Philip Rowlands wrote:

On Sat, 21 Jan 2023, at 13:05, Łukasz Sroka wrote:

When the input files contain duplicates, then the rm fails. Because
 duplicates occur most often when the * is used and the shell unwraps it.
 There is a very common scenario when a user accidentally enters space
 after a filename, or enters space instead of forward slash.


To fail on duplicate FILE args, this bash function would do (lightly tested, 
doesn't attempt getopt processing):

function safe_rm {
   local -A seen
   local file
   for file in "$@"; do
 if [[ -v ${seen[$file]} ]]; then
   echo "error: duplicate name '$file'" 1>&2
   return 1
 fi
 seen[$file]=1
   done

   # no dupes seen
   command rm "$@"
}

and could be used today, without waiting for the next coreutils release.


That's informative, thanks.



As an aside, I could be reading it wrong but the coreutils manual suggests the 
file arguments are optional
rm [option]… [file]…


Right with the -f option rm will not fail if no arguments are specified
(in the presence of nullglob etc.), which is POSIX compliant.

cheers,
Pádraig





bug#60989: [PATCH] rm: fail on duplicate input if force not enabled

2023-01-22 Thread Philip Rowlands
On Sat, 21 Jan 2023, at 13:05, Łukasz Sroka wrote:
> When the input files contain duplicates, then the rm fails. Because
> duplicates occur most often when the * is used and the shell unwraps it.
> There is a very common scenario when a user accidentally enters space
> after a filename, or enters space instead of forward slash.

To fail on duplicate FILE args, this bash function would do (lightly tested, 
doesn't attempt getopt processing):

function safe_rm {
  local -A seen
  local file
  for file in "$@"; do
if [[ -v ${seen[$file]} ]]; then
  echo "error: duplicate name '$file'" 1>&2
  return 1
fi
seen[$file]=1
  done

  # no dupes seen
  command rm "$@"
}

and could be used today, without waiting for the next coreutils release.

As an aside, I could be reading it wrong but the coreutils manual suggests the 
file arguments are optional
   rm [option]… [file]…


Cheers,
Phil





bug#60989: [PATCH] rm: fail on duplicate input if force not enabled

2023-01-21 Thread Łukasz Sroka
On 21/01/2023 15:53, Pádraig Brady  wrote:
>
> On 21/01/2023 13:05, Łukasz Sroka wrote:
> >  When the input files contain duplicates, then the rm fails. Because
> >  duplicates occur most often when the * is used and the shell unwraps 
> > it.
> >  There is a very common scenario when a user accidentally enters space
> >  after a filename, or enters space instead of forward slash.
> >  Example:
> >
> >rm prefix_ *
> >
> >  The user intended to remove all files with a `prefix_` but removed all
> >  of the files in cwd.
> >  The program quits immediately when a duplicate is detected, to prevent
> >  pressing `y` because user expected a prompt regarding removing multiple
> >  files.
> >  The force option disables this function to enable scripts to work
> >  without modifying them.
> >
> > ```
> > diff --git a/src/rm.c b/src/rm.c
> > index 354e2b0df..e4f9949f0 100644
> > --- a/src/rm.c
> > +++ b/src/rm.c
> > @@ -123,6 +123,16 @@ diagnose_leading_hyphen (int argc, char **argv)
> >   }
> >   }
> >
> > +static bool
> > +find_duplicates (int n_files, char **files)
> > +{
> > +  for (int l = 0; l < n_files-1; l++)
> > +for (int r = l+1; r < n_files; r++)
> > +  if (strcmp(files[l], files[r]) == 0)
> > +  return true;
> > +  return false;
> > +}
> > +
> >   void
> >   usage (int status)
> >   {
> > @@ -211,6 +221,7 @@ main (int argc, char **argv)
> > bool preserve_root = true;
> > struct rm_options x;
> > bool prompt_once = false;
> > +  bool force_rm = false;
> > int c;
> >
> > initialize_main (, );
> > @@ -238,6 +249,7 @@ main (int argc, char **argv)
> > x.interactive = RMI_NEVER;
> > x.ignore_missing_files = true;
> > prompt_once = false;
> > +  force_rm = true;
> > break;
> >
> >   case 'i':
> > @@ -352,6 +364,17 @@ main (int argc, char **argv)
> > uintmax_t n_files = argc - optind;
> > char **file =  argv + optind;
> >
> > +  if (!force_rm && find_duplicates(n_files, file))
> > +{
> > +  /* Because usually when the input files are duplicated it means
> > that the user
> > + sumbitted both a directory and an * as separate arguments,
> > probably by accident */
> > +  fprintf (stderr,
> > +   "%s: input contains duplicates, most likely you've put "
> > +   "both * and a file from the same directory.\n",
> > +   program_name);
> > +  return EXIT_FAILURE;
> > +}
> > +
> > if (prompt_once && (x.recursive || 3 < n_files))
> >   {
> > fprintf (stderr,
> > ```
>
> An interesting proposal.
> The main protection would be for `dir/ *` rather than `file_prefix_ *`.
> The former would be unusual for a user to type, while the latter more usual, 
> but wouldn't trigger the protection AFAICS.
> This ads O(N^2) on each interaction, so if it was to be included probably 
> only enabled with --interactive.
>
> cheers,
> Pádraig

Yeah, true. Implemented it quickly and focused on the `dir/ *`
scanerio at first, because that has happened to me -.-
It was because I tried to do something quickly while editing command
from backsearch and ended up with `rm tmp/ *` instead of `rm tmp/*`.
To make the prefix_ protection it won't be possible without checking
if a "prefix" exits on disk (eg. file and file1), which would be more
time complex.
I don't like the --interactive approach, as when you're trying to do
something quickly, you probably won't opt for doing it interactively
and you're probably either disabled or ignoring shell warnings.

The more I think about this problem, the more I tilt towards
implementing it in shell. If you'd've written "rm prefix_ *" and had
10k files there, it would be so much easier to detect it on the shell
side than checking every filename with the other and triggering
syscalls for every one...
Either way if you have other solutions in mind, please update me, as
it wouldn't require to configure shell in dockers or remote machines,
thus making the solution more available.





bug#60989: [PATCH] rm: fail on duplicate input if force not enabled

2023-01-21 Thread Pádraig Brady

On 21/01/2023 15:51, Łukasz Sroka wrote:

On 21/01/2023 15:53, Pádraig Brady  wrote:


On 21/01/2023 13:05, Łukasz Sroka wrote:

  When the input files contain duplicates, then the rm fails. Because
  duplicates occur most often when the * is used and the shell unwraps it.
  There is a very common scenario when a user accidentally enters space
  after a filename, or enters space instead of forward slash.
  Example:

rm prefix_ *

  The user intended to remove all files with a `prefix_` but removed all
  of the files in cwd.
  The program quits immediately when a duplicate is detected, to prevent
  pressing `y` because user expected a prompt regarding removing multiple
  files.
  The force option disables this function to enable scripts to work
  without modifying them.

```
diff --git a/src/rm.c b/src/rm.c
index 354e2b0df..e4f9949f0 100644
--- a/src/rm.c
+++ b/src/rm.c
@@ -123,6 +123,16 @@ diagnose_leading_hyphen (int argc, char **argv)
   }
   }

+static bool
+find_duplicates (int n_files, char **files)
+{
+  for (int l = 0; l < n_files-1; l++)
+for (int r = l+1; r < n_files; r++)
+  if (strcmp(files[l], files[r]) == 0)
+  return true;
+  return false;
+}
+
   void
   usage (int status)
   {
@@ -211,6 +221,7 @@ main (int argc, char **argv)
 bool preserve_root = true;
 struct rm_options x;
 bool prompt_once = false;
+  bool force_rm = false;
 int c;

 initialize_main (, );
@@ -238,6 +249,7 @@ main (int argc, char **argv)
 x.interactive = RMI_NEVER;
 x.ignore_missing_files = true;
 prompt_once = false;
+  force_rm = true;
 break;

   case 'i':
@@ -352,6 +364,17 @@ main (int argc, char **argv)
 uintmax_t n_files = argc - optind;
 char **file =  argv + optind;

+  if (!force_rm && find_duplicates(n_files, file))
+{
+  /* Because usually when the input files are duplicated it means
that the user
+ sumbitted both a directory and an * as separate arguments,
probably by accident */
+  fprintf (stderr,
+   "%s: input contains duplicates, most likely you've put "
+   "both * and a file from the same directory.\n",
+   program_name);
+  return EXIT_FAILURE;
+}
+
 if (prompt_once && (x.recursive || 3 < n_files))
   {
 fprintf (stderr,
```


An interesting proposal.
The main protection would be for `dir/ *` rather than `file_prefix_ *`.
The former would be unusual for a user to type, while the latter more usual, 
but wouldn't trigger the protection AFAICS.
This ads O(N^2) on each interaction, so if it was to be included probably only 
enabled with --interactive.

cheers,
Pádraig


Yeah, true. Implemented it quickly and focused on the `dir/ *`
scanerio at first, because that has happened to me -.-
It was because I tried to do something quickly while editing command
from backsearch and ended up with `rm tmp/ *` instead of `rm tmp/*`.
To make the prefix_ protection it won't be possible without checking
if a "prefix" exits on disk (eg. file and file1), which would be more
time complex.
I don't like the --interactive approach, as when you're trying to do
something quickly, you probably won't opt for doing it interactively
and you're probably either disabled or ignoring shell warnings.


True, but at a high level -I can be seen as
an unintrusive "be more careful" option,
and so is often enabled by default with an alias.
Red Hat flavored systems even enable the more intrusive
-i option by default for the root user for example.


The more I think about this problem, the more I tilt towards
implementing it in shell. If you'd've written "rm prefix_ *" and had
10k files there, it would be so much easier to detect it on the shell
side than checking every filename with the other and triggering
syscalls for every one...
Either way if you have other solutions in mind, please update me, as
it wouldn't require to configure shell in dockers or remote machines,
thus making the solution more available.


A shell option like `shopt -s uniqueglob` would indeed be more general.
There are various other globbing options, like the similar "failglob"
which fails to execute the command if a glob doesn't match anything.

cheers,
Pádraig





bug#60989: [PATCH] rm: fail on duplicate input if force not enabled

2023-01-21 Thread Pádraig Brady

On 21/01/2023 13:05, Łukasz Sroka wrote:

 When the input files contain duplicates, then the rm fails. Because
 duplicates occur most often when the * is used and the shell unwraps it.
 There is a very common scenario when a user accidentally enters space
 after a filename, or enters space instead of forward slash.
 Example:

   rm prefix_ *

 The user intended to remove all files with a `prefix_` but removed all
 of the files in cwd.
 The program quits immediately when a duplicate is detected, to prevent
 pressing `y` because user expected a prompt regarding removing multiple
 files.
 The force option disables this function to enable scripts to work
 without modifying them.

```
diff --git a/src/rm.c b/src/rm.c
index 354e2b0df..e4f9949f0 100644
--- a/src/rm.c
+++ b/src/rm.c
@@ -123,6 +123,16 @@ diagnose_leading_hyphen (int argc, char **argv)
  }
  }

+static bool
+find_duplicates (int n_files, char **files)
+{
+  for (int l = 0; l < n_files-1; l++)
+for (int r = l+1; r < n_files; r++)
+  if (strcmp(files[l], files[r]) == 0)
+  return true;
+  return false;
+}
+
  void
  usage (int status)
  {
@@ -211,6 +221,7 @@ main (int argc, char **argv)
bool preserve_root = true;
struct rm_options x;
bool prompt_once = false;
+  bool force_rm = false;
int c;

initialize_main (, );
@@ -238,6 +249,7 @@ main (int argc, char **argv)
x.interactive = RMI_NEVER;
x.ignore_missing_files = true;
prompt_once = false;
+  force_rm = true;
break;

  case 'i':
@@ -352,6 +364,17 @@ main (int argc, char **argv)
uintmax_t n_files = argc - optind;
char **file =  argv + optind;

+  if (!force_rm && find_duplicates(n_files, file))
+{
+  /* Because usually when the input files are duplicated it means
that the user
+ sumbitted both a directory and an * as separate arguments,
probably by accident */
+  fprintf (stderr,
+   "%s: input contains duplicates, most likely you've put "
+   "both * and a file from the same directory.\n",
+   program_name);
+  return EXIT_FAILURE;
+}
+
if (prompt_once && (x.recursive || 3 < n_files))
  {
fprintf (stderr,
```


An interesting proposal.
The main protection would be for `dir/ *` rather than `file_prefix_ *`.
The former would be unusual for a user to type, while the latter more usual, 
but wouldn't trigger the protection AFAICS.
This ads O(N^2) on each interaction, so if it was to be included probably only 
enabled with --interactive.

cheers,
Pádraig





bug#60989: [PATCH] rm: fail on duplicate input if force not enabled

2023-01-21 Thread Łukasz Sroka
When the input files contain duplicates, then the rm fails. Because
duplicates occur most often when the * is used and the shell unwraps it.
There is a very common scenario when a user accidentally enters space
after a filename, or enters space instead of forward slash.
Example:

  rm prefix_ *

The user intended to remove all files with a `prefix_` but removed all
of the files in cwd.
The program quits immediately when a duplicate is detected, to prevent
pressing `y` because user expected a prompt regarding removing multiple
files.
The force option disables this function to enable scripts to work
without modifying them.

```
diff --git a/src/rm.c b/src/rm.c
index 354e2b0df..e4f9949f0 100644
--- a/src/rm.c
+++ b/src/rm.c
@@ -123,6 +123,16 @@ diagnose_leading_hyphen (int argc, char **argv)
 }
 }

+static bool
+find_duplicates (int n_files, char **files)
+{
+  for (int l = 0; l < n_files-1; l++)
+for (int r = l+1; r < n_files; r++)
+  if (strcmp(files[l], files[r]) == 0)
+  return true;
+  return false;
+}
+
 void
 usage (int status)
 {
@@ -211,6 +221,7 @@ main (int argc, char **argv)
   bool preserve_root = true;
   struct rm_options x;
   bool prompt_once = false;
+  bool force_rm = false;
   int c;

   initialize_main (, );
@@ -238,6 +249,7 @@ main (int argc, char **argv)
   x.interactive = RMI_NEVER;
   x.ignore_missing_files = true;
   prompt_once = false;
+  force_rm = true;
   break;

 case 'i':
@@ -352,6 +364,17 @@ main (int argc, char **argv)
   uintmax_t n_files = argc - optind;
   char **file =  argv + optind;

+  if (!force_rm && find_duplicates(n_files, file))
+{
+  /* Because usually when the input files are duplicated it means
that the user
+ sumbitted both a directory and an * as separate arguments,
probably by accident */
+  fprintf (stderr,
+   "%s: input contains duplicates, most likely you've put "
+   "both * and a file from the same directory.\n",
+   program_name);
+  return EXIT_FAILURE;
+}
+
   if (prompt_once && (x.recursive || 3 < n_files))
 {
   fprintf (stderr,
```