Re: ksh tab completion bug

2020-11-10 Thread Michael Forney
Hi Anton,

Thanks for providing those links to the past discussion.

Anton Lindqvist  wrote:
> As I would address this, the numbers of arguments passed to the
> completion related routines is already painfully^W long. I would rather
> take a step back and introduce a `struct globstate' (just an example,
> could be renamed) which includes all the necessary parameters. Getting
> such refactoring in place first would make the diff even smaller.

After thinking about this some more, I think the word length is not
really the right thing to check, even if you account for escape
characters. What we really want to know is whether or not the current
word was modified by completing up to a common prefix.

Here is another (slightly contrived) case where completion is not
performed, even when backslashes are not involved:

Say you have a user somelonguser, whose home directory is /a, and
there are two files in that directory, /a/test1 and /a/test2. ksh
will not complete ~somelonguser/t to /a/test since the expanded
word is shorter than the original.

This example works on vi completion for the same reason as before.

I took a stab at changing do_complete to use a similar approach as
vi completion (diff below), and it is a bit simpler (only modifies
emacs.c and net removal). However, I still have some concerns:

* There is a slight behavior change when you have test1 and test2,
  and manually type the entire common prefix. Previously, the options
  would be displayed on the first , but now they are only
  displayed on the second .
* The use of a global variable to track whether the word was modified
  since the last completion is a bit messy. vi.c contains a bunch
  of `expanded = NONE` scattered throughout the file. It is quite
  possible that the two I added to emacs.c are not sufficient for
  some editing operations.

Here's another idea for a fix that might address those issues:

* After the call to x_cf_glob, save the part of the buffer between
  start and end in some temporary location.
* Always replace start through end with the longest common prefix.
* After replacement, compare the escaped word with the saved original
  word. If it matches exactly, print the replacement options.

This would avoid the global variable, but I'm not sure how to
determine the new end position after x_escape() since it just calls
x_do_ins() directly.

Though not quite related, I noticed that x_expand does not call
x_free_words before it returns, so it leaks memory in ATEMP. But
since that gets reclaimed after each command, it's probably not
worth worrying about.

> Also, emacs mode has its own regress suite located in
> regress/bin/ksh/edit/emacs.sh. Tricky logic like this should be covered
> in my opinion. Below is a rough diff how to test file name completion.

I'm using OpenBSD's ksh through oksh, which does not contain the
regress tests. I think I'd have to get an OpenBSD VM up and running
in order to contribute a test case.

> Michael, is this something you would like to continue work on?

I would like to see this issue fixed, but now that I have a local
patch that works for me, my motivation is dwindling. I'm happy to
fix any details with my diff, but if there is substantial refactoring
involved in order to merge, then probably not.

I also have no particular attachment to the diffs I wrote. If anyone
has their own idea about a better solution, feel free to go ahead
with that.

diff --git a/emacs.c b/emacs.c
index 2b70992..2ec4fc4 100644
--- a/emacs.c
+++ b/emacs.c
@@ -130,6 +130,7 @@ static  int x_arg_set;
 static char*macro_args;
 static int prompt_skip;
 static int prompt_redraw;
+static int completed;
 
 static int x_ins(char *);
 static voidx_delete(int, int);
@@ -460,6 +461,7 @@ x_ins(char *s)
 
if (x_do_ins(s, strlen(s)) < 0)
return -1;
+   completed = 0;
/*
 * x_zots() may result in a call to x_adjust()
 * we want xcp to reflect the new position.
@@ -566,7 +568,7 @@ x_delete(int nc, int push)
for (cp = x_lastcp(); cp > xcp; )
x_bs(*--cp);
 
-   return;
+   completed = 0;
 }
 
 static int
@@ -1749,7 +1751,6 @@ do_complete(int flags,/* 
XCF_{COMMAND,FILE,COMMAND_FILE} */
int nwords;
int start, end, nlen, olen;
int is_command;
-   int completed = 0;
 
nwords = x_cf_glob(flags, xbuf, xep - xbuf, xcp - xbuf,
&start, &end, &words, &is_command);
@@ -1759,7 +1760,7 @@ do_complete(int flags,/* 
XCF_{COMMAND,FILE,COMMAND_FILE} */
return;
}
 
-   if (type == CT_LIST) {
+   if (type == CT_LIST || type == CT_COMPLIST && completed) {
x_print_expansions(nwords, words, is_command);
x_redraw(0);
x_free_words(nwords, words);
@@ -1769,27 +1770,20 @@ do_complete(int flags,  /* 
XCF_{COMMAND,FILE,COMMAND_FILE} */
olen = end - start;
nlen = x_longest_prefix(nwords, words);

Re: ksh tab completion bug

2020-11-10 Thread Anton Lindqvist
On Tue, Nov 10, 2020 at 06:49:28PM +0100, Sven M. Hallberg wrote:
> Apologies for jumping in as a bystander, but if I may comment:
> 
> Anton Lindqvist on Tue, Nov 10 2020:
> > Been brought up before[1] and rejected[2][3].
> 
> Anton Lindqvist on Sun, Jul 02, 2017:
> > diff below in which slashes are discarded when comparing the length. I
> > don't know if any other character should be discarded as well, if true
> > then it might be worth passing the input buffer through ksh's own
> > lexer and [...]
> 
> The patch then seems to have been rejected for further complicating some
> already hairy code and the above does sound rather unsafe.
> 
> Michael's aim here appears to be to simplify things:
> 
> Michael Forney on Mon, Nov 09, 2020:
> > emacs.c:do_complete() using `end - start` for olen (which counts the
> > backslashes), but nlen (the longest prefix length) does not. [...]
> >
> > I don't believe vi.c:complete_word() has this issue [...]
> > 
> > [my diff] is longer than I had hoped. Perhaps there is a simpler patch
> > to make emacs.c:do_complete() use the same approach as vi completion.
> 
> Maybe some iteration is all that is needed.

I not saying the diff is bad in anyway, just sharing a related data
point. I do think this would be an improvement and the approach looks
better than mine.

As I would address this, the numbers of arguments passed to the
completion related routines is already painfully^W long. I would rather
take a step back and introduce a `struct globstate' (just an example,
could be renamed) which includes all the necessary parameters. Getting
such refactoring in place first would make the diff even smaller.

Also, emacs mode has its own regress suite located in
regress/bin/ksh/edit/emacs.sh. Tricky logic like this should be covered
in my opinion. Below is a rough diff how to test file name completion.

Michael, is this something you would like to continue work on?

Index: emacs.sh
===
RCS file: /cvs/src/regress/bin/ksh/edit/emacs.sh,v
retrieving revision 1.11
diff -u -p -r1.11 emacs.sh
--- emacs.sh3 Apr 2019 14:59:34 -   1.11
+++ emacs.sh10 Nov 2020 19:53:30 -
@@ -139,7 +139,11 @@ testseq "a\0033#" " # a\r # #a \0010\001
 
 # XXX ^[^[: complete
 # XXX ^X^[: complete-command
-# XXX ^[^X: complete-file
+
+# ^[^X: complete-file
+: >abc
+testseq "ls \0033\0030" " # abc"
+
 # XXX ^I, ^[=: complete-list
 
 # [n] ERASE, ^?, ^H: delete-char-backward
Index: subr.sh
===
RCS file: /cvs/src/regress/bin/ksh/edit/subr.sh,v
retrieving revision 1.8
diff -u -p -r1.8 subr.sh
--- subr.sh 21 Nov 2017 19:25:46 -  1.8
+++ subr.sh 10 Nov 2020 19:53:30 -
@@ -18,7 +18,7 @@
 testseq() {
stdin=$1
exp=$(echo "$2")
-   act=$(echo -n "$stdin" | ./edit -p "$PS1" ${KSH:-/bin/ksh} -r 2>&1)
+   act=$(echo -n "$stdin" | "$EDIT" -p "$PS1" ${KSH:-/bin/ksh} -r 2>&1)
[ $? = 0 ] && [ "$exp" = "$act" ] && return 0
 
dump input "$stdin"
@@ -32,3 +32,9 @@ dump() {
printf '%s:\n>>>%s<<<\n' "$1" "$(echo -n "$2" | vis -ol)"
echo -n "$2" | hexdump -Cv
 }
+
+EDIT="${PWD}/edit"
+
+TMP="$(mktemp -dt ksh.XX)"
+trap 'rm -r $TMP' EXIT
+cd "$TMP"



Re: ksh tab completion bug

2020-11-10 Thread Sven M . Hallberg
Apologies for jumping in as a bystander, but if I may comment:

Anton Lindqvist on Tue, Nov 10 2020:
> Been brought up before[1] and rejected[2][3].

Anton Lindqvist on Sun, Jul 02, 2017:
> diff below in which slashes are discarded when comparing the length. I
> don't know if any other character should be discarded as well, if true
> then it might be worth passing the input buffer through ksh's own
> lexer and [...]

The patch then seems to have been rejected for further complicating some
already hairy code and the above does sound rather unsafe.

Michael's aim here appears to be to simplify things:

Michael Forney on Mon, Nov 09, 2020:
> emacs.c:do_complete() using `end - start` for olen (which counts the
> backslashes), but nlen (the longest prefix length) does not. [...]
>
> I don't believe vi.c:complete_word() has this issue [...]
> 
> [my diff] is longer than I had hoped. Perhaps there is a simpler patch
> to make emacs.c:do_complete() use the same approach as vi completion.

Maybe some iteration is all that is needed.

Regards,
pesco



Re: ksh tab completion bug

2020-11-09 Thread Anton Lindqvist
On Mon, Nov 09, 2020 at 11:15:36PM -0800, Michael Forney wrote:
> I noticed some strange behavior of ksh in emacs mode when completing
> file names that contain spaces (or other characters that need
> escaping).
> 
> To illustrate the problem, consider two files 'a b c test1' and
> 'a b c test2'. ksh will correctly complete `a` and `a\ b\ c\ ` to
> the common prefix `a\ b\ c\ test`. However, all of the following
> do not get completed:
> 
> * `a\ b\ c\ t`
> * `a\ b\ c\ te`
> * `a\ b\ c\ tes`
> 
> I did some debugging, and I think this is due to emacs.c:do_complete()
> using `end - start` for olen (which counts the backslashes), but
> nlen (the longest prefix length) does not. So the completion only
> occurs when the current word is shorter than the common prefix
> length minus the number of backslashes in the word.
> 
> I don't believe vi.c:complete_word() has this issue since it always
> replaces the word with the longest prefix.
> 
> I wrote a (only lightly tested) patch to make x_cf_glob return the
> unescaped length as well as the start and end positions, and use
> that for olen instead of `end - start`. This seems to fix the issue,
> but it is longer than I had hoped. Perhaps there is a simpler patch
> to make emacs.c:do_complete() use the same approach as vi completion.

Been brought up before[1] and rejected[2][3].

[1] https://marc.info/?l=openbsd-bugs&m=149902839123905&w=2
[2] https://marc.info/?l=openbsd-bugs&m=149925960003395&w=2
[3] https://marc.info/?l=openbsd-bugs&m=149925991603581&w=2



ksh tab completion bug

2020-11-09 Thread Michael Forney
I noticed some strange behavior of ksh in emacs mode when completing
file names that contain spaces (or other characters that need
escaping).

To illustrate the problem, consider two files 'a b c test1' and
'a b c test2'. ksh will correctly complete `a` and `a\ b\ c\ ` to
the common prefix `a\ b\ c\ test`. However, all of the following
do not get completed:

* `a\ b\ c\ t`
* `a\ b\ c\ te`
* `a\ b\ c\ tes`

I did some debugging, and I think this is due to emacs.c:do_complete()
using `end - start` for olen (which counts the backslashes), but
nlen (the longest prefix length) does not. So the completion only
occurs when the current word is shorter than the common prefix
length minus the number of backslashes in the word.

I don't believe vi.c:complete_word() has this issue since it always
replaces the word with the longest prefix.

I wrote a (only lightly tested) patch to make x_cf_glob return the
unescaped length as well as the start and end positions, and use
that for olen instead of `end - start`. This seems to fix the issue,
but it is longer than I had hoped. Perhaps there is a simpler patch
to make emacs.c:do_complete() use the same approach as vi completion.

diff --git a/bin/ksh/edit.c b/bin/ksh/edit.c
index 3089d195d20..d44afc258ba 100644
--- a/bin/ksh/edit.c
+++ b/bin/ksh/edit.c
@@ -29,7 +29,7 @@ static void check_sigwinch(void);
 
 static int x_file_glob(int, const char *, int, char ***);
 static int x_command_glob(int, const char *, int, char ***);
-static int x_locate_word(const char *, int, int, int *, int *);
+static int x_locate_word(const char *, int, int, int *, int *, int *);
 
 
 /* Called from main */
@@ -524,15 +524,16 @@ x_command_glob(int flags, const char *str, int slen, char 
***wordsp)
(c) == '`' || (c) == '=' || (c) == ':' )
 
 static int
-x_locate_word(const char *buf, int buflen, int pos, int *startp,
+x_locate_word(const char *buf, int buflen, int pos, int *startp, int *endp,
 int *is_commandp)
 {
int p;
-   int start, end;
+   int start, end, len;
 
/* Bad call?  Probably should report error */
if (pos < 0 || pos > buflen) {
*startp = pos;
+   *endp = pos;
*is_commandp = 0;
return 0;
}
@@ -546,10 +547,13 @@ x_locate_word(const char *buf, int buflen, int pos, int 
*startp,
(start > 1 && buf[start-2] == '\\'); start--)
;
/* Go forwards to end of word */
+   len = 0;
for (end = start; end < buflen && IS_WORDC(buf[end]); end++) {
if (buf[end] == '\\' && (end+1) < buflen)
end++;
+   len++;
}
+   *endp = end;
 
if (is_commandp) {
int iscmd;
@@ -574,7 +578,7 @@ x_locate_word(const char *buf, int buflen, int pos, int 
*startp,
 
*startp = start;
 
-   return end - start;
+   return len;
 }
 
 static int
@@ -654,14 +658,14 @@ x_try_array(const char *buf, int buflen, const char 
*want, int wantlen,
 
 int
 x_cf_glob(int flags, const char *buf, int buflen, int pos, int *startp,
-int *endp, char ***wordsp, int *is_commandp)
+int *endp, int *lenp, char ***wordsp, int *is_commandp)
 {
-   int len;
+   int len, esclen;
int nwords;
char **words = NULL;
int is_command;
 
-   len = x_locate_word(buf, buflen, pos, startp, &is_command);
+   len = x_locate_word(buf, buflen, pos, startp, endp, &is_command);
if (!(flags & XCF_COMMAND))
is_command = 0;
/* Don't do command globing on zero length strings - it takes too
@@ -671,10 +675,11 @@ x_cf_glob(int flags, const char *buf, int buflen, int 
pos, int *startp,
if (len == 0 && is_command)
return 0;
 
+   esclen = *endp - *startp;
if (is_command)
-   nwords = x_command_glob(flags, buf + *startp, len, &words);
-   else if (!x_try_array(buf, buflen, buf + *startp, len, &nwords, &words))
-   nwords = x_file_glob(flags, buf + *startp, len, &words);
+   nwords = x_command_glob(flags, buf + *startp, esclen, &words);
+   else if (!x_try_array(buf, buflen, buf + *startp, esclen, &nwords, 
&words))
+   nwords = x_file_glob(flags, buf + *startp, esclen, &words);
if (nwords == 0) {
*wordsp = NULL;
return 0;
@@ -683,7 +688,7 @@ x_cf_glob(int flags, const char *buf, int buflen, int pos, 
int *startp,
if (is_commandp)
*is_commandp = is_command;
*wordsp = words;
-   *endp = *startp + len;
+   *lenp = len;
 
return nwords;
 }
diff --git a/bin/ksh/edit.h b/bin/ksh/edit.h
index 0b604cd64fb..f988d92c4b2 100644
--- a/bin/ksh/edit.h
+++ b/bin/ksh/edit.h
@@ -43,7 +43,7 @@ bool  x_mode(bool);
 intpromptlen(const char *, const char **);
 intx_do_comment(char *, int, int *);
 void   x_print_expansions(int, char *const *, i