[Bug other/86904] Column numbers ignore tab characters

richard.earnshaw at arm dot com Fri, 10 Aug 2018 02:44:41 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86904


--- Comment #1 from richard.earnshaw at arm dot com ---
On 09/08/18 21:08, dmalcolm at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86904
> 
>             Bug ID: 86904
>            Summary: Column numbers ignore tab characters
>            Product: gcc
>            Version: unknown
>             Status: UNCONFIRMED
>           Keywords: diagnostic
>           Severity: normal
>           Priority: P3
>          Component: other
>           Assignee: unassigned at gcc dot gnu.org
>           Reporter: dmalcolm at gcc dot gnu.org
>   Target Milestone: ---
> 
> As noted in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19165#c21 :
> 
> /* Both gcc and emacs number source *lines* starting at 1, but
>    they have differing conventions for *columns*.
> 
>    GCC uses a 1-based convention for source columns,
>    whereas Emacs's M-x column-number-mode uses a 0-based convention.
> 
>    For example, an error in the initial, left-hand
>    column of source line 3 is reported by GCC as:
> 
>       some-file.c:3:1: error: ...etc...
> 
>    On navigating to the location of that error in Emacs
>    (e.g. via "next-error"),
>    the locus is reported in the Mode Line
>    (assuming M-x column-number-mode) as:
> 
>      some-file.c   10%   (3, 0)
> 
>    i.e. "3:1:" in GCC corresponds to "(3, 0)" in Emacs.  */
> 
> In terms of 0 vs 1, GCC complies with the GNU standards here:
> https://www.gnu.org/prep/standards/html_node/Errors.html
> 
> However our "column numbers" are also simply a 1-based byte-count, so a tab
> character is treated by us as simply an increment of 1 right now.
> 
> (see also PR 49973, which covers the case of multibyte characters).
> 
> It turns out that we convert tab characters to *single* space characters when
> printing source code.
> 
> This behavior has been present since Manu first implemented
> -fdiagnostics-show-caret in r186305 (aka
> 5a9830842f69ebb059061e26f8b0699cbd85121e, PR 24985), where it was this logic
> (there in diagnostic.c's diagnostic_show_locus):
>       char c = *line == '\t' ? ' ' : *line;
>       pp_character (context->printer, c);
> 
> (that logic is now in diagnostic-show-locus.c in layout::print_source_line)
> 
> This is arguably a bug, but it's intimately linked to the way in which we 
> track
> "column numbers".
> 
> Our "column numbers" are currently simply a 1-based byte-count, I believe, so 
> a
> tab character is treated by us as simply an increment of 1 right now.
> 
> There are similar issues with encodings that aren't 1 byte per character (e.g.
> non-ASCII unicode characters), which are being tracked in PR 49973.
> 
> Presumably, when we print source lines containing tab characters, we should
> emit a number of spaces to reach a tab stop.
> 
> Consider a diagnostic with a multiline range covering the
> following source (and beyond):
> 
>       indent: 6 (tabs: 0, spaces: 6)
>        indent: 7 (tabs: 0, spaces: 7)
>         indent: 8 (tabs: 1, spaces: 0)
>          indent: 9 (tabs: 1, spaces: 1)
> 
> i.e.:
> 
>   "      indent: 6 (tabs: 0, spaces: 6)\n"
>   "       indent: 7 (tabs: 0, spaces: 7)\n"
>   "\tindent: 8 (tabs: 1, spaces: 0)\n"
>   "\t indent: 9 (tabs: 1, spaces: 1)\n"
> 
> Currently diagnostic_show_locus prints:
> 
>        indent: 6 (tabs: 0, spaces: 6)
>        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>         indent: 7 (tabs: 0, spaces: 7)
>         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   indent: 8 (tabs: 1, spaces: 0)
>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    indent: 9 (tabs: 1, spaces: 1)
>    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> i.e:
>   "       indent: 6 (tabs: 0, spaces: 6)\n"
>   "       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"
>   "        indent: 7 (tabs: 0, spaces: 7)\n"
>   "        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"
>   "  indent: 8 (tabs: 1, spaces: 0)\n"
>   "  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"
>   "   indent: 9 (tabs: 1, spaces: 1)\n"
>   "   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"
> 
> which misrepresents the indentation of the user's code.
> 
> It should respect tabstops, and print:
> 
>        indent: 6 (tabs: 0, spaces: 6)
>        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>         indent: 7 (tabs: 0, spaces: 7)
>         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>          indent: 8 (tabs: 1, spaces: 0)
>          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>           indent: 9 (tabs: 1, spaces: 1)
>           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> i.e.:
> 
>   "       indent: 6 (tabs: 0, spaces: 6)\n"
>   "       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"
>   "        indent: 7 (tabs: 0, spaces: 7)\n"
>   "        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"
>   "         indent: 8 (tabs: 1, spaces: 0)\n"
>   "         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"
>   "          indent: 9 (tabs: 1, spaces: 1)\n"
>   "          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"
> 
> We should also handle erroneous leading spaces before a tab, so that e.g.
> 
>   "  \tfoo"
> 
> should be printed as if it were:
> 
>  "\tfoo"
> 
> (given that that's what the user's editor is probably printing it as).
> 
> Similarly, we should presumably print "8" for the column number for the 'f' of
> "foo".  However, IDEs are expecting GCC's existing behavior, so we should
> probably add a command-line option for controlling this.
> 
> Adding a left margin with line numbers (as of r263450) doesn't change this 
> bug,
> but makes fixing it slightly more complicated.
> 
> Maybe:
>   -fdiagnostics-x-coord=bytes : count of bytes
>   -fdiagnostics-x-coord=characters : count of characters (not special-casing
> tab)
>   -fdiagnostics-x-coord=columns : count of columns: as per characters, but 
> with
> tabs doing tabstops

how about -fdiagnostics-x-coord=visual-[n]

Where n is the size of a hard tab?  Some folks change the tab stop to 4,
for example.  Or maybe ...coord=tab[-n], where -n defaults to "-8".

R.

> (currently we use "bytes"  Not sure if we need "characters")
> 
> I'm thinking that internally, we should continue to track byte offsets, but
> make the option affect the presentation of the number in diagnostics*.c.
> 
> (should it affect -fdiagnostics-parseable-fixits ?
> see also the Emacs RFE for parseable fixits:
>   https://debbugs.gnu.org/cgi/bugreport.cgi?bug=25987 )
>

[Bug other/86904] Column numbers ignore tab characters

Reply via email to