> On Wed, Jun 10, 2020 at 12:11:00PM -0400, David Malcolm wrote:
> > Thanks for the patch; sorry about the delay in reviewing it.
> > 
> > Some high-level review points
> > 
> > - I like the patch overall
> > 
> > - This will deserve an item in the release notes
> > 
> > - I don't like adding "global_tabstop" (I don't like global
> > variables).  Is there nowhere else we can handle this? I believe
> > there's a cluster of functions in the callgraph that make use of
> > it; can we simply pass around the tabstop value instead?  "tabstop"
> > seems to have several meanings.  If I'm reading the patch correctly
> >   * "tabstop > 0" means to expand tabs so that column numbers are a
> > multiple of tabstop
> >   * "tabstop == 0" means "don't expand tabs"
> >   * "tabstop < 0" in some places means: use the global_tabstop
value
> > Is it possible to eliminate global_tabstop value?  Or is there some
> > deep reason I'm missing?
> > 
> > I'll do a more thorough review once that's addressed/resolved
(since
> > eliminating global_tabstop might touch a few places).
> >
> 
> Thanks for the feedback! The attached updated patch addresses these
> concerns. Regarding tabstop, I have removed the new static variable
> global_tabstop in charset.c. FWIW, the usage of "tabstop" arguments
in the
> various new APIs did previously work a bit more consistently than you
> described. In all cases "tabstop <= 0" meant to use the default
value,
> otherwise it specified the tabstop to use (with tabstop=1 naturally
> restoring the old behavior of changing tabs to a single space). In
order
> for libcpp to provide this feature (callers can pass tabstop <= 0 to
get a
> default, and the default can in turn by configured when processing
the
> -ftabstop option), it does need to remember the default, and this has
to
> be a file-level static variable because the routines need to work
> independent of any cpp_reader instance. (Some frontends don't use
> libcpp to read their input, for instance.) Anyway, I see the point
that
> this file-level static, being accessible with cpp_set_tabstop() and
> cpp_get_tabstop(), is effectively just a global variable, so I have
> removed this feature, which just means that all callers need to pass
the
> tabstop they want to use. I am now rather using the
diagnostic_context
> object to remember the value passed to -ftabstop. The only place this
> involves global variables is now in c-family/c-indentation.c, where
if I
> understood correctly, the only diagnostic_context available is
global_dc,
> so I am getting the tabstop value from there. Please let me know if
> there's a better way to handle that? Prior to my patch, the tabstop
was
> obtained from a different global variable (extern cpp_options
*cpp_opts),
> so at least conservation of total globals is maintained. :)

Thanks.  That sounds like a good approach.


> Compared to the previous version, this one is a bit longer, since 25
or
> so call sites had to be modified to know the value of -ftabstop. Most
of
> the churn is in diagnostic-show-locus.c, because there are a fair
number of
> static helper functions and helper classes there, which just needed
to
> receive the diagnostic_context object from their callers. I could
> have made this simpler by letting the tabstop argument default to
> something like 8 in all functions that require it... this would
remove the
> need to pass it in all the selftests that are indifferent to it. I
figured
> it would be better to force this argument to be passed, though, or
else in
> the future it may be easy to forget to pass it where it is needed. 

Thanks.

> > Thanks for adding docs; some nits on them:
> > 
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > 
> > [...snip...]
> > 
> > > +@item -fdiagnostics-column-unit=@var{UNIT}
> > > +@opindex fdiagnostics-column-unit
> > > +Select the units for the column number.  This affects
traditional diagnostics
> > > +(in the absence of @option{-fno-show-column}), as well as JSON
format
> > > +diagnostics if requested.
> > > +
> > > +The default @var{UNIT}, @samp{display}, considers the number of
display columns
> > > +occupied by each character.  This may be larger than the number
of bytes
> > > +occupied, in the case of tab characters, or it may be smaller,
in the case of
> > > +multibyte characters.  For example, the UTF-8 character ``@U{03C
0}'' occupies
> > > +two bytes and one display column, while the character ``@U{1F642
}'' occupies
> > > +four bytes and two display columns.
> > 
> > This is imprecise.  A unicode code point occupies some number of
display columns,
> > and its *UTF-8 encoding* occupies some number of bytes.
> > 
> > [and my inner pedant is now thinking: what about combining
diacritics? 
> > But I don't think we can ever issue a diagnostic on a diacritic; I
> > *think* we only ever care about the per-glyph level]
> > 
> > > +Setting @var{UNIT} to @samp{byte} changes the column number to
the
> > raw byte
> > > +count in all cases, as was traditionally output by GCC prior to
version 11.1.0.
> > > +
> > > +@item -fdiagnostics-column-origin=@var{ORIGIN}
> > > +@opindex fdiagnostics-column-origin
> > > +Select the origin for column numbers, i.e. the column number
assigned to the
> > > +first column.  The default value of 1 corresponds to traditional
GCC
> > > +behavior and to the GNU style guide.  Some utilities may perform
better with an
> > > +origin of 0; any non-negative value may be specified.
> > > +
> > >  @item -fdiagnostics-format=@var{FORMAT}
> > >  @opindex fdiagnostics-format
> > >  Select a different format for printing diagnostics.
> > 
> > [...snip...]
> > 
> > > +A diagnostic can contain zero or more locations.  Each location
has an
> > > +optional @code{label} string and up to three positions within
it: a
> > > +@code{caret} position and optional @code{start} and
@code{finish} positions.
> > > +A position is described by a @code{file} name, a @code{line}
number, and
> > > +three numbers indicating a column position: @code{display-
column} counts
> > > +display columns, accounting for tabs and multibyte characters;
> > > +@code{byte-column} counts raw bytes; and @code{column} is equal
to one of
> > > +the previous two, as dictated by the @option{-fdiagnostics-
column-unit}
> > > +option.
> > 
> > Might be clearer to use an unordered list here for the three kinds
of column.
> > 
> > > All three columns are relative to the origin specified by
> > > +@option{-fdiagnostics-column-origin}, which is typically equal
to 1 but may
> > > +be set, for instance, to 0 for compatibility with other
utilities that
> > > +number columns from 0.  The column origin is recorded in the
JSON output in
> > > +the @code{column-origin} tag.  In the remaining examples below,
the extra
> > > +column number outputs have been omitted for brevity.
> > 
> > [...snip...]
> > 
> 
> I improved the docs along these lines.
> 
> > Thanks again for the patch; hope this is constructive
> > Dave
> >
> 
> Thanks for your time! BTW, I did bootstrap + regtest this version as
well on
> x86-64 Linux, it looks good, new tests pass and others are the same:
> 
> FAIL 97 97
> PASS 476837 477297
> UNRESOLVED 7 7
> UNSUPPORTED 11726 11726
> UNTESTED 195 195
> XFAIL 1807 1807
> XPASS 37 37
> 
> -Lewis
>

Thanks for the updated patch.

This looks (almost) ready; I have a few nits inline below....

> From 7729ce3334b6768a25967a6dd4a0a5a2ed0923cc Mon Sep 17 00:00:00 2001
> From: Lewis Hyatt <lhy...@gmail.com>
> Date: Wed, 10 Jun 2020 22:04:07 -0400
> Subject: [PATCH] diagnostics: Support conversion of tabs to spaces [PR49973] 
> [PR86904]
> 
> Supports conversion of tabs to spaces when outputting diagnostics. Also
> adds -fdiagnostics-column-unit and -fdiagnostics-column-origin options to
> control how the column number is output, thereby resolving the two PRs.
> 
> gcc/c-family/ChangeLog:
> 
>         PR other/86904
>         * c-indentation.c (should_warn_for_misleading_indentation): Get
>         global tabstop from the new source.
>         * c-opts.c (c_common_handle_option): Remove handling of -ftabstop, 
> which
>         is now a common option.
>         * c.opt: Likewise.
> 
> gcc/ChangeLog:
> 
>         PR preprocessor/49973
>         PR other/86904
>         * common.opt: Handle -ftabstop here instead of in c-family
>         options.  Add -fdiagnostics-column-unit= and
>         -fdiagnostics-column-origin= options.
>         * opts.c (common_handle_option): Handle the new options.
>         * diagnostic-format-json.cc (json_from_expanded_location): Add
>         diagnostic_context argument.  Use it to convert column numbers as per
>         the new options.
>         (json_from_location_range): Likewise.
>         (json_from_fixit_hint): Likewise.
>         (json_end_diagnostic): Pass the new context argument to helper
>         functions above.  Add "column-origin" field to the output.
>         (test_unknown_location): Add the new context argument to calls to
>         helper functions.
>         (test_bad_endpoints): Likewise.
>         * diagnostic-show-locus.c
>         (exploc_with_display_col::exploc_with_display_col): Support
>         tabstop parameter.
>         (layout_point::layout_point): Make use of class
>         exploc_with_display_col.
>         (layout_range::layout_range): Likewise.
>         (struct line_bounds): Clarify that the units are now always
>         display columns.  Rename members accordingly.  Add constructor.
>         (layout::print_source_line): Add support for tab expansion.
>         (make_range): Adapt to class layout_range changes.
>         (layout::maybe_add_location_range): Likewise.
>         (layout::layout): Adapt to class exploc_with_display_col changes.
>         (layout::calculate_x_offset_display): Support tabstop parameter.
>         (layout::print_annotation_line): Adapt to struct line_bounds changes.
>         (layout::print_line): Likewise.
>         (line_label::line_label): Add diagnostic_context argument.
>         (get_affected_range): Likewise.
>         (get_printed_columns): Likewise.
>         (layout::print_any_labels): Adapt to struct line_label changes.
>         (class correction): Add m_tabstop member.
>         (correction::correction): Add tabstop argument.
>         (correction::compute_display_cols): Use m_tabstop.
>         (class line_corrections): Add m_context member.
>         (line_corrections::line_corrections): Add diagnostic_context argument.
>         (line_corrections::add_hint): Use m_context to handle tabstops.
>         (layout::print_trailing_fixits): Adapt to class line_corrections
>         changes.
>         (test_layout_x_offset_display_utf8): Support tabstop parameter.
>         (test_layout_x_offset_display_tab): New selftest.
>         (test_one_liner_colorized_utf8): Likewise.
>         (test_tab_expansion): Likewise.
>         (test_diagnostic_show_locus_one_liner_utf8): Call the new tests.
>         (diagnostic_show_locus_c_tests): Likewise.
>         (test_overlapped_fixit_printing): Adapt to helper class and
>         function changes.
>         (test_overlapped_fixit_printing_utf8): Likewise.
>         (test_overlapped_fixit_printing_2): Likewise.
>         * diagnostic.h (enum diagnostics_column_unit): New enum.
>         (struct diagnostic_context): Add members for the new options.
>         (diagnostic_converted_column): Declare.
>         (json_from_expanded_location): Add new context argument.
>         * diagnostic.c (diagnostic_initialize): Initialize new members.
>         (diagnostic_converted_column): New function.
>         (maybe_line_and_column): Be willing to output a column of 0.
>         (diagnostic_get_location_text): Convert column number as per the new
>         options.
>         (diagnostic_report_current_module): Likewise.
>         (assert_location_text): Add origin and column_unit arguments for
>         testing the new functionality.
>         (test_diagnostic_get_location_text): Test the new functionality.
>         * doc/invoke.texi: Document the new options and behavior.
>         * input.h (location_compute_display_column): Add tabstop argument.
>         * input.c (location_compute_display_column): Likewise.
>         (test_cpp_utf8): Add selftests for tab expansion.
>         * tree-diagnostic-path.cc (default_tree_make_json_for_path): Pass the
>         new context argument to json_from_expanded_location().
> 
> libcpp/ChangeLog:
> 
>         PR preprocessor/49973
>         PR other/86904
>         * include/cpplib.h (struct cpp_options):  Removed support for 
> -ftabstop,
>         which is now handled by diagnostic_context.
>         (class cpp_display_width_computation): New class.
>         (cpp_byte_column_to_display_column): Add optional tabstop argument.
>         (cpp_display_width): Likewise.
>         (cpp_display_column_to_byte_column): Likewise.
>         * charset.c
>         (cpp_display_width_computation::cpp_display_width_computation): New
>         function.
>         (cpp_display_width_computation::advance_display_cols): Likewise.
>         (compute_next_display_width): Removed and implemented this
>         functionality in a new function...
>         (cpp_display_width_computation::process_next_codepoint): ...here.
>         (cpp_byte_column_to_display_column): Added tabstop argument.
>         Reimplemented in terms of class cpp_display_width_computation.
>         (cpp_display_column_to_byte_column): Likewise.
>         * init.c (cpp_create_reader): Remove handling of -ftabstop, which is 
> now
>         handled by diagnostic_context.
> 
> gcc/testsuite/ChangeLog:
> 
>         PR preprocessor/49973
>         PR other/86904
>         * c-c++-common/Wmisleading-indentation-3.c: Adjust expected output
>         for new defaults.
>         * c-c++-common/Wmisleading-indentation.c: Likewise.
>         * c-c++-common/diagnostic-format-json-1.c: Likewise.
>         * c-c++-common/diagnostic-format-json-2.c: Likewise.
>         * c-c++-common/diagnostic-format-json-3.c: Likewise.
>         * c-c++-common/diagnostic-format-json-4.c: Likewise.
>         * c-c++-common/diagnostic-format-json-5.c: Likewise.
>         * c-c++-common/missing-close-symbol.c: Likewise.
>         * g++.dg/diagnostic/bad-binary-ops.C: Likewise.
>         * g++.dg/parse/error4.C: Likewise.
>         * g++.old-deja/g++.brendan/crash11.C: Likewise.
>         * g++.old-deja/g++.pt/overload2.C: Likewise.
>         * g++.old-deja/g++.robertl/eb109.C: Likewise.
>         * gcc.dg/analyzer/malloc-paths-9.c: Likewise.
>         * gcc.dg/bad-binary-ops.c: Likewise.
>         * gcc.dg/format/branch-1.c: Likewise.
>         * gcc.dg/format/pr79210.c: Likewise.
>         * gcc.dg/plugin/diagnostic-test-expressions-1.c: Likewise.
>         * gcc.dg/plugin/diagnostic-test-string-literals-1.c: Likewise.
>         * gcc.dg/redecl-4.c: Likewise.
>         * gfortran.dg/diagnostic-format-json-1.F90: Likewise.
>         * gfortran.dg/diagnostic-format-json-2.F90: Likewise.
>         * gfortran.dg/diagnostic-format-json-3.F90: Likewise.
>         * go.dg/arrayclear.go: Add a comment explaining why adding a
>         comment was necessary to work around a dejagnu bug.
>         * c-c++-common/diagnostic-units-1.c: New test.
>         * c-c++-common/diagnostic-units-2.c: New test.
>         * c-c++-common/diagnostic-units-3.c: New test.
>         * c-c++-common/diagnostic-units-4.c: New test.
>         * c-c++-common/diagnostic-units-5.c: New test.
>         * c-c++-common/diagnostic-units-6.c: New test.
>         * c-c++-common/diagnostic-units-7.c: New test.
>         * c-c++-common/diagnostic-units-8.c: New test.

[...snip...]

> diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
> index 307dbcfb34a..75706c5f4d8 100644
> --- a/gcc/diagnostic.h
> +++ b/gcc/diagnostic.h
> @@ -24,6 +24,20 @@ along with GCC; see the file COPYING3.  If not see
>  #include "pretty-print.h"
>  #include "diagnostic-core.h"
>  
> +/* An enum for controlling what units to use for the column number
> +   when diagnostics are output, used by the -fdiagnostics-column-unit option.
> +   Tabs will be expanded or not according to the value of -ftabstop.  The 
> origin
> +   (default 1) is controlled by -fdiagnostics-column-origin.  */
> +

"New" and "historical" can get out of date, so how about:

> +enum diagnostics_column_unit
> +{
> +  /* The new default: display columns.  */

     /* The default from GCC 11 onwards: display column.  */
     
> +  DIAGNOSTICS_COLUMN_UNIT_DISPLAY,
> +
> +  /* The historical behavior: simple bytes.  */

     /* The behavior in GCC 10 and earlier: simple bytes.  */
     
> +  DIAGNOSTICS_COLUMN_UNIT_BYTE
> +};

?

[...snip...]

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 06a04e3d7dd..f463275bc8b 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi

[...snip...]

> @@ -4729,6 +4731,31 @@ Do not print column numbers in diagnostics.  This may 
> be necessary if
>  diagnostics are being scanned by a program that does not understand the
>  column numbers, such as @command{dejagnu}.
>  
> +@item -fdiagnostics-column-unit=@var{UNIT}
> +@opindex fdiagnostics-column-unit
> +Select the units for the column number.  This affects traditional diagnostics
> +(in the absence of @option{-fno-show-column}), as well as JSON format
> +diagnostics if requested.
> +
> +The default @var{UNIT}, @samp{display}, considers the number of display
> +columns occupied by each character.  This may be larger than the number
> +of bytes required to encode the character, in the case of tab
> +characters, or it may be smaller, in the case of multibyte characters.
> +For example, the character ``@U{03C0}'' occupies one display column,
> +and its UTF-8 encoding requires two bytes; the character ``@U{1F642}''
> +occupies two display columns, and its UTF-8 encoding requires four
> +bytes.

Thanks for reworking this.

I'm wary of those @U commands - does the generated HTML from "make
html" work? (similarly for the man page).  Would it be safer to express
them in the following form?

 For example, the character ``GREEK SMALL LETTER PI (U+03C0)'' occupies one
 display column, and its UTF-8 encoding requires two bytes; the character
 ``SLIGHTLY SMILING FACE' (U+1F642)'' occupies two display columns, and
 its UTF-8 encoding requires four bytes.

or somesuch?

> +Setting @var{UNIT} to @samp{byte} changes the column number to the raw byte
> +count in all cases, as was traditionally output by GCC prior to version 
> 11.1.0.

[...snip...]

>  @item -fdiagnostics-format=@var{FORMAT}
>  @opindex fdiagnostics-format
>  Select a different format for printing diagnostics.
> @@ -4764,11 +4791,15 @@ might be printed in JSON form (after formatting) like 
> this:
>          "locations": [
>              @{
>                  "caret": @{
> +                   "display-column": 3,
> +                   "byte-column": 3,
>                      "column": 3,
>                      "file": "misleading-indentation.c",
>                      "line": 15
>                  @},
>                  "finish": @{
> +                   "display-column": 4,
> +                   "byte-column": 4,
>                      "column": 4,
>                      "file": "misleading-indentation.c",
>                      "line": 15

Nit: the new fields don't line up with the old ones.

> @@ -4784,6 +4815,8 @@ might be printed in JSON form (after formatting) like 
> this:
>                  "locations": [
>                      @{
>                          "caret": @{
> +                           "display-column": 5,
> +                           "byte-column": 5,
>                              "column": 5,
>                              "file": "misleading-indentation.c",
>                              "line": 17

Likewise.

[...snip...]

> diff --git a/gcc/opts.c b/gcc/opts.c
> index 340d99434b3..525f44d079f 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "opt-suggestions.h"
>  #include "diagnostic-color.h"
>  #include "selftest.h"
> +#include "cpplib.h"

Is this new #include still needed?

[...snip...]


OK for trunk with those nits fixed.

Dave


Reply via email to