expanding TABs in indentation: the aftermath can be easy

Jim Meyering Fri, 11 Dec 2009 04:57:51 -0800

Jim Meyering wrote:
> Bruno Haible wrote:
>> What should I write in the NEWS file, about recommendations for people who 
>> have
>> patches on top of gnulib?
>
> We also need a way to keep things in order going forward.
> I.e., a syntax-check style rule that enforces this style.
>
> To that end, please prepare a file like the one below,
> to be committed along with your other changes,
> or as part of a subsequent change that enforces policy.
> I started based on your earlier outline.
>
> These are extended regular expressions that match
> any file that must retain TAB-based indentation.
> For now, let's not worry about TABs elsewhere.
> --------------------------
> # These contain Makefile snippets.
> ^modules/
>
> # The regex module is the only major source code for which we still
> # have bidirectional propagation between gnulib and glibc.
> ^lib/regcomp\.c$
> ^lib/regex\.[ch]$
> ^lib/regex_internal\.[ch]$
> ^lib/regexec\.c$
>
> # This is special.
> ^lib/.*\.charset$
>
> # This is a binary file.
> ^lib/.*\.class$
> --------------------------
>
>> What are the tricks?
>
> I'll try to post details tomorrow.


The first part is the "patch-xform" script below.
I'll put it in gnulib's build-aux soon.

For example, I've just used it in coreutils-with-latest-gnulib
to confirm that it can transform the two gl/lib/*.diff files that
no longer apply:

  cd coreutils/gl/lib &&
  for i in c h; do f=tempname.$i.diff; patch-xform $f > k && mv k $f; done

#!/usr/bin/perl
# Expand leading TABs in the context and modified lines of git unidiff patches.
# If --exclude=FILE is specified, do not modify the patches of any file whose
# name matches any of the perl regular expressions (one per line) in that file.
# The regular expressions are matched against each full, relative file name, as
# found in git unidiff headers, but without the typical "a/", "b/", etc. prefix.
# Here is a useful set of regular expressions:
#
# (?:^|\/)ChangeLog[^/]*$
# (?:^|\/)(?:GNU)?[Mm]akefile[^/]*$
# \.(?:am|mk)$
#
# Only lines to consider:
#
#   /^[ +-]/  matched and context lines, when in a diff
#
#   /^diff --git/  this is a git diff: ignore a/ and b/ file name prefix
#   /^--- (.*)/    use the file name in $1
#   /^\+\+\+ /     ignore
#
# Currently makes no attempt to detect the end of the final patch,
# so it may convert TABs to spaces on anything there that resembles
# a unidiff-context/modified line.

use strict;
use warnings;
use Text::Tabs;
use Getopt::Long;

(my $ME = $0) =~ s|.*/||;
my $VERSION = '0.1';
my $verbose;

sub usage ($)
{
  my ($exit_code) = @_;
  my $STREAM = ($exit_code == 0 ? *STDOUT : *STDERR);
  if ($exit_code != 0)
    {
      print $STREAM "Try `$ME --help' for more information.\n";
    }
  else
    {
      my $example_regexp = <<\EOF;
(?:^|\/)ChangeLog[^/]*$
(?:^|\/)(?:GNU)?[Mm]akefile[^/]*$
\.(?:am|mk)$
EOF
      print $STREAM <<EOF;
Usage: $ME [OPTIONS] [FILE]
Filter FILE (containing git unidiff output), expanding leading TABs
in the context and modified lines.

OPTIONS:

   --exclude=RE_FILE  if RE_FILE is specified, do not modify the patches of
                        any file whose name matches any of the perl regular
                        expressions (one per line) in that file.
   --help             display this help and exit
   --version          output version information and exit

With no FILE, or when FILE is -, read standard input.

Sample content for a RE_FILE:

$example_regexp
Be sure to exclude any binary files, e.g., .jpg, .pdf, etc. too.
EOF
    }
  exit $exit_code;
}

sub build_regexp ($)
{
  my ($file) = @_;

  # Read regexps from $file, one per line, then 'OR'ing them together
  # and wrap in (?:...) to form our result.
  open IN, '<', $file
    or die "$ME: $file: cannot open for reading: $!\n";
  my @lines = <IN>;
  close IN;
  chomp @lines;
  my $re = join '|', @lines;
  return "(?:$re)";
}

{
  my $exclude_regexp_file;
  GetOptions
    (
     'exclude=s' => \$exclude_regexp_file,
     help => sub { usage 0 },
     verbose => \$verbose,
     version => sub { print "$ME version $VERSION\n"; exit },
    ) or usage 1;
  my $exempt_file_re;

  defined $exclude_regexp_file
    and $exempt_file_re = build_regexp $exclude_regexp_file;

  my $xform_tabs;
  while (defined (my $line = <>))
    {
      my $xformed;
      if ($line =~ /^--- [a-z]\/(.*)/) # use the file name in $1
        {
          my $file_name = $1;
          $xform_tabs = (defined $exempt_file_re
                         ? $file_name !~ /$exempt_file_re/o
                         : 1);
          $verbose
            and warn "info: $file_name: " . ($xform_tabs ? 1 : 0) . "\n";
        }
      elsif ($line
               =~ /^(?:\...@\@[ ]
                    |(copy|rename)[ ]
                    |[ ]\d{6}$
                    |diff[ ]--git[ ]
                    |index[ ]
                    )
                  /x)
        {
          # ignore
        }
      elsif ($line =~ /^(?:$|[ +-])/)
        {
          $verbose
            and warn "info: $.\n";
          # Process or not, depending on name.
          if ($xform_tabs)
            {
              $verbose
                and warn "info: $line\n";
              my $match = $line =~ /^([ +-])( *\t[ \t]*)(.*)/;
              print $match ? $1 . expand($2) . $3 . "\n" : $line;
              $xformed = 1;
              $verbose && $match
                and warn "info: MATCHED!\n";
            }
        }
      else
        {
          # warn "$ME: unrecognized line: $line\n";
          $xform_tabs = 0;
        }

      ! $xformed
        and print $line;
    }
}

END { # use File::Coda; # http://meyering.net/code/Coda/
  defined fileno STDOUT or return;
  close STDOUT and return;
  warn "$ME: failed to close standard output: $!\n";
  $? ||= 1;
}

# Local variables:
#  indent-tabs-mode: nil
# End:

You can do the same thing to a topic branch in git.
Here is pseudo-texinfo:

Let's assume that just after transforming @samp{master},
you tagged the result with @samp{tab}
and the changes you want to rebase are on the @samp{topic} branch.
With that, you would run these commands to rebase that branch:

@example
git checkout topic                                          [1]
git rebase tab^                                             [2]
git format-patch --stdout master \
  | patch-xform --exclude=leading-blank.exempt \
  > topic.xformed                                           [3]

git checkout -b topic2 tab                                  [4]
git am topic.xformed                                        [5]

git diff --ignore-space-change topic topic2                 [6]

git branch -D topic                                         [7]
git branch -m topic2 topic                                  [8]
git rebase master                                           [9]
@end example

Step 1 ensures that @samp{topic} is the current branch, which [2]
rebases to @samp{tab^}, the change-set just before the problematic one.
The third step prints the patch series on @samp{topic}, filters it through
our patch-transforming script and saves the result in a temporary file.
Step 4 creates and makes current our temporary branch, @samp{topic2},
with its base at @samp{tab}, and [5] then applies the transformed
patch set to that new branch.
[6] is an optional cross-check to ensure that the only differences
between the two branches are safely ignorable.
Steps 7 and 8 clean up by removing the original @samp{topic} branch
and replacing it with the temporary one.
Finally, step 9 rebases our new branch to @samp{master}.

We can perform the same task more efficiently and concisely,
with the advantage of no temporary file, but perhaps at the
expense of readability, depending on your familiarity with
these @command{git} commands.  You be the judge:

@example
git rebase tab^ topic                                       [a]
git checkout -b topic2 tab                                  [b]
git format-patch --stdout master..topic \
  | patch-xform --exclude=leading-blank.exempt \
  | git am                                                  [c]
git diff --ignore-space-change topic topic2                 [d]
git branch -D topic                                         [e]
git branch -m topic2 topic                                  [f]
git rebase master                                           [g]
@end example

Step [a] combines [1] and [2], since there is no need to change the
current branch.
Since [c]'s use of @samp{git am} will modify the current branch
(contrast with [3], which just writes a temporary file),
step [b] must first create and switch to the destination branch, @samp{topic2}.
Step [c] forms the patch series for everything on the @samp{topic} branch,
filters it through our @command{patch-xform} script, and applies the
result to the current branch via @command{git am}.
The remaining steps are identical to 6...@dots{}9.

However, all of the above doesn't qualify as ``easy enough''
for most people.  There are too many variables and interdependencies.
Note that [a] and [g] may evoke merge conflicts, so they delineate
the non-interactive core: [...@dots{}[f].

Even for so few steps, there are four inputs:
@itemize
@item @var{P} parent branch name [master]
@item @var{T} tag marking the transition point on @var{P} [tab]
@item @var{B} name of branch to move [topic] (forked off of @var{P}
  prior to  @var{T})
@item file name blacklist: [leading-blank.exempt]
@end itemize

@c note that the list of branch names from "git br --contains @var{T}"
@c must include @var{P}

You can also think of the type of transformation as an input:
trailing-blank-removal or leading-TAB-to-space, or even both.
If you make that the fifth input, verify that @var{T} contains
only changes implied by this type.
Actually, there's an even better way:
automatically derive the type from @var{T}'s change set.
If this command prints no changes, then @var{T} is a trailing-blank-removal 
delta:

@example
git diff --ignore-space-at-eol T^..T
@end example

Otherwise, if @var{T}'s delta transforms @kbd{TAB}s to spaces in indentation,
this command will print no diffs:

@example
git diff --ignore-space-change T^..T
@end example

expanding TABs in indentation: the aftermath can be easy

Reply via email to