Re: [PATCH v2 1/1] contrib: add coverage-diff script

2018-09-13 Thread Derrick Stolee

On 9/13/2018 1:40 PM, Junio C Hamano wrote:

"Derrick Stolee via GitGitGadget"  writes:

+   then
+   line_num=$(($line_num + 1))
+   fi
+   fi
+   done

I have a feeling that a single Perl script instead of a shell loop
that runs many grep and awk as subprocesses performs better even on
Windows, and it would be more readable and maintainable.

perl -e '
my $line_num;
while (<>) {
# Hunk header?  Grab the beginning in postimage.
if (/^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/) {
$line_num = $1;
next;
}

# Have we seen a hunk?  Ignore "diff --git" etc.
next unless defined $line_num;

# Deleted line? Ignore.
if (/^-/) {
next;
}

# Show only the line number of added lines.
if (/^\+/) {
print "$line_num\n";
}
# Either common context or added line appear in
# the postimage.  Count it.
$line_num++;
}
'

or something like that, given that you seem to only need line
numbers in new_lines.txt anyway?


Thanks for the deep dive here, especially with the perl assistance. I've 
never written any perl, but it seems like the right tool here. I'll have 
time to revisit this next week.


Thanks,

-Stolee



Re: [PATCH v2 1/1] contrib: add coverage-diff script

2018-09-13 Thread Eric Sunshine
On Thu, Sep 13, 2018 at 10:56 AM Derrick Stolee via GitGitGadget
 wrote:
> There have been a few bugs in recent patches what would have been caught
> if the test suite covered those blocks (including a few of mine). I want
> to work towards a "sensible" amount of coverage on new topics. In my opinion,
> this means that any logic should be covered, but the 'die()' blocks in error
> cases do not need to be covered.

The bit about die() blocks is perhaps a bit too general. While it's
true that some die()'s signal very unlikely (or near-impossible)
conditions, others are merely reporting invalid user or other input to
the program. The latter category is often very much worth testing, as
the number of test_must_fail() invocations in the test suite shows.
68a6b3a1bd (worktree: teach 'move' to override lock when --force given
twice, 2018-08-28), which was highlighted in your cover letter,
provides a good example of legitimately testing that a die() is
covered. So, perhaps the above can be toned-down a bit by saying
something like:

...but die() blocks covering very unlikely (or near-impossible)
situations may not warrant coverage.

> It is important to not measure the coverage of the codebase by what old code
> is not covered.


Re: [PATCH v2 1/1] contrib: add coverage-diff script

2018-09-13 Thread Junio C Hamano
"Derrick Stolee via GitGitGadget"  writes:

> From: Derrick Stolee 
>
> We have coverage targets in our Makefile for using gcov to display line
> coverage based on our test suite. The way I like to do it is to run:
>
> make coverage-test
> make coverage-report
>
> This leaves the repo in a state where every X.c file that was covered has
> an X.c.gcov file containing the coverage counts for every line, and "#"
> at every uncovered line.
>
> There have been a few bugs in recent patches what would have been caught
> if the test suite covered those blocks (including a few of mine). I want
> to work towards a "sensible" amount of coverage on new topics. In my opinion,
> this means that any logic should be covered, but the 'die()' blocks in error
> cases do not need to be covered.
>
> It is important to not measure the coverage of the codebase by what old code
> is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
> After creating the coverage statistics at a version (say, 'topic') you can
> then run
>
> contrib/coverage-diff.sh base topic
>
> to see the lines added between 'base' and 'topic' that are not covered by the
> test suite. The output uses 'git blame -c' format so you can find the commits
> responsible and view the line numbers for quick access to the context.
>
> Signed-off-by: Derrick Stolee 
> ---
>  contrib/coverage-diff.sh | 63 
>  1 file changed, 63 insertions(+)
>  create mode 100755 contrib/coverage-diff.sh
>
> diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh
> new file mode 100755
> index 00..0f226f038c
> --- /dev/null
> +++ b/contrib/coverage-diff.sh
> @@ -0,0 +1,63 @@
> +#!/bin/sh
> +
> +# Usage: 'contrib/coverage-diff.sh  
> +# Outputs a list of new lines in version2 compared to version1 that are
> +# not covered by the test suite. Assumes you ran
> +# 'make coverage-test coverage-report' from root first, so .gcov files exist.

I presume that it is "from root first while you have a checkout of
version2, so .gcov files for version2 exist in the working tree"?

> +V1=$1
> +V2=$2
> +
> +diff_lines () {
> + while read line
> + do
> + if echo $line | grep -q -e "^@@ -([0-9]+)(,[0-9]+)? 
> \\+([0-9]+)(,[0-9]+)? @@.*"

As you know you are reading from diff output, you do not have to be
so strict in this if condition.  It's not like you try to carefully
reject a line that began with "@@" but did not match this pattern in
the corresponding else block.

"read line" does funny things to backslashes and whitespaces in the
payload ("read -r line" sometimes helps), and echoing $line without
quoting will destroy its contents here and in the line below (but
you do not care because all you care about is if it begins with a
dash).

> + then
> + line_num=$(echo $line \
> + | awk 'match($0, "@@ -([0-9]+)(,[0-9]+)? 
> \\+([0-9]+)(,[0-9]+)? @@.*", m) { print m[3] }')
> + else
> + echo "$line_num:$line"
> + if ! echo $line | grep -q -e "^-"

If a patch removes a line with a solitary 'n' on it, possibly
followed by a SP and something else, such a line would say "-n
something else", and some implementation of "echo -n something else"
would do what this line does not expect.  A safer way to do this,
as you only care if it is a deletion line, is to do

case "$line" in -*) ;; *) line_num=$(( $line_num + 1 ));; esac

Also you can make the echoing of "$line_num:$line" above
conditional, which will allow you to lose "grep ':+'" in the
pipeline that consumes output from this thing, e.g.

case "$line" in +*) echo "$line_num:$line";; esac

iff you must write this in shell (but see below).

> + then
> + line_num=$(($line_num + 1))
> + fi
> + fi
> + done

I have a feeling that a single Perl script instead of a shell loop
that runs many grep and awk as subprocesses performs better even on
Windows, and it would be more readable and maintainable.

perl -e '
my $line_num;
while (<>) {
# Hunk header?  Grab the beginning in postimage.
if (/^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/) {
$line_num = $1;
next;
}

# Have we seen a hunk?  Ignore "diff --git" etc.
next unless defined $line_num;

# Deleted line? Ignore.
if (/^-/) {
next;
}

# Show only the line number of added lines.
if (/^\+/) {
print "$line_num\n";
}
# Either common context or added line appear in
# the postimage.  Count it.
$line_num++;
}
'

or something like that, given that you seem to only need line

[PATCH v2 1/1] contrib: add coverage-diff script

2018-09-13 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

We have coverage targets in our Makefile for using gcov to display line
coverage based on our test suite. The way I like to do it is to run:

make coverage-test
make coverage-report

This leaves the repo in a state where every X.c file that was covered has
an X.c.gcov file containing the coverage counts for every line, and "#"
at every uncovered line.

There have been a few bugs in recent patches what would have been caught
if the test suite covered those blocks (including a few of mine). I want
to work towards a "sensible" amount of coverage on new topics. In my opinion,
this means that any logic should be covered, but the 'die()' blocks in error
cases do not need to be covered.

It is important to not measure the coverage of the codebase by what old code
is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
After creating the coverage statistics at a version (say, 'topic') you can
then run

contrib/coverage-diff.sh base topic

to see the lines added between 'base' and 'topic' that are not covered by the
test suite. The output uses 'git blame -c' format so you can find the commits
responsible and view the line numbers for quick access to the context.

Signed-off-by: Derrick Stolee 
---
 contrib/coverage-diff.sh | 63 
 1 file changed, 63 insertions(+)
 create mode 100755 contrib/coverage-diff.sh

diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh
new file mode 100755
index 00..0f226f038c
--- /dev/null
+++ b/contrib/coverage-diff.sh
@@ -0,0 +1,63 @@
+#!/bin/sh
+
+# Usage: 'contrib/coverage-diff.sh  
+# Outputs a list of new lines in version2 compared to version1 that are
+# not covered by the test suite. Assumes you ran
+# 'make coverage-test coverage-report' from root first, so .gcov files exist.
+
+V1=$1
+V2=$2
+
+diff_lines () {
+   while read line
+   do
+   if echo $line | grep -q -e "^@@ -([0-9]+)(,[0-9]+)? 
\\+([0-9]+)(,[0-9]+)? @@.*"
+   then
+   line_num=$(echo $line \
+   | awk 'match($0, "@@ -([0-9]+)(,[0-9]+)? 
\\+([0-9]+)(,[0-9]+)? @@.*", m) { print m[3] }')
+   else
+   echo "$line_num:$line"
+   if ! echo $line | grep -q -e "^-"
+   then
+   line_num=$(($line_num + 1))
+   fi
+   fi
+   done
+}
+
+files=$(git diff --raw $V1 $V2 \
+   | grep \.c$ \
+   | awk 'NF>1{print $NF}')
+
+for file in $files
+do
+   git diff $V1 $V2 -- $file \
+   | diff_lines \
+   | grep ":+" \
+   | sed 's/:/ /g' \
+   | awk '{print $1}' \
+   | sort \
+   >new_lines.txt
+
+   hash_file=$(echo $file | sed "s/\//\#/")
+   cat "$hash_file.gcov" \
+   | grep \#\#\#\#\# \
+   | sed 's/#: //g' \
+   | sed 's/\:/ /g' \
+   | awk '{print $1}' \
+   | sort \
+   >uncovered_lines.txt
+
+   comm -12 uncovered_lines.txt new_lines.txt \
+   | sed -e 's/$/\)/' \
+   | sed -e 's/^/\t/' \
+   >uncovered_new_lines.txt
+
+   grep -q '[^[:space:]]' < uncovered_new_lines.txt && \
+   echo $file && \
+   git blame -c $file \
+   | grep -f uncovered_new_lines.txt
+
+   rm -f new_lines.txt uncovered_lines.txt uncovered_new_lines.txt
+done
+
-- 
gitgitgadget