[PATCH 2/2] sha1-lookup: fix handling of duplicates in sha1_pos()

2014-10-01 Thread René Scharfe
If the first 18 bytes of the SHA1's of all entries are the same then
sha1_pos() dies and reports that the lower and upper limits of the
binary search were the same that this wasn't supposed to happen.  This
is wrong because the remaining two bytes could still differ.

Furthermore: It wouldn't be a problem if they actually were the same,
i.e. if all entries have the same SHA1.  The code already handles
duplicates just fine otherwise.  Simply remove the erroneous check.

Signed-off-by: Rene Scharfe 
---
 sha1-lookup.c |  2 --
 t/t0064-sha1-array.sh | 20 
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/sha1-lookup.c b/sha1-lookup.c
index 2dd8515..5f06921 100644
--- a/sha1-lookup.c
+++ b/sha1-lookup.c
@@ -84,8 +84,6 @@ int sha1_pos(const unsigned char *sha1, void *table, size_t 
nr,
die("BUG: assertion failed in binary search");
}
}
-   if (18 <= ofs)
-   die("cannot happen -- lo and hi are identical");
}
 
do {
diff --git a/t/t0064-sha1-array.sh b/t/t0064-sha1-array.sh
index bd68789..3fcb8d8 100755
--- a/t/t0064-sha1-array.sh
+++ b/t/t0064-sha1-array.sh
@@ -61,4 +61,24 @@ test_expect_success 'lookup with duplicates' '
test "$n" -le 3
 '
 
+test_expect_success 'lookup with almost duplicate values' '
+   {
+   echo "append " &&
+   echo "append 555f" &&
+   echo20 "lookup " 55
+   } | test-sha1-array >actual &&
+   n=$(cat actual) &&
+   test "$n" -eq 0
+'
+
+test_expect_success 'lookup with single duplicate value' '
+   {
+   echo20 "append " 55 55 &&
+   echo20 "lookup " 55
+   } | test-sha1-array >actual &&
+   n=$(cat actual) &&
+   test "$n" -ge 0 &&
+   test "$n" -le 1
+'
+
 test_done
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] sha1-lookup: fix handling of duplicates in sha1_pos()

2014-10-01 Thread Jeff King
On Wed, Oct 01, 2014 at 11:43:21AM +0200, René Scharfe wrote:

> If the first 18 bytes of the SHA1's of all entries are the same then
> sha1_pos() dies and reports that the lower and upper limits of the
> binary search were the same that this wasn't supposed to happen.  This
> is wrong because the remaining two bytes could still differ.
> 
> Furthermore: It wouldn't be a problem if they actually were the same,
> i.e. if all entries have the same SHA1.  The code already handles
> duplicates just fine otherwise.  Simply remove the erroneous check.

Yeah, I agree that assertion is just wrong.

Regarding duplicates: in sha1_entry_pos, we had to handle the "not
found" case specially, because we may have found the left-hand or
right-hand side of a run of duplicates, and we want to return the
correct slot where the new item would go (see the comment added by
171bdac). I think we don't have to deal with that here, because we are
just dealing with the initial "mi" selection. The actual binary search
is plain-vanilla, which handles that case just fine.

I wonder if it is worth adding a test (you test only that "not found"
produces a negative index, but not which index). Like:

diff --git a/t/t0064-sha1-array.sh b/t/t0064-sha1-array.sh
index 3fcb8d8..7781129 100755
--- a/t/t0064-sha1-array.sh
+++ b/t/t0064-sha1-array.sh
@@ -42,12 +42,12 @@ test_expect_success 'lookup' '
 '
 
 test_expect_success 'lookup non-existing entry' '
+   echo -1 >expect &&
{
echo20 "append " 88 44 aa 55 &&
echo20 "lookup " 33
} | test-sha1-array >actual &&
-   n=$(cat actual) &&
-   test "$n" -lt 0
+   test_cmp expect actual
 '
 
 test_expect_success 'lookup with duplicates' '
@@ -61,6 +61,17 @@ test_expect_success 'lookup with duplicates' '
test "$n" -le 3
 '
 
+test_expect_success 'lookup non-existing entry with duplicates' '
+   echo -5 >expect &&
+   {
+   echo20 "append " 88 44 aa 55 &&
+   echo20 "append " 88 44 aa 55 &&
+   echo20 "lookup " 66
+   } | test-sha1-array >actual &&
+   test_cmp expect actual
+'
+
+
 test_expect_success 'lookup with almost duplicate values' '
{
echo "append " &&
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] sha1-lookup: fix handling of duplicates in sha1_pos()

2014-10-01 Thread René Scharfe

Am 01.10.2014 um 12:50 schrieb Jeff King:

On Wed, Oct 01, 2014 at 11:43:21AM +0200, René Scharfe wrote:


If the first 18 bytes of the SHA1's of all entries are the same then
sha1_pos() dies and reports that the lower and upper limits of the
binary search were the same that this wasn't supposed to happen.  This
is wrong because the remaining two bytes could still differ.

Furthermore: It wouldn't be a problem if they actually were the same,
i.e. if all entries have the same SHA1.  The code already handles
duplicates just fine otherwise.  Simply remove the erroneous check.


Yeah, I agree that assertion is just wrong.

Regarding duplicates: in sha1_entry_pos, we had to handle the "not
found" case specially, because we may have found the left-hand or
right-hand side of a run of duplicates, and we want to return the
correct slot where the new item would go (see the comment added by
171bdac). I think we don't have to deal with that here, because we are
just dealing with the initial "mi" selection. The actual binary search
is plain-vanilla, which handles that case just fine.

I wonder if it is worth adding a test (you test only that "not found"
produces a negative index, but not which index). Like:


api-sha1-array.txt says about sha1_array_lookup: "If not found, returns 
a negative integer", and that's what the test checks.


I actually like that the value is not specified for that case because no 
existing caller actually uses it and it leaves room to implement the 
function e.g. using bsearch(3).


I agree that adding a "lookup non-existing entry with duplicates" test 
would make t0064 more complete, though.



diff --git a/t/t0064-sha1-array.sh b/t/t0064-sha1-array.sh
index 3fcb8d8..7781129 100755
--- a/t/t0064-sha1-array.sh
+++ b/t/t0064-sha1-array.sh
@@ -42,12 +42,12 @@ test_expect_success 'lookup' '
  '

  test_expect_success 'lookup non-existing entry' '
+   echo -1 >expect &&
{
echo20 "append " 88 44 aa 55 &&
echo20 "lookup " 33
} | test-sha1-array >actual &&
-   n=$(cat actual) &&
-   test "$n" -lt 0
+   test_cmp expect actual
  '

  test_expect_success 'lookup with duplicates' '
@@ -61,6 +61,17 @@ test_expect_success 'lookup with duplicates' '
test "$n" -le 3
  '

+test_expect_success 'lookup non-existing entry with duplicates' '
+   echo -5 >expect &&
+   {
+   echo20 "append " 88 44 aa 55 &&
+   echo20 "append " 88 44 aa 55 &&
+   echo20 "lookup " 66
+   } | test-sha1-array >actual &&
+   test_cmp expect actual
+'
+
+
  test_expect_success 'lookup with almost duplicate values' '
{
echo "append " &&


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] sha1-lookup: fix handling of duplicates in sha1_pos()

2014-10-01 Thread Jeff King
On Wed, Oct 01, 2014 at 01:10:12PM +0200, René Scharfe wrote:

> >I wonder if it is worth adding a test (you test only that "not found"
> >produces a negative index, but not which index). Like:
> 
> api-sha1-array.txt says about sha1_array_lookup: "If not found, returns a
> negative integer", and that's what the test checks.

Hmm. I do not recall intentionally leaving the value unspecified; I
think it is more that I was simply not thorough when writing the
documentation. That being said...

> I actually like that the value is not specified for that case because no
> existing caller actually uses it and it leaves room to implement the
> function e.g. using bsearch(3).

Yeah, if no callers actually care right now, that is a reasonable
argument for leaving the exact return value unspecified (and testing
only what the documentation claims).

> I agree that adding a "lookup non-existing entry with duplicates" test would
> make t0064 more complete, though.

Agreed.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] sha1-lookup: fix handling of duplicates in sha1_pos()

2014-10-01 Thread Eric Sunshine
On Wed, Oct 1, 2014 at 5:43 AM, René Scharfe  wrote:
> If the first 18 bytes of the SHA1's of all entries are the same then
> sha1_pos() dies and reports that the lower and upper limits of the
> binary search were the same that this wasn't supposed to happen.  This
> is wrong because the remaining two bytes could still differ.
>
> Furthermore: It wouldn't be a problem if they actually were the same,
> i.e. if all entries have the same SHA1.  The code already handles
> duplicates just fine otherwise.  Simply remove the erroneous check.
>
> Signed-off-by: Rene Scharfe 
> ---
>  sha1-lookup.c |  2 --
>  t/t0064-sha1-array.sh | 20 
>  2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/sha1-lookup.c b/sha1-lookup.c
> index 2dd8515..5f06921 100644
> --- a/sha1-lookup.c
> +++ b/sha1-lookup.c
> @@ -84,8 +84,6 @@ int sha1_pos(const unsigned char *sha1, void *table, size_t 
> nr,
> die("BUG: assertion failed in binary search");
> }
> }
> -   if (18 <= ofs)
> -   die("cannot happen -- lo and hi are identical");
> }
>
> do {
> diff --git a/t/t0064-sha1-array.sh b/t/t0064-sha1-array.sh
> index bd68789..3fcb8d8 100755
> --- a/t/t0064-sha1-array.sh
> +++ b/t/t0064-sha1-array.sh
> @@ -61,4 +61,24 @@ test_expect_success 'lookup with duplicates' '
> test "$n" -le 3
>  '
>
> +test_expect_success 'lookup with almost duplicate values' '
> +   {
> +   echo "append " &&
> +   echo "append 555f" &&
> +   echo20 "lookup " 55
> +   } | test-sha1-array >actual &&
> +   n=$(cat actual) &&
> +   test "$n" -eq 0
> +'
> +
> +test_expect_success 'lookup with single duplicate value' '
> +   {
> +   echo20 "append " 55 55 &&
> +   echo20 "lookup " 55
> +   } | test-sha1-array >actual &&
> +   n=$(cat actual) &&
> +   test "$n" -ge 0 &&
> +   test "$n" -le 1
> +'

An alternative would be to introduce these two tests in patch 1/2 as
test_expect_failure and flip them to test_expect_success in this patch
which fixes the problem.

> +
>  test_done
> --
> 2.1.2
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html