Re: Consequences of CRLF in index?

2017-10-25 Thread Junio C Hamano
Stefan Beller  writes:

>> diff --git a/diff.c b/diff.c
>> index 6fd288420b..eeca0fd3b8 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -4202,7 +4202,8 @@ void diff_setup_done(struct diff_options *options)
>>
>> if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
>> DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
>> -   DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
>> +   DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
>> +   DIFF_XDL_TST(options, IGNORE_CR_AT_EOL))
>
> This highlights another part of the flag macros, that could be made nicer.
> All these flags combined are XDF_WHITESPACE_FLAGS, so this
> if could be written without the macros as
>
>   if (options->xdl_ops & XDF_WHITESPACE_FLAGS)

Yes, and I think the codepath that matters most already uses that
form.  Perhaps it is a good idea to do the rewrite without a macro
(XDF_WHITESPACE_FLAGS is already a macro enough).

> (1<<5) is taken twice now.

Good eyes.  I think we use bits #1-#8 now (bit #0 is vacant, so are
#9-#31).

>> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
>> index 04d7b32e4e..8720e272b9 100644
>> --- a/xdiff/xutils.c
>> +++ b/xdiff/xutils.c
>> @@ -156,6 +156,21 @@ int xdl_blankline(const char *line, long size, long 
>> flags)
>> return (i == size);
>>  }
>>
>> +/*
>> + * Have we eaten everything on the line, except for an optional
>> + * CR at the very end?
>> + */
>> +static int ends_with_optional_cr(const char *l, long s, long i)
>> +{
>> +   if (s && l[s-1] == '\n')
>> +   s--;
>
> so first we cut off the '\n',

> That seems correct after some thought;

I added the "trim the LF at the end" to the beginning of the
function at the last minute to cheat, and it is debatable if it is
entirely correct on an incomplete line.

The byte at the end of line, i.e. l[s-1], could be either '\n' or
something else, and the latter is an incompete line at the end of
the file.  When we trimmed the LF and decremented s, CR at l[s-1]
is the CR in CRLF, which we do want to ignore.  If we didn't, then
what is CR sitting at l[s-1]?  It is a lone CR at the end of file,
not a part of CRLF.  Do we really want to ignore it?

If we take the name of the option "ignore-cr-at-eol" literally, yes,
it is a CR sitting at the end of a line, which happens to be an
incomplete one, so we do want to ignore.  But if we think about the
reason why we are adding the option (i.e. to help conversion between
CRLF and LF), it is somewhat iffy.  The lone CR at the end of file
cannot be something that came from CRLF<->LF conversion, and ignoring
it would hide possible problems in conversion or the original data.

> I might offer
> another easier to understand (for me) solution,
> ...
> Though this seems even more complicated
> after having it written down.

This happens to me quite often and my solution to it is to remove
the alternative I tried to formulate after convincing myself that it
is not that much of an improvement ;-).


Re: [PATCH 0/6] Create Git/Packet.pm

2017-10-25 Thread Junio C Hamano
Johannes Schindelin  writes:

> Note that the correct blib path starts with `C:\BuildAgent\_work` and
> the line
>
>   use lib (split(/:/, $ENV{GITPERLLIB}));
>
> splits off the drive letter from the rest of the path. Obviously, this
> fails to Do The Right Thing, and simply points to Yet Another Portability
> Problem with Git's reliance on Unix scripting.

In our C code, we have "#define PATH_SEP ';'", and encourage our
code to be careful and use it.  Is there something similar for Perl
scripts, I wonder.

I notice that t/{t0202,t9000,t9700}/test.pl share the same
split(/:/, $ENV{GITPERLLIB}); forcing this specific variable to use
the non-platform convention to accomodate the use of split(/:/)
certainly is a workaround, but it does feel dirty.

It is hard to imagine that we were the first people who wants to
split the value of a variable into a list, where the value is a list
of paths, concatenated into a single string with a delimiter that
may be platform specific.  I wonder if we are going against a best
practice established in the Perl world, simply because we don't know
about it (i.e. basically, it would say "don't split at a colon
because not all world is Unix; use $this_module instead", similar to
"don't split at a slash, use File::Spec instead to extract path
components").



Re: [PATCH 02/13] list-objects-filter-map: extend oidmap to collect omitted objects

2017-10-25 Thread Junio C Hamano
Jeff Hostetler  writes:

> Sorry, I meant a later commit in this patch series.  It is used by
> commits 4, 5, 6, and 10 to actually do the filtering and collect a
> list of omitted or missing objects.

I know you meant "later commits in the series" ;-).  

It does not change the fact that readers of 02/13 haven't seen them
yet to understand patch 02/13, if the changes that drove the design
of this step is in the same series or if they are not yet posted.

> I think of a "set" as a member? or not-member? class.
> I think of a "map" as a member? or not-member? class but where each
> member also has a value.  Sometimes map lookups just want to know
> membership and sometimes the lookup wants the value.
>
> Granted, having the key and value data stuffed into the same entry
> (from hashmap's point of view, rather than a key having a pointer
> to a value) does kind of blur the line, but I was thinking about
> a map here.  (And I was building on oidmap which builds on hashmap,
> so it seemed appropriate.)

My question was mostly about "if this is a map, then a caller that
queries the map with an oid does so because it wants to know the
data associated to the oid; if this is just a set, it is mostly
interested in the membership" and "I cannot quite tell which was
meant without the caller".  

It seems that some callers do care about the "path" name from your
response above, so calling this "map" sounds more appropriate.

The answer "it can be used to speed up 'is this path excluded?'
check" is a bit worrisome, though.  A blob can appear at more than
one path, and unless all the appearances of it are in an excluded
path, omitting the blob from the repository would lead to an aborted
"rev-list --objects" run, and this "map" can record at most one path
per each object; we need to wait until seeing the optimization code
to actually see how effectively this data helps optimization and
comment on the code ;-)

>>> +   len = ((pathname && *pathname) ? strlen(pathname) : 0);
>>> +   size = (offsetof(struct list_objects_filter_map_entry, pathname) + len 
>>> + 1);
>>> +   e = xcalloc(1, size);
>>> +
>>> +   oidcpy(>entry.oid, oid);
>>> +   e->type = type;
>>> +   if (pathname && *pathname)
>>> +   strcpy(e->pathname, pathname);
>>> +
>>> +   oidmap_put(map, e);
>>> +   return 0;
>>> +}
>>
>> The return value from the function needs to be documented in the
>> header to help callers.  It is not apparent why "we did already have
>> one" and "we now newly added" is interesting to the callers, for
>> example.  An obvious alternative implementation of this function
>> would return the pointer to an entry that records the object id
>> (i.e. either the one that was already there, or the one we created
>> because we saw this object for the first time), so that the caller
>> can do something interesting to it---again, because the reason why
>> we want this "filter map" is not explained at this stage, it is hard
>> to tell what that "sometehing interesting" would be.
>
> good point.  thanks.

I am more confused by the response ;-) But as we established that
this is a map (not a set that borrows the implementation of map),
where the data recorded in 'e' is quite useful to the caller, it
probably makes sense to make 'e' available to the caller?  It is
still unclear if the caller finds "it is the first time I saw the
object you gave me" vs "I've seen that object before already"
useful.

>>> +   for (k = 0; k < nr; k++)
>>> +   cb(k, nr, array[k], cb_data);
>>
>> Also it is not clear if you wanted to expose the type of the
>> entry to the callback function.
>
> The thought was that we would sort the OIDs so that things
> like rev-list could print the omitted/missing objects in OID
> order.  Not critical that we do it here, but I thought it would
> help callers.

I can foresee some callers would want sorted, while others do not.
I was primarily wondering why "my_cmp" is not a parameter that can
be NULL (in which case we do not sort at all).

>> An obvious alternative
>>
>>  fn([k].entry.oid, cb_data);
>>
>> would allow you to keep the type of map-entry private to the map,
>> and also the callback does not need to know about k or nr.
>> ...
> I included the {k, nr} so that the callback could dump header/trailer
> information when reporting the results or pre-allocate an array.
> I'll look at refactoring this -- I never quite liked how it turned
> out anyway -- especially with the oidmap simplifications.

And as we established that this is a map, where the data associated
with each oid is interesting to the caller, we do not want to hide
the type of array[] element by passing only [k].entry.oid, I
guess?

Thanks.


Re: [PATCH 01/13] dir: allow exclusions from blob in addition to file

2017-10-25 Thread Junio C Hamano
Jeff Hostetler  writes:

> The existing code handles use cases where you want to read the
> exclusion list from a pathname in the worktree -- or from blob
> named in the index when the pathname is not populated (presumably
> because of the skip-worktree bit).
>
> I was wanting to add a more general case (and perhaps my commit
> message should be improved).  I want to be able to read it from
> a blob not necessarily associated with the current commit or
> not necessarily available on the local client, but yet known to
> exist.  

Oh, I understand the above two paragraphs perfectly well, and I
agree with you that such a helper to read from an arbitrary blob is
a worthy thing to have.  I was merely commenting on the fact that
such a helper that is meant to be able to handle more general cases
is not used to help the more specific case that we already have,
which was a bit curious.

I guess the reason why it is not done is (besides expediency)
because the model the new helper operates in would not fit well with
the existing logic flow, where everything is loaded into core
(either from the filesystem or from a blob) and then a common code
parses and registers; the helper wants to do the reading (only) from
the blob, the parsing and the registration all by itself, so there
is not much that can be shared even if the existing code wanted to
reuse what the helper offers.

The new helper mimicks the read_skip_worktree_file_from_index()
codepath to massage the data it reads from the blob to buf[] but not
really (e.g. even though it copies and pastes a lot, it forgets to
call skip_utf8_bom(), for example).  We may still want to see if we
can share more so that we do not have to worry about these tiny
differences between codepaths.

> With my "add_excludes_from_blob_to_list()", we can request a
> blob-ish expression, such as "master:enlistments/foo".  In my
> later commits associated with clone and fetch, we can use this
> mechanism to let the client ask the server to filter using the
> blob associated with this blob-ish.  If the client has the blob
> (such as during a later fetch) and can resolve it, then it can
> and send the server the OID, but it can also send the blob-ish
> to the server and let it resolve it.

Security-minded people may want to keep an eye or two open for these
later patches---extended SHA-1 expressions is a new attack surface
we would want to carefully polish and protect.


Re: [PATCH v2 1/1] status: do not get confused by submodules in excluded directories

2017-10-25 Thread Junio C Hamano
Johannes Schindelin  writes:

> We meticulously pass the `exclude` flag to the `treat_directory()`
> function so that we can indicate that files in it are excluded rather
> than untracked when recursing.
>
> But we did not yet treat submodules the same way.
>
> Because of that, `git status --ignored --untracked` with a submodule
> `submodule` in a gitignored `tracked/` would show the submodule in the
> "Untracked files" section, e.g.
>
>   On branch master
>   Untracked files:
> (use "git add ..." to include in what will be committed)
>
>   tracked/submodule/
>
>   Ignored files:
> (use "git add -f ..." to include in what will be committed)
>
>   tracked/submodule/initial.t
>
> Instead, we would want it to show the submodule in the "Ignored files"
> section:

Makes sense.  Also listing the paths in the embedded working tree
like initial.t as if it were part of our project is utterly wrong,
especially because we are not doing any --recurse-submodules thing.

Both the change and the updated description looks good.  Thanks.


Re: [PATCH 2/2] diff.c: get rid of duplicate implementation

2017-10-25 Thread Junio C Hamano
Stefan Beller  writes:

> The implementations in diff.c to detect moved lines needs to compare
> strings and hash strings, which is implemented in that file, as well
> as in the xdiff library.
>
> Remove the rather recent implementation in diff.c and rely on the well
> exercised code in the xdiff lib.
> ...
>  static int moved_entry_cmp(const struct diff_options *diffopt,
>  const struct moved_entry *a,
>  const struct moved_entry *b,
>  const void *keydata)
>  {
> - const char *ap = a->es->line, *ae = a->es->line + a->es->len;
> - const char *bp = b->es->line, *be = b->es->line + b->es->len;
> - ...
> + return !xdiff_compare_lines(a->es->line, a->es->len,
> + b->es->line, b->es->len,
> + diffopt->xdl_opts);
>  }

OK, xdiff's xdl_recmatch() takes counted strings, and line[len] is
one byte beyond the end of the line (where LF typically sits), which
is the same convention as how "emitted_symbol" represents a line, so
not just the implementation replaced with a known-working one, but
also the code calls the helper with correct calling convention ;-)

> - ret->ent.hash = get_string_hash(l, o);
> + ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);

Likewise.  Looks good.

Will queue.


[PATCH] blame: add --fuzzy-lines command line option

2017-10-25 Thread Isabella Stephens
If the -L option is used to specify a line range in git blame, and the
end of the range is past the end of the file, git will fail with a fatal
error. It may instead be desirable to perform a git blame for the line
numbers in the intersection of the file and the specified line range.
This patch adds a --fuzzy-lines command line option to allow this.

Signed-off-by: Isabella Stephens 
---
 Documentation/blame-options.txt | 5 +
 builtin/blame.c | 4 
 t/t8003-blame-corner-cases.sh   | 6 ++
 3 files changed, 15 insertions(+)

diff --git a/Documentation/blame-options.txt b/Documentation/blame-options.txt
index dc41957af..664cd8f8b 100644
--- a/Documentation/blame-options.txt
+++ b/Documentation/blame-options.txt
@@ -19,6 +19,11 @@
 +
 include::line-range-format.txt[]
 
+--fuzzy-lines::
+   Use fuzzy line ranges. If a range specified with -L starts on a line
+   within the file but ends past the end of the file, display the blame
+   for the existing lines rather than failing.
+
 -l::
Show long rev (Default: off).
 
diff --git a/builtin/blame.c b/builtin/blame.c
index 67adaef4d..d25b39d40 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -661,6 +661,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
struct string_list range_list = STRING_LIST_INIT_NODUP;
int output_option = 0, opt = 0;
+   int fuzzy_lines = 0;
int show_stats = 0;
const char *revs_file = NULL;
const char *contents_from = NULL;
@@ -670,6 +671,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
OPT_BOOL(0, "root", _root, N_("Do not treat root commits 
as boundaries (Default: off)")),
OPT_BOOL(0, "show-stats", _stats, N_("Show work cost 
statistics")),
OPT_BOOL(0, "progress", _progress, N_("Force progress 
reporting")),
+   OPT_BOOL(0, "fuzzy-lines", _lines, N_("Use fuzzy line 
ranges")),
OPT_BIT(0, "score-debug", _option, N_("Show output score 
for blame entries"), OUTPUT_SHOW_SCORE),
OPT_BIT('f', "show-name", _option, N_("Show original 
filename (Default: auto)"), OUTPUT_SHOW_NAME),
OPT_BIT('n', "show-number", _option, N_("Show original 
linenumber (Default: off)"), OUTPUT_SHOW_NUMBER),
@@ -878,6 +880,8 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
nth_line_cb, , lno, anchor,
, , sb.path))
usage(blame_usage);
+   if (fuzzy_lines && lno < top)
+   top = lno;
if (lno < top || ((lno || bottom) && lno < bottom))
die(Q_("file %s has only %lu line",
   "file %s has only %lu lines",
diff --git a/t/t8003-blame-corner-cases.sh b/t/t8003-blame-corner-cases.sh
index 661f9d430..6e7657df2 100755
--- a/t/t8003-blame-corner-cases.sh
+++ b/t/t8003-blame-corner-cases.sh
@@ -220,6 +220,12 @@ test_expect_success 'blame -L with invalid end' '
test_i18ngrep "has only 2 lines" errors
 '
 
+test_expect_success 'blame -L with invalid end and fuzzy-lines' '
+   git blame -L1,5 tres --fuzzy-lines >out &&
+   cat out &&
+   test $(wc -l < out) -eq 2
+'
+
 test_expect_success 'blame parses  part of -L' '
git blame -L1,1 tres >out &&
cat out &&
-- 
2.14.1



Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-25 Thread Junio C Hamano
Jeff Hostetler  writes:

> A question of mailing-list etiquette: in patch 9, I took Jonathan's
> ideas for adding the "extensions.partialclone" setting and extended it
> with some helper functions.  His change was part of a larger change
> with other code (fsck, IIRC) that I wasn't ready for.  What is the
> preferred way to give credit for something like this?

I think the note you left in the proposed log message

This patch is part of a patch originally authored by:
Jonathan Tan 

was a bit misleading.  The phrasing makes it sound as if it is
more-or-less verbatim copy (either of the whole thing or just a
subset) of Jonathan's patch, in which case, keeping the authorship
intact, i.e.

From: Jonathan Tan 

... log message taken from the original, with additional
... note to describe any adjustment you did

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 

would have been more appropriate.  But if you just were inspired by
the idea in his patch and wrote a one that is similar to but
different from it that suits the need of your series better, then a
note left in the log that instead does s/is part of/was inspired by/
would have been perfectly fine.

Thanks.





Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-25 Thread Junio C Hamano
Jonathan Tan  writes:

> On Tue, Oct 24, 2017 at 10:00 PM, Junio C Hamano  wrote:
>> OK, thanks for working well together.  So does this (1) build on
>> Jonathan's fsck-squelching series, or (2) ignores that and builds
>> filtering first, potentially leaving the codebase to a broken state
>> where it can create fsck-unclean repository until Jonathan's series
>> is rebased on top of this, or (3) something else?  [*1*]
>
> Excluding the partialclone patch (patch 9), I think that the answer is
> (2), but I don't think that it leaves the codebase in a broken state.
> In particular, none of the code modifies the repo, so it can't create
> a fsck-unclean one.

OK.  It is not dangerous enough to matter until we start using the
updated features in repack->rev-list|pack-objects ;-)  As I said, I
was mostly interested in learning what the big-picture direction was
and also seeing you two were more-or-less in agreement.

> The above is relevant only if we can exclude the partialclone patch,
> but I think that we can and we should, as I wrote in my reply to Jeff
> Hostetler [1].

OK.

> As for how this patch set (excluding the partialclone patch) interacts
> with my fsck series, they are relatively independent, as far as I can
> tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not
> the fetch and clone parts, which we plan to instead adapt from Jeff
> Hostetler's patches, as far as I know) on top of these and resend
> those out once discussion on this has settled.

OK.  Thanks, I think tht is a reasonable way forward.

> [1] 
> https://public-inbox.org/git/CAGf8dg+8AR=xfsv92odatktnjbnd1+ovzp9rs4y4otz_ezy...@mail.gmail.com/
>
>> I also saw a patch marked as "this is from Jonathan's earlier work",
>> taking the authorship (which to me implies that the changes were
>> extensive enough), so I am a bit at loss envisioning how this piece
>> fits in the bigger picture together with the other piece.
>
> The patch you mentioned is the partialclone patch, which I think can
> be considered separately from the rest (as I said above).

Good, that lets us sidestep Jeff's question about "how should the
credits for this change attributed?", too.


Re: [PATCH v2 1/4] fsmonitor: Set the PWD to the top of the working tree

2017-10-25 Thread Alex Vandiver


On Wed, 25 Oct 2017, Alex Vandiver wrote:
> The fsmonitor command inherits the PWD of its caller, which may be
> anywhere in the working copy.  This makes is difficult for the
> fsmonitor command to operate on the whole repository.  Specifically,
> for the watchman integration, this causes each subdirectory to get its
> own watch entry.
> 
> Set the CWD to the top of the working directory, for consistency.
> 
> Signed-off-by: Alex Vandiver 
> ---
>  fsmonitor.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 7c1540c05..0d26ff34f 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -121,6 +121,7 @@ static int query_fsmonitor(int version, uint64_t 
> last_update, struct strbuf *que
>   argv[3] = NULL;
>   cp.argv = argv;
>   cp.use_shell = 1;
> +cp.dir = get_git_work_tree();

Looks like my editor swapped out a tab on me.  I'll hold off on
sending a revised version to collect any other comments.
 - Alex


[PATCH v2 1/4] fsmonitor: Set the PWD to the top of the working tree

2017-10-25 Thread Alex Vandiver
The fsmonitor command inherits the PWD of its caller, which may be
anywhere in the working copy.  This makes is difficult for the
fsmonitor command to operate on the whole repository.  Specifically,
for the watchman integration, this causes each subdirectory to get its
own watch entry.

Set the CWD to the top of the working directory, for consistency.

Signed-off-by: Alex Vandiver 
---
 fsmonitor.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fsmonitor.c b/fsmonitor.c
index 7c1540c05..0d26ff34f 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -121,6 +121,7 @@ static int query_fsmonitor(int version, uint64_t 
last_update, struct strbuf *que
argv[3] = NULL;
cp.argv = argv;
cp.use_shell = 1;
+cp.dir = get_git_work_tree();
 
return capture_command(, query_result, 1024);
 }
-- 
2.15.0.rc1.413.g76aedb451



[PATCH v2 4/4] fsmonitor: Delay updating state until after split index is merged

2017-10-25 Thread Alex Vandiver
If the fsmonitor extension is used in conjunction with the split index
extension, the set of entries in the index when it is first loaded is
only a subset of the real index.  This leads to only the non-"base"
index being marked as CE_FSMONITOR_VALID.

Delay the expansion of the ewah bitmap until after tweak_split_index
has been called to merge in the base index as well.

The new fsmonitor_dirty is kept from being leaked by dint of being
cleaned up in post_read_index_from, which is guaranteed to be called
after do_read_index in read_index_from.

Signed-off-by: Alex Vandiver 
---
 cache.h |  1 +
 fsmonitor.c | 39 ---
 2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index 25adcf681..0a4f43ec2 100644
--- a/cache.h
+++ b/cache.h
@@ -348,6 +348,7 @@ struct index_state {
unsigned char sha1[20];
struct untracked_cache *untracked;
uint64_t fsmonitor_last_update;
+   struct ewah_bitmap *fsmonitor_dirty;
 };
 
 extern struct index_state the_index;
diff --git a/fsmonitor.c b/fsmonitor.c
index 0d26ff34f..fad9c6b13 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -49,20 +49,7 @@ int read_fsmonitor_extension(struct index_state *istate, 
const void *data,
ewah_free(fsmonitor_dirty);
return error("failed to parse ewah bitmap reading fsmonitor 
index extension");
}
-
-   if (git_config_get_fsmonitor()) {
-   /* Mark all entries valid */
-   for (i = 0; i < istate->cache_nr; i++)
-   istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID;
-
-   /* Mark all previously saved entries as dirty */
-   ewah_each_bit(fsmonitor_dirty, fsmonitor_ewah_callback, istate);
-
-   /* Now mark the untracked cache for fsmonitor usage */
-   if (istate->untracked)
-   istate->untracked->use_fsmonitor = 1;
-   }
-   ewah_free(fsmonitor_dirty);
+   istate->fsmonitor_dirty = fsmonitor_dirty;
 
trace_printf_key(_fsmonitor, "read fsmonitor extension 
successful");
return 0;
@@ -239,7 +226,29 @@ void remove_fsmonitor(struct index_state *istate)
 
 void tweak_fsmonitor(struct index_state *istate)
 {
-   switch (git_config_get_fsmonitor()) {
+   int i;
+   int fsmonitor_enabled = git_config_get_fsmonitor();
+
+   if (istate->fsmonitor_dirty) {
+   if (fsmonitor_enabled) {
+   /* Mark all entries valid */
+   for (i = 0; i < istate->cache_nr; i++) {
+   istate->cache[i]->ce_flags |= 
CE_FSMONITOR_VALID;
+   }
+
+   /* Mark all previously saved entries as dirty */
+   ewah_each_bit(istate->fsmonitor_dirty, 
fsmonitor_ewah_callback, istate);
+
+   /* Now mark the untracked cache for fsmonitor usage */
+   if (istate->untracked)
+   istate->untracked->use_fsmonitor = 1;
+   }
+
+   ewah_free(istate->fsmonitor_dirty);
+   istate->fsmonitor_dirty = NULL;
+   }
+
+   switch (fsmonitor_enabled) {
case -1: /* keep: do nothing */
break;
case 0: /* false */
-- 
2.15.0.rc1.413.g76aedb451



[PATCH v2 3/4] fsmonitor: Document GIT_TRACE_FSMONITOR

2017-10-25 Thread Alex Vandiver
Signed-off-by: Alex Vandiver 
---
 Documentation/git.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 1fca63634..720db196e 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -594,6 +594,10 @@ into it.
 Unsetting the variable, or setting it to empty, "0" or
 "false" (case insensitive) disables trace messages.
 
+`GIT_TRACE_FSMONITOR`::
+   Enables trace messages for the filesystem monitor extension.
+   See `GIT_TRACE` for available trace output options.
+
 `GIT_TRACE_PACK_ACCESS`::
Enables trace messages for all accesses to any packs. For each
access, the pack file name and an offset in the pack is
-- 
2.15.0.rc1.413.g76aedb451



[PATCH v2 2/4] fsmonitor: Don't bother pretty-printing JSON from watchman

2017-10-25 Thread Alex Vandiver
This provides modest performance savings.  Benchmarking with the
following program, with and without `--no-pretty`, we find savings of
23% (0.316s -> 0.242s) in the git repository, and savings of 8% (5.24s
-> 4.86s) on a large repository with 580k files in the working copy.

#!/usr/bin/perl

use strict;
use warnings;
use IPC::Open2;

my $pid = open2(\*CHLD_OUT, \*CHLD_IN, "watchman -j @ARGV")
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";

my $query = qq|["query", "$ENV{PWD}", {}]|;

print CHLD_IN $query;
close CHLD_IN;
my $response = do {local $/; };

my $json_pkg;
eval {
require JSON::XSomething;
$json_pkg = "JSON::XSomething";
1;
} or do {
require JSON::PP;
$json_pkg = "JSON::PP";
};

my $o = $json_pkg->new->utf8->decode($response);

Signed-off-by: Alex Vandiver 
---
 t/t7519/fsmonitor-watchman | 2 +-
 templates/hooks--fsmonitor-watchman.sample | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman
index a3e30bf54..79f24325c 100755
--- a/t/t7519/fsmonitor-watchman
+++ b/t/t7519/fsmonitor-watchman
@@ -50,7 +50,7 @@ launch_watchman();
 
 sub launch_watchman {
 
-   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
+   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty')
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";
 
diff --git a/templates/hooks--fsmonitor-watchman.sample 
b/templates/hooks--fsmonitor-watchman.sample
index 9eba8a740..9a082f278 100755
--- a/templates/hooks--fsmonitor-watchman.sample
+++ b/templates/hooks--fsmonitor-watchman.sample
@@ -49,7 +49,7 @@ launch_watchman();
 
 sub launch_watchman {
 
-   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
+   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty')
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";
 
-- 
2.15.0.rc1.413.g76aedb451



[PATCH v2 0/4] fsmonitor fixes

2017-10-25 Thread Alex Vandiver
Updated based on comments from Dscho and Ben.  Thanks for those!
 - Alex



Re: [PATCH 4/4] fsmonitor: Delay updating state until after split index is merged

2017-10-25 Thread Alex Vandiver
On Fri, 20 Oct 2017, Johannes Schindelin wrote:
> From the diff, it is not immediately clear that fsmonitor_dirty is not
> leaked in any code path.
> 
> Could you clarify this in the commit message, please?

Will do!

> > @@ -238,6 +225,29 @@ void remove_fsmonitor(struct index_state *istate)
> >  
> >  void tweak_fsmonitor(struct index_state *istate)
> >  {
> > +   int i;
> > +
> > +   if (istate->fsmonitor_dirty) {
> > +   /* Mark all entries valid */
> > +   trace_printf_key(_fsmonitor, "fsmonitor is enabled; cache 
> > is %d", istate->cache_nr);
> 
> Sadly, a call to trace_printf_key() is not really a noop when tracing is
> disabled. [snip]

Apologies -- I'd meant to remove the tracing before committing.  I
think we're all on the same page that it would be nice to lower the
impact of tracing to let it be more prevalent, but I'd rather not
block these changes on that.

Thanks for the comments!
 - Alex


Re: [RFC] protocol version 2

2017-10-25 Thread Junio C Hamano
Brandon Williams  writes:

> On 10/24, Junio C Hamano wrote:
>> Brandon Williams  writes:
>> 
>> >   : When specified, only references matching the given patterns
>> >  are displayed.
>> 
>> I do not think you meant  here.
>> 
>> The side that is listing what it has has no reason to know what the
>> recipient plans to do with the result, so you must be only sending
>> the LHS of a refspec.  If your explanation says "given patterns",
>> then replace  with .  Do not abuse a term that has
>> specific and established meaning for something else.
>
> Yes, you're right i intended that to mean  instead so that the
> client could send "refs/heads/*" or some other such pattern and have the
> server limit its output.

Speaking of limiting the bandwidth consumed by the ref
advertisement, I think another trick that came up in past
discussions may be worth considering, which is to allow the
requestor to say, "oh by the way, I made exactly the same request as
this one earlier to you, and your response hashed down to this
value".  The responder may choose to give an incremental response
relative to the known result the requestor claims to have.

So for example, a requestor may have made an earlier ls-refs request
and can still recall that it got:

refs/heads/maintObjectID A
refs/heads/master   ObjectID B
refs/tags/v1.0  ObjectId C

Also assume that these three lines together (textually) hashes to Z.

When the requestor asks about the two hierarchies, it may say "I know
you gave a result that hashes to Z" with an additional parameter:

command=ls-ref
refspec=refs/heads/*
refspec=refs/tags/*
known=Z

If the current response for refs/heads/* and refs/tags/* when fully
spelt out were like this (i.e. we updated a ref, gained another, and
lost one):

refs/heads/master   ObjectID D
refs/tags/v1.0  ObjectId C
refs/tags/v1.1  ObjectID E

then the responder can send the fully spelt version, or it can
choose to say, "It's good that you know the state Z; relative to
that, refs/heads/maint no loner exists, refs/heads/master is now at
D and refs/tags/v1.1 at E has been added", if the latter results in
a shorter response (and if it recognises Z and map it back to the
set of refs and their values it responded with).

The "known" request parameter could further be refined (I do not
think this possibility was discussed in the past) to say "among the
values I received earlier from you, the ones that match this pattern
hashes to this", e.g. the earlier example request might become

command=ls-ref
refspec=refs/heads/*
refspec=refs/tags/*
known=X for refs/heads/*
known=Y for refs/tags/*



Re: [PATCH 2/4] fsmonitor: Don't bother pretty-printing JSON from watchman

2017-10-25 Thread Alex Vandiver
On Fri, 20 Oct 2017, Ben Peart wrote:
> > While I am very much infavor of this change (I was not aware of the
> > --no-pretty option), I would like to see some statistics on that. Could
> > you measure the impact, please, and include the results in the commit
> > message?
> > 
> > Ciao,
> > Johannes
> > 
> 
> I was also unaware of the --no-pretty option. I've tested this on Windows
> running version 4.9.0 of Watchman and verified that it does work correctly.
> I'm also curious if it produces any measurable difference in performance.

On a repository with ~160k files, the following test harness, which
requests all files inside the repository and parses that output:

--8<---
#!/usr/bin/perl

use strict;
use warnings;
use IPC::Open2;

my $pid = open2(\*CHLD_OUT, \*CHLD_IN, "watchman -j @ARGV")
or die "open2() failed: $!\n" .
"Falling back to scanning...\n";

my $query = qq|["query", "$ENV{PWD}", {}]|;

print CHLD_IN $query;
close CHLD_IN;
my $response = do {local $/; };

my $json_pkg;
eval {
require JSON::XS;
$json_pkg = "JSON::XS";
1;
} or do {
require JSON::PP;
$json_pkg = "JSON::PP";
};

my $o = $json_pkg->new->utf8->decode($response);
--8<---

...run with dumbbench[1], produces:

$ dumbbench -- ./test.pl
cmd: Ran 22 iterations (2 outliers).
cmd: Rounded run time per iteration: 5.240e+00 +/- 1.1e-02 (0.2%)
$ dumbbench -- ./test.pl --no-pretty
cmd: Ran 21 iterations (1 outliers).
cmd: Rounded run time per iteration: 4.866e+00 +/- 1.3e-02 (0.3%)

...so a modest 8% speedup.  I note that those numbers are for a perl
with JSON::XS installed; without it installed, the runtime is so long
that I gave up waiting for it.

Anyways, I'll put that in the commit message in the re-roll.
 - Alex


[1] https://metacpan.org/release/Dumbbench


Re: [PATCH 1/4] fsmonitor: Watch, and ask about, the top of the repo, not the CWD

2017-10-25 Thread Alex Vandiver
On Fri, 20 Oct 2017, Johannes Schindelin wrote:
> This is super expensive, as it means a full-blown new process instead of
> just a simple environment variable expansion.
> 
> The idea behind using `PWD` instead was that Git will already have done
> all of the work of figuring out the top-level directory and switched to
> there before calling the fsmonitor hook.

I'm not seeing that PWD has been at all altered.  The following does
seem like a better solution:
--8<-
diff --git a/fsmonitor.c b/fsmonitor.c
index 7c1540c05..4ea44dcc6 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -121,6 +121,7 @@ static int query_fsmonitor(int version, uint64_t 
last_update, struct strbuf *que
argv[3] = NULL;
cp.argv = argv;
cp.use_shell = 1;
+   cp.dir = get_git_work_tree();

return capture_command(, query_result, 1024);
 }
--8<-

I'll re-roll with that.

> Did you see any case where the script was *not* called from the top-level
> directory?

Merely calling `git status` inside a subdirectory is enough to for the
stock watchman config to report that it's in a "new" directory:

$ watchman watch-list
{
"roots": [],
"version": "4.7.0"
}
$ git status
Adding '/Users/alexmv/src/git' to watchman's watch list.
On branch test
nothing to commit, working tree clean
$ cd builtin/
$ git status
Adding '/Users/alexmv/src/git/builtin' to watchman's watch list.
On branch test
nothing to commit, working tree clean
$ watchman watch-list
{
"roots": [
"/Users/alexmv/src/git/builtin",
"/Users/alexmv/src/git"
],
"version": "4.7.0"
}

As I understand it, that means that it then loses all performance
gains in the new directory, as it spits out "all dirty."

 - Alex


Re: [PATCH 0/6] Create Git/Packet.pm

2017-10-25 Thread Johannes Schindelin
Hi Christian,

On Thu, 19 Oct 2017, Christian Couder wrote:

>   Add Git/Packet.pm from parts of t0021/rot13-filter.pl
> 
>  perl/Git/Packet.pm  | 118 
> 

This change, together with forcing t0021/rot13-filter.pl to use
Git/Packet.pm, breaks the test suite on Windows:

https://travis-ci.org/git/git/jobs/292461846

There are actually two problems, one of which is fixed by

-- snip --
diff --git a/perl/Makefile b/perl/Makefile
index 15d96fcc7a5..f657de20e39 100644
--- a/perl/Makefile
+++ b/perl/Makefile
@@ -30,6 +30,7 @@ instdir_SQ = $(subst ','\'',$(prefix)/lib)
 modules += Git
 modules += Git/I18N
 modules += Git/IndexInfo
+modules += Git/Packet
 modules += Git/SVN
 modules += Git/SVN/Memoize/YAML
 modules += Git/SVN/Fetcher
-- snap --

You will want to pick this up in the next iteration. (You simply did not
notice because you did not build with NO_PERL_MAKEMAKER.)

However, that *still* does not fix the test for me: note how in the
verbose output recorded in Travis (see the link above), Perl fails to find
the Perl modules and then says:

Can't locate Git/Packet.pm in @INC (you may need to install the Git::Packet 
module) (@INC contains: C \BuildAgent\_work\5\s\perl\blib\lib;C 
\BuildAgent\_work\5\s\perl\blib\arch\auto\Git /usr/lib/perl5/site_perl [..]

Note that the correct blib path starts with `C:\BuildAgent\_work` and
the line

use lib (split(/:/, $ENV{GITPERLLIB}));

splits off the drive letter from the rest of the path. Obviously, this
fails to Do The Right Thing, and simply points to Yet Another Portability
Problem with Git's reliance on Unix scripting.

But why is it a Windows path in the first place? We take pains at using
only Unix-style paths in Git's test suite after all.

Well, this one is easy. We call git.exe, which is a pure Win32 executable
(i.e. it *wants* Windows paths, even in the environment passed to it) and
git.exe in turn calls Perl to interpret rot13-filter.pl. So on the way to
git.exe, GITPERLLIB is converted by the MSYS2 runtime into a Windows-style
path list. And then it is not converted back when we call Perl.

As a workaround, I used a trick to exclude GITPERLLIB from being converted
by MSYS2: setting the environment variable MSYS2_ENV_CONV_EXCL=GITPERLLIB
"fixed" the test for me (with above patch thrown in). I also set the test
in the Visual Studio Team Services build definition that runs those tests
triggered by Travis.

If your patch makes it into Git's `master`, we may have to hardcode that
MSYS2_ENV_CONV_EXCL=GITPERLLIB (or augment any existing
MSYS2_ENV_CONV_EXCL), so that other Windows developers do not have to
stumble over the same thing and then spend 3 hours to debug it.

Ciao,
Dscho


[PATCH 2/2] diff.c: ignore all white space changes by default in the move detection

2017-10-25 Thread Stefan Beller
The coloring of moved lines is currently only related to the presentation,
as there are no options to export the move detection information into
the patch format. Hence we can be very loose about the default, so let's
ignore any white space change for the move detection. If a user really
cares about move detection detecting only lines moved byte-for-byte,
they can exercise the --color-moved-no-ignore-* options.

Signed-off-by: Stefan Beller 
---
 Documentation/diff-options.txt | 4 +++-
 diff.c | 1 +
 t/t4015-diff-whitespace.sh | 6 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 1000b53b84..fdf40cbefc 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -269,14 +269,16 @@ dimmed_zebra::
Ignore whitespace when comparing lines when performing the move
detection for --color-moved.  This ignores differences even if
one line has whitespace where the other line has none.
+   This is on by default.
 --color-moved-[no-]ignore-space-change::
Ignore changes in amount of whitespace when performing the move
detection for --color-moved.  This ignores whitespace
at line end, and considers all other sequences of one or
more whitespace characters to be equivalent.
+   This is on by default.
 --color-moved-[no-]ignore-space-at-eol::
Ignore changes in whitespace at EOL when performing the move
-   detection for --color-moved.
+   detection for --color-moved. This is on by default.
 
 --word-diff[=]::
Show a word diff, using the  to delimit changed words.
diff --git a/diff.c b/diff.c
index 2580315ab9..fef4874169 100644
--- a/diff.c
+++ b/diff.c
@@ -4105,6 +4105,7 @@ void diff_setup(struct diff_options *options)
}
 
options->color_moved = diff_color_moved_default;
+   options->color_moved_ignore_space = XDF_WHITESPACE_FLAGS;
 }
 
 void diff_setup_done(struct diff_options *options)
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 4ef4d6934a..a4ffc84f41 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1813,6 +1813,12 @@ test_expect_success 'move detection only ignores white 
spaces' '
 Qbar();
 Q// more unrelated stuff
EOF
+   test_cmp expected actual &&
+
+   # test that this is the default:
+   git diff --color --color-moved |
+   grep -v "index" |
+   test_decode_color >actual &&
test_cmp expected actual
 '
 
-- 
2.15.0.rc2.6.g953226eb5f



[PATCH 1/2] diff: decouple white space treatment for move detection from generic option

2017-10-25 Thread Stefan Beller
In the original implementation of the move dection logic we assumed that
the choice for ignoring white space changes is the same for the move
detection as it is for the generic diff.

It turns out, users want to have different choices regarding white spaces
for the move detection and the generic diff.  For example when reviewing
a patch that is sent to the mailing list (which doesn't ignore white space
changes at all; it is an exact patch) refactors some nested code out
into a separate function, we want to have the move detection kick in
despite different indentation levels of the old and new code.

This patch enables the user to have a different choice for ignoring
white spaces in the move detection.

The example given is covered in a test. Convert the existing tests
to be more explicit on their choice of white space behavior in the
move detection as we'll change the default shortly.

Signed-off-by: Stefan Beller 
---
 Documentation/diff-options.txt |  13 
 diff.c |  17 -
 diff.h |   1 +
 t/t4015-diff-whitespace.sh | 150 +++--
 4 files changed, 172 insertions(+), 9 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index a88c76741e..1000b53b84 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -265,6 +265,19 @@ dimmed_zebra::
blocks are considered interesting, the rest is uninteresting.
 --
 
+--color-moved-[no-]ignore-all-space::
+   Ignore whitespace when comparing lines when performing the move
+   detection for --color-moved.  This ignores differences even if
+   one line has whitespace where the other line has none.
+--color-moved-[no-]ignore-space-change::
+   Ignore changes in amount of whitespace when performing the move
+   detection for --color-moved.  This ignores whitespace
+   at line end, and considers all other sequences of one or
+   more whitespace characters to be equivalent.
+--color-moved-[no-]ignore-space-at-eol::
+   Ignore changes in whitespace at EOL when performing the move
+   detection for --color-moved.
+
 --word-diff[=]::
Show a word diff, using the  to delimit changed words.
By default, words are delimited by whitespace; see
diff --git a/diff.c b/diff.c
index e6814b9e9c..2580315ab9 100644
--- a/diff.c
+++ b/diff.c
@@ -714,7 +714,7 @@ static int moved_entry_cmp(const struct diff_options 
*diffopt,
 {
return !xdiff_compare_lines(a->es->line, a->es->len,
b->es->line, b->es->len,
-   diffopt->xdl_opts);
+   diffopt->color_moved_ignore_space);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -723,7 +723,8 @@ static struct moved_entry *prepare_entry(struct 
diff_options *o,
struct moved_entry *ret = xmalloc(sizeof(*ret));
struct emitted_diff_symbol *l = >emitted_symbols->buf[line_no];
 
-   ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
+   ret->ent.hash = xdiff_hash_string(l->line, l->len,
+ o->color_moved_ignore_space);
ret->es = l;
ret->next_line = NULL;
 
@@ -4592,6 +4593,18 @@ int diff_opt_parse(struct diff_options *options,
DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
else if (!strcmp(arg, "--ignore-blank-lines"))
DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+   else if (!strcmp(arg, "--color-moved-no-ignore-all-space"))
+   options->color_moved_ignore_space &= ~XDF_IGNORE_WHITESPACE;
+   else if (!strcmp(arg, "--color-moved-no-ignore-space-change"))
+   options->color_moved_ignore_space &= 
~XDF_IGNORE_WHITESPACE_CHANGE;
+   else if (!strcmp(arg, "--color-moved-no-ignore-space-at-eol"))
+   options->color_moved_ignore_space &= 
~XDF_IGNORE_WHITESPACE_AT_EOL;
+   else if (!strcmp(arg, "--color-moved-ignore-all-space"))
+   options->color_moved_ignore_space |= XDF_IGNORE_WHITESPACE;
+   else if (!strcmp(arg, "--color-moved-ignore-space-change"))
+   options->color_moved_ignore_space |= 
XDF_IGNORE_WHITESPACE_CHANGE;
+   else if (!strcmp(arg, "--color-moved-ignore-space-at-eol"))
+   options->color_moved_ignore_space |= 
XDF_IGNORE_WHITESPACE_AT_EOL;
else if (!strcmp(arg, "--indent-heuristic"))
DIFF_XDL_SET(options, INDENT_HEURISTIC);
else if (!strcmp(arg, "--no-indent-heuristic"))
diff --git a/diff.h b/diff.h
index aca150ba2e..6ba3f53bbd 100644
--- a/diff.h
+++ b/diff.h
@@ -196,6 +196,7 @@ struct diff_options {
} color_moved;
#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
#define COLOR_MOVED_MIN_ALNUM_COUNT 20
+   int color_moved_ignore_space;
 };
 
 void diff_emit_submodule_del(struct diff_options *o, const 

[PATCH 0/2] color-moved: ignore all space changes by default

2017-10-25 Thread Stefan Beller
On Mon, Oct 23, 2017 at 7:52 PM, Stefan Beller  wrote[1]:
> On Mon, Oct 23, 2017 at 6:54 PM, Junio C Hamano  wrote:
>>
>>  * As moved-lines display is mostly a presentation thing, I wonder
>>if it makes sense to always match loosely wrt whitespace
>>differences. 
>
> Well, sometimes the user wants to know if it is byte-for-byte identical
> (unlikely to be code, but maybe column oriented data for input;
> think of all our FORTRAN users. ;)

... and this is the implementation and the flip of the default setting
to ignore all white space for the move detection.

It applies on "v3 (x)diff cleanup: remove duplicate code" [2].

Thanks,
Stefan

[1] 
https://public-inbox.org/git/CAGZ79kbba9KuDX7HsxhW3jXs7ocWfZg=kshe-gsxjntnt4p...@mail.gmail.com
[2] https://public-inbox.org/git/20171025184912.21657-1-sbel...@google.com/

Stefan Beller (2):
  diff: decouple white space treatment for move detection from generic
option
  diff.c: ignore all white space changes by default in the move
detection

 Documentation/diff-options.txt |  15 
 diff.c |  18 -
 diff.h |   1 +
 t/t4015-diff-whitespace.sh | 156 +++--
 4 files changed, 181 insertions(+), 9 deletions(-)

-- 
2.15.0.rc2.6.g953226eb5f



Re: [PATCH 4/3] t/Makefile: introduce TEST_SHELL_PATH

2017-10-25 Thread Jeff King
On Wed, Oct 25, 2017 at 11:35:44PM +0200, Johannes Schindelin wrote:

> > > Or alternatively we could prefix the assignment by
> > > 
> > >   test -n "$TEST_SHELL_PATH" ||
> > > 
> > > or use the pattern
> > > 
> > >   TEST_SHELL_PATH="${TEST_SHELL_PATH:-[...]}"
> > 
> > I'm not quite sure what this is fixing.  Is there a case where we
> > wouldn't have TEST_SHELL_PATH set when running the tests? I think there
> > are already other bits that assume that "make" has been run (including
> > the existing reference to $SHELL_PATH, I think).
> 
> The way I read your patch, setting the environment variable differnently
> at test time than at build time would be ignored: GIT-BUILD-OPTIONS is
> sourced and would override whatever you told the test suite to use.
> 
> I guess it does not really matter all that much in practice.

Right. I find that behavior mildly irritating at times, but it's
consistent with other items like NO_PERL, etc. E.g., you cannot do:

  make NO_PERL=
  cd t
  NO_PERL=Nope ./t3701-*

and disable perl. It's testing what got built.

-Peff


Re: [PATCH 4/3] t/Makefile: introduce TEST_SHELL_PATH

2017-10-25 Thread Johannes Schindelin
Hi Peff,

On Mon, 23 Oct 2017, Jeff King wrote:

> On Mon, Oct 23, 2017 at 01:01:42PM +0200, Johannes Schindelin wrote:
> 
> > On Fri, 20 Oct 2017, Jeff King wrote:
> > 
> > > @@ -2350,6 +2357,7 @@ GIT-LDFLAGS: FORCE
> > >  # and the first level quoting from the shell that runs "echo".
> > >  GIT-BUILD-OPTIONS: FORCE
> > >   @echo SHELL_PATH=\''$(subst ','\'',$(SHELL_PATH_SQ))'\' >$@+
> > > + @echo TEST_SHELL_PATH=\''$(subst ','\'',$(TEST_SHELL_PATH_SQ))'\' >$@+
> > 
> > Do we really want to force the test shell path to be hardcoded at runtime?
> > It may be a better idea not to write this into GIT-BUILD-OPTIONS.
> 
> My intent was to make it work "out of the box" in the same way as
> SHELL_PATH does now. So that:
> 
>   echo TEST_SHELL_PATH=whatever >>config.mak
>   make test
>   cd t && ./t1234-*
> 
> both respect it. Without going through BUILD-OPTIONS, I don't think it
> makes it into the environment via config.mak (it _does_ if you specify
> it on the command-line of "make", though).
> 
> For my purposes it would be fine to just use the environment, but I was
> trying to have it match the other variables for consistency.

Makes sense.

> > Or alternatively we could prefix the assignment by
> > 
> > test -n "$TEST_SHELL_PATH" ||
> > 
> > or use the pattern
> > 
> > TEST_SHELL_PATH="${TEST_SHELL_PATH:-[...]}"
> 
> I'm not quite sure what this is fixing.  Is there a case where we
> wouldn't have TEST_SHELL_PATH set when running the tests? I think there
> are already other bits that assume that "make" has been run (including
> the existing reference to $SHELL_PATH, I think).

The way I read your patch, setting the environment variable differnently
at test time than at build time would be ignored: GIT-BUILD-OPTIONS is
sourced and would override whatever you told the test suite to use.

I guess it does not really matter all that much in practice.

Ciao,
Dscho


[PATCH v2 0/1] Do not handle submodules in excluded directories as untracked

2017-10-25 Thread Johannes Schindelin
Anything in an excluded directory should be ignored, not only files and
directories but also submodules.

Changes since v1:

- simplified the test case, as suggested by Kevin

- added explicit output to the commit message to demonstrate what is fixed


Johannes Schindelin (1):
  status: do not get confused by submodules in excluded directories

 dir.c  |  2 +-
 t/t7061-wtstatus-ignore.sh | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)


base-commit: ba78f398be65e941b93276680f68a81075716472
Published-As: https://github.com/dscho/git/releases/tag/submodule-in-excluded-v2
Fetch-It-Via: git fetch https://github.com/dscho/git submodule-in-excluded-v2

Interdiff vs v1:
 diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
 index 8c849a4cd2f..0c394cf995c 100755
 --- a/t/t7061-wtstatus-ignore.sh
 +++ b/t/t7061-wtstatus-ignore.sh
 @@ -278,10 +278,7 @@ EOF
  
  test_expect_success 'status ignores submodule in excluded directory' '
git init tracked/submodule &&
 -  (
 -  cd tracked/submodule &&
 -  test_commit initial
 -  ) &&
 +  test_commit -C tracked/submodule initial &&
git status --porcelain --ignored -u tracked/submodule >actual &&
test_cmp expected actual
  '
-- 
2.14.3.windows.1



[PATCH v2 1/1] status: do not get confused by submodules in excluded directories

2017-10-25 Thread Johannes Schindelin
We meticulously pass the `exclude` flag to the `treat_directory()`
function so that we can indicate that files in it are excluded rather
than untracked when recursing.

But we did not yet treat submodules the same way.

Because of that, `git status --ignored --untracked` with a submodule
`submodule` in a gitignored `tracked/` would show the submodule in the
"Untracked files" section, e.g.

On branch master
Untracked files:
  (use "git add ..." to include in what will be committed)

tracked/submodule/

Ignored files:
  (use "git add -f ..." to include in what will be committed)

tracked/submodule/initial.t

Instead, we would want it to show the submodule in the "Ignored files"
section:

On branch master
Ignored files:
  (use "git add -f ..." to include in what will be committed)

tracked/submodule/

Signed-off-by: Johannes Schindelin 
---
 dir.c  |  2 +-
 t/t7061-wtstatus-ignore.sh | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 1d17b800cf3..9987011da57 100644
--- a/dir.c
+++ b/dir.c
@@ -1392,7 +1392,7 @@ static enum path_treatment treat_directory(struct 
dir_struct *dir,
if (!(dir->flags & DIR_NO_GITLINKS)) {
unsigned char sha1[20];
if (resolve_gitlink_ref(dirname, "HEAD", sha1) == 0)
-   return path_untracked;
+   return exclude ? path_excluded : path_untracked;
}
return path_recurse;
}
diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
index fc6013ba3c8..0c394cf995c 100755
--- a/t/t7061-wtstatus-ignore.sh
+++ b/t/t7061-wtstatus-ignore.sh
@@ -272,4 +272,15 @@ test_expect_success 'status ignored tracked directory with 
uncommitted file in t
test_cmp expected actual
 '
 
+cat >expected <<\EOF
+!! tracked/submodule/
+EOF
+
+test_expect_success 'status ignores submodule in excluded directory' '
+   git init tracked/submodule &&
+   test_commit -C tracked/submodule initial &&
+   git status --porcelain --ignored -u tracked/submodule >actual &&
+   test_cmp expected actual
+'
+
 test_done
-- 
2.14.3.windows.1


Re: [PATCH] status: do not get confused by submodules in excluded directories

2017-10-25 Thread Johannes Schindelin
Hi,

On Wed, 25 Oct 2017, Heiko Voigt wrote:

> On Wed, Oct 25, 2017 at 10:28:25AM +0900, Junio C Hamano wrote:
> > Heiko Voigt  writes:
> > 
> > > On Tue, Oct 24, 2017 at 02:18:49PM +0900, Junio C Hamano wrote:
> > >> Johannes Schindelin  writes:
> > >> 
> > >> > We meticulously pass the `exclude` flag to the `treat_directory()`
> > >> > function so that we can indicate that files in it are excluded rather
> > >> > than untracked when recursing.
> > >> >
> > >> > But we did not yet treat submodules the same way.
> > >> 
> > >> ... "because of that, we ended up showing < > >> what situation>>" would be a nice thing to have here, so that it can
> > >> be copied to the release notes for the bugfix.  
> > >
> > > Yes I agree that would be nice here. It was not immediately obvious that
> > > this only applies when using both flags: -u and --ignored.
> > 
> > Does any of you care to fill in the <> then? ;-)
> 
> How about:
> 
> Because of that, we ended up showing the submodule as untracked and its
> content as ignored files when using the --ignored and -u flags with git
> status.
> 
> ? But maybe Dscho also has some more information to add about his
> situation?

He has... as part of v2, a substantially more detailed commit message will
reach your inbox Real Soon Now.

Ciao,
Dscho


Re: [PATCH 0/2] Avoid rewriting "packed-refs" unnecessarily

2017-10-25 Thread Eric Sunshine
On Wed, Oct 25, 2017 at 5:53 AM, Michael Haggerty  wrote:
> Since commit dc39e09942 (files_ref_store: use a transaction to update
> packed refs, 2017-09-08), we've been rewriting the `packed-refs` file
> unnecessarily when deleting a loose reference that has no packed
> counterpart. Add some tests demonstrating this problem, then fix it by
> teaching `files_backend` to see whether any references being deleted
> have packed versions, and if not, skipping the packed_refs
> transaction.
>
> This is a mild regression since 2.14, which avoided rewriting the
> `packed-refs` file in these cases [...]

Should the above information (that this fixes a regression) be
mentioned in the commit message of at least one of the patches
(probably 2/2)? Without such context, it may not be clear to someone
later reading those commit message -- someone who wasn't following
this email discussion -- that this behavior had been lost and is now
being restored.


HELLO

2017-10-25 Thread Mrs.Nicole Maoris


--
I am Mrs Nicole  i have a pending project of fulfillment to put in your 
hand, i will need your support to make this ream come through, could 
you le me know your interest to enable me give you further information, 
and I hereby advice that you send the below mentioned information I 
decided to will/donate the sum of  2.7 million Euro to you for the good 
work of god, and also to help the motherless and less privilege and 
also forth assistance of the widows.

At the moment I cannot take an telephone calls right now due to the 
fact that my relatives (that have squandered the funds agave them for 
this purpose before) are around me and my health status also. I have 
adjusted my will and my lawyer is aware.

I have willed those properties to you by quoting my personal file 
routing and account information. And I have also notified the bank that 
I am willing that properties to you for a good, effective and 
prudent\work. I know I don't know you but I have been directed to do 
this by god.ok Please contact this woman for more details you might not 
get me on line in time contact this

 Your full name..
 Your private telephone number..
 Your passport or identity card
 Your country... ...
 Your occupation..
 Thank you as i wait your reply.

 Yoursfaithful friand,

 Mrs. Nicole Maoris
--



Re: [PATCH 10/13] rev-list: add list-objects filtering support

2017-10-25 Thread Jeff Hostetler



On 10/25/2017 12:41 AM, Jonathan Tan wrote:

On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:

  static void finish_object(struct object *obj, const char *name, void *cb_data)
  {
 struct rev_list_info *info = cb_data;
-   if (obj->type == OBJ_BLOB && !has_object_file(>oid))
+   if (obj->type == OBJ_BLOB && !has_object_file(>oid)) {
+   if (arg_print_missing) {
+   list_objects_filter_map_insert(
+   _objects, >oid, name, obj->type);
+   return;
+   }
+
+   /*
+* Relax consistency checks when we expect missing
+* objects because of partial-clone or a previous
+* partial-fetch.
+*
+* Note that this is independent of any filtering that
+* we are doing in this run.
+*/
+   if (is_partial_clone_registered())
+   return;
+
 die("missing blob object '%s'", oid_to_hex(>oid));


I'm fine with arg_print_missing suppressing lazy fetching (when I
rebase my patches on this, I'll have to ensure that fetch_if_missing
is set to 0 if arg_print_missing is true), but I think that the
behavior when arg_print_missing is false should be the opposite - we
should let has_object_file() perform the lazy fetching, and die if it
returns false (that is, if the fetching failed).


Right. This is a point where our different approaches need
to come together.  My "is_partial_clone_registered" is essentially
a placeholder for your lazy fetching.  so we can delete this call
when your changes are in.  Basically, you set:
fetch_if_missing = !arg_print_missing
at the top.




+   }
 if (info->revs->verify_objects && !obj->parsed && obj->type != 
OBJ_COMMIT)
 parse_object(>oid);
  }


Re: [PATCH 08/13] list-objects: add traverse_commit_list_filtered method

2017-10-25 Thread Jeff Hostetler



On 10/25/2017 12:24 AM, Jonathan Tan wrote:

On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:

+void traverse_commit_list_filtered(
+   struct list_objects_filter_options *filter_options,
+   struct rev_info *revs,
+   show_commit_fn show_commit,
+   show_object_fn show_object,
+   list_objects_filter_map_foreach_cb print_omitted_object,
+   void *show_data);


So the function call chain, if we wanted a filtered traversal, is:
traverse_commit_list_filtered -> traverse_commit_list__sparse_path
(and friends, and each algorithm is in its own file) ->
traverse_commit_list_worker

This makes the implementation of each algorithm more easily understood
(since they are all in their own files), but also increases the number
of global functions and code files. I personally would combine the
traverse_commit_list__* functions into one file
(list-objects-filtered.c), make them static, and also put
traverse_commit_list_filtered in there, but I understand that other
people in the Git project may differ on this.



I'll do a round of refactoring to include your suggestion of
a default null filter.  Then with that see what collapsing this
looks like.

Thanks,
Jeff


Re: [PATCH 07/13] list-objects-filter-options: common argument parsing

2017-10-25 Thread Jeff Hostetler



On 10/25/2017 12:14 AM, Jonathan Tan wrote:

On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:

+ *  ::= blob:none
+ *   blob:limit:[kmg]
+ *   sparse:oid:
+ *   sparse:path:


I notice in the code below that there are some usages of "=" instead
of ":" - could you clarify which one it is? (Ideally this would point
to one point of documentation which serves as both user and technical
documentation.)


good catch.  thanks!
 

+ */
+int parse_list_objects_filter(struct list_objects_filter_options 
*filter_options,
+ const char *arg)
+{
+   struct object_context oc;
+   struct object_id sparse_oid;
+   const char *v0;
+   const char *v1;
+
+   if (filter_options->choice)
+   die(_("multiple object filter types cannot be combined"));
+
+   /*
+* TODO consider rejecting 'arg' if it contains any
+* TODO injection characters (since we might send this
+* TODO to a sub-command or to the server and we don't
+* TODO want to deal with legacy quoting/escaping for
+* TODO a new feature).
+*/
+
+   filter_options->raw_value = strdup(arg);
+
+   if (skip_prefix(arg, "blob:", ) || skip_prefix(arg, "blobs:", )) {


I know that some people prefer leniency, but I think it's better to
standardize on one form ("blob" instead of both "blob" and "blobs").


I could go either way on this.  (I kept mistyping it during interactive testing,
so I added both cases...)




+   if (!strcmp(v0, "none")) {
+   filter_options->choice = LOFC_BLOB_NONE;
+   return 0;
+   }
+
+   if (skip_prefix(v0, "limit=", ) &&
+   git_parse_ulong(v1, _options->blob_limit_value)) {
+   filter_options->choice = LOFC_BLOB_LIMIT;
+   return 0;
+   }
+   }
+   else if (skip_prefix(arg, "sparse:", )) {
+   if (skip_prefix(v0, "oid=", )) {
+   filter_options->choice = LOFC_SPARSE_OID;
+   if (!get_oid_with_context(v1, GET_OID_BLOB,
+ _oid, )) {
+   /*
+* We successfully converted the 
+* into an actual OID.  Rewrite the raw_value
+* in canonoical form with just the OID.
+* (If we send this request to the server, we
+* want an absolute expression rather than a
+* local-ref-relative expression.)
+*/
+   free((char *)filter_options->raw_value);
+   filter_options->raw_value =
+   xstrfmt("sparse:oid=%s",
+   oid_to_hex(_oid));
+   filter_options->sparse_oid_value =
+   oiddup(_oid);
+   } else {
+   /*
+* We could not turn the  into an
+* OID.  Leave the raw_value as is in case
+* the server can parse it.  (It may refer to
+* a branch, commit, or blob we don't have.)
+*/
+   }
+   return 0;
+   }
+
+   if (skip_prefix(v0, "path=", )) {
+   filter_options->choice = LOFC_SPARSE_PATH;
+   filter_options->sparse_path_value = strdup(v1);
+   return 0;
+   }
+   }
+
+   die(_("invalid filter expression '%s'"), arg);
+   return 0;
+}
+
+int opt_parse_list_objects_filter(const struct option *opt,
+ const char *arg, int unset)
+{
+   struct list_objects_filter_options *filter_options = opt->value;
+
+   assert(arg);
+   assert(!unset);
+
+   return parse_list_objects_filter(filter_options, arg);
+}
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
new file mode 100644
index 000..23bd68e
--- /dev/null
+++ b/list-objects-filter-options.h
@@ -0,0 +1,50 @@
+#ifndef LIST_OBJECTS_FILTER_OPTIONS_H
+#define LIST_OBJECTS_FILTER_OPTIONS_H
+
+#include "parse-options.h"
+
+/*
+ * Common declarations and utilities for filtering objects (such as omitting
+ * large blobs) in list_objects:traverse_commit_list() and git-rev-list.
+ */
+
+enum list_objects_filter_choice {
+   LOFC_DISABLED = 0,
+   LOFC_BLOB_NONE,
+   LOFC_BLOB_LIMIT,
+   LOFC_SPARSE_OID,
+   LOFC_SPARSE_PATH,
+};
+
+struct list_objects_filter_options {
+   /*
+* The raw argument value given on the 

Re: [PATCH 03/13] list-objects: filter objects in traverse_commit_list

2017-10-25 Thread Jeff Hostetler



On 10/25/2017 12:05 AM, Jonathan Tan wrote:

On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:


+enum list_objects_filter_result {
+   LOFR_ZERO  = 0,
+   LOFR_MARK_SEEN = 1<<0,


Probably worth documenting, something like /* Mark this object so that
it is skipped for the rest of the traversal. */


+   LOFR_SHOW  = 1<<1,


And something like /* Invoke show_object_fn on this object. This
object may be revisited unless LOFR_MARK_SEEN is also set. */


+};
+
+/* See object.h and revision.h */
+#define FILTER_REVISIT (1<<25)


I think this should be declared closer to its use - in the sparse
filter code or in the file that uses it. Wherever it is, also update
the chart in object.h to indicate that we're using this 25th bit.


+
+enum list_objects_filter_type {
+   LOFT_BEGIN_TREE,
+   LOFT_END_TREE,
+   LOFT_BLOB
+};
+
+typedef enum list_objects_filter_result list_objects_filter_result;
+typedef enum list_objects_filter_type list_objects_filter_type;


I don't think we typedef enums in Git code.


+
+typedef list_objects_filter_result (*filter_object_fn)(
+   list_objects_filter_type filter_type,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data);
+
+void traverse_commit_list_worker(
+   struct rev_info *,
+   show_commit_fn, show_object_fn, void *show_data,
+   filter_object_fn filter, void *filter_data);


I think things would be much clearer if a default filter was declared
(matching the behavior as of this patch when filter == NULL), say:
static inline default_filter(args) { switch(filter_type) { case
LOFT_BEGIN_TREE: return LOFR_MARK_SEEN | LOFR_SHOW; case
LOFT_END_TREE: return LOFT_ZERO; ...

And inline traverse_commit_list() instead of putting it in the .c file.

This would reduce or eliminate the need to document
traverse_commit_list_worker, including what happens if filter is NULL,
and explain how a user would make their own filter_object_fn.


+
+#endif /* LIST_OBJECTS_H */
--
2.9.3



I'll give that a try.  Thanks!

Jeff


Re: [PATCH 02/13] list-objects-filter-map: extend oidmap to collect omitted objects

2017-10-25 Thread Jeff Hostetler



On 10/25/2017 3:10 AM, Junio C Hamano wrote:

Jeff Hostetler  writes:


From: Jeff Hostetler 

Create helper class to extend oidmap to collect a list of
omitted or missing objects during traversal.


The reason why oidmap itself cannot be used is because the code
wants to record not just the object name but something else about
the object.  And attributes that the code may care about we can see
in this patch are the object type and the path it found.


I recently simplified the code in this version to not completely
sub-class oidmap, but to just use it along with a custom
_insert method that takes care of allocating the _entry
data.  I should update the commit message to reflect that.



Is the plan to extend this set of attributes over time as different
"omitter"s are added?  Why was "path" chosen as a member of the
initial set and how it will be useful (also, what path would we
record for tags and commits)?


I envisioned this to let rev-list print the pathname of omitted
objects -- like "rev-list --objects" does for regular blobs.
I would leave the pathname NULL for tags and commits.

The pathname helps with debugging and testing, but also is
used by the sparse filter to avoid some expensive duplicate
is-excluded lookups.

Currently the 3 filters I have defined all use the same extra
data.  I suppose a future filter could want additional fields,
so maybe it would be better to refactor my "map-entry" to be
per-filter specific.



These "future plans" needs revealed upfront, instead of (or in
addition to) "will be used in a later commit".  As it is hard to
judge if "filter map" is an appropriate name for this thing without
knowing _how_ it is envisioned to be used.  "filter map" sounds more
like a map function that is consulted when we decide if we want to
drop the object, but from the looks of the code, it is used more to
record what was done to these objects.


Sorry, I meant a later commit in this patch series.  It is used by
commits 4, 5, 6, and 10 to actually do the filtering and collect a
list of omitted or missing objects.



Is it really a "map" (i.e. whose primary focus is to find out what
an object name is "mapped to" when we get an object name---e.g. we
notice an otherwise connected object is missing, and consult this
"map" to learn what the type/path is because we want to do X)?  Or
is it more like a "set of known-to-be-missing object" (i.e. whose
primary point is to serve as a set of object names and what a name
maps to is primarily for debugging)?  These are easier to answer if
we know how it will be used.


I think of a "set" as a member? or not-member? class.
I think of a "map" as a member? or not-member? class but where each
member also has a value.  Sometimes map lookups just want to know
membership and sometimes the lookup wants the value.

Granted, having the key and value data stuffed into the same entry
(from hashmap's point of view, rather than a key having a pointer
to a value) does kind of blur the line, but I was thinking about
a map here.  (And I was building on oidmap which builds on hashmap,
so it seemed appropriate.)




This will be used in a later commit by the list-object filtering
code.

Signed-off-by: Jeff Hostetler 
---
diff --git a/list-objects-filter-map.c b/list-objects-filter-map.c
new file mode 100644
index 000..7e496b3
--- /dev/null
+++ b/list-objects-filter-map.c
@@ -0,0 +1,63 @@
+#include "cache.h"
+#include "list-objects-filter-map.h"
+
+int list_objects_filter_map_insert(struct oidmap *map,
+  const struct object_id *oid,
+  const char *pathname, enum object_type type)
+{
+   size_t len, size;
+   struct list_objects_filter_map_entry *e;
+
+   if (oidmap_get(map, oid))
+   return 1;


It is OK for the existing entry to record a path that is totally
different from what the caller has.  It is hard to judge without
knowing what pathname the callers are expected to call this function
with, but I am guessing that it is similar to the path shown in the
output from "rev-list --objects"---and if that is the case, it is
correct that the same object may be reached at different paths
depending on what tree the traversal begins at, so pathname recorded
in the map is merely "there is one tree somewhere that has this
object at this path".


Right, the first observed pathname is as good as any.



For that matter, the caller may have a completely different type
from the object we saw earlier; not checking and flagging it as a
possible error makes me feel somewhat uneasy, but there probably is
little you can do at this layer of the code if you noticed such a
discrepancy so it may be OK to punt.


I could assert() that the types match, but right there's not much
we can do about it at this layer.




+   len = ((pathname && *pathname) ? strlen(pathname) : 0);
+   size = (offsetof(struct 

Re: [WIP PATCH] diff: add option to ignore whitespaces for move detection only

2017-10-25 Thread Brandon Williams
On 10/25, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> > Brandon Williams  writes:
> >
> >> One simple idea would be to convert the single 'flag' into various bit
> >> fields themselves, that way if you need to add a new flag you would just
> >> make a new bit field.  I'm unaware of any downsides of doing so (though
> >> i may be missing something) but doing so would probably cause a bit of
> >> code churn.
> >
> > The reason why people want to have their own bit in the flags word
> > is because they want to use DIFF_OPT_{SET,CLR,TST,TOUCHED} but they
> > do not want to do the work to extend them beyond a single word.  
> >
> > I think it is doable by making everything a 1-bit wide bitfield
> > without affecting existing users.
> 
> ... but the "touched" thing may be harder---I haven't thought it
> through.

>From what I can tell the 'touched' thing is implemented as a parallel
flag field so we would just need to have each flag use 2-bits, one
for the flag itself and one for the 'touched' field.  Then when using
those macros it would just need to update the corresponding 'touched'
field as well as what ever happens with the flag itself.  It may be a
little more involved than the current scheme but it should be doable if
we need to extend the flag space past 32 bits.

-- 
Brandon Williams


[PATCH 1/2] xdiff-interface: export comparing and hashing strings

2017-10-25 Thread Stefan Beller
This will turn out to be useful in a later patch.

xdl_recmatch is exported in xdiff/xutils.h, to be used by various
xdiff/*.c files, but not outside of xdiff/. This one makes it available
to the outside, too.

While at it, add documentation.

Signed-off-by: Stefan Beller 
---
 xdiff-interface.c | 12 
 xdiff-interface.h | 16 
 2 files changed, 28 insertions(+)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index 018e033089..770e1f7f81 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -5,6 +5,7 @@
 #include "xdiff/xdiffi.h"
 #include "xdiff/xemit.h"
 #include "xdiff/xmacros.h"
+#include "xdiff/xutils.h"
 
 struct xdiff_emit_state {
xdiff_emit_consume_fn consume;
@@ -296,6 +297,17 @@ void xdiff_clear_find_func(xdemitconf_t *xecfg)
}
 }
 
+unsigned long xdiff_hash_string(const char *s, size_t len, long flags)
+{
+   return xdl_hash_record(, s + len, flags);
+}
+
+int xdiff_compare_lines(const char *l1, long s1,
+   const char *l2, long s2, long flags)
+{
+   return xdl_recmatch(l1, s1, l2, s2, flags);
+}
+
 int git_xmerge_style = -1;
 
 int git_xmerge_config(const char *var, const char *value, void *cb)
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 6f6ba9095d..135fc05d72 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -29,4 +29,20 @@ extern void xdiff_clear_find_func(xdemitconf_t *xecfg);
 extern int git_xmerge_config(const char *var, const char *value, void *cb);
 extern int git_xmerge_style;
 
+/*
+ * Compare the strings l1 with l2 which are of size s1 and s2 respectively.
+ * Returns 1 if the strings are deemed equal, 0 otherwise.
+ * The `flags` given as XDF_WHITESPACE_FLAGS determine how white spaces
+ * are treated for the comparision.
+ */
+extern int xdiff_compare_lines(const char *l1, long s1,
+  const char *l2, long s2, long flags);
+
+/*
+ * Returns a hash of the string s of length len.
+ * The `flags` given as XDF_WHITESPACE_FLAGS determine how white spaces
+ * are treated for the hash.
+ */
+extern unsigned long xdiff_hash_string(const char *s, size_t len, long flags);
+
 #endif
-- 
2.15.0.rc2.6.g953226eb5f



[PATCHv3 0/2] (x)diff cleanup: remove duplicate code

2017-10-25 Thread Stefan Beller
V3:
* changed parameter names back to xdiff standard

v2:
* I realized that we don't have to change the hashing function of xdiff;
  we can rather change the hashing function for the move detection,
  which is less fundamental.
  (That way I can shrink the series down to 2 patches)
* commented and renamed function parameters in the exposed xdiff functions.
* applies on top of jk/diff-color-moved-fix.

Thanks,
Stefan

v1:
Junio wrote:

>  * I was hoping that the next_byte() and string_hash() thing, once
>they are cleaned up, will eventually be shared with the xdiff/
>code at the lower layer, which needs to do pretty much the same
>in order to implement various whitespace ignoring options.  I am
>not sure how well the approach taken by the WIP patch meshes with
>the needs of the lower layer.

This series does exactly this; although I chose to reuse the code in
xdiff/xutils.c instead of the new fancy next_byte/string_hash, as that
code has seen more exercise already (hence I assume it has fewer bugs
if any as well as its performance implications are well understood).

However to do so, we need to pollute xdiff/xutils.c and include
hashmap.h there (which also requires cache.h as that header has
an inline function using BUG()), which I find a bit unfortunate but
worth the trade off of using better tested code.

Thanks,
Stefan

Stefan Beller (2):
  xdiff-interface: export comparing and hashing strings
  diff.c: get rid of duplicate implementation

 diff.c| 82 +++
 xdiff-interface.c | 12 
 xdiff-interface.h | 16 +++
 3 files changed, 32 insertions(+), 78 deletions(-)

-- 
2.15.0.rc2.6.g953226eb5f



[PATCH 2/2] diff.c: get rid of duplicate implementation

2017-10-25 Thread Stefan Beller
The implementations in diff.c to detect moved lines needs to compare
strings and hash strings, which is implemented in that file, as well
as in the xdiff library.

Remove the rather recent implementation in diff.c and rely on the well
exercised code in the xdiff lib.

With this change the hash used for bucketing the strings for the moved
line detection changes from FNV32 (that is provided via the hashmaps
memhash) to DJB2 (which is used internally in xdiff).  Benchmarks found
on the web[1] do not indicate that these hashes are different in
performance for readable strings.

[1] 
https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

Signed-off-by: Stefan Beller 
---
 diff.c | 82 --
 1 file changed, 4 insertions(+), 78 deletions(-)

diff --git a/diff.c b/diff.c
index c4a669ffa8..e6814b9e9c 100644
--- a/diff.c
+++ b/diff.c
@@ -707,88 +707,14 @@ struct moved_entry {
struct moved_entry *next_line;
 };
 
-static int next_byte(const char **cp, const char **endp,
-const struct diff_options *diffopt)
-{
-   int retval;
-
-   if (*cp >= *endp)
-   return -1;
-
-   if (isspace(**cp)) {
-   if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_CHANGE)) {
-   while (*cp < *endp && isspace(**cp))
-   (*cp)++;
-   /*
-* After skipping a couple of whitespaces,
-* we still have to account for one space.
-*/
-   return (int)' ';
-   }
-
-   if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {
-   while (*cp < *endp && isspace(**cp))
-   (*cp)++;
-   /*
-* return the first non-ws character via the usual
-* below, unless we ate all of the bytes
-*/
-   if (*cp >= *endp)
-   return -1;
-   }
-   }
-
-   retval = (unsigned char)(**cp);
-   (*cp)++;
-   return retval;
-}
-
 static int moved_entry_cmp(const struct diff_options *diffopt,
   const struct moved_entry *a,
   const struct moved_entry *b,
   const void *keydata)
 {
-   const char *ap = a->es->line, *ae = a->es->line + a->es->len;
-   const char *bp = b->es->line, *be = b->es->line + b->es->len;
-
-   if (!(diffopt->xdl_opts & XDF_WHITESPACE_FLAGS))
-   return a->es->len != b->es->len  || memcmp(ap, bp, a->es->len);
-
-   if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_AT_EOL)) {
-   while (ae > ap && isspace(ae[-1]))
-   ae--;
-   while (be > bp && isspace(be[-1]))
-   be--;
-   }
-
-   while (1) {
-   int ca, cb;
-   ca = next_byte(, , diffopt);
-   cb = next_byte(, , diffopt);
-   if (ca != cb)
-   return 1;
-   if (ca < 0)
-   return 0;
-   }
-}
-
-static unsigned get_string_hash(struct emitted_diff_symbol *es, struct 
diff_options *o)
-{
-   if (o->xdl_opts & XDF_WHITESPACE_FLAGS) {
-   static struct strbuf sb = STRBUF_INIT;
-   const char *ap = es->line, *ae = es->line + es->len;
-   int c;
-
-   strbuf_reset();
-   while (ae > ap && isspace(ae[-1]))
-   ae--;
-   while ((c = next_byte(, , o)) >= 0)
-   strbuf_addch(, c);
-
-   return memhash(sb.buf, sb.len);
-   } else {
-   return memhash(es->line, es->len);
-   }
+   return !xdiff_compare_lines(a->es->line, a->es->len,
+   b->es->line, b->es->len,
+   diffopt->xdl_opts);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -797,7 +723,7 @@ static struct moved_entry *prepare_entry(struct 
diff_options *o,
struct moved_entry *ret = xmalloc(sizeof(*ret));
struct emitted_diff_symbol *l = >emitted_symbols->buf[line_no];
 
-   ret->ent.hash = get_string_hash(l, o);
+   ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
ret->es = l;
ret->next_line = NULL;
 
-- 
2.15.0.rc2.6.g953226eb5f



Re: [PATCHv2 0/2] (x)diff cleanup: remove duplicate code

2017-10-25 Thread Stefan Beller
On Tue, Oct 24, 2017 at 10:11 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> v2:
>> * I realized that we don't have to change the hashing function of xdiff;
>>   we can rather change the hashing function for the move detection,
>>   which is less fundamental.
>>   (That way I can shrink the series down to 2 patches)
>> * commented and renamed function parameters in the exposed xdiff functions.
>> * applies on top of jk/diff-color-moved-fix.
>
> Yes, by reusing the line hashing and comparison from xdiff/ we can
> ensure that we will use consistent comparison function, and the
> thing we need to focus on will become how correctly the caller uses
> the xdiff interface.  This looks much better than the previous one.
>
> Eric's comment on the function parameters is right.  We keep them in
> sync with the naming convention of xdiff/ as long as they are still
> part of xdiff layer, and the convention there is that the lines
> being compared are l1[] and l2[] whose lengths are s1 and s2, if I
> am not mistaken (well, I am not, as I just touched the function
> there during my lunch break ;-).
>

Yes, I am torn on the naming as Rene commented on it and did not like
the (l1, s1) and rather would see a (a, alen) / (b, blen) approach.
And I thought that was a good idea initially as that patch is explicitly about
exposing the internals of xdiff to the world. So when the world copes better
with (a, alen), then the function added in xdiff-interface.c should take that
parameter as to not confuse the outside world. And one could argue
that xdiff-interface is not part of the xdiff layer any more as that is the
glue-gap to the rest of Git.

However I agree that we may want to keep l1[], s1 at that place, as the
rest of the xdiff interface looks like it is kept in line with xdiff.

Thanks,
Stefan


Re: [RFC] protocol version 2

2017-10-25 Thread Brandon Williams
On 10/25, Derrick Stolee wrote:
> On 10/20/2017 1:18 PM, Brandon Williams wrote:
> >  Overview
> >==
> >
> >This document presents a specification for a version 2 of Git's wire
> >protocol.  Protocol v2 will improve upon v1 in the following ways:
> >
> >   * Instead of multiple service names, multiple commands will be
> > supported by a single service
> >   * Easily extendable as capabilities are moved into their own section
> > of the protocol, no longer being hidden behind a NUL byte and
> > limited by the size of a pkt-line (as there will be a single
> > capability per pkt-line).
> >   * Separate out other information hidden behind NUL bytes (e.g. agent
> > string as a capability and symrefs can be requested using 'ls-ref')
> >   * Ref advertisement will be omitted unless explicitly requested
> >   * ls-ref command to explicitly request some refs
> 
> Hi Brandon,
> 
> I'm very interested in your protocol as a former server-side dev for
> the VSTS Git server, and understand some of these headaches. We

Always happy to hear that someone is excited about the area I'm working
on :)

> built limited refs specifically to target the problem you are
> solving with ls-ref, but it requires knowledge about the
> authenticated user in order to work. I believe your suggestion is a
> better solution for the Git protocol.

Yes one of the big issues we've run into is the ref advertisement, 1
because it makes it difficult to add any additional features to the
protocol and 2 because it doesn't scale well because all refs are
blasted at the client (unless you use a separate system like you
implemented to reduce that number).  So I'm hoping we can solve (1) by
doing some sort of capability advertisement instead of a ref
advertisement upfront and (2) by allowing clients to express a way of
limiting the ref advertisement server side.

> 
> The "easily extendable" part has specifically caught my interest, as
> we (Microsoft) would like to move most of the GVFS protocol into
> core Git, and this is a great way to do it. Even if not all features
> are accepted by upstream, we could use our GVFS-specific fork of Git
> to communicate to our servers without breaking normal users'
> interactions.

This is one thing I'm excited about too and hope there's enough desire
for such a capability to extend the protocol.  I first was interested in
building such a system when i looked at some of the previous work done
by stephan trying to introduce a protocol v2
(https://public-inbox.org/git/1429658342-5295-1-git-send-email-sbel...@google.com/)
and this idea was further reinforced when I sat down and talked with
some mercurial developers about their protocol and what they did to
migrate to a v2.  I also discovered that their v2 protocol is
service/command oriented and it makes it very simple and easy for
extensions to be added server and client side which allow for additional
commands to be executed during a server/client exchange without breaking
the normal fetch/push interaction.

So I'm hoping that what we decided on for v2 will enable exactly what
you want.  It may even allow for doing things like server side log and
grep when a client has a partial clone or a repo which is so big it
would be difficult to do some of those operations locally.  Of course
you identified some of the issues which such operations below :)

> 
> Please CC me in future versions of this proposal. Let me know if you
> want to chat directly about the "TODO" items below.

Of course, I'll make sure to keep you updated.

> 
> Speaking of TODOs, how much of this concept do you have working in a
> prototype? Do you have code that performs this version 2 handshake
> and communicates the ls-refs result?

Most of the TODOs are areas where more thought and design needs to
happen.  My main goal with this RFC is to see if a modular design like
this would have support from the community and if so, to nail down the
design of the exchanges outside of individual commands since that
infrastructure would be harder to change after the fact.

As for the commands themselves, since this design is meant to be module
we could implement them separately or one at a time (though a base set
of commands would need to be implemented and designed before v2 could
roll out).  That being said I've begun working up a rough prototype of
the basic initial capability/command exchange and the ls-refs command.
Since its pretty rough and not integrated into the actual transport code
yet its no where near ready to be sent out though I wouldn't mind
pushing the WIP code out so people can see/play with what's there.

And if you have any comments or suggestions about any parts of the
design or the TODOs I would love to chat about it :)

> 
> >  Ls-refs
> >-
> >
> >Ls-refs can be looked at as the equivalent of the current ls-remote as
> >it is a way to query a remote for the references that it has.  Unlike
> >the current ls-remote, the filtering of the output is done on 

Re: Consequences of CRLF in index?

2017-10-25 Thread Jonathan Nieder
Hi again,

Lars Schneider wrote:
>> On 24 Oct 2017, at 20:14, Jonathan Nieder  wrote:

>> In any event, you also probably want to declare what you're doing
>> using .gitattributes.  By checking in the files as CRLF, you are
>> declaring that you do *not* want Git to treat them as text files
>> (i.e., you do not want Git to change the line endings), so something
>> as simple as
>>
>>  * -text
>
> That's sounds good. Does "-text" have any other implications?
> For whatever reason I always thought this is the way to tell
> Git that a particular file is binary with the implication that
> Git should not attempt to diff it.

No other implications.  You're thinking of "-diff".  There is also a
shortcut "binary" which simply means "-text -diff".

Ideas for wording improvements to gitattributes(5) on this subject?

Thanks,
Jonathan


Re: Consequences of CRLF in index?

2017-10-25 Thread Lars Schneider

> On 24 Oct 2017, at 20:14, Jonathan Nieder  wrote:
> 
> Hi,
> 
> Lars Schneider wrote:
> 
>> I've migrated a large repo (110k+ files) with a lot of history (177k commits)
>> and a lot of users (200+) to Git. Unfortunately, all text files in the index
>> of the repo have CRLF line endings. In general this seems not to be a problem
>> as the project is developed exclusively on Windows.
> 
> Sounds good.
> 
>> However, I wonder if there are any "hidden consequences" of this setup?
>> If there are consequences, then I see two options. Either I rebase the repo
>> and fix the line endings for all commits or I add a single commit that fixes
>> the line endings for all files. Both actions require coordination with the
>> users to avoid repo trouble/merge conflicts. The "single fixup commit" 
>> options
>> would also make diffs into the past look bad. Would a single large commit 
>> have
>> any impact on the performance of standard Git operations?
> 
> There are no hidden consequences that I'm aware of.  If you later
> decide that you want to become a cross-platform project, then you may
> want to switch to LF endings, in which case I suggest the "single
> fixup commit" strategy.
> 
> In any event, you also probably want to declare what you're doing
> using .gitattributes.  By checking in the files as CRLF, you are
> declaring that you do *not* want Git to treat them as text files
> (i.e., you do not want Git to change the line endings), so something
> as simple as
> 
>   * -text

That's sounds good. Does "-text" have any other implications?
For whatever reason I always thought this is the way to tell
Git that a particular file is binary with the implication that
Git should not attempt to diff it.

Thanks,
Lars

Re: Consequences of CRLF in index?

2017-10-25 Thread Stefan Beller
On Tue, Oct 24, 2017 at 9:53 PM, Junio C Hamano  wrote:
>
> Here is a lunch-time hack to add that option.  As you can see from
> the lack of docs, tests and a proper log message, I haven't played
> with it long enough to know how buggy it is, though.  I am not all
> that interested in polishing it to completion myself and prefer to
> leave it as #leftoverbits ;-)

Ok, nevertheless a review pointing out a couple things would be
useful for those who pick it up later, I assume.

> Also I didn't bother teaching this to Stefan's color-moved thing, as
> the line comparator it uses will hopefully be unified with the one I
> am touching in xdiff/ with this patch.

which will be rerolled shortly fixing just the parameter names as Eric
mentioned.

>  diff.c|  5 -
>  merge-recursive.c |  2 ++
>  xdiff/xdiff.h |  3 ++-
>  xdiff/xutils.c| 34 --
>  4 files changed, 40 insertions(+), 4 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 6fd288420b..eeca0fd3b8 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4202,7 +4202,8 @@ void diff_setup_done(struct diff_options *options)
>
> if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
> DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
> -   DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
> +   DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
> +   DIFF_XDL_TST(options, IGNORE_CR_AT_EOL))

This highlights another part of the flag macros, that could be made nicer.
All these flags combined are XDF_WHITESPACE_FLAGS, so this
if could be written without the macros as

  if (options->xdl_ops & XDF_WHITESPACE_FLAGS)

which we might want to hide in a macro

  DIFF_XDL_MASK_TST(options, mask)

or such?


>  #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
> -#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | 
> XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
> +#define XDF_IGNORE_CR_AT_EOL (1 << 5)
> +#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | 
> XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL | 
> XDF_IGNORE_CR_AT_EOL)
>
>  #define XDF_PATIENCE_DIFF (1 << 5)

(1<<5) is taken twice now. Currently there is only one
unused free bit (but that was used once upon a time);
so we have to think how we revamp the flag system to
support more than 32 bits.

See also the answers to
https://public-inbox.org/git/20171024000931.14814-1-sbel...@google.com/
as that started this discussion already.

>  #define XDF_HISTOGRAM_DIFF (1 << 6)
> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
> index 04d7b32e4e..8720e272b9 100644
> --- a/xdiff/xutils.c
> +++ b/xdiff/xutils.c
> @@ -156,6 +156,21 @@ int xdl_blankline(const char *line, long size, long 
> flags)
> return (i == size);
>  }
>
> +/*
> + * Have we eaten everything on the line, except for an optional
> + * CR at the very end?
> + */
> +static int ends_with_optional_cr(const char *l, long s, long i)
> +{
> +   if (s && l[s-1] == '\n')
> +   s--;

so first we cut off the '\n',

> +   if (s == i)
> +   return 1;

then we either have an ending without

> +   if (s == i + 1 && l[i] == '\r')
> +   return 1;

or with a '\r' before.

That seems correct after some thought; I might offer
another easier to understand (for me) solution,
which is
{
   /* cut of ending LF */
   if (s && l[s-1] == '\n')
   s--;
  /* do we only need to cut LF? */
  if (i == s)
return 1;

   /* cut of ending CR */
   if (s && l[s-1] == '\r')
   s--;
  /* was this everything to cut? */
  return i == s
}

Though this seems even more complicated
after having it written down.

>  * Each flavor of ignoring needs different logic to skip whitespaces
>  * while we have both sides to compare.
> @@ -204,6 +220,14 @@ int xdl_recmatch(const char *l1, long s1, const char 
> *l2, long s2, long flags)
> i1++;
> i2++;
> }
> +   } else if (flags & XDF_IGNORE_CR_AT_EOL) {
> +   /* Find the first difference and see how the line ends */
> +   while (i1 < s1 && i2 < s2 && l1[i1] == l2[i2]) {
> +   i1++;
> +   i2++;
> +   }
> +   return (ends_with_optional_cr(l1, s1, i1) &&
> +   ends_with_optional_cr(l2, s2, i2));
> }
>
> /*
> @@ -230,9 +254,15 @@ static unsigned long 
> xdl_hash_record_with_whitespace(char const **data,
> char const *top, long flags) {
> unsigned long ha = 5381;
> char const *ptr = *data;
> +   int cr_at_eol_only = (flags & XDF_WHITESPACE_FLAGS) == 
> XDF_IGNORE_CR_AT_EOL;
>
> for (; ptr < top && *ptr != '\n'; ptr++) {
> -   if (XDL_ISSPACE(*ptr)) {
> +   if (cr_at_eol_only) {
> +   if (*ptr == '\r' &&
> +   

Re: [PATCH 3/5] ssh: 'auto' variant to select between 'ssh' and 'simple'

2017-10-25 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:
> On Wed, Oct 25, 2017 at 5:51 AM, Johannes Schindelin
>  wrote:

>> This breaks on Windows. And it is not immediately obvious how so, so let
>> me explain:
[nice explanation snipped]
>> As a consequence, the test fails. Could you please squash in this, to fix
>> the test on Windows?
>
> This explanation is in detail and would even make a good commit message
> for a follow up commit. (Squashing just that line would loose the explanation
> as I suspect the original commit will not dedicate so much text to
> this single line)

I have other changes to make when rerolling anyway (from Junio's
review), so no need for a followup patch.  Will fix this in the
reroll today.

Thanks for catching and diagnosing this, Dscho!

Jonathan


Re: [PATCH 3/5] ssh: 'auto' variant to select between 'ssh' and 'simple'

2017-10-25 Thread Stefan Beller
On Wed, Oct 25, 2017 at 5:51 AM, Johannes Schindelin
 wrote:
> Hi Jonathan,
>
> [I only saw that you replied to 3/5 with v2 after writing this reply, but
> it would apply to v2's 3/5, too]
>
> On Mon, 23 Oct 2017, Jonathan Nieder wrote:
>
>> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
>> index 86811a0c35..fd94dd40d2 100755
>> --- a/t/t5601-clone.sh
>> +++ b/t/t5601-clone.sh
>> @@ -384,6 +384,20 @@ test_expect_success 'uplink is treated as simple' '
>>   expect_ssh myhost src
>>  '
>>
>> +test_expect_success 'OpenSSH-like uplink is treated as ssh' '
>> + write_script "$TRASH_DIRECTORY/uplink" <<-EOF &&
>> + if test "\$1" = "-G"
>> + then
>> + exit 0
>> + fi &&
>> + exec "\$TRASH_DIRECTORY/ssh$X" "\$@"
>> + EOF
>> + GIT_SSH="$TRASH_DIRECTORY/uplink" &&
>> + export GIT_SSH &&
>> + git clone "[myhost:123]:src" ssh-bracket-clone-sshlike-uplink &&
>> + expect_ssh "-p 123" myhost src
>> +'
>> +
>>  test_expect_success 'plink is treated specially (as putty)' '
>>   copy_ssh_wrapper_as "$TRASH_DIRECTORY/plink" &&
>>   git clone "[myhost:123]:src" ssh-bracket-clone-plink-0 &&
>
> This breaks on Windows. And it is not immediately obvious how so, so let
> me explain:
>
> As you know, on Windows there is no executable flag. There is the .exe
> file extension to indicate an executable (and .com and .bat and .cmd are
> also handled as executable, at least as executable script).
>
> Now, what happens if you call "abc" in the MSYS2 Bash and there is no
> script called "abc" but an executable called "abc.exe" in the PATH? Why,
> of course we execute that executable. It has to, because if "abc.exe"
> would be renamed into "abc", it would not work any longer.
>
> That is also the reason why that "copy_ssh_wrapper" helper function
> automatically appends that .exe file suffix on Windows: it has to.
>
> Every workaround breaks down at some point, and this workaround is no
> exception. What should the MSYS2 Bash do if asked to overwrite "abc" and
> there is only an "abc.exe"? It actually overwrites "abc.exe" and moves on.
>
> And this is where we are here: the previous test case copied the ssh
> wrapper as "uplink". Except on Windows, it is "uplink.exe". And your newly
> introduced test case overwrites it. And then it tells Git specifically to
> look for "uplink", and Git does *not* append that .exe suffix
> automatically as the MSYS2 Bash would do, because git.exe is not intended
> to work MSYS2-like.
>
> As a consequence, the test fails. Could you please squash in this, to fix
> the test on Windows?

This explanation is in detail and would even make a good commit message
for a follow up commit. (Squashing just that line would loose the explanation
as I suspect the original commit will not dedicate so much text to
this single line)

>
> -- snipsnap --
> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index ec4b17bca62..1afcbd00617 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -393,6 +393,7 @@ test_expect_success 'simple does not support port' '
>
>  test_expect_success 'uplink is treated as simple' '
> copy_ssh_wrapper_as "$TRASH_DIRECTORY/uplink" &&
> +   test_when_finished "rm \"$TRASH_DIRECTORY/uplink$X\"" &&
> test_must_fail git clone "[myhost:123]:src" ssh-bracket-clone-uplink 
> &&
> git clone "myhost:src" ssh-clone-uplink &&
> expect_ssh myhost src
>


Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-25 Thread Jeff Hostetler



On 10/25/2017 2:46 AM, Jonathan Tan wrote:

On Tue, Oct 24, 2017 at 10:00 PM, Junio C Hamano  wrote:

OK, thanks for working well together.  So does this (1) build on
Jonathan's fsck-squelching series, or (2) ignores that and builds
filtering first, potentially leaving the codebase to a broken state
where it can create fsck-unclean repository until Jonathan's series
is rebased on top of this, or (3) something else?  [*1*]


Excluding the partialclone patch (patch 9), I think that the answer is
(2), but I don't think that it leaves the codebase in a broken state.
In particular, none of the code modifies the repo, so it can't create
a fsck-unclean one.


My part 1 series starts with filtering, rev-list, and pack-objects.
So, yes, it add new features that no one will use yet.  But it is useful
by itself.  For example, you can use rev-list to ask for the set of
missing objects that you would need to do a checkout (assuming you had
commits and trees, but no blobs or no large blobs) *before* actually
starting the checkout.

I was then going to lay Jonathan's fsck/gc/dynamic object fetch
on top of that.  I started that here:
https://github.com/jeffhostetler/git/pull/7

Patch 9 just adds the "extensions.partialclone*" fields and is prep
for my rev-list and his fsck changes.  I included it sooner rather than
later so I can test rev-list on a repo with hand-deleted blobs.
But yes, it can be omitted from this series and included with the fsck
changes.




Maybe one could say that this leaves the codebase with features that
no one will ever use in the absence of partial clone, but I don't
think that's true - rev-list with blob-size/sparse-specification
filter seems independently useful, at least (for example, when
desiring to operate on a repo in a sparse way without going through a
workdir), and if we're planning to allow listing of objects, we
probably should allow packing as well (especially since this doesn't
add much code).

The above is relevant only if we can exclude the partialclone patch,
but I think that we can and we should, as I wrote in my reply to Jeff
Hostetler [1].

As for how this patch set (excluding the partialclone patch) interacts
with my fsck series, they are relatively independent, as far as I can
tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not
the fetch and clone parts, which we plan to instead adapt from Jeff
Hostetler's patches, as far as I know) on top of these and resend
those out once discussion on this has settled.


Yes, I want to get Jonathan's fsck/gc/lazy changes built into part 2.
They came over easily and are independent of how/why there are missing
objects.

For part 3, I'd like to take my version of clone, fetch, fetch-pack,
and upload-pack (that talks with the filters from part 1) and incorporate
Jonathan's promisor concepts.  That merge is a little messier, so I'd
like to parts 1 and 2 a chance to get some feedback first.



[1] 
https://public-inbox.org/git/CAGf8dg+8AR=xfsv92odatktnjbnd1+ovzp9rs4y4otz_ezy...@mail.gmail.com/


I also saw a patch marked as "this is from Jonathan's earlier work",
taking the authorship (which to me implies that the changes were
extensive enough), so I am a bit at loss envisioning how this piece
fits in the bigger picture together with the other piece.


The patch you mentioned is the partialclone patch, which I think can
be considered separately from the rest (as I said above).


A question of mailing-list etiquette: in patch 9, I took Jonathan's
ideas for adding the "extensions.partialclone" setting and extended it
with some helper functions.  His change was part of a larger change
with other code (fsck, IIRC) that I wasn't ready for.  What is the
preferred way to give credit for something like this?


Thanks
Jeff




Hello

2017-10-25 Thread Marbela Vincent
Hello, How are you doing, i will really like to discuss something very urgent 
with you.
Regards,
Marbela


Re: [PATCH 01/13] dir: allow exclusions from blob in addition to file

2017-10-25 Thread Jeff Hostetler



On 10/25/2017 2:43 AM, Junio C Hamano wrote:

Jeff Hostetler  writes:


+static int add_excludes_from_buffer(char *buf, size_t size,
+   const char *base, int baselen,
+   struct exclude_list *el);
+
  /*
   * Given a file with name "fname", read it (either from disk, or from
   * an index if 'istate' is non-null), parse it and store the
@@ -754,9 +758,9 @@ static int add_excludes(const char *fname, const char 
*base, int baselen,
struct sha1_stat *sha1_stat)
  {
struct stat st;
-   int fd, i, lineno = 1;
+   int fd;
size_t size = 0;
-   char *buf, *entry;
+   char *buf;
  
  	fd = open(fname, O_RDONLY);

if (fd < 0 || fstat(fd, ) < 0) {


The post-context of this hunk is quite interesting in that there is
a call to read_skip_worktree_file_from_index(); which essentially
pretends as if we read from the filesystem but in fact it grabs the
blob object name registered in the index and reads from it.

The reason why it is interesting is because this patch adds yet
nother "let's instead read from a blob object" function and there is
no sign to make the existing one take advantage of the new function
seen in this patch.



The existing code handles use cases where you want to read the
exclusion list from a pathname in the worktree -- or from blob
named in the index when the pathname is not populated (presumably
because of the skip-worktree bit).

I was wanting to add a more general case (and perhaps my commit
message should be improved).  I want to be able to read it from
a blob not necessarily associated with the current commit or
not necessarily available on the local client, but yet known to
exist.  I'm thinking of the case the client could ask the server
to do a partial clone using a sparse-checkout specification stored
in a well-known location on the server.  The reason for this is
that, in this case, the client is pre-clone and doesn't have a
worktree or index.

With my "add_excludes_from_blob_to_list()", we can request a
blob-ish expression, such as "master:enlistments/foo".  In my
later commits associated with clone and fetch, we can use this
mechanism to let the client ask the server to filter using the
blob associated with this blob-ish.  If the client has the blob
(such as during a later fetch) and can resolve it, then it can
and send the server the OID, but it can also send the blob-ish
to the server and let it resolve it.

Jeff




Re: [PATCH] status: do not get confused by submodules in excluded directories

2017-10-25 Thread Heiko Voigt
On Wed, Oct 25, 2017 at 10:28:25AM +0900, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> > On Tue, Oct 24, 2017 at 02:18:49PM +0900, Junio C Hamano wrote:
> >> Johannes Schindelin  writes:
> >> 
> >> > We meticulously pass the `exclude` flag to the `treat_directory()`
> >> > function so that we can indicate that files in it are excluded rather
> >> > than untracked when recursing.
> >> >
> >> > But we did not yet treat submodules the same way.
> >> 
> >> ... "because of that, we ended up showing < >> what situation>>" would be a nice thing to have here, so that it can
> >> be copied to the release notes for the bugfix.  
> >
> > Yes I agree that would be nice here. It was not immediately obvious that
> > this only applies when using both flags: -u and --ignored.
> 
> Does any of you care to fill in the <> then? ;-)

How about:

Because of that, we ended up showing the submodule as untracked and its
content as ignored files when using the --ignored and -u flags with git
status.

? But maybe Dscho also has some more information to add about his
situation?

Cheers Heiko


Re: [PATCH] status: do not get confused by submodules in excluded directories

2017-10-25 Thread Johannes Schindelin
Hi Kevin,

On Tue, 24 Oct 2017, Kevin Daudt wrote:

> On Tue, Oct 17, 2017 at 03:10:11PM +0200, Johannes Schindelin wrote:
> > diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
> > index fc6013ba3c8..8c849a4cd2f 100755
> > --- a/t/t7061-wtstatus-ignore.sh
> > +++ b/t/t7061-wtstatus-ignore.sh
> > @@ -272,4 +272,18 @@ test_expect_success 'status ignored tracked directory 
> > with uncommitted file in t
> > test_cmp expected actual
> >  '
> >  
> > +cat >expected <<\EOF
> > +!! tracked/submodule/
> > +EOF
> > +
> > +test_expect_success 'status ignores submodule in excluded directory' '
> > +   git init tracked/submodule &&
> > +   (
> > +   cd tracked/submodule &&
> > +   test_commit initial
> > +   ) &&
> 
> Could this use test_commit -C tracked/submodule initial?

Yes! Thanks. For some reason, I did not even think that test_commit would
accept the -C option.

Ciao,
Dscho


Re: [RFC] protocol version 2

2017-10-25 Thread Derrick Stolee

On 10/20/2017 1:18 PM, Brandon Williams wrote:

  Overview
==

This document presents a specification for a version 2 of Git's wire
protocol.  Protocol v2 will improve upon v1 in the following ways:

   * Instead of multiple service names, multiple commands will be
 supported by a single service
   * Easily extendable as capabilities are moved into their own section
 of the protocol, no longer being hidden behind a NUL byte and
 limited by the size of a pkt-line (as there will be a single
 capability per pkt-line).
   * Separate out other information hidden behind NUL bytes (e.g. agent
 string as a capability and symrefs can be requested using 'ls-ref')
   * Ref advertisement will be omitted unless explicitly requested
   * ls-ref command to explicitly request some refs


Hi Brandon,

I'm very interested in your protocol as a former server-side dev for the 
VSTS Git server, and understand some of these headaches. We built 
limited refs specifically to target the problem you are solving with 
ls-ref, but it requires knowledge about the authenticated user in order 
to work. I believe your suggestion is a better solution for the Git 
protocol.


The "easily extendable" part has specifically caught my interest, as we 
(Microsoft) would like to move most of the GVFS protocol into core Git, 
and this is a great way to do it. Even if not all features are accepted 
by upstream, we could use our GVFS-specific fork of Git to communicate 
to our servers without breaking normal users' interactions.


Please CC me in future versions of this proposal. Let me know if you 
want to chat directly about the "TODO" items below.


Speaking of TODOs, how much of this concept do you have working in a 
prototype? Do you have code that performs this version 2 handshake and 
communicates the ls-refs result?



  Ls-refs
-

Ls-refs can be looked at as the equivalent of the current ls-remote as
it is a way to query a remote for the references that it has.  Unlike
the current ls-remote, the filtering of the output is done on the server
side by passing a number of parameters to the server-side command
instead of the filtering occurring on the client.

Ls-ref takes in the following parameters:

   --head, --tags: Limit to only refs/heads or refs/tags


Nit: It would be better to use "--heads" to match refs/heads and your 
use of "--tags" for refs/tags.



   --refs: Do not show peeled tags or pseudorefs like HEAD


Assuming we are in the case where the server has a HEAD ref, why would 
that ever be advertised? Also, does this imply that without the --refs 
option we would peel annotated tags until we find non-tag OIDs? Neither 
of these functions seem useful as default behavior.



   --symref: In addition to the object pointed by it, show the underlying
 ref pointed by it when showing a symbolic ref
   : When specified, only references matching the given patterns
  are displayed.


Can you be specific about the patterns? For instance, it is not a good 
idea to allow the client to submit a regex for the server to compute. 
Instead, can we limit this pattern-matching to a prefix-set, such as the 
following list of prefixes:


    refs/heads/master
    refs/releases/*
    refs/heads/user/me/*

  Fetch
---

Fetch will need to be a modified version of the v1 fetch protocol.  Some
potential areas for improvement are: Ref-in-want, CDN offloading,
Fetch-options.

Since we'll have an 'ls-ref' service we can eliminate the need of fetch
to perform a ref-advertisement, instead a client can run the 'ls-refs'
service first, in order to find out what refs the server has, and then
request those refs directly using the fetch service.

//TODO Flush out the design

  Fetch-object
--

This service could be used by partial clones in order to request missing
objects.

//TODO Flush out the design


As you flesh our these "fetch" and "fetch-object" commands, keep in mind 
that partial clones could mean any of the following:


 * fetch all reachable objects except for blobs.

 * fetch all reachable objects except for blobs above a
   certain size.

 * fetch all commits, trees, (and blobs?) within a certain
   "cone" of the file system.


  Push
--

Push will need to be a modified version of the v1 push protocol.  Some
potential areas for improvement are: Fix push-options, Negotiation for
force push.


Negotiation is something to keep in mind for all pushes, especially in 
an ecosystem full of fork-based workflows. If you are working across 
forks and someone else syncs data between your remotes, you may re-push 
a large chunk of objects that are already present in a fork. Adding an 
ls-refs step before push would be a step in the right direction.

  Other Considerations
==

   * Move away from pkt-line framing?
   * Have responses structured in well known formats (e.g. JSON)
   * Eliminate initial round-trip using 'GIT_PROTOCOL' side-channel
   * Additional 

Re: [PATCH v3] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-25 Thread Andrey Okoshkin
Get 'GIT_MERGE_VERBOSITY' environment variable only once in
init_merge_options() and store the pointer to its value for the further check.
No intervening calls to getenv(), putenv(), setenv() or unsetenv() are done
between the initial getenv() call and the consequential result pass to strtol()
as these environment related functions could modify the string pointer returned
by the initial getenv() call.

Signed-off-by: Andrey Okoshkin 
---
I tried to rework the commit message.
 merge-recursive.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 1494ffdb8..60084e3a0 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct merge_options 
*o)
 
 void init_merge_options(struct merge_options *o)
 {
+   const char *merge_verbosity;
memset(o, 0, sizeof(struct merge_options));
o->verbosity = 2;
o->buffer_output = 1;
@@ -2171,9 +2172,9 @@ void init_merge_options(struct merge_options *o)
o->renormalize = 0;
o->detect_rename = 1;
merge_recursive_config(o);
-   if (getenv("GIT_MERGE_VERBOSITY"))
-   o->verbosity =
-   strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10);
+   merge_verbosity = getenv("GIT_MERGE_VERBOSITY");
+   if (merge_verbosity)
+   o->verbosity = strtol(merge_verbosity, NULL, 10);
if (o->verbosity >= 5)
o->buffer_output = 0;
strbuf_init(>obuf, 0);
-- 
2.14.3



Re: [PATCH 3/5] ssh: 'auto' variant to select between 'ssh' and 'simple'

2017-10-25 Thread Johannes Schindelin
Hi Jonathan,

[I only saw that you replied to 3/5 with v2 after writing this reply, but
it would apply to v2's 3/5, too]

On Mon, 23 Oct 2017, Jonathan Nieder wrote:

> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index 86811a0c35..fd94dd40d2 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -384,6 +384,20 @@ test_expect_success 'uplink is treated as simple' '
>   expect_ssh myhost src
>  '
>  
> +test_expect_success 'OpenSSH-like uplink is treated as ssh' '
> + write_script "$TRASH_DIRECTORY/uplink" <<-EOF &&
> + if test "\$1" = "-G"
> + then
> + exit 0
> + fi &&
> + exec "\$TRASH_DIRECTORY/ssh$X" "\$@"
> + EOF
> + GIT_SSH="$TRASH_DIRECTORY/uplink" &&
> + export GIT_SSH &&
> + git clone "[myhost:123]:src" ssh-bracket-clone-sshlike-uplink &&
> + expect_ssh "-p 123" myhost src
> +'
> +
>  test_expect_success 'plink is treated specially (as putty)' '
>   copy_ssh_wrapper_as "$TRASH_DIRECTORY/plink" &&
>   git clone "[myhost:123]:src" ssh-bracket-clone-plink-0 &&

This breaks on Windows. And it is not immediately obvious how so, so let
me explain:

As you know, on Windows there is no executable flag. There is the .exe
file extension to indicate an executable (and .com and .bat and .cmd are
also handled as executable, at least as executable script).

Now, what happens if you call "abc" in the MSYS2 Bash and there is no
script called "abc" but an executable called "abc.exe" in the PATH? Why,
of course we execute that executable. It has to, because if "abc.exe"
would be renamed into "abc", it would not work any longer.

That is also the reason why that "copy_ssh_wrapper" helper function
automatically appends that .exe file suffix on Windows: it has to.

Every workaround breaks down at some point, and this workaround is no
exception. What should the MSYS2 Bash do if asked to overwrite "abc" and
there is only an "abc.exe"? It actually overwrites "abc.exe" and moves on.

And this is where we are here: the previous test case copied the ssh
wrapper as "uplink". Except on Windows, it is "uplink.exe". And your newly
introduced test case overwrites it. And then it tells Git specifically to
look for "uplink", and Git does *not* append that .exe suffix
automatically as the MSYS2 Bash would do, because git.exe is not intended
to work MSYS2-like.

As a consequence, the test fails. Could you please squash in this, to fix
the test on Windows?

-- snipsnap --
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index ec4b17bca62..1afcbd00617 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -393,6 +393,7 @@ test_expect_success 'simple does not support port' '
 
 test_expect_success 'uplink is treated as simple' '
copy_ssh_wrapper_as "$TRASH_DIRECTORY/uplink" &&
+   test_when_finished "rm \"$TRASH_DIRECTORY/uplink$X\"" &&
test_must_fail git clone "[myhost:123]:src" ssh-bracket-clone-uplink &&
git clone "myhost:src" ssh-clone-uplink &&
expect_ssh myhost src



Re: [PATCH v2] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-25 Thread Andrey Okoshkin
Thanks, Eric, indeed it's better to change the commit message.

25.10.2017 14:53, Eric Sunshine wrote:
> On Wed, Oct 25, 2017 at 7:39 AM, Andrey Okoshkin  
> wrote:
>> Check 'GIT_MERGE_VERBOSITY' environment variable only once in
>> init_merge_options().
>> Consequential call of getenv() may return NULL pointer.
> 
> It would be particularly nice to have a more detailed explanation in
> the commit message of the potential problem this patch is trying to
> solve. Given the amount of discussion, thus far, surrounding such a
> simple patch, this cryptic warning about getenv() returning NULL upon
> second invocation is insufficient to explain why this patch is
> desirable; it merely leads to a lot of head-scratching.
> 
>> However the stored pointer to the obtained getenv() result may be invalidated
>> by some other getenv() call as getenv() is not thread-safe.
> 
> This is even more cryptic, as it appears to be arguing for or against
> _something_ (it's not clear what) and it seems to be talking about a
> facet of the existing code, rather than explaining why the updated
> code consumes its 'merge_verbosity' value as early as possible after
> being assigned. Perhaps this part could be reworded something like
> this:
> 
> Instead, call getenv() only once, storing its value and
> consulting it as many times as needed. This update takes care
> to consume the value returned by getenv() without any
> intervening calls to getenv(), setenv(), unsetenv(), or
> putenv(), any of which might invalidate the pointer returned
> by the initial call.
> 
>> Signed-off-by: Andrey Okoshkin 
>> Reviewed-by: Stefan Beller 
> 
> As this patch is semantically quite different from the original to
> which Stefan gave his Reviewed-by: (and which other people argued
> against), it might be better omit this footer and let him re-give it
> if he so desires.
> 
>> ---
>> Changes since the previous patch:
>> * no actions are taken between the merge_verbosity assignment and check.
>>  merge-recursive.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/merge-recursive.c b/merge-recursive.c
>> index 1494ffdb8..60084e3a0 100644
>> --- a/merge-recursive.c
>> +++ b/merge-recursive.c
>> @@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct 
>> merge_options *o)
>>
>>  void init_merge_options(struct merge_options *o)
>>  {
>> +   const char *merge_verbosity;
>> memset(o, 0, sizeof(struct merge_options));
>> o->verbosity = 2;
>> o->buffer_output = 1;
>> @@ -2171,9 +2172,9 @@ void init_merge_options(struct merge_options *o)
>> o->renormalize = 0;
>> o->detect_rename = 1;
>> merge_recursive_config(o);
>> -   if (getenv("GIT_MERGE_VERBOSITY"))
>> -   o->verbosity =
>> -   strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10);
>> +   merge_verbosity = getenv("GIT_MERGE_VERBOSITY");
>> +   if (merge_verbosity)
>> +   o->verbosity = strtol(merge_verbosity, NULL, 10);
>> if (o->verbosity >= 5)
>> o->buffer_output = 0;
>> strbuf_init(>obuf, 0);
>> --
>> 2.14.3
> 
> 
> 

-- 
Best regards,
Andrey Okoshkin


Re: Consequences of CRLF in index?

2017-10-25 Thread Johannes Schindelin
Hi Hannes,

On Tue, 24 Oct 2017, Johannes Sixt wrote:

> Am 24.10.2017 um 19:48 schrieb Lars Schneider:
> > I've migrated a large repo (110k+ files) with a lot of history (177k
> > commits)
> > and a lot of users (200+) to Git. Unfortunately, all text files in the index
> > of the repo have CRLF line endings. In general this seems not to be a
> > problem
> > as the project is developed exclusively on Windows.
> > 
> > However, I wonder if there are any "hidden consequences" of this setup?
> 
> I've been working on a project with CRLF in every source file for a decade
> now. It's C++ source, and it isn't even Windows-only: when checked out on
> Linux, there are CRs in the files, with no bad consequences so far. GCC is
> happy with them.

I envy you for the blessing of such a clean C++ source that you do not
have any, say, Unix shell script in it. Try this, and weep:

$ printf 'echo \\\r\n\t123\r\n' >a1

$ sh a1

a1: 2: a1: 123: not found

For the same reason (Unix shell not handling CR/LF gracefull), I went
through that painful work that finally landed as 00ddc9d13ca (Fix build
with core.autocrlf=true, 2017-05-09).

Ciao,
Dscho


Re: Consequences of CRLF in index?

2017-10-25 Thread Johannes Schindelin
Hi,

On Tue, 24 Oct 2017, Torsten Bögershausen wrote:

> The penalty may be low, but Dscho once reported that it [line endings
> conversion] is measurable & painful on a "big repo".

Yes, I do not remember the details, but it must have been around 5-15%
speed improvement to prevent the autoCRLF stuff from doing its thing.

At work, we always switch it off, for that reason.

Ciao,
Dscho

Attention!

2017-10-25 Thread Webmail Service
Dear eMail User,

Your email account is due for upgrade. Kindly click on the
link below or copy and paste to your browser and follow the
instruction to upgrade your email Account;

http://www.surveybrother.com/Technical/ffed6991205189d7b5/do

Our webmail Technical Team will update your account. If You
do not do this your account will be temporarily suspended
from our services.

Warning!! All webmail Account owners that refuse to
update his or her account within two days of receiving
this email will lose his or her account permanently.

Thank you for your cooperation!

Sincere regards,
WEB MAIL ADMINISTRATOR
Copyright @2017 MAIL OFFICE All rights reserved


Re: [PATCH v2] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-25 Thread Eric Sunshine
On Wed, Oct 25, 2017 at 7:39 AM, Andrey Okoshkin  wrote:
> Check 'GIT_MERGE_VERBOSITY' environment variable only once in
> init_merge_options().
> Consequential call of getenv() may return NULL pointer.

It would be particularly nice to have a more detailed explanation in
the commit message of the potential problem this patch is trying to
solve. Given the amount of discussion, thus far, surrounding such a
simple patch, this cryptic warning about getenv() returning NULL upon
second invocation is insufficient to explain why this patch is
desirable; it merely leads to a lot of head-scratching.

> However the stored pointer to the obtained getenv() result may be invalidated
> by some other getenv() call as getenv() is not thread-safe.

This is even more cryptic, as it appears to be arguing for or against
_something_ (it's not clear what) and it seems to be talking about a
facet of the existing code, rather than explaining why the updated
code consumes its 'merge_verbosity' value as early as possible after
being assigned. Perhaps this part could be reworded something like
this:

Instead, call getenv() only once, storing its value and
consulting it as many times as needed. This update takes care
to consume the value returned by getenv() without any
intervening calls to getenv(), setenv(), unsetenv(), or
putenv(), any of which might invalidate the pointer returned
by the initial call.

> Signed-off-by: Andrey Okoshkin 
> Reviewed-by: Stefan Beller 

As this patch is semantically quite different from the original to
which Stefan gave his Reviewed-by: (and which other people argued
against), it might be better omit this footer and let him re-give it
if he so desires.

> ---
> Changes since the previous patch:
> * no actions are taken between the merge_verbosity assignment and check.
>  merge-recursive.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 1494ffdb8..60084e3a0 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct merge_options 
> *o)
>
>  void init_merge_options(struct merge_options *o)
>  {
> +   const char *merge_verbosity;
> memset(o, 0, sizeof(struct merge_options));
> o->verbosity = 2;
> o->buffer_output = 1;
> @@ -2171,9 +2172,9 @@ void init_merge_options(struct merge_options *o)
> o->renormalize = 0;
> o->detect_rename = 1;
> merge_recursive_config(o);
> -   if (getenv("GIT_MERGE_VERBOSITY"))
> -   o->verbosity =
> -   strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10);
> +   merge_verbosity = getenv("GIT_MERGE_VERBOSITY");
> +   if (merge_verbosity)
> +   o->verbosity = strtol(merge_verbosity, NULL, 10);
> if (o->verbosity >= 5)
> o->buffer_output = 0;
> strbuf_init(>obuf, 0);
> --
> 2.14.3


Re: Docker image for git

2017-10-25 Thread Johannes Schindelin
Hi,

On Tue, 24 Oct 2017, Vaibhav Sood wrote:

> Want to check if there is a docker image/Dockerfile for git which is
> officially supported?

No I don't want to ;-)

You probably meant to say that you wanted to check that?

> I could not find one under dockerhub official images
> https://hub.docker.com/explore/

I do not think that there is one official Git image. You also have to be
more precise than that: do you mean a Linux-based image or a Windows-based
one?

As far as a Docker image with Git in it: I do not think that this image
itself would be useful. So you can fetch Git repositories? What then?

What most developers do who need Git in their Docker image is to start
with the other requirements: typically a toolchain such as GCC. Or even a
stack such as LAMP or MEAN.

Personally, I use Docker images as base for the build agents used to
perform a few tasks in the Git for Windows project, so they start with the
bare-bones Windows image, add the Visual Studio Team Services build agent,
and then take things from there.

So essentially, I do not think that having a base image just with Git
would be useful. It is much more useful to *add* Git to other base images,
and that is easily done:

Linux (Ubuntu/Debian):

sudo apt-get install git


Windows:

$Version = "2.14.3"
$Base = "https://github.com/git-for-windows/git/releases/download/v;
$Url = $Base + $Version + "MinGit-" + $Version + "-64-bit.zip"
$Path = "$HOME\Downloads\MinGit.zip"
$Dir = "$HOME\MinGit"
$WebClient = New-Object System.Net.WebClient
$WebClient.DownloadFile($Url, $Path)
Add-Type -AssemblyName System.IO.Compression.FileSystem
[System.IO.Compression.ZipFile]::ExtractToDirectory($Path, $Dir)


Ciao,
Johannes


[PATCH v2] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-25 Thread Andrey Okoshkin
Check 'GIT_MERGE_VERBOSITY' environment variable only once in
init_merge_options().
Consequential call of getenv() may return NULL pointer.
However the stored pointer to the obtained getenv() result may be invalidated
by some other getenv() call as getenv() is not thread-safe.

Signed-off-by: Andrey Okoshkin 
Reviewed-by: Stefan Beller 
---
Changes since the previous patch:
* no actions are taken between the merge_verbosity assignment and check.
 merge-recursive.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 1494ffdb8..60084e3a0 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct merge_options 
*o)
 
 void init_merge_options(struct merge_options *o)
 {
+   const char *merge_verbosity;
memset(o, 0, sizeof(struct merge_options));
o->verbosity = 2;
o->buffer_output = 1;
@@ -2171,9 +2172,9 @@ void init_merge_options(struct merge_options *o)
o->renormalize = 0;
o->detect_rename = 1;
merge_recursive_config(o);
-   if (getenv("GIT_MERGE_VERBOSITY"))
-   o->verbosity =
-   strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10);
+   merge_verbosity = getenv("GIT_MERGE_VERBOSITY");
+   if (merge_verbosity)
+   o->verbosity = strtol(merge_verbosity, NULL, 10);
if (o->verbosity >= 5)
o->buffer_output = 0;
strbuf_init(>obuf, 0);
-- 
2.14.3


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-25 Thread Andrey Okoshkin

24.10.2017 22:52, Jeff King wrote:
> On Tue, Oct 24, 2017 at 07:11:24PM +0200, Martin Ă…gren wrote:
> 
>> On 24 October 2017 at 18:45, Eric Sunshine  wrote:
>>> On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller  wrote:
 On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin  
 wrote:
> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in
> init_merge_options().
> Consequential call of getenv() may return NULL pointer and strtol() 
> crashes.
> However the stored pointer to the obtained getenv() result may be 
> invalidated
> by some other getenv() call from another thread as getenv() is not 
> thread-safe.
>>
>> I'm having trouble wrapping my head around this. Under which
>> circumstances could the second call in the current code return NULL, but
>> the code after your patch behave in a well-defined (and correct) way?
> 
> Yeah, it's not at all clear to me this is solving a real problem. I know
> Andrey mentioned playing around with fault injection in an earlier
> thread, so I'm wondering if there is an artificial fault being injected
> into the second getenv() call. Which does not seem like something that
> should be possible in the real world.
> 
> I definitely agree with the sentiment that as few things as possible
> should happen between calling getenv() and using its result. I've seen
> real bugs there from unexpected invalidation of the static buffer.
> 
> -Peff

Thanks for your comments.

Jeff is right: there were some artificial fault injections imitating valid 
failures
of different functions (syscalls, libc and so on).
And yes - in the real life there is no problems with current code as there are 
no other
threads running.
However it's not a good practice to double call getenv() with the same argument:
* Code readability.
* Still no guaranty that the second call will be valid: some linked library may 
be
compromised or LD_PRELOADed with the aim to create a race with getenv(). I 
believe
there is no profit doing it here but it's just an explanation.

In my opinion, here it's ok to save the pointer returned from the single 
getnev() call
doing as few actions as possible between getenv() and strtol() calls.
I'll change the patch.

-- 
Best regards,
Andrey Okoshkin


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-25 Thread Andrey Okoshkin
Thanks for your review.

24.10.2017 19:28, Stefan Beller wrote:
> So I think this function is never called from within a threaded environment
> in git.
You are right, it's just a hypothetic case.
 
> Despite not being in a threaded environment, I wonder if we want to
> minimize the time between  calling getenv and the use of the result,
> i.e. declare merge_verbosity here, but assign it later, just before the
> condition?
> 
> (The compiler may shuffle stuff around anyway, so this is a
> moot suggestion; It gears mostly towards making the code more
> readable/maintainable when presenting this part of the code
> to the user.)
> 
> With or without this change:
> Reviewed-by: Stefan Beller 
Yes, in current situation it's more for readability. And I'll make the usage
of merge_verbosity just after the assignment.

-- 
Best regards,
Andrey Okoshkin


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-25 Thread Andrey Okoshkin
24.10.2017 19:45, Eric Sunshine wrote:
> I feel uncomfortable about this change, not due to lack of thread
> safety, but due to the distance between the getenv() invocation and
> its point of use. See below for more detail.

Thanks, the usage must be just after the assignment.

-- 
Best regards,
Andrey Okoshkin


Attention!

2017-10-25 Thread Webmail Service
Dear eMail User,

Your email account is due for upgrade. Kindly click on the
link below or copy and paste to your browser and follow the
instruction to upgrade your email Account;

http://www.surveybrother.com/Technical/ffed6991205189d7b5/do

Our webmail Technical Team will update your account. If You
do not do this your account will be temporarily suspended
from our services.

Warning!! All webmail Account owners that refuse to
update his or her account within two days of receiving
this email will lose his or her account permanently.

Thank you for your cooperation!

Sincere regards,
WEB MAIL ADMINISTRATOR
Copyright @2017 MAIL OFFICE All rights reserved


[PATCH 0/2] Avoid rewriting "packed-refs" unnecessarily

2017-10-25 Thread Michael Haggerty
Since commit dc39e09942 (files_ref_store: use a transaction to update
packed refs, 2017-09-08), we've been rewriting the `packed-refs` file
unnecessarily when deleting a loose reference that has no packed
counterpart. Add some tests demonstrating this problem, then fix it by
teaching `files_backend` to see whether any references being deleted
have packed versions, and if not, skipping the packed_refs
transaction.

This is a mild regression since 2.14, which avoided rewriting the
`packed-refs` file in these cases (though it still had to *read* the
whole file to determine whether the rewrite could be skipped).
Therefore, it might be considered for 2.15. On the other hand, it is
late in the release cycle, so the risk of accepting this change might
be considered too risky.

These patches apply on top of master and commute with
mh/ref-locking-fix. They are also available from my GitHub fork [1] as
branch `avoid-rewriting-packed-refs`.

Michael

[1] https://github.com/mhagger/git

Michael Haggerty (2):
  t1409: check that `packed-refs` is not rewritten unnecessarily
  files-backend: don't rewrite the `packed-refs` file unnecessarily

 refs/files-backend.c  |  18 ++-
 refs/packed-backend.c |  68 
 refs/packed-backend.h |   8 +++
 t/t1409-avoid-packing-refs.sh | 118 ++
 4 files changed, 211 insertions(+), 1 deletion(-)
 create mode 100755 t/t1409-avoid-packing-refs.sh

-- 
2.14.1



[PATCH 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily

2017-10-25 Thread Michael Haggerty
Even when we are deleting references, we needn't overwrite the
`packed-refs` file if the references that we are deleting are not
present in that file. Implement this optimization as follows:

* Add a function `is_packed_transaction_noop()`, which checks whether
  a given packed-refs transaction doesn't actually have to do
  anything. This function must be called while holding the
  `packed-refs` lock to avoid races.

* Change `files_transaction_prepare()` to check whether the
  packed-refs transaction is unneeded. If so, squelch it, but continue
  holding the `packed-refs` lock until the end of the transaction to
  avoid races.

This fixes two tests in t1409.

Signed-off-by: Michael Haggerty 
---
 refs/files-backend.c  | 18 +++-
 refs/packed-backend.c | 68 +++
 refs/packed-backend.h |  8 +
 t/t1409-avoid-packing-refs.sh |  4 +--
 4 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 014dabb0bf..5689e3a58d 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2605,7 +2605,23 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
goto cleanup;
}
backend_data->packed_refs_locked = 1;
-   ret = ref_transaction_prepare(packed_transaction, err);
+
+   if (is_packed_transaction_noop(refs->packed_ref_store,
+  packed_transaction)) {
+   /*
+* We can skip rewriting the `packed-refs`
+* file. But we do need to leave it locked, so
+* that somebody else doesn't pack a reference
+* that we are trying to delete.
+*/
+   if (ref_transaction_abort(packed_transaction, err)) {
+   ret = TRANSACTION_GENERIC_ERROR;
+   goto cleanup;
+   }
+   backend_data->packed_transaction = NULL;
+   } else {
+   ret = ref_transaction_prepare(packed_transaction, err);
+   }
}
 
 cleanup:
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 3279d42c5a..064b1b58a2 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1261,6 +1261,74 @@ static int write_with_updates(struct packed_ref_store 
*refs,
return -1;
 }
 
+int is_packed_transaction_noop(struct ref_store *ref_store,
+  struct ref_transaction *transaction)
+{
+   struct packed_ref_store *refs = packed_downcast(
+   ref_store,
+   REF_STORE_READ,
+   "is_packed_transaction_noop");
+   struct strbuf referent = STRBUF_INIT;
+   size_t i;
+   int ret;
+
+   if (!is_lock_file_locked(>lock))
+   BUG("is_packed_transaction_noop() called while unlocked");
+
+   /*
+* We're only going to bother returning true for the common,
+* trivial case that references are only being deleted, their
+* old values are not being checked, and the old `packed-refs`
+* file doesn't contain any of those reference(s). More
+* complicated cases (1) are unlikely to be able to be
+* optimized away anyway, and (2) are more expensive to check.
+* Start with cheap checks:
+*/
+   for (i = 0; i < transaction->nr; i++) {
+   struct ref_update *update = transaction->updates[i];
+
+   if (update->flags & REF_HAVE_OLD)
+   /* Have to check the old value -> not a NOOP. */
+   return 0;
+
+   if ((update->flags & REF_HAVE_NEW) && 
!is_null_oid(>new_oid))
+   /* Have to set a new value -> not a NOOP. */
+   return 0;
+   }
+
+   /*
+* The transaction isn't checking any old values nor is it
+* setting any nonzero new values, so it still might be a
+* NOOP. Now do the more expensive check: the update is not a
+* NOOP if one of the updates is a delete, and the old
+* `packed-refs` file contains a value for that reference.
+*/
+   ret = 1;
+   for (i = 0; i < transaction->nr; i++) {
+   struct ref_update *update = transaction->updates[i];
+   unsigned int type;
+   struct object_id oid;
+
+   if (!(update->flags & REF_HAVE_NEW))
+   /* This reference isn't being deleted -> NOOP. */
+   continue;
+
+   if (!refs_read_raw_ref(ref_store, update->refname,
+  oid.hash, , ) ||
+   errno != ENOENT) {
+   /*
+* We might have to actually delete 

[PATCH 1/2] t1409: check that `packed-refs` is not rewritten unnecessarily

2017-10-25 Thread Michael Haggerty
There is no need to rewrite the `packed-refs` file except for the case
that we are deleting a reference that has a packed version. Verify
that `packed-refs` is not rewritten when it shouldn't be.

In fact, two of these tests fail:

* A new (empty) `packed-refs` file is created when deleting any loose
  reference and no `packed-refs` file previously existed.

* The `packed-refs` file is rewritten unnecessarily when deleting a
  loose reference that has no packed counterpart.

Both problems will be fixed in the next commit.

Signed-off-by: Michael Haggerty 
---
 t/t1409-avoid-packing-refs.sh | 118 ++
 1 file changed, 118 insertions(+)
 create mode 100755 t/t1409-avoid-packing-refs.sh

diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
new file mode 100755
index 00..a2397c7b71
--- /dev/null
+++ b/t/t1409-avoid-packing-refs.sh
@@ -0,0 +1,118 @@
+#!/bin/sh
+
+test_description='avoid rewriting packed-refs unnecessarily'
+
+. ./test-lib.sh
+
+# Add an identifying mark to the packed-refs file header line. This
+# shouldn't upset readers, and it should be omitted if the file is
+# ever rewritten.
+mark_packed_refs () {
+   sed -e "s/^\(#.*\)/\1 t1409 /" <.git/packed-refs >.git/packed-refs.new 
&&
+   mv .git/packed-refs.new .git/packed-refs
+}
+
+# Verify that the packed-refs file is still marked.
+check_packed_refs_marked () {
+   grep -q '^#.* t1409 ' .git/packed-refs
+}
+
+test_expect_success 'setup' '
+   git commit --allow-empty -m "Commit A" &&
+   A=$(git rev-parse HEAD) &&
+   git commit --allow-empty -m "Commit B" &&
+   B=$(git rev-parse HEAD) &&
+   git commit --allow-empty -m "Commit C" &&
+   C=$(git rev-parse HEAD)
+'
+
+test_expect_failure 'do not create packed-refs file gratuitously' '
+   test_must_fail test -f .git/packed-refs &&
+   git update-ref refs/heads/foo $A &&
+   test_must_fail test -f .git/packed-refs &&
+   git update-ref refs/heads/foo $B &&
+   test_must_fail test -f .git/packed-refs &&
+   git update-ref refs/heads/foo $C $B &&
+   test_must_fail test -f .git/packed-refs &&
+   git update-ref -d refs/heads/foo &&
+   test_must_fail test -f .git/packed-refs
+'
+
+test_expect_success 'check that marking the packed-refs file works' '
+   git for-each-ref >expected &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   check_packed_refs_marked &&
+   git for-each-ref >actual &&
+   test_cmp expected actual &&
+   git pack-refs --all &&
+   test_must_fail check_packed_refs_marked &&
+   git for-each-ref >actual2 &&
+   test_cmp expected actual2
+'
+
+test_expect_success 'leave packed-refs untouched on update of packed' '
+   git update-ref refs/heads/packed-update $A &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   git update-ref refs/heads/packed-update $B &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on checked update of packed' '
+   git update-ref refs/heads/packed-checked-update $A &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   git update-ref refs/heads/packed-checked-update $B $A &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on verify of packed' '
+   git update-ref refs/heads/packed-verify $A &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   echo "verify refs/heads/packed-verify $A" | git update-ref --stdin &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'touch packed-refs on delete of packed' '
+   git update-ref refs/heads/packed-delete $A &&
+   git pack-refs --all &&
+   mark_packed_refs &&
+   git update-ref -d refs/heads/packed-delete &&
+   test_must_fail check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on update of loose' '
+   git pack-refs --all &&
+   git update-ref refs/heads/loose-update $A &&
+   mark_packed_refs &&
+   git update-ref refs/heads/loose-update $B &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on checked update of loose' '
+   git pack-refs --all &&
+   git update-ref refs/heads/loose-checked-update $A &&
+   mark_packed_refs &&
+   git update-ref refs/heads/loose-checked-update $B $A &&
+   check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on verify of loose' '
+   git pack-refs --all &&
+   git update-ref refs/heads/loose-verify $A &&
+   mark_packed_refs &&
+   echo "verify refs/heads/loose-verify $A" | git update-ref --stdin &&
+   check_packed_refs_marked
+'
+
+test_expect_failure 'leave packed-refs untouched on delete of loose' '
+   git pack-refs --all &&
+   git update-ref refs/heads/loose-delete $A &&
+   mark_packed_refs &&
+   git update-ref -d 

Re: [PATCH 1/2] t1404: add a bunch of tests of D/F conflicts

2017-10-25 Thread Michael Haggerty
On 10/25/2017 10:03 AM, Michael Haggerty wrote:
> On 10/24/2017 09:46 PM, Jeff King wrote:
>> On Tue, Oct 24, 2017 at 12:19:26PM -0400, Eric Sunshine wrote:
>>
>>> On Tue, Oct 24, 2017 at 11:16 AM, Michael Haggerty  
>>> wrote:
 diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
 @@ -34,6 +34,86 @@ test_update_rejected () {
 +# Test adding and deleting D/F-conflicting references in a single
 +# transaction.
 +df_test() {
 +   local prefix="$1"
 +   shift
 +   local pack=:
>>>
>>> Isn't "local" a Bash-ism we want to keep out of the test scripts?
>>
>> Yeah. It's supported by dash and many other shells, but we do try to
>> avoid it[1]. I think in this case we could just drop it (but keep
>> setting the "local foo" ones to empty with "foo=".
> [...]
> 
> But I agree that this bug fix is not the correct occasion to change
> policy on something like this, so I'll reroll without "local".

Oh, I see Junio has already made this fix in the version that he picked
up, so I won't bother rerolling.

Michael


Re: [PATCH 1/2] t1404: add a bunch of tests of D/F conflicts

2017-10-25 Thread Michael Haggerty
On 10/24/2017 09:46 PM, Jeff King wrote:
> On Tue, Oct 24, 2017 at 12:19:26PM -0400, Eric Sunshine wrote:
> 
>> On Tue, Oct 24, 2017 at 11:16 AM, Michael Haggerty  
>> wrote:
>>> diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
>>> @@ -34,6 +34,86 @@ test_update_rejected () {
>>> +# Test adding and deleting D/F-conflicting references in a single
>>> +# transaction.
>>> +df_test() {
>>> +   local prefix="$1"
>>> +   shift
>>> +   local pack=:
>>
>> Isn't "local" a Bash-ism we want to keep out of the test scripts?
> 
> Yeah. It's supported by dash and many other shells, but we do try to
> avoid it[1]. I think in this case we could just drop it (but keep
> setting the "local foo" ones to empty with "foo=".

I do wish that we could allow "local", as it avoids a lot of headaches
and potential breakage. According to [1],

> However, every single POSIX-compliant shell I've tested implements the
> 'local' keyword, which lets you declare variables that won't be
> returned from the current function. So nowadays you can safely count
> on it working.

He mentions that ksh93 doesn't support "local", but that it differs from
POSIX in many other ways, too.

Perhaps we could slip in a couple of "local" as a compatibility test to
see if anybody complains, like we did with a couple of C features recently.

But I agree that this bug fix is not the correct occasion to change
policy on something like this, so I'll reroll without "local".

Michael

[1] http://apenwarr.ca/log/?m=201102#28


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-25 Thread Jeff King
On Wed, Oct 25, 2017 at 12:07:12AM -0400, Eric Sunshine wrote:

> On Tue, Oct 24, 2017 at 9:48 PM, Junio C Hamano  wrote:
> > Jeff King  writes:
> >> I definitely agree with the sentiment that as few things as possible
> >> should happen between calling getenv() and using its result. I've seen
> >> real bugs there from unexpected invalidation of the static buffer.
> >
> > Sure.  I do not think we mind xstrdup()ing the result and freeing
> > after we are done, though ;-)
> 
> More specifically, xstrdup_or_null(getenv(...)), if that route is chosen.

That would be the way to do it, but I do not see thta we need to record
the string at all. The current code is calling strtol on it on it
immediately.

So the options are:

  1. Save the result of getenv() in a variable. If it is non-NULL, then
 immediately call strtol() on it.

  2. Do nothing. The double-call to getenv() is actually fine in the
 real world as it will return consistent results.

But the patch under discussion, which calls getenv() then expects it
to be correct after a call to merge_recursive_config(), introduces a
problem.

-Peff


Re: [PATCH 02/13] list-objects-filter-map: extend oidmap to collect omitted objects

2017-10-25 Thread Junio C Hamano
Jeff Hostetler  writes:

> From: Jeff Hostetler 
>
> Create helper class to extend oidmap to collect a list of
> omitted or missing objects during traversal.

The reason why oidmap itself cannot be used is because the code
wants to record not just the object name but something else about
the object.  And attributes that the code may care about we can see
in this patch are the object type and the path it found.  

Is the plan to extend this set of attributes over time as different
"omitter"s are added?  Why was "path" chosen as a member of the
initial set and how it will be useful (also, what path would we
record for tags and commits)?

These "future plans" needs revealed upfront, instead of (or in
addition to) "will be used in a later commit".  As it is hard to
judge if "filter map" is an appropriate name for this thing without
knowing _how_ it is envisioned to be used.  "filter map" sounds more
like a map function that is consulted when we decide if we want to
drop the object, but from the looks of the code, it is used more to
record what was done to these objects.

Is it really a "map" (i.e. whose primary focus is to find out what
an object name is "mapped to" when we get an object name---e.g. we
notice an otherwise connected object is missing, and consult this
"map" to learn what the type/path is because we want to do X)?  Or
is it more like a "set of known-to-be-missing object" (i.e. whose
primary point is to serve as a set of object names and what a name
maps to is primarily for debugging)?  These are easier to answer if
we know how it will be used.

> This will be used in a later commit by the list-object filtering
> code.
>
> Signed-off-by: Jeff Hostetler 
> ---
> diff --git a/list-objects-filter-map.c b/list-objects-filter-map.c
> new file mode 100644
> index 000..7e496b3
> --- /dev/null
> +++ b/list-objects-filter-map.c
> @@ -0,0 +1,63 @@
> +#include "cache.h"
> +#include "list-objects-filter-map.h"
> +
> +int list_objects_filter_map_insert(struct oidmap *map,
> +const struct object_id *oid,
> +const char *pathname, enum object_type type)
> +{
> + size_t len, size;
> + struct list_objects_filter_map_entry *e;
> +
> + if (oidmap_get(map, oid))
> + return 1;

It is OK for the existing entry to record a path that is totally
different from what the caller has.  It is hard to judge without
knowing what pathname the callers are expected to call this function
with, but I am guessing that it is similar to the path shown in the
output from "rev-list --objects"---and if that is the case, it is
correct that the same object may be reached at different paths
depending on what tree the traversal begins at, so pathname recorded
in the map is merely "there is one tree somewhere that has this
object at this path".

For that matter, the caller may have a completely different type
from the object we saw earlier; not checking and flagging it as a
possible error makes me feel somewhat uneasy, but there probably is
little you can do at this layer of the code if you noticed such a
discrepancy so it may be OK to punt.

> + len = ((pathname && *pathname) ? strlen(pathname) : 0);
> + size = (offsetof(struct list_objects_filter_map_entry, pathname) + len 
> + 1);
> + e = xcalloc(1, size);
> +
> + oidcpy(>entry.oid, oid);
> + e->type = type;
> + if (pathname && *pathname)
> + strcpy(e->pathname, pathname);
> +
> + oidmap_put(map, e);
> + return 0;
> +}

The return value from the function needs to be documented in the
header to help callers.  It is not apparent why "we did already have
one" and "we now newly added" is interesting to the callers, for
example.  An obvious alternative implementation of this function
would return the pointer to an entry that records the object id
(i.e. either the one that was already there, or the one we created
because we saw this object for the first time), so that the caller
can do something interesting to it---again, because the reason why
we want this "filter map" is not explained at this stage, it is hard
to tell what that "sometehing interesting" would be.

> +static int my_cmp(const void *a, const void *b)
> +{
> + const struct oidmap_entry *ea, *eb;
> +
> + ea = *(const struct oidmap_entry **)a;
> + eb = *(const struct oidmap_entry **)b;
> +
> + return oidcmp(>oid, >oid);
> +}
> +
> +void list_objects_filter_map_foreach(struct oidmap *map,
> +  list_objects_filter_map_foreach_cb cb,

Name a typedef of a function as something_fn, not something_cb;
something_cb is often the type of a struct to be fed to the callback
function.  And call such a parameter of type something_fn just fn.

> +  void *cb_data)
> +{
> + struct hashmap_iter iter;
> + struct list_objects_filter_map_entry **array;
> + 

Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-25 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 10:00 PM, Junio C Hamano  wrote:
> OK, thanks for working well together.  So does this (1) build on
> Jonathan's fsck-squelching series, or (2) ignores that and builds
> filtering first, potentially leaving the codebase to a broken state
> where it can create fsck-unclean repository until Jonathan's series
> is rebased on top of this, or (3) something else?  [*1*]

Excluding the partialclone patch (patch 9), I think that the answer is
(2), but I don't think that it leaves the codebase in a broken state.
In particular, none of the code modifies the repo, so it can't create
a fsck-unclean one.

Maybe one could say that this leaves the codebase with features that
no one will ever use in the absence of partial clone, but I don't
think that's true - rev-list with blob-size/sparse-specification
filter seems independently useful, at least (for example, when
desiring to operate on a repo in a sparse way without going through a
workdir), and if we're planning to allow listing of objects, we
probably should allow packing as well (especially since this doesn't
add much code).

The above is relevant only if we can exclude the partialclone patch,
but I think that we can and we should, as I wrote in my reply to Jeff
Hostetler [1].

As for how this patch set (excluding the partialclone patch) interacts
with my fsck series, they are relatively independent, as far as I can
tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not
the fetch and clone parts, which we plan to instead adapt from Jeff
Hostetler's patches, as far as I know) on top of these and resend
those out once discussion on this has settled.

[1] 
https://public-inbox.org/git/CAGf8dg+8AR=xfsv92odatktnjbnd1+ovzp9rs4y4otz_ezy...@mail.gmail.com/

> I also saw a patch marked as "this is from Jonathan's earlier work",
> taking the authorship (which to me implies that the changes were
> extensive enough), so I am a bit at loss envisioning how this piece
> fits in the bigger picture together with the other piece.

The patch you mentioned is the partialclone patch, which I think can
be considered separately from the rest (as I said above).


Re: [PATCH 01/13] dir: allow exclusions from blob in addition to file

2017-10-25 Thread Junio C Hamano
Jeff Hostetler  writes:

> +static int add_excludes_from_buffer(char *buf, size_t size,
> + const char *base, int baselen,
> + struct exclude_list *el);
> +
>  /*
>   * Given a file with name "fname", read it (either from disk, or from
>   * an index if 'istate' is non-null), parse it and store the
> @@ -754,9 +758,9 @@ static int add_excludes(const char *fname, const char 
> *base, int baselen,
>   struct sha1_stat *sha1_stat)
>  {
>   struct stat st;
> - int fd, i, lineno = 1;
> + int fd;
>   size_t size = 0;
> - char *buf, *entry;
> + char *buf;
>  
>   fd = open(fname, O_RDONLY);
>   if (fd < 0 || fstat(fd, ) < 0) {

The post-context of this hunk is quite interesting in that there is
a call to read_skip_worktree_file_from_index(); which essentially 
pretends as if we read from the filesystem but in fact it grabs the
blob object name registered in the index and reads from it.

The reason why it is interesting is because this patch adds yet
nother "let's instead read from a blob object" function and there is
no sign to make the existing one take advantage of the new function
seen in this patch.


Re: [PATCH v2 1/1] completion: add remaining flags to checkout

2017-10-25 Thread Junio C Hamano
Junio C Hamano  writes:

> Johannes Sixt  writes:
>
>>> The flags --force and --ignore-other-worktrees are not added as they are
>>> potentially dangerous.
>>>
>>> The flags --progress and --no-progress are only useful for scripting and are
>>> therefore also not included.
>>>
>>> Signed-off-by: Thomas Braun 
>>> ---
>>>   contrib/completion/git-completion.bash | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/contrib/completion/git-completion.bash 
>>> b/contrib/completion/git-completion.bash
>>> index d934417475..eb6ade6974 100644
>>> --- a/contrib/completion/git-completion.bash
>>> +++ b/contrib/completion/git-completion.bash
>>> @@ -1250,7 +1250,8 @@ _git_checkout ()
>>> --*)
>>> __gitcomp "
>>> --quiet --ours --theirs --track --no-track --merge
>>> -   --conflict= --orphan --patch
>>> +   --conflict= --orphan --patch --detach 
>>> --ignore-skip-worktree-bits
>>> +   --recurse-submodules --no-recurse-submodules
>>> "
>>> ;;
>>> *)
>>>
>>
>> Looks good to me. Thanks,
>> -- Hannes
>
> Doesn't quite.  This breaks t9902, doesn't it?

I've queued it with the following squashed in.

 t/t9902-completion.sh | 4 
 1 file changed, 4 insertions(+)

diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 2cb999ecfa..fc614dcbfa 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -1245,6 +1245,10 @@ test_expect_success 'double dash "git checkout"' '
--conflict=
--orphan Z
--patch Z
+   --detach Z
+   --ignore-skip-worktree-bits Z
+   --recurse-submodules Z
+   --no-recurse-submodules Z
EOF
 '
 


Re: [PATCH v2 1/1] completion: add remaining flags to checkout

2017-10-25 Thread Junio C Hamano
Johannes Sixt  writes:

>> The flags --force and --ignore-other-worktrees are not added as they are
>> potentially dangerous.
>>
>> The flags --progress and --no-progress are only useful for scripting and are
>> therefore also not included.
>>
>> Signed-off-by: Thomas Braun 
>> ---
>>   contrib/completion/git-completion.bash | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/contrib/completion/git-completion.bash 
>> b/contrib/completion/git-completion.bash
>> index d934417475..eb6ade6974 100644
>> --- a/contrib/completion/git-completion.bash
>> +++ b/contrib/completion/git-completion.bash
>> @@ -1250,7 +1250,8 @@ _git_checkout ()
>>  --*)
>>  __gitcomp "
>>  --quiet --ours --theirs --track --no-track --merge
>> ---conflict= --orphan --patch
>> +--conflict= --orphan --patch --detach 
>> --ignore-skip-worktree-bits
>> +--recurse-submodules --no-recurse-submodules
>>  "
>>  ;;
>>  *)
>>
>
> Looks good to me. Thanks,
> -- Hannes

Doesn't quite.  This breaks t9902, doesn't it?