date:20181110

Re: [PATCH 04/10] fast-export: avoid dying when filtering by paths and old tags exist

2018-11-10 Thread Elijah Newren

On Sat, Nov 10, 2018 at 10:44 PM Jeff King  wrote:
>
> On Sat, Nov 10, 2018 at 10:23:06PM -0800, Elijah Newren wrote:
>
> > If --tag-of-filtered-object=rewrite is specified along with a set of
> > paths to limit what is exported, then any tags pointing to old commits
> > that do not contain any of those specified paths cause problems.  Since
> > the old tagged commit is not exported, fast-export attempts to rewrite
> > such tags to an ancestor commit which was exported.  If no such commit
> > exists, then fast-export currently die()s.  Five years after the tag
> > rewriting logic was added to fast-export (see commit 2d8ad4691921,
> > "fast-export: Add a --tag-of-filtered-object  option for newly dangling
> > tags", 2009-06-25), fast-import gained the ability to delete refs (see
> > commit 4ee1b225b99f, "fast-import: add support to delete refs",
> > 2014-04-20), so now we do have a valid option to rewrite the tag to.
> > Delete these tags instead of dying.
>
> Hmm. That's the right thing to do if we're considering the export to be
> an independent unit. But what if I'm just rewriting a portion of history
> like:
>
>   git fast-export HEAD~5..HEAD | some_filter | git fast-import
>
> ? If I have a tag pointing to HEAD~10, will this delete that? Ideally I
> think it would be left alone.

A couple things:
  * This code path only triggers in a very specific case: If a tag is
requested for export but points to a commit which is filtered out by
something else (e.g. path limiters and the commit in question didn't
modify any of the relevant paths), AND the user explicitly specified
--tag-of-filtered-object=rewrite (so that the tag in question can be
rewritten to the nearest non-filtered ancestor).
  * You didn't specify to export any tags, only HEAD, so this
situation isn't relevant (the tag wouldn't be exported or deleted).
  * You didn't specify --tag-of-filtered-object=rewrite, so this
situation isn't relevant (even if you had specified a tag to filter,
you'd get an abort instead)

But let's say you do modify the example some:
   git fast-export --tag-of-filtered-object=rewrite
--signed-tags=strip --tags master -- relatively_recent_subdirectory/ |
some_filter | git fast-import

The user asked that all tags and master be exported but only for the
history that touched relatively_recent_subdirectory/, and if any tags
point at commits that are pruned by only asking for commits touching
relatively_recent_subdirectory/, then rewrite what those tags point to
so that they instead point to the nearest non-filtered ancestor.  What
about a commit like v0.1.0 that likely pre-dated the introduction of
relatively_recent_subdirectory/?  It has no nearest ancestor to
rewrite to.  The previous answer was to abort, which is really bad,
especially since the user was clearly asking us to do whatever smart
rewriting we can (--signed-tags=strip and
--tag-of-filtered-object=rewrite).

Perhaps there's a different answer that's workable as well, but this
one, in these circumstances, seemed the most reasonable to me.

> > +test_expect_success 'rewrite tag predating pathspecs to nothing' '
> > + test_create_repo rewrite_tag_predating_pathspecs &&
> > + (
> > + cd rewrite_tag_predating_pathspecs &&
> > +
> > + touch ignored &&
>
> We usually prefer ">ignored" to create an empty file rather than
> "touch".

Will fix.

>
> > + git add ignored &&
> > + test_commit initial &&
>
> What do we need this "ignored" for? test_commit should create a file
> "initial.t".

I think I original had plain "git commit", then switched to
test_commit, then didn't recheck.  Thanks, will fix.

> > + echo foo >bar &&
> > + git add bar &&
> > + test_commit add-bar &&
>
> Likewise, "test_commit bar" should work by itself (though note the
> filename is "bar.t" in your fast-export command).
>
> > + git fast-export --tag-of-filtered-object=rewrite --all -- bar 
> > >output &&
> > + grep -A 1 refs/tags/v0.0.0.0.0.0.1 output | grep -E 
> > ^from.0{40}
>
> I don't think "grep -A" is portable (and we don't seem to otherwise use
> it). You can probably do something similar with sed.
>
> Use $ZERO_OID instead of hard-coding 40, which future-proofs for the
> hash transition (though I suppose the hash is not likely to get
> _shorter_ ;) ).

Will fix these up as well...after waiting for more feedback on
possible alternate suggestions.

Re: [PATCH v2 1/1] Upcast size_t variables to uintmax_t when printing

2018-11-10 Thread Jeff King

On Sun, Nov 11, 2018 at 08:05:04AM +0100, tbo...@web.de wrote:

> From: Torsten Bögershausen 
> 
> When printing variables which contain a size, today "unsigned long"
> is used at many places.
> In order to be able to change the type from "unsigned long" into size_t
> some day in the future, we need to have a way to print 64 bit variables
> on a system that has "unsigned long" defined to be 32 bit, like Win64.
> 
> Upcast all those variables into uintmax_t before they are printed.
> This is to prepare for a bigger change, when "unsigned long"
> will be converted into size_t for variables which may be > 4Gib.

I like the overall direction. I feel a little funny doing this step now,
and not as part of a series to convert individual variables. But I
cannot offhand think of any reason that it would behave badly even if
the other part does not materialize

-Peff

Re: [PATCH 00/10] fast export and import fixes and features

2018-11-10 Thread Jeff King

On Sat, Nov 10, 2018 at 10:23:02PM -0800, Elijah Newren wrote:

> This is a series of ten patches representing two doc corrections, one
> pedantic fix, three real bug fixes, one micro code refactor, and three
> new features.  Each of these ten changes is relatively small in size.
> These changes predominantly affect fast-export, but there's a couple
> small changes for fast-import as well.
> 
> I could potentially split these patches up, but I'd just end up
> chaining them sequentially since otherwise there'd be lots of
> conflicts; having 10 different single patch series with lots of
> dependencies sounded like a bigger pain to me, but let me know if you
> would prefer I split them up and how you suggest doing so.

I think it's fine to put them in sequence when there's a textual
dependency.  If it turns out that one of them needs more discussion and
we don't want it to hold later patches hostage, we can always re-roll at
that point.

(I also think it's fine to lump together thematically similar patches
even when they aren't strictly dependent, even textually. It's less work
for the maintainer to consider 1 group of 10 than 10 groups of 1).

> These patches were driven by the needs of git-repo-filter[1], but most
> if not all of them should be independently useful.

I left lots of comments. Some of the earlier ones may just be showing my
confusion about fast-export works (some of which was cleared up by your
later patches). But I like the overall direction for sure.

-Peff

Re: [PATCH 10/10] fast-export: add --always-show-modify-after-rename

2018-11-10 Thread Jeff King

On Sat, Nov 10, 2018 at 10:23:12PM -0800, Elijah Newren wrote:

> fast-export output is traditionally used as an input to a fast-import
> program, but it is also useful to help gather statistics about the
> history of a repository (particularly when --no-data is also passed).
> For example, two of the types of information we may want to collect
> could include:
>   1) general information about renames that have occurred
>   2) what the biggest objects in a repository are and what names
>  they appear under.
> 
> The first bit of information can be gathered by just passing -M to
> fast-export.  The second piece of information can partially be gotten
> from running
> git cat-file --batch-check --batch-all-objects
> However, that only shows what the biggest objects in the repository are
> and their sizes, not what names those objects appear as or what commits
> they were introduced in.  We can get that information from fast-export,
> but when we only see
> R oldname newname
> instead of
> R oldname newname
> M 100644 $SHA1 newname
> then it makes the job more difficult.  Add an option which allows us to
> force the latter output even when commits have exact renames of files.

fast-export seems like a funny tool to look up paths. What about "git
log --find-object=$SHA1" ?

-Peff

Re: [PATCH 09/10] fast-export: add a --show-original-ids option to show original names

2018-11-10 Thread Jeff King

On Sat, Nov 10, 2018 at 10:23:11PM -0800, Elijah Newren wrote:

> Knowing the original names (hashes) of commits, blobs, and tags can
> sometimes enable post-filtering that would otherwise be difficult or
> impossible.  In particular, the desire to rewrite commit messages which
> refer to other prior commits (on top of whatever other filtering is
> being done) is very difficult without knowing the original names of each
> commit.
> 
> This commit teaches a new --show-original-ids option to fast-export
> which will make it add a 'originally ' line to blob, commits, and
> tags.  It also teaches fast-import to parse (and ignore) such lines.

Makes sense as a feature; I think filter-branch can make its mappings
available, too.

Do we need to worry about compatibility with other fast-import programs?
I think no, because this is not enabled by default (so if sending the
extra lines to another importer hurts, the answer is "don't do that").

I have a vague feeling that there might be some way to combine this with
--export-marks or --no-data, but I can't really think of a way. They
seem related, but not quite.

> ---
>  Documentation/git-fast-export.txt |  7 +++
>  builtin/fast-export.c | 20 +++-
>  fast-import.c | 17 +
>  t/t9350-fast-export.sh| 17 +
>  4 files changed, 56 insertions(+), 5 deletions(-)

The fast-import format is documented in Documentation/git-fast-import.txt.
It might need an update to cover the new format.

> --- a/Documentation/git-fast-export.txt
> +++ b/Documentation/git-fast-export.txt
> @@ -121,6 +121,13 @@ marks the same across runs.
>   used by a repository which already contains the necessary
>   parent commits.
>  
> +--show-original-ids::
> + Add an extra directive to the output for commits and blobs,
> + `originally `.  While such directives will likely be
> + ignored by importers such as git-fast-import, it may be useful
> + for intermediary filters (e.g. for rewriting commit messages
> + which refer to older commits, or for stripping blobs by id).

I'm not quite sure how a blob ends up being rewritten by fast-export (I
get that commits may change due to dropping parents).

The name "originally" doesn't seem great to me. Probably because I would
continually wonder if it has one "l" or two. ;) Perhaps something like
"original-oid" might be better. That's well into bikeshed territory,
though.

-Peff

Re: [PATCH 02/10] git-fast-export.txt: clarify misleading documentation about rev-list args

2018-11-10 Thread Elijah Newren

On Sat, Nov 10, 2018 at 10:36 PM Jeff King  wrote:
>
> On Sat, Nov 10, 2018 at 10:23:04PM -0800, Elijah Newren wrote:
>
> > Signed-off-by: Elijah Newren 
> > ---
> >  Documentation/git-fast-export.txt | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/git-fast-export.txt 
> > b/Documentation/git-fast-export.txt
> > index ce954be532..677510b7f7 100644
> > --- a/Documentation/git-fast-export.txt
> > +++ b/Documentation/git-fast-export.txt
> > @@ -119,7 +119,8 @@ marks the same across runs.
> >   'git rev-list', that specifies the specific objects and references
> >   to export.  For example, `master~10..master` causes the
> >   current master reference to be exported along with all objects
> > - added since its 10th ancestor commit.
> > + added since its 10th ancestor commit and all files common to
> > + master\~9 and master~10.
>
> Do you need to backslash the second tilde?  Maybe `master~9` and
> `master~10` instead of escaping?

Oops, yeah, that needs to be consistent.

> I'm not sure what this is trying to say. I guess that we'd always show
> all of the blobs necessary to reconstruct the first non-negative commit
> (i.e., `master~9` here)?

For someone familiar with fast-export or fast-import, sure, you'd
guess that it'd show all the blobs necessary to reconstruct the first
non-negative commit.  But it's not clear to first time users and
readers of the docs that the first non-negative commit becomes a root
commit; by comparison, filter-branch suggests using a very similar
construction and yet behaves quite differently -- it does not turn the
first non-negative commit into a root but retains the original
parent(s) of the first non-negative commit without rewriting those
earlier commits.  The text as previously written, "along with all
objects added since its 10th ancestor commit", seems to suggest
behavior similar to how filter-branch behaves (particularly the
"Acked-by example"), i.e. it implies that files not touched in the
last 10 commits are not included.  My wording in this patch was an
attempt to fix that.  Was my attempt perhaps too clumsy, or was it
just the case that you had sufficient knowledge of fast-export that
the previous text didn't mislead you?

Re: [PATCH 08/10] fast-export: add --reference-excluded-parents option

2018-11-10 Thread Jeff King

On Sat, Nov 10, 2018 at 10:23:10PM -0800, Elijah Newren wrote:

> git filter-branch has a nifty feature allowing you to rewrite, e.g. just
> the last 8 commits of a linear history
>   git filter-branch $OPTIONS HEAD~8..HEAD
> 
> If you try the same with git fast-export, you instead get a history of
> only 8 commits, with HEAD~7 being rewritten into a root commit.  There
> are two alternatives:

Ah, I think this maybe answers some of my earlier questions, too. You
cannot use fast-import as it stands to do a partial rewrite.

>   1) Don't use the negative revision specification, and when you're
>  filtering the output to make modifications to the last 8 commits,
>  just be careful to not modify any earlier commits somehow.
> 
>   2) First run 'git fast-export --export-marks=somefile HEAD~8', then
>  run 'git fast-export --import-marks=somefile HEAD~8..HEAD'.
> 
> Both are more error prone than I'd like (the first for obvious reasons;
> with the second option I have sometimes accidentally included too many
> revisions in the first command and then found that the corresponding
> extra revisions were not exported by the second command and thus were
> not modified as I expected).  Also, both are poor from a performance
> perspective.

Yeah, this should be O(commits you're touching), and it the current code
does not allow that at all. So I think this feature makes a lot of sense
(it probably _should_ have been the default, but it's a bit late for
that now).

> @@ -638,13 +640,21 @@ static void handle_commit(struct commit *commit, struct 
> rev_info *rev,
>   unuse_commit_buffer(commit, commit_buffer);
>  
>   for (i = 0, p = commit->parents; p; p = p->next) {
> - int mark = get_object_mark(>item->object);
> - if (!mark)
> + struct object *obj = >item->object;
> + int mark = get_object_mark(obj);
> +
> + if (!mark && !reference_excluded_commits)
>   continue;
>   if (i == 0)
> - printf("from :%d\n", mark);
> + printf("from ");
> + else
> + printf("merge ");
> + if (mark)
> + printf(":%d\n", mark);
>   else
> - printf("merge :%d\n", mark);
> + printf("%s\n", sha1_to_hex(anonymize ?
> +anonymize_sha1(>oid) :
> +obj->oid.hash));
>   i++;
>   }

OK, so this just teaches us to start with the sensible "from" directive.
I think we might be able to do a little more optimization here. If we're
exporting HEAD^..HEAD and there's an object in HEAD^ which is unchanged
in HEAD, I think we'd still print it (because it would not be marked
SHOWN), but we could omit it (by walking the tree of the boundary
commits and marking them shown).

I don't think it's a blocker for what you're doing here, but just a
possible future optimization.

> @@ -925,13 +935,22 @@ static void handle_tags_and_duplicates(struct 
> string_list *extras)
>   /*
>* Getting here means we have a commit which
>* was excluded by a negative refspec (e.g.
> -  * fast-export ^master master).  If the user
> +  * fast-export ^master master).  If we are
> +  * referencing excluded commits, set the ref
> +  * to the exact commit.  Otherwise, the user
>* wants the branch exported but every commit
> -  * in its history to be deleted, that sounds
> -  * like a ref deletion to me.
> +  * in its history to be deleted, which basically
> +  * just means deletion of the ref.
>*/
> - printf("reset %s\nfrom %s\n\n",
> -name, sha1_to_hex(null_sha1));
> + if (!reference_excluded_commits) {
> + /* delete the ref */
> + printf("reset %s\nfrom %s\n\n",
> +name, sha1_to_hex(null_sha1));
> + continue;
> + }
> + /* set ref to commit using oid, not mark */
> + printf("reset %s\nfrom %s\n\n", name,
> +sha1_to_hex(commit->object.oid.hash));

OK, and this is basically answering my earlier questions again: yes, you
_would_ want to keep old tags pointing at their commits. But only in
this much more sensible mode.

-Peff

[PATCH v2 1/1] Upcast size_t variables to uintmax_t when printing

2018-11-10 Thread tboegi

From: Torsten Bögershausen 

When printing variables which contain a size, today "unsigned long"
is used at many places.
In order to be able to change the type from "unsigned long" into size_t
some day in the future, we need to have a way to print 64 bit variables
on a system that has "unsigned long" defined to be 32 bit, like Win64.

Upcast all those variables into uintmax_t before they are printed.
This is to prepare for a bigger change, when "unsigned long"
will be converted into size_t for variables which may be > 4Gib.

Signed-off-by: Torsten Bögershausen 
---

Changes since V1:
- fixed typos in the commit message, thanks to Eric  Sunshime for careful 
reading

Applying it on pu  gives 1 conflict from the index/repo changes,
Should be easy to fix.

archive-tar.c  |  2 +-
 builtin/cat-file.c |  4 ++--
 builtin/fast-export.c  |  2 +-
 builtin/index-pack.c   |  9 +
 builtin/ls-tree.c  |  2 +-
 builtin/pack-objects.c | 12 ++--
 diff.c |  2 +-
 fast-import.c  |  4 ++--
 http-push.c|  2 +-
 ref-filter.c   |  2 +-
 sha1-file.c|  6 +++---
 11 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index 7a535cba24..a58e1a8ebf 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -202,7 +202,7 @@ static void prepare_header(struct archiver_args *args,
   unsigned int mode, unsigned long size)
 {
xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 0);
-   xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? 
size : 0);
+   xsnprintf(header->size, sizeof(header->size), "%011"PRIoMAX , 
S_ISREG(mode) ? (uintmax_t)size : (uintmax_t)0);
xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned 
long) args->time);
 
xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 8d97c84725..05decee33f 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -92,7 +92,7 @@ static int cat_one_file(int opt, const char *exp_type, const 
char *obj_name,
oi.sizep = 
if (oid_object_info_extended(the_repository, , , flags) 
< 0)
die("git cat-file: could not get object info");
-   printf("%lu\n", size);
+   printf("%"PRIuMAX"\n", (uintmax_t)size);
return 0;
 
case 'e':
@@ -238,7 +238,7 @@ static void expand_atom(struct strbuf *sb, const char 
*atom, int len,
if (data->mark_query)
data->info.sizep = >size;
else
-   strbuf_addf(sb, "%lu", data->size);
+   strbuf_addf(sb, "%"PRIuMAX , (uintmax_t)data->size);
} else if (is_atom("objectsize:disk", atom, len)) {
if (data->mark_query)
data->info.disk_sizep = >disk_size;
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 456797c12a..5790f0d554 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -253,7 +253,7 @@ static void export_blob(const struct object_id *oid)
 
mark_next_object(object);
 
-   printf("blob\nmark :%"PRIu32"\ndata %lu\n", last_idnum, size);
+   printf("blob\nmark :%"PRIu32"\ndata %"PRIuMAX"\n", last_idnum, 
(uintmax_t)size);
if (size && fwrite(buf, size, 1, stdout) != 1)
die_errno("could not write blob '%s'", oid_to_hex(oid));
printf("\n");
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 2004e25da2..2a8ada432b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -450,7 +450,8 @@ static void *unpack_entry_data(off_t offset, unsigned long 
size,
int hdrlen;
 
if (!is_delta_type(type)) {
-   hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", type_name(type), 
size) + 1;
+   hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX,
+  type_name(type),(uintmax_t)size) + 1;
the_hash_algo->init_fn();
the_hash_algo->update_fn(, hdr, hdrlen);
} else
@@ -1628,10 +1629,10 @@ static void show_pack_info(int stat_only)
chain_histogram[obj_stat[i].delta_depth - 1]++;
if (stat_only)
continue;
-   printf("%s %-6s %lu %lu %"PRIuMAX,
+   printf("%s %-6s %"PRIuMAX" %"PRIuMAX" %"PRIuMAX,
   oid_to_hex(>idx.oid),
-  type_name(obj->real_type), obj->size,
-  (unsigned long)(obj[1].idx.offset - obj->idx.offset),
+  type_name(obj->real_type), (uintmax_t)obj->size,
+  (uintmax_t)(obj[1].idx.offset - obj->idx.offset),
   (uintmax_t)obj->idx.offset);
if (is_delta_type(obj->type)) {
struct object_entry *bobj = 
[obj_stat[i].base_object_no];

Re: [PATCH 07/10] fast-export: ensure we export requested refs

2018-11-10 Thread Jeff King

On Sat, Nov 10, 2018 at 10:23:09PM -0800, Elijah Newren wrote:

> If file paths are specified to fast-export and a ref points to a commit
> that does not touch any of the relevant paths, then that ref would
> sometimes fail to be exported.  (This depends on whether any ancestors
> of the commit which do touch the relevant paths would be exported with
> that same ref name or a different ref name.)  To avoid this problem,
> put *all* specified refs into extra_refs to start, and then as we export
> each commit, remove the refname used in the 'commit $REFNAME' directive
> from extra_refs.  Then, in handle_tags_and_duplicates() we know which
> refs actually do need a manual reset directive in order to be included.
> 
> This means that we do need some special handling for excluded refs; e.g.
> if someone runs
>git fast-export ^master master
> then they've asked for master to be exported, but they have also asked
> for the commit which master points to and all of its history to be
> excluded.  That logically means ref deletion.  Previously, such refs
> were just silently omitted from being exported despite having been
> explicitly requested for export.

Hmm. Reading this it makes sense to me, but I remember from discussion
long ago that there were a lot of funny corner cases around "which refs
to include" and possibly even some ambiguous cases. Maybe that is all
sorted these days, with --refspec.

> ---
> NOTE: I was hoping the strmap API proposal would materialize, but I either
> missed it or it hasn't shown up.  The usage of string_list in this patch
> would be better replaced by what Peff suggested.

You didn't miss it. Junio did some manual conversions using hashmap,
which weren't too bad.  It's not entirely clear to me how often we'd be
able to use strmap instead of a full-on hashmap, so I haven't really
pursued it.

It looks like you generate the list here via append, and then sort at
the end. That's at least not quadratic. I think the string_list_remove()
is, though.

-Peff

Re: [PATCH 06/10] fast-export: when using paths, avoid corrupt stream with non-existent mark

2018-11-10 Thread Jeff King

On Sat, Nov 10, 2018 at 10:23:08PM -0800, Elijah Newren wrote:

> If file paths are specified to fast-export and multiple refs point to a
> commit that does not touch any of the relevant file paths, then
> fast-export can hit problems.  fast-export has a list of additional refs
> that it needs to explicitly set after exporting all blobs and commits,
> and when it tries to get_object_mark() on the relevant commit, it can
> get a mark of 0, i.e. "not found", because the commit in question did
> not touch the relevant paths and thus was not exported.  Trying to
> import a stream with a mark corresponding to an unexported object will
> cause fast-import to crash.
> 
> Avoid this problem by taking the commit the ref points to and finding an
> ancestor of it that was exported, and make the ref point to that commit
> instead.

As with the earlier tag commit, I wonder if this might depend on the
context in which you're using fast-export. I suppose that if you did not
feed the ref on the command line that we would not be dealing with it at
all (and maybe that is the answer to my question about the tag thing,
too).

It does seem funny that the behavior for the earlier case (bounded
commits) and this case (skipping some commits) are different. Would you
ever want to keep walking backwards to find an ancestor in the earlier
case? Or vice versa, would you ever want to simply delete a tag in a
case like this one?

I'm not sure sure, but I suspect you may have thought about it a lot
harder than I have. :)

> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index a3c044b0af..5648a8ce9c 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -900,7 +900,18 @@ static void handle_tags_and_duplicates(void)
>   if (anonymize)
>   name = anonymize_refname(name);
>   /* create refs pointing to already seen commits */
> - commit = (struct commit *)object;
> + commit = rewrite_commit((struct commit *)object);
> + if (!commit) {
> + /*
> +  * Neither this object nor any of its
> +  * ancestors touch any relevant paths, so
> +  * it has been filtered to nothing.  Delete
> +  * it.
> +  */
> + printf("reset %s\nfrom %s\n\n",
> +name, sha1_to_hex(null_sha1));
> + continue;
> + }

This hunk makes sense.

> --- a/t/t9350-fast-export.sh
> +++ b/t/t9350-fast-export.sh
> @@ -386,6 +386,30 @@ test_expect_success 'path limiting with import-marks 
> does not lose unmodified fi
>   grep file0 actual
>  '
>  
> +test_expect_success 'avoid corrupt stream with non-existent mark' '
> + test_create_repo avoid_non_existent_mark &&
> + (
> + cd avoid_non_existent_mark &&
> +
> + touch important-path &&
> + git add important-path &&
> + test_commit initial &&
> +
> + touch ignored &&
> + git add ignored &&
> + test_commit whatever &&
> +
> + git branch A &&
> + git branch B &&
> +
> + echo foo >>important-path &&
> + git add important-path &&
> + test_commit more changes &&
> +
> + git fast-export --all -- important-path | git fast-import 
> --force
> + )
> +'

Similar comments apply about "touch" and "test_commit" to what I wrote
for the earlier patch.

-Peff

Re: [PATCH 05/10] fast-export: move commit rewriting logic into a function for reuse

2018-11-10 Thread Jeff King

On Sat, Nov 10, 2018 at 10:23:07PM -0800, Elijah Newren wrote:

> Logic to replace a filtered commit with an unfiltered ancestor is useful
> elsewhere; put it into a function we can call.

OK. I had to stare at it for a minute to make sure there was not an
edge case with looking at "p" versus "p->parents", but I think it is a
faithful conversion.

-Peff

Re: [PATCH 04/10] fast-export: avoid dying when filtering by paths and old tags exist

2018-11-10 Thread Jeff King

On Sat, Nov 10, 2018 at 10:23:06PM -0800, Elijah Newren wrote:

> If --tag-of-filtered-object=rewrite is specified along with a set of
> paths to limit what is exported, then any tags pointing to old commits
> that do not contain any of those specified paths cause problems.  Since
> the old tagged commit is not exported, fast-export attempts to rewrite
> such tags to an ancestor commit which was exported.  If no such commit
> exists, then fast-export currently die()s.  Five years after the tag
> rewriting logic was added to fast-export (see commit 2d8ad4691921,
> "fast-export: Add a --tag-of-filtered-object  option for newly dangling
> tags", 2009-06-25), fast-import gained the ability to delete refs (see
> commit 4ee1b225b99f, "fast-import: add support to delete refs",
> 2014-04-20), so now we do have a valid option to rewrite the tag to.
> Delete these tags instead of dying.

Hmm. That's the right thing to do if we're considering the export to be
an independent unit. But what if I'm just rewriting a portion of history
like:

  git fast-export HEAD~5..HEAD | some_filter | git fast-import

? If I have a tag pointing to HEAD~10, will this delete that? Ideally I
think it would be left alone.

> +test_expect_success 'rewrite tag predating pathspecs to nothing' '
> + test_create_repo rewrite_tag_predating_pathspecs &&
> + (
> + cd rewrite_tag_predating_pathspecs &&
> +
> + touch ignored &&

We usually prefer ">ignored" to create an empty file rather than
"touch".

> + git add ignored &&
> + test_commit initial &&

What do we need this "ignored" for? test_commit should create a file
"initial.t".

> + echo foo >bar &&
> + git add bar &&
> + test_commit add-bar &&

Likewise, "test_commit bar" should work by itself (though note the
filename is "bar.t" in your fast-export command).

> + git fast-export --tag-of-filtered-object=rewrite --all -- bar 
> >output &&
> + grep -A 1 refs/tags/v0.0.0.0.0.0.1 output | grep -E ^from.0{40}

I don't think "grep -A" is portable (and we don't seem to otherwise use
it). You can probably do something similar with sed.

Use $ZERO_OID instead of hard-coding 40, which future-proofs for the
hash transition (though I suppose the hash is not likely to get
_shorter_ ;) ).

-Peff

Re: [PATCH 02/10] git-fast-export.txt: clarify misleading documentation about rev-list args

2018-11-10 Thread Jeff King

On Sat, Nov 10, 2018 at 10:23:04PM -0800, Elijah Newren wrote:

> Signed-off-by: Elijah Newren 
> ---
>  Documentation/git-fast-export.txt | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/git-fast-export.txt 
> b/Documentation/git-fast-export.txt
> index ce954be532..677510b7f7 100644
> --- a/Documentation/git-fast-export.txt
> +++ b/Documentation/git-fast-export.txt
> @@ -119,7 +119,8 @@ marks the same across runs.
>   'git rev-list', that specifies the specific objects and references
>   to export.  For example, `master~10..master` causes the
>   current master reference to be exported along with all objects
> - added since its 10th ancestor commit.
> + added since its 10th ancestor commit and all files common to
> + master\~9 and master~10.

Do you need to backslash the second tilde?  Maybe `master~9` and
`master~10` instead of escaping?

I'm not sure what this is trying to say. I guess that we'd always show
all of the blobs necessary to reconstruct the first non-negative commit
(i.e., `master~9` here)?

-Peff

Re: [PATCH 03/10] fast-export: use value from correct enum

2018-11-10 Thread Jeff King

On Sat, Nov 10, 2018 at 10:23:05PM -0800, Elijah Newren wrote:

> ABORT and ERROR happen to have the same value, but come from differnt
> enums.  Use the one from the correct enum.

Yikes. :)

This is a good argument for naming these SIGNED_TAG_ABORT, etc. But this
is obviously an improvement in the meantime.

-Peff

Re: [PATCH 01/10] git-fast-import.txt: fix documentation for --quiet option

2018-11-10 Thread Jeff King

On Sat, Nov 10, 2018 at 10:23:03PM -0800, Elijah Newren wrote:

> Signed-off-by: Elijah Newren 
> ---
>  Documentation/git-fast-import.txt | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/git-fast-import.txt 
> b/Documentation/git-fast-import.txt
> index e81117d27f..7ab97745a6 100644
> --- a/Documentation/git-fast-import.txt
> +++ b/Documentation/git-fast-import.txt
> @@ -40,9 +40,10 @@ OPTIONS
>   not contain the old commit).
>  
>  --quiet::
> - Disable all non-fatal output, making fast-import silent when it
> - is successful.  This option disables the output shown by
> - --stats.
> + Disable the output shown by --stats, making fast-import usually
> + be silent when it is successful.  However, if the import stream
> + has directives intended to show user output (e.g. `progress`
> + directives), the corresponding messages will still be shown.

Makes sense. I think one could argue that it should disable those
messages, too, but probably the right answer is that the export side
should be told to be `--quiet` as well.

-Peff

[PATCH 03/10] fast-export: use value from correct enum

2018-11-10 Thread Elijah Newren

ABORT and ERROR happen to have the same value, but come from differnt
enums.  Use the one from the correct enum.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 456797c12a..1a299c2a21 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -752,7 +752,7 @@ static void handle_tag(const char *name, struct tag *tag)
tagged_mark = get_object_mark(tagged);
if (!tagged_mark) {
switch(tag_of_filtered_mode) {
-   case ABORT:
+   case ERROR:
die("tag %s tags unexported object; use "
"--tag-of-filtered-object= to handle it",
oid_to_hex(>object.oid));
-- 
2.19.1.866.g82735bcbde

[PATCH 02/10] git-fast-export.txt: clarify misleading documentation about rev-list args

2018-11-10 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index ce954be532..677510b7f7 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -119,7 +119,8 @@ marks the same across runs.
'git rev-list', that specifies the specific objects and references
to export.  For example, `master~10..master` causes the
current master reference to be exported along with all objects
-   added since its 10th ancestor commit.
+   added since its 10th ancestor commit and all files common to
+   master\~9 and master~10.
 
 EXAMPLES
 
-- 
2.19.1.866.g82735bcbde

[PATCH 05/10] fast-export: move commit rewriting logic into a function for reuse

2018-11-10 Thread Elijah Newren

Logic to replace a filtered commit with an unfiltered ancestor is useful
elsewhere; put it into a function we can call.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c | 37 ++---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 89de9d6400..a3c044b0af 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -187,6 +187,22 @@ static int get_object_mark(struct object *object)
return ptr_to_mark(decoration);
 }
 
+static struct commit *rewrite_commit(struct commit *p)
+{
+   for (;;) {
+   if (p->parents && p->parents->next)
+   break;
+   if (p->object.flags & UNINTERESTING)
+   break;
+   if (!(p->object.flags & TREESAME))
+   break;
+   if (!p->parents)
+   return NULL;
+   p = p->parents->item;
+   }
+   return p;
+}
+
 static void show_progress(void)
 {
static int counter = 0;
@@ -766,21 +782,12 @@ static void handle_tag(const char *name, struct tag *tag)
oid_to_hex(>object.oid),
type_name(tagged->type));
}
-   p = (struct commit *)tagged;
-   for (;;) {
-   if (p->parents && p->parents->next)
-   break;
-   if (p->object.flags & UNINTERESTING)
-   break;
-   if (!(p->object.flags & TREESAME))
-   break;
-   if (!p->parents) {
-   printf("reset %s\nfrom %s\n\n",
-  name, sha1_to_hex(null_sha1));
-   free(buf);
-   return;
-   }
-   p = p->parents->item;
+   p = rewrite_commit((struct commit *)tagged);
+   if (!p) {
+   printf("reset %s\nfrom %s\n\n",
+  name, sha1_to_hex(null_sha1));
+   free(buf);
+   return;
}
tagged_mark = get_object_mark(>object);
}
-- 
2.19.1.866.g82735bcbde

[PATCH 10/10] fast-export: add --always-show-modify-after-rename

2018-11-10 Thread Elijah Newren

fast-export output is traditionally used as an input to a fast-import
program, but it is also useful to help gather statistics about the
history of a repository (particularly when --no-data is also passed).
For example, two of the types of information we may want to collect
could include:
  1) general information about renames that have occurred
  2) what the biggest objects in a repository are and what names
 they appear under.

The first bit of information can be gathered by just passing -M to
fast-export.  The second piece of information can partially be gotten
from running
git cat-file --batch-check --batch-all-objects
However, that only shows what the biggest objects in the repository are
and their sizes, not what names those objects appear as or what commits
they were introduced in.  We can get that information from fast-export,
but when we only see
R oldname newname
instead of
R oldname newname
M 100644 $SHA1 newname
then it makes the job more difficult.  Add an option which allows us to
force the latter output even when commits have exact renames of files.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt | 11 ++
 builtin/fast-export.c |  7 +-
 t/t9350-fast-export.sh| 36 +++
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index 4e40f0b99a..946a5aee1f 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -128,6 +128,17 @@ marks the same across runs.
for intermediary filters (e.g. for rewriting commit messages
which refer to older commits, or for stripping blobs by id).
 
+--always-show-modify-after-rename::
+   When a rename is detected, fast-export normally issues both a
+   'R' (rename) and a 'M' (modify) directive.  However, if the
+   contents of the old and new filename match exactly, it will
+   only issue the rename directive.  Use this flag to have it
+   always issue the modify directive after the rename, which may
+   be useful for tools which are using the fast-export stream as
+   a mechanism for gathering statistics about a repository.  Note
+   that this option only has effect when rename detection is
+   active (see the -M option).
+
 --refspec::
Apply the specified refspec to each ref exported. Multiple of them can
be specified.
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index cc01dcc90c..db606d1fd0 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -38,6 +38,7 @@ static int use_done_feature;
 static int no_data;
 static int full_tree;
 static int reference_excluded_commits;
+static int always_show_modify_after_rename;
 static int show_original_ids;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
@@ -407,7 +408,8 @@ static void show_filemodify(struct diff_queue_struct *q,
putchar('\n');
 
if (oideq(>oid, >oid) &&
-   ospec->mode == spec->mode)
+   ospec->mode == spec->mode &&
+   !always_show_modify_after_rename)
break;
}
/* fallthrough */
@@ -1099,6 +1101,9 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
 _excluded_commits, N_("Reference parents 
which are not in fast-export stream by sha1sum")),
OPT_BOOL(0, "show-original-ids", _original_ids,
N_("Show original sha1sums of blobs/commits")),
+   OPT_BOOL(0, "always-show-modify-after-rename",
+   _show_modify_after_rename,
+N_("Always provide 'M' directive after 'R'")),
 
OPT_END()
};
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 5ad6669910..d0c30672ac 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -638,4 +638,40 @@ test_expect_success 'merge commit gets exported with 
--import-marks' '
)
 '
 
+test_expect_success 'rename detection and --always-show-modify-after-rename' '
+   test_create_repo renames &&
+   (
+   cd renames &&
+   test_seq 0  9  >single_digit &&
+   test_seq 10 98 >double_digit &&
+   git add . &&
+   git commit -m initial &&
+
+   echo 99 >>double_digit &&
+   git mv single_digit single-digit &&
+   git mv double_digit double-digit &&
+   git add double-digit &&
+   git commit -m renames &&
+
+   # First, check normal fast-export -M output
+   git fast-export -M --no-data master >out &&
+
+

[PATCH 04/10] fast-export: avoid dying when filtering by paths and old tags exist

2018-11-10 Thread Elijah Newren

If --tag-of-filtered-object=rewrite is specified along with a set of
paths to limit what is exported, then any tags pointing to old commits
that do not contain any of those specified paths cause problems.  Since
the old tagged commit is not exported, fast-export attempts to rewrite
such tags to an ancestor commit which was exported.  If no such commit
exists, then fast-export currently die()s.  Five years after the tag
rewriting logic was added to fast-export (see commit 2d8ad4691921,
"fast-export: Add a --tag-of-filtered-object  option for newly dangling
tags", 2009-06-25), fast-import gained the ability to delete refs (see
commit 4ee1b225b99f, "fast-import: add support to delete refs",
2014-04-20), so now we do have a valid option to rewrite the tag to.
Delete these tags instead of dying.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  |  9 ++---
 t/t9350-fast-export.sh | 20 
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 1a299c2a21..89de9d6400 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -774,9 +774,12 @@ static void handle_tag(const char *name, struct tag *tag)
break;
if (!(p->object.flags & TREESAME))
break;
-   if (!p->parents)
-   die("can't find replacement commit for 
tag %s",
-oid_to_hex(>object.oid));
+   if (!p->parents) {
+   printf("reset %s\nfrom %s\n\n",
+  name, sha1_to_hex(null_sha1));
+   free(buf);
+   return;
+   }
p = p->parents->item;
}
tagged_mark = get_object_mark(>object);
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 6a392e87bc..5bf21b4908 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -325,6 +325,26 @@ test_expect_success 'rewriting tag of filtered out object' 
'
 )
 '
 
+test_expect_success 'rewrite tag predating pathspecs to nothing' '
+   test_create_repo rewrite_tag_predating_pathspecs &&
+   (
+   cd rewrite_tag_predating_pathspecs &&
+
+   touch ignored &&
+   git add ignored &&
+   test_commit initial &&
+
+   git tag -a -m "Some old tag" v0.0.0.0.0.0.1 &&
+
+   echo foo >bar &&
+   git add bar &&
+   test_commit add-bar &&
+
+   git fast-export --tag-of-filtered-object=rewrite --all -- bar 
>output &&
+   grep -A 1 refs/tags/v0.0.0.0.0.0.1 output | grep -E ^from.0{40}
+   )
+'
+
 cat > limit-by-paths/expected << EOF
 blob
 mark :1
-- 
2.19.1.866.g82735bcbde

[PATCH 06/10] fast-export: when using paths, avoid corrupt stream with non-existent mark

2018-11-10 Thread Elijah Newren

If file paths are specified to fast-export and multiple refs point to a
commit that does not touch any of the relevant file paths, then
fast-export can hit problems.  fast-export has a list of additional refs
that it needs to explicitly set after exporting all blobs and commits,
and when it tries to get_object_mark() on the relevant commit, it can
get a mark of 0, i.e. "not found", because the commit in question did
not touch the relevant paths and thus was not exported.  Trying to
import a stream with a mark corresponding to an unexported object will
cause fast-import to crash.

Avoid this problem by taking the commit the ref points to and finding an
ancestor of it that was exported, and make the ref point to that commit
instead.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  | 13 -
 t/t9350-fast-export.sh | 24 
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index a3c044b0af..5648a8ce9c 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -900,7 +900,18 @@ static void handle_tags_and_duplicates(void)
if (anonymize)
name = anonymize_refname(name);
/* create refs pointing to already seen commits */
-   commit = (struct commit *)object;
+   commit = rewrite_commit((struct commit *)object);
+   if (!commit) {
+   /*
+* Neither this object nor any of its
+* ancestors touch any relevant paths, so
+* it has been filtered to nothing.  Delete
+* it.
+*/
+   printf("reset %s\nfrom %s\n\n",
+  name, sha1_to_hex(null_sha1));
+   continue;
+   }
printf("reset %s\nfrom :%d\n\n", name,
   get_object_mark(>object));
show_progress();
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 5bf21b4908..dbb560c110 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -386,6 +386,30 @@ test_expect_success 'path limiting with import-marks does 
not lose unmodified fi
grep file0 actual
 '
 
+test_expect_success 'avoid corrupt stream with non-existent mark' '
+   test_create_repo avoid_non_existent_mark &&
+   (
+   cd avoid_non_existent_mark &&
+
+   touch important-path &&
+   git add important-path &&
+   test_commit initial &&
+
+   touch ignored &&
+   git add ignored &&
+   test_commit whatever &&
+
+   git branch A &&
+   git branch B &&
+
+   echo foo >>important-path &&
+   git add important-path &&
+   test_commit more changes &&
+
+   git fast-export --all -- important-path | git fast-import 
--force
+   )
+'
+
 test_expect_success 'full-tree re-shows unmodified files''
git checkout -f simple &&
git fast-export --full-tree simple >actual &&
-- 
2.19.1.866.g82735bcbde

[PATCH 07/10] fast-export: ensure we export requested refs

2018-11-10 Thread Elijah Newren

If file paths are specified to fast-export and a ref points to a commit
that does not touch any of the relevant paths, then that ref would
sometimes fail to be exported.  (This depends on whether any ancestors
of the commit which do touch the relevant paths would be exported with
that same ref name or a different ref name.)  To avoid this problem,
put *all* specified refs into extra_refs to start, and then as we export
each commit, remove the refname used in the 'commit $REFNAME' directive
from extra_refs.  Then, in handle_tags_and_duplicates() we know which
refs actually do need a manual reset directive in order to be included.

This means that we do need some special handling for excluded refs; e.g.
if someone runs
   git fast-export ^master master
then they've asked for master to be exported, but they have also asked
for the commit which master points to and all of its history to be
excluded.  That logically means ref deletion.  Previously, such refs
were just silently omitted from being exported despite having been
explicitly requested for export.

Signed-off-by: Elijah Newren 
---
NOTE: I was hoping the strmap API proposal would materialize, but I either
missed it or it hasn't shown up.  The usage of string_list in this patch
would be better replaced by what Peff suggested.

 builtin/fast-export.c  | 48 +++---
 t/t9350-fast-export.sh | 16 +++---
 2 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 5648a8ce9c..0d0bbd9445 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -38,6 +38,7 @@ static int use_done_feature;
 static int no_data;
 static int full_tree;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
+static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
 static int anonymize;
 static struct revision_sources revision_sources;
@@ -611,6 +612,7 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev,
export_blob(_queued_diff.queue[i]->two->oid);
 
refname = *revision_sources_at(_sources, commit);
+   string_list_remove(_refs, refname, 0);
if (anonymize) {
refname = anonymize_refname(refname);
anonymize_ident_line(, _end);
@@ -814,7 +816,7 @@ static struct commit *get_commit(struct rev_cmdline_entry 
*e, char *full_name)
/* handle nested tags */
while (tag && tag->object.type == OBJ_TAG) {
parse_object(the_repository, >object.oid);
-   string_list_append(_refs, full_name)->util = tag;
+   string_list_append(_refs, full_name)->util = tag;
tag = (struct tag *)tag->tagged;
}
if (!tag)
@@ -873,25 +875,30 @@ static void get_tags_and_duplicates(struct 
rev_cmdline_info *info)
}
 
/*
-* This ref will not be updated through a commit, lets make
-* sure it gets properly updated eventually.
+* Make sure this ref gets properly updated eventually, whether
+* through a commit or manually at the end.
 */
-   if (*revision_sources_at(_sources, commit) ||
-   commit->object.flags & SHOWN)
+   if (e->item->type != OBJ_TAG)
string_list_append(_refs, full_name)->util = 
commit;
+
if (!*revision_sources_at(_sources, commit))
*revision_sources_at(_sources, commit) = 
full_name;
}
+
+   string_list_sort(_refs);
+   string_list_remove_duplicates(_refs, 0);
 }
 
-static void handle_tags_and_duplicates(void)
+static void handle_tags_and_duplicates(struct string_list *extras)
 {
struct commit *commit;
int i;
 
-   for (i = extra_refs.nr - 1; i >= 0; i--) {
-   const char *name = extra_refs.items[i].string;
-   struct object *object = extra_refs.items[i].util;
+   for (i = extras->nr - 1; i >= 0; i--) {
+   const char *name = extras->items[i].string;
+   struct object *object = extras->items[i].util;
+   int mark;
+
switch (object->type) {
case OBJ_TAG:
handle_tag(name, (struct tag *)object);
@@ -912,8 +919,24 @@ static void handle_tags_and_duplicates(void)
   name, sha1_to_hex(null_sha1));
continue;
}
-   printf("reset %s\nfrom :%d\n\n", name,
-  get_object_mark(>object));
+
+   mark = get_object_mark(>object);
+   if (!mark) {
+   /*
+* Getting here means we have a commit which
+

[PATCH 08/10] fast-export: add --reference-excluded-parents option

2018-11-10 Thread Elijah Newren

git filter-branch has a nifty feature allowing you to rewrite, e.g. just
the last 8 commits of a linear history
  git filter-branch $OPTIONS HEAD~8..HEAD

If you try the same with git fast-export, you instead get a history of
only 8 commits, with HEAD~7 being rewritten into a root commit.  There
are two alternatives:

  1) Don't use the negative revision specification, and when you're
 filtering the output to make modifications to the last 8 commits,
 just be careful to not modify any earlier commits somehow.

  2) First run 'git fast-export --export-marks=somefile HEAD~8', then
 run 'git fast-export --import-marks=somefile HEAD~8..HEAD'.

Both are more error prone than I'd like (the first for obvious reasons;
with the second option I have sometimes accidentally included too many
revisions in the first command and then found that the corresponding
extra revisions were not exported by the second command and thus were
not modified as I expected).  Also, both are poor from a performance
perspective.

Add a new --reference-excluded-parents option which will cause
fast-export to refer to commits outside the specified rev-list-args
range by their sha1sum.  Such a stream will only be useful in a
repository which already contains the necessary commits (much like the
restriction imposed when using --no-data).

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt | 16 ++--
 builtin/fast-export.c | 42 +++
 t/t9350-fast-export.sh| 11 
 3 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index 677510b7f7..2916096bdd 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -110,6 +110,17 @@ marks the same across runs.
the shape of the history and stored tree.  See the section on
`ANONYMIZING` below.
 
+--reference-excluded-parents::
+   By default, running a command such as `git fast-export
+   master~5..master` will not include the commit master\~5 and
+   will make master\~4 no longer have master\~5 as a parent (though
+   both the old master\~4 and new master~4 will have all the same
+   files).  Use --reference-excluded-parents to instead have the
+   the stream refer to commits in the excluded range of history
+   by their sha1sum.  Note that the resulting stream can only be
+   used by a repository which already contains the necessary
+   parent commits.
+
 --refspec::
Apply the specified refspec to each ref exported. Multiple of them can
be specified.
@@ -119,8 +130,9 @@ marks the same across runs.
'git rev-list', that specifies the specific objects and references
to export.  For example, `master~10..master` causes the
current master reference to be exported along with all objects
-   added since its 10th ancestor commit and all files common to
-   master\~9 and master~10.
+   added since its 10th ancestor commit and (unless the
+   --reference-excluded-parents option is specified) all files
+   common to master\~9 and master~10.
 
 EXAMPLES
 
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 0d0bbd9445..ea9c5b1c00 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -37,6 +37,7 @@ static int fake_missing_tagger;
 static int use_done_feature;
 static int no_data;
 static int full_tree;
+static int reference_excluded_commits;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
@@ -596,7 +597,8 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev,
message += 2;
 
if (commit->parents &&
-   get_object_mark(>parents->item->object) != 0 &&
+   (get_object_mark(>parents->item->object) != 0 ||
+reference_excluded_commits) &&
!full_tree) {
parse_commit_or_die(commit->parents->item);
diff_tree_oid(get_commit_tree_oid(commit->parents->item),
@@ -638,13 +640,21 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev,
unuse_commit_buffer(commit, commit_buffer);
 
for (i = 0, p = commit->parents; p; p = p->next) {
-   int mark = get_object_mark(>item->object);
-   if (!mark)
+   struct object *obj = >item->object;
+   int mark = get_object_mark(obj);
+
+   if (!mark && !reference_excluded_commits)
continue;
if (i == 0)
-   printf("from :%d\n", mark);
+   printf("from ");
+   else
+   printf("merge ");
+   if (mark)
+   printf(":%d\n", mark);
else
-

[PATCH 09/10] fast-export: add a --show-original-ids option to show original names

2018-11-10 Thread Elijah Newren

Knowing the original names (hashes) of commits, blobs, and tags can
sometimes enable post-filtering that would otherwise be difficult or
impossible.  In particular, the desire to rewrite commit messages which
refer to other prior commits (on top of whatever other filtering is
being done) is very difficult without knowing the original names of each
commit.

This commit teaches a new --show-original-ids option to fast-export
which will make it add a 'originally ' line to blob, commits, and
tags.  It also teaches fast-import to parse (and ignore) such lines.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt |  7 +++
 builtin/fast-export.c | 20 +++-
 fast-import.c | 17 +
 t/t9350-fast-export.sh| 17 +
 4 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index 2916096bdd..4e40f0b99a 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -121,6 +121,13 @@ marks the same across runs.
used by a repository which already contains the necessary
parent commits.
 
+--show-original-ids::
+   Add an extra directive to the output for commits and blobs,
+   `originally `.  While such directives will likely be
+   ignored by importers such as git-fast-import, it may be useful
+   for intermediary filters (e.g. for rewriting commit messages
+   which refer to older commits, or for stripping blobs by id).
+
 --refspec::
Apply the specified refspec to each ref exported. Multiple of them can
be specified.
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index ea9c5b1c00..cc01dcc90c 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -38,6 +38,7 @@ static int use_done_feature;
 static int no_data;
 static int full_tree;
 static int reference_excluded_commits;
+static int show_original_ids;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
@@ -271,7 +272,10 @@ static void export_blob(const struct object_id *oid)
 
mark_next_object(object);
 
-   printf("blob\nmark :%"PRIu32"\ndata %lu\n", last_idnum, size);
+   printf("blob\nmark :%"PRIu32"\n", last_idnum);
+   if (show_original_ids)
+   printf("originally %s\n", oid_to_hex(oid));
+   printf("data %lu\n", size);
if (size && fwrite(buf, size, 1, stdout) != 1)
die_errno("could not write blob '%s'", oid_to_hex(oid));
printf("\n");
@@ -628,8 +632,10 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev,
reencoded = reencode_string(message, "UTF-8", encoding);
if (!commit->parents)
printf("reset %s\n", refname);
-   printf("commit %s\nmark :%"PRIu32"\n%.*s\n%.*s\ndata %u\n%s",
-  refname, last_idnum,
+   printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum);
+   if (show_original_ids)
+   printf("originally %s\n", oid_to_hex(>object.oid));
+   printf("%.*s\n%.*s\ndata %u\n%s",
   (int)(author_end - author), author,
   (int)(committer_end - committer), committer,
   (unsigned)(reencoded
@@ -807,8 +813,10 @@ static void handle_tag(const char *name, struct tag *tag)
 
if (starts_with(name, "refs/tags/"))
name += 10;
-   printf("tag %s\nfrom :%d\n%.*s%sdata %d\n%.*s\n",
-  name, tagged_mark,
+   printf("tag %s\nfrom :%d\n", name, tagged_mark);
+   if (show_original_ids)
+   printf("originally %s\n", oid_to_hex(>object.oid));
+   printf("%.*s%sdata %d\n%.*s\n",
   (int)(tagger_end - tagger), tagger,
   tagger == tagger_end ? "" : "\n",
   (int)message_size, (int)message_size, message ? message : "");
@@ -1089,6 +1097,8 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
OPT_BOOL(0, "anonymize", , N_("anonymize output")),
OPT_BOOL(0, "reference-excluded-parents",
 _excluded_commits, N_("Reference parents 
which are not in fast-export stream by sha1sum")),
+   OPT_BOOL(0, "show-original-ids", _original_ids,
+   N_("Show original sha1sums of blobs/commits")),
 
OPT_END()
};
diff --git a/fast-import.c b/fast-import.c
index 95600c78e0..232b6a8b8d 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -14,11 +14,13 @@ Format of STDIN stream:
 
   new_blob ::= 'blob' lf
 mark?
+originally?
 file_content;
   file_content ::= data;
 
   new_commit ::= 'commit' sp ref_str lf
 mark?
+originally?
 ('author' (sp name)? sp '<' email '>' sp when lf)?
 'committer' (sp name)? sp '<' email '>' sp when lf

[PATCH 01/10] git-fast-import.txt: fix documentation for --quiet option

2018-11-10 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-import.txt | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-fast-import.txt 
b/Documentation/git-fast-import.txt
index e81117d27f..7ab97745a6 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -40,9 +40,10 @@ OPTIONS
not contain the old commit).
 
 --quiet::
-   Disable all non-fatal output, making fast-import silent when it
-   is successful.  This option disables the output shown by
-   --stats.
+   Disable the output shown by --stats, making fast-import usually
+   be silent when it is successful.  However, if the import stream
+   has directives intended to show user output (e.g. `progress`
+   directives), the corresponding messages will still be shown.
 
 --stats::
Display some basic statistics about the objects fast-import has
-- 
2.19.1.866.g82735bcbde

[PATCH 00/10] fast export and import fixes and features

2018-11-10 Thread Elijah Newren

This is a series of ten patches representing two doc corrections, one
pedantic fix, three real bug fixes, one micro code refactor, and three
new features.  Each of these ten changes is relatively small in size.
These changes predominantly affect fast-export, but there's a couple
small changes for fast-import as well.

I could potentially split these patches up, but I'd just end up
chaining them sequentially since otherwise there'd be lots of
conflicts; having 10 different single patch series with lots of
dependencies sounded like a bigger pain to me, but let me know if you
would prefer I split them up and how you suggest doing so.

These patches were driven by the needs of git-repo-filter[1], but most
if not all of them should be independently useful.

Elijah Newren (10):
  git-fast-import.txt: fix documentation for --quiet option
  git-fast-export.txt: clarify misleading documentation about rev-list
args
  fast-export: use value from correct enum
  fast-export: avoid dying when filtering by paths and old tags exist
  fast-export: move commit rewriting logic into a function for reuse
  fast-export: when using paths, avoid corrupt stream with non-existent
mark
  fast-export: ensure we export requested refs
  fast-export: add --reference-excluded-parents option
  fast-export: add a --show-original-ids option to show original names
  fast-export: add --always-show-modify-after-rename

 Documentation/git-fast-export.txt |  33 ++-
 Documentation/git-fast-import.txt |   7 +-
 builtin/fast-export.c | 156 +++---
 fast-import.c |  17 
 t/t9350-fast-export.sh| 124 +++-
 5 files changed, 293 insertions(+), 44 deletions(-)

[1] https://github.com/newren/git-repo-filter if you're really
curious, but * IT HAS SEVERAL SHARP EDGES *.  It isn't really
ready for review/testing/usage/announcing/etc; in fact, it's not quite
WIP/RFC ready.  (Further, it's not clear if it should somehow become
part of core git, should go into contrib, or just remain separate
indefinitely.)  Anyway, please do not attempt to use it for anything
real yet.  I'll send out an email when I think it's closer to ready.

-- 
2.19.1.866.g82735bcbde

Can I Trust You?

2018-11-10 Thread Abel Brent

Dear friend,

I am Abel Brent, a NATO soldier serving in Afghanistan. I and my 
Comrades, we are seeking your assistance to help us 
receive/invest our funds in your country in any lucrative 
business. Please if this proposal is acceptable by you, kindly 
respond back to me for more details.

Thanks and waiting to hear from you


Abel.

Re: Git Reference Manual enhance

2018-11-10 Thread Eric Sunshine

On Sat, Nov 10, 2018 at 7:21 PM Fredi Fowler  wrote:
> Is there any way to create pull request to git man (https://git-scm.com/docs)?

That website is maintained as a project separate from Git, so you can
report issues specific to the website, or create pull requests, at its
project page (https://github.com/git/git-scm.com), however...

> I found there some inconsistencies. For example, almost in all pages
> are using [no-], but at https://git-scm.com/docs/git-merge each
> command (with [no-] or without) write separately.
>
> There are some same inconsistencies that a easy to fix. So, if I can
> sent a pull-request for such fix – inform me.

Those manual pages are generated from documentation sources in the Git
project, thus those fixes should be submitted to Git itself, not to
the website project. Changes to Git are sent to this mailing list as
patches (see Documentation/SubmittingPatches), although see
gitgitgadget[1], which acts as a Github
pull-request-to-Git-mailing-list gateway, as a possible alternative.

[1]: https://github.com/gitgitgadget/gitgitgadget/blob/master/README.md

Git Reference Manual enhance

2018-11-10 Thread Fredi Fowler

Is there any way to create pull request to git man (https://git-scm.com/docs)?

I found there some inconsistencies. For example, almost in all pages
are using [no-], but at https://git-scm.com/docs/git-merge each
command (with [no-] or without) write separately.

There are some same inconsistencies that a easy to fix. So, if I can
sent a pull-request for such fix – inform me.

Re: [PATCH] p3400: replace calls to `git checkout -b' by `git checkout -B'

2018-11-10 Thread Johannes Schindelin

Hi Alban,

On Fri, 9 Nov 2018, Alban Gruin wrote:

> p3400 makes a copy of the current repository to test git-rebase
> performance, and creates new branches in the copy with `git checkout
> -b'.  If the original repository has branches with the same name as the
> script is trying to create, this operation will fail.
> 
> This replaces these calls by `git checkout -B' to force the creation and
> update of these branches.

Both the explanation and the patch make sense to me.

Thanks,
Dscho

> 
> Signed-off-by: Alban Gruin 
> ---
>  t/perf/p3400-rebase.sh | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/t/perf/p3400-rebase.sh b/t/perf/p3400-rebase.sh
> index ce271ca4c1..d202aaed06 100755
> --- a/t/perf/p3400-rebase.sh
> +++ b/t/perf/p3400-rebase.sh
> @@ -6,9 +6,9 @@ test_description='Tests rebase performance'
>  test_perf_default_repo
>  
>  test_expect_success 'setup rebasing on top of a lot of changes' '
> - git checkout -f -b base &&
> - git checkout -b to-rebase &&
> - git checkout -b upstream &&
> + git checkout -f -B base &&
> + git checkout -B to-rebase &&
> + git checkout -B upstream &&
>   for i in $(seq 100)
>   do
>   # simulate huge diffs
> @@ -35,8 +35,8 @@ test_perf 'rebase on top of a lot of unrelated changes' '
>  
>  test_expect_success 'setup rebasing many changes without split-index' '
>   git config core.splitIndex false &&
> - git checkout -b upstream2 to-rebase &&
> - git checkout -b to-rebase2 upstream
> + git checkout -B upstream2 to-rebase &&
> + git checkout -B to-rebase2 upstream
>  '
>  
>  test_perf 'rebase a lot of unrelated changes without split-index' '
> -- 
> 2.19.1
> 
>

Re: [PATCH] coccicheck: introduce 'pending' semantic patches

2018-11-10 Thread Martin Ågren

On Sat, 10 Nov 2018 at 01:10, Stefan Beller  wrote:
> I dialed back on the workflow, as we may want to explore it first
> before writing it down.

Makes sense.

FWIW, this iteration looks good to me.

Martin

[PATCH] Fix broken command-list.h generation with core.autocrlf

2018-11-10 Thread Nguyễn Thái Ngọc Duy

The script generate-cmdlist.sh needs input text files in UNIX line
ending to work correctly. It's been fine even with core.autocrlf set
because Documentation/git-*.txt is forced LF conversion.

But this leaves out gitk.txt and also Documentation/*config.txt that
recently becomes new input for this script. Update the attribute file
to force LF on all *.txt files to be on the safe side.

For more details, please see 00ddc9d13c (Fix build with
core.autocrlf=true - 2017-05-09)

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Some tests must have broken out of the test repo and set
 core.autocrlf on my $GIT_DIR/config. Fun was not had.

 .gitattributes | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.gitattributes b/.gitattributes
index 49b3051641..acf853e029 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -5,7 +5,7 @@
 *.pl eof=lf diff=perl
 *.pm eol=lf diff=perl
 *.py eol=lf diff=python
-/Documentation/git-*.txt eol=lf
+/Documentation/**/*.txt eol=lf
 /command-list.txt eol=lf
 /GIT-VERSION-GEN eol=lf
 /mergetools/* eol=lf
-- 
2.19.1.1231.g84aef82467

Re: [PATCH v5 10/12] Add a base implementation of SHA-256 support

2018-11-10 Thread Ævar Arnfjörð Bjarmason



On Wed, Nov 07 2018, brian m. carlson wrote:

> On Mon, Nov 05, 2018 at 12:39:14PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> On Sun, Nov 04 2018, brian m. carlson wrote:
>> > +  {
>> > +  "sha256",
>> > +  /* "s256", big-endian */
>>
>> The existing entry/comment for sha1 is:
>>
>>  "sha1",
>>  /* "sha1", big-endian */
>>
>> So why the sha256/s256 difference in the code/comment? Wondering if I'm
>> missing something and we're using "s256" for something.
>
> Ah, good question.  The comment refers to the format_id field which
> follows this comment.  The value is the big-endian representation of
> "s256" as a 32-bit value.  I picked that over "sha2" to avoid confusion
> in case we add SHA-512 in the future, since that's also an SHA-2
> algorithm.
>
> Config files and command-line interfaces will use "sha1" or "sha256",
> and binary formats will use those 32-bit values ("sha1" or "s256").

Okey.

>> >  const char *empty_tree_oid_hex(void)
>> > diff --git a/sha256/block/sha256.c b/sha256/block/sha256.c
>> > [...]
>>
>> I had a question before about whether we see ourselves perma-forking
>> this implementation based off libtomcrypt, as I recall you said yes.
>
> Yes.
>
>> Still, I think it would be better to introduce this in at least two-four
>> commits where the upstream code is added as-is, then trimmed down to
>> size, then adapted to our coding style, and finally we add our own
>> utility functions.
>
> At this point, the only code that's actually used from libtomcrypt is
> the block transform.  The upstream code is split over multiple files in
> multiple directories and won't compile in our codebase without many
> files and a lot of work, so I don't feel good about either including
> code that doesn't compile or including large numbers of files that don't
> meet our coding standards (and that may still not compile because of
> platform issues).
>
>> It'll make it easier to forward-port any future upstream changes.
>
> I don't foresee many, if any, changes to this code.  It either
> implements the specification or it doesn't, and it's objectively easy to
> determine which.  There's not even an argument to port performance
> improvements, since almost everyone will be using a crypto library to
> provide this code because libraries perform so dramatically better.
> I've tried to not make the code perform worse than it did originally,
> but that's it.
>
> Furthermore, the modified code carries a relatively small amount of
> resemblance to the original, so if we did port changes forward, we'd
> probably have conflicts.
>
> It seems like you really want to include the upstream code as a separate
> commit and I understand where you're coming from with wanting to have
> this split out into logical commits, but due to the specific nature of
> this code, I see a lot of downsides and not a lot of upsides.

Yeah sorry to keep bringing this up. Your way makes sense, and I'd
forgotten the details since last time . I'll shut up about it:)

>> > +  perl -E "for (1..10) { print q{aa}; }" | \
>> > +  test-tool sha256 >actual &&
>> > +  grep cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0 
>> > actual &&
>> > +  perl -E "for (1..10) { print q{abcdefghijklmnopqrstuvwxyz}; }" | \
>> > +  test-tool sha256 >actual &&
>>
>> I've been wanting to make use depend on perl >= 5.10 (previous noises
>> about that on-list), but for now we claim to support >=5.8, which
>> doesn't have the -E switch.
>
> Good point.  I'll fix that.  After having written a lot of one-liners,
> I always write -E, and this was originally a one-liner.
>
>> But most importantly you aren't even using -E features here, and this
>> isn't very idoimatic Perl. Instead do, respectively:
>>
>> perl -e 'print q{aa} x 10'
>> perl -e "print q{abcdefghijklmnopqrstuvwxyz} x 10"
>
> I considered the more idiomatic version originally, but the latter could
> allocate a decent amount of memory in one chunk, and I wanted to avoid
> that.

~2.5MB for the latter, so trivial.

> I think what I'd like to do, actually, is turn on autoflush and
> use a postfix for, which would be more idiomatic and could potentially
> provide better testing of the chunking code.  I'll add a comment to that
> effect.

Okey. Maybe better to use syswrite() instead, or maybe print with
autoflush is more idiomatic, I don't know.

Re: [PATCH v2 12/16] parse-options: replace opterror() with optname()

2018-11-10 Thread Ramsay Jones




On 10/11/2018 04:55, Duy Nguyen wrote:
> On Tue, Nov 6, 2018 at 3:07 PM Ramsay Jones  
> wrote:
>> Also, this patch does not replace opterror() calls outside of
>> the 'parse-options.c' file with optname(). This tickles my
>> static-check.pl script, since optname() is an external function
>> which is only called from 'parse-options.c'.
>>
>> So, at present, optname() could be marked as a local 'static'
>> symbol. However, I could also imagine it being used by new callers
>> outside of 'parse-options.c' in the future. (maybe) Your call. ;-)
> 
> I was making it static, but the compiler complained about undefined
> function and I would need to either move optname() up in
> parse-options.c or add a forward declaration (but I was going to
> _remove_ the declaration!)
> 
> Since it could be potentially used by Jeff's series (and I think it
> does have potential in parse-options-cb.c), I'll just leave it
> exported and caress your static-check.pl script

Fair enough.

>(how did it not catch
> optbug() not being used outside parse-options.c either)?

It did, some time ago (presumably, I haven't checked). Which is to
say: the output from the master branch notes it:

$ grep parse-options sc
parse-options.o - optbug
$ 

... but if it gets to the master branch I tend to forget it. (I have
been meaning to go through the 'sc' file and clean out some of the
'easy' cases).

So, if optname() doesn't get any new callers, I will watch it go
from 'pu' to 'next' and then to 'master' and ... ;-)

ATB,
Ramsay Jones

Re: [RFC PATCH] index-pack: improve performance on NFS

2018-11-10 Thread Ævar Arnfjörð Bjarmason

On Fri, Nov 09 2018, Duy Nguyen wrote:

> On Fri, Nov 9, 2018 at 2:46 PM Ævar Arnfjörð Bjarmason  
> wrote:
>> I'm planning to re-submit mine with some minor changes after the great
>> Documentation/config* move lands.
>>
>> As noted in
>> https://public-inbox.org/git/87bm7clf4o@evledraar.gmail.com/ and
>> https://public-inbox.org/git/87h8gq5zmc@evledraar.gmail.com/ I think
>> it's regardless of Jeff's optimization is. O(nothing) is always faster
>> than O(something), particularly (as explained in that E-Mail) on NFS.
>
> Is it really worth adding more code to maintain just to shave a couple
> seconds (or a few percent clone time)?

Yeah I think so, because (in rough priority order):

a) The maintenance burden of carrying core.checkCollisions is trivial,
   and it's hard to imagine a scenario where it'll be difficult to
   selectively turn off some does_this_collide() function.

b) I think I need to worry more about a meteorite colliding with the
   datacenter than the threat this check is trying to guard against.

c) I think we should just turn it off by default on SHA-1, but don't
   expect that argument will carry the day. But I expect even those who
   think we still need it will have a hard time making that argument in
   the case of SHA-256. So having the codepath to disable it is helpful.

d) As shown in the linked E-Mails of mine you sometimes pay a 2-3 second
   *fixed* cost even for a very small (think ~100-200 objects) push/fetch
   that would otherwise take milliseconds with Jeff's version of this
   optimization (and not with mine). This can be a hundred/thousands of
   percent slowdown.

   Is that a big deal in itself in terms of absolute time spent? No. But
   I'm also thinking about this from the perspective of getting noise
   out of performance metrics. Some of this slowdown is also "user
   waiting for the terminal to be usable again" not just some machine
   somewhere wasting its own time.

e) As shown in the patch I have this direction as a very beneficial
   side-effect makes it much easier to repair corrupt
   repositories. Something I'm hoping to pursue even further. I've had
   cases where core.checkCollisions=false + stuff on top would have made
   repairing a broken repo much easier.

Anyway, I'm in no rush to send my patch. I'm happily using it in
production, but will wait for Jeff's be ready and to land before picking
it up again. Just wanted to do a braindump of the benefits.

[PATCH/RFC] checkout: print something when checking out paths

2018-11-10 Thread Nguyễn Thái Ngọc Duy

One of the problems with "git checkout" is that it does so many
different things and could confuse people specially when we fail to
handle ambiguation correctly.

One way to help with that is tell the user what sort of operation is
actually carried out. When switching branches, we always print
something [1].  Checking out paths however is silent. Print something
so that if we got the user intention wrong, they won't waste too much
time to find that out.

Since the purpose of printing this is to help disambiguate. Only do it
when "--" is missing (the actual reason though is many tests check
empty stderr to see that no error is raised and I'm too lazy to fix
all the test cases).

[1] Knowing the number of paths updated could also be useful even in
normal case.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 This is related to another patch in
 
 https://public-inbox.org/git/20181110120707.25846-1-pclo...@gmail.com/T/#u

 While writing that patch I thought printing something would also
 help. But if we get ambiguation right (in that particular case) then
 we don't actually need this. But it still seems a good idea...

 apply.c  |  3 ++-
 builtin/checkout-index.c |  6 --
 builtin/checkout.c   | 30 ++
 builtin/difftool.c   |  2 +-
 cache.h  |  4 ++--
 entry.c  | 10 ++
 unpack-trees.c   |  6 +++---
 7 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/apply.c b/apply.c
index 073d5f0451..5876b02197 100644
--- a/apply.c
+++ b/apply.c
@@ -3352,7 +3352,8 @@ static int checkout_target(struct index_state *istate,
 
costate.refresh_cache = 1;
costate.istate = istate;
-   if (checkout_entry(ce, , NULL) || lstat(ce->name, st))
+   if (checkout_entry(ce, , NULL, NULL) ||
+   lstat(ce->name, st))
return error(_("cannot checkout %s"), ce->name);
return 0;
 }
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 88b86c8d9f..bada491f58 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -67,7 +67,8 @@ static int checkout_file(const char *name, const char *prefix)
continue;
did_checkout = 1;
if (checkout_entry(ce, ,
-   to_tempfile ? topath[ce_stage(ce)] : NULL) < 0)
+  to_tempfile ? topath[ce_stage(ce)] : NULL,
+  NULL) < 0)
errs++;
}
 
@@ -111,7 +112,8 @@ static void checkout_all(const char *prefix, int 
prefix_length)
write_tempfile_record(last_ce->name, prefix);
}
if (checkout_entry(ce, ,
-   to_tempfile ? topath[ce_stage(ce)] : NULL) < 0)
+  to_tempfile ? topath[ce_stage(ce)] : NULL,
+  NULL) < 0)
errs++;
last_ce = ce;
}
diff --git a/builtin/checkout.c b/builtin/checkout.c
index acdafc6e4c..13ed041dc1 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -44,6 +44,7 @@ struct checkout_opts {
int ignore_skipworktree;
int ignore_other_worktrees;
int show_progress;
+   int count_checkout_paths;
/*
 * If new checkout options are added, skip_merge_working_tree
 * should be updated accordingly.
@@ -165,12 +166,13 @@ static int check_stages(unsigned stages, const struct 
cache_entry *ce, int pos)
 }
 
 static int checkout_stage(int stage, const struct cache_entry *ce, int pos,
- const struct checkout *state)
+ const struct checkout *state, int *nr_checkouts)
 {
while (pos < active_nr &&
   !strcmp(active_cache[pos]->name, ce->name)) {
if (ce_stage(active_cache[pos]) == stage)
-   return checkout_entry(active_cache[pos], state, NULL);
+   return checkout_entry(active_cache[pos], state,
+ NULL, nr_checkouts);
pos++;
}
if (stage == 2)
@@ -179,7 +181,7 @@ static int checkout_stage(int stage, const struct 
cache_entry *ce, int pos,
return error(_("path '%s' does not have their version"), 
ce->name);
 }
 
-static int checkout_merged(int pos, const struct checkout *state)
+static int checkout_merged(int pos, const struct checkout *state, int 
*nr_checkouts)
 {
struct cache_entry *ce = active_cache[pos];
const char *path = ce->name;
@@ -242,7 +244,7 @@ static int checkout_merged(int pos, const struct checkout 
*state)
ce = make_transient_cache_entry(mode, , path, 2);
if (!ce)
die(_("make_cache_entry failed for path '%s'"), path);
-   status = checkout_entry(ce, state, NULL);
+   status = checkout_entry(ce, state, NULL, nr_checkouts);

[PATCH] checkout: disambiguate dwim tracking branches and local files

2018-11-10 Thread Nguyễn Thái Ngọc Duy

When checkout dwim is added in [1], it is restricted to only dwim when
certain conditions are met and fall back to default checkout behavior
otherwise. It turns out falling back could be confusing. One of the
conditions to turn

git checkout frotz

to

git checkout -b frotz origin/frotz

is that frotz must not exist as a file. But when the user comes to
expect "git checkout frotz" to create the branch "frotz" and there
happens to be a file named "frotz", git's silently reverting "frotz"
file content is not helping. This is reported in Git mailing list [2]
and even used as an example of "Git is bad" elsewhere [3].

We normally try to do the right thing, but when there are multiple
"right things" to do, it's best to leave it to the user to decide.
Check this case, ask the user to use "--" to disambiguate.

[1] 70c9ac2f19 (DWIM "git checkout frotz" to "git checkout -b frotz
origin/frotz" - 2009-10-18)
[2] 
https://public-inbox.org/git/CACsJy8B2TVr1g+k+eSQ=pbeo3wn4_ltglo9gpur8x7z9gof...@mail.gmail.com/
[3] https://news.ycombinator.com/item?id=18230655

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/checkout.c   | 15 ---
 t/t2024-checkout-dwim.sh | 31 +++
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index acdafc6e4c..17d48166d1 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -1079,9 +1079,15 @@ static int parse_branchname_arg(int argc, const char 
**argv,
 */
int recover_with_dwim = dwim_new_local_branch_ok;
 
-   if (!has_dash_dash &&
-   (check_filename(opts->prefix, arg) || !no_wildcard(arg)))
-   recover_with_dwim = 0;
+   /*
+* If both refs/remotes/origin/master and the file
+* 'master'. Don't blindly go for 'master' file
+* because it's ambiguous. Leave it for the user to
+* decide.
+*/
+   int disambi_local_file = !has_dash_dash &&
+   (check_filename(opts->prefix, arg) || 
!no_wildcard(arg));
+
/*
 * Accept "git checkout foo" and "git checkout foo --"
 * as candidates for dwim.
@@ -1094,6 +1100,9 @@ static int parse_branchname_arg(int argc, const char 
**argv,
const char *remote = unique_tracking_name(arg, rev,
  
dwim_remotes_matched);
if (remote) {
+   if (disambi_local_file)
+   die(_("'%s' could be both a local file 
and a tracking branch.\n"
+ "Please use -- to disambiguate"), 
arg);
*new_branch = arg;
arg = remote;
/* DWIMmed to create local branch, case (3).(b) 
*/
diff --git a/t/t2024-checkout-dwim.sh b/t/t2024-checkout-dwim.sh
index 69b6774d10..fa0718c730 100755
--- a/t/t2024-checkout-dwim.sh
+++ b/t/t2024-checkout-dwim.sh
@@ -278,4 +278,35 @@ test_expect_success 'loosely defined local base branch is 
reported correctly' '
test_cmp expect actual
 '
 
+test_expect_success 'reject when arg could be part of dwim branch' '
+   git remote add foo file://non-existent-place &&
+   git update-ref refs/remotes/foo/dwim-arg HEAD &&
+   echo foo >dwim-arg &&
+   git add dwim-arg &&
+   echo bar >dwim-arg &&
+   test_must_fail git checkout dwim-arg &&
+   test_must_fail git rev-parse refs/heads/dwim-arg -- &&
+   grep bar dwim-arg
+'
+
+test_expect_success 'disambiguate dwim branch and checkout path (1)' '
+   git update-ref refs/remotes/foo/dwim-arg1 HEAD &&
+   echo foo >dwim-arg1 &&
+   git add dwim-arg1 &&
+   echo bar >dwim-arg1 &&
+   git checkout -- dwim-arg1 &&
+   test_must_fail git rev-parse refs/heads/dwim-arg1 -- &&
+   grep foo dwim-arg1
+'
+
+test_expect_success 'disambiguate dwim branch and checkout path (2)' '
+   git update-ref refs/remotes/foo/dwim-arg2 HEAD &&
+   echo foo >dwim-arg2 &&
+   git add dwim-arg2 &&
+   echo bar >dwim-arg2 &&
+   git checkout dwim-arg2 -- &&
+   git rev-parse refs/heads/dwim-arg2 -- &&
+   grep bar dwim-arg2
+'
+
 test_done
-- 
2.19.1.1231.g84aef82467

I need a partner

2018-11-10 Thread Khalid Rehman

I need urgent partner for urgent business

My Greetings

2018-11-10 Thread Mrs. Marianne Jeanne

Beloved,
I am writing this mail to you with heavy tears in my eyes and great
sorrow in my heart.  As I informed you earlier, I am (Mrs.)Marianne 
Jeanne, 
suffering from long time Cancer. From all indications 
my condition is really deteriorating and it's quite obvious 
that I won't live more than 2 months according to my doctors.

I have some funds I inherited from my late loving husband Mr. Jeanne
Smith, the sum of ($8.5000.000) which he deposited in a Bank. I need
a very honest and God fearing person that can use these funds for
Charity work, helping the Less Privileges, and 20% of this money will
be for your time and expenses, while 80% goes to charities.

Please let me know if I can TRUST YOU ON THIS to carry out this favour
for me.  I look forward to your prompt reply for more details.

Yours sincerely
Marianne Jeanne

39 matches

Mail list logo