Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-24 Thread Junio C Hamano
Fengguang Wu  writes:

>> This starts to sound more like something you would want to write in
>> the cover letter, or the trailer block next to Signed-off-by: at the
>> end of the first patch in the series.
>
> Yes, that's roughly what the current patch does, except in the latter
> case we add new info after diffstat.
>
>> Or even after the mail
>> signature at the very end of the message (incidentally that would
>> probably minimize the damage to the Git codebase needed for this
>> addition--you should be able to do this without touching anything
>> other than builtin/log.c).
>
> That's an interesting place. It looks worth trying. 

Here is an outline of such a structure.  The idea is for a history
of this shape, where "P" is the well-known public commit (e.g. one
in Linus's tree), "X", "Y", "Z" are prerequisite patches in flight,
and "A", "B", "C" are the work being sent out for testing,

 ---P---X---Y---Z---A---B---C

the submitter would say "format-patch --base=P -3 C" (or variants
thereof, e.g. with "--cover" or using "Z..C" instead of "-3 C" to
specify the range), and the identifiers for P, X, Y, Z are appended
at the end of the _first_ message (either the cover letter or the
first patch in the series).

You are welcome to steal this as a skeleton and take authorship, as
I do not plan to spend too much more time on actually implementing
prepare_bases() function myself unless I run out of things to do and
get bored, but I think there are already sufficient helper functions
to do this (you may want to make commit_patch_id() in patch-ids.c a
public helper function).

 builtin/log.c | 65 +++
 1 file changed, 65 insertions(+)

diff --git a/builtin/log.c b/builtin/log.c
index 069bd3a..8f6cc17 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1182,6 +1182,59 @@ static int from_callback(const struct option *opt, const 
char *arg, int unset)
return 0;
 }
 
+struct base_tree_info {
+   struct object_id base_commit;
+   int nr_patch_id, alloc_patch_id;
+   struct object_id *patch_id;
+};
+
+static void prepare_bases(struct base_tree_info *bases,
+ const char *base_commit,
+ struct commit **list)
+{
+   /*
+* If you are given --base=$commit command line argument, that
+* names commit 'P' when you are formatting A, B and C in this
+* history (i.e. "format-patch --base=P Z..C"):
+*
+* ---P---X---Y---Z---A---B---C
+*
+* where "P" is the well-known public commit (e.g. one in
+* Linus's tree), "X", "Y", "Z" are prerequisite patches in
+* flight whose patch-ids are well-known, and "A", "B", "C"
+* are the work being sent out for testing, you would compute
+* base_commit=P, patch_ids[] = map(commit_patch_id, [X, Y,
+* Z]) here and stuff them in bases structure.
+*
+* Be sure to check error (e.g. invalid base_commit) and die()
+* here as necessary.
+*/
+   
+}
+
+static void print_bases(struct base_tree_info *bases)
+{
+   int i;
+
+   /* Only do this once, either for the cover or for the first one */
+   if (is_null_oid(>base_commit))
+   return;
+
+   printf("** base-commit-info **\n");
+
+   /* Show the base commit */
+   printf("base-commit: %s\n", oid_to_hex(>base_commit));
+
+   /* Show the prerequisite patches */
+   for (i = 0; i < bases->nr_patch_id; i++)
+   printf("base-patch-id: %s\n", oid_to_hex(>patch_id[i]));
+
+   free(>patch_id);
+   bases->nr_patch_id = 0;
+   bases->alloc_patch_id = 0;
+   oidclr(>base_commit);
+}
+
 int cmd_format_patch(int argc, const char **argv, const char *prefix)
 {
struct commit *commit;
@@ -1205,6 +1258,9 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
int reroll_count = -1;
char *branch_name = NULL;
char *from = NULL;
+   char *base_commit = NULL;
+   struct base_tree_info bases;
+
const struct option builtin_format_patch_options[] = {
{ OPTION_CALLBACK, 'n', "numbered", , NULL,
N_("use [PATCH n/m] even with a single patch"),
@@ -1265,6 +1321,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
PARSE_OPT_OPTARG, thread_callback },
OPT_STRING(0, "signature", , N_("signature"),
N_("add a signature")),
+   OPT_STRING(0, "base", _commit, N_("base-commit"),
+  N_("add prerequisite tree info to the patch 
series")),
OPT_FILENAME(0, "signature-file", _file,
N_("add a signature from a file")),
OPT__QUIET(, N_("don't print the patch filenames")),
@@ -1496,6 +1554,11 @@ int cmd_format_patch(int argc, const char **argv, 

Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-24 Thread Michael J Gruber
Stefan Beller venit, vidit, dixit 23.02.2016 23:21:
> On Tue, Feb 23, 2016 at 12:46 PM, H. Peter Anvin  wrote:
>> On February 23, 2016 12:35:12 PM PST, Junio C Hamano  
>> wrote:
>>> ebied...@xmission.com (Eric W. Biederman) writes:
>>>
 Junio C Hamano  writes:

> It is valuable for a testing organization to say "We tested this
> series on top of version X.  We know it works, we have tested on a
> lot more hardware than the original developer had, we know this is
> good to go."  It is a valuable service.
>
> But that is valuable only if version X is still relevant, isn't it?
>
> Is the relevance of a version something that is decided by a
> developer who submits a patch series, or is it more of an attribute
> of the project and where the current integration is happening?
> Judging from the responses from Dan to this thread, I think the
> answer is the latter, and for the purpose of identifying the
> relevant version(s), the project does not even care about the exact
> commit, but it wants to know more about which branch the series is
> targetted to.
>
> With that understanding, I find it hard to believe that it buys the
> project much for the "base" commit to be recorded in a patch series
> and automated testing is done by applying the patches to that exact
> commit, which possibly is no-longer-relevant, even though it may
> help shielding the testing machinery from "you tested with a wrong
> version" complaints.
>
> Isn't it more valuable for the test robot to say "this may or may
> not have worked well with whatever old version the patch series was
> based on, but it no longer is useful to the current tip of the
> 'master'"?  If you consider what benefit the project would gain by
> having such a robot, that is the conclusion I have to draw.
>
> So I still am not convinced that this "record base commit" is a
> useful thing to do.

 So I don't know what value this has to the git project.
>>>
>>> Oh, don't worry, I wasn't talking about value this may have to the
>>> Git project at all.  "The value to the project" I mentioned in my
>>> response was all about the value to the kernel project.
>>>
 The value of Fengguag's automated testing to the kernel project is to
 smoke out really stupid things.
>>>
>>> I'll snip your bullet points, but as I wrote, I do understand the
>>> value of prescreening test.
>>>
>>> What I was questioning was what value it gives to that testing to
>>> use "the developer based this patch on this exact commit" added to
>>> each incoming patch, and have Fengguag's testing machinery test a
>>> patch with a base version that may no longer be relevant in the
>>> context of the project.  Compared to that, wouldn't "this no longer
>>> applies to the branch the series targets" or "this still applies
>>> cleanly, but no longer compiles because the surrounding API has
>>> changed" be much more valuable?
>>>
>>> In your other message, you mentioned the "index $old..$new" lines,
>>> and it is possible to use them to find a tree that the patch cleanly
>>> applies to, but it will not uniquely identify _the_ version the
>>> patch submitter used.  IMHO, finding such _a_ tree from the recent
>>> history of the branch that the patch targets and testing the patch
>>> on top of that tree (as opposed to testing the patch in the exact
>>> context the developer worked on) would give the project a better
>>> value.
>>
>> Personally, as a maintainer, I would love to see the tree ID and ideally 
>> also the commit ID a series is based on.  The commit ID is in some ways less 
>> useful since it is non-recreatable (and therefore will never match for 
>> anything but the first patch of a series), but could be useful to the 
>> maintainer.
> 
> As a contributor a commit id would also add value I would think. When
> it is unclear
> where a series is headed, contributors in Git land say things like:
> 
> This applies cleanly on origin/master.
> 
> And usually this is the master branch from yesterday as Junio pushes
> once a day. origin/master being a moving target, so it may not apply any more,
> so a commit sha1 would help for finding out what to do in the maintainer role.
> 
> This discussion sounds to me as if we'd want to have some advantages of
> the (kernel pull style, not github style) pull-request here for patch
> submissions.
> 
> I don't remember the exact quote, but Linus said once upon a time about the
> pull request workflow roughly:
> 
> "Please pull from ... And by the way I tested this software for 2
> month during development"
> (For kernels that makes sense as the contributor run the kernel
> and it worked)
> 
> as pull requests have the new patches on top of the exact parents the
> contributor put them
> on, there can be more faith in the process to divide between the
> problems the 

Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Fengguang Wu
On Tue, Feb 23, 2016 at 10:30:04PM -0800, Junio C Hamano wrote:
> Fengguang Wu  writes:
> 
> > The necessary lines for the robot are
> >
> > base commit:
> > base patch-id:
> > or
> > base tree-id:
> > base patch-id:
> 
> I will not repeat why a commit object name would be more appropriate
> than a tree object name here (please see my response to HPA).

Yes I see that reasoning in your other email.

> > The "base tree-id" will be useful if the submitted patchset is based
> > on a public (maintainer) commit.
> >
> > The "base patch-id" will be useful if the submitted patchset is based
> > on another patchset someone (likely the developer himself) posted to
> > the mailing list.
> 
> Is there a database of in-flight patches indexed by their patch-ids
> with a large enough coverage (hopefully those who maintain such a

Yes, the 0day robot internally maintains such a patch-id => commit-id
(of the below git tree) database for in-flight patches.

We exported a git tree which holds all in-flight patches, where each
patchset maps to a new branch:

https://github.com/0day-ci/linux/branches

We monitor dozens of linux kernel mailing lists, the coverage is
pretty good for the linux kernel project.

> database are using the --stable version of the patch-id for indexing
> the patches)?

Right, we do use the --stable option.

> I am wondering how well this scales, especially if a
> well-known commit named by "base commit" needs to be checked out and
> then many in-flight patches identified by "base patch-id"s need to
> be applied on top of it, to prepare the tree-ish the patch being
> evaluated can be applied to.

The database is effectively a key-value store, in the scale of 1000
new mappings per day. If we only keep 100 days data, there will be
100k mappings, which could be hold in 10MB memory.

> This starts to sound more like something you would want to write in
> the cover letter, or the trailer block next to Signed-off-by: at the
> end of the first patch in the series.

Yes, that's roughly what the current patch does, except in the latter
case we add new info after diffstat.

> Or even after the mail
> signature at the very end of the message (incidentally that would
> probably minimize the damage to the Git codebase needed for this
> addition--you should be able to do this without touching anything
> other than builtin/log.c).

That's an interesting place. It looks worth trying. 

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Junio C Hamano
Fengguang Wu  writes:

> The necessary lines for the robot are
>
> base commit:
> base patch-id:
> or
> base tree-id:
> base patch-id:

I will not repeat why a commit object name would be more appropriate
than a tree object name here (please see my response to HPA).

> The "base tree-id" will be useful if the submitted patchset is based
> on a public (maintainer) commit.
>
> The "base patch-id" will be useful if the submitted patchset is based
> on another patchset someone (likely the developer himself) posted to
> the mailing list.

Is there a database of in-flight patches indexed by their patch-ids
with a large enough coverage (hopefully those who maintain such a
database are using the --stable version of the patch-id for indexing
the patches)?  I am wondering how well this scales, especially if a
well-known commit named by "base commit" needs to be checked out and
then many in-flight patches identified by "base patch-id"s need to
be applied on top of it, to prepare the tree-ish the patch being
evaluated can be applied to.

This starts to sound more like something you would want to write in
the cover letter, or the trailer block next to Signed-off-by: at the
end of the first patch in the series.  Or even after the mail
signature at the very end of the message (incidentally that would
probably minimize the damage to the Git codebase needed for this
addition--you should be able to do this without touching anything
other than builtin/log.c).


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Junio C Hamano
"H. Peter Anvin"  writes:

> Personally, as a maintainer, I would love to see the tree ID and
> ideally also the commit ID a series is based on.  The commit ID is
> in some ways less useful since it is non-recreatable (and
> therefore will never match for anything but the first patch of a
> series), but could be useful to the maintainer.

I admit that the very first "applies-to" proposal I made long time
ago was based on a tree object name, not a commit object name like
the proposal under discussion here, but I doubt that a tree object
name is that much more useful than a commit object name in this
context.

Below, I assume that you are envisioning that the "base tree"
recorded in a patch does not necessarily name a public, well-known
tree (e.g. a tree-ish that already appears in Linus's tree for those
who work with his tree, or other relevant trees like linux-next or
net tree) [*1*]. It would name an unknown tree that results by
applying a set of well-known patches in-flight on a public
well-known commit.  In that set-up, because you cannot guess
committer identity and timestamp that are used by the patch
submitter when these in-flight patches are applied to prepare the
base for these private commits, a commit object name is useless, but
it may still be possible for you to independently compute these
trees that would result from set of well-known in-flight patches.

But I do not think "it may be possible" above translates to
usefulness in practice.

Suppose we have only three well-known in-flight patches that are
unrelated and independent, and you somehow know that the patch
submitter built the first patch in the series by working on either a
recently tagged commit (say v4.4) or a result of applying some of
these in-flight patches on top of that commit.  Even with these
three commits, the base tree the patch submitter based his or her
work on could be v4.4 itself, v4.4 plus one of the three patches
(v4.4+A, v4.4+B, v4.4+C, three possibilities in total), v4.4 plus
two of the three patches in some order (v4.4+AB, v4.4+BC, v4.4+CA,
three possibilities in total) or v4.4 plus all of the three patches,
so there are 8 possible top-level tree objects in total.  Unless you
are doing something unusual [*2*], even if you have all of these
three well-known in-flight patches in your repository, you would
have only a subset of them (you would certainly have v4.4, and v4.4
plus all three patches, but you would likely to have only one path
between these two points, that's four commits recording four trees,
out of possible 8).

In the real world, of course you have far more than three well-known
in-flight patches, so even though in theory trees may have better
chance to be "figured out", I do not think it is practical to even
attempt to "figure out" an unknown state given a tree object name.

So assuming that it is a good idea to add some information to a
patch that identifies the whole tree it applies to, I think it is
sensible to (1) limit that identifiable set of tree-ishes only to
well-known public ones, and (2) use the commit object name, not the
tree object name, for the purpose of identifying these tree-ishes.

If I understand Fengguang's plan correctly, a new work based on a
public well-known base tree-ish plus other patches in-flight are to
be accompanied by the identifier for that well-known base tree-ish
and some identifiers for these in-flight patches, i.e. the robot
will be told to check out the well-known base tree-ish, apply the
prerequisite patches and then the patches for the work are applied
on top to be evaluated.  So the above two limitations I placed in
the previous paragraph would not hurt the identifiability of the
"base" tree-ish, I would think.


[Footnote]

*1* If you limit the bases to these well known ones, then there is
no practical difference between commit and tree, because we can
assume as a maintainer you would have these commits so you would
have both, and once located, a commit would be easier to reason
about (e.g. run "log" to see what changes there are between it and a
well known tags).

*2* By "something unusual" I mean you prepare the permutations of
in-flight patches in your repository, to make it possible to find
any of the 8 tree objects in this senario.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Fengguang Wu
On Tue, Feb 23, 2016 at 12:35:12PM -0800, Junio C Hamano wrote:
> ebied...@xmission.com (Eric W. Biederman) writes:
> 
> > Junio C Hamano  writes:
> >
> >> It is valuable for a testing organization to say "We tested this
> >> series on top of version X.  We know it works, we have tested on a
> >> lot more hardware than the original developer had, we know this is
> >> good to go."  It is a valuable service.
> >>
> >> But that is valuable only if version X is still relevant, isn't it?
> >>
> >> Is the relevance of a version something that is decided by a
> >> developer who submits a patch series, or is it more of an attribute
> >> of the project and where the current integration is happening?
> >> Judging from the responses from Dan to this thread, I think the
> >> answer is the latter, and for the purpose of identifying the
> >> relevant version(s), the project does not even care about the exact
> >> commit, but it wants to know more about which branch the series is
> >> targetted to.
> >>
> >> With that understanding, I find it hard to believe that it buys the
> >> project much for the "base" commit to be recorded in a patch series
> >> and automated testing is done by applying the patches to that exact
> >> commit, which possibly is no-longer-relevant, even though it may
> >> help shielding the testing machinery from "you tested with a wrong
> >> version" complaints.
> >>
> >> Isn't it more valuable for the test robot to say "this may or may
> >> not have worked well with whatever old version the patch series was
> >> based on, but it no longer is useful to the current tip of the
> >> 'master'"?  If you consider what benefit the project would gain by
> >> having such a robot, that is the conclusion I have to draw.
> >>
> >> So I still am not convinced that this "record base commit" is a
> >> useful thing to do.
> >
> > So I don't know what value this has to the git project.
> 
> Oh, don't worry, I wasn't talking about value this may have to the
> Git project at all.  "The value to the project" I mentioned in my
> response was all about the value to the kernel project.
> 
> > The value of Fengguag's automated testing to the kernel project is to
> > smoke out really stupid things.
> 
> I'll snip your bullet points, but as I wrote, I do understand the
> value of prescreening test.
> 
> What I was questioning was what value it gives to that testing to
> use "the developer based this patch on this exact commit" added to
> each incoming patch, and have Fengguag's testing machinery test a
> patch with a base version that may no longer be relevant in the
> context of the project.  Compared to that, wouldn't "this no longer
> applies to the branch the series targets" or "this still applies
> cleanly, but no longer compiles because the surrounding API has
> changed" be much more valuable?
> 
> In your other message, you mentioned the "index $old..$new" lines,
> and it is possible to use them to find a tree that the patch cleanly
> applies to, but it will not uniquely identify _the_ version the
> patch submitter used.  IMHO, finding such _a_ tree from the recent
> history of the branch that the patch targets and testing the patch

Problem is, it's typically not clear what the patch "targets". Linux
is such a big project, there are dozens of maintainer trees and an
order more branches in their trees. The robot does a hard work trying
to guess what are the patches' targets, based on various hints like the
files being modified, the TO/CC list, the subject, etc. However it's
error prone -- even big maintainers cannot make a certain judgement
at times: "shall me or you take the patch?". Not to mention the
semi-private rules of co-maintainers, sub-maintainers and topic
branches. Sometimes people are backporting patches, hence "don't
test my patch on new RC kernels"; Or people may be working on cool
futurism patches that depend on more changes than the latest RC
kernel. To a robot, it's really hard to avoid stupid actions in such
a versatile bazaar. The "base commit/tree-id" under discussion would
be a perfect compass to guide the bot.

Thanks,
Fengguang

> on top of that tree (as opposed to testing the patch in the exact
> context the developer worked on) would give the project a better
> value.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Fengguang Wu
On Tue, Feb 23, 2016 at 11:51:31AM -0800, Junio C Hamano wrote:
> Fengguang Wu  writes:
> 
> >> >> I have a mixed feeling about this one, primarily because this was
> >> >> already tried quite early in the life of "format-patch" command.
> >> >> 
> >> >> 
> >> >> http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757
> >> >> 
> >> >> Only the name is different (it was called "applies-to" and named a
> >> >> tree object).
> >> >
> >> > Either commit or tree object will work for us. We can use it in
> >> > v2 if you prefer tree object.
> >> 
> >> Sorry, I think you misunderstood.  By "only the name is different", I
> >> didn't mean to say that the tree object name should be shown as the
> >> old proposal did.  What I meant but didn't explicitly say, as I
> >> thought it was sufficient to point at an old discussion thread, was
> >> that this was already tried and rejected.  This round uses different
> >> name but does essentially the same thing as the old proposal, and I
> >> do not think I heard anything new that supports this patch against
> >> earlier rejection by Linus.  That is what gave me a mixed feeling.
> >
> > I can understand the rejection by Linus in development process POV.
> >
> > However we are facing a new situation: in test robot POV, IMHO there
> > are values to test exactly the same tree as the patch submitter.
> > Otherwise the robot risks
> >
> > - false negative: failing to apply and test some patches
> > - false positive: sending wrong bug reports due to guessed wrong base tree
> 
> I always get negatives and positives confused, so let me think aloud
> with an example.  Let's say that somebody worked on adding a new
> feature based on v4.2 codebase and sent in a patch series.  The
> series touched files in quiescent part of the system, these files
> are identical between v4.2 and the current codebase at v4.5-rc5, and
> the series applies cleanly to a "wrong" base tree at the tip of
> 'master'.  But it turns out that the series uses an old API that was
> removed in the meantime.  The test robot may say "the result of
> applying the series does not even build" and the developer would
> complain to you saying "You tested with a wrong version".
> 
> I've already said that I can see the value this approach has for
> you.  By having the developer state which commit the series was
> based on, it will shield you from such a complaint, because you
> would not use closer-to-tip 'master' as the base, but instead use
> v4.2 codebase for the test.
> 
> As I said, what is unclear to me is what value this apporach gives
> to the project.

Problem arises when a developer based his work on a maintainer's topic
branch. The robot doesn't know that and tests the patch on v4.5-rc5,
which may trigger a false error because the patch depends on some
changes in that maintainer's topic branch. In that case, the error
report will be pure noise.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Fengguang Wu
On Tue, Feb 23, 2016 at 04:31:35PM +0300, Dan Carpenter wrote:
> Blergh...  You want it machine readable and I want it human readable.  I

Yeah. It's kind of tasting which may differ among people. I'll leave
the judgments to Junio and others, and only add necessary comments to
your points.

> don't care so much about the cover letter but for the first patch then I
> really want something minimal (one line) and human readable.
> 
> base tree/branch: 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> base commit: afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc
> base patch-id: a849260a843115dbac4b1a330d44256ee6b16d7b
> base patch-subject: Linux 4.4
> base tag: v4.4

The necessary lines for the robot are

base commit:
base patch-id:
or
base tree-id:
base patch-id:

The "base tree-id" will be useful if the submitted patchset is based
on a public (maintainer) commit.

The "base patch-id" will be useful if the submitted patchset is based
on another patchset someone (likely the developer himself) posted to
the mailing list.

> To me that looks like an unparseable wall of text.  My version of that
> is:
> 
> Applies-to: afd2ff9b7e1b+ origin
> 
> As a human all I really want to know is the tree to apply this to.  If
> it doesn't apply then I don't debug it, I just send an automatic note
> "This doesn't apply to staging-next.  Please redo."
> 
> I think that Applies-to is a better name and also that grepping for
> "^base " is less reliable than grepping for ^Applies-to.

Grep reliability should be the same, if you use "^base tree-id" and
"^base patch-id". If necessary, we can avoid white space by naming the
keys base-tree-id and base-patch-id.

> I used "origin" because that's the name in Next/Trees.  The + means
> private patches are applied.  That's what we already do in naming the
> kernel.  If the + matters, then I would include a cover letter.
> 
> I have no idea what a "base patch-id" is so that doesn't work at all.

It'll come from this command 

man git patch-id

It'll be useful if the patchset's base commit is a private one -- not
in any public maintainer tree, however the developer may have posted
it to LKML before.

The "base patch-id" can more reliably track different versions of
patches than "base patch-subject", and do not have the risk of
information leaking in case it's a confidential patch.

> Including the tag is just duplicative since we already have the hash.

That's right. Just in case it's more human readable.

> In my email, I proposed that we list all the other private patches in a
> cover letter, but I think you are saying that we only need to know the
> most recent private patch?

Yes in test robot POV. However it's a general git feature, so I guess
there will be more potential use cases and requirements.

> Another idea would be to list them newest
> to oldest (git log order instead of email order) in the cover letter.
> 
> Btw, I always work against linux-next and Dave M is always getting
> annoyed with me for not marking which patches go to net and which go to
> net-next.  I don't use git format-patch, but I will probably start using
> "Applies-to: net" or "Applies-to: net-next".

As for now, I see the netdev ML has the convention

[PATCH net]
[PATCH net-next]

to tell Dave the target tree.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Fengguang Wu
Hi Eric,

On Tue, Feb 23, 2016 at 01:56:07PM -0600, Eric W. Biederman wrote:
> 
> Fengguag Wu, Xiaolong Ye, have you attempted to use the truncated
> sha1 of the file the patch applies to?  Git already places a file sha1
> at the top of a patch.  See the index line?
> 
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index eccd925c6e82..3c3f8172c734 100644

Yes we've evaluated to make use of that index. The conclusion is,
it helps make a better guess, however it's still a guessing work
and far from perfect.

A simple accounting shows only 1/5 files will be changed between
two major kernel releases:

wfg /c/linux% git ls-files |wc -l
52915
wfg /c/linux% git diff --name-only v4.3 v4.4|wc -l  
 
10606

That means a huge number candidate base tree IDs matching the given
blob IDs.

> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> 
> As I understand it you are aiming for making a good guess what the patch
> or patches apply to, having a set of file hashes looks like it would
> give you that.
> 
> All it should take is to iterate over a patchset and for each file in
> the patchset capture the first file hash.  Then in the smallish set of
> maintainer trees see if that set of file hashes matches any of their
> recent commits.  You should be able to prune the set of possible
> maintainer trees even more by looking at the mailling list or lists
> the patch was submitted to.
 
We actually start with the above thinking half year ago. Yes it'll
help narrow down the list of candidate maintainer trees. And the
chance will be increased if the patchset modifies multiple files,
and the fact some files are modified more frequently than the others.
However it's still fundamentally a guess work. The best choice is to
ask for explicit "base tree ID".

> Before we talk about adding anything more I think we need a clear
> picture of what you have tried with what already exists.  A decade ago
> part of the problem was that not everyone used git.  At best it will
> take a little while before everyone upgrades to a version of git diff
> containing your changes, and if possibly even longer if they have to
> start specifying an additional option when a diff is generated.

That's a good concern. It may take year long delay before reaching
reasonable population of the new feature.

To speedup the process, we could advocate the new git option in 0day
robot's error reports. Since we catch errors in ~10 LKML patches each
day, within months most kernel developers should get the tips on how
to set it up and enable the feature by default.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread H. Peter Anvin
On 02/23/2016 01:49 PM, Eric W. Biederman wrote:
> 
> So I could really respect a patch header line that said:
> tree abcdef0123456789...0123456789abcdef
> 
> Where the numbers where the truncated tree hash before and after a patch
> was applied.  That would seem to give a little bit of extra sanity
> checking in the application of git diffs as well.
> 

I would rather have the untruncated base tree ID.  The truncated file
IDs already provide the verification component, and it is *way* cheaper
to search for an untruncated ID.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Stefan Beller
On Tue, Feb 23, 2016 at 12:46 PM, H. Peter Anvin  wrote:
> On February 23, 2016 12:35:12 PM PST, Junio C Hamano  
> wrote:
>>ebied...@xmission.com (Eric W. Biederman) writes:
>>
>>> Junio C Hamano  writes:
>>>
 It is valuable for a testing organization to say "We tested this
 series on top of version X.  We know it works, we have tested on a
 lot more hardware than the original developer had, we know this is
 good to go."  It is a valuable service.

 But that is valuable only if version X is still relevant, isn't it?

 Is the relevance of a version something that is decided by a
 developer who submits a patch series, or is it more of an attribute
 of the project and where the current integration is happening?
 Judging from the responses from Dan to this thread, I think the
 answer is the latter, and for the purpose of identifying the
 relevant version(s), the project does not even care about the exact
 commit, but it wants to know more about which branch the series is
 targetted to.

 With that understanding, I find it hard to believe that it buys the
 project much for the "base" commit to be recorded in a patch series
 and automated testing is done by applying the patches to that exact
 commit, which possibly is no-longer-relevant, even though it may
 help shielding the testing machinery from "you tested with a wrong
 version" complaints.

 Isn't it more valuable for the test robot to say "this may or may
 not have worked well with whatever old version the patch series was
 based on, but it no longer is useful to the current tip of the
 'master'"?  If you consider what benefit the project would gain by
 having such a robot, that is the conclusion I have to draw.

 So I still am not convinced that this "record base commit" is a
 useful thing to do.
>>>
>>> So I don't know what value this has to the git project.
>>
>>Oh, don't worry, I wasn't talking about value this may have to the
>>Git project at all.  "The value to the project" I mentioned in my
>>response was all about the value to the kernel project.
>>
>>> The value of Fengguag's automated testing to the kernel project is to
>>> smoke out really stupid things.
>>
>>I'll snip your bullet points, but as I wrote, I do understand the
>>value of prescreening test.
>>
>>What I was questioning was what value it gives to that testing to
>>use "the developer based this patch on this exact commit" added to
>>each incoming patch, and have Fengguag's testing machinery test a
>>patch with a base version that may no longer be relevant in the
>>context of the project.  Compared to that, wouldn't "this no longer
>>applies to the branch the series targets" or "this still applies
>>cleanly, but no longer compiles because the surrounding API has
>>changed" be much more valuable?
>>
>>In your other message, you mentioned the "index $old..$new" lines,
>>and it is possible to use them to find a tree that the patch cleanly
>>applies to, but it will not uniquely identify _the_ version the
>>patch submitter used.  IMHO, finding such _a_ tree from the recent
>>history of the branch that the patch targets and testing the patch
>>on top of that tree (as opposed to testing the patch in the exact
>>context the developer worked on) would give the project a better
>>value.
>
> Personally, as a maintainer, I would love to see the tree ID and ideally also 
> the commit ID a series is based on.  The commit ID is in some ways less 
> useful since it is non-recreatable (and therefore will never match for 
> anything but the first patch of a series), but could be useful to the 
> maintainer.

As a contributor a commit id would also add value I would think. When
it is unclear
where a series is headed, contributors in Git land say things like:

This applies cleanly on origin/master.

And usually this is the master branch from yesterday as Junio pushes
once a day. origin/master being a moving target, so it may not apply any more,
so a commit sha1 would help for finding out what to do in the maintainer role.

This discussion sounds to me as if we'd want to have some advantages of
the (kernel pull style, not github style) pull-request here for patch
submissions.

I don't remember the exact quote, but Linus said once upon a time about the
pull request workflow roughly:

"Please pull from ... And by the way I tested this software for 2
month during development"
(For kernels that makes sense as the contributor run the kernel
and it worked)

as pull requests have the new patches on top of the exact parents the
contributor put them
on, there can be more faith in the process to divide between the
problems the contributor
overlooked/introduced and problems as introduced by the merge of the maintainer.

Now when applying patches at another parent than the contributor
developed on, some
subtle problems may arise, 

Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Eric W. Biederman
"H. Peter Anvin"  writes:

> On February 23, 2016 12:35:12 PM PST, Junio C Hamano  
> wrote:
>>ebied...@xmission.com (Eric W. Biederman) writes:
>>
>>> Junio C Hamano  writes:
>>>
 It is valuable for a testing organization to say "We tested this
 series on top of version X.  We know it works, we have tested on a
 lot more hardware than the original developer had, we know this is
 good to go."  It is a valuable service.

 But that is valuable only if version X is still relevant, isn't it?

 Is the relevance of a version something that is decided by a
 developer who submits a patch series, or is it more of an attribute
 of the project and where the current integration is happening?
 Judging from the responses from Dan to this thread, I think the
 answer is the latter, and for the purpose of identifying the
 relevant version(s), the project does not even care about the exact
 commit, but it wants to know more about which branch the series is
 targetted to.

 With that understanding, I find it hard to believe that it buys the
 project much for the "base" commit to be recorded in a patch series
 and automated testing is done by applying the patches to that exact
 commit, which possibly is no-longer-relevant, even though it may
 help shielding the testing machinery from "you tested with a wrong
 version" complaints.

 Isn't it more valuable for the test robot to say "this may or may
 not have worked well with whatever old version the patch series was
 based on, but it no longer is useful to the current tip of the
 'master'"?  If you consider what benefit the project would gain by
 having such a robot, that is the conclusion I have to draw.

 So I still am not convinced that this "record base commit" is a
 useful thing to do.
>>>
>>> So I don't know what value this has to the git project.
>>
>>Oh, don't worry, I wasn't talking about value this may have to the
>>Git project at all.  "The value to the project" I mentioned in my
>>response was all about the value to the kernel project.
>>
>>> The value of Fengguag's automated testing to the kernel project is to
>>> smoke out really stupid things.
>>
>>I'll snip your bullet points, but as I wrote, I do understand the
>>value of prescreening test.
>>
>>What I was questioning was what value it gives to that testing to
>>use "the developer based this patch on this exact commit" added to
>>each incoming patch, and have Fengguag's testing machinery test a
>>patch with a base version that may no longer be relevant in the
>>context of the project.  Compared to that, wouldn't "this no longer
>>applies to the branch the series targets" or "this still applies
>>cleanly, but no longer compiles because the surrounding API has
>>changed" be much more valuable?
>>
>>In your other message, you mentioned the "index $old..$new" lines,
>>and it is possible to use them to find a tree that the patch cleanly
>>applies to, but it will not uniquely identify _the_ version the
>>patch submitter used.  IMHO, finding such _a_ tree from the recent
>>history of the branch that the patch targets and testing the patch
>>on top of that tree (as opposed to testing the patch in the exact
>>context the developer worked on) would give the project a better
>>value.
>
> Personally, as a maintainer, I would love to see the tree ID and
> ideally also the commit ID a series is based on.  The commit ID is in
> some ways less useful since it is non-recreatable (and therefore will
> never match for anything but the first patch of a series), but could
> be useful to the maintainer.
>
> As far as putting in a bunch of metainformation that has to be
> manually or semimanually obtained, there are a lot of issues with
> that, as it way too easily turns into wrong information or potential
> information leaks that might make corporate contributors
> uncomfortable.

I would really love to hear from Fengguag about where this is a
practical problem, but I expect the flock of x86 topic tress is probably
one of those cases where there are problems for automation to figure out
which tree a patch applies to in practice.  The mailing list for x86
patches is linux-kernel.  Even if you filter by x...@kernel.org to detect
it is an x86 patch which topic branch a patch goes to was not clear to
me from a quick skim of recent lkml x86 patches.

So I could really respect a patch header line that said:
tree abcdef0123456789...0123456789abcdef

Where the numbers where the truncated tree hash before and after a patch
was applied.  That would seem to give a little bit of extra sanity
checking in the application of git diffs as well.

Eric
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread H. Peter Anvin
On February 23, 2016 12:35:12 PM PST, Junio C Hamano  wrote:
>ebied...@xmission.com (Eric W. Biederman) writes:
>
>> Junio C Hamano  writes:
>>
>>> It is valuable for a testing organization to say "We tested this
>>> series on top of version X.  We know it works, we have tested on a
>>> lot more hardware than the original developer had, we know this is
>>> good to go."  It is a valuable service.
>>>
>>> But that is valuable only if version X is still relevant, isn't it?
>>>
>>> Is the relevance of a version something that is decided by a
>>> developer who submits a patch series, or is it more of an attribute
>>> of the project and where the current integration is happening?
>>> Judging from the responses from Dan to this thread, I think the
>>> answer is the latter, and for the purpose of identifying the
>>> relevant version(s), the project does not even care about the exact
>>> commit, but it wants to know more about which branch the series is
>>> targetted to.
>>>
>>> With that understanding, I find it hard to believe that it buys the
>>> project much for the "base" commit to be recorded in a patch series
>>> and automated testing is done by applying the patches to that exact
>>> commit, which possibly is no-longer-relevant, even though it may
>>> help shielding the testing machinery from "you tested with a wrong
>>> version" complaints.
>>>
>>> Isn't it more valuable for the test robot to say "this may or may
>>> not have worked well with whatever old version the patch series was
>>> based on, but it no longer is useful to the current tip of the
>>> 'master'"?  If you consider what benefit the project would gain by
>>> having such a robot, that is the conclusion I have to draw.
>>>
>>> So I still am not convinced that this "record base commit" is a
>>> useful thing to do.
>>
>> So I don't know what value this has to the git project.
>
>Oh, don't worry, I wasn't talking about value this may have to the
>Git project at all.  "The value to the project" I mentioned in my
>response was all about the value to the kernel project.
>
>> The value of Fengguag's automated testing to the kernel project is to
>> smoke out really stupid things.
>
>I'll snip your bullet points, but as I wrote, I do understand the
>value of prescreening test.
>
>What I was questioning was what value it gives to that testing to
>use "the developer based this patch on this exact commit" added to
>each incoming patch, and have Fengguag's testing machinery test a
>patch with a base version that may no longer be relevant in the
>context of the project.  Compared to that, wouldn't "this no longer
>applies to the branch the series targets" or "this still applies
>cleanly, but no longer compiles because the surrounding API has
>changed" be much more valuable?
>
>In your other message, you mentioned the "index $old..$new" lines,
>and it is possible to use them to find a tree that the patch cleanly
>applies to, but it will not uniquely identify _the_ version the
>patch submitter used.  IMHO, finding such _a_ tree from the recent
>history of the branch that the patch targets and testing the patch
>on top of that tree (as opposed to testing the patch in the exact
>context the developer worked on) would give the project a better
>value.

Personally, as a maintainer, I would love to see the tree ID and ideally also 
the commit ID a series is based on.  The commit ID is in some ways less useful 
since it is non-recreatable (and therefore will never match for anything but 
the first patch of a series), but could be useful to the maintainer.

As far as putting in a bunch of metainformation that has to be manually or 
semimanually obtained, there are a lot of issues with that, as it way too 
easily turns into wrong information or potential information leaks that might 
make corporate contributors uncomfortable.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Junio C Hamano
ebied...@xmission.com (Eric W. Biederman) writes:

> Junio C Hamano  writes:
>
>> It is valuable for a testing organization to say "We tested this
>> series on top of version X.  We know it works, we have tested on a
>> lot more hardware than the original developer had, we know this is
>> good to go."  It is a valuable service.
>>
>> But that is valuable only if version X is still relevant, isn't it?
>>
>> Is the relevance of a version something that is decided by a
>> developer who submits a patch series, or is it more of an attribute
>> of the project and where the current integration is happening?
>> Judging from the responses from Dan to this thread, I think the
>> answer is the latter, and for the purpose of identifying the
>> relevant version(s), the project does not even care about the exact
>> commit, but it wants to know more about which branch the series is
>> targetted to.
>>
>> With that understanding, I find it hard to believe that it buys the
>> project much for the "base" commit to be recorded in a patch series
>> and automated testing is done by applying the patches to that exact
>> commit, which possibly is no-longer-relevant, even though it may
>> help shielding the testing machinery from "you tested with a wrong
>> version" complaints.
>>
>> Isn't it more valuable for the test robot to say "this may or may
>> not have worked well with whatever old version the patch series was
>> based on, but it no longer is useful to the current tip of the
>> 'master'"?  If you consider what benefit the project would gain by
>> having such a robot, that is the conclusion I have to draw.
>>
>> So I still am not convinced that this "record base commit" is a
>> useful thing to do.
>
> So I don't know what value this has to the git project.

Oh, don't worry, I wasn't talking about value this may have to the
Git project at all.  "The value to the project" I mentioned in my
response was all about the value to the kernel project.

> The value of Fengguag's automated testing to the kernel project is to
> smoke out really stupid things.

I'll snip your bullet points, but as I wrote, I do understand the
value of prescreening test.

What I was questioning was what value it gives to that testing to
use "the developer based this patch on this exact commit" added to
each incoming patch, and have Fengguag's testing machinery test a
patch with a base version that may no longer be relevant in the
context of the project.  Compared to that, wouldn't "this no longer
applies to the branch the series targets" or "this still applies
cleanly, but no longer compiles because the surrounding API has
changed" be much more valuable?

In your other message, you mentioned the "index $old..$new" lines,
and it is possible to use them to find a tree that the patch cleanly
applies to, but it will not uniquely identify _the_ version the
patch submitter used.  IMHO, finding such _a_ tree from the recent
history of the branch that the patch targets and testing the patch
on top of that tree (as opposed to testing the patch in the exact
context the developer worked on) would give the project a better
value.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Eric W. Biederman
Junio C Hamano  writes:

> Fengguang Wu  writes:
>
>>> >> I have a mixed feeling about this one, primarily because this was
>>> >> already tried quite early in the life of "format-patch" command.
>>> >> 
>>> >> 
>>> >> http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757
>>> >> 
>>> >> Only the name is different (it was called "applies-to" and named a
>>> >> tree object).
>>> >
>>> > Either commit or tree object will work for us. We can use it in
>>> > v2 if you prefer tree object.
>>> 
>>> Sorry, I think you misunderstood.  By "only the name is different", I
>>> didn't mean to say that the tree object name should be shown as the
>>> old proposal did.  What I meant but didn't explicitly say, as I
>>> thought it was sufficient to point at an old discussion thread, was
>>> that this was already tried and rejected.  This round uses different
>>> name but does essentially the same thing as the old proposal, and I
>>> do not think I heard anything new that supports this patch against
>>> earlier rejection by Linus.  That is what gave me a mixed feeling.
>>
>> I can understand the rejection by Linus in development process POV.
>>
>> However we are facing a new situation: in test robot POV, IMHO there
>> are values to test exactly the same tree as the patch submitter.
>> Otherwise the robot risks
>>
>> - false negative: failing to apply and test some patches
>> - false positive: sending wrong bug reports due to guessed wrong base tree
>
> I always get negatives and positives confused, so let me think aloud
> with an example.  Let's say that somebody worked on adding a new
> feature based on v4.2 codebase and sent in a patch series.  The
> series touched files in quiescent part of the system, these files
> are identical between v4.2 and the current codebase at v4.5-rc5, and
> the series applies cleanly to a "wrong" base tree at the tip of
> 'master'.  But it turns out that the series uses an old API that was
> removed in the meantime.  The test robot may say "the result of
> applying the series does not even build" and the developer would
> complain to you saying "You tested with a wrong version".
>
> I've already said that I can see the value this approach has for
> you.  By having the developer state which commit the series was
> based on, it will shield you from such a complaint, because you
> would not use closer-to-tip 'master' as the base, but instead use
> v4.2 codebase for the test.
>
> As I said, what is unclear to me is what value this apporach gives
> to the project.
>
>>> I can see that recording the exact commit object name allows you to
>>> claim that you identified the exact commit to apply the patch, and
>>> that you tested the exact tree contents.  It however is unclear what
>>> the value of such a claim would be to the project or to the
>>> integrator.
>>
>> The value of base commit info is: providing a solid ground to the
>> tester, to reliably avoid false positive/negatives.
>
> It is valuable for a testing organization to say "We tested this
> series on top of version X.  We know it works, we have tested on a
> lot more hardware than the original developer had, we know this is
> good to go."  It is a valuable service.
>
> But that is valuable only if version X is still relevant, isn't it?
>
> Is the relevance of a version something that is decided by a
> developer who submits a patch series, or is it more of an attribute
> of the project and where the current integration is happening?
> Judging from the responses from Dan to this thread, I think the
> answer is the latter, and for the purpose of identifying the
> relevant version(s), the project does not even care about the exact
> commit, but it wants to know more about which branch the series is
> targetted to.
>
> With that understanding, I find it hard to believe that it buys the
> project much for the "base" commit to be recorded in a patch series
> and automated testing is done by applying the patches to that exact
> commit, which possibly is no-longer-relevant, even though it may
> help shielding the testing machinery from "you tested with a wrong
> version" complaints.
>
> Isn't it more valuable for the test robot to say "this may or may
> not have worked well with whatever old version the patch series was
> based on, but it no longer is useful to the current tip of the
> 'master'"?  If you consider what benefit the project would gain by
> having such a robot, that is the conclusion I have to draw.
>
> So I still am not convinced that this "record base commit" is a
> useful thing to do.

So I don't know what value this has to the git project.

The value of Fengguag's automated testing to the kernel project is to
smoke out really stupid things.

- A developer forgot to actually compile test their code.
  An error report that the code does not even compile from Fengguag
  allows the patch to be dropped without spending any time on it.

- A developer failed to test some 

Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Eric W. Biederman

Fengguag Wu, Xiaolong Ye, have you attempted to use the truncated
sha1 of the file the patch applies to?  Git already places a file sha1
at the top of a patch.  See the index line?

> diff --git a/fs/namespace.c b/fs/namespace.c
> index eccd925c6e82..3c3f8172c734 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c

As I understand it you are aiming for making a good guess what the patch
or patches apply to, having a set of file hashes looks like it would
give you that.

All it should take is to iterate over a patchset and for each file in
the patchset capture the first file hash.  Then in the smallish set of
maintainer trees see if that set of file hashes matches any of their
recent commits.  You should be able to prune the set of possible
maintainer trees even more by looking at the mailling list or lists
the patch was submitted to.

Before we talk about adding anything more I think we need a clear
picture of what you have tried with what already exists.  A decade ago
part of the problem was that not everyone used git.  At best it will
take a little while before everyone upgrades to a version of git diff
containing your changes, and if possibly even longer if they have to
start specifying an additional option when a diff is generated.

Eric
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Junio C Hamano
Fengguang Wu  writes:

>> >> I have a mixed feeling about this one, primarily because this was
>> >> already tried quite early in the life of "format-patch" command.
>> >> 
>> >> http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757
>> >> 
>> >> Only the name is different (it was called "applies-to" and named a
>> >> tree object).
>> >
>> > Either commit or tree object will work for us. We can use it in
>> > v2 if you prefer tree object.
>> 
>> Sorry, I think you misunderstood.  By "only the name is different", I
>> didn't mean to say that the tree object name should be shown as the
>> old proposal did.  What I meant but didn't explicitly say, as I
>> thought it was sufficient to point at an old discussion thread, was
>> that this was already tried and rejected.  This round uses different
>> name but does essentially the same thing as the old proposal, and I
>> do not think I heard anything new that supports this patch against
>> earlier rejection by Linus.  That is what gave me a mixed feeling.
>
> I can understand the rejection by Linus in development process POV.
>
> However we are facing a new situation: in test robot POV, IMHO there
> are values to test exactly the same tree as the patch submitter.
> Otherwise the robot risks
>
> - false negative: failing to apply and test some patches
> - false positive: sending wrong bug reports due to guessed wrong base tree

I always get negatives and positives confused, so let me think aloud
with an example.  Let's say that somebody worked on adding a new
feature based on v4.2 codebase and sent in a patch series.  The
series touched files in quiescent part of the system, these files
are identical between v4.2 and the current codebase at v4.5-rc5, and
the series applies cleanly to a "wrong" base tree at the tip of
'master'.  But it turns out that the series uses an old API that was
removed in the meantime.  The test robot may say "the result of
applying the series does not even build" and the developer would
complain to you saying "You tested with a wrong version".

I've already said that I can see the value this approach has for
you.  By having the developer state which commit the series was
based on, it will shield you from such a complaint, because you
would not use closer-to-tip 'master' as the base, but instead use
v4.2 codebase for the test.

As I said, what is unclear to me is what value this apporach gives
to the project.

>> I can see that recording the exact commit object name allows you to
>> claim that you identified the exact commit to apply the patch, and
>> that you tested the exact tree contents.  It however is unclear what
>> the value of such a claim would be to the project or to the
>> integrator.
>
> The value of base commit info is: providing a solid ground to the
> tester, to reliably avoid false positive/negatives.

It is valuable for a testing organization to say "We tested this
series on top of version X.  We know it works, we have tested on a
lot more hardware than the original developer had, we know this is
good to go."  It is a valuable service.

But that is valuable only if version X is still relevant, isn't it?

Is the relevance of a version something that is decided by a
developer who submits a patch series, or is it more of an attribute
of the project and where the current integration is happening?
Judging from the responses from Dan to this thread, I think the
answer is the latter, and for the purpose of identifying the
relevant version(s), the project does not even care about the exact
commit, but it wants to know more about which branch the series is
targetted to.

With that understanding, I find it hard to believe that it buys the
project much for the "base" commit to be recorded in a patch series
and automated testing is done by applying the patches to that exact
commit, which possibly is no-longer-relevant, even though it may
help shielding the testing machinery from "you tested with a wrong
version" complaints.

Isn't it more valuable for the test robot to say "this may or may
not have worked well with whatever old version the patch series was
based on, but it no longer is useful to the current tip of the
'master'"?  If you consider what benefit the project would gain by
having such a robot, that is the conclusion I have to draw.

So I still am not convinced that this "record base commit" is a
useful thing to do.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Dan Carpenter
Blergh...  You want it machine readable and I want it human readable.  I
don't care so much about the cover letter but for the first patch then I
really want something minimal (one line) and human readable.

base tree/branch: 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
base commit: afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc
base patch-id: a849260a843115dbac4b1a330d44256ee6b16d7b
base patch-subject: Linux 4.4
base tag: v4.4

To me that looks like an unparseable wall of text.  My version of that
is:

Applies-to: afd2ff9b7e1b+ origin

As a human all I really want to know is the tree to apply this to.  If
it doesn't apply then I don't debug it, I just send an automatic note
"This doesn't apply to staging-next.  Please redo."

I think that Applies-to is a better name and also that grepping for
"^base " is less reliable than grepping for ^Applies-to.

I used "origin" because that's the name in Next/Trees.  The + means
private patches are applied.  That's what we already do in naming the
kernel.  If the + matters, then I would include a cover letter.

I have no idea what a "base patch-id" is so that doesn't work at all.

Including the tag is just duplicative since we already have the hash.

In my email, I proposed that we list all the other private patches in a
cover letter, but I think you are saying that we only need to know the
most recent private patch?  Another idea would be to list them newest
to oldest (git log order instead of email order) in the cover letter.

Btw, I always work against linux-next and Dave M is always getting
annoyed with me for not marking which patches go to net and which go to
net-next.  I don't use git format-patch, but I will probably start using
"Applies-to: net" or "Applies-to: net-next".

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Fengguang Wu
Hi Dan,

On Tue, Feb 23, 2016 at 01:32:53PM +0300, Dan Carpenter wrote:
> So this is the format for the first patch?
> 
> base commit: 0233b800c838ddda41db318ee396320b3c21a560

What's in my mind is lines like

base tree/branch: 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
base commit: afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc
base patch-id: a849260a843115dbac4b1a330d44256ee6b16d7b

The point is one piece of information per line, so that new lines can
be added trivially in future, like

base patch-subject: Linux 4.4
base tag: v4.4

The exact format can be improved wherever suitable. For example, use
more suitable key name part (eg. "base commit" => "base-commit") or
value part (eg. "$tree_url $branch" to "$tree_url#$branch").

> Can we change it to include the name of the public tree we are starting
> from?
> 
> applies-to: 0233b800c838 
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git#master

No problem, just that I'd prefer breaking up such information into
multi "key: value" lines.

> Of course, my absolute prefered format would be:
> 
> applies-to: net-next 0233b800c838
> 
> I don't think that's possible though?  I often write that sort of a line
> in my emails to Dave already.

Yeah, that'd be most human readable. It does require people (and
scripts) to reach consensus on the tree/branch name, which may only be
possible for well known trees.

> Fengguang was suggesting something like this if we have to include
> unmerged patches:
> 
> applies-to: net-next 0233b800c838
> private patchset 1
> private patchset 2
> 
> I don't think git knows what a patchset is.

Git may not need to have patchset concept. Suppose a developer's local
branch has

v4.4
private commit 1, subject: do aaa
private commit 2, subject: do bbb
private commit 3, subject: do ccc
private commit 4, subject: do ddd
private commit 5, subject: do eee

If he decided to send commits 1-2 as one patchset, and 3-5 as another
patchset to LKML. The 2 cover letters would look like (only showing
useful fields):

$ git format-patch commit 1..commit 2
[PATCH 0/2]
base commit: afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc

$ git format-patch commit 3..commit 5
[PATCH 0/3]
base patch-subject: do bbb

The 0day robot will be able to find the suitable base and re-create
exactly the same tree object for both the above 2 patchsets based on
the first one's "base commit" and the second one's "base patch-subject".

> We would have to include the subject line for each unmerged patch.

That's a good idea!

> I think we should only do that if there is a cover letter, otherwise
> the it's too noisy.

Or if no cover letter, the information can be included in the first
patch, ie. [PATCH 1/N].

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Dan Carpenter
So this is the format for the first patch?

base commit: 0233b800c838ddda41db318ee396320b3c21a560

Can we change it to include the name of the public tree we are starting
from?

applies-to: 0233b800c838 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git#master

Of course, my absolute prefered format would be:

applies-to: net-next 0233b800c838

I don't think that's possible though?  I often write that sort of a line
in my emails to Dave already.

Fengguang was suggesting something like this if we have to include
unmerged patches:

applies-to: net-next 0233b800c838
private patchset 1
private patchset 2

I don't think git knows what a patchset is.  We would have to include
the subject line for each unmerged patch.  I think we should only do
that if there is a cover letter, otherwise the it's too noisy.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Fengguang Wu
On Tue, Feb 23, 2016 at 01:23:19AM -0800, H. Peter Anvin wrote:
> On 02/23/16 01:17, Fengguang Wu wrote:
> > 
> > However we are facing a new situation: in test robot POV, IMHO there
> > are values to test exactly the same tree as the patch submitter.
> > Otherwise the robot risks
> > 
> > - false negative: failing to apply and test some patches
> > - false positive: sending wrong bug reports due to guessed wrong base tree
> > 
> 
> Wouldn't the important part here be the git hash, rather than the tree?
>  If you have the same hash then it by definition is the same contents?

Yes. Sorry for the partial wording! We should be talking about the
same thing: the hash of the tree object. The commit SHA1 will also
do the work.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread H. Peter Anvin
On 02/23/16 01:17, Fengguang Wu wrote:
> 
> However we are facing a new situation: in test robot POV, IMHO there
> are values to test exactly the same tree as the patch submitter.
> Otherwise the robot risks
> 
> - false negative: failing to apply and test some patches
> - false positive: sending wrong bug reports due to guessed wrong base tree
> 

Wouldn't the important part here be the git hash, rather than the tree?
 If you have the same hash then it by definition is the same contents?

-hpa


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-23 Thread Fengguang Wu
Hi Junio and All,

CC more relevant people. FYI this thread starts here:

http://thread.gmane.org/gmane.comp.version-control.git

On Mon, Feb 22, 2016 at 10:54:38PM -0800, Junio C Hamano wrote:
> Fengguang Wu  writes:
> 
> > Hi Junio,
> >
> > On Sun, Feb 21, 2016 at 08:19:56PM -0800, Junio C Hamano wrote:
> >> Xiaolong Ye  writes:
> >> 
> >> > It would be helpful for maintainers or reviewers to know the base tree
> >> > info of the patches created by git format-patch. Teach git format-patch
> >> > a --base-tree-info option to record these info.
> >> >
> >> > Signed-off-by: Xiaolong Ye 
> >> > ---
> >> 
> >> I have a mixed feeling about this one, primarily because this was
> >> already tried quite early in the life of "format-patch" command.
> >> 
> >> http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757
> >> 
> >> Only the name is different (it was called "applies-to" and named a
> >> tree object).
> >
> > Either commit or tree object will work for us. We can use it in
> > v2 if you prefer tree object.
> 
> Sorry, I think you misunderstood.  By only the name is different, I
> didn't mean to say that the tree object name should be shown as the
> old proposal did.  What I meant but didn't explicitly say, as I
> thought it was sufficient to point at an old discussion thread, was
> that this was already tried and rejected.  This round uses different
> name but does essentially the same thing as the old proposal, and I
> do not think I heard anything new that supports this patch against
> earlier rejection by Linus.  That is what gave me a mixed feeling.

I can understand the rejection by Linus in development process POV.

However we are facing a new situation: in test robot POV, IMHO there
are values to test exactly the same tree as the patch submitter.
Otherwise the robot risks

- false negative: failing to apply and test some patches
- false positive: sending wrong bug reports due to guessed wrong base tree

> >> Is it your goal to insist on one exact commit the patch is applied
> >> to?
> >
> > Right. Our goal is fully automated patch testing, where the base tree
> > info is required for *reliably* avoid reporting false positives.
> >
> > A clean git-apply does not guarantee the resulted code is logically
> > consistent and hence testable by 3rd party. For a 3rd party tester to
> > provide useful and trustable test reports, he must apply the patch to
> > exactly the same base as the patch submitter.
> 
> The patch submitter (or you as a third party tester) is not in the
> position to dictate the integrator to apply the patch to one
> specific commit and use it from there.  The integrator would pick an
> appropriate base that would be different from the commit where the
> patch was taken from, apply it there, and merge the result to the
> tip of the mainline, or apply the patch directly to the tip of the
> mainline.  Even if the integrator picked the commit the patch was
> taken from, the result would not be used alone without any other
> changes, i.e. before getting merged into the integration branch.

Yeah. Per my understanding the base commit info will be mainly parsed
by test robots instead of integrators.

> So in that sense, any test that is done by the patch submitter and
> the third party tester would not be what will be released to the
> wild *anyway*.  The resulting code will be exercised in a context
> that *is* different from the context the original author had.

That's right. But no worry, when the patch is merged by maintainer,
we'll test it once again in the maintainer tree.

Pre-merge patch testing is useful in 2 ways:

- shift left testing to early review stage

- maintainer trees are typically not rebaseable. When errors are
  discovered there, it's a bit too late: the error will likely remain
  in git history for ever. Which will hurt bisects.

> I can see that recording the exact commit object name allows you to
> claim that you identified the exact commit to apply the patch, and
> that you tested the exact tree contents.  It however is unclear what
> the value of such a claim would be to the project or to the
> integrator.

The value of base commit info is: providing a solid ground to the
tester, to reliably avoid false positive/negatives.

> So I dunno.

FYI, the 0day test robot will be able to work better if provided the
base commit info. It'll work a bit more sophisticated than simply
relying on the base commit info: if it's sure about the tree the patch
is targeted for (or the maintainer would apply to), it'll use that as
base tree[*]; otherwise it'll fall back to using the base commit info
included in the patchset.

[*] For examples,

[PATCH -mm] ...
[PATCH net] ...

For such patches we are sure they are targeted for the well known
mm/net trees.

Anyway the worst case of not adopting the discussed patch is, the 0day
test robot continue to work in current heuristic 

Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-22 Thread Junio C Hamano
Fengguang Wu  writes:

> Hi Junio,
>
> On Sun, Feb 21, 2016 at 08:19:56PM -0800, Junio C Hamano wrote:
>> Xiaolong Ye  writes:
>> 
>> > It would be helpful for maintainers or reviewers to know the base tree
>> > info of the patches created by git format-patch. Teach git format-patch
>> > a --base-tree-info option to record these info.
>> >
>> > Signed-off-by: Xiaolong Ye 
>> > ---
>> 
>> I have a mixed feeling about this one, primarily because this was
>> already tried quite early in the life of "format-patch" command.
>> 
>> http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757
>> 
>> Only the name is different (it was called "applies-to" and named a
>> tree object).
>
> Either commit or tree object will work for us. We can use it in
> v2 if you prefer tree object.

Sorry, I think you misunderstood.  By only the name is different, I
didn't mean to say that the tree object name should be shown as the
old proposal did.  What I meant but didn't explicitly say, as I
thought it was sufficient to point at an old discussion thread, was
that this was already tried and rejected.  This round uses different
name but does essentially the same thing as the old proposal, and I
do not think I heard anything new that supports this patch against
earlier rejection by Linus.  That is what gave me a mixed feeling.

>> Is it your goal to insist on one exact commit the patch is applied
>> to?
>
> Right. Our goal is fully automated patch testing, where the base tree
> info is required for *reliably* avoid reporting false positives.
>
> A clean git-apply does not guarantee the resulted code is logically
> consistent and hence testable by 3rd party. For a 3rd party tester to
> provide useful and trustable test reports, he must apply the patch to
> exactly the same base as the patch submitter.

The patch submitter (or you as a third party tester) is not in the
position to dictate the integrator to apply the patch to one
specific commit and use it from there.  The integrator would pick an
appropriate base that would be different from the commit where the
patch was taken from, apply it there, and merge the result to the
tip of the mainline, or apply the patch directly to the tip of the
mainline.  Even if the integrator picked the commit the patch was
taken from, the result would not be used alone without any other
changes, i.e. before getting merged into the integration branch.

So in that sense, any test that is done by the patch submitter and
the third party tester would not be what will be released to the
wild *anyway*.  The resulting code will be exercised in a context
that *is* different from the context the original author had.

I can see that recording the exact commit object name allows you to
claim that you identified the exact commit to apply the patch, and
that you tested the exact tree contents.  It however is unclear what
the value of such a claim would be to the project or to the
integrator.

So I dunno.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-22 Thread Fengguang Wu
Hi Junio,

On Sun, Feb 21, 2016 at 08:19:56PM -0800, Junio C Hamano wrote:
> Xiaolong Ye  writes:
> 
> > It would be helpful for maintainers or reviewers to know the base tree
> > info of the patches created by git format-patch. Teach git format-patch
> > a --base-tree-info option to record these info.
> >
> > Signed-off-by: Xiaolong Ye 
> > ---
> 
> I have a mixed feeling about this one, primarily because this was
> already tried quite early in the life of "format-patch" command.
> 
> http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757
> 
> Only the name is different (it was called "applies-to" and named a
> tree object).

Either commit or tree object will work for us. We can use it in
v2 if you prefer tree object.

> The approach the patch I am responding to takes, which is to touch
> diff.[ch], is probably flawed.  Even if we assume, for the sake of
> reviewing, that "which commit (or tree) this patch applies to" is a
> good thing to add, this is useful only to the very first patch in
> the series.  Hence its output should come from some function in
> builtin/log.c that iterates over commits and done for only the first
> one in a series, not by the diff machinery that is called by that
> loop iterating over commits.

For patchset with a cover letter, builtin/log.c is the best place to
put the base tree info:

[PATCH 0/3] add base tree info here
[PATCH 1/3]
[PATCH 2/3]
[PATCH 3/3]

For the others, it looks the best place for the new info is right
after the diffstat of the first patch:

[PATCH 1/3] add base tree info here
[PATCH 2/3]
[PATCH 3/3]

Since the diffstat is shown in diff.c, we add a few lines of code
after diff.c calls diff_summary().

> It is unclear what your goal is, and "would be helpful" is just as
> convincing as saying it would be helpful to record the phase of the
> moon when the commit was made.  A typical patch rarely touches all
> the files in the project, so there will be multiple commits in the
> existing history of the project to which the patch would apply
> cleanly.
>
> Is it your goal to insist on one exact commit the patch is applied
> to?

Right. Our goal is fully automated patch testing, where the base tree
info is required for *reliably* avoid reporting false positives.

A clean git-apply does not guarantee the resulted code is logically
consistent and hence testable by 3rd party. For a 3rd party tester to
provide useful and trustable test reports, he must apply the patch to
exactly the same base as the patch submitter.

For example, suppose a developer posted a patch to file A to add a
call to func_foo(). The test robot could apply it cleanly to latest
kernel git tree, eg. v4.5-rc5, and continue to build test it. Suppose
v4.5-rc5 included another commit to rename func_foo() to func_bar() in
file B. As a result, the robot will catch and report a build error
"func_foo() not defined" to the developer, who will then complain "but
my patch is based on v4.4, you tested my patch on the wrong base".

Without the base tree info, what we do now is a lot of dirty guess
works, trying the same patch on some "maybe suitable" bases and see if
they'll trigger the same error. For example, this is one of the error
reports the 0day robot auto sent to LKML:

https://lists.01.org/pipermail/kbuild-all/2016-February/017385.html
Re: [PATCH 2/2] kernel: sched: fix preempt_disable_ip recording for 
preempt_disable()

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.5-rc3 next-20160211]
[if your patch is applied to the wrong git tree, please drop us a note 
to help improving the system]

We sent 300+ "/maybe/ your patch has problem" reports like that in
each month. Whose quality can be greatly improved if we can be 100%
sure about the test errors.

> Or are you OK as long as the patch is applied to the
> same set of blobs the diff was taken from?  A developer may have a
> few unrelated and private commits on top of Linus's released
> version, on top of which she builds a series.  As long as the paths
> touched by the patches in this series do not overlap with the paths

As shown in the above example, "no overlap paths" is not a sufficient
condition for "patches are irrelevant". Because patch A may change
definition of a function in one file while patch B may reference that
function in another file.

So there is no easy mechanical way for a script to remove the several
non-Linus commits and can still provide useful base commit info.

However if we teach this patch to include one more line:

base patch-id: HASH

And the developer's local patches are

mainline tree
private patchset 1
private patchset 2

The developer will likely have already post patchset 2's base
patchset 1 to the mailing list, then our robot already know about
patchset 1's patch ids, therefore being able to apply 

Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-21 Thread Jacob Keller
On Sun, Feb 21, 2016 at 8:19 PM, Junio C Hamano  wrote:
> If you are OK with accepting a patch application to a tree with the
> same blobs the diff was taken from, then the format-patch output
> already has all the necessary information.  For each "diff --git"
> part, there is the "index $old..$new" line that records preimage and
> postimage blob object ID, so you should be able to find a tree that
> has the matching blobs in the paths mentioned in the patch, and it
> is guaranteed that the patch would apply cleanly to such a tree.
>
> Of course, that requires the recipient of the patch to have the all
> the blobs mentioned as the preimage in the patch, but it would be a
> reasonable assumption, as your patch assumes that the recipient has
> the commit, and if he has a commit by definition he would have all
> the blobs recorded by that commit.
>

Sorry for a sort of tangent here, but what method would you recommend
for asking git to locate a commit which matches all those blobs? Is
there any already existing plumbing to do such a thing?

Regards,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info

2016-02-21 Thread Junio C Hamano
Xiaolong Ye  writes:

> It would be helpful for maintainers or reviewers to know the base tree
> info of the patches created by git format-patch. Teach git format-patch
> a --base-tree-info option to record these info.
>
> Signed-off-by: Xiaolong Ye 
> ---

I have a mixed feeling about this one, primarily because this was
already tried quite early in the life of "format-patch" command.

http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757

Only the name is different (it was called "applies-to" and named a
tree object).

The approach the patch I am responding to takes, which is to touch
diff.[ch], is probably flawed.  Even if we assume, for the sake of
reviewing, that "which commit (or tree) this patch applies to" is a
good thing to add, this is useful only to the very first patch in
the series.  Hence its output should come from some function in
builtin/log.c that iterates over commits and done for only the first
one in a series, not by the diff machinery that is called by that
loop iterating over commits.

It is unclear what your goal is, and "would be helpful" is just as
convincing as saying it would be helpful to record the phase of the
moon when the commit was made.  A typical patch rarely touches all
the files in the project, so there will be multiple commits in the
existing history of the project to which the patch would apply
cleanly.  Is it your goal to insist on one exact commit the patch is
applied to?  Or are you OK as long as the patch is applied to the
same set of blobs the diff was taken from?  A developer may have a
few unrelated and private commits on top of Linus's released
version, on top of which she builds a series.  As long as the paths
touched by the patches in this series do not overlap with the paths
touched by the initial few unrelated and private commits, such a
series should be testable directly on top of Linus's released
version, without forcing her to first rebase the series to remove
these initial few unrelated and private commits from her history
before running "format-patch --base-commit" (your patch is recording
the commit object name, and it shouldn't call it tree-info), but you
are forcing her to do so because these private commits will not be
available to those who apply her patches.

If you are OK with accepting a patch application to a tree with the
same blobs the diff was taken from, then the format-patch output
already has all the necessary information.  For each "diff --git"
part, there is the "index $old..$new" line that records preimage and
postimage blob object ID, so you should be able to find a tree that
has the matching blobs in the paths mentioned in the patch, and it
is guaranteed that the patch would apply cleanly to such a tree.

Of course, that requires the recipient of the patch to have the all
the blobs mentioned as the preimage in the patch, but it would be a
reasonable assumption, as your patch assumes that the recipient has
the commit, and if he has a commit by definition he would have all
the blobs recorded by that commit.

Incidentally, the "index $old..$new" lines are what make "am -3"
possible.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html