Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
Fengguang Wuwrites: >> This starts to sound more like something you would want to write in >> the cover letter, or the trailer block next to Signed-off-by: at the >> end of the first patch in the series. > > Yes, that's roughly what the current patch does, except in the latter > case we add new info after diffstat. > >> Or even after the mail >> signature at the very end of the message (incidentally that would >> probably minimize the damage to the Git codebase needed for this >> addition--you should be able to do this without touching anything >> other than builtin/log.c). > > That's an interesting place. It looks worth trying. Here is an outline of such a structure. The idea is for a history of this shape, where "P" is the well-known public commit (e.g. one in Linus's tree), "X", "Y", "Z" are prerequisite patches in flight, and "A", "B", "C" are the work being sent out for testing, ---P---X---Y---Z---A---B---C the submitter would say "format-patch --base=P -3 C" (or variants thereof, e.g. with "--cover" or using "Z..C" instead of "-3 C" to specify the range), and the identifiers for P, X, Y, Z are appended at the end of the _first_ message (either the cover letter or the first patch in the series). You are welcome to steal this as a skeleton and take authorship, as I do not plan to spend too much more time on actually implementing prepare_bases() function myself unless I run out of things to do and get bored, but I think there are already sufficient helper functions to do this (you may want to make commit_patch_id() in patch-ids.c a public helper function). builtin/log.c | 65 +++ 1 file changed, 65 insertions(+) diff --git a/builtin/log.c b/builtin/log.c index 069bd3a..8f6cc17 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -1182,6 +1182,59 @@ static int from_callback(const struct option *opt, const char *arg, int unset) return 0; } +struct base_tree_info { + struct object_id base_commit; + int nr_patch_id, alloc_patch_id; + struct object_id *patch_id; +}; + +static void prepare_bases(struct base_tree_info *bases, + const char *base_commit, + struct commit **list) +{ + /* +* If you are given --base=$commit command line argument, that +* names commit 'P' when you are formatting A, B and C in this +* history (i.e. "format-patch --base=P Z..C"): +* +* ---P---X---Y---Z---A---B---C +* +* where "P" is the well-known public commit (e.g. one in +* Linus's tree), "X", "Y", "Z" are prerequisite patches in +* flight whose patch-ids are well-known, and "A", "B", "C" +* are the work being sent out for testing, you would compute +* base_commit=P, patch_ids[] = map(commit_patch_id, [X, Y, +* Z]) here and stuff them in bases structure. +* +* Be sure to check error (e.g. invalid base_commit) and die() +* here as necessary. +*/ + +} + +static void print_bases(struct base_tree_info *bases) +{ + int i; + + /* Only do this once, either for the cover or for the first one */ + if (is_null_oid(>base_commit)) + return; + + printf("** base-commit-info **\n"); + + /* Show the base commit */ + printf("base-commit: %s\n", oid_to_hex(>base_commit)); + + /* Show the prerequisite patches */ + for (i = 0; i < bases->nr_patch_id; i++) + printf("base-patch-id: %s\n", oid_to_hex(>patch_id[i])); + + free(>patch_id); + bases->nr_patch_id = 0; + bases->alloc_patch_id = 0; + oidclr(>base_commit); +} + int cmd_format_patch(int argc, const char **argv, const char *prefix) { struct commit *commit; @@ -1205,6 +1258,9 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) int reroll_count = -1; char *branch_name = NULL; char *from = NULL; + char *base_commit = NULL; + struct base_tree_info bases; + const struct option builtin_format_patch_options[] = { { OPTION_CALLBACK, 'n', "numbered", , NULL, N_("use [PATCH n/m] even with a single patch"), @@ -1265,6 +1321,8 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) PARSE_OPT_OPTARG, thread_callback }, OPT_STRING(0, "signature", , N_("signature"), N_("add a signature")), + OPT_STRING(0, "base", _commit, N_("base-commit"), + N_("add prerequisite tree info to the patch series")), OPT_FILENAME(0, "signature-file", _file, N_("add a signature from a file")), OPT__QUIET(, N_("don't print the patch filenames")), @@ -1496,6 +1554,11 @@ int cmd_format_patch(int argc, const char **argv,
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
Stefan Beller venit, vidit, dixit 23.02.2016 23:21: > On Tue, Feb 23, 2016 at 12:46 PM, H. Peter Anvinwrote: >> On February 23, 2016 12:35:12 PM PST, Junio C Hamano >> wrote: >>> ebied...@xmission.com (Eric W. Biederman) writes: >>> Junio C Hamano writes: > It is valuable for a testing organization to say "We tested this > series on top of version X. We know it works, we have tested on a > lot more hardware than the original developer had, we know this is > good to go." It is a valuable service. > > But that is valuable only if version X is still relevant, isn't it? > > Is the relevance of a version something that is decided by a > developer who submits a patch series, or is it more of an attribute > of the project and where the current integration is happening? > Judging from the responses from Dan to this thread, I think the > answer is the latter, and for the purpose of identifying the > relevant version(s), the project does not even care about the exact > commit, but it wants to know more about which branch the series is > targetted to. > > With that understanding, I find it hard to believe that it buys the > project much for the "base" commit to be recorded in a patch series > and automated testing is done by applying the patches to that exact > commit, which possibly is no-longer-relevant, even though it may > help shielding the testing machinery from "you tested with a wrong > version" complaints. > > Isn't it more valuable for the test robot to say "this may or may > not have worked well with whatever old version the patch series was > based on, but it no longer is useful to the current tip of the > 'master'"? If you consider what benefit the project would gain by > having such a robot, that is the conclusion I have to draw. > > So I still am not convinced that this "record base commit" is a > useful thing to do. So I don't know what value this has to the git project. >>> >>> Oh, don't worry, I wasn't talking about value this may have to the >>> Git project at all. "The value to the project" I mentioned in my >>> response was all about the value to the kernel project. >>> The value of Fengguag's automated testing to the kernel project is to smoke out really stupid things. >>> >>> I'll snip your bullet points, but as I wrote, I do understand the >>> value of prescreening test. >>> >>> What I was questioning was what value it gives to that testing to >>> use "the developer based this patch on this exact commit" added to >>> each incoming patch, and have Fengguag's testing machinery test a >>> patch with a base version that may no longer be relevant in the >>> context of the project. Compared to that, wouldn't "this no longer >>> applies to the branch the series targets" or "this still applies >>> cleanly, but no longer compiles because the surrounding API has >>> changed" be much more valuable? >>> >>> In your other message, you mentioned the "index $old..$new" lines, >>> and it is possible to use them to find a tree that the patch cleanly >>> applies to, but it will not uniquely identify _the_ version the >>> patch submitter used. IMHO, finding such _a_ tree from the recent >>> history of the branch that the patch targets and testing the patch >>> on top of that tree (as opposed to testing the patch in the exact >>> context the developer worked on) would give the project a better >>> value. >> >> Personally, as a maintainer, I would love to see the tree ID and ideally >> also the commit ID a series is based on. The commit ID is in some ways less >> useful since it is non-recreatable (and therefore will never match for >> anything but the first patch of a series), but could be useful to the >> maintainer. > > As a contributor a commit id would also add value I would think. When > it is unclear > where a series is headed, contributors in Git land say things like: > > This applies cleanly on origin/master. > > And usually this is the master branch from yesterday as Junio pushes > once a day. origin/master being a moving target, so it may not apply any more, > so a commit sha1 would help for finding out what to do in the maintainer role. > > This discussion sounds to me as if we'd want to have some advantages of > the (kernel pull style, not github style) pull-request here for patch > submissions. > > I don't remember the exact quote, but Linus said once upon a time about the > pull request workflow roughly: > > "Please pull from ... And by the way I tested this software for 2 > month during development" > (For kernels that makes sense as the contributor run the kernel > and it worked) > > as pull requests have the new patches on top of the exact parents the > contributor put them > on, there can be more faith in the process to divide between the > problems the
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
On Tue, Feb 23, 2016 at 10:30:04PM -0800, Junio C Hamano wrote: > Fengguang Wuwrites: > > > The necessary lines for the robot are > > > > base commit: > > base patch-id: > > or > > base tree-id: > > base patch-id: > > I will not repeat why a commit object name would be more appropriate > than a tree object name here (please see my response to HPA). Yes I see that reasoning in your other email. > > The "base tree-id" will be useful if the submitted patchset is based > > on a public (maintainer) commit. > > > > The "base patch-id" will be useful if the submitted patchset is based > > on another patchset someone (likely the developer himself) posted to > > the mailing list. > > Is there a database of in-flight patches indexed by their patch-ids > with a large enough coverage (hopefully those who maintain such a Yes, the 0day robot internally maintains such a patch-id => commit-id (of the below git tree) database for in-flight patches. We exported a git tree which holds all in-flight patches, where each patchset maps to a new branch: https://github.com/0day-ci/linux/branches We monitor dozens of linux kernel mailing lists, the coverage is pretty good for the linux kernel project. > database are using the --stable version of the patch-id for indexing > the patches)? Right, we do use the --stable option. > I am wondering how well this scales, especially if a > well-known commit named by "base commit" needs to be checked out and > then many in-flight patches identified by "base patch-id"s need to > be applied on top of it, to prepare the tree-ish the patch being > evaluated can be applied to. The database is effectively a key-value store, in the scale of 1000 new mappings per day. If we only keep 100 days data, there will be 100k mappings, which could be hold in 10MB memory. > This starts to sound more like something you would want to write in > the cover letter, or the trailer block next to Signed-off-by: at the > end of the first patch in the series. Yes, that's roughly what the current patch does, except in the latter case we add new info after diffstat. > Or even after the mail > signature at the very end of the message (incidentally that would > probably minimize the damage to the Git codebase needed for this > addition--you should be able to do this without touching anything > other than builtin/log.c). That's an interesting place. It looks worth trying. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
Fengguang Wuwrites: > The necessary lines for the robot are > > base commit: > base patch-id: > or > base tree-id: > base patch-id: I will not repeat why a commit object name would be more appropriate than a tree object name here (please see my response to HPA). > The "base tree-id" will be useful if the submitted patchset is based > on a public (maintainer) commit. > > The "base patch-id" will be useful if the submitted patchset is based > on another patchset someone (likely the developer himself) posted to > the mailing list. Is there a database of in-flight patches indexed by their patch-ids with a large enough coverage (hopefully those who maintain such a database are using the --stable version of the patch-id for indexing the patches)? I am wondering how well this scales, especially if a well-known commit named by "base commit" needs to be checked out and then many in-flight patches identified by "base patch-id"s need to be applied on top of it, to prepare the tree-ish the patch being evaluated can be applied to. This starts to sound more like something you would want to write in the cover letter, or the trailer block next to Signed-off-by: at the end of the first patch in the series. Or even after the mail signature at the very end of the message (incidentally that would probably minimize the damage to the Git codebase needed for this addition--you should be able to do this without touching anything other than builtin/log.c). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
"H. Peter Anvin"writes: > Personally, as a maintainer, I would love to see the tree ID and > ideally also the commit ID a series is based on. The commit ID is > in some ways less useful since it is non-recreatable (and > therefore will never match for anything but the first patch of a > series), but could be useful to the maintainer. I admit that the very first "applies-to" proposal I made long time ago was based on a tree object name, not a commit object name like the proposal under discussion here, but I doubt that a tree object name is that much more useful than a commit object name in this context. Below, I assume that you are envisioning that the "base tree" recorded in a patch does not necessarily name a public, well-known tree (e.g. a tree-ish that already appears in Linus's tree for those who work with his tree, or other relevant trees like linux-next or net tree) [*1*]. It would name an unknown tree that results by applying a set of well-known patches in-flight on a public well-known commit. In that set-up, because you cannot guess committer identity and timestamp that are used by the patch submitter when these in-flight patches are applied to prepare the base for these private commits, a commit object name is useless, but it may still be possible for you to independently compute these trees that would result from set of well-known in-flight patches. But I do not think "it may be possible" above translates to usefulness in practice. Suppose we have only three well-known in-flight patches that are unrelated and independent, and you somehow know that the patch submitter built the first patch in the series by working on either a recently tagged commit (say v4.4) or a result of applying some of these in-flight patches on top of that commit. Even with these three commits, the base tree the patch submitter based his or her work on could be v4.4 itself, v4.4 plus one of the three patches (v4.4+A, v4.4+B, v4.4+C, three possibilities in total), v4.4 plus two of the three patches in some order (v4.4+AB, v4.4+BC, v4.4+CA, three possibilities in total) or v4.4 plus all of the three patches, so there are 8 possible top-level tree objects in total. Unless you are doing something unusual [*2*], even if you have all of these three well-known in-flight patches in your repository, you would have only a subset of them (you would certainly have v4.4, and v4.4 plus all three patches, but you would likely to have only one path between these two points, that's four commits recording four trees, out of possible 8). In the real world, of course you have far more than three well-known in-flight patches, so even though in theory trees may have better chance to be "figured out", I do not think it is practical to even attempt to "figure out" an unknown state given a tree object name. So assuming that it is a good idea to add some information to a patch that identifies the whole tree it applies to, I think it is sensible to (1) limit that identifiable set of tree-ishes only to well-known public ones, and (2) use the commit object name, not the tree object name, for the purpose of identifying these tree-ishes. If I understand Fengguang's plan correctly, a new work based on a public well-known base tree-ish plus other patches in-flight are to be accompanied by the identifier for that well-known base tree-ish and some identifiers for these in-flight patches, i.e. the robot will be told to check out the well-known base tree-ish, apply the prerequisite patches and then the patches for the work are applied on top to be evaluated. So the above two limitations I placed in the previous paragraph would not hurt the identifiability of the "base" tree-ish, I would think. [Footnote] *1* If you limit the bases to these well known ones, then there is no practical difference between commit and tree, because we can assume as a maintainer you would have these commits so you would have both, and once located, a commit would be easier to reason about (e.g. run "log" to see what changes there are between it and a well known tags). *2* By "something unusual" I mean you prepare the permutations of in-flight patches in your repository, to make it possible to find any of the 8 tree objects in this senario. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
On Tue, Feb 23, 2016 at 12:35:12PM -0800, Junio C Hamano wrote: > ebied...@xmission.com (Eric W. Biederman) writes: > > > Junio C Hamanowrites: > > > >> It is valuable for a testing organization to say "We tested this > >> series on top of version X. We know it works, we have tested on a > >> lot more hardware than the original developer had, we know this is > >> good to go." It is a valuable service. > >> > >> But that is valuable only if version X is still relevant, isn't it? > >> > >> Is the relevance of a version something that is decided by a > >> developer who submits a patch series, or is it more of an attribute > >> of the project and where the current integration is happening? > >> Judging from the responses from Dan to this thread, I think the > >> answer is the latter, and for the purpose of identifying the > >> relevant version(s), the project does not even care about the exact > >> commit, but it wants to know more about which branch the series is > >> targetted to. > >> > >> With that understanding, I find it hard to believe that it buys the > >> project much for the "base" commit to be recorded in a patch series > >> and automated testing is done by applying the patches to that exact > >> commit, which possibly is no-longer-relevant, even though it may > >> help shielding the testing machinery from "you tested with a wrong > >> version" complaints. > >> > >> Isn't it more valuable for the test robot to say "this may or may > >> not have worked well with whatever old version the patch series was > >> based on, but it no longer is useful to the current tip of the > >> 'master'"? If you consider what benefit the project would gain by > >> having such a robot, that is the conclusion I have to draw. > >> > >> So I still am not convinced that this "record base commit" is a > >> useful thing to do. > > > > So I don't know what value this has to the git project. > > Oh, don't worry, I wasn't talking about value this may have to the > Git project at all. "The value to the project" I mentioned in my > response was all about the value to the kernel project. > > > The value of Fengguag's automated testing to the kernel project is to > > smoke out really stupid things. > > I'll snip your bullet points, but as I wrote, I do understand the > value of prescreening test. > > What I was questioning was what value it gives to that testing to > use "the developer based this patch on this exact commit" added to > each incoming patch, and have Fengguag's testing machinery test a > patch with a base version that may no longer be relevant in the > context of the project. Compared to that, wouldn't "this no longer > applies to the branch the series targets" or "this still applies > cleanly, but no longer compiles because the surrounding API has > changed" be much more valuable? > > In your other message, you mentioned the "index $old..$new" lines, > and it is possible to use them to find a tree that the patch cleanly > applies to, but it will not uniquely identify _the_ version the > patch submitter used. IMHO, finding such _a_ tree from the recent > history of the branch that the patch targets and testing the patch Problem is, it's typically not clear what the patch "targets". Linux is such a big project, there are dozens of maintainer trees and an order more branches in their trees. The robot does a hard work trying to guess what are the patches' targets, based on various hints like the files being modified, the TO/CC list, the subject, etc. However it's error prone -- even big maintainers cannot make a certain judgement at times: "shall me or you take the patch?". Not to mention the semi-private rules of co-maintainers, sub-maintainers and topic branches. Sometimes people are backporting patches, hence "don't test my patch on new RC kernels"; Or people may be working on cool futurism patches that depend on more changes than the latest RC kernel. To a robot, it's really hard to avoid stupid actions in such a versatile bazaar. The "base commit/tree-id" under discussion would be a perfect compass to guide the bot. Thanks, Fengguang > on top of that tree (as opposed to testing the patch in the exact > context the developer worked on) would give the project a better > value. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
On Tue, Feb 23, 2016 at 11:51:31AM -0800, Junio C Hamano wrote: > Fengguang Wuwrites: > > >> >> I have a mixed feeling about this one, primarily because this was > >> >> already tried quite early in the life of "format-patch" command. > >> >> > >> >> > >> >> http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757 > >> >> > >> >> Only the name is different (it was called "applies-to" and named a > >> >> tree object). > >> > > >> > Either commit or tree object will work for us. We can use it in > >> > v2 if you prefer tree object. > >> > >> Sorry, I think you misunderstood. By "only the name is different", I > >> didn't mean to say that the tree object name should be shown as the > >> old proposal did. What I meant but didn't explicitly say, as I > >> thought it was sufficient to point at an old discussion thread, was > >> that this was already tried and rejected. This round uses different > >> name but does essentially the same thing as the old proposal, and I > >> do not think I heard anything new that supports this patch against > >> earlier rejection by Linus. That is what gave me a mixed feeling. > > > > I can understand the rejection by Linus in development process POV. > > > > However we are facing a new situation: in test robot POV, IMHO there > > are values to test exactly the same tree as the patch submitter. > > Otherwise the robot risks > > > > - false negative: failing to apply and test some patches > > - false positive: sending wrong bug reports due to guessed wrong base tree > > I always get negatives and positives confused, so let me think aloud > with an example. Let's say that somebody worked on adding a new > feature based on v4.2 codebase and sent in a patch series. The > series touched files in quiescent part of the system, these files > are identical between v4.2 and the current codebase at v4.5-rc5, and > the series applies cleanly to a "wrong" base tree at the tip of > 'master'. But it turns out that the series uses an old API that was > removed in the meantime. The test robot may say "the result of > applying the series does not even build" and the developer would > complain to you saying "You tested with a wrong version". > > I've already said that I can see the value this approach has for > you. By having the developer state which commit the series was > based on, it will shield you from such a complaint, because you > would not use closer-to-tip 'master' as the base, but instead use > v4.2 codebase for the test. > > As I said, what is unclear to me is what value this apporach gives > to the project. Problem arises when a developer based his work on a maintainer's topic branch. The robot doesn't know that and tests the patch on v4.5-rc5, which may trigger a false error because the patch depends on some changes in that maintainer's topic branch. In that case, the error report will be pure noise. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
On Tue, Feb 23, 2016 at 04:31:35PM +0300, Dan Carpenter wrote: > Blergh... You want it machine readable and I want it human readable. I Yeah. It's kind of tasting which may differ among people. I'll leave the judgments to Junio and others, and only add necessary comments to your points. > don't care so much about the cover letter but for the first patch then I > really want something minimal (one line) and human readable. > > base tree/branch: > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > base commit: afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc > base patch-id: a849260a843115dbac4b1a330d44256ee6b16d7b > base patch-subject: Linux 4.4 > base tag: v4.4 The necessary lines for the robot are base commit: base patch-id: or base tree-id: base patch-id: The "base tree-id" will be useful if the submitted patchset is based on a public (maintainer) commit. The "base patch-id" will be useful if the submitted patchset is based on another patchset someone (likely the developer himself) posted to the mailing list. > To me that looks like an unparseable wall of text. My version of that > is: > > Applies-to: afd2ff9b7e1b+ origin > > As a human all I really want to know is the tree to apply this to. If > it doesn't apply then I don't debug it, I just send an automatic note > "This doesn't apply to staging-next. Please redo." > > I think that Applies-to is a better name and also that grepping for > "^base " is less reliable than grepping for ^Applies-to. Grep reliability should be the same, if you use "^base tree-id" and "^base patch-id". If necessary, we can avoid white space by naming the keys base-tree-id and base-patch-id. > I used "origin" because that's the name in Next/Trees. The + means > private patches are applied. That's what we already do in naming the > kernel. If the + matters, then I would include a cover letter. > > I have no idea what a "base patch-id" is so that doesn't work at all. It'll come from this command man git patch-id It'll be useful if the patchset's base commit is a private one -- not in any public maintainer tree, however the developer may have posted it to LKML before. The "base patch-id" can more reliably track different versions of patches than "base patch-subject", and do not have the risk of information leaking in case it's a confidential patch. > Including the tag is just duplicative since we already have the hash. That's right. Just in case it's more human readable. > In my email, I proposed that we list all the other private patches in a > cover letter, but I think you are saying that we only need to know the > most recent private patch? Yes in test robot POV. However it's a general git feature, so I guess there will be more potential use cases and requirements. > Another idea would be to list them newest > to oldest (git log order instead of email order) in the cover letter. > > Btw, I always work against linux-next and Dave M is always getting > annoyed with me for not marking which patches go to net and which go to > net-next. I don't use git format-patch, but I will probably start using > "Applies-to: net" or "Applies-to: net-next". As for now, I see the netdev ML has the convention [PATCH net] [PATCH net-next] to tell Dave the target tree. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
Hi Eric, On Tue, Feb 23, 2016 at 01:56:07PM -0600, Eric W. Biederman wrote: > > Fengguag Wu, Xiaolong Ye, have you attempted to use the truncated > sha1 of the file the patch applies to? Git already places a file sha1 > at the top of a patch. See the index line? > > > diff --git a/fs/namespace.c b/fs/namespace.c > > index eccd925c6e82..3c3f8172c734 100644 Yes we've evaluated to make use of that index. The conclusion is, it helps make a better guess, however it's still a guessing work and far from perfect. A simple accounting shows only 1/5 files will be changed between two major kernel releases: wfg /c/linux% git ls-files |wc -l 52915 wfg /c/linux% git diff --name-only v4.3 v4.4|wc -l 10606 That means a huge number candidate base tree IDs matching the given blob IDs. > > --- a/fs/namespace.c > > +++ b/fs/namespace.c > > As I understand it you are aiming for making a good guess what the patch > or patches apply to, having a set of file hashes looks like it would > give you that. > > All it should take is to iterate over a patchset and for each file in > the patchset capture the first file hash. Then in the smallish set of > maintainer trees see if that set of file hashes matches any of their > recent commits. You should be able to prune the set of possible > maintainer trees even more by looking at the mailling list or lists > the patch was submitted to. We actually start with the above thinking half year ago. Yes it'll help narrow down the list of candidate maintainer trees. And the chance will be increased if the patchset modifies multiple files, and the fact some files are modified more frequently than the others. However it's still fundamentally a guess work. The best choice is to ask for explicit "base tree ID". > Before we talk about adding anything more I think we need a clear > picture of what you have tried with what already exists. A decade ago > part of the problem was that not everyone used git. At best it will > take a little while before everyone upgrades to a version of git diff > containing your changes, and if possibly even longer if they have to > start specifying an additional option when a diff is generated. That's a good concern. It may take year long delay before reaching reasonable population of the new feature. To speedup the process, we could advocate the new git option in 0day robot's error reports. Since we catch errors in ~10 LKML patches each day, within months most kernel developers should get the tips on how to set it up and enable the feature by default. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
On 02/23/2016 01:49 PM, Eric W. Biederman wrote: > > So I could really respect a patch header line that said: > tree abcdef0123456789...0123456789abcdef > > Where the numbers where the truncated tree hash before and after a patch > was applied. That would seem to give a little bit of extra sanity > checking in the application of git diffs as well. > I would rather have the untruncated base tree ID. The truncated file IDs already provide the verification component, and it is *way* cheaper to search for an untruncated ID. -hpa -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
On Tue, Feb 23, 2016 at 12:46 PM, H. Peter Anvinwrote: > On February 23, 2016 12:35:12 PM PST, Junio C Hamano > wrote: >>ebied...@xmission.com (Eric W. Biederman) writes: >> >>> Junio C Hamano writes: >>> It is valuable for a testing organization to say "We tested this series on top of version X. We know it works, we have tested on a lot more hardware than the original developer had, we know this is good to go." It is a valuable service. But that is valuable only if version X is still relevant, isn't it? Is the relevance of a version something that is decided by a developer who submits a patch series, or is it more of an attribute of the project and where the current integration is happening? Judging from the responses from Dan to this thread, I think the answer is the latter, and for the purpose of identifying the relevant version(s), the project does not even care about the exact commit, but it wants to know more about which branch the series is targetted to. With that understanding, I find it hard to believe that it buys the project much for the "base" commit to be recorded in a patch series and automated testing is done by applying the patches to that exact commit, which possibly is no-longer-relevant, even though it may help shielding the testing machinery from "you tested with a wrong version" complaints. Isn't it more valuable for the test robot to say "this may or may not have worked well with whatever old version the patch series was based on, but it no longer is useful to the current tip of the 'master'"? If you consider what benefit the project would gain by having such a robot, that is the conclusion I have to draw. So I still am not convinced that this "record base commit" is a useful thing to do. >>> >>> So I don't know what value this has to the git project. >> >>Oh, don't worry, I wasn't talking about value this may have to the >>Git project at all. "The value to the project" I mentioned in my >>response was all about the value to the kernel project. >> >>> The value of Fengguag's automated testing to the kernel project is to >>> smoke out really stupid things. >> >>I'll snip your bullet points, but as I wrote, I do understand the >>value of prescreening test. >> >>What I was questioning was what value it gives to that testing to >>use "the developer based this patch on this exact commit" added to >>each incoming patch, and have Fengguag's testing machinery test a >>patch with a base version that may no longer be relevant in the >>context of the project. Compared to that, wouldn't "this no longer >>applies to the branch the series targets" or "this still applies >>cleanly, but no longer compiles because the surrounding API has >>changed" be much more valuable? >> >>In your other message, you mentioned the "index $old..$new" lines, >>and it is possible to use them to find a tree that the patch cleanly >>applies to, but it will not uniquely identify _the_ version the >>patch submitter used. IMHO, finding such _a_ tree from the recent >>history of the branch that the patch targets and testing the patch >>on top of that tree (as opposed to testing the patch in the exact >>context the developer worked on) would give the project a better >>value. > > Personally, as a maintainer, I would love to see the tree ID and ideally also > the commit ID a series is based on. The commit ID is in some ways less > useful since it is non-recreatable (and therefore will never match for > anything but the first patch of a series), but could be useful to the > maintainer. As a contributor a commit id would also add value I would think. When it is unclear where a series is headed, contributors in Git land say things like: This applies cleanly on origin/master. And usually this is the master branch from yesterday as Junio pushes once a day. origin/master being a moving target, so it may not apply any more, so a commit sha1 would help for finding out what to do in the maintainer role. This discussion sounds to me as if we'd want to have some advantages of the (kernel pull style, not github style) pull-request here for patch submissions. I don't remember the exact quote, but Linus said once upon a time about the pull request workflow roughly: "Please pull from ... And by the way I tested this software for 2 month during development" (For kernels that makes sense as the contributor run the kernel and it worked) as pull requests have the new patches on top of the exact parents the contributor put them on, there can be more faith in the process to divide between the problems the contributor overlooked/introduced and problems as introduced by the merge of the maintainer. Now when applying patches at another parent than the contributor developed on, some subtle problems may arise,
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
"H. Peter Anvin"writes: > On February 23, 2016 12:35:12 PM PST, Junio C Hamano > wrote: >>ebied...@xmission.com (Eric W. Biederman) writes: >> >>> Junio C Hamano writes: >>> It is valuable for a testing organization to say "We tested this series on top of version X. We know it works, we have tested on a lot more hardware than the original developer had, we know this is good to go." It is a valuable service. But that is valuable only if version X is still relevant, isn't it? Is the relevance of a version something that is decided by a developer who submits a patch series, or is it more of an attribute of the project and where the current integration is happening? Judging from the responses from Dan to this thread, I think the answer is the latter, and for the purpose of identifying the relevant version(s), the project does not even care about the exact commit, but it wants to know more about which branch the series is targetted to. With that understanding, I find it hard to believe that it buys the project much for the "base" commit to be recorded in a patch series and automated testing is done by applying the patches to that exact commit, which possibly is no-longer-relevant, even though it may help shielding the testing machinery from "you tested with a wrong version" complaints. Isn't it more valuable for the test robot to say "this may or may not have worked well with whatever old version the patch series was based on, but it no longer is useful to the current tip of the 'master'"? If you consider what benefit the project would gain by having such a robot, that is the conclusion I have to draw. So I still am not convinced that this "record base commit" is a useful thing to do. >>> >>> So I don't know what value this has to the git project. >> >>Oh, don't worry, I wasn't talking about value this may have to the >>Git project at all. "The value to the project" I mentioned in my >>response was all about the value to the kernel project. >> >>> The value of Fengguag's automated testing to the kernel project is to >>> smoke out really stupid things. >> >>I'll snip your bullet points, but as I wrote, I do understand the >>value of prescreening test. >> >>What I was questioning was what value it gives to that testing to >>use "the developer based this patch on this exact commit" added to >>each incoming patch, and have Fengguag's testing machinery test a >>patch with a base version that may no longer be relevant in the >>context of the project. Compared to that, wouldn't "this no longer >>applies to the branch the series targets" or "this still applies >>cleanly, but no longer compiles because the surrounding API has >>changed" be much more valuable? >> >>In your other message, you mentioned the "index $old..$new" lines, >>and it is possible to use them to find a tree that the patch cleanly >>applies to, but it will not uniquely identify _the_ version the >>patch submitter used. IMHO, finding such _a_ tree from the recent >>history of the branch that the patch targets and testing the patch >>on top of that tree (as opposed to testing the patch in the exact >>context the developer worked on) would give the project a better >>value. > > Personally, as a maintainer, I would love to see the tree ID and > ideally also the commit ID a series is based on. The commit ID is in > some ways less useful since it is non-recreatable (and therefore will > never match for anything but the first patch of a series), but could > be useful to the maintainer. > > As far as putting in a bunch of metainformation that has to be > manually or semimanually obtained, there are a lot of issues with > that, as it way too easily turns into wrong information or potential > information leaks that might make corporate contributors > uncomfortable. I would really love to hear from Fengguag about where this is a practical problem, but I expect the flock of x86 topic tress is probably one of those cases where there are problems for automation to figure out which tree a patch applies to in practice. The mailing list for x86 patches is linux-kernel. Even if you filter by x...@kernel.org to detect it is an x86 patch which topic branch a patch goes to was not clear to me from a quick skim of recent lkml x86 patches. So I could really respect a patch header line that said: tree abcdef0123456789...0123456789abcdef Where the numbers where the truncated tree hash before and after a patch was applied. That would seem to give a little bit of extra sanity checking in the application of git diffs as well. Eric -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
On February 23, 2016 12:35:12 PM PST, Junio C Hamanowrote: >ebied...@xmission.com (Eric W. Biederman) writes: > >> Junio C Hamano writes: >> >>> It is valuable for a testing organization to say "We tested this >>> series on top of version X. We know it works, we have tested on a >>> lot more hardware than the original developer had, we know this is >>> good to go." It is a valuable service. >>> >>> But that is valuable only if version X is still relevant, isn't it? >>> >>> Is the relevance of a version something that is decided by a >>> developer who submits a patch series, or is it more of an attribute >>> of the project and where the current integration is happening? >>> Judging from the responses from Dan to this thread, I think the >>> answer is the latter, and for the purpose of identifying the >>> relevant version(s), the project does not even care about the exact >>> commit, but it wants to know more about which branch the series is >>> targetted to. >>> >>> With that understanding, I find it hard to believe that it buys the >>> project much for the "base" commit to be recorded in a patch series >>> and automated testing is done by applying the patches to that exact >>> commit, which possibly is no-longer-relevant, even though it may >>> help shielding the testing machinery from "you tested with a wrong >>> version" complaints. >>> >>> Isn't it more valuable for the test robot to say "this may or may >>> not have worked well with whatever old version the patch series was >>> based on, but it no longer is useful to the current tip of the >>> 'master'"? If you consider what benefit the project would gain by >>> having such a robot, that is the conclusion I have to draw. >>> >>> So I still am not convinced that this "record base commit" is a >>> useful thing to do. >> >> So I don't know what value this has to the git project. > >Oh, don't worry, I wasn't talking about value this may have to the >Git project at all. "The value to the project" I mentioned in my >response was all about the value to the kernel project. > >> The value of Fengguag's automated testing to the kernel project is to >> smoke out really stupid things. > >I'll snip your bullet points, but as I wrote, I do understand the >value of prescreening test. > >What I was questioning was what value it gives to that testing to >use "the developer based this patch on this exact commit" added to >each incoming patch, and have Fengguag's testing machinery test a >patch with a base version that may no longer be relevant in the >context of the project. Compared to that, wouldn't "this no longer >applies to the branch the series targets" or "this still applies >cleanly, but no longer compiles because the surrounding API has >changed" be much more valuable? > >In your other message, you mentioned the "index $old..$new" lines, >and it is possible to use them to find a tree that the patch cleanly >applies to, but it will not uniquely identify _the_ version the >patch submitter used. IMHO, finding such _a_ tree from the recent >history of the branch that the patch targets and testing the patch >on top of that tree (as opposed to testing the patch in the exact >context the developer worked on) would give the project a better >value. Personally, as a maintainer, I would love to see the tree ID and ideally also the commit ID a series is based on. The commit ID is in some ways less useful since it is non-recreatable (and therefore will never match for anything but the first patch of a series), but could be useful to the maintainer. As far as putting in a bunch of metainformation that has to be manually or semimanually obtained, there are a lot of issues with that, as it way too easily turns into wrong information or potential information leaks that might make corporate contributors uncomfortable. -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
ebied...@xmission.com (Eric W. Biederman) writes: > Junio C Hamanowrites: > >> It is valuable for a testing organization to say "We tested this >> series on top of version X. We know it works, we have tested on a >> lot more hardware than the original developer had, we know this is >> good to go." It is a valuable service. >> >> But that is valuable only if version X is still relevant, isn't it? >> >> Is the relevance of a version something that is decided by a >> developer who submits a patch series, or is it more of an attribute >> of the project and where the current integration is happening? >> Judging from the responses from Dan to this thread, I think the >> answer is the latter, and for the purpose of identifying the >> relevant version(s), the project does not even care about the exact >> commit, but it wants to know more about which branch the series is >> targetted to. >> >> With that understanding, I find it hard to believe that it buys the >> project much for the "base" commit to be recorded in a patch series >> and automated testing is done by applying the patches to that exact >> commit, which possibly is no-longer-relevant, even though it may >> help shielding the testing machinery from "you tested with a wrong >> version" complaints. >> >> Isn't it more valuable for the test robot to say "this may or may >> not have worked well with whatever old version the patch series was >> based on, but it no longer is useful to the current tip of the >> 'master'"? If you consider what benefit the project would gain by >> having such a robot, that is the conclusion I have to draw. >> >> So I still am not convinced that this "record base commit" is a >> useful thing to do. > > So I don't know what value this has to the git project. Oh, don't worry, I wasn't talking about value this may have to the Git project at all. "The value to the project" I mentioned in my response was all about the value to the kernel project. > The value of Fengguag's automated testing to the kernel project is to > smoke out really stupid things. I'll snip your bullet points, but as I wrote, I do understand the value of prescreening test. What I was questioning was what value it gives to that testing to use "the developer based this patch on this exact commit" added to each incoming patch, and have Fengguag's testing machinery test a patch with a base version that may no longer be relevant in the context of the project. Compared to that, wouldn't "this no longer applies to the branch the series targets" or "this still applies cleanly, but no longer compiles because the surrounding API has changed" be much more valuable? In your other message, you mentioned the "index $old..$new" lines, and it is possible to use them to find a tree that the patch cleanly applies to, but it will not uniquely identify _the_ version the patch submitter used. IMHO, finding such _a_ tree from the recent history of the branch that the patch targets and testing the patch on top of that tree (as opposed to testing the patch in the exact context the developer worked on) would give the project a better value. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
Junio C Hamanowrites: > Fengguang Wu writes: > >>> >> I have a mixed feeling about this one, primarily because this was >>> >> already tried quite early in the life of "format-patch" command. >>> >> >>> >> >>> >> http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757 >>> >> >>> >> Only the name is different (it was called "applies-to" and named a >>> >> tree object). >>> > >>> > Either commit or tree object will work for us. We can use it in >>> > v2 if you prefer tree object. >>> >>> Sorry, I think you misunderstood. By "only the name is different", I >>> didn't mean to say that the tree object name should be shown as the >>> old proposal did. What I meant but didn't explicitly say, as I >>> thought it was sufficient to point at an old discussion thread, was >>> that this was already tried and rejected. This round uses different >>> name but does essentially the same thing as the old proposal, and I >>> do not think I heard anything new that supports this patch against >>> earlier rejection by Linus. That is what gave me a mixed feeling. >> >> I can understand the rejection by Linus in development process POV. >> >> However we are facing a new situation: in test robot POV, IMHO there >> are values to test exactly the same tree as the patch submitter. >> Otherwise the robot risks >> >> - false negative: failing to apply and test some patches >> - false positive: sending wrong bug reports due to guessed wrong base tree > > I always get negatives and positives confused, so let me think aloud > with an example. Let's say that somebody worked on adding a new > feature based on v4.2 codebase and sent in a patch series. The > series touched files in quiescent part of the system, these files > are identical between v4.2 and the current codebase at v4.5-rc5, and > the series applies cleanly to a "wrong" base tree at the tip of > 'master'. But it turns out that the series uses an old API that was > removed in the meantime. The test robot may say "the result of > applying the series does not even build" and the developer would > complain to you saying "You tested with a wrong version". > > I've already said that I can see the value this approach has for > you. By having the developer state which commit the series was > based on, it will shield you from such a complaint, because you > would not use closer-to-tip 'master' as the base, but instead use > v4.2 codebase for the test. > > As I said, what is unclear to me is what value this apporach gives > to the project. > >>> I can see that recording the exact commit object name allows you to >>> claim that you identified the exact commit to apply the patch, and >>> that you tested the exact tree contents. It however is unclear what >>> the value of such a claim would be to the project or to the >>> integrator. >> >> The value of base commit info is: providing a solid ground to the >> tester, to reliably avoid false positive/negatives. > > It is valuable for a testing organization to say "We tested this > series on top of version X. We know it works, we have tested on a > lot more hardware than the original developer had, we know this is > good to go." It is a valuable service. > > But that is valuable only if version X is still relevant, isn't it? > > Is the relevance of a version something that is decided by a > developer who submits a patch series, or is it more of an attribute > of the project and where the current integration is happening? > Judging from the responses from Dan to this thread, I think the > answer is the latter, and for the purpose of identifying the > relevant version(s), the project does not even care about the exact > commit, but it wants to know more about which branch the series is > targetted to. > > With that understanding, I find it hard to believe that it buys the > project much for the "base" commit to be recorded in a patch series > and automated testing is done by applying the patches to that exact > commit, which possibly is no-longer-relevant, even though it may > help shielding the testing machinery from "you tested with a wrong > version" complaints. > > Isn't it more valuable for the test robot to say "this may or may > not have worked well with whatever old version the patch series was > based on, but it no longer is useful to the current tip of the > 'master'"? If you consider what benefit the project would gain by > having such a robot, that is the conclusion I have to draw. > > So I still am not convinced that this "record base commit" is a > useful thing to do. So I don't know what value this has to the git project. The value of Fengguag's automated testing to the kernel project is to smoke out really stupid things. - A developer forgot to actually compile test their code. An error report that the code does not even compile from Fengguag allows the patch to be dropped without spending any time on it. - A developer failed to test some
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
Fengguag Wu, Xiaolong Ye, have you attempted to use the truncated sha1 of the file the patch applies to? Git already places a file sha1 at the top of a patch. See the index line? > diff --git a/fs/namespace.c b/fs/namespace.c > index eccd925c6e82..3c3f8172c734 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c As I understand it you are aiming for making a good guess what the patch or patches apply to, having a set of file hashes looks like it would give you that. All it should take is to iterate over a patchset and for each file in the patchset capture the first file hash. Then in the smallish set of maintainer trees see if that set of file hashes matches any of their recent commits. You should be able to prune the set of possible maintainer trees even more by looking at the mailling list or lists the patch was submitted to. Before we talk about adding anything more I think we need a clear picture of what you have tried with what already exists. A decade ago part of the problem was that not everyone used git. At best it will take a little while before everyone upgrades to a version of git diff containing your changes, and if possibly even longer if they have to start specifying an additional option when a diff is generated. Eric -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
Fengguang Wuwrites: >> >> I have a mixed feeling about this one, primarily because this was >> >> already tried quite early in the life of "format-patch" command. >> >> >> >> http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757 >> >> >> >> Only the name is different (it was called "applies-to" and named a >> >> tree object). >> > >> > Either commit or tree object will work for us. We can use it in >> > v2 if you prefer tree object. >> >> Sorry, I think you misunderstood. By "only the name is different", I >> didn't mean to say that the tree object name should be shown as the >> old proposal did. What I meant but didn't explicitly say, as I >> thought it was sufficient to point at an old discussion thread, was >> that this was already tried and rejected. This round uses different >> name but does essentially the same thing as the old proposal, and I >> do not think I heard anything new that supports this patch against >> earlier rejection by Linus. That is what gave me a mixed feeling. > > I can understand the rejection by Linus in development process POV. > > However we are facing a new situation: in test robot POV, IMHO there > are values to test exactly the same tree as the patch submitter. > Otherwise the robot risks > > - false negative: failing to apply and test some patches > - false positive: sending wrong bug reports due to guessed wrong base tree I always get negatives and positives confused, so let me think aloud with an example. Let's say that somebody worked on adding a new feature based on v4.2 codebase and sent in a patch series. The series touched files in quiescent part of the system, these files are identical between v4.2 and the current codebase at v4.5-rc5, and the series applies cleanly to a "wrong" base tree at the tip of 'master'. But it turns out that the series uses an old API that was removed in the meantime. The test robot may say "the result of applying the series does not even build" and the developer would complain to you saying "You tested with a wrong version". I've already said that I can see the value this approach has for you. By having the developer state which commit the series was based on, it will shield you from such a complaint, because you would not use closer-to-tip 'master' as the base, but instead use v4.2 codebase for the test. As I said, what is unclear to me is what value this apporach gives to the project. >> I can see that recording the exact commit object name allows you to >> claim that you identified the exact commit to apply the patch, and >> that you tested the exact tree contents. It however is unclear what >> the value of such a claim would be to the project or to the >> integrator. > > The value of base commit info is: providing a solid ground to the > tester, to reliably avoid false positive/negatives. It is valuable for a testing organization to say "We tested this series on top of version X. We know it works, we have tested on a lot more hardware than the original developer had, we know this is good to go." It is a valuable service. But that is valuable only if version X is still relevant, isn't it? Is the relevance of a version something that is decided by a developer who submits a patch series, or is it more of an attribute of the project and where the current integration is happening? Judging from the responses from Dan to this thread, I think the answer is the latter, and for the purpose of identifying the relevant version(s), the project does not even care about the exact commit, but it wants to know more about which branch the series is targetted to. With that understanding, I find it hard to believe that it buys the project much for the "base" commit to be recorded in a patch series and automated testing is done by applying the patches to that exact commit, which possibly is no-longer-relevant, even though it may help shielding the testing machinery from "you tested with a wrong version" complaints. Isn't it more valuable for the test robot to say "this may or may not have worked well with whatever old version the patch series was based on, but it no longer is useful to the current tip of the 'master'"? If you consider what benefit the project would gain by having such a robot, that is the conclusion I have to draw. So I still am not convinced that this "record base commit" is a useful thing to do. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
Blergh... You want it machine readable and I want it human readable. I don't care so much about the cover letter but for the first patch then I really want something minimal (one line) and human readable. base tree/branch: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master base commit: afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc base patch-id: a849260a843115dbac4b1a330d44256ee6b16d7b base patch-subject: Linux 4.4 base tag: v4.4 To me that looks like an unparseable wall of text. My version of that is: Applies-to: afd2ff9b7e1b+ origin As a human all I really want to know is the tree to apply this to. If it doesn't apply then I don't debug it, I just send an automatic note "This doesn't apply to staging-next. Please redo." I think that Applies-to is a better name and also that grepping for "^base " is less reliable than grepping for ^Applies-to. I used "origin" because that's the name in Next/Trees. The + means private patches are applied. That's what we already do in naming the kernel. If the + matters, then I would include a cover letter. I have no idea what a "base patch-id" is so that doesn't work at all. Including the tag is just duplicative since we already have the hash. In my email, I proposed that we list all the other private patches in a cover letter, but I think you are saying that we only need to know the most recent private patch? Another idea would be to list them newest to oldest (git log order instead of email order) in the cover letter. Btw, I always work against linux-next and Dave M is always getting annoyed with me for not marking which patches go to net and which go to net-next. I don't use git format-patch, but I will probably start using "Applies-to: net" or "Applies-to: net-next". regards, dan carpenter -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
Hi Dan, On Tue, Feb 23, 2016 at 01:32:53PM +0300, Dan Carpenter wrote: > So this is the format for the first patch? > > base commit: 0233b800c838ddda41db318ee396320b3c21a560 What's in my mind is lines like base tree/branch: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master base commit: afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc base patch-id: a849260a843115dbac4b1a330d44256ee6b16d7b The point is one piece of information per line, so that new lines can be added trivially in future, like base patch-subject: Linux 4.4 base tag: v4.4 The exact format can be improved wherever suitable. For example, use more suitable key name part (eg. "base commit" => "base-commit") or value part (eg. "$tree_url $branch" to "$tree_url#$branch"). > Can we change it to include the name of the public tree we are starting > from? > > applies-to: 0233b800c838 > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git#master No problem, just that I'd prefer breaking up such information into multi "key: value" lines. > Of course, my absolute prefered format would be: > > applies-to: net-next 0233b800c838 > > I don't think that's possible though? I often write that sort of a line > in my emails to Dave already. Yeah, that'd be most human readable. It does require people (and scripts) to reach consensus on the tree/branch name, which may only be possible for well known trees. > Fengguang was suggesting something like this if we have to include > unmerged patches: > > applies-to: net-next 0233b800c838 > private patchset 1 > private patchset 2 > > I don't think git knows what a patchset is. Git may not need to have patchset concept. Suppose a developer's local branch has v4.4 private commit 1, subject: do aaa private commit 2, subject: do bbb private commit 3, subject: do ccc private commit 4, subject: do ddd private commit 5, subject: do eee If he decided to send commits 1-2 as one patchset, and 3-5 as another patchset to LKML. The 2 cover letters would look like (only showing useful fields): $ git format-patch commit 1..commit 2 [PATCH 0/2] base commit: afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc $ git format-patch commit 3..commit 5 [PATCH 0/3] base patch-subject: do bbb The 0day robot will be able to find the suitable base and re-create exactly the same tree object for both the above 2 patchsets based on the first one's "base commit" and the second one's "base patch-subject". > We would have to include the subject line for each unmerged patch. That's a good idea! > I think we should only do that if there is a cover letter, otherwise > the it's too noisy. Or if no cover letter, the information can be included in the first patch, ie. [PATCH 1/N]. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
So this is the format for the first patch? base commit: 0233b800c838ddda41db318ee396320b3c21a560 Can we change it to include the name of the public tree we are starting from? applies-to: 0233b800c838 git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git#master Of course, my absolute prefered format would be: applies-to: net-next 0233b800c838 I don't think that's possible though? I often write that sort of a line in my emails to Dave already. Fengguang was suggesting something like this if we have to include unmerged patches: applies-to: net-next 0233b800c838 private patchset 1 private patchset 2 I don't think git knows what a patchset is. We would have to include the subject line for each unmerged patch. I think we should only do that if there is a cover letter, otherwise the it's too noisy. regards, dan carpenter -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
On Tue, Feb 23, 2016 at 01:23:19AM -0800, H. Peter Anvin wrote: > On 02/23/16 01:17, Fengguang Wu wrote: > > > > However we are facing a new situation: in test robot POV, IMHO there > > are values to test exactly the same tree as the patch submitter. > > Otherwise the robot risks > > > > - false negative: failing to apply and test some patches > > - false positive: sending wrong bug reports due to guessed wrong base tree > > > > Wouldn't the important part here be the git hash, rather than the tree? > If you have the same hash then it by definition is the same contents? Yes. Sorry for the partial wording! We should be talking about the same thing: the hash of the tree object. The commit SHA1 will also do the work. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
On 02/23/16 01:17, Fengguang Wu wrote: > > However we are facing a new situation: in test robot POV, IMHO there > are values to test exactly the same tree as the patch submitter. > Otherwise the robot risks > > - false negative: failing to apply and test some patches > - false positive: sending wrong bug reports due to guessed wrong base tree > Wouldn't the important part here be the git hash, rather than the tree? If you have the same hash then it by definition is the same contents? -hpa -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
Hi Junio and All, CC more relevant people. FYI this thread starts here: http://thread.gmane.org/gmane.comp.version-control.git On Mon, Feb 22, 2016 at 10:54:38PM -0800, Junio C Hamano wrote: > Fengguang Wuwrites: > > > Hi Junio, > > > > On Sun, Feb 21, 2016 at 08:19:56PM -0800, Junio C Hamano wrote: > >> Xiaolong Ye writes: > >> > >> > It would be helpful for maintainers or reviewers to know the base tree > >> > info of the patches created by git format-patch. Teach git format-patch > >> > a --base-tree-info option to record these info. > >> > > >> > Signed-off-by: Xiaolong Ye > >> > --- > >> > >> I have a mixed feeling about this one, primarily because this was > >> already tried quite early in the life of "format-patch" command. > >> > >> http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757 > >> > >> Only the name is different (it was called "applies-to" and named a > >> tree object). > > > > Either commit or tree object will work for us. We can use it in > > v2 if you prefer tree object. > > Sorry, I think you misunderstood. By only the name is different, I > didn't mean to say that the tree object name should be shown as the > old proposal did. What I meant but didn't explicitly say, as I > thought it was sufficient to point at an old discussion thread, was > that this was already tried and rejected. This round uses different > name but does essentially the same thing as the old proposal, and I > do not think I heard anything new that supports this patch against > earlier rejection by Linus. That is what gave me a mixed feeling. I can understand the rejection by Linus in development process POV. However we are facing a new situation: in test robot POV, IMHO there are values to test exactly the same tree as the patch submitter. Otherwise the robot risks - false negative: failing to apply and test some patches - false positive: sending wrong bug reports due to guessed wrong base tree > >> Is it your goal to insist on one exact commit the patch is applied > >> to? > > > > Right. Our goal is fully automated patch testing, where the base tree > > info is required for *reliably* avoid reporting false positives. > > > > A clean git-apply does not guarantee the resulted code is logically > > consistent and hence testable by 3rd party. For a 3rd party tester to > > provide useful and trustable test reports, he must apply the patch to > > exactly the same base as the patch submitter. > > The patch submitter (or you as a third party tester) is not in the > position to dictate the integrator to apply the patch to one > specific commit and use it from there. The integrator would pick an > appropriate base that would be different from the commit where the > patch was taken from, apply it there, and merge the result to the > tip of the mainline, or apply the patch directly to the tip of the > mainline. Even if the integrator picked the commit the patch was > taken from, the result would not be used alone without any other > changes, i.e. before getting merged into the integration branch. Yeah. Per my understanding the base commit info will be mainly parsed by test robots instead of integrators. > So in that sense, any test that is done by the patch submitter and > the third party tester would not be what will be released to the > wild *anyway*. The resulting code will be exercised in a context > that *is* different from the context the original author had. That's right. But no worry, when the patch is merged by maintainer, we'll test it once again in the maintainer tree. Pre-merge patch testing is useful in 2 ways: - shift left testing to early review stage - maintainer trees are typically not rebaseable. When errors are discovered there, it's a bit too late: the error will likely remain in git history for ever. Which will hurt bisects. > I can see that recording the exact commit object name allows you to > claim that you identified the exact commit to apply the patch, and > that you tested the exact tree contents. It however is unclear what > the value of such a claim would be to the project or to the > integrator. The value of base commit info is: providing a solid ground to the tester, to reliably avoid false positive/negatives. > So I dunno. FYI, the 0day test robot will be able to work better if provided the base commit info. It'll work a bit more sophisticated than simply relying on the base commit info: if it's sure about the tree the patch is targeted for (or the maintainer would apply to), it'll use that as base tree[*]; otherwise it'll fall back to using the base commit info included in the patchset. [*] For examples, [PATCH -mm] ... [PATCH net] ... For such patches we are sure they are targeted for the well known mm/net trees. Anyway the worst case of not adopting the discussed patch is, the 0day test robot continue to work in current heuristic
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
Fengguang Wuwrites: > Hi Junio, > > On Sun, Feb 21, 2016 at 08:19:56PM -0800, Junio C Hamano wrote: >> Xiaolong Ye writes: >> >> > It would be helpful for maintainers or reviewers to know the base tree >> > info of the patches created by git format-patch. Teach git format-patch >> > a --base-tree-info option to record these info. >> > >> > Signed-off-by: Xiaolong Ye >> > --- >> >> I have a mixed feeling about this one, primarily because this was >> already tried quite early in the life of "format-patch" command. >> >> http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757 >> >> Only the name is different (it was called "applies-to" and named a >> tree object). > > Either commit or tree object will work for us. We can use it in > v2 if you prefer tree object. Sorry, I think you misunderstood. By only the name is different, I didn't mean to say that the tree object name should be shown as the old proposal did. What I meant but didn't explicitly say, as I thought it was sufficient to point at an old discussion thread, was that this was already tried and rejected. This round uses different name but does essentially the same thing as the old proposal, and I do not think I heard anything new that supports this patch against earlier rejection by Linus. That is what gave me a mixed feeling. >> Is it your goal to insist on one exact commit the patch is applied >> to? > > Right. Our goal is fully automated patch testing, where the base tree > info is required for *reliably* avoid reporting false positives. > > A clean git-apply does not guarantee the resulted code is logically > consistent and hence testable by 3rd party. For a 3rd party tester to > provide useful and trustable test reports, he must apply the patch to > exactly the same base as the patch submitter. The patch submitter (or you as a third party tester) is not in the position to dictate the integrator to apply the patch to one specific commit and use it from there. The integrator would pick an appropriate base that would be different from the commit where the patch was taken from, apply it there, and merge the result to the tip of the mainline, or apply the patch directly to the tip of the mainline. Even if the integrator picked the commit the patch was taken from, the result would not be used alone without any other changes, i.e. before getting merged into the integration branch. So in that sense, any test that is done by the patch submitter and the third party tester would not be what will be released to the wild *anyway*. The resulting code will be exercised in a context that *is* different from the context the original author had. I can see that recording the exact commit object name allows you to claim that you identified the exact commit to apply the patch, and that you tested the exact tree contents. It however is unclear what the value of such a claim would be to the project or to the integrator. So I dunno. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
Hi Junio, On Sun, Feb 21, 2016 at 08:19:56PM -0800, Junio C Hamano wrote: > Xiaolong Yewrites: > > > It would be helpful for maintainers or reviewers to know the base tree > > info of the patches created by git format-patch. Teach git format-patch > > a --base-tree-info option to record these info. > > > > Signed-off-by: Xiaolong Ye > > --- > > I have a mixed feeling about this one, primarily because this was > already tried quite early in the life of "format-patch" command. > > http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757 > > Only the name is different (it was called "applies-to" and named a > tree object). Either commit or tree object will work for us. We can use it in v2 if you prefer tree object. > The approach the patch I am responding to takes, which is to touch > diff.[ch], is probably flawed. Even if we assume, for the sake of > reviewing, that "which commit (or tree) this patch applies to" is a > good thing to add, this is useful only to the very first patch in > the series. Hence its output should come from some function in > builtin/log.c that iterates over commits and done for only the first > one in a series, not by the diff machinery that is called by that > loop iterating over commits. For patchset with a cover letter, builtin/log.c is the best place to put the base tree info: [PATCH 0/3] add base tree info here [PATCH 1/3] [PATCH 2/3] [PATCH 3/3] For the others, it looks the best place for the new info is right after the diffstat of the first patch: [PATCH 1/3] add base tree info here [PATCH 2/3] [PATCH 3/3] Since the diffstat is shown in diff.c, we add a few lines of code after diff.c calls diff_summary(). > It is unclear what your goal is, and "would be helpful" is just as > convincing as saying it would be helpful to record the phase of the > moon when the commit was made. A typical patch rarely touches all > the files in the project, so there will be multiple commits in the > existing history of the project to which the patch would apply > cleanly. > > Is it your goal to insist on one exact commit the patch is applied > to? Right. Our goal is fully automated patch testing, where the base tree info is required for *reliably* avoid reporting false positives. A clean git-apply does not guarantee the resulted code is logically consistent and hence testable by 3rd party. For a 3rd party tester to provide useful and trustable test reports, he must apply the patch to exactly the same base as the patch submitter. For example, suppose a developer posted a patch to file A to add a call to func_foo(). The test robot could apply it cleanly to latest kernel git tree, eg. v4.5-rc5, and continue to build test it. Suppose v4.5-rc5 included another commit to rename func_foo() to func_bar() in file B. As a result, the robot will catch and report a build error "func_foo() not defined" to the developer, who will then complain "but my patch is based on v4.4, you tested my patch on the wrong base". Without the base tree info, what we do now is a lot of dirty guess works, trying the same patch on some "maybe suitable" bases and see if they'll trigger the same error. For example, this is one of the error reports the 0day robot auto sent to LKML: https://lists.01.org/pipermail/kbuild-all/2016-February/017385.html Re: [PATCH 2/2] kernel: sched: fix preempt_disable_ip recording for preempt_disable() [auto build test ERROR on tip/sched/core] [also build test ERROR on v4.5-rc3 next-20160211] [if your patch is applied to the wrong git tree, please drop us a note to help improving the system] We sent 300+ "/maybe/ your patch has problem" reports like that in each month. Whose quality can be greatly improved if we can be 100% sure about the test errors. > Or are you OK as long as the patch is applied to the > same set of blobs the diff was taken from? A developer may have a > few unrelated and private commits on top of Linus's released > version, on top of which she builds a series. As long as the paths > touched by the patches in this series do not overlap with the paths As shown in the above example, "no overlap paths" is not a sufficient condition for "patches are irrelevant". Because patch A may change definition of a function in one file while patch B may reference that function in another file. So there is no easy mechanical way for a script to remove the several non-Linus commits and can still provide useful base commit info. However if we teach this patch to include one more line: base patch-id: HASH And the developer's local patches are mainline tree private patchset 1 private patchset 2 The developer will likely have already post patchset 2's base patchset 1 to the mailing list, then our robot already know about patchset 1's patch ids, therefore being able to apply
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
On Sun, Feb 21, 2016 at 8:19 PM, Junio C Hamanowrote: > If you are OK with accepting a patch application to a tree with the > same blobs the diff was taken from, then the format-patch output > already has all the necessary information. For each "diff --git" > part, there is the "index $old..$new" line that records preimage and > postimage blob object ID, so you should be able to find a tree that > has the matching blobs in the paths mentioned in the patch, and it > is guaranteed that the patch would apply cleanly to such a tree. > > Of course, that requires the recipient of the patch to have the all > the blobs mentioned as the preimage in the patch, but it would be a > reasonable assumption, as your patch assumes that the recipient has > the commit, and if he has a commit by definition he would have all > the blobs recorded by that commit. > Sorry for a sort of tangent here, but what method would you recommend for asking git to locate a commit which matches all those blobs? Is there any already existing plumbing to do such a thing? Regards, Jake -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/1] format-patch: add an option to record base tree info
Xiaolong Yewrites: > It would be helpful for maintainers or reviewers to know the base tree > info of the patches created by git format-patch. Teach git format-patch > a --base-tree-info option to record these info. > > Signed-off-by: Xiaolong Ye > --- I have a mixed feeling about this one, primarily because this was already tried quite early in the life of "format-patch" command. http://thread.gmane.org/gmane.comp.version-control.git/9694/focus=9757 Only the name is different (it was called "applies-to" and named a tree object). The approach the patch I am responding to takes, which is to touch diff.[ch], is probably flawed. Even if we assume, for the sake of reviewing, that "which commit (or tree) this patch applies to" is a good thing to add, this is useful only to the very first patch in the series. Hence its output should come from some function in builtin/log.c that iterates over commits and done for only the first one in a series, not by the diff machinery that is called by that loop iterating over commits. It is unclear what your goal is, and "would be helpful" is just as convincing as saying it would be helpful to record the phase of the moon when the commit was made. A typical patch rarely touches all the files in the project, so there will be multiple commits in the existing history of the project to which the patch would apply cleanly. Is it your goal to insist on one exact commit the patch is applied to? Or are you OK as long as the patch is applied to the same set of blobs the diff was taken from? A developer may have a few unrelated and private commits on top of Linus's released version, on top of which she builds a series. As long as the paths touched by the patches in this series do not overlap with the paths touched by the initial few unrelated and private commits, such a series should be testable directly on top of Linus's released version, without forcing her to first rebase the series to remove these initial few unrelated and private commits from her history before running "format-patch --base-commit" (your patch is recording the commit object name, and it shouldn't call it tree-info), but you are forcing her to do so because these private commits will not be available to those who apply her patches. If you are OK with accepting a patch application to a tree with the same blobs the diff was taken from, then the format-patch output already has all the necessary information. For each "diff --git" part, there is the "index $old..$new" line that records preimage and postimage blob object ID, so you should be able to find a tree that has the matching blobs in the paths mentioned in the patch, and it is guaranteed that the patch would apply cleanly to such a tree. Of course, that requires the recipient of the patch to have the all the blobs mentioned as the preimage in the patch, but it would be a reasonable assumption, as your patch assumes that the recipient has the commit, and if he has a commit by definition he would have all the blobs recorded by that commit. Incidentally, the "index $old..$new" lines are what make "am -3" possible. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html