Re: Libtaskotron - allow non-cli data input
> On Wed, Feb 8, 2017 at 2:26 PM, Kamil Paral < kpa...@redhat.com > wrote: > > > This is what I meant - keeping item as is, but being able to pass another > > > structure to the formula, which can then be used from it. I'd still like > > > to > > > keep the item to a single string, so it can be queried easily in the > > > resultsdb. The item should still represent what was tested. It's just > > > that > > > I > > > want to be able to pass arbitrary data to the formulae, without the need > > > for > > > ugly hacks like we have seen with the git commits lately. > > > > > So, the question is now how much we want the `item` to uniquely identify > > the > > item under test. Currently we mostly do (rpmlint, rpmgrill) and sometimes > > don't (depcheck, because item is NVR, but the full ID is NEVRA, and we > > store > > arch in the results extradata section). > > I still kind of believe that the `item` should be chosen with great respect > to what actually is the item under test, but it also really depends on what > you want to do with it later on. Note that the `item` is actually a > convention (yay, more water to adamw's "if we only had some awesome new > project" mill), and is not enforced in any way. I believe that there should > be firm rules (once again - conventions) on what the item is for each "well > known" item type, so you can kind-of assume that if you query for > `item=foo&type=koji_build` you are getting the results related to that > build. > As we were discussing privately with the item types (I'm not going to go into > much detail here, but for the rest of you guys - I'm contemplating making > the types more general, and using more of the 'metadata' to store additional > spefics - like replacing `type=koji_build` with `type=build, source=koji`, > or `type=build, source=brew` - on the high level, you know that a > package/build was tested, and you don't really care where it came from, but > you sometimes might care, and so there is the additional metadata stored. We > could even have more types stored for one results, or I don't know... It's > complicated), the idea behind item is that it should be a reasonable value, > that carries the "what was tested" information, and you will use the other > "extra-data" fields to provide more details (like we kind-of want to do with > arch, but we don't really..). The reason for it to be 'reasonable value" and > not "superset of all values that we have" is to make the general querying a > bit more straightforward. > > If we have structured input data, what happens to `item` for > > check_modulemd? > > Currently it is "namespace/module#commithash". Will it stay the same, and > > they'll just avoid parsing it because we'll also provide ${data.namespace}, > > ${data.module} and ${data.hash}? Or will the `item` be perhaps just > > "module" > > (and the rest will be stored as extradata)? What happens when we have a > > generic git_commit type, and the source can be an arbitrary service? Will > > have some convention to use item as "giturl#commithash"? > > Once again - whatever makes sense as the item. For me that would be the > Repo/SHA combo, with server, repo, branch, and commit in extradata. > And it comes to "storing as much relevant metadata as possible" once again. > The thing is, that as long as stuff is predictable, it almost does not > matter what it is, and it once again points out how good of an idea is the > conventions stuff. I believe that we are now storing much less metadata in > resultsdb than we should, and it is caused mostly by the fact that > - we did not really need to use the results much so far > - it is pretty hard to pass data into libtaskotron, and querying all the > services all the time, to get the metadata, is/was deemed a bad idea - why > do it ourselves, if the consumer can get it himself. They know that it is > koji_build, so they can query koji. > There is a fine balance to be struck, IMO, so we don't end up storing "all > the data" in resultsdb. But I believe that the stuff relevant for the result > consumption should be there. > > Because the ideal way would be to store the whole item+data structure as > > item > > in resultsdb. But that's hard to query for humans, so we want a simple > > string as an identifier. > > This, for me, is once again about being predictable. As I said above, I still > think that `item` should be a reasonable identifier, but not necessary a > superset of all the info. That is what the extra data is for. Talking > about... > > But sometimes there can be a lot of data points which uniquely identify the > > thing under test only when you specify it all (for example what Dan wrote, > > sometimes the ID is the old NVR *plus* the new NVR). Will we want to > > somehow > > combine them into a single item value? We should give some directions how > > people should construct their items. > > My gut feeling here would be storing the "new NVR" (the thing that actually > caused the test to be execute
Re: Libtaskotron - allow non-cli data input
On Thu, 2017-02-09 at 00:29 +0100, Josef Skladanka wrote: > On Wed, Feb 8, 2017 at 8:06 PM, Adam Williamson > wrote: > > > Wouldn't it be great if we had a brand new project which would be the > > ideal place to represent such conventions, so the bit of taskotron > > which reported the results could construct them conveniently? :P > > > https://xkcd.com/684/ :) (I mean no offense just really reminded me of that) Hmm, clearly we need a *** CONVENTION *** for quoting xkcd ;) -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net ___ qa-devel mailing list -- qa-devel@lists.fedoraproject.org To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org
Re: Libtaskotron - allow non-cli data input
On Wed, Feb 8, 2017 at 8:06 PM, Adam Williamson wrote: > Wouldn't it be great if we had a brand new project which would be the > ideal place to represent such conventions, so the bit of taskotron > which reported the results could construct them conveniently? :P https://xkcd.com/684/ :) (I mean no offense just really reminded me of that) ___ qa-devel mailing list -- qa-devel@lists.fedoraproject.org To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org
Re: Libtaskotron - allow non-cli data input
On Wed, Feb 8, 2017 at 7:39 PM, Kamil Paral wrote: > > I mentioned this in IRC but why not have a bit of both and allow input > > as either a file or on the CLI. I don't think that json would be too > > bad to type on the command line as an option for when you're running > > something manually: > > > > runtask sometask.yml -e "{'namespace':'someuser',\ > > 'module':'somemodule', 'commithash': 'abc123df980'}" > > I probably misunderstood you on IRC. In my older response here, I actually > suggested something like this - having "--datafile data.json", which can > also be used like "--datafile -" meaning stdin. You can then use "echo > | runtask --datafile - ". But your solution is probably > easier to look at. > I honestl like the `--datafile [fname, -]` approach a lot. We could sure name the param better, but that's about it. I like it better than necessarily having a long cmdline, and you can still use "echo " if you wanted to have a cmdline example, or "cat " for the common usage > > There would be some risk of running into the same problems we had with > > AutoQA where depcheck commands were too long for bash to parse but > > that's when I'd say "you need to use a file for that" > > Definitely. > And that's why I'd rather stay away from long cmdlines :) > > > > I'm a bit torn between providing as much useful data as we can when > > > scheduling (because a) yaml formulas are very limited and you can't > > > do stuff like string parsing/splitting b) might save you a lot of > > > work/code to have this data presented to you right from the start), > > > and the easy manual execution (when you need to gather and provide > > > all that data manually). It's probably about finding the right > > > balance. We can't avoid having structured multi-data input, I don't > > > think. > > > > If we did something along the lines of allowing input on the CLI, we > > could have both, no? We'd need to be clear on the precedence of file vs > > CLI input but that seems to me like something that could solve the > > issue of dealing with more complicated inputs without requiring users > > to futz with a file when running tasks locally. > > That's not the worry I had. Creating a file or writing json to a command > line is a bit more work than the current state, but not a problem. What I'm > a bit afraid of is that we'll start adding many keyvals into the json just > because it is useful or convenient. As an artificial example, let's say for > a koji_build FOO we supply NVR, name, epoch, owner, build_id and > build_timestamp. And if we receive all of that in the fedmsg (or from some > koji query that we'll need to do anyway for some reason), it makes sense to > pass that data, it's free for us and it's less work for the task (it > doesn't have to do its own queries). However, running the task manually as > a task developer (and I don't mean re-running an existing task on FOO by > copy-pasting the existing data json from a log file, but running it on a > fresh new koji build BAR) makes it much more difficult for the developer, > because he needs to figure out (manually) all those values for BAR just to > be able to run his task. > Even more extreme (deliberately, to illustrate the point) example would be > to pass the whole koji buildinfo dict structure that you get when running > koji.getBuild(). Which could be actually easier for the developer to > emulate, because we could document a single command that retrieves exactly > that. Unless we start adding additional data to it... > > So on one hand, I'd like to pass as much data as we have to make task > formulas simpler, but on the other hand, I'm afraid task development > (manual task execution, without having a trigger to get all this data by > magic) will get harder. (I hope I managed to explain it better this time:)) > _ As I mentioned in one of the other emails - the dev (while developing) should really only need to provide the data that is relevant for the task/formula. Why have a ton of stuff that you never use in the "testing data" - it is unnecessary work, and even makes it more prone to error IMO. If I had task that only needs NVR, name and build_timestamp, I'd (while developing/testing) just pass a structure containing these. Or do you think that is a bad idea? I sure can see how (e.g.) the resultsdb directive could be spitting warnings out about missing data, but that is why we have the different profiles - the resultsdb could fail in production mode, if data was missing (and that probably means some serious error) or just warn you in development mode. If you wanted to "test it thoroughly" you'd better use some real data anyway - and if we store the "input data structure" in logs for the tasks, then there even is a good source of those, should you want to copy-paste it. I hope I understood what you meant. joza ___ qa-devel mailing list -- qa-devel@lists.fedoraproject.org To unsubscribe sen
Re: Libtaskotron - allow non-cli data input
On Wed, Feb 8, 2017 at 4:11 PM, Tim Flink wrote: > On Wed, 8 Feb 2017 08:26:30 -0500 (EST) > Kamil Paral wrote: > > I think another question is whether we want to keep assuming that the > *user supplies the item* that is used as a UID in resultsdb. As you say, > it seems a bit odd to require people to munge stuff together like > "namespace/module#commithash" at the same time that it can be separated > out into a dict-like data structure for easy access. > > Emphasis mine. I think that we should not really be assuming that at all. In most cases, the item should be provided by the trigger automagically, the same with the type. With what I'd like to see for the structured input, the conventions module could/should take that data into account while constructing the "default" results. Keep in mind, that the one result can also have multiple "items" (as it can have a multiple of any extra data field), if it makes sense. One, the "auto-provided" and the second could be user-added. That would make it both consistent (the tirgger generated item) and flexible, if a different "item" makes sense. Would it make more sense to just pass in the dict and have semi-coded > conventions for reporting to resultsdb based on the item_type which > could be set during the task instead of requiring that to be known > before task execution time? > > Something along the lines of enabling some common kinds of input for > the resultsdb directive - module commit, dist-git rpm change, etc. so > that you could specify the item_type to the resultsdb directive and it > would know to look for certain bits to construct the UID item that's > reported to resultsdb. > Yup, I think that setting some conventions, and making sure we keep the same (or at least very similar) set of metadata for the relevant type is a key. I mentioned this in the previous email, but I am, in the past few days, thinking about making the types a bit more general - the pretty specific types we have now made sense, when we first designed stuff, and had a very narrow usecase. Now that we want to make the stack usable in stuff like Platform CI, I think it would make sense to abstract a bit more, so we don't have `koji_build`, `brew_build`, `copr_build` which are essentialy the same, but differ in minor details. We can specify those classes/details in extradata, or could even use multiple types - having the common set of information guaranteed for all the 'build' type, and add other kind of data to `koji_build`, `brew_build` of `whatever_build` as needed. > Using Kamil's example, assume that we have a task for a module and the > following data is passed in: > > {'namespace':'someuser', 'module':'httpd', 'commithash':'abc123df980'} > > Neither item nor type is specified on the CLI at execution time. The > task executes using that input data and when it comes time to report to > resultsdb: > > - name: report results to resultsdb > resultsdb: > results: ${some_task_output} > type: module > > By passing in that type of module, the directive would look through the > input data and construct the "item" from input.namespace, input.module > and input.commithash. > > I'm not sure if it makes more sense to have a set of "types" that the > resultsdb directive understands natively or to actually require item > but allow variable names in it along the lines of > > "item":"${namespace}/${module}#${commithash}" > I'd rather have that in "conventions" than the resultsdb directive, but I guess it is essentialy the same thing, once you think about it. > > > > My take on this is, that we will say which variables are provided > > > by the trigger for each type. If a variable is missing, the > > > formula/execution should just crash when it tries to access it. > > > > Sounds reasonable. > > +1 from me as well. Assume everything is there, crash if there's > something requested that isn't available (missing data etc.) > > yup, that's what I have in mind. > > We'll probably end up having a mix of necessary and convenience > > values in the inputdata. "name" is probably a convenience value here, > > so that tasks don't have to parse if they need to use it in a certain > > directive. "epoch" might be an important value for some test cases, > > and let's say we learn the value in trigger during scheduling > > investigation, so we decide to pass it down. But that information is > > not that easy to get manually. If you know what to do, you'll open up > > a particular koji page and see it. But you can also be clueless about > > how to figure it out. The same goes for build_id, again can be > > important, but also can be retrieved later, so more of a convenience > > data (saving you from writing a koji query). This is just an example > > for illustration, might not match real-world use cases. > > I mentioned this in IRC but why not have a bit of both and allow input > as either a file or on the CLI. I don't think that json would be too > bad to type on the command line a
Re: Libtaskotron - allow non-cli data input
On Wed, Feb 8, 2017 at 2:26 PM, Kamil Paral wrote: > This is what I meant - keeping item as is, but being able to pass another > structure to the formula, which can then be used from it. I'd still like to > keep the item to a single string, so it can be queried easily in the > resultsdb. The item should still represent what was tested. It's just that > I want to be able to pass arbitrary data to the formulae, without the need > for ugly hacks like we have seen with the git commits lately. > > > So, the question is now how much we want the `item` to uniquely identify > the item under test. Currently we mostly do (rpmlint, rpmgrill) and > sometimes don't (depcheck, because item is NVR, but the full ID is NEVRA, > and we store arch in the results extradata section). > > I still kind of believe that the `item` should be chosen with great respect to what actually is the item under test, but it also really depends on what you want to do with it later on. Note that the `item` is actually a convention (yay, more water to adamw's "if we only had some awesome new project" mill), and is not enforced in any way. I believe that there should be firm rules (once again - conventions) on what the item is for each "well known" item type, so you can kind-of assume that if you query for `item=foo&type=koji_build` you are getting the results related to that build. As we were discussing privately with the item types (I'm not going to go into much detail here, but for the rest of you guys - I'm contemplating making the types more general, and using more of the 'metadata' to store additional spefics - like replacing `type=koji_build` with `type=build, source=koji`, or `type=build, source=brew` - on the high level, you know that a package/build was tested, and you don't really care where it came from, but you sometimes might care, and so there is the additional metadata stored. We could even have more types stored for one results, or I don't know... It's complicated), the idea behind item is that it should be a reasonable value, that carries the "what was tested" information, and you will use the other "extra-data" fields to provide more details (like we kind-of want to do with arch, but we don't really..). The reason for it to be 'reasonable value" and not "superset of all values that we have" is to make the general querying a bit more straightforward. > If we have structured input data, what happens to `item` for > check_modulemd? Currently it is "namespace/module#commithash". Will it stay > the same, and they'll just avoid parsing it because we'll also provide > ${data.namespace}, ${data.module} and ${data.hash}? Or will the `item` be > perhaps just "module" (and the rest will be stored as extradata)? What > happens when we have a generic git_commit type, and the source can be an > arbitrary service? Will have some convention to use item as > "giturl#commithash"? > > Once again - whatever makes sense as the item. For me that would be the Repo/SHA combo, with server, repo, branch, and commit in extradata. And it comes to "storing as much relevant metadata as possible" once again. The thing is, that as long as stuff is predictable, it almost does not matter what it is, and it once again points out how good of an idea is the conventions stuff. I believe that we are now storing much less metadata in resultsdb than we should, and it is caused mostly by the fact that - we did not really need to use the results much so far - it is pretty hard to pass data into libtaskotron, and querying all the services all the time, to get the metadata, is/was deemed a bad idea - why do it ourselves, if the consumer can get it himself. They know that it is koji_build, so they can query koji. There is a fine balance to be struck, IMO, so we don't end up storing "all the data" in resultsdb. But I believe that the stuff relevant for the result consumption should be there. Because the ideal way would be to store the whole item+data structure as > item in resultsdb. But that's hard to query for humans, so we want a simple > string as an identifier. > This, for me, is once again about being predictable. As I said above, I still think that `item` should be a reasonable identifier, but not necessary a superset of all the info. That is what the extra data is for. Talking about... > But sometimes there can be a lot of data points which uniquely identify > the thing under test only when you specify it all (for example what Dan > wrote, sometimes the ID is the old NVR *plus* the new NVR). Will we want to > somehow combine them into a single item value? We should give some > directions how people should construct their items. > > My gut feeling here would be storing the "new NVR" (the thing that actually caused the test to be executed) as item, and adding 'old nvr' to extra data. But I'm not that familiar with the specific usecase. To me, this would make sense, because when you query for "this NVR related results" you'd get the results too. If you wan
Re: Libtaskotron - allow non-cli data input
On Wed, 2017-02-08 at 08:11 -0700, Tim Flink wrote: > Would it make more sense to just pass in the dict and have semi-coded > conventions for reporting to resultsdb based on the item_type which > could be set during the task instead of requiring that to be known > before task execution time? Wouldn't it be great if we had a brand new project which would be the ideal place to represent such conventions, so the bit of taskotron which reported the results could construct them conveniently? :P -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net ___ qa-devel mailing list -- qa-devel@lists.fedoraproject.org To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org
Re: Libtaskotron - allow non-cli data input
> I think another question is whether we want to keep assuming that the > user supplies the item that is used as a UID in resultsdb. As you say, > it seems a bit odd to require people to munge stuff together like > "namespace/module#commithash" at the same time that it can be separated > out into a dict-like data structure for easy access. > > Would it make more sense to just pass in the dict and have semi-coded > conventions for reporting to resultsdb based on the item_type which > could be set during the task instead of requiring that to be known > before task execution time? > > Something along the lines of enabling some common kinds of input for > the resultsdb directive - module commit, dist-git rpm change, etc. so > that you could specify the item_type to the resultsdb directive and it > would know to look for certain bits to construct the UID item that's > reported to resultsdb. > > Using Kamil's example, assume that we have a task for a module and the > following data is passed in: > > {'namespace':'someuser', 'module':'httpd', 'commithash':'abc123df980'} > > Neither item nor type is specified on the CLI at execution time. The > task executes using that input data and when it comes time to report to > resultsdb: > > - name: report results to resultsdb > resultsdb: > results: ${some_task_output} > type: module > > By passing in that type of module, the directive would look through the > input data and construct the "item" from input.namespace, input.module > and input.commithash. I'll have to think about this, maybe sketch up some examples. > I mentioned this in IRC but why not have a bit of both and allow input > as either a file or on the CLI. I don't think that json would be too > bad to type on the command line as an option for when you're running > something manually: > > runtask sometask.yml -e "{'namespace':'someuser',\ > 'module':'somemodule', 'commithash': 'abc123df980'}" I probably misunderstood you on IRC. In my older response here, I actually suggested something like this - having "--datafile data.json", which can also be used like "--datafile -" meaning stdin. You can then use "echo | runtask --datafile - ". But your solution is probably easier to look at. > > There would be some risk of running into the same problems we had with > AutoQA where depcheck commands were too long for bash to parse but > that's when I'd say "you need to use a file for that" Definitely. > > I'm a bit torn between providing as much useful data as we can when > > scheduling (because a) yaml formulas are very limited and you can't > > do stuff like string parsing/splitting b) might save you a lot of > > work/code to have this data presented to you right from the start), > > and the easy manual execution (when you need to gather and provide > > all that data manually). It's probably about finding the right > > balance. We can't avoid having structured multi-data input, I don't > > think. > > If we did something along the lines of allowing input on the CLI, we > could have both, no? We'd need to be clear on the precedence of file vs > CLI input but that seems to me like something that could solve the > issue of dealing with more complicated inputs without requiring users > to futz with a file when running tasks locally. That's not the worry I had. Creating a file or writing json to a command line is a bit more work than the current state, but not a problem. What I'm a bit afraid of is that we'll start adding many keyvals into the json just because it is useful or convenient. As an artificial example, let's say for a koji_build FOO we supply NVR, name, epoch, owner, build_id and build_timestamp. And if we receive all of that in the fedmsg (or from some koji query that we'll need to do anyway for some reason), it makes sense to pass that data, it's free for us and it's less work for the task (it doesn't have to do its own queries). However, running the task manually as a task developer (and I don't mean re-running an existing task on FOO by copy-pasting the existing data json from a log file, but running it on a fresh new koji build BAR) makes it much more difficult for the developer, because he needs to figure out (manually) all those values for BAR just to be able to run his task. Even more extreme (deliberately, to illustrate the point) example would be to pass the whole koji buildinfo dict structure that you get when running koji.getBuild(). Which could be actually easier for the developer to emulate, because we could document a single command that retrieves exactly that. Unless we start adding additional data to it... So on one hand, I'd like to pass as much data as we have to make task formulas simpler, but on the other hand, I'm afraid task development (manual task execution, without having a trigger to get all this data by magic) will get harder. (I hope I managed to explain it better this time:)) __
Re: Libtaskotron - allow non-cli data input
On Wed, 8 Feb 2017 08:26:30 -0500 (EST) Kamil Paral wrote: > > This is what I meant - keeping item as is, but being able to pass > > another structure to the formula, which can then be used from it. > > I'd still like to keep the item to a single string, so it can be > > queried easily in the resultsdb. The item should still represent > > what was tested. It's just that I want to be able to pass arbitrary > > data to the formulae, without the need for ugly hacks like we have > > seen with the git commits lately. > > So, the question is now how much we want the `item` to uniquely > identify the item under test. Currently we mostly do (rpmlint, > rpmgrill) and sometimes don't (depcheck, because item is NVR, but the > full ID is NEVRA, and we store arch in the results extradata > section). > > If we have structured input data, what happens to `item` for > check_modulemd? Currently it is "namespace/module#commithash". Will > it stay the same, and they'll just avoid parsing it because we'll > also provide ${data.namespace}, ${data.module} and ${data.hash}? Or > will the `item` be perhaps just "module" (and the rest will be stored > as extradata)? What happens when we have a generic git_commit type, > and the source can be an arbitrary service? Will have some convention > to use item as "giturl#commithash"? I think another question is whether we want to keep assuming that the user supplies the item that is used as a UID in resultsdb. As you say, it seems a bit odd to require people to munge stuff together like "namespace/module#commithash" at the same time that it can be separated out into a dict-like data structure for easy access. Would it make more sense to just pass in the dict and have semi-coded conventions for reporting to resultsdb based on the item_type which could be set during the task instead of requiring that to be known before task execution time? Something along the lines of enabling some common kinds of input for the resultsdb directive - module commit, dist-git rpm change, etc. so that you could specify the item_type to the resultsdb directive and it would know to look for certain bits to construct the UID item that's reported to resultsdb. Using Kamil's example, assume that we have a task for a module and the following data is passed in: {'namespace':'someuser', 'module':'httpd', 'commithash':'abc123df980'} Neither item nor type is specified on the CLI at execution time. The task executes using that input data and when it comes time to report to resultsdb: - name: report results to resultsdb resultsdb: results: ${some_task_output} type: module By passing in that type of module, the directive would look through the input data and construct the "item" from input.namespace, input.module and input.commithash. I'm not sure if it makes more sense to have a set of "types" that the resultsdb directive understands natively or to actually require item but allow variable names in it along the lines of "item":"${namespace}/${module}#${commithash}" > Because the ideal way would be to store the whole item+data structure > as item in resultsdb. But that's hard to query for humans, so we want > a simple string as an identifier. But sometimes there can be a lot of > data points which uniquely identify the thing under test only when > you specify it all (for example what Dan wrote, sometimes the ID is > the old NVR *plus* the new NVR). Will we want to somehow combine them > into a single item value? We should give some directions how people > should construct their items. > > > > I guess it depends whether the extra data will be mandatory and > > > exactly defined ("this item type provides these input values") or > > > not (what will formulas do when they're not there?). Also whether > > > we want to make it still possible to execute a task with simple > > > `--item string` in some kind of fallback mode, to keep local > > > execution on dev machines still easy and simple. > > > > > My take on this is, that we will say which variables are provided > > by the trigger for each type. If a variable is missing, the > > formula/execution should just crash when it tries to access it. > > Sounds reasonable. +1 from me as well. Assume everything is there, crash if there's something requested that isn't available (missing data etc.) > > Not sure about the fallback mode, but my take on this is, that if > > the user will want to run the task, he will have to just write the > > "extra data" once to a file, and then it will be just passed in as > > usual. > > Once, for each item :) If a task developer wants to execute his task > on a NVR, he'll need to prepare the structured input data for each > NVR he wants to test. That might not be difficult, but it's more work > than it is currently, so we should know whether we're fine with that. > I guess we are, but there can be some gotchas, e.g. sometimes it > might not be obvious how to get some extra data. For example: > >
Re: Libtaskotron - allow non-cli data input
> This is what I meant - keeping item as is, but being able to pass another > structure to the formula, which can then be used from it. I'd still like to > keep the item to a single string, so it can be queried easily in the > resultsdb. The item should still represent what was tested. It's just that I > want to be able to pass arbitrary data to the formulae, without the need for > ugly hacks like we have seen with the git commits lately. So, the question is now how much we want the `item` to uniquely identify the item under test. Currently we mostly do (rpmlint, rpmgrill) and sometimes don't (depcheck, because item is NVR, but the full ID is NEVRA, and we store arch in the results extradata section). If we have structured input data, what happens to `item` for check_modulemd? Currently it is "namespace/module#commithash". Will it stay the same, and they'll just avoid parsing it because we'll also provide ${data.namespace}, ${data.module} and ${data.hash}? Or will the `item` be perhaps just "module" (and the rest will be stored as extradata)? What happens when we have a generic git_commit type, and the source can be an arbitrary service? Will have some convention to use item as "giturl#commithash"? Because the ideal way would be to store the whole item+data structure as item in resultsdb. But that's hard to query for humans, so we want a simple string as an identifier. But sometimes there can be a lot of data points which uniquely identify the thing under test only when you specify it all (for example what Dan wrote, sometimes the ID is the old NVR *plus* the new NVR). Will we want to somehow combine them into a single item value? We should give some directions how people should construct their items. > > I guess it depends whether the extra data will be mandatory and exactly > > defined ("this item type provides these input values") or not (what will > > formulas do when they're not there?). Also whether we want to make it still > > possible to execute a task with simple `--item string` in some kind of > > fallback mode, to keep local execution on dev machines still easy and > > simple. > > My take on this is, that we will say which variables are provided by the > trigger for each type. If a variable is missing, the formula/execution > should just crash when it tries to access it. Sounds reasonable. > Not sure about the fallback mode, but my take on this is, that if the user > will want to run the task, he will have to just write the "extra data" once > to a file, and then it will be just passed in as usual. Once, for each item :) If a task developer wants to execute his task on a NVR, he'll need to prepare the structured input data for each NVR he wants to test. That might not be difficult, but it's more work than it is currently, so we should know whether we're fine with that. I guess we are, but there can be some gotchas, e.g. sometimes it might not be obvious how to get some extra data. For example: nvr = htop-1.0-2.fc25 inputdata = {name: htop; epoch: 0; build_id: 482735} We'll probably end up having a mix of necessary and convenience values in the inputdata. "name" is probably a convenience value here, so that tasks don't have to parse if they need to use it in a certain directive. "epoch" might be an important value for some test cases, and let's say we learn the value in trigger during scheduling investigation, so we decide to pass it down. But that information is not that easy to get manually. If you know what to do, you'll open up a particular koji page and see it. But you can also be clueless about how to figure it out. The same goes for build_id, again can be important, but also can be retrieved later, so more of a convenience data (saving you from writing a koji query). This is just an example for illustration, might not match real-world use cases. > We could even make some templates for each item_type (I guess trigger docs > are the place for it?), so people can easily just copy-paste it, and make > changes. > I also think that providing a sample json file to the existing tasks (that > are using it) is a best practice we should strive for. That is a very good idea, because it could help with the problem described above. It could also document how to retrieve values manually, if needed. Also, we have to make sure that a user can copy&paste a command to run the same production task in his local environment. We should either have it in the log file, or in execdb (or both), but it needs to include full inputdata, so that you can still easily run the same thing locally for debugging. > Makes sense? I'm a bit torn between providing as much useful data as we can when scheduling (because a) yaml formulas are very limited and you can't do stuff like string parsing/splitting b) might save you a lot of work/code to have this data presented to you right from the start), and the easy manual execution (when you need to gather and provide all t
Re: Libtaskotron - allow non-cli data input
Excerpts from Josef Skladanka's message of 2017-02-07 11:23 +01:00: > On Mon, Feb 6, 2017 at 6:49 PM, Kamil Paral wrote: > > > The formulas already provide a way to 'query' structured data via the > > dot-format, so we could do with as much as passing some variable like > > 'task_data' that would contain the parsed json/yaml. > > > > > > Or are you proposing we add another variable with these extra values, like > > this? > > > > echo " {'branch': 'master', 'commit': '6e4fc7'} " | runtask --item > > libtaskotron --type pagure_git_commit --data-file - runtask.yaml > > > > or this: > > > > echo " {'name': 'htop'} " | runtask --item htop-2.0-1.fc25 --type > > koji_build --data-file - runtask.yaml > > > > and then use ${item} and ${data.branch}, ${data.commit}, or ${data.name} ? > > > > > > > This is what I meant - keeping item as is, but being able to pass another > structure to the formula, which can then be used from it. I'd still like to > keep the item to a single string, so it can be queried easily in the > resultsdb. The item should still represent what was tested. It's just that > I want to be able to pass arbitrary data to the formulae, without the need > for ugly hacks like we have seen with the git commits lately. We ran into a similar problem with how to represent RPMDiff results in ResultsDB. The problem for RPMDiff is that the results are not for a single build NVR, but for a pair of NVRs (the build being analyzed plus the older "baseline" build it is comparing against). The solution we came up with (not yet implemented) was to use a new item type, with two extra data keys "oldnvr" and "newnvr" to specify the two builds in the comparison. We would also store "newnvr" as "item" as well, for consistency with other result types. -- Dan Callaghan Senior Software Engineer, Products & Technologies Operations Red Hat signature.asc Description: PGP signature ___ qa-devel mailing list -- qa-devel@lists.fedoraproject.org To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org
Re: Libtaskotron - allow non-cli data input
On Mon, Feb 6, 2017 at 6:49 PM, Kamil Paral wrote: > The formulas already provide a way to 'query' structured data via the > dot-format, so we could do with as much as passing some variable like > 'task_data' that would contain the parsed json/yaml. > > > Or are you proposing we add another variable with these extra values, like > this? > > echo " {'branch': 'master', 'commit': '6e4fc7'} " | runtask --item > libtaskotron --type pagure_git_commit --data-file - runtask.yaml > > or this: > > echo " {'name': 'htop'} " | runtask --item htop-2.0-1.fc25 --type > koji_build --data-file - runtask.yaml > > and then use ${item} and ${data.branch}, ${data.commit}, or ${data.name} ? > > > This is what I meant - keeping item as is, but being able to pass another structure to the formula, which can then be used from it. I'd still like to keep the item to a single string, so it can be queried easily in the resultsdb. The item should still represent what was tested. It's just that I want to be able to pass arbitrary data to the formulae, without the need for ugly hacks like we have seen with the git commits lately. > I guess it depends whether the extra data will be mandatory and exactly > defined ("this item type provides these input values") or not (what will > formulas do when they're not there?). Also whether we want to make it still > possible to execute a task with simple `--item string` in some kind of > fallback mode, to keep local execution on dev machines still easy and > simple. > > My take on this is, that we will say which variables are provided by the trigger for each type. If a variable is missing, the formula/execution should just crash when it tries to access it. Not sure about the fallback mode, but my take on this is, that if the user will want to run the task, he will have to just write the "extra data" once to a file, and then it will be just passed in as usual. We could even make some templates for each item_type (I guess trigger docs are the place for it?), so people can easily just copy-paste it, and make changes. I also think that providing a sample json file to the existing tasks (that are using it) is a best practice we should strive for. Makes sense? Joza ___ qa-devel mailing list -- qa-devel@lists.fedoraproject.org To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org
Re: Libtaskotron - allow non-cli data input
> Chaps, > we were discussing this many times in the past, and as with the > type-restriction, I think this is the right time to get this done, actually. > It sure ties to the fact, that I'm trying to put together > Taskotron-continuously-testing-Taskotron together - the idea here being that > on each commit to a devel branch on any of the Taskotron components, we will > spin-up a testing instance of the whole stack, and run some integration > tests. > To do this, I added a new consumer to Trigger ( > https://phab.qa.fedoraproject.org/D1110 ) that eats Pagure.io commits, and > spins jobs based on that. > This means, that I want to have the repo, branch and commit id as input for > the job, thus making yet-another-nasty-hack to pass the combined data into > the job ( https://phab.qa.fedoraproject.org/D1110#C16697NL18 ) so I can hack > it apart later on either in the formula or in the task itself. Ewww :) That's the same approach modularity folks need to use for dist_git_commit. We need to improve this, agreed. > It would be very helpfull to be able to pass some structured data into the > task instead. > I kind of remember that we agreed on json/yaml. Yes, that seems reasonable. I guess I'd prefer using yaml parser for that, because it can also understand classic json. In infra, I guess json will be used (coming from fedmsgs), but in documentation we can use better readable yaml examples. > The possibilities were either reading it from stdin or file. I don't really > care that much either way, but would probably feel a bit better about having > a cli-param to pass the filename there. I propose using a cli-param with a file path, with a special case for "-" meaning stdin. That is the common syntax for many unix tools. > The formulas already provide a way to 'query' structured data via the > dot-format, so we could do with as much as passing some variable like > 'task_data' that would contain the parsed json/yaml. Here I'm a bit unclear. Are you proposing that "item" itself will be structured, e.g. like this? echo " {'repo': 'libtaskotron', 'branch': 'master', 'commit': '6e4fc7'} " | runtask --item-file - --type pagure_git_commit runtask.yaml or this: echo " {'nvr': 'htop-2.0-1.fc25', 'name': 'htop'} " | runtask --item-file - --type koji_build runtask.yaml In this case we would be able to use ${item.repo}, ${item.branch}, ${item.commit} or ${item.nvr}, ${item.name} directly in the formula. Would we still keep `--item string`, or would it no longer have any use? Or are you proposing we add another variable with these extra values, like this? echo " {'branch': 'master', 'commit': '6e4fc7'} " | runtask --item libtaskotron --type pagure_git_commit --data-file - runtask.yaml or this: echo " {'name': 'htop'} " | runtask --item htop-2.0-1.fc25 --type koji_build --data-file - runtask.yaml and then use ${item} and ${data.branch}, ${data.commit}, or ${data.name} ? I guess it depends whether the extra data will be mandatory and exactly defined ("this item type provides these input values") or not (what will formulas do when they're not there?). Also whether we want to make it still possible to execute a task with simple `--item string` in some kind of fallback mode, to keep local execution on dev machines still easy and simple. ___ qa-devel mailing list -- qa-devel@lists.fedoraproject.org To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org