Re: Libtaskotron - allow non-cli data input

2017-02-14 Thread Kamil Paral
> On Wed, Feb 8, 2017 at 2:26 PM, Kamil Paral < kpa...@redhat.com > wrote:

> > > This is what I meant - keeping item as is, but being able to pass another
> > > structure to the formula, which can then be used from it. I'd still like
> > > to
> > > keep the item to a single string, so it can be queried easily in the
> > > resultsdb. The item should still represent what was tested. It's just
> > > that
> > > I
> > > want to be able to pass arbitrary data to the formulae, without the need
> > > for
> > > ugly hacks like we have seen with the git commits lately.
> > 
> 

> > So, the question is now how much we want the `item` to uniquely identify
> > the
> > item under test. Currently we mostly do (rpmlint, rpmgrill) and sometimes
> > don't (depcheck, because item is NVR, but the full ID is NEVRA, and we
> > store
> > arch in the results extradata section).
> 

> I still kind of believe that the `item` should be chosen with great respect
> to what actually is the item under test, but it also really depends on what
> you want to do with it later on. Note that the `item` is actually a
> convention (yay, more water to adamw's "if we only had some awesome new
> project" mill), and is not enforced in any way. I believe that there should
> be firm rules (once again - conventions) on what the item is for each "well
> known" item type, so you can kind-of assume that if you query for
> `item=foo&type=koji_build` you are getting the results related to that
> build.
> As we were discussing privately with the item types (I'm not going to go into
> much detail here, but for the rest of you guys - I'm contemplating making
> the types more general, and using more of the 'metadata' to store additional
> spefics - like replacing `type=koji_build` with `type=build, source=koji`,
> or `type=build, source=brew` - on the high level, you know that a
> package/build was tested, and you don't really care where it came from, but
> you sometimes might care, and so there is the additional metadata stored. We
> could even have more types stored for one results, or I don't know... It's
> complicated), the idea behind item is that it should be a reasonable value,
> that carries the "what was tested" information, and you will use the other
> "extra-data" fields to provide more details (like we kind-of want to do with
> arch, but we don't really..). The reason for it to be 'reasonable value" and
> not "superset of all values that we have" is to make the general querying a
> bit more straightforward.

> > If we have structured input data, what happens to `item` for
> > check_modulemd?
> > Currently it is "namespace/module#commithash". Will it stay the same, and
> > they'll just avoid parsing it because we'll also provide ${data.namespace},
> > ${data.module} and ${data.hash}? Or will the `item` be perhaps just
> > "module"
> > (and the rest will be stored as extradata)? What happens when we have a
> > generic git_commit type, and the source can be an arbitrary service? Will
> > have some convention to use item as "giturl#commithash"?
> 

> Once again - whatever makes sense as the item. For me that would be the
> Repo/SHA combo, with server, repo, branch, and commit in extradata.
> And it comes to "storing as much relevant metadata as possible" once again.
> The thing is, that as long as stuff is predictable, it almost does not
> matter what it is, and it once again points out how good of an idea is the
> conventions stuff. I believe that we are now storing much less metadata in
> resultsdb than we should, and it is caused mostly by the fact that
> - we did not really need to use the results much so far
> - it is pretty hard to pass data into libtaskotron, and querying all the
> services all the time, to get the metadata, is/was deemed a bad idea - why
> do it ourselves, if the consumer can get it himself. They know that it is
> koji_build, so they can query koji.

> There is a fine balance to be struck, IMO, so we don't end up storing "all
> the data" in resultsdb. But I believe that the stuff relevant for the result
> consumption should be there.

> > Because the ideal way would be to store the whole item+data structure as
> > item
> > in resultsdb. But that's hard to query for humans, so we want a simple
> > string as an identifier.
> 

> This, for me, is once again about being predictable. As I said above, I still
> think that `item` should be a reasonable identifier, but not necessary a
> superset of all the info. That is what the extra data is for. Talking
> about...

> > But sometimes there can be a lot of data points which uniquely identify the
> > thing under test only when you specify it all (for example what Dan wrote,
> > sometimes the ID is the old NVR *plus* the new NVR). Will we want to
> > somehow
> > combine them into a single item value? We should give some directions how
> > people should construct their items.
> 

> My gut feeling here would be storing the "new NVR" (the thing that actually
> caused the test to be execute

Re: Libtaskotron - allow non-cli data input

2017-02-09 Thread Adam Williamson
On Thu, 2017-02-09 at 00:29 +0100, Josef Skladanka wrote:
> On Wed, Feb 8, 2017 at 8:06 PM, Adam Williamson 
> wrote:
> 
> > Wouldn't it be great if we had a brand new project which would be the
> > ideal place to represent such conventions, so the bit of taskotron
> > which reported the results could construct them conveniently? :P
> 
> 
> https://xkcd.com/684/ :) (I mean no offense just really reminded me of that)

Hmm, clearly we need a *** CONVENTION *** for quoting xkcd ;)
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
___
qa-devel mailing list -- qa-devel@lists.fedoraproject.org
To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org


Re: Libtaskotron - allow non-cli data input

2017-02-08 Thread Josef Skladanka
On Wed, Feb 8, 2017 at 8:06 PM, Adam Williamson 
wrote:

> Wouldn't it be great if we had a brand new project which would be the
> ideal place to represent such conventions, so the bit of taskotron
> which reported the results could construct them conveniently? :P


https://xkcd.com/684/ :) (I mean no offense just really reminded me of that)
___
qa-devel mailing list -- qa-devel@lists.fedoraproject.org
To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org


Re: Libtaskotron - allow non-cli data input

2017-02-08 Thread Josef Skladanka
On Wed, Feb 8, 2017 at 7:39 PM, Kamil Paral  wrote:

> > I mentioned this in IRC but why not have a bit of both and allow input
> > as either a file or on the CLI. I don't think that json would be too
> > bad to type on the command line as an option for when you're running
> > something manually:
> >
> >   runtask sometask.yml -e "{'namespace':'someuser',\
> > 'module':'somemodule', 'commithash': 'abc123df980'}"
>
> I probably misunderstood you on IRC. In my older response here, I actually
> suggested something like this - having "--datafile data.json", which can
> also be used like "--datafile -" meaning stdin. You can then use "echo
>  | runtask --datafile - ". But your solution is probably
> easier to look at.
>

I honestl like the `--datafile [fname, -]` approach a lot. We could sure
name the param better, but that's about it. I like it better than
necessarily having a long cmdline, and you can still use "echo " if
you wanted to have a cmdline example, or "cat " for the common usage



> > There would be some risk of running into the same problems we had with
> > AutoQA where depcheck commands were too long for bash to parse but
> > that's when I'd say "you need to use a file for that"
>
> Definitely.
>

And that's why I'd rather stay away from long cmdlines :)


>
> > > I'm a bit torn between providing as much useful data as we can when
> > > scheduling (because a) yaml formulas are very limited and you can't
> > > do stuff like string parsing/splitting b) might save you a lot of
> > > work/code to have this data presented to you right from the start),
> > > and the easy manual execution (when you need to gather and provide
> > > all that data manually). It's probably about finding the right
> > > balance. We can't avoid having structured multi-data input, I don't
> > > think.
> >
> > If we did something along the lines of allowing input on the CLI, we
> > could have both, no? We'd need to be clear on the precedence of file vs
> > CLI input but that seems to me like something that could solve the
> > issue of dealing with more complicated inputs without requiring users
> > to futz with a file when running tasks locally.
>
> That's not the worry I had. Creating a file or writing json to a command
> line is a bit more work than the current state, but not a problem. What I'm
> a bit afraid of is that we'll start adding many keyvals into the json just
> because it is useful or convenient. As an artificial example, let's say for
> a koji_build FOO we supply NVR, name, epoch, owner, build_id and
> build_timestamp. And if we receive all of that in the fedmsg (or from some
> koji query that we'll need to do anyway for some reason), it makes sense to
> pass that data, it's free for us and it's less work for the task (it
> doesn't have to do its own queries). However, running the task manually as
> a task developer (and I don't mean re-running an existing task on FOO by
> copy-pasting the existing data json from a log file, but running it on a
> fresh new koji build BAR) makes it much more difficult for the developer,
> because he needs to figure out (manually) all those values for BAR just to
> be able to run his task.
>

Even more extreme (deliberately, to illustrate the point) example would be
> to pass the whole koji buildinfo dict structure that you get when running
> koji.getBuild(). Which could be actually easier for the developer to
> emulate, because we could document a single command that retrieves exactly
> that. Unless we start adding additional data to it...
>
> So on one hand, I'd like to pass as much data as we have to make task
> formulas simpler, but on the other hand, I'm afraid task development
> (manual task execution, without having a trigger to get all this data by
> magic) will get harder. (I hope I managed to explain it better this time:))
> _


As I mentioned in one of the other emails - the dev (while developing)
should really only need to provide the data that is relevant for the
task/formula. Why have a ton of stuff that you never use in the "testing
data" - it is unnecessary work, and even makes it more prone to error IMO.
If I had task that only needs NVR, name and build_timestamp, I'd (while
developing/testing) just pass a structure containing these.

Or do you think that is a bad idea? I sure can see how (e.g.) the resultsdb
directive could be spitting warnings out about missing data, but that is
why we have the different profiles - the resultsdb could fail in production
mode, if data was missing (and that probably means some serious error) or
just warn you in development mode.
If you wanted to "test it thoroughly" you'd better use some real data
anyway - and if we store the "input data structure" in logs for the tasks,
then there even is a good source of those, should you want to copy-paste it.

I hope I understood what you meant.

joza
___
qa-devel mailing list -- qa-devel@lists.fedoraproject.org
To unsubscribe sen

Re: Libtaskotron - allow non-cli data input

2017-02-08 Thread Josef Skladanka
On Wed, Feb 8, 2017 at 4:11 PM, Tim Flink  wrote:

> On Wed, 8 Feb 2017 08:26:30 -0500 (EST)
> Kamil Paral  wrote:
>
> I think another question is whether we want to keep assuming that the
> *user supplies the item* that is used as a UID in resultsdb. As you say,
> it seems a bit odd to require people to munge stuff together like
> "namespace/module#commithash" at the same time that it can be separated
> out into a dict-like data structure for easy access.
>
>
Emphasis mine. I think that we should not really be assuming that at all.
In most cases, the item should be provided by the trigger automagically,
the same with the type. With what I'd like to see for the structured input,
the conventions module could/should take that data into account while
constructing the "default" results.
Keep in mind, that the one result can also have multiple "items" (as it can
have a multiple of any extra data field), if it makes sense. One, the
"auto-provided" and the second could be user-added. That would make it both
consistent (the tirgger generated item) and flexible, if a different "item"
makes sense.

Would it make more sense to just pass in the dict and have semi-coded
> conventions for reporting to resultsdb based on the item_type which
> could be set during the task instead of requiring that to be known
> before task execution time?
>
> Something along the lines of enabling some common kinds of input for
> the resultsdb directive - module commit, dist-git rpm change, etc. so
> that you could specify the item_type to the resultsdb directive and it
> would know to look for certain bits to construct the UID item that's
> reported to resultsdb.
>

Yup, I think that setting some conventions, and making sure we keep the
same (or at least very similar) set of metadata for the relevant type is a
key.
I mentioned this in the previous email, but I am, in the past few days,
thinking about making the types a bit more general - the pretty specific
types we have now made sense, when we first designed stuff, and had a very
narrow usecase.
Now that we want to make the stack usable in stuff like Platform CI, I
think it would make sense to abstract a bit more, so we don't have
`koji_build`, `brew_build`, `copr_build` which are essentialy the same, but
differ in minor details. We can specify those classes/details in extradata,
or could even use multiple types - having the common set of information
guaranteed for all the 'build' type, and add other kind of data to
`koji_build`, `brew_build` of `whatever_build` as needed.


> Using Kamil's example, assume that we have a task for a module and the
> following data is passed in:
>
>   {'namespace':'someuser', 'module':'httpd', 'commithash':'abc123df980'}
>
> Neither item nor type is specified on the CLI at execution time. The
> task executes using that input data and when it comes time to report to
> resultsdb:
>
>   - name: report results to resultsdb
> resultsdb:
>   results: ${some_task_output}
>   type: module
>
> By passing in that type of module, the directive would look through the
> input data and construct the "item" from input.namespace, input.module
> and input.commithash.
>
> I'm not sure if it makes more sense to have a set of "types" that the
> resultsdb directive understands natively or to actually require item
> but allow variable names in it along the lines of
>
>   "item":"${namespace}/${module}#${commithash}"
>

I'd rather have that in "conventions" than the resultsdb directive, but I
guess it is essentialy the same thing, once you think about it.


>
> > > My take on this is, that we will say which variables are provided
> > > by the trigger for each type. If a variable is missing, the
> > > formula/execution should just crash when it tries to access it.
> >
> > Sounds reasonable.
>
> +1 from me as well. Assume everything is there, crash if there's
> something requested that isn't available (missing data etc.)
>
>
yup, that's what I have in mind.


> > We'll probably end up having a mix of necessary and convenience
> > values in the inputdata. "name" is probably a convenience value here,
> > so that tasks don't have to parse if they need to use it in a certain
> > directive. "epoch" might be an important value for some test cases,
> > and let's say we learn the value in trigger during scheduling
> > investigation, so we decide to pass it down. But that information is
> > not that easy to get manually. If you know what to do, you'll open up
> > a particular koji page and see it. But you can also be clueless about
> > how to figure it out. The same goes for build_id, again can be
> > important, but also can be retrieved later, so more of a convenience
> > data (saving you from writing a koji query). This is just an example
> > for illustration, might not match real-world use cases.
>
> I mentioned this in IRC but why not have a bit of both and allow input
> as either a file or on the CLI. I don't think that json would be too
> bad to type on the command line a

Re: Libtaskotron - allow non-cli data input

2017-02-08 Thread Josef Skladanka
On Wed, Feb 8, 2017 at 2:26 PM, Kamil Paral  wrote:

> This is what I meant - keeping item as is, but being able to pass another
> structure to the formula, which can then be used from it. I'd still like to
> keep the item to a single string, so it can be queried easily in the
> resultsdb. The item should still represent what was tested. It's just that
> I want to be able to pass arbitrary data to the formulae, without the need
> for ugly hacks like we have seen with the git commits lately.
>
>
> So, the question is now how much we want the `item` to uniquely identify
> the item under test. Currently we mostly do (rpmlint, rpmgrill) and
> sometimes don't (depcheck, because item is NVR, but the full ID is NEVRA,
> and we store arch in the results extradata section).
>
>
I still kind of believe that the `item` should be chosen with great respect
to what actually is the item under test, but it also really depends on what
you want to do with it later on. Note that the `item` is actually a
convention (yay, more water to adamw's "if we only had some awesome new
project" mill), and is not enforced in any way. I believe that there should
be firm rules (once again - conventions) on what the item is for each "well
known" item type, so you can kind-of assume that if you query for
`item=foo&type=koji_build` you are getting the results related to that
build.
As we were discussing privately with the item types (I'm not going to go
into much detail here, but for the rest of you guys - I'm contemplating
making the types more general, and using more of the 'metadata' to store
additional spefics - like replacing `type=koji_build` with `type=build,
source=koji`, or `type=build, source=brew` - on the high level, you know
that a package/build was tested, and you don't really care where it came
from, but you sometimes might care, and so there is the additional metadata
stored. We could even have more types stored for one results, or I don't
know... It's complicated), the idea behind item is that it should be a
reasonable value, that carries the "what was tested" information, and you
will use the other "extra-data" fields to provide more details (like we
kind-of want to do with arch, but we don't really..). The reason for it to
be 'reasonable value" and not "superset of all values that we have" is to
make the general querying a bit more straightforward.


> If we have structured input data, what happens to `item` for
> check_modulemd? Currently it is "namespace/module#commithash". Will it stay
> the same, and they'll just avoid parsing it because we'll also provide
> ${data.namespace}, ${data.module} and ${data.hash}? Or will the `item` be
> perhaps just "module" (and the rest will be stored as extradata)? What
> happens when we have a generic git_commit type, and the source can be an
> arbitrary service? Will have some convention to use item as
> "giturl#commithash"?
>
>
Once again - whatever makes sense as the item. For me that would be the
Repo/SHA combo, with server, repo, branch, and commit in extradata.
And it comes to "storing as much relevant metadata as possible" once again.
The thing is, that as long as stuff is predictable, it almost does not
matter what it is, and it once again points out how good of an idea is the
conventions stuff. I believe that we are now storing much less metadata in
resultsdb than we should, and it is caused mostly by the fact that
 - we did not really need to use the results much so far
 - it is pretty hard to pass data into libtaskotron, and querying all the
services all the time, to get the metadata, is/was deemed a bad idea - why
do it ourselves, if the consumer can get it himself. They know that it is
koji_build, so they can query koji.

There is a fine balance to be struck, IMO, so we don't end up storing "all
the data" in resultsdb. But I believe that the stuff relevant for the
result consumption should be there.


Because the ideal way would be to store the whole item+data structure as
> item in resultsdb. But that's hard to query for humans, so we want a simple
> string as an identifier.
>

This, for me, is once again about being predictable. As I said above, I
still think that `item` should be a reasonable identifier, but not
necessary a superset of all the info. That is what the extra data is for.
Talking about...


> But sometimes there can be a lot of data points which uniquely identify
> the thing under test only when you specify it all (for example what Dan
> wrote, sometimes the ID is the old NVR *plus* the new NVR). Will we want to
> somehow combine them into a single item value? We should give some
> directions how people should construct their items.
>
>
My gut feeling here would be storing the "new NVR" (the thing that actually
caused the test to be executed) as item, and adding 'old nvr' to extra
data. But I'm not that familiar with the specific usecase. To me, this
would make sense, because when you query for "this NVR related results"
you'd get the results too. If you wan

Re: Libtaskotron - allow non-cli data input

2017-02-08 Thread Adam Williamson
On Wed, 2017-02-08 at 08:11 -0700, Tim Flink wrote:
> Would it make more sense to just pass in the dict and have semi-coded
> conventions for reporting to resultsdb based on the item_type which
> could be set during the task instead of requiring that to be known
> before task execution time?

Wouldn't it be great if we had a brand new project which would be the
ideal place to represent such conventions, so the bit of taskotron
which reported the results could construct them conveniently? :P
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
___
qa-devel mailing list -- qa-devel@lists.fedoraproject.org
To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org


Re: Libtaskotron - allow non-cli data input

2017-02-08 Thread Kamil Paral
> I think another question is whether we want to keep assuming that the
> user supplies the item that is used as a UID in resultsdb. As you say,
> it seems a bit odd to require people to munge stuff together like
> "namespace/module#commithash" at the same time that it can be separated
> out into a dict-like data structure for easy access.
> 
> Would it make more sense to just pass in the dict and have semi-coded
> conventions for reporting to resultsdb based on the item_type which
> could be set during the task instead of requiring that to be known
> before task execution time?
> 
> Something along the lines of enabling some common kinds of input for
> the resultsdb directive - module commit, dist-git rpm change, etc. so
> that you could specify the item_type to the resultsdb directive and it
> would know to look for certain bits to construct the UID item that's
> reported to resultsdb.
> 
> Using Kamil's example, assume that we have a task for a module and the
> following data is passed in:
> 
>   {'namespace':'someuser', 'module':'httpd', 'commithash':'abc123df980'}
> 
> Neither item nor type is specified on the CLI at execution time. The
> task executes using that input data and when it comes time to report to
> resultsdb:
> 
>   - name: report results to resultsdb
> resultsdb:
>   results: ${some_task_output}
>   type: module
> 
> By passing in that type of module, the directive would look through the
> input data and construct the "item" from input.namespace, input.module
> and input.commithash.

I'll have to think about this, maybe sketch up some examples.

> I mentioned this in IRC but why not have a bit of both and allow input
> as either a file or on the CLI. I don't think that json would be too
> bad to type on the command line as an option for when you're running
> something manually:
> 
>   runtask sometask.yml -e "{'namespace':'someuser',\
> 'module':'somemodule', 'commithash': 'abc123df980'}"

I probably misunderstood you on IRC. In my older response here, I actually 
suggested something like this - having "--datafile data.json", which can also 
be used like "--datafile -" meaning stdin. You can then use "echo  | 
runtask --datafile - ". But your solution is probably easier to look 
at.

> 
> There would be some risk of running into the same problems we had with
> AutoQA where depcheck commands were too long for bash to parse but
> that's when I'd say "you need to use a file for that" 

Definitely.

> > I'm a bit torn between providing as much useful data as we can when
> > scheduling (because a) yaml formulas are very limited and you can't
> > do stuff like string parsing/splitting b) might save you a lot of
> > work/code to have this data presented to you right from the start),
> > and the easy manual execution (when you need to gather and provide
> > all that data manually). It's probably about finding the right
> > balance. We can't avoid having structured multi-data input, I don't
> > think.
> 
> If we did something along the lines of allowing input on the CLI, we
> could have both, no? We'd need to be clear on the precedence of file vs
> CLI input but that seems to me like something that could solve the
> issue of dealing with more complicated inputs without requiring users
> to futz with a file when running tasks locally.

That's not the worry I had. Creating a file or writing json to a command line 
is a bit more work than the current state, but not a problem. What I'm a bit 
afraid of is that we'll start adding many keyvals into the json just because it 
is useful or convenient. As an artificial example, let's say for a koji_build 
FOO we supply NVR, name, epoch, owner, build_id and build_timestamp. And if we 
receive all of that in the fedmsg (or from some koji query that we'll need to 
do anyway for some reason), it makes sense to pass that data, it's free for us 
and it's less work for the task (it doesn't have to do its own queries). 
However, running the task manually as a task developer (and I don't mean 
re-running an existing task on FOO by copy-pasting the existing data json from 
a log file, but running it on a fresh new koji build BAR) makes it much more 
difficult for the developer, because he needs to figure out (manually) all 
those values for BAR just to be able to run his task.

Even more extreme (deliberately, to illustrate the point) example would be to 
pass the whole koji buildinfo dict structure that you get when running 
koji.getBuild(). Which could be actually easier for the developer to emulate, 
because we could document a single command that retrieves exactly that. Unless 
we start adding additional data to it...

So on one hand, I'd like to pass as much data as we have to make task formulas 
simpler, but on the other hand, I'm afraid task development (manual task 
execution, without having a trigger to get all this data by magic) will get 
harder. (I hope I managed to explain it better this time:))
__

Re: Libtaskotron - allow non-cli data input

2017-02-08 Thread Tim Flink
On Wed, 8 Feb 2017 08:26:30 -0500 (EST)
Kamil Paral  wrote:

> > This is what I meant - keeping item as is, but being able to pass
> > another structure to the formula, which can then be used from it.
> > I'd still like to keep the item to a single string, so it can be
> > queried easily in the resultsdb. The item should still represent
> > what was tested. It's just that I want to be able to pass arbitrary
> > data to the formulae, without the need for ugly hacks like we have
> > seen with the git commits lately.  
> 
> So, the question is now how much we want the `item` to uniquely
> identify the item under test. Currently we mostly do (rpmlint,
> rpmgrill) and sometimes don't (depcheck, because item is NVR, but the
> full ID is NEVRA, and we store arch in the results extradata
> section). 
> 
> If we have structured input data, what happens to `item` for
> check_modulemd? Currently it is "namespace/module#commithash". Will
> it stay the same, and they'll just avoid parsing it because we'll
> also provide ${data.namespace}, ${data.module} and ${data.hash}? Or
> will the `item` be perhaps just "module" (and the rest will be stored
> as extradata)? What happens when we have a generic git_commit type,
> and the source can be an arbitrary service? Will have some convention
> to use item as "giturl#commithash"? 

I think another question is whether we want to keep assuming that the
user supplies the item that is used as a UID in resultsdb. As you say,
it seems a bit odd to require people to munge stuff together like
"namespace/module#commithash" at the same time that it can be separated
out into a dict-like data structure for easy access.

Would it make more sense to just pass in the dict and have semi-coded
conventions for reporting to resultsdb based on the item_type which
could be set during the task instead of requiring that to be known
before task execution time?

Something along the lines of enabling some common kinds of input for
the resultsdb directive - module commit, dist-git rpm change, etc. so
that you could specify the item_type to the resultsdb directive and it
would know to look for certain bits to construct the UID item that's
reported to resultsdb.

Using Kamil's example, assume that we have a task for a module and the
following data is passed in:

  {'namespace':'someuser', 'module':'httpd', 'commithash':'abc123df980'}

Neither item nor type is specified on the CLI at execution time. The
task executes using that input data and when it comes time to report to
resultsdb:

  - name: report results to resultsdb
resultsdb:
  results: ${some_task_output}
  type: module

By passing in that type of module, the directive would look through the
input data and construct the "item" from input.namespace, input.module
and input.commithash.

I'm not sure if it makes more sense to have a set of "types" that the
resultsdb directive understands natively or to actually require item
but allow variable names in it along the lines of

  "item":"${namespace}/${module}#${commithash}"

> Because the ideal way would be to store the whole item+data structure
> as item in resultsdb. But that's hard to query for humans, so we want
> a simple string as an identifier. But sometimes there can be a lot of
> data points which uniquely identify the thing under test only when
> you specify it all (for example what Dan wrote, sometimes the ID is
> the old NVR *plus* the new NVR). Will we want to somehow combine them
> into a single item value? We should give some directions how people
> should construct their items. 
> 
> > > I guess it depends whether the extra data will be mandatory and
> > > exactly defined ("this item type provides these input values") or
> > > not (what will formulas do when they're not there?). Also whether
> > > we want to make it still possible to execute a task with simple
> > > `--item string` in some kind of fallback mode, to keep local
> > > execution on dev machines still easy and simple.  
> >   
> 
> > My take on this is, that we will say which variables are provided
> > by the trigger for each type. If a variable is missing, the
> > formula/execution should just crash when it tries to access it.  
> 
> Sounds reasonable. 

+1 from me as well. Assume everything is there, crash if there's
something requested that isn't available (missing data etc.)

> > Not sure about the fallback mode, but my take on this is, that if
> > the user will want to run the task, he will have to just write the
> > "extra data" once to a file, and then it will be just passed in as
> > usual.  
> 
> Once, for each item :) If a task developer wants to execute his task
> on a NVR, he'll need to prepare the structured input data for each
> NVR he wants to test. That might not be difficult, but it's more work
> than it is currently, so we should know whether we're fine with that.
> I guess we are, but there can be some gotchas, e.g. sometimes it
> might not be obvious how to get some extra data. For example: 
> 
>

Re: Libtaskotron - allow non-cli data input

2017-02-08 Thread Kamil Paral
> This is what I meant - keeping item as is, but being able to pass another
> structure to the formula, which can then be used from it. I'd still like to
> keep the item to a single string, so it can be queried easily in the
> resultsdb. The item should still represent what was tested. It's just that I
> want to be able to pass arbitrary data to the formulae, without the need for
> ugly hacks like we have seen with the git commits lately.

So, the question is now how much we want the `item` to uniquely identify the 
item under test. Currently we mostly do (rpmlint, rpmgrill) and sometimes don't 
(depcheck, because item is NVR, but the full ID is NEVRA, and we store arch in 
the results extradata section). 

If we have structured input data, what happens to `item` for check_modulemd? 
Currently it is "namespace/module#commithash". Will it stay the same, and 
they'll just avoid parsing it because we'll also provide ${data.namespace}, 
${data.module} and ${data.hash}? Or will the `item` be perhaps just "module" 
(and the rest will be stored as extradata)? What happens when we have a generic 
git_commit type, and the source can be an arbitrary service? Will have some 
convention to use item as "giturl#commithash"? 

Because the ideal way would be to store the whole item+data structure as item 
in resultsdb. But that's hard to query for humans, so we want a simple string 
as an identifier. But sometimes there can be a lot of data points which 
uniquely identify the thing under test only when you specify it all (for 
example what Dan wrote, sometimes the ID is the old NVR *plus* the new NVR). 
Will we want to somehow combine them into a single item value? We should give 
some directions how people should construct their items. 

> > I guess it depends whether the extra data will be mandatory and exactly
> > defined ("this item type provides these input values") or not (what will
> > formulas do when they're not there?). Also whether we want to make it still
> > possible to execute a task with simple `--item string` in some kind of
> > fallback mode, to keep local execution on dev machines still easy and
> > simple.
> 

> My take on this is, that we will say which variables are provided by the
> trigger for each type. If a variable is missing, the formula/execution
> should just crash when it tries to access it.

Sounds reasonable. 

> Not sure about the fallback mode, but my take on this is, that if the user
> will want to run the task, he will have to just write the "extra data" once
> to a file, and then it will be just passed in as usual.

Once, for each item :) If a task developer wants to execute his task on a NVR, 
he'll need to prepare the structured input data for each NVR he wants to test. 
That might not be difficult, but it's more work than it is currently, so we 
should know whether we're fine with that. I guess we are, but there can be some 
gotchas, e.g. sometimes it might not be obvious how to get some extra data. For 
example: 

nvr = htop-1.0-2.fc25 
inputdata = {name: htop; epoch: 0; build_id: 482735} 

We'll probably end up having a mix of necessary and convenience values in the 
inputdata. "name" is probably a convenience value here, so that tasks don't 
have to parse if they need to use it in a certain directive. "epoch" might be 
an important value for some test cases, and let's say we learn the value in 
trigger during scheduling investigation, so we decide to pass it down. But that 
information is not that easy to get manually. If you know what to do, you'll 
open up a particular koji page and see it. But you can also be clueless about 
how to figure it out. The same goes for build_id, again can be important, but 
also can be retrieved later, so more of a convenience data (saving you from 
writing a koji query). This is just an example for illustration, might not 
match real-world use cases. 

> We could even make some templates for each item_type (I guess trigger docs
> are the place for it?), so people can easily just copy-paste it, and make
> changes.
> I also think that providing a sample json file to the existing tasks (that
> are using it) is a best practice we should strive for.

That is a very good idea, because it could help with the problem described 
above. It could also document how to retrieve values manually, if needed. Also, 
we have to make sure that a user can copy&paste a command to run the same 
production task in his local environment. We should either have it in the log 
file, or in execdb (or both), but it needs to include full inputdata, so that 
you can still easily run the same thing locally for debugging. 

> Makes sense?

I'm a bit torn between providing as much useful data as we can when scheduling 
(because a) yaml formulas are very limited and you can't do stuff like string 
parsing/splitting b) might save you a lot of work/code to have this data 
presented to you right from the start), and the easy manual execution (when you 
need to gather and provide all t

Re: Libtaskotron - allow non-cli data input

2017-02-07 Thread Dan Callaghan
Excerpts from Josef Skladanka's message of 2017-02-07 11:23 +01:00:
> On Mon, Feb 6, 2017 at 6:49 PM, Kamil Paral  wrote:
> 
> > The formulas already provide a way to 'query' structured data via the
> > dot-format, so we could do with as much as passing some variable like
> > 'task_data' that would contain the parsed json/yaml.
> >
> >
> > Or are you proposing we add another variable with these extra values, like
> > this?
> >
> > echo " {'branch': 'master', 'commit': '6e4fc7'} " | runtask --item
> > libtaskotron --type pagure_git_commit --data-file - runtask.yaml
> >
> > or this:
> >
> > echo " {'name': 'htop'} " | runtask --item htop-2.0-1.fc25 --type
> > koji_build --data-file - runtask.yaml
> >
> > and then use ${item} and ${data.branch}, ${data.commit}, or ${data.name} ?
> >
> >
> >
> This is what I meant - keeping item as is, but being able to pass another
> structure to the formula, which can then be used from it. I'd still like to
> keep the item to a single string, so it can be queried easily in the
> resultsdb. The item should still represent what was tested. It's just that
> I want to be able to pass arbitrary data to the formulae, without the need
> for ugly hacks like we have seen with the git commits lately.

We ran into a similar problem with how to represent RPMDiff results in 
ResultsDB. The problem for RPMDiff is that the results are not for 
a single build NVR, but for a pair of NVRs (the build being analyzed 
plus the older "baseline" build it is comparing against).

The solution we came up with (not yet implemented) was to use a new item 
type, with two extra data keys "oldnvr" and "newnvr" to specify the two 
builds in the comparison. We would also store "newnvr" as "item" as 
well, for consistency with other result types.

-- 
Dan Callaghan 
Senior Software Engineer, Products & Technologies Operations
Red Hat


signature.asc
Description: PGP signature
___
qa-devel mailing list -- qa-devel@lists.fedoraproject.org
To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org


Re: Libtaskotron - allow non-cli data input

2017-02-07 Thread Josef Skladanka
On Mon, Feb 6, 2017 at 6:49 PM, Kamil Paral  wrote:

> The formulas already provide a way to 'query' structured data via the
> dot-format, so we could do with as much as passing some variable like
> 'task_data' that would contain the parsed json/yaml.
>
>
> Or are you proposing we add another variable with these extra values, like
> this?
>
> echo " {'branch': 'master', 'commit': '6e4fc7'} " | runtask --item
> libtaskotron --type pagure_git_commit --data-file - runtask.yaml
>
> or this:
>
> echo " {'name': 'htop'} " | runtask --item htop-2.0-1.fc25 --type
> koji_build --data-file - runtask.yaml
>
> and then use ${item} and ${data.branch}, ${data.commit}, or ${data.name} ?
>
>
>
This is what I meant - keeping item as is, but being able to pass another
structure to the formula, which can then be used from it. I'd still like to
keep the item to a single string, so it can be queried easily in the
resultsdb. The item should still represent what was tested. It's just that
I want to be able to pass arbitrary data to the formulae, without the need
for ugly hacks like we have seen with the git commits lately.



> I guess it depends whether the extra data will be mandatory and exactly
> defined ("this item type provides these input values") or not (what will
> formulas do when they're not there?). Also whether we want to make it still
> possible to execute a task with simple `--item string` in some kind of
> fallback mode, to keep local execution on dev machines still easy and
> simple.
>
>
My take on this is, that we will say which variables are provided by the
trigger for each type. If a variable is missing, the formula/execution
should just crash when it tries to access it.
Not sure about the fallback mode, but my take on this is, that if the user
will want to run the task, he will have to just write the "extra data" once
to a file, and then it will be just passed in as usual.
We could even make some templates for each item_type (I guess trigger docs
are the place for it?), so people can easily just copy-paste it, and make
changes.
I also think that providing a sample json file to the existing tasks (that
are using it) is a best practice we should strive for.

Makes sense?

Joza
___
qa-devel mailing list -- qa-devel@lists.fedoraproject.org
To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org


Re: Libtaskotron - allow non-cli data input

2017-02-06 Thread Kamil Paral
> Chaps,

> we were discussing this many times in the past, and as with the
> type-restriction, I think this is the right time to get this done, actually.

> It sure ties to the fact, that I'm trying to put together
> Taskotron-continuously-testing-Taskotron together - the idea here being that
> on each commit to a devel branch on any of the Taskotron components, we will
> spin-up a testing instance of the whole stack, and run some integration
> tests.

> To do this, I added a new consumer to Trigger (
> https://phab.qa.fedoraproject.org/D1110 ) that eats Pagure.io commits, and
> spins jobs based on that.
> This means, that I want to have the repo, branch and commit id as input for
> the job, thus making yet-another-nasty-hack to pass the combined data into
> the job ( https://phab.qa.fedoraproject.org/D1110#C16697NL18 ) so I can hack
> it apart later on either in the formula or in the task itself.

Ewww :) That's the same approach modularity folks need to use for 
dist_git_commit. We need to improve this, agreed. 

> It would be very helpfull to be able to pass some structured data into the
> task instead.

> I kind of remember that we agreed on json/yaml.

Yes, that seems reasonable. I guess I'd prefer using yaml parser for that, 
because it can also understand classic json. In infra, I guess json will be 
used (coming from fedmsgs), but in documentation we can use better readable 
yaml examples. 

> The possibilities were either reading it from stdin or file. I don't really
> care that much either way, but would probably feel a bit better about having
> a cli-param to pass the filename there.

I propose using a cli-param with a file path, with a special case for "-" 
meaning stdin. That is the common syntax for many unix tools. 

> The formulas already provide a way to 'query' structured data via the
> dot-format, so we could do with as much as passing some variable like
> 'task_data' that would contain the parsed json/yaml.

Here I'm a bit unclear. Are you proposing that "item" itself will be 
structured, e.g. like this? 

echo " {'repo': 'libtaskotron', 'branch': 'master', 'commit': '6e4fc7'} " | 
runtask --item-file - --type pagure_git_commit runtask.yaml 

or this: 

echo " {'nvr': 'htop-2.0-1.fc25', 'name': 'htop'} " | runtask --item-file - 
--type koji_build runtask.yaml 

In this case we would be able to use ${item.repo}, ${item.branch}, 
${item.commit} or ${item.nvr}, ${item.name} directly in the formula. Would we 
still keep `--item string`, or would it no longer have any use? 

Or are you proposing we add another variable with these extra values, like 
this? 

echo " {'branch': 'master', 'commit': '6e4fc7'} " | runtask --item libtaskotron 
--type pagure_git_commit --data-file - runtask.yaml 

or this: 

echo " {'name': 'htop'} " | runtask --item htop-2.0-1.fc25 --type koji_build 
--data-file - runtask.yaml 

and then use ${item} and ${data.branch}, ${data.commit}, or ${data.name} ? 

I guess it depends whether the extra data will be mandatory and exactly defined 
("this item type provides these input values") or not (what will formulas do 
when they're not there?). Also whether we want to make it still possible to 
execute a task with simple `--item string` in some kind of fallback mode, to 
keep local execution on dev machines still easy and simple. 
___
qa-devel mailing list -- qa-devel@lists.fedoraproject.org
To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org