> This is what I meant - keeping item as is, but being able to pass another
> structure to the formula, which can then be used from it. I'd still like to
> keep the item to a single string, so it can be queried easily in the
> resultsdb. The item should still represent what was tested. It's just that I
> want to be able to pass arbitrary data to the formulae, without the need for
> ugly hacks like we have seen with the git commits lately.

So, the question is now how much we want the `item` to uniquely identify the 
item under test. Currently we mostly do (rpmlint, rpmgrill) and sometimes don't 
(depcheck, because item is NVR, but the full ID is NEVRA, and we store arch in 
the results extradata section). 

If we have structured input data, what happens to `item` for check_modulemd? 
Currently it is "namespace/module#commithash". Will it stay the same, and 
they'll just avoid parsing it because we'll also provide ${data.namespace}, 
${data.module} and ${data.hash}? Or will the `item` be perhaps just "module" 
(and the rest will be stored as extradata)? What happens when we have a generic 
git_commit type, and the source can be an arbitrary service? Will have some 
convention to use item as "giturl#commithash"? 

Because the ideal way would be to store the whole item+data structure as item 
in resultsdb. But that's hard to query for humans, so we want a simple string 
as an identifier. But sometimes there can be a lot of data points which 
uniquely identify the thing under test only when you specify it all (for 
example what Dan wrote, sometimes the ID is the old NVR *plus* the new NVR). 
Will we want to somehow combine them into a single item value? We should give 
some directions how people should construct their items. 

> > I guess it depends whether the extra data will be mandatory and exactly
> > defined ("this item type provides these input values") or not (what will
> > formulas do when they're not there?). Also whether we want to make it still
> > possible to execute a task with simple `--item string` in some kind of
> > fallback mode, to keep local execution on dev machines still easy and
> > simple.
> 

> My take on this is, that we will say which variables are provided by the
> trigger for each type. If a variable is missing, the formula/execution
> should just crash when it tries to access it.

Sounds reasonable. 

> Not sure about the fallback mode, but my take on this is, that if the user
> will want to run the task, he will have to just write the "extra data" once
> to a file, and then it will be just passed in as usual.

Once, for each item :) If a task developer wants to execute his task on a NVR, 
he'll need to prepare the structured input data for each NVR he wants to test. 
That might not be difficult, but it's more work than it is currently, so we 
should know whether we're fine with that. I guess we are, but there can be some 
gotchas, e.g. sometimes it might not be obvious how to get some extra data. For 
example: 

nvr = htop-1.0-2.fc25 
inputdata = {name: htop; epoch: 0; build_id: 482735} 

We'll probably end up having a mix of necessary and convenience values in the 
inputdata. "name" is probably a convenience value here, so that tasks don't 
have to parse if they need to use it in a certain directive. "epoch" might be 
an important value for some test cases, and let's say we learn the value in 
trigger during scheduling investigation, so we decide to pass it down. But that 
information is not that easy to get manually. If you know what to do, you'll 
open up a particular koji page and see it. But you can also be clueless about 
how to figure it out. The same goes for build_id, again can be important, but 
also can be retrieved later, so more of a convenience data (saving you from 
writing a koji query). This is just an example for illustration, might not 
match real-world use cases. 

> We could even make some templates for each item_type (I guess trigger docs
> are the place for it?), so people can easily just copy-paste it, and make
> changes.
> I also think that providing a sample json file to the existing tasks (that
> are using it) is a best practice we should strive for.

That is a very good idea, because it could help with the problem described 
above. It could also document how to retrieve values manually, if needed. Also, 
we have to make sure that a user can copy&paste a command to run the same 
production task in his local environment. We should either have it in the log 
file, or in execdb (or both), but it needs to include full inputdata, so that 
you can still easily run the same thing locally for debugging. 

> Makes sense?

I'm a bit torn between providing as much useful data as we can when scheduling 
(because a) yaml formulas are very limited and you can't do stuff like string 
parsing/splitting b) might save you a lot of work/code to have this data 
presented to you right from the start), and the easy manual execution (when you 
need to gather and provide all that data manually). It's probably about finding 
the right balance. We can't avoid having structured multi-data input, I don't 
think. 
_______________________________________________
qa-devel mailing list -- qa-devel@lists.fedoraproject.org
To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org

Reply via email to