Re: Libtaskotron - allow non-cli data input

Josef Skladanka Wed, 08 Feb 2017 15:16:13 -0800

On Wed, Feb 8, 2017 at 2:26 PM, Kamil Paral <kpa...@redhat.com> wrote:


> This is what I meant - keeping item as is, but being able to pass another
> structure to the formula, which can then be used from it. I'd still like to
> keep the item to a single string, so it can be queried easily in the
> resultsdb. The item should still represent what was tested. It's just that
> I want to be able to pass arbitrary data to the formulae, without the need
> for ugly hacks like we have seen with the git commits lately.
>
>
> So, the question is now how much we want the `item` to uniquely identify
> the item under test. Currently we mostly do (rpmlint, rpmgrill) and
> sometimes don't (depcheck, because item is NVR, but the full ID is NEVRA,
> and we store arch in the results extradata section).
>
>
I still kind of believe that the `item` should be chosen with great respect
to what actually is the item under test, but it also really depends on what
you want to do with it later on. Note that the `item` is actually a
convention (yay, more water to adamw's "if we only had some awesome new
project" mill), and is not enforced in any way. I believe that there should
be firm rules (once again - conventions) on what the item is for each "well
known" item type, so you can kind-of assume that if you query for
`item=foo&type=koji_build` you are getting the results related to that
build.
As we were discussing privately with the item types (I'm not going to go
into much detail here, but for the rest of you guys - I'm contemplating
making the types more general, and using more of the 'metadata' to store
additional spefics - like replacing `type=koji_build` with `type=build,
source=koji`, or `type=build, source=brew` - on the high level, you know
that a package/build was tested, and you don't really care where it came
from, but you sometimes might care, and so there is the additional metadata
stored. We could even have more types stored for one results, or I don't
know... It's complicated), the idea behind item is that it should be a
reasonable value, that carries the "what was tested" information, and you
will use the other "extra-data" fields to provide more details (like we
kind-of want to do with arch, but we don't really..). The reason for it to
be 'reasonable value" and not "superset of all values that we have" is to
make the general querying a bit more straightforward.


> If we have structured input data, what happens to `item` for
> check_modulemd? Currently it is "namespace/module#commithash". Will it stay
> the same, and they'll just avoid parsing it because we'll also provide
> ${data.namespace}, ${data.module} and ${data.hash}? Or will the `item` be
> perhaps just "module" (and the rest will be stored as extradata)? What
> happens when we have a generic git_commit type, and the source can be an
> arbitrary service? Will have some convention to use item as
> "giturl#commithash"?
>
>
Once again - whatever makes sense as the item. For me that would be the
Repo/SHA combo, with server, repo, branch, and commit in extradata.
And it comes to "storing as much relevant metadata as possible" once again.
The thing is, that as long as stuff is predictable, it almost does not
matter what it is, and it once again points out how good of an idea is the
conventions stuff. I believe that we are now storing much less metadata in
resultsdb than we should, and it is caused mostly by the fact that
 - we did not really need to use the results much so far
 - it is pretty hard to pass data into libtaskotron, and querying all the
services all the time, to get the metadata, is/was deemed a bad idea - why
do it ourselves, if the consumer can get it himself. They know that it is
koji_build, so they can query koji.

There is a fine balance to be struck, IMO, so we don't end up storing "all
the data" in resultsdb. But I believe that the stuff relevant for the
result consumption should be there.


Because the ideal way would be to store the whole item+data structure as
> item in resultsdb. But that's hard to query for humans, so we want a simple
> string as an identifier.
>

This, for me, is once again about being predictable. As I said above, I
still think that `item` should be a reasonable identifier, but not
necessary a superset of all the info. That is what the extra data is for.
Talking about...


> But sometimes there can be a lot of data points which uniquely identify
> the thing under test only when you specify it all (for example what Dan
> wrote, sometimes the ID is the old NVR *plus* the new NVR). Will we want to
> somehow combine them into a single item value? We should give some
> directions how people should construct their items.
>
>
My gut feeling here would be storing the "new NVR" (the thing that actually
caused the test to be executed) as item, and adding 'old nvr' to extra
data. But I'm not that familiar with the specific usecase. To me, this
would make sense, because when you query for "this NVR related results"
you'd get the results too. If you wanted to have the specifics, then you
could narrow that search by using the relevant extra-data key-vals.


> Once, for each item :)  If a task developer wants to execute his task on a
> NVR, he'll need to prepare the structured input data for each NVR he wants
> to test. That might not be difficult, but it's more work than it is
> currently, so we should know whether we're fine with that. I guess we are,
> but there can be some gotchas, e.g. sometimes it might not be obvious how
> to get some extra data. For example:
>
> nvr = htop-1.0-2.fc25
> inputdata = {name: htop; epoch: 0; build_id: 482735}
>
>
I think that you are, once again, being overly defensive. The task does not
_need_ to use/access all the data that the 'item'(type) can possible have
associated with it. And the values the task needs, the task-developer
should be able to provide. I mean - the developer knows, what he wants to
test, right? And if not, is it a good idea for him to put the test together
in the first place?



> We'll probably end up having a mix of necessary and convenience values in
> the inputdata. "name" is probably a convenience value here, so that tasks
> don't have to parse if they need to use it in a certain directive. "epoch"
> might be an important value for some test cases, and let's say we learn the
> value in trigger during scheduling investigation, so we decide to pass it
> down. But that information is not that easy to get manually. If you know
> what to do, you'll open up a particular koji page and see it. But you can
> also be clueless about how to figure it out. The same goes for build_id,
> again can be important, but also can be retrieved later, so more of a
> convenience data (saving you from writing a koji query). This is just an
> example for illustration, might not match real-world use cases.
>
>
If you are clueless about getting information necessary for your test to
run, you (IMO) should not be writing the test. Because that most probably
mean, that you do not have the necessary expertise to design the test in
the first place.

>
> Makes sense?
>
>
> I'm a bit torn between providing as much useful data as we can when
> scheduling (because a) yaml formulas are very limited and you can't do
> stuff like string parsing/splitting b) might save you a lot of work/code to
> have this data presented to you right from the start), and the easy manual
> execution (when you need to gather and provide all that data manually).
> It's probably about finding the right balance. We can't avoid having
> structured multi-data input, I don't think.
>
>
As I said already, I firmly believe that we should be providing as much
reasonable, useful data as possible. Most of the tests will not need it
all, and if they do, the maintainers/developers should know what the
possible values are anyway.
It sure can be a minor pita to put together the first "testing" datafiles,
but you only need to do it once.

Joza

_______________________________________________
qa-devel mailing list -- qa-devel@lists.fedoraproject.org
To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org

Re: Libtaskotron - allow non-cli data input

Reply via email to