Re: Libtaskotron - allow non-cli data input

Kamil Paral Tue, 14 Feb 2017 08:34:53 -0800

> On Wed, Feb 8, 2017 at 2:26 PM, Kamil Paral < [email protected] > wrote:


> > > This is what I meant - keeping item as is, but being able to pass another
> > > structure to the formula, which can then be used from it. I'd still like
> > > to
> > > keep the item to a single string, so it can be queried easily in the
> > > resultsdb. The item should still represent what was tested. It's just
> > > that
> > > I
> > > want to be able to pass arbitrary data to the formulae, without the need
> > > for
> > > ugly hacks like we have seen with the git commits lately.
> > 
> 

> > So, the question is now how much we want the `item` to uniquely identify
> > the
> > item under test. Currently we mostly do (rpmlint, rpmgrill) and sometimes
> > don't (depcheck, because item is NVR, but the full ID is NEVRA, and we
> > store
> > arch in the results extradata section).
> 

> I still kind of believe that the `item` should be chosen with great respect
> to what actually is the item under test, but it also really depends on what
> you want to do with it later on. Note that the `item` is actually a
> convention (yay, more water to adamw's "if we only had some awesome new
> project" mill), and is not enforced in any way. I believe that there should
> be firm rules (once again - conventions) on what the item is for each "well
> known" item type, so you can kind-of assume that if you query for
> `item=foo&type=koji_build` you are getting the results related to that
> build.
> As we were discussing privately with the item types (I'm not going to go into
> much detail here, but for the rest of you guys - I'm contemplating making
> the types more general, and using more of the 'metadata' to store additional
> spefics - like replacing `type=koji_build` with `type=build, source=koji`,
> or `type=build, source=brew` - on the high level, you know that a
> package/build was tested, and you don't really care where it came from, but
> you sometimes might care, and so there is the additional metadata stored. We
> could even have more types stored for one results, or I don't know... It's
> complicated), the idea behind item is that it should be a reasonable value,
> that carries the "what was tested" information, and you will use the other
> "extra-data" fields to provide more details (like we kind-of want to do with
> arch, but we don't really..). The reason for it to be 'reasonable value" and
> not "superset of all values that we have" is to make the general querying a
> bit more straightforward.

> > If we have structured input data, what happens to `item` for
> > check_modulemd?
> > Currently it is "namespace/module#commithash". Will it stay the same, and
> > they'll just avoid parsing it because we'll also provide ${data.namespace},
> > ${data.module} and ${data.hash}? Or will the `item` be perhaps just
> > "module"
> > (and the rest will be stored as extradata)? What happens when we have a
> > generic git_commit type, and the source can be an arbitrary service? Will
> > have some convention to use item as "giturl#commithash"?
> 

> Once again - whatever makes sense as the item. For me that would be the
> Repo/SHA combo, with server, repo, branch, and commit in extradata.
> And it comes to "storing as much relevant metadata as possible" once again.
> The thing is, that as long as stuff is predictable, it almost does not
> matter what it is, and it once again points out how good of an idea is the
> conventions stuff. I believe that we are now storing much less metadata in
> resultsdb than we should, and it is caused mostly by the fact that
> - we did not really need to use the results much so far
> - it is pretty hard to pass data into libtaskotron, and querying all the
> services all the time, to get the metadata, is/was deemed a bad idea - why
> do it ourselves, if the consumer can get it himself. They know that it is
> koji_build, so they can query koji.

> There is a fine balance to be struck, IMO, so we don't end up storing "all
> the data" in resultsdb. But I believe that the stuff relevant for the result
> consumption should be there.

> > Because the ideal way would be to store the whole item+data structure as
> > item
> > in resultsdb. But that's hard to query for humans, so we want a simple
> > string as an identifier.
> 

> This, for me, is once again about being predictable. As I said above, I still
> think that `item` should be a reasonable identifier, but not necessary a
> superset of all the info. That is what the extra data is for. Talking
> about...

> > But sometimes there can be a lot of data points which uniquely identify the
> > thing under test only when you specify it all (for example what Dan wrote,
> > sometimes the ID is the old NVR *plus* the new NVR). Will we want to
> > somehow
> > combine them into a single item value? We should give some directions how
> > people should construct their items.
> 

> My gut feeling here would be storing the "new NVR" (the thing that actually
> caused the test to be executed) as item, and adding 'old nvr' to extra data.
> But I'm not that familiar with the specific usecase. To me, this would make
> sense, because when you query for "this NVR related results" you'd get the
> results too. If you wanted to have the specifics, then you could narrow that
> search by using the relevant extra-data key-vals.

> > Once, for each item :) If a task developer wants to execute his task on a
> > NVR, he'll need to prepare the structured input data for each NVR he wants
> > to test. That might not be difficult, but it's more work than it is
> > currently, so we should know whether we're fine with that. I guess we are,
> > but there can be some gotchas, e.g. sometimes it might not be obvious how
> > to
> > get some extra data. For example:
> 

> > nvr = htop-1.0-2.fc25
> 
> > inputdata = {name: htop; epoch: 0; build_id: 482735}
> 

> I think that you are, once again, being overly defensive. The task does not
> _need_ to use/access all the data that the 'item'(type) can possible have
> associated with it. And the values the task needs, the task-developer should
> be able to provide. I mean - the developer knows, what he wants to test,
> right? And if not, is it a good idea for him to put the test together in the
> first place?

> > We'll probably end up having a mix of necessary and convenience values in
> > the
> > inputdata. "name" is probably a convenience value here, so that tasks don't
> > have to parse if they need to use it in a certain directive. "epoch" might
> > be an important value for some test cases, and let's say we learn the value
> > in trigger during scheduling investigation, so we decide to pass it down.
> > But that information is not that easy to get manually. If you know what to
> > do, you'll open up a particular koji page and see it. But you can also be
> > clueless about how to figure it out. The same goes for build_id, again can
> > be important, but also can be retrieved later, so more of a convenience
> > data
> > (saving you from writing a koji query). This is just an example for
> > illustration, might not match real-world use cases.
> 

> If you are clueless about getting information necessary for your test to run,
> you (IMO) should not be writing the test. Because that most probably mean,
> that you do not have the necessary expertise to design the test in the first
> place.

> > > Makes sense?
> > 
> 

> > I'm a bit torn between providing as much useful data as we can when
> > scheduling (because a) yaml formulas are very limited and you can't do
> > stuff
> > like string parsing/splitting b) might save you a lot of work/code to have
> > this data presented to you right from the start), and the easy manual
> > execution (when you need to gather and provide all that data manually).
> > It's
> > probably about finding the right balance. We can't avoid having structured
> > multi-data input, I don't think.
> 

> As I said already, I firmly believe that we should be providing as much
> reasonable, useful data as possible. Most of the tests will not need it all,
> and if they do, the maintainers/developers should know what the possible
> values are anyway.
> It sure can be a minor pita to put together the first "testing" datafiles,
> but you only need to do it once.

> Joza

> _______________________________________________
> qa-devel mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

All of this sounds reasonable.

_______________________________________________
qa-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Re: Libtaskotron - allow non-cli data input

Reply via email to