> On Wed, Feb 8, 2017 at 2:26 PM, Kamil Paral < kpa...@redhat.com > wrote:
> > > This is what I meant - keeping item as is, but being able to pass another > > > structure to the formula, which can then be used from it. I'd still like > > > to > > > keep the item to a single string, so it can be queried easily in the > > > resultsdb. The item should still represent what was tested. It's just > > > that > > > I > > > want to be able to pass arbitrary data to the formulae, without the need > > > for > > > ugly hacks like we have seen with the git commits lately. > > > > > So, the question is now how much we want the `item` to uniquely identify > > the > > item under test. Currently we mostly do (rpmlint, rpmgrill) and sometimes > > don't (depcheck, because item is NVR, but the full ID is NEVRA, and we > > store > > arch in the results extradata section). > > I still kind of believe that the `item` should be chosen with great respect > to what actually is the item under test, but it also really depends on what > you want to do with it later on. Note that the `item` is actually a > convention (yay, more water to adamw's "if we only had some awesome new > project" mill), and is not enforced in any way. I believe that there should > be firm rules (once again - conventions) on what the item is for each "well > known" item type, so you can kind-of assume that if you query for > `item=foo&type=koji_build` you are getting the results related to that > build. > As we were discussing privately with the item types (I'm not going to go into > much detail here, but for the rest of you guys - I'm contemplating making > the types more general, and using more of the 'metadata' to store additional > spefics - like replacing `type=koji_build` with `type=build, source=koji`, > or `type=build, source=brew` - on the high level, you know that a > package/build was tested, and you don't really care where it came from, but > you sometimes might care, and so there is the additional metadata stored. We > could even have more types stored for one results, or I don't know... It's > complicated), the idea behind item is that it should be a reasonable value, > that carries the "what was tested" information, and you will use the other > "extra-data" fields to provide more details (like we kind-of want to do with > arch, but we don't really..). The reason for it to be 'reasonable value" and > not "superset of all values that we have" is to make the general querying a > bit more straightforward. > > If we have structured input data, what happens to `item` for > > check_modulemd? > > Currently it is "namespace/module#commithash". Will it stay the same, and > > they'll just avoid parsing it because we'll also provide ${data.namespace}, > > ${data.module} and ${data.hash}? Or will the `item` be perhaps just > > "module" > > (and the rest will be stored as extradata)? What happens when we have a > > generic git_commit type, and the source can be an arbitrary service? Will > > have some convention to use item as "giturl#commithash"? > > Once again - whatever makes sense as the item. For me that would be the > Repo/SHA combo, with server, repo, branch, and commit in extradata. > And it comes to "storing as much relevant metadata as possible" once again. > The thing is, that as long as stuff is predictable, it almost does not > matter what it is, and it once again points out how good of an idea is the > conventions stuff. I believe that we are now storing much less metadata in > resultsdb than we should, and it is caused mostly by the fact that > - we did not really need to use the results much so far > - it is pretty hard to pass data into libtaskotron, and querying all the > services all the time, to get the metadata, is/was deemed a bad idea - why > do it ourselves, if the consumer can get it himself. They know that it is > koji_build, so they can query koji. > There is a fine balance to be struck, IMO, so we don't end up storing "all > the data" in resultsdb. But I believe that the stuff relevant for the result > consumption should be there. > > Because the ideal way would be to store the whole item+data structure as > > item > > in resultsdb. But that's hard to query for humans, so we want a simple > > string as an identifier. > > This, for me, is once again about being predictable. As I said above, I still > think that `item` should be a reasonable identifier, but not necessary a > superset of all the info. That is what the extra data is for. Talking > about... > > But sometimes there can be a lot of data points which uniquely identify the > > thing under test only when you specify it all (for example what Dan wrote, > > sometimes the ID is the old NVR *plus* the new NVR). Will we want to > > somehow > > combine them into a single item value? We should give some directions how > > people should construct their items. > > My gut feeling here would be storing the "new NVR" (the thing that actually > caused the test to be executed) as item, and adding 'old nvr' to extra data. > But I'm not that familiar with the specific usecase. To me, this would make > sense, because when you query for "this NVR related results" you'd get the > results too. If you wanted to have the specifics, then you could narrow that > search by using the relevant extra-data key-vals. > > Once, for each item :) If a task developer wants to execute his task on a > > NVR, he'll need to prepare the structured input data for each NVR he wants > > to test. That might not be difficult, but it's more work than it is > > currently, so we should know whether we're fine with that. I guess we are, > > but there can be some gotchas, e.g. sometimes it might not be obvious how > > to > > get some extra data. For example: > > > nvr = htop-1.0-2.fc25 > > > inputdata = {name: htop; epoch: 0; build_id: 482735} > > I think that you are, once again, being overly defensive. The task does not > _need_ to use/access all the data that the 'item'(type) can possible have > associated with it. And the values the task needs, the task-developer should > be able to provide. I mean - the developer knows, what he wants to test, > right? And if not, is it a good idea for him to put the test together in the > first place? > > We'll probably end up having a mix of necessary and convenience values in > > the > > inputdata. "name" is probably a convenience value here, so that tasks don't > > have to parse if they need to use it in a certain directive. "epoch" might > > be an important value for some test cases, and let's say we learn the value > > in trigger during scheduling investigation, so we decide to pass it down. > > But that information is not that easy to get manually. If you know what to > > do, you'll open up a particular koji page and see it. But you can also be > > clueless about how to figure it out. The same goes for build_id, again can > > be important, but also can be retrieved later, so more of a convenience > > data > > (saving you from writing a koji query). This is just an example for > > illustration, might not match real-world use cases. > > If you are clueless about getting information necessary for your test to run, > you (IMO) should not be writing the test. Because that most probably mean, > that you do not have the necessary expertise to design the test in the first > place. > > > Makes sense? > > > > > I'm a bit torn between providing as much useful data as we can when > > scheduling (because a) yaml formulas are very limited and you can't do > > stuff > > like string parsing/splitting b) might save you a lot of work/code to have > > this data presented to you right from the start), and the easy manual > > execution (when you need to gather and provide all that data manually). > > It's > > probably about finding the right balance. We can't avoid having structured > > multi-data input, I don't think. > > As I said already, I firmly believe that we should be providing as much > reasonable, useful data as possible. Most of the tests will not need it all, > and if they do, the maintainers/developers should know what the possible > values are anyway. > It sure can be a minor pita to put together the first "testing" datafiles, > but you only need to do it once. > Joza > _______________________________________________ > qa-devel mailing list -- qa-devel@lists.fedoraproject.org > To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org All of this sounds reasonable.
_______________________________________________ qa-devel mailing list -- qa-devel@lists.fedoraproject.org To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org