Re: Libtaskotron - allow non-cli data input

Tim Flink Wed, 08 Feb 2017 07:12:17 -0800

On Wed, 8 Feb 2017 08:26:30 -0500 (EST)
Kamil Paral <kpa...@redhat.com> wrote:


> > This is what I meant - keeping item as is, but being able to pass
> > another structure to the formula, which can then be used from it.
> > I'd still like to keep the item to a single string, so it can be
> > queried easily in the resultsdb. The item should still represent
> > what was tested. It's just that I want to be able to pass arbitrary
> > data to the formulae, without the need for ugly hacks like we have
> > seen with the git commits lately.  
> 
> So, the question is now how much we want the `item` to uniquely
> identify the item under test. Currently we mostly do (rpmlint,
> rpmgrill) and sometimes don't (depcheck, because item is NVR, but the
> full ID is NEVRA, and we store arch in the results extradata
> section). 
> 
> If we have structured input data, what happens to `item` for
> check_modulemd? Currently it is "namespace/module#commithash". Will
> it stay the same, and they'll just avoid parsing it because we'll
> also provide ${data.namespace}, ${data.module} and ${data.hash}? Or
> will the `item` be perhaps just "module" (and the rest will be stored
> as extradata)? What happens when we have a generic git_commit type,
> and the source can be an arbitrary service? Will have some convention
> to use item as "giturl#commithash"? 

I think another question is whether we want to keep assuming that the
user supplies the item that is used as a UID in resultsdb. As you say,
it seems a bit odd to require people to munge stuff together like
"namespace/module#commithash" at the same time that it can be separated
out into a dict-like data structure for easy access.

Would it make more sense to just pass in the dict and have semi-coded
conventions for reporting to resultsdb based on the item_type which
could be set during the task instead of requiring that to be known
before task execution time?

Something along the lines of enabling some common kinds of input for
the resultsdb directive - module commit, dist-git rpm change, etc. so
that you could specify the item_type to the resultsdb directive and it
would know to look for certain bits to construct the UID item that's
reported to resultsdb.

Using Kamil's example, assume that we have a task for a module and the
following data is passed in:

  {'namespace':'someuser', 'module':'httpd', 'commithash':'abc123df980'}

Neither item nor type is specified on the CLI at execution time. The
task executes using that input data and when it comes time to report to
resultsdb:

  - name: report results to resultsdb
    resultsdb:
      results: ${some_task_output}
      type: module

By passing in that type of module, the directive would look through the
input data and construct the "item" from input.namespace, input.module
and input.commithash.

I'm not sure if it makes more sense to have a set of "types" that the
resultsdb directive understands natively or to actually require item
but allow variable names in it along the lines of

  "item":"${namespace}/${module}#${commithash}"

> Because the ideal way would be to store the whole item+data structure
> as item in resultsdb. But that's hard to query for humans, so we want
> a simple string as an identifier. But sometimes there can be a lot of
> data points which uniquely identify the thing under test only when
> you specify it all (for example what Dan wrote, sometimes the ID is
> the old NVR *plus* the new NVR). Will we want to somehow combine them
> into a single item value? We should give some directions how people
> should construct their items. 
> 
> > > I guess it depends whether the extra data will be mandatory and
> > > exactly defined ("this item type provides these input values") or
> > > not (what will formulas do when they're not there?). Also whether
> > > we want to make it still possible to execute a task with simple
> > > `--item string` in some kind of fallback mode, to keep local
> > > execution on dev machines still easy and simple.  
> >   
> 
> > My take on this is, that we will say which variables are provided
> > by the trigger for each type. If a variable is missing, the
> > formula/execution should just crash when it tries to access it.  
> 
> Sounds reasonable. 

+1 from me as well. Assume everything is there, crash if there's
something requested that isn't available (missing data etc.)

> > Not sure about the fallback mode, but my take on this is, that if
> > the user will want to run the task, he will have to just write the
> > "extra data" once to a file, and then it will be just passed in as
> > usual.  
> 
> Once, for each item :) If a task developer wants to execute his task
> on a NVR, he'll need to prepare the structured input data for each
> NVR he wants to test. That might not be difficult, but it's more work
> than it is currently, so we should know whether we're fine with that.
> I guess we are, but there can be some gotchas, e.g. sometimes it
> might not be obvious how to get some extra data. For example: 
> 
> nvr = htop-1.0-2.fc25 
> inputdata = {name: htop; epoch: 0; build_id: 482735} 
> 
> We'll probably end up having a mix of necessary and convenience
> values in the inputdata. "name" is probably a convenience value here,
> so that tasks don't have to parse if they need to use it in a certain
> directive. "epoch" might be an important value for some test cases,
> and let's say we learn the value in trigger during scheduling
> investigation, so we decide to pass it down. But that information is
> not that easy to get manually. If you know what to do, you'll open up
> a particular koji page and see it. But you can also be clueless about
> how to figure it out. The same goes for build_id, again can be
> important, but also can be retrieved later, so more of a convenience
> data (saving you from writing a koji query). This is just an example
> for illustration, might not match real-world use cases. 

I mentioned this in IRC but why not have a bit of both and allow input
as either a file or on the CLI. I don't think that json would be too
bad to type on the command line as an option for when you're running
something manually:

  runtask sometask.yml -e "{'namespace':'someuser',\
                    'module':'somemodule', 'commithash': 'abc123df980'}"

There would be some risk of running into the same problems we had with
AutoQA where depcheck commands were too long for bash to parse but
that's when I'd say "you need to use a file for that" or see if there
was another solution to whatever required input that was too long for
bash.

> > We could even make some templates for each item_type (I guess
> > trigger docs are the place for it?), so people can easily just
> > copy-paste it, and make changes.
> > I also think that providing a sample json file to the existing
> > tasks (that are using it) is a best practice we should strive for.  
> 
> That is a very good idea, because it could help with the problem
> described above. It could also document how to retrieve values
> manually, if needed. Also, we have to make sure that a user can
> copy&paste a command to run the same production task in his local
> environment. We should either have it in the log file, or in execdb
> (or both), but it needs to include full inputdata, so that you can
> still easily run the same thing locally for debugging. 

Templates, at least. I think it'd be nice to have a bit more than a
suggestion for things but then again, maybe resultsdb's requirement of
an "ITEM" is enough consistency and any farther requirements (instead
of suggestions) would end up being too much.

> > Makes sense?  
> 
> I'm a bit torn between providing as much useful data as we can when
> scheduling (because a) yaml formulas are very limited and you can't
> do stuff like string parsing/splitting b) might save you a lot of
> work/code to have this data presented to you right from the start),
> and the easy manual execution (when you need to gather and provide
> all that data manually). It's probably about finding the right
> balance. We can't avoid having structured multi-data input, I don't
> think. 

If we did something along the lines of allowing input on the CLI, we
could have both, no? We'd need to be clear on the precedence of file vs
CLI input but that seems to me like something that could solve the
issue of dealing with more complicated inputs without requiring users
to futz with a file when running tasks locally.

Tim

pgp1P07w5_vmF.pgp
Description: OpenPGP digital signature

_______________________________________________
qa-devel mailing list -- qa-devel@lists.fedoraproject.org
To unsubscribe send an email to qa-devel-le...@lists.fedoraproject.org

Re: Libtaskotron - allow non-cli data input

Reply via email to