> > We talked with Martin about this in length some time ago, and I
> > raised the question of different consumers. I see two groups here -
> > machines and humans. If I understand you correctly, what you propose
> > up there is to hardcode the system to fit human preferences. If I
> > misunderstood it, then the whole rest of the mail is based on wrong
> > assumptions, but it's still an interesting topic :)
> 
> I think of it more as trying to ensure consistent behavior in the data
> that we emit to external entities, whether that's a machine or a human.

I misunderstood then, I thought it was more than that. In that case we can just 
discuss deduplication of depcheck-like results and forget about the rest of my 
email (or keep it at hand for future discussions of that topic).

> 
> In my mind, the core of the issue is that we're taking what are
> essentially per-repo checks and hacking them until they emulate a
> per-build or per-update check as closely as we can make it.
> 
> This means that we end up with a lot of "duplicate" results every time
> that the check is run because we have to run it as check(repo) instead
> of check(build) or check(update).
> 
> Since we're trying to emulate a per-build or per-update check (in the
> case of upgradepath and depcheck) I don't see a reason why a human or
> machine consumer would want or need the "duplicate" results beyond how
> we emulate the per-build or per-update check that external entities are
> expecting.

I think the most confusing part is that depcheck is really a repo check, but we 
try to _report it_ per build (let's disregard updates, we plan to get rid of 
them). I think that the same thing as you're saying.
I understand the motivation for not duplicating those results, even though I 
consider both ways to be "consistent", just in a slightly different way.

Let's consider a different repo check, let's say "valid_metadata". We would run 
this check every day, once a new repo is pushed. The "item" would be e.g. 
"fedora-22-update", with some arch, only the time of running/reporting would 
differ, every day.
We don't want to deduplicate this (there's nothing to deduplicate, there's one 
result per day). Running our naive deduplication algorithm (ignoring same 
results from the same check on the same item) would incorrectly ignore all 
results from the second execution forward (until the result would change).
How do we plan to distinguish valid_metadata from depcheck, to know which 
results to deduplicate and which not to?

I can imagine adding a timestamp to "item" or hardcoding some check differences 
to resultsdb, and they both seem very unclean and unmaintainable. Am I missing 
some simple and clean solution?

> 
> FWIW, I don't think we're going to have many more of these emulation
> checks. I also think that we can get away with an "emulation mode" for
> resultsdb that handles checks like upgradepath and depcheck differently
> than other results so that they behave like other "one result per
> change" checks but I'm not the one writing that code. 

I'd very much like to avoid this, it just adds a bunch of complexity and makes 
task writing much harder because of documentation full of "special" modes and 
exceptions.

> If martin thinks
> that's not a tractable solution, I'm game for other options.
> 
> > When targeting humans, I believe we will cut off some use cases for
> > machines, which can benefit from duplicated (and thus very
> > up-to-date) information. Some ideas from top of my head describing
> > what duplicated messages allow:
> > * For some checks like depcheck, the machine (i.e. Bodhi) can not
> > only display the outcome for a certain package, but also the time
> > when this package was last tested (might be a very interesting piece
> > of information, was it OK yesterday, or 14 days ago and not run
> > since?).
> > * Or maybe show a graph of all the outcomes in the last week - so
> > that you can see that it passed 10 times and failed once, and decide
> > that the failure was probably a random fluke which is not worth
> > investigating further.
> 
> What would another interpretation of failing once and then passing for
> all subsequent runs be? I can't think of a situation other than bugs in
> the check that would be anything other than pass.

Race conditions, perhaps. Issues with particular input data. Depends on the 
check.

> 
> > * If the message passes through another system (e.g. Bodhi, Koji),
> > the system in question can e.g. allows users to configure how they
> > want to receive results - whether duplicated or deduplicated, how
> > much deduplicated, how often, etc. This is mostly true for email, RSS
> > or some other communication channels, because fedmsg bus itself is
> > not configurable per individual users' needs.
> 
> I'm not sure there would be enough people interested in receiving that
> kind of data to make it worth the effort supplying it through bodhi or
> fedmsg would entail. If we were going to have a bunch of checks that
> were emulating per-event checks, then maybe but for upgradepath and
> depcheck? I just can't see the desire or demand.

That's true. I was thinking in general, not just about upgradepath and 
depcheck. OTOH, things can be changed when needed.

> 
> > * It's possible to create some kind of package testing stats
> > overview, live and without regular queries.
> 
> If I'm understanding you correctly, this has been in the back of my
> mind for a while but I think it can be done without the duplicated data
> - just showing current check state.
> 
> > You can argue that most of this is achievable without duplicated
> > messages, by querying the ResultsDB. Yes, but it often means
> > increased performance hit and you lose the "live" status. For
> > example, in order to display the graph from the second point, you can
> > choose the query ResultsDB for every page view, but that means a lot
> > of computing demand. Or you can cache it and refresh it once an hour,
> > but that loses the live status. With notifications, you can have it
> > always perfectly up-to-date and you don't need to refresh it
> > needlessly. You can put in a safeguard against lost fedmsgs like
> > "refresh the graph if older than a week, just to be safe", but that's
> > it.
> > 
> > So, for machine processing, I see duplicated messages as a benefit. I
> > don't insist we need to have it, but it seems to allow interesting
> > tools to be written. (A different question is whether the volume
> > won't be too high for fedmsg bus to process it, but that is a
> > separate and a technical issue.)
> > 
> > If some machine didn't want to see duplicated messages and wanted to
> > be able to easily filter them out without keeping its own database of
> > querying ours, we can add something like "duplicate=True" into the
> > message body? Simple solution, for machines.
> > 
> > 
> > Now, let's imagine we still decide for message deduplication and we
> > chose the human as our primary notification target. There are further
> > issues with it. Let's imagine a simple scenario:
> > 
> > 1. A maintainer submits update U1 consisting of builds B1 and B2.
> > 2. Depcheck x86_64 runs on U1, reports results.
> > 3. Maintainer receives two fedmsg notifications, one for B1 and one
> > for B2, from FMN (email or irc).
> > 4. Depcheck i386 runs on U1, reports results.
> > 3. Maintainer receives two fedmsg notifications, one for B1
> > and one for B2, from FMN (email or irc).
> > 6. Depcheck armhfp runs on U1, reports results.
> > 3. Maintainer receives two fedmsg notifications, one for B1 and one
> > for B2, from FMN (email or irc).
> > 8. Upgradepath noarch runs on U1, reports results.
> > 3. Maintainer receives two fedmsg notifications, one for B1 and one
> > for B2, from FMN (email or irc).
> > 
> > As you can see, the maintainer receives "number of builds x number of
> > architectures (except for noarch checks) x number of checks" results.
> > And the notifications are distributed in time, not sent together at
> > once.
> > 
> > So, if we really want to do a good job in informing the maintainer
> > here, deduplication of future results is just one part of the story.
> > We also need to combine:
> > * individual build results, if they are part of a bigger object
> > (update)
> > * architecture results, for checks which are architecture dependent
> > * individual check results, if we run multiple checks
> > 
> > So that ideally:
> > 1. A maintainer submits update U1 consisting of builds B1 and B2.
> > 2. Depcheck x86_64 runs on U1, reports results.
> > 3. Depcheck i386 runs on U1, reports results.
> > 4. Depcheck armhfp runs on U1, reports results.
> > 5. Upgradepath noarch runs on U1, reports results.
> > 6. Maintainer receives a single fedmsg notification about U1, from
> > FMN (email or irc).
> > 
> > Unfortunately, this means we would have to implement a lot of
> > external logic (i.e. Bodhi's "what is an update" logic), which is
> > something we're trying to get away from (we have our unpleasant
> > experience with bodhi comments feature which deals with lots of this
> > stuff).
> 
> Yeah, this would be great but that's the exact thinking that gave us
> the current bodhi comment code that needs to die in a fire :)
> 
> IIRC, one of the things that we agreed on a while back was that we
> wouldn't try to do any of that by ourselves in the future.

And I completely agree.

> 
> > Taking all of this into account, it seems easier and more sensible to
> > me to target machines with taskotron fedmsgs. Let's see:
> > 
> > 1. A maintainer submits update U1 consisting of builds B1 and B2.
> > 2. Taskotron gradually executes all available checks on B1 and B2.
> > 3. Taskotron emits fedmsgs for every completed check, for every
> > architecture, for every build.
> > 4. Bodhi listens for Taskotron fedmsgs, marks internally (and
> > possibly in the web UI) which builds were tested with what result,
> > adds/updates links to logs.
> > 5. Once results for all builds x archs x checks were received, or
> > once some timeout occurred (e.g. "wait at most 8 hours for test
> > results"), Bodhi sends its own fedmsg.
> > 6. Maintainer listens for _Bodhi_ fedmsgs and receives a single
> > notification that U1 testing is complete.
> > 
> > Now, because of the fact that Bodhi is designed for publishing
> > updates, it can tailor the messaging behavior nicely. It can either
> > notify after all testing is complete, or it can notify immediately
> > after the first failure. It can have timeouts in case some tests get
> > stuck. I'm not sure if it can make some of these things configurable
> > for the particular maintainer, I think that is no longer possible
> > when using fedmsgs instead of emails. But it can publish under
> > different topics (e.g. first failure vs testing complete) and
> > maintainers can subscribe to what suits them. (And if they're feeling
> > particularly tough, they can of course also subscribe to the flood of
> > core taskotron fedmsgs).
> > 
> > Furthermore, Bodhi can put additional logic into this, splitting
> > checks into essential a non-essential group. I.e. depcheck +
> > upgradepath vs rpmlint + rpmgrill. The notifications can fire off
> > after the essential testing is complete, or maybe then can wait for
> > all testing but ignore potential failures in non-essential group (and
> > set the overall outcome to something like INFO, if e.g. only rpmlint
> > failed).
> > 
> > With this approach, I like that the Bodhi logic is configured in
> > Bodhi, and we're not trying to emulate it, we just supply raw data.
> > People subscribe to Bodhi notifications. The same approach can be
> > used with Koji or any other service - we're supplying data, they're
> > deciding what to do with it, what is important and what is not, and
> > they're sending final result notifications (or even partial if they
> > want and make sense).
> 
> Assuming that the bodhi devs are onboard with taking on most of the
> maintenance of all that, I like the idea of keeping most of that in
> bodhi or at least not in resultsdb. From the start, Josef and I
> (maybe more folks, don't recall) agreed that resultsdb should not care
> about whether a given item is a global pass/fail or overridden.
> Resultsdb should only care that item $x was run with result $r.
> Anything beyond that needs to be in a different system that is
> capable of using resultsdb as input.
> 
> The thing I don't understand is how deduplicating our emulated results
> from upgradepath or depcheck prevents any of that you've described. As
> long as we're emitting results on state change, an external system
> would be plenty capable of doing what you describe. Am I missing
> something?

No, you're not missing anything, you're right. I started talking about 
deduplicating depcheck-like results, but because I thought it's just the first 
step in the general direction of tailoring our notification system towards 
human consumption (people subscribing to taskotron messages directly), I 
started comparing these two approaches and showing that it might work better if 
we send our messages to middle-man systems with their own logic (Bodhi) and let 
people subscribe to those.

In other words, you started talking about depcheck-like results deduplication, 
I ended up talking about full deduplication of builds in an 
update/architectures/checks run on an update.

So, sorry for confusion. I went big :)

> 
> > But what about results which don't have a specific service, you ask?
> > What if new glibc is submitted and existing firefox is tested against
> > it using firefox-regression-suite check, where does these results go?
> > Great question.
> > 
> > I think the raw Taskotron fedmsgs are the answer here. Hopefully most
> > of these checks will be one-shot execution (unlike continuous
> > execution like depcheck). So if maintainers subscribe to our
> > messages, they should receive one result per every arch at worst,
> > i.e. 3 separate notifications for a single execution. Or, if they
> > have some really special kind of check, they'd process the
> > notifications on their own. Once we're there and checks like these
> > are more common, we can talk about providing services for further
> > deduplication. But still, even if we really need to do this in some
> > specific cases, I think the general approach should be the one
> > outlined above, where we don't notify people directly but send it
> > through middle-man services with their own logic and special needs.
> 
> Yeah, I think that for the near future after we have package-specific
> checks we'll be limited to taskotron fedmsgs. Once we have everything
> in place to support stuff like this, we can see if there are other
> reporting/status mechanisms that make sense.
> 
> > Now, after seeing the wall of text I've written, I wonder, have I
> > actually kept to the original topic, or strayed away into a
> > completely different area? :-)
> 
> I think most of it is broadly on the topic of how notifications and
> results are presented to users. Not sure that it's all required for
> fedmsg emission from resultsdb but I could be missing something :)


I talked to Josef yesterday about resultsdb implementation and he said this 
could be pretty easy. After each result submission, resultsdb would search for 
the last record of $check with $item and $arch. If it is found, it would send 
the fedmsg only if the outcome changed. If it is not found, it would send it 
always. This means that first result submission would actually be slower 
(nothing found = search whole db) than any "duplicate" submission later on 
(something found = search just part of db and stop).

Performance wise, he says it should be pretty fast. But, we don't have any 
sharding (i.e. keeping recent records separately from old records), so we can't 
time-limit this search. So the performance might decrease over time as we fill 
the db more, unless we clean the old records from the database regularly.

I realize now that we could perhaps use the task formula for depcheck and 
upgradepath, and add something like "filter_duplicates: True" to the resultsdb 
directive. That would allow us to a) limit the performance hit only to those 
submissions that really need it b) recognize which repo checks need to be 
deduplicated and which need not. I think it's better to use the formula for 
this than to hardcode it to resultsdb, as suggested above. We still need to 
explain this in documentation and it makes task writing more complex... but I 
don't see any simpler way to to do this, if we really want to have it.

_______________________________________________
qa-devel mailing list
qa-devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/qa-devel

Reply via email to