Mike Bonnet wrote:
On Thu, 2008-07-17 at 13:54 -0400, Mike McLean wrote:
If the remote_repo_url data is going to be inherited (and I tend to
think it should be), then I think it should be in a separate table. I'd
like to reserve tag_config for data that is local to individual tags.
This will also make it easier to represent multiple remote repos.

I don't have any problem with this, though it does mean we'll need to
duplicate quite a bit of the inheritance-walking code, or make it
configurable as to which inheritance it's walking.  This new table would
also have to be versioned, the same way the tag_config table is.

Walking inheritance is just a matter of determining the inheritance order and scanning data on the parent tags in sequence. Currently, nothing scans tag_config in this way because no data in tag_config is inherited. (Well, in a sense tag_changed_since_event() does walk tag_config, but that's a little different.)

We need to figure out how we'll deal with multiplicity for the external repos. If tag A uses repo X and inherits from tag B which uses repo Y, then does tag A use both X and Y, or does the X entry override it?
A (+repo X)
 +- B (+repo Y)

My inclination is that it should override, because I think we'll want some way to do override that that mechanism seems easiest.

Also, I think we'll probably want to allow multiple external repos per tag, something which will be much easier to represent in an external table. We can include an explicit priority field to make a sane uniqueness condition (and to provide a clear ordering for the repo merge).

The big win here is that the methods and tools that query rpminfo for
information about what was present in the buildroot at build time
-snip-

I see all that, and I'm almost convinced. The flipside is that by default all the code will treat these external rpms the same as the local ones, which will not be correct for a number of cases. Obviously, part of this will involve changing code to behave differently for the external ones, I'm just worried about how much we might have to change, or what we might miss.

Yes, I realize that the "not null" constraint should exist now, and in
fact all rpms in the Fedora database do reference builds.  However, I
think logically having a remote rpm not reference a local build makes
sense.  The alternative is to create the build object from the srpm info
in the repodata (along with some namespacing similar to rpminfo).
However, this would significantly clutter the build table with
information that is pretty non-essential.

The idea of grouping them into builds appeals to me, but I don't think it's possible in general (though maybe we could fake it well enough somehow). The only data we're (mostly) guaranteed to have to work with is the sourcerpm header field. The catch is that in case of an nvr-collision we can't determine which build it belongs to (or indeed if we should create a new build of same nvr).

I'm open to suggestions on how to modify the uniqueness constraint to
handle this case.  We care about ensuring that a locally-built rpm
doesn't have the same n-v-r as another locally-built rpm.  I don't think
we care at all about n-v-r uniqueness amongst remote rpms.  However, we
probably want to avoid creating 2 rpminfo entries when the same remote
rpm is used in 2 different buildroots.  Using the sigmd5 is a good way
to avoid that.

Agreed. same sigmd5 ==> same rpm.

 However, what happens if a remote rpm with the same
n-v-r and sigmd5 gets pulled in from 2 different remote repos?

This gets into part of what bugs me about this and why I'm somewhat inclined to keep the ext repo data a step removed. It's so potentially dirty. Koji has all these consistency constraints that an external repo (much less many of them in aggregate) lacks.

It's quite possible that an external repo might respin a package keeping the same nvr, so we don't even need 2 external repos to hit this possibility.

Perhaps
the "origin" field should be pushed down to the buildroot_listing table,
so the buildroots can reference the same rpminfo object, but indicate
that it came from a different repo in each buildroot?

Interesting. Yeah, I think that is is probably the right answer.

Also, I'm thinking we need to have some sort of rpm_origin table so that all these references can be managed cleanly.

Also, what happens when we find 2 remote rpms with the same n-v-r but
different sigmd5s?  Should that be an error?

Certainly we have to allow the possibility that two origins might have overlapping nvras. Within a single origin, I'm not so sure. I suppose we can get away with some small consistency demands. As long as we're only enforcing unique nvra for local builds and indexing by sigmd5/similar, I don't think we /have/ to make this an error condition.

In the same vein, what happens when an external repo has an nvra+sigmd5 matching a /local/ rpm? Maybe it doesn't matter, though I guess technically we want to record the origin properly when it gets into a buildroot via external repo vs internal tag.

First, I'd like to be able to support external koji servers (or rather a
...
I agree that this is a desirable goal.  I believe this is more the
domain of the Koji secondary-arch daemon.  It would be talking directly

Well, it has some similarities to 2nd arch, but still quite different.

The more I think about it, the more I think that supporting an external koji server will probably be much different from from the ext repo business. Most of the issues with rpminfo will carry over, but with a koji server we will be able to determine build data and can probably actually pull off something like "inherit from tag X on koji server Y."

The tag content may be managed by build, but when it's time for it to
actually get used (in the form of a yum repo) it gets unfolded into a
big list of rpms.  And what gets associated with a buildroot is simply a
big list of rpms.  Conceptually I don't really have a problem with the
idea of a tag as a big list of rpms, that we happen to group by srpm
within Koji because it's more convenient for us.  So adding the external
repo information to tag_config is just an extension of the big list of
rpms model.

Yeah, I almost wish I hadn't made the build structure quite the way I did.

However, we will already be parsing the remote repodata, which contains
information like the srpm name for each rpm, so we could do something
more sophisticated here.
-snipsnip-
...
The repomerge tool seems like it solves the problem better, and would be
more useful in general.

If we're going to have our fingers in the repodata, we'll probably want to have them in the merge too. Perhaps we can get createrepo and/or this repomerge tool usefully libified?

--
Fedora-buildsys-list mailing list
Fedora-buildsys-list@redhat.com
https://www.redhat.com/mailman/listinfo/fedora-buildsys-list

Reply via email to