I think adding a separate UUID column seems like a feasible workaround for Content without a natural key. But I can understand the desire not to do so in order to create an artificial natural key.
I guess we can weigh the performance results of using UUID PKs against how often we expect plugin Content to have no natural key. David On Thu, Jan 3, 2019 at 2:28 PM Simon Baatz <gmbno...@gmail.com> wrote: > On Thu, Jan 03, 2019 at 01:02:57PM -0500, David Davis wrote: > > I don't think that using integer ids with bulk_create and supporting > > mysql/mariadb are necessarily mutually exclusive. I think there might > > be a way to find the records created using bulk_create if we know the > > natural key. It might be more performant than using UUIDs as well. > > This assumes that there is a natural key. For content types with no > digest information in the meta data, there may be a natural key > for content within a repo version only, but no natural key for the > overall content. (If we want to support non-immediate modes for such > content. In immediate mode, a digest can be computed from the > associated artifact(s)). > > Of course, there are ways around that (use a UUID as the "natural" key, > or add a UUID to the repo version key fields), but I would like to > avoid that. > > > On Thu, Jan 3, 2019 at 11:04 AM Dennis Kliban <[1]dkli...@redhat.com> > > wrote: > > > > Thank you Daniel for the explanation and for filing an issue[0] to do > > performance analysis of UUIDs. > > I really hope that we can switch back to using UUIDs so we can bring > > back MariaDB support for Pulp 3. > > [0] [2]https://pulp.plan.io/issues/4290 > > > > On Wed, Dec 5, 2018 at 1:35 PM Daniel Alley <[3]dal...@redhat.com> > > wrote: > > > > To rephrase the problem a little bit: > > We need to bulk_create() a bunch of objects, and then after we do that > > we want to immediately be able to relate them with other objects, > which > > means we need their PKs of the objects that were just created. > > In the case of auto-increment integer PKs, we can't know that PK value > > before it gets saved into the database. Luckily, PostgreSQL (and > > Oracle) support a "RETURNING" keyword that does provides this > > information. The raw SQL would look something like this: > > > > INSERT INTO items (name) values ('bear') RETURNING id; > > > > Django uses this feature to set the PK field on the model objects it > > returns when you call bulk_create() on a list of unsaved model > objects. > > Unfortunately, MySQL doesn't support this, so there's no way to figure > > out what the PKs of the objects you just saved were, so the ORM can't > > set that information on the returned model objects. > > UUID PKs circumvent this because the PK gets created outside of the > > database, prior to being saved in the database, and so Django *can* > > know what the PK will be when it gets saved. > > > > On Wed, Dec 5, 2018 at 12:11 PM Brian Bouterse <[4] > bbout...@redhat.com> > > wrote: > > > > +1 to experimentation and also making sure that we understand the > > performance implications of the decision. I'm replying to this earlier > > note to restate my observations of the problem a bit more. > > More ideas and thoughts are welcome. This is a decision with a lot of > > aspects to consider. > > On Tue, Nov 20, 2018 at 10:00 AM Patrick Creech <[5] > pcre...@redhat.com> > > wrote: > > > > On Mon, 2018-11-19 at 17:08 -0500, Brian Bouterse wrote: > > > When we switched from UUID to integers for the PK > > > with databases other than PostgreSQL [0]. > > > > > > With a goal of database agnosticism for Pulp3, if plugin writers > > plan to use bulk_create with any object inherited > > > from one of ours, they can't will get different behaviors on > > different databases and they won't have PKs that they may > > > require. bulk_create is a normal django thing, so plugin writers > > making a django plugin should be able to use it. This > > > concerned me already, but today it was also brought up by non-RH > > plugin writers also [1] in a PR. > > > > > > The tradeoffs bteween UUIDs versus PKs are pretty well summed up > > in our ticket where we discussed that change [2]. > > > Note, we did not consider this bulk_create downside at that time, > > which I think is the most significant downside to > > > consider. > > > > > > Having bulk_create effectively not available for plugin writers > > (since we can't rely on its pks being returned) I > > > think is a non-starter for me. I love how short the UUIDs made our > > URLs so that's the tradeoff mainly in my mind. > > > Those balanced against each other, I think we should switch back. > > > > > > Another option is to become PostgreSQL only which (though I love > > psql) I think would be the wrong choice for Pulp from > > > what I've heard from its users. > > > > > > What do you think? What should we do? > > So, my mind immediately goes to this question, which might be > > usefull for others to help make decisions, so I'll ask: > > When you say: > > "we lost the ability to have the primary key set during bulk_create" > > Can you clarify what you mean by this? > > My mind immediately goes to this chain of events: > > When you use bulk_create, the existing in-memory model > > objects representing the data to create do not get > > updated with the primary key values that are created in the > > database. > > Upon a subsequent query of the database, for the exact same > > set of objects just added, those objects _will_ have > > the primary key populated. > > In other words, > > The database records themselves get the auto-increment IDs > > added, they just don't get reported back in that > > query to the ORM layer, therefore it takes a subsequent query to get > > those ids out. > > Does that about sum it up? > > > > Yes this describes the situation, but there is a bit more to tell. > > Since PostgreSQL does return the ids the subsequent query that could > be > > done to get the ids isn't written in code today. We didn't need to > > because we developed it against PostgreSQL. I'm pretty sure that if > you > > configure Pulp against MySQL Pulp won't work, which I think is a > > problem. So I'm observing two things here. 1) This is a hazard that > > causes code to unexpectedly be only compliant with PostgreSQL. 2) Pulp > > itself fell into this hazard and we need to fix that too > > Do you also see these two issues? What should be done about these? > > > > > > > > [0]: > > [6] > https://docs.djangoproject.com/en/2.1/ref/models/querysets/#bulk- > > create > > > [1]: > > [7]https://github.com/pulp/pulp/pull/3764#discussion_r234780702 > > > [2]: [8]https://pulp.plan.io/issues/3848 > > > _______________________________________________ > > > Pulp-dev mailing list > > > [9]Pulp-dev@redhat.com > > > [10]https://www.redhat.com/mailman/listinfo/pulp-dev > > _______________________________________________ > > Pulp-dev mailing list > > [11]Pulp-dev@redhat.com > > [12]https://www.redhat.com/mailman/listinfo/pulp-dev > > > > _______________________________________________ > > Pulp-dev mailing list > > [13]Pulp-dev@redhat.com > > [14]https://www.redhat.com/mailman/listinfo/pulp-dev > > > > _______________________________________________ > > Pulp-dev mailing list > > [15]Pulp-dev@redhat.com > > [16]https://www.redhat.com/mailman/listinfo/pulp-dev > > > > _______________________________________________ > > Pulp-dev mailing list > > [17]Pulp-dev@redhat.com > > [18]https://www.redhat.com/mailman/listinfo/pulp-dev > > > > References > > > > 1. mailto:dkli...@redhat.com > > 2. https://pulp.plan.io/issues/4290 > > 3. mailto:dal...@redhat.com > > 4. mailto:bbout...@redhat.com > > 5. mailto:pcre...@redhat.com > > 6. > https://docs.djangoproject.com/en/2.1/ref/models/querysets/#bulk-create > > 7. https://github.com/pulp/pulp/pull/3764#discussion_r234780702 > > 8. https://pulp.plan.io/issues/3848 > > 9. mailto:Pulp-dev@redhat.com > > 10. https://www.redhat.com/mailman/listinfo/pulp-dev > > 11. mailto:Pulp-dev@redhat.com > > 12. https://www.redhat.com/mailman/listinfo/pulp-dev > > 13. mailto:Pulp-dev@redhat.com > > 14. https://www.redhat.com/mailman/listinfo/pulp-dev > > 15. mailto:Pulp-dev@redhat.com > > 16. https://www.redhat.com/mailman/listinfo/pulp-dev > > 17. mailto:Pulp-dev@redhat.com > > 18. https://www.redhat.com/mailman/listinfo/pulp-dev > > > _______________________________________________ > > Pulp-dev mailing list > > Pulp-dev@redhat.com > > https://www.redhat.com/mailman/listinfo/pulp-dev > > > -- > Simon Baatz <gmbno...@gmail.com> >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev