Thank you Daniel for the explanation and for filing an issue[0] to do performance analysis of UUIDs.
I really hope that we can switch back to using UUIDs so we can bring back MariaDB support for Pulp 3. [0] https://pulp.plan.io/issues/4290 On Wed, Dec 5, 2018 at 1:35 PM Daniel Alley <dal...@redhat.com> wrote: > To rephrase the problem a little bit: > > We need to bulk_create() a bunch of objects, and then after we do that we > want to immediately be able to relate them with other objects, which means > we need their PKs of the objects that were just created. > > In the case of auto-increment integer PKs, we can't know that PK value > before it gets saved into the database. Luckily, PostgreSQL (and Oracle) > support a "RETURNING" keyword that does provides this information. The raw > SQL would look something like this: > >> INSERT INTO items (name) values ('bear') RETURNING id; >> >> Django uses this feature to set the PK field on the model objects it > returns when you call bulk_create() on a list of unsaved model objects. > > Unfortunately, MySQL doesn't support this, so there's no way to figure out > what the PKs of the objects you just saved were, so the ORM can't set that > information on the returned model objects. > > UUID PKs circumvent this because the PK gets created outside of the > database, prior to being saved in the database, and so Django *can* know > what the PK will be when it gets saved. > > On Wed, Dec 5, 2018 at 12:11 PM Brian Bouterse <bbout...@redhat.com> > wrote: > >> +1 to experimentation and also making sure that we understand the >> performance implications of the decision. I'm replying to this earlier note >> to restate my observations of the problem a bit more. >> >> More ideas and thoughts are welcome. This is a decision with a lot of >> aspects to consider. >> >> >> On Tue, Nov 20, 2018 at 10:00 AM Patrick Creech <pcre...@redhat.com> >> wrote: >> >>> On Mon, 2018-11-19 at 17:08 -0500, Brian Bouterse wrote: >>> > When we switched from UUID to integers for the PK >>> > with databases other than PostgreSQL [0]. >>> > >>> > With a goal of database agnosticism for Pulp3, if plugin writers plan >>> to use bulk_create with any object inherited >>> > from one of ours, they can't will get different behaviors on different >>> databases and they won't have PKs that they may >>> > require. bulk_create is a normal django thing, so plugin writers >>> making a django plugin should be able to use it. This >>> > concerned me already, but today it was also brought up by non-RH >>> plugin writers also [1] in a PR. >>> > >>> > The tradeoffs bteween UUIDs versus PKs are pretty well summed up in >>> our ticket where we discussed that change [2]. >>> > Note, we did not consider this bulk_create downside at that time, >>> which I think is the most significant downside to >>> > consider. >>> > >>> > Having bulk_create effectively not available for plugin writers (since >>> we can't rely on its pks being returned) I >>> > think is a non-starter for me. I love how short the UUIDs made our >>> URLs so that's the tradeoff mainly in my mind. >>> > Those balanced against each other, I think we should switch back. >>> > >>> > Another option is to become PostgreSQL only which (though I love psql) >>> I think would be the wrong choice for Pulp from >>> > what I've heard from its users. >>> > >>> > What do you think? What should we do? >>> >>> So, my mind immediately goes to this question, which might be usefull >>> for others to help make decisions, so I'll ask: >>> >>> When you say: >>> >>> "we lost the ability to have the primary key set during bulk_create" >>> >>> Can you clarify what you mean by this? >>> >>> My mind immediately goes to this chain of events: >>> >>> When you use bulk_create, the existing in-memory model objects >>> representing the data to create do not get >>> updated with the primary key values that are created in the database. >>> >>> Upon a subsequent query of the database, for the exact same set >>> of objects just added, those objects _will_ have >>> the primary key populated. >>> >>> In other words, >>> >>> The database records themselves get the auto-increment IDs >>> added, they just don't get reported back in that >>> query to the ORM layer, therefore it takes a subsequent query to get >>> those ids out. >>> >>> Does that about sum it up? >>> >> >> Yes this describes the situation, but there is a bit more to tell. Since >> PostgreSQL does return the ids the subsequent query that could be done to >> get the ids isn't written in code today. We didn't need to because we >> developed it against PostgreSQL. I'm pretty sure that if you configure Pulp >> against MySQL Pulp won't work, which I think is a problem. So I'm observing >> two things here. 1) This is a hazard that causes code to unexpectedly be >> only compliant with PostgreSQL. 2) Pulp itself fell into this hazard and we >> need to fix that too >> >> Do you also see these two issues? What should be done about these? >> >> >>> >>> > >>> > [0]: >>> https://docs.djangoproject.com/en/2.1/ref/models/querysets/#bulk-create >>> > [1]: https://github.com/pulp/pulp/pull/3764#discussion_r234780702 >>> > [2]: https://pulp.plan.io/issues/3848 >>> > _______________________________________________ >>> > Pulp-dev mailing list >>> > Pulp-dev@redhat.com >>> > https://www.redhat.com/mailman/listinfo/pulp-dev >>> >>> _______________________________________________ >>> Pulp-dev mailing list >>> Pulp-dev@redhat.com >>> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> _______________________________________________ >> Pulp-dev mailing list >> Pulp-dev@redhat.com >> https://www.redhat.com/mailman/listinfo/pulp-dev >> > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev