I don't think that using integer ids with bulk_create and supporting mysql/mariadb are necessarily mutually exclusive. I think there might be a way to find the records created using bulk_create if we know the natural key. It might be more performant than using UUIDs as well.
David On Thu, Jan 3, 2019 at 11:04 AM Dennis Kliban <[email protected]> wrote: > Thank you Daniel for the explanation and for filing an issue[0] to do > performance analysis of UUIDs. > > I really hope that we can switch back to using UUIDs so we can bring back > MariaDB support for Pulp 3. > > [0] https://pulp.plan.io/issues/4290 > > On Wed, Dec 5, 2018 at 1:35 PM Daniel Alley <[email protected]> wrote: > >> To rephrase the problem a little bit: >> >> We need to bulk_create() a bunch of objects, and then after we do that we >> want to immediately be able to relate them with other objects, which means >> we need their PKs of the objects that were just created. >> >> In the case of auto-increment integer PKs, we can't know that PK value >> before it gets saved into the database. Luckily, PostgreSQL (and Oracle) >> support a "RETURNING" keyword that does provides this information. The raw >> SQL would look something like this: >> >>> INSERT INTO items (name) values ('bear') RETURNING id; >>> >>> Django uses this feature to set the PK field on the model objects it >> returns when you call bulk_create() on a list of unsaved model objects. >> >> Unfortunately, MySQL doesn't support this, so there's no way to figure >> out what the PKs of the objects you just saved were, so the ORM can't set >> that information on the returned model objects. >> >> UUID PKs circumvent this because the PK gets created outside of the >> database, prior to being saved in the database, and so Django *can* know >> what the PK will be when it gets saved. >> >> On Wed, Dec 5, 2018 at 12:11 PM Brian Bouterse <[email protected]> >> wrote: >> >>> +1 to experimentation and also making sure that we understand the >>> performance implications of the decision. I'm replying to this earlier note >>> to restate my observations of the problem a bit more. >>> >>> More ideas and thoughts are welcome. This is a decision with a lot of >>> aspects to consider. >>> >>> >>> On Tue, Nov 20, 2018 at 10:00 AM Patrick Creech <[email protected]> >>> wrote: >>> >>>> On Mon, 2018-11-19 at 17:08 -0500, Brian Bouterse wrote: >>>> > When we switched from UUID to integers for the PK >>>> > with databases other than PostgreSQL [0]. >>>> > >>>> > With a goal of database agnosticism for Pulp3, if plugin writers plan >>>> to use bulk_create with any object inherited >>>> > from one of ours, they can't will get different behaviors on >>>> different databases and they won't have PKs that they may >>>> > require. bulk_create is a normal django thing, so plugin writers >>>> making a django plugin should be able to use it. This >>>> > concerned me already, but today it was also brought up by non-RH >>>> plugin writers also [1] in a PR. >>>> > >>>> > The tradeoffs bteween UUIDs versus PKs are pretty well summed up in >>>> our ticket where we discussed that change [2]. >>>> > Note, we did not consider this bulk_create downside at that time, >>>> which I think is the most significant downside to >>>> > consider. >>>> > >>>> > Having bulk_create effectively not available for plugin writers >>>> (since we can't rely on its pks being returned) I >>>> > think is a non-starter for me. I love how short the UUIDs made our >>>> URLs so that's the tradeoff mainly in my mind. >>>> > Those balanced against each other, I think we should switch back. >>>> > >>>> > Another option is to become PostgreSQL only which (though I love >>>> psql) I think would be the wrong choice for Pulp from >>>> > what I've heard from its users. >>>> > >>>> > What do you think? What should we do? >>>> >>>> So, my mind immediately goes to this question, which might be usefull >>>> for others to help make decisions, so I'll ask: >>>> >>>> When you say: >>>> >>>> "we lost the ability to have the primary key set during bulk_create" >>>> >>>> Can you clarify what you mean by this? >>>> >>>> My mind immediately goes to this chain of events: >>>> >>>> When you use bulk_create, the existing in-memory model objects >>>> representing the data to create do not get >>>> updated with the primary key values that are created in the database. >>>> >>>> Upon a subsequent query of the database, for the exact same set >>>> of objects just added, those objects _will_ have >>>> the primary key populated. >>>> >>>> In other words, >>>> >>>> The database records themselves get the auto-increment IDs >>>> added, they just don't get reported back in that >>>> query to the ORM layer, therefore it takes a subsequent query to get >>>> those ids out. >>>> >>>> Does that about sum it up? >>>> >>> >>> Yes this describes the situation, but there is a bit more to tell. Since >>> PostgreSQL does return the ids the subsequent query that could be done to >>> get the ids isn't written in code today. We didn't need to because we >>> developed it against PostgreSQL. I'm pretty sure that if you configure Pulp >>> against MySQL Pulp won't work, which I think is a problem. So I'm observing >>> two things here. 1) This is a hazard that causes code to unexpectedly be >>> only compliant with PostgreSQL. 2) Pulp itself fell into this hazard and we >>> need to fix that too >>> >>> Do you also see these two issues? What should be done about these? >>> >>> >>>> >>>> > >>>> > [0]: >>>> https://docs.djangoproject.com/en/2.1/ref/models/querysets/#bulk-create >>>> > [1]: https://github.com/pulp/pulp/pull/3764#discussion_r234780702 >>>> > [2]: https://pulp.plan.io/issues/3848 >>>> > _______________________________________________ >>>> > Pulp-dev mailing list >>>> > [email protected] >>>> > https://www.redhat.com/mailman/listinfo/pulp-dev >>>> >>>> _______________________________________________ >>>> Pulp-dev mailing list >>>> [email protected] >>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>> >>> _______________________________________________ >>> Pulp-dev mailing list >>> [email protected] >>> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> _______________________________________________ >> Pulp-dev mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/pulp-dev >> > _______________________________________________ > Pulp-dev mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/pulp-dev >
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
