Re: [Pulp-dev] Concerns about bulk_create and PostgreSQL

Jeff Ortel Tue, 08 Jan 2019 08:16:24 -0800



On 1/3/19 1:28 PM, Simon Baatz wrote:

On Thu, Jan 03, 2019 at 01:02:57PM -0500, David Davis wrote:

    I don't think that using integer ids with bulk_create and supporting
    mysql/mariadb are necessarily mutually exclusive. I think there might
    be a way to find the records created using bulk_create if we know the
    natural key. It might be more performant than using UUIDs as well.

This assumes that there is a natural key.  For content types with no
digest information in the meta data, there may be a natural key
for content within a repo version only, but no natural key for the
overall content.  (If we want to support non-immediate modes for such
content.  In immediate mode, a digest can be computed from the
associated artifact(s)).


Can you give some examples of Content without a natural key?


Of course, there are ways around that (use a UUID as the "natural" key,
or add a UUID to the repo version key fields), but I would like to
avoid that.

    On Thu, Jan 3, 2019 at 11:04 AM Dennis Kliban <[1][email protected]>
    wrote:

    Thank you Daniel for the explanation and for filing an issue[0] to do
    performance analysis of UUIDs.
    I really hope that we can switch back to using UUIDs so we can bring
    back MariaDB support for Pulp 3.
    [0] [2]https://pulp.plan.io/issues/4290

    On Wed, Dec 5, 2018 at 1:35 PM Daniel Alley <[3][email protected]>
    wrote:

    To rephrase the problem a little bit:
    We need to bulk_create() a bunch of objects, and then after we do that
    we want to immediately be able to relate them with other objects, which
    means we need their PKs of the objects that were just created.
    In the case of auto-increment integer PKs, we can't know that PK value
    before it gets saved into the database.  Luckily, PostgreSQL (and
    Oracle) support a "RETURNING" keyword that does provides this
    information.  The raw SQL would look something like this:

INSERT INTO items (name) values ('bear') RETURNING id;

    Django uses this feature to set the PK field on the model objects it
    returns when you call bulk_create() on a list of unsaved model objects.
    Unfortunately, MySQL doesn't support this, so there's no way to figure
    out what the PKs of the objects you just saved were, so the ORM can't
    set that information on the returned model objects.
    UUID PKs circumvent this because the PK gets created outside of the
    database, prior to being saved in the database, and so Django *can*
    know what the PK will be when it gets saved.

    On Wed, Dec 5, 2018 at 12:11 PM Brian Bouterse <[4][email protected]>
    wrote:

    +1 to experimentation and also making sure that we understand the
    performance implications of the decision. I'm replying to this earlier
    note to restate my observations of the problem a bit more.
    More ideas and thoughts are welcome. This is a decision with a lot of
    aspects to consider.
    On Tue, Nov 20, 2018 at 10:00 AM Patrick Creech <[5][email protected]>
    wrote:

      On Mon, 2018-11-19 at 17:08 -0500, Brian Bouterse wrote:
      > When we switched from UUID to integers for the PK
      > with databases other than PostgreSQL [0].
      >
      > With a goal of database agnosticism for Pulp3, if plugin writers
      plan to use bulk_create with any object inherited
      > from one of ours, they can't will get different behaviors on
      different databases and they won't have PKs that they may
      > require. bulk_create is a normal django thing, so plugin writers
      making a django plugin should be able to use it. This
      > concerned me already, but today it was also brought up by non-RH
      plugin writers also [1] in a PR.
      >
      > The tradeoffs bteween UUIDs versus PKs are pretty well summed up
      in our ticket where we discussed that change [2].
      > Note, we did not consider this bulk_create downside at that time,
      which I think is the most significant downside to
      > consider.
      >
      > Having bulk_create effectively not available for plugin writers
      (since we can't rely on its pks being returned) I
      > think is a non-starter for me. I love how short the UUIDs made our
      URLs so that's the tradeoff mainly in my mind.
      > Those balanced against each other, I think we should switch back.
      >
      > Another option is to become PostgreSQL only which (though I love
      psql) I think would be the wrong choice for Pulp from
      > what I've heard from its users.
      >
      > What do you think? What should we do?
      So, my mind immediately goes to this question, which might be
      usefull for others to help make decisions, so I'll ask:
      When you say:
      "we lost the ability to have the primary key set during bulk_create"
      Can you clarify what you mean by this?
      My mind immediately goes to this chain of events:
              When you use bulk_create, the existing in-memory model
      objects representing the data to create do not get
      updated with the primary key values that are created in the
      database.
              Upon a subsequent query of the database, for the exact same
      set of objects just added, those objects _will_ have
      the primary key populated.
      In other words,
              The database records themselves get the auto-increment IDs
      added, they just don't get reported back in that
      query to the ORM layer, therefore it takes a subsequent query to get
      those ids out.
      Does that about sum it up?

    Yes this describes the situation, but there is a bit more to tell.
    Since PostgreSQL does return the ids the subsequent query that could be
    done to get the ids isn't written in code today. We didn't need to
    because we developed it against PostgreSQL. I'm pretty sure that if you
    configure Pulp against MySQL Pulp won't work, which I think is a
    problem. So I'm observing two things here. 1) This is a hazard that
    causes code to unexpectedly be only compliant with PostgreSQL. 2) Pulp
    itself fell into this hazard and we need to fix that too
    Do you also see these two issues? What should be done about these?

      >
      > [0]:
      [6]https://docs.djangoproject.com/en/2.1/ref/models/querysets/#bulk-
      create
      > [1]:
      [7]https://github.com/pulp/pulp/pull/3764#discussion_r234780702
      > [2]: [8]https://pulp.plan.io/issues/3848
      > _______________________________________________
      > Pulp-dev mailing list
      > [9][email protected]
      > [10]https://www.redhat.com/mailman/listinfo/pulp-dev
      _______________________________________________
      Pulp-dev mailing list
      [11][email protected]
      [12]https://www.redhat.com/mailman/listinfo/pulp-dev

      _______________________________________________
      Pulp-dev mailing list
      [13][email protected]
      [14]https://www.redhat.com/mailman/listinfo/pulp-dev

      _______________________________________________
      Pulp-dev mailing list
      [15][email protected]
      [16]https://www.redhat.com/mailman/listinfo/pulp-dev

      _______________________________________________
      Pulp-dev mailing list
      [17][email protected]
      [18]https://www.redhat.com/mailman/listinfo/pulp-dev

References

    1. mailto:[email protected]
    2. https://pulp.plan.io/issues/4290
    3. mailto:[email protected]
    4. mailto:[email protected]
    5. mailto:[email protected]
    6. https://docs.djangoproject.com/en/2.1/ref/models/querysets/#bulk-create
    7. https://github.com/pulp/pulp/pull/3764#discussion_r234780702
    8. https://pulp.plan.io/issues/3848
    9. mailto:[email protected]
   10. https://www.redhat.com/mailman/listinfo/pulp-dev
   11. mailto:[email protected]
   12. https://www.redhat.com/mailman/listinfo/pulp-dev
   13. mailto:[email protected]
   14. https://www.redhat.com/mailman/listinfo/pulp-dev
   15. mailto:[email protected]
   16. https://www.redhat.com/mailman/listinfo/pulp-dev
   17. mailto:[email protected]
   18. https://www.redhat.com/mailman/listinfo/pulp-dev
_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev


_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev

Re: [Pulp-dev] Concerns about bulk_create and PostgreSQL

Reply via email to