I'm +1 on grooming that ticket and sprint nominating it. I commented on question there about how to handle RQ.
On Wed, Jul 11, 2018 at 4:53 PM, Dennis Kliban <dkli...@redhat.com> wrote: > Thanks David. I am in favor of this change. > > On Wed, Jul 11, 2018 at 4:39 PM, David Davis <davidda...@redhat.com> > wrote: > >> There is now: >> >> https://pulp.plan.io/issues/3848 >> >> David >> >> >> On Wed, Jul 11, 2018 at 4:23 PM Brian Bouterse <bbout...@redhat.com> >> wrote: >> >>> A 30% improvement I think is a good case for integers over uuids. >>> >>> Is there a ticket tracking that change? >>> >>> On Wed, Jul 11, 2018 at 3:55 PM, Daniel Alley <dal...@redhat.com> wrote: >>> >>>> w/ creating 400,000 units, the non-uuid PK is 30% faster at 42.22 >>>> seconds vs. 55.98 seconds. >>>> >>>> w/ searching through the same 400,000 units, performance is still about >>>> 30% faster. Doing a filter for file content units that have a >>>> relative_path__startswith={some random letter} (I put UUIDs in all the >>>> fields) takes about 0.44 seconds if the model has a UUID pk and about 0.33 >>>> seconds if the model has a default Django auto-incrementing PK. >>>> >>>> On Wed, Jul 11, 2018 at 11:03 AM, Daniel Alley <dal...@redhat.com> >>>> wrote: >>>> >>>>> So, since I've already been working on some Pulp 3 benchmarking I >>>>> decided to go ahead and benchmark this to get some actual data. >>>>> >>>>> Disclaimer: The following data is using bulk_create() with a >>>>> modified, flat, non-inheriting content model, not the current multi-table >>>>> inherited content model we're currently using. It's also using >>>>> bulk_create() which we are not currently using in Pulp 3, but likely will >>>>> end up using eventually. >>>>> >>>>> Using normal IDs instead of UUIDs was between 13% and 25% faster with >>>>> 15,000 units. 15,000 units isn't really a sufficient value to actually >>>>> test index performance, so I'm rerunning it with a few hundred thousand >>>>> units, but that will take a substantial amount of time to run. I'll >>>>> follow >>>>> up later. >>>>> >>>>> As far as search/update performance goes, that probably has better >>>>> margins than just insert performance, but I'll need to write new code to >>>>> benchmark that properly. >>>>> >>>>> On Thu, May 24, 2018 at 11:52 AM, David Davis <davidda...@redhat.com> >>>>> wrote: >>>>> >>>>>> Agreed on performance. Doing some more Googling seems to have mixed >>>>>> opinions on whether UUIDs performance is worse or not. If this is a >>>>>> significant reason to switch, I agree we should test out the performance. >>>>>> >>>>>> Regarding the disk size, I think using UUIDs is cumulative. Larger >>>>>> PKs mean bigger index sizes, bigger FKs, etc. I agree that it’s probably >>>>>> not a major concern but I wouldn’t say it’s trivial. >>>>>> >>>>>> David >>>>>> >>>>>> On Thu, May 24, 2018 at 11:27 AM, Sean Myers <sean.my...@redhat.com> >>>>>> wrote: >>>>>> >>>>>>> Responses inline. >>>>>>> >>>>>>> On 05/23/2018 02:26 PM, David Davis wrote: >>>>>>> > Before the release of Pulp 3.0 GA, I think it’s worth just >>>>>>> checking in to >>>>>>> > make sure we want to use UUIDs over integer based IDs. Changing >>>>>>> from UUIDs >>>>>>> > to ints would be a very easy change at this point (1-2 lines of >>>>>>> code) but >>>>>>> > after GA ships, it would be hard if not impossible to switch. >>>>>>> > >>>>>>> > I think there are a number of reasons why we might want to >>>>>>> consider integer >>>>>>> > IDs: >>>>>>> > >>>>>>> > - Better performance all around for inserts[0], searches, >>>>>>> indexing, etc >>>>>>> >>>>>>> I don't really care either way, but it's worth pointing out that >>>>>>> UUIDs are >>>>>>> integers (in the sense that the entire internet can be reduced to a >>>>>>> single >>>>>>> integer since it's all just bits). To the best of my knowledge they >>>>>>> are equally >>>>>>> performant to integers and stored in similar ways in Postgres. >>>>>>> >>>>>>> You linked a MySQL experiment, done using a version of MySQL that is >>>>>>> nearly 10 >>>>>>> years old. If there are concerns about the performance of UUID PKs >>>>>>> vs. int PKs >>>>>>> in Pulp, we should compare apples to apples and profile Pulp using >>>>>>> UUID PKs, >>>>>>> profile Pulp using integer PKs, and then compare the two. >>>>>>> >>>>>>> In my small-scale testing (100,000 randomly generated content rows >>>>>>> of a >>>>>>> proto-RPM content model, 1000 repositories randomly related to each, >>>>>>> no db funny >>>>>>> business beyond enforced uniqueness constraints), there was either no >>>>>>> difference, or what difference there was fell into the margin of >>>>>>> error. >>>>>>> >>>>>>> > - Less storage required (4 bytes for int vs 16 byes for UUIDs) >>>>>>> >>>>>>> Well, okay...UUIDs are *huge* integers. But it's the length of an >>>>>>> IPv6 address >>>>>>> vs. the length of an IPv4 address. While it's true that 4 < 16, both >>>>>>> are still >>>>>>> pretty small. Trivially so, I think. >>>>>>> >>>>>>> Without taking relations into account, a table with a million rows >>>>>>> should be a >>>>>>> little less than twelve mega(mebi)bytes larger. Even at scale, the >>>>>>> size >>>>>>> difference is negligible, especially when compared to the size on >>>>>>> disk of the >>>>>>> actual content you'd need to be storing that those million rows >>>>>>> represent. >>>>>>> >>>>>>> > - Hrefs would be shorter (e.g. /pulp/api/v3/repositories/1/) >>>>>>> > - In line with other apps like Katello >>>>>>> >>>>>>> I think these two are definitely worth considering, though. >>>>>>> >>>>>>> > There are some downsides to consider though: >>>>>>> > >>>>>>> > - Integer ids expose info like how many records there are >>>>>>> >>>>>>> This was the main intent, if I recall correctly. UUID PKs are not: >>>>>>> - monotonically increasing >>>>>>> - variably sized (string length, not bit length) >>>>>>> >>>>>>> So an objects PK doesn't give you any indication of how many other >>>>>>> objects may >>>>>>> be in the same collection, and while the Hrefs are long, for any >>>>>>> given resource >>>>>>> they will always be a predictable size. >>>>>>> >>>>>>> The major downside is really that they're a pain in the butt to type >>>>>>> out when >>>>>>> compared to int PKs, so if users are in a situation where they do >>>>>>> have to type >>>>>>> these things out, I think something has gone wrong. >>>>>>> >>>>>>> If users typing in PKs can't be avoided, UUIDs probably should be >>>>>>> avoided. I >>>>>>> recognize that this is effectively a restatement of "Hrefs would be >>>>>>> shorter" in >>>>>>> the context of how that impacts the user. >>>>>>> >>>>>>> > - Can’t support sharding or multiple dbs (are we ever going to >>>>>>> need this?) >>>>>>> >>>>>>> A very good question. To the best of my recollection this was never >>>>>>> stated as a >>>>>>> hard requirement; it was only ever mentioned like it is here, as a >>>>>>> potential >>>>>>> positive side-effect of UUID keys. If collision-avoidance is not >>>>>>> desired, and >>>>>>> will certainly never be desired, then a normal integer field would >>>>>>> likely be a >>>>>>> less astonishing[0] user experience, and therefore a better user >>>>>>> experience. >>>>>>> >>>>>>> [0]: https://en.wikipedia.org/wiki/Principle_of_least_astonishment >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pulp-dev mailing list >>>>>>> Pulp-dev@redhat.com >>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Pulp-dev mailing list >>>>>> Pulp-dev@redhat.com >>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>> >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Pulp-dev mailing list >>>> Pulp-dev@redhat.com >>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>> >>>> >>> >> _______________________________________________ >> Pulp-dev mailing list >> Pulp-dev@redhat.com >> https://www.redhat.com/mailman/listinfo/pulp-dev >> >> >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev