Agreed on performance. Doing some more Googling seems to have mixed opinions on whether UUIDs performance is worse or not. If this is a significant reason to switch, I agree we should test out the performance.
Regarding the disk size, I think using UUIDs is cumulative. Larger PKs mean bigger index sizes, bigger FKs, etc. I agree that it’s probably not a major concern but I wouldn’t say it’s trivial. David On Thu, May 24, 2018 at 11:27 AM, Sean Myers <sean.my...@redhat.com> wrote: > Responses inline. > > On 05/23/2018 02:26 PM, David Davis wrote: > > Before the release of Pulp 3.0 GA, I think it’s worth just checking in to > > make sure we want to use UUIDs over integer based IDs. Changing from > UUIDs > > to ints would be a very easy change at this point (1-2 lines of code) > but > > after GA ships, it would be hard if not impossible to switch. > > > > I think there are a number of reasons why we might want to consider > integer > > IDs: > > > > - Better performance all around for inserts[0], searches, indexing, etc > > I don't really care either way, but it's worth pointing out that UUIDs are > integers (in the sense that the entire internet can be reduced to a single > integer since it's all just bits). To the best of my knowledge they are > equally > performant to integers and stored in similar ways in Postgres. > > You linked a MySQL experiment, done using a version of MySQL that is > nearly 10 > years old. If there are concerns about the performance of UUID PKs vs. int > PKs > in Pulp, we should compare apples to apples and profile Pulp using UUID > PKs, > profile Pulp using integer PKs, and then compare the two. > > In my small-scale testing (100,000 randomly generated content rows of a > proto-RPM content model, 1000 repositories randomly related to each, no db > funny > business beyond enforced uniqueness constraints), there was either no > difference, or what difference there was fell into the margin of error. > > > - Less storage required (4 bytes for int vs 16 byes for UUIDs) > > Well, okay...UUIDs are *huge* integers. But it's the length of an IPv6 > address > vs. the length of an IPv4 address. While it's true that 4 < 16, both are > still > pretty small. Trivially so, I think. > > Without taking relations into account, a table with a million rows should > be a > little less than twelve mega(mebi)bytes larger. Even at scale, the size > difference is negligible, especially when compared to the size on disk of > the > actual content you'd need to be storing that those million rows represent. > > > - Hrefs would be shorter (e.g. /pulp/api/v3/repositories/1/) > > - In line with other apps like Katello > > I think these two are definitely worth considering, though. > > > There are some downsides to consider though: > > > > - Integer ids expose info like how many records there are > > This was the main intent, if I recall correctly. UUID PKs are not: > - monotonically increasing > - variably sized (string length, not bit length) > > So an objects PK doesn't give you any indication of how many other objects > may > be in the same collection, and while the Hrefs are long, for any given > resource > they will always be a predictable size. > > The major downside is really that they're a pain in the butt to type out > when > compared to int PKs, so if users are in a situation where they do have to > type > these things out, I think something has gone wrong. > > If users typing in PKs can't be avoided, UUIDs probably should be avoided. > I > recognize that this is effectively a restatement of "Hrefs would be > shorter" in > the context of how that impacts the user. > > > - Can’t support sharding or multiple dbs (are we ever going to need > this?) > > A very good question. To the best of my recollection this was never stated > as a > hard requirement; it was only ever mentioned like it is here, as a > potential > positive side-effect of UUID keys. If collision-avoidance is not desired, > and > will certainly never be desired, then a normal integer field would likely > be a > less astonishing[0] user experience, and therefore a better user > experience. > > [0]: https://en.wikipedia.org/wiki/Principle_of_least_astonishment > > > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev > >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev