A 30% improvement I think is a good case for integers over uuids. Is there a ticket tracking that change?
On Wed, Jul 11, 2018 at 3:55 PM, Daniel Alley <dal...@redhat.com> wrote: > w/ creating 400,000 units, the non-uuid PK is 30% faster at 42.22 seconds > vs. 55.98 seconds. > > w/ searching through the same 400,000 units, performance is still about > 30% faster. Doing a filter for file content units that have a > relative_path__startswith={some random letter} (I put UUIDs in all the > fields) takes about 0.44 seconds if the model has a UUID pk and about 0.33 > seconds if the model has a default Django auto-incrementing PK. > > On Wed, Jul 11, 2018 at 11:03 AM, Daniel Alley <dal...@redhat.com> wrote: > >> So, since I've already been working on some Pulp 3 benchmarking I decided >> to go ahead and benchmark this to get some actual data. >> >> Disclaimer: The following data is using bulk_create() with a modified, >> flat, non-inheriting content model, not the current multi-table inherited >> content model we're currently using. It's also using bulk_create() which >> we are not currently using in Pulp 3, but likely will end up using >> eventually. >> >> Using normal IDs instead of UUIDs was between 13% and 25% faster with >> 15,000 units. 15,000 units isn't really a sufficient value to actually >> test index performance, so I'm rerunning it with a few hundred thousand >> units, but that will take a substantial amount of time to run. I'll follow >> up later. >> >> As far as search/update performance goes, that probably has better >> margins than just insert performance, but I'll need to write new code to >> benchmark that properly. >> >> On Thu, May 24, 2018 at 11:52 AM, David Davis <davidda...@redhat.com> >> wrote: >> >>> Agreed on performance. Doing some more Googling seems to have mixed >>> opinions on whether UUIDs performance is worse or not. If this is a >>> significant reason to switch, I agree we should test out the performance. >>> >>> Regarding the disk size, I think using UUIDs is cumulative. Larger PKs >>> mean bigger index sizes, bigger FKs, etc. I agree that it’s probably not a >>> major concern but I wouldn’t say it’s trivial. >>> >>> David >>> >>> On Thu, May 24, 2018 at 11:27 AM, Sean Myers <sean.my...@redhat.com> >>> wrote: >>> >>>> Responses inline. >>>> >>>> On 05/23/2018 02:26 PM, David Davis wrote: >>>> > Before the release of Pulp 3.0 GA, I think it’s worth just checking >>>> in to >>>> > make sure we want to use UUIDs over integer based IDs. Changing from >>>> UUIDs >>>> > to ints would be a very easy change at this point (1-2 lines of >>>> code) but >>>> > after GA ships, it would be hard if not impossible to switch. >>>> > >>>> > I think there are a number of reasons why we might want to consider >>>> integer >>>> > IDs: >>>> > >>>> > - Better performance all around for inserts[0], searches, indexing, >>>> etc >>>> >>>> I don't really care either way, but it's worth pointing out that UUIDs >>>> are >>>> integers (in the sense that the entire internet can be reduced to a >>>> single >>>> integer since it's all just bits). To the best of my knowledge they are >>>> equally >>>> performant to integers and stored in similar ways in Postgres. >>>> >>>> You linked a MySQL experiment, done using a version of MySQL that is >>>> nearly 10 >>>> years old. If there are concerns about the performance of UUID PKs vs. >>>> int PKs >>>> in Pulp, we should compare apples to apples and profile Pulp using UUID >>>> PKs, >>>> profile Pulp using integer PKs, and then compare the two. >>>> >>>> In my small-scale testing (100,000 randomly generated content rows of a >>>> proto-RPM content model, 1000 repositories randomly related to each, no >>>> db funny >>>> business beyond enforced uniqueness constraints), there was either no >>>> difference, or what difference there was fell into the margin of error. >>>> >>>> > - Less storage required (4 bytes for int vs 16 byes for UUIDs) >>>> >>>> Well, okay...UUIDs are *huge* integers. But it's the length of an IPv6 >>>> address >>>> vs. the length of an IPv4 address. While it's true that 4 < 16, both >>>> are still >>>> pretty small. Trivially so, I think. >>>> >>>> Without taking relations into account, a table with a million rows >>>> should be a >>>> little less than twelve mega(mebi)bytes larger. Even at scale, the size >>>> difference is negligible, especially when compared to the size on disk >>>> of the >>>> actual content you'd need to be storing that those million rows >>>> represent. >>>> >>>> > - Hrefs would be shorter (e.g. /pulp/api/v3/repositories/1/) >>>> > - In line with other apps like Katello >>>> >>>> I think these two are definitely worth considering, though. >>>> >>>> > There are some downsides to consider though: >>>> > >>>> > - Integer ids expose info like how many records there are >>>> >>>> This was the main intent, if I recall correctly. UUID PKs are not: >>>> - monotonically increasing >>>> - variably sized (string length, not bit length) >>>> >>>> So an objects PK doesn't give you any indication of how many other >>>> objects may >>>> be in the same collection, and while the Hrefs are long, for any given >>>> resource >>>> they will always be a predictable size. >>>> >>>> The major downside is really that they're a pain in the butt to type >>>> out when >>>> compared to int PKs, so if users are in a situation where they do have >>>> to type >>>> these things out, I think something has gone wrong. >>>> >>>> If users typing in PKs can't be avoided, UUIDs probably should be >>>> avoided. I >>>> recognize that this is effectively a restatement of "Hrefs would be >>>> shorter" in >>>> the context of how that impacts the user. >>>> >>>> > - Can’t support sharding or multiple dbs (are we ever going to need >>>> this?) >>>> >>>> A very good question. To the best of my recollection this was never >>>> stated as a >>>> hard requirement; it was only ever mentioned like it is here, as a >>>> potential >>>> positive side-effect of UUID keys. If collision-avoidance is not >>>> desired, and >>>> will certainly never be desired, then a normal integer field would >>>> likely be a >>>> less astonishing[0] user experience, and therefore a better user >>>> experience. >>>> >>>> [0]: https://en.wikipedia.org/wiki/Principle_of_least_astonishment >>>> >>>> >>>> _______________________________________________ >>>> Pulp-dev mailing list >>>> Pulp-dev@redhat.com >>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>> >>>> >>> >>> _______________________________________________ >>> Pulp-dev mailing list >>> Pulp-dev@redhat.com >>> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >>> >> > > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev > >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev