Yes, I used the "started_at" and "finished_at" timestamps. And there's definitely things we can do to speed up sync times since they dropped by a sizable amount since the last time I did this testing. I'm not sure where that slowdown could have come from but I'm sure we can figure it out.
On Wed, Feb 27, 2019 at 4:10 AM David Davis <[email protected]> wrote: > Daniel, > > Thanks for the work on this. I'm wondering where you got the times from. > The task timestamps? I'm asking because when you say 30-40% slow down, I am > wondering if that's the overall time it takes to sync or if that's just > part of the sync. I think it's the former which I do find a bit troubling. > That said, I think I agree with your conclusion that we should probably > switch to UUIDs anyway. Perhaps we can find other ways to speed up sync > times. > > David > > > On Wed, Feb 27, 2019 at 1:23 AM Daniel Alley <[email protected]> wrote: > >> Hello all, >> >> We've had an ongoing discussion about whether Pulp would be able to >> perform acceptably if we switched back to UUID primary keys. I've finished >> doing the performance testing and I *think* the answer is yes. Although to >> be honest, I'm not sure that I understand why, in the case of MariaDB. >> >> I linked my testing methodology and results here: >> https://pulp.plan.io/issues/4290#note-18 >> >> To summarize, I tested the following: >> >> * How long it takes to perform subsequent large (lazy) syncs, with lots >> of content in the database (100-400k content units) >> * How long it takes to perform various small but important database >> queries >> >> The results were weirdly in contrast in some cases. >> >> The first four syncs (202,000 content total) behaved mostly the same on >> PostgreSQL whether it used an autoincrement or UUID primary key. >> Subsequent syncs had a performance drop of between 30-40%. Likewise, the >> code snippets performed 30+% worse. Sync time scaled linearly"ish" with >> the amont of content in the repository in both cases, which was a bit >> surprising to me. The size of the database at the end was 30-40% larger >> with UUID primary keys, 736 MB vs 521 MB. The gap would be smaller in >> typical usage when you consider that most content types have more metadata >> than FileContent (what I was testing). >> >> Autoincrement PostgreSQL (left) vs. UUID PostgreSQL (right) in diff form >> https://www.diffchecker.com/40AF8vvM >> >> With MariaDB the first sync was almost 80% slower than the first sync w/ >> PostgreSQL, but every subsequent sync was as fast or faster, despite the >> tests of specific queries performing multiple times worse. Additionally >> the sync performance did not decrease as rapidly as it did under >> PostgreSQL. With MariaDB, one of my test queries that worked fine when >> backed by PostgreSQL ended up hanging endlessly and I had to cut it off >> after 25 or so minutes. [0] I would consider that a blocker to claiming we >> support MariaDB / MySQL. >> >> But overall I'm not sure how to interpret the fact that on one hand the >> real-usage performance is equal or better better, and on the performance of >> some of the underlying queries is noticably worse. Maybe there's some >> weird caching going on in the backend, or the generated indexes are >> different? >> >> UUID PostgreSQL (left) vs. UUID MariaDB (right) in diff form >> https://www.diffchecker.com/W1nnIQgj >> >> I'd like to invite some discussion on this, but nothing I've mentioned >> seems like it would be a problem for going forwards with using UUID primary >> keys in a general sense. If we're all in agreement about that engineering >> decision then we can move forwards with that work. >> >> [0] for *some* but not all repository versions. No idea what's up there. >> >> >> >> >> >> >> >> _______________________________________________ >> Pulp-dev mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/pulp-dev >> >
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
