Re: [Pulp-dev] Performance testing results, autoincrement ID vs UUID primary keys

Daniel Alley Wed, 27 Feb 2019 05:48:58 -0800

Yes, I used the "started_at" and "finished_at" timestamps.  And there's
definitely things we can do to speed up sync times since they dropped by a
sizable amount since the last time I did this testing. I'm not sure where
that slowdown could have come from but I'm sure we can figure it out.


On Wed, Feb 27, 2019 at 4:10 AM David Davis <[email protected]> wrote:

> Daniel,
>
> Thanks for the work on this. I'm wondering where you got the times from.
> The task timestamps? I'm asking because when you say 30-40% slow down, I am
> wondering if that's the overall time it takes to sync or if that's just
> part of the sync. I think it's the former which I do find a bit troubling.
> That said, I think I agree with your conclusion that we should probably
> switch to UUIDs anyway. Perhaps we can find other ways to speed up sync
> times.
>
> David
>
>
> On Wed, Feb 27, 2019 at 1:23 AM Daniel Alley <[email protected]> wrote:
>
>> Hello all,
>>
>> We've had an ongoing discussion about whether Pulp would be able to
>> perform acceptably if we switched back to UUID primary keys.  I've finished
>> doing the performance testing and I *think* the answer is yes.  Although to
>> be honest, I'm not sure that I understand why, in the case of MariaDB.
>>
>> I linked my testing methodology and results here:
>> https://pulp.plan.io/issues/4290#note-18
>>
>> To summarize, I tested the following:
>>
>> * How long it takes to perform subsequent large (lazy) syncs, with lots
>> of content in the database (100-400k content units)
>> * How long it takes to perform various small but important database
>> queries
>>
>> The results were weirdly in contrast in some cases.
>>
>> The first four syncs (202,000 content total) behaved mostly the same on
>> PostgreSQL whether it used an autoincrement or UUID primary key.
>> Subsequent syncs had a performance drop of between 30-40%.  Likewise, the
>> code snippets performed 30+% worse.  Sync time scaled linearly"ish" with
>> the amont of content in the repository in both cases, which was a bit
>> surprising to me.  The size of the database at the end was 30-40% larger
>> with UUID primary keys, 736 MB vs 521 MB.  The gap would be smaller in
>> typical usage when you consider that most content types have more metadata
>> than FileContent (what I was testing).
>>
>> Autoincrement PostgreSQL (left) vs. UUID PostgreSQL (right) in diff form
>> https://www.diffchecker.com/40AF8vvM
>>
>> With MariaDB the first sync was almost 80% slower than the first sync w/
>> PostgreSQL, but every subsequent sync was as fast or faster, despite the
>> tests of specific queries performing multiple times worse.  Additionally
>> the sync performance did not decrease as rapidly as it did under
>> PostgreSQL.  With MariaDB, one of my test queries that worked fine when
>> backed by PostgreSQL ended up hanging endlessly and I had to cut it off
>> after 25 or so minutes. [0]  I would consider that a blocker to claiming we
>> support MariaDB / MySQL.
>>
>> But overall I'm not sure how to interpret the fact that on one hand the
>> real-usage performance is equal or better better, and on the performance of
>> some of the underlying queries is noticably worse.  Maybe there's some
>> weird caching going on in the backend, or the generated indexes are
>> different?
>>
>> UUID PostgreSQL (left) vs. UUID MariaDB (right) in diff form
>> https://www.diffchecker.com/W1nnIQgj
>>
>> I'd like to invite some discussion on this, but nothing I've mentioned
>> seems like it would be a problem for going forwards with using UUID primary
>> keys in a general sense.  If we're all in agreement about that engineering
>> decision then we can move forwards with that work.
>>
>> [0] for *some* but not all repository versions.  No idea what's up there.
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> [email protected]
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>

_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev

Re: [Pulp-dev] Performance testing results, autoincrement ID vs UUID primary keys

Reply via email to