Re: [Pulp-dev] Performance testing results, autoincrement ID vs UUID primary keys

Dana Walker Fri, 01 Mar 2019 11:52:00 -0800

As I brought up on irc, I don't know how feasible the complications to
maintenance would be going forward, but I would prefer if we could use some
sort of settings in order to choose uuid or id based on MariaDB or
PostgreSQL.  I want us to work everywhere, but I'm really concerned at the
impact to our users of a 30-40% efficiency drop in speed and storage.


David wrote up a quick Proof of Concept after I brought this up but wasn't
necessarily advocating it himself.  I think Daniel and Dennis expressed
some concerns.  I'd like to see more people discussing it here with
reasoning/examples on how doable something like this could be?

If it's not on the table, I understand, but want to make sure we've
considered all reasonable options, and that might not be a simple binary of
either/or.

Thanks,

--Dana

Dana Walker

Associate Software Engineer

Red Hat

<https://www.redhat.com>
<https://red.ht/sig>


On Fri, Mar 1, 2019 at 9:15 AM David Davis <davidda...@redhat.com> wrote:

> I just want to bump this thread. If we hope to make the Pulp 3 RC date, we
> need feedback today.
>
> David
>
>
> On Wed, Feb 27, 2019 at 5:09 PM Matt Pusateri <mpusa...@redhat.com> wrote:
>
>> Not sure if https://www.webyog.com/ Monyog will give a free opensource
>> project license.  But that might help diagnose the MariaDB performance.
>> Monyog is really nice, I wish it supported Postgres.
>>
>> Matt P.
>>
>> On Tue, Feb 26, 2019 at 7:23 PM Daniel Alley <dal...@redhat.com> wrote:
>>
>>> Hello all,
>>>
>>> We've had an ongoing discussion about whether Pulp would be able to
>>> perform acceptably if we switched back to UUID primary keys.  I've finished
>>> doing the performance testing and I *think* the answer is yes.  Although to
>>> be honest, I'm not sure that I understand why, in the case of MariaDB.
>>>
>>> I linked my testing methodology and results here:
>>> https://pulp.plan.io/issues/4290#note-18
>>>
>>> To summarize, I tested the following:
>>>
>>> * How long it takes to perform subsequent large (lazy) syncs, with lots
>>> of content in the database (100-400k content units)
>>> * How long it takes to perform various small but important database
>>> queries
>>>
>>> The results were weirdly in contrast in some cases.
>>>
>>> The first four syncs (202,000 content total) behaved mostly the same on
>>> PostgreSQL whether it used an autoincrement or UUID primary key.
>>> Subsequent syncs had a performance drop of between 30-40%.  Likewise, the
>>> code snippets performed 30+% worse.  Sync time scaled linearly"ish" with
>>> the amont of content in the repository in both cases, which was a bit
>>> surprising to me.  The size of the database at the end was 30-40% larger
>>> with UUID primary keys, 736 MB vs 521 MB.  The gap would be smaller in
>>> typical usage when you consider that most content types have more metadata
>>> than FileContent (what I was testing).
>>>
>>> Autoincrement PostgreSQL (left) vs. UUID PostgreSQL (right) in diff form
>>> https://www.diffchecker.com/40AF8vvM
>>>
>>> With MariaDB the first sync was almost 80% slower than the first sync w/
>>> PostgreSQL, but every subsequent sync was as fast or faster, despite the
>>> tests of specific queries performing multiple times worse.  Additionally
>>> the sync performance did not decrease as rapidly as it did under
>>> PostgreSQL.  With MariaDB, one of my test queries that worked fine when
>>> backed by PostgreSQL ended up hanging endlessly and I had to cut it off
>>> after 25 or so minutes. [0]  I would consider that a blocker to claiming we
>>> support MariaDB / MySQL.
>>>
>>> But overall I'm not sure how to interpret the fact that on one hand the
>>> real-usage performance is equal or better better, and on the performance of
>>> some of the underlying queries is noticably worse.  Maybe there's some
>>> weird caching going on in the backend, or the generated indexes are
>>> different?
>>>
>>> UUID PostgreSQL (left) vs. UUID MariaDB (right) in diff form
>>> https://www.diffchecker.com/W1nnIQgj
>>>
>>> I'd like to invite some discussion on this, but nothing I've mentioned
>>> seems like it would be a problem for going forwards with using UUID primary
>>> keys in a general sense.  If we're all in agreement about that engineering
>>> decision then we can move forwards with that work.
>>>
>>> [0] for *some* but not all repository versions.  No idea what's up there.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev@redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev@redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>

_______________________________________________
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev

Re: [Pulp-dev] Performance testing results, autoincrement ID vs UUID primary keys

Reply via email to