As I brought up on irc, I don't know how feasible the complications to maintenance would be going forward, but I would prefer if we could use some sort of settings in order to choose uuid or id based on MariaDB or PostgreSQL. I want us to work everywhere, but I'm really concerned at the impact to our users of a 30-40% efficiency drop in speed and storage.
David wrote up a quick Proof of Concept after I brought this up but wasn't necessarily advocating it himself. I think Daniel and Dennis expressed some concerns. I'd like to see more people discussing it here with reasoning/examples on how doable something like this could be? If it's not on the table, I understand, but want to make sure we've considered all reasonable options, and that might not be a simple binary of either/or. Thanks, --Dana Dana Walker Associate Software Engineer Red Hat <https://www.redhat.com> <https://red.ht/sig> On Fri, Mar 1, 2019 at 9:15 AM David Davis <davidda...@redhat.com> wrote: > I just want to bump this thread. If we hope to make the Pulp 3 RC date, we > need feedback today. > > David > > > On Wed, Feb 27, 2019 at 5:09 PM Matt Pusateri <mpusa...@redhat.com> wrote: > >> Not sure if https://www.webyog.com/ Monyog will give a free opensource >> project license. But that might help diagnose the MariaDB performance. >> Monyog is really nice, I wish it supported Postgres. >> >> Matt P. >> >> On Tue, Feb 26, 2019 at 7:23 PM Daniel Alley <dal...@redhat.com> wrote: >> >>> Hello all, >>> >>> We've had an ongoing discussion about whether Pulp would be able to >>> perform acceptably if we switched back to UUID primary keys. I've finished >>> doing the performance testing and I *think* the answer is yes. Although to >>> be honest, I'm not sure that I understand why, in the case of MariaDB. >>> >>> I linked my testing methodology and results here: >>> https://pulp.plan.io/issues/4290#note-18 >>> >>> To summarize, I tested the following: >>> >>> * How long it takes to perform subsequent large (lazy) syncs, with lots >>> of content in the database (100-400k content units) >>> * How long it takes to perform various small but important database >>> queries >>> >>> The results were weirdly in contrast in some cases. >>> >>> The first four syncs (202,000 content total) behaved mostly the same on >>> PostgreSQL whether it used an autoincrement or UUID primary key. >>> Subsequent syncs had a performance drop of between 30-40%. Likewise, the >>> code snippets performed 30+% worse. Sync time scaled linearly"ish" with >>> the amont of content in the repository in both cases, which was a bit >>> surprising to me. The size of the database at the end was 30-40% larger >>> with UUID primary keys, 736 MB vs 521 MB. The gap would be smaller in >>> typical usage when you consider that most content types have more metadata >>> than FileContent (what I was testing). >>> >>> Autoincrement PostgreSQL (left) vs. UUID PostgreSQL (right) in diff form >>> https://www.diffchecker.com/40AF8vvM >>> >>> With MariaDB the first sync was almost 80% slower than the first sync w/ >>> PostgreSQL, but every subsequent sync was as fast or faster, despite the >>> tests of specific queries performing multiple times worse. Additionally >>> the sync performance did not decrease as rapidly as it did under >>> PostgreSQL. With MariaDB, one of my test queries that worked fine when >>> backed by PostgreSQL ended up hanging endlessly and I had to cut it off >>> after 25 or so minutes. [0] I would consider that a blocker to claiming we >>> support MariaDB / MySQL. >>> >>> But overall I'm not sure how to interpret the fact that on one hand the >>> real-usage performance is equal or better better, and on the performance of >>> some of the underlying queries is noticably worse. Maybe there's some >>> weird caching going on in the backend, or the generated indexes are >>> different? >>> >>> UUID PostgreSQL (left) vs. UUID MariaDB (right) in diff form >>> https://www.diffchecker.com/W1nnIQgj >>> >>> I'd like to invite some discussion on this, but nothing I've mentioned >>> seems like it would be a problem for going forwards with using UUID primary >>> keys in a general sense. If we're all in agreement about that engineering >>> decision then we can move forwards with that work. >>> >>> [0] for *some* but not all repository versions. No idea what's up there. >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Pulp-dev mailing list >>> Pulp-dev@redhat.com >>> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> _______________________________________________ >> Pulp-dev mailing list >> Pulp-dev@redhat.com >> https://www.redhat.com/mailman/listinfo/pulp-dev >> > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev