The changes to switch to UUIDs have been merged. I opened issues against all the Pulp 3 plugins I could think of to update their docs. There may be some other changes needed too though.
David On Wed, Mar 6, 2019 at 9:18 AM David Davis <[email protected]> wrote: > Since there seems to be no objections to switching to UUIDs, I’d like to > propose we merge the PRs[0][1] that will switch core to use UUID PKs > tomorrow (in 24 hours). After that, we'll open redmine issues to update > plugins to use UUIDs. > > [0] https://github.com/pulp/pulpcore/pull/16 > [1] https://github.com/pulp/pulpcore-plugin/pull/69 > > David > > > On Tue, Mar 5, 2019 at 5:15 PM Jeff Ortel <[email protected]> wrote: > >> +1 to switching back to UUIDs for the reasons Brian gave. >> >> On 3/1/19 2:23 PM, Brian Bouterse wrote: >> >> I've finally gotten to read through the numbers and this thread. It is a >> tradeoff but I am +1 for switching to UUIDs. I focus on the PostgreSQL UUID >> vs int case because that is our default database. I don't think too much >> about how things perform on MariaDB because they can improve their own >> performance to catch up to PostgreSQL which regularly is performing better >> afaict. I agree with the assessment of 30% ish slowdown in the large unit >> cases for PostgreSQL. Still, I believe the advantages of switching to UUIDs >> are worth it. Two main reasons stick out in my mind. >> >> 1. Our core code and all plugin code will always be compatible with >> common db backends even when using bulk_create() >> 2. We get database sharding with postgresql which you can only do with >> UUID pks. I was advised this years ago by jcline. >> >> Performance and compatibility are a pretty classic trade-off. Overall >> I've found that initial releases launch with less performance and improve >> (often significantly) overtime. Consider the interpreter pypy (not pypi). >> It started "roughly 2000x slower [at initial launch] than CPython, to >> roughly 7x faster [now]" [0]. Launching Pulp 3.0 that is 30% slower in the >> worst-case but runs everywhere with zero "db-behavior surprises" I think is >> worth it. Also conversely, if we don't adopt UUIDs, how will we address >> item 1 pre RC? >> >> @dawalker for the "can we have both" option, we probably can have some >> db-specific codepaths, but I don't think doing an application wide PK type >> change as a setting is feasible to support. The db specific codepaths are >> one way performance improves over time. For the initial release, to keep >> things simple I hope we don't have conditional database codepaths (for now). >> >> More discussion on this change is encouraged. Thanks @dalley so much for >> all the detailed investigation! >> >> [0]: >> https://morepypy.blogspot.com/2018/09/the-first-15-years-of-pypy.html >> >> Thank you, >> Brian >> >> On Fri, Mar 1, 2019 at 2:51 PM Dana Walker <[email protected]> wrote: >> >>> As I brought up on irc, I don't know how feasible the complications to >>> maintenance would be going forward, but I would prefer if we could use some >>> sort of settings in order to choose uuid or id based on MariaDB or >>> PostgreSQL. I want us to work everywhere, but I'm really concerned at the >>> impact to our users of a 30-40% efficiency drop in speed and storage. >>> >>> David wrote up a quick Proof of Concept after I brought this up but >>> wasn't necessarily advocating it himself. I think Daniel and Dennis >>> expressed some concerns. I'd like to see more people discussing it here >>> with reasoning/examples on how doable something like this could be? >>> >>> If it's not on the table, I understand, but want to make sure we've >>> considered all reasonable options, and that might not be a simple binary of >>> either/or. >>> >>> Thanks, >>> >>> --Dana >>> >>> Dana Walker >>> >>> Associate Software Engineer >>> >>> Red Hat >>> >>> <https://www.redhat.com> >>> <https://red.ht/sig> >>> >>> >>> On Fri, Mar 1, 2019 at 9:15 AM David Davis <[email protected]> >>> wrote: >>> >>>> I just want to bump this thread. If we hope to make the Pulp 3 RC date, >>>> we need feedback today. >>>> >>>> David >>>> >>>> >>>> On Wed, Feb 27, 2019 at 5:09 PM Matt Pusateri <[email protected]> >>>> wrote: >>>> >>>>> Not sure if https://www.webyog.com/ Monyog will give a free >>>>> opensource project license. But that might help diagnose the MariaDB >>>>> performance. Monyog is really nice, I wish it supported Postgres. >>>>> >>>>> Matt P. >>>>> >>>>> On Tue, Feb 26, 2019 at 7:23 PM Daniel Alley <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello all, >>>>>> >>>>>> We've had an ongoing discussion about whether Pulp would be able to >>>>>> perform acceptably if we switched back to UUID primary keys. I've >>>>>> finished >>>>>> doing the performance testing and I *think* the answer is yes. Although >>>>>> to >>>>>> be honest, I'm not sure that I understand why, in the case of MariaDB. >>>>>> >>>>>> I linked my testing methodology and results here: >>>>>> https://pulp.plan.io/issues/4290#note-18 >>>>>> >>>>>> To summarize, I tested the following: >>>>>> >>>>>> * How long it takes to perform subsequent large (lazy) syncs, with >>>>>> lots of content in the database (100-400k content units) >>>>>> * How long it takes to perform various small but important database >>>>>> queries >>>>>> >>>>>> The results were weirdly in contrast in some cases. >>>>>> >>>>>> The first four syncs (202,000 content total) behaved mostly the same >>>>>> on PostgreSQL whether it used an autoincrement or UUID primary key. >>>>>> Subsequent syncs had a performance drop of between 30-40%. Likewise, the >>>>>> code snippets performed 30+% worse. Sync time scaled linearly"ish" with >>>>>> the amont of content in the repository in both cases, which was a bit >>>>>> surprising to me. The size of the database at the end was 30-40% larger >>>>>> with UUID primary keys, 736 MB vs 521 MB. The gap would be smaller in >>>>>> typical usage when you consider that most content types have more >>>>>> metadata >>>>>> than FileContent (what I was testing). >>>>>> >>>>>> Autoincrement PostgreSQL (left) vs. UUID PostgreSQL (right) in diff >>>>>> form >>>>>> https://www.diffchecker.com/40AF8vvM >>>>>> >>>>>> With MariaDB the first sync was almost 80% slower than the first sync >>>>>> w/ PostgreSQL, but every subsequent sync was as fast or faster, despite >>>>>> the >>>>>> tests of specific queries performing multiple times worse. Additionally >>>>>> the sync performance did not decrease as rapidly as it did under >>>>>> PostgreSQL. With MariaDB, one of my test queries that worked fine when >>>>>> backed by PostgreSQL ended up hanging endlessly and I had to cut it off >>>>>> after 25 or so minutes. [0] I would consider that a blocker to claiming >>>>>> we >>>>>> support MariaDB / MySQL. >>>>>> >>>>>> But overall I'm not sure how to interpret the fact that on one hand >>>>>> the real-usage performance is equal or better better, and on the >>>>>> performance of some of the underlying queries is noticably worse. Maybe >>>>>> there's some weird caching going on in the backend, or the generated >>>>>> indexes are different? >>>>>> >>>>>> UUID PostgreSQL (left) vs. UUID MariaDB (right) in diff form >>>>>> https://www.diffchecker.com/W1nnIQgj >>>>>> >>>>>> I'd like to invite some discussion on this, but nothing I've >>>>>> mentioned seems like it would be a problem for going forwards with using >>>>>> UUID primary keys in a general sense. If we're all in agreement about >>>>>> that >>>>>> engineering decision then we can move forwards with that work. >>>>>> >>>>>> [0] for *some* but not all repository versions. No idea what's up >>>>>> there. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Pulp-dev mailing list >>>>>> [email protected] >>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>> >>>>> _______________________________________________ >>>>> Pulp-dev mailing list >>>>> [email protected] >>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>> >>>> _______________________________________________ >>>> Pulp-dev mailing list >>>> [email protected] >>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>> >>> _______________________________________________ >>> Pulp-dev mailing list >>> [email protected] >>> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> >> _______________________________________________ >> Pulp-dev mailing >> [email protected]https://www.redhat.com/mailman/listinfo/pulp-dev >> >> >> _______________________________________________ >> Pulp-dev mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/pulp-dev >> >
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
