Thanks to @dalley and @daviddavis for their help investigating the different performance issues related to the epic #3770. Here's an update on changes made and upcoming.
https://pulp.plan.io/issues/3812 - 50x speedup on saving Content units when not using multi-table inheritance. I believe this means we need to have Content units not use Master/Detail. I believe plugin writers are waiting on this change, so we should prioritize this for Pulp3 and add it to a sprint (I think). https://pulp.plan.io/issues/3813 - Resolved via documentation on how to use bulk_save safely with Artifacts. Needs to be added to sprint. 15x - 20x speedup experimentally shown. https://pulp.plan.io/issues/3814 - The interface for add_content and remove_content now only take a Queryset. 10x speedup at least. Already merged, coming in next Pulp3 beta. This also sped up the API calls that use these interfaces also. :) We may have a lingering performance issue of moving a large number of files into place across filesystems, but we'll wait until we have a clear reproducer that we can optimize on to handle that case. We also believe we know how to resolve that file-saving issue should it arise. On Mon, Jul 2, 2018 at 4:41 PM, Brian Bouterse <bbout...@redhat.com> wrote: > As described in 3770, pulp_file syncs 2.4x slower than than pulp2 [0]. I > believe we want Pulp3 to sync at least as fast as Pulp2. I think we should > consider making the goal of "have pulp3 sync as fast as pulp2" a Pulp3 GA > requirement. The reasoning for me is two fold. (a) users aren't going to > switch to something over twice as slow. (b) we likely will have to make > some non-trivial database changes so doing them now. > > How do you feel about this goal/need? > > In terms of tackling the problems themselves, I've separated the > performance issue into 3 different performance problems: > > https://pulp.plan.io/issues/3812 > https://pulp.plan.io/issues/3813 > https://pulp.plan.io/issues/3814 > > Any feedback or discussion on these is welcome. I plan to help organizing > ideas as we explore possible solutions. Once some more info is available > and a few vetted ideas are available, I plan to bring it back to the list. > If anyone wants to talk through them before then, feel free to reach out to > me. > > [0]: https://pulp.plan.io/issues/3770#note-5 > > -Brian > > > On Thu, Jun 21, 2018 at 4:50 PM, Brian Bouterse <bbout...@redhat.com> > wrote: > >> I just tried an implementation of DeclarativeVersion that uses >> bulk_create for all content units, content artifacts, and remote artifacts. >> >> The content units are incompatible with bulk_save(). When trying to save >> a batch of content units with bulk_save it raises: ValueError: Can't bulk >> create a multi-table inherited model >> >> On Thu, Jun 21, 2018 at 4:19 PM, Brian Bouterse <bbout...@redhat.com> >> wrote: >> >>> I'm only considering these changes for the plugin writer API to help >>> resolve the performance issues. >>> >>> On Thu, Jun 21, 2018 at 4:11 PM, Austin Macdonald <amacd...@redhat.com> >>> wrote: >>> >>>> For models, bulk_create seems good to me. Endpoints to kick off tasks >>>> like sync that use bulk_create seems fine. >>>> >>>> Are you also proposing we have bulk_create for non-task REST API calls? >>>> Should a user be able to POST a list of dictionaries that becomes a set of >>>> Content? I'm open to it, but it seems like it could get ugly. >>>> >>>> On Thu, Jun 21, 2018 at 3:54 PM, Brian Bouterse <bbout...@redhat.com> >>>> wrote: >>>> >>>>> I've run cprofile on some of the sync code for Pulp3 and I've noticed >>>>> that we may have some problems with bulk_create on some of the object >>>>> types. >>>>> >>>>> Here is a small analysis I did: https://pulp.plan.io/issues/37 >>>>> 70#note-2 >>>>> >>>>> As an aside, we don't have a bulk add option for >>>>> RepositoryVersion.add_content, which ensures each round trip to the db >>>>> will >>>>> be for one unit. When you're processing 70K units, that's a lot of trips. >>>>> I >>>>> don't think we have to add this right now, but to resolve an issue like >>>>> 3770 we may need to. >>>>> >>>>> I do think we should make our models compatible with bulk_create now >>>>> either way. >>>>> >>>>> What do you think? >>>>> >>>>> -Brian >>>>> >>>>> _______________________________________________ >>>>> Pulp-dev mailing list >>>>> Pulp-dev@redhat.com >>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>> >>>>> >>>> >>> >> >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev