On Tue, Apr 10, 2018 at 10:43 AM, Jeff Ortel <[email protected]> wrote:
> > > > > > > > > > > > > On 04/06/2018 09:15 AM, Brian Bouterse wrote: > > Several plugins have started using the Changesets including pulp_ansible, > pulp_python, pulp_file, and perhaps others. The Changesets provide several > distinct points of value which are great, but there are two challenges I > want to bring up. I want to focus only on the problem statements first. > > 1. There is redundant "differencing" code in all plugins. The Changeset > interface requires the plugin writer to determine what units need to be > added and those to be removed. This requires all plugin writers to write > the same non-trivial differencing code over and over. For example, you can > see the same non-trivial differencing code present in pulp_ansible > <https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306>, > pulp_file > <https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193>, > and pulp_python > <https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223>. > Line-wise, this "differencing" code makes up a large portion (maybe 50%) of > the sync code itself in each plugin. > > > Ten lines of trivial set logic hardly seems like a big deal but any > duplication is worth exploring. > It's more than ten lines. Take pulp_ansible for example. By my count (the linked to section) it's 89 lines, which out of 306 lines of plugin code for sync is 29% of extra redundant code. The other plugins have similar numbers. So with those numbers in mind, what do you think? > > > 2. Plugins can't do end-to-end stream processing. The Changesets > themselves do stream processing, but when you call into > changeset.apply_and_drain() you have to have fully parsed the metadata > already. Currently when fetching all metadata from Galaxy, pulp_ansible > takes about 380 seconds (6+ min). This means that the actual Changeset > content downloading starts 380 seconds later than it could. At the heart of > the problem, the fetching+parsing of the metadata is not part of the stream > processing. > > > The additions/removals can be any interable (like generator) and by using > ChangeSet.apply() and iterating the returned object, the pluign can "turn > the crank" while downloading and processing the metadata. The > ChangeSet.apply_and_drain() is just a convenience method. I don't see how > this is a limitation of the ChangeSet. > That is new info for me (and maybe everyone). OK so Changesets have two interfaces. apply() and apply_and_drain(). Why do we have two interfaces when apply() can support all existing use cases (that I know of) and do end-to-end stream processing but apply_and_drain() cannot? I see all of our examples (and all of our new plugins) using apply_and_drain(). > > > Do you see the same challenges I do? Are these the right problem > statements? I think with clear problem statements a solution will be easy > to see and agree on. > > > I'm not convinced that these are actual problems/challenges that need to > be addressed in the near term. > > > Thanks! > Brian > > > _______________________________________________ > Pulp-dev mailing > [email protected]https://www.redhat.com/mailman/listinfo/pulp-dev > > > > _______________________________________________ > Pulp-dev mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/pulp-dev > >
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
