On 04/11/2018 10:59 AM, Brian Bouterse wrote:


On Tue, Apr 10, 2018 at 10:43 AM, Jeff Ortel <[email protected] <mailto:[email protected]>> wrote:





        

        

        

        


    On 04/06/2018 09:15 AM, Brian Bouterse wrote:
    Several plugins have started using the Changesets including
    pulp_ansible, pulp_python, pulp_file, and perhaps others. The
    Changesets provide several distinct points of value which are
    great, but there are two challenges I want to bring up. I want to
    focus only on the problem statements first.

    1. There is redundant "differencing" code in all plugins. The
    Changeset interface requires the plugin writer to determine what
    units need to be added and those to be removed. This requires all
    plugin writers to write the same non-trivial differencing code
    over and over. For example, you can see the same non-trivial
    differencing code present in pulp_ansible
    
<https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306>,
    pulp_file
    
<https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193>,
    and pulp_python
    
<https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223>.
    Line-wise, this "differencing" code makes up a large portion
    (maybe 50%) of the sync code itself in each plugin.

    Ten lines of trivial set logic hardly seems like a big deal but
    any duplication is worth exploring.

It's more than ten lines. Take pulp_ansible for example. By my count (the linked to section) it's 89 lines, which out of 306 lines of plugin code for sync is 29% of extra redundant code. The other plugins have similar numbers. So with those numbers in mind, what do you think?

I was counting the lines (w/o comments) in find_delta() based on the linked code.  Which functions are you counting?



    2. Plugins can't do end-to-end stream processing. The Changesets
    themselves do stream processing, but when you call into
    changeset.apply_and_drain() you have to have fully parsed the
    metadata already. Currently when fetching all metadata from
    Galaxy, pulp_ansible takes about 380 seconds (6+ min). This means
    that the actual Changeset content downloading starts 380 seconds
    later than it could. At the heart of the problem, the
    fetching+parsing of the metadata is not part of the stream
    processing.

    The additions/removals can be any interable (like generator) and
    by using ChangeSet.apply() and iterating the returned object, the
    pluign can "turn the crank" while downloading and processing the
    metadata.  The ChangeSet.apply_and_drain() is just a convenience
    method.  I don't see how this is a limitation of the ChangeSet.


That is new info for me (and maybe everyone). OK so Changesets have two interfaces. apply() and apply_and_drain(). Why do we have two interfaces when apply() can support all existing use cases (that I know of) and do end-to-end stream processing but apply_and_drain() cannot? I see all of our examples (and all of our new plugins) using apply_and_drain().

The ChangeSet.apply() was how I designed (and documented) it.  Not sure when/who added the apply_and_drain().  +1 for removing it.



    Do you see the same challenges I do? Are these the right problem
    statements? I think with clear problem statements a solution will
    be easy to see and agree on.

    I'm not convinced that these are actual problems/challenges that
    need to be addressed in the near term.


    Thanks!
    Brian


    _______________________________________________
    Pulp-dev mailing list
    [email protected] <mailto:[email protected]>
    https://www.redhat.com/mailman/listinfo/pulp-dev
    <https://www.redhat.com/mailman/listinfo/pulp-dev>


    _______________________________________________
    Pulp-dev mailing list
    [email protected] <mailto:[email protected]>
    https://www.redhat.com/mailman/listinfo/pulp-dev
    <https://www.redhat.com/mailman/listinfo/pulp-dev>



_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev

Reply via email to