Re: [Pulp-list] Importer Sync APIs

Nick Coghlan Mon, 21 Nov 2011 15:33:59 -0800

On 11/22/2011 07:43 AM, Jay Dobies wrote:

http://blog.pulpproject.org/2011/11/21/importer-sync-apis/


I know the week of Thanksgiving isn't the best time to ask for deep
thought, but I'm asking anyway.


No Thanksgiving over here, so no holiday distractions for me :)

I also know there are at least two other teams interested in writing
plugins that I'd like to give some feedback on how this will meet their
needs.

1. The new sync log API looks pretty good. What I'll do is set up mysync commands to log to a file on disk (since of them run in a differentprocess), then when everything is done, read that file and pass thecontents back in the final report.

However, it would be nice to be able to store a "stats" mapping inaddition to the raw log data.

2. I *think* the 'working directory' API is the'get_repo_storage_directory()' call on the conduit. However, I'm notentirely clear on that, nor what the benefits are over using Python'sown tempfile module (although that may be an artefact of the requirementfor 2.4 compatibility in Pulp - with 2.5+, the combination of contextmanagers, tempfile.mkdtemp() and shutil.remove() means that cleaning uptemporary directories is a *lot* easier than it used to be)

3. The "request_unit_filename" and "add_or_update_content_unit" APIsseem oddly asymmetrical, and the "get_unit_keys_for_repo" namingincludes quite a bit of redundancy.

To be both consistent, flexible and efficient, I suggest an API basedaround a "ContentUnitData" class with the following attributes:

  - type_id
  - unit_id (may be None when defining a new unit to be added to Pulp)
  - key_data
  - other_data

- storage_path (may be None if no bits are stored for the contenttype - perhaps whether or not bits are stored should be part of thecontent type definition?)


The content management API itself could then look like:

- get_units() -> two level mapping {type_id: {unit_id: ContentUnitData}}
    Replacement for get_unit_keys_for_repo()

Note that if you're concerned about exposing 'unit_id', theexisting APIs already exposed it as the return value from'add_or_update_content_unit'.I think you're right to avoid exposing a "single lookup" API, atleast initially - that's a performance problem waiting to happen.


- new_unit(type_id, key_data, other_data, relative_path) -> ContentUnitData
  Does *not* assign a unit ID (or touch the database at all)
  Does fill in absolute path in storage_path based on relative_path
  Replaces any use of "request_unit_filename"

- save_unit(ContentUnitData) -> ContentUnitData
  Assigns a unit ID with the unit and stores the unit in the database
  Associates the unit with the repo
  Batching will be tricky due to error handling if the save fails

Replaces any use of 'add_or_update_content_unit' and'associate_content_unit'


- remove_unit(type_id, pulp_id) -> bool
  True if removed, False if association retained
  Replaces any use of 'unassociate_content_unit'

For the content unit lifecycle, I suggest adopting a reference countingmodel where the importer owns one set of references (controlled viasave_unit/remove_unit on the importer conduit) and manual associationowns a second set of references (which the importer conduit can'ttouch). A reference through either mechanism would then keep the contentunit alive and associated with the repository (the repo should present aunified interface to other code, so client code doesn't need to care ifit is an importer association or a manual association that is keepingthe content unit alive).

4. It's probably worth also discussing the scratchpad API that lets theimporter store additional state between syncs. I like having this asa convenience API (rather than having to store everything as explicitmetadata on the repo), but for debugging purposes, it would probably begood to publish "_importer_scratchpad" as a metadata attribute on therepo that is accessible via REST.

This is too big and ambitious for me to get right on my own.

Definitely headed in the right direction, but I think it's worth pushingthe "structured data" approach even further. You've already starteddoing this on the configuration side of things, I think it makes senseon the metadata side as well.


Cheers,
Nick.

--
Nick Coghlan
Red Hat Engineering Operations, Brisbane

_______________________________________________
Pulp-list mailing list
Pulp-list@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-list

Re: [Pulp-list] Importer Sync APIs

Reply via email to