Re: [Pulp-dev] Performance testing results, autoincrement ID vs UUID primary keys

2019-03-05 Thread Jeff Ortel

+1 to switching back to UUIDs for the reasons Brian gave.

On 3/1/19 2:23 PM, Brian Bouterse wrote:
I've finally gotten to read through the numbers and this thread. It is 
a tradeoff but I am +1 for switching to UUIDs. I focus on the 
PostgreSQL UUID vs int case because that is our default database. I 
don't think too much about how things perform on MariaDB because they 
can improve their own performance to catch up to PostgreSQL which 
regularly is performing better afaict. I agree with the assessment of 
30% ish slowdown in the large unit cases for PostgreSQL. Still, I 
believe the advantages of switching to UUIDs are worth it. Two main 
reasons stick out in my mind.


1. Our core code and all plugin code will always be compatible with 
common db backends even when using bulk_create()
2. We get database sharding with postgresql which you can only do with 
UUID pks. I was advised this years ago by jcline.


Performance and compatibility are a pretty classic trade-off. Overall 
I've found that initial releases launch with less performance and 
improve (often significantly) overtime. Consider the interpreter pypy 
(not pypi).  It started "roughly 2000x slower [at initial launch] than 
CPython, to roughly 7x faster [now]" [0]. Launching Pulp 3.0 that is 
30% slower in the worst-case but runs everywhere with zero 
"db-behavior surprises" I think is worth it. Also conversely, if we 
don't adopt UUIDs, how will we address item 1 pre RC?


@dawalker for the "can we have both" option, we probably can have some 
db-specific codepaths, but I don't think doing an application wide PK 
type change as a setting is feasible to support. The db specific 
codepaths are one way performance improves over time. For the initial 
release, to keep things simple I hope we don't have conditional 
database codepaths (for now).


More discussion on this change is encouraged. Thanks @dalley so much 
for all the detailed investigation!


[0]: https://morepypy.blogspot.com/2018/09/the-first-15-years-of-pypy.html

Thank you,
Brian

On Fri, Mar 1, 2019 at 2:51 PM Dana Walker > wrote:


As I brought up on irc, I don't know how feasible the
complications to maintenance would be going forward, but I would
prefer if we could use some sort of settings in order to choose
uuid or id based on MariaDB or PostgreSQL.  I want us to work
everywhere, but I'm really concerned at the impact to our users of
a 30-40% efficiency drop in speed and storage.

David wrote up a quick Proof of Concept after I brought this up
but wasn't necessarily advocating it himself.  I think Daniel and
Dennis expressed some concerns.  I'd like to see more people
discussing it here with reasoning/examples on how doable something
like this could be?

If it's not on the table, I understand, but want to make sure
we've considered all reasonable options, and that might not be a
simple binary of either/or.

Thanks,

--Dana

Dana Walker

Associate Software Engineer

Red Hat







On Fri, Mar 1, 2019 at 9:15 AM David Davis mailto:davidda...@redhat.com>> wrote:

I just want to bump this thread. If we hope to make the Pulp 3
RC date, we need feedback today.

David


On Wed, Feb 27, 2019 at 5:09 PM Matt Pusateri
mailto:mpusa...@redhat.com>> wrote:

Not sure if https://www.webyog.com/ Monyog will give a
free opensource project license.  But that might help
diagnose the MariaDB performance.  Monyog is really nice,
I wish it supported Postgres.

Matt P.

On Tue, Feb 26, 2019 at 7:23 PM Daniel Alley
mailto:dal...@redhat.com>> wrote:

Hello all,

We've had an ongoing discussion about whether Pulp
would be able to perform acceptably if we switched
back to UUID primary keys.  I've finished doing the
performance testing and I *think* the answer is yes. 
Although to be honest, I'm not sure that I understand
why, in the case of MariaDB.

I linked my testing methodology and results here:
https://pulp.plan.io/issues/4290#note-18

To summarize, I tested the following:

* How long it takes to perform subsequent large (lazy)
syncs, with lots of content in the database (100-400k
content units)
* How long it takes to perform various small but
important database queries

The results were weirdly in contrast in some cases.

The first four syncs (202,000 content total) behaved
mostly the same on PostgreSQL whether it used an
autoincrement or UUID primary key. Subsequent syncs
had a performance drop of 

Re: [Pulp-dev] Pulp 2.18.1 GA (delayed)

2019-02-21 Thread Jeff Ortel
An additional packaging adjustment to deal with Celery (and related) 
dependencies provided by EPEL.  As a result, the 2.18.1 GA is now 
scheduled for Feb 22.


https://pulp.plan.io/projects/pulp/wiki/2181_Release_Schedule


On 2/14/19 12:32 PM, Jeff Ortel wrote:
To ensure that Pulp packaging adjustments related to dependencies 
(celery) provided by EPEL can be completed and tested, the GA date 
needs to be pushed.

The GA is now scheduled for February 20.


https://pulp.plan.io/projects/pulp/wiki/2181_Release_Schedule


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Pulp 2.18.1 GA (delayed)

2019-02-14 Thread Jeff Ortel
To ensure that Pulp packaging adjustments related to dependencies 
(celery) provided by EPEL can be completed and tested, the GA date needs 
to be pushed.

The GA is now scheduled for February 20.


https://pulp.plan.io/projects/pulp/wiki/2181_Release_Schedule

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] pulpcore-plugin 0.1.0b20

2019-02-11 Thread Jeff Ortel

The following packages are now available on PyPI:
   - pulpcore-plugin 0.1.0b20 [1] with its release notes here [2]

Note: The management of remote artifacts has been pulled out of the 
ContentSaver stage and is now provided by a /new/ RemoteArtifactSaver stage.

  Plugins creating custom pipelines should include this stage.

The beta documentation is available here[3].

[1]: https://pypi.org/project/pulpcore-plugin/0.1.0b20/ 

[2]: 
https://docs.pulpproject.org/en/pulpcore-plugin/nightly/release-notes/index.html#b20 


[3] https://docs.pulpproject.org/en/3.0/beta/
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] 2.18.1 Beta 1 available

2019-02-07 Thread Jeff Ortel
Pulp 2.18.1 Beta 1 is now available, and can be downloaded from the 2.18 
beta repositories:


https://repos.fedorapeople.org/repos/pulp/pulp/beta/2.18/

Upgrading
=

The Pulp 2.18 beta repository is included in the pulp repo files:
https://repos.fedorapeople.org/repos/pulp/pulp/fedora-pulp.repo for Fedora
https://repos.fedorapeople.org/repos/pulp/pulp/rhel-pulp.repo for RHEL 7

 package dependencies provided in the pulp beta repositories must be 
used instead of EPEL.


After enabling the pulp-beta repository, you'll want to follow the 
standard upgrade path

with migrations:

$ sudo systemctl stop httpd pulp_workers pulp_resource_manager 
pulp_celerybeat pulp_streamer goferd

$ sudo yum upgrade
$ sudo -u apache pulp-manage-db
$ sudo systemctl start httpd pulp_workers pulp_resource_manager 
pulp_celerybeat pulp_streamer goferd


The pulp_streamer and goferd services should be omitted if those 
services are not installed.


Issues Addressed


OSTree:
  3999    Publishing incorrect branch head.

RPM:
  4333    repo syncs fail causing worker to crash with 
/lib64/libmodulemd.so.1: undefined symbol: g_log_structured_standard

  4225    0029_applicability_schema_change.py fails for some users
  4152    Regression Pulp 2.17.1: recursive copy of RPMs does not copy 
partially resolvable dependencies
  4252    modules.yaml file is generated on repository with no 
modularity information
  4253    modules.yaml reference in repomd.xml does not use selected 
checksum
  4309    Vendor field migration fails with 'NoneType' object has no 
attribute 'text'

  4375    Recursive copy doesn't solve rich dependencies correctly


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] 2.18.1 Release Schedule

2019-02-06 Thread Jeff Ortel
Still working to get an outstanding issue verified.  The beta needs to 
slip (1) additional day.  See the updated schedule[0].


On 2/5/19 5:17 PM, Jeff Ortel wrote:
An issue found during 2.18.1 testing as not yet been completely 
resolved/verified.  As of now, we need to slip (1) additional day.


On 1/29/19 2:41 PM, Jeff Ortel wrote:
Testing on 2.18.1 has found a few issues.  As a result, the schedule 
has slipped ~1 week.


On 1/16/19 10:28 AM, Jeff Ortel wrote:
A 2.18.1 is being planned with some features and recent fixes. Here 
[0] is a release schedule page which outlines the dates, starting 
with a dev freeze on January 21, 2019 @ 22:00 UTC.


If this schedule needs to be adjusted, please reply with alternate 
dates.


[0] https://pulp.plan.io/projects/pulp/wiki/2181_Release_Schedule 
<https://pulp.plan.io/projects/pulp/wiki/2180_Release_Schedule> 






___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] 2.18.1 Release Schedule

2019-02-05 Thread Jeff Ortel
An issue found during 2.18.1 testing as not yet been completely 
resolved/verified.  As of now, we need to slip (1) additional day.


On 1/29/19 2:41 PM, Jeff Ortel wrote:
Testing on 2.18.1 has found a few issues.  As a result, the schedule 
has slipped ~1 week.


On 1/16/19 10:28 AM, Jeff Ortel wrote:
A 2.18.1 is being planned with some features and recent fixes. Here 
[0] is a release schedule page which outlines the dates, starting 
with a dev freeze on January 21, 2019 @ 22:00 UTC.


If this schedule needs to be adjusted, please reply with alternate dates.

[0] https://pulp.plan.io/projects/pulp/wiki/2181_Release_Schedule 
<https://pulp.plan.io/projects/pulp/wiki/2180_Release_Schedule> 




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] 2.18.1 Release Schedule

2019-01-29 Thread Jeff Ortel
Testing on 2.18.1 has found a few issues.  As a result, the schedule has 
slipped ~1 week.


On 1/16/19 10:28 AM, Jeff Ortel wrote:
A 2.18.1 is being planned with some features and recent fixes. Here 
[0] is a release schedule page which outlines the dates, starting with 
a dev freeze on January 21, 2019 @ 22:00 UTC.


If this schedule needs to be adjusted, please reply with alternate dates.

[0] https://pulp.plan.io/projects/pulp/wiki/2181_Release_Schedule 
<https://pulp.plan.io/projects/pulp/wiki/2180_Release_Schedule> 


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Removing pulp/relational-pulp from Pulp org on Jan 22nd

2019-01-21 Thread Jeff Ortel

+1, delete.

On 1/17/19 12:44 PM, Brian Bouterse wrote:
This repo [0] was an early-on repo design of Master/Detail which has 
been moved into the Pulp3 codebase for several years now. Now github 
identified several issues in it via static analysis and we're getting 
email notification about fixing them.


The author of it, @smyers, confirmed via private email that it is no 
longer used and can be deleted. I second that it should be deleted.


I am planning to delete it on Tues Jan 22nd. You can fork until then 
if you want to. If there are any concerns or problems with this, 
feedback is welcome.


[0]: https://github.com/pulp/relational-pulp

Thanks,
Brian

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Reminder: 2.18.1 Dev Freeze Today.

2019-01-21 Thread Jeff Ortel
Any Pulp2 core or plugin code that you want included in the 2.18.1 
release must be:


- Merged to master by 22:00 UTC, Today.
- Be associated with a bugfix issue. stories, refactors, and tasks are 
not included in z-stream releases.


Thanks!

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] 2.18.1 Release Schedule

2019-01-16 Thread Jeff Ortel
A 2.18.1 is being planned with some features and recent fixes. Here [0] 
is a release schedule page which outlines the dates, starting with a dev 
freeze on January 21, 2019 @ 22:00 UTC.


If this schedule needs to be adjusted, please reply with alternate dates.

[0] https://pulp.plan.io/projects/pulp/wiki/2181_Release_Schedule 

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Namespacing plugins, looking for feedback

2019-01-11 Thread Jeff Ortel



On 1/9/19 11:30 AM, Tatiana Tereshchenko wrote:

To summarize where we are so far:
*All* master/detail related endpoints will be automatically prepended 
with Django app *label* [0]

    - concerns: 'pulp_' in the label
    - options to address concerns:
 * introduce a new attribute to the AppConfig class to use in 
the endpoints construction (not supported by majority so far)
 * drop 'pulp_' part from a *plugin's* app label (supported by 
majority so far)


Questions/concerns about dropping the 'pulp_' from the plugins' app label:

# Table names in the DB are prepended using the app label. We need to 
be sure to avoid collisions with other applications for pulpcore and 
for pulp plugins. Are they already in the "pulp" database?

Yes, all pulpcore and pulp plugin tables are in "pulp" database.

# The names in the list of installed plugins would then not be the 
same as the packages themselves.

It's probably ok. The status would look like this:
    {
    "component": "*file*",
    "version": "0.0.1b6"
    },
    {
    "component": "*rpm*",
    "version": "3.0.0b1"
    }

# What about the label for the core? (not discussed)
It stays as is - 'pulp_app'.


Why?  Seems like 'core' would be more descriptive.


[0] 
https://docs.djangoproject.com/en/2.1/ref/applications/#django.apps.AppConfig.label


On Tue, Jan 8, 2019 at 8:22 PM Daniel Alley > wrote:


I'm not opposed to this plan, I just want to point out that it
would make the status API make slightly less sense.  The names in
the list of installed plugins would then not be the same as the
packages themselves.  It's probably close enough as to not be a
problem though.

On Tue, Jan 8, 2019 at 12:23 PM Austin Macdonald
mailto:amacd...@redhat.com>> wrote:



On Tue, Jan 8, 2019 at 12:12 PM Brian Bouterse
mailto:bbout...@redhat.com>> wrote:

My understanding is that it's for both. It would be
dropped from app_label and that will automatically be used
in master/detail urls. Is that what others thought?

This seems like the simplest approach to me. My only concern
with this approach is making sure that the database will be
properly namespaced so there won't be collisions with other
applications that use postgres like Katello. AFAIK, the plugin
tables don't need to be namespaced since they are already in
the "pulp" database. Is that correct? If so, +1.
___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev

___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Concerns about bulk_create and PostgreSQL

2019-01-08 Thread Jeff Ortel




On 1/3/19 1:28 PM, Simon Baatz wrote:

On Thu, Jan 03, 2019 at 01:02:57PM -0500, David Davis wrote:

I don't think that using integer ids with bulk_create and supporting
mysql/mariadb are necessarily mutually exclusive. I think there might
be a way to find the records created using bulk_create if we know the
natural key. It might be more performant than using UUIDs as well.

This assumes that there is a natural key.  For content types with no
digest information in the meta data, there may be a natural key
for content within a repo version only, but no natural key for the
overall content.  (If we want to support non-immediate modes for such
content.  In immediate mode, a digest can be computed from the
associated artifact(s)).


Can you give some examples of Content without a natural key?



Of course, there are ways around that (use a UUID as the "natural" key,
or add a UUID to the repo version key fields), but I would like to
avoid that.


On Thu, Jan 3, 2019 at 11:04 AM Dennis Kliban <[1]dkli...@redhat.com>
wrote:

Thank you Daniel for the explanation and for filing an issue[0] to do
performance analysis of UUIDs.
I really hope that we can switch back to using UUIDs so we can bring
back MariaDB support for Pulp 3.
[0] [2]https://pulp.plan.io/issues/4290

On Wed, Dec 5, 2018 at 1:35 PM Daniel Alley <[3]dal...@redhat.com>
wrote:

To rephrase the problem a little bit:
We need to bulk_create() a bunch of objects, and then after we do that
we want to immediately be able to relate them with other objects, which
means we need their PKs of the objects that were just created.
In the case of auto-increment integer PKs, we can't know that PK value
before it gets saved into the database.  Luckily, PostgreSQL (and
Oracle) support a "RETURNING" keyword that does provides this
information.  The raw SQL would look something like this:

INSERT INTO items (name) values ('bear') RETURNING id;

Django uses this feature to set the PK field on the model objects it
returns when you call bulk_create() on a list of unsaved model objects.
Unfortunately, MySQL doesn't support this, so there's no way to figure
out what the PKs of the objects you just saved were, so the ORM can't
set that information on the returned model objects.
UUID PKs circumvent this because the PK gets created outside of the
database, prior to being saved in the database, and so Django *can*
know what the PK will be when it gets saved.

On Wed, Dec 5, 2018 at 12:11 PM Brian Bouterse <[4]bbout...@redhat.com>
wrote:

+1 to experimentation and also making sure that we understand the
performance implications of the decision. I'm replying to this earlier
note to restate my observations of the problem a bit more.
More ideas and thoughts are welcome. This is a decision with a lot of
aspects to consider.
On Tue, Nov 20, 2018 at 10:00 AM Patrick Creech <[5]pcre...@redhat.com>
wrote:

  On Mon, 2018-11-19 at 17:08 -0500, Brian Bouterse wrote:
  > When we switched from UUID to integers for the PK
  > with databases other than PostgreSQL [0].
  >
  > With a goal of database agnosticism for Pulp3, if plugin writers
  plan to use bulk_create with any object inherited
  > from one of ours, they can't will get different behaviors on
  different databases and they won't have PKs that they may
  > require. bulk_create is a normal django thing, so plugin writers
  making a django plugin should be able to use it. This
  > concerned me already, but today it was also brought up by non-RH
  plugin writers also [1] in a PR.
  >
  > The tradeoffs bteween UUIDs versus PKs are pretty well summed up
  in our ticket where we discussed that change [2].
  > Note, we did not consider this bulk_create downside at that time,
  which I think is the most significant downside to
  > consider.
  >
  > Having bulk_create effectively not available for plugin writers
  (since we can't rely on its pks being returned) I
  > think is a non-starter for me. I love how short the UUIDs made our
  URLs so that's the tradeoff mainly in my mind.
  > Those balanced against each other, I think we should switch back.
  >
  > Another option is to become PostgreSQL only which (though I love
  psql) I think would be the wrong choice for Pulp from
  > what I've heard from its users.
  >
  > What do you think? What should we do?
  So, my mind immediately goes to this question, which might be
  usefull for others to help make decisions, so I'll ask:
  When you say:
  "we lost the ability to have the primary key set during bulk_create"
  Can you clarify what you mean by this?
  My mind immediately goes to this chain of events:
  When you use bulk_create, the existing in-memory model
  objects representing the 

Re: [Pulp-dev] Single-Table Content API Changes, Performance Discussion

2018-12-12 Thread Jeff Ortel

On 12/10/18 1:06 PM, Jeff Ortel wrote:
+1 to counts instead of URLs.  The URLs are documented and can be 
constructed to listing them on the serialized version does not seem to 
add much value.  The counts would likely provide more useful 
information and consistent with the summary counts.


Just thought of something.  The URLs for specific content types are at 
the discretion of the plugin writer so now I'm not convinced the user 
has a way to reliably construct the URLs themselves.




On 12/7/18 1:30 PM, Dennis Kliban wrote:
What if instead the API returned the number of each content type 
added or removed. So a repository version response would look like:


{'base_version': None,
 'content_added': {'pulp_file.file': 4},
 'content_removed': {'pulp_file.file': 1},
 'content_summary': {'pulp_file.file': '3'},
 'created': datetime.datetime(2018, 12, 5, 23, 34, 26, 948749, 
tzinfo=tzlocal()),

 'href': '/pulp/api/v3/repositories/4/versions/1/',
 'number': 1}

Thoughts?




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Single-Table Content API Changes, Performance Discussion

2018-12-10 Thread Jeff Ortel
+1 to counts instead of URLs.  The URLs are documented and can be 
constructed to listing them on the serialized version does not seem to 
add much value.  The counts would likely provide more useful information 
and consistent with the summary counts.


On 12/7/18 1:30 PM, Dennis Kliban wrote:
What if instead the API returned the number of each content type added 
or removed. So a repository version response would look like:


{'base_version': None,
 'content_added': {'pulp_file.file': 4},
 'content_removed': {'pulp_file.file': 1},
 'content_summary': {'pulp_file.file': '3'},
 'created': datetime.datetime(2018, 12, 5, 23, 34, 26, 948749, 
tzinfo=tzlocal()),

 'href': '/pulp/api/v3/repositories/4/versions/1/',
 'number': 1}

Thoughts?


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Possible Pulp3 RC Blocker issues from backlog

2018-12-07 Thread Jeff Ortel

Decisions look good to me.

On 12/5/18 11:36 AM, Brian Bouterse wrote:
I commented on the jwt one that I think it can be closed and why: 
https://pulp.plan.io/issues/3248#note-6


On Wed, Dec 5, 2018 at 8:54 AM David Davis > wrote:


Awesome, thanks!

David


On Wed, Dec 5, 2018 at 8:44 AM Austin Macdonald mailto:aus...@redhat.com>> wrote:

For those with ambiguity, I added the RC blocker to force
discussion and [acceptance | closing].

Added RC Blocker:

  * Add task names:https://pulp.plan.io/issues/2889

  * Determine mutable fields: https://pulp.plan.io/issues/2635
  * pulp-manager migrate order: https://pulp.plan.io/issues/3062
  o @david - https://pulp.plan.io/issues/4067#note-5
  * Asynchronous Distribution update/delete:
https://pulp.plan.io/issues/3044
  * Distribution base_path model validation:
https://pulp.plan.io/issues/3051

Closed:

  * Viewable status endpoint w/out database running:
https://pulp.plan.io/issues/2850
  * Port Dependencies to Python3: https://pulp.plan.io/issues/2247
  * Plugins can specify plugin API version:
https://pulp.plan.io/issues/2656

No action:

  * jwt: https://pulp.plan.io/issues/3248
  * Add Publication.created (MODIFIED, david++):
https://pulp.plan.io/issues/2989


On Mon, Dec 3, 2018 at 3:21 PM David Davis
mailto:davidda...@redhat.com>> wrote:

Thanks for digging through older issues to find potential
RC blockers.

2889 - +1 to making it an RC blocker
2635 - +1 here as well
2850 - I spent some time working on this and didn’t get
far. I think we should just require the db to be running.
I vote to close it out.
2989 - +1 to RC blocker
3044 - I guess we should revisit 3051 and decide on a
design before the RC which will determine if the
distribution endpoints need to be async?
2247 - Agreed on closing. Seems like we open issues on an
as-needed basis
2656 - Seems like this is done or am I missing something?
3062 - Will checking in migrations to source control not
solve this problem?
3248 - I haven’t heard anyone asking for jwt so I would
say we don’t need it. We can just leave the issue open I
think.

David


On Mon, Dec 3, 2018 at 2:41 PM Austin Macdonald
mailto:aus...@redhat.com>> wrote:

To be on the safe side, I'd like to highlight issues
that *might* need to be RC blockers. Please reply
directly onto the issue, I'll update this thread
periodically if necessary.

REST API, backwards incompatible changes:

  * Add Task Names:
  o https://pulp.plan.io/issues/2889
  o IMO: We should make this an RC Blocker,
because this will be an additional requirement
for every task in every plugin.
  * Determine mutable fields
  o https://pulp.plan.io/issues/2635
  o IMO: someone (or a group) should take this as
assigned and audit the mutability of fields.
If we find one that needs to change, it will
be a backwards incompatible change to the REST
API, so this should have the RC blocker tack.
  * Status API without db connection
  o https://pulp.plan.io/issues/2850
  o IMO: RC blocker or close. As it is the db
connection field is not useful, and later
removal would be backwards incompatible.
  * Add new field, Publication.created
  o https://pulp.plan.io/issues/2989
  o IMO: RC blocker or close, this would be a
backwards incompatible change.
  * Asynchronous Distribution update/delete
  o https://pulp.plan.io/issues/3044
  o IMO: RC blocker or close, this would be a
backwards incompatible change.

Packaging

  * Port dependencies to Python 3
  o https://pulp.plan.io/issues/2247
  o IMO: It seems like if this weren't done, we'd
be having problems. Anyone mind if I close
this one? If we do need to keep it open,
should it be an RC blocker?
  * Plugins can declare PluginAPI version
 

Re: [Pulp-dev] Proposal to remove 'notes' fields from the Pulp 3 RC

2018-12-04 Thread Jeff Ortel

no objection

On 12/3/18 10:32 PM, Daniel Alley wrote:

*Background:*

"Notes" are a generic key value store where data can be attached to 
repositories and content and publications and so forth.  The eventual 
plan is to use this to enable adding tags to those sorts of objects, 
which is important for Katello.


Most of the code for this is located in pulp/app/models/generic.py

*Motivation:*

"Notes" have been in Pulp 3 for a very very long time and are 
completely unchanged for the last 12 months (the git history doesn't 
go back further because the file was moved). The data model behind it 
is extremely complex and while we have a few unit tests around it, we 
have no functional tests for it whatsoever, and (to my knowledge) we 
haven't been using/exercising this functionality manually in a 
meaningful way (if at all).  I could be wrong here, but I haven't seen 
it discussed  or any issues related to it filed in quite some time.


*Proposal:*

We should pull out all of the "notes" code (models/generic.py + the 
fields on the aforementioned models) until we've had a chance to 
properly evaluate our needs and whether the current design fits them.


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Distributing Pulp3 Plans

2018-12-03 Thread Jeff Ortel

+1

On 12/1/18 6:01 AM, David Davis wrote:

+1 from me.

David


On Fri, Nov 30, 2018 at 10:26 AM Dennis Kliban > wrote:


No objections from me.

On Thu, Nov 29, 2018 at 7:50 AM Brian Bouterse
mailto:bbout...@redhat.com>> wrote:

The plan about 12-24 months ago was to distribute Pulp3 with
Pulp itself on a machine hosted in the osci.io
 community environment. We have this ticket
tracking that work [0] (still at NEW).

I commented [1] that I think our distribution plans now
involve mainly PyPI releases, and we probably won't self-host
our release infrastructure. Is that what others think?

If we aren't self-hosting with Pulp, can we close this ticket
[0], clean up the infra wiki [2], and ask OSCI to deprovision
their machine?

[0]: https://pulp.plan.io/issues/2325
[1]: https://pulp.plan.io/issues/2325#note-32
[2]:

https://pulp.plan.io/projects/pulp/wiki/Infrastructure_&_Hosting#Distribute-Pulp-with-Pulp

Thanks!
Brian
___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev

___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Proposal: merge the content-app & streamer

2018-11-30 Thread Jeff Ortel

_BACKGROUND_

The pulp3 content app and the streamer (in-progress) currently have a 
lot of duplicate code and functionality.  At the very least, I think 
there is a opportunity to refactor both and share code.  But, this would 
leave us with two components with significant overlap in functionality.


The functionality exclusive to the content-app:
  - Optionally delegate file serving to a web server. (Eg: mod_xsendfile).
  - Optional redirect to the streamer.

The functionality exclusive to the streamer:
  - Using the Remote & RemoteArtifact to download the file and stream 
on demand.


Not much difference which raises the question: "Why do we have both?"  I 
think the answer may be that we don't.


_PROPOSAL_

Let's pull the content-app out and merge it with the streamer.  The new 
content (app) would have /streamer/ architecture & functionality.  When 
a requested artifact has not been downloaded, it would download/streamed 
instead of REDIRECT.  This does mean that deployments and development 
environments would need to run an additional service to serve content.  
The /pulp/content endpoint would be on a different port than the API.  I 
see this separation as a healthy thing.  There is significant efficiency 
to be gained as well.  Let's start with eliminating the REDIRECTs.  
Cutting the GET requests in half is a win for both the client, the 
network and the Pulp web stack.  Next is database queries.  Since both 
applications needed to perform many of the same queries, combining the 
applications will roughly cut them in half as well.  Since the streamer 
is based on asyncio and so would the merged app.


There are probably lots of other pros/cons I have not considered but it 
seems relatively straight forward.


I'm thinking the new content app/service would be named: /pulp-content/.

Thoughts?

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Auto-distribution

2018-11-28 Thread Jeff Ortel



On 11/27/18 11:45 PM, James Cassell wrote:

On Tue, Nov 27, 2018, at 4:22 PM, Jeff Ortel wrote:


On 11/27/18 3:20 PM, Jeff Ortel wrote:


On 11/27/18 8:29 AM, Austin Macdonald wrote:

Yes, and AFAIK this is already complete. There are 2 fields on the
Distribution that allow auto-distribution. These fields must both be
set, and when they are, new publications will automatically update
the distribution.

The Auto-distribution feature is not the same as auto-publish in pulp2
which automatically triggered a publish at the end of a sync.  The
auto-distribution feature automatically makes a newly created
publication "live" after it has been created.  This is done by
updating distributions (per configuration) with the newly created
publication. As a result, the publication will be served by the
distribution.  This is different than auto-publish in pulp2.

Currently, there are no plans to support pulp2 auto-publish in pulp3.


How would one achieve the same behaviour?  Is this a big functionality loss?


By not providing auto-publish, the responsibility for publishing after 
sync just shifts to the user.  For use cases where the user is manually 
triggering the sync, a subsequent API call would be needed to also 
trigger the publish.  For scheduled sync use cases or other cases where 
the sync is trigger though external automation, the automation could 
implement auto publish by triggering a publish following a successful 
sync.  This seems straight forward enough and puts the responsibility 
for making the decision to publish in the hands of the user.


The decision to not provide auto-publish in the pulp 3.0 core does not 
imply that the auto-publish flow has been dismissed as insignificant.  
But instead, is intended to promote implementations outside the core 
because it seemed more appropriate.  If this approach proves to be a 
significant burden on the user, we've had some preliminary discussions 
on mitigation.  For example, some tooling could be provided.  For 
example: some libs for python and bash scripting etc.  But, in the end, 
if users end up wanting/needing this back in core, that can be discussed 
as well.




V/r,
James Cassell


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Auto-distribution

2018-11-27 Thread Jeff Ortel



On 11/27/18 3:20 PM, Jeff Ortel wrote:



On 11/27/18 8:29 AM, Austin Macdonald wrote:
Yes, and AFAIK this is already complete. There are 2 fields on the 
Distribution that allow auto-distribution. These fields must both be 
set, and when they are, new publications will automatically update 
the distribution.


The Auto-distribution feature is not the same as auto-publish in pulp2 
which automatically triggered a publish at the end of a sync.  The 
auto-distribution feature automatically makes a newly created 
publication "live" after it has been created.  This is done by 
updating distributions (per configuration) with the newly created 
publication. As a result, the publication will be served by the 
distribution.  This is different than auto-publish in pulp2.


Currently, there are no plans to support pulp2 auto-publish in pulp3.





https://github.com/pulp/pulp/blob/master/pulpcore/pulpcore/app/serializers/repository.py#L281-L282
https://github.com/pulp/pulp/blob/master/pulpcore/pulpcore/app/serializers/repository.py#L301-L302

On Tue, Nov 27, 2018 at 9:14 AM Kersom <mailto:ker...@redhat.com>> wrote:


Pulp 2 has the concept of auto-publish.[0]

Are we creating something like "auto-distribution" or something
like that for Pulp 3?

I could not find any related issue.

[0]

https://docs.pulpproject.org/dev-guide/integration/rest-api/repo/cud.html?highlight=auto_publish

Regards,
___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Auto-distribution

2018-11-27 Thread Jeff Ortel



On 11/27/18 8:29 AM, Austin Macdonald wrote:
Yes, and AFAIK this is already complete. There are 2 fields on the 
Distribution that allow auto-distribution. These fields must both be 
set, and when they are, new publications will automatically update the 
distribution.


The Auto-distribution feature is not the same as auto-publish in pulp2 
which automatically triggered a publish at the end of a sync. The 
auto-distribution feature automatically makes a newly created 
publication "live" after it has been created.  This is done by updating 
distributions (per configuration) with the newly created publication. As 
a result, the publication will be served by the distribution.  This is 
different than auto-publish in pulp2.




https://github.com/pulp/pulp/blob/master/pulpcore/pulpcore/app/serializers/repository.py#L281-L282
https://github.com/pulp/pulp/blob/master/pulpcore/pulpcore/app/serializers/repository.py#L301-L302

On Tue, Nov 27, 2018 at 9:14 AM Kersom > wrote:


Pulp 2 has the concept of auto-publish.[0]

Are we creating something like "auto-distribution" or something
like that for Pulp 3?

I could not find any related issue.

[0]

https://docs.pulpproject.org/dev-guide/integration/rest-api/repo/cud.html?highlight=auto_publish

Regards,
___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] SSL/OID content-guard

2018-11-26 Thread Jeff Ortel
To support content protection using an X.509 certificate containing OID 
extensions a concrete ContentGuard needs to be developed.  The question 
is, in which plugin does this belong?  Issue #4009 suggests the RPM 
plugin.  I'm not convinced that this is specific only to protecting 
RPMs.  Is it?



[1] https://pulp.plan.io/issues/4009

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Single-Table Content API Changes, Performance Discussion

2018-11-26 Thread Jeff Ortel



On 11/20/18 11:31 AM, Dennis Kliban wrote:
On Mon, Nov 19, 2018 at 6:20 PM Daniel Alley > wrote:



Some of the API changes that are required by single-table-content
would be beneficial even if we didn't go forwards with the
modelling changes. For instance, currently we have single
endpoints for each of repository_version/.../content/,
.../added_content/, and .../removed_content/ which mix content of
all types together.  This makes it impossible for clients to
expect the data returned to expect any particular schema.  What
the single-table-content does is to provide separate query urls
for each content type present in the repository version, which I
believe is a usability win for us, and it's something we could
implement without using any of the modelling changes.


The current behavior of the 'content' APIs is already causing a 
problem for our OpenAPI 2.0 schema. OpenAPI 2.0 does not support 
polymorphic responses. We are currently tracking problem with a 
bug[0]. The only way to resolve this problem is to provide APIs that 
return heterogeneous types.


[0] https://pulp.plan.io/issues/4052

Besides being a general update, I'd like to start a discussionto
understand:  is changing the Pulp 3 API so that it's organized
around content type URLs OK with everyone? This resolves the
usability issues of returning mixed types. Are there any downsides
with this approach?

To clarify what I mean on that last point -- by "content type
URLs" I mean that where you currently get back the url
"/pulp/api/v3/repository_version/.../content/" under the
"_content" field on a repoversion, you would instead get back
something like

{ "pulp_file.filecontent":
"/pulp/api/v3/content/file/files/?repository_version=.. }


I am +1 to making this change to our REST API.


+1



___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Basic Lazy Streaming for Pulp3 Ready for Rough Testing

2018-11-26 Thread Jeff Ortel
The initial planning for lazy omitted content protection for 3.0. Since 
then, we have pulled content protection back into 3.0 re: 
content-guards.  In pulp2, the content app redirected using a signed-url 
so that clients could not circumvent content protection. Currently in 
3.0, there is nothing to keep clients from circumventing content 
protection by going directly to the streamer. Isn't this a gap?



On 11/20/18 3:51 PM, Brian Bouterse wrote:
I've been developing the streamer functionality, and it's correctly 
working (in my testing) as driven from the Remote.policy attribute. It 
correctly works with 'immediate', 'on_demand', and 'cache_only'. Read 
more about the expected behaviors in the epic [0].


# Try it out!
Here is the core commit needed: https://github.com/pulp/pulp/pull/3738
Here is the streamer you should pip install from master: 
https://github.com/bmbouter/pulp_streamer
Here is what it looks like to port a plugin using DeclarativeVersion, 
e.g. pulp_file to support lazy: https://github.com/pulp/pulp_file/pull/132


You'll need to configure Pulp's webserver for streaming. I did this by 
exporting an environment var to dynaconf in the same bash environment 
as my django run server. Specifically I configured Pulp to redirect to 
port localhost:8080/streamer/ with this command:


export PULP_CONTENT='@json {"HOST": null, "WEB_SERVER": "django", 
"REDIRECT": {"ENABLED": true, "PORT": 8080, "HOST": "localhost", 
"PATH_PREFIX": "/streamer/"}}'


Then I run the streamer (after pip installed) with gunicorn which you 
also need to pip install. Run it with:


gunicorn pulpcore.streamer:server --bind localhost:8080 --worker-class 
aiohttp.GunicornWebWorker -w 2


Then sync a pulp_file repo with policy='on_demand' or 
policy='cache_only' and see how Pulp behaves.


Feedback, ideas, concerns are welcome in any form. Note this is still 
rough, and the following are known things to be done:


* fix tests to get Travis passing
* docs for the streamer and for pulpcore
* an installer role to install the streamer
* integration with squid to cache lots of data at the streamer
* transfer the pulp_streamer to the Pulp org on github
* publish an initial release to PyPI for users to use it
* write a blog post about porting to it and using it
* make a demo

[0]: https://pulp.plan.io/issues/3693

Thanks!
Brian

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] To integrate Fus or not to....

2018-10-12 Thread Jeff Ortel



On 10/12/2018 11:37 AM, Milan Kovacik wrote:



On Fri, Oct 12, 2018 at 5:17 PM Jeff Ortel <mailto:jor...@redhat.com>> wrote:




On 10/12/2018 09:53 AM, Milan Kovacik wrote:



On Fri, Oct 12, 2018 at 3:59 PM Jeff Ortel mailto:jor...@redhat.com>> wrote:



On 10/10/2018 08:59 AM, Milan Kovacik wrote:

...that might be the question we should ask ourselves once
again when it comes to recursive copying of units between
repositories.

I'd like to poll folks opinions about the possibilities that
we may have when it comes to integrating third party solvers
in Pulp. My yesterday's chat with the #fedora-modularity
folks about us integrating the Fus[1] solver in order to
reuse the Fus algorithm ran into a couple of bumps:

* it would be laborous to create a programmatic Python API
between Fus and Pulp because we can't directly use the
libsolv thingies (pools, solvables and friends) in such an
API because Fus is written utilizing GObject, which is
incompatible with Swig, which in turn is used in libsolv to
expose the python bindings. One would have to either re-wrap
libsolv code in Fus to work with pygobject or submit PRs
against libsolv to support GObject introspection. I dunno
the details of either approach (yet) but from the sad faces
on the IRC and the Fus PR[1] it seemed like a lot of work
but it's still an option

* we still should be able to integrate thru a pipe into Fus,
that would make it possible to dump modular and ursine
metadata into Fus to perform the dependency solving in a
separate subprocess. We should probably re-check the reasons
behind our previous decision not to do the same with DNF[2].


How is Integration with Fus via pipe (CLI) easier than with
gobject?  Either way, you "can't directly use the libsolv
thingies (pools, solvables and friends)".  Right?  What am I
missing?


Right, a publish-like operation would be required every time, for
all repositories involved in the copy to dump the metadata to the
pipe(s); sample of this interface is can be found in Pungi:
https://pagure.io/pungi/blob/master/f/pungi/wrappers/fus.py the
"query" is passed thru command line.
I just learnt Fedora will keep modules and their ursine deps in
separate repos, so the source repo won't necessarily be closed on
dependencies thus multiple source repos would be needed.


This be done using the Fus gobject interface as well?


we'd just dump the XML (and YAML) metadata and run: fus --repo 
source1,1,/path/to/pipe1 --repo source2,2,/path/to/pipe2 --repo 
target,system,/path/to/target_pipe  "module(walrus)" "penguin:1-2.3" etc

then parse the textual output of fus such as:


Can't this ^ be done with Fus through gobject as well and instead of 
parsing textual output, inspect the objects returned?




# ---%>-
  - nothing provides policycoreutils-python-utils needed by 
container-selinux-2:2.69-3.git452b90d.module_2040+0e96cf1b.noarch

Problem 1 / 1:
  - conflicting requests
  - nothing provides libpthread.so.0(GLIBC_2.2.5)(64bit) needed by 
atomic-1.22.1-2.module_1637+1872e86a.x86_64
  - nothing provides libc.so.6(GLIBC_2.2.5)(64bit) needed by 
atomic-1.22.1-2.module_1637+1872e86a.x86_64
  - nothing provides libpthread.so.0(GLIBC_2.3.2)(64bit) needed by 
atomic-1.22.1-2.module_1637+1872e86a.x86_64
  - nothing provides /bin/bash needed by 
atomic-1.22.1-2.module_1637+1872e86a.x86_64
  - nothing provides /usr/bin/python3 needed by 
atomic-1.22.1-2.module_1637+1872e86a.x86_64
  - nothing provides python3-dateutil needed by 
atomic-1.22.1-2.module_1637+1872e86a.x86_64
  - nothing provides dbus needed by 
atomic-1.22.1-2.module_1637+1872e86a.x86_64

# >%--
(fus:8524): fus-WARNING **: 15:13:09.350: Can't resolve all solvables
module:docker:2017.0:20180816194539:3ff668f0.x86_64@f29
module:container-tools:2017.0:20180816194450:80bd9113.x86_64@f29
*docker-devel-2:1.13.1-61.git9cb56fd.module_2109+7c83ead1.noarch@f29
*containers-common-0.1.31-14.dev.gitb0b750d.module_2040+0e96cf1b.x86_64@f29





* we should be able to extend current libsolv solver in
Pulp, reimplementing the algorithm from Fus. This might be
as laborous as the first option. It would probably give us
more flexibility as well as more room for screwing things up
but the responsibility would be ours alone.

Please let me know what option seems more appealing to you;
other option suggestion are welcome  too.

Cheers,
milan

[1] https://github.com/fedora-modularity/fus/pull/46
[2] https://pulp.plan.io/issues/3528#note-7


___
Pulp-dev mailing list
P

Re: [Pulp-dev] To integrate Fus or not to....

2018-10-12 Thread Jeff Ortel



On 10/12/2018 09:53 AM, Milan Kovacik wrote:



On Fri, Oct 12, 2018 at 3:59 PM Jeff Ortel <mailto:jor...@redhat.com>> wrote:




On 10/10/2018 08:59 AM, Milan Kovacik wrote:

...that might be the question we should ask ourselves once again
when it comes to recursive copying of units between repositories.

I'd like to poll folks opinions about the possibilities that we
may have when it comes to integrating third party solvers in
Pulp. My yesterday's chat with the #fedora-modularity folks about
us integrating the Fus[1] solver in order to reuse the Fus
algorithm ran into a couple of bumps:

* it would be laborous to create a programmatic Python API
between Fus and Pulp because we can't directly use the libsolv
thingies (pools, solvables and friends) in such an API because
Fus is written utilizing GObject, which is incompatible with
Swig, which in turn is used in libsolv to expose the python
bindings. One would have to either re-wrap libsolv code in Fus to
work with pygobject or submit PRs against libsolv to support
GObject introspection. I dunno the details of either approach
(yet) but from the sad faces on the IRC and the Fus PR[1] it
seemed like a lot of work but it's still an option

* we still should be able to integrate thru a pipe into Fus, that
would make it possible to dump modular and ursine metadata into
Fus to perform the dependency solving in a separate subprocess.
We should probably re-check the reasons behind our previous
decision not to do the same with DNF[2].


How is Integration with Fus via pipe (CLI) easier than with
gobject?  Either way, you "can't directly use the libsolv thingies
(pools, solvables and friends)". Right?  What am I missing?


Right, a publish-like operation would be required every time, for all 
repositories involved in the copy to dump the metadata to the pipe(s); 
sample of this interface is can be found in Pungi: 
https://pagure.io/pungi/blob/master/f/pungi/wrappers/fus.py the 
"query" is passed thru command line.
I just learnt Fedora will keep modules and their ursine deps in 
separate repos, so the source repo won't necessarily be closed on 
dependencies thus multiple source repos would be needed.


This be done using the Fus gobject interface as well?



* we should be able to extend current libsolv solver in Pulp,
reimplementing the algorithm from Fus. This might be as laborous
as the first option. It would probably give us more flexibility
as well as more room for screwing things up but the
responsibility would be ours alone.

Please let me know what option seems more appealing to you; other
option suggestion are welcome  too.

Cheers,
milan

[1] https://github.com/fedora-modularity/fus/pull/46
[2] https://pulp.plan.io/issues/3528#note-7


___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev



___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] content guards

2018-10-12 Thread Jeff Ortel
I'd like to get broader input on #3968 from comment: 
https://pulp.plan.io/issues/3968#note-14 on.


I don't think this falls within the classic discussion of thin/fat 
models.  Mainly because the methods included in what is considered a 
/fat/ model by django are still confined within the concerns of the data 
model.  We already have a few /fat/ models such as RepositoryVersion.  
What's being discussed for content-guards (and previously discussed for 
sync and publish) is different.



[1] 
https://django-best-practices.readthedocs.io/en/latest/applications.html#make-em-fat
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] To integrate Fus or not to....

2018-10-12 Thread Jeff Ortel



On 10/10/2018 08:59 AM, Milan Kovacik wrote:
...that might be the question we should ask ourselves once again when 
it comes to recursive copying of units between repositories.


I'd like to poll folks opinions about the possibilities that we may 
have when it comes to integrating third party solvers in Pulp. My 
yesterday's chat with the #fedora-modularity folks about us 
integrating the Fus[1] solver in order to reuse the Fus algorithm ran 
into a couple of bumps:


* it would be laborous to create a programmatic Python API between Fus 
and Pulp because we can't directly use the libsolv thingies (pools, 
solvables and friends) in such an API because Fus is written utilizing 
GObject, which is incompatible with Swig, which in turn is used in 
libsolv to expose the python bindings. One would have to either 
re-wrap libsolv code in Fus to work with pygobject or submit PRs 
against libsolv to support GObject introspection. I dunno the details 
of either approach (yet) but from the sad faces on the IRC and the Fus 
PR[1] it seemed like a lot of work but it's still an option


* we still should be able to integrate thru a pipe into Fus, that 
would make it possible to dump modular and ursine metadata into Fus to 
perform the dependency solving in a separate subprocess. We should 
probably re-check the reasons behind our previous decision not to do 
the same with DNF[2].


How is Integration with Fus via pipe (CLI) easier than with gobject?  
Either way, you "can't directly use the libsolv thingies (pools, 
solvables and friends)".  Right?  What am I missing?




* we should be able to extend current libsolv solver in Pulp, 
reimplementing the algorithm from Fus. This might be as laborous as 
the first option. It would probably give us more flexibility as well 
as more room for screwing things up but the responsibility would be 
ours alone.


Please let me know what option seems more appealing to you; other 
option suggestion are welcome  too.


Cheers,
milan

[1] https://github.com/fedora-modularity/fus/pull/46
[2] https://pulp.plan.io/issues/3528#note-7


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Changeset code in Pulp 3

2018-10-11 Thread Jeff Ortel
Looks like we're fully invested in stages.  I don't think it makes sense 
to maintain both.  We can always resurrect it later (in some form) as 
needed.


On 10/10/2018 01:17 PM, David Davis wrote:
As part of the upcoming RC release, there was a question as to whether 
the Changeset code could  removed. AFAIK, there is only one plugin 
still using it (pulp_ansible) although there’s a ticket to update it 
to use the Stages code. I wanted to ask though if we were planning to 
keep the Changeset code in Pulp 3?


David


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] URL Word Separators

2018-09-17 Thread Jeff Ortel

What is the project policy on word separators in URLs?

My take on 3 most common options:

1. The words run together - is hard to read. Example: /contentguard/
2. Hyphens in URLs are easy to type and read.  Most common and 
recommended based on my limited search.  Example: /content-guard/
3. Underscores strike me as odd outside of programming languages. Harder 
to type.  Example: /content_guard/


Does django have a recommendation/limitation?

Thoughts?



___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Triage of #3915

2018-09-11 Thread Jeff Ortel
Issue 3915 has been skipped several times during triage with a request 
for discussion on the issue.  I feel this issue identifies a serious 
concern.  Investigating has also raised questions about our approach to 
supporting bulk_create() of Artifact.


Please review and comment before the next triage.

Thanks!

https://pulp.plan.io/issues/3915

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Proposal to drop support of Python 3.5 for Pulp 3

2018-09-11 Thread Jeff Ortel

+1

On 09/07/2018 01:09 AM, Simon Baatz wrote:

I had a discussion on IRC with Brian yesterday which led to the
question whether we can drop support for Python 3.5. I think there are
good reasons for this, see the rationale below.

Brian proposed to initiate a vote on this topic (and find out whether
this "community thing" works :-) ).

Please send feedback by Friday Sept 14th. Especially, let me know if
there are specific reasons for depending on Python 3.5. The
corresponding issue is 3984 [7].

Cheers,
Simon


Rationale:

The trigger for the discussion was to get rid of boilerplate code like
this [0], [1] to handle batches in the stages API. This becomes a
single line [2] when using an asynchronous generator [3]. Adding the
`batches()` async generator to Pulp core would simplify existing
stages and ease implementation of stages in plugins.

Async generators have been introduced in Python 3.6. Thus, to make the
`batches` generator available in the Pulp core plugin API, we either

- have to drop support for Python 3.5 or

- reimplement the async generator as an async iterator (which would be
   more convoluted but looks doable)


I prefer to drop 3.5, since this will allow to use additional language
features[4]. Among them:

- As said, async generators/async comprehensions. Async generators are
   simpler to write and understand than async iterators.

- String interpolation "f-Strings" [5]

- dict objects preserve insertion-order (officially declared part of
   the language with Python 3.7). Eliminates a source of subtle
   "works on 3.6, sometimes works on 3.5" bugs.

- One version less to support is always a good thing (provided nobody
   really requires it)

- Type annotations are currently not used by the Pulp project, but if
   the project decides to use them in the future: IMHO type annotations
   (which are great btw.) began to feel “right” with 3.6. Working with
   them in 3.5 can be clumsy at times.

- And of course: [6]


Python 3.6 has the OS/distribution support we need:

- Python 3.6 SCL is available for RHEL 7 / CentOS 7
- It is part of Fedora as of Fedora 26

For Ubuntu, it is part of 18.04 LTS. Debian does not have Python 3.6 in stable 
yet.



[0] 
https://github.com/pulp/pulp/blob/631031e38270c5c7c2b2289ff4ab87a058447c5e/plugin/pulpcore/plugin/stages/content_unit_stages.py#L47-L59
[1] 
https://github.com/pulp/pulp/blob/631031e38270c5c7c2b2289ff4ab87a058447c5e/plugin/pulpcore/plugin/stages/artifact_stages.py#L48-L60
[2] 
https://github.com/gmbnomis/pulp_cookbook/blob/ca4882cecab16995c5713d27131da8112a5f5a0c/pulp_cookbook/app/tasks/synchronizing.py#L98
[3] 
https://github.com/gmbnomis/pulp_cookbook/blob/d44ed593925b78c046e1b568810b15acbdad5ac4/pulp_cookbook/app/tasks/synchronizing.py#L26
[4] https://docs.python.org/3/whatsnew/3.6.html
[5] 
https://docs.python.org/3/whatsnew/3.6.html#pep-498-formatted-string-literals
[6] https://twitter.com/raymondh/status/844955415259463681
[7] https://pulp.plan.io/issues/3984

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Settings merge proposal.

2018-08-21 Thread Jeff Ortel
This issue[1] highlighted a shortcoming in how server.yaml is applied to 
the settings.py.  Mainly that the marge algorithm does not support 
removing unwanted properties such as the logging handlers.  The change 
proposed on #3879 is to replace entire top-level properties (trees) 
instead of the fine-grained merge.  The "logging" section is somewhat 
unique and the original problem has been mitigated by #3883 
 which changed the logging default to 
CONSOLE.


Do we still want/need to change the way server.yml is applied to 
settings.py as proposed on #3879?  There are pros/cons to either 
approach and thought it should be discussed before moving forward.



[1] https://pulp.plan.io/issues/3879
[2] https://pulp.plan.io/issues/3883 
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Requiring 2FA in Github

2018-08-20 Thread Jeff Ortel

+1

On 08/15/2018 01:10 PM, David Davis wrote:
Thanks everyone for the feedback. I have opened a PR for PUP-7 which 
(if approved) will require 2FA for the Pulp organization in Github:


https://github.com/pulp/pups/pull/14

Feedback welcome. Also, I'd like to call for a vote by August 27, 
2018. Per PUP-1[0], are the voting options:


+1: "Will benefit the project and should definitely be adopted."
+0: "Might benefit the project and is acceptable."
-0: "Might not be the right choice but is acceptable."
-1: "I have serious reservations that need to be thought through and 
addressed."


[0] https://github.com/pulp/pups/blob/master/pup-0001.md

David


On Wed, Aug 1, 2018 at 3:00 PM David Davis > wrote:


+1 to opening a PUP. Seems like that’s the best way to document
the policy. I will start working on this.

David


On Mon, Jul 30, 2018 at 2:21 PM Brian Bouterse
mailto:bbout...@redhat.com>> wrote:

+1 to requiring it. I also already have it enabled. Would it
be possible to either (a) turn this into a short pup and call
for a vote or (b) add a date to close this email thread
decision by?

Let me know if I should help write/review any.

On Sat, Jul 28, 2018 at 6:09 AM, Tatiana Tereshchenko
mailto:ttere...@redhat.com>> wrote:

+1, enabled.

On Fri, Jul 27, 2018 at 12:02 AM, Dennis Kliban
mailto:dkli...@redhat.com>> wrote:

+1, but I already have it enabled.

On Thu, Jul 26, 2018 at 3:53 PM, David Davis
mailto:davidda...@redhat.com>>
wrote:

I got a notification from another organization I
am a member of on Github[0] that they are going to
require Two Factor Authentication[1] in response
to recent news about some malicious code being
shipped in a compromised npm package[2].

We are vulnerable to having malicious code
deployed to PyPI if one of our Github accounts is
compromised. Thus, I wonder if we should also
require that people with a commit bit have Two
Factor Authentication enabled.

Thoughts?

[0]

https://community.theforeman.org/t/require-2fa-for-github-organization-members/10404
[1]

https://help.github.com/articles/requiring-two-factor-authentication-in-your-organization/
[2]
https://www.theregister.co.uk/2018/07/12/npm_eslint/

David

___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev



___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev



___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] 'id' versus 'pulp_id' on Content

2018-08-13 Thread Jeff Ortel



On 08/07/2018 11:47 AM, Jeff Ortel wrote:
After long consideration, I have had a change of heart about this.  I 
think.  In short, Pulp's data model has unique requirements that make 
it acceptable to deviate from common convention regarding ID as the 
PK.  Mainly that the schema is extensible by plugin writers.  Given 
the plugin architecture, I think it's reasonable to think of "core" 
fields like: ID, CREATED and LAST_MODIFIED as metadata.  Although, the 
ID is harder to fit in this semantic, I think it's reasonable to do 
for consistency and to support the user query use-case re: content 
having an natural ID attribute. Taking this further, the /href/ 
attributes /could/ be though of in the same category.


With this in mind, I'm thinking that the leading underscore (_) could 
be used broadly to denote /generated/ /or metadata/ fields and the 
following would be reasonable:


_id
_created
_last_updated


I'm convinced that all tables should have _created.  Knowing when 
something is created helps fulfill many common use cases and is 
essential for troubleshooting.  I am open to including _last_updated 
only on mutable entities .  Depending on the number (ratio) of 
mutable/immutable entities, we could support this with either an 
additional Model class eg: MutableModel or just add _last_updated on 
concrete models.  Either way, the column (attribute) needs to be named 
consistently.




_href
_added_href
_removed_href
_content_href

I highly value consistency so this applies to the entire schema.

This will introduce a few fairly odd things into the schema that I 
/suppose/ we can live with.


- Two fields on /some/ tables named (ID ,  _ID).  To mitigate 
confusion, we should serialize the *pk* and not*_id*.  This will also 
be consistent with *pk* parameters passed in.
- I expect django will generate foreign key fields with double 
understores.  Eg: content__id


I'm still -1 for using a /pulp_/ prefix.

Thoughts?


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Revisit: sync modes

2018-08-09 Thread Jeff Ortel



On 08/09/2018 01:29 PM, Daniel Alley wrote:


It's possible we could want additional sync_modes in the future.
To me, sync mode deals with the contents of the repo during the
sync. There are other ways you would want to have a sync associate
content with a repository. Consider a retention behavior that
retains 5 versions of each unit, e.g. rpms, ansible modules, etc;
that behavior is somewhere in between mirror and additive. If we
make mirror a boolean then to introduce this retention feature we
would have to add it as an additional option. This creates the
downside I hope to avoid which is that interaction between options
becomes complicated.

For example, a call with both (mirror=False, retention=True) now
becomes more complicated to think about. Is it mirroring or using
the retention policy? How do these interact? At that point, it
seems more complicated than what we have now. The way to avoid
this is by keeping them together as one option, but that can only
be done if it stays as a string.


These are all good points but I think "retention" would likely need to 
be a configurable parameter, probably one that you would have to pass 
in.  The default value could mean "unlimited retention", i.e.  "additive".


So what you could do is:

(mirror=False)   # this is normal additive
mode, retain everything.  let's say that default retention=0,
which is nonsensical and would map to this behavior instead

(mirror=False, retention=5)     # retain at most 5 versions of any
given unit

(mirror=False, retention=1) # this is *almost* like mirror
mode, except that you would still keep one historical copy of
units that are no longer present in the upstream repository


Maybe it even makes sense to have retention be able to modify "mirror" 
mode, although this would make the concept of "mirror" more difficult 
to understand as you point out.  Maybe we could find a name that would 
be less misleading.


(mirror=True, retention=5)   # retain at most 5 versions of
any given unit, /but purge units that that are no longer present
in the upstream repo entirely/



This ^^ matches what I was thinking as well.



I don't have a specific use case in mind for that one, but maybe 
someone can think of one?



On Thu, Aug 9, 2018 at 12:53 PM, Brian Bouterse <mailto:bbout...@redhat.com>> wrote:


It's possible we could want additional sync_modes in the future.
To me, sync mode deals with the contents of the repo during the
sync. There are other ways you would want to have a sync associate
content with a repository. Consider a retention behavior that
retains 5 versions of each unit, e.g. rpms, ansible modules, etc;
that behavior is somewhere in between mirror and additive. If we
make mirror a boolean then to introduce this retention feature we
would have to add it as an additional option. This creates the
downside I hope to avoid which is that interaction between options
becomes complicated.

For example, a call with both (mirror=False, retention=True) now
becomes more complicated to think about. Is it mirroring or using
the retention policy? How do these interact? At that point, it
seems more complicated than what we have now. The way to avoid
this is by keeping them together as one option, but that can only
be done if it stays as a string.

On Thu, Aug 9, 2018 at 9:04 AM, Milan Kovacik mailto:mkova...@redhat.com>> wrote:



On Wed, Aug 8, 2018 at 7:54 PM, Jeff Ortel mailto:jor...@redhat.com>> wrote:

I'm not convinced that /named/ sync mode is a good
approach.  I doubt it will ever be anything besides
(additive|mirror) which really boils down to mirror (or
not).  Perhaps the reasoning behind a /named/ mode is that
it is potentially more extensible in that the API won't be
impacted when a new mode is needed. The main problem with
this approach is that the mode names are validated and
interpreted in multiple places. Adding another mode will
require coordinated changes in both the core and most
plugins.  Generally, I'm an advocate of named things like
/modes/ and /policies/ but given the orthogonal nature of
the two modes we currently support _and_ that no /real/ or
anticipated use cases for additional modes are known, I'm
not convinced it's a good fit.  Are there any /real/ or
anticipated use cases I'm missing?


Looking at the code[1] we're actually talking about almost a
(pipeline) factory that has exactly 2 modes of operation with
a limited possibilities of extending, unsure that the
possibility to extend was a goal though.
Moreover 

[Pulp-dev] Revisit: sync modes

2018-08-08 Thread Jeff Ortel
I'm not convinced that /named/ sync mode is a good approach. I doubt it 
will ever be anything besides (additive|mirror) which really boils down 
to mirror (or not).  Perhaps the reasoning behind a /named/ mode is that 
it is potentially more extensible in that the API won't be impacted when 
a new mode is needed.  The main problem with this approach is that the 
mode names are validated and interpreted in multiple places. Adding 
another mode will require coordinated changes in both the core and most 
plugins.  Generally, I'm an advocate of named things like /modes/ and 
/policies/ but given the orthogonal nature of the two modes we currently 
support _and_ that no /real/ or anticipated use cases for additional 
modes are known, I'm not convinced it's a good fit. Are there any /real/ 
or anticipated use cases I'm missing?


I propose we replace the (str)sync_mode="" with (bool)mirror=False 
anywhere stored or passed.


Are there any /real/ or anticipated use cases I'm missing?

Thoughts?

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Installing RQ

2018-08-07 Thread Jeff Ortel
It has been my experience that /usr/bin/rq is only installed by 'pip 
install rq' and it's not installed by 'pip3 install rq'.


The only work around I have found is to:

1. pip install rq
2. pip3 install rq
3. edit /usr/bin/rq to use python3.

How is this handled in the vagrant environment?  It's not obvious to me 
looking at pulp-devel.


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] 'id' versus 'pulp_id' on Content

2018-08-07 Thread Jeff Ortel
After long consideration, I have had a change of heart about this. I 
think.  In short, Pulp's data model has unique requirements that make it 
acceptable to deviate from common convention regarding ID as the PK.  
Mainly that the schema is extensible by plugin writers. Given the plugin 
architecture, I think it's reasonable to think of "core" fields like: 
ID, CREATED and LAST_MODIFIED as metadata. Although, the ID is harder to 
fit in this semantic, I think it's reasonable to do for consistency and 
to support the user query use-case re: content having an natural ID 
attribute.  Taking this further, the /href/ attributes /could/ be though 
of in the same category.


With this in mind, I'm thinking that the leading underscore (_) could be 
used broadly to denote /generated/ /or metadata/ fields and the 
following would be reasonable:


_id
_created
_last_updated

_href
_added_href
_removed_href
_content_href

I highly value consistency so this applies to the entire schema.

This will introduce a few fairly odd things into the schema that I 
/suppose/ we can live with.


- Two fields on /some/ tables named (ID ,  _ID).  To mitigate confusion, 
we should serialize the *pk* and not*_id*. This will also be consistent 
with *pk* parameters passed in.
- I expect django will generate foreign key fields with double 
understores.  Eg: content__id


I'm still -1 for using a /pulp_/ prefix.

Thoughts?


On 06/18/2018 01:15 PM, Daniel Alley wrote:
I'm -1 on going the underscore idea, partly because of the 
aforementioned confusion issue, but also partly because but I've 
noticed that in our API, the "underscore" basically has a semantic 
meeting of "href, [which is] generated on the fly, not stored in the db".


Specifically:

  * '_href'
  * '_added_href'
  * '_removed_href'
  * '_content_href'

So I think if we use a prefix, we should avoid using one that already 
has a semantic meaning (I don't know whether we actually planned for 
that to be the case, but I think it's a useful pattern / distinction 
and I don't think we should mess with it).


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Branch protection

2018-07-10 Thread Jeff Ortel

+1

On 07/10/2018 02:30 PM, David Davis wrote:
We noticed in Pulp that the 2-master branch has branch protection but 
only to prevent force pushes and deletion. I was wondering if we 
should also add these checks:


- Require an approving review
- Require status checks (e.g. unit tests, docs test, flake8)

If so, I think we should also do this for all master and 2-master 
branches for all Pulp core repos (where applicable). Does anyone have 
any thoughts or objections?


I’ll leave this discussion open until July 22, 2018.

David


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Task groups in the Pulp 3 MVP

2018-07-03 Thread Jeff Ortel

+1

On 03/08/2018 05:07 PM, Dennis Kliban wrote:

+1 but we should also remove this[0] from the code.

[0] 
https://github.com/pulp/pulp/blob/3.0-dev/pulpcore/pulpcore/app/models/task.py#L215


On Thu, Mar 8, 2018 at 5:45 PM, Brian Bouterse > wrote:


+1 to removing it from the MVP area. There is already a gap in
several ways with Katello and I think this was one of them.
Additional input from other would be good.

On Thu, Mar 8, 2018 at 4:59 PM, David Davis mailto:davidda...@redhat.com>> wrote:

There’s a section of the MVP doc about task groups:


https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Minimum_Viable_Product#Task-Group



I haven’t heard any talk about supporting them in the MVP.
Should we remove this section?

David

___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev




___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev





___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Branching Pulp 2 for plugins

2018-07-02 Thread Jeff Ortel

pulp-ostree has been updated.

- 2-master created from master
- master reset to 3.0-dev
- 3.0-dev deleted

On 07/02/2018 09:47 AM, David Davis wrote:
In order to conform to the pulp/pulp repository, I propose we update 
our branches for our plugins. This would include:


1. Moving master to 2-master
2. Moving 3.0-dev to master (and removing 3.0-dev)
3. Letting @pcreech know that the branches have changed

I was thinking we could do aim to do so by July 9th. Here are the 
plugins we need to update and some volunteers I have picked randomly 
from a hat to handle updating the branches:


- puppet - @bizhang
- rpm - @daviddavis
- python - @bizhang
- docker - @dkliban
- ostree - @jortel

I’m not sure what to do about our debian plugin though since it 
doesn’t have a 3.0-dev branch.


Any feedback is welcome. Thanks.

David


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pagination in Pulp

2018-06-26 Thread Jeff Ortel



On 06/26/2018 08:48 AM, Dennis Kliban wrote:
The user should be able to specify a page size at request time. The 
user should also be able to specify which page they are requesting.


+ 1


On Tue, Jun 26, 2018 at 9:31 AM, David Davis > wrote:


I was looking at the pagination code this morning and there were
two things I wanted to discuss.

First, there’s no way to override the number of results per
request. Instead, page size has to be configured for the whole
app. Allowing users to override page size is trivial[0] so I
wonder if we should enable it.

The second topic is a bit more complex. We currently use cursor
based pagination where pages must be fetched sequentially as
opposed to the default DRF pagination method of using page
numbers. Cursors work great for large data sets as you don’t have
to figure out things like the number of pages.

The first problem is that in Pulp we parallelize web requests for
things like fetching metadata. See our Ansible plugin as an
example[1]. If we want to support things like syncing content from
one Pulp server to another, we probably have to use
offset/page-based pagination for certain endpoints.

Another consideration is Katello. In Katello’s UI they show the
number of pages and allow users to jump to arbitrary pages or the
last page. If we want Katello to stop indexing Pulp data and
instead query Pulp directly, we’ll need to allow them to use page
numbers somehow.

Thoughts?

[0]
https://gist.github.com/daviddavis/56a0b86629cd675d57aac61583c01944

[1]

https://github.com/pulp/pulp_ansible/blob/master/pulp_ansible/app/tasks/synchronizing.py#L147-L171



David

___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev





___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] 'id' versus 'pulp_id' on Content

2018-06-14 Thread Jeff Ortel



On 06/14/2018 12:19 PM, Jeff Ortel wrote:



On 06/14/2018 10:37 AM, Daniel Alley wrote:
I will make one more suggestion.  What about naming "id" -> "uuid"?  
This carries the clear connotation that it is a unique identifier so 
it is less likely to be confusing a la "id and _id", and is still 
less likely to have a namespace conflict.


Appreciate the suggestion but this would only be marginally less 
confusing.


Reconsidering this suggestion for the reasons you outlined.

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] 'id' versus 'pulp_id' on Content

2018-06-14 Thread Jeff Ortel

Thanks for your comment, Simon.

This introduces a perspective that is helpful to the discussion.  
Filtering on an 'ID' natural key field (such as errata_ID) in a way that 
is intuitive to the user is a significant use case.


On 06/14/2018 12:32 PM, Simon Baatz wrote:

My 2 cents (in my role as a user, not plugin writer): I think the most
important argument in the entire discussion is this (not sure who
said this):


* plugin users (not writers) who are familiar with 'id' as part of the
erratum data type would then have to also understand this field name
renaming that Pulp arbitrarily introduces. This could get confusing
when the user submit a filter with id='ID-2115858' and they find
nothing because 'id' is matching on the primary key not on the 'id'
attribute of the errata like they expect. Those users would also be
Pulp users so they'll understand that _id means the pk.

By the same logic, if Pulp users know that id means pk, wouldn’t they
therefore understand that the id is not the erratum id?

Yes by that logic they probably would know, but the actual errata field
is named 'id' so my it's more about a correctness problem than
confusion. A correctness problem that passes along to users. If we're
going to have confusing names, let's pick names that allow for
alignment with the names already chosen by content types which commonly
do use 'id'. Plugin writer's aren't in control of those names; they
already are chosen by content types.


Assuming that Pulp users are aware of a pk named 'id' is a strong
assumption.  If the user is just managing entire repositories and
searches content from time to time when troubleshooting (using a CLI
for example), she/he could not care less that there is a field called
"id" that is not what it seems to be.

I think the entire discussion is focused on plugin writers too much.
The user visible consequences of this decision are more important from
my point of view.

The situation is not directly comparable, but I already had fun with
confusing id names [0] in the CLI.  I must have been rather annoyed
at the time, since I still remember ;-)


[0] https://www.redhat.com/archives/pulp-list/2016-March/msg00048.html

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] 'id' versus 'pulp_id' on Content

2018-06-14 Thread Jeff Ortel

On 06/14/2018 08:08 AM, Brian Bouterse wrote:
Jeff, can you elaborate more on your -1. I want to understand it. I'm 
struggling to appreciate an "it's a convention" argument without 
sources like an RFC or similar. I don't believe internet articles are 
credible sources because any viewpoint can be validated by an internet 
post.


RFCs typically define standards not conventions. Agreed on internet 
articles being available to support most any viewpoint. FWIW, I didn't 
introduce the aforementioned article.  Conventions are typically 
establish through example.  IMHO, most articles, tutorials, textbooks, 
etc use ID (or TABLE_ID) for the primary key. Also, this convention has 
been applied on /every/ project I have worked on.




To recap my interests here, it's about being responsive to the 
community. We ask plugin writers for feedback and from two independent 
plugin writers (not me) we received feedback that this name wasn't 
ideal. I want us to be responsive to that. It's not only because I 
think their technical feedback is legit (albeit small), but also 
because it's our strategy during the beta/RC of Pulp3 core is to make 
adjustments based on plugin writer feedback. To receive feedback and 
choose to not follow the recommendation they suggested feels like not 
the way I want to interact with plugin writers. This is my main 
concern with not making a change in this area.


I am sensitive to plugin writer requests but changing the name of the 
primary key field for every table in the core because 2 plugin writers 
said that it "wasn't ideal" seems rash.  I'm not convinced that this is 
a correctness concern but rather a minor inconvenience for what seems 
like (so far) a small percentage of plugins.  Plugin writers will always 
need to contend with naming conflicts and I believe the plugin is the 
proper place to resolve them.  I also want to be responsive to feedback 
but I think it's reasonable for the answer to be "no" when the request 
is not in the best interest of the project as a whole.


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] 'id' versus 'pulp_id' on Content

2018-06-13 Thread Jeff Ortel



On 06/12/2018 05:03 PM, David Davis wrote:
I do think the most compelling case for renaming the field is having 
feedback from plugin writers to do so and also the desire to reduce 
complexity for plugin writers. Honestly, I am on the fence about 
renaming the field.


Just to clarify, is anyone a hard -1 on renaming id?


-1




David

On Tue, Jun 12, 2018 at 5:32 PM, Brian Bouterse <mailto:bbout...@redhat.com>> wrote:




On Tue, Jun 12, 2018 at 5:11 PM, David Davis
mailto:davidda...@redhat.com>> wrote:

On Tue, Jun 12, 2018 at 4:50 PM, Brian Bouterse
mailto:bbout...@redhat.com>> wrote:

Silly question, but could we just call our 'id' 'pk'
instead? Since that is a fully reserved value in Django
for the primary key it seems clearest to just use that?
What about that?


Are you recommending we rename the id field to pk in the
database? I’m not sure if that would work.


I'm wondering if its possible yes. #django says it is but they've
been wrong before. I haven't had a chance to test it.


On Tue, Jun 12, 2018 at 3:44 PM, Jeff Ortel
mailto:jor...@redhat.com>> wrote:

On 06/08/2018 02:57 PM, Brian Bouterse wrote:


@jortel: We're blocked on your -1 vote expressed
for 3704. We have practical plugin writer issues
with the current state. Can you elaborate on why
we shouldn't go forward with
https://pulp.plan.io/issues/3704
<https://pulp.plan.io/issues/3704>


The 'ID' column is reserved for the primary key and is
inappropriate for natural keys.  This is well
establish convention and best practice. 



I don't understand this reasoning. Earlier in the thread
we discussed how the sources recommending these
conventions also mention that if we have a practical
reason or problem with that convention to do something
differently. We received complaints on this name about
collisions so I don't follow how we should still follow
the convention.

Plugin writers specify natural keys.  Also, by
introducing '_' prefix (or any prefix) means a table
could have both 'ID' and '_ID' columns which is
especially confusing since the 'ID' column would not
be the primary key.


We have two concepts here that are similar, so I think
that problem is mostly unrelated to this decision. For
example, if we leave the names as-is we have this problem
only now it's named id and errata_id and in addition we'll
have the problems listed below.


How does naming the natural key for an rpm as 'rpm_id'
cause a significant problem for plugin writers?


It's a good question because it's the whole motivation for
this change. It's not an rpm, it's an erratum which
doesn't have nevra like a package. It's also the problem
from another content type I heard about at Config
Management Camp.

It causes problems in two ways:

* plugin users (not writers) who are familiar with 'id' as
part of the erratum data type would then have to also
understand this field name renaming that Pulp arbitrarily
introduces. This could get confusing when the user submit
a filter with id='ID-2115858' and they find nothing
because 'id' is matching on the primary key not on the
'id' attribute of the errata like they expect. Those users
would also be Pulp users so they'll understand that _id
means the pk.


By the same logic, if Pulp users know that id means pk,
wouldn’t they therefore understand that the id is not the
erratum id?


Yes by that logic they probably would know, but the actual errata
field is named 'id' so my it's more about a correctness problem
than confusion. A correctness problem that passes along to users.
If we're going to have confusing names, let's pick names that
allow for alignment with the names already chosen by content types
which commonly do use 'id'. Plugin writer's aren't in control of
those names; they already are chosen by content types.


* plugins specifically may wrap other tools and now they
have to maintain mappings as well. This is specifically
the case with errata which the data model is design to be
name-for-name identical to the createrepo_c interface


Mapping one field to another seems rather minor. Or am I
missing something?


After 22 emails on this th

Re: [Pulp-dev] Lazy for Pulp3

2018-05-31 Thread Jeff Ortel



On 05/31/2018 04:39 PM, Brian Bouterse wrote:
I updated the epic (https://pulp.plan.io/issues/3693) to use this new 
language.


policy=immediate  -> downloads now while the task runs (no lazy). Also 
the default if unspecified.
policy=cache-and-save   -> All the steps in the diagram. Content that 
is downloaded is saved so that it's only ever downloaded once.
policy=cache -> All the steps in the diagram except step 14. If 
squid pushes the bits out of the cache, it will be re-downloaded again 
to serve to other clients requesting the same bits.


These policy names strike me as an odd, non-intuitive mixture. I think 
we need to brainstorm on policy names and/or additional attributes to 
best capture this.  Suggest the epic be updated to describe the "modes" 
or use cases without the names for now.  I'll try to follow up with 
other suggestions.




Also @milan, see inline for answers to your question.

On Wed, May 30, 2018 at 3:48 PM, Milan Kovacik > wrote:


On Wed, May 30, 2018 at 4:50 PM, Brian Bouterse
mailto:bbout...@redhat.com>> wrote:
>
>
> On Wed, May 30, 2018 at 8:57 AM, Tom McKay
mailto:thomasmc...@redhat.com>> wrote:
>>
>> I think there is a usecase for "proxy only" like is being
described here.
>> Several years ago there was a project called thumbslug[1] that
was used in a
>> version of katello instead of pulp. It's job was to check
entitlements and
>> then proxy content from a cdn. The same functionality could be
implemented
>> in pulp. (Perhaps it's even as simple as telling squid not to
cache anything
>> so the content would never make it from cache to pulp in
current pulp-2.)
>
>
> What would you call this policy?
> policy=proxy?
> policy=stream-dont-save?
> policy=stream-no-save?
>
> Are the names 'on-demand' and 'immediate' clear enough? Are
there better
> names?
>>
>>
>> Overall I'm +1 to the idea of an only-squid version, if others
think it
>> would be useful.
>
>
> I understand describing this as a "only-squid" version, but for
clarity, the
> streamer would still be required because it is what requests the
bits with
> the correctly configured downloader (certs, proxy, etc). The
streamer
> streams the bits into squid which provides caching and client
multiplexing.

I have to admit it's just now I'm reading

https://docs.pulpproject.org/dev-guide/design/deferred-download.html#apache-reverse-proxy


again because of the SSL termination. So the new plan is to use the
streamer to terminate the SSL instead of the Apache reverse proxy?


The plan for right now is to not use a reverse proxy and have the 
client's connection terminate at squid directly either via http or 
https depending on how squid is configured. The Reverse proxy in 
pulp2's design served to validate the signed urls and rewrite them for 
squid. This first implementation won't use signed urls. I believe that 
means we don't need a reverse proxy here yet.



W/r the construction of the URL of an artifact, I thought it would be
stored in the DB, so the Remote would create it during the sync.


This is correct. The inbound URL from the client after the redirect 
will still be a reference that the "Pulp content app" will resolve to 
a RemoteArtifact. Then the streamer will use that RemoteArtifact data 
to correctly build the downloader. That's the gist of it at least.



>
> To confirm my understanding this "squid-only" policy would be
the same as
> on-demand except that it would *not* perform step 14 from the
diagram here
> (https://pulp.plan.io/issues/3693
). Is that right?
yup
>
>>
>>
>> [1] https://github.com/candlepin/thumbslug

>>
>> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik
mailto:mkova...@redhat.com>>
>> wrote:
>>>
>>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban
mailto:dkli...@redhat.com>>
>>> wrote:
>>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik
mailto:mkova...@redhat.com>>
>>> > wrote:
>>> >>
>>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban
mailto:dkli...@redhat.com>>
>>> >> wrote:
>>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik
>>> >> > mailto:mkova...@redhat.com>>
>>> >> > wrote:
>>> >> >>
>>> >> >> Good point!
>>> >> >> More the second; it might be a bit crazy to utilize
Squid for that
>>> >> >> but
>>> >> >> first, let's answer the why ;)
>>> >> >> So why does Pulp need to store the content here?
>>> >> >> Why don't we point the users to the Squid all the time
(for the
>>> >> >> lazy
>>> >> >> repos)?
>>> >> >
>>> >> 

Re: [Pulp-dev] Lazy for Pulp3

2018-05-29 Thread Jeff Ortel

Looks good.

Made a few minor edits.

On 05/25/2018 02:11 PM, Brian Bouterse wrote:
A mini-team of core devs** met to talk through lazy use cases for 
Pulp3. It's effectively the same lazy from Pulp2 except:


* it's now built into core (not just RPM)
* It disincludes repo protection use cases because we haven't added 
repo protection to Pulp3 yet
* It disincludes the "background" policy which based on feedback from 
stakeholders provided very little value
* it will no longer will depend on Twisted as a dependency. It will 
use asyncio instead.


While it is being built into core, it will require minimal support by 
a plugin writer to add support for it. Details in the epic below.


The current use cases along with a technical plan are written on this 
epic: https://pulp.plan.io/issues/3693 


We're putting it out for comment, questions, and feedback before we 
start into the code. I hope we are able to add this into our next sprint.


** ipanova, jortel, ttereshc, dkliban, bmbouter

Thanks!
Brian



___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] 'id' versus 'pulp_id' on Content

2018-05-29 Thread Jeff Ortel

On 05/29/2018 08:24 AM, Brian Bouterse wrote:
On Fri, May 25, 2018 at 7:39 PM, Dana Walker > wrote:


I'm basically -1 for the reasons Jeff enumerated but if he is ok
with this, I'm happy to go ahead with it.

[Jeff]:
In classic relational modeling, using ID as the primary key is
common practice.  Especially when ORMs are involved.  The "id"
added by plugin writers is a natural key so naming it ID goes
against convention.


This is echoed here, for further reading (though perhaps this
article is overly simplified for our needs) in the sections "Key
Fields" and "Prefixes and Suffixes (are bad)":
https://launchbylunch.com/posts/2014/Feb/16/sql-naming-conventions/



That is true, but this article also talks about avoiding reserved 
words as well. I think we're hearing 'id' is a commonly reserved word 
for content types being modeled by plugin writers.




The article[1] you mentioned states that 'ID' /should/ be used for the 
PK which means it is inappropriate for natural key fields defined by 
plugin writers.  The reserved words caution in the article are DDL/DML 
reserved words "Ex: Avoid using words like |user|, |lock|, or |table|." 
not reserved by plugins.


[1] 
https://launchbylunch.com/posts/2014/Feb/16/sql-naming-conventions/#primary-keys
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Content types which are not compatible with the normal pulp workflow

2018-05-24 Thread Jeff Ortel



On 05/17/2018 07:46 AM, Daniel Alley wrote:
Some content types are not going to be compatible with the normal 
sync/publish/distribute Pulp workflows, and will need to be live 
API-only.  To what degree should Pulp accomodate these use cases?


Example:

Pulp makes the assumptions that

A) the metadata for a repository can be generated in its entirety by 
the known set of content in a RepositoryVersion, and


B) the client wouldn't care if you point it at an older version of the 
same repository.


Cargo, the package manager for the Rust programming language, expects 
the registry url to be a git repository. When a user does a "cargo 
update", cargo essentially does a "git pull" to update a local copy of 
the registry.


Both of those assumptions are false in this case. You cannot generate 
the git history just from the set of content, and you cannot "roll 
back" the state of the repository without either breaking it for 
clients, or adding new commits on top.


A theoretical Pulp plugin that worked with Cargo would need to ignore 
almost all of the existing Pulp primitives and very little (if any) of 
the normal Pulp workflow could be used.


Should Pulp attempt to cater to plugins like these?  What could Pulp 
do to provide a benefit for such plugins over writing something from 
scratch from the ground up?  To what extent would such plugins be able 
to integrate with the rest of Pulp, if at all?


I think OSTree and Ansible plugins will be in the same boat as Cargo.  
In the case of OSTree, libostree does the heavy lifting for sync and 
publishing and I suspect the same is true for Git based repositories.  
We should consider way to best support distributing (serving) content in 
core for these content types.  I suspect this will mainly entail 
something in the content app and perhaps a new component of a 
Publication like PublishedDirectory that references an OSTree/Git 
repository created in /var/lib/pulp/published.  This may benefit Maven 
as well.




We don't have to commit to anything pre-GA but it is a good thing to 
keep in mind.  I'm sure there are other content types out there (not 
just Cargo) which would face similar problems. pulp_git was inquired 
about a few months ago, it seems like it would share a few of them.



___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] 'id' versus 'pulp_id' on Content

2018-05-23 Thread Jeff Ortel
In classic relational modeling, using ID as the primary key is common 
practice.  Especially when ORMs are involved.  The "id" added by plugin 
writers is a natural key so naming it ID goes against convention.  Every 
field in base models used by plugins has potential for name collisions.  
Where does it end?  Every column having a pulp_ or _ prefix?  Plugins 
create relatively few tables and it doesn't seem unreasonable for plugin 
writers to select other names to resolve naming conflicts.


On 05/23/2018 07:50 AM, Brian Bouterse wrote:
Currently the Content model [0] has 'id' as it's primary key which is 
inherited from MasterModel here [1]. By naming our pk 'id', we are 
preventing plugin writers from also using that field. That field name 
is common for content types. For example: both RPM and Nuget content 
also expect to use the 'id' field to store data about the content type 
itself (not Pulp's pk). We learned about the Nuget incompatibility at 
ConfigMgmgtCamp from a community member. I learned about this issue 
with RPM from @dalley.


The only workaround a plugin writer has is to call their field 
'rpm_id' or something like that. I don't see how it's unavoidable that 
this renaming won't be passed directly onto the user for things like 
filtering, creating units, etc. I think that is an undesirable outcome 
just so that the Pulp pk can be named 'id'.


One option would be to rename 'id' to 'pulp_id' at the MasterModel. 
This is also somewhat ugly for Pulp developers, but it would be (a) 
crystal clear to the user in all cases and (b) allow Content writers 
to model their content types correctly.


Another option would be to rename the pk for 'Content' specifically 
and not at the MasterModel level. I think that would create more 
confusion than benefit so I recommend doing it at the MasterModel level.


What do you all think?

[0]: 
https://github.com/pulp/pulp/blob/6f492ee8fac94b8562dc62d87e6886869e052e7e/pulpcore/pulpcore/app/models/content.py#L106
[1]: 
https://github.com/pulp/pulp/blob/d1dc089890f167617fe9917af087d5587708296b/pulpcore/pulpcore/app/models/base.py#L25


-Brian


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] is 3.0-dev branch ready to become master?

2018-05-23 Thread Jeff Ortel



On 05/23/2018 06:20 AM, Brian Bouterse wrote:
It sounds like there isn't much blocking this, but does that mean the 
devs should go ahead with planning and making the branching changes?


Also I want to confirm: is the scope of this planned change only for 
pulp/pulp and pulp/devel repos for now?


Agreed.




On Tue, May 22, 2018 at 10:56 AM, Patrick Creech > wrote:


On Mon, 2018-05-21 at 19:51 -0400, Dennis Kliban wrote:
> We need to start planning the creation of  a "2.17-dev" branch
from the current master and merging "3.0-dev" into "master". We
would then create new "2.Y-dev" branch after each "2.Y.0" release. All
> 3.0 work would then land on master.

Might I suggest a y-version agnostic 2-dev or 2-master or similar
branch instead?  This would reflect better the state of the branch
as "Pulp 2 master" and will prevent us from having to rename a lot
of items each release.

+1 to this naming.

+1



This would also help enforce our cherry-pick model of 'merge to
master, pick back to -release branches for releases' and will
provide us a feature branch to branch off our '2.y-release' branches
without adding in confusion each .y cycle.


> Do our release engineering tools support this change? If not,
what would it take to support it?

Yes.  There'd be some small changes required to use the new master
branch insted of 'master', but that's it.

> ___
> Pulp-dev mailing list
> Pulp-dev@redhat.com 
> https://www.redhat.com/mailman/listinfo/pulp-dev

___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev





___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Core Commit Bit Process

2018-05-22 Thread Jeff Ortel

Thanks for the proposal, Brian.  Looks fine to me.

On 05/21/2018 04:48 PM, Brian Bouterse wrote:
For core and it's related tools, we don't have a written process to 
describe giving the commit bit to a contributor. We've been wanting to 
agree on and document that process for a while, so I'm facilitating 
thread gathering ideas to inform the writing of a PUP.


This starter email gives a brief history of what we've done and 
outlines a simple proposal to get us started. We can throw that 
proposal away in favor of any other idea.


# History

Historically if you were hired onto the Pulp team at Red Hat you 
received the commit bit day 0. In Oct 2017 we decided to stop doing 
that and instead document an open process. Engineers hired on the pulp 
time since Oct '17 have not received commit bit. We have not yet 
documented an open process of which to give it to them or any other 
proven contributor.


# Current State

The current core devs as shown on github are: asmacdo, bizhang, 
bmbouter, daviddavis, dkliban, dalley, ipanova, jortel, pcreech, ttereshc


# Scope of this discussion

pulp/pulp, pulp/devel, and any repos for the Pulp3 Ansible installer. 
It applies to both Pulp2 and Pulp3. Plugins will do what they want.


# Process Idea

One process idea is to add a new core committer upon a vote with +1's 
received from all current core developers. The thinking is that all 
current core devs needs to be 100% comfortable for the new person to 
handle any issue in place of themselves.


# Criteria

Overall I believe someone who has demonstrated commitment and care 
towards the needs of all Pulp users and not only their own interests. 
Also they must have the experience to be trusted with major aspects of 
Pulp functionality.


These requirements are somewhat vague by design. Any process with hard 
requirements will be gamed so I believe leaving it to the judgement of 
the existing devs is a safe approach. Anyone who specifically wants to 
get more involved should approach the core devs about mentorship. I 
think the right time will be obvious, and if there are doubts those 
can be expressed ahead of time or at vote time.


# Code owners

This commit bit vote could be for entire core repos, or it could be 
for a subsystem of Pulp enforced using github's "code owners" feature 
(https://blog.github.com/2017-07-06-introducing-code-owners/ 
).



^ is starter content, please send ideas and discussion that will be 
incorporated into a first draft PEP at some point.


-Brian



___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pulp CLI MVP User Stories

2018-05-18 Thread Jeff Ortel
The main goal of the CLI is to make it easier (than using the REST API 
and http) for admins to perform routine tasks.  It seems likely that A 
CLI /could/ provide an improvement by reducing the REST syntax 
complexity without combining the steps to complete a task.  Also, I 
think we should consider the frequency of tasks & steps.  For example: I 
would anticipate that work flows involving creating and deleting of 
repositories, remotes, publishers and distributions are performed with 
relatively low frequency.  And, each of those will likely happen with 
different frequency.  Work flows involving sync & publish would be 
performed with relatively high frequency. Updating resources (perhaps 
except Distribution) don't seem likely to be a regular thing.


My point is: I think that providing high level, task oriented, 
combination CLI commands won't provide enough value to justify the 
effort.  An admin running (1) complex command (think pulp-admin) rather 
than 3-4 simple commands does not seem like an improvement.  Especially 
since the CLI would probably need to provide the simple commands a well 
for work flows not accounted for.  For example: Edit a remote.  Or, 
would the admin need to use the REST API for that?


The pulp-admin CLI provided value not because it combined things but 
because it managed auth and reduced syntax complexity.



On 05/17/2018 10:52 AM, Dennis Kliban wrote:
The use cases we outlined earlier provide very little value over using 
httpie to interact with the REST API directly. I'd like to propose 5 
new use cases:


  * As a CLI user, I can create a repository, a remote, a publisher,
and a distribution with a single command.
  * As a CLI user, I can create a repository version, a publication,
and update the distribution with a single command.
  * As a CLI user, I can list remote types available on the Pulp server.
  * As a CLI user, I can list publisher types available on the Pulp
server.
  * As a CLI user, I can list all repositories available on the Pulp
server.


The use cases proposed at the beginning on this thread require the 
user to perform 4 steps before any content can be synced:


1) Create repository
2) Create remote
3) Create publisher
4) Create distribution

The goal for the CLI should be to reduce this to a single step. The 
CLI will need to make some assumptions for the user: publisher name, 
distribution name, auto publish, auto distribute, and maybe others. 
However, this will allow the user to use a single command to create a 
repository that's ready for sync/publish.


Sync/Publish/Distribute workflow can also be 3 steps:

1) Create a new repository version
2) Create a new publication
3) Update distribution

The goal here is to also reduce this to a single step.

The other use cases are auxiliary.

Questions? Thoughts? Ideas?

-Dennis






On Mon, May 14, 2018 at 11:58 AM, Dana Walker > wrote:


+1

Dana Walker

Associate Software Engineer

Red Hat






On Tue, May 8, 2018 at 10:31 AM, Jeremy Audet > wrote:

A configuration file in the user's home dir, right?


Yes, exactly.


Can we make sure to avoid placing configuration files directly
in users home directories, and instead place them into
directories like ~/.config? This is in line with the XDG Base
Directory Specification

.
The spec is pretty straightforward, but Pulp Smash uses pyxdg
 to avoid
mistakes. There's two big benefits to doing this:

  * Less clutter in home directories.
  * Guidance on what to do with other types of files, such as
cached files and runtime files.

Projects such as git, htop, lftp, mpd, neovim, tmuxinator,
boybo, and more do this.


___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev




___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev





___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Composed Repositories

2018-05-16 Thread Jeff Ortel



On 05/15/2018 11:59 AM, Brian Bouterse wrote:
I agree these are specific cases for a few content types that are used 
by multiple plugins. I think the most productive thing would be for us 
to talk in specific only about kickstart trees being shared between 
RPM and ostree. It would be much easier to generalize after building 
something specific once (I think).


This discussion wasn't about generalization or abstraction.  It's about 
dealing with remote repositories that are different combinations of 
common content types.  That said, while searching for concrete examples 
(use cases), it turns out these combinations don't really exist.  In 
pulp2, the RPM plugin is used to sync ISO repositories but they are not 
combined with other content types in the same repository.  Kickstart 
trees are only combined with YUM repositories.  Combination 
OSTree/KS-tree repositories aren't really a thing.


I think this thread can end here.



A mentor I had once told all software that lives long enough goes 
through 3 phases. (1) A concrete implementation (2) generalizing that 
implementation, and then (3) rewriting that implementation because of 
everything you didn't know before. I'm advocating for us to think 
about the problem as a specific plugin problem (step 1) and then after 
that is done, to look at generalizing it (step 2).


On Tue, May 15, 2018 at 11:27 AM, Bryan Kearney <bkear...@redhat.com 
<mailto:bkear...@redhat.com>> wrote:


On 05/14/2018 03:44 PM, Jeff Ortel wrote:
> Let's brainstorm on something.
>
> Pulp needs to deal with remote repositories that are composed of
> multiple content types which may span the domain of a single
plugin.
> Here are a few examples.  Some Red Hat RPM repositories are
composed of:
> RPMs, DRPMs, , ISOs and Kickstart Trees.  Some OSTree
repositories are
> composed of OSTrees & Kickstart Trees. This raises a question:
>
> How can pulp3 best support syncing with remote repositories that are
> composed of multiple (unrelated) content types in a way that doesn't
> result in plugins duplicating support for content types?
>


Both these examples are cases of RPM repos, yes? If so, does this
require a general purpose solution?

-- bk



___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Composed Repositories

2018-05-15 Thread Jeff Ortel



On 05/15/2018 10:41 AM, Jeff Ortel wrote:



On 05/15/2018 10:27 AM, Bryan Kearney wrote:

On 05/14/2018 03:44 PM, Jeff Ortel wrote:

Let's brainstorm on something.

Pulp needs to deal with remote repositories that are composed of
multiple content types which may span the domain of a single plugin.
Here are a few examples.  Some Red Hat RPM repositories are composed 
of:

RPMs, DRPMs, , ISOs and Kickstart Trees.  Some OSTree repositories are
composed of OSTrees & Kickstart Trees. This raises a question:

How can pulp3 best support syncing with remote repositories that are
composed of multiple (unrelated) content types in a way that doesn't
result in plugins duplicating support for content types?



Both these examples are cases of RPM repos, yes? If so, does this
require a general purpose solution?


The example in the thread is mainly RPM but there are other 
repositories with shared content types.  Eg: OSTree repositories also 
containing Kickstart Trees.


I also think there is value in not having the RPM plugin be a /mega/ 
plugin that knows how to deal with several complicated types of content 
(like in pulp2).  Making each plugin responsible for specific closely 
related types of content would make them more maintainable.






-- bk






___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Composed Repositories

2018-05-15 Thread Jeff Ortel



On 05/15/2018 09:29 AM, Milan Kovacik wrote:

Hi,

On Tue, May 15, 2018 at 3:22 PM, Dennis Kliban <dkli...@redhat.com> wrote:

On Mon, May 14, 2018 at 3:44 PM, Jeff Ortel <jor...@redhat.com> wrote:

Let's brainstorm on something.

Pulp needs to deal with remote repositories that are composed of multiple
content types which may span the domain of a single plugin.  Here are a few
examples.  Some Red Hat RPM repositories are composed of: RPMs, DRPMs, ,
ISOs and Kickstart Trees.  Some OSTree repositories are composed of OSTrees
& Kickstart Trees. This raises a question:

How can pulp3 best support syncing with remote repositories that are
composed of multiple (unrelated) content types in a way that doesn't result
in plugins duplicating support for content types?

Few approaches come to mind:

1. Multiple plugins (Remotes) participate in the sync flow to produce a
new repository version.
2. Multiple plugins (Remotes) are sync'd successively each producing a new
version of a repository.  Only the last version contains the fully sync'd
composition.
3. Plugins share code.
4. Other?


Option #1: Sync would be orchestrated by core or the user so that multiple
plugins (Remotes) participate in populating a new repository version.  For
example: the RPM plugin (Remote) and the Kickstart Tree plugin (Remote)
would both be sync'd against the same remote repository that is composed of
both types.  The new repository version would be composed of the result of
both plugin (Remote) syncs.  To support this, we'd need to provide a way for
each plugin to operate seamlessly on the same (new) repository version.
Perhaps something internal to the RepositoryVersion.  The repository version
would not be marked "complete" until the last plugin (Remote) sync has
succeeded.  More complicated than #2 but results in only creating truly
complete versions or nothing.  No idea how this would work with current REST
API whereby plugins provide sync endpoints.


I like this approach because it allows the user to perform a single call to
the REST API and specify multiple "sync methods" to use to create a single
new repository version.

Same here, esp. if the goal is an all-or-nothing behavior w/r the
mix-in remotes; i.e an atomic sync.
This has a benefit of a clear start and end of the sync procedure,
that the user might want to refer to.


Option #2: Sync would be orchestrated by core or the user so that multiple
plugins (Remotes) create successive repository versions.  For example: the
RPM plugin (Remote) and the Kickstart Tree plugin (Remote) would both be
sync'd against the same remote repository that is a composition including
both types.  The intermediate versions would be incomplete.  Only the last
version contains the fully sync'd composition.  This approach can be
supported by core today :) but will produce incomplete repository versions
that are marked complete=True.  This /seems/ undesirable, right?  This may
not be a problem for distribution since I would imaging that only the last
(fully composed) version would be published.  But what about other usages of
the repository's "latest" version?

I'm afraid I don't see use of a middle-version esp. in case of
failures; e.g ostree failed to sync while rpm managed and kickstart
managed too; is the sync OK as a whole? What to do with the versions
created? Should I merge the successes into one and retry the failure?
How many versions would this introduce?


(option 2) The partial versions would be created in both normal and 
failure scenarios.  The normal scenario is created because each plugin 
(Remote) creates a new version and only the last one is completed.  the 
intermediate versions are always partial.





Option #3: requires a plugin to be aware of specific repository
composition(s); other plugins and creates a code dependency between plugins.
For example, the RPM plugin could delegate ISOs to the File plugin and
Kickstart Trees to the KickStart Tree plugin.

Do you mean that the RPM plug-in would directly call into the File plug-in?
If that's the case then I don't like it much, would be a pain every
time a new plug-in would be introduced (O(len(plugin)^2) of updates)
or if the API of a plug-in changed (O(len(plugin)) updates).
Esp. keeping the plugin code aware of other plugin updates would be ugly.


Agreed.  The plugins could install libs into site-packages which would 
at least mitigate the complexity of calling into each other through the 
pulp plugin framework but I don't think it helps much. Even the rpm 
dependency is undesirable.





For all options, plugins (Remotes) need to limit sync to affect only those
content types within their domain.  For example, the RPM (Remote) sync
cannot add/remove ISO or KS Trees.

I am an advocate of some from of options #1 or #2.  Combining plugins
(Remotes) as needed to deal with arbitrary combinations within remote
repositories seems very powerful; does not impose complexity on plugin
wri

Re: [Pulp-dev] Composed Repositories

2018-05-15 Thread Jeff Ortel



On 05/15/2018 05:58 AM, Austin Macdonald wrote:

Here's another complexity, how do 2 plugins create a single publication?


The plugin API could make this seamless.

We basically have the same problem of 2 parallel operations creating 
content from a single source.


I don't think so.  plugins should not manipulate content outside of 
their domain (other plugins content) so either serial or parallel should 
be safe.




On Tue, May 15, 2018, 06:27 Ina Panova <ipan...@redhat.com 
<mailto:ipan...@redhat.com>> wrote:


+1 on not introducing dependencies between plugins.

What will be the behavior in case there is a composed repo of rpm
and ks trees but just the rpm plugin is installed?

Do we fail and say we cannot sync this repo at all or we just sync
the rpm part?


Assuming plugins do not depend on each other, I think that when each 
plugin looks at the upstream repo, they will only "see" the content of 
that type. Conceptually, we will have 2 remotes, so it will feel like 
we are syncing from 2 totally distinct repositories.


The solution I've been imagining is a lot like 2. Each plugin would 
sync to a *separate repository.* These separate repositories are then 
published creating *separate publications*. This approach allows the 
plugins to live completely in ignorance of each other.


The final step is to associate *both publications to one 
distribution*, which composes the publications as they are served.


The downside is that we have to sync and publish twice, and that the 
resulting versions and publications aren't locked together. But I 
think this is better than leaving versions and publications unfinished 
with the assumption that another plugin will finish the job. Maybe 
linking them together could be a good use of the notes field.


Pulp should support repositories with composed (mixed) content for the 
same reason RH does.  The repository is a collection of content that 
users want to manage together.  Consider the promotion cases: dev, test, 
prod.





Depends how we plan this ^ i guess we'll decide which option 1 or
2 fits better.

Don't want to go wild, but what if notion of composed repos will
be so popular in the future that's its amount will increase? I
think we do want to at least partially being able to sync it and
not take the approach all or nothing?

#2 speaks to me more for now.





Regards,

Ina Panova
Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."

On Mon, May 14, 2018 at 9:44 PM, Jeff Ortel <jor...@redhat.com
<mailto:jor...@redhat.com>> wrote:

Let's brainstorm on something.

Pulp needs to deal with remote repositories that are composed
of multiple content types which may span the domain of a
single plugin.  Here are a few examples.  Some Red Hat RPM
repositories are composed of: RPMs, DRPMs, , ISOs and
Kickstart Trees.  Some OSTree repositories are composed of
OSTrees & Kickstart Trees. This raises a question:

How can pulp3 best support syncing with remote repositories
that are composed of multiple (unrelated) content types in a
way that doesn't result in plugins duplicating support for
content types?

Few approaches come to mind:

1. Multiple plugins (Remotes) participate in the sync flow to
produce a new repository version.
2. Multiple plugins (Remotes) are sync'd successively each
producing a new version of a repository.  Only the last
version contains the fully sync'd composition.
3. Plugins share code.
4. Other?


Option #1: Sync would be orchestrated by core or the user so
that multiple plugins (Remotes) participate in populating a
new repository version.  For example: the RPM plugin (Remote)
and the Kickstart Tree plugin (Remote) would both be sync'd
against the same remote repository that is composed of both
types.  The new repository version would be composed of the
result of both plugin (Remote) syncs.  To support this, we'd
need to provide a way for each plugin to operate seamlessly on
the same (new) repository version.  Perhaps something internal
to the RepositoryVersion.  The repository version would not be
marked "complete" until the last plugin (Remote) sync has
succeeded.  More complicated than #2 but results in only
creating truly complete versions or nothing.  No idea how this
would work with current REST API whereby plugins provide sync
endpoints.

Option #2: Sync would be orchestrated by core or the user so
that multiple plugins (Remotes) create successive repository
versions.  For example: the RPM plugin (Remote) and the
K

Re: [Pulp-dev] Composed Repositories

2018-05-15 Thread Jeff Ortel



On 05/15/2018 05:26 AM, Ina Panova wrote:

+1 on not introducing dependencies between plugins.

What will be the behavior in case there is a composed repo of rpm and 
ks trees but just the rpm plugin is installed?


I would expect the result would be to only sync the rpm content into the 
pulp repository.


Do we fail and say we cannot sync this repo at all or we just sync the 
rpm part?


No, I think it would be expected to succeed since the user has only 
installed the rpm plugin and requested that only rpm content be sync'd.  
The remote repository is composed of multiple content types out of 
convenience for managing the content.  Pulp should not be bound to the 
organization of remote repositories.




Depends how we plan this ^ i guess we'll decide which option 1 or 2 
fits better.


Don't want to go wild, but what if notion of composed repos will be so 
popular in the future that's its amount will increase? I think we do 
want to at least partially being able to sync it and not take the 
approach all or nothing?


#2 speaks to me more for now.


#2 will create repository version with partial content which are 
complete=True.  Given users can choose which version to publish, do you 
see this as a problem.  What about cases where the "latest" version is, 
at times, partial?








Regards,

Ina Panova
Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."

On Mon, May 14, 2018 at 9:44 PM, Jeff Ortel <jor...@redhat.com 
<mailto:jor...@redhat.com>> wrote:


Let's brainstorm on something.

Pulp needs to deal with remote repositories that are composed of
multiple content types which may span the domain of a single
plugin.  Here are a few examples. Some Red Hat RPM repositories
are composed of: RPMs, DRPMs, , ISOs and Kickstart Trees.  Some
OSTree repositories are composed of OSTrees & Kickstart Trees.
This raises a question:

How can pulp3 best support syncing with remote repositories that
are composed of multiple (unrelated) content types in a way that
doesn't result in plugins duplicating support for content types?

Few approaches come to mind:

1. Multiple plugins (Remotes) participate in the sync flow to
produce a new repository version.
2. Multiple plugins (Remotes) are sync'd successively each
producing a new version of a repository.  Only the last version
contains the fully sync'd composition.
3. Plugins share code.
4. Other?


Option #1: Sync would be orchestrated by core or the user so that
multiple plugins (Remotes) participate in populating a new
repository version.  For example: the RPM plugin (Remote) and the
Kickstart Tree plugin (Remote) would both be sync'd against the
same remote repository that is composed of both types.  The new
repository version would be composed of the result of both plugin
(Remote) syncs.  To support this, we'd need to provide a way for
each plugin to operate seamlessly on the same (new) repository
version.  Perhaps something internal to the RepositoryVersion. 
The repository version would not be marked "complete" until the
last plugin (Remote) sync has succeeded.  More complicated than #2
but results in only creating truly complete versions or nothing. 
No idea how this would work with current REST API whereby plugins
provide sync endpoints.

Option #2: Sync would be orchestrated by core or the user so that
multiple plugins (Remotes) create successive repository versions. 
For example: the RPM plugin (Remote) and the Kickstart Tree plugin
(Remote) would both be sync'd against the same remote repository
that is a composition including both types.  The intermediate
versions would be incomplete. Only the last version contains the
fully sync'd composition.  This approach can be supported by core
today :) but will produce incomplete repository versions that are
marked complete=True.  This /seems/ undesirable, right? This may
not be a problem for distribution since I would imaging that only
the last (fully composed) version would be published.  But what
about other usages of the repository's "latest" version?

Option #3: requires a plugin to be aware of specific repository
composition(s); other plugins and creates a code dependency
between plugins.  For example, the RPM plugin could delegate ISOs
to the File plugin and Kickstart Trees to the KickStart Tree plugin.

For all options, plugins (Remotes) need to limit sync to affect
only those content types within their domain. For example, the RPM
(Remote) sync cannot add/remove ISO or KS Trees.

I am an advocate of some from of options #1 or #2. Combining
plugins (Remotes) as needed to deal with arbitrary combinations
within remote repositories seems very powerful; do

[Pulp-dev] Composed Repositories

2018-05-14 Thread Jeff Ortel

Let's brainstorm on something.

Pulp needs to deal with remote repositories that are composed of 
multiple content types which may span the domain of a single plugin.  
Here are a few examples.  Some Red Hat RPM repositories are composed of: 
RPMs, DRPMs, , ISOs and Kickstart Trees.  Some OSTree repositories are 
composed of OSTrees & Kickstart Trees. This raises a question:


How can pulp3 best support syncing with remote repositories that are 
composed of multiple (unrelated) content types in a way that doesn't 
result in plugins duplicating support for content types?


Few approaches come to mind:

1. Multiple plugins (Remotes) participate in the sync flow to produce a 
new repository version.
2. Multiple plugins (Remotes) are sync'd successively each producing a 
new version of a repository.  Only the last version contains the fully 
sync'd composition.

3. Plugins share code.
4. Other?


Option #1: Sync would be orchestrated by core or the user so that 
multiple plugins (Remotes) participate in populating a new repository 
version.  For example: the RPM plugin (Remote) and the Kickstart Tree 
plugin (Remote) would both be sync'd against the same remote repository 
that is composed of both types.  The new repository version would be 
composed of the result of both plugin (Remote) syncs.  To support this, 
we'd need to provide a way for each plugin to operate seamlessly on the 
same (new) repository version.  Perhaps something internal to the 
RepositoryVersion. The repository version would not be marked "complete" 
until the last plugin (Remote) sync has succeeded.  More complicated 
than #2 but results in only creating truly complete versions or nothing. 
No idea how this would work with current REST API whereby plugins 
provide sync endpoints.


Option #2: Sync would be orchestrated by core or the user so that 
multiple plugins (Remotes) create successive repository versions.  For 
example: the RPM plugin (Remote) and the Kickstart Tree plugin (Remote) 
would both be sync'd against the same remote repository that is a 
composition including both types.  The intermediate versions would be 
incomplete. Only the last version contains the fully sync'd 
composition.  This approach can be supported by core today :) but will 
produce incomplete repository versions that are marked complete=True.  
This /seems/ undesirable, right?  This may not be a problem for 
distribution since I would imaging that only the last (fully composed) 
version would be published.  But what about other usages of the 
repository's "latest" version?


Option #3: requires a plugin to be aware of specific repository 
composition(s); other plugins and creates a code dependency between 
plugins.  For example, the RPM plugin could delegate ISOs to the File 
plugin and Kickstart Trees to the KickStart Tree plugin.


For all options, plugins (Remotes) need to limit sync to affect only 
those content types within their domain.  For example, the RPM (Remote) 
sync cannot add/remove ISO or KS Trees.


I am an advocate of some from of options #1 or #2.  Combining plugins 
(Remotes) as needed to deal with arbitrary combinations within remote 
repositories seems very powerful; does not impose complexity on plugin 
writers; and does not introduce code dependencies between plugins.


Thoughts?
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pulp3 "auto-distribute" Feature?

2018-05-08 Thread Jeff Ortel



On 05/07/2018 10:19 AM, Kersom Moura Oliveira wrote:
Is there a specific user case that requires that a publisher and a 
publication to be part of a distribution?


Just the publisher & publication, No.

The Distribution.publication links a publication to the Distribution and 
is completely independent from Publisher.  It determines which 
publication the Distribution is distributing.


The Distribution.publication and Distribution.repository are only set 
(always together) to support auto-distribute.  That is "when this 
publisher creates a new publication for this repository, update the 
Distribution.publication = ".




Thanks,



On Mon, May 7, 2018 at 11:11 AM, Austin Macdonald > wrote:



On Mon, May 7, 2018 at 10:55 AM, Brian Bouterse
> wrote:

I'm confused about the feature claim of the auto-distribute
feature for Pulp 3.0 GA. The distribution object takes both
'repository' and 'publisher' as options currently...

As a user, can I create a distribution that will
auto-distribute any new repository version for repo 'foo'?

As a user, can I create a distribution that will
auto-distribute any publication produced by a specific
publisher, e.g. 'baz'?


Repository and publisher must both be set at the same time. Any
time the publisher is used to create a new publication of the
repository it is auto distributed.

Not sure why the help text isn't showing up in the autogenerated docs.

https://github.com/pulp/pulp/blob/3.0-dev/pulpcore/pulpcore/app/serializers/repository.py#L203-L223



Heres how it works:

https://github.com/pulp/pulp/blob/3.0-dev/pulpcore/pulpcore/app/models/publication.py#L107-L112



Thanks,
Brian

___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev




___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev





___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pulp CLI MVP User Stories

2018-05-02 Thread Jeff Ortel
Thanks for putting this together.  Seems like the devil will be in the 
details pending REST API decisions.


On 05/02/2018 01:09 PM, David Davis wrote:
The CLI team has identified a set of user stories that we think we 
should try to accomplish for the MVP. These would be the minimum set 
of requirements for the MVP CLI.


- As a user, I have a set of CLI commands that match the REST API for 
my Pulp server
  - I have parameters for each command that correspond to API resource 
parameters

  - I also have a CLI filter for every API resource filter
  - I have CLI commands for core and installed plugins
  - CLI commands for plugins that aren’t installed don’t show up
- As a user, I can configure a file with the Pulp API URI, username, 
password


A configuration file in the user's home dir, right?


- As a user, I can view all results even if they are paginated


"View" as in - press any key for the next page?


- As a user, I can view a help screen with help text from the API schema


Like "$ pulp -h"  and "$ pulp repository -h"?



Also, we’re investigating an autocomplete story. We might get it for 
free with some of the tools we’re using, and it might be worth 
supporting from the start rather then trying to retrofit something later:


- As user, I have autocompletion for all commands.

We’ve also identified stories that we’re were thinking of not handling 
with the MVP:


- As a user I can see progress for async tasks and when they complete.
- As a user, I have CLI enabled "workflows" (ie commands that hit 
multiple API endpoints) for pulpcore and plugins


Lastly, the issue around how to identify objects from the cli 
(href/UUID/name/etc) is still an ongoing discussion that involves the 
bindings as well. We plan to add a user story later for that.


David


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pulp api seemingly incompatible with generated bindings

2018-04-30 Thread Jeff Ortel



On 04/30/2018 09:05 AM, David Davis wrote:
So what I’d probably propose is exposing the UUIDs in the response and 
then extending HyperlinkedRelatedFields to accept UUID or href. Then 
third parties like Katello could store and just use UUIDs (and not 
worry about hrefs).


+1 to exposing/supporting both the PKs and hrefs.  Btw: We should be 
talking in terms of resource PKs (primary keys) or IDs instead of UUIDs 
for clarity.




Regarding hrefs though, hostname and port don’t matter. The app just 
looks at the relative path. It looks like changing the deployment path 
causes problems though.



David

On Mon, Apr 30, 2018 at 9:58 AM, Justin Sherrill > wrote:




On 04/27/2018 07:18 PM, David Davis wrote:

I’m not sure how returning UUIDs in our responses helps Katello.
In our previous conversation, it was concluded that Katello
should use the hrefs[0]. Why expose UUIDs if Katello is not going
to store them?


And thats fine, but bindings are pointless at that point, so pulp
shouldn't really advertise them as a feature.   This seemed to
have been 'talked up' quite a bit as a feature, but is completely
unusable.



Katello could store/use UUIDs but then it's going to run into
problems when dealing with parameters that are hrefs (such as
repository_version for publishing[1]).

[0]
https://www.redhat.com/archives/pulp-dev/2018-January/msg4.html

[1]

https://github.com/pulp/pulp_file/blob/5ffb33d8c70ffbb247aba8bf5b45633eba414b79/pulp_file/app/viewsets.py#L54




Could you explain a bit about this?

In order to use pulp 3 then, i'd guess we would either need to:

1) store ALL hrefs about all objects
2) fetch an object before we can do anything with it

Or am i missing an option 3?

On a side note, the href's seem to include
hostname/port/deployment path.  This seems incompatible with
things like hostname changes.  We can fairly easily just chomp off
only the path, but if i were a user and had stored all these
hrefs, i would be very unhappy if i had all the full href's stored.

Justin





David

On Fri, Apr 27, 2018 at 4:29 PM, Dennis Kliban
> wrote:

I can't remember why we decided to remove UUID from the
responses. It sounds like we should add them back.

On Fri, Apr 27, 2018 at 12:26 PM, Justin Sherrill
> wrote:

Hi All!

I started playing around with pulp 3 and generated
bindings via https://pulp.plan.io/issues/3580
 and it results
somewhat in what you would expect.  Here's an example:

    # @param id A UUID string identifying this repository.
    # @param [Hash] opts the optional parameters
    # @return [Repository]
    def repositories_read(id, opts = {})
  data, _status_code, _headers =
repositories_read_with_http_info(id, opts)
  return data
    end


Notice that the UUID is to be passed in.  When creating a
repository, i only get the _href:

{
    "_href":

"http://localhost:8000/pulp/api/v3/repositories/bfc61565-89b1-4b7b-9c4a-2ec91f299aca/

",
    "_latest_version_href": null,
    "_versions_href":

"http://localhost:8000/pulp/api/v3/repositories/bfc61565-89b1-4b7b-9c4a-2ec91f299aca/versions/

",
    "created": "2018-04-27T15:26:03.546956Z",
    "description": "",
    "name": "test",
    "notes": {}
}

Meaning, there's really no way to use this specific
binding with the return format for pulp.   I imagine most
binding generation would be expecting the user to know
the ID of the objects and not work off of _hrefs.  Any
reason to not include the IDs in the response?

Justin

___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev




___
Pulp-dev mailing list
Pulp-dev@redhat.com 

Re: [Pulp-dev] Fwd: Re: Changesets Challenges

2018-04-16 Thread Jeff Ortel

Thanks for the proposal, Brian.  I also commented on the issue.

On 04/16/2018 09:41 AM, Brian Bouterse wrote:
I wrote up a description of the opportunity I see here [0]. I put a 
high level pro/con analysis below. I would like feedback on (a) if 
this adequately addresses the problem statements, (b) if there are 
alternatives, and (c) does this improve the plugin wrtier's experience 
enough to adopt this?


pros:
* significantly less plugin code to write. Compare the Thing example 
code versus the current docs.

+1

* Higher performing with metadata downloading and parsing being 
included in stream processing. This causes sync's for pulp_ansible to 
start 6+ min earlier.


This could also be done currently with the ChangeSet as-is.



cons:
* Progress reporting doesn't know how many things it's processing 
(it's a stream). So user's would see progress as "X things completed", 
not "X of Y things completed". Y can't be known until just before the 
stream processing completes otherwise it's not stream processing.


I'm not a fan of the SizedIterator either.
I contemplated this when designing the ChangeSet.  An alternative I 
considered was to report progress like OSTree does.  It reports progress 
by periodically updating the expected TOTAL.  It's better than nothing.




[0]: https://pulp.plan.io/issues/3570

Thanks!
Brian



On Thu, Apr 12, 2018 at 7:12 PM, Jeff Ortel <jor...@redhat.com 
<mailto:jor...@redhat.com>> wrote:




On 04/12/2018 04:00 PM, Brian Bouterse wrote:


On Thu, Apr 12, 2018 at 11:53 AM, Jeff Ortel <jor...@redhat.com
<mailto:jor...@redhat.com>> wrote:



On 04/12/2018 10:01 AM, Brian Bouterse wrote:



    On Wed, Apr 11, 2018 at 6:07 PM, Jeff Ortel
<jor...@redhat.com <mailto:jor...@redhat.com>> wrote:



On 04/11/2018 03:29 PM, Brian Bouterse wrote:

I think we should look into this in the near-term.
Changing an interface on an object used by all plugins
will be significantly easier, earlier.


On Wed, Apr 11, 2018 at 12:25 PM, Jeff Ortel
<jor...@redhat.com <mailto:jor...@redhat.com>> wrote:



On 04/11/2018 10:59 AM, Brian Bouterse wrote:



On Tue, Apr 10, 2018 at 10:43 AM, Jeff Ortel
<jor...@redhat.com <mailto:jor...@redhat.com>> wrote:














On 04/06/2018 09:15 AM, Brian Bouterse wrote:

Several plugins have started using the
Changesets including pulp_ansible,
pulp_python, pulp_file, and perhaps others.
The Changesets provide several distinct
points of value which are great, but there
are two challenges I want to bring up. I want
to focus only on the problem statements first.

1. There is redundant "differencing" code in
all plugins. The Changeset interface requires
the plugin writer to determine what units
need to be added and those to be removed.
This requires all plugin writers to write the
same non-trivial differencing code over and
over. For example, you can see the same
non-trivial differencing code present in
pulp_ansible

<https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306>,
pulp_file

<https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193>,
and pulp_python

<https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223>.
Line-wise, this "differencing" code makes up
a large portion (maybe 50%) of the sync code
itself in each plugin.


Ten lines of trivial set logic hardly seems
like a big deal but any duplication is worth
exploring.

It's more than ten lines. Take pulp_ansible for
example. By my count (the linked to section) it's
89 lines, which out of 306 lines of plugin code
for sync is 29% of extra redundant code. The other
plugins have similar numbers. So with those
numbers in mind, what do you think?


I was counting the lines (w/o comments) in
find_delta() based on the linked code. Which
functions 

Re: [Pulp-dev] Pulp 3 REST API Challenges

2018-04-15 Thread Jeff Ortel



On 04/12/2018 04:49 PM, Dennis Kliban wrote:
On Thu, Apr 12, 2018 at 2:49 PM, Jeff Ortel <jor...@redhat.com 
<mailto:jor...@redhat.com>> wrote:




On 04/11/2018 01:13 PM, Dennis Kliban wrote:

On Tue, Apr 10, 2018 at 6:44 PM, Jeff Ortel <jor...@redhat.com
<mailto:jor...@redhat.com>> wrote:



On 04/10/2018 04:15 PM, Dennis Kliban wrote:

On Tue, Apr 10, 2018 at 2:04 PM, Brian Bouterse
<bbout...@redhat.com <mailto:bbout...@redhat.com>> wrote:

These are good problem statements. I didn't understand
all of the aspects of it, so I put some inline questions.

My overall question is: are these related problems? To
share my answer to this, I believe the first two
problems are related and the third is separate. The
classic divide and conquor approach we could use here is
to confirm that the problems are unrelated and focus on
resolving one of them first.


I don't think all 3 are related problems. The motivation for
grouping all together is that a subset of the action
endpoints from problem 1 are used to create repository
versions and Problem 3 is a problem with the repository
version creation API.


On Mon, Apr 9, 2018 at 3:18 PM, Austin Macdonald
<aus...@redhat.com <mailto:aus...@redhat.com>> wrote:

Folks,

Austin, Dennis, and Milan have identified the
following issues with current Pulp3 REST API design:

  * Action endpoints are problematic.
  o Example POST@/importers//sync/
  o They are non-RESTful and would make client
code tightly coupled with the server code.
  o These endpoints are inconsistent with the
other parts of the REST API.

Is self-consistency really a goal? I think it's a
placeholder for consistency for REST since the "rest" of
the API is RESTful. After reading parts of Roy
Fielding's writeup of the definition of REST I believe
"action endpoints are not RESTful" to be a true
statement. Maybe "Action endpoints are problematic"
should be replaced with "Action endpoints are not
RESTful" perhaps and have the self-consistency bullet
removed?


+1 to "Action endpoints are not RESTful"
+1 to removing the self-consistency language

  o DRF is not being used as intended for action
endpoints so we have to implement extra
code. (against the grain)

I don't know much about this. Where is the extra code?

  * We don't have a convention for where
plug-in-specific, custom repository version
creation endpoints.
  o example POST@/api/v3/<where?>/docker/add/
  o needs to be discoverable through the schema

What does discoverable via the schema ^ mean? Aren't all
urls listed in the schema?

I think of ^ problem somewhat differently. Yes all urls
need to be discoverable (a REST property), but isn't it
more of an issue that the urls which produce repo
versions can't be identified distinctly from any other
plugin-contributed url? To paraphrase this perspective:
making a repo version is strewn about throughout the API
in random places which is a bad user experience. Is that
what is motivation url discovery?


Yes. I envision a CLI that can discover new plugin
repository-version-creating functionality without having to
install new client packages. Allowing plugin writers to add
endpoints in arbitrary places for creating repository
versions will make it impossible for the client to know what
all the possible ways of creating a repository version are.

  * For direct repository version creation, plugins
are not involved.
  o validation correctness problem:
https://pulp.plan.io/issues/3541
<https://pulp.plan.io/issues/3541>
  o example:
POST@/api/v3/repositories//versions/

I agree with this problem statement. In terms of scope
it affects some plugin writers but not all.


I think it affects all plugin writers. Even the File plugin
needs to provide some validation when creating a repository
version. Right now you can add a FileContent with the same
relative path as another FileConte

Re: [Pulp-dev] Publication delete, sync or async?

2018-04-15 Thread Jeff Ortel



On 04/11/2018 10:34 AM, Austin Macdonald wrote:
From our checkin meeting, there was an MVP doc question that needed 
some discussion:

*
Publications:*https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Minimum_Viable_Product#Publications 



  o /As a user, As an authenticated user, I can delete publications./
  + /asynchronously with a lock on the repository version. /
  + /prevented if associated with a distribution./
  + /single object only./

In the code, Publication deletes are synchronous, not asynchronous
like the MVP docs says. I think the code is correct, so we should
remove this line. If we do not remove this line, we should write a
story to make this call async.

In the code, Publication deletes are not blocked by association to
distributions. Should write a story or remove this line?



I propose we leave the publication delete synchronous (non locking) for 
now.  The code should be setting the Distribution.publication = NULL and 
this seems like a reasonable thing.  If we make publication DELETE 
asynchronous (with locking), we'd need to also make setting the 
Distribution.publication asynchronous (with locking) to prevent race 
conditions.  This all seems more complicated than necessary.




"single object only". What does that mean? If it means 1
publication at a time, that is how all our objects work, so I
think we can delete this line.



+0 delete that line.





___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Fwd: Re: Changesets Challenges

2018-04-15 Thread Jeff Ortel



On 04/11/2018 10:59 AM, Brian Bouterse wrote:



On Tue, Apr 10, 2018 at 10:43 AM, Jeff Ortel <jor...@redhat.com 
<mailto:jor...@redhat.com>> wrote:















On 04/06/2018 09:15 AM, Brian Bouterse wrote:

Several plugins have started using the Changesets including
pulp_ansible, pulp_python, pulp_file, and perhaps others. The
Changesets provide several distinct points of value which are
great, but there are two challenges I want to bring up. I want to
focus only on the problem statements first.

1. There is redundant "differencing" code in all plugins. The
Changeset interface requires the plugin writer to determine what
units need to be added and those to be removed. This requires all
plugin writers to write the same non-trivial differencing code
over and over. For example, you can see the same non-trivial
differencing code present in pulp_ansible

<https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306>,
pulp_file

<https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193>,
and pulp_python

<https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223>.
Line-wise, this "differencing" code makes up a large portion
(maybe 50%) of the sync code itself in each plugin.


Ten lines of trivial set logic hardly seems like a big deal but
any duplication is worth exploring.

It's more than ten lines. Take pulp_ansible for example. By my count 
(the linked to section) it's 89 lines, which out of 306 lines of 
plugin code for sync is 29% of extra redundant code. The other plugins 
have similar numbers. So with those numbers in mind, what do you think?


I was counting the lines (w/o comments) in find_delta() based on the 
linked code.  Which functions are you counting?






2. Plugins can't do end-to-end stream processing. The Changesets
themselves do stream processing, but when you call into
changeset.apply_and_drain() you have to have fully parsed the
metadata already. Currently when fetching all metadata from
Galaxy, pulp_ansible takes about 380 seconds (6+ min). This means
that the actual Changeset content downloading starts 380 seconds
later than it could. At the heart of the problem, the
fetching+parsing of the metadata is not part of the stream
processing.


The additions/removals can be any interable (like generator) and
by using ChangeSet.apply() and iterating the returned object, the
pluign can "turn the crank" while downloading and processing the
metadata.  The ChangeSet.apply_and_drain() is just a convenience
method.  I don't see how this is a limitation of the ChangeSet.


That is new info for me (and maybe everyone). OK so Changesets have 
two interfaces. apply() and apply_and_drain(). Why do we have two 
interfaces when apply() can support all existing use cases (that I 
know of) and do end-to-end stream processing but apply_and_drain() 
cannot? I see all of our examples (and all of our new plugins) using 
apply_and_drain().


The ChangeSet.apply() was how I designed (and documented) it.  Not sure 
when/who added the apply_and_drain().  +1 for removing it.






Do you see the same challenges I do? Are these the right problem
statements? I think with clear problem statements a solution will
be easy to see and agree on.


I'm not convinced that these are actual problems/challenges that
need to be addressed in the near term.



Thanks!
Brian


___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>



___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Fwd: Re: Changesets Challenges

2018-04-12 Thread Jeff Ortel



On 04/12/2018 04:00 PM, Brian Bouterse wrote:


On Thu, Apr 12, 2018 at 11:53 AM, Jeff Ortel <jor...@redhat.com 
<mailto:jor...@redhat.com>> wrote:




On 04/12/2018 10:01 AM, Brian Bouterse wrote:



On Wed, Apr 11, 2018 at 6:07 PM, Jeff Ortel <jor...@redhat.com
<mailto:jor...@redhat.com>> wrote:



On 04/11/2018 03:29 PM, Brian Bouterse wrote:

I think we should look into this in the near-term. Changing
an interface on an object used by all plugins will be
significantly easier, earlier.


On Wed, Apr 11, 2018 at 12:25 PM, Jeff Ortel
<jor...@redhat.com <mailto:jor...@redhat.com>> wrote:



On 04/11/2018 10:59 AM, Brian Bouterse wrote:



    On Tue, Apr 10, 2018 at 10:43 AM, Jeff Ortel
<jor...@redhat.com <mailto:jor...@redhat.com>> wrote:














On 04/06/2018 09:15 AM, Brian Bouterse wrote:

Several plugins have started using the Changesets
including pulp_ansible, pulp_python, pulp_file,
and perhaps others. The Changesets provide several
distinct points of value which are great, but
there are two challenges I want to bring up. I
want to focus only on the problem statements first.

1. There is redundant "differencing" code in all
plugins. The Changeset interface requires the
plugin writer to determine what units need to be
added and those to be removed. This requires all
plugin writers to write the same non-trivial
differencing code over and over. For example, you
can see the same non-trivial differencing code
present in pulp_ansible

<https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306>,
pulp_file

<https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193>,
and pulp_python

<https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223>.
Line-wise, this "differencing" code makes up a
large portion (maybe 50%) of the sync code itself
in each plugin.


Ten lines of trivial set logic hardly seems like a
big deal but any duplication is worth exploring.

It's more than ten lines. Take pulp_ansible for
example. By my count (the linked to section) it's 89
lines, which out of 306 lines of plugin code for sync
is 29% of extra redundant code. The other plugins have
similar numbers. So with those numbers in mind, what do
you think?


I was counting the lines (w/o comments) in find_delta()
based on the linked code. Which functions are you counting?


I was counting the find_delta, build_additions, and
build_removals methods. Regardless of how the lines are
counted, that differencing code is the duplication I'm
talking about. There isn't a way to use the changesets
without duplicating that differencing code in a plugin.


The differencing code is limited to find_delta() and perhaps
build_removals().  Agreed, the line count is less useful than
specifically identifying duplicate code.  Outside of
find_delta(), I see similar code (in part because it got
copied from file plugin) but not seeing actual duplication. 
Can you be more specific?


Very similar code or identical code, I think it begs the question
why are we having plugin writer's do this at all? What value are
they creating with it? I don't have a reasonable answer to that
question, so the requirement for plugin writer's to write that
code brings me back to the problem statement: "plugin writers
have redundant differencing code when using Changesets". More
info on why it is valuable for the plugin writer to do the
differencing code versus the Changesets would be helpful.


The ChangeSet abstraction (and API) is based on following division
of responsibility:

The plugin  (with an understanding of the remote and its content):
  - Download metadata.
  - Parse metadata
  - Based on the metadata:
    - determine content to be added to the repository.
  - define how artifacts are downloaded.
  - construct content
    - determine content to be removed to the repository.

Core (without understand of specif

Re: [Pulp-dev] Fwd: Re: Changesets Challenges

2018-04-12 Thread Jeff Ortel



On 04/12/2018 10:01 AM, Brian Bouterse wrote:



On Wed, Apr 11, 2018 at 6:07 PM, Jeff Ortel <jor...@redhat.com 
<mailto:jor...@redhat.com>> wrote:




On 04/11/2018 03:29 PM, Brian Bouterse wrote:

I think we should look into this in the near-term. Changing an
interface on an object used by all plugins will be significantly
easier, earlier.


On Wed, Apr 11, 2018 at 12:25 PM, Jeff Ortel <jor...@redhat.com
<mailto:jor...@redhat.com>> wrote:



On 04/11/2018 10:59 AM, Brian Bouterse wrote:



On Tue, Apr 10, 2018 at 10:43 AM, Jeff Ortel
<jor...@redhat.com <mailto:jor...@redhat.com>> wrote:














On 04/06/2018 09:15 AM, Brian Bouterse wrote:

Several plugins have started using the Changesets
including pulp_ansible, pulp_python, pulp_file, and
perhaps others. The Changesets provide several distinct
points of value which are great, but there are two
challenges I want to bring up. I want to focus only on
the problem statements first.

1. There is redundant "differencing" code in all
plugins. The Changeset interface requires the plugin
writer to determine what units need to be added and
those to be removed. This requires all plugin writers
to write the same non-trivial differencing code over
and over. For example, you can see the same non-trivial
differencing code present in pulp_ansible

<https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306>,
pulp_file

<https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193>,
and pulp_python

<https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223>.
Line-wise, this "differencing" code makes up a large
portion (maybe 50%) of the sync code itself in each plugin.


Ten lines of trivial set logic hardly seems like a big
deal but any duplication is worth exploring.

It's more than ten lines. Take pulp_ansible for example. By
my count (the linked to section) it's 89 lines, which out of
306 lines of plugin code for sync is 29% of extra redundant
code. The other plugins have similar numbers. So with those
numbers in mind, what do you think?


I was counting the lines (w/o comments) in find_delta() based
on the linked code.  Which functions are you counting?


I was counting the find_delta, build_additions, and
build_removals methods. Regardless of how the lines are counted,
that differencing code is the duplication I'm talking about.
There isn't a way to use the changesets without duplicating that
differencing code in a plugin.


The differencing code is limited to find_delta() and perhaps
build_removals().  Agreed, the line count is less useful than
specifically identifying duplicate code.  Outside of find_delta(),
I see similar code (in part because it got copied from file
plugin) but not seeing actual duplication.  Can you be more specific?


Very similar code or identical code, I think it begs the question why 
are we having plugin writer's do this at all? What value are they 
creating with it? I don't have a reasonable answer to that question, 
so the requirement for plugin writer's to write that code brings me 
back to the problem statement: "plugin writers have redundant 
differencing code when using Changesets". More info on why it is 
valuable for the plugin writer to do the differencing code versus the 
Changesets would be helpful.


The ChangeSet abstraction (and API) is based on following division of 
responsibility:


The plugin  (with an understanding of the remote and its content):
  - Download metadata.
  - Parse metadata
  - Based on the metadata:
    - determine content to be added to the repository.
  - define how artifacts are downloaded.
  - construct content
    - determine content to be removed to the repository.

Core (without understand of specific remote or its content):
  - Provide low level API for plugin to affect the changes it has 
determined need to be made to the repository.  This is downloaders, 
models etc.
  - Provide high(er) level API for plugin to affect the changes it has 
determined need to be made to the repository.  This is the ChangeSet.


Are you proposing that this is not the correct division?





So a shorter, simpler problem statement is: "to use the
changesets plugin writers have to do extra work to compute
additions and rem

Re: [Pulp-dev] Pulp 3 REST API Challenges

2018-04-10 Thread Jeff Ortel



On 04/10/2018 04:15 PM, Dennis Kliban wrote:
On Tue, Apr 10, 2018 at 2:04 PM, Brian Bouterse > wrote:


These are good problem statements. I didn't understand all of the
aspects of it, so I put some inline questions.

My overall question is: are these related problems? To share my
answer to this, I believe the first two problems are related and
the third is separate. The classic divide and conquor approach we
could use here is to confirm that the problems are unrelated and
focus on resolving one of them first.


I don't think all 3 are related problems. The motivation for grouping 
all together is that a subset of the action endpoints from problem 1 
are used to create repository versions and Problem 3 is a problem with 
the repository version creation API.



On Mon, Apr 9, 2018 at 3:18 PM, Austin Macdonald
> wrote:

Folks,

Austin, Dennis, and Milan have identified the following issues
with current Pulp3 REST API design:

  * Action endpoints are problematic.
  o Example POST@/importers//sync/
  o They are non-RESTful and would make client code
tightly coupled with the server code.
  o These endpoints are inconsistent with the other parts
of the REST API.

Is self-consistency really a goal? I think it's a placeholder for
consistency for REST since the "rest" of the API is RESTful. After
reading parts of Roy Fielding's writeup of the definition of REST
I believe "action endpoints are not RESTful" to be a true
statement. Maybe "Action endpoints are problematic" should be
replaced with "Action endpoints are not RESTful" perhaps and have
the self-consistency bullet removed?


+1 to "Action endpoints are not RESTful"
+1 to removing the self-consistency language

  o DRF is not being used as intended for action endpoints
so we have to implement extra code. (against the grain)

I don't know much about this. Where is the extra code?

  * We don't have a convention for where plug-in-specific,
custom repository version creation endpoints.
  o example POST@/api/v3//docker/add/
  o needs to be discoverable through the schema

What does discoverable via the schema ^ mean? Aren't all urls
listed in the schema?

I think of ^ problem somewhat differently. Yes all urls need to be
discoverable (a REST property), but isn't it more of an issue that
the urls which produce repo versions can't be identified
distinctly from any other plugin-contributed url? To paraphrase
this perspective: making a repo version is strewn about throughout
the API in random places which is a bad user experience. Is that
what is motivation url discovery?


Yes. I envision a CLI that can discover new plugin 
repository-version-creating functionality without having to install 
new client packages. Allowing plugin writers to add endpoints in 
arbitrary places for creating repository versions will make it 
impossible for the client to know what all the possible ways of 
creating a repository version are.


  * For direct repository version creation, plugins are not
involved.
  o validation correctness problem:
https://pulp.plan.io/issues/3541

  o example:
POST@/api/v3/repositories//versions/

I agree with this problem statement. In terms of scope it affects
some plugin writers but not all.


I think it affects all plugin writers. Even the File plugin needs to 
provide some validation when creating a repository version. Right now 
you can add a FileContent with the same relative path as another 
FileContent in the repository version. This should not be possible 
because it's not a valid combination of FileContent units in the same 
repository version.


Not necessarily.  Two files with the same relative path will have 
different digest (different content).  The assumption that they both 
cannot be in the same repository makes assumptions about how the 
repository is used which I don't think is a good idea.  Image two 
different versions of abc.iso.




We would like to get feedback on these issues being sound and
worth resolving before we resume particular solution
discussion[1].

Thanks,
Austin, Dennis, and Milan

[1]
https://www.redhat.com/archives/pulp-dev/2018-March/msg00066.html



___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev

Re: [Pulp-dev] Pulp 3 REST API Challenges

2018-04-10 Thread Jeff Ortel



On 04/10/2018 04:15 PM, Dennis Kliban wrote:
On Tue, Apr 10, 2018 at 2:04 PM, Brian Bouterse > wrote:


These are good problem statements. I didn't understand all of the
aspects of it, so I put some inline questions.

My overall question is: are these related problems? To share my
answer to this, I believe the first two problems are related and
the third is separate. The classic divide and conquor approach we
could use here is to confirm that the problems are unrelated and
focus on resolving one of them first.


I don't think all 3 are related problems. The motivation for grouping 
all together is that a subset of the action endpoints from problem 1 
are used to create repository versions and Problem 3 is a problem with 
the repository version creation API.



On Mon, Apr 9, 2018 at 3:18 PM, Austin Macdonald
> wrote:

Folks,

Austin, Dennis, and Milan have identified the following issues
with current Pulp3 REST API design:

  * Action endpoints are problematic.
  o Example POST@/importers//sync/
  o They are non-RESTful and would make client code
tightly coupled with the server code.
  o These endpoints are inconsistent with the other parts
of the REST API.

Is self-consistency really a goal? I think it's a placeholder for
consistency for REST since the "rest" of the API is RESTful. After
reading parts of Roy Fielding's writeup of the definition of REST
I believe "action endpoints are not RESTful" to be a true
statement. Maybe "Action endpoints are problematic" should be
replaced with "Action endpoints are not RESTful" perhaps and have
the self-consistency bullet removed?


+1 to "Action endpoints are not RESTful"
+1 to removing the self-consistency language

  o DRF is not being used as intended for action endpoints
so we have to implement extra code. (against the grain)

I don't know much about this. Where is the extra code?

  * We don't have a convention for where plug-in-specific,
custom repository version creation endpoints.
  o example POST@/api/v3//docker/add/
  o needs to be discoverable through the schema

What does discoverable via the schema ^ mean? Aren't all urls
listed in the schema?

I think of ^ problem somewhat differently. Yes all urls need to be
discoverable (a REST property), but isn't it more of an issue that
the urls which produce repo versions can't be identified
distinctly from any other plugin-contributed url? To paraphrase
this perspective: making a repo version is strewn about throughout
the API in random places which is a bad user experience. Is that
what is motivation url discovery?


Yes. I envision a CLI that can discover new plugin 
repository-version-creating functionality without having to install 
new client packages. Allowing plugin writers to add endpoints in 
arbitrary places for creating repository versions will make it 
impossible for the client to know what all the possible ways of 
creating a repository version are.


Currently plugins can provide one (or more) arbitrary endpoints to 
perform sync which is one form of creating a repository version. How is 
the endpoint for creating a version by adding content any different?




  * For direct repository version creation, plugins are not
involved.
  o validation correctness problem:
https://pulp.plan.io/issues/3541

  o example:
POST@/api/v3/repositories//versions/

I agree with this problem statement. In terms of scope it affects
some plugin writers but not all.


I think it affects all plugin writers. Even the File plugin needs to 
provide some validation when creating a repository version. Right now 
you can add a FileContent with the same relative path as another 
FileContent in the repository version. This should not be possible 
because it's not a valid combination of FileContent units in the same 
repository version.



We would like to get feedback on these issues being sound and
worth resolving before we resume particular solution
discussion[1].

Thanks,
Austin, Dennis, and Milan

[1]
https://www.redhat.com/archives/pulp-dev/2018-March/msg00066.html



___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev




___

Re: [Pulp-dev] Roadmap Challenges

2018-04-10 Thread Jeff Ortel

From a tooling perspective:

we have had good success in the past with fully defining and designing a 
feature in a Redmine Story.  The story (description) provides a good way 
to capture (and edit) the overall design and (comments) support a 
discussion history.  Then, the implementation can be broken down and 
tracked by related sub-tasks which are aligned to sprints and cross 
core/plugin boundaries.  The feature is complete when all the 
implementation tasks are complete.


On 04/06/2018 02:00 PM, Robin Chan wrote:

Brian,

To bring this back to your original question, here are some comments 
in line.


Agree w/#1 - I have observed a few different ways that this problems 
has been solved by developers. The requirement here is "I need a way 
to understand all the work and deliverables associated with a 
feature." This question comes down to how do we track of deliverables. 
This is I think secondary and not as much of a problem as the next 
question.


#2 - This is essentially a question of planning deliverables. Your 
descriptions is "how will someone know if a feature is committed to"? 
I think full planning is not necessary for commitment. I believe that 
"full planning" part could go in #1 in terms of tracking status. I 
think the question is actually "how will someone know if a feature is 
committed to and when it is committed by" - addition of a time or time 
frame.


In my experience, feature work generally went like this:
1. Define feature/problem to be solved.
2. Investigate:
    - refine requirements/problem definition
    - do enough design or planning of tasks to come up with estimate 
of work

3. Commit to work or not
4. execute along list of tasks, refine list as you learn.

Steps 1-3 is part of roadmap planning (higher level planning) and #3-4 
is sprint planning.



I think the problem with using the sprint field as we have used it, is 
that if you add something to a sprint, the Scrum definition would lead 
people to assume that the team is committing to it at the end of a 
defined sprint period. We do not. This major departure from industry 
standard does not serve us well in my opinion. We have kept items on 
sprints for many months and then removed it. Even if we were able to 
convince folks that our definition of sprint was "our next few 
sprints" of work, we don't have any accountability that we are 
actually keeping our commitment here and the folks wanting something 
on the sprint don't have any idea if something added to a sprint will 
be there in 3 weeks or 12 weeks. I think others in software are 
reasonable in understanding that software deliveries aren't going to 
be there until they are, but I think our immediate focus on what is in 
process (impending delivery/next build) and on some of the larger 
deliveries.


Robin

On Thu, Mar 29, 2018 at 3:13 PM, Brian Bouterse > wrote:


I want to start a discussion around how Pulp does roadmap planning
and some of our current challenges. This is the pre-discussion
part of a future PUP. I want to collaborate on characterizing the
problems first before we discuss solutions.

# The surface problem statement

It very difficult for external stakeholders to answer some simple
questions about any given feature:

* How would a user use this feature exactly?
* Is it completed? If not, how much is left to do?
* Which release is this going in?
* Has this feature been fully planned and accepted as a committed
to piece of work?

# deeper problems

I believe there are two deeper problems contributing to the
problem above.

1. Any given feature is typically spread across multiple Redmine
tickets. There may be the first implementation, followup changes,
adjustments, bugfixes, reworks, docs followups, etc. This makes it
practically hard to have the information necessary to answer the
first 3 questions ^.

2. Devs of core or a plugin have no clear way to clearly signal
that a feature has been fully planned and is committed to. The
'sprint' field has been used heretofore, but the recent feedback
suggests that mechanism may not be the best way to indicate that
work has been fully planned and accepted. We need a clear way to
answer the last question ^.

Do you agree or disagree with these problem statements? Do you
have anything to add about the problem statements?

Thanks!
Brian

___
Pulp-dev mailing list
Pulp-dev@redhat.com 
https://www.redhat.com/mailman/listinfo/pulp-dev





___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com

Re: [Pulp-dev] Pulp 3 MVP Issue Cleanup

2018-04-10 Thread Jeff Ortel



On 04/10/2018 11:28 AM, Brian Bouterse wrote:
OK so to bring it back to how to manage this in Redmine. We've been 
talking about 'Tags' and I've read a some +1s for their use to track 
these things and no -1s. I want to identify I think it would be better 
to use the 'Category' built-in field of Redmine. Tags can be 
multi-selcted, but I don't think a single issue will need to be a CLI 
and an Installer and a $other_tag all at once (multi-selected). 
Categories are a single selection which seems more appropriate. We 
also make little use of them today and they are built into all Issues 
redmine has.


+1



Should I make a Categories for 'Ansible Installer', 'CLI' and 
'Migration Tool' in the Pulp project on Redmine?


Other suggestions and ideas are welcome.




On Tue, Apr 10, 2018 at 11:23 AM, Jeff Ortel <jor...@redhat.com 
<mailto:jor...@redhat.com>> wrote:




On 04/10/2018 10:15 AM, Jeff Ortel wrote:



On 04/04/2018 05:09 PM, Dennis Kliban wrote:

Anything that is going to have it's own release cadence should
be tracked in it's own project. That way we can assign issues
related to specific release of that project to the particular
release.

Are we going to release the CLI, Ansible Installer, and the
Migration tool as part of one version of Pulp or will these all
be versioned separately?


Separately.


Meant to clarify.  The CLI can be released separately but I think
the migration tool needs to be released in step with Pulp.  As for
the installer .. seems like that also needs to be released in step
with Pulp.






On Wed, Apr 4, 2018 at 5:41 PM, Austin Macdonald
<aus...@redhat.com <mailto:aus...@redhat.com>> wrote:


I'm hoping to continue the "Infrastructure" Redmine
project for things like website hosting. I see what you
mean though because it will be developed and released
separately. I think we're in a similar situation for 3
things: the ansible installer, the migration tool, and
CLI, and for each of them we should either make their
own Redmine projects or a tag under Pulp. We already
have many Redmine projects and they are kind of a pain
so I want to float a tags based approach for feedback.
Perhaps keeping them out of "Pulp" means that we remove
all the existing tags from them and tag them with new
tags like 'Ansible Installer', '2to3 Migration' and 'CLI'?


I had hoped that someday there would be a separate group of
committers for pulp/devel or wherever we keep it. Also, I
wouldnt want potential users/PMs to see a "bug count" that
includes non-user facing issues. These concerns are trivial
though, and if projects are a pain, I'm fine with keeping Tags.

Since projects are a pain, can we get rid of the "external"
project? https://pulp.plan.io/projects/external/issues
<https://pulp.plan.io/projects/external/issues>


___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>




___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>





___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pulp 3 MVP Issue Cleanup

2018-04-10 Thread Jeff Ortel



On 04/10/2018 10:15 AM, Jeff Ortel wrote:



On 04/04/2018 05:09 PM, Dennis Kliban wrote:
Anything that is going to have it's own release cadence should be 
tracked in it's own project. That way we can assign issues related to 
specific release of that project to the particular release.


Are we going to release the CLI, Ansible Installer, and the Migration 
tool as part of one version of Pulp or will these all be versioned 
separately?


Separately.


Meant to clarify.  The CLI can be released separately but I think the 
migration tool needs to be released in step with Pulp.  As for the 
installer .. seems like that also needs to be released in step with Pulp.







On Wed, Apr 4, 2018 at 5:41 PM, Austin Macdonald <aus...@redhat.com 
<mailto:aus...@redhat.com>> wrote:



I'm hoping to continue the "Infrastructure" Redmine project
for things like website hosting. I see what you mean though
because it will be developed and released separately. I think
we're in a similar situation for 3 things: the ansible
installer, the migration tool, and CLI, and for each of them
we should either make their own Redmine projects or a tag
under Pulp. We already have many Redmine projects and they
are kind of a pain so I want to float a tags based approach
for feedback. Perhaps keeping them out of "Pulp" means that
we remove all the existing tags from them and tag them with
new tags like 'Ansible Installer', '2to3 Migration' and 'CLI'?


I had hoped that someday there would be a separate group of
committers for pulp/devel or wherever we keep it. Also, I wouldnt
want potential users/PMs to see a "bug count" that includes
non-user facing issues. These concerns are trivial though, and if
projects are a pain, I'm fine with keeping Tags.

Since projects are a pain, can we get rid of the "external"
project? https://pulp.plan.io/projects/external/issues
<https://pulp.plan.io/projects/external/issues>


___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pulp 3 MVP Issue Cleanup

2018-04-10 Thread Jeff Ortel

Looks great.  Thanks for putting this together!

On 04/04/2018 02:23 PM, Austin Macdonald wrote:
David and I went through all the pulpcore issues that have the "Pulp3 
MVP" tag.


We added this one to the sprint:

  * https://pulp.plan.io/issues/3545


These two need to be updated before we can move forward:

  * @dalley https://pulp.plan.io/issues/3505
  * @asmacdo https://pulp.plan.io/issues/3546


We marked these as groomed, unless someone says "no", I plan to add 
all of these to the sprint.


  * https://pulp.plan.io/issues/3082
  * https://pulp.plan.io/issues/3081
  * https://pulp.plan.io/issues/3220
  * https://pulp.plan.io/issues/3298
  * https://pulp.plan.io/issues/3395

We have some vagrant/ansible issues. I don't think these really belong 
in the "Pulp" project tracker. Mind if we move them to the 
"Infrastructure" project? (BTW, there are a lot more, just without the 
MVP tag).


  * https://pulp.plan.io/issues/3439
  * https://pulp.plan.io/issues/2922






___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Pulp 3 REST API Challenges

2018-04-10 Thread Jeff Ortel



On 04/09/2018 02:18 PM, Austin Macdonald wrote:

Folks,

Austin, Dennis, and Milan have identified the following issues with 
current Pulp3 REST API design:


  * Action endpoints are problematic.
  o Example POST@/importers//sync/
  o They are non-RESTful and would make client code tightly
coupled with the server code.
  o These endpoints are inconsistent with the other parts of the
REST API.
  o DRF is not being used as intended for action endpoints so we
have to implement extra code. (against the grain)


+1


  * We don't have a convention for where plug-in-specific, custom
repository version creation endpoints.
  o example POST@/api/v3//docker/add/
  o needs to be discoverable through the schema



+1


  * For direct repository version creation, plugins are not involved.
  o validation correctness problem:
https://pulp.plan.io/issues/3541

  o example: POST@/api/v3/repositories//versions/



Looks like half of the plugins will need to participate in creating 
repository versions (outside of sync). The API design should take a 
consistent approach to creating repository versions (/add/ _and_ /sync/).


We would like to get feedback on these issues being sound and worth 
resolving before we resume particular solution discussion[1].


Thanks,
Austin, Dennis, and Milan

[1] https://www.redhat.com/archives/pulp-dev/2018-March/msg00066.html 





___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Fwd: Re: Changesets Challenges

2018-04-10 Thread Jeff Ortel














On 04/06/2018 09:15 AM, Brian Bouterse wrote:
Several plugins have started using the Changesets including 
pulp_ansible, pulp_python, pulp_file, and perhaps others. The 
Changesets provide several distinct points of value which are great, 
but there are two challenges I want to bring up. I want to focus only 
on the problem statements first.


1. There is redundant "differencing" code in all plugins. The 
Changeset interface requires the plugin writer to determine what units 
need to be added and those to be removed. This requires all plugin 
writers to write the same non-trivial differencing code over and over. 
For example, you can see the same non-trivial differencing code 
present in pulp_ansible 
, 
pulp_file 
, 
and pulp_python 
. 
Line-wise, this "differencing" code makes up a large portion (maybe 
50%) of the sync code itself in each plugin.


Ten lines of trivial set logic hardly seems like a big deal but any 
duplication is worth exploring.




2. Plugins can't do end-to-end stream processing. The Changesets 
themselves do stream processing, but when you call into 
changeset.apply_and_drain() you have to have fully parsed the metadata 
already. Currently when fetching all metadata from Galaxy, 
pulp_ansible takes about 380 seconds (6+ min). This means that the 
actual Changeset content downloading starts 380 seconds later than it 
could. At the heart of the problem, the fetching+parsing of the 
metadata is not part of the stream processing.


The additions/removals can be any interable (like generator) and by 
using ChangeSet.apply() and iterating the returned object, the pluign 
can "turn the crank" while downloading and processing the metadata.  The 
ChangeSet.apply_and_drain() is just a convenience method.  I don't see 
how this is a limitation of the ChangeSet.




Do you see the same challenges I do? Are these the right problem 
statements? I think with clear problem statements a solution will be 
easy to see and agree on.


I'm not convinced that these are actual problems/challenges that need to 
be addressed in the near term.




Thanks!
Brian


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Importer Name

2018-03-15 Thread Jeff Ortel
In pulp3, users need to keep track for a number of things.  For example, 
without auto publish, users need to keep track of which importer(s) and 
publishers need to be used for sync/publish workflows.  I fully expect 
that users using the API will be maintaining some kind of 
automation/orchestration on their end (shell scripts, ansible). So, 
keeping track of sync and download policies does not seem like much of a 
burden.  Also, after further consideration, I don't think that storing 
either the sync (mode) or download policy on the repository is appropriate.



On 03/13/2018 04:59 PM, David Davis wrote:
Can you elaborate on what made you reconsider? Asking because I still 
see the point that you and Justin raised about dropping the fields as 
an issue.



David

On Mon, Mar 12, 2018 at 12:31 PM, Jeff Ortel <jor...@redhat.com 
<mailto:jor...@redhat.com>> wrote:




On 03/12/2018 10:28 AM, Jeff Ortel wrote:

On 03/08/2018 10:13 AM, Austin Macdonald wrote:

Motivation:
The name "importer" carries some inaccurate implications.
1) Importers should "import". Tasks like "sync" will do the
actual importing. The object only holds the configuration that
happens to be used by sync tasks.
2) Sync tasks on mirror mode remove content as well as add it,
so "import" isn't quite right.

Proposed name: Remote

The inspiration for remote is "git remote". In git, remotes
represent external repositories, which is almost exactly what
our importers do.


+1, The git/ostree "remote" concept applies very well to most of
what an "importer" defines in pulp.



---
Part 2: Trim the fields

Currently, Importers have settings that can be categorized in 2
ways. I am proposing removing the "sync settings" from the
Remote model:

External Source information
    name
    feed_url
    validate
    ssl_ca_certificate
    ssl_client_certificate
    ssl_client_key
    ssl_validation
    proxy_url
    username
    password

Sync settings
    download_policy
    sync_mode

This had some advantages when Importers were related to
Repositories. For example, having a repository.importer that
always used the same sync mode made sense. However, the "how" to
sync settings don't make much sense when importers and
repositories are not linked. It seems very reasonable that a
user might have 2 repositories that sync from the same source
(ex EPEL). It does not make sense for them to have create an
Importer for the EPEL repository twice or more just to change
sync_mode or download policy. Instead of modeling these fields,
I propose that they should POST body parameters.


I, as a user, don't like having to specify download_policy & 
sync_mode on every request.  The burden on the user to passing
these consistently seems unnecessary and prone to error.  And,
like something that pulp should store as part of it's value
proposition.   Imagine an organization with tons of repositories
and admins.  They would need to maintain a spreadsheet, notes,
scripts for these settings so that admin A is syncing using the
same settings as admin B.

Perhaps download_policy &  sync_mode should be attributes of the
repository.  Thoughts on moving them there.  The sync_mode
(mirror/additive) may need to be renamed in a way that changes it
from describing how the importer is syning to something that
defines the type of repository.  Like that the repository is
intended to be a mirror or not. Perhaps just a "mirror" (bool)
attribute.


I have reconsidered this.  Disregard.



example
POST v3/remotes/1234/sync/ repositorty=myrepo_href
sync_mode=additive, dl_policy=immediate
POST v3/remotes/1234/sync/ repositorty=myother_href
sync_mode=mirror, dl_policy=deferred



___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>





___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Importer Name

2018-03-12 Thread Jeff Ortel

On 03/08/2018 10:13 AM, Austin Macdonald wrote:

Motivation:
The name "importer" carries some inaccurate implications.
1) Importers should "import". Tasks like "sync" will do the actual 
importing. The object only holds the configuration that happens to be 
used by sync tasks.
2) Sync tasks on mirror mode remove content as well as add it, so 
"import" isn't quite right.


Proposed name: Remote

The inspiration for remote is "git remote". In git, remotes represent 
external repositories, which is almost exactly what our importers do.


+1, The git/ostree "remote" concept applies very well to most of what an 
"importer" defines in pulp.




---
Part 2: Trim the fields

Currently, Importers have settings that can be categorized in 2 ways. 
I am proposing removing the "sync settings" from the Remote model:


External Source information
    name
    feed_url
    validate
    ssl_ca_certificate
    ssl_client_certificate
    ssl_client_key
    ssl_validation
    proxy_url
    username
    password

Sync settings
    download_policy
    sync_mode

This had some advantages when Importers were related to Repositories. 
For example, having a repository.importer that always used the same 
sync mode made sense. However, the "how" to sync settings don't make 
much sense when importers and repositories are not linked. It seems 
very reasonable that a user might have 2 repositories that sync from 
the same source (ex EPEL). It does not make sense for them to have 
create an Importer for the EPEL repository twice or more just to 
change sync_mode or download policy. Instead of modeling these fields, 
I propose that they should POST body parameters.


I, as a user, don't like having to specify download_policy & sync_mode 
on every request.  The burden on the user to passing these consistently 
seems unnecessary and prone to error.  And, like something that pulp 
should store as part of it's value proposition.   Imagine an 
organization with tons of repositories and admins. They would need to 
maintain a spreadsheet, notes, scripts for these settings so that admin 
A is syncing using the same settings as admin B.


Perhaps download_policy &  sync_mode should be attributes of the 
repository.  Thoughts on moving them there.  The sync_mode 
(mirror/additive) may need to be renamed in a way that changes it from 
describing how the importer is syning to something that defines the type 
of repository.  Like that the repository is intended to be a mirror or 
not.  Perhaps just a "mirror" (bool) attribute.


example
POST v3/remotes/1234/sync/ repositorty=myrepo_href sync_mode=additive, 
dl_policy=immediate
POST v3/remotes/1234/sync/ repositorty=myother_href sync_mode=mirror, 
dl_policy=deferred




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Github Required Checks

2018-03-05 Thread Jeff Ortel



On 03/02/2018 03:20 PM, Brian Bouterse wrote:
I had neglected to write up the temporary enable/disable part of the 
issue, so I just updated it here: https://pulp.plan.io/issues/3379


In short, one of the pulp org owners (ipanova, ttereshc, rchan, 
jortel, bmbouter) can temporarily enable/disable required checks. This 
issue would also add this process to both the pulp2 and pulp3 docs.


What do you all think about an idea like this?


+1



On Fri, Feb 16, 2018 at 1:33 PM, Brian Bouterse > wrote:


+1 to enabling checks for the 'pulp' and 'pulp_file' repos in
Github with the ability to temporarily disable them. I wrote up
this issue here to do that: https://pulp.plan.io/issues/3379


I think we should enable these because we have a human-enforced
policy that expects failed checks to not be merged, but in
practice code that is merged breaks things that quality checks
also identified. I think Pulp would benefit from a stronger
pre-merge enforcement of our existing checks. In the case where
our quality checks are failing, I'm hoping we can focus on fixing
them before continuing on with the merge in all but exceptional cases.

On Thu, Feb 15, 2018 at 8:55 PM, Daniel Alley > wrote:

+0 on required github-enforcement, +1 to a strict
human-enforced policy about tests passing for PR merges

Reason being, an issue has occurred which would block valid
PRs twice within the last month.  The first being the test
certs expiring on January 25th, the second being when we
switched the PR unittest runners over to new versions of
Fedora this morning.

I'm not against the idea by any means, I'm just not entirely
convinced that the exceptions requiring intervention will be
very infrequent, and I can imagine it leading to a fair amount
of frustration.

On Thu, Feb 15, 2018 at 7:34 PM, David Davis
> wrote:

+1 to enabling the checks for the core pulp repos in
Github. The only concern I have is that perhaps something
happens outside of our control (e.g. Travis goes down) and
we can’t merge PRs. In those cases though, we can
temporarily disable checks.


David

On Thu, Feb 15, 2018 at 4:38 PM, Brian Bouterse
> wrote:

I want to adjust my proposal to only be for core, and
not a requirement for any plugin. I think the plugin
policy is something the committers should decide along
with their users. I overall believe enabling these
kinds of checks is a good idea so I encourage plugins
do it. We should make sure each team has a github
admin in place who could make such a change.

I like option 1, which to retell my understanding
means that we'll enable github to require the checks
to pass and you can't merge or push without them
passing. Is that good, would there be any -1's for a
change on core like this?

To share my perspective about plugins being in the
Pulp organization, they are there only for a clear
association with Pulp on github. Any open source
plugin that creates value with Pulp and does it with a
debatable level of responsibility towards its users I
think is probably ok to include. I don't expect them
to give up any control or autonomy if they do that.
The benefit of bringing these different plugin
communities closer together through the organization
is hopefully towards common services like automated
testing and such over time.



On Tue, Feb 13, 2018 at 8:28 AM, Milan Kovacik
> wrote:

> Option 1:  Nothing merges without passing PR runner 
tests, ever, even if the issue is
rooted in the configuration or infrastructure of
the test runners or an expired certificate etc. 
This would light a fire to get these issues
resolved ASAP because nothing can happen without them.
I like this option for the same reasons Daniel
mentioned; it also implies an up-to-date
infrastructure and better reliability: both false
negative and false positive (test/build) failures
will still happen in all the three options
  

Re: [Pulp-dev] Migrating Sprint to Custom Field

2018-03-05 Thread Jeff Ortel

+1

On 03/02/2018 04:17 PM, Brian Bouterse wrote:
Redmine's milestone feature allows for roadmap pages to be published 
for each project on pulp.plan.io  like this one 
[0]. Currently all projects use a single set of milestones from the 
main 'Pulp' project on Redmine which is what defines the Sprints, e.g. 
'Sprint 22'. This creates a few problems:


1. Redmine projects can't use the milestone feature of Redmine for 
release planning. This is unfortunate since the milestone feature is a 
good release planning and roadmapping tool.


2. Any project that does use milestones can't have issues associated 
with milestones also on a sprint. Like pulp_rpm pulp3 issues [0].


I'm interested in hearing any solution on resolving these issues, but 
I also have one to share:


1. We could create a custom field called 'Sprint' and make that 
available to all projects.

2. Populate the custom field with all existing Sprint values
3. We then port the existing historic sprint issues to the correct 
custom field which preserves all past sprints.

4. Clear the milestone for all isues on plan.io 
5. Delete the old, unused milestones
6. enjoy

At least one issue at the upcoming sprint planning meeting won't be 
able to be added because of this so I'm hoping we can resolve in the 
next few days.


[0]: https://pulp.plan.io/versions/50

Thanks!
Brian


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Plugin Writer's Coding Workshop Feedback

2018-02-13 Thread Jeff Ortel

Thanks for providing this feedback, Brian.!  Good stuff.


On 02/12/2018 03:57 PM, Brian Bouterse wrote:
At the Foreman Construction day [0] last Wednesday, we had our first 
code focused plugin writer's workshop. About 6 people were actively 
engaged as we talked through the plugin API, example code, and then 
tried to install Pulp3. All of this happened over about 4-5 hours. In 
contrast to the devconf workshop which was planning focused, this was 
a "let's look at and write some code together" workshop. Two attendees 
came to both, and they got all the way to calling their own sync code.


We got a lot of feedback, which I will try to group into some areas. 
(my feedback in parens)


[installation issues]
- the pypi install commands are missing the migrations and they 
produce broken installations

- the vagrantcloud boxes couldn't have a plugin installed on them :(
- the dev environments worked great but we didn't recommend them until 
we realized all of these other methods were broken
- we assume the user 'pulp' in a lot of places, e.g. systemd file, 
ansible, etc
- assumptions about Fedora both in ansible, but also the copy/paste 
commands
- some users who copied and pasted commands didn't realize they 
weren't for their OS


[desire for simpler things]
- there is a strong desire to use sqlite as the default db not postgresql


Very interesting.  Can you elaborate about why?


- desire to not install a message bus. (I think this is unavoidable)

[need examples]
- pulp_file is our example, but it's laid out into different functions 
and classes. People were confused by this because they thought the 
classes and function names are meaningful when they aren't. For 
example we were asked "what is a synchronizer" 
https://github.com/pulp/pulp_file/blob/master/pulp_file/app/tasks.py#L139


The Synchronizer used to be the FileImporter and got renamed as part of 
early mitigation of the "circular import" problem.  I plan to do some 
final refactoring as soon as the plugin API stabilizes (really soon).  I 
suspect the Synchronizer class (at least the name), will go away.  That 
said, I'm a little puzzled as to what led to actual "confusion" about a 
class named Synchronizer that was used to synchronize a repository.  You 
also mentioned that some of the function names where somehow confusing - 
can you name them and why they were confusing?


- pulp_file doesn't provide a good example because changesets do 
everything for you. (The main pulp_file should be a simple, direct 
example of the objects they have to save).


True, but It does provides a good example of how to use the ChangeSet.

- people found pulp_example via github and thought "oh here is what I 
needed to find!" only to base their code on outdated code (we need to 
delete pulp_example)
- a database picture would be helpful to show off the data layer 
objects, foreign keys, and attributes.


Yes!  We really need to publish an ER diagram.  I'm overdue on an action 
item to produce one.




[specific things]
- 'id' on the inherited content unit conflicted with a content unit 
which also uses 'id'.
- qpid vs rabbitmq defaults confusion. The settings.yaml says we 
default to qpid so they installed qpid, but really in settings.py it's 
rabbitmq. (this is a 1 line fix)



In terms of the installation challenges, we should consider 
consolidating onto a single installation method of pip with 
virtualenv. Of all the methods we offer [1] that is the one everyone 
could use and no one minded. We could remove the other options from 
the install page so that for for now (pre-GA) everyone is doing the 
same, simple thing. I think we should consolidate our effort and not 
focus on end-user installations as the main thing right now.**


I also think we should do these things:

* switch pulp to use sqlite3 by default. An ansible installer can both 
install postgres and configure it, later.

* rewrite pulp_file to be a really really simple example


The file-plugin is already a "really really simple example". Rewriting 
it not using the ChangeSet will significantly increase code line count 
and complexity.  As you know, the file-plugin supports managing 
FileContent like .img and .iso files.  The primary goal of the pulp-file 
project is to support real use cases.  Because it's the only plugin, it 
has taken on a secondary goal of being an example.  I'm opposed to 
increasing complexity in support of the secondary "example" goal at the 
expense of its primary goal.


The file-plugin currently provides a good example of how to use the 
ChangeSet.  I have no doubt that plugin writers want additional examples 
but I think that if we intend to continue to rely on "real" plugins as 
natural examples, we should identify a plugin on the roadmap that has 
made the design choice to be implemented without the ChangeSet and 
prioritize it.  Another choice could be to refactor the example plugin 
to support a broader range of examples and continue to maintain it.



* 

Re: [Pulp-dev] Adding Model.created.

2018-02-12 Thread Jeff Ortel

FYI: https://github.com/pulp/pulp/pull/3325

On 02/12/2018 03:01 PM, Dennis Kliban wrote:

+1 to created and +1 to updated.

On Mon, Feb 12, 2018 at 3:52 PM, David Davis <davidda...@redhat.com 
<mailto:davidda...@redhat.com>> wrote:


+1. Also, wondering if we should add a Model.last_updated field as
well.


David

On Mon, Feb 12, 2018 at 12:22 PM, Jeff Ortel <jor...@redhat.com
<mailto:jor...@redhat.com>> wrote:

A few of our models have a field:

created = models.DateTimeField(auto_now_add=True)

To support ordering needed by a FilePlugin use case, I'm
planning to add Content.created as it seems generally useful
and I believe will be needed by most plugins.  This raises a
more general question: should we add Model.created instead? 
Knowing when most things get created seems generally useful. 
For example, knowing when an artifact got created tells uses
when it got downloaded.  Things like that.

Thoughts?



___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>



___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Adding Model.created.

2018-02-12 Thread Jeff Ortel

A few of our models have a field:

created = models.DateTimeField(auto_now_add=True)

To support ordering needed by a FilePlugin use case, I'm planning to add 
Content.created as it seems generally useful and I believe will be 
needed by most plugins. This raises a more general question: should we 
add Model.created instead?  Knowing when most things get created seems 
generally useful.  For example, knowing when an artifact got created 
tells uses when it got downloaded.  Things like that.


Thoughts?


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Downloading Decision

2018-01-09 Thread Jeff Ortel
The pulp 3 alpha included two solutions for providing downloading 
support in the plugin API.  Each solution was based on different 
concurrency technologies, protocol support libs and abstractions.  The 
goal was to test drive each, get community feedback and make a second 
pass at documenting the /minimal/ requirements and characteristics 
deemed desirable for download support in the API.  This criteria appears 
at the end of this email.


Agreement has been reached to proceed with the /asyncio/based solution.  
The underlying asyncio/coroutine technology has several advantages and 
none of the disadvantages imposed by python threading.  The abstractions 
in the proposed solution are sufficient to meet the minimal criteria 
we've documented.


Unless there are objections, the next steps will include writing stories to:

 * Remove the /futures/ solution.
 * Move (promote) the /asyncio/ solution to the /download/ package.
 * Update the ChangeSet to work with /asyncio/ based downloaders.
 * Update the File plugin to work with /asyncio/ downloaders.
 * Update the Example plugin imports.


The feature set and a few characteristics of the /asyncio/ solution need 
(some) further discussion and may necessitate follow up stories.  But, 
this set of stories will get us most of the way there.







_Requirements_

 * support concurrent downloading by changesets

 * usesa /Downloader/

 * support concurrent downloading by streamer

 * uses a /Downloader/

 * importer provides the right, configured downloader(defined interface)

 * streamer and changeset does not choose


___Desirable_

 * Platform supply downloading support for known protocols.

 * Support concurrent downloading

 * single file

 * groups of files

 * HTTP downloader

 * size verification

 * hash verification

 * use the importer settings

 * automatically retry

 * automatically follow HTTP redirects

 * raise 400+ HTTP responses as foreground errors

 * exposes underlying lib settings

 * connection pooling (concurrent)

 * connection reuse (concurrent)

 * connection keep-alive support (concurrent)

 * extensible

 * must signal that headers are available prior to flowing data

 * raises fatal exceptions

 * FILE downloader

 * size verification

 * hash verification

 * use the importer settings

 * extensible

 * raises fatal Exceptions

 * Factory to build a downloader from the importer settings

 * Leverage library/language as much as possible


/Downloader//- /characteristics

 * Start the download

 * Do something with the downloaded content.

 * Write to file

 * Write to output stream (Eg: streamer)

 * Calculate and providethesize and digests

 * Verify size and/or digest(s) values  (requirement)



___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Stop distributing gofer

2018-01-08 Thread Jeff Ortel
The gofer package is distributed in Fedora and Copr[1] for EL6 & EL7.  
Gofer is an external (to the project) dependency much like Celery.  I 
propose we stop distributing gofer and update our documentation to refer 
users to the existing sources.


Thoughts?


[1] https://copr.fedorainfracloud.org/coprs/jortel/gofer/
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] repository version stories

2018-01-04 Thread Jeff Ortel

Looking at 3209, I don't see the actual design documented.

On 01/03/2018 04:58 PM, Dennis Kliban wrote:
@bmbouter, @daviddavis, and I have put together a plan for 
implementing repository version use cases. The overall design is 
captured in issue 3209[0]. The individual use cases are captured in 
the child stories.


Please take a look at these stories and provide feedback ASAP. We'd 
like to add most of these stories to the sprint during planning on 
Friday.



[0] https://pulp.plan.io/issues/3209


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Partially constructed data in the DB

2017-12-14 Thread Jeff Ortel



On 12/14/2017 12:55 PM, Brian Bouterse wrote:
The behavior brings me back to an attribute name like 'user_visible' 
and it would default to False. Thus you have to explicitly take the 
step to make it user visible. Whatever the name, I think this would 
this apply to both RepoVersion and Publication objects. Plugin writers 
who produce these objects also need docs that identify they need to 
set user_visible=True.


Agreed, except for field name.

My concern with the name user_visible is that rather than describing the 
incomplete state of the resource it describes only one aspect of how the 
resources should be handled.  That is, that a non-visible resource 
should be hidden from the user.  But there's more to it. For example, 
associating a publication to a distribution should be prevented by the 
viewset.  Not based on user visibility but the incomplete state of the 
publication.




If an exception is raised while creating the repo_version or 
publication, or from the plugin code, the core catches it, deletes the 
repo_version/publication and re-raises the exception. This will cause 
the task the user is tracking to error and report the error.


Agreed.



We had some challenges on irc in finding a working design for the 
crash case. If a crash occurs though the db record will just be there 
with user_visible=False. We need some way to clean those up. We can't 
assume that there will be just one outstanding one for us to cleanup 
next time for a variety of reasons I won't recap here. During the irc 
convo, @jortel suggested we consider if the tasking system can help 
cleanup the work like it cleans up other areas and I think that is a 
good idea. We could record on the Task model a list of objects to 
deleted if the tasking system cleans up a task that crashed while 
running. For example, when a publication is made, the first thing done 
it to associate it with the running task as an object that needs to be 
deleted if the task crashes. We would also hide this objects_to_delete 
list from the user in the Task serializer which would omit this data. 
If we don't omit that data from a Task serialization when the user 
tries to load the url they will get a 404 because that object has 
user_visible=False.


I think it would be best to omit from the task serializer.

All seems reasonable but want to note that for this to be crash proof it 
is imperative that the resource insert and the insert into 
/things-to-be-deleted-when-the-task-crashes/ must be committed in the 
same transaction in order to be crash proof.  The same is true for when 
the task completes successfully.  Updating the (valid|visible|?) field 
on the resource, inserting into CreatedResources and deleting from 
/things-to-be-deleted-when-the-task-crashes/ needs to be done in the 
same transaction.  This is trivial for the core because it can be done 
in the task code.  Relying on plugin writers to do this is a little 
concerning.


Perhaps we can do something simpler.  Given the frequency of crash or 
worker restart, I wonder if we could delete incomplete things based on 
another event that ensures that no tasks are running.  Kind of like how 
//tmp /is cleaned up on system reboot.  I don't think having some of 
these things hanging around in the DB is a problem.  It's mainly that we 
don't want to leak them indefinitely. Any ideas?




What are thoughts on these approaches, behaviors, and the attribute 
name? Should this be moved into Redmine?





On Thu, Dec 14, 2017 at 11:14 AM, Jeff Ortel <jor...@redhat.com 
<mailto:jor...@redhat.com>> wrote:




On 12/13/2017 01:54 PM, Brian Bouterse wrote:

Defining the field's behaivor a bit more could help us with the
name. Will it actually be shown to the user in viewsets and
filter results?

I think the answer should be no, not until it's fully finished. I
can't think of a reason why a user would want to see inconsistent
content during a sync or publish.


Agreed.


There are some downsides when users thinking things are done when
they aren't. For instance, the user could mistakenly think the
publish is done when its not, trigger package updates, and many
machines will still receive the old content because it hasn't
been switched over to auto-publish for the expected distribution.

Also how is this related to when the 'created_resources' field is
set on a Task? I had imagined core would set that at as the last
thing it does so that when the user sees it everthing is
"consistent" and "live" already.


Agreed.




-Brian

On Wed, Dec 13, 2017 at 2:42 PM, David Davis
<davidda...@redhat.com <mailto:davidda...@redhat.com>> wrote:

Thanks for answering my questions. I agree on not using an
“is_” prefix and avoiding “visible.”

Your suggestion of “valid” sounds fine. Maybe some other
options: finished, complete[d], ready.


David

 

Re: [Pulp-dev] Partially constructed data in the DB

2017-12-14 Thread Jeff Ortel



On 12/13/2017 01:54 PM, Brian Bouterse wrote:
Defining the field's behaivor a bit more could help us with the name. 
Will it actually be shown to the user in viewsets and filter results?


I think the answer should be no, not until it's fully finished. I 
can't think of a reason why a user would want to see inconsistent 
content during a sync or publish.


Agreed.

There are some downsides when users thinking things are done when they 
aren't. For instance, the user could mistakenly think the publish is 
done when its not, trigger package updates, and many machines will 
still receive the old content because it hasn't been switched over to 
auto-publish for the expected distribution.


Also how is this related to when the 'created_resources' field is set 
on a Task? I had imagined core would set that at as the last thing it 
does so that when the user sees it everthing is "consistent" and 
"live" already.


Agreed.



-Brian

On Wed, Dec 13, 2017 at 2:42 PM, David Davis <davidda...@redhat.com 
<mailto:davidda...@redhat.com>> wrote:


Thanks for answering my questions. I agree on not using an “is_”
prefix and avoiding “visible.”

Your suggestion of “valid” sounds fine. Maybe some other options:
finished, complete[d], ready.


David

On Wed, Dec 13, 2017 at 2:15 PM, Jeff Ortel <jor...@redhat.com
<mailto:jor...@redhat.com>> wrote:



On 12/13/2017 12:46 PM, David Davis wrote:

A few questions. First, what is meant by incomplete? I’m
assuming it refers to a version in the process of being
created or one that was not successfully created?


Both.



Also, what’s the motivation behind storing this information?
Is there something in Pulp that needs to know this or is this
so that the user can know?


There may be others but an importer needs to be passed the new
version so it can add/remove content.  It needs to exist in
the DB so that it can add/remove content in separate
transaction(s).



Lastly, I imagine that a task will be associated with the
creation of a version. Does knowing its state not suffice for
determining if a version is visible/valid?


IMHO, absolutely not.  That is not what tasks records in the
DB are for. Completed task records can be deleted at any time.




David

    On Wed, Dec 13, 2017 at 12:16 PM, Jeff Ortel
<jor...@redhat.com <mailto:jor...@redhat.com>> wrote:

There has been discussion on IRC about a matter related
to versioned repositories that needs to be broadened.  It
dealt with situations whereby a new repository version
exists in the DB in an incomplete state.  The incomplete
state exists because conceptually a repository version
includes both the version record itself and all of the DB
records that associate content.  For several reasons, the
entire version cannot be constructed in the DB in a
single DB transaction.  The problem of /Incomplete State/
is not unique to repository versions.  It applies to
publications as well.  I would like to discuss and decide
on a standard approach to resolving this throughout the
data model.

The IRC discussion (as related to me) suggested we use a
common approach of having a field in the DB that
indicates this state.  This seems reasonable to me.  As
noted, it's a common approach.  Thoughts?

Assume we did use a field, let's discuss name.  It's my
understanding that a field named /is_visible/ or just
/visible/ was discussed.  I would argue two things.  1)
the is_ prefix is redundant to the fact it's a boolean
field and we have not used this convention anywhere else
in the model.  2) Historically, the term /"visible"/ has
strong ties to user interfaces and is used to mask fields
or records from being displayed to users.  I propose we
use a more appropriate field name.  Perhaps /"valid"/.
Thoughts?


___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>





___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>



___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailt

Re: [Pulp-dev] Partially constructed data in the DB

2017-12-13 Thread Jeff Ortel



On 12/13/2017 12:46 PM, David Davis wrote:
A few questions. First, what is meant by incomplete? I’m assuming it 
refers to a version in the process of being created or one that was 
not successfully created?


Both.



Also, what’s the motivation behind storing this information? Is there 
something in Pulp that needs to know this or is this so that the user 
can know?


There may be others but an importer needs to be passed the new version 
so it can add/remove content.  It needs to exist in the DB so that it 
can add/remove content in separate transaction(s).




Lastly, I imagine that a task will be associated with the creation of 
a version. Does knowing its state not suffice for determining if a 
version is visible/valid?


IMHO, absolutely not.  That is not what tasks records in the DB are 
for.  Completed task records can be deleted at any time.





David

On Wed, Dec 13, 2017 at 12:16 PM, Jeff Ortel <jor...@redhat.com 
<mailto:jor...@redhat.com>> wrote:


There has been discussion on IRC about a matter related to
versioned repositories that needs to be broadened. It dealt with
situations whereby a new repository version exists in the DB in an
incomplete state.  The incomplete state exists because
conceptually a repository version includes both the version record
itself and all of the DB records that associate content.  For
several reasons, the entire version cannot be constructed in the
DB in a single DB transaction. The problem of /Incomplete State/
is not unique to repository versions.  It applies to publications
as well.  I would like to discuss and decide on a standard
approach to resolving this throughout the data model.

The IRC discussion (as related to me) suggested we use a common
approach of having a field in the DB that indicates this state. 
This seems reasonable to me.  As noted, it's a common approach. 
Thoughts?

Assume we did use a field, let's discuss name.  It's my
understanding that a field named /is_visible/ or just /visible/
was discussed.  I would argue two things.  1) the is_ prefix is
redundant to the fact it's a boolean field and we have not used
this convention anywhere else in the model.  2) Historically, the
term /"visible"/ has strong ties to user interfaces and is used to
mask fields or records from being displayed to users.  I propose
we use a more appropriate field name.  Perhaps /"valid"/. Thoughts?


___
Pulp-dev mailing list
Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
https://www.redhat.com/mailman/listinfo/pulp-dev
<https://www.redhat.com/mailman/listinfo/pulp-dev>




___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Partially constructed data in the DB

2017-12-13 Thread Jeff Ortel
There has been discussion on IRC about a matter related to versioned 
repositories that needs to be broadened.  It dealt with situations 
whereby a new repository version exists in the DB in an incomplete 
state.  The incomplete state exists because conceptually a repository 
version includes both the version record itself and all of the DB 
records that associate content.  For several reasons, the entire version 
cannot be constructed in the DB in a single DB transaction.  The problem 
of /Incomplete State/ is not unique to repository versions.  It applies 
to publications as well.  I would like to discuss and decide on a 
standard approach to resolving this throughout the data model.


The IRC discussion (as related to me) suggested we use a common approach 
of having a field in the DB that indicates this state. This seems 
reasonable to me.  As noted, it's a common approach. Thoughts?


Assume we did use a field, let's discuss name.  It's my understanding 
that a field named /is_visible/ or just /visible/ was discussed.  I 
would argue two things.  1) the is_ prefix is redundant to the fact it's 
a boolean field and we have not used this convention anywhere else in 
the model.  2) Historically, the term /"visible"/ has strong ties to 
user interfaces and is used to mask fields or records from being 
displayed to users.  I propose we use a more appropriate field name.  
Perhaps /"valid"/. Thoughts?


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Deferring 3 things for Pulp3 to 3.1+

2017-12-13 Thread Jeff Ortel

+1


On 12/12/2017 10:47 AM, Brian Bouterse wrote:
As we get to the end of the MVP planning for Pulp3, I want to check-in 
about deferring 3 areas of Pulp functionality to the 3.1+ page [0]. 
I'm looking for feedback, especially -1s, about deferring the 
following 3 things from the Pulp 3.0 release. This would finalize a 
few still-red or totally missing areas of the MVP [1].


- Consumer Applicability. Pulp3 won't manage consumers, but Pulp is 
still in a good position to offer applicability. Katello uses it 
significantly, but they won't be using the 3.0 release.


- Lazy downloading. I think this should be a top 3.1 priority. It will 
take a significant effort to update/test/release the streamer so I 
don't think we can include it in 3.0 for practical timeline reasons.


- Content Protection. I believe we want both basic auth and key based 
verification of content served by the Pulp content app. This is an 
easy feature to add, but not one I think we should plan fully or do as 
part of the 3.0 MVP.


Please send thoughts or ideas on these changes soon, so we can 
finalize the MVP document in the next few days.


[0]: https://pulp.plan.io/projects/pulp/wiki/31+_Ideas_(post_MVP) 

[1]: 
https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Minimum_Viable_Product/


Thank you,
Brian


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Proposal and feedback request: un-nest urls

2017-11-30 Thread Jeff Ortel


On 11/29/2017 04:32 PM, Brian Bouterse wrote:
> For deletes, the db relationships are all there, so I expect deletes to 
> cascade to other objects with any url
> structure. I believe closer to the release, we'll have to look at the 
> cascading delete relationships to see if
> the behaviors that we have are correct.
> 
> Overall, I'm +1 on un-nesting. I think it would result in a good user 
> experience. I know it goes against the
> logical composition arguments, which have been well laid out. We want Pulp to 
> be really simple, and the nested
> URL in the top of this thread is anything but simple. Consider another 
> project like Ansible Galaxy (who also
> uses Django and DRF). Their API is very flat and as an outsider I find it 
> very approachable: 
> https://galaxy.ansible.com/api/v1/  Pulp could be that simple.

Clicking through the Galaxy API, there seems to be a good bit of nesting.

> 
> My main concern in keeping the nesting is that this is going to be difficult 
> for plugin writers. Making plugin
> writing easy is a primary goal if not the primary goal of Pulp3. If core devs 
> are spending lots of time on it,
> a person doing this in their free time may not bother.
> 
> I also see practical reasons motivating us to un-nest. We have been adding 
> custom code regularly in this area,
> and it's been highly complexity and slow going. I think Austin described it 
> well. Getting the viewsets working
> and to be simpler would allow us to move forward in many areas.
> 
> So overall, un-nesting would give a better user experience (I think), a 
> simpler plugin writer experience, and
> it would unblock a lot of work.
> 
> 
> 
> On Wed, Nov 29, 2017 at 3:29 PM, Bihan Zhang <bizh...@redhat.com 
> <mailto:bizh...@redhat.com>> wrote:
> 
> I have a question about repository delete with the un-nested model. 
> When a repository is deleted does the DELETE cascade to the 
> importers/publishers that are linked to the
> repo? In an un-nested world I don't think they would. It would be odd for 
> an object with its own endpoint
> to vanish without the user calling DELETE on the model. 
> 
> When nested it makes sense to cascade the delete so if /repo/1/ is 
> deleted, everything thereafter
> (/repo/1/importer/2) should also be removed.
> 
> Austin, I do see you point about it being a lot more complicated, but I 
> think modeling things the right
> way is worth carrying the extra code and complexity. 
> 
> Anyways, maybe I'm wrong and importer/publishers should exist without a 
> repository, in which case I can
> definitely see the value in un-nesting the URLs.
> 
> 
> On Wed, Nov 29, 2017 at 2:21 PM, Jeff Ortel <jor...@redhat.com 
> <mailto:jor...@redhat.com>> wrote:
> 
> Austin makes a compelling argument.
> 
> 
> On 11/28/2017 02:16 PM, Austin Macdonald wrote:
> > When I look at this, the most important point is that we have a 
> hyperlinked REST API, which means
> that the
> > urls are specifically not going to be built by users.
> >
> > For a user to retrieve an importer, they would first GET the 
> importers for a repository. The next
> call would
> > be the exact href returned by pulp. This workflow is exactly the 
> same whether we nest or not. The only
> > difference is that we no longer convey the information in the href, 
> which seems fine to me since
> they aren't
> > particularly readable anyway.
> >
> > It has already been discussed that filtering can make up for the 
> use cases that use nesting, and
> that filters
> > would be more flexible.
> >
> > So for me, nesting costs in (1) extra code to carry (2) extra 
> dependency (3) complexity to use.
> >
> > To elaborate on the complexity, the problem is in declaring fields 
> on the serializer. The serializer is
> > responsible for building the urls, which requires all of the uuids 
> for the entire nested structure.
> This is
> > further complicated by master/detail, which is an entirely Pulp 
> concept.
> >
> > Because of this, anyone working on the API (likely including plugin 
> writers) will need to understand
> > parent_lookup_kwargs and how to use then with:
> > DetailNestedHyperlinkedRelatedField
> > DetailNestedHyperlinkedidentityField
> > DetailwritableNestedUrlRelatedField
> > DetailRelatedField
> > DetailIdentity

Re: [Pulp-dev] Proposal and feedback request: un-nest urls

2017-11-29 Thread Jeff Ortel
Austin makes a compelling argument.


On 11/28/2017 02:16 PM, Austin Macdonald wrote:
> When I look at this, the most important point is that we have a hyperlinked 
> REST API, which means that the
> urls are specifically not going to be built by users.
> 
> For a user to retrieve an importer, they would first GET the importers for a 
> repository. The next call would
> be the exact href returned by pulp. This workflow is exactly the same whether 
> we nest or not. The only
> difference is that we no longer convey the information in the href, which 
> seems fine to me since they aren't
> particularly readable anyway.
> 
> It has already been discussed that filtering can make up for the use cases 
> that use nesting, and that filters
> would be more flexible.
> 
> So for me, nesting costs in (1) extra code to carry (2) extra dependency (3) 
> complexity to use.
> 
> To elaborate on the complexity, the problem is in declaring fields on the 
> serializer. The serializer is
> responsible for building the urls, which requires all of the uuids for the 
> entire nested structure. This is
> further complicated by master/detail, which is an entirely Pulp concept. 
> 
> Because of this, anyone working on the API (likely including plugin writers) 
> will need to understand
> parent_lookup_kwargs and how to use then with:
> DetailNestedHyperlinkedRelatedField
> DetailNestedHyperlinkedidentityField
> DetailwritableNestedUrlRelatedField
> DetailRelatedField
> DetailIdentityField
> NestedHyperlinkedRelatedField
> HyperlinkedRelatedField.
> 
> The complexity seems inherrent, so I doubt we will be able to simplify this 
> much. So, is all this code and
> complexity worth the implied relationship in non-human-friendly urls? As 
> someone who has spent a lot of time
> on this code, I don't think so.
> 
> 
> 
> On Nov 28, 2017 06:12, "Patrick Creech" <pcre...@redhat.com 
> <mailto:pcre...@redhat.com>> wrote:
> 
> On Mon, 2017-11-27 at 16:10 -0600, Jeff Ortel wrote:
> > On 11/27/2017 12:19 PM, Jeff Ortel wrote:
> > >
> > >
> > > On 11/17/2017 08:55 AM, Patrick Creech wrote:
> > > > One of the things I like to think about in these types of 
> situations is, "what is good rest
> > > > api
> > > > design".  Nesting resources under other resources is a necessary 
> part of good api design, and
> > > > has
> > > > its place.  To borrow some terms from domain driven development:
> > > >
> > > > Collections of objects are called aggregates.  Think 'an order and 
> its line items'.  Line
> > > > items make
> > > > no sense without having the order context, so they are an aggregate 
> that is accessed under an
> > > > Order.  This is called the aggregate root.  The rest api design for 
> such an object, using
> > > > order as
> > > > the aggregate root, would look like:
> > > >
> > > > '/orders/' -- all orders
> > > > '/orders/{order_key}/' -- a specific order with key.
> > > > '/orders/{order_key}/items/' -- All of the order's items.
> > > > '/orders/{order_key}/items/{item_key}/' -- a specific line item of 
> the order
> > > >
> > > > When it comes to order items themselves, it isn't helpful to start 
> with them as their own
> > > > aggregate
> > > > root in one large collection:
> > > >
> > > > '/items/'   -- all order items in the system
> > >
> > > The order/items is a good example of aggregation (or composition) and 
> I agree it makes a strong
> > > case for
> > > nesting.  In pulp, a repository is easily thought of as a collection 
> or aggregation of content.
> > >
> > > >
> > > > Because you lose the order context. Based on api design, this 
> endpoint will need to respond
> > > > with all
> > > > order items across all orders and resort to parameter filtering to 
> provide the context you
> > > > need.
> > > >
> > > > A quote borrowed from Martin Fowler [0]
> > > >
> > > > "An aggregate will have one of its component objects be the 
> aggregate root. Any references
> > > > from
> > > > outside the aggregate should only go to the aggregate root. The 
> root can thus ensure the
> > > > integrity
> > > > of the aggregate a

Re: [Pulp-dev] Pulp 3: using JWT to request a JWT

2017-11-29 Thread Jeff Ortel
+1

On 11/28/2017 04:34 PM, Dennis Kliban wrote:
> Our MVP doc currently states "As an API user, I can authenticate any API call 
> (except to request a JWT) with a
> JWT. (not certain if this should be the behavior) [in progress]"
> 
> The uncertainty was due to the "except to request a JWT" clause.
> 
> I propose that Pulp 3 should support requesting a new JWT by using an 
> existing JWT. Automated systems that
> integrate with Pulp would benefit from being able to renew tokens using an 
> existing token.
> 
> Enabling this feature with django-rest-framework-jwt requires also selecting 
> the maximum amount of time since
> original token was issued that the token can be refreshed. The default is 7 
> days. Pulp users should be able to
> supply this value. Thy should also be able to specify how long each token is 
> good for.
> 
> 
> What do others think?
> 
> 
> ___
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
> 



signature.asc
Description: OpenPGP digital signature
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] repository versions update

2017-11-29 Thread Jeff Ortel


On 11/29/2017 07:22 AM, David Davis wrote:
> I think we could design an API in 3.0 that would support versioned repos in 
> 3.1+. However, our current API
> does not. For example, the /repositorycontents/ endpoint doesn't make sense 
> with versioned repos as no one
> would want to add/remove content units one-by-one when doing so would 
> generate a new repo version each time.
> Imagine that we end up with an endpoint in 3.0 that’s not compatible with 
> versioned repos. What would we do? I
> think this is a strong argument for adding versioned repos now.

agreed.

> 
> Of course the main drawback is that it might delay the beta. But I wonder by 
> how much. It might be good to
> groom the versioned repo user stories so that (a) we can see how much value 
> they provide to end users and (b)
> how closely they align with the work @mhrivnak has done.

agreed.

> 
> 
> David
> 
> On Tue, Nov 28, 2017 at 4:00 PM, Brian Bouterse  > wrote:
> 
> In reading back over the last email thread in May, it ended with us 
> looking at URL options to ensure we
> could release 3.0 and add in repo versions in 3.1+. We definitely want 
> repo versions in the 3.y line, so
> we wanted to make sure that was possible. If it wasn't, then we may have 
> to add it into 3.0.
> 
> That question is a lot easier now given how firm the API is. I think we 
> can add in versioned repos in
> 3.1+, in a natural way. Just like a user creates a Publication which 
> triggers a publish, a user would
> create a RepoVersion which would trigger a sync to produce that new 
> RepoVersion. The repo versions work
> needs to continue, but first I hope we prioritize getting to Beta 1 for 
> core. There are a lot of use cases
> in black on the MVP which are not implemented or written in Redmine. I 
> believe closing that gap would be a
> better use of time given that we can add this later.
> 
> What do others think?
> 
> 
> On Tue, Nov 28, 2017 at 2:24 PM, Dennis Kliban  > wrote:
> 
> I have a hard objection to including versioned repositories in 3.0. 
> We agreed to make sure that our
> current design would not prevent us from adding versioned 
> repositories in the future. We did NOT agree
> to including versioned repositories in 3.0 release. This is a big 
> code change that did not go through
> our regular planning process. I greatly appreciate your effort in 
> driving this feature forward, but we
> should take a step back and go through our regular process. I am also 
> concerned that adding such a big
> change at this time will delay the beta.
> 
> -Dennis
> 
> 
> On Tue, Nov 28, 2017 at 10:10 AM, Michael Hrivnak 
> >
> wrote:
> 
> Following up on previous discussions, I did an analysis of how 
> repository versioning would impact
> Pulp 3's current REST API and plugin API. A lot has changed since 
> we last discussed the topic (in
> May 2017), such as how we handle publications, and how the REST 
> API is laid out. You can read the
> analysis here:
> 
> https://pulp.plan.io/projects/pulp/wiki/Repository_Versions
> 
> 
> We previously discussed and vetted the mechanics at great length. 
> While there was broad agreement
> on the value to Pulp 3, there was uncertainty about the details 
> of how it would impact REST
> clients and plugin writers, and also uncertainty about how long 
> it would take to fully implement.
> 
> In the course of my recent analysis, two things became clear. 1) 
> both current APIs are not
> compatible and would have to change. Details are on the wiki page 
> above. 2) the PoC from earlier
> this year indeed covers the hard parts, leaving mostly DRF 
> details to sort out.
> 
> 
> I don't agree with your assessment that the current REST API is not 
> compatible with adding repository
> versions. A repository version is it's own resource that can be added
>  
> 
> 
> I started rebasing the PoC onto current 3.0-dev, and within an 
> hour I had it working with the
> updated REST endpoints. With that having been so easy, I threw 
> caution to the wind, and within a
> few hours I had a fully functional branch that covered all the 
> key use cases.
> 
> - sync creates a new version
> - versions and their content sets are visible through the REST API
> - each version shows what content was added and removed
> - versions can be deleted, which queues a task that squashes 
> changes as previously discussed
> - the ChangeSet and pulp_file were updated 

Re: [Pulp-dev] Proposal and feedback request: un-nest urls

2017-11-27 Thread Jeff Ortel


On 11/17/2017 08:55 AM, Patrick Creech wrote:
> One of the things I like to think about in these types of situations is, 
> "what is good rest api
> design".  Nesting resources under other resources is a necessary part of good 
> api design, and has
> its place.  To borrow some terms from domain driven development:
> 
> Collections of objects are called aggregates.  Think 'an order and its line 
> items'.  Line items make
> no sense without having the order context, so they are an aggregate that is 
> accessed under an
> Order.  This is called the aggregate root.  The rest api design for such an 
> object, using order as
> the aggregate root, would look like:
> 
> '/orders/' -- all orders
> '/orders/{order_key}/' -- a specific order with key.
> '/orders/{order_key}/items/' -- All of the order's items.
> '/orders/{order_key}/items/{item_key}/' -- a specific line item of the order
> 
> When it comes to order items themselves, it isn't helpful to start with them 
> as their own aggregate
> root in one large collection:
> 
> '/items/'   -- all order items in the system

The order/items is a good example of aggregation (or composition) and I agree 
it makes a strong case for
nesting.  In pulp, a repository is easily thought of as a collection or 
aggregation of content.

> 
> Because you lose the order context. Based on api design, this endpoint will 
> need to respond with all
> order items across all orders and resort to parameter filtering to provide 
> the context you need.
> 
> A quote borrowed from Martin Fowler [0]
> 
> "An aggregate will have one of its component objects be the aggregate root. 
> Any references from
> outside the aggregate should only go to the aggregate root. The root can thus 
> ensure the integrity
> of the aggregate as a whole."
> 
> Publishers, importers, and publications are all aggregates that don't make 
> much sense outside of
> their aggregate root of Repository.  They are dependent on the Repository 
> context, and from a domain
> view, should be accessed starting with their specific Repository endpoint.

I don't think the aggregation relationship exists between repository and 
importer/publisher.  There is a
strong association between repository and importer/publisher which /could/ even 
be characterized as
"ownership".  However, I don't think there is an aggregation (or composition) 
relationship.  The same for
publisher & publication.  A publication is associated to its creating publisher 
but the publisher isn't an
aggregation of publications.  The relationship mainly provides linkage to the 
repository.

> 
> --
> Specific items rebuttals:
> 
> Yes, using the primary key uuid's as the immutable key adds some human 
> readable challenges to
> the API.  That sounds more like a point to discuss in the human readable vs. 
> not human readable
> immutable key debate.

Agreed.

Also, I don't think nesting impacts URL readability.

> 
> One of the challenges in software engineering is ensuring the tools you 
> are using don't limit
> your choices.  DRF limited the choices for pulp's rest API design, and 
> drf-nested-routers was
> introduced to help remove that limit.  If working around these limitations is 
> complex, take
> advantage of open source here and help improve the upstream dependencies for 
> your workflow.
> 
> As far as making things simpler for plugin writers, perhaps there are 
> ways you can simplify it
> for them by providing some encapsulation in pulp's core instead.  Abstract 
> away the nasty bits
> behind the scenes, and provide them with a simpler interface to do what they 
> need.
> 
> With respect to the invested time already in making this work, I agree 
> with jeremy that it
> should be considered part of the sunken cost fallacy.  What does need to be 
> evaluated though is how
> much time re-architecting at this point will cost you (discussion, planning, 
> and development) vs the
> amount of time it will save, and weigh that against any planned milestones 
> for pulp to see if it
> will push them out as well.
> 
> I'm also in agreement that it is moot if pulp3 has a different api 
> structure than pulp2.  Major
> version boundaries are the perfect time for evaluating and moving such things 
> around.
> 
> [0] https://martinfowler.com/bliki/DDD_Aggregate.html
> 
> 
> 
> ___
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
> 



signature.asc
Description: OpenPGP digital signature
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Task tagging in Pulp 3

2017-11-14 Thread Jeff Ortel
On 11/14/2017 01:24 PM, Brian Bouterse wrote:
> Thanks for all the discussion! I agree there are improvements to be made here.
> 
> I don't think either of these proposals solve all the problems without 
> creating a few new ones. Rather than
> saying +1 to either, I want to talk about the goals and use cases a bit more. 
> Here is a list of 3 related use
> cases that Pulp currently *can't* do, along with some commentary on the state 
> of why. Once we decide on what
> we think users need, figuring out how to make it should be straightforward.
> 
> 1) As a user, I can know if a task is current locking a worker or not. Say 
> they need to take a worker offline.
> They have no way of knowing if that is safe to do at this moment without this 
> info. Pulp internally knows this
> information, but I don't believe this is visible to the user currently. This 
> is useful info for debugging that
> we regularly have to pull from the db. Regardless of either proposal, we 
> still need to decide if this will be
> included in the Task viewset. I'm +1 to adding this use case to the MVP 
> explicitly. Do people feel this use
> case should be added to the MVP?

+1

> 
> 2) As a user, I can filter for tasks by the resource locked, e.g. repo 'foo', 
> without forming a special search
> string to search by. Currently the 'resource' field in the TaskTag model 
> stores a string like
> 'repository:foo'. Even if you know the name 'foo' you need to search via 
> substring (inefficient and maybe
> dangerous).You also can't search by other properties like the UUID, feed, etc 
> because to Pulp it's just a
> string 'repository:foo'. It doesn't know that is actually repository 
> x---zzz with a name='foo'.

I would like to see this use case expanded (into several cases) to include a 
description of why a user wants
to do this.  What are they trying to accomplish.  Like: "As a user, I want to 
search for tasks pending for a
repository because I'm trying to understand why my task isn't running yet."

This applies to #3 as well.

> 
> 3) As a user, I can filter for tasks by operation type (e.g. sync). Currently 
> we have no way to do this. The
> data model doesn't even have a field to capture this information. This 
> feature seems simple from a high level,
> but determining the specific taxonomy of those operation types it can get 
> messy. We have 'sync' and 'publish',
> those are pretty clear. What about 'update' like a publisher/importer/repo 
> attribute update? How about 'add'
> and 'remove' content? What if both add and remove happen in the same 
> operation? Is that two tags or a new one?
> If we're going to talk about this feature we need to call out the use cases 
> more specifically. A series of use
> cases like: "As a user, I can filter for tasks labeled with the 'sync' 
> operation" could work.
> 
> Another way to accomplish use case (3) is to record the actual task name as a 
> string, e.g.
> 'pulpcore.app.tasks.importer.sync'. This won't work well either because we 
> DRY up our tasks, especially
> update, so I think the simple taxonomy is still the way forward for that 
> feature. https://pulp.plan.io/issues/3038

Agreed.  Exposing and relying on an implementation detail like 
"pulpcore.app.tasks.importer.sync" would be bad.

> 
> Each proposal affects these use cases, but neither of them totally enables 
> all of them. Aside from solving
> them, what do others thing about these use cases and the current state of 
> Pulp3 w.r.t them? Thanks for all the
> discussion.
> 
> -Brian
> 
> 
> On Thu, Nov 9, 2017 at 3:43 PM, Dennis Kliban  > wrote:
> 
> On Mon, Nov 6, 2017 at 2:17 PM, David Davis  > wrote:
> 
> Originally I scheduled a meeting for tomorrow but on second thought, 
> I figured a pulp-dev thread would
> be more inclusive than a meeting. I hope to get this resolved by the 
> end of this week and if not then
> maybe we can have a meeting. 
> 
> This is to design out the replacement of task tags in Pulp 3. I’ve 
> got feedback from a few other
> developers in terms of how to do that so I wrote up a sort of outline 
> of the problem and two possible
> proposals. Looking for feedback/questions/etc on what people prefer.
> 
> 
> Background
> ---
> 
> In Pulp 2, tasks have tags that either provide a task 
> name/description or info on what resources a
> task acts on. Tasks also have reserved resources, which provide a way 
> for tasks to lock a particular
> resource.
> 
> In Pulp 3, we have models TaskTag and ReservedResource[0]. Tasks are 
> associated with the resources
> they work on via TaskTag. If a resource is locked, a ReservedResource 
> record is created in the db and
> then removed from the db once the resource is unlocked.
> 
> 
> Problem
> ---
> 
> 

  1   2   >