[Pulp-dev] Content types which are not compatible with the normal pulp workflow

2018-05-17 Thread Daniel Alley
Some content types are not going to be compatible with the normal
sync/publish/distribute Pulp workflows, and will need to be live API-only.
To what degree should Pulp accomodate these use cases?

Example:

Pulp makes the assumptions that

A) the metadata for a repository can be generated in its entirety by the
known set of content in a RepositoryVersion, and

B) the client wouldn't care if you point it at an older version of the same
repository.

Cargo, the package manager for the Rust programming language, expects the
registry url to be a git repository.  When a user does a "cargo update",
cargo essentially does a "git pull" to update a local copy of the registry.

Both of those assumptions are false in this case. You cannot generate the
git history just from the set of content, and you cannot "roll back" the
state of the repository without either breaking it for clients, or adding
new commits on top.

A theoretical Pulp plugin that worked with Cargo would need to ignore
almost all of the existing Pulp primitives and very little (if any) of the
normal Pulp workflow could be used.

Should Pulp attempt to cater to plugins like these?  What could Pulp do to
provide a benefit for such plugins over writing something from scratch from
the ground up?  To what extent would such plugins be able to integrate with
the rest of Pulp, if at all?

We don't have to commit to anything pre-GA but it is a good thing to keep
in mind.  I'm sure there are other content types out there (not just Cargo)
which would face similar problems.  pulp_git was inquired about a few
months ago, it seems like it would share a few of them.
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Content types which are not compatible with the normal pulp workflow

2018-05-24 Thread Jeff Ortel



On 05/17/2018 07:46 AM, Daniel Alley wrote:
Some content types are not going to be compatible with the normal 
sync/publish/distribute Pulp workflows, and will need to be live 
API-only.  To what degree should Pulp accomodate these use cases?


Example:

Pulp makes the assumptions that

A) the metadata for a repository can be generated in its entirety by 
the known set of content in a RepositoryVersion, and


B) the client wouldn't care if you point it at an older version of the 
same repository.


Cargo, the package manager for the Rust programming language, expects 
the registry url to be a git repository. When a user does a "cargo 
update", cargo essentially does a "git pull" to update a local copy of 
the registry.


Both of those assumptions are false in this case. You cannot generate 
the git history just from the set of content, and you cannot "roll 
back" the state of the repository without either breaking it for 
clients, or adding new commits on top.


A theoretical Pulp plugin that worked with Cargo would need to ignore 
almost all of the existing Pulp primitives and very little (if any) of 
the normal Pulp workflow could be used.


Should Pulp attempt to cater to plugins like these?  What could Pulp 
do to provide a benefit for such plugins over writing something from 
scratch from the ground up?  To what extent would such plugins be able 
to integrate with the rest of Pulp, if at all?


I think OSTree and Ansible plugins will be in the same boat as Cargo.  
In the case of OSTree, libostree does the heavy lifting for sync and 
publishing and I suspect the same is true for Git based repositories.  
We should consider way to best support distributing (serving) content in 
core for these content types.  I suspect this will mainly entail 
something in the content app and perhaps a new component of a 
Publication like PublishedDirectory that references an OSTree/Git 
repository created in /var/lib/pulp/published.  This may benefit Maven 
as well.




We don't have to commit to anything pre-GA but it is a good thing to 
keep in mind.  I'm sure there are other content types out there (not 
just Cargo) which would face similar problems. pulp_git was inquired 
about a few months ago, it seems like it would share a few of them.



___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Content types which are not compatible with the normal pulp workflow

2018-05-25 Thread Brian Bouterse
I think Pulp does have enough value proposition over a script-based
alternative to make it worthwile for all of those types of plugins. Here
are a few points I think about:

* scalability. A common story users tell is that scripts work well up until
a point. Doing it for an entire organization, or when content comes from
many places, or with more than a few people involved in maintaining the
content, it becomes unmaintainable.

* Stacks of content. Often a group of content goes together, but each piece
of content is updated separately. For instance with Ansible roles, you may
use many of them together to deploy something, but each role may receive
changes separately. I think of all this content together as a "stack".
Keeping everything up to date can be challenging. Managing that change with
scripts can be hard and fragile. Also the ability to rollback quickly an
confidently is something Pulp can offer.

* Organizing content is easier. Having an API that you can use to organize
content is easier than doing lots and lots of git yourself or with scripts.

* Tasking. Long running tasks (and a lot of them) can be unweildy, and Pulp
makes that very organized and run very well.

* Static and vulnerability analysis. We're seeing interest in using
analysis projects like Clair (https://github.com/arminc/clair-scanner) to
scan content in Pulp. By bringing all the content into one place, and that
place having a tasking system that plugin writers can control how their
content can be analyzed continuously.

Also +1 to jortel's idea. I think that's a great idea and exactly what we
need.


On Thu, May 24, 2018 at 1:33 PM, Jeff Ortel  wrote:

>
>
> On 05/17/2018 07:46 AM, Daniel Alley wrote:
>
> Some content types are not going to be compatible with the normal
> sync/publish/distribute Pulp workflows, and will need to be live API-only.
> To what degree should Pulp accomodate these use cases?
>
> Example:
>
> Pulp makes the assumptions that
>
> A) the metadata for a repository can be generated in its entirety by the
> known set of content in a RepositoryVersion, and
>
> B) the client wouldn't care if you point it at an older version of the
> same repository.
>
> Cargo, the package manager for the Rust programming language, expects the
> registry url to be a git repository.  When a user does a "cargo update",
> cargo essentially does a "git pull" to update a local copy of the registry.
>
> Both of those assumptions are false in this case. You cannot generate the
> git history just from the set of content, and you cannot "roll back" the
> state of the repository without either breaking it for clients, or adding
> new commits on top.
>
> A theoretical Pulp plugin that worked with Cargo would need to ignore
> almost all of the existing Pulp primitives and very little (if any) of the
> normal Pulp workflow could be used.
>
> Should Pulp attempt to cater to plugins like these?  What could Pulp do to
> provide a benefit for such plugins over writing something from scratch from
> the ground up?  To what extent would such plugins be able to integrate with
> the rest of Pulp, if at all?
>
>
> I think OSTree and Ansible plugins will be in the same boat as Cargo.  In
> the case of OSTree, libostree does the heavy lifting for sync and
> publishing and I suspect the same is true for Git based repositories.  We
> should consider way to best support distributing (serving) content in core
> for these content types.  I suspect this will mainly entail something in
> the content app and perhaps a new component of a Publication like
> PublishedDirectory that references an OSTree/Git repository created in
> /var/lib/pulp/published.  This may benefit Maven as well.
>

>
> We don't have to commit to anything pre-GA but it is a good thing to keep
> in mind.  I'm sure there are other content types out there (not just Cargo)
> which would face similar problems.  pulp_git was inquired about a few
> months ago, it seems like it would share a few of them.
>
>
> ___
> Pulp-dev mailing 
> listPulp-dev@redhat.comhttps://www.redhat.com/mailman/listinfo/pulp-dev
>
>
>
> ___
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Content types which are not compatible with the normal pulp workflow

2018-05-25 Thread Daniel Alley
@Brian

I agree with a lot of those points, but I would say that we're not just
competing against hodgepodge collections of "scripts", but also against
writing small microservice-y Flask apps that only implement the API for one
content type.

Also, rollback is not something Pulp would necessarily be able to offer
with respect to history-sensitive content and metadata, like git
repositories, or the Cargo example I provided.  It's still something the
plugin writer would have to implement themselves in this case.

@Jeff

perhaps a new component of a Publication like PublishedDirectory that
> references an OSTree/Git repository created in /var/lib/pulp/published.
>

I like the idea generally, but I don't think it would be able to be a
component of a Publication.  I think it would need to be an alternative to
a Publication which fulfills a similar function.

The fundamental problem is this scenario:

   1. You upload a git repository with a git repository plugin
   2. You publish and distribute version 1 of the git repository
   3. You publish and distribute version 2 of the git repository
   4. A client downloads the git repository
   5. You notice a problem and decide to roll back to version 1.  A
   publication of version 1 already exists, which you distribute.
   6. Clients have a broken git history.  New clients can download the old
   version but anyone who has already downloaded version 2 will not be able to
   roll back to version 1 by pulling from Pulp

We need to prevent step 5 from happening.

There are a couple of possible solutions to this problem:

   - As a Pulp admin, you ignore Pulp's rollback functionality.  Instead of
   using Pulp to roll back, you manually revert the commits using git, and
   upload a new version of the repository to Pulp as "version 3".  You then
   distribute version 3 instead of version 1.  You understand that if you were
   to publish and old version using Pulp, it would misbehave for clients that
   tried to pull / update instead of cloning.


   - As a Pulp admin / plugin writer / user, you know that the client for
   the content type will never try to pull or update, only clone.  Therefore
   it is not a problem for you and can be ignored.


   - As a Plugin writer, whenever you publish a new version of the git
   repository, you delete or invalidate every publication for previous
   versions for the distribution base path.  If a Pulp admin wants to roll
   back, they need to create a new Publication.  The Plugin knows to apply
   revert commits on top of the repository to keep history linear.
   - But really we've just pushed the problem forwards.  What happens when
  you want to upload future versions?  Now history of the git repository in
  Pulp is different from the Pulp admin's git repo history
  - This is only acceptable for content types where the history is
  immaterial to the content itself. Probably viable for Cargo, but probably
  not a Git content type.


   - As a Plugin writer, you ignore publications entirely.  You don't make
   it possible to do the wrong thing. You have something along the lines of a
   "PluginManagedDirectory" which core does not try to mess with.  If you want
   to implement rollback functionality, you do it through your own API where
   the side effects are more easily controlled and reasoned about.

I have doubts about whether Option 3 is viable - it seems like making it
work reliably would be difficult.

On Fri, May 25, 2018 at 5:05 PM, Brian Bouterse  wrote:

> I think Pulp does have enough value proposition over a script-based
> alternative to make it worthwile for all of those types of plugins. Here
> are a few points I think about:
>
> * scalability. A common story users tell is that scripts work well up
> until a point. Doing it for an entire organization, or when content comes
> from many places, or with more than a few people involved in maintaining
> the content, it becomes unmaintainable.
>
> * Stacks of content. Often a group of content goes together, but each
> piece of content is updated separately. For instance with Ansible roles,
> you may use many of them together to deploy something, but each role may
> receive changes separately. I think of all this content together as a
> "stack". Keeping everything up to date can be challenging. Managing that
> change with scripts can be hard and fragile. Also the ability to rollback
> quickly an confidently is something Pulp can offer.
>
> * Organizing content is easier. Having an API that you can use to organize
> content is easier than doing lots and lots of git yourself or with scripts.
>
> * Tasking. Long running tasks (and a lot of them) can be unweildy, and
> Pulp makes that very organized and run very well.
>
> * Static and vulnerability analysis. We're seeing interest in using
> analysis projects like Clair (https://github.com/arminc/clair-scanner) to
> scan content in Pulp. By bringing all the content into one place, and that
> place having a task

Re: [Pulp-dev] Content types which are not compatible with the normal pulp workflow

2018-05-28 Thread Milan Kovacik
On Sat, May 26, 2018 at 2:23 AM, Daniel Alley  wrote:
> @Brian
>
> I agree with a lot of those points, but I would say that we're not just
> competing against hodgepodge collections of "scripts", but also against
> writing small microservice-y Flask apps that only implement the API for one
> content type.
>
> Also, rollback is not something Pulp would necessarily be able to offer with
> respect to history-sensitive content and metadata, like git repositories, or
> the Cargo example I provided.  It's still something the plugin writer would
> have to implement themselves in this case.
>
> @Jeff
>
>> perhaps a new component of a Publication like PublishedDirectory that
>> references an OSTree/Git repository created in /var/lib/pulp/published.
>
>
> I like the idea generally, but I don't think it would be able to be a
> component of a Publication.  I think it would need to be an alternative to a
> Publication which fulfills a similar function.
>
> The fundamental problem is this scenario:
>
> You upload a git repository with a git repository plugin
> You publish and distribute version 1 of the git repository
> You publish and distribute version 2 of the git repository
> A client downloads the git repository
> You notice a problem and decide to roll back to version 1.  A publication of
> version 1 already exists, which you distribute.
> Clients have a broken git history.  New clients can download the old version
> but anyone who has already downloaded version 2 will not be able to roll
> back to version 1 by pulling from Pulp

Just trying to understand the situation:
Is that because of the rollback actually creates version #3 that's
"newer" but lacks the rolled-back commits?
So there are some "merge" conflict if folks, that cloned #2, want to
pull from version #3 but their branch contains a commit the origin
lacks now?
Or rather that the published bits of the version #2 doesn't exist
anymore at all?

>
> We need to prevent step 5 from happening.
>
> There are a couple of possible solutions to this problem:
>
> As a Pulp admin, you ignore Pulp's rollback functionality.  Instead of using
> Pulp to roll back, you manually revert the commits using git, and upload a
> new version of the repository to Pulp as "version 3".  You then distribute
> version 3 instead of version 1.  You understand that if you were to publish
> and old version using Pulp, it would misbehave for clients that tried to
> pull / update instead of cloning.

In my opinion folks needing Pulp to track a git(-like) repo are
probably interested in more workflows than just the clone.

>
> As a Pulp admin / plugin writer / user, you know that the client for the
> content type will never try to pull or update, only clone.  Therefore it is
> not a problem for you and can be ignored.

The cloning might be equivalent of just snapshotting the tree at a
particular commit and just publishing a plain tar.gz w/o the git
structures.
Limiting but clean?

>
> As a Plugin writer, whenever you publish a new version of the git
> repository, you delete or invalidate every publication for previous versions
> for the distribution base path.  If a Pulp admin wants to roll back, they
> need to create a new Publication.  The Plugin knows to apply revert commits
> on top of the repository to keep history linear.
>
> But really we've just pushed the problem forwards.  What happens when you
> want to upload future versions?  Now history of the git repository in Pulp
> is different from the Pulp admin's git repo history
> This is only acceptable for content types where the history is immaterial to
> the content itself. Probably viable for Cargo, but probably not a Git
> content type.
>

Does it mean a publication directory git tree is built anew every time
a rollback happens?
So Pulp history and the original project history are meant to be different?
Can there be ever conflicts?


> As a Plugin writer, you ignore publications entirely.  You don't make it
> possible to do the wrong thing. You have something along the lines of a
> "PluginManagedDirectory" which core does not try to mess with.  If you want
> to implement rollback functionality, you do it through your own API where
> the side effects are more easily controlled and reasoned about.

+1 seems like the cleanest way to me

>
> I have doubts about whether Option 3 is viable - it seems like making it
> work reliably would be difficult.

I'd say option #1 and #3 are the same, #3 adding the complexity of
automating the rollback in Pulp,
option #2 and #4 are the same in the sense of Pulp staying away from
the incompatible workflow a content type has while providing a limited
functionality subset to the consumer. In addition, #4 allows for Pulp
service/host to provide both the Pulp-specific, limited functionality
as well as the incompatible, content-type specific workflows from a
"single" point. This might be a benefit to some folks.


Option #5: somehow make core Pulp (content versioning) compatible with
the Git model ;)

-

Re: [Pulp-dev] Content types which are not compatible with the normal pulp workflow

2018-05-28 Thread Daniel Alley
>
> Is that because of the rollback actually creates version #3 that's
> "newer" but lacks the rolled-back commits?
> So there are some "merge" conflict if folks, that cloned #2, want to
> pull from version #3 but their branch contains a commit the origin
> lacks now?
> Or rather that the published bits of the version #2 doesn't exist
> anymore at all?


The first one.  It would be like if someone force-pushed to the git
repository, removing the last couple of commits of history.  It's basically
the same problem.

Does it mean a publication directory git tree is built anew every time
> a rollback happens?
>

What it would have to do is take the existing git tree and apply new
commits on top to return the contents of the repository to the state you
want to roll it back to.

So Pulp history and the original project history are meant to be different?
> Can there be ever conflicts?
>

It's not that they're *meant* to be different, but I think it is an
unavoidable problem if you want to do rollbacks in Pulp.

The source git repository for the project, whether it's on github or the
admin's machine, is separate from Pulp's copy. The second you add a commit
to one and not the other (by doing rollback w/ linear git history from the
client's perspective), the histories will diverge.  It's unavoidable,
that's just how git works.  You can keep the content of the files in the
repo identical but the history will never be equivalent again.

Basically, it is mutually exclusive to have:

* Pulp not be the "master" git repository e.g. the admin is syncing /
uploading it from somewhere else
* maintain linear git history
* be able to do rollbacks in Pulp
* keep identical git history between Pulp and the git repository being
synced / uploaded into Pulp

One of them has to give.


On Mon, May 28, 2018 at 8:01 AM, Milan Kovacik  wrote:

> On Sat, May 26, 2018 at 2:23 AM, Daniel Alley  wrote:
> > @Brian
> >
> > I agree with a lot of those points, but I would say that we're not just
> > competing against hodgepodge collections of "scripts", but also against
> > writing small microservice-y Flask apps that only implement the API for
> one
> > content type.
> >
> > Also, rollback is not something Pulp would necessarily be able to offer
> with
> > respect to history-sensitive content and metadata, like git
> repositories, or
> > the Cargo example I provided.  It's still something the plugin writer
> would
> > have to implement themselves in this case.
> >
> > @Jeff
> >
> >> perhaps a new component of a Publication like PublishedDirectory that
> >> references an OSTree/Git repository created in /var/lib/pulp/published.
> >
> >
> > I like the idea generally, but I don't think it would be able to be a
> > component of a Publication.  I think it would need to be an alternative
> to a
> > Publication which fulfills a similar function.
> >
> > The fundamental problem is this scenario:
> >
> > You upload a git repository with a git repository plugin
> > You publish and distribute version 1 of the git repository
> > You publish and distribute version 2 of the git repository
> > A client downloads the git repository
> > You notice a problem and decide to roll back to version 1.  A
> publication of
> > version 1 already exists, which you distribute.
> > Clients have a broken git history.  New clients can download the old
> version
> > but anyone who has already downloaded version 2 will not be able to roll
> > back to version 1 by pulling from Pulp
>
> Just trying to understand the situation:
> Is that because of the rollback actually creates version #3 that's
> "newer" but lacks the rolled-back commits?
> So there are some "merge" conflict if folks, that cloned #2, want to
> pull from version #3 but their branch contains a commit the origin
> lacks now?
> Or rather that the published bits of the version #2 doesn't exist
> anymore at all?
>
> >
> > We need to prevent step 5 from happening.
> >
> > There are a couple of possible solutions to this problem:
> >
> > As a Pulp admin, you ignore Pulp's rollback functionality.  Instead of
> using
> > Pulp to roll back, you manually revert the commits using git, and upload
> a
> > new version of the repository to Pulp as "version 3".  You then
> distribute
> > version 3 instead of version 1.  You understand that if you were to
> publish
> > and old version using Pulp, it would misbehave for clients that tried to
> > pull / update instead of cloning.
>
> In my opinion folks needing Pulp to track a git(-like) repo are
> probably interested in more workflows than just the clone.
>
> >
> > As a Pulp admin / plugin writer / user, you know that the client for the
> > content type will never try to pull or update, only clone.  Therefore it
> is
> > not a problem for you and can be ignored.
>
> The cloning might be equivalent of just snapshotting the tree at a
> particular commit and just publishing a plain tar.gz w/o the git
> structures.
> Limiting but clean?
>
> >
> > As a Plugin writer, whenever you publi

Re: [Pulp-dev] Content types which are not compatible with the normal pulp workflow

2018-05-28 Thread Milan Kovacik
Thanks for the explanation!

On Mon, May 28, 2018 at 4:17 PM, Daniel Alley  wrote:
>> Is that because of the rollback actually creates version #3 that's
>> "newer" but lacks the rolled-back commits?
>> So there are some "merge" conflict if folks, that cloned #2, want to
>> pull from version #3 but their branch contains a commit the origin
>> lacks now?
>> Or rather that the published bits of the version #2 doesn't exist
>> anymore at all?
>
>
> The first one.  It would be like if someone force-pushed to the git
> repository, removing the last couple of commits of history.  It's basically
> the same problem.
>
>> Does it mean a publication directory git tree is built anew every time
>> a rollback happens?
>
>
> What it would have to do is take the existing git tree and apply new commits
> on top to return the contents of the repository to the state you want to
> roll it back to.
>
>> So Pulp history and the original project history are meant to be
>> different?
>> Can there be ever conflicts?
>
>
> It's not that they're meant to be different, but I think it is an
> unavoidable problem if you want to do rollbacks in Pulp.
>
> The source git repository for the project, whether it's on github or the
> admin's machine, is separate from Pulp's copy. The second you add a commit
> to one and not the other (by doing rollback w/ linear git history from the
> client's perspective), the histories will diverge.  It's unavoidable, that's
> just how git works.  You can keep the content of the files in the repo
> identical but the history will never be equivalent again.

...impairing the usability of Pulp as the "master" repository

>
> Basically, it is mutually exclusive to have:
>
> * Pulp not be the "master" git repository e.g. the admin is syncing /
> uploading it from somewhere else
> * maintain linear git history
> * be able to do rollbacks in Pulp
> * keep identical git history between Pulp and the git repository being
> synced / uploaded into Pulp
>
> One of them has to give.

+1

I believe any content type/plug-in with its own idea of  content
versioning will have the same conflict.
Wrapping/translating from content-specific versioning scheme to Pulp
versioning scheme sounds like a headache even if Pulp supports a
non-linear history one day.

Let's forget about it and give the plug-in the ability to opt-out from
the core versioning scheme instead?

Cheers,
milan

>
>
> On Mon, May 28, 2018 at 8:01 AM, Milan Kovacik  wrote:
>>
>> On Sat, May 26, 2018 at 2:23 AM, Daniel Alley  wrote:
>> > @Brian
>> >
>> > I agree with a lot of those points, but I would say that we're not just
>> > competing against hodgepodge collections of "scripts", but also against
>> > writing small microservice-y Flask apps that only implement the API for
>> > one
>> > content type.
>> >
>> > Also, rollback is not something Pulp would necessarily be able to offer
>> > with
>> > respect to history-sensitive content and metadata, like git
>> > repositories, or
>> > the Cargo example I provided.  It's still something the plugin writer
>> > would
>> > have to implement themselves in this case.
>> >
>> > @Jeff
>> >
>> >> perhaps a new component of a Publication like PublishedDirectory that
>> >> references an OSTree/Git repository created in /var/lib/pulp/published.
>> >
>> >
>> > I like the idea generally, but I don't think it would be able to be a
>> > component of a Publication.  I think it would need to be an alternative
>> > to a
>> > Publication which fulfills a similar function.
>> >
>> > The fundamental problem is this scenario:
>> >
>> > You upload a git repository with a git repository plugin
>> > You publish and distribute version 1 of the git repository
>> > You publish and distribute version 2 of the git repository
>> > A client downloads the git repository
>> > You notice a problem and decide to roll back to version 1.  A
>> > publication of
>> > version 1 already exists, which you distribute.
>> > Clients have a broken git history.  New clients can download the old
>> > version
>> > but anyone who has already downloaded version 2 will not be able to roll
>> > back to version 1 by pulling from Pulp
>>
>> Just trying to understand the situation:
>> Is that because of the rollback actually creates version #3 that's
>> "newer" but lacks the rolled-back commits?
>> So there are some "merge" conflict if folks, that cloned #2, want to
>> pull from version #3 but their branch contains a commit the origin
>> lacks now?
>> Or rather that the published bits of the version #2 doesn't exist
>> anymore at all?
>>
>> >
>> > We need to prevent step 5 from happening.
>> >
>> > There are a couple of possible solutions to this problem:
>> >
>> > As a Pulp admin, you ignore Pulp's rollback functionality.  Instead of
>> > using
>> > Pulp to roll back, you manually revert the commits using git, and upload
>> > a
>> > new version of the repository to Pulp as "version 3".  You then
>> > distribute
>> > version 3 instead of version 1.  You understand that if