Re: [Pulp-dev] Lazy for Pulp3

2018-06-13 Thread Brian Bouterse
@ipanova, +1 to your names, I updated the epic.

FYI, I updated the epic in several ways to allow for the "cache_only"
option in the design.

I added a new task to add "policy" also to ContentUnit so the streamer can
know what to do:  https://pulp.plan.io/issues/3763

Other updates to allow for "cache_only":
https://pulp.plan.io/issues/3695#note-2
https://pulp.plan.io/issues/3699#note-3
https://pulp.plan.io/issues/3693



On Thu, Jun 7, 2018 at 5:10 AM, Ina Panova  wrote:

> we could try to go with:
>
> policy=immediate  -> downloads now while the task runs (no lazy). Also the
> default if unspecified.
> policy=on_demand   -> All the steps in the diagram. Content that is
> downloaded is saved so that it's only ever downloaded once.
> policy=cache_only -> All the steps in the diagram except step 14. If
> squid pushes the bits out of the cache, it will be re-downloaded again to
> serve to other clients requesting the same bits.
>
>
>
> 
> Regards,
>
> Ina Panova
> Software Engineer| Pulp| Red Hat Inc.
>
> "Do not go where the path may lead,
>  go instead where there is no path and leave a trail."
>
> On Fri, Jun 1, 2018 at 12:36 AM, Jeff Ortel  wrote:
>
>>
>>
>> On 05/31/2018 04:39 PM, Brian Bouterse wrote:
>>
>> I updated the epic (https://pulp.plan.io/issues/3693) to use this new
>> language.
>>
>> policy=immediate  -> downloads now while the task runs (no lazy). Also
>> the default if unspecified.
>> policy=cache-and-save   -> All the steps in the diagram. Content that is
>> downloaded is saved so that it's only ever downloaded once.
>> policy=cache -> All the steps in the diagram except step 14. If squid
>> pushes the bits out of the cache, it will be re-downloaded again to serve
>> to other clients requesting the same bits.
>>
>>
>> These policy names strike me as an odd, non-intuitive mixture. I think we
>> need to brainstorm on policy names and/or additional attributes to best
>> capture this.  Suggest the epic be updated to describe the "modes" or use
>> cases without the names for now.  I'll try to follow up with other
>> suggestions.
>>
>>
>>
>> Also @milan, see inline for answers to your question.
>>
>> On Wed, May 30, 2018 at 3:48 PM, Milan Kovacik 
>> wrote:
>>
>>> On Wed, May 30, 2018 at 4:50 PM, Brian Bouterse 
>>> wrote:
>>> >
>>> >
>>> > On Wed, May 30, 2018 at 8:57 AM, Tom McKay 
>>> wrote:
>>> >>
>>> >> I think there is a usecase for "proxy only" like is being described
>>> here.
>>> >> Several years ago there was a project called thumbslug[1] that was
>>> used in a
>>> >> version of katello instead of pulp. It's job was to check
>>> entitlements and
>>> >> then proxy content from a cdn. The same functionality could be
>>> implemented
>>> >> in pulp. (Perhaps it's even as simple as telling squid not to cache
>>> anything
>>> >> so the content would never make it from cache to pulp in current
>>> pulp-2.)
>>> >
>>> >
>>> > What would you call this policy?
>>> > policy=proxy?
>>> > policy=stream-dont-save?
>>> > policy=stream-no-save?
>>> >
>>> > Are the names 'on-demand' and 'immediate' clear enough? Are there
>>> better
>>> > names?
>>> >>
>>> >>
>>> >> Overall I'm +1 to the idea of an only-squid version, if others think
>>> it
>>> >> would be useful.
>>> >
>>> >
>>> > I understand describing this as a "only-squid" version, but for
>>> clarity, the
>>> > streamer would still be required because it is what requests the bits
>>> with
>>> > the correctly configured downloader (certs, proxy, etc). The streamer
>>> > streams the bits into squid which provides caching and client
>>> multiplexing.
>>>
>>> I have to admit it's just now I'm reading
>>> https://docs.pulpproject.org/dev-guide/design/deferred-downl
>>> oad.html#apache-reverse-proxy
>>> again because of the SSL termination. So the new plan is to use the
>>> streamer to terminate the SSL instead of the Apache reverse proxy?
>>>
>>
>> The plan for right now is to not use a reverse proxy and have the
>> client's connection terminate at squid directly either via http or https
>> depending on how squid is configured. The Reverse proxy in pulp2's design
>> served to validate the signed urls and rewrite them for squid. This first
>> implementation won't use signed urls. I believe that means we don't need a
>> reverse proxy here yet.
>>
>>
>>> W/r the construction of the URL of an artifact, I thought it would be
>>> stored in the DB, so the Remote would create it during the sync.
>>>
>>
>> This is correct. The inbound URL from the client after the redirect will
>> still be a reference that the "Pulp content app" will resolve to a
>> RemoteArtifact. Then the streamer will use that RemoteArtifact data to
>> correctly build the downloader. That's the gist of it at least.
>>
>>
>>> >
>>> > To confirm my understanding this "squid-only" policy would be the same
>>> as
>>> > on-demand except that it would *not* perform step 14 from the diagram
>>> here
>>> > (https://pulp.plan.io/issues/3693). Is that right?
>>> yup
>>> >
>>> >>
>>> 

Re: [Pulp-dev] Lazy for Pulp3

2018-06-07 Thread Ina Panova
we could try to go with:

policy=immediate  -> downloads now while the task runs (no lazy). Also the
default if unspecified.
policy=on_demand   -> All the steps in the diagram. Content that is
downloaded is saved so that it's only ever downloaded once.
policy=cache_only -> All the steps in the diagram except step 14. If
squid pushes the bits out of the cache, it will be re-downloaded again to
serve to other clients requesting the same bits.




Regards,

Ina Panova
Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."

On Fri, Jun 1, 2018 at 12:36 AM, Jeff Ortel  wrote:

>
>
> On 05/31/2018 04:39 PM, Brian Bouterse wrote:
>
> I updated the epic (https://pulp.plan.io/issues/3693) to use this new
> language.
>
> policy=immediate  -> downloads now while the task runs (no lazy). Also the
> default if unspecified.
> policy=cache-and-save   -> All the steps in the diagram. Content that is
> downloaded is saved so that it's only ever downloaded once.
> policy=cache -> All the steps in the diagram except step 14. If squid
> pushes the bits out of the cache, it will be re-downloaded again to serve
> to other clients requesting the same bits.
>
>
> These policy names strike me as an odd, non-intuitive mixture. I think we
> need to brainstorm on policy names and/or additional attributes to best
> capture this.  Suggest the epic be updated to describe the "modes" or use
> cases without the names for now.  I'll try to follow up with other
> suggestions.
>
>
>
> Also @milan, see inline for answers to your question.
>
> On Wed, May 30, 2018 at 3:48 PM, Milan Kovacik 
> wrote:
>
>> On Wed, May 30, 2018 at 4:50 PM, Brian Bouterse 
>> wrote:
>> >
>> >
>> > On Wed, May 30, 2018 at 8:57 AM, Tom McKay 
>> wrote:
>> >>
>> >> I think there is a usecase for "proxy only" like is being described
>> here.
>> >> Several years ago there was a project called thumbslug[1] that was
>> used in a
>> >> version of katello instead of pulp. It's job was to check entitlements
>> and
>> >> then proxy content from a cdn. The same functionality could be
>> implemented
>> >> in pulp. (Perhaps it's even as simple as telling squid not to cache
>> anything
>> >> so the content would never make it from cache to pulp in current
>> pulp-2.)
>> >
>> >
>> > What would you call this policy?
>> > policy=proxy?
>> > policy=stream-dont-save?
>> > policy=stream-no-save?
>> >
>> > Are the names 'on-demand' and 'immediate' clear enough? Are there better
>> > names?
>> >>
>> >>
>> >> Overall I'm +1 to the idea of an only-squid version, if others think it
>> >> would be useful.
>> >
>> >
>> > I understand describing this as a "only-squid" version, but for
>> clarity, the
>> > streamer would still be required because it is what requests the bits
>> with
>> > the correctly configured downloader (certs, proxy, etc). The streamer
>> > streams the bits into squid which provides caching and client
>> multiplexing.
>>
>> I have to admit it's just now I'm reading
>> https://docs.pulpproject.org/dev-guide/design/deferred-downl
>> oad.html#apache-reverse-proxy
>> again because of the SSL termination. So the new plan is to use the
>> streamer to terminate the SSL instead of the Apache reverse proxy?
>>
>
> The plan for right now is to not use a reverse proxy and have the client's
> connection terminate at squid directly either via http or https depending
> on how squid is configured. The Reverse proxy in pulp2's design served to
> validate the signed urls and rewrite them for squid. This first
> implementation won't use signed urls. I believe that means we don't need a
> reverse proxy here yet.
>
>
>> W/r the construction of the URL of an artifact, I thought it would be
>> stored in the DB, so the Remote would create it during the sync.
>>
>
> This is correct. The inbound URL from the client after the redirect will
> still be a reference that the "Pulp content app" will resolve to a
> RemoteArtifact. Then the streamer will use that RemoteArtifact data to
> correctly build the downloader. That's the gist of it at least.
>
>
>> >
>> > To confirm my understanding this "squid-only" policy would be the same
>> as
>> > on-demand except that it would *not* perform step 14 from the diagram
>> here
>> > (https://pulp.plan.io/issues/3693). Is that right?
>> yup
>> >
>> >>
>> >>
>> >> [1] https://github.com/candlepin/thumbslug
>> >>
>> >> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik 
>> >> wrote:
>> >>>
>> >>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban 
>> >>> wrote:
>> >>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik <
>> mkova...@redhat.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban > >
>> >>> >> wrote:
>> >>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik
>> >>> >> > 
>> >>> >> > wrote:
>> >>> >> >>
>> >>> >> >> Good point!
>> >>> >> >> More the second; it might be a bit crazy to utilize Squid for
>> that
>> >>> >> >> but
>> >>> >> >> 

Re: [Pulp-dev] Lazy for Pulp3

2018-06-06 Thread Brian Bouterse
On Mon, Jun 4, 2018 at 12:45 PM, Bryan Kearney  wrote:

> On 05/31/2018 06:36 PM, Jeff Ortel wrote:
> >
> >
> > On 05/31/2018 04:39 PM, Brian Bouterse wrote:
> >> I updated the epic (https://pulp.plan.io/issues/3693) to use this new
> >> language.
> >>
> >> policy=immediate  -> downloads now while the task runs (no lazy). Also
> >> the default if unspecified.
> >> policy=cache-and-save   -> All the steps in the diagram. Content that
> >> is downloaded is saved so that it's only ever downloaded once.
> >> policy=cache -> All the steps in the diagram except step 14. If
> >> squid pushes the bits out of the cache, it will be re-downloaded again
> >> to serve to other clients requesting the same bits.
>
> If this became a requirement, another implementation of what tom is
> asking is bulk job to clean out old cached content. I assume
> cache-and-save with a 2 week purge would be the same end result and not
> require alot of net new coding.
>

I believe these features would layer on top of the current plan reasonably
well. I want to describe the user experience on this concept.

Purging content would probably be a process where the associated
ContentUnit is updated to not be associated with the saved Artifact; this
would cause the Artifact to become an Orphaned Artifact. After that, the
user would need to run orphan cleanup to actually delete that Artifact from
the db and the storage system. This 2-step thing is due to a correctness
requirement that Orphaned Artifact deletion has to be run without any other
Pulp jobs executing. The orphan cleanup already does this correctly so this
purging would probably occur like this for now.

Would ^ type of execution work for this type of purging use case?


> -- bk
>
>
>
>
> ___
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Lazy for Pulp3

2018-06-06 Thread Brian Bouterse
The config we used in pulp2 can be seen here:
https://docs.pulpproject.org/user-guide/deferred-download.html#squid

In that scenario we used a reverse proxy to do TLS termination, but I think
squid will do the TLS termination in this case. We haven't configured squid
like that before so we'll have to make sure (a) it can and (b) that it will
cache data that flows over that TLS link also.

If we can't have squid do the TLS termination, we'll need a reverse proxy
to do it like we did in pulp2.


On Mon, Jun 4, 2018 at 12:43 PM, Milan Kovacik  wrote:

> On Thu, May 31, 2018 at 11:39 PM, Brian Bouterse 
> wrote:
> > I updated the epic (https://pulp.plan.io/issues/3693) to use this new
> > language.
> >
> > policy=immediate  -> downloads now while the task runs (no lazy). Also
> the
> > default if unspecified.
> > policy=cache-and-save   -> All the steps in the diagram. Content that is
> > downloaded is saved so that it's only ever downloaded once.
> > policy=cache -> All the steps in the diagram except step 14. If squid
> > pushes the bits out of the cache, it will be re-downloaded again to
> serve to
> > other clients requesting the same bits.
> >
> > Also @milan, see inline for answers to your question.
> >
> > On Wed, May 30, 2018 at 3:48 PM, Milan Kovacik 
> wrote:
> >>
> >> On Wed, May 30, 2018 at 4:50 PM, Brian Bouterse 
> >> wrote:
> >> >
> >> >
> >> > On Wed, May 30, 2018 at 8:57 AM, Tom McKay 
> >> > wrote:
> >> >>
> >> >> I think there is a usecase for "proxy only" like is being described
> >> >> here.
> >> >> Several years ago there was a project called thumbslug[1] that was
> used
> >> >> in a
> >> >> version of katello instead of pulp. It's job was to check
> entitlements
> >> >> and
> >> >> then proxy content from a cdn. The same functionality could be
> >> >> implemented
> >> >> in pulp. (Perhaps it's even as simple as telling squid not to cache
> >> >> anything
> >> >> so the content would never make it from cache to pulp in current
> >> >> pulp-2.)
> >> >
> >> >
> >> > What would you call this policy?
> >> > policy=proxy?
> >> > policy=stream-dont-save?
> >> > policy=stream-no-save?
> >> >
> >> > Are the names 'on-demand' and 'immediate' clear enough? Are there
> better
> >> > names?
> >> >>
> >> >>
> >> >> Overall I'm +1 to the idea of an only-squid version, if others think
> it
> >> >> would be useful.
> >> >
> >> >
> >> > I understand describing this as a "only-squid" version, but for
> clarity,
> >> > the
> >> > streamer would still be required because it is what requests the bits
> >> > with
> >> > the correctly configured downloader (certs, proxy, etc). The streamer
> >> > streams the bits into squid which provides caching and client
> >> > multiplexing.
> >>
> >> I have to admit it's just now I'm reading
> >>
> >> https://docs.pulpproject.org/dev-guide/design/deferred-
> download.html#apache-reverse-proxy
> >> again because of the SSL termination. So the new plan is to use the
> >> streamer to terminate the SSL instead of the Apache reverse proxy?
> >
> >
> > The plan for right now is to not use a reverse proxy and have the
> client's
> > connection terminate at squid directly either via http or https
> depending on
> > how squid is configured. The Reverse proxy in pulp2's design served to
> > validate the signed urls and rewrite them for squid. This first
> > implementation won't use signed urls. I believe that means we don't need
> a
> > reverse proxy here yet.
>
> I don't think I understand; so Squid will be used to terminate TLS but
> it won't be used as a reverse proxy?
>
>
>
>
> >
> >>
> >> W/r the construction of the URL of an artifact, I thought it would be
> >> stored in the DB, so the Remote would create it during the sync.
> >
> >
> > This is correct. The inbound URL from the client after the redirect will
> > still be a reference that the "Pulp content app" will resolve to a
> > RemoteArtifact. Then the streamer will use that RemoteArtifact data to
> > correctly build the downloader. That's the gist of it at least.
>
>
>
> >
> >>
> >> >
> >> > To confirm my understanding this "squid-only" policy would be the same
> >> > as
> >> > on-demand except that it would *not* perform step 14 from the diagram
> >> > here
> >> > (https://pulp.plan.io/issues/3693). Is that right?
> >> yup
> >> >
> >> >>
> >> >>
> >> >> [1] https://github.com/candlepin/thumbslug
> >> >>
> >> >> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik 
> >> >> wrote:
> >> >>>
> >> >>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban 
> >> >>> wrote:
> >> >>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik
> >> >>> > 
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban <
> dkli...@redhat.com>
> >> >>> >> wrote:
> >> >>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik
> >> >>> >> > 
> >> >>> >> > wrote:
> >> >>> >> >>
> >> >>> >> >> Good point!
> >> >>> >> >> More the second; it might be a bit crazy to utilize Squid for
> >> >>> >> >> that
> >> >>> >> >> but
> >> >>> >> 

Re: [Pulp-dev] Lazy for Pulp3

2018-06-04 Thread Bryan Kearney
On 05/31/2018 06:36 PM, Jeff Ortel wrote:
> 
> 
> On 05/31/2018 04:39 PM, Brian Bouterse wrote:
>> I updated the epic (https://pulp.plan.io/issues/3693) to use this new
>> language.
>>
>> policy=immediate  -> downloads now while the task runs (no lazy). Also
>> the default if unspecified.
>> policy=cache-and-save   -> All the steps in the diagram. Content that
>> is downloaded is saved so that it's only ever downloaded once.
>> policy=cache -> All the steps in the diagram except step 14. If
>> squid pushes the bits out of the cache, it will be re-downloaded again
>> to serve to other clients requesting the same bits.

If this became a requirement, another implementation of what tom is
asking is bulk job to clean out old cached content. I assume
cache-and-save with a 2 week purge would be the same end result and not
require alot of net new coding.

-- bk





signature.asc
Description: OpenPGP digital signature
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Lazy for Pulp3

2018-06-04 Thread Milan Kovacik
On Thu, May 31, 2018 at 11:39 PM, Brian Bouterse  wrote:
> I updated the epic (https://pulp.plan.io/issues/3693) to use this new
> language.
>
> policy=immediate  -> downloads now while the task runs (no lazy). Also the
> default if unspecified.
> policy=cache-and-save   -> All the steps in the diagram. Content that is
> downloaded is saved so that it's only ever downloaded once.
> policy=cache -> All the steps in the diagram except step 14. If squid
> pushes the bits out of the cache, it will be re-downloaded again to serve to
> other clients requesting the same bits.
>
> Also @milan, see inline for answers to your question.
>
> On Wed, May 30, 2018 at 3:48 PM, Milan Kovacik  wrote:
>>
>> On Wed, May 30, 2018 at 4:50 PM, Brian Bouterse 
>> wrote:
>> >
>> >
>> > On Wed, May 30, 2018 at 8:57 AM, Tom McKay 
>> > wrote:
>> >>
>> >> I think there is a usecase for "proxy only" like is being described
>> >> here.
>> >> Several years ago there was a project called thumbslug[1] that was used
>> >> in a
>> >> version of katello instead of pulp. It's job was to check entitlements
>> >> and
>> >> then proxy content from a cdn. The same functionality could be
>> >> implemented
>> >> in pulp. (Perhaps it's even as simple as telling squid not to cache
>> >> anything
>> >> so the content would never make it from cache to pulp in current
>> >> pulp-2.)
>> >
>> >
>> > What would you call this policy?
>> > policy=proxy?
>> > policy=stream-dont-save?
>> > policy=stream-no-save?
>> >
>> > Are the names 'on-demand' and 'immediate' clear enough? Are there better
>> > names?
>> >>
>> >>
>> >> Overall I'm +1 to the idea of an only-squid version, if others think it
>> >> would be useful.
>> >
>> >
>> > I understand describing this as a "only-squid" version, but for clarity,
>> > the
>> > streamer would still be required because it is what requests the bits
>> > with
>> > the correctly configured downloader (certs, proxy, etc). The streamer
>> > streams the bits into squid which provides caching and client
>> > multiplexing.
>>
>> I have to admit it's just now I'm reading
>>
>> https://docs.pulpproject.org/dev-guide/design/deferred-download.html#apache-reverse-proxy
>> again because of the SSL termination. So the new plan is to use the
>> streamer to terminate the SSL instead of the Apache reverse proxy?
>
>
> The plan for right now is to not use a reverse proxy and have the client's
> connection terminate at squid directly either via http or https depending on
> how squid is configured. The Reverse proxy in pulp2's design served to
> validate the signed urls and rewrite them for squid. This first
> implementation won't use signed urls. I believe that means we don't need a
> reverse proxy here yet.

I don't think I understand; so Squid will be used to terminate TLS but
it won't be used as a reverse proxy?




>
>>
>> W/r the construction of the URL of an artifact, I thought it would be
>> stored in the DB, so the Remote would create it during the sync.
>
>
> This is correct. The inbound URL from the client after the redirect will
> still be a reference that the "Pulp content app" will resolve to a
> RemoteArtifact. Then the streamer will use that RemoteArtifact data to
> correctly build the downloader. That's the gist of it at least.



>
>>
>> >
>> > To confirm my understanding this "squid-only" policy would be the same
>> > as
>> > on-demand except that it would *not* perform step 14 from the diagram
>> > here
>> > (https://pulp.plan.io/issues/3693). Is that right?
>> yup
>> >
>> >>
>> >>
>> >> [1] https://github.com/candlepin/thumbslug
>> >>
>> >> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik 
>> >> wrote:
>> >>>
>> >>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban 
>> >>> wrote:
>> >>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik
>> >>> > 
>> >>> > wrote:
>> >>> >>
>> >>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban 
>> >>> >> wrote:
>> >>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik
>> >>> >> > 
>> >>> >> > wrote:
>> >>> >> >>
>> >>> >> >> Good point!
>> >>> >> >> More the second; it might be a bit crazy to utilize Squid for
>> >>> >> >> that
>> >>> >> >> but
>> >>> >> >> first, let's answer the why ;)
>> >>> >> >> So why does Pulp need to store the content here?
>> >>> >> >> Why don't we point the users to the Squid all the time (for the
>> >>> >> >> lazy
>> >>> >> >> repos)?
>> >>> >> >
>> >>> >> >
>> >>> >> > Pulp's Streamer needs to fetch and store the content because
>> >>> >> > that's
>> >>> >> > Pulp's
>> >>> >> > primary responsibility.
>> >>> >>
>> >>> >> Maybe not that much the storing but rather the content views
>> >>> >> management?
>> >>> >> I mean the partitioning into repositories, promoting.
>> >>> >>
>> >>> >
>> >>> > Exactly this. We want Pulp users to be able to reuse content that
>> >>> > was
>> >>> > brought in using the 'on_demand' download policy in other
>> >>> > repositories.
>> >>> I see.
>> >>>
>> >>> >
>> >>> >>
>> >>> >> If some of the content lived in Squid and some 

Re: [Pulp-dev] Lazy for Pulp3

2018-05-31 Thread Jeff Ortel



On 05/31/2018 04:39 PM, Brian Bouterse wrote:
I updated the epic (https://pulp.plan.io/issues/3693) to use this new 
language.


policy=immediate  -> downloads now while the task runs (no lazy). Also 
the default if unspecified.
policy=cache-and-save   -> All the steps in the diagram. Content that 
is downloaded is saved so that it's only ever downloaded once.
policy=cache -> All the steps in the diagram except step 14. If 
squid pushes the bits out of the cache, it will be re-downloaded again 
to serve to other clients requesting the same bits.


These policy names strike me as an odd, non-intuitive mixture. I think 
we need to brainstorm on policy names and/or additional attributes to 
best capture this.  Suggest the epic be updated to describe the "modes" 
or use cases without the names for now.  I'll try to follow up with 
other suggestions.




Also @milan, see inline for answers to your question.

On Wed, May 30, 2018 at 3:48 PM, Milan Kovacik > wrote:


On Wed, May 30, 2018 at 4:50 PM, Brian Bouterse
mailto:bbout...@redhat.com>> wrote:
>
>
> On Wed, May 30, 2018 at 8:57 AM, Tom McKay
mailto:thomasmc...@redhat.com>> wrote:
>>
>> I think there is a usecase for "proxy only" like is being
described here.
>> Several years ago there was a project called thumbslug[1] that
was used in a
>> version of katello instead of pulp. It's job was to check
entitlements and
>> then proxy content from a cdn. The same functionality could be
implemented
>> in pulp. (Perhaps it's even as simple as telling squid not to
cache anything
>> so the content would never make it from cache to pulp in
current pulp-2.)
>
>
> What would you call this policy?
> policy=proxy?
> policy=stream-dont-save?
> policy=stream-no-save?
>
> Are the names 'on-demand' and 'immediate' clear enough? Are
there better
> names?
>>
>>
>> Overall I'm +1 to the idea of an only-squid version, if others
think it
>> would be useful.
>
>
> I understand describing this as a "only-squid" version, but for
clarity, the
> streamer would still be required because it is what requests the
bits with
> the correctly configured downloader (certs, proxy, etc). The
streamer
> streams the bits into squid which provides caching and client
multiplexing.

I have to admit it's just now I'm reading

https://docs.pulpproject.org/dev-guide/design/deferred-download.html#apache-reverse-proxy


again because of the SSL termination. So the new plan is to use the
streamer to terminate the SSL instead of the Apache reverse proxy?


The plan for right now is to not use a reverse proxy and have the 
client's connection terminate at squid directly either via http or 
https depending on how squid is configured. The Reverse proxy in 
pulp2's design served to validate the signed urls and rewrite them for 
squid. This first implementation won't use signed urls. I believe that 
means we don't need a reverse proxy here yet.



W/r the construction of the URL of an artifact, I thought it would be
stored in the DB, so the Remote would create it during the sync.


This is correct. The inbound URL from the client after the redirect 
will still be a reference that the "Pulp content app" will resolve to 
a RemoteArtifact. Then the streamer will use that RemoteArtifact data 
to correctly build the downloader. That's the gist of it at least.



>
> To confirm my understanding this "squid-only" policy would be
the same as
> on-demand except that it would *not* perform step 14 from the
diagram here
> (https://pulp.plan.io/issues/3693
). Is that right?
yup
>
>>
>>
>> [1] https://github.com/candlepin/thumbslug

>>
>> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik
mailto:mkova...@redhat.com>>
>> wrote:
>>>
>>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban
mailto:dkli...@redhat.com>>
>>> wrote:
>>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik
mailto:mkova...@redhat.com>>
>>> > wrote:
>>> >>
>>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban
mailto:dkli...@redhat.com>>
>>> >> wrote:
>>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik
>>> >> > mailto:mkova...@redhat.com>>
>>> >> > wrote:
>>> >> >>
>>> >> >> Good point!
>>> >> >> More the second; it might be a bit crazy to utilize
Squid for that
>>> >> >> but
>>> >> >> first, let's answer the why ;)
>>> >> >> So why does Pulp need to store the content here?
>>> >> >> Why don't we point the users to the Squid all the time
(for the
>>> >> >> lazy
>>> >> >> repos)?
>>> >> >
>>> >> 

Re: [Pulp-dev] Lazy for Pulp3

2018-05-31 Thread Brian Bouterse
I updated the epic (https://pulp.plan.io/issues/3693) to use this new
language.

policy=immediate  -> downloads now while the task runs (no lazy). Also the
default if unspecified.
policy=cache-and-save   -> All the steps in the diagram. Content that is
downloaded is saved so that it's only ever downloaded once.
policy=cache -> All the steps in the diagram except step 14. If squid
pushes the bits out of the cache, it will be re-downloaded again to serve
to other clients requesting the same bits.

Also @milan, see inline for answers to your question.

On Wed, May 30, 2018 at 3:48 PM, Milan Kovacik  wrote:

> On Wed, May 30, 2018 at 4:50 PM, Brian Bouterse 
> wrote:
> >
> >
> > On Wed, May 30, 2018 at 8:57 AM, Tom McKay 
> wrote:
> >>
> >> I think there is a usecase for "proxy only" like is being described
> here.
> >> Several years ago there was a project called thumbslug[1] that was used
> in a
> >> version of katello instead of pulp. It's job was to check entitlements
> and
> >> then proxy content from a cdn. The same functionality could be
> implemented
> >> in pulp. (Perhaps it's even as simple as telling squid not to cache
> anything
> >> so the content would never make it from cache to pulp in current
> pulp-2.)
> >
> >
> > What would you call this policy?
> > policy=proxy?
> > policy=stream-dont-save?
> > policy=stream-no-save?
> >
> > Are the names 'on-demand' and 'immediate' clear enough? Are there better
> > names?
> >>
> >>
> >> Overall I'm +1 to the idea of an only-squid version, if others think it
> >> would be useful.
> >
> >
> > I understand describing this as a "only-squid" version, but for clarity,
> the
> > streamer would still be required because it is what requests the bits
> with
> > the correctly configured downloader (certs, proxy, etc). The streamer
> > streams the bits into squid which provides caching and client
> multiplexing.
>
> I have to admit it's just now I'm reading
> https://docs.pulpproject.org/dev-guide/design/deferred-
> download.html#apache-reverse-proxy
> again because of the SSL termination. So the new plan is to use the
> streamer to terminate the SSL instead of the Apache reverse proxy?
>

The plan for right now is to not use a reverse proxy and have the client's
connection terminate at squid directly either via http or https depending
on how squid is configured. The Reverse proxy in pulp2's design served to
validate the signed urls and rewrite them for squid. This first
implementation won't use signed urls. I believe that means we don't need a
reverse proxy here yet.


> W/r the construction of the URL of an artifact, I thought it would be
> stored in the DB, so the Remote would create it during the sync.
>

This is correct. The inbound URL from the client after the redirect will
still be a reference that the "Pulp content app" will resolve to a
RemoteArtifact. Then the streamer will use that RemoteArtifact data to
correctly build the downloader. That's the gist of it at least.


> >
> > To confirm my understanding this "squid-only" policy would be the same as
> > on-demand except that it would *not* perform step 14 from the diagram
> here
> > (https://pulp.plan.io/issues/3693). Is that right?
> yup
> >
> >>
> >>
> >> [1] https://github.com/candlepin/thumbslug
> >>
> >> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik 
> >> wrote:
> >>>
> >>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban 
> >>> wrote:
> >>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik  >
> >>> > wrote:
> >>> >>
> >>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban 
> >>> >> wrote:
> >>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik
> >>> >> > 
> >>> >> > wrote:
> >>> >> >>
> >>> >> >> Good point!
> >>> >> >> More the second; it might be a bit crazy to utilize Squid for
> that
> >>> >> >> but
> >>> >> >> first, let's answer the why ;)
> >>> >> >> So why does Pulp need to store the content here?
> >>> >> >> Why don't we point the users to the Squid all the time (for the
> >>> >> >> lazy
> >>> >> >> repos)?
> >>> >> >
> >>> >> >
> >>> >> > Pulp's Streamer needs to fetch and store the content because
> that's
> >>> >> > Pulp's
> >>> >> > primary responsibility.
> >>> >>
> >>> >> Maybe not that much the storing but rather the content views
> >>> >> management?
> >>> >> I mean the partitioning into repositories, promoting.
> >>> >>
> >>> >
> >>> > Exactly this. We want Pulp users to be able to reuse content that was
> >>> > brought in using the 'on_demand' download policy in other
> repositories.
> >>> I see.
> >>>
> >>> >
> >>> >>
> >>> >> If some of the content lived in Squid and some lived
> >>> >> > in Pulp, it would be difficult for the user to know what content
> is
> >>> >> > actually
> >>> >> > available in Pulp and what content needs to be fetched from a
> remote
> >>> >> > repository.
> >>> >>
> >>> >> I'd say the rule of the thumb would be: lazy -> squid, regular ->
> pulp
> >>> >> so not that difficult.
> >>> >> Maybe Pulp could have a concept of Origin, where folks upload 

Re: [Pulp-dev] Lazy for Pulp3

2018-05-30 Thread Milan Kovacik
On Wed, May 30, 2018 at 4:50 PM, Brian Bouterse  wrote:
>
>
> On Wed, May 30, 2018 at 8:57 AM, Tom McKay  wrote:
>>
>> I think there is a usecase for "proxy only" like is being described here.
>> Several years ago there was a project called thumbslug[1] that was used in a
>> version of katello instead of pulp. It's job was to check entitlements and
>> then proxy content from a cdn. The same functionality could be implemented
>> in pulp. (Perhaps it's even as simple as telling squid not to cache anything
>> so the content would never make it from cache to pulp in current pulp-2.)
>
>
> What would you call this policy?
> policy=proxy?
> policy=stream-dont-save?
> policy=stream-no-save?
>
> Are the names 'on-demand' and 'immediate' clear enough? Are there better
> names?
>>
>>
>> Overall I'm +1 to the idea of an only-squid version, if others think it
>> would be useful.
>
>
> I understand describing this as a "only-squid" version, but for clarity, the
> streamer would still be required because it is what requests the bits with
> the correctly configured downloader (certs, proxy, etc). The streamer
> streams the bits into squid which provides caching and client multiplexing.

I have to admit it's just now I'm reading
https://docs.pulpproject.org/dev-guide/design/deferred-download.html#apache-reverse-proxy
again because of the SSL termination. So the new plan is to use the
streamer to terminate the SSL instead of the Apache reverse proxy?

W/r the construction of the URL of an artifact, I thought it would be
stored in the DB, so the Remote would create it during the sync.

>
> To confirm my understanding this "squid-only" policy would be the same as
> on-demand except that it would *not* perform step 14 from the diagram here
> (https://pulp.plan.io/issues/3693). Is that right?
yup
>
>>
>>
>> [1] https://github.com/candlepin/thumbslug
>>
>> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik 
>> wrote:
>>>
>>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban 
>>> wrote:
>>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik 
>>> > wrote:
>>> >>
>>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban 
>>> >> wrote:
>>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik
>>> >> > 
>>> >> > wrote:
>>> >> >>
>>> >> >> Good point!
>>> >> >> More the second; it might be a bit crazy to utilize Squid for that
>>> >> >> but
>>> >> >> first, let's answer the why ;)
>>> >> >> So why does Pulp need to store the content here?
>>> >> >> Why don't we point the users to the Squid all the time (for the
>>> >> >> lazy
>>> >> >> repos)?
>>> >> >
>>> >> >
>>> >> > Pulp's Streamer needs to fetch and store the content because that's
>>> >> > Pulp's
>>> >> > primary responsibility.
>>> >>
>>> >> Maybe not that much the storing but rather the content views
>>> >> management?
>>> >> I mean the partitioning into repositories, promoting.
>>> >>
>>> >
>>> > Exactly this. We want Pulp users to be able to reuse content that was
>>> > brought in using the 'on_demand' download policy in other repositories.
>>> I see.
>>>
>>> >
>>> >>
>>> >> If some of the content lived in Squid and some lived
>>> >> > in Pulp, it would be difficult for the user to know what content is
>>> >> > actually
>>> >> > available in Pulp and what content needs to be fetched from a remote
>>> >> > repository.
>>> >>
>>> >> I'd say the rule of the thumb would be: lazy -> squid, regular -> pulp
>>> >> so not that difficult.
>>> >> Maybe Pulp could have a concept of Origin, where folks upload stuff to
>>> >> a Pulp repo, vs. Proxy for it's repo storage policy?
>>> >>
>>> >
>>> > Squid removes things from the cache at some point. You can probably
>>> > configure it to never remove anything from the cache, but then we would
>>> > need
>>> > to implement orphan cleanup that would work across two systems: pulp
>>> > and
>>> > squid.
>>>
>>> Actually "remote" units wouldn't need orphan cleaning from the disk,
>>> just dropping them from the DB would suffice.
>>>
>>> >
>>> > Answering that question would still be difficult. Not all content that
>>> > is in
>>> > the repository that was synced using on_demand download policy will be
>>> > in
>>> > Squid - only the content that has been requested by clients. So it's
>>> > still
>>> > hard to know which of the content units have been downloaded and which
>>> > have
>>> > not been.
>>>
>>> But the beauty is exactly in that: we don't have to track whether the
>>> content is downloaded if it is reverse-proxied[1][2].
>>> Moreover, this would work both with and without a proxy between Pulp
>>> and the Origin of the remote unit.
>>> A "remote" content artifact might just need to carry it's URL in a DB
>>> column for this to work; so the async artifact model, instead of the
>>> "policy=on-demand"  would have a mandatory remote "URL" attribute; I
>>> wouldn't say it's more complex than tracking the "policy" attribute.
>>>
>>> >
>>> >
>>> >>
>>> >> >
>>> >> > As Pulp downloads an Artifact, it calculates all the checksums and
>>> >> > it's
>>> >> > 

Re: [Pulp-dev] Lazy for Pulp3

2018-05-30 Thread Tom McKay
No opinion on name; foreman will name it whatever they want on the front
end user experience. Devs working on pulp-2 to pulp-3 foreman transition
may desire maintaining existing names.

Yes, I'd say everything but step 14 in that diagram. In addition, I would
ensure that the squid cache size is configurable to zero so that it is
effectively a straight pull through.

I assume that all pulp-3 content types will have this as an option as well,
if the type supports it? I want straight proxy of container images, for
example. An straight proxy of files. etc.

On Wed, May 30, 2018 at 11:34 AM, Brian Bouterse 
wrote:

> Actually, what about these as names?
>
> policy=immediate  -> downloads now while the task runs (no lazy). Also the
> default if unspecified.
> policy=cache-and-save   -> All the steps in the diagram. Content that is
> downloaded is saved so that it's only ever downloaded once.
> policy=cache -> All the steps in the diagram except step 14. If squid
> pushes the bits out of the cache, it will be re-downloaded again to serve
> to other clients requesting the same bits.
>
> If ^ is better I can update the stories. Other naming ideas and use cases
> are welcome.
>
> Thanks,
> Brian
>
> On Wed, May 30, 2018 at 10:50 AM, Brian Bouterse 
> wrote:
>
>>
>>
>> On Wed, May 30, 2018 at 8:57 AM, Tom McKay 
>> wrote:
>>
>>> I think there is a usecase for "proxy only" like is being described
>>> here. Several years ago there was a project called thumbslug[1] that was
>>> used in a version of katello instead of pulp. It's job was to check
>>> entitlements and then proxy content from a cdn. The same functionality
>>> could be implemented in pulp. (Perhaps it's even as simple as telling squid
>>> not to cache anything so the content would never make it from cache to pulp
>>> in current pulp-2.)
>>>
>>
>> What would you call this policy?
>> policy=proxy?
>> policy=stream-dont-save?
>> policy=stream-no-save?
>>
>> Are the names 'on-demand' and 'immediate' clear enough? Are there better
>> names?
>>
>>>
>>> Overall I'm +1 to the idea of an only-squid version, if others think it
>>> would be useful.
>>>
>>
>> I understand describing this as a "only-squid" version, but for clarity,
>> the streamer would still be required because it is what requests the bits
>> with the correctly configured downloader (certs, proxy, etc). The streamer
>> streams the bits into squid which provides caching and client multiplexing.
>>
>> To confirm my understanding this "squid-only" policy would be the same as
>> on-demand except that it would *not* perform step 14 from the diagram here (
>> https://pulp.plan.io/issues/3693). Is that right?
>>
>>
>>>
>>> [1] https://github.com/candlepin/thumbslug
>>>
>>> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik 
>>> wrote:
>>>
 On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban 
 wrote:
 > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik 
 wrote:
 >>
 >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban 
 wrote:
 >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik <
 mkova...@redhat.com>
 >> > wrote:
 >> >>
 >> >> Good point!
 >> >> More the second; it might be a bit crazy to utilize Squid for
 that but
 >> >> first, let's answer the why ;)
 >> >> So why does Pulp need to store the content here?
 >> >> Why don't we point the users to the Squid all the time (for the
 lazy
 >> >> repos)?
 >> >
 >> >
 >> > Pulp's Streamer needs to fetch and store the content because that's
 >> > Pulp's
 >> > primary responsibility.
 >>
 >> Maybe not that much the storing but rather the content views
 management?
 >> I mean the partitioning into repositories, promoting.
 >>
 >
 > Exactly this. We want Pulp users to be able to reuse content that was
 > brought in using the 'on_demand' download policy in other
 repositories.
 I see.

 >
 >>
 >> If some of the content lived in Squid and some lived
 >> > in Pulp, it would be difficult for the user to know what content is
 >> > actually
 >> > available in Pulp and what content needs to be fetched from a
 remote
 >> > repository.
 >>
 >> I'd say the rule of the thumb would be: lazy -> squid, regular ->
 pulp
 >> so not that difficult.
 >> Maybe Pulp could have a concept of Origin, where folks upload stuff
 to
 >> a Pulp repo, vs. Proxy for it's repo storage policy?
 >>
 >
 > Squid removes things from the cache at some point. You can probably
 > configure it to never remove anything from the cache, but then we
 would need
 > to implement orphan cleanup that would work across two systems: pulp
 and
 > squid.

 Actually "remote" units wouldn't need orphan cleaning from the disk,
 just dropping them from the DB would suffice.

 >
 > Answering that question would still be difficult. Not all content
 that is in
 > the 

Re: [Pulp-dev] Lazy for Pulp3

2018-05-30 Thread Brian Bouterse
Actually, what about these as names?

policy=immediate  -> downloads now while the task runs (no lazy). Also the
default if unspecified.
policy=cache-and-save   -> All the steps in the diagram. Content that is
downloaded is saved so that it's only ever downloaded once.
policy=cache -> All the steps in the diagram except step 14. If squid
pushes the bits out of the cache, it will be re-downloaded again to serve
to other clients requesting the same bits.

If ^ is better I can update the stories. Other naming ideas and use cases
are welcome.

Thanks,
Brian

On Wed, May 30, 2018 at 10:50 AM, Brian Bouterse 
wrote:

>
>
> On Wed, May 30, 2018 at 8:57 AM, Tom McKay  wrote:
>
>> I think there is a usecase for "proxy only" like is being described here.
>> Several years ago there was a project called thumbslug[1] that was used in
>> a version of katello instead of pulp. It's job was to check entitlements
>> and then proxy content from a cdn. The same functionality could be
>> implemented in pulp. (Perhaps it's even as simple as telling squid not to
>> cache anything so the content would never make it from cache to pulp in
>> current pulp-2.)
>>
>
> What would you call this policy?
> policy=proxy?
> policy=stream-dont-save?
> policy=stream-no-save?
>
> Are the names 'on-demand' and 'immediate' clear enough? Are there better
> names?
>
>>
>> Overall I'm +1 to the idea of an only-squid version, if others think it
>> would be useful.
>>
>
> I understand describing this as a "only-squid" version, but for clarity,
> the streamer would still be required because it is what requests the bits
> with the correctly configured downloader (certs, proxy, etc). The streamer
> streams the bits into squid which provides caching and client multiplexing.
>
> To confirm my understanding this "squid-only" policy would be the same as
> on-demand except that it would *not* perform step 14 from the diagram here (
> https://pulp.plan.io/issues/3693). Is that right?
>
>
>>
>> [1] https://github.com/candlepin/thumbslug
>>
>> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik 
>> wrote:
>>
>>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban 
>>> wrote:
>>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik 
>>> wrote:
>>> >>
>>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban 
>>> wrote:
>>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik <
>>> mkova...@redhat.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Good point!
>>> >> >> More the second; it might be a bit crazy to utilize Squid for that
>>> but
>>> >> >> first, let's answer the why ;)
>>> >> >> So why does Pulp need to store the content here?
>>> >> >> Why don't we point the users to the Squid all the time (for the
>>> lazy
>>> >> >> repos)?
>>> >> >
>>> >> >
>>> >> > Pulp's Streamer needs to fetch and store the content because that's
>>> >> > Pulp's
>>> >> > primary responsibility.
>>> >>
>>> >> Maybe not that much the storing but rather the content views
>>> management?
>>> >> I mean the partitioning into repositories, promoting.
>>> >>
>>> >
>>> > Exactly this. We want Pulp users to be able to reuse content that was
>>> > brought in using the 'on_demand' download policy in other repositories.
>>> I see.
>>>
>>> >
>>> >>
>>> >> If some of the content lived in Squid and some lived
>>> >> > in Pulp, it would be difficult for the user to know what content is
>>> >> > actually
>>> >> > available in Pulp and what content needs to be fetched from a remote
>>> >> > repository.
>>> >>
>>> >> I'd say the rule of the thumb would be: lazy -> squid, regular -> pulp
>>> >> so not that difficult.
>>> >> Maybe Pulp could have a concept of Origin, where folks upload stuff to
>>> >> a Pulp repo, vs. Proxy for it's repo storage policy?
>>> >>
>>> >
>>> > Squid removes things from the cache at some point. You can probably
>>> > configure it to never remove anything from the cache, but then we
>>> would need
>>> > to implement orphan cleanup that would work across two systems: pulp
>>> and
>>> > squid.
>>>
>>> Actually "remote" units wouldn't need orphan cleaning from the disk,
>>> just dropping them from the DB would suffice.
>>>
>>> >
>>> > Answering that question would still be difficult. Not all content that
>>> is in
>>> > the repository that was synced using on_demand download policy will be
>>> in
>>> > Squid - only the content that has been requested by clients. So it's
>>> still
>>> > hard to know which of the content units have been downloaded and which
>>> have
>>> > not been.
>>>
>>> But the beauty is exactly in that: we don't have to track whether the
>>> content is downloaded if it is reverse-proxied[1][2].
>>> Moreover, this would work both with and without a proxy between Pulp
>>> and the Origin of the remote unit.
>>> A "remote" content artifact might just need to carry it's URL in a DB
>>> column for this to work; so the async artifact model, instead of the
>>> "policy=on-demand"  would have a mandatory remote "URL" attribute; I
>>> wouldn't say it's more complex than tracking the 

Re: [Pulp-dev] Lazy for Pulp3

2018-05-30 Thread Brian Bouterse
On Wed, May 30, 2018 at 8:57 AM, Tom McKay  wrote:

> I think there is a usecase for "proxy only" like is being described here.
> Several years ago there was a project called thumbslug[1] that was used in
> a version of katello instead of pulp. It's job was to check entitlements
> and then proxy content from a cdn. The same functionality could be
> implemented in pulp. (Perhaps it's even as simple as telling squid not to
> cache anything so the content would never make it from cache to pulp in
> current pulp-2.)
>

What would you call this policy?
policy=proxy?
policy=stream-dont-save?
policy=stream-no-save?

Are the names 'on-demand' and 'immediate' clear enough? Are there better
names?

>
> Overall I'm +1 to the idea of an only-squid version, if others think it
> would be useful.
>

I understand describing this as a "only-squid" version, but for clarity,
the streamer would still be required because it is what requests the bits
with the correctly configured downloader (certs, proxy, etc). The streamer
streams the bits into squid which provides caching and client multiplexing.

To confirm my understanding this "squid-only" policy would be the same as
on-demand except that it would *not* perform step 14 from the diagram here (
https://pulp.plan.io/issues/3693). Is that right?


>
> [1] https://github.com/candlepin/thumbslug
>
> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik 
> wrote:
>
>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban 
>> wrote:
>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik 
>> wrote:
>> >>
>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban 
>> wrote:
>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik > >
>> >> > wrote:
>> >> >>
>> >> >> Good point!
>> >> >> More the second; it might be a bit crazy to utilize Squid for that
>> but
>> >> >> first, let's answer the why ;)
>> >> >> So why does Pulp need to store the content here?
>> >> >> Why don't we point the users to the Squid all the time (for the lazy
>> >> >> repos)?
>> >> >
>> >> >
>> >> > Pulp's Streamer needs to fetch and store the content because that's
>> >> > Pulp's
>> >> > primary responsibility.
>> >>
>> >> Maybe not that much the storing but rather the content views
>> management?
>> >> I mean the partitioning into repositories, promoting.
>> >>
>> >
>> > Exactly this. We want Pulp users to be able to reuse content that was
>> > brought in using the 'on_demand' download policy in other repositories.
>> I see.
>>
>> >
>> >>
>> >> If some of the content lived in Squid and some lived
>> >> > in Pulp, it would be difficult for the user to know what content is
>> >> > actually
>> >> > available in Pulp and what content needs to be fetched from a remote
>> >> > repository.
>> >>
>> >> I'd say the rule of the thumb would be: lazy -> squid, regular -> pulp
>> >> so not that difficult.
>> >> Maybe Pulp could have a concept of Origin, where folks upload stuff to
>> >> a Pulp repo, vs. Proxy for it's repo storage policy?
>> >>
>> >
>> > Squid removes things from the cache at some point. You can probably
>> > configure it to never remove anything from the cache, but then we would
>> need
>> > to implement orphan cleanup that would work across two systems: pulp and
>> > squid.
>>
>> Actually "remote" units wouldn't need orphan cleaning from the disk,
>> just dropping them from the DB would suffice.
>>
>> >
>> > Answering that question would still be difficult. Not all content that
>> is in
>> > the repository that was synced using on_demand download policy will be
>> in
>> > Squid - only the content that has been requested by clients. So it's
>> still
>> > hard to know which of the content units have been downloaded and which
>> have
>> > not been.
>>
>> But the beauty is exactly in that: we don't have to track whether the
>> content is downloaded if it is reverse-proxied[1][2].
>> Moreover, this would work both with and without a proxy between Pulp
>> and the Origin of the remote unit.
>> A "remote" content artifact might just need to carry it's URL in a DB
>> column for this to work; so the async artifact model, instead of the
>> "policy=on-demand"  would have a mandatory remote "URL" attribute; I
>> wouldn't say it's more complex than tracking the "policy" attribute.
>>
>> >
>> >
>> >>
>> >> >
>> >> > As Pulp downloads an Artifact, it calculates all the checksums and
>> it's
>> >> > size. It then performs validation based on information that was
>> provided
>> >> > from the RemoteArtifact. After validation is performed, the
>> Artifact, is
>> >> > saved to the database and it's final place in
>> >> > /var/lib/content/artifacts/.
>> >>
>> >> This could be still achieved by storing the content just temporarily
>> >> in the Squid proxy i.e use Squid as the content source, not the disk.
>> >>
>> >> > Once this information is in the database, Pulp's web server can serve
>> >> > the
>> >> > content without having to involve the Streamer or Squid.
>> >>
>> >> Pulp might serve just the API and the metadata, the content might be
>> 

Re: [Pulp-dev] Lazy for Pulp3

2018-05-30 Thread Tom McKay
I think there is a usecase for "proxy only" like is being described here.
Several years ago there was a project called thumbslug[1] that was used in
a version of katello instead of pulp. It's job was to check entitlements
and then proxy content from a cdn. The same functionality could be
implemented in pulp. (Perhaps it's even as simple as telling squid not to
cache anything so the content would never make it from cache to pulp in
current pulp-2.)

Overall I'm +1 to the idea of an only-squid version, if others think it
would be useful.


[1] https://github.com/candlepin/thumbslug

On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik  wrote:

> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban  wrote:
> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik 
> wrote:
> >>
> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban 
> wrote:
> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik 
> >> > wrote:
> >> >>
> >> >> Good point!
> >> >> More the second; it might be a bit crazy to utilize Squid for that
> but
> >> >> first, let's answer the why ;)
> >> >> So why does Pulp need to store the content here?
> >> >> Why don't we point the users to the Squid all the time (for the lazy
> >> >> repos)?
> >> >
> >> >
> >> > Pulp's Streamer needs to fetch and store the content because that's
> >> > Pulp's
> >> > primary responsibility.
> >>
> >> Maybe not that much the storing but rather the content views management?
> >> I mean the partitioning into repositories, promoting.
> >>
> >
> > Exactly this. We want Pulp users to be able to reuse content that was
> > brought in using the 'on_demand' download policy in other repositories.
> I see.
>
> >
> >>
> >> If some of the content lived in Squid and some lived
> >> > in Pulp, it would be difficult for the user to know what content is
> >> > actually
> >> > available in Pulp and what content needs to be fetched from a remote
> >> > repository.
> >>
> >> I'd say the rule of the thumb would be: lazy -> squid, regular -> pulp
> >> so not that difficult.
> >> Maybe Pulp could have a concept of Origin, where folks upload stuff to
> >> a Pulp repo, vs. Proxy for it's repo storage policy?
> >>
> >
> > Squid removes things from the cache at some point. You can probably
> > configure it to never remove anything from the cache, but then we would
> need
> > to implement orphan cleanup that would work across two systems: pulp and
> > squid.
>
> Actually "remote" units wouldn't need orphan cleaning from the disk,
> just dropping them from the DB would suffice.
>
> >
> > Answering that question would still be difficult. Not all content that
> is in
> > the repository that was synced using on_demand download policy will be in
> > Squid - only the content that has been requested by clients. So it's
> still
> > hard to know which of the content units have been downloaded and which
> have
> > not been.
>
> But the beauty is exactly in that: we don't have to track whether the
> content is downloaded if it is reverse-proxied[1][2].
> Moreover, this would work both with and without a proxy between Pulp
> and the Origin of the remote unit.
> A "remote" content artifact might just need to carry it's URL in a DB
> column for this to work; so the async artifact model, instead of the
> "policy=on-demand"  would have a mandatory remote "URL" attribute; I
> wouldn't say it's more complex than tracking the "policy" attribute.
>
> >
> >
> >>
> >> >
> >> > As Pulp downloads an Artifact, it calculates all the checksums and
> it's
> >> > size. It then performs validation based on information that was
> provided
> >> > from the RemoteArtifact. After validation is performed, the Artifact,
> is
> >> > saved to the database and it's final place in
> >> > /var/lib/content/artifacts/.
> >>
> >> This could be still achieved by storing the content just temporarily
> >> in the Squid proxy i.e use Squid as the content source, not the disk.
> >>
> >> > Once this information is in the database, Pulp's web server can serve
> >> > the
> >> > content without having to involve the Streamer or Squid.
> >>
> >> Pulp might serve just the API and the metadata, the content might be
> >> redirected to the Proxy all the time, correct?
> >> Doesn't Crane do that btw?
> >
> >
> > Theoretically we could do this, but in practice we would run into
> problems
> > when we needed to scale out the Content app. Right now when the Content
> app
> > needs to be scaled, a user can launch another machine that will run the
> > Content app. Squid does not support that kind of scaling. Squid can only
> > take advantage of additional cores in a single machine
>
> I don't think I understand; proxies are actually designed to scale[1]
> and are used as tools to scale the web too.
>
> This is all about the How question but when it comes to my original
> Why, please correct me if I'm being wrong, the answer so far has been:
>  Pulp always downloads the content because that's what it is supposed to
> do.
>
> Cheers,
> milan
>
> [1] 

Re: [Pulp-dev] Lazy for Pulp3

2018-05-30 Thread Milan Kovacik
On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban  wrote:
> On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik  wrote:
>>
>> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban  wrote:
>> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik 
>> > wrote:
>> >>
>> >> Good point!
>> >> More the second; it might be a bit crazy to utilize Squid for that but
>> >> first, let's answer the why ;)
>> >> So why does Pulp need to store the content here?
>> >> Why don't we point the users to the Squid all the time (for the lazy
>> >> repos)?
>> >
>> >
>> > Pulp's Streamer needs to fetch and store the content because that's
>> > Pulp's
>> > primary responsibility.
>>
>> Maybe not that much the storing but rather the content views management?
>> I mean the partitioning into repositories, promoting.
>>
>
> Exactly this. We want Pulp users to be able to reuse content that was
> brought in using the 'on_demand' download policy in other repositories.
I see.

>
>>
>> If some of the content lived in Squid and some lived
>> > in Pulp, it would be difficult for the user to know what content is
>> > actually
>> > available in Pulp and what content needs to be fetched from a remote
>> > repository.
>>
>> I'd say the rule of the thumb would be: lazy -> squid, regular -> pulp
>> so not that difficult.
>> Maybe Pulp could have a concept of Origin, where folks upload stuff to
>> a Pulp repo, vs. Proxy for it's repo storage policy?
>>
>
> Squid removes things from the cache at some point. You can probably
> configure it to never remove anything from the cache, but then we would need
> to implement orphan cleanup that would work across two systems: pulp and
> squid.

Actually "remote" units wouldn't need orphan cleaning from the disk,
just dropping them from the DB would suffice.

>
> Answering that question would still be difficult. Not all content that is in
> the repository that was synced using on_demand download policy will be in
> Squid - only the content that has been requested by clients. So it's still
> hard to know which of the content units have been downloaded and which have
> not been.

But the beauty is exactly in that: we don't have to track whether the
content is downloaded if it is reverse-proxied[1][2].
Moreover, this would work both with and without a proxy between Pulp
and the Origin of the remote unit.
A "remote" content artifact might just need to carry it's URL in a DB
column for this to work; so the async artifact model, instead of the
"policy=on-demand"  would have a mandatory remote "URL" attribute; I
wouldn't say it's more complex than tracking the "policy" attribute.

>
>
>>
>> >
>> > As Pulp downloads an Artifact, it calculates all the checksums and it's
>> > size. It then performs validation based on information that was provided
>> > from the RemoteArtifact. After validation is performed, the Artifact, is
>> > saved to the database and it's final place in
>> > /var/lib/content/artifacts/.
>>
>> This could be still achieved by storing the content just temporarily
>> in the Squid proxy i.e use Squid as the content source, not the disk.
>>
>> > Once this information is in the database, Pulp's web server can serve
>> > the
>> > content without having to involve the Streamer or Squid.
>>
>> Pulp might serve just the API and the metadata, the content might be
>> redirected to the Proxy all the time, correct?
>> Doesn't Crane do that btw?
>
>
> Theoretically we could do this, but in practice we would run into problems
> when we needed to scale out the Content app. Right now when the Content app
> needs to be scaled, a user can launch another machine that will run the
> Content app. Squid does not support that kind of scaling. Squid can only
> take advantage of additional cores in a single machine

I don't think I understand; proxies are actually designed to scale[1]
and are used as tools to scale the web too.

This is all about the How question but when it comes to my original
Why, please correct me if I'm being wrong, the answer so far has been:
 Pulp always downloads the content because that's what it is supposed to do.

Cheers,
milan

[1] https://en.wikipedia.org/wiki/Reverse_proxy
[2] https://paste.fedoraproject.org/paste/zkBTyxZjm330FsqvPP0lIA
[3] 
https://wiki.squid-cache.org/Features/CacheHierarchy?highlight=%28faqlisted.yes%29

>
>>
>>
>> Cheers,
>> milan
>>
>> >
>> > -dennis
>> >
>> >
>> >
>> >
>> >
>> >>
>> >>
>> >> --
>> >> cheers
>> >> milan
>> >>
>> >> On Tue, May 29, 2018 at 4:25 PM, Brian Bouterse 
>> >> wrote:
>> >> >
>> >> > On Mon, May 28, 2018 at 9:57 AM, Milan Kovacik 
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> Looking at the diagram[1] I'm wondering what's the reasoning behind
>> >> >> Pulp having to actually fetch the content locally?
>> >> >
>> >> >
>> >> > Is the question "why is Pulp doing the fetching and not Squid?" or
>> >> > "why
>> >> > is
>> >> > Pulp storing the content after fetching it?" or both?
>> >> >
>> >> >> Couldn't Pulp just rely on the proxy with regards to the content
>> 

Re: [Pulp-dev] Lazy for Pulp3

2018-05-29 Thread Jeff Ortel

Looks good.

Made a few minor edits.

On 05/25/2018 02:11 PM, Brian Bouterse wrote:
A mini-team of core devs** met to talk through lazy use cases for 
Pulp3. It's effectively the same lazy from Pulp2 except:


* it's now built into core (not just RPM)
* It disincludes repo protection use cases because we haven't added 
repo protection to Pulp3 yet
* It disincludes the "background" policy which based on feedback from 
stakeholders provided very little value
* it will no longer will depend on Twisted as a dependency. It will 
use asyncio instead.


While it is being built into core, it will require minimal support by 
a plugin writer to add support for it. Details in the epic below.


The current use cases along with a technical plan are written on this 
epic: https://pulp.plan.io/issues/3693 


We're putting it out for comment, questions, and feedback before we 
start into the code. I hope we are able to add this into our next sprint.


** ipanova, jortel, ttereshc, dkliban, bmbouter

Thanks!
Brian



___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Lazy for Pulp3

2018-05-29 Thread Dennis Kliban
On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik  wrote:

> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban  wrote:
> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik 
> wrote:
> >>
> >> Good point!
> >> More the second; it might be a bit crazy to utilize Squid for that but
> >> first, let's answer the why ;)
> >> So why does Pulp need to store the content here?
> >> Why don't we point the users to the Squid all the time (for the lazy
> >> repos)?
> >
> >
> > Pulp's Streamer needs to fetch and store the content because that's
> Pulp's
> > primary responsibility.
>
> Maybe not that much the storing but rather the content views management?
> I mean the partitioning into repositories, promoting.
>
>
Exactly this. We want Pulp users to be able to reuse content that was
brought in using the 'on_demand' download policy in other repositories.


> If some of the content lived in Squid and some lived
> > in Pulp, it would be difficult for the user to know what content is
> actually
> > available in Pulp and what content needs to be fetched from a remote
> > repository.
>
> I'd say the rule of the thumb would be: lazy -> squid, regular -> pulp
> so not that difficult.
> Maybe Pulp could have a concept of Origin, where folks upload stuff to
> a Pulp repo, vs. Proxy for it's repo storage policy?
>
>
Squid removes things from the cache at some point. You can probably
configure it to never remove anything from the cache, but then we would
need to implement orphan cleanup that would work across two systems: pulp
and squid.

Answering that question would still be difficult. Not all content that is
in the repository that was synced using on_demand download policy will be
in Squid - only the content that has been requested by clients. So it's
still hard to know which of the content units have been downloaded and
which have not been.



> >
> > As Pulp downloads an Artifact, it calculates all the checksums and it's
> > size. It then performs validation based on information that was provided
> > from the RemoteArtifact. After validation is performed, the Artifact, is
> > saved to the database and it's final place in
> /var/lib/content/artifacts/.
>
> This could be still achieved by storing the content just temporarily
> in the Squid proxy i.e use Squid as the content source, not the disk.
>
> > Once this information is in the database, Pulp's web server can serve the
> > content without having to involve the Streamer or Squid.
>
> Pulp might serve just the API and the metadata, the content might be
> redirected to the Proxy all the time, correct?
> Doesn't Crane do that btw?
>

Theoretically we could do this, but in practice we would run into problems
when we needed to scale out the Content app. Right now when the Content app
needs to be scaled, a user can launch another machine that will run the
Content app. Squid does not support that kind of scaling. Squid can only
take advantage of additional cores in a single machine.


>
> Cheers,
> milan
>
> >
> > -dennis
> >
> >
> >
> >
> >
> >>
> >>
> >> --
> >> cheers
> >> milan
> >>
> >> On Tue, May 29, 2018 at 4:25 PM, Brian Bouterse 
> >> wrote:
> >> >
> >> > On Mon, May 28, 2018 at 9:57 AM, Milan Kovacik 
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> Looking at the diagram[1] I'm wondering what's the reasoning behind
> >> >> Pulp having to actually fetch the content locally?
> >> >
> >> >
> >> > Is the question "why is Pulp doing the fetching and not Squid?" or
> "why
> >> > is
> >> > Pulp storing the content after fetching it?" or both?
> >> >
> >> >> Couldn't Pulp just rely on the proxy with regards to the content
> >> >> streaming?
> >> >>
> >> >> Thanks,
> >> >> milan
> >> >>
> >> >>
> >> >> [1] https://pulp.plan.io/attachments/130957
> >> >>
> >> >> On Fri, May 25, 2018 at 9:11 PM, Brian Bouterse  >
> >> >> wrote:
> >> >> > A mini-team of core devs** met to talk through lazy use cases for
> >> >> > Pulp3.
> >> >> > It's effectively the same lazy from Pulp2 except:
> >> >> >
> >> >> > * it's now built into core (not just RPM)
> >> >> > * It disincludes repo protection use cases because we haven't added
> >> >> > repo
> >> >> > protection to Pulp3 yet
> >> >> > * It disincludes the "background" policy which based on feedback
> from
> >> >> > stakeholders provided very little value
> >> >> > * it will no longer will depend on Twisted as a dependency. It will
> >> >> > use
> >> >> > asyncio instead.
> >> >> >
> >> >> > While it is being built into core, it will require minimal support
> by
> >> >> > a
> >> >> > plugin writer to add support for it. Details in the epic below.
> >> >> >
> >> >> > The current use cases along with a technical plan are written on
> this
> >> >> > epic:
> >> >> > https://pulp.plan.io/issues/3693
> >> >> >
> >> >> > We're putting it out for comment, questions, and feedback before we
> >> >> > start
> >> >> > into the code. I hope we are able to add this into our next sprint.
> >> >> >
> >> >> > ** ipanova, jortel, ttereshc, dkliban, bmbouter
> >> 

Re: [Pulp-dev] Lazy for Pulp3

2018-05-29 Thread Dennis Kliban
On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik  wrote:

> Good point!
> More the second; it might be a bit crazy to utilize Squid for that but
> first, let's answer the why ;)
> So why does Pulp need to store the content here?
> Why don't we point the users to the Squid all the time (for the lazy
> repos)?
>

Pulp's Streamer needs to fetch and store the content because that's Pulp's
primary responsibility. If some of the content lived in Squid and some
lived in Pulp, it would be difficult for the user to know what content is
actually available in Pulp and what content needs to be fetched from a
remote repository.

As Pulp downloads an Artifact, it calculates all the checksums and it's
size. It then performs validation based on information that was provided
from the RemoteArtifact. After validation is performed, the Artifact, is
saved to the database and it's final place in /var/lib/content/artifacts/.
Once this information is in the database, Pulp's web server can serve the
content without having to involve the Streamer or Squid.

-dennis






>
> --
> cheers
> milan
>
> On Tue, May 29, 2018 at 4:25 PM, Brian Bouterse 
> wrote:
> >
> > On Mon, May 28, 2018 at 9:57 AM, Milan Kovacik 
> wrote:
> >>
> >> Hi,
> >>
> >> Looking at the diagram[1] I'm wondering what's the reasoning behind
> >> Pulp having to actually fetch the content locally?
> >
> >
> > Is the question "why is Pulp doing the fetching and not Squid?" or "why
> is
> > Pulp storing the content after fetching it?" or both?
> >
> >> Couldn't Pulp just rely on the proxy with regards to the content
> >> streaming?
> >>
> >> Thanks,
> >> milan
> >>
> >>
> >> [1] https://pulp.plan.io/attachments/130957
> >>
> >> On Fri, May 25, 2018 at 9:11 PM, Brian Bouterse 
> >> wrote:
> >> > A mini-team of core devs** met to talk through lazy use cases for
> Pulp3.
> >> > It's effectively the same lazy from Pulp2 except:
> >> >
> >> > * it's now built into core (not just RPM)
> >> > * It disincludes repo protection use cases because we haven't added
> repo
> >> > protection to Pulp3 yet
> >> > * It disincludes the "background" policy which based on feedback from
> >> > stakeholders provided very little value
> >> > * it will no longer will depend on Twisted as a dependency. It will
> use
> >> > asyncio instead.
> >> >
> >> > While it is being built into core, it will require minimal support by
> a
> >> > plugin writer to add support for it. Details in the epic below.
> >> >
> >> > The current use cases along with a technical plan are written on this
> >> > epic:
> >> > https://pulp.plan.io/issues/3693
> >> >
> >> > We're putting it out for comment, questions, and feedback before we
> >> > start
> >> > into the code. I hope we are able to add this into our next sprint.
> >> >
> >> > ** ipanova, jortel, ttereshc, dkliban, bmbouter
> >> >
> >> > Thanks!
> >> > Brian
> >> >
> >> >
> >> > ___
> >> > Pulp-dev mailing list
> >> > Pulp-dev@redhat.com
> >> > https://www.redhat.com/mailman/listinfo/pulp-dev
> >> >
> >
> >
>
> ___
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Lazy for Pulp3

2018-05-29 Thread Milan Kovacik
Good point!
More the second; it might be a bit crazy to utilize Squid for that but
first, let's answer the why ;)
So why does Pulp need to store the content here?
Why don't we point the users to the Squid all the time (for the lazy repos)?

--
cheers
milan

On Tue, May 29, 2018 at 4:25 PM, Brian Bouterse  wrote:
>
> On Mon, May 28, 2018 at 9:57 AM, Milan Kovacik  wrote:
>>
>> Hi,
>>
>> Looking at the diagram[1] I'm wondering what's the reasoning behind
>> Pulp having to actually fetch the content locally?
>
>
> Is the question "why is Pulp doing the fetching and not Squid?" or "why is
> Pulp storing the content after fetching it?" or both?
>
>> Couldn't Pulp just rely on the proxy with regards to the content
>> streaming?
>>
>> Thanks,
>> milan
>>
>>
>> [1] https://pulp.plan.io/attachments/130957
>>
>> On Fri, May 25, 2018 at 9:11 PM, Brian Bouterse 
>> wrote:
>> > A mini-team of core devs** met to talk through lazy use cases for Pulp3.
>> > It's effectively the same lazy from Pulp2 except:
>> >
>> > * it's now built into core (not just RPM)
>> > * It disincludes repo protection use cases because we haven't added repo
>> > protection to Pulp3 yet
>> > * It disincludes the "background" policy which based on feedback from
>> > stakeholders provided very little value
>> > * it will no longer will depend on Twisted as a dependency. It will use
>> > asyncio instead.
>> >
>> > While it is being built into core, it will require minimal support by a
>> > plugin writer to add support for it. Details in the epic below.
>> >
>> > The current use cases along with a technical plan are written on this
>> > epic:
>> > https://pulp.plan.io/issues/3693
>> >
>> > We're putting it out for comment, questions, and feedback before we
>> > start
>> > into the code. I hope we are able to add this into our next sprint.
>> >
>> > ** ipanova, jortel, ttereshc, dkliban, bmbouter
>> >
>> > Thanks!
>> > Brian
>> >
>> >
>> > ___
>> > Pulp-dev mailing list
>> > Pulp-dev@redhat.com
>> > https://www.redhat.com/mailman/listinfo/pulp-dev
>> >
>
>

___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Lazy for Pulp3

2018-05-29 Thread Brian Bouterse
On Mon, May 28, 2018 at 9:57 AM, Milan Kovacik  wrote:

> Hi,
>
> Looking at the diagram[1] I'm wondering what's the reasoning behind
> Pulp having to actually fetch the content locally?
>

Is the question "why is Pulp doing the fetching and not Squid?" or "why is
Pulp storing the content after fetching it?" or both?

Couldn't Pulp just rely on the proxy with regards to the content streaming?
>
> Thanks,
> milan
>
>
> [1] https://pulp.plan.io/attachments/130957
>
> On Fri, May 25, 2018 at 9:11 PM, Brian Bouterse 
> wrote:
> > A mini-team of core devs** met to talk through lazy use cases for Pulp3.
> > It's effectively the same lazy from Pulp2 except:
> >
> > * it's now built into core (not just RPM)
> > * It disincludes repo protection use cases because we haven't added repo
> > protection to Pulp3 yet
> > * It disincludes the "background" policy which based on feedback from
> > stakeholders provided very little value
> > * it will no longer will depend on Twisted as a dependency. It will use
> > asyncio instead.
> >
> > While it is being built into core, it will require minimal support by a
> > plugin writer to add support for it. Details in the epic below.
> >
> > The current use cases along with a technical plan are written on this
> epic:
> > https://pulp.plan.io/issues/3693
> >
> > We're putting it out for comment, questions, and feedback before we start
> > into the code. I hope we are able to add this into our next sprint.
> >
> > ** ipanova, jortel, ttereshc, dkliban, bmbouter
> >
> > Thanks!
> > Brian
> >
> >
> > ___
> > Pulp-dev mailing list
> > Pulp-dev@redhat.com
> > https://www.redhat.com/mailman/listinfo/pulp-dev
> >
>
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Lazy for Pulp3

2018-05-25 Thread Brian Bouterse
A mini-team of core devs** met to talk through lazy use cases for Pulp3.
It's effectively the same lazy from Pulp2 except:

* it's now built into core (not just RPM)
* It disincludes repo protection use cases because we haven't added repo
protection to Pulp3 yet
* It disincludes the "background" policy which based on feedback from
stakeholders provided very little value
* it will no longer will depend on Twisted as a dependency. It will use
asyncio instead.

While it is being built into core, it will require minimal support by a
plugin writer to add support for it. Details in the epic below.

The current use cases along with a technical plan are written on this epic:
https://pulp.plan.io/issues/3693

We're putting it out for comment, questions, and feedback before we start
into the code. I hope we are able to add this into our next sprint.

** ipanova, jortel, ttereshc, dkliban, bmbouter

Thanks!
Brian
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Lazy for Pulp3

2018-05-17 Thread Tatiana Tereshchenko
Thanks Justin!

+1 to consider this use case, I can confirm that many users asked for that.

Tanya

On Wed, May 16, 2018 at 8:52 PM, Justin Sherrill 
wrote:

>
>
> On 05/16/2018 01:02 PM, Brian Bouterse wrote:
>
> A mini-team of @jortel, @ttereshc, @ipanova, @dkliban, and @bmbouter met
> today to discuss Lazy use-cases for Pulp3. The "initial" use cases would
> all be delivered together as the first implementation to be used with Pulp
> 3.0. The ones for "later" are blocked on other gaps in core's functionality
> (specifically content protection) which should come with 3.1+.
>
> Feedback on these use cases is welcome. We are meeting again on this
> upcoming Monday, after which, we will writeup the work into Redmine. We'll
> email this thread with links to the Redmine plan when it's available for
> comment.
>
> Initial use cases are:
>
>- pull-through caching of packages (squid)
>
>
>- parallel streaming of bits to multiple clients (squid)
>
>
>- Pulp redirects to squid when content is not already downloaded (pulp)
>
>
>- streaming data and headers (streamer)
>
>
>- After streamer downloads the content, the new Artifact is created
>and associated with the correct ContentArtifact (downloader)
>
>
>- to use a configured downloader, configured by the correct remote.
>This would correctly configure authentication, proxy, mirrorlists, etc.
>when fetching content (streamer)
>
>
> Use cases to be implemented later. Currently blocked because Pulp itself
> doesn't verify client entitlement for content currently.
>
>- authentication of the client to verify they are entitled to the
>content
>
>
>
> Could I suggest to consider:
>
> * The ability to delete all downloaded content in a repository (basically
> null out the content).  I've been using this rhel7 repo for years, and
> likely all the old content is not needed anymore.
>
> I've seen this requested from time to time over the past couple years.
>
>
>
>
>
> ___
> Pulp-dev mailing 
> listPulp-dev@redhat.comhttps://www.redhat.com/mailman/listinfo/pulp-dev
>
>
>
> ___
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


Re: [Pulp-dev] Lazy for Pulp3

2018-05-16 Thread Justin Sherrill



On 05/16/2018 01:02 PM, Brian Bouterse wrote:
A mini-team of @jortel, @ttereshc, @ipanova, @dkliban, and @bmbouter 
met today to discuss Lazy use-cases for Pulp3. The "initial" use cases 
would all be delivered together as the first implementation to be used 
with Pulp 3.0. The ones for "later" are blocked on other gaps in 
core's functionality (specifically content protection) which should 
come with 3.1+.


Feedback on these use cases is welcome. We are meeting again on this 
upcoming Monday, after which, we will writeup the work into Redmine. 
We'll email this thread with links to the Redmine plan when it's 
available for comment.


Initial use cases are:

  * pull-through caching of packages (squid)

  * parallel streaming of bits to multiple clients (squid)

  * Pulp redirects to squid when content is not already downloaded (pulp)

  * streaming data and headers (streamer)

  * After streamer downloads the content, the new Artifact is created
and associated with the correct ContentArtifact (downloader)

  * to use a configured downloader, configured by the correct remote.
This would correctly configure authentication, proxy, mirrorlists,
etc. when fetching content (streamer)


Use cases to be implemented later. Currently blocked because Pulp 
itself doesn't verify client entitlement for content currently.


  * authentication of the client to verify they are entitled to the
content




Could I suggest to consider:

* The ability to delete all downloaded content in a repository 
(basically null out the content).  I've been using this rhel7 repo for 
years, and likely all the old content is not needed anymore.


I've seen this requested from time to time over the past couple years.






___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev


[Pulp-dev] Lazy for Pulp3

2018-05-16 Thread Brian Bouterse
A mini-team of @jortel, @ttereshc, @ipanova, @dkliban, and @bmbouter met
today to discuss Lazy use-cases for Pulp3. The "initial" use cases would
all be delivered together as the first implementation to be used with Pulp
3.0. The ones for "later" are blocked on other gaps in core's functionality
(specifically content protection) which should come with 3.1+.

Feedback on these use cases is welcome. We are meeting again on this
upcoming Monday, after which, we will writeup the work into Redmine. We'll
email this thread with links to the Redmine plan when it's available for
comment.

Initial use cases are:

   - pull-through caching of packages (squid)


   - parallel streaming of bits to multiple clients (squid)


   - Pulp redirects to squid when content is not already downloaded (pulp)


   - streaming data and headers (streamer)


   - After streamer downloads the content, the new Artifact is created and
   associated with the correct ContentArtifact (downloader)


   - to use a configured downloader, configured by the correct remote. This
   would correctly configure authentication, proxy, mirrorlists, etc. when
   fetching content (streamer)


Use cases to be implemented later. Currently blocked because Pulp itself
doesn't verify client entitlement for content currently.

   - authentication of the client to verify they are entitled to the content
___
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev