On Wed, May 30, 2018 at 4:50 PM, Brian Bouterse <[email protected]> wrote: > > > On Wed, May 30, 2018 at 8:57 AM, Tom McKay <[email protected]> wrote: >> >> I think there is a usecase for "proxy only" like is being described here. >> Several years ago there was a project called thumbslug[1] that was used in a >> version of katello instead of pulp. It's job was to check entitlements and >> then proxy content from a cdn. The same functionality could be implemented >> in pulp. (Perhaps it's even as simple as telling squid not to cache anything >> so the content would never make it from cache to pulp in current pulp-2.) > > > What would you call this policy? > policy=proxy? > policy=stream-dont-save? > policy=stream-no-save? > > Are the names 'on-demand' and 'immediate' clear enough? Are there better > names? >> >> >> Overall I'm +1 to the idea of an only-squid version, if others think it >> would be useful. > > > I understand describing this as a "only-squid" version, but for clarity, the > streamer would still be required because it is what requests the bits with > the correctly configured downloader (certs, proxy, etc). The streamer > streams the bits into squid which provides caching and client multiplexing.
I have to admit it's just now I'm reading https://docs.pulpproject.org/dev-guide/design/deferred-download.html#apache-reverse-proxy again because of the SSL termination. So the new plan is to use the streamer to terminate the SSL instead of the Apache reverse proxy? W/r the construction of the URL of an artifact, I thought it would be stored in the DB, so the Remote would create it during the sync. > > To confirm my understanding this "squid-only" policy would be the same as > on-demand except that it would *not* perform step 14 from the diagram here > (https://pulp.plan.io/issues/3693). Is that right? yup > >> >> >> [1] https://github.com/candlepin/thumbslug >> >> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik <[email protected]> >> wrote: >>> >>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban <[email protected]> >>> wrote: >>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik <[email protected]> >>> > wrote: >>> >> >>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban <[email protected]> >>> >> wrote: >>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik >>> >> > <[email protected]> >>> >> > wrote: >>> >> >> >>> >> >> Good point! >>> >> >> More the second; it might be a bit crazy to utilize Squid for that >>> >> >> but >>> >> >> first, let's answer the why ;) >>> >> >> So why does Pulp need to store the content here? >>> >> >> Why don't we point the users to the Squid all the time (for the >>> >> >> lazy >>> >> >> repos)? >>> >> > >>> >> > >>> >> > Pulp's Streamer needs to fetch and store the content because that's >>> >> > Pulp's >>> >> > primary responsibility. >>> >> >>> >> Maybe not that much the storing but rather the content views >>> >> management? >>> >> I mean the partitioning into repositories, promoting. >>> >> >>> > >>> > Exactly this. We want Pulp users to be able to reuse content that was >>> > brought in using the 'on_demand' download policy in other repositories. >>> I see. >>> >>> > >>> >> >>> >> If some of the content lived in Squid and some lived >>> >> > in Pulp, it would be difficult for the user to know what content is >>> >> > actually >>> >> > available in Pulp and what content needs to be fetched from a remote >>> >> > repository. >>> >> >>> >> I'd say the rule of the thumb would be: lazy -> squid, regular -> pulp >>> >> so not that difficult. >>> >> Maybe Pulp could have a concept of Origin, where folks upload stuff to >>> >> a Pulp repo, vs. Proxy for it's repo storage policy? >>> >> >>> > >>> > Squid removes things from the cache at some point. You can probably >>> > configure it to never remove anything from the cache, but then we would >>> > need >>> > to implement orphan cleanup that would work across two systems: pulp >>> > and >>> > squid. >>> >>> Actually "remote" units wouldn't need orphan cleaning from the disk, >>> just dropping them from the DB would suffice. >>> >>> > >>> > Answering that question would still be difficult. Not all content that >>> > is in >>> > the repository that was synced using on_demand download policy will be >>> > in >>> > Squid - only the content that has been requested by clients. So it's >>> > still >>> > hard to know which of the content units have been downloaded and which >>> > have >>> > not been. >>> >>> But the beauty is exactly in that: we don't have to track whether the >>> content is downloaded if it is reverse-proxied[1][2]. >>> Moreover, this would work both with and without a proxy between Pulp >>> and the Origin of the remote unit. >>> A "remote" content artifact might just need to carry it's URL in a DB >>> column for this to work; so the async artifact model, instead of the >>> "policy=on-demand" would have a mandatory remote "URL" attribute; I >>> wouldn't say it's more complex than tracking the "policy" attribute. >>> >>> > >>> > >>> >> >>> >> > >>> >> > As Pulp downloads an Artifact, it calculates all the checksums and >>> >> > it's >>> >> > size. It then performs validation based on information that was >>> >> > provided >>> >> > from the RemoteArtifact. After validation is performed, the >>> >> > Artifact, is >>> >> > saved to the database and it's final place in >>> >> > /var/lib/content/artifacts/. >>> >> >>> >> This could be still achieved by storing the content just temporarily >>> >> in the Squid proxy i.e use Squid as the content source, not the disk. >>> >> >>> >> > Once this information is in the database, Pulp's web server can >>> >> > serve >>> >> > the >>> >> > content without having to involve the Streamer or Squid. >>> >> >>> >> Pulp might serve just the API and the metadata, the content might be >>> >> redirected to the Proxy all the time, correct? >>> >> Doesn't Crane do that btw? >>> > >>> > >>> > Theoretically we could do this, but in practice we would run into >>> > problems >>> > when we needed to scale out the Content app. Right now when the Content >>> > app >>> > needs to be scaled, a user can launch another machine that will run the >>> > Content app. Squid does not support that kind of scaling. Squid can >>> > only >>> > take advantage of additional cores in a single machine >>> >>> I don't think I understand; proxies are actually designed to scale[1] >>> and are used as tools to scale the web too. >>> >>> This is all about the How question but when it comes to my original >>> Why, please correct me if I'm being wrong, the answer so far has been: >>> Pulp always downloads the content because that's what it is supposed to >>> do. >>> >>> Cheers, >>> milan >>> >>> [1] https://en.wikipedia.org/wiki/Reverse_proxy >>> [2] https://paste.fedoraproject.org/paste/zkBTyxZjm330FsqvPP0lIA >>> [3] >>> https://wiki.squid-cache.org/Features/CacheHierarchy?highlight=%28faqlisted.yes%29 >>> >>> > >>> >> >>> >> >>> >> Cheers, >>> >> milan >>> >> >>> >> > >>> >> > -dennis >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> cheers >>> >> >> milan >>> >> >> >>> >> >> On Tue, May 29, 2018 at 4:25 PM, Brian Bouterse >>> >> >> <[email protected]> >>> >> >> wrote: >>> >> >> > >>> >> >> > On Mon, May 28, 2018 at 9:57 AM, Milan Kovacik >>> >> >> > <[email protected]> >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> Hi, >>> >> >> >> >>> >> >> >> Looking at the diagram[1] I'm wondering what's the reasoning >>> >> >> >> behind >>> >> >> >> Pulp having to actually fetch the content locally? >>> >> >> > >>> >> >> > >>> >> >> > Is the question "why is Pulp doing the fetching and not Squid?" >>> >> >> > or >>> >> >> > "why >>> >> >> > is >>> >> >> > Pulp storing the content after fetching it?" or both? >>> >> >> > >>> >> >> >> Couldn't Pulp just rely on the proxy with regards to the content >>> >> >> >> streaming? >>> >> >> >> >>> >> >> >> Thanks, >>> >> >> >> milan >>> >> >> >> >>> >> >> >> >>> >> >> >> [1] https://pulp.plan.io/attachments/130957 >>> >> >> >> >>> >> >> >> On Fri, May 25, 2018 at 9:11 PM, Brian Bouterse >>> >> >> >> <[email protected]> >>> >> >> >> wrote: >>> >> >> >> > A mini-team of core devs** met to talk through lazy use cases >>> >> >> >> > for >>> >> >> >> > Pulp3. >>> >> >> >> > It's effectively the same lazy from Pulp2 except: >>> >> >> >> > >>> >> >> >> > * it's now built into core (not just RPM) >>> >> >> >> > * It disincludes repo protection use cases because we haven't >>> >> >> >> > added >>> >> >> >> > repo >>> >> >> >> > protection to Pulp3 yet >>> >> >> >> > * It disincludes the "background" policy which based on >>> >> >> >> > feedback >>> >> >> >> > from >>> >> >> >> > stakeholders provided very little value >>> >> >> >> > * it will no longer will depend on Twisted as a dependency. It >>> >> >> >> > will >>> >> >> >> > use >>> >> >> >> > asyncio instead. >>> >> >> >> > >>> >> >> >> > While it is being built into core, it will require minimal >>> >> >> >> > support >>> >> >> >> > by >>> >> >> >> > a >>> >> >> >> > plugin writer to add support for it. Details in the epic >>> >> >> >> > below. >>> >> >> >> > >>> >> >> >> > The current use cases along with a technical plan are written >>> >> >> >> > on >>> >> >> >> > this >>> >> >> >> > epic: >>> >> >> >> > https://pulp.plan.io/issues/3693 >>> >> >> >> > >>> >> >> >> > We're putting it out for comment, questions, and feedback >>> >> >> >> > before >>> >> >> >> > we >>> >> >> >> > start >>> >> >> >> > into the code. I hope we are able to add this into our next >>> >> >> >> > sprint. >>> >> >> >> > >>> >> >> >> > ** ipanova, jortel, ttereshc, dkliban, bmbouter >>> >> >> >> > >>> >> >> >> > Thanks! >>> >> >> >> > Brian >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > _______________________________________________ >>> >> >> >> > Pulp-dev mailing list >>> >> >> >> > [email protected] >>> >> >> >> > https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> >> >> > >>> >> >> > >>> >> >> > >>> >> >> >>> >> >> _______________________________________________ >>> >> >> Pulp-dev mailing list >>> >> >> [email protected] >>> >> >> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> > >>> >> > >>> > >>> > >>> >>> _______________________________________________ >>> Pulp-dev mailing list >>> [email protected] >>> https://www.redhat.com/mailman/listinfo/pulp-dev >> >> >> >> _______________________________________________ >> Pulp-dev mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/pulp-dev >> > _______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
