subject:"New Job Cacher plugin to cache dependencies of builds on docker based executors"

Re: New Job Cacher plugin to cache dependencies of builds on docker based executors

2016-11-30 Thread Peter Hayes

Ok. Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/97781456-775b-4141-ae09-4b9f710c76b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: New Job Cacher plugin to cache dependencies of builds on docker based executors

2016-11-30 Thread Jesse Glick

Probably you are looking for the External Workspace Manager plugin.
Last I checked, it had not been extended to really support clouds. I
suggest you raise this as an RFE with the PSE team, rather than
discussing it here—unless you intend to try developing such an
extension yourself, in which case you would likely want to hang out in
https://gitter.im/jenkinsci/external-workspace-manager-plugin and ask
for advice.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr3HMGUEPP9PRReGLvY-37VfaE%2BsGdqe1SM_GXvB1EaOKQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: New Job Cacher plugin to cache dependencies of builds on docker based executors

2016-11-30 Thread Peter Hayes

Continuing to think on this a bit more - FilePath abstraction doesn't look 
like it would work as it assumes a computer on the other end. What if there 
was an "External Storage Plugin" extension point that could be backed by S3 
and leveraged by other plugins for managing large files associated with 
jobs.  Ideally it would share a job lifecycle so that when jobs are renamed 
/ deleted, the related external storage area for the jobs would be managed 
as well.  Is there an extension point for something like that?

On Wednesday, November 30, 2016 at 3:36:43 PM UTC-5, Peter Hayes wrote:
>
> Thanks for the insight.  I do see that this will cause a burden on the 
> master node.  Since we are using CJP-PSE, that is mitigated somewhat as we 
> will be running quite a few masters so the ratio of jobs to masters won't 
> be terribly high.  
>
> Reusing workspaces isn't an option for us due to the architecture of 
> CJP-PSE at the moment. I actually did start using an externally mounted 
> volume but as you note, we will run into concurrency issues with shared 
> caches on the host instance and there is no reliable way to separate the 
> caches while still getting the benefit of caches as there is no distinct 
> executor number (always 1). If there was some enhancement to CJP to 
> transparently manage workspaces across executor (and support parallel build 
> execution) then we could look at that.  I did raise this with the PSE team 
> in any event a while back and I imagine that this will need to be addressed 
> as it is a step back in performance from classic persistent Jenkins 
> executors.
>
> The other thought that crossed my mind since we are running in AWS is to 
> leverage a more scalable file store within AWS like S3.  Both artifact 
> archiving and dependency caching could be good candidates. It would be cool 
> if there was an S3 backing of FilePath abstraction and plugin developers 
> could seamlessly access it via Project.getStoragePath() or something like 
> that.  Then a plugin like I am proposing could provide a more scalable 
> solution without hardwiring to S3.  I'm guessing I'm not the first to think 
> of it so there are likely challenges in doing so. 
>
> On Wednesday, November 30, 2016 at 2:04:03 PM UTC-5, Jesse Glick wrote:
>>
>> On Wed, Nov 30, 2016 at 10:18 AM, Peter Hayes  wrote: 
>> > each time you run a job, you 
>> > start with a fresh container without any previously cached dependencies 
>> (we 
>> > use gradle generally).  This increases the length of the build and adds 
>> > network traffic to our Artifactory instance.  I looked around for 
>> existing 
>> > plugins but didn't find any so I have started a plugin[1] based on 
>> > SimpleBuildWrapper that stores a configured set of files on the master 
>> at 
>> > the end of the build and then on the next build downloads them to 
>> master in 
>> > the original location. 
>>
>> This seems like a poor approach; rather than overloading Artifactory, 
>> you will be overloading the Jenkins master. Archiving artifacts via 
>> the Remoting channel can already wreck performance; you are talking 
>> about potentially orders of magnitude more traffic than that. 
>>
>> There are two basic approaches to this kind of problem. One, which 
>> assumes that the agents reuse workspaces between builds, is to set the 
>> local repository/cache location to a workspace location. The 
>> `docker-workflow` demo does this: 
>>
>>
>> https://github.com/jenkinsci/docker-workflow-plugin/blob/46432bbe36af17dac93cfedcc93ffa51beba1343/demo/repo/flow.groovy#L20-L22
>>  
>>
>> The other approach is to mount a volume containing the cache, letting 
>> the Docker daemon handle the storage, which the 
>> `parallel-test-executor` demo does: 
>>
>>
>> https://github.com/jenkinsci/parallel-test-executor-plugin/blob/3961df3784045df1f6f285bc2b685ead4bc8593b/demo/Makefile#L3-L27
>>  
>>
>> The volume-based approach is probably the more scalable, though there 
>> are two points to beware: at least Maven’s `install:install` will dump 
>> locally built artifacts into the repository alongside downloaded 
>> releases (probably Gradle does something similar); and Maven’s Aether 
>> repository manager is by default not thread-safe (Takari fixes this). 
>> Maven 5 may allow the cache to be properly separated (again I am not 
>> sure how Gradle fares here); in the meantime you may need to ensure 
>> that there is a distinct volume for every potentially concurrent 
>> build, for example keyed by `${JOB_NAME}/${EXECUTOR_NUMBER}`. 
>>
>> At any rate the exact solution chosen is going to depend on details of 
>> how agents are provisioned and workspaces managed, so at root this 
>> might simply be an RFE for CJP-PSE. 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web

Re: New Job Cacher plugin to cache dependencies of builds on docker based executors

2016-11-30 Thread Peter Hayes

Thanks for the insight.  I do see that this will cause a burden on the 
master node.  Since we are using CJP-PSE, that is mitigated somewhat as we 
will be running quite a few masters so the ratio of jobs to masters won't 
be terribly high.  

Reusing workspaces isn't an option for us due to the architecture of 
CJP-PSE at the moment. I actually did start using an externally mounted 
volume but as you note, we will run into concurrency issues with shared 
caches on the host instance and there is no reliable way to separate the 
caches while still getting the benefit of caches as there is no distinct 
executor number (always 1). If there was some enhancement to CJP to 
transparently manage workspaces across executor (and support parallel build 
execution) then we could look at that.  I did raise this with the PSE team 
in any event a while back and I imagine that this will need to be addressed 
as it is a step back in performance from classic persistent Jenkins 
executors.

The other thought that crossed my mind since we are running in AWS is to 
leverage a more scalable file store within AWS like S3.  Both artifact 
archiving and dependency caching could be good candidates. It would be cool 
if there was an S3 backing of FilePath abstraction and plugin developers 
could seamlessly access it via Project.getStoragePath() or something like 
that.  Then a plugin like I am proposing could provide a more scalable 
solution without hardwiring to S3.  I'm guessing I'm not the first to think 
of it so there are likely challenges in doing so. 

On Wednesday, November 30, 2016 at 2:04:03 PM UTC-5, Jesse Glick wrote:
>
> On Wed, Nov 30, 2016 at 10:18 AM, Peter Hayes  > wrote: 
> > each time you run a job, you 
> > start with a fresh container without any previously cached dependencies 
> (we 
> > use gradle generally).  This increases the length of the build and adds 
> > network traffic to our Artifactory instance.  I looked around for 
> existing 
> > plugins but didn't find any so I have started a plugin[1] based on 
> > SimpleBuildWrapper that stores a configured set of files on the master 
> at 
> > the end of the build and then on the next build downloads them to master 
> in 
> > the original location. 
>
> This seems like a poor approach; rather than overloading Artifactory, 
> you will be overloading the Jenkins master. Archiving artifacts via 
> the Remoting channel can already wreck performance; you are talking 
> about potentially orders of magnitude more traffic than that. 
>
> There are two basic approaches to this kind of problem. One, which 
> assumes that the agents reuse workspaces between builds, is to set the 
> local repository/cache location to a workspace location. The 
> `docker-workflow` demo does this: 
>
>
> https://github.com/jenkinsci/docker-workflow-plugin/blob/46432bbe36af17dac93cfedcc93ffa51beba1343/demo/repo/flow.groovy#L20-L22
>  
>
> The other approach is to mount a volume containing the cache, letting 
> the Docker daemon handle the storage, which the 
> `parallel-test-executor` demo does: 
>
>
> https://github.com/jenkinsci/parallel-test-executor-plugin/blob/3961df3784045df1f6f285bc2b685ead4bc8593b/demo/Makefile#L3-L27
>  
>
> The volume-based approach is probably the more scalable, though there 
> are two points to beware: at least Maven’s `install:install` will dump 
> locally built artifacts into the repository alongside downloaded 
> releases (probably Gradle does something similar); and Maven’s Aether 
> repository manager is by default not thread-safe (Takari fixes this). 
> Maven 5 may allow the cache to be properly separated (again I am not 
> sure how Gradle fares here); in the meantime you may need to ensure 
> that there is a distinct volume for every potentially concurrent 
> build, for example keyed by `${JOB_NAME}/${EXECUTOR_NUMBER}`. 
>
> At any rate the exact solution chosen is going to depend on details of 
> how agents are provisioned and workspaces managed, so at root this 
> might simply be an RFE for CJP-PSE. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/0ab8e598-1bc9-4602-ab00-1fcdd33590d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: New Job Cacher plugin to cache dependencies of builds on docker based executors

2016-11-30 Thread Jesse Glick

On Wed, Nov 30, 2016 at 10:18 AM, Peter Hayes  wrote:
> each time you run a job, you
> start with a fresh container without any previously cached dependencies (we
> use gradle generally).  This increases the length of the build and adds
> network traffic to our Artifactory instance.  I looked around for existing
> plugins but didn't find any so I have started a plugin[1] based on
> SimpleBuildWrapper that stores a configured set of files on the master at
> the end of the build and then on the next build downloads them to master in
> the original location.

This seems like a poor approach; rather than overloading Artifactory,
you will be overloading the Jenkins master. Archiving artifacts via
the Remoting channel can already wreck performance; you are talking
about potentially orders of magnitude more traffic than that.

There are two basic approaches to this kind of problem. One, which
assumes that the agents reuse workspaces between builds, is to set the
local repository/cache location to a workspace location. The
`docker-workflow` demo does this:

https://github.com/jenkinsci/docker-workflow-plugin/blob/46432bbe36af17dac93cfedcc93ffa51beba1343/demo/repo/flow.groovy#L20-L22

The other approach is to mount a volume containing the cache, letting
the Docker daemon handle the storage, which the
`parallel-test-executor` demo does:

https://github.com/jenkinsci/parallel-test-executor-plugin/blob/3961df3784045df1f6f285bc2b685ead4bc8593b/demo/Makefile#L3-L27

The volume-based approach is probably the more scalable, though there
are two points to beware: at least Maven’s `install:install` will dump
locally built artifacts into the repository alongside downloaded
releases (probably Gradle does something similar); and Maven’s Aether
repository manager is by default not thread-safe (Takari fixes this).
Maven 5 may allow the cache to be properly separated (again I am not
sure how Gradle fares here); in the meantime you may need to ensure
that there is a distinct volume for every potentially concurrent
build, for example keyed by `${JOB_NAME}/${EXECUTOR_NUMBER}`.

At any rate the exact solution chosen is going to depend on details of
how agents are provisioned and workspaces managed, so at root this
might simply be an RFE for CJP-PSE.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr3pkMzJ9MzMFhUPNRkePJCM3EeyDd3C1KrgAtXvnnZnWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

New Job Cacher plugin to cache dependencies of builds on docker based executors

2016-11-30 Thread Peter Hayes

Hi,

We are using Cloudbees Private SaaS Edition which utilizes docker
containers as executors. A side effect of this is that each time you run a
job, you start with a fresh container without any previously cached
dependencies (we use gradle generally). This increases the length of the
build and adds network traffic to our Artifactory instance. I looked
around for existing plugins but didn't find any so I have started a
plugin[1] based on SimpleBuildWrapper that stores a configured set of files
on the master at the end of the build and then on the next build downloads
them to master in the original location.

I still have more work remaining but prior to investing more time, I wanted
to check with this group to see if it makes sense to complete this or if
there is a better option. I also had seen a post[2] on the user's list a
few months ago looking for a similar capability that didn't come up with
anything.

Thanks,
Pete

[1] https://github.com/petehayes/jobcacher-plugin
[2] https://groups.google.com/forum/#!topic/jenkinsci-users/n0A1qBLe2Is

--
You received this message because you are subscribed to the Google Groups
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/jenkinsci-dev/381ee609-3568-4b4d-9930-978ec2378c7f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: New Job Cacher plugin to cache dependencies of builds on docker based executors

Re: New Job Cacher plugin to cache dependencies of builds on docker based executors

Re: New Job Cacher plugin to cache dependencies of builds on docker based executors

Re: New Job Cacher plugin to cache dependencies of builds on docker based executors

Re: New Job Cacher plugin to cache dependencies of builds on docker based executors

New Job Cacher plugin to cache dependencies of builds on docker based executors

6 matches

Site Navigation

Mail list logo

Footer information