[JIRA] (JENKINS-57795) Orphaned EC2 instances after Jenkins restart
Title: Message Title Pierson Yieh edited a comment on JENKINS-57795 Re: Orphaned EC2 instances after Jenkins restart We've identified the cause of our issue. The orphan re-attachment logic is tied the EC2Cloud's provision method. But the issue occurs when the actual number of existing AWS nodes has hit an instance cap (i.e. no more nodes can be provisioned). Because we've hit an instance cap, provisioning isn't even attempted and the orphan re-attachment logic isn't triggered. We're submitting Submitted a PR now that addresses this issue here: [https://github . com/jenkinsci/ec2-plugin/pull/448] Add Comment This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.199745.1559315049000.11297.1586900160314%40Atlassian.JIRA.
[JIRA] (JENKINS-57795) Orphaned EC2 instances after Jenkins restart
Title: Message Title Pierson Yieh commented on JENKINS-57795 Re: Orphaned EC2 instances after Jenkins restart We've identified the cause of our issue. The orphan re-attachment logic is tied the EC2Cloud's provision method. But the issue occurs when the actual number of existing AWS nodes has hit an instance cap (i.e. no more nodes can be provisioned). Because we've hit an instance cap, provisioning isn't even attempted and the orphan re-attachment logic isn't triggered. We're submitting a PR now that addresses this issue. Add Comment This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.199745.1559315049000.11183.1586893200240%40Atlassian.JIRA.
[JIRA] (JENKINS-57795) Orphaned EC2 instances after Jenkins restart
Title: Message Title Pierson Yieh commented on JENKINS-57795 Re: Orphaned EC2 instances after Jenkins restart Jakub Bochenski Was the problem that was solved the issue of orphan nodes not getting reconnected or agents dying during launch? Our issue is of orphan nodes not getting re-attached to their respective Jenkins Masters. Add Comment This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.199745.1559315049000.8361.1586376180257%40Atlassian.JIRA.
[JIRA] (JENKINS-57795) Orphaned EC2 instances after Jenkins restart
Title: Message Title Pierson Yieh edited a comment on JENKINS-57795 Re: Orphaned EC2 instances after Jenkins restart We've also seen this behavior before though we're not sure how to reproduce the problem. We saw it when we'd hit our max AWS request limit and Jenkins started losing track of nodes and couldn't spin up new ones cause the orphaned nodes were still being counted towards the max instance count, but weren't showing up in the Jenkins UI.I'm able to "simulate" the "losing track of nodes" by running a groovy script on the Jenkins Master to manually remove the node for the Jenkins object. And we're looking into implementing a feature to automatically re-attach these orphaned nodes to Jenkins. Update: seems the SlaveTemplate.checkInstance() finds our orphan nodes and were able to re-attach them to the Jenkins Master. Not sure why in the past they weren't getting re-attached. Add Comment This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.199745.1559315049000.7895.1586301900338%40Atlassian.JIRA.
[JIRA] (JENKINS-57795) Orphaned EC2 instances after Jenkins restart
Title: Message Title Pierson Yieh commented on JENKINS-57795 Re: Orphaned EC2 instances after Jenkins restart We've also seen this behavior before though we're not sure how to reproduce the problem. We saw it when we'd hit our max AWS request limit and Jenkins started losing track of nodes and couldn't spin up new ones cause the orphaned nodes were still being counted towards the max instance count, but weren't showing up in the Jenkins UI. I'm able to "simulate" the "losing track of nodes" by running a groovy script on the Jenkins Master to manually remove the node for the Jenkins object. And we're looking into implementing a feature to automatically re-attach these orphaned nodes to Jenkins. Add Comment This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.199745.1559315049000.7874.1586299500323%40Atlassian.JIRA.
[JIRA] (JENKINS-61314) EC2 Plugin: Windows java.io.IOException: Pipe Closed
Title: Message Title Pierson Yieh created an issue Jenkins / JENKINS-61314 EC2 Plugin: Windows java.io.IOException: Pipe Closed Issue Type: Bug Assignee: FABRIZIO MANFREDI Attachments: Screen Shot 2020-03-03 at 1.14.50 PM.png, Screen Shot 2020-03-03 at 1.15.48 PM.png Components: ec2-plugin Created: 2020-03-03 21:17 Environment: Jenkins 2.204.2 Amazon EC2 plugin 1.49.1 Labels: Windows Jenkins ec2-plugin Priority: Minor Reporter: Pierson Yieh We are seeing 2 occurrences of a java.io.IOException when using Windows nodes as agents. 1. When new nodes are being spun up for Windows jobs, it appears that Jenkins will assign a newly spun up node (after establishing a connection) to service the build, but the connection gets terminated before the build even begins. Started by user some_user Running as SYSTEM Building remotely on EC2 (amazon-ec2) - windows-label (i-0de1740c2107602b3) (windows-label) in workspace D:\dev\jenkins\workspace\WindowsStressTests\SimpleWindowsBuild11 FATAL: java.io.IOException: Pipe is already closed hudson.remoting.FastPipedInputStream$ClosedBy: The pipe was closed at... at hudson.remoting.FastPipedOutputStream.error(FastPipedOutputStream.java:100) at
[JIRA] (JENKINS-34044) EC2 Plugin: Windows: java.io.IOException: Pipe closed
Title: Message Title Pierson Yieh updated JENKINS-34044 Jenkins / JENKINS-34044 EC2 Plugin: Windows: java.io.IOException: Pipe closed Change By: Pierson Yieh Resolution: Fixed Status: Resolved In Review Add Comment This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.169543.1459903395000.1519.1582934700916%40Atlassian.JIRA.
[JIRA] (JENKINS-34044) EC2 Plugin: Windows: java.io.IOException: Pipe closed
Title: Message Title Pierson Yieh edited a comment on JENKINS-34044 Re: EC2 Plugin: Windows: java.io.IOException: Pipe closed Requesting this to be reopened. We're seeing this behavior in our Jenkins builds (where the Jenkins Master will lose/close the connection to the Windows agent nodes at the beginning or in the middle of a build). We're using the latest release of the ec2-plugin 1.49.1 and Jenkins version 2.204.2 We're happy to contribute a fix, but could use some assistance in pointing us in the right direction of where to start looking. CC: [~narayanan] Add Comment This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.169543.1459903395000.1051.1582845780289%40Atlassian.JIRA.
[JIRA] (JENKINS-34044) EC2 Plugin: Windows: java.io.IOException: Pipe closed
Title: Message Title Pierson Yieh edited a comment on JENKINS-34044 Re: EC2 Plugin: Windows: java.io.IOException: Pipe closed Requesting this to be reopened. We're seeing this behavior in our Jenkins builds (where the Jenkins Master will lose/close the connection to the Windows agent nodes at the beginning or in the middle of a build). We're using the latest release of the ec2-plugin 1.49.1 and Jenkins version 2.204.2 CC: [~narayanan] Add Comment This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.169543.1459903395000.1043.1582845720288%40Atlassian.JIRA.
[JIRA] (JENKINS-34044) EC2 Plugin: Windows: java.io.IOException: Pipe closed
Title: Message Title Pierson Yieh commented on JENKINS-34044 Re: EC2 Plugin: Windows: java.io.IOException: Pipe closed Requesting this to be reopened. We're seeing this behavior in our Jenkins builds (where the Jenkins Master will lose/close the connection to the Windows agent nodes at the beginning or in the middle of a build). We're using the latest release of the ec2-plugin 1.49.1 and Jenkins version 2.204.2 Add Comment This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.169543.1459903395000.1035.1582845600290%40Atlassian.JIRA.
[JIRA] (JENKINS-59160) Cannot Override workingDir for jnlp container
Title: Message Title Pierson Yieh commented on JENKINS-59160 Re: Cannot Override workingDir for jnlp container We also noticed that `workingDir` was only being ignored when being declared from a podTemplate / pipeline from a Jenkinsfile. Setting the working directory from pod templates saved on the Jenkins Master configuration page were being respected. Add Comment This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.201597.1567204609000.9095.1571258520131%40Atlassian.JIRA.
[JIRA] (JENKINS-53790) Kubernetes plugin shows failing templates to only admins
Title: Message Title Pierson Yieh commented on JENKINS-53790 Re: Kubernetes plugin shows failing templates to only admins UPDATE **Here is the PR with our suggested changes: https://github.com/jenkinsci/kubernetes-plugin/pull/440 Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53790) Kubernetes plugin shows failing templates to only admins
Title: Message Title Pierson Yieh edited a comment on JENKINS-53790 Re: Kubernetes plugin shows failing templates to only admins *UPDATE* ** Here is the PR with our suggested changes: [ https://github.com/jenkinsci/kubernetes-plugin/pull/440 ] Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-50429) Shell command are really slower than before
Title: Message Title Pierson Yieh edited a comment on JENKINS-50429 Re: Shell command are really slower than before [~fabricepipart] That is correct (100 exports = 100 threads) . We considered doing some calculations on maybe clumping together smaller export statements into a single writer thread, but felt the gain wouldn't outweigh the effort to do the calculations. Realistically, most of these threads would end as soon as the writing is done, therefore not all the threads would be running at once. The kubernetes-plugin already has ExecutorService classes that would manage the threads and so we're currently exploring that.We also considered changing the buffer size, the 2 problems with this are: * The number of export statements (i.e. the amount of data you're trying to write) is going to be dependent on many factors (e.g. global environment variables set from the Jenkins Master, environment variables declared in the pipeline, etc.) and so there's not compelling argument to increase the buffer size to a given larger size if the required size can be volatile. * We don't know if we even have control of how large the buffer size is. We have not fully explored this option, but it seems the buffer size is being set on the Jenkins / Cloudbees side. And so to change the buffer size would require a code change on the actual Jenkins code and we wanted to limit changes to simply the kubernetes-plugin. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-50429) Shell command are really slower than before
Title: Message Title Pierson Yieh commented on JENKINS-50429 Re: Shell command are really slower than before Fabrice Pipart That is correct. We considered doing some calculations on maybe clumping together smaller export statements into a single writer thread, but felt the gain wouldn't outweigh the effort to do the calculations. The kubernetes-plugin already has ExecutorService classes that would manage the threads and so we're currently exploring that. We also considered changing the buffer size, the 2 problems with this are: The number of export statements (i.e. the amount of data you're trying to write) is going to be dependent on many factors (e.g. global environment variables set from the Jenkins Master, environment variables declared in the pipeline, etc.) and so there's not compelling argument to increase the buffer size to a given larger size if the required size can be volatile. We don't know if we even have control of how large the buffer size is. We have not fully explored this option, but it seems the buffer size is being set on the Jenkins / Cloudbees side. And so to change the buffer size would require a code change on the actual Jenkins code and we wanted to limit changes to simply the kubernetes-plugin. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53790) Kubernetes plugin shows failing templates to only admins
Title: Message Title Pierson Yieh edited a comment on JENKINS-53790 Re: Kubernetes plugin shows failing templates to only admins We implemented a change to the kubernetes-plugin that will check the message of containers in the Waiting state. The message will container contain the String "Back-off pulling image" when it can't locate the docker image (e.g. a badly defined docker image). We then grab the corresponding build job from the Jenkins Queue, print a message to the build's console output to notify users that they've specified a bad docker image, then cancel the build. Canceling the job and not simply labeling it as failed was necessary or else Jenkins would continuously re-try to create the Kubernetes pod using the bad docker image and fail. Our solution solves the problem of customers not knowing why their job is stuck in a perpetual limbo due to a bad docker image and not knowing / having permissions to view the kubernetes logs, as well as the problem of jobs being stuck in the aforementioned perpetual waiting state due to a malformed docker image. We are currently in the process of refining it and will submit a formal PR once that's ready. Any suggestions and comments would be appreciated. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53790) Kubernetes plugin shows failing templates to only admins
Title: Message Title Pierson Yieh commented on JENKINS-53790 Re: Kubernetes plugin shows failing templates to only admins We implemented a change to the kubernetes-plugin that will check the message of containers in the Waiting state. The message will container the String "Back-off pulling image" when it can't locate the docker image (e.g. a badly defined docker image). We then grab the corresponding build job from the Jenkins Queue, print a message to the build's console output to notify users that they've specified a bad docker image, then cancel the build. Canceling the job and not simply labeling it as failed was necessary or else Jenkins would continuously re-try to create the Kubernetes pod using the bad docker image and fail. Our solution solves the problem of customers not knowing why their job is stuck in a perpetual limbo due to a bad docker image and not knowing / having permissions to view the kubernetes logs, as well as the problem of jobs being stuck in the aforementioned perpetual waiting state due to a malformed docker image. We are currently in the process of refining it and will submit a formal PR once that's ready. Any suggestions and comments would be appreciated. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-50429) Shell command are really slower than before
Title: Message Title Pierson Yieh commented on JENKINS-50429 Re: Shell command are really slower than before We have isolated the cause of the slowdown to the 1s wait in java.io.PipedOutputStream's write() method. The symptom being that whenever the process tries to write to the buffer (namely all the export environment variable statements), a number of the write() calls will block for 1+ seconds due to the buffer being full. Our solution was to, instead of the main thread writing, have the main thread delegate the write calls to asynchronous writer threads (with each one being in charge of writing an export statement to the buffer), then ensuring all the writer threads have finished at the end. This dramatically reduced our overhead time of `sh` calls from 3-4 seconds down to less than 1 second. We're currently in the process of refining it and then will submit a formal PR with our changes, but if there are any comments or suggestions please let us know. Also a note, we noticed that the slow `sh` behavior only occurred when called within a `container` block, not when the `sh` calls were simply using the default container. However, even using the same container as the default container produced slow `sh` calls. Example: pipeline { agent { kubernetes { label "pod-name" defaultContainer "jnlp" yaml """ apiVersion: v1 kind: Pod spec: containers: - name: jnlp ... """ } } stages { stage("Loop in Default") { steps { script { for (i = 0; i < 10; i++) { sh "which jq" } } } } } stages { stage("Loop in JNLP") { steps { container("jnlp") { script { for (i = 0; i < 10; i++) { sh "which jq" } } } } } } } Add Comment
[JIRA] (JENKINS-50429) Shell command are really slower than before
Title: Message Title Pierson Yieh updated an issue Jenkins / JENKINS-50429 Shell command are really slower than before Change By: Pierson Yieh Comment: We have isolated the cause of the slowdown to the 1s wait in java.io.PipedOutputStream's write() method. The symptom being that whenever the process tries to write to the buffer (namely all the export environment variable statements), a number of the write() calls will block for 1+ seconds due to the buffer being full. Our solution was to, instead of the main thread writing, have the main thread delegate the write calls to asynchronous writer threads (with each one being in charge of writing an export statement to the buffer), then ensuring all the writer threads have finished at the end. This dramatically reduced our overhead time of `sh` calls from 3-4 seconds down to less than 1 second. We're currently in the process of refining it and then will submit a formal PR with our changes, but if there are any comments or suggestions please let us know.Also a note, we noticed that the slow `sh` behavior only occurred when called within a `container` block, not when the `sh` calls were simply using the default container. However, even using the same container as the default container produced slow `sh` calls. Example:```pipeline { agent { kubernetes { label "pod-name" defaultContainer "jnlp" yaml """ apiVersion: v1 kind: Pod spec: containers: - name: jnlp ... """ } } stages { stage("Loop in Default") { steps { script { for (i = 0; i < 10; i++) { sh "which jq" } } } } } stages { stage("Loop in JNLP") { steps { container("jnlp") { script { for (i = 0; i < 10; i++) { sh "which jq" } } } } } }}``` Add Comment
[JIRA] (JENKINS-50429) Shell command are really slower than before
Title: Message Title Pierson Yieh commented on JENKINS-50429 Re: Shell command are really slower than before We have isolated the cause of the slowdown to the 1s wait in java.io.PipedOutputStream's write() method. The symptom being that whenever the process tries to write to the buffer (namely all the export environment variable statements), a number of the write() calls will block for 1+ seconds due to the buffer being full. Our solution was to, instead of the main thread writing, have the main thread delegate the write calls to asynchronous writer threads (with each one being in charge of writing an export statement to the buffer), then ensuring all the writer threads have finished at the end. This dramatically reduced our overhead time of `sh` calls from 3-4 seconds down to less than 1 second. We're currently in the process of refining it and then will submit a formal PR with our changes, but if there are any comments or suggestions please let us know. Also a note, we noticed that the slow `sh` behavior only occurred when called within a `container` block, not when the `sh` calls were simply using the default container. However, even using the same container as the default container produced slow `sh` calls. Example: ``` pipeline { agent { kubernetes { label "pod-name" defaultContainer "jnlp" yaml """ apiVersion: v1 kind: Pod spec: containers: - name: jnlp ... """ } } stages { stage("Loop in Default") { steps { script { for (i = 0; i < 10; i++) { sh "which jq" } } } } } stages { stage("Loop in JNLP") { steps { container("jnlp") { script { for (i = 0; i < 10; i++) { sh "which jq" } } } } } } } ``` Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)