Sorry for the inconvenience. I disabled the worker. I'll need more time to restore it.
On Fri, Jun 28, 2019 at 3:56 PM Daniel Oliveira <[email protected]> wrote: > Any updates to this issue today? It seems like this (or a similar bug) is > still happening across many Pre and Postcommits. > > On Fri, Jun 28, 2019 at 12:33 AM Yifan Zou <[email protected]> wrote: > >> I did the prune on beam15. The disk was free but all jobs fails with >> other weird problems. Looks like docker prune overkills, but I don't have >> evidence. Will look further in AM. >> >> On Thu, Jun 27, 2019 at 11:20 PM Udi Meiri <[email protected]> wrote: >> >>> See how the hdfs IT already avoids tag collisions. >>> >>> On Thu, Jun 27, 2019, 20:42 Yichi Zhang <[email protected]> wrote: >>> >>>> for flakiness I guess a tag is needed to separate concurrent build >>>> apart. >>>> >>>> On Thu, Jun 27, 2019 at 8:39 PM Yichi Zhang <[email protected]> wrote: >>>> >>>>> maybe a cron job on jenkins node that does docker prune every day? >>>>> >>>>> On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka <[email protected]> >>>>> wrote: >>>>> >>>>>> This highlights the race condition caused by using single docker >>>>>> registry on a machine. >>>>>> If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one >>>>>> after another then the 2nd one will replace the 1st one and cause >>>>>> flakyness. >>>>>> >>>>>> Is their a way to dynamically create and destroy docker repository on >>>>>> a machine and clean all the relevant data? >>>>>> >>>>>> On Thu, Jun 27, 2019 at 3:15 PM Yifan Zou <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> The problem was because of the large quantity of stale docker images >>>>>>> generated by the Python portable tests and HDFS IT. >>>>>>> >>>>>>> Dumping the docker disk usage gives me: >>>>>>> >>>>>>> TYPE TOTAL ACTIVE SIZE >>>>>>> RECLAIMABLE >>>>>>> *Images 1039 356 424GB >>>>>>> 384.2GB (90%)* >>>>>>> Containers 987 2 2.042GB >>>>>>> 2.041GB (99%) >>>>>>> Local Volumes 126 0 392.8MB >>>>>>> 392.8MB (100%) >>>>>>> >>>>>>> REPOSITORY >>>>>>> TAG IMAGE ID CREATED >>>>>>> SIZE SHARED SIZE UNIQUE SIZE CONTAINERS >>>>>>> jenkins-docker-apache.bintray.io/beam/python3 >>>>>>> latest ff1b949f4442 22 hours ago 1.639GB >>>>>>> 922.3MB 716.9MB 0 >>>>>>> jenkins-docker-apache.bintray.io/beam/python >>>>>>> latest 1dda7b9d9748 22 hours ago 1.624GB >>>>>>> 913.7MB 710.3MB 0 >>>>>>> <none> >>>>>>> <none> 05458187a0e3 22 >>>>>>> hours >>>>>>> ago 732.9MB 625.1MB 107.8MB 4 >>>>>>> <none> >>>>>>> <none> 896f35dd685f 23 >>>>>>> hours >>>>>>> ago 1.639GB 922.3MB 716.9MB >>>>>>> 0 >>>>>>> <none> >>>>>>> <none> db4d24ca9f2b 23 >>>>>>> hours >>>>>>> ago 1.624GB 913.7MB 710.3MB >>>>>>> 0 >>>>>>> <none> >>>>>>> <none> 547df4d71c31 23 hours >>>>>>> ago 732.9MB 625.1MB 107.8MB 4 >>>>>>> <none> >>>>>>> <none> dd7d9582c3e0 23 hours >>>>>>> ago 1.639GB 922.3MB 716.9MB 0 >>>>>>> <none> >>>>>>> <none> 664aae255239 23 hours >>>>>>> ago 1.624GB 913.7MB 710.3MB 0 >>>>>>> <none> >>>>>>> <none> b528fedf9228 23 >>>>>>> hours >>>>>>> ago 732.9MB 625.1MB 107.8MB 4 >>>>>>> <none> >>>>>>> <none> 8e996f22435e 25 >>>>>>> hours >>>>>>> ago 1.624GB 913.7MB 710.3MB 0 >>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify_pr-818_test latest >>>>>>> 24b73b3fec06 25 hours ago 1.305GB >>>>>>> 965.7MB 339.5MB 0 >>>>>>> <none> >>>>>>> <none> 096325fb48de 25 >>>>>>> hours >>>>>>> ago 732.9MB 625.1MB 107.8MB 2 >>>>>>> jenkins-docker-apache.bintray.io/beam/java >>>>>>> latest c36d8ff2945d 25 hours ago >>>>>>> 685.6MB >>>>>>> 625.1MB 60.52MB 0 >>>>>>> <none> >>>>>>> <none> 11c86ebe025f 26 hours >>>>>>> ago 1.639GB 922.3MB 716.9MB >>>>>>> 0 >>>>>>> <none> >>>>>>> <none> 2ecd69c89ec1 26 hours >>>>>>> ago 1.624GB 913.7MB 710.3MB 0 >>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8590_test >>>>>>> latest 3d1d589d44fe 2 days ago 1.305GB >>>>>>> 965.7MB 339.5MB 0 >>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify_pr-801_test >>>>>>> latest d1cc503ebe8e 2 days ago 1.305GB >>>>>>> 965.7MB 339.2MB 0 >>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8577_test >>>>>>> latest 8582c6ca6e15 3 days ago 1.305GB >>>>>>> 965.7MB 339.2MB 0 >>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8576_test >>>>>>> latest 4591e0948170 3 days ago 1.305GB >>>>>>> 965.7MB 339.2MB 0 >>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8575_test >>>>>>> latest ab181c49d56e 4 days ago 1.305GB >>>>>>> 965.7MB 339.2MB 0 >>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8573_test >>>>>>> latest 2104ba0a6db7 4 days ago 1.305GB >>>>>>> 965.7MB 339.2MB 0 >>>>>>> ... >>>>>>> <1000+ images> >>>>>>> >>>>>>> I removed unused the images and the beam15 is back now. >>>>>>> >>>>>>> Opened https://issues.apache.org/jira/browse/BEAM-7650. >>>>>>> Ankur, I assigned the issue to you. Feel free to reassign it if >>>>>>> needed. >>>>>>> >>>>>>> Thank you. >>>>>>> Yifan >>>>>>> >>>>>>> On Thu, Jun 27, 2019 at 11:29 AM Yifan Zou <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Something were eating the disk. Disconnected the worker so jobs >>>>>>>> could be allocated to other nodes. Will look deeper. >>>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>>> /dev/sda1 485G 485G 96K 100% / >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jun 27, 2019 at 10:54 AM Yifan Zou <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I'm on it. >>>>>>>>> >>>>>>>>> On Thu, Jun 27, 2019 at 10:17 AM Udi Meiri <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Opened a bug here: >>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-7648 >>>>>>>>>> >>>>>>>>>> Can someone investigate what's going on? >>>>>>>>>> >>>>>>>>>
