Thank you for triaging and working out a solution Yifan and Ankur.

Ankur, from what you discovered, we should fix this race condition
otherwise same problem will happen in the future. Is there a jira tracking
this issue?

On Fri, Jun 28, 2019 at 4:56 PM Yifan Zou <[email protected]> wrote:

> Sorry for the inconvenience. I disabled the worker. I'll need more time to
> restore it.
>
> On Fri, Jun 28, 2019 at 3:56 PM Daniel Oliveira <[email protected]>
> wrote:
>
>> Any updates to this issue today? It seems like this (or a similar bug) is
>> still happening across many Pre and Postcommits.
>>
>> On Fri, Jun 28, 2019 at 12:33 AM Yifan Zou <[email protected]> wrote:
>>
>>> I did the prune on beam15. The disk was free but all jobs fails with
>>> other weird problems. Looks like docker prune overkills, but I don't have
>>> evidence. Will look further in AM.
>>>
>>> On Thu, Jun 27, 2019 at 11:20 PM Udi Meiri <[email protected]> wrote:
>>>
>>>> See how the hdfs IT already avoids tag collisions.
>>>>
>>>> On Thu, Jun 27, 2019, 20:42 Yichi Zhang <[email protected]> wrote:
>>>>
>>>>> for flakiness I guess a tag is needed to separate concurrent build
>>>>> apart.
>>>>>
>>>>> On Thu, Jun 27, 2019 at 8:39 PM Yichi Zhang <[email protected]> wrote:
>>>>>
>>>>>> maybe a cron job on jenkins node that does docker prune every day?
>>>>>>
>>>>>> On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> This highlights the race condition caused by using single docker
>>>>>>> registry on a machine.
>>>>>>> If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one
>>>>>>> after another then the 2nd one will replace the 1st one and cause 
>>>>>>> flakyness.
>>>>>>>
>>>>>>> Is their a way to dynamically create and destroy docker repository
>>>>>>> on a machine and clean all the relevant data?
>>>>>>>
>>>>>>> On Thu, Jun 27, 2019 at 3:15 PM Yifan Zou <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The problem was because of the large quantity of stale docker
>>>>>>>> images generated by the Python portable tests and HDFS IT.
>>>>>>>>
>>>>>>>> Dumping the docker disk usage gives me:
>>>>>>>>
>>>>>>>> TYPE                TOTAL               ACTIVE              SIZE
>>>>>>>>              RECLAIMABLE
>>>>>>>> *Images              1039                356                 424GB
>>>>>>>>               384.2GB (90%)*
>>>>>>>> Containers          987                 2                   2.042GB
>>>>>>>>             2.041GB (99%)
>>>>>>>> Local Volumes       126                 0                   392.8MB
>>>>>>>>             392.8MB (100%)
>>>>>>>>
>>>>>>>> REPOSITORY
>>>>>>>>                   TAG                 IMAGE ID            CREATED
>>>>>>>>   SIZE                SHARED SIZE         UNIQUE SIZE         
>>>>>>>> CONTAINERS
>>>>>>>> jenkins-docker-apache.bintray.io/beam/python3
>>>>>>>>    latest              ff1b949f4442        22 hours ago        1.639GB
>>>>>>>>         922.3MB                  716.9MB             0
>>>>>>>> jenkins-docker-apache.bintray.io/beam/python
>>>>>>>>      latest              1dda7b9d9748        22 hours ago        
>>>>>>>> 1.624GB
>>>>>>>>           913.7MB               710.3MB             0
>>>>>>>> <none>
>>>>>>>>                          <none>              05458187a0e3        22 
>>>>>>>> hours
>>>>>>>> ago        732.9MB             625.1MB            107.8MB             4
>>>>>>>> <none>
>>>>>>>>                          <none>              896f35dd685f        23 
>>>>>>>> hours
>>>>>>>> ago        1.639GB             922.3MB               716.9MB           
>>>>>>>>   0
>>>>>>>> <none>
>>>>>>>>                          <none>              db4d24ca9f2b        23 
>>>>>>>> hours
>>>>>>>> ago        1.624GB             913.7MB              710.3MB            
>>>>>>>>  0
>>>>>>>> <none>
>>>>>>>>                           <none>              547df4d71c31        23 
>>>>>>>> hours
>>>>>>>> ago        732.9MB             625.1MB             107.8MB             
>>>>>>>> 4
>>>>>>>> <none>
>>>>>>>>                           <none>              dd7d9582c3e0        23 
>>>>>>>> hours
>>>>>>>> ago        1.639GB             922.3MB             716.9MB             >>>>>>>> 0
>>>>>>>> <none>
>>>>>>>>                           <none>              664aae255239        23 
>>>>>>>> hours
>>>>>>>> ago        1.624GB             913.7MB             710.3MB             >>>>>>>> 0
>>>>>>>> <none>
>>>>>>>>                           <none>              b528fedf9228        23 
>>>>>>>> hours
>>>>>>>> ago        732.9MB             625.1MB             107.8MB             
>>>>>>>> 4
>>>>>>>> <none>
>>>>>>>>                           <none>              8e996f22435e        25 
>>>>>>>> hours
>>>>>>>> ago        1.624GB             913.7MB            710.3MB             0
>>>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify_pr-818_test    latest
>>>>>>>>              24b73b3fec06        25 hours ago        1.305GB
>>>>>>>> 965.7MB               339.5MB             0
>>>>>>>> <none>
>>>>>>>>                           <none>              096325fb48de       25 
>>>>>>>> hours
>>>>>>>> ago        732.9MB             625.1MB            107.8MB              
>>>>>>>> 2
>>>>>>>> jenkins-docker-apache.bintray.io/beam/java
>>>>>>>>        latest              c36d8ff2945d          25 hours ago
>>>>>>>>  685.6MB             625.1MB               60.52MB             0
>>>>>>>> <none>
>>>>>>>>                           <none>              11c86ebe025f        26 
>>>>>>>> hours
>>>>>>>> ago        1.639GB             922.3MB              716.9MB            
>>>>>>>>  0
>>>>>>>> <none>
>>>>>>>>                           <none>              2ecd69c89ec1        26 
>>>>>>>> hours
>>>>>>>> ago        1.624GB             913.7MB             710.3MB             >>>>>>>> 0
>>>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8590_test
>>>>>>>>  latest              3d1d589d44fe        2 days ago          1.305GB
>>>>>>>>       965.7MB               339.5MB             0
>>>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify_pr-801_test
>>>>>>>>  latest              d1cc503ebe8e        2 days ago          1.305GB
>>>>>>>>       965.7MB             339.2MB             0
>>>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8577_test
>>>>>>>>  latest              8582c6ca6e15        3 days ago          1.305GB
>>>>>>>>       965.7MB              339.2MB             0
>>>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8576_test
>>>>>>>>  latest              4591e0948170        3 days ago          1.305GB
>>>>>>>>       965.7MB              339.2MB             0
>>>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8575_test
>>>>>>>>  latest              ab181c49d56e        4 days ago          1.305GB
>>>>>>>>       965.7MB              339.2MB             0
>>>>>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8573_test
>>>>>>>>  latest              2104ba0a6db7        4 days ago          1.305GB
>>>>>>>>       965.7MB              339.2MB             0
>>>>>>>> ...
>>>>>>>> <1000+ images>
>>>>>>>>
>>>>>>>> I removed unused the images and the beam15 is back now.
>>>>>>>>
>>>>>>>> Opened https://issues.apache.org/jira/browse/BEAM-7650.
>>>>>>>> Ankur, I assigned the issue to you. Feel free to reassign it if
>>>>>>>> needed.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>> Yifan
>>>>>>>>
>>>>>>>> On Thu, Jun 27, 2019 at 11:29 AM Yifan Zou <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Something were eating the disk. Disconnected the worker so jobs
>>>>>>>>> could be allocated to other nodes. Will look deeper.
>>>>>>>>> Filesystem      Size  Used  Avail Use% Mounted on
>>>>>>>>> /dev/sda1       485G  485G 96K 100%  /
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jun 27, 2019 at 10:54 AM Yifan Zou <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I'm on it.
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 27, 2019 at 10:17 AM Udi Meiri <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Opened a bug here:
>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-7648
>>>>>>>>>>>
>>>>>>>>>>> Can someone investigate what's going on?
>>>>>>>>>>>
>>>>>>>>>>

Reply via email to