@Matthias Dellweg <dell...@atix.de> did you restart Pulp after upgrading redis-py? Are you seeing this on fresh boxes, or does it require some modification to reproduce? I don't think any of us have experienced this thus far.
The only thing I can think of is that maybe when the worker process forks it ends up using a different version of redis-py than the parent worker. On Tue, Jan 21, 2020 at 4:49 AM Matthias Dellweg <dell...@atix.de> wrote: > [@ Brian: > > https://github.com/pulp/pulpcore/commit/e36e7b5f0eccc176a6e6298df29293b014f4710c > ] > > Hi Daniel, > thank you for looking into this. > What i am seeing is: > > Jan 21 09:23:10 pulp3-source-fedora30.anubis.example.com > gunicorn[23274]: 127.0.0.1 - admin [21/Jan/2020:09:23:10 +0000] "PATCH > /pulp/api/v3/repositories/file/file/3a31ed13-585d-4a36-8398-7df40560ffa4/ > HTTP/1.1" 202 67 "-" "python-requests/2.22.0" Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: pulp: > rq.worker:INFO: 23...@pulp3-source-fedora30.anubis.example.com: > 3a776d4d-ff0c-4b44-afae-08dc7c6cc415 Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23270]: pulp: > rq.worker:INFO: resource-manager: Job OK > (aa22aed0-0363-45ae-8254-4ab14893a4a1) Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: pulp: > rq.worker:ERROR: Worker > rq:worker:23...@pulp3-source-fedora30.anubis.example.com: found an > unhandled exception, quitting... Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: Traceback (most > recent call last): Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py", line > 782, in prepare_job_execution Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: > pipeline.execute() Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/redis/client.py", > line 3707, in execute Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: self.reset() > Jan 21 09:23:10 pulp3-source-fedora30.anubis.example.com rq[23269]: > File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/redis/client.py", > line 3476, in reset Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: > self.connection_pool.release(self.connection) Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/redis/connection.py", > line 1114, in release Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: > self._in_use_connections.remove(connection) Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: KeyError: > Connection<host=localhost,port=6379,db=0> Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: During handling of > the above exception, another exception occurred: Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: Traceback (most > recent call last): Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py", line > 515, in work Jan 21 09:23:10 pulp3-source-fedora30.anubis.example.com > rq[23269]: self.execute_job(job, queue) Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/home/vagrant/devel/pulpcore/pulpcore/tasking/worker.py", line 72, in > execute_job Jan 21 09:23:10 pulp3-source-fedora30.anubis.example.com > rq[23269]: super().execute_job(*args, **kwargs) Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py", line > 727, in execute_job Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: > self.fork_work_horse(job, queue) Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py", line > 667, in fork_work_horse Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: > self.main_work_horse(job, queue) Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py", line > 744, in main_work_horse Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: raise e Jan 21 > 09:23:10 pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py", line > 741, in main_work_horse Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: > self.perform_job(job, queue) Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/home/vagrant/devel/pulpcore/pulpcore/tasking/worker.py", line 103, in > perform_job Jan 21 09:23:10 pulp3-source-fedora30.anubis.example.com > rq[23269]: return super().perform_job(job, queue) Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py", line > 866, in perform_job Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: > self.prepare_job_execution(job, heartbeat_ttl) Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/rq/worker.py", line > 782, in prepare_job_execution Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: > pipeline.execute() Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/redis/client.py", > line 3445, in __exit__ Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: self.reset() > Jan 21 09:23:10 pulp3-source-fedora30.anubis.example.com rq[23269]: > File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/redis/client.py", > line 3476, in reset Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: > self.connection_pool.release(self.connection) Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: File > "/usr/local/lib/pulp/lib64/python3.7/site-packages/redis/connection.py", > line 1114, in release Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: > self._in_use_connections.remove(connection) Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com rq[23269]: KeyError: > Connection<host=localhost,port=6379,db=0> Jan 21 09:23:10 > pulp3-source-fedora30.anubis.example.com gunicorn[23274]: 127.0.0.1 - > admin [21/Jan/2020:09:23:10 +0000] "GET > /pulp/api/v3/tasks/3a776d4d-ff0c-4b44-afae-08dc7c6cc415/ HTTP/1.1" 200 > 477 "-" "python-requests/2.22.0" > > and that last line repeats forever. > > On Fri, 17 Jan 2020 09:09:01 -0500 > Daniel Alley <dal...@redhat.com> wrote: > > > Different issue perhaps? Are you seeing anything in the logs that > > looks like this? https://github.com/rq/rq/issues/1044 > > > > On Fri, Jan 17, 2020 at 9:06 AM Daniel Alley <dal...@redhat.com> > > wrote: > > > > > Strange, I'm pretty sure an issue like this is why we pinned > > > originally, but upstream said that that particular issue was > > > (supposedly) fixed. > > > > > > > https://github.com/andymccurdy/redis-py/issues/1136#issuecomment-571168161 > > > > > > On Fri, Jan 17, 2020 at 4:15 AM Matthias Dellweg <dell...@atix.de> > > > wrote: > > >> Hello all, > > >> I believe i have found a new incarnation of hanging tasks (tm). > > >> This time it is pulp3 and as hard to nail down as ever. > > >> I think, it is introduced by > > >> e36e7b5f0eccc176a6e6298df29293b014f4710c. Where the dependency on > > >> redis was dropped with the result that 3.3.smth instead of > > >> 3.1.smth was installed. > > >> > > >> Before filing an issue, is there anyone out there to share that > > >> experience? > > >> > > >> Also as a thought protocol of how to reproduce: > > >> I have seen tasks hanging in both "waiting" and "running" state > > >> when using the command `prestart; django-admin test pulp_deb` or > > >> `<...> test pulpcore`. All the tasks I have seen were `sync`, > > >> `general_create` or `general_delete` and looked like they never > > >> started to do anything for real. To have consistent results, i had > > >> the impression that i needed to rebuild the vagrant boxes for > > >> every bisecting step. Also updating the python-redis package on a > > >> box that worked, produced a hanging task in the next run. > > >> > > >> Have a good day, > > >> Matthias >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev