Didi - we just saw a report of a similar failure on engine-4.1's OST. COuld you please backport these patches there too?
On Mon, Nov 27, 2017 at 2:57 PM, Yedidyah Bar David <d...@redhat.com> wrote: > On Mon, Nov 27, 2017 at 10:38 AM, Yedidyah Bar David <d...@redhat.com> > wrote: > >> On Sun, Nov 26, 2017 at 7:24 PM, Nir Soffer <nsof...@redhat.com> wrote: >> > I think we need to check and report which process is listening on a >> port >> > when starting a server on that port fail. >> >> How do you know that a server was "started on that port", and that >> if failed specifically because it failed to bind? >> >> There is no standardized (Unix) way to mark that a service wants to >> listen on a specific port, or that it failed because a specific port >> was bound by some other process. >> >> There are various classical *inetd* daemons, and modern systemd.socket, >> that listen *instead* of some service. Then they can manage the port >> resources and perhaps do something intelligent about them. >> >> > >> > Didi, do you think we can integrate this in the deploy code, or this >> > should be implemented in each server? >> >> It should be quite easy to patch otopi's services.state to run something >> if start fails, e.g. 'ss -anp' or whatever you want. >> >> It should even be not-too-hard to do this in a self-contained plugin, >> so can be part of otopi-debug-plugins. >> >> If we decide that something needs to be implemented by each server, >> perhaps "something" should be to be controlled by a systemd.socket unit. >> Didn't try, though, to see what this actually buys us. >> >> > >> > Maybe when deployment fails, the deploy code can report all the >> > listening sockets and the processes bound to these sockets? >> >> Pushed now: >> >> https://gerrit.ovirt.org/84699 core: Name TRANSACTION_INIT >> https://gerrit.ovirt.org/84700 plugins: debug: Add debug_failure >> https://gerrit.ovirt.org/84701 automation: Test failure >> >> Will merge soon, if all goes well. >> > > Merged them. > > Pushed to OST: > > https://gerrit.ovirt.org/84710 > > Dafna - thanks for opening the bug on ovirt-imageio, but I am not > sure anyone can do much about it without more info, such as might > be provided by above patches. When I suggested below to open BZ > I meant on otopi or host-deploy to provide more debugging info, > not for imageio - obviously no harm in opening it, and it's good > to have it even if only for reference. > > >> >> Feel free to open BZ for other things discussed above, if relevant. >> >> > >> > Nir >> > >> > On Sun, Nov 26, 2017 at 7:11 PM Gal Ben Haim <gbenh...@redhat.com> >> wrote: >> >> >> >> The failure is not consistent. >> >> >> >> On Sun, Nov 26, 2017 at 5:33 PM, Yaniv Kaul <yk...@redhat.com> wrote: >> >>> >> >>> >> >>> >> >>> On Sun, Nov 26, 2017 at 4:53 PM, Gal Ben Haim <gbenh...@redhat.com> >> >>> wrote: >> >>>> >> >>>> We still see this issue on the upgrade suite from latest release to >> >>>> master [1]. >> >>>> I don't see any evidence in "/var/log/messages" [2] that >> >>>> "ovirt-imageio-proxy" was started twice. >> >>> >> >>> >> >>> Since it's not a registered port and a high port, could it be used by >> >>> something else (what are the odds though ? >> >>> Is it consistent? >> >>> Y. >> >>> >> >>>> >> >>>> >> >>>> [1] >> >>>> http://jenkins.ovirt.org/blue/rest/organizations/jenkins/pip >> elines/ovirt-master_change-queue-tester/runs/4153/nodes/123/ >> steps/241/log/?start=0 >> >>>> >> >>>> [2] >> >>>> http://jenkins.ovirt.org/view/Change%20queue%20jobs/job/ovir >> t-master_change-queue-tester/4153/artifact/exported-artifac >> ts/upgrade-from-release-suit-master-el7/test_logs/upgrade- >> from-release-suite-master/post-001_initialize_engine.py/ >> lago-upgrade-from-release-suite-master-engine/_var_log/messages/*view*/ >> >>>> >> >>>> On Fri, Nov 24, 2017 at 8:16 PM, Dafna Ron <d...@redhat.com> wrote: >> >>>>> >> >>>>> there were two different patches reported as failing cq today with >> the >> >>>>> ovirt-imageio-proxy service failing to start. >> >>>>> >> >>>>> Here is the latest failure: >> >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste >> r/4130/artifact >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> On 11/23/2017 03:39 PM, Allon Mureinik wrote: >> >>>>> >> >>>>> Daniel/Nir? >> >>>>> >> >>>>> On Thu, Nov 23, 2017 at 5:29 PM, Dafna Ron <d...@redhat.com> wrote: >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> We have a failing on test >> >>>>>> 001_initialize_engine.test_initialize_engine. >> >>>>>> >> >>>>>> This is failing with error Failed to start service >> >>>>>> 'ovirt-imageio-proxy >> >>>>>> >> >>>>>> >> >>>>>> Link and headline ofto suspected patches: >> >>>>>> >> >>>>>> build: Make resulting RPMs architecture-specific - >> >>>>>> https://gerrit.ovirt.org/#/c/84534/ >> >>>>>> >> >>>>>> >> >>>>>> Link to Job: >> >>>>>> >> >>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4055 >> >>>>>> >> >>>>>> >> >>>>>> Link to all logs: >> >>>>>> >> >>>>>> >> >>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste >> r/4055/artifact/ >> >>>>>> >> >>>>>> >> >>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste >> r/4055/artifact/exported-artifacts/upgrade-from-release- >> suit-master-el7/test_logs/upgrade-from-release-suite- >> master/post-001_initialize_engine.py/lago-upgrade-from- >> release-suite-master-engine/_var_log/messages/*view*/ >> >>>>>> >> >>>>>> >> >>>>>> (Relevant) error snippet from the log: >> >>>>>> >> >>>>>> <error> >> >>>>>> >> >>>>>> >> >>>>>> from lago log: >> >>>>>> >> >>>>>> Failed to start service 'ovirt-imageio-proxy >> >>>>>> >> >>>>>> messages logs: >> >>>>>> >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> Starting Session 8 of user root. >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: Traceback (most recent call last): >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File "/usr/bin/ovirt-imageio-proxy", line >> 85, in >> >>>>>> <module> >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: status = image_proxy.main(args, config) >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File >> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/image_proxy.py", >> line >> >>>>>> 21, in main >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: image_server.start(config) >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File >> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/server.py", >> line 45, >> >>>>>> in start >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: WSGIRequestHandler) >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/SocketServer.py", >> line 419, >> >>>>>> in __init__ >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: self.server_bind() >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/wsgiref/ >> simple_server.py", >> >>>>>> line 48, in server_bind >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: HTTPServer.server_bind(self) >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/BaseHTTPServer.py", >> line >> >>>>>> 108, in server_bind >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: SocketServer.TCPServer.server_bind(self) >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/SocketServer.py", >> line 430, >> >>>>>> in server_bind >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: self.socket.bind(self.server_address) >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/socket.py", line >> 224, in >> >>>>>> meth >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: return getattr(self._sock,name)(*args) >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: socket.error: [Errno 98] Address already in >> use >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> ovirt-imageio-proxy.service: main process exited, code=exited, >> >>>>>> status=1/FAILURE >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> Failed to start oVirt ImageIO Proxy. >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> Unit ovirt-imageio-proxy.service entered failed state. >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> ovirt-imageio-proxy.service failed. >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> ovirt-imageio-proxy.service holdoff time over, scheduling restart. >> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> Starting oVirt ImageIO Proxy... >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: Traceback (most recent call last): >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File "/usr/bin/ovirt-imageio-proxy", line >> 85, in >> >>>>>> <module> >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: status = image_proxy.main(args, config) >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File >> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/image_proxy.py", >> line >> >>>>>> 21, in main >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: image_server.start(config) >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File >> >>>>>> "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/server.py", >> line 45, >> >>>>>> in start >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: WSGIRequestHandler) >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/SocketServer.py", >> line 419, >> >>>>>> in __init__ >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: self.server_bind() >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/wsgiref/ >> simple_server.py", >> >>>>>> line 48, in server_bind >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: HTTPServer.server_bind(self) >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/BaseHTTPServer.py", >> line >> >>>>>> 108, in server_bind >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: SocketServer.TCPServer.server_bind(self) >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/SocketServer.py", >> line 430, >> >>>>>> in server_bind >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: self.socket.bind(self.server_address) >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/socket.py", line >> 224, in >> >>>>>> meth >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: return getattr(self._sock,name)(*args) >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> >>>>>> ovirt-imageio-proxy: socket.error: [Errno 98] Address already in >> use >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> ovirt-imageio-proxy.service: main process exited, code=exited, >> >>>>>> status=1/FAILURE >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> Failed to start oVirt ImageIO Proxy. >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> Unit ovirt-imageio-proxy.service entered failed state. >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> ovirt-imageio-proxy.service failed. >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> ovirt-imageio-proxy.service holdoff time over, scheduling restart. >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> start request repeated too quickly for ovirt-imageio-proxy.service >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> Failed to start oVirt ImageIO Proxy. >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> Unit ovirt-imageio-proxy.service entered failed state. >> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >> systemd: >> >>>>>> ovirt-imageio-proxy.service failed. >> >>>>>> >> >>>>>> </error> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> _______________________________________________ >> >>>>>> Infra mailing list >> >>>>>> in...@ovirt.org >> >>>>>> http://lists.ovirt.org/mailman/listinfo/infra >> >>>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Devel mailing list >> >>>>> Devel@ovirt.org >> >>>>> http://lists.ovirt.org/mailman/listinfo/devel >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> GAL bEN HAIM >> >>>> RHV DEVOPS >> >>>> >> >>>> _______________________________________________ >> >>>> Devel mailing list >> >>>> Devel@ovirt.org >> >>>> http://lists.ovirt.org/mailman/listinfo/devel >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> GAL bEN HAIM >> >> RHV DEVOPS >> >> _______________________________________________ >> >> Devel mailing list >> >> Devel@ovirt.org >> >> http://lists.ovirt.org/mailman/listinfo/devel >> >> >> >> -- >> Didi >> > > > > -- > Didi > > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel >
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel