On 24 February 2016 at 09:29, Sandro Tosi <[email protected]> wrote: > Hey Felipe, thanks a lot for your help! > > > On Tue, Feb 23, 2016 at 10:16 PM, Felipe Sateler <[email protected]> > wrote: >> >> On 23 February 2016 at 12:12, Sandro Tosi <[email protected]> wrote: >> > On Tue, Feb 23, 2016 at 9:19 AM, Sandro Tosi <[email protected]> wrote: >> >> quick update: we had a couple of (real) nfs issues and >> >> misconfiguration (meeh) that made the script fail even if it shouldnt >> >> have, so no news yet; the reboot loop just restarted and will >> >> periodically check it and report back if something comes up. >> > >> > so well, we just needed to wait a bit :) >> > >> > here it is: >> > >> > # journalctl -alb | grep -nE "cron|NFSSERVER" >> > 1012:Feb 23 06:50:42 SERVER systemd[1]: Installed new job >> > mnt-NFSSERVER_VOL.mount/start as 99 >> > 1014:Feb 23 06:50:42 SERVER systemd[1]: Installed new job >> > cron.service/start >> > as 101 >> > 1714:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER_VOL... >> > 1716:Feb 23 06:50:43 SERVER systemd[1]: About to execute: /bin/mount -n >> > XXX.YYY.32.75:/vol/VOL /mnt/NFSSERVER_VOL -t nfs -o >> > ro,intr,nolock,tcp,rdirplus,noatime,_netdev >> > 1718:Feb 23 06:50:43 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed >> > dead >> > -> mounting >> > 1720:Feb 23 06:50:43 SERVER systemd[574]: Executing: /bin/mount -n >> > XXX.YYY.32.75:/vol/VOL /mnt/NFSSERVER_VOL -t nfs -o >> > ro,intr,nolock,tcp,rdirplus,noatime,_netdev >> > 1905:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount mounting >> > timed out. Stopping. >> > 1906:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed >> > mounting -> mounting-sigterm >> > 1915:Feb 23 06:52:13 SERVER systemd[1]: Child 574 belongs to >> > mnt-NFSSERVER_VOL.mount >> > 1916:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount mount >> > process exited, code=killed status=15 >> > 1917:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed >> > mounting-sigterm -> mounted >> > 1918:Feb 23 06:52:13 SERVER systemd[1]: Job >> > mnt-NFSSERVER_VOL.mount/start >> > finished, result=done >> > 1919:Feb 23 06:52:13 SERVER systemd[1]: Mounted /mnt/NFSSERVER_VOL. >> > 2025:Feb 23 06:52:13 SERVER systemd[1]: About to execute: /usr/sbin/cron >> > -f >> > $EXTRA_OPTS >> > 2026:Feb 23 06:52:13 SERVER systemd[1]: Forked /usr/sbin/cron as 786 >> > 2027:Feb 23 06:52:13 SERVER systemd[1]: cron.service changed dead -> >> > running >> > 2028:Feb 23 06:52:13 SERVER systemd[1]: Job cron.service/start finished, >> > result=done >> > 2029:Feb 23 06:52:13 SERVER systemd[786]: Executing: /usr/sbin/cron -f >> > 2038:Feb 23 06:52:13 SERVER cron[786]: (CRON) INFO (pidfile fd = 3) >> > 2128:Feb 23 06:52:13 SERVER cron[786]: (CRON) INFO (Running @reboot >> > jobs) >> > 2300:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed >> > mounted -> failed >> > 2301:Feb 23 06:52:13 SERVER systemd[1]: Failed to destroy cgroup >> > /system.slice/mnt-NFSSERVER_VOL.mount: Device or resource busy >> > 2302:Feb 23 06:52:13 SERVER systemd[1]: Unit mnt-NFSSERVER_VOL.mount >> > entered >> > failed state. >> > 2303:Feb 23 06:52:13 SERVER systemd[1]: Sent message type=signal >> > sender=n/a >> > destination=n/a >> > object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER_5fVOL_2emount >> > interface=org.freedesktop.DBus.Properties member=PropertiesChanged >> > cookie=30 >> > reply_cookie=0 error=n/a >> > 2304:Feb 23 06:52:13 SERVER systemd[1]: Sent message type=signal >> > sender=n/a >> > destination=n/a >> > object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER_5fVOL_2emount >> > interface=org.freedesktop.DBus.Properties member=PropertiesChanged >> > cookie=31 >> > reply_cookie=0 error=n/a >> > >> > so 1m30s passed and the mount didnt come up, which is (one of) the >> > original >> > issue (usually, running mount -t nfs -a will bring it up, even just >> > right >> > after the failure at boot, so it seems like it's not retried?). i check >> > in >> > the journalctl output around those lines, there was additional msg >> > relevant >> > to this. >> >> Is the network up at this point? > > i think so (and the other 11 NFS mounts were correctly mounted at this point > anyway): here's some logs of the network and the failing mount: > > # journalctl -alb | grep -inE "network|NFSSERVER" > 653:Feb 23 06:50:42 SERVER kernel: igb: Intel(R) Gigabit Ethernet Network > Driver - version 5.2.15-k > 656:Feb 23 06:50:42 SERVER kernel: igb 0000:01:00.0: Intel(R) Gigabit > Ethernet Network Connection > 661:Feb 23 06:50:42 SERVER kernel: igb 0000:01:00.1: Intel(R) Gigabit > Ethernet Network Connection > 678:Feb 23 06:50:42 SERVER kernel: drop_monitor: Initializing network drop > monitor service > 962:Feb 23 06:50:42 SERVER systemd[1]: Installed new job > networking.service/start as 46 > 963:Feb 23 06:50:42 SERVER systemd[1]: Installed new job > network.target/start as 47 > 969:Feb 23 06:50:42 SERVER systemd[1]: Installed new job > network-online.target/start as 53 > 1012:Feb 23 06:50:42 SERVER systemd[1]: Installed new job > mnt-NFSSERVER_VOL.mount/start as 99 > 1548:Feb 23 06:50:42 SERVER systemd[1]: Starting LSB: Raise network > interfaces.... > 1550:Feb 23 06:50:42 SERVER systemd[1]: About to execute: > /etc/init.d/networking start > 1551:Feb 23 06:50:42 SERVER systemd[1]: Forked /etc/init.d/networking as 407 > 1552:Feb 23 06:50:42 SERVER systemd[1]: networking.service changed dead -> > start > 1553:Feb 23 06:50:42 SERVER systemd[407]: Executing: /etc/init.d/networking > start > 1591:Feb 23 06:50:43 SERVER networking[407]: Configuring network > interfaces...Cannot get EEE settings: Operation not supported > 1648:Feb 23 06:50:43 SERVER networking[407]: done. > 1649:Feb 23 06:50:43 SERVER systemd[1]: Received SIGCHLD from PID 407 > (networking). > 1650:Feb 23 06:50:43 SERVER systemd[1]: Child 407 (networking) died > (code=exited, status=0/SUCCESS) > 1651:Feb 23 06:50:43 SERVER systemd[1]: Child 407 belongs to > networking.service > 1652:Feb 23 06:50:43 SERVER systemd[1]: networking.service: control process > exited, code=exited status=0 > 1653:Feb 23 06:50:43 SERVER systemd[1]: networking.service got final SIGCHLD > for state start > 1654:Feb 23 06:50:43 SERVER systemd[1]: networking.service changed start -> > running > 1655:Feb 23 06:50:43 SERVER systemd[1]: Job networking.service/start > finished, result=done > 1656:Feb 23 06:50:43 SERVER systemd[1]: Started LSB: Raise network > interfaces.. > 1657:Feb 23 06:50:43 SERVER systemd[1]: Starting Network. > 1658:Feb 23 06:50:43 SERVER systemd[1]: network.target changed dead -> > active > 1659:Feb 23 06:50:43 SERVER systemd[1]: Job network.target/start finished, > result=done > 1660:Feb 23 06:50:43 SERVER systemd[1]: Reached target Network. > 1661:Feb 23 06:50:43 SERVER systemd[1]: Starting Network is Online. > 1662:Feb 23 06:50:43 SERVER systemd[1]: network-online.target changed dead > -> active > 1663:Feb 23 06:50:43 SERVER systemd[1]: Job network-online.target/start > finished, result=done > 1664:Feb 23 06:50:43 SERVER systemd[1]: Reached target Network is Online. > 1714:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER_VOL... > 1716:Feb 23 06:50:43 SERVER systemd[1]: About to execute: /bin/mount -n > XXX.YYY.32.75:/vol/VOL /mnt/NFSSERVER_VOL -t nfs -o > ro,intr,nolock,tcp,rdirplus,noatime,_netdev > 1718:Feb 23 06:50:43 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed dead > -> mounting > 1720:Feb 23 06:50:43 SERVER systemd[574]: Executing: /bin/mount -n > XXX.YYY.32.75:/vol/VOL /mnt/NFSSERVER_VOL -t nfs -o > ro,intr,nolock,tcp,rdirplus,noatime,_netdev > 1877:Feb 23 06:50:58 SERVER systemd[1]: Child 489 belongs to > networking.service > 1879:Feb 23 06:50:58 SERVER systemd[1]: Child 501 belongs to > networking.service > 1891:Feb 23 06:51:19 SERVER systemd[1]: Child 539 belongs to > networking.service
This is weird. Why would there be lingering processes of networking.service? Note that these processes exit after the nfs mounts are started. > 1892:Feb 23 06:51:19 SERVER systemd[1]: networking.service: cgroup is empty > 1893:Feb 23 06:51:19 SERVER systemd[1]: networking.service changed running > -> exited > 1905:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount mounting > timed out. Stopping. > 1906:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed > mounting -> mounting-sigterm > 1915:Feb 23 06:52:13 SERVER systemd[1]: Child 574 belongs to > mnt-NFSSERVER_VOL.mount > 1916:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount mount > process exited, code=killed status=15 > 1917:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed > mounting-sigterm -> mounted This is definitely weird. > 1918:Feb 23 06:52:13 SERVER systemd[1]: Job mnt-NFSSERVER_VOL.mount/start > finished, result=done > 1919:Feb 23 06:52:13 SERVER systemd[1]: Mounted /mnt/NFSSERVER_VOL. > 2300:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER_VOL.mount changed > mounted -> failed > 2301:Feb 23 06:52:13 SERVER systemd[1]: Failed to destroy cgroup > /system.slice/mnt-NFSSERVER_VOL.mount: Device or resource busy > 2302:Feb 23 06:52:13 SERVER systemd[1]: Unit mnt-NFSSERVER_VOL.mount entered > failed state. > 2303:Feb 23 06:52:13 SERVER systemd[1]: Sent message type=signal sender=n/a > destination=n/a > object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER_5fVOL_2emount > interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=30 > reply_cookie=0 error=n/a > 2304:Feb 23 06:52:13 SERVER systemd[1]: Sent message type=signal sender=n/a > destination=n/a > object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER_5fVOL_2emount > interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=31 > reply_cookie=0 error=n/a > 2449:Feb 23 06:52:13 SERVER systemd[1]: Starting Host and Network Name > Lookups. > 2452:Feb 23 06:52:13 SERVER systemd[1]: Reached target Host and Network Name > Lookups. > > in particular > > 1664:Feb 23 06:50:43 SERVER systemd[1]: Reached target Network is Online. > 1714:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER_VOL... > > so the net online target is reached (if we believe systemd :) from man > systemd.special for network-online.target, "What precisely this requires is > left to the implementation of the network managing service.", how can we > know how this is implemented, and so what is meant here as 'online'?) before > attempting the mount. > > I checked at what point the /mnt/NFSSERVER_VOL is executed, and it was the > very last one NFS mount attempted (dunno if it's useful, i was expecting to > be the first, so with a possibly network still unstable), here below what I > called before /mnt/NFSSERVER_VOL is /mnt/NFSSERVER1_VOL (as i included also > the other NFS mounts): > > 1665:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER12_VOL... > 1669:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER7_VOL... > 1674:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER10... > 1679:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER7_VOL... > 1684:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER8... > 1689:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER7_VOL... > 1694:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER5... > 1699:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER4... > 1704:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER3... > 1709:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER2... > 1714:Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER1_VOL... > 1757:Feb 23 06:50:46 SERVER systemd[1]: mnt-NFSSERVER12_VOL.mount changed > mounting -> mounting-done > 1759:Feb 23 06:50:46 SERVER systemd[1]: Mounted /mnt/NFSSERVER12_VOL. > 1777:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER8.mount changed > mounting -> mounting-done > 1779:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER8. > 1780:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER7_VOL.mount changed > mounting -> mounting-done > 1782:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER7_VOL. > 1783:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER3.mount changed > mounting -> mounting-done > 1785:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER3. > 1786:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER5.mount changed > mounting -> mounting-done > 1788:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER5. > 1789:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER7_VOL.mount changed > mounting -> mounting-done > 1791:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER7_VOL. > 1792:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER7_VOL.mount changed > mounting -> mounting-done > 1794:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER7_VOL. > 1795:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER10.mount changed > mounting -> mounting-done > 1797:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER10. > 1830:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER4.mount changed > mounting -> mounted > 1832:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER4. > 1836:Feb 23 06:50:47 SERVER systemd[1]: mnt-NFSSERVER2.mount changed > mounting -> mounted > 1838:Feb 23 06:50:47 SERVER systemd[1]: Mounted /mnt/NFSSERVER2. > 1906:Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed > mounting -> mounting-sigterm > 1919:Feb 23 06:52:13 SERVER systemd[1]: Mounted /mnt/NFSSERVER1_VOL. > > but it is the first one defined in /etc/fstab: > > XXX.YYY.32.75:/vol/vol3 /mnt/NFSSERVER1_VOL nfs > ro,intr,nolock,tcp,rdirplus,noatime,_netdev > XXX.YYY.26.132:/ifs /mnt/NFSSERVER2 nfs > rw,intr,tcp,rdirplus,noatime,_netdev > XXX.YYY.20.176:/ifs /mnt/NFSSERVER3 nfs > ro,tcp,rdirplus,noatime,_netdev > XXX.YYY.16.226:/ifs /mnt/NFSSERVER4 nfs > rw,intr,tcp,rdirplus,noatime,_netdev > XXX.YYY.21.22:/ifs/scratcho /mnt/NFSSERVER5 nfs > rw,intr,tcp,rdirplus,noatime,_netdev > XXX.YYY.27.74:/vol/vol5 /mnt/NFSSERVER7_VOL nfs > rw,intr,tcp,rdirplus,noatime,_netdev > XXX.YYY.32.111:/buildcache /mnt/NFSSERVER8 nfs > ro,soft,intr,tcp,rdirplus,noatime,_netdev > XXX.YYY.27.74:/vol/vol1 /mnt/NFSSERVER7_VOL nfs > ro,soft,intr,tcp,rdirplus,noatime,_netdev > XXX.YYY.20.176:/ifs /mnt/NFSSERVER10 nfs > ro,nocto,actimeo=600,tcp,rdirplus,noatime,_netdev > XXX.YYY.27.74:/vol/vol6 /mnt/NFSSERVER7_VOL nfs > ro,soft,intr,tcp,rdirplus,noatime,_netdev > XXX.YYY.20.14:/vol/vol3 /mnt/NFSSERVER12_VOL nfs > ro,intr,nolock,tcp,rdirplus,noatime,_netdev > > something else that sounds fishy is: > > Feb 23 06:50:43 SERVER systemd[1]: Mounting /mnt/NFSSERVER1_VOL... > Feb 23 06:50:43 SERVER systemd[1]: About to execute: /bin/mount -n > XXX.YYY.32.75:/vol/vol3 /mnt/NFSSERVER1_VOL -t nfs -o > ro,intr,nolock,tcp,rdirplus,noatime,_netdev > Feb 23 06:50:43 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed dead -> > mounting > Feb 23 06:50:43 SERVER systemd[574]: Executing: /bin/mount -n > XXX.YYY.32.75:/vol/vol3 /mnt/NFSSERVER1_VOL -t nfs -o > ro,intr,nolock,tcp,rdirplus,noatime,_netdev > Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount mounting timed > out. Stopping. > Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed mounting > -> mounting-sigterm > Feb 23 06:52:13 SERVER systemd[1]: Child 574 belongs to > mnt-NFSSERVER1_VOL.mount > Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount mount process > exited, code=killed status=15 > Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed > mounting-sigterm -> mounted > Feb 23 06:52:13 SERVER systemd[1]: Job mnt-NFSSERVER1_VOL.mount/start > finished, result=done > Feb 23 06:52:13 SERVER systemd[1]: Mounted /mnt/NFSSERVER1_VOL. > Feb 23 06:52:13 SERVER systemd[1]: Starting Remote File Systems. > Feb 23 06:52:13 SERVER systemd[1]: remote-fs.target changed dead -> active > Feb 23 06:52:13 SERVER systemd[1]: Job remote-fs.target/start finished, > result=done > Feb 23 06:52:13 SERVER systemd[1]: Reached target Remote File Systems. > Feb 23 06:52:13 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed mounted > -> failed > > so it is actually marked as done/mounted (?!?): "mnt-NFSSERVER1_VOL.mount > changed mounting-sigterm -> mounted", which triggers "Job > remote-fs.target/start finished, result=done" - but that seems wrong, it was > sigterm'ed, why systemd thinks this is actually completed correctly? at the > end it doesnt, as it's marked as failed, but that "mounting-sigterm -> > mounted" transition appears to be a mistake Yes, that is definitely weird. > > >> >> > >> > also note like cron.service is started, even if we configured: >> > >> > # grep remote-fs /etc/systemd/system/cron.service >> > Requires=remote-fs.target >> > After=remote-fs.target >> >> What does `systemctl show remote-fs.target` say? > > Id=remote-fs.target > Names=remote-fs.target > Requires=mnt-NFSSERVER12_VOL.mount mnt-NFSSERVER7_vol6.mount > mnt-NFSSERVER32.mount mnt-NFSSERVER9_vol1.mount mnt-NFSSERVER8.mount > mnt-NFSSERVER11_vol5.mount mnt-NFSSERVER5.mount mnt-NFSSERVER4.mount > mnt-gonzo.mount mnt-NFSSERVER2.mount mnt-NFSSERVER1_vol3.mount > Wants=remote-fs-pre.target > RequiredBy=cron.service > WantedBy=multi-user.target > Conflicts=shutdown.target > Before=console-setup.service kbd.service systemd-journal-flush.service > systemd-user-sessions.service mcelog.service ntp.service ipmidetectd.service > kexec-load.service nagios-nrpe-server.service nutcracker.service > lsyncd.service irqbalance.service ipwatchd.service ganglia-monitor.service > unbound.service bmc-watchdog.service setup_hugetlbfs.service sysstat.service > postfix.service cron.service > After=remote-fs-pre.target local-fs.target mnt-NFSSERVER1_VOL.mount > mnt-NFSSERVER2.mount mnt-NFSSERVER3.mount mnt-NFSSERVER4.mount > mnt-NFSSERVER5.mount mnt-NFSSERVER7_vol5.mount mnt-NFSSERVER8.mount > mnt-NFSSERVER9_vol1.mount mnt-NFSSERVER10.mount mnt-NFSSERVER11_vol6.mount > mnt-NFSSERVER12_vol3.mount > Documentation=man:systemd.special(7) > Description=Remote File Systems > LoadState=loaded > ActiveState=active > SubState=active > FragmentPath=/lib/systemd/system/remote-fs.target > DropInPaths=/run/systemd/generator/remote-fs.target.d/50-insserv.conf.conf > UnitFileState=enabled > InactiveExitTimestamp=Tue 2016-02-23 06:52:13 EST > InactiveExitTimestampMonotonic=104052746 > ActiveEnterTimestamp=Tue 2016-02-23 06:52:13 EST > ActiveEnterTimestampMonotonic=104052746 > ActiveExitTimestampMonotonic=0 > InactiveEnterTimestampMonotonic=0 > CanStart=yes > CanStop=yes > CanReload=no > CanIsolate=no > StopWhenUnneeded=no > RefuseManualStart=no > RefuseManualStop=no > AllowIsolate=no > DefaultDependencies=no > OnFailureJobMode=replace > IgnoreOnIsolate=no > IgnoreOnSnapshot=no > NeedDaemonReload=no > JobTimeoutUSec=0 > ConditionResult=yes > ConditionTimestamp=Tue 2016-02-23 06:52:13 EST > ConditionTimestampMonotonic=104052734 > Transient=no > >> >> > >> > checking the status of that target: >> > >> > # systemctl status remote-fs.target >> > ● remote-fs.target - Remote File Systems >> > Loaded: loaded (/lib/systemd/system/remote-fs.target; enabled) >> > Drop-In: /run/systemd/generator/remote-fs.target.d >> > └─50-insserv.conf.conf >> > Active: active since Tue 2016-02-23 06:52:13 EST; 3h 15min ago >> > Docs: man:systemd.special(7) >> > >> > Feb 23 06:52:13 SERVER systemd[1]: Starting Remote File Systems. >> > Feb 23 06:52:13 SERVER systemd[1]: Job remote-fs.target/start finished, >> > result=done >> > Feb 23 06:52:13 SERVER systemd[1]: Reached target Remote File Systems. >> > >> > so at the same time when mnt-NFSSERVER_VOL.mount is marked as failed, >> > the >> > remote-fs.target is marked as loaded successfully (which seems the wrong >> > status to me), and in fact the only failed unit is: >> > >> > # systemctl --failed >> > UNIT LOAD ACTIVE SUB DESCRIPTION >> > ● mnt-NFSSERVER_VOL.mount loaded failed failed /mnt/NFSSERVER_VOL >> > >> > 1 loaded units listed. Pass --all to see loaded but inactive units, too. >> >> Looks like remote-fs.target does not Require its mounts, only Wants. > > from the 'show' command above it seems it actually Requires the mount, where > did you find the info about only 'Want'ing the mounts? I was guessing, but your output suggests the relation is correct. I am very confused... > >> >> > >> > >> > can I provide more logs/info? do you see anything wrong on this >> > configuration that we might want to change? >> >> It seems strange that the mount fails. > > yeah I agree it's strange, but there might be situations when the particular > nfs server is overloaded, and/or the network path is a bit congested, and > the mount might take a while to come up; we consider a slow mount acceptable > (we dont reboot machines that often, and servers are already slow beasts to > start, a few secs more is not an issue), we just feel like if something goes > a bit out of the usual, the mount is never retried and it's just left it > there, dying Maybe the timeout is just too short. Maybe adding x-systemd.device-timeout=90s helps? A completely different alternative is to setup the nfs mounts as automounts instead of real mounts (ie, set x-systemd.automount option). This would have the aditional benefit of removing the need to specify dependency against remote-fs.target. This option will cause systemd to attempt to mount whenever a subpath is accessed and the mount is not yet active. > >> Any logs which might shed some >> light on why the mount command fails (especially network status)? > > sadly i cant find anything relevant: dmesg has nothing, journalctl contains > nothing helpful/related, ad so do syslog or messages - any other place you > want me to look? > >> Does >> the mount complete successfully if started via systemd post-boot ? >> that is, `systemctl start mnt-NFSSERVER_VOL.mount`. > > it usually does, up to the point that we setup a "mount -t nfs -a" as an > @reboot cronjob just to get the mounts up > > on this specific machine. I'm running that command now, so 24hrs after when > that happened, and it completes just fine (but i have no doubts it would > have succeeded even just after the boot failure): > > # systemctl --failed --no-legend > mnt-NFSSERVER_VOL.mount loaded failed failed /mnt/NFSSERVER_VOL > # systemctl start mnt-NFSSERVER_VOL.mount > > Feb 24 07:24:21 SERVER systemd[1]: Trying to enqueue job > mnt-NFSSERVER1_VOL.mount/start/replace > Feb 24 07:24:21 SERVER systemd[1]: Installed new job > mnt-NFSSERVER1_VOL.mount/start as 565 > Feb 24 07:24:21 SERVER systemd[1]: Enqueued job > mnt-NFSSERVER1_VOL.mount/start as 565 > Feb 24 07:24:21 SERVER systemd[1]: Mounting /mnt/NFSSERVER1_VOL... > Feb 24 07:24:21 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount: Directory > /mnt/majestix_vol3 to mount over is not empty, mounting anyway. > Feb 24 07:24:21 SERVER systemd[1]: About to execute: /bin/mount -n > XXX.YYY.32.75:/vol/vol3 /mnt/NFSSERVER1_VOL -t nfs -o > ro,intr,nolock,tcp,rdirplus,noatime,_netdev > Feb 24 07:24:21 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed failed > -> mounting > Feb 24 07:24:21 SERVER systemd[1]: Sent message type=signal sender=n/a > destination=n/a > object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER1_5fvol3_2emount > interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=3 > reply_cookie=0 error=n/a > Feb 24 07:24:21 SERVER systemd[1]: Sent message type=signal sender=n/a > destination=n/a > object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER1_5fvol3_2emount > interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=4 > reply_cookie=0 error=n/a > Feb 24 07:24:21 SERVER systemd[1]: Got message type=method_call sender=n/a > destination=org.freedesktop.systemd1 > object=/org/freedesktop/systemd1/unit/mnt_2dNFSSERVER1_5fvol3_2emount > interface=org.freedesktop.DBus.Properties member=Get cookie=3 reply_cookie=0 > error=n/a > Feb 24 07:24:21 SERVER systemd[27575]: Executing: /bin/mount -n > XXX.YYY.32.75:/vol/vol3 /mnt/NFSSERVER1_VOL -t nfs -o > ro,intr,nolock,tcp,rdirplus,noatime,_netdev > Feb 24 07:24:21 SERVER systemd[1]: Child 27575 belongs to > mnt-NFSSERVER1_VOL.mount > Feb 24 07:24:21 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount mount process > exited, code=exited status=0 > Feb 24 07:24:21 SERVER systemd[1]: mnt-NFSSERVER1_VOL.mount changed mounting > -> mounted > Feb 24 07:24:21 SERVER systemd[1]: Job mnt-NFSSERVER1_VOL.mount/start > finished, result=done > Feb 24 07:24:21 SERVER systemd[1]: Mounted /mnt/NFSSERVER1_VOL. > > # systemctl status | head > ● SERVER > State: running > Jobs: 0 queued > Failed: 0 units > OK, so no systemd-specific environment snafu, and more likely just a timing issue. I have to confess I don't have much more ideas on where to look... -- Saludos, Felipe Sateler _______________________________________________ Pkg-systemd-maintainers mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-systemd-maintainers
