Re: Passwordless accounts vi ports!
On Thu, 11 Aug 2016 15:29:03 +1000 Dewayne Geraghtywrote: > Olivier, > I've checked my 10.3Stable systems and they all have '*' as their password, > which is consistent with /usr/ports/Mk/UIDs. You might like to check the > age of the latter. > Regards, Dewayne. > PS Both ports and src were built from updated src and ports from 2016-08-09 The system is a most recent CURRENT as compiled yesterday last time. The ports tree is also up to date and updated on a daily basis, so are the ports. Interestingly, the problem shows up only on one box so far, although all other systems are also CURRENT and updated the very same way. On another system, only user "bacula" has an empty password, were this user is set correctly with a "*"-password on another system, on which I installed bacula months earlier. I checked the installation of the ports and their installating the password-result again and all I tested (polkit, bacula, sane) did set the "*" as expected (I deleted manually the password entry via vipw before). I guess this "problem" is due to the fact I install ports and world on a daily basis on such systems and the likelyhood hitting a interim bug is very high. Regards, Oliver ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Passwordless accounts vi ports!
Hi! > I just checked the security scanning outputs of FreeBSD and found this > surprising result: > > [...] > Checking for passwordless accounts: > polkitd::565:565::0:0:Polkit Daemon User:/var/empty:/usr/sbin/nologin > pulse::563:563::0:0:PulseAudio System User:/nonexistent:/usr/sbin/nologin > saned::194:194::0:0:SANE Scanner Daemon:/nonexistent:/bin/sh > clamav::106:106::0:0:Clamav Antivirus:/nonexistent:/usr/sbin/nologin > bacula::910:910::0:0:Bacula Daemon:/var/db/bacula:/usr/sbin/nologin > [...] > > Obviously, some ports install accounts but do not secure them as there is an > empty password. > > I consider this not a feature, but a bug. Indeed, but I can't reproduce it on my hosts. There must be some reason for this to happen ? -- p...@opsec.eu+49 171 3101372 4 years to go ! ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Passwordless accounts vi ports!
> On Aug 10, 2016, at 22:05, O. Hartmannwrote: > > I just checked the security scanning outputs of FreeBSD and found this > surprising result: > > [...] > Checking for passwordless accounts: > polkitd::565:565::0:0:Polkit Daemon User:/var/empty:/usr/sbin/nologin > pulse::563:563::0:0:PulseAudio System User:/nonexistent:/usr/sbin/nologin > saned::194:194::0:0:SANE Scanner Daemon:/nonexistent:/bin/sh > clamav::106:106::0:0:Clamav Antivirus:/nonexistent:/usr/sbin/nologin > bacula::910:910::0:0:Bacula Daemon:/var/db/bacula:/usr/sbin/nologin > [...] > > Obviously, some ports install accounts but do not secure them as there is an > empty password. > > I consider this not a feature, but a bug. saned is the only one that might concern me because the login shell isn't nologin(1). Cheers, -Ngie ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Passwordless accounts vi ports!
I just checked the security scanning outputs of FreeBSD and found this surprising result: [...] Checking for passwordless accounts: polkitd::565:565::0:0:Polkit Daemon User:/var/empty:/usr/sbin/nologin pulse::563:563::0:0:PulseAudio System User:/nonexistent:/usr/sbin/nologin saned::194:194::0:0:SANE Scanner Daemon:/nonexistent:/bin/sh clamav::106:106::0:0:Clamav Antivirus:/nonexistent:/usr/sbin/nologin bacula::910:910::0:0:Bacula Daemon:/var/db/bacula:/usr/sbin/nologin [...] Obviously, some ports install accounts but do not secure them as there is an empty password. I consider this not a feature, but a bug. Regards, Oliver ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Possible zpool online, resilvering issue
> A new transaction group (TXG) is created at LEAST every > vfs.zfs.txg.timeout (defaults to 5) seconds. > f you offline a drive for hours or more, it must have all blocks with a > 'birth time' newer than the last transaction that was recorded on the > offlined drive replayed to catch that drive up to the other drives in > the pool. > As long as you have enough redundancy, the checksum errors can be > corrected without concern. > In the end, the checksum errors can be written off as being caused by > the bad hardware. After you finish the scrub and everything is OK, do: > 'zpool clear poolname', and it will reset all of the error and checksum > counts to 0, so you can track if any more ever show up. Thanks Allan, can always count on you for crystal clear answers =]. I'm surprised tho that it would be concluded as bad hardware(assuming you mean hd?). Just seems like its too much of a coincidence. I always ran zpool clear each time after the resilver/scrub was completed. > Perhaps on or more of the drives running out of Realloc Sectors? > I had once a case where smartctl showed no issues but zfs scrubbing showed > a defect, some weeks later smartctl was showing some reallocated sectors > and one week later the HD was out of spare sectors. > Have you already tested every single HD for smart issues? Smartd is set to run a short test weekly on Tuesday Thursday and Saturday. Extended test is performed weekly on Tuesday an hour after the short test. This occurs on all 24 drives. A scrub is performed once per month on Saturday an hour after the short test. 5 Reallocated_Sector_Ct 0x0033 100 100 010Pre-fail Always - 0 This is the value of Reallocated sectors on all the drives(I think this is the normal value?). This drives smart looks like the worst of the lot. === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 592) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 491) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x50bd) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 072 063 044Pre-fail Always - 20189561 3 Spin_Up_Time0x0003 091 091 000Pre-fail Always - 0 4 Start_Stop_Count0x0032 100 100 020Old_age Always - 188 5 Reallocated_Sector_Ct 0x0033 100 100 010Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 092 085 030Pre-fail Always - 1802626788 9 Power_On_Hours 0x0032 081 081 000Old_age Always - 17457 10 Spin_Retry_Count0x0013 100 100 097Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020Old_age Always - 158 184 End-to-End_Error0x0032 100 100 099Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000Old_age Always - 0 188 Command_Timeout 0x0032 100 099 000Old_age Always - 65537 189 High_Fly_Writes 0x003a 100 100 000Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 055 045 045Old_age Always In_the_past 45 (Min/Max 34/51) 191 G-Sense_Error_Rate 0x0032 100 100 000Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000Old_age Always - 157 193 Load_Cycle_Count0x0032 100 100 000Old_age Always - 867 194 Temperature_Celsius 0x0022 045 055 000Old_age Always - 45 (0 22 0 0 0) 195 Hardware_ECC_Recovered 0x001a 053 011 000
Re: kernel panic caused by virtualbox(?)
On 10 Aug, Jung-uk Kim wrote: > On 08/09/16 05:12 AM, Konstantin Belousov wrote: >> On Mon, Aug 08, 2016 at 04:44:20PM -0700, Don Lewis wrote: >>> On 8 Aug, Konstantin Belousov wrote: On Mon, Aug 08, 2016 at 10:22:44AM -0700, John Baldwin wrote: > On Thursday, August 04, 2016 05:10:29 PM Don Lewis wrote: >> Reposted to -current to get some more eyes on this ... >> >> I just got a kernel panic when I started up a CentOS 7 VM in virtualbox. >> The host is: >> FreeBSD 12.0-CURRENT #17 r302500 GENERIC amd64 >> The virtualbox version is: >> virtualbox-ose-5.0.26 >> virtualbox-ose-kmod-5.0.26_1 >> >> The panic message is: >> >> panic: Unregistered use of FPU in kernel >> cpuid = 1 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 0xfe085a55d030 >> vpanic() at vpanic+0x182/frame 0xfe085a55d0b0 >> kassert_panic() at kassert_panic+0x126/frame 0xfe085a55d120 >> trap() at trap+0x7ae/frame 0xfe085a55d330 >> calltrap() at calltrap+0x8/frame 0xfe085a55d330 >> --- trap 0x16, rip = 0x827dd3a9, rsp = 0xfe085a55d408, rbp = >> 0xfe085a55d430 --- >> g_pLogger() at 0x827dd3a9/frame 0xfe085a55d430 >> g_pLogger() at 0x8274e5c7/frame 0x3 >> KDB: enter: panic >> >> Since g_pLogger is a symbol in vboxdrv.ko, it looks like virtualbox is >> the trigger. >> >> There are no symbols for the virtualbox kmods, possibly because I >> installed them as an upgrade using packages (built with the same source >> tree version) instead of by using PORTS_MODULES in make.conf, so ports >> kgdb didn't have anything useful to say about what happened before the >> trap. >> >> This panic is very repeatable. I just got another one when starting the >> same VM., but this time the two calls before the trap were >> null_bug_bypass(). Hmn, that symbol is in nullfs ... >> >> I don't see this with a Windows 7 VM. >> >> All of the virtualbox kmod files are compiled with -mno-mmx -mno-sse >> -msoft-float -mno-aes -mno-avx Your disassemble listed fxrstor instruction that failing, or did I mis-remembered ? This is most likely some context switch code, either by virtual machine or erronously executed guest code. It is not a spontaneous use of FPU, but more likely something different. Can you confirm ? In either case, I do not remember any KBI changes around PCB layout or fpu_enter() KPI recently. > > I suspect head packages are quite likely built against the a "wrong" KBI > and are too fragile to use for kmods vs compiling from ports. :-/ I would > try a built-from-ports kmod to see if the panics go away. FWIW, I will commit the following change shortly. Since third-party modules break the invariant, either due to bugs (ndis wrappers) or possibly due to KBI breakage, it is worth to have the detection enabled for production kernels. >>> >>> Interesting ... I tried running virtualbox on recent 10.3-STABLE with a >>> GENERIC kernel and the guest seemed to operate properly. Then I enabled >>> INVARIANTS and got the panic. I suspect that is why nobody has stumbled >>> across this before. >>> >> This is yet another reason to promote KASSERT to the full panic. >> I expect that the vbox source lacks fpu_kern_enter() calls around the >> FPU state restoration. > > Unfortunately, the code is in MI source as it is unnecessary for > supported OSes (read: FreeBSD is not supported) and it's not easy to > inject fpu_kern_enter()/fpu_kern_leave() calls there. :-( It's a headache, but our ports can use patch files for that sort of thing ... ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: kernel panic caused by virtualbox(?)
On 08/09/16 05:12 AM, Konstantin Belousov wrote: > On Mon, Aug 08, 2016 at 04:44:20PM -0700, Don Lewis wrote: >> On 8 Aug, Konstantin Belousov wrote: >>> On Mon, Aug 08, 2016 at 10:22:44AM -0700, John Baldwin wrote: On Thursday, August 04, 2016 05:10:29 PM Don Lewis wrote: > Reposted to -current to get some more eyes on this ... > > I just got a kernel panic when I started up a CentOS 7 VM in virtualbox. > The host is: > FreeBSD 12.0-CURRENT #17 r302500 GENERIC amd64 > The virtualbox version is: > virtualbox-ose-5.0.26 > virtualbox-ose-kmod-5.0.26_1 > > The panic message is: > > panic: Unregistered use of FPU in kernel > cpuid = 1 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfe085a55d030 > vpanic() at vpanic+0x182/frame 0xfe085a55d0b0 > kassert_panic() at kassert_panic+0x126/frame 0xfe085a55d120 > trap() at trap+0x7ae/frame 0xfe085a55d330 > calltrap() at calltrap+0x8/frame 0xfe085a55d330 > --- trap 0x16, rip = 0x827dd3a9, rsp = 0xfe085a55d408, rbp = > 0xfe085a55d430 --- > g_pLogger() at 0x827dd3a9/frame 0xfe085a55d430 > g_pLogger() at 0x8274e5c7/frame 0x3 > KDB: enter: panic > > Since g_pLogger is a symbol in vboxdrv.ko, it looks like virtualbox is > the trigger. > > There are no symbols for the virtualbox kmods, possibly because I > installed them as an upgrade using packages (built with the same source > tree version) instead of by using PORTS_MODULES in make.conf, so ports > kgdb didn't have anything useful to say about what happened before the > trap. > > This panic is very repeatable. I just got another one when starting the > same VM., but this time the two calls before the trap were > null_bug_bypass(). Hmn, that symbol is in nullfs ... > > I don't see this with a Windows 7 VM. > > All of the virtualbox kmod files are compiled with -mno-mmx -mno-sse > -msoft-float -mno-aes -mno-avx >>> Your disassemble listed fxrstor instruction that failing, or did I >>> mis-remembered ? This is most likely some context switch code, either >>> by virtual machine or erronously executed guest code. It is not a >>> spontaneous use of FPU, but more likely something different. Can you >>> confirm ? >>> >>> In either case, I do not remember any KBI changes around PCB layout or >>> fpu_enter() KPI recently. >>> I suspect head packages are quite likely built against the a "wrong" KBI and are too fragile to use for kmods vs compiling from ports. :-/ I would try a built-from-ports kmod to see if the panics go away. >>> >>> FWIW, I will commit the following change shortly. Since third-party >>> modules break the invariant, either due to bugs (ndis wrappers) or >>> possibly due to KBI breakage, it is worth to have the detection enabled >>> for production kernels. >> >> Interesting ... I tried running virtualbox on recent 10.3-STABLE with a >> GENERIC kernel and the guest seemed to operate properly. Then I enabled >> INVARIANTS and got the panic. I suspect that is why nobody has stumbled >> across this before. >> > This is yet another reason to promote KASSERT to the full panic. > I expect that the vbox source lacks fpu_kern_enter() calls around the > FPU state restoration. Unfortunately, the code is in MI source as it is unnecessary for supported OSes (read: FreeBSD is not supported) and it's not easy to inject fpu_kern_enter()/fpu_kern_leave() calls there. :-( Jung-uk Kim signature.asc Description: OpenPGP digital signature
Re: PORTS_MODULES breakage on HEAD
On 8/7/16 5:44 PM, Don Lewis wrote: > Adding PORTS_MODULES=emulators/virtualbox-ose-kmod recently broke on > HEAD. When I do that I get this failure: > > ===> Ports module emulators/virtualbox-ose-kmod (all) > cd ${PORTSDIR:-/usr/ports}/emulators/virtualbox-ose-kmod; > PATH=/usr/obj/usr/src/ > tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/leg > acy/bin:/usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/sbin:/bin:/u > sr/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin SRC_BASE=/usr/src > OSVERSION=12 > 0 WRKDIRPREFIX=/usr/obj/usr/src/sys/ make -B clean all > ===> Cleaning for virtualbox-ose-kmod-5.0.26_1 > ===> License GPLv2 accepted by the user > ===> Found saved configuration for virtualbox-ose-kmod-4.3.34 > ===> virtualbox-ose-kmod-5.0.26_1 depends on file: /usr/local/sbin/pkg - > found > ===> Fetching all distfiles required by virtualbox-ose-kmod-5.0.26_1 for > buildin > g > ===> Extracting for virtualbox-ose-kmod-5.0.26_1 > => SHA256 Checksum OK for VirtualBox-5.0.26.tar.bz2. > ===> Patching for virtualbox-ose-kmod-5.0.26_1 > ===> Applying FreeBSD patches for virtualbox-ose-kmod-5.0.26_1 > ===> virtualbox-ose-kmod-5.0.26_1 depends on executable: kmk - found > ===> Configuring for virtualbox-ose-kmod-5.0.26_1 > Checking for environment: Determined build machine: freebsd.amd64, target > machin > e: freebsd.amd64, OK. > Checking for kBuild: found, OK. > Checking for gcc: > ** cc -target x86_64-unknown-freebsd12.0 --sysroot (variable CC) not found! > Check > /usr/obj/usr/src/sys/usr/ports/emulators/virtualbox-ose-kmod/work/VirtualB > ox-5.0.26/configure.log for details > ===> Script "configure" failed unexpectedly. > Please report the problem to v...@freebsd.org [maintainer] and attach the > "/usr/obj/usr/src/sys//usr/ports/emulators/virtualbox-ose-kmod/work/VirtualBox-5 > .0.26/config.log" > > > It appears that the problem is due to CC being set to: > cc -target x86_64-unknown-freebsd12.0 --sysroot > and the Makefile for the port passes this: > --with-gcc="${CC}" > to configure. The configure script passes $CC to check_avail, which > does a -z test on it. > > I think that CC should just be set to "cc" and the rest should get added > to CFLAGS. I suspect this got broken by the recent crossbuild changes. > It's a SYSTEM_COMPILER bug. I'll look into fixing it. For now you can try passing WITHOUT_SYSTEM_COMPILER=yes as a workaround. -- Regards, Bryan Drewery signature.asc Description: OpenPGP digital signature
Re: Mosh regression between 10.x and 11-stable
On 8/10/16 4:18 AM, Peter Jeremy wrote: > I recently updated one of my VPS hosts from 10.3-RELEASE-p5 to 11.0-BETA4 > r303811 and mosh to that host from my Linux laptop stopped working. All > I get on the laptop is: > $ mosh remotehost > Connection to remotehost closed. > /usr/bin/mosh: Did not find mosh server startup message. > > I've tried rebuilding mosh (and all dependencies) on the host to no avail. I'm a mosh maintainer. mosh 1.2.5 (from ports) and mosh master (just last night tagged as 1.2.6, alas) work fine for me on my two 11.0-BETA4 systems, one local and one remote. > This isn't the DSA change that's been discussed elsewhere: I can SSH from my > laptop to the host without problem. I can also manually invoke mosh-client > and mosh-server and it works. Unfortunately, mosh has no provision for > debugging. I've tried hacking the mosh perl script to make it more verbose > and that shows that: > 1) the "MOSH CONNECT" message isn't making it out of the local ssh process. Do you know if the message is getting out of mosh-server? into sshd? Do you know if mosh-server is actually running? (It will log utmp entries on startup.) Mosh's debugging/logging isn't very good, but 'mosh-server new -v 2> logfile' does produce some useful info (mostly logging of network traffic). > 2) it's racy because I can get it from "always fails" to "sometimes works". How do you get it there? > My suspicion is that something has changed in either sshd or TCP that > is resulting in the connection going away before the stdout from the > remote mosh-server makes it out from the local ssh process. mosh does 'ssh -t' and uses ptys. That's another potential point the message could get dropped. > I've looked at tcpdump's of both successful and failed SSH sessions > but don't see anything obviously different (encryption makes it > difficult to decode the session). > > Has anyone else seen this behaviour or have any ideas what might be > causing it? Common suspects include issues with shell login/invocation of mosh (are you making sure it's reachable in /usr/local/bin with $PATH or '--server=/usr/local/bin/mosh'? are your login shell and its login scripts unusual?) On Linux we've had issues with ecryptfs and systemd breaking mosh-server when the ssh session ends, but I don't think that applies here. regards, --jh ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Possible zpool online, resilvering issue
On 2016-08-04 07:22, Ultima wrote: > Hello, > > I recently had some issue with a PSU and ran several scrubs on a pool with > around 35T. Random drives would drop and require a zpool online, this found > checksum errors. (as expected) However, after all the scrubs I ran, I think > I may have found a bug with zpool online resilvering process. > > 24 disks total, 4 vdevs raidz2 (6 drives each). > > Before this next part... I had a backup PSU, however it was also going bad > and waiting for RMA. The current one seemed to be dieing but ran fine with > less drives. So I decided I would run the server short 4 drives. > > Started by offline(or already removed from psu) 4 drives from different > vdevs, then ran a scrub to verify everything. Many sum errors were present > on some of the drives, but this was expected due to faulty psu. Then > offlined 4 different drives and onlined the other 4 and scrubbed once > again. After resilver, again, many sum errors on these drives as expected. > > After the scrub completed, I decided to offline 4 different drives, then > online the ones that were out of pool for awhile. During the resilver, > checksum errors were once again found. I was surprised due to the recent > scrub, So I decided to run another scrub, and it found even more checksum > errors on these recently onlined drives. I didn't think much about it, > however after the replacement PSU arrived, I onlined all the drives out of > pool and again, resilver had checksum errors as well as another scrub with > more sum errors. > > Is this issue known? Is it common for a scrub to be required after onlining > a disk that was out of pool for some time? > > The drives are ST4000NM0033, and until recent have never had a single > checksum error in they're lifetime.(at least with zfs) > FreeBSD S1 12.0-CURRENT FreeBSD 12.0-CURRENT #19 r303224: Sat Jul 23 > 10:41:12 EDT 2016 > root@S1:/usr/src/head/obj/usr/src/head/src/sys/MYKERNEL-NODEBUG > amd64 > > > Sorry for the wall of text, but I hope this helps in tracking down this > possible bug. > Perhaps on or more of the drives running out of Realloc Sectors? I had once a case where smartctl showed no issues but zfs scrubbing showed a defect, some weeks later smartctl was showing some reallocated sectors and one week later the HD was out of spare sectors. Have you already tested every single HD for smart issues? -- olli ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Mosh regression between 10.x and 11-stable
On 8/10/16 4:18 AM, Peter Jeremy wrote: > I recently updated one of my VPS hosts from 10.3-RELEASE-p5 to 11.0-BETA4 > r303811 and mosh to that host from my Linux laptop stopped working. All > I get on the laptop is: > $ mosh remotehost > Connection to remotehost closed. > /usr/bin/mosh: Did not find mosh server startup message. > > I've tried rebuilding mosh (and all dependencies) on the host to no avail. I'm a mosh maintainer. mosh 1.2.5 (from ports) and mosh master (just last night tagged as 1.2.6, alas) work fine for me on my two 11.0-BETA4 systems, one local and one remote. > This isn't the DSA change that's been discussed elsewhere: I can SSH from my > laptop to the host without problem. I can also manually invoke mosh-client > and mosh-server and it works. Unfortunately, mosh has no provision for > debugging. I've tried hacking the mosh perl script to make it more verbose > and that shows that: > 1) the "MOSH CONNECT" message isn't making it out of the local ssh process. Do you know if the message is getting out of mosh-server? into sshd? Do you know if mosh-server is actually running? (It will log utmp entries on startup.) Mosh's debugging/logging isn't very good, but 'mosh-server new -v 2> logfile' does produce some useful info (mostly logging of network traffic). > 2) it's racy because I can get it from "always fails" to "sometimes works". How do you get it there? > My suspicion is that something has changed in either sshd or TCP that > is resulting in the connection going away before the stdout from the > remote mosh-server makes it out from the local ssh process. mosh does 'ssh -t' and uses ptys. That's another potential point the message could get dropped. > I've looked at tcpdump's of both successful and failed SSH sessions > but don't see anything obviously different (encryption makes it > difficult to decode the session). > > Has anyone else seen this behaviour or have any ideas what might be > causing it? Common suspects include issues with shell login/invocation of mosh (are you making sure it's reachable in /usr/local/bin with $PATH or '--server=/usr/local/bin/mosh'? are your login shell and its login scripts unusual?) On Linux we've had issues with ecryptfs and systemd breaking mosh-server when the ssh session ends, but I don't think that applies here. regards, --jh ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Possible zpool online, resilvering issue
Am 10.08.2016 um 18:53 schrieb Ultima: > Hello, > >> I didn't see any reply on the list, so I thought I might let you know > > Sorry, never received this reply (till now) xD > >>what I assume is happening: > >> ZFS never updates data in place, which affects inode updates, e.g. if >> a file has been read and access times must be updated. (For that reason, >> many ZFS file systems are configured to ignore access time updates). > >> Even if there were only R/O accesses to files in the pool, there will >> have been updates to the inodes, which were missed by the offlined >> drives (unless you ignore atime updates). > >> But even if there are no access time updates, ZFS might have written >> new uberblocks and other meta information. Check the POOL history and >> see if there were any TXGs created during the scrub. > >> If you scrub the pooll while it is off-line, it should stay stable >> (but if any information about the scrub, the offlining of drives etc. >> is recorded in the pool's history log, differences are to be expected). > >> Just my $.02 ... > >> Regards, STefan > > Thanks for the reply, I'm not completely sure what would be considered a > TXG. Maintained normal operations during most this noise and this pool > has quite a bit of activity during normal operations. My zpool history > looks like it gos on forever and the last scrub is showing it repaired > 9.48G. That was for all these access time updates? I guess that would be > a little less then 2.5G per disk worth. > > The zpool history looks like it gos on forever (733373 lines). This pool > has much of this activity with poudriere. All the entries I see are > clone, destroy, rollback and snapshotting. I can't really say how much > but at least 500 (prob much more than that) entries between the last two > scrubs. Atime is off on all datasets. > > So to be clear, this is expected behavior with atime=off + TXGs during > offline time? I had thought that the resilver after onlining the disk > would bring that disk up-to-date with the pool. I guess my understanding > was a bit off. Sorry, you'll have to ask somebody more familiar with ZFS internals than me. I just wanted to point out, that scrub might change the state of the drives, even though no file data is modified. Some 10 GB "repaired" on a 35000 GB pool is not much, it is about what I'd expect to be required for meta-data. BTW: The pool history is chronologically sorted, you need only check the last few lines (written after the start time of the scrub, or rather written after offlining some of the disk drives). Regards, STefan ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Possible zpool online, resilvering issue
On 2016-08-10 12:53, Ultima wrote: > Hello, > >> I didn't see any reply on the list, so I thought I might let you know > > Sorry, never received this reply (till now) xD > >> what I assume is happening: > >> ZFS never updates data in place, which affects inode updates, e.g. if >> a file has been read and access times must be updated. (For that reason, >> many ZFS file systems are configured to ignore access time updates). > >> Even if there were only R/O accesses to files in the pool, there will >> have been updates to the inodes, which were missed by the offlined >> drives (unless you ignore atime updates). > >> But even if there are no access time updates, ZFS might have written >> new uberblocks and other meta information. Check the POOL history and >> see if there were any TXGs created during the scrub. > >> If you scrub the pooll while it is off-line, it should stay stable >> (but if any information about the scrub, the offlining of drives etc. >> is recorded in the pool's history log, differences are to be expected). > >> Just my $.02 ... > >> Regards, STefan > > Thanks for the reply, I'm not completely sure what would be considered a > TXG. Maintained normal operations during most this noise and this pool has > quite a bit of activity during normal operations. My zpool history looks > like it gos on forever and the last scrub is showing it repaired 9.48G. > That was for all these access time updates? I guess that would be a little > less then 2.5G per disk worth. > > The zpool history looks like it gos on forever (733373 lines). This pool > has much of this activity with poudriere. All the entries I see are clone, > destroy, rollback and snapshotting. I can't really say how much but at > least 500 (prob much more than that) entries between the last two scrubs. > Atime is off on all datasets. > > So to be clear, this is expected behavior with atime=off + TXGs during > offline time? I had thought that the resilver after onlining the disk would > bring that disk up-to-date with the pool. I guess my understanding was a > bit off. > > Ultima > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > A new transaction group (TXG) is created at LEAST every vfs.zfs.txg.timeout (defaults to 5) seconds. If you offline a drive for hours or more, it must have all blocks with a 'birth time' newer than the last transaction that was recorded on the offlined drive replayed to catch that drive up to the other drives in the pool. As long as you have enough redundancy, the checksum errors can be corrected without concern. In the end, the checksum errors can be written off as being caused by the bad hardware. After you finish the scrub and everything is OK, do: 'zpool clear poolname', and it will reset all of the error and checksum counts to 0, so you can track if any more ever show up. -- Allan Jude signature.asc Description: OpenPGP digital signature
Re: Possible zpool online, resilvering issue
Hello, > I didn't see any reply on the list, so I thought I might let you know Sorry, never received this reply (till now) xD >what I assume is happening: > ZFS never updates data in place, which affects inode updates, e.g. if > a file has been read and access times must be updated. (For that reason, > many ZFS file systems are configured to ignore access time updates). > Even if there were only R/O accesses to files in the pool, there will > have been updates to the inodes, which were missed by the offlined > drives (unless you ignore atime updates). > But even if there are no access time updates, ZFS might have written > new uberblocks and other meta information. Check the POOL history and > see if there were any TXGs created during the scrub. > If you scrub the pooll while it is off-line, it should stay stable > (but if any information about the scrub, the offlining of drives etc. > is recorded in the pool's history log, differences are to be expected). > Just my $.02 ... > Regards, STefan Thanks for the reply, I'm not completely sure what would be considered a TXG. Maintained normal operations during most this noise and this pool has quite a bit of activity during normal operations. My zpool history looks like it gos on forever and the last scrub is showing it repaired 9.48G. That was for all these access time updates? I guess that would be a little less then 2.5G per disk worth. The zpool history looks like it gos on forever (733373 lines). This pool has much of this activity with poudriere. All the entries I see are clone, destroy, rollback and snapshotting. I can't really say how much but at least 500 (prob much more than that) entries between the last two scrubs. Atime is off on all datasets. So to be clear, this is expected behavior with atime=off + TXGs during offline time? I had thought that the resilver after onlining the disk would bring that disk up-to-date with the pool. I guess my understanding was a bit off. Ultima ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Signal 12 on make update (or any target in /usrc/src)
On Wed, Aug 10, 2016 at 10:49:40AM -0400, Matteo Riondato wrote: > > > On Aug 10, 2016, at 10:41 AM, Konstantin Belousov> > wrote: > > On Wed, Aug 10, 2016 at 10:33:23AM -0400, Matteo Riondato wrote: > >> Hi all, > >> > >> I recently upgraded from a late June (pre 11-branch, as far as I can tell) > >> revision to r303771. > >> > >> Now, running ???make update??? (or buildworld, ???) in /usr/src fails with > >> a signal 12: > >> > >> matteo@triton:/usr/src$ sudo make update > >> Password: > >> *** Signal 12 > > > > You did not updated, I think. You, most likely, inly updated the kernel, > > but left the old userspace in place, at least libc. > > That would be surprising but it may have happened, as I don???t remember > without doubts to have run installworld :/ > > > Signal 12 is SIGSYS, which means that the program tries to use a syscall > > not implemented by the kernel. My guess is that your kernel lacks option > > COMPAT_FREEBSD10, and the failing syscall is pipe(2). > > Indeed I do not have COMPAT_FREEBSD10, because I believed my previous world > revision was >302092, as noted by the entry about pipe(2) in UPDATING. > > Any suggestion on how to fix this? > Boot the old kernel, add COMPAT_FREEBSD10 to kernel config, and > rebuild/install world and kernel perhaps? > If old kernel works, then this would allow you to recover. Take libc.so.7 from the BETA-4, and put it into /lib, taking backup of your current libc first. I suspect this is the easiest route if old kernel does not match with your world. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Signal 12 on make update (or any target in /usrc/src)
> On Aug 10, 2016, at 10:41 AM, Konstantin Belousovwrote: > On Wed, Aug 10, 2016 at 10:33:23AM -0400, Matteo Riondato wrote: >> Hi all, >> >> I recently upgraded from a late June (pre 11-branch, as far as I can tell) >> revision to r303771. >> >> Now, running ???make update??? (or buildworld, ???) in /usr/src fails with a >> signal 12: >> >> matteo@triton:/usr/src$ sudo make update >> Password: >> *** Signal 12 > > You did not updated, I think. You, most likely, inly updated the kernel, > but left the old userspace in place, at least libc. That would be surprising but it may have happened, as I don’t remember without doubts to have run installworld :/ > Signal 12 is SIGSYS, which means that the program tries to use a syscall > not implemented by the kernel. My guess is that your kernel lacks option > COMPAT_FREEBSD10, and the failing syscall is pipe(2). Indeed I do not have COMPAT_FREEBSD10, because I believed my previous world revision was >302092, as noted by the entry about pipe(2) in UPDATING. Any suggestion on how to fix this? Boot the old kernel, add COMPAT_FREEBSD10 to kernel config, and rebuild/install world and kernel perhaps? Thanks for the help! Matteo signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Signal 12 on make update (or any target in /usrc/src)
On Wed, Aug 10, 2016 at 10:33:23AM -0400, Matteo Riondato wrote: > Hi all, > > I recently upgraded from a late June (pre 11-branch, as far as I can tell) > revision to r303771. > > Now, running ???make update??? (or buildworld, ???) in /usr/src fails with a > signal 12: > > matteo@triton:/usr/src$ sudo make update > Password: > *** Signal 12 You did not updated, I think. You, most likely, inly updated the kernel, but left the old userspace in place, at least libc. Signal 12 is SIGSYS, which means that the program tries to use a syscall not implemented by the kernel. My guess is that your kernel lacks option COMPAT_FREEBSD10, and the failing syscall is pipe(2). ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Signal 12 on make update (or any target in /usrc/src)
Hi all, I recently upgraded from a late June (pre 11-branch, as far as I can tell) revision to r303771. Now, running “make update” (or buildworld, …) in /usr/src fails with a signal 12: matteo@triton:/usr/src$ sudo make update Password: *** Signal 12 Stop. make: stopped in /usr/src .ERROR_TARGET='update' .ERROR_META_FILE='' .MAKE.LEVEL='0' MAKEFILE='' .MAKE.MODE='normal' .CURDIR='/usr/src' .MAKE='make' .OBJDIR='/usr/obj/usr/src' .TARGETS='update' DESTDIR='' LD_LIBRARY_PATH='' MACHINE='amd64' MACHINE_ARCH='amd64' MAKEOBJDIRPREFIX='/usr/obj' MAKESYSPATH='/usr/src/share/mk' MAKE_VERSION='20160606' PATH='/sbin:/bin:/usr/sbin:/usr/bin' SRCTOP='/usr/src' OBJTOP='/usr/obj/usr/src Installing ports using “make install” works. Relevant (?) section of src.conf: WITH_CCACHE_BUILD=y WITH_SYSTEM_COMPILER=y src-env.conf: WITH_META_MODE=yes make.conf: KERNCONF=TRITON CPUTYPE?=k8-sse3 SVN_UPDATE=y COPTFLAGS=-O2 -pipe MALLOC_PRODUCTION=y Any hints? Thanks, Matteo signature.asc Description: Message signed with OpenPGP using GPGMail
Mosh regression between 10.x and 11-stable
I recently updated one of my VPS hosts from 10.3-RELEASE-p5 to 11.0-BETA4 r303811 and mosh to that host from my Linux laptop stopped working. All I get on the laptop is: $ mosh remotehost Connection to remotehost closed. /usr/bin/mosh: Did not find mosh server startup message. I've tried rebuilding mosh (and all dependencies) on the host to no avail. This isn't the DSA change that's been discussed elsewhere: I can SSH from my laptop to the host without problem. I can also manually invoke mosh-client and mosh-server and it works. Unfortunately, mosh has no provision for debugging. I've tried hacking the mosh perl script to make it more verbose and that shows that: 1) the "MOSH CONNECT" message isn't making it out of the local ssh process. 2) it's racy because I can get it from "always fails" to "sometimes works". My suspicion is that something has changed in either sshd or TCP that is resulting in the connection going away before the stdout from the remote mosh-server makes it out from the local ssh process. I've looked at tcpdump's of both successful and failed SSH sessions but don't see anything obviously different (encryption makes it difficult to decode the session). Has anyone else seen this behaviour or have any ideas what might be causing it? -- Peter Jeremy signature.asc Description: PGP signature