from:"Eggert, Lars"

Re: FYI: SVN to GIT converter currently broken, github is falling behind

2015-11-11 Thread Eggert, Lars

Hi,

I just got this error when fetching from remote; related?

[elars@laurel: ~/src] git fetch --all
Fetching origin
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
Fetching upstream
remote: Counting objects: 557, done.
remote: Compressing objects: 100% (543/543), done.
remote: Total 557 (delta 213), reused 2 (delta 2), pack-reused 0
Receiving objects: 100% (557/557), 1.15 MiB | 433.00 KiB/s, done.
Resolving deltas: 100% (213/213), completed with 2 local objects.
From github.com:/freebsd/freebsd
   b4eb11a..3eb0ea4  master -> upstream/master
   f147893..9c319c0  stable/10  -> upstream/stable/10
   e901edd..b3c9fd2  stable/8   -> upstream/stable/8
   81ab2b1..2fc7a9a  stable/9   -> upstream/stable/9
   c2c933c..cc76737  svn_head   -> upstream/svn_head
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
error: The last gc run reported the following. Please correct the root cause
and remove .git/gc.log.
Automatic cleanup will not be performed until the file is removed.

fatal: bad object refs/remotes/origin/HEAD
error: failed to run repack

Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
error: The last gc run reported the following. Please correct the root cause
and remove .git/gc.log.
Automatic cleanup will not be performed until the file is removed.

fatal: bad object refs/remotes/origin/HEAD
error: failed to run repack

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: env functionality of config(5) has no effect?

2015-10-22 Thread Eggert, Lars

amd64

-- 
Sent from a mobile device; please excuse typos.
+49 151 120 55791

> On Oct 22, 2015, at 21:08, Ian Lepore <i...@freebsd.org> wrote:
> 
>> On Wed, 2015-10-21 at 08:09 +, Eggert, Lars wrote:
>> Hi,
>> 
>> I'm trying to include some loader tunables in the kernel, via the
>> "env" functionality described in config(5).
>> 
>> When I look at the compiled kernel binary with strings(1), I see that
>> the tunables are compiled in.
>> 
>> However, they don't seem to take any effect when booting the kernel,
>> and they also don't show up when running kenv(1) after boot.
>> 
>> Any ideas?
>> 
>> Thanks,
>> Lars
> 
> I finally found a few minutes to look into this today.  You didn't say
> what platform you're working with.  It appears that this has only ever
> worked on i386 and a handful of old arm and mips platforms.
> 
> -- Ian
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

env functionality of config(5) has no effect?

2015-10-21 Thread Eggert, Lars

Hi,

I'm trying to include some loader tunables in the kernel, via the "env" 
functionality described in config(5).

When I look at the compiled kernel binary with strings(1), I see that the 
tunables are compiled in.

However, they don't seem to take any effect when booting the kernel, and they 
also don't show up when running kenv(1) after boot.

Any ideas?

Thanks,
Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

Difference between pkg 1.5.2 and 1.5.4

2015-06-18 Thread Eggert, Lars

Hi,

I'm netbooting with a read-only rootfs. Up until version 1.5.2 of pkg, that 
sometimes caused some errors when installing various packages, but the install 
continued even if some files couldn't be written.

That seems to have changed with 1.5.4. Specifically, upgrading ca_root_nss from 
3.19 to 3.19.1_1 now aborts in archive_read_extract () as shown below.

This regression makes it difficult to run read-only; any chance this abort can 
be turned into a warning instead?

Lars


Updating FreeBSD repository catalogue...
FreeBSD repository is up-to-date.
All repositories are up-to-date.
Checking integrity... done (0 conflicting)
The following 1 package(s) will be affected (of 0 checked):

Installed packages to be UPGRADED:
ca_root_nss: 3.19 - 3.19.1_1

The process will require 42 B more space.

Proceed with this action? [y/N]: y
[1/1] Upgrading ca_root_nss from 3.19 to 3.19.1_1...
You may need to manually remove /usr/local/etc/ssl/cert.pem if it's no longer 
needed.
You may need to manually remove /usr/local/openssl/cert.pem if it's no longer 
needed.
pkg: unlinkat(usr/local/share/licenses/ca_root_nss-3.19/LICENSE): No such file 
or directory
pkg: unlinkat(usr/local/share/licenses/ca_root_nss-3.19/MPL): No such file or 
directory
pkg: unlinkat(usr/local/share/licenses/ca_root_nss-3.19/catalog.mk): No such 
file or directory
[1/1] Extracting ca_root_nss-3.19.1_1:  71%
pkg: archive_read_extract(): Can't create '/etc/ssl/cert.pem.pXkDjkwDtvyq'
[1/1] Extracting ca_root_nss-3.19.1_1: 100%
[1/1] Deleting files for ca_root_nss-3.19.1_1: 100%


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: ixl and BOOTP

2015-05-20 Thread Eggert, Lars

On 2015-5-20, at 17:42, Ryan Stone ryst...@gmail.com wrote:
 Oh, I bet that you have a bunch of CPUs and ixl is consuming all of your
 interrupt vectors.   Does setting this tunable fix the issue?
 
 hw.ixl.max_queues=1

Yeah, this box has 40 cores, but unfortunately that tunable doesn't change 
things in terms of BOOTP (I do see 2 vectors assigned not instead of 41 though).

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: ixl and BOOTP

2015-05-19 Thread Eggert, Lars

On 2015-5-18, at 19:22, Ryan Stone ryst...@gmail.com wrote:
 Hm, I'm unable to reproduce this on the latest -CURRENT (r283059).  My 
 hardware is a little different from yours -- my CPU is a Haswell Xeon, and I 
 have only 1 igb port and no ixgbe.  Also, I was just booting GENERIC.  I 
 didn't have Xen or anything running.

Happens also without Xen. I will dig a bit further. Thanks for testing!

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: ixl and BOOTP

2015-05-18 Thread Eggert, Lars

On 2015-5-18, at 16:08, Ryan Stone ryst...@gmail.com wrote:
 This is very strange.  I have successfully netbooted -CURRENT in a very
 similar environment (ixl compiled into kernel and booting over igb).  I
 can't remember when the last time I did this but it was probably within the
 last couple of weeks.  I routinely netboot an 8.2 derivative in this kind
 of environment and I've never seen this kind of problem.

It used to work here, too. Something recently must have broken this.

 Could it be related to the size of the kernel, and not ixl specifically?

I don't know. Why would the size of the kernel matter?

 Also, do you have any indication as to where the hang happens?  Is it still
 in the BIOS, or in pxeloader, or in the kernel itself?  Are you booting in
 legacy mode or EFI?

Legacy mode, and it hangs in the kernel.

Without if_ixl in loader.conf, it does the usual BOOTP logic:

ses0 at ahciem0 bus 0 scbus7 target 0 lun 0
ses0: AHCI SGPIO Enclosure 1.00 0001 SEMB S-E-S 2.00 device
ses0: SEMB SES Device
ada0 at ahcich0 bus 0 scbus1 target 0 lun 0
ada0: INTEL SSDSC2BW180A3F 400i ACS-2 ATA SATA 3.x device
ada0: Serial Number CVCV3102050X180EGN
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 171705MB (351651888 512 byte sectors: 16H 63S/T 16383C)
ada0: quirks=0x14K
ada0: Previously was known as ad4
Sending DHCP Discover packet from interface igb0 (00:25:90:9b:73:2e)
Sending DHCP Discover packet from interface igb1 (00:25:90:9b:73:2f)
Sending DHCP Discover packet from interface ix0 (90:e2:ba:77:d4:9c)
Sending DHCP Discover packet from interface ix1 (90:e2:ba:77:d4:9d)
uhub1: 2 ports with 2 removable, self powered
uhub0: 2 ports with 2 removable, self powered
ugen0.2: vendor 0x8087 at usbus0
uhub2: vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2 on 
usbus0
ugen1.2: vendor 0x8087 at usbus1
uhub3: vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2 on 
usbus1
uhub2: 6 ports with 6 removable, self powered
uhub3: 8 ports with 8 removable, self powered
ugen0.3: American Megatrends Inc. at usbus0
ukbd0: Keyboard Interface on usbus0
ums0: Mouse Interface on usbus0
ums0: 3 buttons and [Z] coordinates ID=0
igb0: link state changed to UP
Received DHCP Offer packet on igb0 from 192.168.0.2 (accepted) (no root path) 
(boot_file)
Received DHCP Offer packet on igb0 from 192.168.0.2 (ignored) (no root path) 
(boot_file)
Received DHCP Offer packet on igb0 from 192.168.0.2 (ignored) (no root path) 
(boot_file)
Sending DHCP Request packet from interface igb0 (00:25:90:9b:73:2e)
Received DHCP Ack packet on igb0 from 192.168.0.2 (accepted) (got root path)
DHCP timeout for interface igb1
DHCP timeout for interface ix0
DHCP timeout for interface ix1
Wired loader interface (IP 192.168.11.1) is igb0
igb0 at 192.168.11.1 server 192.168.0.2 boot file /pxe/pxelinux.0
subnet mask 255.255.0.0 router 192.168.0.2 rootfs 192.168.0.10:/home/elars/dst 
hostname phobos2
Adjusted interface igb0
Shutdown interface igb1
Shutdown interface ix0
Shutdown interface ix1
...
And then later on mount the rootfs correctly:

Trying to mount root from nfs: []...
NFS ROOT: 192.168.0.10:/home/elars/dst
Interface igb0 IP-Address 192.168.11.1 Broadcast 192.168.255.255
Setting hostuuid: ----0025909b732e.
Setting hostid: 0xe85d6456.
...

If I enable if_ixl in loader.conf, I don't see any Sending DHCP Discover 
packet messages in the log at all, and consequently the NFS mount fails. See 
the attached diff; on the left is a boot without if_ixl, and on the right, with 
if_ixl.

Lars



ok   nok
   B2 B2
   __  _ _ __  _ _
   | | | _ \ / | __ \ | | | _ \ / | __ \
   | |___ _ __ ___ ___ | |_) | (___ | | | | | |___ _ __ ___ ___ | |_) |
   (___ | | | |
   | ___| '__/ _ \/ _ \| _  \___ \| | | | | ___| '__/ _ \/ _ \| _  \___
   \| | | |
   | | | | | __/ __/| |_) |) | |__| | | | | | | __/ __/| |_) |) |
   |__| |
   | | | | | | || | | | | | | | | | || | | |
   Xen 4.6-unstabled.ko size 0x2c00 at 0x10620` `
   /boot/kernel/cc_htcp.ko size 0x2f90 at 0x109e `
   (XEN) Xen version 4.6-unstable (r...@netapp.com) (gcc47 (FreeBSD Ports
   Collection) 4.7.4) debug=y Mon May 18 14:50:17 CEST 2015ko size 0x30d0
   at 0x1068000 .--` /y:` +. /boot/kernel/cc_vegas.ko size 0x30d0 at
   0x10a1000.---...--.``` -/
   (XEN) Latest ChangeSet: | yo`:. :o `+- Booting...el/cc_hd.ko size
   0x2c00 at 0x109b000o .--` /y:` +.
   (XEN) Bootloader: FreeBSD Loader | y/ -/` -o/ Xen 4.6-unstable | yo`:.
   :o `+-
   (XEN) Command line: dom0_mem=4096M dom0pvh=1 com1=115200,8n1
   console=com1. (XEN) Xen version 4.6-unstable (r...@netapp.com) (gcc47
   (FreeBSD Ports Collection) 4.7.4) debug=y Mon May 18 14:50:17 CEST
   2015ser | .- ::/sy+:.
   (XEN) Video information:r prompt | / `-- / (XEN) Latest ChangeSet: r p
   | / `-- /
   (XEN) VGA is text mode 80x25, font 8x16 | `: :` (XEN) Bootloader:
   FreeBSD Loader | `: :`
   (XEN)

ixl and BOOTP

2015-05-18 Thread Eggert, Lars

Hi,

when I have the ixl driver compiled into my -CURRENT kernel (or loaded as a 
module via loader.conf), the boot seems to hang (or silently crash) when BOOTP 
starts bringing up interfaces to send out probes. (I'm not netbooting over an 
ixl, the boot interface is an igb.)

What works is building the kernel without the ixl driver and then loading it 
manually once the system is up. That way, BOOTP via igb succeeds.

Any ideas what could be causing this?

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

FreeBSD FUSE calls truncate() on read-only files

2015-02-25 Thread Eggert, Lars

Hi,

this came up when trying to port tup (https://github.com/gittup/tup) to FreeBSD.

Even though we are opening the file read-only with cat, FUSE calls truncate() 
on it, which modifies its mtime and this screws up tup. See 
https://github.com/gittup/tup/issues/198

Anyone know why FreeBSD's FUSE is doing this?

Thanks,
Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: HEADS UP: Upgraded clang, llvm and lldb to 3.5.0

2015-01-07 Thread Eggert, Lars

On 2015-1-7, at 16:28, Garrett Cooper yaneurab...@gmail.com wrote:
 Please open a bug and cc hselasky@ on it.

Done: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196597

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: HEADS UP: Upgraded clang, llvm and lldb to 3.5.0

2015-01-07 Thread Eggert, Lars

Hi,

On 2014-12-31, at 21:41, Dimitry Andric d...@freebsd.org wrote:
 I just committed an upgrade of clang, llvm and lldb to 3.5.0 to head, in
 r276479.

there seem to be issues when building with -DWITH_OFED:

--- contrib/ofed.all__D ---
/usr/home/elars/src/contrib/ofed/usr.bin/opensm/../../management/opensm/opensm/osm_ucast_ftree.c:2996:8:
 error: taking the absolute value of unsigned type 'unsigned int' has no effect 
[-Werror,-Wabsolute-value]
if (abs(p_sw-rank - p_remote_sw-rank) != 1) {

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [PATCH] nscd

2014-09-30 Thread Eggert, Lars

Hi,

I've been seeing the same issues with nscd not caching, but unfortunately your 
patch doesn't seem to change things, for better or worse.

My nsswitch.conf looks as follows:

group: cache files nis
hosts: cache files dns
networks: cache files
passwd: cache files nis
shells: files
services: cache files nis
protocols: cache files
rpc: cache files

When I start nscd -n -s -t and then run top in another shell, top takes ~10 
seconds to start up every time; if nscd did its thing, repeat invocations 
should be much faster. nscd doesn't seem to see any activity either, based on 
its log:

[elars@one: ~] sudo nscd -n -s -t
M1 from main: request agents registered successfully
M2 from cache: cache was successfully initialized
M2 from runtime environment: using socket /var/run/nscd
M2 from runtime environment: successfully initialized
M1 from main: working in single-threaded mode
no further output

Lars

On 2014-9-30, at 5:40, David Shane Holden dpej...@yahoo.com wrote:

 So, I've noticed nscd hasn't worked right for awhile now.  Since I
 upgraded to 10.0 it never seemed to cache properly but I never bothered
 to really dig into it until recently and here's what I've found.  In my
 environment I have nsswitch set to use caching and LDAP as such:
 
 group: files cache ldap
 passwd: files cache ldap
 
 The LDAP part works fine, but caching didn't on 10.0 for some reason.
 On my 9.2 machines it works as expected though.  What I've found is in
 usr.sbin/nscd/query.c
 
 struct query_state *
 init_query_state(int sockfd, size_t kevent_watermark, uid_t euid, gid_t
 egid)
 {
  ...
   memcpy(retval-timeout, s_configuration-query_timeout,
   sizeof(struct timeval));
  ...
 }
 
 s_configuration-query_timeout is an 'int' which is being memcpy'd into
 a 'struct timeval' causing it to grab other parts of the s_configuration
 struct along with the query_timeout value and polluting retval-timeout.
 In this case it appears to be grabbing s_configuration-threads_num and
 shoving that into timeout.tv_sec along with the query_timeout. This ends
 up confusing nscd later on (instead of being 8 it ends up being set to
 34359738376) and breaks it's ability to cache.  I've attached a patch to
 set the retval-timeout properly and gets nscd working again.  I'm
 guessing gcc was handling this differently from clang which is why it
 wasn't a problem before 10.0.
 nscd.patch___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: nscd not caching

2014-08-19 Thread Eggert, Lars

On 2014-8-18, at 20:23, John-Mark Gurney j...@funkthat.com wrote:
 Why not run a local slave on your server?

I am trying to get one set up. It requires a change request to our 
organization's IT, which is, ahem, not always lightning fast.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: nscd not caching

2014-08-19 Thread Eggert, Lars

On 2014-8-19, at 13:54, Daniel Braniss da...@cs.huji.ac.il wrote:
 
 I know that this a bit late but have you ever considered Hesiod? it uses 
 DNS/txt.
 we have been using it since the days when BSDi had no NIS support and haven’t
 seen a ypserver not responding since :-)

I don't control the master NIS infrastructure, I just want to use it (with a 
server that is unfortunately 25ms away). We will move to LDAP at some time, but 
in the meantime, a functioning nscd would be nice.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: nscd not caching

2014-08-18 Thread Eggert, Lars

Hi,

On 2014-8-17, at 18:10, Adam McDougall mcdou...@egr.msu.edu wrote:
 We were using +: type entries in the local password and group
 tables and I believe we used an unmodified /etc/nsswitch.conf (excluding
 cache lines while testing nscd):

I tried that setup too, and it doesn't seem to be caching any NIS lookups 
either.

The current NIS server is 25ms away, which is a pain. I'm trying to get a local 
slave set up, which will make the need for nscd go away, but it would sure be 
nice if it worked in the meantime.

 At our site, we never had enough load to outright require nscd on
 FreeBSD, although there were some areas where caching had a usability
 benefit.

Load is not an issue, latency is (see above).

  top was slow to open since it would load the whole passwd
 table first, but top -u was a workaround.

Right, I see that issue too.

 As a workaround until we retired NIS, I wrote a hack of a script to
 merge NIS groups into my local /etc/group files periodically from cron.
 Aside from bugs in my script, that worked well.

I may end up doing this, too.

Given all this, maybe it's time to retire nscd?

Lars




signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: nscd not caching

2014-08-17 Thread Eggert, Lars

Nobody using nscd? Really?

On 2014-8-14, at 13:26, Eggert, Lars l...@netapp.com wrote:

 [Resending to current@, since I can't get it to work on -CURRENT either.]
 
 Hi,
 
 anyone have an idea why nscd would not be caching NIS lookups?
 
 My nsswitch.conf looks as follows:
 
 group: cache files nis
 hosts: cache files dns
 networks: cache files
 passwd: cache files nis
 shells: files
 services: cache files nis
 protocols: cache files
 rpc: cache files
 
 nisdomain is set and ypbind is started, and I see lots of NIS traffic going 
 in and out.
 
 But nothing is cached; running nscd with -t just prints this and then then 
 nothing, ever:
 
 M1 from main: successfully daemonized
 M1 from main: request agents registered successfully
 M2 from cache: cache was successfully initialized
 M2 from runtime environment: using socket /var/run/nscd
 M2 from runtime environment: successfully initialized
 M1 from main: thread #0 was successfully created
 M1 from main: thread #1 was successfully created
 M1 from main: thread #2 was successfully created
 M1 from main: thread #3 was successfully created
 M1 from main: thread #4 was successfully created
 M1 from main: thread #5 was successfully created
 M1 from main: thread #6 was successfully created
 M1 from main: thread #7 was successfully created
 
 Lars
 



signature.asc
Description: Message signed with OpenPGP using GPGMail

nscd not caching

2014-08-14 Thread Eggert, Lars

[Resending to current@, since I can't get it to work on -CURRENT either.]

Hi,

anyone have an idea why nscd would not be caching NIS lookups?

My nsswitch.conf looks as follows:

group: cache files nis
hosts: cache files dns
networks: cache files
passwd: cache files nis
shells: files
services: cache files nis
protocols: cache files
rpc: cache files

nisdomain is set and ypbind is started, and I see lots of NIS traffic going in 
and out.

But nothing is cached; running nscd with -t just prints this and then then 
nothing, ever:

M1 from main: successfully daemonized
M1 from main: request agents registered successfully
M2 from cache: cache was successfully initialized
M2 from runtime environment: using socket /var/run/nscd
M2 from runtime environment: successfully initialized
M1 from main: thread #0 was successfully created
M1 from main: thread #1 was successfully created
M1 from main: thread #2 was successfully created
M1 from main: thread #3 was successfully created
M1 from main: thread #4 was successfully created
M1 from main: thread #5 was successfully created
M1 from main: thread #6 was successfully created
M1 from main: thread #7 was successfully created

Lars



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: HEADS UP: Updated llvm/clang to 3.4 in r261991

2014-02-18 Thread Eggert, Lars

Hi,

On 2014-2-16, at 21:06, Dimitry Andric d...@freebsd.org wrote:
 I have just upgraded our copy of llvm/clang to 3.4 release, in r261991.

I just done a git pull followed by a buildworld. Shouldn't I be having 
version 3.4?

root@six:~ # dmesg | grep clang
FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610

root@six:~ # clang -v
FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
Target: x86_64-unknown-freebsd11.0
Thread model: posix

root@six:~ # uname -a
FreeBSD six 11.0-CURRENT FreeBSD 11.0-CURRENT #14 df7b691(fas3270): Tue Feb  4 
13:28:37 CET 2014 
el...@stanley.muccbc.hq.netapp.com:/usr/home/elars/obj/usr/home/elars/src/sys/FAS3270
  amd64

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: HEADS UP: Updated llvm/clang to 3.4 in r261991

2014-02-18 Thread Eggert, Lars

Disregard - pilot error. (It's not Feb 4...)

Lars

On 2014-2-18, at 9:50, Eggert, Lars l...@netapp.com wrote:

 Hi,
 
 On 2014-2-16, at 21:06, Dimitry Andric d...@freebsd.org wrote:
 I have just upgraded our copy of llvm/clang to 3.4 release, in r261991.
 
 I just done a git pull followed by a buildworld. Shouldn't I be having 
 version 3.4?
 
 root@six:~ # dmesg | grep clang
 FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
 
 root@six:~ # clang -v
 FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
 Target: x86_64-unknown-freebsd11.0
 Thread model: posix
 
 root@six:~ # uname -a
 FreeBSD six 11.0-CURRENT FreeBSD 11.0-CURRENT #14 df7b691(fas3270): Tue Feb  
 4 13:28:37 CET 2014 
 el...@stanley.muccbc.hq.netapp.com:/usr/home/elars/obj/usr/home/elars/src/sys/FAS3270
   amd64
 
 Lars



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: using ConnectX card as Ethernet (mlxen)

2014-01-21 Thread Eggert, Lars

Hi,

On 2014-1-20, at 21:59, John Baldwin j...@freebsd.org wrote:
 I believe this should work, yes.  Getting a crashdump or the panic messages 
 would be really helpful in figuring out why it isn't.  Thanks.

I rebuilt the kernel, and see no crashes anymore. So that's good.

But there are a bunch of other issues that maybe someone has some ideas about:


(1) Late attach

The ConnectX-3 attaches very late during the boot process, after the system is 
already in single-user mode. See the attached dmesg; pci17 and pci18 (there are 
two identical cards in this system) first show as no driver attached during 
the PCI bus enumeration. Only after the system is single-user mode does the 
mlx4_core attach to the cards.

That means that e.g. trying to set sysctls for these cards in /etc/sysctl.conf, 
or configuring their IP addresses via rc.conf is not possible. At the moment, I 
work around this by sleeping in rc.local and then doing assignments there, but 
that's a hack.

Any clues why these cards attach so late?


(2) Device numbers change

After booting, these cards show up in InfiniBand mode:

ib0: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520
options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE
lladdr 80.0.0.48.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d1.21
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
ib1: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520
options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE
lladdr 80.0.0.49.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d1.22
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
ib2: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520
options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE
lladdr 80.0.0.48.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d0.d1
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
ib3: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520
options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE
lladdr 80.0.0.49.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d0.d2
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL

Then I force one into Ethernet mode:

# sysctl sys.device.mlx4_core0.mlx4_port1=eth
sys.device.mlx4_core0.mlx4_port1: auto (ib) - eth

and the device numbers on the ib devices change: ib1 is now ib4, and I have a 
new mlxen0 device.

ib2: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520
options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE
lladdr 80.0.0.48.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d0.d1
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
ib3: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520
options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE
lladdr 80.0.0.49.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d0.d2
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
mlxen0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500

options=d05bbRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE
ether f4:52:14:10:d1:21
inet6 fe80::f652:14ff:fe10:d121%mlxen0 prefixlen 64 scopeid 0xe 
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
media: Ethernet autoselect
status: no carrier
ib4: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520
options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE
lladdr 80.0.0.4a.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d1.22
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL

When I change another port into Ethernet mode

# sysctl sys.device.mlx4_core0.mlx4_port2=eth
sys.device.mlx4_core0.mlx4_port2: auto (ib) - eth

device numbers change again. Now mxlen0 disappears and becomes mxlen1, and I 
have a new mxlen2 device:

ib2: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520
options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE
lladdr 80.0.0.48.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d0.d1
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
ib3: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520
options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE
lladdr 80.0.0.49.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d0.d2
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
mlxen1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500

options=d05bbRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE
ether f4:52:14:10:d1:21
inet6 fe80::f652:14ff:fe10:d121%mlxen1 prefixlen 64 scopeid 0xe 
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
media: Ethernet autoselect
status: no carrier
mlxen2: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500

options=d05bbRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE
ether f4:52:14:10:d1:22
inet6 fe80::f652:14ff:fe10:d122%mlxen2 prefixlen 64 scopeid 0xf 
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
media: Ethernet autoselect
status: no carrier

Changing the other two ports (on the second card) to Ethernet mode 

# sysctl sys.device.mlx4_core1.mlx4_port1=eth
sys.device.mlx4_core1.mlx4_port1: auto (ib) -

Re: using ConnectX card as Ethernet (mlxen)

2014-01-21 Thread Eggert, Lars

On 2014-1-21, at 10:04, Lars Eggert l...@netapp.com wrote:
 See the attached dmesg

which I of course forget to attach (sigh). See below.

Lars

GDB: no debug ports present970
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2014 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights
reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.0-CURRENT #8 ab08c30(fas3270)-dirty: Tue Jan 21 09:07:36 CET
2014

el...@stanley.muccbc.hq.netapp.com:/usr/home/elars/obj/usr/home/elars/src/sys/FAS3270
amd64
FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
CPU: Intel(R) Xeon(R) CPU   E5240  @ 3.00GHz (3000.17-MHz
K8-class CPU)
  Origin=GenuineIntel  Id=0x1067a  Family=0x6  Model=0x17  Stepping=10

Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE

Features2=0xc0ce3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE,OSXSAVE
  AMD Features=0x20100800SYSCALL,NX,LM
  AMD Features2=0x1LAHF
  TSC: P-state invariant, performance statistics
real memory  = 18253611008 (17408 MB)
avail memory = 16599695360 (15830 MB)
MPTable: NETAPP   SB_XVI  
Event timer LAPIC quality 400
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  6
 cpu3 (AP): APIC ID:  7
random device not loaded; using insecure entropy
ioapic0: Assuming intbase of 0
ioapic0 Version 2.0 irqs 0-23 on motherboard
netmap: loaded module
random: Software, Yarrow initialized
smbios0: System Management BIOS at iomem 0xf6c00-0xf6c1e on
motherboard
smbios0: Version: 2.5
cryptosoft0: software crypto on motherboard
pcib0: MPTable Host-PCI bridge pcibus 0 on motherboard
pci0: PCI bus on pcib0
pcib1: MPTable PCI-PCI bridge at device 2.0 on pci0
pci1: PCI bus on pcib1
cxgbc0: Carnegie T3 onboard SR KR, 2 ports mem
0xdd001000-0xdd001fff,0xdc80-0xdcff,0xdd00-0xdd000fff irq 16
at device 0.0 on pci1
cxgbc0: AD8158 0xf=0x3 0x1=0xf
cxgbc0: using MSI-X interrupts (9 vectors)
cxgb0: Port 0 10GBASE-R on cxgbc0
cxgb0: Ethernet address: 00:a0:98:30:c2:2a
cxgb1: Port 1 10GBASE-R on cxgbc0
cxgb1: Ethernet address: 00:a0:98:30:c2:2b
cxgbc0: Firmware Version 7.11.0
pcib2: PCI-PCI bridge at device 3.0 on pci0
pci2: PCI bus on pcib2
pcib3: MPTable PCI-PCI bridge at device 4.0 on pci0
pci3: PCI bus on pcib3
pcib4: PCI-PCI bridge mem 0xdd30-0xdd31 irq 16 at device 0.0
on pci3
pci4: PCI bus on pcib4
pcib3: unable to route slot 0 INTB
pcib5: PCI-PCI bridge irq 16 at device 4.0 on pci4
pci5: PCI bus on pcib5
pcib6: MPTable PCI-PCI bridge irq 10 at device 5.0 on pci4
pci6: PCI bus on pcib6
ix0: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15
mem 0xdd40-0xdd47,0xdd50-0xdd503fff irq 17 at device 0.0 on
pci6
ix0: Using MSIX interrupts with 5 vectors
ix0: Ethernet address: 90:e2:ba:37:d5:b4
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
001.08 [2141] netmap_attach success for ix0
ix1: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15
mem 0xdd48-0xdd4f,0xdd504000-0xdd507fff irq 18 at device 0.1 on
pci6
ix1: Using MSIX interrupts with 5 vectors
ix1: Ethernet address: 90:e2:ba:37:d5:b5
ix1: PCI Express Bus: Speed 5.0GT/s Width x8
001.09 [2141] netmap_attach success for ix1
pcib7: PCI-PCI bridge irq 16 at device 8.0 on pci4
pci7: PCI bus on pcib7
pcib8: PCI-PCI bridge at device 0.0 on pci7
pci8: PCI bus on pcib8
pcib9: MPTable PCI-PCI bridge at device 0.0 on pci8
pci9: PCI bus on pcib9
em0: Intel(R) PRO/1000 Network Connection 7.3.8 mem
0xdd62-0xdd63,0xdd60-0xdd61 irq 16 at device 0.0 on pci9
em0: Using an MSI interrupt
em0: Ethernet address: 00:1b:21:a8:a5:34
001.10 [2141] netmap_attach success for em0
em1: Intel(R) PRO/1000 Network Connection 7.3.8 mem
0xdd66-0xdd67,0xdd64-0xdd65 irq 17 at device 0.1 on pci9
em1: Using an MSI interrupt
em1: Ethernet address: 00:1b:21:a8:a5:35
001.11 [2141] netmap_attach success for em1
pcib10: MPTable PCI-PCI bridge at device 1.0 on pci8
pci10: PCI bus on pcib10
em2: Intel(R) PRO/1000 Network Connection 7.3.8 mem
0xdd72-0xdd73,0xdd70-0xdd71 irq 17 at device 0.0 on
pci10
em2: Using an MSI interrupt
em2: Ethernet address: 00:1b:21:a8:a5:36
001.12 [2141] netmap_attach success for em2
em3: Intel(R) PRO/1000 Network Connection 7.3.8 mem
0xdd76-0xdd77,0xdd74-0xdd75 irq 18 at device 0.1 on
pci10
em3: Using an MSI interrupt
em3: Ethernet address: 00:1b:21:a8:a5:37
001.13 [2141] netmap_attach success for em3
pcib11: PCI-PCI bridge at device 5.0 on pci0
pci11: PCI bus on pcib11
pcib12: PCI-PCI bridge at device 6.0 on pci0
pci12: PCI bus on pcib12
pcib0: unable to

Re: using ConnectX card as Ethernet (mlxen)

2014-01-21 Thread Eggert, Lars

Last follow-up: I just saw that there are some additional messages (errors?) on 
the serial console when changing the device from IB to Ethernet, maybe they 
mean something to someone:

root@one:~ # sysctl sys.device.mlx4_core0.mlx4_port1=eth
sys.device.mlx4_core0.mlx4_port1: auto (ib)7ib0: stopping interface
7ib0: downing ib_dev
7ib0: stopping multicast thread
7ib0: flushing multicast list
7qpn 0x48: invalid attribute mask specified  for transition 0 to 6. 
qp_type 4,  attr_mask 0x1\n4ib0: Failed to modify QP to ERROR state
7ib0: All sends and receives done.
7ib0: cleaning up ib_dev
7ib0: stopping multicast thread
7ib0: flushing multicast list
7ib0: Cleanup ipoib connected mode.
7ib1: stopping interface
7ib1: downing ib_dev
7ib1: stopping multicast thread
7ib1: flushing multicast list
7qpn 0x49: invalid attribute mask specified  for transition 0 to 6. 
qp_type 4,  attr_mask 0x1\n4ib1: Failed to modify QP to ERROR state
7ib1: All sends and receives done.
7ib1: cleaning up ib_dev
7ib1: stopping multicast thread
7ib1: flushing multicast list
7ib1: Cleanup ipoib connected mode.
6mlx4_en mlx4_core0: Using 5 tx rings for port:1
6mlx4_en mlx4_core0: Defaulting to 4 rx rings for port:1
6mlx4_en mlx4_core0: Activating port:1
mlxen0: Ethernet address: f4:52:14:10:d1:21
4mlx4_en: mlx4_core0: Port 1: Using 5 TX rings
4mlx4_en: mlx4_core0: Port 1: Using 4 RX rings
6mlx4_ib: Mellanox ConnectX InfiniBand driver v1.Jan 21 09:21:31 0 
(April 4, 2008)
one kernel: mlx4_en: mlx4_core0: Port 1: Using 5 TX rings
Jan 7ib4: max_srq_sge=31
21 09:21:31 one 7ib4: max_cm_mtu = 0x1, num_frags=16
kernel: mlx4_en:ib4:  mlx4_core0: PorAttached to mlx4_0 port 2
t 1: Using 4 RX rings
 - eth

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: using ConnectX card as Ethernet (mlxen)

2014-01-20 Thread Eggert, Lars

Hi,

On 2013-7-9, at 22:08, John Nielsen li...@jnielsen.net wrote:
On Jul 9, 2013, at 9:58 AM, John Baldwin j...@freebsd.org wrote:
 So this was just fixed (finally) in HEAD in r253048.  You can how use the
 sysctls to change this.
 
 I saw the commit. Thanks! I'll give it a try at some point (whenever my 
 schedule and hardware availability align).

is this supposed to work at the moment? When I try, the machine seems to crash:

root@one:~ # sysctl sys.device.mlx4_core0.mlx4_port1=eth
sys.device.mlx4_core0.mlx4_port1: auto (ib)
Write failed: Broken pipe
Shared connection to xxx closed.

Unfortunately I don't have serial console access at the moment, so I can't 
access any messages that may have gotten dumped.

The cards in question are:

mlx4_core0@pci0:17:0:0: class=0x028000 card=0x005015b3 chip=0x100315b3 rev=0x00 
hdr=0x00
vendor = 'Mellanox Technologies'
device = 'MT27500 Family [ConnectX-3]'
class  = network

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: using ConnectX card as Ethernet (mlxen)

2014-01-20 Thread Eggert, Lars

Hi,

if I leave the mlx4ib device out of the kernel (i.e., only compile in mlxen), 
doing the sysctl switch to Ethernet mode works fine.

Lars

On 2014-1-20, at 13:08, Eggert, Lars l...@netapp.com wrote:

 Hi,
 
 On 2013-7-9, at 22:08, John Nielsen li...@jnielsen.net wrote:
 On Jul 9, 2013, at 9:58 AM, John Baldwin j...@freebsd.org wrote:
 So this was just fixed (finally) in HEAD in r253048.  You can how use the
 sysctls to change this.
 
 I saw the commit. Thanks! I'll give it a try at some point (whenever my 
 schedule and hardware availability align).
 
 is this supposed to work at the moment? When I try, the machine seems to 
 crash:
 
 root@one:~ # sysctl sys.device.mlx4_core0.mlx4_port1=eth
 sys.device.mlx4_core0.mlx4_port1: auto (ib)
 Write failed: Broken pipe
 Shared connection to xxx closed.
 
 Unfortunately I don't have serial console access at the moment, so I can't 
 access any messages that may have gotten dumped.
 
 The cards in question are:
 
 mlx4_core0@pci0:17:0:0:   class=0x028000 card=0x005015b3 chip=0x100315b3 
 rev=0x00 hdr=0x00
vendor = 'Mellanox Technologies'
device = 'MT27500 Family [ConnectX-3]'
class  = network
 
 Lars



signature.asc
Description: Message signed with OpenPGP using GPGMail

nfsd server cache flooded, try to increase nfsrc_floodlevel

2013-08-08 Thread Eggert, Lars

Hi,

every few days or so, my -STABLE NFS server (v3 and v4) gets wedged with a ton 
of messages about nfsd server cache flooded, try to increase nfsrc_floodlevel 
in the log, and nfsstat shows TCPPeak at 16385. It requires a reboot to 
unwedge, restarting the server does not help.

The clients are (mostly) six -CURRENT nfsv4 boxes that netboot from the server 
and mount all drives from there.

I googled around and saw that others have hit this issue, but I haven't seen 
any resolution posted. I guess I can increase NFSRVCACHE_FLOODLEVEL in the 
source, but I wonder if I wouldn't simply hit the increase value after a little 
while longer...

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: NFSv4 console messages (locks lost etc.)

2013-07-01 Thread Eggert, Lars

Hi,

On Jun 29, 2013, at 2:45, Rick Macklem rmack...@uoguelph.ca wrote:
 Btw, a NFSv4 mounted root fs will not work correctly, because the client
 name is generated from the host uuid, which isn't set when the root fs
 is mounted. I'm not sure what the client would use as its client name,
 but this will definitely break things badly if multiple clients use the
 same name. (And this might explain the lease expiry problem.)

ah, now that explains a lot. Since these are diskless clients, I had set 
hostid_enable=NO in order to turn off the /etc/rc: WARNING: could not store 
hostuuid in /etc/hostid warning.

Turning this back on seems to have fixed the issue.

(It might make sense to have the NFSv4 code throw a warning when the hostid 
isn't set, if it depends on it?)

Thanks,
Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: NFSv4 console messages (locks lost etc.)

2013-06-29 Thread Eggert, Lars

Hi,

I should have mentioned that the server is FreeBSD -STABLE running newnfs, and 
the network isn't partitioned (because I access the box over SSH at the same 
time I see these messages.)

They only appear under heavy NFS load (portmaster build of math/R in this case.)

Lars

On Jun 29, 2013, at 2:32, Rick Macklem rmack...@uoguelph.ca wrote:

 Lars Eggert wrote:
 Hi,
 
 on a -CURRENT client, I get quite a number of console messages under
 heavy NFSv4 load, such as:
 
 nfsv4 expired locks lost
 Means the lease expired on the NFSv4 server somehow. Lease
 expiry is bad news and there is no way to recover locks
 lost because of it.
 nfscl: never fnd open
 Usually, opens can be recovered after a lease expiry, but it
 might be broken. Since lease expiry should never happen during
 normal operation (see below), it doesn't get a lot of testing.
 
 nfscl: never fnd open
 nfscl: never fnd open
 nfsv4 expired locks lost
 nfscl: never fnd open
 nfscl: never fnd open
 nfsv4 expired locks lost
 nfsv4 expired locks lost
 nfsv4 expired locks lost
 nfsv4 expired locks lost
 nfsv4 expired locks lost
 nfscl: never fnd open
 
 Can I ignore them? Can I turn them off?
 
 Well, these should never happen during normal, correct operation. The
 nfsv4 expired locks lost implies lease expiry. This should only happen
 when the client is network partitioned from the server for more than
 a lease duration (chosen by the server, but typically about 1minute).
 The client does a Renew Op every 1/2 lease durations to avoid this. Also,
 any state related operation (open/lock/locku/close/etc) is supposed to
 renew the lease implicitly.
 
 If you are getting network partitions happening, then you really need
 to fix the network.
 
 If not, then if you watch network traffic with something like wireshark
 and see Renew Ops happening at regular intervals, then I can only suggest
 that the server is somehow broken for NFSv4. You should also look for
 NFS4ERR_EXPIRED error replies to operations related to state 
 (open/lock/locku/close).
 That is the server reply which indicates the lease expiry. If the server is
 never returning this, I have no idea how the client would generate the above
 messages, but it does indicate a client NFSv4 bug if that is the case.
 
 Switching all mounts to NFSv3 will get rid of the above, although it is
 not exactly a fix;-)
 
 rick
 
 Thanks,
 Lars
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to
 freebsd-current-unsubscr...@freebsd.org
 

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: NFSv4 console messages (locks lost etc.)

2013-06-29 Thread Eggert, Lars

Thanks Rick! I will check these Monday.

Lars

On Jun 29, 2013, at 2:45, Rick Macklem rmack...@uoguelph.ca wrote:

 Lars Eggert wrote:
 Hi,
 
 On Jun 28, 2013, at 16:37, Eggert, Lars l...@netapp.com wrote:
 On Jun 28, 2013, at 16:14, Eggert, Lars l...@netapp.com wrote:
 on a -CURRENT client, I get quite a number of console messages
 under heavy NFSv4 load, such as:
 
 nfsv4 expired locks lost
 nfscl: never fnd open
 The never fnd open message is generated by the NFSv4 client when
 a close can't find an extant open to close. I suspect the open was
 not recovered after lease expiry. Since Close Ops only matter to the
 NFSv4 server, this doesn't imply a problem unless the NFSv4 server
 thinks the client still has an Open (which would not be the case after
 an NFSv4 server expires a lease, since it assumes all state such as opens
 are lost when a lease is expired).
 
 
 actually, not sure if the nfscl message is from an NFSv4 mount
 point or not, because the box mounts root via BOOTP, so with NFSv3
 (or v2?) and some other mounts with NFSv4.
 
 and another data point: the nfscl messages seem to disappear when I
 remove the BOOTP_NFSV3 flag from the kernel. The client hangs that
 made me dig into these messages seem to also disappear, fingers
 crossed.
 
 Hmm, weird, since NFSv3 should never generate these messages. I think
 that a root fs is remounted using the / entry in the /etc/fstab in the
 NFS mounted root fs. Did this entry specify nfsv4 by any chance?
 
 Btw, a NFSv4 mounted root fs will not work correctly, because the client
 name is generated from the host uuid, which isn't set when the root fs
 is mounted. I'm not sure what the client would use as its client name,
 but this will definitely break things badly if multiple clients use the
 same name. (And this might explain the lease expiry problem.)
 
 If the root fs is mounted NFSv3 (or NFSv2) it shouldn't generate the
 messages or have any effect on the NFSv4 client, so I have no idea
 why removing BOOTP_NFSV3 would have any effect on this?
 
 Oh, and if you are using a pretty up to date system, you can nfsstat -m
 to find out what mount options are actually in use. If nfsv4 is listed
 for your root fs, that is a serious problem that you need to fix.
 
 rick
 
 (I still get a bunch of nfsv4 expired locks lost messages, but no
 hangs.)
 
 Lars
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to
 freebsd-current-unsubscr...@freebsd.org
 

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

NFSv4 console messages (locks lost etc.)

2013-06-28 Thread Eggert, Lars

Hi,

on a -CURRENT client, I get quite a number of console messages under heavy 
NFSv4 load, such as:

nfsv4 expired locks lost
nfscl: never fnd open
nfscl: never fnd open
nfscl: never fnd open
nfsv4 expired locks lost
nfscl: never fnd open
nfscl: never fnd open
nfsv4 expired locks lost
nfsv4 expired locks lost
nfsv4 expired locks lost
nfsv4 expired locks lost
nfsv4 expired locks lost
nfscl: never fnd open

Can I ignore them? Can I turn them off?

Thanks,
Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: NFSv4 console messages (locks lost etc.)

2013-06-28 Thread Eggert, Lars

Hi,

On Jun 28, 2013, at 16:14, Eggert, Lars l...@netapp.com wrote:
 on a -CURRENT client, I get quite a number of console messages under heavy 
 NFSv4 load, such as:
 
 nfsv4 expired locks lost
 nfscl: never fnd open

actually, not sure if the nfscl message is from an NFSv4 mount point or not, 
because the box mounts root via BOOTP, so with NFSv3 (or v2?) and some other 
mounts with NFSv4.

Lars

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: NFSv4 console messages (locks lost etc.)

2013-06-28 Thread Eggert, Lars

Hi,

On Jun 28, 2013, at 16:37, Eggert, Lars l...@netapp.com wrote:
 On Jun 28, 2013, at 16:14, Eggert, Lars l...@netapp.com wrote:
 on a -CURRENT client, I get quite a number of console messages under heavy 
 NFSv4 load, such as:
 
 nfsv4 expired locks lost
 nfscl: never fnd open
 
 actually, not sure if the nfscl message is from an NFSv4 mount point or 
 not, because the box mounts root via BOOTP, so with NFSv3 (or v2?) and some 
 other mounts with NFSv4.

and another data point: the nfscl messages seem to disappear when I remove 
the BOOTP_NFSV3 flag from the kernel. The client hangs that made me dig into 
these messages seem to also disappear, fingers crossed.

(I still get a bunch of nfsv4 expired locks lost messages, but no hangs.)

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

reboot hanging?

2013-06-17 Thread Eggert, Lars

Hi,

something changed in the last 2-3 weeks on -CURRENT that causes reboots to hang 
after this line:

Waiting (max 60 seconds) for system process `vnlru' to stop...done

I need to manually cycle the power to reboot.

Any clues?

Thanks,
Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: ccache issues during buildworld on recent -CURRENT

2013-06-17 Thread Eggert, Lars

Hi,

any further ideas? This issue still exist when building -CURRENT on -STABLE as 
of today.

Thanks,
Lars


On May 23, 2013, at 12:33, Eggert, Lars l...@netapp.com wrote:

 Hi,
 
 On May 22, 2013, at 13:37, Dimitry Andric d...@freebsd.org wrote:
 Can you try to figure out which copy of clang ccache finds and runs?
 
 I enabled CCACHE_LOGFILE, and it seems that it runs /usr/bin/clang:
 
 [2013-05-23T12:25:36.810346 48913] Command line: 
 /usr/local/libexec/ccache/clang 
 --sysroot=/home/elars/obj/usr/home/elars/src/tmp 
 -B/home/elars/obj/usr/home/elars/src/tmp/usr/bin -E -M 
 -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/include 
 -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/include
  
 -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic
  -I. 
 -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/../../lib/clang/include
  -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS 
 -D__STDC_CONSTANT_MACROS -DCLANG_ENABLE_ARCMT -DCLANG_ENABLE_REWRITER 
 -DCLANG_ENABLE_STATIC_ANALYZER 
 -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-freebsd10.0 
 -DLLVM_HOSTTRIPLE=x86_64-unknown-freebsd10.0 -DDEFAULT_SYSROOT= 
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Builtins.cpp
  /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tool
 s/
 clang/lib/Basic/CharInfo.cpp 
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Diagnostic.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/DiagnosticIDs.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileManager.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileSystemStatCache.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/IdentifierTable.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/LangOptions.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Module.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/ObjCRuntime.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/OpenMPKinds.cpp
  /usr/home/elars
 /
 src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/OperatorPrecedence.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceLocation.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceManer.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/TargetInfo.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Targets.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/TokenKinds.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Version.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/VersionTuple.cpp
 [2013-05-23T12:25:36.810373 48913] Hostname: stanley.muccbc.hq.netapp.com
 [2013-05-23T12:25:36.810380 48913] Working directory: (null)
 [2013-05-23T12:25:36.810399 48913] Failed; falling back to running the real 
 compiler
 [2013-05-23T12:25:36.810405 48913] Executing /usr/bin/clang 
 --sysroot=/home/elars/obj/usr/home/elars/src/tmp 
 -B/home/elars/obj/usr/home/elars/src/tmp/usr/bin -E -M 
 -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/include 
 -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/include
  
 -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic
  -I. 
 -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/../../lib/clang/include
  -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS 
 -D__STDC_CONSTANT_MACROS -DCLANG_ENABLE_ARCMT -DCLANG_ENABLE_REWRITER 
 -DCLANG_ENABLE_STATIC_ANALYZER 
 -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-freebsd10.0 
 -DLLVM_HOSTTRIPLE=x86_64-unknown-freebsd10.0 -DDEFAULT_SYSROOT= 
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Builtins.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Cha
 rI
 nfo.cpp 
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Diagnostic.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/DiagnosticIDs.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileManager.cpp
  
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileSystemStatCache.cpp
  
 /usr/home

Re: reboot hanging?

2013-06-17 Thread Eggert, Lars

On Jun 17, 2013, at 15:18, Luiz Otavio O Souza lists...@gmail.com
 wrote:
 It was a change on alq. The fix was committed in r251838.

Awesome! I can confirm it's fixed.

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: ccache issues during buildworld on recent -CURRENT

2013-06-17 Thread Eggert, Lars

Hi,

On Jun 17, 2013, at 14:51, Bryan Drewery bdrew...@freebsd.org wrote:
 ccache is known to be broken with clang [1]. I would recommend not using it.

Pity! We do buildworld often enough that that's an inconvenience.

 Sometimes CCACHE_CPP2=1 in make.conf can help. CCACHE_CPP2=1 is
 hardcoded in version 3.1.9_2 of devel/ccache on CURRENT. There are other
 classes of problems that are not fixed in upstream ccache yet.

doesn't seem to help, unfortunately.

 There are a few fixes in the ccache development repository that I have
 been meaning to bring to our devel/ccache port.

I'd be happy to test once there is something to test!

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: mounting root from NFS via ROOTDEVNAME

2013-06-03 Thread Eggert, Lars

Hi,

On Jun 3, 2013, at 1:57, Rick Macklem rmack...@uoguelph.ca wrote:
 Cool. Thanks. Would you like to review and/or test the above?

it'd be great if folks would test this a bit. It certainly works for me, but I 
can't say that I have done a very thorough testing.

 I'll be happy to commit it if Lars doesn't have a src commit bit. (I've
 seen his posts, but can't remember if he is a committer?)

I'm not a committer.

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: mounting root from NFS via ROOTDEVNAME

2013-05-28 Thread Eggert, Lars

Hi,

to conclude this thread, the patch below allows one to specify an nfs rootfs 
via the ROOTDEVNAME kernel option, which will be mounted when BOOTP does not 
return a root-path option.

Lars


diff --git a/sys/nfs/bootp_subr.c b/sys/nfs/bootp_subr.c
index 2c57a91..972fb12 100644
--- a/sys/nfs/bootp_subr.c
+++ b/sys/nfs/bootp_subr.c
@@ -45,6 +45,7 @@ __FBSDID($FreeBSD$);
 
 #include opt_bootp.h
 #include opt_nfs.h
+#include opt_rootdevname.h
 
 #include sys/param.h
 #include sys/systm.h
@@ -870,8 +871,20 @@ bootpc_call(struct bootpc_globalcontext *gctx, struct 
thread *td)
rtimo = time_second +
BOOTP_SETTLE_DELAY;
printf( (got root path));
-   } else
+   } else {
printf( (no root path));
+#ifdef ROOTDEVNAME
+   /*
+* If we'll mount rootfs from
+* ROOTDEVNAME, we can accept
+* offers without root paths.
+*/
+   gotrootpath = 1;
+   rtimo = time_second +
+   BOOTP_SETTLE_DELAY;
+   printf( (ROOTDEVNAME));
+#endif
+   }
printf(\n);
}
} /* while secs */
@@ -1440,6 +1453,16 @@ bootpc_decode_reply(struct nfsv3_diskless *nd, struct 
bootpc_ifcontext *ifctx,
 
p = bootpc_tag(gctx-tag, ifctx-reply, ifctx-replylen,
   TAG_ROOT);
+#ifdef ROOTDEVNAME
+   /*
+* If there was no root path in BOOTP, use the one in ROOTDEVNAME.
+*/
+   if (p == NULL) {
+   p = strdup(ROOTDEVNAME, M_TEMP);
+   if (strcmp(strsep(p, :), nfs) != 0)
+   panic(ROOTDEVNAME is not an NFS mount point);
+   }
+#endif
if (p != NULL) {
if (gctx-setrootfs != NULL) {
printf(rootfs %s (ignored) , p);

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

port building warnings about compat.ia32.maxvmem

2013-05-27 Thread Eggert, Lars

Hi,

when I try to build ports on -CURRENT, I've been seeing tons of these messages 
since the past week or so:

make: /usr/ports/Mk/bsd.port.mk line 1633: warning: Couldn't read shell's 
output for if /sbin/sysctl -n compat.ia32.maxvmem /dev/null 21; then echo 
YES; fi

The amd64 kernel I'm building this on doesn't have COMPAT_FREEBSD32 defined.

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: ccache issues during buildworld on recent -CURRENT

2013-05-23 Thread Eggert, Lars

Hi,

On May 22, 2013, at 13:37, Dimitry Andric d...@freebsd.org wrote:
 Can you try to figure out which copy of clang ccache finds and runs?

I enabled CCACHE_LOGFILE, and it seems that it runs /usr/bin/clang:

[2013-05-23T12:25:36.810346 48913] Command line: 
/usr/local/libexec/ccache/clang 
--sysroot=/home/elars/obj/usr/home/elars/src/tmp 
-B/home/elars/obj/usr/home/elars/src/tmp/usr/bin -E -M 
-I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/include 
-I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/include
 
-I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic
 -I. 
-I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/../../lib/clang/include
 -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS 
-D__STDC_CONSTANT_MACROS -DCLANG_ENABLE_ARCMT -DCLANG_ENABLE_REWRITER 
-DCLANG_ENABLE_STATIC_ANALYZER 
-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-freebsd10.0 
-DLLVM_HOSTTRIPLE=x86_64-unknown-freebsd10.0 -DDEFAULT_SYSROOT= 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Builtins.cpp
 /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/
 clang/lib/Basic/CharInfo.cpp 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Diagnostic.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/DiagnosticIDs.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileManager.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileSystemStatCache.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/IdentifierTable.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/LangOptions.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Module.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/ObjCRuntime.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/OpenMPKinds.cpp
 /usr/home/elars/
 
src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/OperatorPrecedence.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceLocation.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceManer.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/TargetInfo.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Targets.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/TokenKinds.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Version.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/VersionTuple.cpp
[2013-05-23T12:25:36.810373 48913] Hostname: stanley.muccbc.hq.netapp.com
[2013-05-23T12:25:36.810380 48913] Working directory: (null)
[2013-05-23T12:25:36.810399 48913] Failed; falling back to running the real 
compiler
[2013-05-23T12:25:36.810405 48913] Executing /usr/bin/clang 
--sysroot=/home/elars/obj/usr/home/elars/src/tmp 
-B/home/elars/obj/usr/home/elars/src/tmp/usr/bin -E -M 
-I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/include 
-I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/include
 
-I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic
 -I. 
-I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/../../lib/clang/include
 -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS 
-D__STDC_CONSTANT_MACROS -DCLANG_ENABLE_ARCMT -DCLANG_ENABLE_REWRITER 
-DCLANG_ENABLE_STATIC_ANALYZER 
-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-freebsd10.0 
-DLLVM_HOSTTRIPLE=x86_64-unknown-freebsd10.0 -DDEFAULT_SYSROOT= 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Builtins.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/CharI
 nfo.cpp 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Diagnostic.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/DiagnosticIDs.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileManager.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileSystemStatCache.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/IdentifierTable.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/LangOptions.cpp

ccache issues during buildworld on recent -CURRENT

2013-05-22 Thread Eggert, Lars

Hi,

my buildworlds using ccache have recently begun failing with the message below. 
Buildworld without ccache works fine. Any ideas?

CC='/usr/local/libexec/ccache/world/clang 
--sysroot=/home/elars/obj/usr/home/elars/src/tmp 
-B/home/elars/obj/usr/home/elars/src/tmp/usr/bin' mkdep -f .depend -a
-I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/include 
-I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/include
 
-I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic
 -I. 
-I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/../../lib/clang/include
 -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS 
-D__STDC_CONSTANT_MACROS -DCLANG_ENABLE_ARCMT -DCLANG_ENABLE_REWRITER 
-DCLANG_ENABLE_STATIC_ANALYZER 
-DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd10.0\ 
-DLLVM_HOSTTRIPLE=\x86_64-unknown-freebsd10.0\ -DDEFAULT_SYSROOT=\\
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Builtins.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/li
 b/Basic/CharInfo.cpp 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Diagnostic.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/DiagnosticIDs.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileManager.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileSystemStatCache.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/IdentifierTable.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/LangOptions.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Module.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/ObjCRuntime.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/OpenMPKinds.cpp
 /usr/home/elars/src/lib/
 
clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/OperatorPrecedence.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceLocation.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceManager.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/TargetInfo.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Targets.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/TokenKinds.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Version.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/VersionTuple.cpp
 
/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceManager.cpp:1100:10:
 fatal error: 'emmintrin.h' file not found
#include emmintrin.h
 ^
1 error generated.
mkdep: compile failed

Thanks,
Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: newnfs pkgng database corruption?

2013-04-23 Thread Eggert, Lars

Hi,

On Apr 22, 2013, at 2:56, Baptiste Daroussin b...@freebsd.org wrote:
 As anyone been able to test this patch?

I've been running with it for a few days. I've done a reinstall of all ports 
plus a few portmaster -a runs without pkgng database corruption. I've not 
tested it for very long, but so far, things look good.

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: newnfs pkgng database corruption?

2013-04-12 Thread Eggert, Lars

Hi,

On Apr 12, 2013, at 1:10, Rick Macklem rmack...@uoguelph.ca wrote:
 Well, I have no idea why an NFS server would reply errno 70 if the file
 still exists, unless the client has somehow sent a bogus file handle
 to the server. (I am not aware of any client bug that might do that. I
 am almost suspicious that there might be a memory problem or something
 that corrupts bits in the network layer. Do you have TSO enabled for your
 network interface by any chance? If so, I'd try disabling that on the
 network interface. Same goes for checksum offload.)
 
 rick
 ps: If you can capture packets between the client and server at the
time this error occurs, looking at them in wireshark might be
useful?

I will try all of those things.

But first, a question that someone who understands pkgng will be able to 
answerr: Is this fake-pkg process even running on the NFS mount? The WRKDIR 
is /tmp, which is an mfs mount.

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: newnfs pkgng database corruption?

2013-04-11 Thread Eggert, Lars

Hi,

On Apr 11, 2013, at 10:30, Baptiste Daroussin b...@freebsd.org wrote:
 First, I think you can recover your database.

that would be great.

 Can you try the following command:
 
 # mv /var/db/pkg/local.sqlite /var/db/pkg/backup.sqlite
 # echo '.dump' | pkg shell /var/db/pkg/backup.sqlite | pkg shell

That step doesn't quite work:

[root@stanley /usr/home/elars/local/db]# echo '.dump' | pkg shell backup.sqlite 
| pkg shell
Error: near line 15927: column path is not unique
Error: near line 15928: column path is not unique
Error: near line 15929: column path is not unique
Error: near line 15930: column path is not unique
Error: near line 15931: column path is not unique
Error: near line 15932: column path is not unique
Error: near line 15933: column path is not unique
Error: near line 15934: column path is not unique
Error: near line 15935: column path is not unique
Error: near line 15936: column path is not unique
Error: near line 15937: column path is not unique

[root@stanley /usr/home/elars/local/db]# ll local.sqlite 
-rw-r--r--  1 root  wheel  0 Apr 11 10:42 local.sqlite

I can send you the database off-list, if you like.

 I think the corruption you get are due to the synchronous pragma. I need to 
 dig
 in that direction.

Thanks for looking into this!

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

newnfs pkgng database corruption?

2013-04-10 Thread Eggert, Lars

Hi,

on a diskless server, I keep the ports tree and pkgng databases on a newnfs 
NFSv4 mount. After a bunch of portmaster -a runs, the pkgng sqlite database 
appears to get corrupted. For example, when I try to update an existing port, 
this happens:

root@five:~ # portmaster ports-mgmt/pkg
...
===   Registering installation for pkg-1.0.11
Installing pkg-1.0.11...pkg: sqlite: database disk image is malformed 
(pkgdb.c:925)
pkg: sqlite: database disk image is malformed (pkgdb.c:1914)
*** [fake-pkg] Error code 70

I have removed all ports and the pkgng databases and reinstalled, but the 
corruption seems to return after a few days or weeks of installing and 
deinstalling ports.

On another system that has a disk, that corruption of the pkgng database has 
not happened over six months or so. I therefore wonder if storing the sqlite 
database on an NFS-mount is triggering some sort of bug, either in pkgng or in 
newnfs. AFAIK, pkgng is using locks on the database quite liberally, could that 
be where a bug is lurking?

I'm happy to help debug this, but someone would need to let me know what to try.

Thanks,
Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: newnfs pkgng database corruption?

2013-04-10 Thread Eggert, Lars

Hi,

On Apr 10, 2013, at 10:02, Baptiste Daroussin b...@freebsd.org wrote:
 This can usually happen when a user do not have the nfs lock system started.
 Are you sure that nfs lock is correctly started?

with NFSv4, the locking system is integrated with the main protocol, it's no 
longer separate.

 If that is the case, there is anyway a bug in pkgng that should catch the
 problem and refuse to operate in such situation, I know sqlite to provide a
 mechanism that allow us to be able to catch this, I'm not sure yet to use it.

Not sure about that.

In case anyone wonders, the corruption is quite substantial:

[elars@stanley ~]$ sqlite3 local/db/local.sqlite 
SQLite version 3.7.14.1 2012-10-04 19:37:12
Enter .help for instructions
Enter SQL statements terminated with a ;
sqlite PRAGMA integrity_check; 
*** in database main ***
On tree page 1238 cell 17: 2nd reference to page 1237
On tree page 1238 cell 17: Child page depth differs
On tree page 1238 cell 18: Child page depth differs
On tree page 1241 cell 6: Rowid 17518 out of order (max larger than parent max 
of 12550)
On tree page 1242 cell 3: Rowid 17566 out of order (max larger than parent max 
of 12557)
On tree page 1243 cell 6: Rowid 12558 out of order (min less than parent min of 
17566)
On tree page 2867 cell 28: 2nd reference to page 1241
On tree page 2867 cell 28: Child page depth differs
On tree page 2867 cell 29: 2nd reference to page 1242
On tree page 2867 cell 30: Child page depth differs
On tree page 1417 cell 66: 2nd reference to page 1239
On tree page 1417 cell 66: Child page depth differs
On tree page 1417 cell 67: 2nd reference to page 1240
On tree page 1417 cell 68: Child page depth differs
rowid 62 missing from index sqlite_autoindex_packages_1
wrong # of entries in index sqlite_autoindex_packages_1
rowid 96 missing from index scripts_package_id
rowid 96 missing from index sqlite_autoindex_scripts_1
rowid 97 missing from index scripts_package_id
rowid 97 missing from index sqlite_autoindex_scripts_1
rowid 98 missing from index scripts_package_id
rowid 98 missing from index sqlite_autoindex_scripts_1
wrong # of entries in index scripts_package_id
wrong # of entries in index sqlite_autoindex_scripts_1
rowid 12509 missing from index sqlite_autoindex_files_1
rowid 12510 missing from index sqlite_autoindex_files_1
rowid 12511 missing from index sqlite_autoindex_files_1
rowid 12512 missing from index sqlite_autoindex_files_1
rowid 86 missing from index files_package_id
rowid 86 missing from index sqlite_autoindex_files_1
rowid 87 missing from index files_package_id
rowid 87 missing from index sqlite_autoindex_files_1
Error: database disk image is malformed

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: newnfs pkgng database corruption?

2013-04-10 Thread Eggert, Lars

Hi,

On Apr 11, 2013, at 1:28, Rick Macklem rmack...@uoguelph.ca wrote:
 Error code 70 is ESTALE (or NFSERR_STALE, if you prefer). The server
 replies with that when the file no longer exists.
 
 File locking doesn't stop a file from being removed, as far as I know.

but the file is still there.

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: newnfs pkgng database corruption?

2013-04-10 Thread Eggert, Lars

Hi,

On Apr 11, 2013, at 0:16, Baptiste Daroussin b...@freebsd.org wrote:
 Will you be able to test it?

yes. (But I will be traveling for the next two weeks and so the turnaround may 
be a bit longer than normal.)

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: NewNFS vs. oldNFS for 10.0?

2013-03-15 Thread Eggert, Lars

Hi,

this reminds me that I ran into an issue lately with the new NFS and locking 
for NFSv3 mounts on a client that ran -CURRENT and a server that ran -STABLE.

When I ran portmaster -a on the client, which mounted /usr/ports and 
/usr/local, as well as the location of the respective sqlite databases over 
NFSv3, the client network stack became unresponsive on all interfaces for 30 or 
so seconds and e.g. SSH connections broke. The serial console remained active 
throughout, and the system didn't crash. About a minute after the wedgie I 
could SSH into the box again, too.

The issue went away when I killed lockd on the client, but that caused the 
sqlite database to become corrupted over time. The workaround for me was to 
move to NFSv4, which has been working fine. (One more reason to make it the 
default...)

I'm not really sure how to debug this further, but would be willing to work 
with someone off-list who'd tell me what tests to run.

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: serial console not accepting input?

2013-03-05 Thread Eggert, Lars

On Mar 4, 2013, at 20:59, Doug Ambrisko ambri...@ambrisko.com wrote:
 Try to do a {Ctrl}D to see if works.  We've seen that the TX on reset
 hangs but input works fine.  I'm not sure if we ran into this with
 uart(4) but had a problem with sio(4).

No change.

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Dtrace: Module is no longer loaded

2013-02-21 Thread Eggert, Lars

Hi,

On Feb 19, 2013, at 16:03, Andriy Gapon a...@freebsd.org wrote:
 Couple of thoughts:
 - is your kernel installed in the typical location?

yup.

 - what does the following produce?
 readelf -a -W /boot/kernel/kernel | fgrep shstrtab
 readelf -a -W /boot/kernel/kernel | fgrep SUNW_ctf

# readelf -a -W /boot/kernel/kernel | fgrep shstrtab
  [24] .shstrtab STRTAB   7975ee 000124 00  
0   0  1

# readelf -a -W /boot/kernel/kernel | fgrep SUNW_ctf
  [22] .SUNW_ctf PROGBITS 74ed68 048872 00  
0   0  4

And then:

# dtrace -n 'syscall:::'
dtrace: invalid probe specifier syscall /usr/lib/dtrace/psinfo.d, line 
90: failed to resolve type kernel`struct thread * for identifier curthread: 
Module is no longer loaded

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: system 20% busy at all times?

2013-02-20 Thread Eggert, Lars

Hi,

On Feb 19, 2013, at 17:58, Adrian Chadd adr...@freebsd.org wrote:
 Try top -HS .. to try and break down the kernel threads.

ACPI is eating the cycles, according to top:

0 root 80 0K   496K -   2   1:13 27.88% 
kernel{acpi_task_2}
0 root 80 0K   496K -   0   1:13 25.68% 
kernel{acpi_task_1}
0 root 80 0K   496K CPU11   1:07 23.68% 
kernel{acpi_task_0}

I got an off-list hint that the machine in question requires device mptable 
instead of relying on ACPI. I will try that.

As for dtrace, a complete buildworld/installworld cycle didn't change things, I 
still get:

# dtrace -n 'syscall:::entry { @num[execname] = count(); }'
dtrace: invalid probe specifier syscall:::entry { @num[execname] = count(); }: 
/usr/lib/dtrace/psinfo.d, line 90: failed to resolve type kernel`struct 
thread * for identifier curthread: Module is no longer loaded

Thanks for all the help!

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

system 20% busy at all times?

2013-02-19 Thread Eggert, Lars

Hi,

I have a system running -CURRENT that in top(1) is showing ~20% CPU usage for 
the system at all times. Any ideas what could be causing this, or how I would 
go about diagnosing this further? Nothing in the logs.

Thanks,
Lars

PS: dmesg attached, in case it helps:

Copyright (c) 1992-2013 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.0-CURRENT #11 r+2fc9b3d: Tue Feb 12 19:32:15 CET 2013

el...@stanley.muccbc.hq.netapp.com:/home/elars/obj/usr/home/elars/src/sys/FAS3270
 amd64
FreeBSD clang version 3.2 (tags/RELEASE_32/final 170710) 20121221
CPU: Intel(R) Xeon(R) CPU   E5240  @ 3.00GHz (3000.17-MHz K8-class CPU)
  Origin = GenuineIntel  Id = 0x1067a  Family = 0x6  Model = 0x17  Stepping = 
10
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  
Features2=0xc0ce3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE,OSXSAVE
  AMD Features=0x20100800SYSCALL,NX,LM
  AMD Features2=0x1LAHF
  TSC: P-state invariant, performance statistics
real memory  = 18253611008 (17408 MB)
avail memory = 16526143488 (15760 MB)
Event timer LAPIC quality 400
ACPI APIC Table: PTLTD  CARNEGIE
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  6
 cpu3 (AP): APIC ID:  7
ioapic0 Version 2.0 irqs 0-23 on motherboard
kbd0 at kbdmux0
ctl: CAM Target Layer loaded
smbios0: System Management BIOS at iomem 0xf6c00-0xf6c1e on motherboard
smbios0: Version: 2.5
cryptosoft0: software crypto on motherboard
acpi0: PTLTD CARNEGIE on motherboard
acpi0: Power Button (fixed)
cpu0: ACPI CPU on acpi0
ACPI Error: [\134_SB_.PCI0.LPC0.BCMD] Namespace lookup failure, AE_NOT_FOUND 
(20130117/psargs-393)
ACPI Error: Method parse/execution failed [\134_PR_.CPU0._OSC] (Node 
0xfe0007630c00), AE_NOT_FOUND (20130117/psparse-560)
ACPI Error: [\134_SB_.PCI0.LPC0.BCMD] Namespace lookup failure, AE_NOT_FOUND 
(20130117/psargs-393)
ACPI Error: Method parse/execution failed [\134_PR_.CPU0._OSC] (Node 
0xfe0007630c00), AE_NOT_FOUND (20130117/psparse-560)
ACPI Error: Method parse/execution failed [\134_PR_.CPU0._PDC] (Node 
0xfe0007630c40), AE_NOT_FOUND (20130117/psparse-560)
cpu1: ACPI CPU on acpi0
cpu2: ACPI CPU on acpi0
cpu3: ACPI CPU on acpi0
atrtc0: AT realtime clock port 0x70-0x71 irq 8 on acpi0
Event timer RTC frequency 32768 Hz quality 0
attimer0: AT timer port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter i8254 frequency 1193182 Hz quality 0
Event timer i8254 frequency 1193182 Hz quality 100
Timecounter ACPI-safe frequency 3579545 Hz quality 850
acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pcib1: ACPI PCI-PCI bridge at device 2.0 on pci0
pci1: ACPI PCI bus on pcib1
pci1: network, ethernet at device 0.0 (no driver attached)
pcib2: PCI-PCI bridge at device 3.0 on pci0
pci2: PCI bus on pcib2
pcib3: ACPI PCI-PCI bridge at device 4.0 on pci0
pci3: ACPI PCI bus on pcib3
pcib4: ACPI PCI-PCI bridge mem 0xdeb0-0xdeb1 irq 16 at device 0.0 on 
pci3
pci4: ACPI PCI bus on pcib4
pcib4: no PRT entry for 4.4.INTA
pcib4: no PRT entry for 4.5.INTA
pcib4: no PRT entry for 4.8.INTA
pcib5: PCI-PCI bridge irq 5 at device 4.0 on pci4
pci5: PCI bus on pcib5
pcib6: PCI-PCI bridge irq 10 at device 5.0 on pci4
pci6: PCI bus on pcib6
pcib4: no PRT entry for 4.5.INTA
pcib4: no PRT entry for 4.5.INTB
ix0: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.0 mem 
0xdec0-0xdec7,0xded0-0xded03fff irq 10 at device 0.0 on pci6
ix0: Using MSIX interrupts with 5 vectors
ix0: Ethernet address: 90:e2:ba:2b:3b:6c
ix0: PCI Express Bus: Speed 5.0Gb/s Width x8
ix1: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.0 mem 
0xdec8-0xdecf,0xded04000-0xded07fff irq 11 at device 0.1 on pci6
ix1: Using MSIX interrupts with 5 vectors
ix1: Ethernet address: 90:e2:ba:2b:3b:6d
ix1: PCI Express Bus: Speed 5.0Gb/s Width x8
pcib7: ACPI PCI-PCI bridge irq 5 at device 8.0 on pci4
pci7: ACPI PCI bus on pcib7
pcib8: PCI-PCI bridge at device 5.0 on pci0
pci8: PCI bus on pcib8
pcib9: PCI-PCI bridge at device 6.0 on pci0
pci9: PCI bus on pcib9
pcib10: PCI-PCI bridge mem 0xdee0-0xdee1 irq 16 at device 0.0 on pci9
pci10: PCI bus on pcib10
pcib11: PCI-PCI bridge irq 16 at device 0.0 on pci10
pci11: PCI bus on pcib11
pcib12: PCI-PCI bridge mem 0xdef0-0xdef1 irq 16 at device 0.0 on pci11
pci12: PCI bus on pcib12
pcib13: PCI-PCI bridge irq 17 at device 1.0 on pci12
pci13: PCI bus on pcib13
pcib14: PCI-PCI bridge irq 16 at device 4.0 on pci12
pci14: PCI bus on pcib14
pcib15: PCI-PCI bridge irq 17 at device 5.0 on pci12

Re: system 20% busy at all times?

2013-02-19 Thread Eggert, Lars

Hi,

On Feb 19, 2013, at 10:40, Fleuriot Damien m...@my.gd
 wrote:
 What about reviewing top(1) ?

top shows the ~20% I mentioned:

last pid:  3176;  load averages:  0.79,  0.80,  0.84
 up 0+14:49:49  09:43:51
17 processes:  1 running, 16 sleeping
CPU:  0.0% user,  0.0% nice, 18.7% system,  0.0% interrupt, 81.3% idle
Mem: 32M Active, 9456K Inact, 196M Wired, 19M Buf, 15G Free
Swap: 

  PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
 3002 root  1  200 14264K  1664K select  0   0:02  0.00% powerd
 2999 root  1  200 25120K  3304K select  3   0:01  0.00% ntpd
 3084 root  1  200 81420K  6120K select  0   0:00  0.00% sshd
 3094 root  1  200 17180K  3956K pause   1   0:00  0.00% csh
 3062 root  1  210 17180K  3900K ttyin   2   0:00  0.00% csh
 2867 root  1  200 14296K  2028K select  1   0:00  0.00% syslogd
 2959 root  1  520 20500K  7512K rpcsvc  3   0:00  0.00% rpc.lockd
 2943 root  1  200 16376K  2064K select  2   0:00  0.00% rpcbind
 3061 root  1  200 47504K  2604K wait3   0:00  0.00% login
 2945 root  1  200   274M  7448K select  1   0:00  0.00% rpc.statd
 3176 root  1  200 19608K  2996K CPU31   0:00  0.00% top
 2676 root  1  200  9016K  4652K select  0   0:00  0.00% devd
 3014 root  1  200 56152K  4964K select  2   0:00  0.00% sshd
 2562 root  1  290 14416K  2224K select  1   0:00  0.00% dhclient
 2629 _dhcp 1  200 14416K  2240K select  2   0:00  0.00% dhclient
 3065 root  1  200 14528K  1708K select  2   0:00  0.00% netserver
 2708 root  1  200 14232K  1568K select  2   0:00  0.00% rtsold


 or possibly ps(1) aufx

# ps -aufx
USER   PID  %CPU %MEMVSZ  RSS TT  STAT STARTED   TIME COMMAND
root10 346.8  0.0  0   64  -  RL6:54PM 2862:46.43 [idle]
root 0  64.1  0.0  0  496  -  DLs   6:54PM  694:47.32 [kernel]
root 1   0.0  0.0   9344  792  -  ILs   6:54PM0:00.09 /sbin/init --
root 2   0.0  0.0  0   16  -  DL6:54PM0:00.00 [crypto]
root 3   0.0  0.0  0   16  -  DL6:54PM0:00.00 [crypto returns]
root 4   0.0  0.0  0   16  -  DL6:54PM0:00.00 [ctl_thrd]
root 5   0.0  0.0  0   16  -  DL6:54PM0:00.00 [xpt_thrd]
root 6   0.0  0.0  0   16  -  DL6:54PM0:00.00 [ipmi0: kcs]
root 7   0.0  0.0  0   16  -  DL6:54PM0:00.04 [pagedaemon]
root 8   0.0  0.0  0   16  -  DL6:54PM0:00.00 [pagezero]
root 9   0.0  0.0  0   16  -  DL6:54PM0:00.15 [bufdaemon]
root11   0.0  0.0  0  416  -  WL6:54PM0:10.47 [intr]
root12   0.0  0.0  0   48  -  DL6:54PM0:00.03 [geom]
root13   0.0  0.0  0   16  -  DL6:54PM0:01.87 [yarrow]
root14   0.0  0.0  0  256  -  DL6:54PM0:00.50 [usb]
root15   0.0  0.0  0   16  -  DL6:54PM0:00.17 [vnlru]
root16   0.0  0.0  0   16  -  DL6:54PM0:00.56 [syncer]
root17   0.0  0.0  0   16  -  DL6:54PM0:00.21 [softdepflush]
root42   0.0  0.0  0   16  -  DL6:54PM0:00.03 [md0]
root53   0.0  0.0  0   16  -  DL6:54PM0:00.00 [md1]
root   120   0.0  0.0  0   16  -  DL6:54PM0:00.00 [md2]
root   125   0.0  0.0  0   16  -  DL6:54PM0:00.00 [md3]
root  2562   0.0  0.0  14416 2224  -  Is6:54PM0:00.00 dhclient: em4 
[priv] (dhclient)
_dhcp 2629   0.0  0.0  14416 2240  -  Is6:54PM0:00.00 dhclient: em4 
(dhclient)
root  2676   0.0  0.0   9016 4652  -  Is6:54PM0:00.01 /sbin/devd
root  2708   0.0  0.0  14232 1568  -  Is6:54PM0:00.00 /usr/sbin/rtsold 
-a
root  2867   0.0  0.0  14296 2028  -  Ss6:54PM0:00.04 /usr/sbin/syslogd 
-s
root  2943   0.0  0.0  16376 2064  -  Ss6:54PM0:00.03 /usr/sbin/rpcbind
root  2945   0.0  0.0 280472 7448  -  Ss6:54PM0:00.02 
/usr/sbin/rpc.statd
root  2959   0.0  0.0  20500 7512  -  Ss6:54PM0:00.04 
/usr/sbin/rpc.lockd
root  2999   0.0  0.0  25120 3304  -  Ss6:54PM0:00.85 /usr/sbin/ntpd -g 
-c /etc/ntp.conf -p /var/run/ntpd.pid -f /var/db/ntpd.drift
root  3002   0.0  0.0  14264 1664  -  Ss6:54PM0:01.61 /usr/sbin/powerd
root  3014   0.0  0.0  56152 4964  -  Is6:54PM0:00.00 /usr/sbin/sshd -o 
PermitRootLogin=without-password
root  3065   0.0  0.0  14528 1708  -  Is6:54PM0:00.00 netserver
root  3084   0.0  0.0  81420 6120  -  Ss9:21AM0:00.09 sshd: root@pts/0 
(sshd)
root  3061   0.0  0.0  47504 2604 u0  Is6:54PM0:00.02 login [pam] 
(login)
root  3062   0.0  0.0  17180 3900 u0  I+6:54PM0:00.05 -csh (csh)
root  3094   0.0  0.0  17180 3956  0  Ss9:32AM0:00.05 -csh (csh)
root  3177   0.0  0.0  16436 1900  0  R+9:44AM0:00.00 ps -aufx


 At least you

Re: system 20% busy at all times?

2013-02-19 Thread Eggert, Lars

Hi,

On Feb 19, 2013, at 10:54, Fleuriot Damien m...@my.gd
 wrote:
 And indeed we find your answer here, acpi0 firing up a lot of interrupts.
 
 Don't you get any message about that in dmesg -a or /var/log/messages ?
 
 I'd expect something like interrupt storm blabla… source throttled blabla..

nope. The only odd ACPI-related messages I see in dmesg are these:

ACPI Error: [\134_SB_.PCI0.LPC0.BCMD] Namespace lookup failure, AE_NOT_FOUND 
(20130117/psargs-393)
ACPI Error: Method parse/execution failed [\134_PR_.CPU0._OSC] (Node 
0xfe0007630c00), AE_NOT_FOUND (20130117/psparse-560)
ACPI Error: [\134_SB_.PCI0.LPC0.BCMD] Namespace lookup failure, AE_NOT_FOUND 
(20130117/psargs-393)
ACPI Error: Method parse/execution failed [\134_PR_.CPU0._OSC] (Node 
0xfe0007630c00), AE_NOT_FOUND (20130117/psparse-560)
ACPI Error: Method parse/execution failed [\134_PR_.CPU0._PDC] (Node 
0xfe0007630c40), AE_NOT_FOUND (20130117/psparse-560)

Nothing in syslog.

 From man 4 acpi , in /boot/loader.conf :
 hint.acpi.0.disabled=1
 Set this to 1 to disable all of ACPI.  If ACPI has been disabled
 on your system due to a blacklist entry for your BIOS, you can
 set this to 0 to re-enable ACPI for testing.
 
 Any chance you could reboot the host with ACPI disabled ?

If I do that, I get an early kernel crash:

Loading 10.11.12.13/~elars/kernel/kernel:0x20/7634255 0xb47d50/473552 
0xbbb720/890736 Entry at 0x802746f0
Closing network.
Starting program at 0x802746f0
GDB: no debug ports present
KDB: debugger backends: ddb
KDB: current backend: ddb
panic: running without device atpic requires a local APIC
cpuid = 0
KDB: stack backtrace:
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x805c2973
stack pointer   = 0x28:0x80c9a960
frame pointer   = 0x28:0x80c9aa80
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 0 ()
[ thread pid 0 tid 0 ]
Stopped at  0x805c2973: movzbl  (%rdi),%ecx


 If that helps your CPU load, try setting this in /boot/loader.conf :
 hw.acpi.verbose=1
   Turn on verbose debugging information about what ACPI is doing.

Done, but it doesn't really result in any additional messages:

# dmesg | grep -i acpi
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
ACPI APIC Table: PTLTD  CARNEGIE
acpi0: PTLTD CARNEGIE on motherboard
acpi0: Power Button (fixed)
cpu0: ACPI CPU on acpi0
ACPI Error: [\134_SB_.PCI0.LPC0.BCMD] Namespace lookup failure, AE_NOT_FOUND 
(20130117/psargs-393)
ACPI Error: Method parse/execution failed [\134_PR_.CPU0._OSC] (Node 
0xfe0007630c00), AE_NOT_FOUND (20130117/psparse-560)
ACPI Error: [\134_SB_.PCI0.LPC0.BCMD] Namespace lookup failure, AE_NOT_FOUND 
(20130117/psargs-393)
ACPI Error: Method parse/execution failed [\134_PR_.CPU0._OSC] (Node 
0xfe0007630c00), AE_NOT_FOUND (20130117/psparse-560)
ACPI Error: Method parse/execution failed [\134_PR_.CPU0._PDC] (Node 
0xfe0007630c40), AE_NOT_FOUND (20130117/psparse-560)
cpu1: ACPI CPU on acpi0
cpu2: ACPI CPU on acpi0
cpu3: ACPI CPU on acpi0
atrtc0: AT realtime clock port 0x70-0x71 irq 8 on acpi0
attimer0: AT timer port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter ACPI-fast frequency 3579545 Hz quality 900
acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pcib1: ACPI PCI-PCI bridge at device 2.0 on pci0
pci1: ACPI PCI bus on pcib1
pcib3: ACPI PCI-PCI bridge at device 4.0 on pci0
pci3: ACPI PCI bus on pcib3
pcib4: ACPI PCI-PCI bridge mem 0xdeb0-0xdeb1 irq 16 at device 0.0 on 
pci3
pci4: ACPI PCI bus on pcib4
pcib7: ACPI PCI-PCI bridge irq 5 at device 8.0 on pci4
pci7: ACPI PCI bus on pcib7
pcib29: ACPI PCI-PCI bridge irq 16 at device 28.0 on pci0
pci29: ACPI PCI bus on pcib29
pcib30: ACPI PCI-PCI bridge irq 16 at device 28.4 on pci0
pci30: ACPI PCI bus on pcib30
pcib31: ACPI PCI-PCI bridge irq 17 at device 28.5 on pci0
pci31: ACPI PCI bus on pcib31
pcib32: ACPI PCI-PCI bridge at device 30.0 on pci0
pci32: ACPI PCI bus on pcib32
acpi_button0: Power Button on acpi0
uart0: 16550 or compatible port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart1: 16550 or compatible port 0x2f8-0x2ff irq 3 on acpi0

Lars

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: system 20% busy at all times?

2013-02-19 Thread Eggert, Lars

Hi,

On Feb 19, 2013, at 11:21, Fleuriot Damien m...@my.gd wrote:
 What about a newly build kernel without the line device acpi and without 
 the options ACPI_DEBUG ?
 Hoping that this kernel:
 1/ won't crash on boot
 2/ will make the 20% cpu load and high interrupt rates disappear

I added device atpic to my kernel config and rebooted with 
hint.acpi.0.disabled=1 in the loader. I get further during boot, but then get a 
panic: No usable event timer found! Also, my is devices showed errors trying 
to allocate bus resources.

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: system 20% busy at all times?

2013-02-19 Thread Eggert, Lars

Hi,

thanks for looking into this!

On Feb 19, 2013, at 12:14, Andriy Gapon a...@freebsd.org wrote:
 Please try to run the following DTrace script (dtrace -s script-file) and
 capture its output.

I get this error:

# dtrace -s x
dtrace: failed to compile script x: /usr/lib/dtrace/psinfo.d, line 90: failed 
to resolve type kernel`struct thread * for identifier curthread: Module is no 
longer loaded

(New to dtrace, so no clue what this means.)

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: system 20% busy at all times?

2013-02-19 Thread Eggert, Lars

Hi,

On Feb 19, 2013, at 13:15, Lars Engels lars.eng...@0x20.net wrote:
 You need to recompile your Kernel to use DTrace:
 https://wiki.freebsd.org/DTrace

I did. But I still get that error, even with the sample from the wiki:

# dtrace -n 'syscall:::entry { @num[execname] = count(); }'
dtrace: invalid probe specifier syscall:::entry { @num[execname] = count(); }: 
/usr/lib/dtrace/psinfo.d, line 90: failed to resolve type kernel`struct 
thread * for identifier curthread: Module is no longer loaded

I cross-compile the -CURRENT world and kernel under -STABLE for netbooting. 
Could doing that cause this issue?

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: system 20% busy at all times?

2013-02-19 Thread Eggert, Lars

Hi,

On Feb 19, 2013, at 13:37, Eggert, Lars l...@netapp.com wrote:
 On Feb 19, 2013, at 13:15, Lars Engels lars.eng...@0x20.net wrote:
 You need to recompile your Kernel to use DTrace:
 https://wiki.freebsd.org/DTrace
 
 I did. But I still get that error, even with the sample from the wiki:
 
 # dtrace -n 'syscall:::entry { @num[execname] = count(); }'
 dtrace: invalid probe specifier syscall:::entry { @num[execname] = count(); 
 }: /usr/lib/dtrace/psinfo.d, line 90: failed to resolve type kernel`struct 
 thread * for identifier curthread: Module is no longer loaded
 
 I cross-compile the -CURRENT world and kernel under -STABLE for netbooting. 
 Could doing that cause this issue?

FWIW, a full buildworld/installworld of the latest -CURRENT also didn't help, 
the error remains.

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Dtrace: Module is no longer loaded

2013-02-19 Thread Eggert, Lars

Hi,

did you ever figure this out? I'm seeing the same thing.

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: mounting root from NFS via ROOTDEVNAME

2013-01-31 Thread Eggert, Lars

Hi,

On Jan 30, 2013, at 22:43, Craig Rodrigues rodr...@crodrigues.org wrote:
 What you need to do is, before the FreeBSD kernel boots, your
 loader needs to export some environment variables.  This will trigger
 the various behaviors in the FreeBSD mount code.

the loader can export some environment variables (this is how I get the serial 
console working.)

 So as I suggested before, you should continue with:
 
 (1)  Have /usr/home/elars/dst/etc/fstab with:
 #  Options  Dump Pass
 10.11.12.13:/usr/home/elars/dst/   / nfs  ro00

Done.

 (2)  From your loader, you need to export this environment variable, so
 that the kernel can get it with getenv().  You need at least:
 
 vfs.root.mountfrom=nfs:10.11.12.13:/usr/home/elars/dst

Done.

 Now, there are some other environment variables you need to export from the
 loader.  
 
 boot.netif.ip
 boot.netif.netmask
 boot.netif.gateway
 boot.nfsroot.server
 boot.nfsroot.path

Done. I also ripped out all the BOOTP* options from the kernel.

However, this still fails:

Trying to mount root from nfs:10.11.12.13:/usr/home/elars/dst []...
mountroot: waiting for device 10.11.12.13:/usr/home/elars/dst ...
Mounting from nfs:10.11.12.13:/usr/home/elars/dst failed with error 19.

Loader variables:
  vfs.root.mountfrom=nfs:10.11.12.13:/usr/home/elars/dst

Manual root filesystem specification:
  fstype:device [options]
  Mount device using filesystem fstype
  and with the specified (optional) option list.

eg. ufs:/dev/da0s1a
zfs:tank
cd9660:/dev/acd0 ro
  (which is equivalent to: mount -t cd9660 -o ro /dev/acd0 /)

  ?   List valid disk boot devices
  .   Yield 1 second (for background tasks)
  empty lineAbort manual input

mountroot 

I did a tcpdump and no traffic shows up on the correct interface (em4). I guess 
I need to set yet another loader environment variable to indicate which 
interface I'd like to use. Looking at the source, I only see boot.netif.name, 
but setting that to em4 doesn't help either.

Any further ideas?

Thanks,
Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: mounting root from NFS via ROOTDEVNAME

2013-01-31 Thread Eggert, Lars

On Jan 31, 2013, at 12:45, Andreas Nilsson andrn...@gmail.com
 wrote:
 Just a shot in the dark, did you actually tell it to do the root mount ro,
 or try with the nfs share as rw?

ro

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: mounting root from NFS via ROOTDEVNAME

2013-01-31 Thread Eggert, Lars

On Jan 31, 2013, at 12:53, Andre Oppermann an...@freebsd.org
 wrote:
 The interface doesn't have a name during loader stage.  The kernel
 finds the interface to use based on the MAC address.  You should
 set boot.netif.hwaddr as well in the kernel environment.

Done, no change. Here is what's in my loader environment:

boot.netif.netmask   255.255.255.0
boot.netif.gateway   10.11.12.13
boot.nfsroot.server  10.11.12.13
boot.nfsroot.path/usr/home/elars/dst
boot.netif.ip10.11.12.15
boot.netif.name  em4
boot.netif.hwaddrxx:xx:xx:xx:xx:xx
vfs.root.mountfrom   nfs:10.11.12.13:/usr/home/elars/dst

And here is what I see during boot:

Trying to mount root from nfs:10.11.12.13:/usr/home/elars/dst []...
mountroot: waiting for device 10.11.12.13:/usr/home/elars/dst ...
Mounting from nfs:10.11.12.13:/usr/home/elars/dst failed with error 19.

Lars

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: mounting root from NFS via ROOTDEVNAME

2013-01-31 Thread Eggert, Lars

On Jan 31, 2013, at 15:54, Daniel Braniss da...@cs.huji.ac.il wrote:
 a shot in the dark, but is /usr/home/elars/dst properly exported?

Yep, the NFS mount works fine when I use BOOTP with a root-path option

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: mounting root from NFS via ROOTDEVNAME

2013-01-30 Thread Eggert, Lars

Hi,

On Jan 30, 2013, at 10:32, Eggert, Lars l...@netapp.com wrote:
 On Jan 29, 2013, at 20:22, Craig Rodrigues rodr...@crodrigues.org wrote:
 In src/sys/boot/common/boot.c which is part of the loader (not the kernel),
 if you look in the getrootmount() function,
 you will see that the loader will try to figure out where the root file
 system
 is by parsing /etc/fstab, and looking for the / mount.
 
 So, if your kernel is located in:
 
  /usr/home/elars/dst/boot/kernel/kernel
 
 Then create a file /usr/home/elars/dst/etc/fstab file with something like:
 
 # Device MountpointFSType
 Options  Dump Pass
 10.11.12.13:/usr/home/elars/dst/   / nfs  ro00
 
 Thanks, will try that!

doesn't work.

The kernel never leaves the DHCP/BOOTP timeout for server-loop unless I hand 
out a root-path option via DHCP.

I tried your tip above, I tried setting ROOTDEVNAME in the kernel, I created a 
/boot.config with -r in it on the NFS root - all to no avail. 

 Alternatively, if you don't want to create an /etc/fstab file, then
 you could put something like this in your loader.conf file:
 
 vfs.root.mountfrom=nfs:10.11.12.13:/usr/home/elars/dst
 
 Will try that too, but not sure if this works with our custom loader.

Doesn't seem to work either.

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: mounting root from NFS via ROOTDEVNAME

2013-01-29 Thread Eggert, Lars

Hi,

On Jan 29, 2013, at 9:34, Craig Rodrigues rodr...@crodrigues.org wrote:
 I recommend that you do not use ROOTDEVNAME, and instead
 you should follow the instructions which I wrote and contributed to the
 FreeBSD handbook:
 
 PXE Booting with an NFS Root File System
 
 http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-pxe-nfs.html
 
 The content of this document is the same as the text file which Rick
 Macklem pointed out (I wrote that too).

I had read both before, and they're very useful documents. Unfortunately, they 
don't fully apply to my case, since I'm not PXE-booting the system; it netboots 
the kernel from a custom loader. So once the kernel bootstraps, I need it to 
obtain an IP address and then NFS-mount root.

 
 BTW, if you ever visit the Netapp campus in Sunnyvale, California, feel
 free to say hello,
 because I work around the corner from there. :)
 --
 Craig Rodrigues
 rodr...@crodrigues.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: mounting root from NFS via ROOTDEVNAME

2013-01-29 Thread Eggert, Lars

On Jan 29, 2013, at 10:13, Lars Eggert l...@netapp.com
 wrote:
 On Jan 29, 2013, at 9:34, Craig Rodrigues rodr...@crodrigues.org wrote:
 I recommend that you do not use ROOTDEVNAME, and instead
 you should follow the instructions which I wrote and contributed to the
 FreeBSD handbook:
 
 PXE Booting with an NFS Root File System
 
 http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-pxe-nfs.html
 
 The content of this document is the same as the text file which Rick
 Macklem pointed out (I wrote that too).
 
 I had read both before, and they're very useful documents. Unfortunately, 
 they don't fully apply to my case, since I'm not PXE-booting the system; it 
 netboots the kernel from a custom loader. So once the kernel bootstraps, I 
 need it to obtain an IP address and then NFS-mount root.

(Whoops, hit send by mistake.)

That's what I was trying to achieve with the BOOTP and BOOTP_WIRED_TO options.

Hm, I wonder if I could simply use the custom loader to netboot tftpboot, and 
then follow your instructions... Will try.

 BTW, if you ever visit the Netapp campus in Sunnyvale, California, feel
 free to say hello, because I work around the corner from there. :)

Am there about once a month, will do :-)

Lars

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

mounting root from NFS via ROOTDEVNAME

2013-01-28 Thread Eggert, Lars

Hi,

I'm trying to netboot a system where the root device is specified in the kernel 
via ROOTDEVNAME:

options BOOTP
options BOOTP_NFSROOT
options BOOTP_NFSV3
options BOOTP_COMPAT
options BOOTP_WIRED_TO=em4
options ROOTDEVNAME=\nfs:10.11.12.13:/usr/home/elars/dst\

I was under the assumption that specifying a ROOTDEVNAME in the kernel config 
would override the root-path option in DHCP, or at least take effect when 
root-path wasn't provided via DHCP, but that doesn't seem to be the case. The 
system configures it's address correctly over em4, but then enters a loop:

em4: link state changed to UP
Received DHCP Offer packet on em4 from 0.0.0.0 (accepted) (no root path)
Sending DHCP Request packet from interface em4 (XX:XX:XX:XX:XX:XX)
Received DHCP Ack packet on em4 from 0.0.0.0 (accepted) (no root path)
Received DHCP Ack packet on em4 from 0.0.0.0 (accepted) (no root path)
DHCP/BOOTP timeout for server 255.255.255.255
Received DHCP Ack packet on em4 from 0.0.0.0 (accepted) (no root path)
DHCP/BOOTP timeout for server 255.255.255.255
...

If I hand out a root path via DHCP the system boots fine, but the idea here is 
to be able to boot different root devices without needing to diddle dhcpd.conf. 
Can this be done?

Thanks,
Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: mounting root from NFS via ROOTDEVNAME

2013-01-28 Thread Eggert, Lars

Hi,

On Jan 28, 2013, at 16:23, Ian Lepore i...@freebsd.org wrote:
 Remove the BOOTP_NFSROOT option, it tells the bootp/dhcp code to keep
 querying the server until a root path is delivered.  Without it, the
 ROOTDEVNAME option should get used (and I think even override a path
 from the server, if it delivers one).

no luck:

em4: link state changed to UP
Received DHCP Offer packet on em4 from 0.0.0.0 (accepted) (no root path)
Sending DHCP Request packet from interface em4 (XX:XX:XX:XX:XX:XX)
Received DHCP Ack packet on em4 from 0.0.0.0 (accepted) (no root path)
DHCP/BOOTP timeout for server 255.255.255.255
Received DHCP Ack packet on em4 from 0.0.0.0 (accepted) (no root path)
DHCP/BOOTP timeout for server 255.255.255.255
...

The only visible difference is that the first Received DHCP Ack packet line 
is now printed only once, instead of twice as in the previous log.

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: serial console not accepting input?

2013-01-24 Thread Eggert, Lars

Hi,

On Jan 23, 2013, at 17:04, Dimitry Andric d...@freebsd.org wrote:
 CTS/RTS hardware flow control, maybe?  E.g. add :hw to the default
 settings in /etc/gettytab, or make a specific entry with an added :hw
 setting.

nope, I don't even get a login prompt if I do that.

 If it is a physical serial console, you could also simply have a bad
 cable.  Try swapping it with working system. :)

Spent the last few hours fiddling with the cabling and the various BIOS serial 
redirection options (it's a Dell 2950). My best guess is that the serial port 
on the box is physically broken.

Thanks for the help!

Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

serial console not accepting input?

2013-01-23 Thread Eggert, Lars

Hi,

I'm embarrassed to ask this newbie question, but I'm at my wit's end: I've 
configured a serial console according to the handbook. I see the boot messages 
and get the login prompt. But at no point during the boot process does the 
console seem to accept any input, incl. when at the boot prompt. The same 
serial setup works fine with other boxes.

Any ideas?

Thanks,
Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

genmodes dumping core during buildworld

2012-11-15 Thread Eggert, Lars

Hi,

anyone see something similar before:

 === usr.sbin/zic/zdump (depend)
 rm -f .depend
 mkdep -f .depend -a-DTM_GMTOFF=tm_gmtoff -DTM_ZONE=tm_zone -DSTD_INSPIRED 
 -DPCTS -DHAVE_LONG_DOUBLE -DTZDIR=\/usr/share/zoneinfo\ -Demkdir=mkdir 
 -I/usr/src/usr.sbin/zic/zdump/.. 
 -I/usr/src/usr.sbin/zic/zdump/../../../contrib/tzcode/stdtime -std=gnu99   
 /usr/src/usr.sbin/zic/zdump/../../../contrib/tzcode/zic/zdump.c 
 /usr/src/usr.sbin/zic/zdump/../../../contrib/tzcode/zic/ialloc.c 
 /usr/src/usr.sbin/zic/zdump/../../../contrib/tzcode/zic/scheck.c
 echo zdump: /usr/obj/usr/src/tmp/usr/lib/libc.a   .depend
 === usr.sbin/zzz (depend)
 1 error
 *** [_depend] Error code 2
 1 error
 *** [buildworld] Error code 2
 1 error

This is in the log:

 Nov 15 15:56:49 server kernel: pid 83891 (genmodes), uid 0: exited on signal 
 10 (core dumped)
 Nov 15 15:56:49 server kernel: pid 83893 (genmodes), uid 0: exited on signal 
 10 (core dumped)
 Nov 15 15:56:49 server kernel: pid 83897 (genmodes), uid 0: exited on signal 
 10 (core dumped)


genmodes seems to be a component of gcc?

Lars

71 matches

Mail list logo