date:20060705

Re: Which FreeBSD is the most stable for Dell PowerEdge 2850

2006-07-05 Thread Dan Charrois


I don't have any 2850's but the 1850 I have has been running 6.0
since the BETA1, and last night just upgraded it to 6.1.  No issues.
The PERC 4e/Si card is phenominally fast on this system (running 2
disk RAID1).  I'd recommend you to run 6.1 as it is stable on all of
my Dell systems that run it (and I'm migrating the older FreeBSD
boxes to 6.1 as time permits).

If you already have > 1 CPU, you might as well leave hyperthreading
off.  There are cases where it degenerates performance rather than
enhance it.

As for mysql version, "no comment" :-)


Thanks for the reply!  I'm in the process of upgrading the 2850 to  
6.1 now, and it seems to have gone well so far.  Time will tell in  
the long term whether the stability is what I'm hoping for, but at  
least it does seem to be up and running okay so far.


As for hyperthreading, I did some benchmarking back with FreeBSD 5.4  
using the actual SQL databases I'm serving on the machine and loading  
the server with lots of simultaneous queries from remote machines  
similar to those which will be used in production.  Back then, there  
was about a 10% increase in performance.  I'll run the same tests  
again before putting the machine in production again to see if  
anything changed.


10% isn't much, but every bit helps, if hyperthreading doesn't cause  
the machine to become unstable otherwise.


Thanks again!

Dan
--
Syzygy Research & Technology
Box 83, Legal, AB  T0G 1L0 Canada
Phone: 780-961-2213

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?

2006-07-05 Thread Vye Wilson

I do not use usb at all on this paticular server. The only ones on it are
the 2-4 onboard ones. Removing uhci from my kernel fixed this issue. Thank
you.

On 7/5/06, Max Laier <[EMAIL PROTECTED]> wrote:

On Thursday 06 July 2006 02:17, Vye Wilson wrote:
> # vmstat -i
> interrupt  total   rate
> irq1: atkbd0   5  0
> irq6: fdc0 3  0
> irq10: uhci1   915633230 262810
> irq15: ata1 1306  0
> irq17: fwohci0 1  0
> irq18: fxp0 2876  0
> irq21: twa0  153  0
> cpu0: timer  6964974   1999
> Total  922602548 264811

Are you using usb on that box?  If not, get rid of device uhci in your
kernel
config to see if that fixes it.  If you are using usb - I have no idea.  A
BIOS upgrade might help.

--
/"\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News

--
--Vye
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: pkg_version confused by architecutre in package name

2006-07-05 Thread Brooks Davis

On Thu, Jul 06, 2006 at 02:45:45AM +0200, [LoN]Kamikaze wrote:
> I normally run the command
> #  pkg_version -Iv | grep \<
> before running 'portupgrade -a', to see what's going to happen. This time I 
> got the following output:
> 
> diablo-jdk-freebsd6.i386.1.5.0.07.00  <   needs updating (index has 
> 1.5.0.07.00)
> 
> It seems that the tool is confused by the i386 in the package name.

Actually I think it's confused by the fact that the package name is
"diablo-jdk" and the version is "freebsd6.i386.1.5.0.07.00".  That's
just plain bogus.

-- Brooks

-- 
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4


pgpGOC8Lt7rr1.pgp
Description: PGP signature

Re: em device hangs on ifconfig alias ...

2006-07-05 Thread Pyun YongHyeon

On Wed, Jul 05, 2006 at 06:29:55PM -0700, Atanas wrote:
 > Pyun YongHyeon said the following on 6/30/06 8:54 PM:
 > >On Fri, Jun 30, 2006 at 12:28:49PM -0700, Atanas wrote:
 > > > User Freebsd said the following on 6/29/06 9:29 PM:
 > > > >
 > > > >The other funny thing about the current em driver is that if you move 
 > > an > >IP to it from a different server, the appropriate ARP packets 
 > > aren't > >sent out to redirect the IP traffic .. recently, someone 
 > > pointed me to > >arping, which has solved my problem *external* to the 
 > > driver ...
 > > > >
 > > > That's the second reason why I (still) avoid em in mass-aliased systems.
 > > > 
 > > > I have a single pool of IP addresses shared by many servers with 
 > > > multiple aliases each. When someone leaves and frees an IP, it gets 
 > > > reused and brought up on a different server. In case it was previously 
 > > > handled by em, the traffic doesn't get redirected to the new server.
 > > > 
 > > > Similar thing happens even with machines with single static IPs. For 
 > > > instance when retiring an old production system, I usually request a 
 > > new > box to be brought up on a different IP, make a fresh install on 
 > > > everything and test, swap IP addresses and reboot. In case of em, after 
 > > > a soft reboot both systems are inaccessible.
 > > > 
 > > > A workaround is to power both of the systems down and then power them 
 > > > up. This however cannot be done remotely and in case there were IP 
 > > > aliases, they still don't get any traffic.
 > > > 
 > >
 > >I haven't fully tested it but what about attached patch?
 > >It may fix your ARP issue. The patch also fixes other issues
 > >related with ioctls.
 > >Now em(4) will send a ARP packet when its IP address is changed even
 > >if there is no active link. Since em(4) is not mii-aware driver I
 > >can't sure this behaviour is correct.
 > >
 > The patch is against if_em.c,v 1.116 2006/06/06, which is 7-CURRENT. I 
 > tried "merging" the relevant em driver files into a 6-STABLE 
 > installation by simply copying sys/dev/em/* and sys/modules/em/Makefile, 
 > but it seems that the new revision depends on other -CURRENT things and 
 > the module build fails:
 > 
 > # pwd
 > /usr/src/sys/modules/em
 > # make clean; make
 > ...
 > /usr/src/sys/modules/em/../../dev/em/if_em.c: In function 
 > `em_setup_interface':
 > /usr/src/sys/modules/em/../../dev/em/if_em.c:2143: error: 
 > `IFCAP_VLAN_HWCSUM' undeclared (first use in this function)
 > ...
 > 
 > I don't have a 7-CURRENT based box around. It seems too bleeding edge 
 > for me anyway. I was hoping to play with different if_em kernel modules 
 > on a semi-production (spare) box and eventually test the proposed em 
 > patch, but apparently it's not so easy.
 > 
 > Please let me know if I'm missing something obvious.
 > 

My bad. Here is patch generated against RELENG_6.

-- 
Regards,
Pyun YongHyeon
--- if_em.c.origFri May 19 09:19:57 2006
+++ if_em.c Thu Jul  6 11:10:56 2006
@@ -657,8 +657,9 @@
 
mtx_assert(&adapter->mtx, MA_OWNED);
 
-if (!adapter->link_active)
-return;
+   if ((ifp->if_drv_flags & (IFF_DRV_RUNNING|IFF_DRV_OACTIVE)) !=
+   IFF_DRV_RUNNING)
+   return;
 
 while (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) {
 
@@ -719,11 +720,6 @@
if (adapter->in_detach) return(error);
 
switch (command) {
-   case SIOCSIFADDR:
-   case SIOCGIFADDR:
-   IOCTL_DEBUGOUT("ioctl rcv'd: SIOCxIFADDR (Get/Set Interface 
Addr)");
-   ether_ioctl(ifp, command, data);
-   break;
case SIOCSIFMTU:
{
int max_frame_size;
@@ -760,16 +756,17 @@
IOCTL_DEBUGOUT("ioctl rcv'd: SIOCSIFFLAGS (Set Interface 
Flags)");
EM_LOCK(adapter);
if (ifp->if_flags & IFF_UP) {
-   if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
+   if ((ifp->if_drv_flags & IFF_DRV_RUNNING)) {
+   if ((ifp->if_flags ^ adapter->if_flags) &
+   IFF_PROMISC) {
+   em_disable_promisc(adapter);
+   em_set_promisc(adapter);
+   }
+   } else
em_init_locked(adapter);
-   }
-
-   em_disable_promisc(adapter);
-   em_set_promisc(adapter);
} else {
-   if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
+   if (ifp->if_drv_flags & IFF_DRV_RUNNING)
em_stop(adapter);
-   }
}
EM_UNLOCK(adapter);
break;
@@ -835,8 +832,8 @@
break;
}
default:
-   IOCTL_DEBUGOUT1("ioctl received: UNKNOWN (0x%x)", (int)command);

Re: em device hangs on ifconfig alias ...

2006-07-05 Thread Atanas


Pyun YongHyeon said the following on 6/30/06 8:54 PM:

On Fri, Jun 30, 2006 at 12:28:49PM -0700, Atanas wrote:
 > User Freebsd said the following on 6/29/06 9:29 PM:
 > >
 > >The other funny thing about the current em driver is that if you move an 
 > >IP to it from a different server, the appropriate ARP packets aren't 
 > >sent out to redirect the IP traffic .. recently, someone pointed me to 
 > >arping, which has solved my problem *external* to the driver ...

 > >
 > That's the second reason why I (still) avoid em in mass-aliased systems.
 > 
 > I have a single pool of IP addresses shared by many servers with 
 > multiple aliases each. When someone leaves and frees an IP, it gets 
 > reused and brought up on a different server. In case it was previously 
 > handled by em, the traffic doesn't get redirected to the new server.
 > 
 > Similar thing happens even with machines with single static IPs. For 
 > instance when retiring an old production system, I usually request a new 
 > box to be brought up on a different IP, make a fresh install on 
 > everything and test, swap IP addresses and reboot. In case of em, after 
 > a soft reboot both systems are inaccessible.
 > 
 > A workaround is to power both of the systems down and then power them 
 > up. This however cannot be done remotely and in case there were IP 
 > aliases, they still don't get any traffic.
 > 


I haven't fully tested it but what about attached patch?
It may fix your ARP issue. The patch also fixes other issues
related with ioctls.
Now em(4) will send a ARP packet when its IP address is changed even
if there is no active link. Since em(4) is not mii-aware driver I
can't sure this behaviour is correct.

The patch is against if_em.c,v 1.116 2006/06/06, which is 7-CURRENT. I 
tried "merging" the relevant em driver files into a 6-STABLE 
installation by simply copying sys/dev/em/* and sys/modules/em/Makefile, 
but it seems that the new revision depends on other -CURRENT things and 
the module build fails:


# pwd
/usr/src/sys/modules/em
# make clean; make
...
/usr/src/sys/modules/em/../../dev/em/if_em.c: In function 
`em_setup_interface':
/usr/src/sys/modules/em/../../dev/em/if_em.c:2143: error: 
`IFCAP_VLAN_HWCSUM' undeclared (first use in this function)

...

I don't have a 7-CURRENT based box around. It seems too bleeding edge 
for me anyway. I was hoping to play with different if_em kernel modules 
on a semi-production (spare) box and eventually test the proposed em 
patch, but apparently it's not so easy.


Please let me know if I'm missing something obvious.

Thanks,
Atanas






Index: if_em.c
===
RCS file: /pool/ncvs/src/sys/dev/em/if_em.c,v
retrieving revision 1.116
diff -u -r1.116 if_em.c
--- if_em.c 6 Jun 2006 08:03:49 -   1.116
+++ if_em.c 1 Jul 2006 03:51:41 -
@@ -692,7 +692,8 @@
 
 	EM_LOCK_ASSERT(sc);
 
-	if (!sc->link_active)

+   if ((ifp->if_drv_flags & (IFF_DRV_RUNNING|IFF_DRV_OACTIVE)) !=
+   IFF_DRV_RUNNING)
return;
 
 	while (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) {

@@ -751,11 +752,6 @@
return (error);
 
 	switch (command) {

-   case SIOCSIFADDR:
-   case SIOCGIFADDR:
-   IOCTL_DEBUGOUT("ioctl rcv'd: SIOCxIFADDR (Get/Set Interface 
Addr)");
-   ether_ioctl(ifp, command, data);
-   break;
case SIOCSIFMTU:
{
int max_frame_size;
@@ -802,17 +798,19 @@
IOCTL_DEBUGOUT("ioctl rcv'd: SIOCSIFFLAGS (Set Interface 
Flags)");
EM_LOCK(sc);
if (ifp->if_flags & IFF_UP) {
-   if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
+   if ((ifp->if_drv_flags & IFF_DRV_RUNNING)) {
+   if ((ifp->if_flags ^ sc->if_flags) &
+   IFF_PROMISC) {
+   em_disable_promisc(sc);
+   em_set_promisc(sc);
+   }
+   } else
em_init_locked(sc);
-   }
-
-   em_disable_promisc(sc);
-   em_set_promisc(sc);
} else {
-   if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
+   if (ifp->if_drv_flags & IFF_DRV_RUNNING)
em_stop(sc);
-   }
}
+   sc->if_flags = ifp->if_flags;
EM_UNLOCK(sc);
break;
case SIOCADDMULTI:
@@ -878,8 +876,8 @@
break;
}
default:
-   IOCTL_DEBUGOUT1("ioctl received: UNKNOWN (0x%x)", (int)command);
-   error = EINVAL;
+   error = ether_ioctl(ifp, command, data);
+   break;

pkg_version confused by architecutre in package name

2006-07-05 Thread [LoN]Kamikaze

I normally run the command
#  pkg_version -Iv | grep \<
before running 'portupgrade -a', to see what's going to happen. This time I got 
the following output:

diablo-jdk-freebsd6.i386.1.5.0.07.00  <   needs updating (index has 1.5.0.07.00)

It seems that the tool is confused by the i386 in the package name.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?

2006-07-05 Thread Max Laier

On Thursday 06 July 2006 02:17, Vye Wilson wrote:
> # vmstat -i
> interrupt  total   rate
> irq1: atkbd0   5  0
> irq6: fdc0 3  0
> irq10: uhci1   915633230 262810
> irq15: ata1 1306  0
> irq17: fwohci0 1  0
> irq18: fxp0 2876  0
> irq21: twa0  153  0
> cpu0: timer  6964974   1999
> Total  922602548 264811

Are you using usb on that box?  If not, get rid of device uhci in your kernel 
config to see if that fixes it.  If you are using usb - I have no idea.  A 
BIOS upgrade might help.

-- 
/"\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News


pgprlUTcGe4ux.pgp
Description: PGP signature

Re: 0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?

2006-07-05 Thread Steven Hartland


Anything plugged into USB if so try removing it as uhci1
is clearly your issue.

Vye Wilson wrote:

# vmstat -i
interrupt  total   rate
irq1: atkbd0   5  0
irq6: fdc0 3  0
irq10: uhci1   915633230 262810
irq15: ata1 1306  0
irq17: fwohci0 1  0
irq18: fxp0 2876  0
irq21: twa0  153  0
cpu0: timer  6964974   1999




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to [EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread User Freebsd


On Wed, 5 Jul 2006, Francisco Reyes wrote:


Scott Long writes:


For what it's worth, I recently spent a lot of time putting FreeBSD 6.1
to the test as both an NFS client and server in a mixed OS environment.


I have a few debugging settings/suggestions that have been sent my way and I 
plan to try them tonight, but this is just another report..


FreeBSD only environment.
Today after hours going crazy with horrible performance I brought down nfsd 
and brought it back up.. that simple process got vmstat 'b' column down and 
everything was back to normal.


Again this will not help anyone troubleshoot, but just to mention that it 
happens even with a FreeBSD only environment.


'k, to those out there that know what is useful, and what isn't ...

If Francisco had DDB enabled, did a CTL-ALT-ESC when the above happens, 
and does a 'panic' to crash the server and dump a core ... can anything 
useful be gleamed from that core dump?



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?

2006-07-05 Thread Vye Wilson


# vmstat -i
interrupt  total   rate
irq1: atkbd0   5  0
irq6: fdc0 3  0
irq10: uhci1   915633230 262810
irq15: ata1 1306  0
irq17: fwohci0 1  0
irq18: fxp0 2876  0
irq21: twa0  153  0
cpu0: timer  6964974   1999
Total  922602548 264811

# systat
   /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
Load Average

   /0   /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
root irq10: uhc X
root   idle X
 

So would irq10 be the culprit? If so where do I go from here?


On 7/5/06, Max Laier <[EMAIL PROTECTED]> wrote:


On Thursday 06 July 2006 02:02, Vye Wilson wrote:
> I'm really not sure how to go about troubleshooting this issue. Can
someone
> point me in the right direction?

"vmstat -i" should give a good idea what is causing the interrupt load.

--
/"\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News






--
--Vye
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?

2006-07-05 Thread Steven Hartland


"vmstat -i" and "systat" will be useful at identifying
what is causing the interupts.

Vye Wilson wrote:

Recently I've had an unusually high amount of 'interrupt' cpu usage. I
stopped all my jails so the box is for the most part idle.


   Steve



This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to [EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?

2006-07-05 Thread Max Laier

On Thursday 06 July 2006 02:02, Vye Wilson wrote:
> I'm really not sure how to go about troubleshooting this issue. Can someone
> point me in the right direction?

"vmstat -i" should give a good idea what is causing the interrupt load.

-- 
/"\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News


pgpF9ZSmeYtaR.pgp
Description: PGP signature

0.0% user, 0.0% nice, 0.0% system, 53.8% interrupt, 46.2% idle - Unusual interrupt use?

2006-07-05 Thread Vye Wilson


Recently I've had an unusually high amount of 'interrupt' cpu usage. I
stopped all my jails so the box is for the most part idle.

Here is my uname:
FreeBSD Natsume.wow.com 6.1-STABLE FreeBSD 6.1-STABLE #3: Tue Jul  4
22:14:02 UTC 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NATSUME  i386

Here is my top output:
last pid:   674;  load averages:  0.00,  0.00,  0.00
up 0+00:32:49  16:51:02
19 processes:  1 running, 18 sleeping
CPU states:  0.0% user,  0.0% nice,  0.0% system, 53.8% interrupt, 46.2%idle
Mem: 5332K Active, 3984K Inact, 20M Wired, 9056K Buf, 967M Free
Swap: 2022M Total, 2022M Free

 PID USERNAMETHR PRI NICE   SIZERES STATETIME   WCPU COMMAND
 666 root  1   40  6116K  3096K sbwait   0:00  0.00% sshd
 674 root  1 -640  2288K  1560K RUN  0:00  0.00% top
 670 vye   1   80  3188K  1992K wait 0:00  0.00% bash
 672 vye   1   80  1684K  1332K wait 0:00  0.00% su
 312 root  1  960  1344K   988K select   0:00  0.00% syslogd
 673 root  1   80  3184K  2068K wait 0:00  0.00% bash
 669 vye   1  960  6100K  3128K select   0:00  0.00% sshd
 463 root  1   80  1356K  1116K nanslp   0:00  0.00% cron
 594 root  1   50  1312K   944K ttyin0:00  0.00% getty
 599 root  1   50  1312K   944K ttyin0:00  0.00% getty
 593 root  1   50  1312K   944K ttyin0:00  0.00% getty
 597 root  1   50  1312K   944K ttyin0:00  0.00% getty
 592 root  1   50  1312K   944K ttyin0:00  0.00% getty
 595 root  1   50  1312K   944K ttyin0:00  0.00% getty
 596 root  1   50  1312K   944K ttyin0:00  0.00% getty
 598 root  1   50  1312K   944K ttyin0:00  0.00% getty
 450 root  1  960  3400K  2556K select   0:00  0.00% sshd
 390 root  1  960  1256K   832K select   0:00  0.00% usbd
 283 root  1 1080   516K   376K select   0:00  0.00% devd

After taking a look at dmesg I'm not sure if I just now noticed this or if
it has recently started doing this:

unknown:  can't assign resources (memory)
unknown:  can't assign resources (port)
unknown:  can't assign resources (port)
unknown:  can't assign resources (port)
unknown:  can't assign resources (port)
unknown:  can't assign resources (port)
unknown:  can't assign resources (irq)

Full dmesg output:

Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
   The Regents of the University of California. All rights reserved.
FreeBSD 6.1-STABLE #3: Tue Jul  4 22:14:02 UTC 2006
   [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NATSUME
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz (2392.05-MHz 686-class CPU)
 Origin = "GenuineIntel"  Id = 0xf27  Stepping = 7

Features=0xbfebfbff
 Features2=0x400
real memory  = 1073479680 (1023 MB)
avail memory = 1041547264 (993 MB)
MPTable: 
ioapic0: Assuming intbase of 0
ioapic0  irqs 0-23 on motherboard
kbd1 at kbdmux0
cpu0 on motherboard
pcib0:  pcibus 0 on motherboard
pci0:  on pcib0
pcib0: unable to route slot 31 INTC
agp0:  mem 0xf800-0xfbff at device
0.0 on pci0
pcib1:  at device 1.0 on pci0
pci1:  on pcib1
pcib2:  at device 30.0 on pci0
pci2:  on pcib2
3ware device driver for 9000 series storage controllers, version:
3.60.02.012
twa0: <3ware 9000 series Storage Controller> port 0xd400-0xd4ff mem
0xfeaffc00-0xfeaffcff,0xf380-0xf3ff irq 21 at device 9.0 on pci2
twa0: [GIANT-LOCKED]
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9500S-12, 12 ports,
Firmware FE9X 2.06.00.009, BIOS BE9X 2.03.01.051
pcib3:  at device 11.0 on pci2
pci3:  on pcib3
pci3:  at device 8.0 (no driver attached)
fwohci0:  mem 0xfc8fe000-0xfc8fefff irq 17 at device
9.0on pci3
fwohci0: OHCI version 1.0 (ROM=1)
fwohci0: No. of Isochronous channels is 8.
fwohci0: EUI64 00:08:d3:f0:00:00:01:09
fwohci0: Phy 1394a available S400, 3 ports.
fwohci0: Link S400, max_rec 2048 bytes.
firewire0:  on fwohci0
fwe0:  on firewire0
if_fwe0: Fake Ethernet address: 02:08:d3:00:01:09
fwe0: Ethernet address: 02:08:d3:00:01:09
fwe0: if_start running deferred for Giant
sbp0:  on firewire0
fwohci0: Initiate bus reset
fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me)
firewire0: bus manager 0 (me)
fxp0:  port 0xdf00-0xdf3f mem
0xfeacf000-0xfeac,0xfea8-0xfea9 irq 18 at device 12.0 on pci2
miibus0:  on fxp0
inphy0:  on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:07:e9:d4:a4:f8
fxp1:  port 0xde80-0xdebf mem
0xfeace000-0xfeacefff,0xfea4-0xfea5 irq 19 at device 13.0 on pci2
miibus1:  on fxp1
inphy1:  on miibus1
inphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp1: Ethernet address: 00:07:e9:d4:a4:fa
pci2:  at device 15.0 (no driver attached)
isab0:  at device 31.0 on pci0
isa0:  on i

Re: NFS Locking Issue

2006-07-05 Thread Francisco Reyes


User Freebsd writes:


What are others using for ethernet?


Of our two machines having the problem 1 has BGE and the other one has EM 
(Intel). Doesn't seem to make much of a difference.


Except for the network cards, these two machines are identical. Same 
motherboard, same RAID controller, same amount of RAM, same RAID 
configuration...


 
___

freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread Robert Watson



On Wed, 5 Jul 2006, Francisco Reyes wrote:

can you trigger it using work on just one client against a server, without 
client<->client interactions?  This makes tracking and reproduction a lot 
easier


Personally I am experiencing two problems.
1- NFS clients freeze/hang if the server goes away.
We have clients with several mounts so if one of the servers dies then the 
entire operation of the client is put in jeopardy.


This I can reproduce every single time with a 6.X client.. with both a 5.X 
and a 6.X server.


"umount -f" hangs too.


The problems you are experiencing are almost certainly not related to 
rpc.lockd, rather, bugs in the NFS client.


Let's just look at the normal use hang for now, and revisit umount -f after 
that.



as multi-client test cases are really tricky!


The second case only happens under heavy load and restarting nfsd makes it 
go away. Basically 'b' column in vmstat goes high and the performnance of 
the machine falls to the floor.


Going to try 
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneld 
ebug-deadlocks.html


And reading up on how to debug with DDB. Have another user who volunteered 
to give me some pointers.. so will try that.. so I am able to actually 
produce more helpfull info.


If you can get into DDB when the hang has occurred, output via serial console 
for the following commands would be very helpful:


show pcpu
show allpcpu
ps
trace
traceall
show locks
show alllocks
show uma
show malloc
show lockedvnods

Note that the last two will only work if you compile WITNESS in -- WITNESS 
significantly changes kernel timing, so you may find it closes whatever race 
you're running into.  If you can reproduce the problem with WITNESS and 
INVARIANTS, that would be very useful.  The above output will hopefully tell 
us the basic state of the system with respect to processes, threads, locking, 
and so on, and may help us track things down.  For the above, you definitely 
want a serial console as it will be quite a bit of output.


Also, can you send the output of the 'mount' command from the un-hung state? 
I notice a lot of threads stuck in 'ufs'.


Finally, during the above, if you could disable background file system 
checking by placing the following in /etc/rc.conf:


  background_fsck="NO"

And boot to single user mode, doing a full fsck -p before booting up, in order 
to make sure the file system is in a good state before beginning.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread Francisco Reyes


User Freebsd writes:

I believe, in Francisco's case, they are willing to pay someone to fix the 
NFS issues they are having, which, i'd assume, means easy access to the 
problematic server(s) to do proper testing in a "real life scenario" ...


Correct. As long as the person is someone "trusted in the community" we 
could do that. And yes we are willing to come to some agreement for 
compensation for the help. Needless to say our introduction of new machines 
will go through a more rigourous test in the future.. specially when jumping 
to a new Release number in FreeBSD. 

We lost 1 big customer and after today we likely will loose 2 or 3 more.. of 
the big ones.. when it's all said and done we are likely to loose several 
thousand dollars/month due to this 6.X incidents.


We are fairly new to NFS and that's why we were hoping to get someone to 
help us.. or at least point us in the right direction.


I plan to go over the link you sent me and try to prepare at least one 
machine. 

As for paying someone, yes we have been actively looking for someone to help 
us since we are relatively new to NFS.. and much more newer to 
troubleshooting this type of prolbems

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread Francisco Reyes


Robert Watson writes:

It's not impossible.  It would be interesting to see if ps axl reports that 
rpc.lockd is in the kqread state


Found my post in another thread.
0   354 1   0  96  0  1412  1032 select Ss??0:07.06 
/usr/sbin/rpcbind


It was not in kqread state.. and that was from a point where the machine was 
totally locked up.. had to do a physical reset.. could not even kill nfsd 
that time.


I had also more output from several different ps. You need to do "view more" 
to see them all.


http://tinyurl.com/kpejr
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread Francisco Reyes


Robert Watson writes:

It's not impossible.  It would be interesting to see if ps axl reports that 
rpc.lockd is in the kqread state, which would suggest it was blocked in the 
resolver.


Just tried "ps axl | grep rpc" in the machine giving us the most grief.. 
Only got one line back:

root  367  0.0  0.0  1368   960  ??  Ss   25Jun06   0:05.52 /usr/sbin/rpcbin
 0 1   0   4  0 select

Is that what one of the lines I should keep an eye, next time the machine is 
locked up?

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread Francisco Reyes


Robert Watson writes:

can you trigger it using work on just one client against a server, without 
client<->client interactions?  This makes tracking and reproduction a lot 
easier


Personally I am experiencing two problems.
1- NFS clients freeze/hang if the server goes away.
We have clients with several mounts so if one of the servers dies then the 
entire operation of the client is put in jeopardy.


This I can reproduce every single time with a 6.X client.. with both a 5.X 
and a 6.X server.


"umount -f" hangs too.


as multi-client test cases are really tricky! 


The second case only happens under heavy load and restarting nfsd makes it 
go away. Basically 'b' column in vmstat goes high and the performnance of 
the machine falls to the floor.


Going to try 
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneld 
ebug-deadlocks.html


And reading up on how to debug with DDB. Have another user who volunteered 
to give me some pointers.. so will try that.. so I am able to actually 
produce more helpfull info.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread Francisco Reyes


Scott Long writes:


For what it's worth, I recently spent a lot of time putting FreeBSD 6.1
to the test as both an NFS client and server in a mixed OS environment.


I have a few debugging settings/suggestions that have been sent my way and I 
plan to try them tonight, but this is just another report..


FreeBSD only environment.
Today after hours going crazy with horrible performance I brought down nfsd 
and brought it back up.. that simple process got vmstat 'b' column down and 
everything was back to normal.


Again this will not help anyone troubleshoot, but just to mention that it 
happens even with a FreeBSD only environment.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: fetch hangs on AMD64 RELENG_6

2006-07-05 Thread Charles Swiger


On Jul 5, 2006, at 4:22 PM, Justin T. Gibbs wrote:
Hmm.  Seems we close the window unexpectedly and the remote side  
doesn't

retransmit when we open it.


Yes, interesting that.  :-)

Normally the stack only sets the window size to 0 in the event of  
severe congestion, it's used to tell the other side to stop sending  
traffic for an interval, although the other side should retry with  
zero-data-length ACK-only packets after a delay, or once your side  
sends a packet opening the window.



FreeBSD's acks stop once the window is fully
open... aren't the acks supposed to retried longer?  If not, shouldn't
fetch eventually see a socket close event instead of hanging forever?


RFC-793 says:

"The sending TCP must be prepared to accept from the user and send at
  least one octet of new data even if the send window is zero.  The
  sending TCP must regularly retransmit to the receiving TCP even when
  the window is zero.  Two minutes is recommended for the  
retransmission
  interval when the window is zero.  This retransmission is  
essential to
  guarantee that when either TCP has a zero window the re-opening of  
the

  window will be reliably reported to the other.

  When the receiving TCP has a zero window and a segment arrives it  
must
  still send an acknowledgment showing its next expected sequence  
number

  and current window (zero)."

The fact that you aren't seeing any ACK's back from this remote  
server suggests that perhaps a stateful firewall is involved which is  
getting confused and/or dropping the state entry once it sees the  
zero-window-size packet from your machine.


There may be something wrong on the FreeBSD side as well, of course--  
the fact that it grows the window by sending nearly twenty or more  
ACK packets in the span of about one millisecond without waiting for  
any ACKs from the other side is pretty wacky in it's own right.


--
-Chuck

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: fetch hangs on AMD64 RELENG_6

2006-07-05 Thread Justin T. Gibbs

Hmm.  Seems we close the window unexpectedly and the remote side doesn't
retransmit when we open it.  FreeBSD's acks stop once the window is fully
open... aren't the acks supposed to retried longer?  If not, shouldn't
fetch eventually see a socket close event instead of hanging forever?

A similar failure occurs with SACK disabled.

--
Justin

13:31:44.695211 IP manna.mozilla.org.http > databus.avidyne.com.64531: . 
9018128:9019496(1368) ack 179 win 1716 
13:31:44.695229 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
8957936 win 32832 
13:31:44.702704 IP manna.mozilla.org.http > databus.avidyne.com.64531: . 
9019496:9020864(1368) ack 179 win 1716 
13:31:44.702719 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
8957936 win 32832 
13:31:44.710200 IP manna.mozilla.org.http > databus.avidyne.com.64531: . 
9020864:9022232(1368) ack 179 win 1716 
13:31:44.710215 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
8957936 win 32832 
13:31:44.719444 IP manna.mozilla.org.http > databus.avidyne.com.64531: . 
9022232:9023600(1368) ack 179 win 1716 
13:31:44.719462 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
8957936 win 32832 
13:31:44.727065 IP manna.mozilla.org.http > databus.avidyne.com.64531: . 
8957936:8959304(1368) ack 179 win 1716 
13:31:44.727089 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 0 
13:31:44.727146 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 1680 
13:31:44.727181 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 3216 
13:31:44.727275 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 4752 
13:31:44.727295 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 6288 
13:31:44.727342 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 7824 
13:31:44.727375 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 9360 
13:31:44.727492 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 10896 
13:31:44.727513 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 12432 
13:31:44.727565 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 15504 
13:31:44.727632 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 17040 
13:31:44.727653 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 18576 
13:31:44.727701 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 20112 
13:31:44.727780 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 21648 
13:31:44.727870 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 23184 
13:31:44.727889 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 24720 
13:31:44.727920 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 26256 
13:31:44.727982 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 27792 
13:31:44.728034 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 29328 
13:31:44.728053 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 30864 
13:31:44.728217 IP databus.avidyne.com.64531 > manna.mozilla.org.http: . ack 
9023600 win 32400 















___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread Michel Talon

> with the bge driver ... could we be possibly talking internet vs nfs 
> issues?

Pursuing invetigations, i have discovered that for people having 
workstations whose home directories are on a NFS server, and who run 
Gnome or KDE, there is a program which has horrible NFS behavior,
it is gam_server from gamin, which detects alterations on your .kde
for example. On my machine running nfsstat -c -w 1 i see 4000 requests/s
due to that. If i displace it (*) and kill it, this drops to 80 requests/s
and KDE works exactly as well, including discovering new files.
I think it is not necessary to comment on the performance penalty if a number
of stations send 4000r/s to a server, it will soon be killed.
(*) it restarts itself automatically so it is necessary to displace or rename
it before killing.

-- 

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Network Card

2006-07-05 Thread Cian Hughes

http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/config- 
network-setup.html might help you


Regards, cian

On 5 Jul 2006, at 18:54, Mihir Sanghavi wrote:


Hi,
Can someone please tell me how do i activate the network card in  
FreeBSD 5.5.

Thanks.
--
What we see depends mainly on what we look for.
-MIHIR
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"




PGP.sig
Description: This is a digitally signed message part

Network Card

2006-07-05 Thread Mihir Sanghavi


Hi,
Can someone please tell me how do i activate the network card in FreeBSD 5.5.
Thanks.
--
What we see depends mainly on what we look for.
-MIHIR
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: fetch hangs on AMD64 RELENG_6

2006-07-05 Thread Michael Proto

Justin T. Gibbs wrote:
> Hi,
> 
> I'm seeing fetch hang under AMD64/RELENG_6 when fetching data
> from several different sites.  An i386 machinem sitting next to it
> running current from a few weeks back is not showing this problem
> when fetching the same files.  The failing machine is a Dell 2850
> with an em0 device.  We have a T-1 here, so transfer speeds are
> usually well over 100KBps.  fetch is stuck in sbwait.  Restarting
> fetch a few times will eventually allow the transfer to complete.
> Anyone else seen this?  Any hints on how I might help debug the
> problem?
> 

Are these fetches for ports installs, and if so are they from the
gnu.org site(s)? I noticed a similar issue myself last night when doing
some installs from ports, and they were all related to gnu.org FTP
sites. Otherwise fetch was working just as expected.


-Proto
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 5.5-stable network interface rl0 stops working

2006-07-05 Thread Roland Smith

On Wed, Jul 05, 2006 at 06:40:58PM +0200, Hank Hampel wrote:
> Hello everybody,
> 
> I have a very disturbing problem with one of our FreeBSD 5.5-stable
> machines. It is a box on which ~10 jail systems run, each with
> small to moderate network traffic.
> 
> Now from time to time - sometimes after a few days, sometimes after a
> couple of weeks - the network interface rl0 (which is the main
> interface on the maschine, rl1 is for backups/internal use only) stops
> working.

Are they physically on the motherboard? Or on PCI cards? In the latter
case try reseating the card in the slot.

Try switching rl0 and rl1, and see if te problem persists. Also,
swapping out the ethernet cable is worth trying.

Another thing to check is if rl0 is sharing an interrupt with another
device. That can cause problems.

> Each jailed system has its own firewall ruleset, permitting only
> traffic for the services in that specific jail. The packet filter used
> is ipfw. Some of the rules are stateful (keep-state).
> 
> When rl0 stops working ipfw loggs lots of denied packets so that it
> seems that the dynamic (keep-state) rules don't work any longer. We
> checked and increased the buffers for the dynamic rules to no avail -
> I doubt they are part of the problem. I'm not even sure ipfw is part
> of the problem.

Does the problem persist without ipfw? I've got an rl0 card on my
workstation (6.1-STABLE, amd64, using PF without problems)

> After the stop on the interface occurs there is no other way to get
> the interface up and running again than rebooting the whole machine.
> Restarting /etc/rc.d/netif, the jails or ipfw doesn't help anything.

What does ifconfig say after the interface stops working?
 
> The bad thing is I haven't found any way to trigger this problem so
> that I can only check and change things and wait if the situation
> improves or not. For example I've already set debug.mpsafenet="0" but
> this doesn't help, in contrast it seems to worsen the problem a little
> bit.

> Find attached the dmesg output of the machine. If any other
> information is needed to hunt down the cause of this problem please
> let me know. I checked various list archives but haven't found a clue
> yet.

Anything in the logs, except the denied packets?

Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpqxfkKnhmwC.pgp
Description: PGP signature

Re: NFS Locking Issue

2006-07-05 Thread User Freebsd


On Wed, 5 Jul 2006, Michel Talon wrote:


So it may be relevant to say that i have kernels without IPV6 support.
Recall that i have absolutely no problem with the client in FreeBSD-6.1.
Tomorrow i will test one of the 6.1 machines as a NFS server and the other as
a client, and will make you know if i see something.


Well, i have checked between 2 FreeBSD-6.1-RELEASE machines on the network,
both have fxp ethernet driver running at 100 Mb/s, one is NFS server other NFS
client. Both run lockd and statd. I have absolutely no problem exchanging
files, for example if i begin to copy /usr/src through NFS from one machine to
the other, which makes a lot of transactions of all sorts, i get:
niobe# mount asmodee:/usr/src /mnt
cp -R /mnt/src .
...
after some time i interrupt the transfer
niobe% du -sh .
131M.
and during this time i observe the following type of statistics
asmodee% netstat -w 1 -I fxp0
  input (fxp0)   output
  packets  errs  bytespackets  errs  bytes colls
  542 0  84116   1330 01219388 0
  515 0  72806   1290 01196330 0
  501 0  95722   1081 0 741048 0
  539 0  90704   1090 01228052 0
  645 0  67888902 01451098 0
  405 0  81264   1609 0 604278 0
  503 0  74218709 0 924422 0
  500 0  98904973 0 619350 0
  550 0 100122855 0 836328 0
  615 0  79336   1081 0 862772 0
  577 0  82862901 01005024 0

which looks decent to me.

Doing the same with just one big file no problem either, and i get a transfer
speed of 6.60 MB/s which is perhaps a little less than with linux, but nothing
catastrophic. I get 8.20 MB/s for FreeBSD client interacting with the Linux
server.

Now netstat gives
 packets  errs  bytespackets  errs  bytes colls
  785 0 123266   4716 06825600 0
  759 0 139898   4530 07747276 0
  852 0 124652   5106 06902566 0
  863 0 128040   5170 07081738 0
  811 0 123760   4862 06851498 0
  789 0 123540   4720 06834310 0
  840 0 115378   5024 06382114 0

So up to what i can see NFS works OK for me on FreeBSD-6.1.

So the main difference with other people cases may be that i have removed IPV6
support from kernel.


What are others using for ethernet?  In your case, you say you are running 
between fxp cards ... I've heard some report, in another thread, problems 
with the bge driver ... could we be possibly talking internet vs nfs 
issues?



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: fetch hangs on AMD64 RELENG_6

2006-07-05 Thread Charles Swiger


On Jul 5, 2006, at 1:42 PM, Justin T. Gibbs wrote:

I'm seeing fetch hang under AMD64/RELENG_6 when fetching data
from several different sites.  An i386 machine sitting next to it
running current from a few weeks back is not showing this problem
when fetching the same files. [ ... ]
Any hints on how I might help debug the problem?


Using tcpdump to look at the traffic would be a useful starting  
point.  :-)


--
-Chuck

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

fetch hangs on AMD64 RELENG_6

2006-07-05 Thread Justin T. Gibbs

Hi,

I'm seeing fetch hang under AMD64/RELENG_6 when fetching data
from several different sites.  An i386 machinem sitting next to it
running current from a few weeks back is not showing this problem
when fetching the same files.  The failing machine is a Dell 2850
with an em0 device.  We have a T-1 here, so transfer speeds are
usually well over 100KBps.  fetch is stuck in sbwait.  Restarting
fetch a few times will eventually allow the transfer to complete.
Anyone else seen this?  Any hints on how I might help debug the
problem?

Thanks,
Justin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

5.5-stable network interface rl0 stops working

2006-07-05 Thread Hank Hampel

Hello everybody,

I have a very disturbing problem with one of our FreeBSD 5.5-stable
machines. It is a box on which ~10 jail systems run, each with
small to moderate network traffic.

Now from time to time - sometimes after a few days, sometimes after a
couple of weeks - the network interface rl0 (which is the main
interface on the maschine, rl1 is for backups/internal use only) stops
working.

Each jailed system has its own firewall ruleset, permitting only
traffic for the services in that specific jail. The packet filter used
is ipfw. Some of the rules are stateful (keep-state).

When rl0 stops working ipfw loggs lots of denied packets so that it
seems that the dynamic (keep-state) rules don't work any longer. We
checked and increased the buffers for the dynamic rules to no avail -
I doubt they are part of the problem. I'm not even sure ipfw is part
of the problem.

After the stop on the interface occurs there is no other way to get
the interface up and running again than rebooting the whole machine.
Restarting /etc/rc.d/netif, the jails or ipfw doesn't help anything.

The bad thing is I haven't found any way to trigger this problem so
that I can only check and change things and wait if the situation
improves or not. For example I've already set debug.mpsafenet="0" but
this doesn't help, in contrast it seems to worsen the problem a little
bit.

Find attached the dmesg output of the machine. If any other
information is needed to hunt down the cause of this problem please
let me know. I checked various list archives but haven't found a clue
yet.

-[ dmesg ]-
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.5-STABLE #5: Tue May 30 13:51:55 CEST 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/SHAWSHANK
WARNING: MPSAFE network stack disabled, expect reduced performance.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz (2411.60-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf34  Stepping = 4
  
Features=0xbfebfbff
real memory  = 2147418112 (2047 MB)
avail memory = 2096037888 (1998 MB)
ACPI APIC Table: 
ioapic0  irqs 0-23 on motherboard
npx0:  on motherboard
npx0: INT 16 interface
acpi0:  on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0:  on acpi0
acpi_button0:  on acpi0
pcib0:  port 0x1000-0x10bf,0xcf8-0xcff on acpi0
pci0:  on pcib0
agp0:  mem 0xe800-0xefff at device 0.0 
on pci0
pcib1:  at device 1.0 on pci0
pci1:  on pcib1
pcib2:  at device 30.0 on pci0
pci2:  on pcib2
pci2:  at device 0.0 (no driver attached)
rl0:  port 0x9000-0x90ff mem 0xf500-0xf5ff 
irq 21 at device 1.0 on pci2
miibus0:  on rl0
rlphy0:  on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl0: Ethernet address: 00:02:2a:d5:39:74
rl1:  port 0x9400-0x94ff mem 0xf5001000-0xf50010ff 
irq 22 at device 2.0 on pci2
miibus1:  on rl1
rlphy1:  on miibus1
rlphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl1: Ethernet address: 00:02:2a:d5:39:53
isab0:  at device 31.0 on pci0
isa0:  on isab0
atapci0:  port 
0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
pci0:  at device 31.3 (no driver attached)
acpi_tz0:  on acpi0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A, console
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
pmtimer0 on isa0
orm0:  at iomem 0xc-0xc7fff on isa0
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x100>
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
atkbdc0:  at port 0x64,0x60 on isa0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
ppc0: parallel port not found.
Timecounter "TSC" frequency 2411601876 Hz quality 800
Timecounters tick every 10.000 msec
ipfw2 initialized, divert disabled, rule-based forwarding disabled, default to 
deny, logging disabled
ad0: 114497MB  [232629/16/63] at 
ata0-master UDMA100
acd0: DVDROM  at ata1-master PIO4
Mounting root from ufs:/dev/ad0s1a
-[ dmesg ]-


Best regards, Hank


pgpesF2HPryqd.pgp
Description: PGP signature

Re: NFS Locking Issue

2006-07-05 Thread Robert Watson



On Mon, 3 Jul 2006, Michael Collette wrote:


-
Let's start with the simplest.  The scenario here involves 2 machines, mach01 
and mach02.  Both are running 6-STABLE, and both are running rpcbind, 
rpc.statd, and rpc.lockd.  mach01 has exported /documents and mach02 is 
mounting that export under /mnt.  Simple enough?


The /documents directory has multiple subdirectories and files of various 
sizes.  The actual amount of data doesn't really matter to produce a failure. 
All you need to do at this point is to try to copy files from that mount 
point to somewhere else on the hard drive.


cp -Rp /mnt/* /tmp/documents/

You may, or not, see that a couple of subdirectories were created, but no 
files actually moved over.  The cp command is now locked up, and no traffic 
moves.  This usually takes a second or two to show up as a problem.  I can 
repeat this with multiple 6-STABLE boxes.


Turn off rpc.lockd on either the server or client before the cp command, and 
things work.


I've tried several times to reproduce this, and have not succeeded in doing 
so.  In princple, cp should not be using advisory locks.  Could you try 
running cp under ktrace, and saving the ktrace file somewhere outside of NFS? 
Something like the following:


  ktrace -f /usr/tmp/localfile cp -Rp /mnt/* /tmp/documents/

If you are able to reproduce the problem with tracing turned on, a copy of the 
tracefile would be very helpful.


Also, when it locks up, are you able to kill cp using Ctrl-C, and if you hit 
Ctrl-T while it appears locked, what output do you get?


Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread User Freebsd


On Wed, 5 Jul 2006, Robert Watson wrote:


On Wed, 5 Jul 2006, Danny Braniss wrote:

In my case our main servers are NetApp, and the problems are more related 
to am-utils running into some race condition (need more time to debug this 
:-) the other problem is related to throughput, freebsd is slower than 
linux, and while freebsd/nfs/tcp is faster on Freebsd than udp, on linux 
it's the same. So it seems some tunning is needed.


our main problem now is samba/rpc.lockd, we are stuck with a server running 
FreeBSD 5.4 which crashes, and we can't upgrade to 6.1 because lockd 
doesn't work.


So, if someone is willing to look into the lockd issue, we would like to 
help.


The most significant problem working with rpc.lockd is creating easy to 
reproduce test cases.  Not least because they can potentially involve 
multiple clients.  If you can help to produce simple test cases to reproduce 
the bugs you're seeing, that would be invaluable.


I'm aware of two general classes of problems with rpc.lockd.  First, 
architectural issues, some derived from architectural problems in the NLM 
protocol: for example, assumptions that there can be a clean mapping of 
process lock owners to locks, which fall down as locks are properties of file 
descriptors that can be inheritted.  Second, implementation bugs/misfeatures, 
such as the kernel not knowing how to cancel lock requests, so being unable 
to implement interruptible waits on locks in the distributed case.


Reducing complex failure modes to easily reproduced test cases is tricky 
also, though.  It requires careful analysis, often with ktrace and 
tcpdump/ethereal to work out what's going on, and not a little luck to 
perform the reduction of a large trace down to a simple test scenario.  The 
first step is to try and figure out what, if any, specific workload results 
in a problem.  For example, can you trigger it using work on just one client 
against a server, without client<->client interactions?  This makes tracking 
and reproduction a lot easier, as multi-client test cases are really tricky! 
Once you've established whether it can be reproduced with a single client, 
you have to track down the behavior that triggers it -- normally, this is 
done by attempting to narrow down the specific program or sequence of events 
that causes the bug to trigger, removing things one at a time to see what 
causes the problem to disappear.  This is made more difficult as lock 
managers are sensitive to timing, so removing a high load item from the list, 
even if it isn't the source of the problem, might cause it to trigger less 
frequently.


I'm not sure if this is an option for anyone, either developer or user, 
but in the past, on particularly tricky bugs where I seemed to be the only 
one to be able to produce it, I've given access to a 'trusted developer' 
to the machine itself, to minimize the time lag that emails create ... 
but, also, to let the developer at a machine that has the load required to 
easily reproduce it ...


Not sure if there is anyone out there, on either side of the proverbial 
fence, that feels comfortable doing this, but figured I'd throw the idea 
out ...


I believe, in Francisco's case, they are willing to pay someone to fix the 
NFS issues they are having, which, i'd assume, means easy access to the 
problematic server(s) to do proper testing in a "real life scenario" ...



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread Michel Talon

> So it may be relevant to say that i have kernels without IPV6 support.
> Recall that i have absolutely no problem with the client in FreeBSD-6.1.
> Tomorrow i will test one of the 6.1 machines as a NFS server and the other as
> a client, and will make you know if i see something.

Well, i have checked between 2 FreeBSD-6.1-RELEASE machines on the network,
both have fxp ethernet driver running at 100 Mb/s, one is NFS server other NFS
client. Both run lockd and statd. I have absolutely no problem exchanging
files, for example if i begin to copy /usr/src through NFS from one machine to
the other, which makes a lot of transactions of all sorts, i get:
niobe# mount asmodee:/usr/src /mnt
cp -R /mnt/src .
...
after some time i interrupt the transfer 
niobe% du -sh .
131M.
and during this time i observe the following type of statistics
asmodee% netstat -w 1 -I fxp0
   input (fxp0)   output
   packets  errs  bytespackets  errs  bytes colls
   542 0  84116   1330 01219388 0 
   515 0  72806   1290 01196330 0 
   501 0  95722   1081 0 741048 0 
   539 0  90704   1090 01228052 0 
   645 0  67888902 01451098 0 
   405 0  81264   1609 0 604278 0 
   503 0  74218709 0 924422 0 
   500 0  98904973 0 619350 0 
   550 0 100122855 0 836328 0 
   615 0  79336   1081 0 862772 0 
   577 0  82862901 01005024 0 
   
which looks decent to me.

Doing the same with just one big file no problem either, and i get a transfer
speed of 6.60 MB/s which is perhaps a little less than with linux, but nothing
catastrophic. I get 8.20 MB/s for FreeBSD client interacting with the Linux
server.

Now netstat gives
  packets  errs  bytespackets  errs  bytes colls
   785 0 123266   4716 06825600 0 
   759 0 139898   4530 07747276 0 
   852 0 124652   5106 06902566 0 
   863 0 128040   5170 07081738 0 
   811 0 123760   4862 06851498 0 
   789 0 123540   4720 06834310 0 
   840 0 115378   5024 06382114 0 
   
So up to what i can see NFS works OK for me on FreeBSD-6.1. 

So the main difference with other people cases may be that i have removed IPV6
support from kernel.

-- 

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread Kostik Belousov

On Wed, Jul 05, 2006 at 02:04:59PM +0100, Robert Watson wrote:
> 
> On Wed, 5 Jul 2006, Kostik Belousov wrote:
> 
> >>Also, the both lockd processes now put identification information in the 
> >>proctitle (srv and kern). SIGUSR1 shall be sent to srv process.
> >
> >Hmm, after looking at the dump there and some code reading, I have noted 
> >the following:
> >
> >1. NLM lock request contains the field caller_name. It is filled by (let 
> >call it) kernel rpc.lockd by the results of hostname(3).
> >
> >2. This caller_name is used by server rpc.lockd to send request for host 
> >monitoring to rpc.statd (see send_granted). Request is made by clnt_call, 
> >that is blocking rpc call.
> >
> >3. rpc.statd does getaddrinfo on caller_name to determine address of the 
> >host to monitor.
> >
> >If the getaddrinfo in step 3 waits for resolver, then your client machine 
> >will get locking process in"lockd" state.
> >
> >Could people experiencing rpc.lockd mistery at least report whether 
> >_server_ machine successfully resolve hostname of clients as reported by 
> >hostname? And, if yes, to what family of IP protocols ?
> 
> It's not impossible.  It would be interesting to see if ps axl reports that 
> rpc.lockd is in the kqread state, which would suggest it was blocked in the 
  rpc.statd :).
> resolver.  We probably ought to review rpc.statd and make sure it's 
> generally sensible.  I've noticed that its notification process on start is 
> a bit poorly structured in terms of how it notifies hosts of its state 
> change -- if one host is down, it may take a very long time to notify other 
> hosts.


pgpExEUvwNn5G.pgp
Description: PGP signature

Re: NFS Locking Issue

2006-07-05 Thread Robert Watson



On Wed, 5 Jul 2006, Kostik Belousov wrote:

Also, the both lockd processes now put identification information in the 
proctitle (srv and kern). SIGUSR1 shall be sent to srv process.


Hmm, after looking at the dump there and some code reading, I have noted the 
following:


1. NLM lock request contains the field caller_name. It is filled by (let 
call it) kernel rpc.lockd by the results of hostname(3).


2. This caller_name is used by server rpc.lockd to send request for host 
monitoring to rpc.statd (see send_granted). Request is made by clnt_call, 
that is blocking rpc call.


3. rpc.statd does getaddrinfo on caller_name to determine address of the 
host to monitor.


If the getaddrinfo in step 3 waits for resolver, then your client machine 
will get locking process in"lockd" state.


Could people experiencing rpc.lockd mistery at least report whether _server_ 
machine successfully resolve hostname of clients as reported by hostname? 
And, if yes, to what family of IP protocols ?


It's not impossible.  It would be interesting to see if ps axl reports that 
rpc.lockd is in the kqread state, which would suggest it was blocked in the 
resolver.  We probably ought to review rpc.statd and make sure it's generally 
sensible.  I've noticed that its notification process on start is a bit poorly 
structured in terms of how it notifies hosts of its state change -- if one 
host is down, it may take a very long time to notify other hosts.


There are a number of other dubious things about the NLM protocol design (at 
least, from my reading last night). I've also noticed that our rpc.lockd is 
particularly sensitive, on the client side, to locks being released by a 
different process than the process that acquired the lock, which is triggered 
excessively by our new libpidfile in RELENG_6.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread Kostik Belousov

On Wed, Jul 05, 2006 at 02:38:22PM +0300, Kostik Belousov wrote:
> On Wed, Jul 05, 2006 at 10:09:24AM +0100, Robert Watson wrote:
> > The most significant problem working with rpc.lockd is creating easy to 
> > reproduce test cases.  Not least because they can potentially involve 
> > multiple clients.  If you can help to produce simple test cases to 
> > reproduce the bugs you're seeing, that would be invaluable.
> > 
> 
> > 
> > Reducing complex failure modes to easily reproduced test cases is tricky 
> > also, though.  It requires careful analysis, often with ktrace and 
> > tcpdump/ethereal to work out what's going on, and not a little luck to 
> > perform the reduction of a large trace down to a simple test scenario.  The 
> > first step is to try and figure out what, if any, specific workload results 
> > in a problem.  For example, can you trigger it using work on just one 
> > client against a server, without client<->client interactions?  This makes 
> > tracking and reproduction a lot easier, as multi-client test cases are 
> > really tricky!  Once you've established whether it can be reproduced with a 
> > single client, you have to track down the behavior that triggers it -- 
> > normally, this is done by attempting to narrow down the specific program or 
> > sequence of events that causes the bug to trigger, removing things one at a 
> > time to see what causes the problem to disappear.  This is made more 
> > difficult as lock managers are sensitive to timing, so removing a high load 
> > item from the list, even if it isn't the source of the problem, might cause 
> > it to trigger less frequently.
> 
> I made the patch for rpc.lockd that could somewhat ease obtaining
> debug information. Patch is available at
> http://people.freebsd.org/~kib/rpc.lockd-debug.patch
> 
> No functional changes. Patch only adds dumping of currently held locks
> (as perceived by lockd) on receiving of SIGUSR1. You need to specify
> debug level 2 or 3 to obtain the dump.
> 
> Also, the both lockd processes now put identification information
> in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process.

Hmm, after looking at the dump there and some code reading, I have noted
the following:

1. NLM lock request contains the field caller_name. It is filled by
(let call it) kernel rpc.lockd by the results of hostname(3).

2. This caller_name is used by server rpc.lockd to send request
for host monitoring to rpc.statd (see send_granted).
Request is made by clnt_call, that is blocking rpc call.

3. rpc.statd does getaddrinfo on caller_name to determine address of the
host to monitor.

If the getaddrinfo in step 3 waits for resolver, then your client machine
will get locking process in"lockd" state.

Could people experiencing rpc.lockd mistery at least report whether
_server_ machine successfully resolve hostname of clients as reported
by hostname? And, if yes, to what family of IP protocols ?


pgpqXwVLbOl6l.pgp
Description: PGP signature

Re: NFS Locking Issue

2006-07-05 Thread Kostik Belousov

On Wed, Jul 05, 2006 at 10:09:24AM +0100, Robert Watson wrote:
> The most significant problem working with rpc.lockd is creating easy to 
> reproduce test cases.  Not least because they can potentially involve 
> multiple clients.  If you can help to produce simple test cases to 
> reproduce the bugs you're seeing, that would be invaluable.
> 

> 
> Reducing complex failure modes to easily reproduced test cases is tricky 
> also, though.  It requires careful analysis, often with ktrace and 
> tcpdump/ethereal to work out what's going on, and not a little luck to 
> perform the reduction of a large trace down to a simple test scenario.  The 
> first step is to try and figure out what, if any, specific workload results 
> in a problem.  For example, can you trigger it using work on just one 
> client against a server, without client<->client interactions?  This makes 
> tracking and reproduction a lot easier, as multi-client test cases are 
> really tricky!  Once you've established whether it can be reproduced with a 
> single client, you have to track down the behavior that triggers it -- 
> normally, this is done by attempting to narrow down the specific program or 
> sequence of events that causes the bug to trigger, removing things one at a 
> time to see what causes the problem to disappear.  This is made more 
> difficult as lock managers are sensitive to timing, so removing a high load 
> item from the list, even if it isn't the source of the problem, might cause 
> it to trigger less frequently.

I made the patch for rpc.lockd that could somewhat ease obtaining
debug information. Patch is available at
http://people.freebsd.org/~kib/rpc.lockd-debug.patch

No functional changes. Patch only adds dumping of currently held locks
(as perceived by lockd) on receiving of SIGUSR1. You need to specify
debug level 2 or 3 to obtain the dump.

Also, the both lockd processes now put identification information
in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process.


pgpyMjtyKCekU.pgp
Description: PGP signature

mountd changed?

2006-07-05 Thread Danny Braniss

something has changed wrt nmount(2)/mountd(8)/exports(5):

> cat /etc/exports
/h -alldirs -network 132.65.0.0 -mask 255.255.0.0
> cat /etc/fstab
/dev/da1s1d /h  ufs rw  1 1

and all is fine, the filesystem is exported and accesible.

# /etc/rc.d/mountd reload
Reloading mountd config files.

but /var/log/messages:
mountd[473]: can't change attributes for /h
mountd[473]: bad exports list line /h -alldirs -network 132.65.0.0 -mask 
255.255.0.0

btw, nothing has changed in the /etc/exports file.
2nd, the root (/) is nfs readonly.
and now any attempt to mount is denied.
just in case: kern.securelevel: -1

danny



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread Chris H.


Quoting Michel Talon <[EMAIL PROTECTED]>:


So it would appear that you cured the NFS problems inherent with FBSD-6
by replacing FBSD with Fedora Linux. Nice to know that NFSd works in Linux.
But won't help those on the FBSD list fix their FBSD-6 boxen. :/



First NFS is designed to make machines of different OSs interact properly.

Yes, this is it's purpose.

If a FreeBSD server interacts properly with a FreeBSD client, but not other
clients, you cannot say that the situation is fine.

Indeed.

Second i am not the one to chose the NFS server, there are people working
in social groups, in the real world.

And third, the most important, the OP message seemed to imply that the
FreeBSD-6 NFS client was at fault, i pointed out that in my experience my
FreeBSD-6.1 client works OK, while the 6.0 doesn't, when  interacting with a
FC5 server. This is in itself a relevant piece of information for the problem
at hand. It may be that the server side is at fault, or some complex
interaction between client and server.

Of course. I quite agree. Horrible oversight on my part.


Anyways some people claimed here that they had no problem with FreeBSD-5
clients and servers. My experience is that i had constant problems
between FreeBSD-5 clients and Fedora Core 3 servers. I cannot provide any
other data point. I am not particularly sure of the quality of the FC3 or
FC5 NFS server implementation, except that the ~ 100 workstations
running the similar Fedora distribution work like a charm with their homes
NFS mounted on the server. On  the other hand a Debian client machine 
also has

severe NFS problems. My only conclusion is that these NFS stories are very
tricky. The only moment everything worked fine was when we were running
Solaris on the server.

Useful knowledge, to be sure.
Sorry for my oversight. I should probably refrain from responding when I
have too many other things purculating in my mind while at work. This
has gotten me in trouble once before on this _same_ list. :)

Thank you for your thoughtful response.




--

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"





--
panic: kernel trap (ignored)



-
FreeBSD 5.4-RELEASE-p12 (SMP - 900x2) Tue Mar 7 19:37:23 PST 2006
/



pgpHofOVV3K34.pgp
Description: PGP Digital Signature

Re: NFS Locking Issue

2006-07-05 Thread Robert Watson


On Wed, 5 Jul 2006, Danny Braniss wrote:

In my case our main servers are NetApp, and the problems are more related to 
am-utils running into some race condition (need more time to debug this :-) 
the other problem is related to throughput, freebsd is slower than linux, 
and while freebsd/nfs/tcp is faster on Freebsd than udp, on linux it's the 
same. So it seems some tunning is needed.


our main problem now is samba/rpc.lockd, we are stuck with a server running 
FreeBSD 5.4 which crashes, and we can't upgrade to 6.1 because lockd doesn't 
work.


So, if someone is willing to look into the lockd issue, we would like to 
help.


The most significant problem working with rpc.lockd is creating easy to 
reproduce test cases.  Not least because they can potentially involve multiple 
clients.  If you can help to produce simple test cases to reproduce the bugs 
you're seeing, that would be invaluable.


I'm aware of two general classes of problems with rpc.lockd.  First, 
architectural issues, some derived from architectural problems in the NLM 
protocol: for example, assumptions that there can be a clean mapping of 
process lock owners to locks, which fall down as locks are properties of file 
descriptors that can be inheritted.  Second, implementation bugs/misfeatures, 
such as the kernel not knowing how to cancel lock requests, so being unable to 
implement interruptible waits on locks in the distributed case.


Reducing complex failure modes to easily reproduced test cases is tricky also, 
though.  It requires careful analysis, often with ktrace and tcpdump/ethereal 
to work out what's going on, and not a little luck to perform the reduction of 
a large trace down to a simple test scenario.  The first step is to try and 
figure out what, if any, specific workload results in a problem.  For example, 
can you trigger it using work on just one client against a server, without 
client<->client interactions?  This makes tracking and reproduction a lot 
easier, as multi-client test cases are really tricky!  Once you've established 
whether it can be reproduced with a single client, you have to track down the 
behavior that triggers it -- normally, this is done by attempting to narrow 
down the specific program or sequence of events that causes the bug to 
trigger, removing things one at a time to see what causes the problem to 
disappear.  This is made more difficult as lock managers are sensitive to 
timing, so removing a high load item from the list, even if it isn't the 
source of the problem, might cause it to trigger less frequently.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS Locking Issue

2006-07-05 Thread Oliver Brandmueller

Mornin'

On Tue, Jul 04, 2006 at 09:47:21PM +0100, Robert Watson wrote:
> BTW, I noticed yesterday that that IPv6 support committ to rpc.lockd was 
> never backed out.  An immediate question for people experiencing new 
> rpc.lockd problems with 6.x should be whether or not backing out that 
> change helps.

That could be a good pointer. I also started experiencing some problems 
at home (I did not investigate further though, but started using local 
locking and all was fine), while in our prod setup, where lots of 
machines are running, and many of them use 6-STABLE of not too long ago, 
I never experienced any problems with NFS. The main difference between 
both these networks is, that at home I have an IPv6 environment, while 
at work it's IPv4 only.

I barely find time before the weekend to do tests, but if I don't read 
any postings telling, that this made a difference, I will then start 
testing at home.

Thanx, Oliver

-- 
| Oliver Brandmueller | Offenbacher Str. 1  | Germany   D-14197 Berlin |
| Fon +49-172-3130856 | Fax +49-172-3145027 | WWW:   http://the.addict.de/ |
|   Ich bin das Internet. Sowahr ich Gott helfe.   |
| Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! |

pgp9BUYZloqfB.pgp
Description: PGP signature

novell mount losing state

2006-07-05 Thread m . ehinger


Hello,

i'am using FreeBSD 6.1 Stable and tried to mount an Novell volume (mount_nwfs). 
Mounting the volume works without problems but after
some time of inactivity on that mount i have to remount the volume to get 
access again.



Syslog message:

  Jul  5 08:51:08 pcmcb3-104 kernel: ncprq: Restoring connection, flags = 
101



Output of "ncplist c" working mount (yesterday evening)
  Active NCP connections:
   refid server:user(connid), owner:group(mode), refs, 
   7 SERVER:USER(483), root:wheel(755), 1, 


Output of "ncplist c" non working mount (today morning)

  Active NCP connections:
   refid server:user(connid), owner:group(mode), refs, 
   7 SERVER:USER(397), root:wheel(755), 1, <>



If i use a cronjob to access the mount periodically there is no such problem!

Any hints?

If this is the wrong list please let me know

If you need more info you're welcome

Thanks in advance

Maik

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

43 matches

Mail list logo