from:"\"greg byshenk\""

Re: KBI unexpexted change in stable/11 ?

2018-03-28 Thread Greg Byshenk

On Wed, Mar 28, 2018 at 03:11:50PM +0100, tech-lists wrote:
> On 28/03/2018 14:39, Gregory Byshenk wrote:
> > You can do this manually, or by adding a PORTS_MODULES line to
> > /etc/make.conf. This will rebuild the listed modules from ports
> > when you build a new kernel.
> 
> Are you sure it's in /etc/make.conf and not /etc/src.conf?

No. But it is in the man page for make.conf and not src.conf.

-- 
gregory byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: NFS and amd on older FreeBSD

2017-01-12 Thread Greg Byshenk

On Wed, Jan 11, 2017 at 03:47:37PM -0800, Karl Young wrote:
> I inherited a lab that has a few hundred hosts running FreeBSD 7.2.
> These hosts run test scripts that access files that are stored on
> FreeBSD 6.3 host.  The 6.3 host exports a /data directory with NFS
> 
> [...]
>
> $ showmount -e  9.3-host
> Exports list on 9.3-host:
> /data   Everyone
> 
> But I can't automount it:
> 
> $ ls -l /net/9.3-host/data
> ls: /net/9.3-host/data: No such file or directory
> 
> If I manually mount the exported directory, it works:
> 
> $ sudo mount -t nfs 9.3-host:/data /mnt/data/
> $ mount | grep nfs
> 9.3-host:/data on /mnt/data (nfs)
> 
> $ ls -l /mnt/data
> total 4
> drwxr-xr-x  9 root  wheel  512 Dec 20 17:41 iaf2
> 
> I've spent some time on Google, but haven't found a solution.  I realize
> these are very old versions, but I'm not in a position to upgrade them
> right now.  My last resort will be to use /etc/fstab to do the NFS
> mount, but I'd rather avoid that if I can.

If you can mount the share manually, there is almost 
certainly nothing wrong with the server. Based on the
error ("No such file or directory"), I would recommend
checking your amd config on the client.


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS Panic after freebsd-update

2013-07-02 Thread Greg Byshenk

On Tue, Jul 02, 2013 at 12:57:16AM -0700, Jeremy Chadwick wrote:

> But in the OP's case, the situation sounds dire given the limitations --
> limitations that someone (apparently not him) chose, which greatly
> hinder debugging/troubleshooting.  Had a heterogeneous setup been
> chosen, the debugging/troubleshooting pains are less (IMO).  When I see
> this, it makes me step back and ponder the decisions that lead to the
> ZFS-only setup.

As an observer (though one who has used ZFS for some time, now),
I might suggest that this can at least -seem- like FUD about ZFS
because the "limitations" don't necessarily have anything to do
with ZFS. That is, a situation in which one cannot recover, nor
even effectively troubleshoot, if there is a problem, will be a
"dire" one, regardless of what the problem might be or where its
source might lie.

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL - Portland, OR USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: svn - but smaller?

2013-01-25 Thread Greg Byshenk

On Fri, Jan 25, 2013 at 03:12:03PM +0200, Daniel Kalchev wrote:
[...]
> It is absurd to require the installation of any port, if your only 
> intention is to update the base system sources.

I think others have already pointed this out, but
"if your only intention is to update the base system
sources", then 'freebsd-update' (from the base system)
will do the job.

Or am I missing/misunderstanding something?

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL - Portland, OR USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: fsck_ufs running too often

2012-06-23 Thread Greg Byshenk

On Sat, Jun 23, 2012 at 06:23:58PM +1000, Sean wrote:
> On 23/06/2012, at 7:47 AM, Leonardo M. Ram? wrote:
 
> > Hi, since a few of days ago, I noticed my home server turns very
> > slow more than once a day, so every time I run "top" to see what's
> > processes are running, I can see fsck_ufs at the very top, and the
> > hard drive working like mad.
> > 
> > I've checked my crontab and there's nothing related to fsck_ufs,
> > where can I start searching for the cause of the problem?, I
> > thought this process should run only at boot or shutdown, but this
> > time it is running -apparently- without a cause.
 
> Background fsck. Your server crashed, rebooted, started up and fsck
> is running in the background while everything else continues.
> 
> [...]
> 
> The more important thing is to find out why it crashed - if there
> was a power outage, hardware or software issue.

Another thing to do is look in the logs to see if background fsck
is failing for some reason. I've seen it happen in some cases that
background fsck fails and asks for a manual run, in which case the
filesystem remains dirty, and further reboots will continue to fail
until a manual fsck is run.


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL - Portland, OR USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Experience with Intel SATA and fbsd 8.3-amd64 ?

2012-06-15 Thread Greg Byshenk

On Fri, Jun 15, 2012 at 07:20:28PM +0200, Kurt Jaeger wrote:
> > Kurt Jaeger  wrote:

> >  > I have a problem with some host: If I put heavy IO load on that
> >  > system, write errors happen, and then it crashes.
> > 
> > What kind of write errors, exactly?  What messages do you
> > get on the console?
> 
> g_vfs_done():ada0s1f[WRITE(offset=50699862016, length=16384)]error = 2
> 2
> g_vfs_done():ada0s1f[WRITE(offset=50699862016, length=16384)]error = 22
> g_vfs_done():ada0s1e[WRITE(offset=44693307392, length=16384)]error = 22
> g_vfs_done():ada0s1e[WRITE(offset=44693211136, length=2048)]error = 5
> 
> > It's also worth mentioning that such problems could also
> > be caused by bad RAM, or even by the power supply (though
> > the latter is unlikely in this case, I think).
> 
> Well, the device was probably a bit on the cheap side (ALLNET FW9000).

Could it be a device problem? I've seen that type of error
(including a crash in the end) when a device can't handle DMA.
Disabling DMA solved the problem for me.


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL - Portland, OR USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Strange 'hangs' with RELENG_9

2012-01-19 Thread Greg Byshenk

On Thu, Jan 19, 2012 at 04:00:24PM +0100, L??szl?? K??ROLYI wrote:
 
> Moreover, I couldn't set SCHED_BSD in the kernel config, it said that
> it's an illegal option. Maybe it does not exist in RELENG_9.

This should be 

options SCHED_4BSD 
  ^
if you want to try it.

It can be used with RELENG_9; check the NOTES file.


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Serial multiport error Oxford/Startech PEX2S952

2011-08-24 Thread Greg Byshenk

On Mon, Aug 22, 2011 at 11:59:11AM +0100, David Wood wrote:

> In message <20110822094756.gj92...@core.byshenk.net>, Greg Byshenk 
>  writes
> >It doesn't seem to matter; both cuau?.lock and cuau?.init produce the
> >error (for both ports), and cuau? itself remains a no-op.
> 
> You could try
> hint.uart.2.baud="115200"
> 
> in /boot/device.hints - making the relevant changes to port number and 
> speed according to your needs.

This does not help; speed remains set to 9600.

> >Now that I can see that the card is working (at least minimally), it
> >begins to look as if there might be a problem somewhere in 9.x. I'll
> >try to install 8.x and see if the results are different.
> 
> It will be interesting to see if there is a difference between 8.x and 
> 9.x.

Yes, there is.

Using 8-STABLE (with sources from 17 August 2011) and inbuilt puc,
the controller works as expected. It defaults to 9600, but setting
the speed on the cuaa?.lock and cuaa?.init devices works.

Interestingly, setting the speed in device.hints does _not_ work.

So, it appears that there is something wrong (or at least different)
with 9.x

Doing some poking around, I see that, in 9.x, termios.h is not
included in dev/uart/uart_core.c and dev/uart/uart_tty.c. While
it is included under 8.x.

If I look at the 8.x .c files, they want 

#include 

... which appears to no longer be used. But adding either that,
or

#include 

... produces errors:

/usr/src/sys/dev/uart/uart_core.c:47:21: error: termios.h: No such file 
or directory
/usr/src/sys/dev/uart/uart_tty.c:42:21: error: termios.h: No such file 
or directory
mkdep: compile failed
*** Error code 1

Though a fresh build of world seems to produce termios.h:

# find /usr/obj/ |grep termios.h
/usr/obj/usr/src/lib32/usr/include/sys/termios.h
/usr/obj/usr/src/lib32/usr/include/sys/_termios.h
/usr/obj/usr/src/lib32/usr/include/termios.h
/usr/obj/usr/src/tmp/usr/include/termios.h
/usr/obj/usr/src/tmp/usr/include/sys/termios.h
/usr/obj/usr/src/tmp/usr/include/sys/_termios.h
# 

But I may be completely confused here, as I don't pretend to be
familiar with all of the details of the build process.

Does this look like a bug with 9.x, or something that should be
done differently?

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Serial multiport error Oxford/Startech PEX2S952

2011-08-22 Thread Greg Byshenk

On Mon, Aug 22, 2011 at 10:23:14AM +0100, David Wood wrote:
 
> In message <20110822083336.gi92...@core.byshenk.net>, Greg Byshenk 
>  writes
> >On Mon, Aug 22, 2011 at 12:20:33AM +0200, Greg Byshenk wrote:
> >>puc0:  mem 
> >>0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 
> >>30 at device 0.0 on pci4
> >>puc0: 2 UARTs detected
> >>uart2: <16950 or compatible> at port 1 on puc0
> >>uart3: <16950 or compatible> at port 2 on puc0
> 
> This indicates that the puc(4) code is working correctly - it recognises 
> the board, reads via one of the BARs to confirm there are two UARTs, 
> initialises both UARTs to 16950 mode, then hands off these ports to 
> uart(4).
> 
> >>I'll follow up tomorrow. Thanks.
> >
> >Following up:
> >
> >It appears that indeed, the "options COM_MULTIPORT" is unnecessary
> >for 9-BETA; I've rebuilt the kernel without it, and the card is
> >still recognized, along with the ports.
> 
> That's what I expected. The only line needed is "device puc". I have no 
> idea why this can't be included in GENERIC, especially as puc(4) doesn't 
> work as a module (no drivers are attached to the ports on the puc 
> board).
> 
> 
> >But all it not as it should be. I still can't set the speed on the
> >card.
> >
> >> # stty -f /dev/cuau2.init speed 115200 crtscts
> >> stty: /dev/cuau2.init isn't a terminal
> >> #
> >
> >And setting speed on the device itself remains a no-op:
> >
> >  # stty -f /dev/cuau2 speed 115200 crtscts
> >  9600
> >  #
> >
> >That said, the card -does- seem to work, at least at some level.
> >With the speed issue pointed out, I set the connection on the
> >other end to 9600, and then it works. But I'd really like it to
> >be faster than that (it's just a serial console, so we could
> >probably live with 9600, though we wouldn't like it).
> >
> >If there is reason to think that this could be a 9.x issue,
> >then I could try going to 8.x.
> 
> My earlier instructions omitted mention of the lock, which is really 
> needed if you want to force a particular speed
> 
> 
> On 8.2:
> 
> [root@manganese ~]# PORT='/dev/cuau5' ; OPTIONS='speed 115200 crtscts' ; 
> stty -f ${PORT}.lock 0 ; stty -f ${PORT}.init ${OPTIONS} > /dev/null ; 
> stty -f ${PORT}.lock 1 ; stty -f ${PORT}
> speed 115200 baud;
> lflags: echoe echoke echoctl
> oflags: tab0
> cflags: cs8 -parenb crtscts
> [root@manganese ~]# cu -l cuau5
> Connected
> ATI4
> U.S. Robotics 56K FAX EXT Settings...
> 
>B0  E1  F1  L2  M1  Q0  V1  X4  Y1
>SPEED=115200  PARITY=N  WORDLEN=8
>DIAL=TONEOFF LINE   CID=1
> 
>&A3  &B1  &C1  &D2  &H2  &I2  &K1
>&M4  &N0  &R1  &S0  &T5  &U0  &Y1
> 
>S00=000  S01=000  S02=043  S03=013  S04=010  S05=008  S06=004
>S07=060  S08=002  S09=006  S10=014  S11=072  S12=050  S13=000
>S15=000  S16=000  S18=000  S19=000  S21=010  S22=017  S23=019
>S25=005  S27=001  S28=008  S29=020  S30=000  S31=128  S32=002
>S33=000  S34=000  S35=000  S36=014  S38=000  S39=012  S40=000
>S41=004  S42=000
> 
>LAST DIALLED #:
> 
> OK
> ~
> [EOT]
> [root@manganese ~]# PORT='/dev/cuau5' ; OPTIONS='speed 38400 crtscts' ; 
> stty -f ${PORT}.lock 0 ; stty -f ${PORT}.init ${OPTIONS} > /dev/null ; 
> stty -f ${PORT}.lock 1 ; stty -f ${PORT}
> speed 38400 baud;
> lflags: echoe echoke echoctl
> oflags: tab0
> cflags: cs8 -parenb crtscts
> [root@manganese ~]# cu -l cuau5
> Connected
> ATI4
> U.S. Robotics 56K FAX EXT Settings...
> 
>B0  E1  F1  L2  M1  Q0  V1  X4  Y1
>SPEED=38400  PARITY=N  WORDLEN=8
>DIAL=TONEOFF LINE   CID=1
> 
>&A3  &B1  &C1  &D2  &H2  &I2  &K1
>&M4  &N0  &R1  &S0  &T5  &U0  &Y1
> 
>S00=000  S01=000  S02=043  S03=013  S04=010  S05=008  S06=004
>S07=060  S08=002  S09=006  S10=014  S11=072  S12=050  S13=000
>S15=000  S16=000  S18=000  S19=000  S21=010  S22=017  S23=019
>S25=005  S27=001  S28=008  S29=020  S30=000  S31=128  S32=002
>S33=000  S34=000  S35=000  S36=014  S38=000  S39=012  S40=000
>S41=004  S42=000
> 
>LAST DIALLED #:
> 
> OK
> ~
> [EOT]
> 
> 
> This is one of my OXPCIe954 ports - the modem on that port identifies 
> the speed it is being talked to in the ATI4 output.
> 
> If this is a 9.x issue, it seems more likely

Re: Serial multiport error Oxford/Startech PEX2S952

2011-08-22 Thread Greg Byshenk

On Mon, Aug 22, 2011 at 12:20:33AM +0200, Greg Byshenk wrote:
> On Sun, Aug 21, 2011 at 09:44:41PM +0100, David Wood wrote:
>  
> > I wrote and contributed the support code for the OXPCIe95x serial chips 
> > - and just happened to notice your report.
> 
> Thanks for the response.
> 
> 
> > In message <20110821154249.ge92...@core.byshenk.net>, Greg Byshenk 
> >  writes
> > >I'm having a problem with a StarTech PEX2S952 dual-port serial
> > >card.
> > >
> > >I believe that it should be supported, as it has this entry in
> > >pucdata.c
> > >
> > >[...]
> > >   {   0x1415, 0xc158, 0x, 0,
> > >   "Oxford Semiconductor OXPCIe952 UARTs",
> > >   DEFAULT_RCLK * 0x22,
> > >   PUC_PORT_NONSTANDARD, 0x10, 0, -1,
> > >   .config_function = puc_config_oxford_pcie
> > >   },
> > >[...]
> > 
> > It should be supported. The OXPCIe952 is more awkward to support than 
> > the OXPCIe954 and OXPCIe958 because it can be configured in so many 
> > different ways by the board manufacturer. However, 0xc158 is 
> > configuration that is identical in arrangement as the larger chips, so 
> > is the configuration I'm most confident of. I've just double-checked the 
> > data sheets, and can't see any relevant differences between 0xc158 
> > OXPCIe952 and the OXPCIe954 I tested the code with.
> > 
> > I use my OXPCIe954 board on FreeBSD 8.2, and have had success reports 
> > from other OXPCIe954 and OXPCIe958 board users (including someone with a 
> > 16 port board based on dual OXPCIe958s). I have yet to try FreeBSD 9.x 
> > on my hardware.
> > 
> > 
> > >And, while it is recognized at boot -- after adding
> > >
> > >  device  puc
> > >  options COM_MULTIPORT
> > 
> > I'm 99% certain that "options COM_MULTIPORT" relates to the old sio(4) 
> > code - I certainly don't need it on 8.x. Does it make any difference if 
> > you delete that line and just leave "device puc"?
> 
> I will rebuild my kernel and try.
>  
>  
> > >to my kernel, it doesn't seem to be working. The devices '/dev/cuau2'
> > >and '/dev/cuau3' show up, and I can connect to them, but they don't
> > >seem to pass any traffic. If I connect to the serial console of
> > >another machine (one that I know for certain is working), I get
> > >nothing at all.
> > 
> > Have you remembered to set the speed (and other relevant options) on the 
> > .init devices? This is a feature (or is it a quirk) of the uart(4) 
> > driver that catches many people out. Setting options on the base device 
> > is normally a no-op.
> > 
> > For example, if the remote device on /dev/cuau2 operates at 115200 bps 
> > with hardware handshaking, try:
> > 
> > stty -f /dev/cuau2.init speed 115200 crtscts
> 
> Interestingly, it -is- a no-op on the device, which I hadn't noticed.
> But trying to set it on the .init fails:
> 
>   # stty -f /dev/cuau2.init speed 115200
>   stty: /dev/cuau2.init isn't a terminal crtscts
>   # 
> 
>  
> > One frustrating aspect of adding puc(4) support for many devices is that 
> > you can't be certain of the clock rate multiplier - the same device can 
> > crop up on a different manufacturer's board with a different multiplier. 
> > This problem doesn't occur with the OXPCIe95x devices as they derive 
> > their 62.5MHz UART clock from the PCI Express clock. Consequently, the 
> > problem can't be that your board inadvertently operating the UARTs at 
> > the wrong speed.
> > 
> > 
> > >I suspect (?) that it may not be recognized as the proper card. Boot
> > >and pciconf messages are:
> > >
> > >puc0:  mem 
> > >0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 
> > >30 at device 0.0 on pci4
> > 
> > That is correct. Are there any more lines afterwards - especially one 
> > giving the number of UARTs detected? That line is crucial, as, on these 
> > chips, the number of UARTs has to be read from configuration space 
> > because you can slave two chips together.
> > 
> > My OXPCIe954 board is recognised thus (FreeBSD 8.2 amd64):
> > 
> > puc0:  mem 
> > 0xd5efc000-0xd5ef,0xd5c0-0xd5df,0xd5a0-0xd5bf irq 18 
> > at device 0.0 on pci8
> > puc0: 4 UARTs detected
> > puc0: [FILTER]
> > uart2: <16

Re: Serial multiport error Oxford/Startech PEX2S952

2011-08-21 Thread Greg Byshenk

On Sun, Aug 21, 2011 at 09:44:41PM +0100, David Wood wrote:
 
> I wrote and contributed the support code for the OXPCIe95x serial chips 
> - and just happened to notice your report.

Thanks for the response.


> In message <20110821154249.ge92...@core.byshenk.net>, Greg Byshenk 
>  writes
> >I'm having a problem with a StarTech PEX2S952 dual-port serial
> >card.
> >
> >I believe that it should be supported, as it has this entry in
> >pucdata.c
> >
> >[...]
> >   {   0x1415, 0xc158, 0x, 0,
> >   "Oxford Semiconductor OXPCIe952 UARTs",
> >   DEFAULT_RCLK * 0x22,
> >   PUC_PORT_NONSTANDARD, 0x10, 0, -1,
> >   .config_function = puc_config_oxford_pcie
> >   },
> >[...]
> 
> It should be supported. The OXPCIe952 is more awkward to support than 
> the OXPCIe954 and OXPCIe958 because it can be configured in so many 
> different ways by the board manufacturer. However, 0xc158 is 
> configuration that is identical in arrangement as the larger chips, so 
> is the configuration I'm most confident of. I've just double-checked the 
> data sheets, and can't see any relevant differences between 0xc158 
> OXPCIe952 and the OXPCIe954 I tested the code with.
> 
> I use my OXPCIe954 board on FreeBSD 8.2, and have had success reports 
> from other OXPCIe954 and OXPCIe958 board users (including someone with a 
> 16 port board based on dual OXPCIe958s). I have yet to try FreeBSD 9.x 
> on my hardware.
> 
> 
> >And, while it is recognized at boot -- after adding
> >
> >  device  puc
> >  options COM_MULTIPORT
> 
> I'm 99% certain that "options COM_MULTIPORT" relates to the old sio(4) 
> code - I certainly don't need it on 8.x. Does it make any difference if 
> you delete that line and just leave "device puc"?

I will rebuild my kernel and try.
 
 
> >to my kernel, it doesn't seem to be working. The devices '/dev/cuau2'
> >and '/dev/cuau3' show up, and I can connect to them, but they don't
> >seem to pass any traffic. If I connect to the serial console of
> >another machine (one that I know for certain is working), I get
> >nothing at all.
> 
> Have you remembered to set the speed (and other relevant options) on the 
> .init devices? This is a feature (or is it a quirk) of the uart(4) 
> driver that catches many people out. Setting options on the base device 
> is normally a no-op.
> 
> For example, if the remote device on /dev/cuau2 operates at 115200 bps 
> with hardware handshaking, try:
> 
> stty -f /dev/cuau2.init speed 115200 crtscts

Interestingly, it -is- a no-op on the device, which I hadn't noticed.
But trying to set it on the .init fails:

# stty -f /dev/cuau2.init speed 115200
stty: /dev/cuau2.init isn't a terminal crtscts
# 

 
> One frustrating aspect of adding puc(4) support for many devices is that 
> you can't be certain of the clock rate multiplier - the same device can 
> crop up on a different manufacturer's board with a different multiplier. 
> This problem doesn't occur with the OXPCIe95x devices as they derive 
> their 62.5MHz UART clock from the PCI Express clock. Consequently, the 
> problem can't be that your board inadvertently operating the UARTs at 
> the wrong speed.
> 
> 
> >I suspect (?) that it may not be recognized as the proper card. Boot
> >and pciconf messages are:
> >
> >puc0:  mem 
> >0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 
> >30 at device 0.0 on pci4
> 
> That is correct. Are there any more lines afterwards - especially one 
> giving the number of UARTs detected? That line is crucial, as, on these 
> chips, the number of UARTs has to be read from configuration space 
> because you can slave two chips together.
> 
> My OXPCIe954 board is recognised thus (FreeBSD 8.2 amd64):
> 
> puc0:  mem 
> 0xd5efc000-0xd5ef,0xd5c0-0xd5df,0xd5a0-0xd5bf irq 18 
> at device 0.0 on pci8
> puc0: 4 UARTs detected
> puc0: [FILTER]
> uart2: <16950 or compatible> on puc0
> uart2: [FILTER]
> uart3: <16950 or compatible> on puc0
> uart3: [FILTER]
> uart4: <16950 or compatible> on puc0
> uart4: [FILTER]
> uart5: <16950 or compatible> on puc0
> uart5: [FILTER]

puc0:  mem 
0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 30 at 
device 0.0 on pci4
puc0: 2 UARTs detected
uart2: <16950 or compatible> at port 1 on puc0
uart3: <16950 or compatible> at port 2 on puc0

 
> >puc0@pci0:4:0:0:class=0x070002 card=0xc1581415 chip=0xc15

Serial multiport error Oxford/Startech PEX2S952

2011-08-21 Thread Greg Byshenk

Not sure if -stable is the right place for this, but I'll give it
a shot; if it's not, then a pointer in the right direction would
be much appreciated.

I'm having a problem with a StarTech PEX2S952 dual-port serial
card.

I believe that it should be supported, as it has this entry in
pucdata.c

[...]
{   0x1415, 0xc158, 0x, 0,
"Oxford Semiconductor OXPCIe952 UARTs",
DEFAULT_RCLK * 0x22,
PUC_PORT_NONSTANDARD, 0x10, 0, -1,
.config_function = puc_config_oxford_pcie
},
[...]

And, while it is recognized at boot -- after adding

device  puc
options COM_MULTIPORT

to my kernel, it doesn't seem to be working. The devices '/dev/cuau2'
and '/dev/cuau3' show up, and I can connect to them, but they don't
seem to pass any traffic. If I connect to the serial console of
another machine (one that I know for certain is working), I get 
nothing at all.

I suspect (?) that it may not be recognized as the proper card. Boot
and pciconf messages are:

puc0:  mem 
0xf9dfc000-0xf9df,0xfa00-0xfa1f,0xf9e0-0xf9ff irq 30 at 
device 0.0 on pci4

puc0@pci0:4:0:0:class=0x070002 card=0xc1581415 chip=0xc1581415 rev=0x00 
hdr=0x00
vendor = 'Oxford Semiconductor Ltd'
class  = simple comms
subclass   = UART
bar   [10] = type Memory, range 32, base 0xf9dfc000, size 16384, enabled
bar   [14] = type Memory, range 32, base 0xfa00, size 2097152, enabled
bar   [18] = type Memory, range 32, base 0xf9e0, size 2097152, enabled

The kernel is actually FreeBSD 9.0-BETA1 amd64, which is not quite
'STABLE' yet, but I don't think that this should matter.

Any advice would be much appreciated. The machine is still in
test phase, so I can mess around with it as necessary.

Thanks.

-- 
greg byshenk  -  free...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Question about packages installed via `pkg_add -r`

2011-03-06 Thread Greg Byshenk

On Sun, Mar 06, 2011 at 10:09:17AM +0800, Yue Wu wrote:
> On Sat, Mar 05, 2011 at 08:46:47PM -0500, ill...@gmail.com wrote:
> > On 5 March 2011 20:14, Yue Wu  wrote:
> > > On Sat, Mar 05, 2011 at 08:02:47PM -0500, ill...@gmail.com wrote:
> > >> On 5 March 2011 20:00, Yue Wu  wrote:
> > >> > Hello, sorry for poor English, I will try to explan clearer with my
> > >> > best.
> > >> >
> > >> > On Sat, Mar 05, 2011 at 04:48:17PM +0100, Greg Byshenk wrote:
> > >> >> On Sat, Mar 05, 2011 at 11:04:36PM +0800, Yue Wu wrote:
> > >> >>
> > >> >> > I'm trying to use package instead of ports these day, but a few
> > >> >> > questions have:
> > >> >> >
> > >> >> > 1. How to reserve packages that fetched via `pkg_add -r`?
> > >> >> >
> > >> >> > 2. How to know if there are updates for packages, and how to update?
> > >> >>
> > >> >> For (1), do you mean 'preserve', as in save a copy? ?If so, then
> > >> >> 'portmaster -b [...]' will save a backup copy of installed packages.
> > >> >
> > >> > Yes, I mean 'preserve'. I've maned portmaster, seems -b is for a
> > >> > installed package, so it will preserve it by packing up the files from 
> > >> > a
> > >> > installed package, why not preserve it just when fetching with `pkg_add
> > >> > -r`? I think it's the best way, I don't like the portmaster way to do 
> > >> > it
> > >> > after.
> > >>
> > >> from man 1 pkg_add:
> > >>
> > >> ? ? ?-K, --keep
> > >> ? ? ? ? ? ? ?Keep any downloaded package in PKGDIR if it is defined or 
> > >> in cur-
> > >> ? ? ? ? ? ? ?rent directory by default.
> > >>
> > >
> > > Thanks, sorry for no attentively reading ;p
> > >
> > > Another question arises after checking the pkg 'pkg_add' saves, why the
> > > pkg doesn't have a version appended to its name, it's hard to know the
> > > version the pkg downloaded...
> > 
> > Without digging in too deeply (I use ports, so I'm not the
> > _most_ knowledgeable on packages) I believe it has to
> > do with the fact that the packages are symlinked to non-
> > versioned names on the distribution server(s), probably
> > to simplify fetching.  The packages themselves should
> > have the version information in their metadata somewhere,
> > which might be possible to rename via script.
> > 
> > I apologise if that isn't helpful.
> 
> Thank you for info, I got the reason :)
> 
> ports with portmaster makes pkg installation mangement be much more
> flexiable and more friendly than package by pkg_add -r on FreeBSD,
> except that ports take much more time and resource. After trying with
> packages, I think I have to stick to ports.

As suggested by some of the other comments, you can choose to use
portmaster with packages, if you prefer not to do local builds.

In my own case, I use ports and packages, via portmaster. That is,
I use one machine to build locally-configured packages (in some 
cases with non-standard options), and then install them on the rest
of the machines as packages. It works very well in my environment.


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Question about packages installed via `pkg_add -r`

2011-03-05 Thread Greg Byshenk

On Sat, Mar 05, 2011 at 11:04:36PM +0800, Yue Wu wrote:

> I'm trying to use package instead of ports these day, but a few
> questions have:
> 
> 1. How to reserve packages that fetched via `pkg_add -r`?
> 
> 2. How to know if there are updates for packages, and how to update?

For (1), do you mean 'preserve', as in save a copy?  If so, then
'portmaster -b [...]' will save a backup copy of installed packages.

There may be a better way, but one way to deal with (2) is to have an
up-to-date ports tree. Then 'pkg_version -vL=' will show you which of
your ports are out of date. Then 'portmaster -PP [...]' will force
package use for updates.

If you have an up-to-date ports tree, then I think that

portmaster -abPP

will update all of your ports, using packages, and save a backup copy
of the installed versions.

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: root mount error

2010-12-28 Thread Greg Byshenk

On Tue, Dec 28, 2010 at 11:08:44PM +0300, Michael BlackHeart wrote:
> I'm no looking for help neither instructions how to build kernel. I'm
> just installing 8.1 RELEASE and svn it up to last week 8-stable. And
> going step-by-step of handbook installing kernel I'm having a trouble
> - it seems than new kenel doesn't recognize my HDD. I'm not doing
> something special, in that case I'm for shure mentioned it. I'm just
> building GENERIC kernel without any configuration of system after
> installation, to tweaks, no tunes, nothing. It's a new GENERIC kernel
> and it can't find my HDD but 8.1 i386/amd64 releases works well and as
> I remember something about month ago stable too.
> >Now, a likely cause of your problem is the installation of a custom
> >kernel with removed support for whatever your hard disk drive or raid
> >controller is recognized as.
> When it works it's just and ad0 hdd, no raid or special driver
> I'm jsut trying to say than recent changes in kernel or kernel-modules
> broke up my HDD support and I'd like to notice developres to check
> where the problem is.
> And of couse I've tried to switch SATA native mode and it doesn't
> change anything.
> Loader on it's own stage easily detects HDD and root partition so I
> can just select old kernel and boot up, but I'm not shure how he gain
> access to HDD to mfke any conclusion, probably through BIOS interrupts
> but it's out of piont.
> And for my pity I don't know how to dump demsg without having any
> serial connection or usable disk drive, maybe to flash drive, but I
> don't know how. And anyway there's no real kernel painc, it just asks
> for root mountpoint.
> 
> And for shure I've got an 2.5" Hitachi HTS542516K9A300 160Gb SATA HDD
> 
> If you need any aditional info I'll give it all, just ask.

If you change to SATA native mode, then your HD may show up at a 
different device (mine moved to ad8). If you go to native mode and
issue a '?' when it fails to find the kernel, does it show any HD
devices?

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: root mount error

2010-12-28 Thread Greg Byshenk

On Tue, Dec 28, 2010 at 01:36:01AM +0100, Damien Fleuriot wrote:
> On 12/27/10 9:18 PM, Michael BlackHeart wrote:

> > I've got trouble with FreeBSD 8 Stable
> > First I've put on notebook 8.2 RELEASE amd64, then SVN'ed src's to
> > yesterday revision I don't remember exact number, but I've have this
> > problem aobut week or two so it's not so important, also as it doesn't
> > work on i386 too.
> > 
> > After installing new kernel I've just build - indeed it always was
> > GENERIC for both arch's on clean system - I've got an a kernel painc
> > caused by disability to mount root partition because kernel couldn't
> > see the drive. By pressing '?' I've sen only acd0 that represents
> > CD-ROM.
> > 
> > In debug messages I haven't found anything about ad0 - than hdd was
> > identified before new kernel was installed.
> > I've got an HP 6720s notebook with SATA 160GB Hitachi HDD that is
> > working with diabled SATA native mode.
> > 
> > I've not found any info 'bout this error in recent 8.Stable so I don't
> > know how to handle this one.

> First, I'd advise making use of FreeBSD's nextboot utility to test new
> kernels:
> http://fuse4bsd.creo.hu/localcgi/man-cgi.cgi?nextboot+8
> 
> Second, I would suggest reading the handbook's excellent section on
> upgrading your machine or rebuilding the kernel:
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/updating-upgrading.html
> 
> Now, a likely cause of your problem is the installation of a custom
> kernel with removed support for whatever your hard disk drive or raid
> controller is recognized as.
> 
> Did you reinstall your old, working kernel, or are you actually asking
> for help doing just that ?

What kind of laptop?

For information, I had a similar problem when I updated my laptop
(HP Compaq 6910p) to 8.2-PRERELEASE as of 14 December. For some reason,
the system was no longer seeing the main hard drive. 

I solved the problem by setting 'SATA Native Mode' (or some such) in the
BIOS, which then led my (SATA) drive to be seen at '/dev/ad8'. After 
booting from ad8 and modifying my 'fstab', everything works fine.

So you might try the same thing. At least change the setting in your 
BIOS to see if you can see a drive.

-greg


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS and Storage Systems

2010-10-12 Thread Greg Byshenk

On Tue, Oct 12, 2010 at 10:33:49AM +0100, Michal wrote:

> Apologies for the basic question but I just want to make sure. I have 
> been looking at storage systems like this one 
> http://www.icc-usa.com/storage-35-2u.asp. I am guessing it would be a 
> case of sorting the discs out, probably on some GUI or command line for 
> the box it self, then using a FreeBSD box I can set up ZFS over the 
> drives...or is it not that simple?

You say "using a FreeBSD box I can set up ZFS over the drives...", which
doesn't make sense, if I undestand the ICC system. The device appears to
be an NAS system, with an OS, not an external disk bay.

What you would want, I think, is either a) an external FCAL or iSCSI box
that you could connect to another machine (running FreeBSD or some other
OS); or b) a 'storage' server upon which to install FreeBSD and use as
a NAS system.

It may be that the ICC system can export the drives, but it seems like an
unnecessary complication.

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Serial console problems with stable/8

2010-09-12 Thread Greg Byshenk

On Sun, Sep 12, 2010 at 05:26:12PM +0200, Oliver Fromme wrote:
 
> On Friday I have updated a machine from 7.1 to stable/8.
> It is connected to a serial console.  With 7.1 everything
> worked fine, but with stable/8 things seem to break.

[...]
 
> Here's my setup (which worked perfectly fine with 7.1):
> 
> /boot.config:
> -P
> 
> /boot/loader.conf:
> kernel_options="-P"
> console="comconsole"
> 
> /etc/ttys:
> ttyu0   "/usr/libexec/getty std.9600"   vt100   off secure

Shouldn't this:   ^^^
be 'on'...?



-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
 
> /boot/device.hints:
> hint.uart.0.at="isa"
> hint.uart.0.port="0x3F8"
> hint.uart.0.flags="0x10"
> hint.uart.0.irq="4"
> 
> /var/run/dmesg.boot:
> uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
> uart0: [FILTER]
> uart0: console (9600,n,8,1)
> 
> The serial port is connected to another PC that runs tip(1)
> in a screen(1) session, using a 9-pin nullmodem cable.
> That setup hasn't changed in ages; that other PC is running
> an older version of FreeBSD.
> 
> I need this issue to be resolved, because the serial console
> is required for remote management (the machine is a 3-hours
> ride away from home).  If it can't be resolved, I will have
> to downgrade it to 7.x.
> 
> Best regards
>Oliver
> 
> -- 
> Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
> Handelsregister: Registergericht Muenchen, HRA 74606,  Gesch?ftsfuehrung:
> secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n-
> chen, HRB 125758,  Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart
> 
> FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd
> 
> I suggested holding a "Python Object Oriented Programming Seminar",
> but the acronym was unpopular.
> -- Joseph Strout
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: igb related(?) panics on 7.3-STABLE

2010-09-02 Thread Greg Byshenk

On Mon, Aug 30, 2010 at 05:22:47AM -0700, Jeremy Chadwick wrote:
> On Mon, Aug 30, 2010 at 04:08:45AM -0700, Jeremy Chadwick wrote:
> > Bcc: 
> > Subject: Re: igb related(?) panics on 7.3-STABLE
> > Reply-To: 
> > In-Reply-To: <20100830094631.gd12...@core.byshenk.net>
> > {snip}

> My apologies -- somehow my mail client completely broke the Subject line
> and pulled it from another thread.  I'm not quite sure how mutt managed
> to do that, but probably an extraneous newline when editing mail
> headers, e.g. PEBKAC.

As an informational followup on this issue, I've updated the problem
machine to 8-STABLE (FreeBSD 8.1-STABLE #7: Mon Aug 23 13:01:15 CEST 2010)
and the problem seems to have gone away.

I had a journal overflow this morning, but that is a different problem,
and I think that it should be fixable via tuning a bit.


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Crashes on X7SPE-HF with em

2010-08-30 Thread Greg Byshenk

On Mon, Aug 30, 2010 at 04:08:45AM -0700, Jeremy Chadwick wrote:
> Bcc: 
> Subject: Re: igb related(?) panics on 7.3-STABLE
> Reply-To: 
> In-Reply-To: <20100830094631.gd12...@core.byshenk.net>
> 
> On Mon, Aug 30, 2010 at 11:46:31AM +0200, Greg Byshenk wrote:
> > On Sun, Aug 29, 2010 at 08:16:59PM +0200, Greg Byshenk wrote:
> > 
> > > I've begun seeing problems on a machine running FreeBSD-7.3-STABLE, 
> > > 64-bit,
> > > with two igb nics in use.  Previously the machine was fine, running 
> > > earlier
> > > versions of 7-STABLE, although the load on the network has increased due
> > > to additional machines being added to the network (the machine functions
> > > as a fileserver, serving files to compute machines via NFS(v3)).
> > > 
> > > Any advice is much appreciated. System info is below.
> > 
> > 
> > Followup with more information. The machine just panic'ed again, with 
> > a lot of load on the network.
> > 
> > Output from the 'systat' that was running at the time:
> > 
> >3 usersLoad 54.47 42.35 24.25  Aug 30 11:17
> > 
> >Mem:KBREALVIRTUAL   VN PAGER   SWAP 
> > PAGER
> >Tot   Share  TotShareFree   in   out in  
> >  out
> >Act   462325504   86814010548  943324  count
> >All  4564847852 1074772k27740  pages
> >Proc:
> > Interrupts
> >  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Fltcow   54220 
> > total
> >  1 170  392k8  278  22k  1951zfod
> > sio0 irq4
> >  ozfod   
> > fdc0 irq6
> >70.4%Sys   3.1%Intr  0.0%User  0.0%Nice 26.5%Idle%ozfod27 
> > twa0 uhci0
> >|||||||||||   daefr  2001 
> > cpu0: time
> >===++ prcfr   
> > igb0 256
> >  9938 dtbuf 1247 totfr   
> > igb0 257
> >Namei Name-cache   Dir-cache10 desvn  react   
> > igb0 258
> >   Callshits   %hits   % 34443 numvn1 pdwak   
> > igb0 259
> > 24996 frevn   112852 pdpgs   
> > igb0 262
> >  intrn   
> > igb0 263
> >Disks   da0   da1 pass0 pass1 2570672 wire
> > igb0 264
> >KB/t   0.00 12.23  0.00  0.00   46760 act 
> > igb0 265
> >tps   026 0 014706896 inact 19449 
> > igb1 266
> >MB/s   0.00  0.31  0.00  0.000 769796  26585
> >  021 0 0  173528
> > 
> > 
> > -greg
> >  
> >  
> >  
> > > Machine:
> > > ===
> > > 
> > > FreeBSD server.example.com 7.3-STABLE FreeBSD 7.3-STABLE #36: Wed Aug 25 
> > > 11:01:07 CEST 2010 
> > > r...@server.example.com:/usr/obj/usr/src/sys/KERNEL amd64
> > > 
> > > Kernel was csup'd earlier in the day on 25 August, immediately prior to 
> > > the build.
> > > 
> > > 
> > > Panic:
> > > ==
> > > 
> > > Fatal trap 9: general protection fault while in kernel mode
> > > cpuid = 2; apic id = 02
> > > instruction pointer = 0x8:0x8052f40c
> > > stack pointer   = 0x10:0xff82056819d0
> > > frame pointer   = 0x10:0xff82056819f0
> > > code segment= base 0x0, limit 0xf, type 0x1b
> > > = DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags= interrupt enabled, resume, IOPL = 0
> > > current process = 65 (igb1 que)
> > > trap number = 9
> > > panic: general protection fault
> > > cpuid = 2
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> > > panic() at panic+0x182
> > > trap_fatal() at trap_fatal+0x294
> > > trap() at trap+0x106
> > > calltrap() at calltrap+0x8
> > > --- trap 0x9, rip = 0x8052f40c, rsp = 0xff82056819d0, rbp = 
> > > 0xff82056819f0 --- m_ta

Re: igb related(?) panics on 7.3-STABLE

2010-08-30 Thread Greg Byshenk

On Sun, Aug 29, 2010 at 08:16:59PM +0200, Greg Byshenk wrote:

> I've begun seeing problems on a machine running FreeBSD-7.3-STABLE, 64-bit,
> with two igb nics in use.  Previously the machine was fine, running earlier
> versions of 7-STABLE, although the load on the network has increased due
> to additional machines being added to the network (the machine functions
> as a fileserver, serving files to compute machines via NFS(v3)).
> 
> Any advice is much appreciated. System info is below.


Followup with more information. The machine just panic'ed again, with 
a lot of load on the network.

Output from the 'systat' that was running at the time:

   3 usersLoad 54.47 42.35 24.25  Aug 30 11:17

   Mem:KBREALVIRTUAL   VN PAGER   SWAP PAGER
   Tot   Share  TotShareFree   in   out in   out
   Act   462325504   86814010548  943324  count
   All  4564847852 1074772k27740  pages
   Proc:Interrupts
 r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Fltcow   54220 total
 1 170  392k8  278  22k  1951zfodsio0 
irq4
 ozfod   fdc0 
irq6
   70.4%Sys   3.1%Intr  0.0%User  0.0%Nice 26.5%Idle%ozfod27 twa0 
uhci0
   |||||||||||   daefr  2001 cpu0: 
time
   ===++ prcfr   igb0 
256
 9938 dtbuf 1247 totfr   igb0 
257
   Namei Name-cache   Dir-cache10 desvn  react   igb0 
258
  Callshits   %hits   % 34443 numvn1 pdwak   igb0 
259
24996 frevn   112852 pdpgs   igb0 
262
 intrn   igb0 
263
   Disks   da0   da1 pass0 pass1 2570672 wireigb0 
264
   KB/t   0.00 12.23  0.00  0.00   46760 act igb0 
265
   tps   026 0 014706896 inact 19449 igb1 
266
   MB/s   0.00  0.31  0.00  0.000 769796  26585
 021 0 0  173528


-greg
 
 
 
> Machine:
> ===
> 
> FreeBSD server.example.com 7.3-STABLE FreeBSD 7.3-STABLE #36: Wed Aug 25 
> 11:01:07 CEST 2010 r...@server.example.com:/usr/obj/usr/src/sys/KERNEL 
> amd64
> 
> Kernel was csup'd earlier in the day on 25 August, immediately prior to 
> the build.
> 
> 
> Panic:
> ==
> 
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 2; apic id = 02
> instruction pointer = 0x8:0x8052f40c
> stack pointer   = 0x10:0xff82056819d0
> frame pointer   = 0x10:0xff82056819f0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 65 (igb1 que)
> trap number = 9
> panic: general protection fault
> cpuid = 2
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> panic() at panic+0x182
> trap_fatal() at trap_fatal+0x294
> trap() at trap+0x106
> calltrap() at calltrap+0x8
> --- trap 0x9, rip = 0x8052f40c, rsp = 0xff82056819d0, rbp = 
> 0xff82056819f0 --- m_tag_delete_chain() at m_tag_delete_chain+0x1c
> uma_zfree_arg() at uma_zfree_arg+0x41
> m_freem() at m_freem+0x54
> ether_demux() at ether_demux+0x85
> ether_input() at ether_input+0x1bb
> igb_rxeof() at igb_rxeof+0x29d
> igb_handle_que() at igb_handle_que+0x9a
> taskqueue_run() at taskqueue_run+0xac
> taskqueue_thread_loop() at taskqueue_thread_loop+0x46
> fork_exit() at fork_exit+0x122
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xff8205681d30, rbp = 0 ---
> Uptime: 11h57m6s
> Physical memory: 18411 MB
> Dumping 3770 MB:
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x80
> fault code  = supervisor write data, page not present
> instruction pointer = 0x8:0x80188b5f
> stack pointer   = 0x10:0xff82056811f0
> frame pointer   = 0x10:0xff82056812f0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 65 (igb1 que)
> trap number = 12
> 
> 
> pciconf:
> ===
> 
>

igb related(?) panics on 7.3-STABLE

2010-08-29 Thread Greg Byshenk

I've begun seeing problems on a machine running FreeBSD-7.3-STABLE, 64-bit,
with two igb nics in use.  Previously the machine was fine, running earlier
versions of 7-STABLE, although the load on the network has increased due
to additional machines being added to the network (the machine functions
as a fileserver, serving files to compute machines via NFS(v3)).

Any advice is much appreciated. System info is below.
-greg



Machine:
===

FreeBSD server.example.com 7.3-STABLE FreeBSD 7.3-STABLE #36: Wed Aug 25 
11:01:07 CEST 2010 r...@server.example.com:/usr/obj/usr/src/sys/KERNEL amd64

Kernel was csup'd earlier in the day on 25 August, immediately prior to 
the build.


Panic:
==

Fatal trap 9: general protection fault while in kernel mode
cpuid = 2; apic id = 02
instruction pointer = 0x8:0x8052f40c
stack pointer   = 0x10:0xff82056819d0
frame pointer   = 0x10:0xff82056819f0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 65 (igb1 que)
trap number = 9
panic: general protection fault
cpuid = 2
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x182
trap_fatal() at trap_fatal+0x294
trap() at trap+0x106
calltrap() at calltrap+0x8
--- trap 0x9, rip = 0x8052f40c, rsp = 0xff82056819d0, rbp = 
0xff82056819f0 --- m_tag_delete_chain() at m_tag_delete_chain+0x1c
uma_zfree_arg() at uma_zfree_arg+0x41
m_freem() at m_freem+0x54
ether_demux() at ether_demux+0x85
ether_input() at ether_input+0x1bb
igb_rxeof() at igb_rxeof+0x29d
igb_handle_que() at igb_handle_que+0x9a
taskqueue_run() at taskqueue_run+0xac
taskqueue_thread_loop() at taskqueue_thread_loop+0x46
fork_exit() at fork_exit+0x122
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff8205681d30, rbp = 0 ---
Uptime: 11h57m6s
Physical memory: 18411 MB
Dumping 3770 MB:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x80
fault code  = supervisor write data, page not present
instruction pointer = 0x8:0x80188b5f
stack pointer   = 0x10:0xff82056811f0
frame pointer   = 0x10:0xff82056812f0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 65 (igb1 que)
trap number = 12


pciconf:
===

i...@pci0:10:0:0:   class=0x02 card=0x10c915d9 chip=0x10c98086 rev=0x01 
hdr=0x00
vendor = 'Intel Corporation'
class  = network
subclass   = ethernet
i...@pci0:10:0:1:   class=0x02 card=0x10c915d9 chip=0x10c98086 rev=0x01 
hdr=0x00
vendor = 'Intel Corporation'
class  = network
subclass   = ethernet


dmesg:
=

igb0:  port 0xe880-0xe89f 
mem 0xfbe6-0xfbe
7,0xfbe4-0xfbe5,0xfbeb8000-0xfbebbfff irq 16 at device 0.0 on pci10
igb0: Using MSIX interrupts with 10 vectors
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: [ITHREAD]
igb0: Ethernet address: 00:30:48:ca:cd:72
igb1:  port 0xec00-0xec1f 
mem 0xfbee-0xfbe
f,0xfbec-0xfbed,0xfbebc000-0xfbeb irq 17 at device 0.1 on pci10
igb1: Using MSIX interrupts with 10 vectors
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: [ITHREAD]
igb1: Ethernet address: 00:30:48:ca:cd:73


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: I broke my SSH to jails after 7.2-8.0 src upgrade

2010-03-14 Thread Greg Byshenk

On Sat, Mar 13, 2010 at 09:33:50PM -0800, Doug Barton wrote:
> On 03/12/10 02:13, Greg Byshenk wrote:

> > I would put in a word for 'mergemaster -F' (or maybe '-iF') in such
> > cases.
 
> At this point the -U option is generally a safer bet. The only time this
> won't work for you is when upgrading from an older -RELEASE where you've
> never run mergemaster previously, in which case it will bark loudly that
> there is no mtree database. You could then run 'mergemaster -Fi' as you
> suggested, and run 'mergemaster -U' immediately thereafter and you
> should get as much "automation" as is possible.

I don't actually want "as much 'automation' as is possible".  Generally
I want to know what is being modified, even if it is in a file that I
haven't changed.  I like '-F' because it allows me to ignore the huge
number of files that aren't actually changed -- except the RCS line --
that sometimes arise when moving between versions.


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: I broke my SSH to jails after 7.2-8.0 src upgrade

2010-03-12 Thread Greg Byshenk

On Thu, Mar 11, 2010 at 08:33:29PM -0800, Garrett Cooper wrote:
 
> I've done a few RELENG_8_0 to STABLE-8 to 9-CURRENT upgrades lately
> and mergemaster was goofing up the contents a bit based on the RCS
> versions. I had to hand-edit a crapload of stuff going from 8 to 9,
> and I still don't trust mergemaster's automatic merging logic because
> it goofs up on /etc/group // /etc/passwd still (doesn't merge
> anything, discards my info, etc) for starters.
> 
> -a doesn't actually do any merging though, FWIW:

[...]

I would put in a word for 'mergemaster -F' (or maybe '-iF') in such
cases.

It doesn't try to automate much, but it allows one to concentrate on
actual differences by automating the handling of those files where
only the VCS Id is different.

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Supplementary groups on LDAP cannot work with RELENG_8 +nss_ldap

2010-03-09 Thread Greg Byshenk

On Tue, Mar 09, 2010 at 09:00:49AM +0800, Linghua Tseng wrote:
 
> Here is the output of `diff -u /usr/src/etc/nsswitch.conf 
> /etc/nsswitch.conf'.
> --- /usr/src/etc/nsswitch.conf  2010-03-08 09:04:25.0 +0800
> +++ /etc/nsswitch.conf  2010-03-08 18:01:08.0 +0800
> @@ -1,13 +1,13 @@
> #
> # nsswitch.conf(5) - name service switch configuration file
> -# $FreeBSD: src/etc/nsswitch.conf,v 1.1.10.1 2009/08/03 08:13:06 kensmith 
> Exp $
> +# $FreeBSD: src/etc/nsswitch.conf,v 1.1 2006/05/03 15:14:47 ume Exp $
> #
> group: compat
> -group_compat: nis
> +group_compat: ldap nis
> hosts: files dns
> networks: files
> passwd: compat
> -passwd_compat: nis
> +passwd_compat: ldap nis
> shells: files
> services: compat
> services_compat: nis
> 
> The line `+:*' has already put into /etc/master.passwd,
> and the line `+:*::' has already put into /etc/group.

I may be completely wrong (I can't seem to find the source), and I
don't know if it is the source of your problem, but I recall it being
reported that 'passwd_compat' and 'group_compat' require a *single*
source entry. 


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: device.hints isn't setting what I want

2010-01-22 Thread Greg Byshenk

On Fri, Jan 22, 2010 at 10:01:02AM +0100, Greg Byshenk wrote:
> On Thu, Jan 21, 2010 at 08:23:23PM -0500, Dan Langille wrote:
  
> > First, see also my post: do I want ch0 or pass1?
> > 
> > I have an external tape library and an external tape drive.  They are
> > not always powered up.  My goal: always get the same devices regardless
> > of whether or not the tape library is powered on at boot.
> > 
> > After booting, with the tape library powered on, I have these devices:
> > 
> > # camcontrol devlist
> >  at scbus0 target 5 lun 0 (sa0,pass0)
> > at scbus1 target 0 lun 0 (ch0,pass1)
> > at scbus1 target 5 lun 0 (sa1,pass2)
> > at scbus2 target 0 lun 0 (cd0,pass3)
> >   at scbus5 target 0 lun 0 (da0,pass4)
> > 
> > In /boot/devices, I have added these entries:
> > 
> > hint.scbus.1.at="ahc0"
> > hint.scbus.0.at="ahc1"
> > hint.scbus.2.at="acd0"
> > hint.scbus.5.at="umass0"
> 
> I think that this is wrong.
> 
> I had a similar issue (multiple tape drives and changer devices that 
> needed to stay at the same ids).
> 
> Your device.hints entries should look something like this:
> 
>hint.sa.0.at="scbus0"
>hint.sa.0.target="5"
>hint.sa.0.unit="0"
>hint.sa.1.at="scbus0"
>hint.sa.1.target="3"
>hint.sa.1.unit="0"
>hint.sa.2.at="scbus0"
>hint.sa.2.target="1"
>hint.sa.2.unit="0"
>hint.ch.0.at="scbus0"
>hint.ch.0.target="4"
>hint.ch.0.unit="0"
>hint.ch.1.at="scbus0"
>hint.ch.1.target="2"
>hint.ch.1.unit="0"
>hint.ch.2.at="scbus0"
>hint.ch.2.target="0"
>hint.ch.2.unit="0"
> 
> Which I use to get this:
> 
># camcontrol devlist
>at scbus0 target 0 lun 0 (pass0,ch2)
>   at scbus0 target 1 lun 0 (sa2,pass1)
>at scbus0 target 2 lun 0 (pass2,ch1)
>   at scbus0 target 3 lun 0 (sa1,pass3)
># 
> 
> (Currently the first changer is not powered up.)
> 
> 
> So I think that what you want is something like:
> 
>hint.sa.0.at="scbus0"
>hint.sa.0.target="5"
>hint.sa.0.unit="0"
>hint.sa.1.at="scbus1"
>hint.sa.1.target="5"
>    hint.sa.1.unit="0"
>hint.ch.0.at="scbus1"
>hint.ch.0.target="0"
>hint.ch.0.unit="0"
>[...]


Just saw your second message.

I don't know if you can wire down 'pass?' the same way, but if you can,
I would assume that you need to set it the same way as the 'sa?' and 
other devices.

That is, if you want:

> >  at scbus0 target 5 lun 0 (sa0,pass0)

Then the device.hints entry would look like:

   hint.pass.0.at="scbus0"
   hint.pass.0.target="5"
   hint.pass.0.unit="0"

(If you can do that.)

-greg

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: device.hints isn't setting what I want

2010-01-22 Thread Greg Byshenk

On Thu, Jan 21, 2010 at 08:23:23PM -0500, Dan Langille wrote:
 
> First, see also my post: do I want ch0 or pass1?
> 
> I have an external tape library and an external tape drive.  They are
> not always powered up.  My goal: always get the same devices regardless
> of whether or not the tape library is powered on at boot.
> 
> After booting, with the tape library powered on, I have these devices:
> 
> # camcontrol devlist
>  at scbus0 target 5 lun 0 (sa0,pass0)
> at scbus1 target 0 lun 0 (ch0,pass1)
> at scbus1 target 5 lun 0 (sa1,pass2)
> at scbus2 target 0 lun 0 (cd0,pass3)
>   at scbus5 target 0 lun 0 (da0,pass4)
> 
> In /boot/devices, I have added these entries:
> 
> hint.scbus.1.at="ahc0"
> hint.scbus.0.at="ahc1"
> hint.scbus.2.at="acd0"
> hint.scbus.5.at="umass0"

I think that this is wrong.

I had a similar issue (multiple tape drives and changer devices that 
needed to stay at the same ids).

Your device.hints entries should look something like this:

   hint.sa.0.at="scbus0"
   hint.sa.0.target="5"
   hint.sa.0.unit="0"
   hint.sa.1.at="scbus0"
   hint.sa.1.target="3"
   hint.sa.1.unit="0"
   hint.sa.2.at="scbus0"
   hint.sa.2.target="1"
   hint.sa.2.unit="0"
   hint.ch.0.at="scbus0"
   hint.ch.0.target="4"
   hint.ch.0.unit="0"
   hint.ch.1.at="scbus0"
   hint.ch.1.target="2"
   hint.ch.1.unit="0"
   hint.ch.2.at="scbus0"
   hint.ch.2.target="0"
   hint.ch.2.unit="0"

Which I use to get this:

   # camcontrol devlist
   at scbus0 target 0 lun 0 (pass0,ch2)
  at scbus0 target 1 lun 0 (sa2,pass1)
   at scbus0 target 2 lun 0 (pass2,ch1)
  at scbus0 target 3 lun 0 (sa1,pass3)
   # 

(Currently the first changer is not powered up.)


So I think that what you want is something like:

   hint.sa.0.at="scbus0"
   hint.sa.0.target="5"
   hint.sa.0.unit="0"
   hint.sa.1.at="scbus1"
   hint.sa.1.target="5"
   hint.sa.1.unit="0"
   hint.ch.0.at="scbus1"
   hint.ch.0.target="0"
   hint.ch.0.unit="0"
   [...]


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: em0 watchdog timeouts

2009-10-05 Thread Greg Byshenk

On Mon, Oct 05, 2009 at 08:32:14PM +0200, Daniel Bond wrote:

> What I need is useful advice/help. I never stated I needed a driver  
> developer.
> 
> I'd like to be able to run my favorite OS on cool hardware, in the  
> future, for a high-performing NFS-server, without problems like I've  
> experienced the past 6months, on a production system.
> Please note that I'm managing a server-park almost completely based on  
> FreeBSD, and I'm running many NFS servers on other hardware, for other  
> services, without issues.
> 
> I've seen several other FreeBSD-users having problems with this too,  
> so I think it's of importance for the project. As I mentioned  
> originally, I'm happy to dispose the hardware to any FreeBSD developer
> that might want to look further into this. Debugging it further is  
> above my skill-set, I don't even know where to begin looking,  
> especially since I can't produce any panics.

I can give one bit of advice that helped me in a similar situation:
check you motherboards.

I run about a dozen fileservers on FreeBSD, and have always been very
happy with their performance, but some months ago I began to experience
problems with one of them.  These problems were 'watchdog timeout'
errors.  Tried all manner of things, different NICs of different types,
changing settings, etc., but nothing helped over the long term.  At 
some point, when very heavy i/o was going on to our Beowulf cluster, the
'watchdog timeouts' would begin.  What was strange is that other 
(supposedly identical) machines handled _more_ i/o without a problem.

Finally, while doing some comparisons, I realized that the motherboard
having the problem was _not_ the same as the others; it was similar, but
not identical.  I changed the motherboard and all the problems went away,
never to reappear.

I don't know if it was a specific problem with that particular
motherboard, or something about that model, but for whatever reason, it
appears that the buses just couldn't handle a RAID card and three active
NICs.

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: issues with Intel Pro/1000 and 1000baseTX

2009-05-16 Thread Greg Byshenk

On Fri, May 15, 2009 at 06:01:33PM -0300, Nenhum_de_Nos wrote:

> I know this is a bit off, but as I never had CAT6 stuff to deal with here
> it goes. is there any problems in using CAT6 cabling and not 1000baseTX
> capable switch ?
> 
> I plan to install cat6 cables and just use 1000baseTX in future. this will
> be my new home network and all I have now is 100baseTX and two 1000baseT
> cards.

There should be no problem at all.  CAT6 must meet higher standards, but
the basic cable design is the same at CAT5, and it works for 100baseTX,
and even for 10baseT (if you really wanted to use it).

When my company relocated to a new building, the entire network was 
cabled at CAT6, but we still have some machines and switches that are
100baseTX, and they work fine.

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: em? watchdog timeout 7-stable

2009-05-15 Thread Greg Byshenk

mation re em1 on this machine:

# pciconf -lvb
e...@pci0:7:1:0: class=0x02 card=0x10028086 chip=0x10118086 rev=0x01 
hdr=0x00
vendor = 'Intel Corporation'
device = '82545EM Gigabit Ethernet Controller (Fiber)'
class  = network
subclass   = ethernet
bar   [10] = type Memory, range 64, base 0xda30, size 131072, enabled
bar   [20] = type I/O Port, range 32, base 0x5000, size 64, enabled


# vmstat -i
interrupt  total   rate
irq4: sio0  1479  0
irq6: fdc010  0
irq14: ata0   58  0
irq16: skc0 em0   758850 85
irq18: twa0  2085338234
irq24: em1 1  0
cpu0: timer 17806226   1999
cpu3: timer 17798161   1998
cpu2: timer 17798127   1998
cpu1: timer 17798043   1998
cpu5: timer 17798058   1998
cpu6: timer 17798161   1998
cpu4: timer 17798160   1998
cpu7: timer 17798160   1998
Total  145238832  16311


# ifconfig em1
em1: flags=8843 metric 0 mtu 1500
options=db
ether 00:07:e9:1a:ae:dc
inet 192.168.1.62 netmask 0xf800 broadcast 192.168.7.255
media: Ethernet autoselect (1000baseLX )
status: active


Any ideas?



On Wed, May 13, 2009 at 06:44:38PM +0200, Greg Byshenk wrote:
> On Wed, May 13, 2009 at 06:42:07PM +0200, Greg Byshenk wrote:
> 
> > As a followup to my own previous message, I continue to have annoying 
> > problems with "em?: watchdog timeout" on one of my machines (now running
> > 7.2-STABLE as of 2009-05-08).
> > 
> > I have discontinued using the on-board (em, copper) NICs, and replaced
> > the original fibre NIC with a newer model, but the problem persists.
> > I've also set
> > 
> >hw.pci.enable_msix=0
> >hw.pci.enable_msi=0
> >hw.em.rxd=1024
> >hw.em.txd=1024
> >net.inet.tcp.tso=0
> > 
> > ...as suggested in some discussions of this problem, and set the em1
> > interface to 'polling', all to no avail.  Frequently, though irregularly
> > (once or twice a day), the console begins to display
> > 
> >em1: watchdog timeout -- resetting
> >em1: watchdog timeout -- resetting
> >em1: watchdog timeout -- resetting
> > 
> > the nework is down, and the machine locks up.
> > 
> > [Note: I am getting 'em1' now instead of 'em0' as previously, but this
> > is due to changing all of the nics, which led to a different numbering;
> > the timeout is still occurring on the (main) interface, the fibre 
> > gigabit connection.]
> > 
> > What is particularly perverse (IMO) is that, since changing the NIC to
> > the newer model (and updating the kernel), I can no longer break to the
> > debugger when the lockup occurs (there is no response to the break) --
> > bit I _can_ shut the machine down cleanly via hardware (a touch of the
> > power switch sends 'shutdown', and the machine shuts down cleanly --
> > after killing off processes waiting on network i/o).
> > 
> > The machine is running nfs and samba (3.2.10, from ports), and pretty
> > much nothing else.
> > 
> > 
> > Anyone have any ideas about this...?  I'm going mad with this.

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: em0 watchdog timeout 7-stable

2009-05-13 Thread Greg Byshenk

On Wed, May 13, 2009 at 06:42:07PM +0200, Greg Byshenk wrote:

> As a followup to my own previous message, I continue to have annoying 
> problems with "em?: watchdog timeout" on one of my machines (now running
> 7.2-STABLE as of 2009-05-08).
> 
> I have discontinued using the on-board (em, copper) NICs, and replaced
> the original fibre NIC with a newer model, but the problem persists.
> I've also set
> 
>hw.pci.enable_msix=0
>hw.pci.enable_msi=0
>hw.em.rxd=1024
>hw.em.txd=1024
>net.inet.tcp.tso=0
> 
> ...as suggested in some discussions of this problem, and set the em1
> interface to 'polling', all to no avail.  Frequently, though irregularly
> (once or twice a day), the console begins to display
> 
>em1: watchdog timeout -- resetting
>em1: watchdog timeout -- resetting
>em1: watchdog timeout -- resetting
> 
> the nework is down, and the machine locks up.
> 
> [Note: I am getting 'em1' now instead of 'em0' as previously, but this
> is due to changing all of the nics, which led to a different numbering;
> the timeout is still occurring on the (main) interface, the fibre 
> gigabit connection.]
> 
> What is particularly perverse (IMO) is that, since changing the NIC to
> the newer model (and updating the kernel), I can no longer break to the
> debugger when the lockup occurs (there is no response to the break) --
> bit I _can_ shut the machine down cleanly via hardware (a touch of the
> power switch sends 'shutdown', and the machine shuts down cleanly --
> after killing off processes waiting on network i/o).
> 
> The machine is running nfs and samba (3.2.10, from ports), and pretty
> much nothing else.
> 
> 
> Anyone have any ideas about this...?  I'm going mad with this.


Just as an FYI, the drive errors I described in my previous message
appear to have been due to a bad BBU on the RAID controller, and to
have been resolved.
 

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: em0 watchdog timeout 7-stable

2009-05-13 Thread Greg Byshenk

As a followup to my own previous message, I continue to have annoying 
problems with "em?: watchdog timeout" on one of my machines (now running
7.2-STABLE as of 2009-05-08).

I have discontinued using the on-board (em, copper) NICs, and replaced
the original fibre NIC with a newer model, but the problem persists.
I've also set

   hw.pci.enable_msix=0
   hw.pci.enable_msi=0
   hw.em.rxd=1024
   hw.em.txd=1024
   net.inet.tcp.tso=0

...as suggested in some discussions of this problem, and set the em1
interface to 'polling', all to no avail.  Frequently, though irregularly
(once or twice a day), the console begins to display

   em1: watchdog timeout -- resetting
   em1: watchdog timeout -- resetting
   em1: watchdog timeout -- resetting

the nework is down, and the machine locks up.

[Note: I am getting 'em1' now instead of 'em0' as previously, but this
is due to changing all of the nics, which led to a different numbering;
the timeout is still occurring on the (main) interface, the fibre 
gigabit connection.]

What is particularly perverse (IMO) is that, since changing the NIC to
the newer model (and updating the kernel), I can no longer break to the
debugger when the lockup occurs (there is no response to the break) --
bit I _can_ shut the machine down cleanly via hardware (a touch of the
power switch sends 'shutdown', and the machine shuts down cleanly --
after killing off processes waiting on network i/o).

The machine is running nfs and samba (3.2.10, from ports), and pretty
much nothing else.


Anyone have any ideas about this...?  I'm going mad with this.

-greg byshenk



# pciconf -lvb
[...]
e...@pci0:7:1:0: class=0x02 card=0x10028086 chip=0x10118086 rev=0x01 
hdr=0x00
vendor = 'Intel Corporation'
device = '82545EM Gigabit Ethernet Controller (Fiber)'
class  = network
subclass   = ethernet
bar   [10] = type Memory, range 64, base 0xda30, size 131072, enabled
bar   [20] = type I/O Port, range 32, base 0x5000, size 64, enabled
[...]

# vmstat -i
interrupt  total   rate
irq4: sio0  1666  0
irq6: fdc010  0
irq14: ata0   58  0
irq16: skc0 em0  1437801 98
irq18: twa0   846981 57
irq24: em1   4378650299
cpu0: timer 29258004   1999
cpu1: timer 29249758   1999
cpu3: timer 29249816   1999
cpu7: timer 29249779   1999
cpu2: timer 29249729   1999
cpu4: timer 29249852   1999
cpu6: timer 29249851   1999
cpu5: timer 29249814   1999
Total      240671769  16450



On Sun, Apr 26, 2009 at 02:50:08PM +0200, Greg Byshenk wrote:
> I have one machine that is seeing watchdog timeouts on em0, running 7-STABLE
> amd64 as of 2009.04.19, and also some other more perverse errors.
> 
> Twice now in the last 48 hours, this machine has become unreachable via the
> network, and connecting to the console shows an endless string of 
> 
>[...]
>em0: watchdog timeout -- resetting
>em0: watchdog timeout -- resetting
>em0: watchdog timeout -- resetting
> 
> messages. The machine is almost locked up.  That is, I can get a login
> prompt, but can go no further than typing in a username; after the
> username, no password prompt, and nothing further.  The only option is
> to hard reset the machine or to drop to debugger and reboot.
> 
> Now the "perverse" part.  After restarting, the system partition is no
> more.
> 
> Background detail:  the machine is a fileserver, with a 3Ware 9650SE-16ML
> SATA controller, connected to 16 1TB SATA drives, this configured as
> a 14-drive RAID10 array (+ 2 hot spares), with a 50GB system partition
> and 6.5TB data partition.  The system partition is configured as da1,
> with one slice and more or less standard partitions for / /var /tmp, etc.
> (the data partition of the array is sliced with gpt).
> 
> The issue here is that, upon restart, all parition information on da0
> seems to have disappeared, and restarting results in a "no operating
> system found" message, and a failure to boot (obviously).
> 
> But all of the data is still present.  If I boot into rescue mode,
> recreate da0s1, mark it bootable, and restore the bsdlabel, then
> everything works again.  I can restart the machine, and it comes back
> up normally (it requires an fsck of everything on da0, but after that
> everything is back to normal).
> 
> I don't know if this is two unrelated problems, or one problem with
> two symptoms, or something e

Re: [7.2] R/W mount of / denied. Filesystem not clean - run fsck.

2009-05-06 Thread Greg Byshenk

On Wed, May 06, 2009 at 09:18:02PM +0200, Helmut Schneider wrote:
> Marat N.Afanasyev  wrote:
> >Helmut Schneider wrote:

> >>I do have such thing (IBM Blade Center) but I'm looking for something to 
> >>avoid the situation above. Something that lets me at least boot into 
> >>single user mode.

> >if you have an ip-kvm you can drop into single-user and fsck any disk you 
> >have. all you need to do is to choose 'single user' from beastie-menu. or 
> >start kernel with -s parameter

> I *do* now how to enter single user mode but the kernel panic'ed *before* 
> the shell started. :)

The problem is that, if something is so far wrong that you can't even
get to the single-user shell, then there probably isn't anything else
but rescue.

One thing that might be an option:  at work, we use PXE for Linux and
FreeBSD installs, so one thing I've done is to create a pxeboot rescue
image (using the mfsroot from the rescue CD).  This means that, if there
is this sort of problem, we can boot into rescue mode from the network
(the BIOS is also redirected to the serial console) and not have to 
worry about swapping CDs.  The same thing should also work for remote
locations.

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: [7.2] R/W mount of / denied. Filesystem not clean - run fsck.

2009-05-06 Thread Greg Byshenk

On Wed, May 06, 2009 at 11:50:11AM +0200, Helmut Schneider wrote:
> Marat N.Afanasyev  wrote:
> >Helmut Schneider wrote:

> >>after upgrading a few systems yesterday from 7.1-RELEASE to 7.2-RELEASE 
> >>on one machine I got the error above. The problem was that
> >>
> >>- I was unable to cope with it but booting from a live CD.
> >>- the message appeared ~ 1000 times and then the kernel paniced.
> >>
> >>After fsck'ing / with the help of the live CD I rebooted the machine but 
> >>now I got the same problem with /home.
> >>
> >>How can I avoid such issues (except of not letting the machine crash)? Is 
> >>there a way to boot at least to single user mode and then run fsck (I was 
> >>at home, far away from the machine, not funny)?

> There is no 'login' when / cannot be mounted...
> 
> >fsck it. if you have another machine in there, you can try to make a 
> >serial console. or install a ip-kvm extender ;)
> 
> I do have such thing (IBM Blade Center) but I'm looking for something to 
> avoid the situation above. Something that lets me at least boot into single 
> user mode.

If you had access to the console (I'm guessing you did in order to use the
live CD), did you try booting into single-user from the beastie menu?

IME, failure to fsck the / menu should drop automatically to single-user
at the console, but if this fails, then you should be able to choose
single-user boot from the menu, which will then not try to run fsck or
mount / rw.  From there you should be able to fsck and remount /, as well
as /home or anything else.  This will fail if there is something horribly
wrong with /, causing a failure even when / is mounted ro, but then there
may be no good solution.

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

em0 watchdog timeout (and 3ware problems) 7-stable

2009-04-26 Thread Greg Byshenk

I have one machine that is seeing watchdog timeouts on em0, running 7-STABLE
amd64 as of 2009.04.19, and also some other more perverse errors.

Twice now in the last 48 hours, this machine has become unreachable via the
network, and connecting to the console shows an endless string of 

   [...]
   em0: watchdog timeout -- resetting
   em0: watchdog timeout -- resetting
   em0: watchdog timeout -- resetting

messages. The machine is almost locked up.  That is, I can get a login
prompt, but can go no further than typing in a username; after the
username, no password prompt, and nothing further.  The only option is
to hard reset the machine or to drop to debugger and reboot.

Now the "perverse" part.  After restarting, the system partition is no
more.

Background detail:  the machine is a fileserver, with a 3Ware 9650SE-16ML
SATA controller, connected to 16 1TB SATA drives, this configured as
a 14-drive RAID10 array (+ 2 hot spares), with a 50GB system partition
and 6.5TB data partition.  The system partition is configured as da1,
with one slice and more or less standard partitions for / /var /tmp, etc.
(the data partition of the array is sliced with gpt).

The issue here is that, upon restart, all parition information on da0
seems to have disappeared, and restarting results in a "no operating
system found" message, and a failure to boot (obviously).

But all of the data is still present.  If I boot into rescue mode,
recreate da0s1, mark it bootable, and restore the bsdlabel, then
everything works again.  I can restart the machine, and it comes back
up normally (it requires an fsck of everything on da0, but after that
everything is back to normal).

I don't know if this is two unrelated problems, or one problem with
two symptoms, or something else.  I think that I can safely say that
it is not a problem with the 3Ware controller itself, as I replaced
the controller with a spare (identical model), and the problem
recurred.  Additionally, I have an almost-identical configuration on
four other machines, none of which are experiencing any problems.
One thing that is different is that the other machines use
Intel PRO/1000 PF (pci-e) NICs.

Is there some known problem with the Intel 2572 fibre NIC?  Or some
potential interaction of it with the 3ware RAID controller?

For the moment, I've set hw.pci.enable_msi=0 (as discussed in the
threads on 7.2/bge), and am building a new kernel/world from sources
csup'd one hour ago, but I'd really like to hear any ideas about this
-- particularly the wiping of the label.

Some information about the system:


# /dev/da0s1:
8 partitions:
#size   offsetfstype   [fsize bsize bps/cpg]
  a:  209715204.2BSD0 0 0 
  b:  8388608  2097152  swap
  c: 1048561920unused0 0 # "raw" part, don't 
edit
  d:  8388608 104857604.2BSD0 0 0 
  e:  2097152 188743684.2BSD0 0 0 
  f: 41943040 209715204.2BSD0 0 0 
  g: 41941632 629145604.2BSD0 0 0 


e...@pci0:4:1:0: class=0x02 card=0x10038086 chip=0x10018086 rev=0x02 
hdr=0x00
vendor = 'Intel Corporation'thernet Controller (Fiber)'
device = '2572 10/100/1000 Ethernet Controller (Fiber)'
class  = networktory, range 32, base 0xda00, size 131072, enabled
subclass   = ethernetory, range 32, base 0xda00, size 131072, enabled
bar   [10] = type Memory, range 32, base 0xda00, size 131072, enabled
bar   [14] = type Memory, range 32, base 0xda02, size 65536, enabled0x00
 
t...@pci0:9:0:0:class=0x010400 card=0x100413c1 chip=0x100413c1 rev=0x01 
hdr=0x00
device = '9650SE Series PCI-Express SATA2 Raid Controller'
class  = mass storage
subclass   = RAID
bar   [10] = type Prefetchable Memory, range 64, base 0xd800, size 
33554432, enabled
bar   [18] = type Memory, range 64, base 0xda30, size 4096, enabled
bar   [20] = type I/O Port, range 32, base 0x3000, size 256, enabled
cap 01[40] = powerspec 2  supports D0 D1 D2 D3  current D0
cap 05[50] = MSI supports 32 messages, 64 bit
cap 10[70] = PCI-Express 1 legacy endpoint

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: X.Org/xdm 'frozen' after installworld (7-stable)

2009-02-04 Thread Greg Byshenk

On Tue, Feb 03, 2009 at 10:58:42AM -0800, Kent Stewart wrote:
> On Tuesday 03 February 2009 09:29:05 am Steve Franks wrote:

> > This is a new weird one I've never had before.  Consoles work fine,
> > but the mouse and keyboard won't move/type when xdm pops up.
> > ctrl-alt-F2 takes you right to a working console, and the mouse works
> > fine in the console...ctrl-alt-backspace no longer kills X either...
 
> The option that I found the easiest was to add
> 
>  Option "AutoAddDevices" "off"
> 
> To the ServerFlags section. I was told in the ports list that you can add it 
> to the ServerLayout section but I could never make that work.

I had the same problem yesterday after updating X.

For me, adding dbus_enable="YES" and hald_enable="YES" to rc.conf and
restarting solved the problem.


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SSH problem

2009-01-26 Thread Greg Byshenk

On Mon, Jan 26, 2009 at 11:21:57AM -0800, Xin LI wrote:
> Xian Chen wrote:

> > I can use scp to move files from a linux to my Freebsd machine.
> > 
> > But, when I try to use WinSCP under windows, it always failed. WinSCP
> > errors: "Network error: Connection refused". Both scp & sftp fail if using
> > WinSCP.
> > 
> > Any clues for this?

> My guess is that you have specified an incorrect port number.  Try tcpdump?

Another possibility, IIRC, is a bad ssh hostkey (I haven't used WinSCP in
quite some time, but I recall that its error messages are not particularly
informative).

You can also check to see if you can reach the server.  Try a plain telnet
to port 22.  You won't actually be able to establish a connection if you
aren't running ssh, but you should see something like:

   Connected to .
   Escape character is '^]'.
   SSH-2.0-OpenSSH_5.1p1 FreeBSD-20080901

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: mergemaster broken -- take 2

2009-01-08 Thread Greg Byshenk

On Thu, Jan 08, 2009 at 11:47:42AM +0100, Oliver Fromme wrote:
> Greg Byshenk wrote:
>  > Andrei Kolu wrote:
 
>  > > NOTE: I do not reboot my system until everything is updated. Why it is 
>  > > necessary to boot new kernel and then upgrade world is beyound me..YMMW
>  > 
>  > - I suppose that it is not strictly necessary to reboot between 
>  >   installing kernel and world, but I always do so.
 
> It _is_ necessary.  If you don't reboot, you're still running
> the old kernel which might not be able to support new binaries
> and libraries that installworld will install on your system.

Of course this is correct; my error.

The chance of something going wrong in this case is probably quite
small, but it something does go wrong it can go horribly wrong.
 

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: mergemaster broken -- take 2

2009-01-08 Thread Greg Byshenk

On Thu, Jan 08, 2009 at 10:10:25AM +0200, Andrei Kolu wrote:
> Mike Lempriere wrote:

> >Hi folks -- sorry to be a nag, but my main production system is barely 
> >limping along on an old kernel with mismatched libraries.  I have no 
> >idea what else to do -- please help!
> >---
> >I'm upgrading 5-stable (was at 5.5) to 6-stable, in preparation for 
> >6-stable to 7-stable.
> >No problems with cvsup, make buildworld, make installworld, make 
> >buildkernel, mergemaster -p.
> >make installkernel, boot to single user.  Then mergemaster -- blammo:
> What is your exact make sequences are?
> 
> I usually do this way:
> 
> # csup /usr/share/examples/cvsup/standard-supfile
> # cd /usr/src
> here I usually softlink my kernel config file in /root directory to 
> appropriate architecture one and edit /etc/make.conf:
> ---
> SUP_UPDATE=yes
> SUPHOST=cvsup.no.FreeBSD.org
> SUPFILE=/usr/share/examples/cvsup/standard-supfile
> PORTSSUPFILE=/usr/share/examples/cvsup/ports-supfile
> DOCSUPFILE=/usr/share/examples/cvsup/doc-supfile
> KERNCONF=KERNEL
> ---
> /usr/src/sys/amd64/conf
> KERNEL -> /root/kernel/KERNEL
> 
> # make buildkernel
> # make installkernel
> # make buildworld
> # mergemaster -p
> # make installworld
> # mergemaster

It may be me that is mistaken, but this seems wrong to me, as does the
sequence in the original message:

   # cvsup
   # make buildworld
   # make installworld
   # make buildkernel
   # mergemaster -p.
   # make installkernel
   # boot to single user
   # mergemaster

If I am not very much mistaken, the "canonical" process is:

# make buildworld
# make buildkernel
# make installkernel
# reboot (*)
# mergemaster -p
# make installworld
# mergemaster

The reasons for the other methods being wrong are (as I understand them):

- You should build your new world before building your new kernel, as
  it may be the case that some aspects of the new kernel build are
  dependent upon aspects of the new world build.  If you build your
  new kernel before building your new world, you will be building 
  your new kernel against the old world.

- You should install your new kernel before installing your new world,
  as it can be the case that some aspects of the new world will not be
  understood by your old kernel. A new kernel should always be
  compatible with an old userland/world, but an old kernel may not 
  always be compatible with a new userland/world.

> NOTE: I do not reboot my system until everything is updated. Why it is 
> necessary to boot new kernel and then upgrade world is beyound me..YMMW

- I suppose that it is not strictly necessary to reboot between 
  installing kernel and world, but I always do so.  The reason for
  this is that, if something has gone horribly wrong, it is quite easy
  to go back and boot kernel.old. If you don't realize that there is
  something wrong until after you have installed everything (kernel and
  userland), it can be much more difficult to recover.


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Problem with Adaptec 29320LPE

2008-11-24 Thread Greg Byshenk

On Mon, Nov 24, 2008 at 12:49:12PM +0100, Rink Springer wrote:
> Hi Greg,
> 
> On Mon, Nov 24, 2008 at 12:42:49PM +0100, Greg Byshenk wrote:
> > backuphost# camcontrol devlist
> > at scbus0 target 0 lun 0 (pass0,ch3)
> >at scbus0 target 1 lun 0 (sa3,pass1)
> > at scbus0 target 2 lun 0 (pass2,ch4)
> >at scbus0 target 3 lun 0 (sa4,pass3)
> > at scbus1 target 0 lun 0 (da0,pass4)
> > at scbus1 target 0 lun 1 (da1,pass5)

> Are these volumes perhaps >2TB ? If so, it won't work...  we stumbled on
> this at work a few weeks ago, and once we resized the volumes so that'd
> all be <2TB, the controller worked fine...
> 
> As far as I know, this is the only workaround - I couldn't see relevant
> patches in Open/NetBSD either that might have fixed this issue :-(
 
The volume da1 is indeed >2TB, but it is not connected to the controller;
it (along with da0) is actually a RAID-10 array connected to a 3Ware/AMCC 
SATA controller.  The Adaptec contoller is used only for the tape drives
(the SDX-900V is AIT4; the SDX-1100 is AIT5), and they are <2TB.

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Problem with Adaptec 29320LPE

2008-11-24 Thread Greg Byshenk

l: ch: warning: could not map element source 
address 43927d to a val
id element type
Nov 20 15:01:16 backuphost kernel: ch: warning: could not map element source 
address 31239d to a val
id element type
Nov 20 15:01:16 backuphost kernel: ch: warning: could not map element source 
address 31616d to a val
id element type
Nov 20 15:01:16 backuphost kernel: ch: warning: could not map element source 
address 30983d to a val
id element type
Nov 20 15:01:16 backuphost kernel: ch: warning: could not map element source 
address 30983d to a val
id element type
Nov 20 15:01:16 backuphost kernel: ch: warning: could not map element source 
address 31239d to a val
id element type
Nov 20 15:01:16 backuphost kernel: ch: warning: could not map element source 
address 31616d to a val
id element type
Nov 20 15:01:16 backuphost kernel: ch: warning: could not map element source 
address 30215d to a val
id element type
Nov 20 15:01:16 backuphost kernel: ch: warning: could not map element source 
address 25603d to a val
id element type
Nov 20 15:01:16 backuphost kernel: ch: warning: could not map element source 
address 31239d to a val
id element type
Nov 20 15:01:16 backuphost kernel: ch: warning: could not map element source 
address 31616d to a val
id element type


-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: System deadlock when using mksnap_ffs

2008-11-14 Thread Greg Byshenk

On Thu, Nov 13, 2008 at 05:08:10PM +0100, Greg Byshenk wrote:
> On Wed, Nov 12, 2008 at 08:42:00PM -0800, Jeremy Chadwick wrote:
>  
> > The rest of the below information is good -- but I'm confused about
> > something: is there anyone out there who can use mksnap_ffs on a
> > filesystem (/usr is a good test source) and NOT experience this
> > deadlocking problem?  Literally *every* FreeBSD box I have root access
> > to suffers from this problem, so I'm a little baffled why we end-users
> > need to keep providing debugging output when it should be easy as pie
> > for a developer to do "dump -0 -L -a -f /path/fs.dump /usr" and watch
> > their system wedge.
> 
> As an answer to the question (and additional information), I am 
> experiencing the problem, but not on all filesystems. 
> 
> This is under FreeBSD 7.1-PRERELEASE #7: Thu Nov  6 11:29:52 CET 2008,
> amd64 (from sources csup'ed immediately prior to the build).
> 
> I have four filesystems used for data storage:
> 
> /dev/da1p196850470   7866026   81236408 9%/export/mail
> /dev/da1p2  1937058312 972070320  81002332855%/export/home
> /dev/da1p3  1937058312  79027008 1703066640 4%/export/misc
> /dev/da1p4  2598991534 271980564 211909164811%/export/spare
> 
> I can successfully mksnap_ffs the first (smaller) partition, but an
> attempt to do so on any of the others causes a lock.
> 
> Note: this is a lockup, not a "slow".  The system becomes unresponsive
> to any input, and there is no hard drive activity, and this does not
> change over a period of more than 12 hours.


As a followup to my own post, after reading this discussion, I applied
the patches and rebuild my system last night.

As of today, with the patched ffs_snapshot.c, I can now make snapshots
of all the filesystems listed above.  It takes rather a long time, but
that is to be expected, I think, and the snapshots finish normally.


-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: System deadlock when using mksnap_ffs

2008-11-13 Thread Greg Byshenk

On Wed, Nov 12, 2008 at 08:42:00PM -0800, Jeremy Chadwick wrote:

> The rest of the below information is good -- but I'm confused about
> something: is there anyone out there who can use mksnap_ffs on a
> filesystem (/usr is a good test source) and NOT experience this
> deadlocking problem?  Literally *every* FreeBSD box I have root access
> to suffers from this problem, so I'm a little baffled why we end-users
> need to keep providing debugging output when it should be easy as pie
> for a developer to do "dump -0 -L -a -f /path/fs.dump /usr" and watch
> their system wedge.

As an answer to the question (and additional information), I am 
experiencing the problem, but not on all filesystems. 

This is under FreeBSD 7.1-PRERELEASE #7: Thu Nov  6 11:29:52 CET 2008,
amd64 (from sources csup'ed immediately prior to the build).

I have four filesystems used for data storage:

/dev/da1p196850470   7866026   81236408 9%/export/mail
/dev/da1p2  1937058312 972070320  81002332855%/export/home
/dev/da1p3  1937058312  79027008 1703066640 4%/export/misc
/dev/da1p4  2598991534 271980564 211909164811%/export/spare

I can successfully mksnap_ffs the first (smaller) partition, but an
attempt to do so on any of the others causes a lock.

Note: this is a lockup, not a "slow".  The system becomes unresponsive
to any input, and there is no hard drive activity, and this does not
change over a period of more than 12 hours.

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: challenge: end of life for 6.2 is premature with buggy 6.3

2008-06-08 Thread Greg Byshenk

On Sat, Jun 07, 2008 at 03:11:42PM -0700, Jo Rhett wrote:
> On Jun 7, 2008, at 1:44 PM, Patrick M. Hausen wrote:

> >>This is why EoLing 6.2 and forcing people to upgrade to a release
> >>with lots of known issues is a problem.

> >People who have issues with RELENG_6_3 should upgrade to RELENG_6
> >which is perfectly supported.

> I'm sorry, but you clearly don't run RELENG_6 on anything.  I run it  
> on two home computers, and grabbing it on any given day and trying to  
> run with it in production is insanity.  Lots and lots of things are  
> committed, reverted, recommitted, reverted and then finally  
> redesigned.  Each of those steps are often committed to the source  
> tree.  The -RELEASE versions prevent this kind of insanity.

I can't speak for Patrick, but I can ad that I very definitely _do_
run RELENG_6 on ~40 machines (web, mail, file, and applications
servers), and do so without any serious problems. Which is not to say
that there are never problems, but that when there have been problems,
they have been uncovered during testing.

Of course it is true that "grabbing" something and "trying to run
with it in production is insanity". But this (at least IMO) has
nothing at all to do with RELENG_X _per_ _se_, as it applies equally
to X-RELEASE, and also to any production systems running any other OS.
Before we roll out a new RELENG_6 build, we test it first to discover
any potential problems -- but this is standard practice for
_everything_ that goes into production, including changes to Linux,
Solaris, and Windows systems, and also changes to samba, apache, or any
other software running on the systems.  My point here is that it is the
"grabbing" something and throwing it into production without testing
that is "insanity", and that this has nothing specifically to do with
RELENG_6.

I might also add that I have machines that "grab" (actually, pretty 
much randomly -- that is, "on a given day" and without particular 
concern from me) RELENG_6 and RELENG_7, and even these machines very
rarely exhibit any problems. Of course, these are just test machines,
and without the full pre-production testing it is possible that there
are some problems in these cases that just don't manifest themselves,
but my experience (and, I suspect, that of many others) indicates that
your description of RELENG_6 as a seething cauldron of uncertainty is
inaccurate. 

> I'm struggling to find a phrase here that can't be taken to be an  
> insult, so forgive me and try to understand when I say that you really  
> should try watching the cvs tree for a bit before making a nonsense  
> comment like that.

You don't seem to have struggled very hard. After all, you could have
mad the same point by noting that you consider it a mistake to run
RELENG_6 in production. And by not doing this, you have undermined
your own position, as it seems clear that there are _many_ people and
organizations who run RELENG_6 in production (by which I mean, some
version of RELENG_6, and not the tracking of daily changes to RELENG_6),
which means that your assertion that such is "nonsense" is itself
mistaken.

Somewhat more generally, this sort of thing may be why you are getting
the amount of push-back you see. That is, what you are claiming seems
to match the experience of few (if any) others.  As you may have
noticed from this thread, the general view (a consensus, seemingly,
apart from yourself) is that 6.3 is _better_ (more stable, etc.) than
6.2.  Given that such is the case (as it seems very much to be), then
the response to your statement that 6.3 isn't good enough of "what 
exactly is wrong?" seems (at least to me) to be entirely reasonable.
When one of my people comes to me and says that something is wrong with
X (and particularly when my experience is that there is nothing wrong
with X), my first response is almost invariably:  "what, specifically,
is wrong with X?"

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: challenge: end of life for 6.2 is premature with buggy 6.3

2008-06-04 Thread Greg Byshenk

On Wed, Jun 04, 2008 at 04:41:45PM -0500, Kevin Kinsey wrote:
> Clifton Royston wrote:

> >  For example, if I take a 6.3R CD, or build one for 6-RELENG, is there
> >a way to do an "upgrade in place" on each server?  Or would it work
> >better to do a build from recent source on the development server, then
> >export /usr/src and /usr/obj via NFS to the production servers and do
> >the usual "make installkernel; reboot;" etc. sequence on them?  (In my
> >case I do have all machines on one GigE switch.)

> I've heard of the latter being done with decent results.

I can't say that it is "better", but I do the latter (well, actually I
build on a test machine to make sure there are no problems, then sync
to an NFS server and mount src and object from there, followed by
installkernel-reboot-installworld-merge-reboot) on a number of different
machines (currently runnign 6.3-STABLE of 2008-05-22 and 7.0-STABLE of
2008-05-27), and it is certainly faster and easier than doing a build
on each individual machine.

I do the same thing with ports, doing a 'portupgrade -p' on the build
machine followed by a 'portupgrade -P' on the "clients" (building
packages on the build machine, and then installing via my own packages
on the others).  Again, I can't say that it is "better", but it is
certainly faster and easier.

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: possible zfs bug? lost all pools

2008-05-18 Thread Greg Byshenk

On Sun, May 18, 2008 at 09:56:17AM -0300, JoaoBR wrote:
 
> after trying to mount my zfs pools in single user mode I got the following 
> message for each:
> 
> May 18 09:09:36 gw kernel: ZFS: WARNING: pool 'cache1' could not be loaded as 
> it was last accessed by another system (host: gw.bb1.matik.com.br hostid: 
> 0xbefb4a0f).  See: http://www.sun.com/msg/ZFS-8000-EY
> 
> any zpool cmd returned nothing else as not existing zfs, seems the zfs info 
> on 
> disks was gone
> 
> to double-check I recreated them, rebooted in single user mode and repeated 
> the story, same thing, trying to /etc/rc.d/zfs start returnes the above msg 
> and pools are gone ...
> 
> I guess this is kind of wrong 


I think that the problem is related to the absence of a hostid when in
single-user.  Try running '/etc/rc.d/hostid start' before mouning.

http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075001.html


-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: samba build failure on 6-STABLE

2008-05-03 Thread Greg Byshenk

On Sat, May 03, 2008 at 11:42:14AM +0100, Doug Rabson wrote:
> On 1 May 2008, at 15:39, Michael Proto wrote:
> >Greg Byshenk wrote:

> >>[...] Basically my problem is that the current Samba3 (samba-3.0.28,1)  
> >>won't build on a recent 6-STABLE system (I noticed it with sources
> >>csup'd 24 April, and it continues with sources csup'd today, 1 May).
> >>The strange thing is that this is a version of samba that has
> >>previously built successfully,  on the machine and with the
> >>configuration that is now failing.  (I was  attempting
> >>to rebuild because I saw some strange library errors.)  This at least
> >>suggests to me that the problem is _not_ due to something changing  
> >>with Samba, but to some other change that is being reflected in the
> >>Samba build.  [...]

> >I can confirm this on a 6-STABLE system last SUPed (kernel and world
> >rebuilt) to 20080428 11:23 EDT. samba-3.0.28,1 built fine on this box
> >when it was 6.3-RELEASE, and now fails in exactly the same place when
> >trying to rebuild on 6-STABLE.
 
> The attached patch should fix the problem.

It appears that it does.

I've applied the patch on a test machine, and Samba now builds successfully.
I've also done a reinstall of Samba, and the rebuild version appears to be
working properly (though I have not yet done any extensive testing).

Thanks,
-greg

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

samba build failure on 6-STABLE

2008-05-01 Thread Greg Byshenk

I'm posting this to freebsd-stable even though it is a problem with a port,
because the port itself has not changed, but a rebuild fails (on a system
and with a configuration that worked before my most recent system updates).

Basically my problem is that the current Samba3 (samba-3.0.28,1) won't build
on a recent 6-STABLE system (I noticed it with sources csup'd 24 April, and
it continues with sources csup'd today, 1 May). The strange thing is that
this is a version of samba that has previously built successfully, on the
machine and with the configuration that is now failing.  (I was attempting
to rebuild because I saw some strange library errors.)  This at least
suggests to me that the problem is _not_ due to something changing with Samba,
but to some other change that is being reflected in the Samba build.


The system in question is built from sources csup'd today (1 May 2008), with
all installed ports current as of today.  The same Samba did build successfully
with a source and ports tree csup'd on 7 March 2008.

As a test to see if there is some problem with the ports dependencies, I've 
tried a 'portupgrade -fR samba'; all of the dependencies built fine, but then
I got the same error when attempting to build Samba itself. It is not
definitive, but this suggests to me that this is not a ports problem (per se),
but a kernel/world problem.

This latter is highlighted by the fact that Samba builds without error on a
system with sources csup'd on 17 April.  That is, if I take the exact same
system on which the build fails, revert my world/kernel to a build from
17 April (leaving everything else exactly the same), then the error 
disappears and Samba builds successfully.


The actual error is below. Any ideas are welcome. I have a machine that I can
play with if someone would like me to try anything.

-greg


Compiling smbd/oplock_linux.c
smbd/oplock_linux.c: In function `signal_handler':
smbd/oplock_linux.c:73: error: structure has no member named `si_fd'
The following command failed:
cc -I. -I/usr/ports/net/samba3/work/samba-3.0.28/source  -O2 
-fno-strict-aliasing -pipe -D_SAMBA_BUILD_=3 -I/usr/local/include  
-I/usr/ports/net/samba3/work/samba-3.0.28/source/iniparser/src -Iinclude 
-I./include  -I. -I. -I./lib/replace -I./lib/talloc -I./tdb/include 
-I./libaddns -I./librpc -DHAVE_CONFIG_H  -I/usr/local/include -DLDAP_DEPRECATED 
   -I/usr/ports/net/samba3/work/samba-3.0.28/source/lib -D_SAMBA_BUILD_=3 -fPIC 
-DPIC -c smbd/oplock_linux.c -o smbd/oplock_linux.o
*** Error code 1

Stop in /usr/ports/net/samba3/work/samba-3.0.28/source.
*** Error code 1

Stop in /usr/ports/net/samba3.
*** Error code 1

Stop in /usr/ports/net/samba3.

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Recent bootloaders not working also on FIC PA-2005 board

2008-04-09 Thread Greg Byshenk

On Wed, Apr 09, 2008 at 10:32:48PM +0200, Marcin Cieslak wrote:

> >It would be these changes.  Debugging this will be hard. :(  Are you 
> >familiar with x86 assembly at all?

In relation to John Baldwin's question, I (at least) have basically zero
knowledge of x86 assembler.  :-(

But this bit caught my eye:

> How can I try to debug this? I have tried to attach serial console
> with AT keyboard unplugged I still get message that VGA console will
> be used. The serial port is working correctly (verified with Windows and
> later with NetBSD).

> Can I get serial console while booting from CDROM - do I need to remove
> VGA card for this?

When my error occurs (with the Asus TR-DLS), I get the message about
using "internal" console (vga?), even when the machine is set to 
use a serial console.  I don't know if this is relevant, but in my
case I can't use serial.

I can also add that -- though I am not much of a progammer -- I will
happily test anything that anyone might suggest.  My machine is not
in production (I built it to do some testing with FreeBSD7 and ZFS),
and I can break it without any real consequences.

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

7-STABLE bootloader not working on Asus TR-DLS

2008-04-08 Thread Greg Byshenk

I'm piggybacking this onto the previous bootloader thread because I have
a suspicion that my problem may be related to the 'fix' for the prveious
problem.

I've got a machine (old-ish) that will not boot with the changes to
src/sys/boot/i386 in March.

It it a dual-p3 system running on an Asus tr-dls motherboard (with most
recent -- from 2002, but that is the most recent) BIOS updates:

   Timecounter "i8254" frequency 1193182 Hz quality 0
   CPU: Intel(R) Pentium(R) III CPU family  1266MHz (1266.72-MHz 686-class 
CPU)
 Origin = "GenuineIntel"  Id = 0x6b1  Stepping = 1
 
Features=0x383fbff
   real memory  = 2147463168 (2047 MB)
   avail memory = 2091913216 (1995 MB)
   ACPI APIC Table: 
   FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
cpu0 (BSP): APIC ID:  3
cpu1 (AP): APIC ID:  0

When I install the most recent world (for example, a build of 7-STABLE from
01-04-2008), it simply fails to boot.  No panic, no crash, but just stops.

I get to:

   [...]
   BTX loader 1.00  BTX version is 1.02
   Consoles: internal video/keyboard
   BIOS drive A: is disk0

   ... and then nothing ... just hangs permanently

If I change back to 7-RELEASE, or to 7-STABLE as of 18-03-2008, there is 
no problem at all. If I run the system with 01-04-2008 world, but copy
back in the contents of /boot from 18-03-2008, then there is again no
problem. I can copy in the 01-04-2008 kernel and run under that, and there
is no problem (it is running like that now).  But I have to use the old
version of the booloader.

I'm not a coder, and haven't looked more deeply, but it appears that 
something in here:

   i386/src/sys/boot/i386/btx/btx/Makefile
   i386/src/sys/boot/i386/btx/btx/btx.S
   i386/src/sys/boot/i386/libi386/biosmem.c
   i386/src/sys/boot/i386/libi386/biossmap.c

...has broken booting on this machine.


Any advice gladly accepted.


-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Cannot mount a nfs share after doing a snapshot

2008-01-07 Thread Greg Byshenk

On Sun, Jan 06, 2008 at 05:38:30PM +0100, Jose Garcia Juanino wrote:
> El domingo 06 de enero a las 15:41:21 CET, Greg Byshenk escribi?:
> > On Sat, Jan 05, 2008 at 11:28:31PM +0100, Jose Garcia Juanino wrote:
> >  
> > > I have a 7.0-PRERELEASE i386 system with a nfs server, with an unique 
> > > export
> > > line in /etc/exports file:
> > > 
> > > / -maproot=root -network 192.168.1.0 -mask 255.255.255.0
> > > 
> > > After a reboot, I have no problem mounting this nfs share from a nfs 
> > > client.
> > > But after issuing the following command on the server:
> > > 
> > > # mount -u -o snapshot /.snap/now /

> > Is the problem that you are trying to mount your snapshot on top of the /
> > directory?  I use snapshots, but have never tried to do this, and can 
> > imagine that there might be a problem, since the snapshot is itself a
> > snapshot of a filesystem (different than the actual root filesystem).
> > 
> > That would explain the error:

> > > Jan  5 22:47:03 gauss mountd[542]: can't delete exports for /: 
> > > Cross-device link

> No, I am not trying to mount the snapshot. I am just taking (making) the
> snapshot, as man mount says.

Sorry, I wasn't following this (as I said, I don't work with snapshots in
this way).

I've looked at the 'mount' man page, and it seems that it should work the
way you are trying to do it. That said, because taking a snapshot grabs
the entirety of a filesystem, I can well imagine that trying to take a 
snapshot of the root filesystem while at the same time exporting that
filesystem via NFS will cause a problem.

> > What happens if you create a directory and mount your snapshot there:
> > 
> > mkdir /snapshotmount
> > mount -u -o snapshot /.snap/now /snapshotmount
> >
> > If this works, then you may need a separate exports line for /snapshotmount.

> # file /.snap/now
> /.snap/now: Unix Fast File system [v2] (little-endian) last mounted on
> /, last written at Sun Jan  6 16:24:19 2008, clean flag 1, readonly flag
> 1, number of blocks 130721, number of data blocks 126520, number of
> cylinder groups 4, block size 16384, fragment size 2048, average file
> size 16384, average number of files in dir 64, pending blocks to free 0,
> pending inodes to free 0, system-wide uuid 0, minimum percentage of free
> blocks 8, TIME optimization

Ok, so it looks like your /.snap/now snapshot actually exists, and is being
made, so it looks like the command

# mount -u -o snapshot /.snap/now /

is actually working. (So ignore the rest of what I said last time...)

I've just played with this a bit myself (I'm no expert, but I use snapshots
currently with 6-STABLE and want to know about any future problems), and I
can reproduce the problem (7.0-PRERELEASE as of 2 Jan 2008). I see the same
sort of errors as you report, and they cannot be cleared even by removing
the snapshot file and restarting nfsd/mountd. The only solution appears to
be to remove the snapshot and restart the machine. I can see how this might
be a bit inconvenient.

That said, there appears to be a problem with using the 

# mount -u -o snapshot  

form of the command.

The problem does _not_ occur (at least in my test) if you use the the

# mksnap_ffs  

command. Can you try taking a snapshot using mksnap_ffs?

If mksnap_ffs works, while 'mount -u -o' fails, then it looks like a bug...

-greg

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Cannot mount a nfs share after doing a snapshot

2008-01-06 Thread Greg Byshenk

reebsd7.0/3.4.6 /usr/local/lib/pth 
> /usr/local/lib/zsh
> a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout
> Creating and/or trimming log files:
> .
> Starting syslogd.
> Initial i386 initialization:
> .
> Additional ABI support:
> .
> Setting date via ntp.
>  5 Jan 22:41:49 ntpdate[496]: step time server 212.9.75.245 offset 0.950467 
> sec
> Starting rpcbind.
> NFS access cache time=60
> Clearing /tmp (X related).
> Starting mountd.
> Starting nfsd.
> Starting statd.
> Starting lockd.
> Starting xinetd.
> Removing stale Samba tdb files: 
> .
> .
> .
> .
> .
> .
> .
> .
>  done
> Starting nmbd.
> Starting smbd.
> Starting local daemons:
> .
> Updating motd
> .
> Mounting late file systems:
> .
> Starting ntpd.
> postfix/postfix-script: starting the Postfix mail system
> Starting distccd.
> Performing sanity check on apache22 configuration:
> Syntax OK
> Starting apache22.
> Starting anacron.
> Configuring syscons:
>  keymap
>  keyrate
>  font8x16
>  font8x14
>  font8x8
>  blanktime
> .
> Starting sshd.
> Starting cron.
> Local package initialization:
> #
> 
> 
> 
> Also, my /etc/src.conf used to build the world:
> 
> 
> #
> WITHOUT_ACPI=1
> WITHOUT_ASSERT_DEBUG=1
> WITHOUT_ATM=1
> WITHOUT_AUDIT=1
> WITHOUT_AUTHPF=1
> WITHOUT_BIND_DNSSEC=1
> WITHOUT_BIND_ETC=1
> WITHOUT_BIND_LIBS_LWRES=1
> WITHOUT_BIND_MTREE=1
> WITHOUT_BIND_NAMED=1
> WITHOUT_BLUETOOTH=1
> WITHOUT_I4B=1
> WITHOUT_IPFILTER=1
> WITHOUT_IPX=1
> WITHOUT_KERBEROS=1
> WITHOUT_LPR=1
> WITHOUT_NIS=1
> WITHOUT_PF=1
> WITHOUT_PROFILE=1
> WITHOUT_SENDMAIL=1
> WITHOUT_SHAREDOCS=1
> #
> 
> 
> The /etc/make.conf file:
> 
> #
> CPUTYPE?=pentium3
> MODULES_OVERRIDE=   linux if_tap sound/driver/emu10k1  syscons/green \
> linprocfs linsysfs  smbfs ntfs ext2fs libiconv \
> libmchain aio if_bridge vesa \
> cd9660_iconv udf_iconv msdosfs_iconv ntfs_iconv \
> zfs bridgestp
> BOOT_COMCONSOLE_PORT=   0x3F8
> BOOT_COMCONSOLE_SPEED=  115200
> PERL_VER=5.8.8
> PERL_VERSION=5.8.8
> #
> 
> 
> 
> Regards



-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Nagios + 6.3-RELEASE == Hung Process

2008-01-02 Thread Greg Byshenk

On Wed, Jan 02, 2008 at 07:24:28PM -0400, Marc G. Fournier wrote:
> - --On Wednesday, January 02, 2008 22:54:33 + Tom Judge <[EMAIL 
> PROTECTED]> wrote:

> > Not sure if this is related at all but out of the 3 nagios deployments we
> > have here I have only ever seen it on one (It currently has 2 nagios threads
> > spinning CPU time atm).

> > The differences on that server are:
> >
> > * It is amd64 compared to i386

> I never tried on i386, but in my case it was an amd64 system as well ... not 
> sure if that is relevant or not ... has anyone seen this problem *with* i386?

Yes.

We run Nagios on an i386 machine (dual Athlon MP 1800+), and I first saw this
problem with a build of 6-STABLE as of 2007-10-04, and it continues (if I don't
use the libmap.conf settings) with the running system of 6.3-PRERLEASE as of
2007-12-18 and nagios-2.10 (from ports of same date).

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: can I do 6.1-RELEASE to 6.2 via cvsup

2007-10-24 Thread Greg Byshenk

On Wed, Oct 24, 2007 at 11:18:30AM -0400, Tuc at T-B-O-H.NET wrote:

> > > > Also, the list of things to do is a bit mis-ordered and truncated. The
> > > > official list is in /usr/src/UPDATING and reads:
> > > > 
> > > > 
> > > > make buildworld
> > > > make kernel KERNCONF=YOUR_KERNEL_HERE
> > > > [1]
> > > >  [3]
> > > > mergemaster -p  [5]
> > > > make installworld
> > > > make delete-old

>   Um, I went to go check the file on a 7.0-BETA1 I just installed and and 
> doing the ground
> up on.. And I just realized something...
> 
>   WHERE is the step to install the kernel?? I always thought it was :
> 
> make buildworld
> make kernel KERNCONF=YOUR_KERNEL_HERE
>   make installkernel KERNCONF=YOUR_KERNEL_HERE
>   [1]
> [3]


Pay attention to the make options (you can find them in /usr/src/Makefile).

'make kernel' is equivalent to 'make buildkernel + installkernel', just like
'make world' is equivalent to 'make buildworld + installworld'. The latter
can be dangerous, but the former usually isn't.

One process is:

[csup, etc.]
make buildworld
make buildkernel
make installkernel  [reboot single user]
[mergemaster -p if necessary]
make installworld   
mergemaster     [reboot]
[ports or other stuff]

If you wish, the 'make buildkernel' + 'make installkernel' can be replaced
with 'make kernel', which does them both in sequence with one command.


-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 7.0-BETA1

2007-10-24 Thread Greg Byshenk

On Wed, Oct 24, 2007 at 02:00:42PM +0500, rihad wrote:
> >>rihad wrote:

> >>>How risky is it to start using 7.0-BETA1 in production, with the 
> >>>intention of upgrading to release as soon as possible? Thanks.

> My question was more a theoretical one: it's called BETA for some 
> reason, otherwise it'd still be in HEAD. To me BETA means that no major 
> architectural changes are expected in it any more, no?

Yes, but it doesn't mean that there can't be undiscovered bugs that could
cause problems.

 
> Our machine-to-be is quite mission-critical... But if I start with the 
> latest 6.x release, it would be more difficult to migrate to 7.0 when it 
> comes out than if I start with 7.0-BETA?. I've known people running 
> 4-STABLE or 5-STABLE branches on mission-critical machines, without even 
> bothering to upgrade, but I think they're stress-testing their luck ;-) 
> So I don't want to join their camp, that's why I asked for advice ;-) 
> Again it's named BETA for a reason, so it could be less intrusive than 
> STABLE?..
 
> I will definitely start with beta if it reaches BETA2 in a week or two - 
> the time I got ;-) Thanks for advice.

Well, if it is a "machine-to-be", then I suspect that you should be safe
in starting with 7.0-BETA. First, there don't appear to be any serious
problems with it, and second, if it is a new build "machine-to-be", then
you will have the opportunity to do the testing required to ensure that
there are no problems (in your situation) prior to rollout.


-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 7.0-BETA1

2007-10-24 Thread Greg Byshenk

On Tue, Oct 23, 2007 at 11:08:27PM +0200, Per olof Ljungmark wrote:
> rihad wrote:

> >How risky is it to start using 7.0-BETA1 in production, with the 
> >intention of upgrading to release as soon as possible? Thanks.

> We've used 7-CURRENT since January on a couple of production boxes and 
> had very few disasters, well, none, but a couple of issues.

> "Risky" is a relative term really, but if you ask me I'd say the "risk" 
> is rather low.

> But: TEST FIRST!

I concur with Per.  I've been running 7-CURRENT on a couple of "production"
machines for some months, without any serious problems -- but these are not
mission-critical machines.

Risk is a relative thing, and it is relative to both the risk of failure and
the cost of that failure should it occur.  I have 7- running on one fileserver
that is used only by our IT group (for online copies of distfiles and other
installable software), meaning that if something should go horribly wrong, it
would be an annoyance, but not a disaster. The same could _not_ be said about
our central user fileservers, and so they do not run 7-.

I could also note that I've been running 7-CURRENT on my own workstation
(including X, but only fvwm2 and nothing too fancy) for about 6 months, and
have experienced no serious problems (though I have swapped out SCHED_4BSD
for SCHED_ULE due to poor interactivity with 4BSD).

And I also emphasise:  TEST FIRST!  My situation is not the same as yours,
and something that works fine in my environment may break horribly in yours.

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: FreeBSD 6.x, NIS, local root password, and nsswitch.conf

2006-11-22 Thread Greg Byshenk

On Wed, Nov 22, 2006 at 10:49:01PM +0800, David Adam wrote:
> On Wed, 22 Nov 2006, Gerrit [ISO-8859-1] K?hn wrote:
> > On Wed, 22 Nov 2006 09:07:34 -0500 (EST) Mark Hennessy <[EMAIL PROTECTED]>

> > wrote about Re: FreeBSD 6.x, NIS, local root password, and nsswitch.conf:

> > MH> I'm a bit unsure about it myself.
> > MH> I tried exactly what you suggested, putting files on the compat line
> > MH> and before nis for both passwd and groups on the NIS slave server
> > MH> only, and no go.  Perhaps it is the master server that actually
> > MH> controls this? I don't know.  Any further advice would be greatly
> > MH> appreciated.

> > Sorry to disturb, but I don't understand why you distribute the server's
> > root pw via NIS at all. Is it really shown by "ypcat passwd" on the
> > client? If so, how about removing it from the list of exported accounts?
 
> That's a really good point. When you consider the inherent insecurity of
> NIS, having a root password in the maps is a pretty bad plan anyway.
 
> Given my vague handwaving at PAM, and the fact that the OP probably has
> NIS as sufficient above pam_unix, the obvious solution if my unverified
> assertions are correct is to remove the root password from the NIS maps.

I could be mistaken, but isn't the 'compat' entry to cover the case with
the old format passwd/group files, in which one used '+:...' or similar to
include NIS (or other authentication).  As such, 'compat' means "use the
file, plus whatever is added under 'compat'", further meaning that you 
can have only one entry under 'compat'.

So, if you want "old style" behavior, what you want is something like:

   passwd: compat
   passwd_compat: nis

Alternatively, you can use something like:

   passwd: files nis
   # passwd_compat: nis

or even:

   passwd: winbind nis files
   # passwd_compat: nis


[Corrections welcome if I have this wrong]


-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Cruel and unusual problems with Proliant ML350

2006-11-13 Thread Greg Byshenk

On Mon, Nov 13, 2006 at 09:19:45AM -0800, Jeremy Chadwick wrote:

> I'll agree with this (re: webservers not needing USB), except in
> regards to one item: keyboards.
> 
> More and more x86 PCs these days are expecting keyboards to be
> USB-based.  Yes, PS/2 ports are still present on most (but not all)
> motherboards, but eventually that will be phased out.
> 
> I like the idea of being able to go to my co-location facility and
> plug in a USB keyboard to begin working on a server, and when
> finished remove the keyboard and leave.

Don't you really need to have a monitor, as well?  I _have_ worked
"blind" before, but I didn't enjoy it.  I can imagine having a 
keyboard with me when wandering around, but wouldn't normally have
a monitor.  I had always thought that the preferred solution for 
this sort of case was to use a serial console.

And what seems to be becoming common on servers is a BIOS that allows
you to fully redirect to serial, including BIOS configuration.  The
servers that I have recently purchased have had a keyboard and monitor
plugged into them _once_ -- for the first BIOS setup -- and then never
again.

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: em driver testing

2006-11-07 Thread Greg Byshenk

On Mon, Nov 06, 2006 at 04:14:40PM -0800, Jack Vogel wrote:
> Well, so run 6.2 BETA3 plus the patch I posted as Patrick
> mentioned and then report on that. You've got a lot of
> potential problem areas here, I have no experience with
> samba on FreeBSD. And that motherboard only has PCI
> as I recall, yes? Still, it should get rid of the watchdogs
> unless you have real hardware issues.

As a point of information, I don't think that samba specifically has
anything to do with the problem.

I am running samba on FreeBSD, and have two servers that are rather
heavily used (one is the filestore for a CFD cluster, and the other
for a Maya/Muster rendering cluster), each having two em interfaces
and SMP -- and have not seen any watchdog issues (they are currently
running FreeBSD 6.2-PRERELEASE as of Oct  7 -- but no problems with
any earlier 6.1-STABLE versions either).

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: probs on 6.2-prerelease

2006-09-25 Thread Greg Byshenk

On Mon, Sep 25, 2006 at 09:08:26AM -0400, Michael Proto wrote:
> Michael Vince wrote:

> > I don't know if this is pre 6.2 specific but I changed my /etc/tty for
> > device ttyd0 to 'on' from 'off' and when I rebooted the pc I couldn't
> > login via regular KVM console, just don't get a login.
> > The more alarming thing was that while it appeared everything was
> > booting up from the boot up messages on the screen, I couldn't remotely
> > log into the server in fact it appears the machine didn't bring up the
> > Ethernet device as I couldn't even ping it.
> > As soon as I switched the ttyd0 back to 'off' and rebooted it I could
> > ssh back into the server etc.
> > I have a regular kernel and 1 jail and samba on this machine.
 
> I know this isn't a "yes I'm having problems" response but thought it
> might be useful anyway.
 
> I'm running 6.2-pre on a Soekris Engineering Net4501 with ttyd0 enabled
> in /etc/ttys and I'm not having any problems with the system booting or
> logging in via serial console. SSH logins work fine and the network is
> brought-up as normal during boot. I've had this system in the same
> config (in regards to /etc/ttys) since the 6 was still the HEAD branch
> and I have yet to see problems with it. One difference here is that I
> don't have any virtual consoles enabled BUT ttyd0 (and
> pseudo-terminals), as this box doesn't have a video card, just a serial
> port.

I can also report no problems running 6.2-pre on i686.  I am running on 
several machines, using serial consoles, machines _with_ video cards,
but mostly unused (one machine has a KVM connected, and it works fine,
as well.  No problems with video, no problems with network, no problems
with ssh login, etc.

-greg


FreeBSD xxx.xxx.com 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #21: Tue Sep 19 
19:37:00 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/  i386

/etc/ttys:
[...]
ttyd0   "/usr/libexec/getty std.9600"   xterm   on  secure
[...]

/boot/loader.conf
[...]
console=comconsole
[...]


-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: ARRRRGH! Guys, who's breaking -STABLE's GMIRROR code?!

2006-09-16 Thread Greg Byshenk

On Fri, Sep 15, 2006 at 03:41:04PM -0300, Marc G. Fournier wrote:

> But, I'm just curious here ... for all of the talk going around about this 
> whole issue, how many ppl have truly ever been bitten by an unstable 
> -STABLE?  And for those that have, how long did it take to get help from a 
> developer to get it fixed?

I run -STABLE on a number of production machines.

I have twice been "bitten by an unstable -STABLE" -- but "bitten" in a 
very small way.

When we build a new -STABLE (on average perhaps once per month), we
build it on a test machine, so that we can be sure that it actually
works. Once it is tested and we know it works, then we can roll it out
to the production machines without undue concern.

I note that we follow the same process with out Linux machines, our
Irix machines, and our Windows machines.  Blindly rolling out updates
or patches to critical production machines is unwise and dangerous (at
least IMO).

I will add that I have never even needed to contact a maintainer.
When there has been a problem, I checked the lists.  In one case the
fix was already committed, in the other there was already an "I'm
working on it" message and a fix was commited in less than 24 hours.
In the interim, my test machine had a problem -- but that's what a 
test machine is for.

> In the case that started this thread, it seems to be that the developer 
> fixed his mistake fairly quickly, which is what one would expect ... it 
> shouldn't be so much that he *broke* -STABLE (shit happens, do you want 
> your money back?), but it should be 'was he around to reverse his mistake 
> in a reasonable amount of time?' ... ?

Exactly.

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS locking: lockf freezes (rpc.lockd problem?)

2006-08-31 Thread Greg Byshenk

On Tue, Aug 29, 2006 at 05:05:26PM +, Michael Abbott wrote:

[I wrote]
> >>>An alternative would be to update to RELENG_6 (or at least RELENG_6_1)
> >>>and then try again.

> So.  I have done this.  And I can't reproduce the problem.

> # uname -a
> FreeBSD venus.araneidae.co.uk 6.1-STABLE FreeBSD 6.1-STABLE #1: Mon Aug 28 
> 18:32:17 UTC 2006 
> [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC  i386

> Hmm.  Hopefully this is a *good* thing, ie, the problem really has been 
> fixed, rather than just going into hiding.

> So, as far as I can tell, lockf works properly in this release.

Just as an interesting side note, I just experienced rpc.lockd crashing.
The server is not running RELENG_6, but RELENG_5 (FreeBSD 5.5-STABLE
#15: Thu Aug 24 18:47:20 CEST 2006).  Due to user error, someone ended
up with over 1000 processes trying to lock the same NFS mounted file at
the same time.  The result was over 1000 "Cannot allocate memory" errors
followed by rpc.lockd crashing.

I guess the server is telling me it wants an update...

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS locking: lockf freezes (rpc.lockd problem?)

2006-08-27 Thread Greg Byshenk

On Sun, Aug 27, 2006 at 07:17:34PM +, Michael Abbott wrote:
> On Sun, 27 Aug 2006, Kostik Belousov wrote:

> >Make sure that rpc.statd is running.
> Yep.  Took me some while to figure that one out, but the first lockf test 
> failed without that.

[...]

> As for the other test, let's have a look.  Here we are before the test 
> (NFS server, 4.11, is saturn, test machine, 6.1, is venus):

> saturn$ ps auxww | grep rpc\\.
> root48917  0.0  0.1   980  640  ??  Is7:56am   0:00.01 rpc.lockd
> root  115  0.0  0.1 263096  536  ??  Is   18Aug06   0:00.00 rpc.statd

[...]

> Well, how odd: as soon as I start the test process 515 on venus goes away. 
> Now to wait for it to fail... (doesn't take too long):

[...] 

> In conclusion: I agree with Greg Byshenk that the NFS server is bound to 
> be the one at fault, BUT, is this "freeze until reboot" behaviour really 
> what we want?  I remain astonished (and irritated) that `kill -9` doesn't 
> work!

The problem here is that the process is waiting for somthing, and 
thus not listening to signals (including your 'kill').

I'm not an expert on this, but my first guess would be that saturn (your
server) is offering something that it can't deliver.  That is, the client
asks the server "can you do X?", and the server says "yes I can", so the
client says "do X" and waits -- and the server never does it.

Or alternatively (based on your rpc.statd dying), rpc.lockd on your
client is trying to use rpc.statd to communicate with your server.  And
it starts successfully, but then rpc.statd dies (for some reason) and
your lock ends up waiting forever for it to answer.

I would recommend starting both rpc.lockd and rpc.statd with the '-d'
flag, to see if this provides any information as to what is going on.
There may well be a bug somewhere, but you need to find where it is.
I suspect that it is not actually in rpc.statd, as nothing in the
source has changed since January 2005.

An alternative would be to update to RELENG_6 (or at least RELENG_6_1)
and then try again.

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS locking: lockf freezes (rpc.lockd problem?)

2006-08-27 Thread Greg Byshenk

On Sun, Aug 27, 2006 at 11:24:13AM +, Michael Abbott wrote:
> I've been trying to make some sense of the "NFS locking" issue.  I am 
> trying to run
>   # make installworld DESTDIR=/mnt
> where /mnt is an NFS mount on a FreeBSD 4.11 server, but I am unable to 
> get past a call to `lockf`.

I have not closely followed the discussion, as I have not experienced 
the problem.

I am currently running FreeBSD6 based fileservers in an environment that
includes FreeBSD, Linux (multiple flavors), Solaris, and Irix clients,
and have experienced no nfs locking issues (I have one occasional
problem with 64-bit Linux clients, but it is not locking related and
appears to be due to a 64-bit Linux problem).

Further, (though there may well be problems with nfs locking) I cannot
recreate the problem you described -- at least in a FreeBSD6 environment.

I have just performed a test of what you describe, using 'smbtest'
(6.1-STABLE #17: Fri Aug 25 12:25:19 CEST 2006) as the client and 
'data-2' (FreeBSD 6.1-STABLE #16: Wed Aug  9 15:38:12 CEST 2006) as the
server.

   data-2 # mkdir /export/rw/bsd6root/
   ## /export/rw is already exported via NFS
   smbtest # mount data-2:/export/rw/bsd6root /mnt
   smbtest # cd /usr/src
   smbtest # make installworld DESTDIR=/mnt
   [...]
   makewhatis /mnt/usr/share/man
   makewhatis /mnt/usr/share/openssl/man
   rm -rf /tmp/install.2INObZ3j
   smbtest #

Which is to say that it completed successfully.  Which suggests that there
is not a serious and ongoing problem.

There may well be a problem with FreeBSD4, but I no longer have any NFS
servers running FreeBSD4.x, so I cannot confirm.  Alternatively, there
may have been a problem in 6.1-RELEASE that has since been solved in
6.1-STABLE that I am using.  Or there could be a problem with the 
configuration of your server.  Or there could be something else going
on (in the network...?).

But to see what exactly is happening in your case, you would probably 
want to look at what exactly is happening on the client, the server, and
the network between them.

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 5.5 to 6.1 upgrade

2006-08-23 Thread Greg Byshenk

On Tue, Aug 22, 2006 at 12:23:00PM -0700, Chuck Swiger wrote:

> In practice, however, pretty much all software nowadays depends on  
> shared libraries, so it's reasonable to do a "pkg_delete -a" after  
> upgrading to a new major version of FreeBSD, and then reinstall all  
> of the ports you use once you've finished upgrading.  Run "pkg_info"  
> before the upgrade and keep track of this output to help you remember  
> what ports you've got installed...

As a possible point of clarification, my comments earlier (and, I
suspect similar comments of others) were not meant to imply that one
should not rebuild ports after a major upgrade, but only that one need
not do so _before_ upgrading.

[...probably ... it worked for me ... YMMV ... if it is a critical
package, then it wouldn't hurt to rebuild it first ... usw.]

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 5.5 to 6.1 upgrade

2006-08-21 Thread Greg Byshenk

On Mon, Aug 21, 2006 at 11:52:02PM +0200, Stefan Bethke wrote:
> Am 21.08.2006 um 18:19 schrieb Ian Smith:

> >I recently (without drama) upgraded a 5.4-RELEASE system to
> >FreeBSD 5.5-STABLE #1: Tue Aug  1 11:11:20 EST 2006
> >for 'target practice' at least, on the way to 6.1-STABLE

> >I was preparing to portupgrade everything next, when I wondered:

> >a) should I upgrade from RELENG_5 straight to RELENG_6 or should I be
> >stopping off at 6.1-RELEASE along the way first?  and

> I'd go straight to 6-stable. Make sure you have a good backup, even  
> if you stop over at 6.1.

I see no reason not to go directly to 6-stable (if that is what you plan
to run); I've done it with multiple machines, and just jump right to the
6-stable version that is active on the machines running 6.x.

Though I've had no problems, I second the recommendation to have a good
backup.  Also, if you don't have a known-good 6-stable build, you might
want to upgrade to the GENERIC kernel.

> >b) do I need to upgrade all existing ports (way out of date) before  
> >the source upgrade, or can I be confident of doing that from 6.1
> >(-R or -S)?

> >FWIW: a wee Celeron 300, so minimising upgrade build times is  
> >desirable.

> Unless you have business critical apps running (downtime must be  
> minimal), you can wait until you've completed the upgrade to 6- 
> stable, and then run portupgrade -af.  If you'd like to run the  
> portupgrade overnight, you might want to define BATCH, and possibly  
> set any port building options in /usr/local/etc/pkgtools.conf,  
> otherwise, the port builds will be frequently interrupted by make  
> config questions.

It shouldn't be necessary to rebuild ports before the upgrade.  If 
there is something running that is critical, you might want to upgrade
it first, just be sure, but it probably isn't necessary.  I upgraded a
workstation with 200+ ports installed, and saw no problems (I can't
for certain that nothing was broken before I upgraded the ports, but
I experienced no problems). 

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-21 Thread Greg Byshenk

On Mon, Aug 21, 2006 at 04:03:47AM +0200, Konstantin Saurbier wrote:
> Am 20.08.2006 um 18:20 schrieb Greg Byshenk:

> >What is different is that this was with a 3Ware RAID controller --
> >which made removing/raconfiguring/rebuilding much easier -- but I was
> >seeing the exact same errors.

> No your errors are not related. As of my experience (and the  
> experience of others) the controller forgetting or loosing drives is  
> a "feature" 3ware.
> We had similar problems with 3ware-7500-8 ATA controllers and i was  
> reported of the same errors with 3ware-9000 series. Our in-house  
> 3ware-9500S are not showing this kind of errors.

> This errors are not driver or OS dependent such as they appear on  
> FreeBSD as well on different Linux distros.
> Since not all controllers suffering of these errors it is maybe  
> depending on the firmware or board/chip revisions.

I hesitate to make too strong a statement on this matter, as I have
not done any deep investigation, however...

The explanation above does not appear consistent with my experience.
I am now using (and have used over the past several years) a number
of different 3Ware controllers (7000, 8000, and 9000 series) and have
not previously seen this problem.  Of course I have had drives fail
-- and in one case one port of one controller simply stopped working
-- but never this particular problem.

Further, the very same controller that demonstrated problems (in the
numerically identical server, performing the exact same jobs), had
not demonstrated this problem (over a period of more than six months)
until I installed the June 6.1 STABLE, after which the problem appeared
consistently, until installing the July 6.1 STABLE, at which point the
problem disappeared, and has not occurred since (despite my trying very
hard to make it do so).

It may well be that there is some bug in the 3Ware controllers, but 
my experience suggests that there is/was something else going on.  At
the very least, it suggests that there was something about the June
6.1 STABLE (but not the earlier or later versions) that was triggering
a 3Ware bug -- as my problems occurred only when running the June
6.1 STABLE, and that was the _only_ difference between the cases of
having problems and those of not having problems.

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Motherboard RAID problem

2006-08-20 Thread Greg Byshenk

On Sun, Aug 20, 2006 at 08:22:48PM +0200, Roland Smith wrote:
> On Sun, Aug 20, 2006 at 07:55:47PM +0200, Greg Byshenk wrote:
> > On Sun, Aug 20, 2006 at 07:38:28PM +0200, Roland Smith wrote:

> > > If FreeBSD supports the device, you should see an ar0 device.

> > > Do you have the ataraid(4) driver loaded, or built into your kernel?

> > Alternatively, are you sure you have identified your hardware correctly?

> > According to Supermicro here

> ><http://www.supermicro.com/products/motherboard/P4/875/P4SCT.cfm>

> > the P4SCT has an Intel 6300ESB onboard RAID controller, which according to
> > this page

> >
> > <http://www.gamepc.com/labs/view_content.asp?id=eoyraid&page=2&cookie%5Ftest=1>

> > is based upon the ICH4, and not Adaptec controller.  And, unless I've
> > missed it, this controller is not supported.

> The ata(4) manual page lists the 6300ESB as supported. The ataraid(4)
> manual only lists the "Intel MatrixRAID" metadata format as supported.


Well, the controller itself is supported, obviously (as an ATA
controller), but I don't see that it is supported as a RAID controller.

And, if this is the case -- ie: 1) the controller is indeed 6300ESB; and
2) it is supported as an ATA controller; but 3) it is not supported as a
RAID controller -- then that would explain the situation described in the
original message.  That is:  the system happily sees two individual ATA
drives, but cannot see any array.

This is all guesswork, but it makes sense.


-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-20 Thread Greg Byshenk

On Sun, Aug 20, 2006 at 07:51:29PM +0200, Miroslav Lachman wrote:
> Greg Byshenk wrote:

[...]

> >This happened four times (with the same errors that have been discussed
> >here), running 6.1 STABLE as of June 22.  Before attempting to RMA the
> >drives, I tried an updated kernel, 6.1 STABLE as of July 19.  Strangely
> >enough, the problems disappeared.

> >So, while I have not checked everything that has changed, it _might_ be
> >worth trying 6.1 STABLE...
 
> I have problems with 6.1-RELEASE same as with 6.1-STABLE from August 2. 
> I can try newer STABLE, but as I see on cvsweb, there are not much 
> changes in ATA driver sources, only new chipsets added.

It is only an idea, based on something that worked for me.  And, as I
said, my situation is not exactly the same as the others.
 
> It is strange to me, that I can see significant changes of read/write 
> speed. (I am running nonstop tests with writing disk full of files, 
> delete them, and start again + generating graphs) Speed vary from 
> 2.5MB/s to 11MB/s by jumps. Not continuous from the lowest to the 
> highest. Writing is for example 3MB/s for 20 hours, then jump to 10MB/s 
> and after some time (6 - 20 hours) jump down to about 3MB/s.
> After some days of testing, disk disappear, system reboots itself, 
> resynchronize gmirror and work for next few days till the next disk lose.
> Also earlier synchronization was done after 1:30 hour (at about 30MB/s), 
> now synchronization run at lower speeds - from 2.5MB/s to 15MB/s, so the 
> whole synchronization is done after more then 5 hours (the longest was 
> 20 hours to synchronize 250GB HDDs)

> I don't know what more can I test, what more could be done to solve 
> these problems. :(

You are using gmirror, which I am not, so the situations are not
analogous, since my situation was with h/w RAID.  And I have no direct
experience with gmirror (I use gvinum on a couple of secondary systems,
but those are SCSI based).

Does the output of 'systat -vm' tell you anything of interest?  That is,
are the disks running at or close to 100%, are the CPUs fully loaded, or
anything else...?
 

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Motherboard RAID problem

2006-08-20 Thread Greg Byshenk

On Sun, Aug 20, 2006 at 07:38:28PM +0200, Roland Smith wrote:
> On Sun, Aug 20, 2006 at 10:02:05AM -0700, Bill Blue wrote:

> > I'm not sure if I'm expecting too much, or this is a real bug.

> > Using FreeBSD 6.1 release, CVSup'd to current. The motherboard is a
> > Supermicro P4SCT0 with a 3.2Ghz P4 and 2 DDR400 1G sticks of RAM.  On
> > the MB is a built-in RAID controller (Adaptec chip) for the SATA
> > drives.  You set it for discrete SATA or RAID.  If RAID is set, on the
> > next boot you have essentially a BIOS configuration for that 'device'
> > consisting of the two SATA devices in either RAID 0 (striped) or RAID
> > 1 (mirrored). 

> The ataraid(4) driver supports the Adaptec HostRAID.
  
> 
> > Boot the OS now and all goes well with the device still showing up on
> > /dev/ad4* but I couldn't tell if the mirroring was really working
> > since the drives have no individual led indications.  I then noticed
> > that there was a new ad6* device, and guess what -- it was the second
> > SATA drive and a mirror image of the *original* first drive.  Watching
> > it with DF for size changes when copying a large file to my home
> > directory, it didn't change at all. 

> > ad6* were the only new devices seen in the OS.
 
> If FreeBSD supports the device, you should see an ar0 device.
 
> Do you have the ataraid(4) driver loaded, or built into your kernel?


Alternatively, are you sure you have identified your hardware correctly?

According to Supermicro here

   <http://www.supermicro.com/products/motherboard/P4/875/P4SCT.cfm>

the P4SCT has an Intel 6300ESB onboard RAID controller, which according to
this page

   
<http://www.gamepc.com/labs/view_content.asp?id=eoyraid&page=2&cookie%5Ftest=1>

is based upon the ICH4, and not Adaptec controller.  And, unless I've
missed it, this controller is not supported.


What does dmesg output say about the drives and controller?



-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-20 Thread Greg Byshenk

On Sun, Aug 20, 2006 at 01:38:55PM +0100, Matt Dawson wrote:
> On Sunday 20 August 2006 13:00, [EMAIL PROTECTED] wrote:

> > Do you mean different type of cables, or just another piece? I can't
> > change cables by myself, servers are dedicated from provider, but as I
> > can saw, they picked whole new machine from their HW storage and put new
> > Samsung disk drives in. So these two last machines are brand new with
> > new cables. (Probably with a same type of cables - all machines are ASUS
> > RS120)

> I can confirm the same behaviour with a ULi M1689/Newcastle Athlon64 based 
> system running 6.1-RELEASE-p3 (i386). ad6 just detaches without warning and 
> it takes a reboot to bring it back. atacontrol reinit has no effect. Tried 
> the following to resolve the problems:

> - Changed cables (both ad4 and ad6)
> - Changed SATA power to legacy
> - Moved the NIC and anything else from the shared PCI INT (thought I'd 
> cracked 
> it at this point as it was stable for a month, then it lost ad6 on a nightly 
> dump)
> - Remade my gmirror array as an ar. Put it straight back to gmirror again 
> when 
> I found out what a pain it is to rebuild after ad6 disappears.

I am not sure if it is related, but...  I experienced a similar sort of
problem, although the details in my case are quite different.

What was similar was that I would "lose" two ATA drives from an array,
inexplicably.  Reconfiguring the same drives and rebuilding would cause
them to work perfectly again -- for some number of days, after which 
the same failure would occur.

What is different is that this was with a 3Ware RAID controller -- 
which made removing/raconfiguring/rebuilding much easier -- but I was
seeing the exact same errors.

This happened four times (with the same errors that have been discussed
here), running 6.1 STABLE as of June 22.  Before attempting to RMA the
drives, I tried an updated kernel, 6.1 STABLE as of July 19.  Strangely
enough, the problems disappeared.

So, while I have not checked everything that has changed, it _might_ be
worth trying 6.1 STABLE...

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: [FreeBSD 6.0-RELEASE] Incorrect geometry for VIA RAID0 array

2005-12-08 Thread greg byshenk

On freebsd-stable, [EMAIL PROTECTED] (Jason Harmening) wrote:

>  Here's the dmesg output from the installer:

>  ad4: 70911MB  at ata2-master SATA150
>  ad6: 70911MB  at ata3-master SATA150
>  ar0: 70911MB  status: READY
>  ar0: disk0 READY using ad4 at ata2-master
>  ar0: disk1 READY using ad6 at ata3-master

Are you _sure_ that the array is being recognized properly?

Based on the dmesg output, it looks like the controller is being read
as a 74G drive.

FWIW, this is the section of my dmesg output, for a _mirror_:

   ad4: 78167MB  at ata2-master UDMA133
   ad6: 78167MB  at ata3-master UDMA133
   ar0: 77247MB  status: READY
   ar0: disk0 READY (master) using ad4 at ata2-master
   ar0: disk1 READY (mirror) using ad6 at ata3-master


>  On 12/7/05, Jason Harmening <[EMAIL PROTECTED]> wrote:

> > I'm trying to install FreeBSD 6.0-RELEASE on a RAID0 array attached to the
> > VIA 8237 controller on my Asus A8V Deluxe motherboard.  The array consists
> > of two 74G drives.  The installer recognizes the array as ar0, but when I
> > enter FDISK to set up my partition, the size of the array is only recognized
> > as 74G, rather than the true 148G.  I've double-checked all my BIOS
> > settings, and nothing seems out of order.  Please help!


-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

72 matches

Mail list logo