Re: bsnmpd always died on HDD detach

2012-09-13 Thread Mikolaj Golub
On Wed, Sep 12, 2012 at 10:39:12AM +0200, Miroslav Lachman wrote:

 (gdb) bt
 #0  0x000801046cba in disk_query_disk (entry=0x0) at 
 hostres_diskstorage_tbl.c:241
 #1  0x000801dd6a00 in ?? ()
 #2  0x000801dd6600 in ?? ()
 #3  0x in ?? ()
 #4  0x000801048230 in device_entry_create (name=0x0, 
 location=0x800c14ee0 0, descr=0x8010482a6 ) at hostres_device_tbl.c:217
 #5  0x000801dd7800 in ?? ()
 #6  0x000801dd7800 in ?? ()
 #7  0x000801dd7400 in ?? ()
 #8  0x in ?? ()
 #9  0x000801048230 in device_entry_create (name=0x801dd7c00 , 
 location=0x801048230 ˙˙I\213|$8čŕ\201˙˙L\211çčŘ\201˙˙é\035ţ˙˙H\215\025,
  descr=0x8010482a6 ) at hostres_device_tbl.c:217
 #10 0x000801dd4a00 in ?? ()
 #11 0x000801dd4a00 in ?? ()
 #12 0x000801dd1a00 in ?? ()
 #13 0x in ?? ()
 #14 0x000801048230 in device_entry_create (name=0x801dd8400 , 
 location=0x801048230 ˙˙I\213|$8čŕ\201˙˙L\211çčŘ\201˙˙é\035ţ˙˙H\215\025,
  descr=0x8010482a6 ) at hostres_device_tbl.c:217
 #15 0x000801dd1800 in ?? ()
 #16 0x000801dd1800 in ?? ()
 #17 0x000800c00ea8 in ?? ()
 #18 0x0051b1c8 in ?? ()
 #19 0x000800c00938 in ?? ()
 #20 0x0051b258 in ?? ()
 #21 0x000801dc8a00 in ?? ()
 #22 0x0008009f7be9 in free () from /lib/libc.so.7
 #23 0x in ?? ()
 #24 0x7fffed98 in ?? ()
 #25 0x0008010478bd in device_entry_delete () at hostres_device_tbl.c:266
 #26 0x005187d0 in snmp_error ()
 #27 0x000801047be6 in op_hrDeviceTable (ctx=Variable ctx is not 
 available.
 ) at hostres_device_tbl.c:671
 #28 0x0051b840 in ?? ()
 #29 0x0051b830 in ?? ()
 #30 0x in ?? ()
 #31 0x7fffc360 in ?? ()
 #32 0x0051b830 in ?? ()
 #33 0x in ?? ()
 #34 0x0008009efbd2 in _pthread_mutex_init_calloc_cb () from 
 /lib/libc.so.7
 #35 0x0008009f2d32 in _malloc_prefork () from /lib/libc.so.7
 #36 0x0008009f6e1f in realloc () from /lib/libc.so.7
 #37 0x000800e0b441 in mib_if_is_dyn () from /usr/lib/snmp_mibII.so
 #38 0x in ?? ()
 #39 0x7fffc5cc in ?? ()
 #40 0x0001 in ?? ()
 #41 0x7fffc5e0 in ?? ()
 #42 0x31fa39e2fac72819 in ?? ()
 #43 0x0001 in ?? ()
 #44 0x00080065fad5 in poll_dispatch () from /lib/libbegemot.so.4
 #45 0x0040616a in main ()
 
 
 I hope it helps you to debug this problem.

Looks like we can't trust to this output.

-- 
Mikolaj Golub
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Thinkpad X61s cannot boot 9.1-BETA1

2012-09-13 Thread Lars Engels
On Wed, Sep 12, 2012 at 11:08:25PM +0300, Alexander Motin wrote:
 On 12.09.2012 22:58, Lars Engels wrote:
  On Wed, Sep 12, 2012 at 09:58:31PM +0300, Alexander Motin wrote:
  On 12.09.2012 20:46, Lars Engels wrote:
  On Wed, Sep 12, 2012 at 08:30:36PM +0300, Andriy Gapon wrote:
  on 12/09/2012 20:25 Lars Engels said the following:
  On Wed, Sep 12, 2012 at 03:54:30PM +0300, Andriy Gapon wrote:
  Could you try to play with different eventtimer settings (preferably 
  in current) ?
  You can use this thread / PR as a guide:
  http://thread.gmane.org/gmane.os.freebsd.devel.amd64/14480/focus=14495
 
  The place where boot stop looks suspiciously close to the place where 
  timer
  interrupts should start driving the system.
 
  Yes, that's it!
  Setting  kern.eventtimer.timer=i8254 let's the Thinkpad boot on
  CURRENT with the AC cable inserted.
 
 
  Please share your sysctl kern.eventtimer output with Alexander.
  He will probably ask for some additional information :-)
 
  Sorry if I've missed, but it would be useful to see verbose dmesg in
  situation where system couldn't boot without switching eventtimer.
 
  No problem. See: http://bsd-geek.de/FreeBSD/IMAG0190.jpg
 
 No, I've seen that one and I don't mean it. I mean full verbose dmesg of 
 successful boot in conditions where system was not booting before 
 without setting kern.eventtimer.timer=i8254.

Ok, sorry.
Here's a verbose dmesg booting CURRENT without AC power:
http://bsd-geek.de/FreeBSD/T61_dmesg.boot.works


pgpia63ZNvh1G.pgp
Description: PGP signature


Re: GEOM_RAID in GENERIC is harmful

2012-09-13 Thread Alexander Motin

On 13.09.2012 08:31, Eugene Grosbein wrote:

9-STABLE has got options GEOM_RAID in GENERIC.
In real world, this change is pretty harmful and there are lots of cases
when 9.0-RELEASE systems upgraded to 9-STABLE fail to mount root UFS filesystem
or attach ZFS.

It seems, there are lots of HDDs supplied with pseudo-RAID labels at the end:
pre-installed Windows machined having motherboards with pseudo-RAID
like Intel RapidStore and alike. One can not even be aware of these labels.

9.0-RELEASE can be installed on such HDDs and use them with GMIRROR or ZFS
without a problem. Upgraded to 9-STABLE, such system fails to build due
to GRAID jumping out of box and grabbing HDDs for itself,
so GMIRROR or ZFS got broken.

That's makes users very angry when production server fails to boot
with GENERIC kernel after correctly performed upgrade.

GEOM_RAID compiled in GENERIC should be deactivated and require activation
with some loader knob. Also, we need distinct RELEASE NOTES warning about the 
issue.


Problem of on-disk metadata garbage is not limited to GEOM_RAID. For 
example, I had case where remainders of old UFS file system were found 
by GEOM_LABEL and ZFS incorrectly attached to it instead of proper GPT 
partition, making other partitions inaccessible. Does it mean we should 
remove GEOM_LABEL also? I don't think so. All what GEOM_RAID is guilty 
in is that it was not in place for 9.0 release. If we remove it now, it 
will just postpone the problem for later time or will never be able to 
add it again because of the same reasons.


Unlike GEOM_LABEL, metadata of GEOM_RAID is quite easy to delete without 
complete disk erase: `graid status -ag`, `graid delete ...`. Yes, it can 
be a problem if system can't boot, but now we at least have live mode on 
installation images, that should allow to do it.


Adding some loader tunables indeed could simplify recovery in case of 
boot problem. I will probably add such ones now. It won't hurt. But I 
disagree they should be disabled by default, limiting users who really 
want to use BIOS RAID. Disabling them will also make metadata removal 
without full wipe more difficult because different RAIDs have different 
on-disk metadata layout, and you should know where exactly to apply dd.


--
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: GEOM_RAID in GENERIC is harmful

2012-09-13 Thread Eugene Grosbein
13.09.2012 16:51, Alexander Motin wrote:

 That's makes users very angry when production server fails to boot
 with GENERIC kernel after correctly performed upgrade.

 GEOM_RAID compiled in GENERIC should be deactivated and require activation
 with some loader knob. Also, we need distinct RELEASE NOTES warning about 
 the issue.
 
 Problem of on-disk metadata garbage is not limited to GEOM_RAID. For 
 example, I had case where remainders of old UFS file system were found 
 by GEOM_LABEL and ZFS incorrectly attached to it instead of proper GPT 
 partition, making other partitions inaccessible. Does it mean we should 
 remove GEOM_LABEL also? I don't think so. All what GEOM_RAID is guilty 
 in is that it was not in place for 9.0 release. If we remove it now, it 
 will just postpone the problem for later time or will never be able to 
 add it again because of the same reasons.

We must be ready for lots of angry users of 9.1-RELEASE then
and have BIG RED WARNING in RELEASE NOTES.

Eugene Grosbein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: GEOM_RAID in GENERIC is harmful

2012-09-13 Thread Eugene M. Zheganin

Hi.

On 13.09.2012 15:51, Alexander Motin wrote:


Problem of on-disk metadata garbage is not limited to GEOM_RAID. For 
example, I had case where remainders of old UFS file system were found 
by GEOM_LABEL and ZFS incorrectly attached to it instead of proper GPT 
partition, making other partitions inaccessible. Does it mean we 
should remove GEOM_LABEL also? I don't think so. All what GEOM_RAID is 
guilty in is that it was not in place for 9.0 release. If we remove it 
now, it will just postpone the problem for later time or will never be 
able to add it again because of the same reasons.


Unlike GEOM_LABEL, metadata of GEOM_RAID is quite easy to delete 
without complete disk erase: `graid status -ag`, `graid delete ...`. 
Yes, it can be a problem if system can't boot, but now we at least 
have live mode on installation images, that should allow to do it.


Adding some loader tunables indeed could simplify recovery in case of 
boot problem. I will probably add such ones now. It won't hurt. But I 
disagree they should be disabled by default, limiting users who really 
want to use BIOS RAID. Disabling them will also make metadata removal 
without full wipe more difficult because different RAIDs have 
different on-disk metadata layout, and you should know where exactly 
to apply dd.




From my point of view, the policy of new features should be like that: 
new features introduced to the system should by default try to mimic the 
old behavior. Right now we will have a situation when most of the users 
will just upgrade to the new kernel, and will get a non-bootable system 
or a system with one 100% busy disk (for example degraded raid0 gives 
this). On a system that manages to boot up 'graid delete -f' could lead 
to a livelock (got it today, on a degraded raid1). Furthermore, the 
situation when the engineer forgot about a disk with a glabel/gmirror 
data is less probable than a situation when you have a 'new' disk from 
another department which was extracted from some windows server or 
workstation. Should I test all of the disks against graid labels ? Yeah, 
may be. But for X last years I didn't do that, just because it worked 
for me and it didn't lead to a crash. The softraid labels were harmless 
all the way. I could use a zpool or a gmirror without even knowing that 
I have them. Now I suddenly need to care about the labels. Is GEOM_RAID 
great, as a feature ? Yep, it is. Is the way it is introduced into the 
system that great ? Not at all.


From my point of view GEOM_RAID in GENERIC kernel is a bomb, and we 
will lose lots of FreeBSD beginners due to this.


Eugene.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: GEOM_RAID in GENERIC is harmful

2012-09-13 Thread Alexander Motin

On 13.09.2012 13:01, Eugene Grosbein wrote:

13.09.2012 16:51, Alexander Motin wrote:


That's makes users very angry when production server fails to boot
with GENERIC kernel after correctly performed upgrade.

GEOM_RAID compiled in GENERIC should be deactivated and require activation
with some loader knob. Also, we need distinct RELEASE NOTES warning about the 
issue.


Problem of on-disk metadata garbage is not limited to GEOM_RAID. For
example, I had case where remainders of old UFS file system were found
by GEOM_LABEL and ZFS incorrectly attached to it instead of proper GPT
partition, making other partitions inaccessible. Does it mean we should
remove GEOM_LABEL also? I don't think so. All what GEOM_RAID is guilty
in is that it was not in place for 9.0 release. If we remove it now, it
will just postpone the problem for later time or will never be able to
add it again because of the same reasons.


We must be ready for lots of angry users of 9.1-RELEASE then
and have BIG RED WARNING in RELEASE NOTES.


Warning is good, but I don't think it will be lots. It is enabled in 
9-STABLE for some time now and I haven't seen many complains. If re@ 
permit to MFC r240465 in few days, solution for those who may need it 
will be simple: kern.geom.raid.enable=0.


--
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: GEOM_RAID in GENERIC is harmful

2012-09-13 Thread John Baldwin
On Thursday, September 13, 2012 6:13:51 am Eugene M. Zheganin wrote:
  From my point of view GEOM_RAID in GENERIC kernel is a bomb, and we 
 will lose lots of FreeBSD beginners due to this.

I had the completely opposite experience.  I bought a new desktop and wanted 
to use the onboard SATA RAID.  9.0 didn't work out-of-the-box with a RAID-1 
volume configured using the BIOS.  I knew to kldload geom_raid.ko, but not all 
new users know to do that.  I think the onboard SATA RAID on typical x86 
motherboards is something we should be supporting out of the box.  I don't 
disagree that there were some surprising side effects from enabling GEOM_RAID, 
but I think your viewpoint is very much one-sided.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Thinkpad X61s cannot boot 9.1-BETA1

2012-09-13 Thread Alexander Motin

On 13.09.2012 10:44, Lars Engels wrote:

On Wed, Sep 12, 2012 at 11:08:25PM +0300, Alexander Motin wrote:

On 12.09.2012 22:58, Lars Engels wrote:

On Wed, Sep 12, 2012 at 09:58:31PM +0300, Alexander Motin wrote:

On 12.09.2012 20:46, Lars Engels wrote:

On Wed, Sep 12, 2012 at 08:30:36PM +0300, Andriy Gapon wrote:

on 12/09/2012 20:25 Lars Engels said the following:

On Wed, Sep 12, 2012 at 03:54:30PM +0300, Andriy Gapon wrote:

Could you try to play with different eventtimer settings (preferably in 
current) ?
You can use this thread / PR as a guide:
http://thread.gmane.org/gmane.os.freebsd.devel.amd64/14480/focus=14495

The place where boot stop looks suspiciously close to the place where timer
interrupts should start driving the system.


Yes, that's it!
Setting  kern.eventtimer.timer=i8254 let's the Thinkpad boot on
CURRENT with the AC cable inserted.



Please share your sysctl kern.eventtimer output with Alexander.
He will probably ask for some additional information :-)


Sorry if I've missed, but it would be useful to see verbose dmesg in
situation where system couldn't boot without switching eventtimer.


No problem. See: http://bsd-geek.de/FreeBSD/IMAG0190.jpg


No, I've seen that one and I don't mean it. I mean full verbose dmesg of
successful boot in conditions where system was not booting before
without setting kern.eventtimer.timer=i8254.


Ok, sorry.
Here's a verbose dmesg booting CURRENT without AC power:
http://bsd-geek.de/FreeBSD/T61_dmesg.boot.works


Hmm. I see nothing suspicious. HPET driver output is typical for ICH8M 
chipset, many of which are working fine in different systems, including 
several mine. There was no significant changes in HPET after 9.0-RELASE 
except r231161. It changed device probe order that increased chance of 
interrupt sharing. It should not be a problem, but who knows. You can 
try to hint HPET driver specific IRQ 23 (that looks unused) to avoid 
sharing by setting hint.hpet.0.allowed_irqs=0x0080.


You've told that problem related to AC power state. Have you compared 
dmesg outputs with and without it?


--
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Userland dtrace broken?

2012-09-13 Thread Chris Nehren
On Wed, Aug 29, 2012 at 14:01:15 +0100 , Matt Burke wrote:
 Following http://wiki.freebsd.org/DTrace/userland on 9.1-RC1, the example
 fails to work as demonstrated:
 
 # dtrace -s pid.d -c test
 dtrace: script 'pid.d' matched 2 probes
 CPU IDFUNCTION:NAME
   1  59284   main:entry
 dtrace: pid 25479 exited with status 1
 
 #
 
 
 Also, I get hangs when trying to do pretty much anything with the pid:::entry
 
 # dtrace -n 'pid$target::malloc:entry' -c 'echo x'
 dtrace: description 'pid$target::malloc:entry' matched 2 probes
 xCPU IDFUNCTION:NAME
   1  59311 malloc:entry
 load: 0.43  cmd: dtrace 63737 [running] 8.93r 1.60u 4.19s 35% 25072k
 load: 0.88  cmd: echo 63738 [running] 45.10r 2.27u 18.75s 47% 1452k
 load: 1.19  cmd: dtrace 63737 [running] 70.32r 12.14u 33.27s 64% 25072k
 
 # procstat -k 63737 63738
   PIDTID COMM TDNAME   KSTACK
 63737 101505 dtrace   -mi_switch
 sleepq_catch_signals sleepq_timedwait_sig _sleep do_wait
 __umtx_op_wait_uint_private amd64_syscall Xfast_syscall
 63737 111024 dtrace   -running
 63738 101657 echo -mi_switch
 thread_suspend_switch ptracestop cursig ast doreti_ast
 
 
 I have previously tried using dtrace on 9.0R, but it was insta-panic. Is
 there anything I may have missed here?
 
 
 
 make.conf:
 STRIP=
 CFLAGS+=-fno-omit-frame-pointer
 WITH_CTF=1
 
 kernel config:
 include GENERIC
 ident   DTRACE
 
 makeoptions DEBUG=-g
 makeoptions WITH_CTF=1
 options KDTRACE_FRAME
 options KDTRACE_HOOKS
 options DDB_CTF
 options DDB
 

Relevant to my interests, too. I've followed the instructions on the
wiki / in the handbook (on 9.0/9.1-PRE) and only receive error messages.
Is DTrace supposed to be working properly on 9.x, or is it still
experimental?

It's nice to say that FreeBSD nominally supports DTrace, but if it
doesn't actually work then it needs to be labelled as such. I am fine
with it being experimental if that's the case, but saying so would help
manage expectations a lot better.

-- 
Thanks and best regards,
Chris Nehren


pgpRAM1lTvPSz.pgp
Description: PGP signature


Re: Issue with igb and lagg (was Re: Problem with link aggregation + sshd)

2012-09-13 Thread Giulio Ferro

On 09/12/2012 10:51 PM, Freddie Cash wrote:

On Wed, Sep 12, 2012 at 1:48 PM, Jack Vogel jfvo...@gmail.com wrote:

On Wed, Sep 12, 2012 at 12:40 PM, Freddie Cash fjwc...@gmail.com wrote:

Thanks for checking.  I've used lagg(4) with igb, just not on 9.x.

You're right, it seems to be pointing to the igb(4) driver in 9.x
compared to  9.0.


How do you determine that since it doesn't happen without lagg?  I've no
reports of igb hanging otherwise and its being used extensively.


Well, I did say seems to.  :)

igb+lagg worked for us on 8.3.  Haven't tried it since moving to 9.0
and 9-STABLE on those three boxes.

igb+lagg doesn't work for him on 9.0.  Although, I don't recall if
non-LACP options were tried earlier in this thread, or if it's just
the LACP mode that's failing.  If one mode works (say failover) and
LACP mode doesn't, that seems to point to lagg.



Sorry, forgot to mention it. I tried both failover and lacp: neither 
worked. The switch is a Dell powerconnect 6248 with ports configured for 
aggragation.


I first tried on a 9.1 prerelease, then on a 9.0 release to have
everything clean. In both ssh, both as server and as client, become
unresponsive and unkillable.

The problem might also lie within ssh/d, but I somehow doubt it.
I haven't tried other network services.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org