superpages not solving "PV entries" limit warning
Hi fellow BSD-types, I have a buy system that forks lots of processes and I see repeatedly the message: "Approaching the limit on PV entries, consider increasing either the vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable". My research suggested that enabling the "superpages" feature via sysctl vm.pmap.pg_ps_enabled was the best action... that doing so would quite the warning and improve memory mapping efficiency. As far as I can tell, however, "superpages" haven't done this -- well, to be specific, I can say at least that they haven't quieted the warning. Since I'm still seeing the warning, do I need to tune something else? Other comments? System details: * 8.1-RELEASE-p2 i386 PAE kernel * 6 GB RAM Thanks very much! Charles -- Charles Owens Great Bay Software, Inc. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: superpages not solving "PV entries" limit warning
That's very helpful! I had read about that and wondered if it applied to i386. Should I have expected "superpages" to completely cure the condition... or does it just help? Should I now be looking at tuning the related pmap sysctls to give further relief? Thanks! Charles On 5/10/12 11:24 AM, Alan Cox wrote: On Thu, May 10, 2012 at 2:32 AM, Adam Vande More mailto:amvandem...@gmail.com>> wrote: On Wed, May 9, 2012 at 12:55 PM, Charles Owens mailto:cow...@greatbaysoftware.com>>wrote: > Hi fellow BSD-types, > > I have a buy system that forks lots of processes and I see repeatedly the > message: "Approaching the limit on PV entries, consider increasing either > the vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable". > > System details: > > * 8.1-RELEASE-p2 i386 PAE kernel > * 6 GB RAM > The warning is not applicable any longer including your version as well as several previous ones. The warning has been removed from current releases. For amd64, yes, but not i386. It can't be removed from i386. Alan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: superpages not solving "PV entries" limit warning
On 5/10/12 12:37 PM, Alan Cox wrote: On Thu, May 10, 2012 at 10:52 AM, Charles Owens mailto:cow...@greatbaysoftware.com>> wrote: That's very helpful! I had read about that and wondered if it applied to i386. Should I have expected "superpages" to completely cure the condition... or does it just help? Should I now be looking at tuning the related pmap sysctls to give further relief? Superpages won't cure the problem due to the nature of your workload. After a fork, writes to portions of the address space that are both superpages and copy-on-write will trigger demotion, or re-instantiation of the 4KB page granularity PV entries. Ultimately, repromotion to superpages may occur, but in the meantime, your peak usage of PV entries is only slightly reduced. The bottom line is that you'll need to resort to tuning. Alan Ok. Very good. Lastly, since we're talking about this (for the future) -- is enabling of superpages generally recommended for amd64? Thanks, Charles ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
mfi(4) IO performance regression, post 8.1
NAL : Journal 137861468: mfid0s1d contains data.uhub7: 2 ports with 2 removable, self powered GEOM_JOURNAL uhub6: : 2 ports with 2 removable, self poweredJournal 137861468: mfid0s1d contains journal. GEOM_JOURNAL: Journal mfid0s1d clean. GEOM_JOURNAL: BIO_FLUSH not supported by mfid0s1d. Root mount waiting for: usbus7 usbus3 Root mount waiting for: usbus7 usbus3 uhub5: 6 ports with 6 removable, self powered uhub3: 6 ports with 6 removable, self powered Root mount waiting for: usbus7 usbus3 ugen3.2: at usbus3 uhub8: on usbus3 ugen7.2: at usbus7 umass0: on usbus7 umass0: SCSI over Bulk-Only; quirks = 0x uhub8: 4 ports with 4 removable, self powered umass0:0:0:-1: Attached to scbus0 Root mount waiting for: usbus7 (probe0:umass-sim0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 (probe0:umass-sim0:0:0:0): CAM status: SCSI Status ErrorTrying to mount root from ufs:/dev/ufs/root (probe0:umass-sim0:0:0:0): SCSI status: Check Condition (probe0:umass-sim0:0:0:0): SCSI sense: NOT READY asc:3a,2 (Medium not present - tray open) cd0 at umass-sim0 bus 0 scbus0 target 0 lun 0 cd0: Removable CD-ROM SCSI-0 device cd0: 40.000MB/s transfers cd0: Attempt to query device size failed: NOT READY, Medium not present - tray open ugen5.2: at usbus5ugen2.2: at usbus2 uhub9: on usbus5 ukbd0: on usbus2 kbd1 at ukbd0 ums0: on usbus2 ums0: 3 buttons and [Z] coordinates ID=0 uhub9: 4 ports with 4 removable, self powered ugen5.3: at usbus5 ukbd1: on usbus5 kbd2 at ukbd1 uhid0: on usbus5 ums1: on usbus5 ums1: 5 buttons and [XYZ] coordinates ID=0 ums2: on usbus5 ums2: 3 buttons and [Z] coordinates ID=0 eth0: link state changed to UP -- Charles Owens Great Bay Software, Inc. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mfi(4) IO performance regression, post 8.1
Yes, of course. So far I can say that the major shift appears to have occurred between 8.1 and 8.2 . Thanks, Charles Owens Great Bay Software, Inc. Sent from my phone - Reply message - From: "Adrian Chadd" To: "Charles Owens" Cc: Subject: mfi(4) IO performance regression, post 8.1 Date: Fri, Jun 15, 2012 1:55 am Hm, can you try different subversion checkouts of the kernel tree between 8.1 and 8.3, to pinpoint which commit(s) broke things? ADrian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mfi(4) IO performance regression, post 8.1
No reason other than history... will be changing over at some point. Charles Owens Great Bay Software, Inc. On 6/15/12 10:38 AM, Brian W. wrote: Curious why you are preferring i386 +PAE as opposed to amd64? On Jun 15, 2012 4:09 AM, "Charles Owens" <mailto:cow...@greatbaysoftware.com>> wrote: Yes, of course. So far I can say that the major shift appears to have occurred between 8.1 and 8.2 . Thanks, Charles Owens Great Bay Software, Inc. Sent from my phone - Reply message - From: "Adrian Chadd" mailto:adr...@freebsd.org>> To: "Charles Owens" mailto:cow...@greatbaysoftware.com>> Cc: mailto:sta...@freebsd.org>> Subject: mfi(4) IO performance regression, post 8.1 Date: Fri, Jun 15, 2012 1:55 am Hm, can you try different subversion checkouts of the kernel tree between 8.1 and 8.3, to pinpoint which commit(s) broke things? ADrian ___ freebsd-stable@freebsd.org <mailto:freebsd-stable@freebsd.org> mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org <mailto:freebsd-stable-unsubscr...@freebsd.org>" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mfi(4) IO performance regression, post 8.1
On 6/15/12 8:04 AM, John Baldwin wrote: On Friday, June 15, 2012 12:28:59 am Charles Owens wrote: Hello FreeBSD folk, We're seeing what appears to be a storage performance regression as we try to move from 8.1 (i386) to 8.3. We looked at 8.2 also and it appears that the regression happened between 8.1 and 8.2. Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs. Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10 configuration, using UFS + geom_journal for filesystem. Postgresql performance, as seen via pgbench, dropped by approx 20%. This testing was done with our usual PAE-enabled kernels. We then went back to GENERIC kernels and did comparisons using "bonnie", results below. Following that is a kernel boot log. Notably, we're seeing this regression only with our RAID mfi(4) based systems. Notably, from looking at FreeBSD source changelogs it appears that the mfi(4) code has seen some changes since 8.1. Between 8.1 and 8.2 mfi has not had any significant changes. The only changes made to sys/dev/mfi were to add a new constant: svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi Index: mfireg.h === --- mfireg.h(.../8.1/sys/dev/mfi) (revision 237134) +++ mfireg.h(.../8.2/sys/dev/mfi) (revision 237134) @@ -975,7 +975,9 @@ MFI_PD_STATE_OFFLINE = 0x10, MFI_PD_STATE_FAILED = 0x11, MFI_PD_STATE_REBUILD = 0x14, - MFI_PD_STATE_ONLINE = 0x18 + MFI_PD_STATE_ONLINE = 0x18, + MFI_PD_STATE_COPYBACK = 0x20, + MFI_PD_STATE_SYSTEM = 0x40 }; union mfi_ld_ref { The difference in write performance must be due to something else. You mentioned you are using UFS + gjournal. I think gjournal uses BIO_FLUSH, so I wonder if this is related: r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines MFC 212160: Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic. Add the BIO_ORDERED flag for struct bio and update bio clients to use it. The barrier semantics of bioq_insert_tail() were broken in two ways: o In bioq_disksort(), an added bio could be inserted at the head of the queue, even when a barrier was present, if the sort key for the new entry was less than that of the last queued barrier bio. o The last_offset used to generate the sort key for newly queued bios did not stay at the position of the barrier until either the barrier was de-queued, or a new barrier (which updates last_offset) was queued. When a barrier is in effect, we know that the disk will pass through the barrier position just before the "blocked bios" are released, so using the barrier's offset for last_offset is the optimal choice. sys/geom/sched/subr_disk.c: sys/kern/subr_disk.c: o Update last_offset in bioq_insert_tail(). o Only update last_offset in bioq_remove() if the removed bio is at the head of the queue (typically due to a call via bioq_takefirst()) and no barrier is active. o In bioq_disksort(), if we have a barrier (insert_point is non-NULL), set prev to the barrier and cur to it's next element. Now that last_offset is kept at the barrier position, this change isn't strictly necessary, but since we have to take a decision branch anyway, it does avoid one, no-op, loop iteration in the while loop that immediately follows. o In bioq_disksort(), bypass the normal sort for bios with the BIO_ORDERED attribute and instead insert them into the queue with bioq_insert_tail(). bioq_insert_tail() not only gives the desired command order during insertion, but also provides barrier semantics so that commands disksorted in the future cannot pass the just enqueued transaction. sys/sys/bio.h: Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio. sys/cam/ata/ata_da.c: sys/cam/scsi/scsi_da.c Use an ordered command for SCSI/ATA-NCQ commands issued in response to bios with the BIO_ORDERED flag set. sys/cam/scsi/scsi_da.c Use an ordered tag when issuing a synchronize cache command. Wrap some lines to 80 columns. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c sys/geom/geom_io.c Mark bios with the BIO_FLUSH command as BIO_ORDERED. Sponsored by: Spectra Logic Corporation Can you try perhaps commenting out the 'bp->bio_flags |= BIO_ORDERED' line changed in geom_io.c in 8.2? That would be effectively reverting this portion of the diff: Index: geom_io.c ===
re: disk IO performance regression, post 8.1
On 6/22/12 10:22 AM, John Baldwin wrote: On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote: On 6/15/12 8:04 AM, John Baldwin wrote: On Friday, June 15, 2012 12:28:59 am Charles Owens wrote: Hello FreeBSD folk, We're seeing what appears to be a storage performance regression as we try to move from 8.1 (i386) to 8.3. We looked at 8.2 also and it appears that the regression happened between 8.1 and 8.2. Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs. Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10 configuration, using UFS + geom_journal for filesystem. Postgresql performance, as seen via pgbench, dropped by approx 20%. This testing was done with our usual PAE-enabled kernels. We then went back to GENERIC kernels and did comparisons using "bonnie", results below. Following that is a kernel boot log. Notably, we're seeing this regression only with our RAID mfi(4) based systems. Notably, from looking at FreeBSD source changelogs it appears that the mfi(4) code has seen some changes since 8.1. Between 8.1 and 8.2 mfi has not had any significant changes. The only changes made to sys/dev/mfi were to add a new constant: svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi Index: mfireg.h === --- mfireg.h(.../8.1/sys/dev/mfi) (revision 237134) +++ mfireg.h(.../8.2/sys/dev/mfi) (revision 237134) @@ -975,7 +975,9 @@ MFI_PD_STATE_OFFLINE = 0x10, MFI_PD_STATE_FAILED = 0x11, MFI_PD_STATE_REBUILD = 0x14, - MFI_PD_STATE_ONLINE = 0x18 + MFI_PD_STATE_ONLINE = 0x18, + MFI_PD_STATE_COPYBACK = 0x20, + MFI_PD_STATE_SYSTEM = 0x40 }; union mfi_ld_ref { The difference in write performance must be due to something else. You mentioned you are using UFS + gjournal. I think gjournal uses BIO_FLUSH, so I wonder if this is related: r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines MFC 212160: Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic. Add the BIO_ORDERED flag for struct bio and update bio clients to use it. The barrier semantics of bioq_insert_tail() were broken in two ways: o In bioq_disksort(), an added bio could be inserted at the head of the queue, even when a barrier was present, if the sort key for the new entry was less than that of the last queued barrier bio. o The last_offset used to generate the sort key for newly queued bios did not stay at the position of the barrier until either the barrier was de-queued, or a new barrier (which updates last_offset) was queued. When a barrier is in effect, we know that the disk will pass through the barrier position just before the "blocked bios" are released, so using the barrier's offset for last_offset is the optimal choice. sys/geom/sched/subr_disk.c: sys/kern/subr_disk.c: o Update last_offset in bioq_insert_tail(). o Only update last_offset in bioq_remove() if the removed bio is at the head of the queue (typically due to a call via bioq_takefirst()) and no barrier is active. o In bioq_disksort(), if we have a barrier (insert_point is non-NULL), set prev to the barrier and cur to it's next element. Now that last_offset is kept at the barrier position, this change isn't strictly necessary, but since we have to take a decision branch anyway, it does avoid one, no-op, loop iteration in the while loop that immediately follows. o In bioq_disksort(), bypass the normal sort for bios with the BIO_ORDERED attribute and instead insert them into the queue with bioq_insert_tail(). bioq_insert_tail() not only gives the desired command order during insertion, but also provides barrier semantics so that commands disksorted in the future cannot pass the just enqueued transaction. sys/sys/bio.h: Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio. sys/cam/ata/ata_da.c: sys/cam/scsi/scsi_da.c Use an ordered command for SCSI/ATA-NCQ commands issued in response to bios with the BIO_ORDERED flag set. sys/cam/scsi/scsi_da.c Use an ordered tag when issuing a synchronize cache command. Wrap some lines to 80 columns. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c sys/geom/geom_io.c Mark bios with the BIO_FLUSH command as BIO_ORDERED. Sponsored by: Spectra Logic Corporation Can you try perhaps commenting out the 'bp->bio_flags |= BIO_ORDERED' line changed in g
Re: ? IO performance regression, post 8.1
Charles Owens Great Bay Software, Inc. v: 603.617.4844 m: 603.866.0860 On 6/22/12 10:22 AM, John Baldwin wrote: On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote: On 6/15/12 8:04 AM, John Baldwin wrote: On Friday, June 15, 2012 12:28:59 am Charles Owens wrote: Hello FreeBSD folk, We're seeing what appears to be a storage performance regression as we try to move from 8.1 (i386) to 8.3. We looked at 8.2 also and it appears that the regression happened between 8.1 and 8.2. Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs. Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10 configuration, using UFS + geom_journal for filesystem. Postgresql performance, as seen via pgbench, dropped by approx 20%. This testing was done with our usual PAE-enabled kernels. We then went back to GENERIC kernels and did comparisons using "bonnie", results below. Following that is a kernel boot log. Notably, we're seeing this regression only with our RAID mfi(4) based systems. Notably, from looking at FreeBSD source changelogs it appears that the mfi(4) code has seen some changes since 8.1. Between 8.1 and 8.2 mfi has not had any significant changes. The only changes made to sys/dev/mfi were to add a new constant: svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi Index: mfireg.h === --- mfireg.h(.../8.1/sys/dev/mfi) (revision 237134) +++ mfireg.h(.../8.2/sys/dev/mfi) (revision 237134) @@ -975,7 +975,9 @@ MFI_PD_STATE_OFFLINE = 0x10, MFI_PD_STATE_FAILED = 0x11, MFI_PD_STATE_REBUILD = 0x14, - MFI_PD_STATE_ONLINE = 0x18 + MFI_PD_STATE_ONLINE = 0x18, + MFI_PD_STATE_COPYBACK = 0x20, + MFI_PD_STATE_SYSTEM = 0x40 }; union mfi_ld_ref { The difference in write performance must be due to something else. You mentioned you are using UFS + gjournal. I think gjournal uses BIO_FLUSH, so I wonder if this is related: r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines MFC 212160: Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic. Add the BIO_ORDERED flag for struct bio and update bio clients to use it. The barrier semantics of bioq_insert_tail() were broken in two ways: o In bioq_disksort(), an added bio could be inserted at the head of the queue, even when a barrier was present, if the sort key for the new entry was less than that of the last queued barrier bio. o The last_offset used to generate the sort key for newly queued bios did not stay at the position of the barrier until either the barrier was de-queued, or a new barrier (which updates last_offset) was queued. When a barrier is in effect, we know that the disk will pass through the barrier position just before the "blocked bios" are released, so using the barrier's offset for last_offset is the optimal choice. sys/geom/sched/subr_disk.c: sys/kern/subr_disk.c: o Update last_offset in bioq_insert_tail(). o Only update last_offset in bioq_remove() if the removed bio is at the head of the queue (typically due to a call via bioq_takefirst()) and no barrier is active. o In bioq_disksort(), if we have a barrier (insert_point is non-NULL), set prev to the barrier and cur to it's next element. Now that last_offset is kept at the barrier position, this change isn't strictly necessary, but since we have to take a decision branch anyway, it does avoid one, no-op, loop iteration in the while loop that immediately follows. o In bioq_disksort(), bypass the normal sort for bios with the BIO_ORDERED attribute and instead insert them into the queue with bioq_insert_tail(). bioq_insert_tail() not only gives the desired command order during insertion, but also provides barrier semantics so that commands disksorted in the future cannot pass the just enqueued transaction. sys/sys/bio.h: Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio. sys/cam/ata/ata_da.c: sys/cam/scsi/scsi_da.c Use an ordered command for SCSI/ATA-NCQ commands issued in response to bios with the BIO_ORDERED flag set. sys/cam/scsi/scsi_da.c Use an ordered tag when issuing a synchronize cache command. Wrap some lines to 80 columns. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c sys/geom/geom_io.c Mark bios with the BIO_FLUSH command as BIO_ORDERED. Sponsored by: Spectra Logic Corporation Can you try perhaps commentin
Panic during kernel boot, igb-init related? (8.3-RELEASE)
Hello, We're seeing boot-time panics in about 4% of cases when upgrading from FreeBSD 8.1 to 8.3-RELEASE (i386). This problem is subtle enough that it escaped detection during our regular testing cycle... now with over 100 systems upgraded we're convinced there's a real issue. Our kernel config is essentially PAE (ie. static modules ... with a few drivers added/removed). The hardware is Intel Server System SR1625UR. This appears to match a finding discussed in these threads, having to do with timing of initialization of the igb(4)-based NICs (if I'm understanding it properly): http://lists.freebsd.org/pipermail/freebsd-stable/2011-May/062596.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062949.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063867.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063958.html These threads include some potential patches and possibility of commit/MFC... but it isn't clear that there was ever final resolution (and MFC to 8-stable). I've cc'd a few folks from back then. A real challenge here is the frequency of occurrence. As mentioned, it only hit's a fraction of our systems. When it _does_ hit, the system may enter a reboot loop for days and then mysteriously break out of it... and thereafter seem to work fine. I'd be very grateful for any help. Some questions: * Was there ever a final "blessed" patch? o if so, will it apply to RELENG_8_3? * Is there anything that could be said that might help us with reproducing-the-problem / testing / validating-a-fix? Panic message is -- panic: m_getzone: m_getjcl: invalid cluster type cpuid = 0 KDB: stack backtrace: #0 0xc059c717 at kdb_backtrace+0x47 #1 0xc056caf7 at panic+0x117 #2 0xc03c979e at igb_refresh_mbufs+0x25e #3 0xc03c9f98 at igb_rxeof+0x638 #4 0xc03ca135 at igb_msix_que+0x105 #5 0xc0541e2b at intr_event_execute_handlers+0x13b #6 0xc05434eb at ithread_loop+0x6b #7 0xc053efb7 at fork_exit+0x97 #8 0xc0806744 at fork_trampoline+0x8 Thanks very much, Charles -- Charles Owens Great Bay Software, Inc. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic during kernel boot, igb-init related? (8.3-RELEASE)
On 11/1/12 2:25 AM, Eugene Grosbein wrote: 31.10.2012 23:58, Charles Owens пишет: Hello, We're seeing boot-time panics in about 4% of cases when upgrading from FreeBSD 8.1 to 8.3-RELEASE (i386). This problem is subtle enough that it escaped detection during our regular testing cycle... now with over 100 systems upgraded we're convinced there's a real issue. Our kernel config is essentially PAE (ie. static modules ... with a few drivers added/removed). The hardware is Intel Server System SR1625UR. This appears to match a finding discussed in these threads, having to do with timing of initialization of the igb(4)-based NICs (if I'm understanding it properly): http://lists.freebsd.org/pipermail/freebsd-stable/2011-May/062596.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062949.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063867.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063958.html These threads include some potential patches and possibility of commit/MFC... but it isn't clear that there was ever final resolution (and MFC to 8-stable). I've cc'd a few folks from back then. A real challenge here is the frequency of occurrence. As mentioned, it only hit's a fraction of our systems. When it _does_ hit, the system may enter a reboot loop for days and then mysteriously break out of it... and thereafter seem to work fine. I'd be very grateful for any help. Some questions: * Was there ever a final "blessed" patch? o if so, will it apply to RELENG_8_3? * Is there anything that could be said that might help us with reproducing-the-problem / testing / validating-a-fix? Panic message is -- panic: m_getzone: m_getjcl: invalid cluster type cpuid = 0 KDB: stack backtrace: #0 0xc059c717 at kdb_backtrace+0x47 #1 0xc056caf7 at panic+0x117 #2 0xc03c979e at igb_refresh_mbufs+0x25e #3 0xc03c9f98 at igb_rxeof+0x638 #4 0xc03ca135 at igb_msix_que+0x105 #5 0xc0541e2b at intr_event_execute_handlers+0x13b #6 0xc05434eb at ithread_loop+0x6b #7 0xc053efb7 at fork_exit+0x97 #8 0xc0806744 at fork_trampoline+0x8 Thanks very much, Charles Take a look at http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/172113 that contains simple workaround in followup message not involving any patching, and the fix. Eugene Grosbein Eugene, thanks very much for the pointer. This is definitely what we were looking for! -- Charles Charles Owens Great Bay Software, Inc. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: make installworld 3.2R -> 3.2S fails over NFS
Rich Winkel wrote: > I just cvsupped fresh sources this weekend with the same result. > Does anyone else use nfs to "installworld"? > Or am I the only one having this problem? >From 3.0R to at least 3.1-stable I've seen this problem. The (annoying) workaround has been to do RW mounts of /usr/src and /usr/obj. I've seen some discussion of this problem on the -hackers list of late, with some mention of fixes. I updated my sources to the latest 3.2-stable on August 4th and happily found that running installworld over read-only NFS mounts worked again! So... I hope what you're seen doesn't mean it's broken again > According to Rich Winkel: > ? > ? This is from 3.2 sources cvsup'd today. > ? The buildworld on the nfs server goes fine, the make.conf files are > ? identical on the server and client, but make installworld on the client gives: > ? > ? [... lots of lines ...] > ? ===? gnu/usr.bin/perl/perl > ? install -c -s -o root -g wheel -m 555 perl /usr/bin > ? /usr/bin/perl5 -? /usr/bin/perl > ? /usr/bin/perl5.00503 -? /usr/bin/perl > ? cd /usr/obj/usr/src/gnu/usr.bin/perl/perl/ext/B ; make -B install INSTALLPRIVLI > ? B=/usr/libdata/perl/5.00503 INSTALLARCHLIB=/usr/libdata/perl/5.00503/mach > ? make: don't know how to make Makefile.PL. Stop > ? *** Error code 2 > ? > ? Stop. > ? *** Error code 1 > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-stable" in the body of the message -- --- - Charles N. Owens Email: [EMAIL PROTECTED] http://www.enc.edu/~owensc Network ? Systems Administrator Information Technology Services "Outside of a dog, a book is a man's Eastern Nazarene College best friend. Inside of a dog it's too dark to read." - Groucho Marx - To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
8.0 changes behavior of "who am i"
An observation... pre-8.0, "who am i" always returned the owner of the terminal device, regardless what you might have done with with "su". With 8.0, it returns the id of the user you've changed to. Example: ### 7.1 system [cow...@jakob ~]$ who am i cowens ttyp0Jan 6 07:31 (169.254.222.1) [cow...@jakob ~]$ su Password: [r...@jakob /home/cowens]# who am i cowens ttyp0Jan 6 07:31 (169.254.222.1) ### 8.0 system [cow...@newercastle ~]$ who am i cowens 0Jan 7 17:47 [cow...@newercastle ~]$ su [r...@newercastle /home/cowens]# who am i root 0Jan 7 17:47 The alternative syntax ("who -m") gives same result. The who(1) man page still states that both forms are supposed to give info about the "terminal attached to standard input," which, if I look with "w", it looks as I'd expect: [r...@newercastle /home/cowens]# w 5:47PM up 1 day, 1 min, 2 users, load averages: 0.00, 0.00, 0.00 USER TTY FROM LOGIN@ IDLE WHAT root v0 -17Dec09 21days -bash (bash) cowens pts/0169.254.222.1 5:47PM - w Am I missing something, or do we have a bug here? (I looked but can't find any existing threads about this issue). I'm guessing that the symptom here results somehow from the introduction of pts(4). Tnx -- Charles Owens Great Bay Software, Inc. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
help with 7.0-p12 crash analysis "panic: lockmgr: upgrade without shared"
i0 May 18 13:52:10 gbs01-etc kernel: sio1: type 16550A, console May 18 13:52:10 gbs01-etc kernel: sio1: [FILTER] May 18 13:52:10 gbs01-etc kernel: atkbdc0: port 0x60,0x64 irq 1 on acpi0 May 18 13:52:10 gbs01-etc kernel: atkbd0: irq 1 on atkbdc0 May 18 13:52:10 gbs01-etc kernel: kbd0 at atkbd0 May 18 13:52:10 gbs01-etc kernel: atkbd0: [GIANT-LOCKED] May 18 13:52:10 gbs01-etc kernel: atkbd0: [ITHREAD] May 18 13:52:10 gbs01-etc kernel: ipmi1: on isa0 May 18 13:52:10 gbs01-etc kernel: device_attach: ipmi1 attach returned 16 May 18 13:52:10 gbs01-etc kernel: pmtimer0 on isa0 May 18 13:52:10 gbs01-etc kernel: orm0: at iomem 0xc-0xc8fff,0xc9000-0xcdfff pnpid ORM on isa0 May 18 13:52:10 gbs01-etc kernel: ppc0: parallel port not found. May 18 13:52:10 gbs01-etc kernel: sc0: at flags 0x100 on isa0 May 18 13:52:10 gbs01-etc kernel: sc0: VGA <16 virtual consoles, flags=0x300> May 18 13:52:10 gbs01-etc kernel: vga0: at port 0x3c0-0x3df iomem 0xa-0xb on isa0 May 18 13:52:10 gbs01-etc kernel: Timecounters tick every 1.000 msec May 18 13:52:10 gbs01-etc kernel: acd0: DVDROM at ata0-slave UDMA33 May 18 13:52:10 gbs01-etc kernel: ipmi0: IPMI device rev. 1, firmware rev. 0.2, version 2.0 May 18 13:52:10 gbs01-etc kernel: ipmi0: Number of channels 5 May 18 13:52:10 gbs01-etc kernel: ipmi0: Attached watchdog May 18 13:52:10 gbs01-etc kernel: mfid0: on mfi0 May 18 13:52:10 gbs01-etc kernel: mfid0: 278472MB (570310656 sectors) RAID volume '' is optimal May 18 13:52:10 gbs01-etc kernel: lapic3: Forcing LINT1 to edge trigger May 18 13:52:10 gbs01-etc kernel: SMP: AP CPU #3 Launched! May 18 13:52:10 gbs01-etc kernel: lapic2: Forcing LINT1 to edge trigger May 18 13:52:10 gbs01-etc kernel: SMP: AP CPU #2 Launched! May 18 13:52:10 gbs01-etc kernel: lapic1: Forcing LINT1 to edge trigger May 18 13:52:10 gbs01-etc kernel: SMP: AP CPU #1 Launched! May 18 13:52:10 gbs01-etc kernel: lapic7: Forcing LINT1 to edge trigger May 18 13:52:10 gbs01-etc kernel: SMP: AP CPU #7 Launched! May 18 13:52:10 gbs01-etc kernel: lapic4: Forcing LINT1 to edge trigger May 18 13:52:10 gbs01-etc kernel: SMP: AP CPU #4 Launched! May 18 13:52:10 gbs01-etc kernel: lapic5: Forcing LINT1 to edge trigger May 18 13:52:10 gbs01-etc kernel: SMP: AP CPU #5 Launched! May 18 13:52:10 gbs01-etc kernel: lapic6: Forcing LINT1 to edge trigger May 18 13:52:10 gbs01-etc kernel: SMP: AP CPU #6 Launched! May 18 13:52:10 gbs01-etc kernel: GEOM_JOURNAL: Journal 2994169821: mfid0s1a contains data. May 18 13:52:10 gbs01-etc kernel: GEOM_JOURNAL: Journal 2994169821: mfid0s1a contains journal. May 18 13:52:10 gbs01-etc kernel: GEOM_JOURNAL: Journal GmEfOiMd_0JsO1UaR NcAoLn:s iJsotuernnta.l May 18 13:52:10 gbs01-etc kernel: 1G2E8O5M1_3J7O1U8R:N AmLf:i dB0IsO1_dF LcUoSnHt anionts sduaptpao.rt May 18 13:52:10 gbs01-etc kernel: eGdE ObMy_ JmOfUiRdN0AsL1:a .Jo May 18 13:52:10 gbs01-etc kernel: urnal 128513718: mfid0s1d contains journal. May 18 13:52:10 gbs01-etc kernel: WARNING: Expected rawoffset 0, found 63 May 18 13:52:10 gbs01-etc kernel: Root mount waiting for: GJOURNAL May 18 13:52:10 gbs01-etc kernel: GEOM_JOURNAL: Journal mfid0s1d consistent. May 18 13:52:10 gbs01-etc kernel: GEOM_JOURNAL: BIO_FLUSH not supported by mfid0s1d. May 18 13:52:10 gbs01-etc kernel: Trying to mount root from ufs:/dev/mfid0s1a.journal May 18 13:52:10 gbs01-etc kernel: WARNING: / was not properly dismounted May 18 13:52:10 gbs01-etc kernel: WARNING: Expected rawoffset 0, found 63 May 18 13:52:10 gbs01-etc savecore: reboot after panic: lockmgr: upgrade without shared May 18 13:52:10 gbs01-etc savecore: writing core to vmcore.0 May 18 13:52:12 gbs01-etc kernel: eth0: link state changed to UP May 18 13:52:12 gbs01-etc kernel: eth1: link state changed to UP -- **Charles Owens** *Great Bay Software**|** m: *603.866.0860 *|** f: *603.430.0713 *|** e: *cow...@greatbaysoftware.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: help with 7.0-p12 crash analysis "panic: lockmgr: upgrade without shared"
Kostik Belousov wrote: > On Mon, May 18, 2009 at 09:39:10PM -0400, Charles Owens wrote: > >> Hello, >> >> We had a crash of a 7.0-RELEASE-p12 running on a dual quad-core Xeon >> system with 6 gig RAM and mfi-based RAID. Pasted below are the output >> of kgdb crashdump backtrace, custom kernel config, and boot log. >> >> We really need to understand what happened and would greatly appreciate >> assistance. What is the next step? >> > > I believe this is fixed by r185210 on stable/7. The fix was included at > least into 7.1. > >From the PR that does look relevant. We're giving the patch a try. Thanks, Charles >> Thanks very much, >> >> Charles >> >> (kgdb) newcastle# kgdb kernel.debug /crash/vmcore.0 >> [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: >> Undefined symbol "ps_pglobal_lookup"] >> GNU gdb 6.1.1 [FreeBSD] >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and you are >> welcome to change it and/or distribute copies of it under certain conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for details. >> This GDB was configured as "i386-marcel-freebsd". >> >> Unread portion of the kernel message buffer: >> panic: lockmgr: upgrade without shared >> cpuid = 3 >> Uptime: 1d0h5m24s >> Physical memory: 6126 MB >> Dumping 495 MB: 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 >> 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16 >> >> #0 doadump () at pcpu.h:195 >> 195 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); >> (kgdb) backtrace >> #0 doadump () at pcpu.h:195 >> #1 0xc0505537 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 >> #2 0xc05057f9 in panic (fmt=Variable "fmt" is not available. >> ) at /usr/src/sys/kern/kern_shutdown.c:563 >> #3 0xc04f3cb7 in _lockmgr (lkp=0xcc69ee28, flags=8212, interlkp=0xcc69ee58, >> td=0xcc11e660, file=0xc088d931 "/usr/src/sys/kern/vfs_subr.c", line=2213) >> at /usr/src/sys/kern/kern_lock.c:310 >> #4 0xc05710b0 in vop_stdlock (ap=0xec5c6bfc) >> at /usr/src/sys/kern/vfs_default.c:266 >> #5 0xc07734b6 in VOP_LOCK1_APV (vop=0xc0919180, a=0xec5c6bfc) >> at vnode_if.c:1618 >> #6 0xc06cc3eb in ffs_lock (ap=0xec5c6bfc) >> at /usr/src/sys/ufs/ffs/ffs_vnops.c:391 >> #7 0xc07734b6 in VOP_LOCK1_APV (vop=0xc09296c0, a=0xec5c6bfc) >> at vnode_if.c:1618 >> #8 0xc057f9b1 in vput (vp=0xcc69edd0) at vnode_if.h:851 >> #9 0xc06f9f54 in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:1025 >> #10 0xc04e43c9 in fork_exit (callout=0xc06f8f30 , arg=0x0, >> frame=0xec5c6d38) at /usr/src/sys/kern/kern_fork.c:781 >> #11 0xc0746f80 in fork_trampoline () at >> /usr/src/sys/i386/i386/exception.s:205 >> (kgdb) >> >> >> * KERNEL CONF ** >> >> # Inherit config (most stuff) -- this is slightly customized version... >> # inherits from GENERIC-cust (tweaked to disable the default scheduler) >> include PAE-cust >> >> # Name of this kernel >> ident BEACON >> >> >> # Scheduler -- From /usr/src/sys/conf/NOTES: >> # >> # SCHED_ULE provides significant performance advantages over 4BSD on many >> # workloads on SMP machines. It supports cpu-affinity, per-cpu runqueues >> # and scheduler locks. It also has a stronger notion of interactivity >> # which leads to better responsiveness even on uniprocessor machines. This >> # will eventually become the default scheduler. >> # >> optionsSCHED_ULE >> >> >> # Note: we're compiling modules in statically since with PAE we don't want to >> # load KLDs. See comments in pae(4) and PAE kernel conf file. >> >> # Hardware Monitoring / Management >> >> device ipmi >> >> # Storage >> >> options GEOM_JOURNAL >> >> # Firewall >> >> device pf #PF OpenBSD packet-filter firewall >> device pflog #logging support interface for PF >> # device pfsync #synchronization interface for PF >> >> options ALTQ >> >> options ALTQ_CBQ >> options ALTQ_RED >> options ALTQ_RIO >> options ALTQ_HFSC >> options ALTQ_CDNR >> op
READ_BIG w/DVD-pata motion? (kern/133122)
Hello, Just under a month ago I added a comment to this bug, to share that we're seeing this issue consistently with one of the HP appliance platforms that we support (with both FreeBSD 7.0 and 7.1). This bug (marked as "serious") is still in the "open" state, with as yet no sign that anyone is working on it. Does anyone have a sense as to when it might be given a good look? http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/133122 Thanks much, Charles -- **Charles Owens** *Great Bay Software* ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: READ_BIG w/DVD-pata motion? (kern/133122)
Gavin Atkinson wrote: > On Mon, 2009-08-17 at 10:58 -0400, Charles Owens wrote: > >> Just under a month ago I added a comment to this bug, to share that >> we're seeing this issue consistently with one of the HP appliance >> platforms that we support (with both FreeBSD 7.0 and 7.1). This bug >> (marked as "serious") is still in the "open" state, with as yet no >> sign >> that anyone is working on it. Does anyone have a sense as to when it >> might be given a good look? >> > > A few bits of information are missing from that PR, it would be useful > if you could add the following to it: > > - A full verbose dmesg > - two outputs of "vmstat -i", say 10 seconds apart, both before and > after the problems start. > > FWIW, I'm not convinced that your problem is the same as the submitter's > problem - he's complaining only on issues when he hits the last block of > the CD, and you're having problems with slow reads throughout the whole > CD. If you're not using a "JMicron JMB363 SATA300 controller" it's > likely your issue is different to kern/133122 and it would be worth > creating a new PR with a full description of exactly what symptoms you > see, and the above information. > > Gavin > Thanks... we'll work on submitting a new PR as you've suggested. Charles ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
gjournal "error while writing data (error=6)"
Hello folks, We've had a system crash, apparently related to GEOM_JOURNAL, on an i386 system running 7.0-RELEASE-p11. Here's what we could see on the screen (formatted for readability): GEOM_JOURNAL: [copy] Error while writting data (error=6) \ ad4s1a[WRITE(offset=43561402368, length=16384)] GEOM_JOURNAL: [copy] Error while writting data (error=6) \ ad4s1a[WRITE(offset=48868164096, length=16896)] GEOM_JOURNAL: Error while reading data from ad4s1a (error=6). mode=0134172, inum=5323776, fs = / panic: ffs_valloc: dup alloc cpuid = 3 Uptime: 119d10h5m43s Cannot dump. No dump device defined. GEOM_JOURNAL: Flush of cache of ad4s1a: error=6. GEOM_JOURNAL: [flush] Error while writting data (error=6) \ ad4s1a[WRITE(offset=48868197888, length=98816)] (4 more lines like last one) Rebooting... cpu_reset: Stopping other CPUs The system didn't actually reboot.. just got stuck there. When it was eventually manually rebooted, it booted just fine. Any thoughts as to what could be the real problem? What does "error=6" indicate? I've done some scouring of the net and found something that may not directly relate to this crash... but does relate, at least, to my filesystem configuration. One of the threads: http://markmail.org/message/tamo4r2jho3zdv3z In the described crash, similar error messages were seen, but with "error=1". Ultimately Pawel Dawidek (gjournal author) gave the diagnosis that the crash was related to the first filesystem in the slice being set up with an offset of zero, not the correct offset of 16. Either in this thread or elsewhere I also learned that sysinstall always uses the zero offset... even though it is not best practice. Not a happy discovery. Looking at our system that crashed... sure enough, zero offset (see label below below -- both 'a' and 'd' are journaled). So this then prompts two questions: * Can our crash be explained by the zero offset filesystem configuration? * If not, separate from the crash, how much should we be worried about running a system with gjournal like this. Thanks very much for any and all assistance, Charles # bsdlabel ad4s1 # /dev/ad4s1: 8 partitions: #size offsetfstype [fsize bsize bps/cpg] a: 7759462404.2BSD 2048 16384 28552 b: 24113088 77594624 swap c: 1562963220unused0 0 # "raw" part, don't edit d: 54588610 1017077124.2BSD 2048 16384 28552 -- **Charles Owens** *Great Bay Software**|** ** e: *cow...@greatbaysoftware.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"