Re: Deprecating base system ftpd

2021-04-10 Thread Scott Bennett via freebsd-stable
 On Fri, 09 Apr 2021 07:32:12 +0900 aventa...@fastmail.fm wrote:

>It makes me think that there should be an offering for two completely 
>different audiences:
>(1) FreeBSD core (a very minimal offering for folks that want to build things, 
>like a Desktop, etc.)
>(2) FreeBSD server (an offering for folks that want a server build)
>
>Perhaps that idea is just unreasonably crazy as well. 
>
 LOL!  You have what is called a very big ask.  I would like
something far smaller, namely, a choice of schedulers during/just
after installation of a -RELEASE without having to a) download the
entire source tree, b) make buildworld, and c) make buildkernel.
The kernel developers in their wisdom--ahem--have burdened all new
installations with the abysmal performance of the ULE scheduler.
The installation images for -STABLE versions are much the same.
The 4BSD scheduler has been far from optimal, and the ULE scheduler
looked like a nice idea on paper for newer CPUs, but in fact, the
ULE scheduler's performance is awful, even when compared with the
4BSD scheduler, which generally gives acceptable, though not optimal,
performance.
 If the owner of a new installation wants to get passably usable
performance from his new system, he must first perform the tasks
noted above.  The second and third tasks will take *a lot* of extra
time because they must be done under the ULE scheduler.  Then one
must install the new kernel, reboot, do the mergemaster or /etc/update
steps, install the new world, more mergemaster or /etc/update, and
reboot again.
 Two ways of allowing a choice of scheduler are 1) to provide two
GENERIC kernels, e.g., GENERIC.ULE and GENERIC.4BSD, from which one
could choose at boot time, and 2) to compile both schedulers into the
GENERIC kernel, which could be selected from by a loader tunable at
boot time.
 The current system is yet another discouragement to upgrading to
a new -RELEASE via a new installation.  Further, this fix to bad
performance by default is not documented anywhere.  How is a user who
is new to FreeBSD to know about it?


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Vinum deprecation for FreeBSD 14 - are there any remaining Vinum users?

2021-04-09 Thread Scott Bennett via freebsd-stable
Eugene Grosbein  wrote:

> 07.04.2021 12:49, Scott Bennett via freebsd-stable wrote:
>
> >  At least w.r.t. gvinum's raid5, I can attest that the kernel panics
> > are real.  Before settling on ZFS raidz2 for my largest storage pool, I
> > experimented with gstripe(8), gmirror(8), graid3(8), and graid5(8) (from
> > sysutils/graid5).  All worked reasonably well, except for one operation,
> > namely, "stop".  Most/all such devices cannot actually be stopped because
> > a stopped device does not *stay* stopped.  As soon as the GEOM device
> > node is destroyed, all disks are retasted, their labels, if any, are
> > recognized, and their corresponding device nodes are recreated and placed
> > back on line. :-(  All of this happens too quickly for even a series of
> > commands entered on one line to be able to unload the kernel module for
> > the device node type in question, so there is no practical way to stop
> > such a device once it has been started.
>
> In fact, you can disable re-tasting with sysctl kern.geom.notaste=1,
> stop an GEOM, clear its lables and re-enable tasting setting 
> kern.geom.notaste=0 back.

 Thank you for this valuable, but undocumented, workaround!  However, it
serves to demonstrate the bugs in gstripe(8), gmirror(8), graid3(8), and
graid5(8), and perhaps a few others, either in the commands themselves, which
do not behave as advertised in their respective man pages or in their man pages
for not correctly documenting the commands' actual behavior.
>
> >  A special note is needed here regarding gcache(8) and graid3(8).  The
> > documentation of gcache parameters for sector size for physical devices
> > and gcache logical devices is very unclear, such that a user must have the
> > device nodes and space on them available to create test cases and do so,
> > whereas a properly documented gcache(8) would obviate the need to set up
> > such experiments.  There is similar lack of clarity in various size
> > specifications for blocks, sectors, records, etc. in many of the man pages
> > for native GEOM commands.
>
> I found gcache(8) very nice at first, it really boosts UFS performance 
> provided
> you have extra RAM to dedicate to its cache. gcache can be stacked with 
> gmirror etc.
> but I found it guilty to some obscure UFS-related panics. It seems there were 
> races or something.
> No data loss, though as it is intended to be transparent for writing.

 There are other, also undocumented, problems.  For example, I played with
gcache(8) for a short time as a method of dividing a ZFS pool into two extents
on a drive in order to place a frequently accessed partition between them.  It
worked nicely for a while, but the first time that gcache(8) choked it made a
real mess of the ZFS pool's copy on that drive.  As a result I immediately
abandoned that use of gcache(8).
 gcache(8) usses two poorly defined sysctl values, kern.geom.cache.used_hi
and kern.geom.cache.used_lo.  Its man page shows them with default values, but
neglects to mention whether they are enforced limits or merely sysctl variables
that report current or high and low watermark usages.
>
> I was forced to stop using gcache for sake of stability and it's a shame.
> For example, dump(8) speed-up due to gcache was 2x at least with big cache
> comparing to dump -C32 without gcache.
>
 I used it to make all accesses to a graid3(8) set of partitions work with
64 KB and 32 KB block sizes for UFS2 efficiency on a graid3(8) device.  That use
worked very nicely, but it took some experimentation to figure out how to do it
because the man page is so ambiguous about the gcache command's options and
arguments.
 A similar complaint could be leveled at the man pages for gstripe(8),
graid3(8), and graid5(8) w.r.t. their undocumented definitions of stripe size,
sector size, and block size.  At present, without reading the command and kernel
source code for each or experimenting extensively, it is difficult to understand
what the commands' options and arguments will do and which combinations of their
numerical values can be valid and accepted.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Vinum deprecation for FreeBSD 14 - are there any remaining Vinum users?

2021-04-06 Thread Scott Bennett via freebsd-stable
ata.  IOW, it can be helpful in a potentially large
number of situations for some users, especially for data recovery, but
that is a different purpose from that for which the other GEOM RAID
commands were written to serve.  Again, IMO, gstripe(8), gmirror(8),
graid3(8), and graid5(8) from sysutils/graid5 should be improved, not
eliminated.  Anything gvinum(8) is *supposed* to be able to do, but often
cannot do without violating other system functions, can be done well by
these other GEOM commands.
 A special note is needed here regarding gcache(8) and graid3(8).  The
documentation of gcache parameters for sector size for physical devices
and gcache logical devices is very unclear, such that a user must have the
device nodes and space on them available to create test cases and do so,
whereas a properly documented gcache(8) would obviate the need to set up
such experiments.  There is similar lack of clarity in various size
specifications for blocks, sectors, records, etc. in many of the man pages
for native GEOM commands.
 While you are looking into this situation, please also consider
that deprecation and elimination of the veritably ancient ccd and
ccdconfig are long overdue.  Even the man pages state that these device
nodes are *not* robust and data can be easily lost.  The fact that NetBSD
separately maintains some version of ccd and ccdconfig should be
considered irrelevant in deciding to deprecate and/or eliminate the
supporting code from the source tree.
>
>I plan to add a deprecation notice after a short discussion period,
>assuming no reasonable justification is made to retain it. The notice
>would suggest graid and ZFS as alternatives, and would be merged in
>advance of FreeBSD 13.1. Then, gvinum would be removed in advance of
>FreeBSD 14.0.
>
>Please follow up if you have experience or input on vinum in FreeBSD,
>including past use but especially if you are still using it today and
>expect to continue doing so.

 Given the panics and other problems with gvinum(8), I cannot
recommend that anyone use it.  After experimenting with it, I ended up
using ZFS, gmirror(8), graid5(8), and gconcat(8) to meet my needs.
In sum, I recommend maintaining and enhancing to some degree the native
GEOM support and letting unfinished and/or unmaintained support for
gvinum(8), ccd(8) (and ccd(4)), and ccdconfig(8) be abandoned.  Please
reverse the deprecation for sysutils/graid5, which actually works as
specified, and complete its man page.  Please also add a scrub function
to it.  RAID5, whether by hardware or by software, has known limitations,
but for some purposes it is not only adequate, but is a better choice
than ZFS.  GEOM-based RAID support is also much for versatile and
flexible than hardware RAID, so let's keep it available as an option.
At the same time, the poorly supported, obsolete, and incompatible-with-
other-system-components stuff should rightly be eliminated from the source
tree.  The few known bugs with the native GEOM commands can and should be
fixed.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: swap space issues

2020-07-12 Thread Scott Bennett via freebsd-stable
Don Wilde  wrote:

>
> On 7/11/20 11:28 PM, Scott Bennett via freebsd-stable wrote:
> >   I have read this entire thread to date with growing dismay, and I
> > thank Donald Wilde for reporting his ongoing troubles, although they
> > spoil my hopes that the kernel's memory management bugs that first became
> > apparent in 11.2-RELEASE (and -STABLE around the same time) were not
> > propagated into 12.x.  A recent update to stable/12 source tree made it
> > finally possible for me to build 12.1-STABLE under 11.4-PRERELEASE, and I
> > was just about to install the upgrade when this thread appeared.
> Spoiler alert. Since I gave up on Synth, I haven't had a single swap 
> issue. It does appear to be one particular port that drove it nuts 
> (apparently, one of the 'Google performance' bits, with a 
> mismatched-brackets problem). I have rebuilt the machine several times, 
> but that's more for my sense of tidiness than anything.
>
> I've got a little Crystal script that walks the installed packages and 
> ports and updates them with system() calls.
> The machine is very slow, but it's not swapping at all.

 That's good.  I use portmaster, but not often at present because a
"portmaster -a" run can only be done two or three times per boot before real
memory is locked down to the extent that the system is no longer functional
(i.e., even a scrub of ZFS pools comes to a halt in mid scrub due to lack of a
sufficient supply of free page frames).
 The build procedures of certain ports consistently get killed by the OOM
killer, along with much collateral damage.  I've noticed that lang/golang and
lang/rust are prime examples now, although both used to build without problems.
>
> It is quite usable now with 12-STABLE.

 I don't see any good reason to go through the hassle and lost time of an
upgrade across a major release boundary if I still won't have a production OS
afterward.  I'm already dealing with a graphics stack rendered unsafe to use by
the ongoing churn in X11 code.  (See PR #247441, kindly filed for me by Pau
Amma.)
> >
> >   On Fri, 26 Jun 2020 03:55:04 -0700 : Donald Wilde 
> > wrote:
> >
> >> On 6/26/20, Peter Jeremy  wrote:
> >>>
> [snip]
> >>> I strongly suggest you don't have more than one swap device on spinning
> >>> rust - the VM system will stripe I/O across the available devices and
> >>> that will give particularly poor results when it has to seek between the
> >>> partitions.
> >   True.  The only reason I can think of to use more than one swapping/
> > paging area on the same device for the same OS instance is for emergencies
> > or highly unusual, temporary situations in which more space is needed until
> > those situations conclude. and even in such situations, if the space can be
> > found on another device, it should be placed there.  Interleaving of swap
> > space across multiple devices is intended as a performance enhancement
> > akin to striping (a.k.a. RAID0), although the virtual memory isn't
> > necessarily always actually striped across those devices.  Adding a paging
> > area on the same device as an existing one is an abhorrent situation, as
> > Peter Jeremy noted, and it should be eliminated via swapoff(8) as soon as
> > the extraordinary situation has passed.  N.B. the GENERIC kernel sets a
> > limit of four swap devices, although it can be rebuilt with a different
> > limit.
> That's good data, Scott, thanks! The only reason I got into that 
> situation of trying to add another swap device was that it was crashing 
> with OO swap messages.

 I don't recall you posting those messages, but it sounds like exactly the
*temporary* situation in which adding an inappropriately placed paging area can
be used long enough to get you out of a bind without a reboot, even though
performance will probably suffer until you have removed it again.  Poor
performance is usually preferable to no performance if it is only temporary.
 One cautionary note in such situations, though, applies to remote paging
areas.  Sparse files allocated on the remote system should not be used as
paging areas.  For example, I discovered the hard way (i.e., the problem was
not documented) that SunOS would crash if a sparse file via NFS were added as
a paging area and the SunOS system tried to write a page out to an unallocated
region of the file, which was essentially all of the file at first.

> >> My intent is to make this machine function -- getting the bear
> >> dancing. How deftly she dances is less important than that she dances
> >> at all. My for-real boxen will have real HP and real cores and RAM.
>

Re: swap space issues

2020-07-11 Thread Scott Bennett via freebsd-stable
h, when the free page frame list
has less that ~410 MB of page frames on it.  Setting the 
vm.pageout_wakeup_thresh
to at least 410 MB *seems* to help reduce the number of times a process that
has been marked as swapped out when the system has been under some form of
memory pressure, but it doesn't stop it from happening when the kernel has
pagefixed far too many page frames and hasn't pagefreed them when no longer
really requiring them to remain in real memory.  I don't know whether the bug
is one of illegitimately pagefixing or failure to pagefree, but eventually
the number of page frames tied up becomes so high that too few frames remain
available for the kernel to be willing to page processes back in to resume
them.  Increasing vm.pageout_wakeup_thresh drastically from its default value
is the primary way I have found to extend the time that my system remains
usable before I am forced to reboot.
 As mentioned previously, I am dismayed that 12.1 appears to contain at
least some of 11.2+'s bugs.  Given that 11.1 went EOL a long time ago, that
means that there is presently *NO PRODUCTION RELEASE OF FREEBSD AVAILABLE*
since 11.1-RELEASE, the FreeBSD project web site's erroneous claims
notwithstanding.  This is an awful situation and probably calls for some
corrective action by the FreeBSD core team.  A production release does not
force a reboot every few days or even every month or two in order to remain
usable.  That's one of the reasons Windows through XP and VISTA were never
production systems, even though Mico$lop gave its users no production
alternatives.  FreeBSD used to do better and it should be doing better now.
 It is worth noting that a few years ago, FreeBSD project staff realized
that an elderly Pentium II(?) machine running FreeBSD 2.something and still
doing some simple, but necessary, task for the project had an uptime in excess
of 19 *years*.  That is a reliability record for any OS to strive for.  It is
a shame that FreeBSD's quality control no longer results in anything close to
that.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Is it me or does FreeBSD (12.1 amd64) hang when I manually snapshot it in VSphere 6.7?

2020-05-22 Thread Scott
On Fri, May 22, 2020 at 12:31:29AM +0200, rai...@ultra-secure.de wrote:
> Hi,
> 
> 
> subject says it all, basically.
> 
> The system becomes totally unresponsive and has to be power-cycled.
> 
> da0 at mpt0 bus 0 scbus2 target 0 lun 0
> da0:  Fixed Direct Access SCSI-2 device
> da0: 320.000MB/s transfers (160.000MHz, offset 127, 16bit)
> da0: Command Queueing enabled
> da0: 40960MB (83886080 512 byte sectors)
> da0: quirks=0x140
> 
> 
> This is a bit of a problem here, to say the least.
> 
> 
#metoo:

HV: VMware ESXi, 6.7.0, 11675023
VM: # freebsd-version -kru  
  
12.1-RELEASE-p1 
  
12.1-RELEASE-p1 
  
12.1-RELEASE-p1 
  
# uname -a  
  
FreeBSD XXX 12.1-RELEASE-p1 FreeBSD 12.1-RELEASE-p1 GENERIC  amd64
# pkg info open-vm\*
open-vm-tools-nox11-11.0.1_1,2

doesn't hang but becomes *very* unresponsive over the network.  pings are 
shot.  Console is fine though.  100% idle on top across 4 cores.  And:

# openssl speed -evp aes-256-cbc
...
1024 bytes block:  542595.18k (1593774 in 3 seconds)

I never noticed that before.  I normally turn off snapshotting the RAM 
because it's so much faster and I'm happy with crash consistent rollbacks.

Rebooting fixes the awful network behaviour.  I didn't check the disk 
subsystem.

Scott
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: possible problem with pkg in FreeBSD11.4-BETA2

2020-05-17 Thread Scott Bennett
Mike Karels  wrote:

> >  On Date: Sat, 16 May 2020 16:05:09 +0200 Kurt Jaeger 
> > wrote:
> >  [Mike Karels  wrote:]
>
> > > > This might be due to cockpit error, but I don't remember the exact 
> > > > history
> > > > of this machine.  I have a test machine that was running 11.3-RELEASE, 
> > > > and
> > > > had been upgraded with freebsd-update.  I upgraded it to 11.4-BETA1 last
> > > > week, and BETA2 this weekend.  Then I tried "pkg upgrade" and got an
> > > > error:
> > >
> > > Use
> > >
> > > pkg bootstrap -f
> > >
> > > There's some hickup in the upgrade path between your version
> > > and the most current, so...
> > >
> >  He switched from -RELEASE packages to -STABLE packages.
>
> Are BETA releases treated as -STABLE?  If so, would this problem be
> expected?  If so, it should be handled much more gracefully.  If not,
> I don't think that's the problem; I'm reasonably sure this machine
> has only been upgraded with freebsd-update; it was running an 11.3
> patch version.
>
 Now that you ask, that's a good question.  I haven't run "make buildworld"
and "make buildkernel" since 28 April because I'm trying to minimize being 
forced
to reboot by the kernel's memory management bugs, so my stable/11 system still
says "11.4-PRERELEASE".  It has been long enough since 11.3 happened that I 
don't
recall whether it went through a transition period of BETAx and RCx changes as 
it
changed from 11.2 to 11.3.  Of course, those memory management bugs first caused
big problems starting when 11.2 happened, so I may have missed some of the
transition due to infrequent system updates.  I only remember that it won't say
it's 11.4-STABLE until the release happens and I've updated /usr/src and built 
and
installed a new system.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: OpenZFS port updated

2020-04-17 Thread Scott Long
Is the intention to eventually replace the zfs code in src/ ?  What will be the 
long-term relationship between src/ and ports/ for this?

Scott


> On Apr 17, 2020, at 12:35 PM, Ryan Moeller  wrote:
> 
> FreeBSD support has been merged into the master branch of the openzfs/zfs 
> repository, and the FreeBSD ports have been switched to this branch.
> 
> OpenZFS brings many exciting features to FreeBSD, including:
> * native encryption
> * improved TRIM implementation
> * most recently, persistent L2ARC
> 
> Of course, avoid upgrading your pools if you want to keep the option to go 
> back to the base ZFS.
> 
> OpenZFS can be installed alongside the base ZFS. Change your loader.conf 
> entry to openzfs_load=“YES” to load the OpenZFS module at boot, and set PATH 
> to find the tools in /usr/local/sbin before /sbin. The base zfs tools are 
> still basically functional with the OpenZFS module, so changing PATH in rc is 
> not strictly necessary.
> 
> The FreeBSD loader can boot from pools with the encryption feature enabled, 
> but the root/bootenv datasets must not be encrypted themselves.
> 
> The FreeBSD platform support in OpenZFS does not yet include all features 
> present in FreeBSD’s ZFS. Some notable changes/missing features include:
> * many sysctl names have changed (legacy compat sysctls should be added at 
> some point) 
> * zfs send progress reporting in process title via setproctitle
> * extended 'zfs holds -r' 
> (https://svnweb.freebsd.org/base?view=revision&revision=290015)
> * vdev ashift optimizations 
> (https://svnweb.freebsd.org/base?view=revision&revision=254591)
> * pre-mountroot zpool.cache loading (for automatic pool imports)
> 
> To the last point, this mainly effects the case where / is on ZFS and /boot 
> is not or is on a different pool. OpenZFS cannot handle this case yet, but 
> work is in progress to cover that use case. Booting directly from ZFS does 
> work.
> 
> If there are pools that need to be imported at boot other than the boot pool, 
> OpenZFS does not automatically import yet, and it uses /etc/zfs/zpool.cache 
> rather than /boot/zfs/zpool.cache to keep track of imported pools.  To ensure 
> all pool imports occur automatically, a simple edit to /etc/rc.d/zfs will 
> suffice:
> 
> diff --git a/libexec/rc/rc.d/zfs b/libexec/rc/rc.d/zfs
> index 2d35f9b5464..8e4aef0b1b3 100755
> --- a/libexec/rc/rc.d/zfs
> +++ b/libexec/rc/rc.d/zfs
> @@ -25,6 +25,13 @@ zfs_start_jail()
> 
> zfs_start_main()
> {
> + local cachefile
> +
> + for cachefile in /boot/zfs/zpool.cache /etc/zfs/zpool.cache; do
> + if [ -f $cachefile ]; then
> + zpool import -c $cachefile -a
> + fi
> + done
>   zfs mount -va
>   zfs share -a
>   if [ ! -r /etc/zfs/exports ]; then
> 
> This will probably not be needed long-term. It is not necessary if the boot 
> pool is the only pool.
> 
> Happy testing :)
> 
> - Ryan
> ___
> freebsd-curr...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Slow zfs destroy

2019-12-02 Thread Scott Bennett
Eugene Grosbein  wrote:

> 30.11.2019 0:57, Scott Bennett wrote:
>
> >  On Thu, 28 Nov 2019 23:18:37 +0700 Eugene Grosbein 
> > wrote:
> > 
> >> 28.11.2019 20:34, Steven Hartland wrote:
> >>
> >>> It may well depend on the extent of the deletes occurring.
> >>>
> >>> Have you tried disabling TRIM to see if it eliminates the delay?
> >>
> >> This system used mfi(4) first and mfi(4) does not support TRIM at all. 
> >> Performance was abysmal.
> >> Now it uses mrsas(4) and after switch I ran trim(8) for all SSDs 
> >> one-by-one then re-added them to RAID1.
> >> Disabling TRIM is not an option.
> >>
> >> Almost a year has passed since then and I suspect SSDs have no or a few 
> >> spare trimmed cells for some reason.
> >> Is there documented way to check this out? Maybe some SMART attribute?
> >>
> >  You neglected to state whether you used "zfs destroy datasetname" or
> > "zfs destroy -d datasetname".  If you used the former, then ZFS did what
> > you told it to do.  If you want the data set destroyed in the background,
> > you will need to include the "-d" option in the command.  (See the zfs(1)
> > man page at defer_destroy under "Native Properties".)
>
> The manual says "zfs destroy -d" is not for "background" but for "deferred".
> The "zfs destroy" without -d would return EBUSY for a snapshot on hold (zfs 
> hold)
> or bound with a clone, but "zfs destroy -d" would mark the snapshot for later 
> destruction
> in a moment the clone is deleted or user lock (hold) is lifted.
> Until then the snapshot still usable and destruction does not happen.
>
> All my snapshots are free from holds or clones and can be deleted,
> so "zfs destroy -d" is equal to "zfs destroy" for them.
>
 What you say is true, and I have seen it accept a "zfs destroy -d" for a
held snapshot but do nothing until the hold is released, whereupon the "destroy"
begins.  However, that cannot be the whole story because...
 The vast majority of my "destroy" operations are for snapshots, but what
I have seen is that, without the "-d", the command does not return until the
disk activity of the "destroy" finishes, but with the "-d", it returns within
a couple of seconds,--i.e., just long enough to get the operation going--and
the disk I/Os continue until the work is done and free space in the pool 
increases
until the I/Os stop.
 Perhaps the man pages for zfs(8) and zpool-features(7) need some 
modification/
clarification on this matter.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Slow zfs destroy

2019-11-29 Thread Scott Bennett
 On Thu, 28 Nov 2019 23:18:37 +0700 Eugene Grosbein 
wrote:

>28.11.2019 20:34, Steven Hartland wrote:
>
>> It may well depend on the extent of the deletes occurring.
>> 
>> Have you tried disabling TRIM to see if it eliminates the delay?
>
>This system used mfi(4) first and mfi(4) does not support TRIM at all. 
>Performance was abysmal.
>Now it uses mrsas(4) and after switch I ran trim(8) for all SSDs one-by-one 
>then re-added them to RAID1.
>Disabling TRIM is not an option.
>
>Almost a year has passed since then and I suspect SSDs have no or a few spare 
>trimmed cells for some reason.
>Is there documented way to check this out? Maybe some SMART attribute?
>
 You neglected to state whether you used "zfs destroy datasetname" or
"zfs destroy -d datasetname".  If you used the former, then ZFS did what
you told it to do.  If you want the data set destroyed in the background,
you will need to include the "-d" option in the command.  (See the zfs(1)
man page at defer_destroy under "Native Properties".)


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: syslogd truncating messages on FreeBSD >11.2-R

2019-11-13 Thread Scott
Thanks all.

Scott

On Wed, Nov 13, 2019 at 05:11:51PM +1100, Kubilay Kocak wrote:
> On 9/11/2019 2:08 pm, Scott wrote:
> > Hi,
> > 
> > please let me know if there is a better forum for this.
> > 
> > As of 11.3-RELEASE syslogd truncates all forwarded messages to 480 bytes for
> > IPV4 and 1180 for IPv6.
> > 
> > The change occurs in the added code:
> > 
> >  switch (f->f_type) {
> >  case F_FORW:
> >  /* Truncate messages to RFC 5426 recommended size. */
> >  dprintf(" %s", f->fu_forw_hname);
> >  switch (f->fu_forw_addr->ai_addr->sa_family) {
> > #ifdef INET
> >  case AF_INET:
> >  dprintf(":%d\n",
> >  
> > ntohs(satosin(f->fu_forw_addr->ai_addr)->sin_port));
> >  iovlist_truncate(il, 480);
> >  break;
> > #endif
> > 
> > There's more code for IPv6 and the function iovlist_truncate itself.
> > 
> > This change is not turned on by a switch and happens automatically, however 
> > I
> > can't find it documented in UPDATING or the release notes.  I would have
> > thought that any change in default behaviour of the system should at least 
> > be
> > documented.
> > 
> > Ideally this change would have been implemented via a switch given that the
> > RFC mentioned in the code (RFC 5426) does not mandate truncation, but
> > recommends it when the network MTU is not known.
> > 
> > What's the best way to reach out to the maintainer to suggest a switch to
> > turn on this code?
> > 
> > Thanks,
> > Scott
> 
> 
> Hi Scott,
> 
> Looks like this came in in base r332510 [1] via:
> 
> https://reviews.freebsd.org/D15011
> 
> The commit log message and review history may provide additional context 
> for the behaviour change (I haven't reviewed them)
> 
> I have CC'd the committer on this email, but in the meantime, and if you 
> wouldn't mind, it is worth reporting the issue via Bugzilla, so that we 
> track it, and so others can find it.
> 
> [1] https://svnweb.freebsd.org/changeset/base/332510
> 
> --
> Regards,
> 
> koobs
> 
> 
> 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kernel bug in 11.3-STABLE causes frequent crashes

2019-11-09 Thread Scott Bennett
Eugene,
 Thank you very much for the fast reply!

Eugene Grosbein  wrote:

> 09.11.2019 19:45, Scott Bennett ?:
> >  The rest of this message was posted a little while ago to the
> > freebsd-questions list by mistake.  It was intended for freebsd-stable,
> > so I am posting it here now after posting a brief apology on the other
> > list.
> >  I have had to waste a great deal of time lately in recovering my
> > system from crashes due to a kernel bug.  At present, my system is
> > 
> > FreeBSD hellas 11.3-STABLE FreeBSD 11.3-STABLE #12 r352571: Sat Sep 21 
> > 11:39:52 CDT 2019 bennett@hellas:/usr/obj/usr/src/sys/hellas  amd64
> > 
> > There are actually at least two problems, but this particular one has been
> > causing a large portion of my forced reboots.  It usually fails to produce
> > a dump and freezes right after the panic and backtrace messages, as it did
> > earlier tonight, but Wednesday night it did create a dump, which I am
> > keeping in case it should prove helpful in getting the bug identified and
> > solved.  I copied the console messages to paper painstakingly by hand.
> > They appear to be identical each time, except, of course, for the messages
> > that a dump is produced when, indeed, it does produce one.  I am omitting
> > those fairly standard messages.
> > 
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 2; apic id = 02
> > fault virtual address   = 0x3b8
> > fault code  = supervisor read data, page not present
> > instruction pointer = 0x20:0x80a4b14c
> > stack pointer   = 0x0:0xfe012a60ea50
> > frame pointer   = 0x0:0xfe012a60eae0
> > code segment= base 0x0, limit 0xf, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags= interrupt enabled, resume, IOPL = 0
> > current process = 28 (flowcleaner)
> > trap number = 12
> > panic: page fault
> > cpuid = 2
> > KDB: stack backtrace:
> > #0 0x80a94707 at kdb_backtrace+0x67
> > #1 0x80a4fa2e at vpanic+0x17e
> > #2 0x80a4f8a3 at panic+0x43
> > #3 0x80f3a4d0 at trap_pfault+0
> > #4 0x80f3a519 at trap_pfault+0x49
> > #5 0x80f39bad at trap+0x29
> > #6 0x80f19f33 at calltrap+0x8
> > #7 0x80b3bb8d at flowtable_clean_vnet+0x43d
> > #8 0x80b3c758 at flowtable_cleaner+0xc8
> > #9 0x80a12ea2 at fork_exit+0x82
> > #10 0x80flaf4e at fork_trampoline+0xe
> > 
> >  The machine is ancient.  The CPU is a QX9650 (last group of Core 2
> > Quads) with 8 GB of DDR3 memory.
> >  If this can be identified as a known bug and a clue provided to a
> > patch or a safer version to upgrade to, I would be grateful.  I am getting
> > very, very tired of these crashes.
> >  The other forced reboots I will describe in a separate message, but
> > that problem has existed since the time of 11.2-RELEASE and apparently was
> > never investigated, much less fixed, although people began complaining on
> > this list and possibly -questions within the first few days after the
> > release date.
> >  Thanks in advance for any help with this problem!
>
> It seems you have custom kernel with options FLOWTABLE. The code it includes
> is known to be buggy, this options was removed from GENERIC many releases ago.
> Remove it from your kernel configuration, rebuild kernel and you will be fine.
>
 Wonderful.  I have a comment on that line, saying I added it for 8.x, so I
probably found it in 8.1's GENERIC configuration file when I was preparing to 
upgrade
from 7.3.  It is interesting that it only started hitting me (hard enough to 
make
me notice it, at least) in 11.3 and maybe a bit earlier in 11.2.  Anyway, that 
will
be easy enough to fix, but will require rolling /usr/src back to the revision I 
am
running, which is probably also no big deal.
  I don't seem to be able to build it at the current source revision because
11-STABLE's buildworld began failing during the libc build two or three weeks 
ago.
I just tried "svn update /usr/src" again, followed by "make -j6 buildworld", 
and it
still fails with this ending.

--- libc_pic.a ---
ranlib -D libc_pic.a
--- libc.a ---
ranlib -D libc.a
--- libc.so.7.full ---
cc: error: unable to execute command: posix_spawn failed: Permission denied
cc: error: linker command failed with exit code 1 (use -v to see invocation)
*** [libc.so.7.full] Error code 1

make[4]: stopped in /usr/src/lib/libc
1 error

make[4]: stopped in /usr/src/lib/libc
*** [lib/libc__L] Error code 2

m

kernel bug in 11.3-STABLE causes frequent crashes

2019-11-09 Thread Scott Bennett
 The rest of this message was posted a little while ago to the
freebsd-questions list by mistake.  It was intended for freebsd-stable,
so I am posting it here now after posting a brief apology on the other
list.
 I have had to waste a great deal of time lately in recovering my
system from crashes due to a kernel bug.  At present, my system is

FreeBSD hellas 11.3-STABLE FreeBSD 11.3-STABLE #12 r352571: Sat Sep 21 11:39:52 
CDT 2019 bennett@hellas:/usr/obj/usr/src/sys/hellas  amd64

There are actually at least two problems, but this particular one has been
causing a large portion of my forced reboots.  It usually fails to produce
a dump and freezes right after the panic and backtrace messages, as it did
earlier tonight, but Wednesday night it did create a dump, which I am
keeping in case it should prove helpful in getting the bug identified and
solved.  I copied the console messages to paper painstakingly by hand.
They appear to be identical each time, except, of course, for the messages
that a dump is produced when, indeed, it does produce one.  I am omitting
those fairly standard messages.

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x3b8
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80a4b14c
stack pointer   = 0x0:0xfe012a60ea50
frame pointer   = 0x0:0xfe012a60eae0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 28 (flowcleaner)
trap number = 12
panic: page fault
cpuid = 2
KDB: stack backtrace:
#0 0x80a94707 at kdb_backtrace+0x67
#1 0x80a4fa2e at vpanic+0x17e
#2 0x80a4f8a3 at panic+0x43
#3 0x80f3a4d0 at trap_pfault+0
#4 0x80f3a519 at trap_pfault+0x49
#5 0x80f39bad at trap+0x29
#6 0x80f19f33 at calltrap+0x8
#7 0x80b3bb8d at flowtable_clean_vnet+0x43d
#8 0x80b3c758 at flowtable_cleaner+0xc8
#9 0x80a12ea2 at fork_exit+0x82
#10 0x80flaf4e at fork_trampoline+0xe

 The machine is ancient.  The CPU is a QX9650 (last group of Core 2
Quads) with 8 GB of DDR3 memory.
 If this can be identified as a known bug and a clue provided to a
patch or a safer version to upgrade to, I would be grateful.  I am getting
very, very tired of these crashes.
 The other forced reboots I will describe in a separate message, but
that problem has existed since the time of 11.2-RELEASE and apparently was
never investigated, much less fixed, although people began complaining on
this list and possibly -questions within the first few days after the
release date.
 Thanks in advance for any help with this problem!


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


syslogd truncating messages on FreeBSD >11.2-R

2019-11-08 Thread Scott
Hi,

please let me know if there is a better forum for this.

As of 11.3-RELEASE syslogd truncates all forwarded messages to 480 bytes for 
IPV4 and 1180 for IPv6.

The change occurs in the added code:

switch (f->f_type) {
case F_FORW:
/* Truncate messages to RFC 5426 recommended size. */
dprintf(" %s", f->fu_forw_hname);
switch (f->fu_forw_addr->ai_addr->sa_family) {
#ifdef INET
case AF_INET:
dprintf(":%d\n",
ntohs(satosin(f->fu_forw_addr->ai_addr)->sin_port));
iovlist_truncate(il, 480);
break;
#endif

There's more code for IPv6 and the function iovlist_truncate itself.

This change is not turned on by a switch and happens automatically, however I 
can't find it documented in UPDATING or the release notes.  I would have 
thought that any change in default behaviour of the system should at least be 
documented.

Ideally this change would have been implemented via a switch given that the 
RFC mentioned in the code (RFC 5426) does not mandate truncation, but 
recommends it when the network MTU is not known.

What's the best way to reach out to the maintainer to suggest a switch to 
turn on this code?

Thanks,
Scott
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


r352862 with puc serial card driver

2019-10-22 Thread Brian Scott
Hi,

I just updated my old workhorse system to r353746 (11.3-STABLE, i386)
and one of my serial connections (driving a 1-wire network) stopped working.

The card with the serial port is a:

puc0@pci0:4:3:0:    class=0x070002 card=0x40371409 chip=0x71681409
rev=0x01 hdr=0x00
    vendor = 'Timedia Technology Co Ltd'
    device = 'PCI2S550 (Dual 16550 UART)'
    class  = simple comms
    subclass   = UART

A ktrace/kdump of the program using the port (around the time when it
gets upset) is:

 71279 WeatherStation CALL 
openat(AT_FDCWD,0x29646010,0x6)
 71279 WeatherStation NAMI  "/dev/cuau3"
 71279 WeatherStation RET   openat 3
 71279 WeatherStation CALL  ioctl(0x3,TIOCGETA,0xbfbfe9c8)
 71279 WeatherStation RET   ioctl 0
 71279 WeatherStation CALL  ioctl(0x3,TIOCGETA,0xbfbfe9c8)
 71279 WeatherStation RET   ioctl 0
 71279 WeatherStation CALL  ioctl(0x3,TIOCSETAF,0xbfbfe9c8)
 71279 WeatherStation RET   ioctl 0
 71279 WeatherStation CALL  ioctl(0x3,TIOCFLUSH,0xbfbfe998)
 71279 WeatherStation RET   ioctl 0
 71279 WeatherStation CALL  ioctl(0x3,TIOCGETA,0xbfbfe9c4)
 71279 WeatherStation RET   ioctl 0
 71279 WeatherStation CALL  ioctl(0x3,TIOCSETAF,0xbfbfe9c4)
 71279 WeatherStation RET   ioctl 0
 71279 WeatherStation CALL  ioctl(0x3,TIOCSBRK,0)
 71279 WeatherStation RET   ioctl 0
 71279 WeatherStation CALL  select(0,0,0,0,0xbfbfe9b8)
 71279 WeatherStation RET   select 0
 71279 WeatherStation CALL  ioctl(0x3,TIOCCBRK,0)
 71279 WeatherStation RET   ioctl 0
 71279 WeatherStation CALL  nanosleep(0xbfbfe9f0,0)
 71279 WeatherStation RET   nanosleep 0
 71279 WeatherStation CALL  ioctl(0x3,TIOCFLUSH,0xbfbfe9d8)
 71279 WeatherStation RET   ioctl 0
 71279 WeatherStation CALL  write(0x3,0xbfbfea2d,0x1)
 71279 WeatherStation GIO   fd 3 wrote 1 byte
   0x
c1  
  
|.|

 71279 WeatherStation RET   write 1
 71279 WeatherStation CALL  ioctl(0x3,TIOCDRAIN,0)
 71279 WeatherStation RET   ioctl -1 errno 35 Resource temporarily
unavailable
 71279 WeatherStation CALL  nanosleep(0xbfbfe9f0,0)
 71279 WeatherStation RET   nanosleep 0
 71279 WeatherStation CALL  ioctl(0x3,TIOCFLUSH,0xbfbfe9d8)
 71279 WeatherStation RET   ioctl 0
 71279 WeatherStation CALL  write(0x3,0xbfbfea2d,0x5)
 71279 WeatherStation GIO   fd 3 wrote 5 bytes
   0x 1745 5b0f
91  

|.E[..|

 71279 WeatherStation RET   write 5
 71279 WeatherStation CALL  ioctl(0x3,TIOCDRAIN,0)
 71279 WeatherStation RET   ioctl -1 errno 35 Resource temporarily
unavailable
 71279 WeatherStation CALL  select(0x4,0xbfbfe968,0,0,0xbfbfe960)
 71279 WeatherStation RET   select 0
 71279 WeatherStation CALL  ioctl(0x3,TIOCSETAF,0x8158b80)
 71279 WeatherStation RET   ioctl -1 errno 35 Resource temporarily
unavailable
 71279 WeatherStation CALL  ioctl(0x3,TIOCFLUSH,0xbfbfe9f8)
 71279 WeatherStation RET   ioctl 0
 71279 WeatherStation CALL  close(0x3)
 71279 WeatherStation RET   close 0

So it looks like problems with TIOCDRAIN on the serial port.

Doing some very non-scientific digging around what has changed in recent
times, I found r352862 (sys/dev/uart/uart_dev_ns8250.c) that seems to be
working in this area.
(https://svnweb.freebsd.org/base?view=revision&revision=352862) I backed
that change out and now everything is working properly again.

What I don't know because I didn't dig any further is if this is a
problem with the change or if it's a problem with the puc driver not
supporting fields that are then required by the uart code. I know this
card will be quite old so it may be that I need to upgrade to something
newer at some point. The fact that this change has been in current and
has now been merged back to stable sort of suggests that not many other
people have the same hardware. It also may be that I'm the only person
running a 1-wire network from one of these (very likely) and am
therefore the only one seeing this behaviour.

Any thoughts? Thanks for reading.

Brian

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11-STABLE system unbootable after update

2019-05-19 Thread Scott Bennett
Cyrus Rahman  wrote:

> When you upgrade the kernel you need to upgrade any loadable kernel
> modules at the same time.

 Yes.  For a time, listing them in /etc/src.conf would get them
rebuilt automatically as a sort of epilogue to building a kernel, but then
something changed quite a while back, and I had to comment that line out
and return to doing them manually.  Often they still work without being
rebuilt, and I got careless this time.  Sigh.
>
> Recompile any kmods from ports and you should be ok.

 Done, but the next reboot will be into the previous boot environment,
and it now seems unlikely that I will use this one again rather than
making a new one from a more up-to-date revision.
>
> STABLE doesn't always work.  It's good for adventurous people to try
> it so the bugs get found out, but occasionally it is painful.  Upgrade
> again right after the release in a few weeks.  Usually caution is in
> order just before things get to the BETA stage.  I upgraded just now
> do help test things and discovered your bug, which I posted about a
> few posts before yours on the list.

 Ah.  I suppose I will see that while catching up on my latest email
backlog caused by six days or so with no working system.  Thanks again
for your reply to my call for help.
>
> If the description I posted is similar to yours, go ahead and reply to
> my message on the list, and perhaps go to bugs.freebsd.org and search
> out 'loader', and add any information you might have (or at least
> document the fact that you were affected).

 Will do.
>
> You can quote me on the list, I simply wanted to have you try things
> out before putting my suggestions on it.  Over the years I have grown
> weary of unnecessary noise.
>
 Okay.  I've cc'ed the list this time.
 FWIW, once of the things I have been wishing for in trying new
revisions of 11-STABLE is a fix for the failure of the kernel to honor
the vm.max_wired sysctl variable.  The crash that gave me an opportunity
to try the broken revision was another case of the kernel having pagefixed
so much real memory that it was not only causing paging/swapping when it
should have, but I think the kernel itself couldn't get page frames it
needed fast enough in some situation.  I don't know whether this bug has
been found and fixed yet, though, so I have temporarily returned to
setting vm.kmem_size_max, which does seem to be honored.
 Anyway, thanks for bailing me out yesterday.  The system is now
somewhat usable while I satisfy my curiousity and should be fully usable
upon reversion, which will probably happen tomorrow (Monday) night.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11-STABLE system unbootable after update

2019-05-19 Thread Scott Bennett
Matt Garber  wrote:

> On Sun, May 19, 2019 at 9:00 AM Scott Bennett  wrote:
>
> >
> >  Many, many thanks to the person who responded with the solution to
> > get past the
> > loader crash!!  My system is now getting work done again, and the rest of
> > the new
> > problems can be dealt with on a running system.
>
>
> Out of curiosity, you mentioned in your previous email you created a new
> boot environment for this upgrade? since additional issues remain beyond
> the loader, have you considered rolling back to the known-good BE and
> attempting the entire process again (with another, separate BE) in a week
> or so? Especially since that would hopefully allow you to continue your
> other work without any additional issues or oddities to sort through in the
> meantime?
>
 Yes, and I will likely do so, but not tonight.  I am still exploring
what is new/changed in this revision (besides a broken zfsloader), and I do
have mprime back to work for the time being, so reverting can wait another
day.  I have reverted a few times in the past, but was always able to start
that from the boot menu.  This time really threw me until I received a reply
to my plea for help telling me how to use the previous loader to get to the
boot menu.  I had already successfully run the r347183 kernel in single-user
mode, so I figured I could do so again.  What broke the zfsloader was the
installworld step.
 After reverting to r345498 I will try bringing my source tree up to date,
which would be quite a bit later than this broken r347183, and then run a
fresh buildworld and buildkernel anyway.  If all that fails, then I'll just
go back to r345498 again and sit it out until 11.3-RELEASE happens before
trying again.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11-STABLE system unbootable after update

2019-05-19 Thread Scott Bennett
 On Sat, 18 May 2019 08:02:20 -0500 Scott Bennett  wrote:

> I have been running 11.2-STABLE for a while at r345498.  Last weekend it 
> crashed,
>so I took the opportunity to install the most recent build I had lying around, 
>r347182.
>I created a new boot environment and installed the r347182 kernel into it, 
>shut the
>system down, and rebooted.  The new kernel came up and appeared to be working 
>okay, so
>I continued with the mergemaster -p -F, make installworld, and mergemaster -F, 
>then
>shut it down again, and rebooted.  It asked for the GELI key for the boot 
>pool, which
>I then entered.  The spinning slash cursor appeared and may have changed for 
>one frame
>or so, and then I got a message beginning with "BTX" and followed by several 
>lines of
>hexadecimal, and then it stopped.  I tried it again just to be sure, and the 
>result was
>exactly the same.
> Does anyone know whether the PMBR boot block or the loader in the 
> freebsd-boot
>partition changed between r345498 and r347182?  I found no warning in 
>/usr/src/UPDATING
 ^ I wrote the revision down wrong 
in my
   notes.  It was really r347183.

>about installworld potentially leaving a wasted system, so I don't have a 
>clear idea
>of what went wrong, much less whether I missed some instruction somewhere 
>about source
>updates.  If anyone can lend me a clue here, I would greatlyappreciate it.  I 
>only had
>one working machine, and now it is only working in a "rescue" mode by booting 
>from a
>DVD.  (Probably needless to say, but I will burn new DVDs with up-to-date 
>stuff as soon
>as my system is working the way it is supposed to again.)
> This motherboard is nearly 11 years old and does not boot from USB (in 
> spite of
>the BIOS menus say), so at the moment I am logged into SDF by running a long 
>out-of-date
>TrueOS installer DVD, which happens to be a pain to get to boot all the way, 
>but I've
>figured how to make it do it rather than get stuck with a logo on the screen 
>that never
>goes away.  Unfortunately, it includes no software to burn a CD or DVD, so I 
>cannot
>make a new bootable disk for the time being.  I will check email much later 
>today or
>this evening.

 So far I've received one reply, which was not copied to this list, yet the 
person
responding suggested something to try and also adked that I post the result to 
the list.
I would have done both anyway, but the respondent may have desired anonymity on 
the list,
so I am not quoting the message I received.
 The suggestion was to wait about a second after entering the GELI 
passphrase and
then hit the space bar on my keyboard.  At the resulting prompt, I should enter 
the path
given in the prompt, but with ".old" appended.  I did that, and YES!!!  It 
worked and
proceeded until I had a boot menu.  I opted for single-user mode and then 
responded to
further requests for GELI passphrases until eventually I had a root shell.  
Being unable
to reach the boot menu was a problem hadn't previously even crossed my mind.  I 
certainly
hope that doing updates from source in the future will not cause this same 
booby trap
again.
 At that point I renamed /boot/zfsloader to /boot/zfsloader.bad.r347183 and
/boot/zfsloader.old to /boot/zfsloader.  I also added a hard link to the latter 
as
/boot/zfsloader.good.r345498.
 All is still not well, however.  In multi-user mode, startx turns the 
screen black
and switches its power setting to standby.  After that it remains unresponsive 
until I
log in via a different vt and send SIGHUP to xorg.  After a rather lengthy 
delay (30-60
seconds, at a guess) it returns to the login session on the console vt.  I have 
now
commented out the "kld_list="/boot/modules/radeonkms.ko" line in 
/etc/rc.conf.local in
hopes that the next boot will get the scfb driver to take it instead of the 
radeonkms
driver from graphics/drm-next-kmod.  If someone knows, is this a case where 
rebuilding and
reinstalling graphics/drm-next-kmod?  If so, then I will do that, but I see 
that the
Makefile still appears to use the same distribution file.
 Many, many thanks to the person who responded with the solution to get 
past the
loader crash!!  My system is now getting work done again, and the rest of the 
new
problems can be dealt with on a running system.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a

11-STABLE system unbootable after update

2019-05-18 Thread Scott Bennett
 I have been running 11.2-STABLE for a while at r345498.  Last weekend it 
crashed,
so I took the opportunity to install the most recent build I had lying around, 
r347182.
I created a new boot environment and installed the r347182 kernel into it, shut 
the
system down, and rebooted.  The new kernel came up and appeared to be working 
okay, so
I continued with the mergemaster -p -F, make installworld, and mergemaster -F, 
then
shut it down again, and rebooted.  It asked for the GELI key for the boot pool, 
which
I then entered.  The spinning slash cursor appeared and may have changed for 
one frame
or so, and then I got a message beginning with "BTX" and followed by several 
lines of
hexadecimal, and then it stopped.  I tried it again just to be sure, and the 
result was
exactly the same.
 Does anyone know whether the PMBR boot block or the loader in the 
freebsd-boot
partition changed between r345498 and r347182?  I found no warning in 
/usr/src/UPDATING
about installworld potentially leaving a wasted system, so I don't have a clear 
idea
of what went wrong, much less whether I missed some instruction somewhere about 
source
updates.  If anyone can lend me a clue here, I would greatlyappreciate it.  I 
only had
one working machine, and now it is only working in a "rescue" mode by booting 
from a
DVD.  (Probably needless to say, but I will burn new DVDs with up-to-date stuff 
as soon
as my system is working the way it is supposed to again.)
 This motherboard is nearly 11 years old and does not boot from USB (in 
spite of
the BIOS menus say), so at the moment I am logged into SDF by running a long 
out-of-date
TrueOS installer DVD, which happens to be a pain to get to boot all the way, 
but I've
figured how to make it do it rather than get stuck with a logo on the screen 
that never
goes away.  Unfortunately, it includes no software to burn a CD or DVD, so I 
cannot
make a new bootable disk for the time being.  I will check email much later 
today or
this evening.
 Thanks in advance for any helpful ideas!


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: route based ipsec

2019-05-04 Thread Scott Aitken
> On 5/2/2019 4:16 PM, KOT MATPOCKuH wrote:
> > 0.The ipsec-tools port currently does not have a maintainer (C) portmaster
> > ... Does this solution really supported? Or I should switch to use
> > another IKE daemon?

I've just started using IPSEC between a 12.0-RELEASE box, a 11.2-RELEASE-p9
box and a Cisco IOS router.

I haven't seen any core dumps or crashes.  I run routing between these
devices (using RIPv2 rather than OSPF) - in order to do this you need to
create tunnels between the devices because encrypting routing protocols and
things that use multicast is tricky.  I felt that that the handbook example
was lacking - it should have been encrypting the tunnel endpoints and NOT the
LAN traffic on either side of the tunnel.

Anyway I built IPENCAP (aka IPinIP) tunnels using gif interfaces and
configured racoon/ipsec-tools to build the SA/SADs using the tunnel endpoints
and IP protocol 4 (IPENCAP).

Step 1 was to confirm I could PING over the gif tunnel without crytpo.  Then
I fired up racoon (setkey to create the SA and racoon for IPSEC).

If you want the configs let me know.

Scott
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MSI allocation regression, still to be corrected in HEAD and please MFC before release/12.0 gets branched

2018-11-13 Thread Scott Long


> On Nov 13, 2018, at 11:11 AM, Harry Schmalzbauer  wrote:
> 
> Am 13.11.2018 um 19:02 schrieb Scott Long:
>>> On Nov 12, 2018, at 10:03 AM, Harry Schmalzbauer  wrote:
>>> 
>>> Am 11.06.2018 um 20:28 schrieb Harry Schmalzbauer:
>>>> Am 05.06.2018 um 19:54 schrieb Scott Long:
>>>> …
>>>>>>>> Late in the 11.2 phase, I identified this commit as a regression for 
>>>>>>>> MSI (non-x) alloctaion.
>>>>>>>> I have an idea what probably causes the problem here (INTx allocation, 
>>>>>>>> although MSI (and MSI-x) capability):
>>>>>>>> disable_msix is not 0 (I need to disable MSI-x because of 
>>>>>>>> ESXi-passthru…).
>>>>>>>> 
>>>>>>>> Corresponding lines:
>>>>>>>> {
>>>>>>>>  device_t dev;
>>>>>>>>  int error, msgs;
>>>>>>>> 
>>>>>>>>  dev = sc->mps_dev;
>>>>>>>>  error = 0;
>>>>>>>>  msgs = 0;
>>>>>>>> 
>>>>>>>>  if ((sc->disable_msix == 0) &&
>>>>>>>>  ((msgs = pci_msix_count(dev)) >= MPS_MSI_COUNT))
>>>>>>>>  error = mps_alloc_msix(sc, MPS_MSI_COUNT);
>>>>>>>>  if ((error != 0) && (sc->disable_msi == 0) &&
>>>>>>>>  ((msgs = pci_msi_count(dev)) >= MPS_MSI_COUNT))
>>>>>>>>  error = mps_alloc_msi(sc, MPS_MSI_COUNT);
>>>>>>>>  if (error != 0)
>>>>>>>>  msgs = 0;
>>>>>>>> 
>>>>>>>>  sc->msi_msgs = msgs;
>>>>>>>>  return (error);
>>>>>>>> }
>>>>>>>> 
>>> …
>>>>>>> Hi Harry,
>>>>>>> You are correct about the bug.  Please change the line at the top of 
>>>>>>> the function that reads
>>>>>>> error = 0;
>>>>>>> to
>>>>>>> error = ENXIO;
>>>>>>> Let me know if that fixes the MSI problem for you.
>>>> 
>>>> …
>>>> 
>>> …
>>>> Index: src/sys/dev/mps/mps_pci.c
>>>> ===
>>>> --- sys/dev/mps/mps_pci.c   (Revision 334948)
>>>> +++ sys/dev/mps/mps_pci.c   (Arbeitskopie)
>>>> @@ -244,7 +244,7 @@
>>>> int error, msgs;
>>>> 
>>>> dev = sc->mps_dev;
>>>> -   error = 0;
>>>> +   error = ENXIO;
>>>> msgs = 0;
>>>> 
>>>> if ((sc->disable_msix == 0) &&
>>>> 
>>> 
>>> To my understanding, it's obvious that the way mps_pci_alloc_interrupts() 
>>> currently works is unintended.
>>> This might not affect too many people, but is there a reason not to fix it?
>>> 
>>> I already created a coresponding problem report: 
>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229267
>>> Anything else I should do?
>>> 
>> Hi Harry,
>> Sorry for ignoring this for so long.  I’m going to commit a fix today, but 
>> it won’t be the same one-line change.
>> Upon reviewing the code, I’d going to refactor it so it’s not so confusing 
>> and prone to these kinds of mistakes.
>> Thank you for the continued reminders to finish this.
> 
> Hi Scott,
> 
> thanks a lot, in fact I'm not surprised that you come up with a better 
> solution than that quick fix :-)
> Had hoped someone else would do an intermediate commit to get it into 12.0 in 
> time, so you won't feel any time pressure - good job needs the time it needs, 
> as long as the right person is doing the job.
> 
> Unfortunately I don't have a non-productive setup where I could test before 
> release/12.0 will be branched – might be subject to change...

12.0 has completely different code from 11.x, and from my review of it last 
night it should be fine.  If you have evidence that what’s currently in 12 is 
not working, please let me know ASAP.

Scott

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MSI allocation regression, still to be corrected in HEAD and please MFC before release/12.0 gets branched

2018-11-13 Thread Scott Long


> On Nov 12, 2018, at 10:03 AM, Harry Schmalzbauer  wrote:
> 
> Am 11.06.2018 um 20:28 schrieb Harry Schmalzbauer:
>> Am 05.06.2018 um 19:54 schrieb Scott Long:
>> …
>>>>>> Late in the 11.2 phase, I identified this commit as a regression for MSI 
>>>>>> (non-x) alloctaion.
>>>>>> I have an idea what probably causes the problem here (INTx allocation, 
>>>>>> although MSI (and MSI-x) capability):
>>>>>> disable_msix is not 0 (I need to disable MSI-x because of 
>>>>>> ESXi-passthru…).
>>>>>> 
>>>>>> Corresponding lines:
>>>>>> {
>>>>>>  device_t dev;
>>>>>>  int error, msgs;
>>>>>> 
>>>>>>  dev = sc->mps_dev;
>>>>>>  error = 0;
>>>>>>  msgs = 0;
>>>>>> 
>>>>>>  if ((sc->disable_msix == 0) &&
>>>>>>  ((msgs = pci_msix_count(dev)) >= MPS_MSI_COUNT))
>>>>>>  error = mps_alloc_msix(sc, MPS_MSI_COUNT);
>>>>>>  if ((error != 0) && (sc->disable_msi == 0) &&
>>>>>>  ((msgs = pci_msi_count(dev)) >= MPS_MSI_COUNT))
>>>>>>  error = mps_alloc_msi(sc, MPS_MSI_COUNT);
>>>>>>  if (error != 0)
>>>>>>  msgs = 0;
>>>>>> 
>>>>>>  sc->msi_msgs = msgs;
>>>>>>  return (error);
>>>>>> }
>>>>>> 
> …
>>>>> Hi Harry,
>>>>> You are correct about the bug.  Please change the line at the top of the 
>>>>> function that reads
>>>>> error = 0;
>>>>> to
>>>>> error = ENXIO;
>>>>> Let me know if that fixes the MSI problem for you.
>> 
>> …
>> 
> …
>> Index: src/sys/dev/mps/mps_pci.c
>> ===
>> --- sys/dev/mps/mps_pci.c   (Revision 334948)
>> +++ sys/dev/mps/mps_pci.c   (Arbeitskopie)
>> @@ -244,7 +244,7 @@
>> int error, msgs;
>> 
>> dev = sc->mps_dev;
>> -   error = 0;
>> +   error = ENXIO;
>> msgs = 0;
>> 
>> if ((sc->disable_msix == 0) &&
>> 
> 
> To my understanding, it's obvious that the way mps_pci_alloc_interrupts() 
> currently works is unintended.
> This might not affect too many people, but is there a reason not to fix it?
> 
> I already created a coresponding problem report: 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229267
> Anything else I should do?
> 

Hi Harry,

Sorry for ignoring this for so long.  I’m going to commit a fix today, but it 
won’t be the same one-line change.
Upon reviewing the code, I’d going to refactor it so it’s not so confusing and 
prone to these kinds of mistakes.
Thank you for the continued reminders to finish this.

Scott


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-BETA3 isp(4): exclusive sleep mutex CAM device lock (CAM device lock)

2018-11-02 Thread Scott Long
It’s harmless unless you run your machine under severe and sustained memory 
exhaustion.

Scott


> On Nov 2, 2018, at 8:15 AM, Harry Schmalzbauer  wrote:
> 
> Hello,
> 
> unnfortunately I can't determine if this is begnin, so I'd like to ask the 
> experts:
> 
> uma_zalloc_arg: zone "64" with the following non-sleepable locks held:
> exclusive sleep mutex CAM device lock (CAM device lock) r = 0 
> (0xf8000f1424d0) locked @ 
> /usr/local/share/deploy-tools/RELENG_12/src/sys/cam/cam_xpt.c:4309
> stack backtrace:
> #0 0x8060ead3 at witness_debugger+0x73
> #1 0x8060fa48 at witness_warn+0x448
> #2 0x808b6f18 at uma_zalloc_arg+0x38
> #3 0x8058773a at malloc+0x9a
> #4 0x8036f423 at nvlist_create+0x23
> #5 0x8030d597 at ctl_port_register+0x187
> #6 0x8031aa25 at ctlfeasync+0x405
> #7 0x802ca772 at xpt_async_process_dev+0x162
> #8 0x802c603d at xpt_async_process+0x15d
> #9 0x802c697e at xpt_done_process+0x35e
> #10 0x802c8a76 at xpt_done_td+0xf6
> #11 0x8056e094 at fork_exit+0x84
> #12 0x808f46de at fork_trampoline+0xe
> uma_zalloc_arg: zone "64" with the following non-sleepable locks held:
> exclusive sleep mutex CAM device lock (CAM device lock) r = 0 
> (0xf8000f1414d0) locked @ 
> /usr/local/share/deploy-tools/RELENG_12/src/sys/cam/cam_xpt.c:4309
> stack backtrace:
> #0 0x8060ead3 at witness_debugger+0x73
> #1 0x8060fa48 at witness_warn+0x448
> #2 0x808b6f18 at uma_zalloc_arg+0x38
> #3 0x8058773a at malloc+0x9a
> #4 0x8036f423 at nvlist_create+0x23
> #5 0x8030d597 at ctl_port_register+0x187
> #6 0x8031aa25 at ctlfeasync+0x405
> #7 0x802ca772 at xpt_async_process_dev+0x162
> #8 0x802c603d at xpt_async_process+0x15d
> #9 0x802c697e at xpt_done_process+0x35e
> #10 0x802c8a76 at xpt_done_td+0xf6
> #11 0x8056e094 at fork_exit+0x84
> #12 0x808f46de at fork_trampoline+0xe
> 
> Thanks for hints,
> 
> -harry
> 
> ___
> freebsd-s...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: why does buildworld fail on stable/11 ?

2018-01-28 Thread Scott Bennett
Ian Lepore  wrote:

> On Fri, 2018-01-26 at 09:52 +, Holger Kipp wrote:
> > Dear Scott,
> > 
> > Am 26.01.2018 um 09:07 schrieb Scott Bennett 
> > >:
> > 
> > cd /usr/src; PATH=/sbin:/bin:/usr/sbin:/usr/bin MAKE_CMD=make 
> > /usr/obj/usr/src/make.amd64/bmake -m /usr/src/share/mk -f Makefile.inc1 
> > TARGET=amd64 TARGET_ARCH=amd64 MK_META_MODE=no cleandir
> > bmake: illegal argument to d option -- p
> > usage: make [-BPSXeiknpqrstv] [-C directory] [-D variable]
> > [-d flags] [-E variable] [-f makefile] [-I directory]
> > [-j max_jobs] [-m directory] [-V variable]
> > [variable=value] [target ...]
> > *** Error code 2
> > 
> > Stop.
> > make: stopped in /usr/src
> > hellas# exit
> > exit
> > 
> > Script done on Fri Jan 26 01:33:18 2018
> > 
> > 
> > ?Scott Bennett, Comm. ASMELG, CFIAG
> > 
> > This sound similar to an issue with make in 2013:
> > 
> > 20130613:
> > Some people report the following error after the switch to bmake:
> > 
> > make: illegal option -- J
> > usage: make [-BPSXeiknpqrstv] [-C directory] [-D variable]
> > ...
> > *** [buildworld] Error code 2
> > 
> > this likely due to an old instance of make in
> > ${MAKEPATH} (${MAKEOBJDIRPREFIX}${.CURDIR}/make.${MACHINE})
> > which src/Makefile will use that blindly, if it exists, so if
> > you see the above error:
> > 
> > rm -rf `make -V MAKEPATH`
> > 
> > should resolve it.
> > 
> > Can you check if you have an older version of make in your makepath and 
> > delete / rename it?
> > 
>
> Yep, it's definitely running a bad old version of make, and the thought
> that it's using /usr/obj/usr/src/make.amd64/bmake even though it's not

 Aack!!  "make buildworld" doesn't kill that??  Wow.  Why does it get
missed by buildworld (or cleanworld, if that's what buildworld uses)?  Should
that get a PR?  That program must have been sitting there for several *years*.
 The irony here is that I have long treated /usr/obj as disposable
(i.e., I don't normally bother to back it up anywhere) because a) a "make
buildworld buildkernel" will recreate all of it, and b) both of those targets
include huge sequences of deletions that wipe out all existing versions of
stuff that they will create.  Or so I have thought until now.  Apparently,
the handbook needs to be updated to reflect the need to

/bin/rm -rf /usr/obj/usr

(or use newfs) before each buildworld just for safety's sake.

> up to date fits the symptoms. ?I'm a bit confused by the "rm -rf"

 Yes, it certainly does.  A quick newfs of the device where I keep
/usr/obj, and it seems to work perfectly now.  I'm almost certain that this
same obsolete binary was what put a sudden halt to my updates of 10.3-STABLE
back in early October 2016, as well.

> command at the end... when I do make -V MAKEPATH I get nothing, so the
> rm command would just be an error -- since that's from UPDATING in
> 2013, I'm thinking it may be out of date advice now.
>
> I think the right fix here is probably "rm -rf /usr/obj/*" followed by
> a make buildworld.
>
 Well, in this case, /usr/obj is a mount point for a UFS2 file system,
so it's less messy and much faster just to newfs it and mount it again.
 Anyway, thanks ever so much to Holger Kipp and Ian Lepore for finding
the problem, and thanks also to everyone else who tried.  The buildworld
(complete with ccache) has completed, and right now a kernel (GENERIC except
for SCHED_4BSD instead of the wretched SCHED_ULE) is busily being built.
Then I will try a much more customized kernel, but I no longer expect any
serious obstacles, thanks entirely to the help I got here on this list.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: why does buildworld fail on stable/11 ?

2018-01-26 Thread Scott Bennett
Ian Lepore  wrote:

> On Thu, 2018-01-25 at 00:25 -0600, Scott Bennett wrote:
> > Ian Lepore  wrote:
> > 
> > > 
> > > On Wed, 2018-01-24 at 12:39 +0100, Dimitry Andric wrote:
> > > > 
> > > > On 24 Jan 2018, at 09:51, Scott Bennett  wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > Subject: Re: why does buildworld fail on stable/11 ?
> > > > > 
> > > > > I wrote:
> > > > > > 
> > > > > > 
> > > > > > ???On Mon, 22 Jan 2018 12:42:58 + lists  wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On 22/01/2018 09:17, Scott Bennett wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > ???Anyway, I'm stuck.??Can someone please tell me what is going 
> > > > > > > > wrong and
> > > > > > > > how to fix it???I'd really like to be able to update my system, 
> > > > > > > > not only to
> > > > > > > > keep it reasonably current, but also to be able to customize a 
> > > > > > > > kernel.??Thanks
> > > > > > > > in advance for any suggestions/solutions.
> > > > > [much deleted??--SB]
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > then
> > > > > > > 
> > > > > > > [/usr/src #] make cleandir && make clean && make buildworld && 
> > > > > > > make
> > > > > > > buildkernel && make installkernel && mergemaster -p
> > > > > > ???At this point, that looks very optimistic, to say the least. 
> > > > > > :-)??I've
> > > > > > tried "make cleanworld" (with /etc/make.conf still in place), and 
> > > > > > it failed
> > > > > > exactly like the buildworld example I posted before.
> > > > > Okay.??Here's what happened.
> > > > > 
> > > > > Script started on Wed Jan 24 02:17:30 2018
> > > > > hellas#   mv /etc/make.conf{,.save}
> > > > > hellas#   mv /etc/src.conf{,.save}
> > > > > hellas#   cd /usr/src
> > > > > hellas#   make cleandir
> > > > > "/usr/src/share/mk/local.sys.mk", line 51: Malformed conditional 
> > > > > (${.MAKE.MODE:Mmeta*} != "")
> > > > > "/usr/src/share/mk/local.sys.mk", line 58: Malformed conditional 
> > > > > (${.MAKE.MODE:Mnofilemon} == "")
> > > > > "/usr/src/share/mk/local.sys.mk", line 76: if-less else
> > > > > "/usr/src/share/mk/local.sys.mk", line 79: if-less endif
> > > > > "/usr/src/share/mk/sys.mk", line 476: if-less endif
> > > > > bmake: fatal errors encountered -- cannot continue
> > > > Looks like your make is broken.??What is the output of "which make"?
> > > > 
> > > > -Dimitry
> > > > 
> > > And also the output of "make -V MAKE_VERSION". ?To me, this looks a lot
> > > like what happens when you try to use old fmake from freebsd 8 to build
> > > modern freebsd source.
> > > 
> > hellas# make -V MAKE_VERSION
> > 20170720
> > hellas#
>
> Well, that kills the wrong-version theory. ?The thing I would try next
> is setting some make debug flags, but it'll generate a ton of output.
>
> I'd start with "make -dlp cleandir" that should list everything it's
> doing while reading makefiles, and list any commands it executes.
> ?Capture that output (stdout and stderr), then a good first-look at the
> file might be something like "grep -v?ParseReadLine make.log", that
> should show us what files it's reading from which directories. ?My
> theory is maybe it's picking up a wrong include file somehow which
> leads it astray. ?If that's not it, we may need to also examine all the
> ParseReadLine stuff, or add some other debug flags.
>
 Okay.  If you really want all of it, let me know, and I'll email it
to you directly.  Here are the last 50 lines or so of it.

# .MAIN, flags 0, type 1, made 0
# _guard, flags 0, type 0, made 0
ParseDoDependency(_guard: .PHONY)
ParseDoDependency(world: .PHONY)
ParseDoDependency(kernel: bu

Re: why does buildworld fail on stable/11 ?

2018-01-24 Thread Scott Bennett
Ian Lepore  wrote:

> On Wed, 2018-01-24 at 12:39 +0100, Dimitry Andric wrote:
> > On 24 Jan 2018, at 09:51, Scott Bennett  wrote:
> > > 
> > > 
> > > Subject: Re: why does buildworld fail on stable/11 ?
> > > 
> > > I wrote:
> > > > 
> > > > ???On Mon, 22 Jan 2018 12:42:58 + lists  
> > > > wrote:
> > > > > 
> > > > > On 22/01/2018 09:17, Scott Bennett wrote:
> > > > > > 
> > > > > > ???Anyway, I'm stuck.??Can someone please tell me what is going 
> > > > > > wrong and
> > > > > > how to fix it???I'd really like to be able to update my system, not 
> > > > > > only to
> > > > > > keep it reasonably current, but also to be able to customize a 
> > > > > > kernel.??Thanks
> > > > > > in advance for any suggestions/solutions.
> > > [much deleted??--SB]
> > > > 
> > > > > 
> > > > > then
> > > > > 
> > > > > [/usr/src #] make cleandir && make clean && make buildworld && make
> > > > > buildkernel && make installkernel && mergemaster -p
> > > > ???At this point, that looks very optimistic, to say the least. 
> > > > :-)??I've
> > > > tried "make cleanworld" (with /etc/make.conf still in place), and it 
> > > > failed
> > > > exactly like the buildworld example I posted before.
> > > Okay.??Here's what happened.
> > > 
> > > Script started on Wed Jan 24 02:17:30 2018
> > > hellas#   mv /etc/make.conf{,.save}
> > > hellas#   mv /etc/src.conf{,.save}
> > > hellas#   cd /usr/src
> > > hellas#   make cleandir
> > > "/usr/src/share/mk/local.sys.mk", line 51: Malformed conditional 
> > > (${.MAKE.MODE:Mmeta*} != "")
> > > "/usr/src/share/mk/local.sys.mk", line 58: Malformed conditional 
> > > (${.MAKE.MODE:Mnofilemon} == "")
> > > "/usr/src/share/mk/local.sys.mk", line 76: if-less else
> > > "/usr/src/share/mk/local.sys.mk", line 79: if-less endif
> > > "/usr/src/share/mk/sys.mk", line 476: if-less endif
> > > bmake: fatal errors encountered -- cannot continue
> > Looks like your make is broken.??What is the output of "which make"?
> > 
> > -Dimitry
> > 
>
> And also the output of "make -V MAKE_VERSION". ?To me, this looks a lot
> like what happens when you try to use old fmake from freebsd 8 to build
> modern freebsd source.
>
hellas# make -V MAKE_VERSION
20170720
hellas#


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: why does buildworld fail on stable/11 ?

2018-01-24 Thread Scott Bennett
tech-lists  wrote:

> On 24/01/2018 08:51, Scott Bennett wrote:
> > hellas# mv /etc/make.conf{,.save}
> > hellas# mv /etc/src.conf{,.save}
> > hellas# cd /usr/src
> > hellas# make cleandir
> > "/usr/src/share/mk/local.sys.mk", line 51: Malformed conditional 
> > (${.MAKE.MODE:Mmeta*} != "")
> > "/usr/src/share/mk/local.sys.mk", line 58: Malformed conditional 
> > (${.MAKE.MODE:Mnofilemon} == "")
> > "/usr/src/share/mk/local.sys.mk", line 76: if-less else
> > "/usr/src/share/mk/local.sys.mk", line 79: if-less endif
> > "/usr/src/share/mk/sys.mk", line 476: if-less endif
> > bmake: fatal errors encountered -- cannot continue
> > *** Error code 1
>
> Move the /usr/src somewhere else, make a new /usr/src dir and then
>
> svnlite co https://svn.FreeBSD.org/base/stable/11 /usr/src
>
> make sure it completes normally
>
> then cd into it and make cleandir. What happens?

 Before I do that, please answer my earlier question regarding svnlite
vs. svn.  The make is failing on a clean checkout of /usr/src, as I stated
before.  The only difference is that I used svn to do the checkout, not
svnlite.  If they give identical checkout output, then repeating that rather
lengthy download would give an identical result and thus would serve no
purpose.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: why does buildworld fail on stable/11 ?

2018-01-24 Thread Scott Bennett
Dimitry Andric  wrote:

> On 24 Jan 2018, at 09:51, Scott Bennett  wrote:
> > 
> > Subject: Re: why does buildworld fail on stable/11 ?
> > 
> > I wrote:
> >>On Mon, 22 Jan 2018 12:42:58 + lists  wrote:
> >>> On 22/01/2018 09:17, Scott Bennett wrote:
> >>>>Anyway, I'm stuck.  Can someone please tell me what is going wrong and
> >>>> how to fix it?  I'd really like to be able to update my system, not only 
> >>>> to
> >>>> keep it reasonably current, but also to be able to customize a kernel.  
> >>>> Thanks
> >>>> in advance for any suggestions/solutions.
> >>> 
> > [much deleted  --SB]
> >>> then
> >>> 
> >>> [/usr/src #] make cleandir && make clean && make buildworld && make
> >>> buildkernel && make installkernel && mergemaster -p
> >> 
> >>At this point, that looks very optimistic, to say the least. :-)  I've
> >> tried "make cleanworld" (with /etc/make.conf still in place), and it failed
> >> exactly like the buildworld example I posted before.
> > 
> > Okay.  Here's what happened.
> > 
> > Script started on Wed Jan 24 02:17:30 2018
> > hellas# mv /etc/make.conf{,.save}
> > hellas# mv /etc/src.conf{,.save}
> > hellas# cd /usr/src
> > hellas# make cleandir
> > "/usr/src/share/mk/local.sys.mk", line 51: Malformed conditional 
> > (${.MAKE.MODE:Mmeta*} != "")
> > "/usr/src/share/mk/local.sys.mk", line 58: Malformed conditional 
> > (${.MAKE.MODE:Mnofilemon} == "")
> > "/usr/src/share/mk/local.sys.mk", line 76: if-less else
> > "/usr/src/share/mk/local.sys.mk", line 79: if-less endif
> > "/usr/src/share/mk/sys.mk", line 476: if-less endif
> > bmake: fatal errors encountered -- cannot continue
>
> Looks like your make is broken.  What is the output of "which make"?
>
hellas# echo $PATH
/usr/local/libexec/ccache:/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/root/bin
hellas# which make
/usr/bin/make
hellas# file /usr/bin/make
/usr/bin/make: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), 
statically linked, for FreeBSD 11.1 (1101506), FreeBSD-style, stripped
hellas# 

 Thanks, both of you, for something to try.  With enough ideas, maybe
the problem can be uncovered.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: why does buildworld fail on stable/11 ?

2018-01-24 Thread Scott Bennett
Subject: Re: why does buildworld fail on stable/11 ?

 I wrote:
> On Mon, 22 Jan 2018 12:42:58 + lists  wrote:
>>On 22/01/2018 09:17, Scott Bennett wrote:
>>> Anyway, I'm stuck.  Can someone please tell me what is going wrong and
>>> how to fix it?  I'd really like to be able to update my system, not only to
>>> keep it reasonably current, but also to be able to customize a kernel.  
>>> Thanks
>>> in advance for any suggestions/solutions.
>>
 [much deleted  --SB]
>>then
>>
>>[/usr/src #] make cleandir && make clean && make buildworld && make
>>buildkernel && make installkernel && mergemaster -p
>
> At this point, that looks very optimistic, to say the least. :-)  I've
>tried "make cleanworld" (with /etc/make.conf still in place), and it failed
>exactly like the buildworld example I posted before.

 Okay.  Here's what happened.

Script started on Wed Jan 24 02:17:30 2018
hellas# mv /etc/make.conf{,.save}
hellas# mv /etc/src.conf{,.save}
hellas# cd /usr/src
hellas# make cleandir
"/usr/src/share/mk/local.sys.mk", line 51: Malformed conditional 
(${.MAKE.MODE:Mmeta*} != "")
"/usr/src/share/mk/local.sys.mk", line 58: Malformed conditional 
(${.MAKE.MODE:Mnofilemon} == "")
"/usr/src/share/mk/local.sys.mk", line 76: if-less else
"/usr/src/share/mk/local.sys.mk", line 79: if-less endif
"/usr/src/share/mk/sys.mk", line 476: if-less endif
bmake: fatal errors encountered -- cannot continue
*** Error code 1

Stop.
make: stopped in /usr/src
hellas# exit
exit

Script done on Wed Jan 24 02:19:04 2018

 Does anyone have any idea what is so horribly broken in 11.1-STABLE?
FWIW, this is on a ZFS installation done by the utterly braindead bsdinstall
provided in stable/11 ISO images.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: why does buildworld fail on stable/11 ?

2018-01-24 Thread Scott Bennett
Subject: Re: why does buildworld fail on stable/11 ?

 On Mon, 22 Jan 2018 12:42:58 + lists  wrote:
>On 22/01/2018 09:17, Scott Bennett wrote:
>> Anyway, I'm stuck.  Can someone please tell me what is going wrong and
>> how to fix it?  I'd really like to be able to update my system, not only to
>> keep it reasonably current, but also to be able to customize a kernel.  
>> Thanks
>> in advance for any suggestions/solutions.
>
>Hi,

 Thank you for responding with your thoughts on this.
>
>What I'd do is firstly to make things as simple as possible. First do
>the upgrade simply. Either delete or call make.conf & src.conf something

 Okay.

>else. Then get a fresh src  and ports tree via svnlite. Then do

/usr/src has not been altered since I ran the checkout.  Is there some reason
to use svnlite rather than svn, which is how I did the checkout?
>
>rm -rf /buildwork/ccache.freebsd
>mkdir /buildwork/ccache.freebsd

 I'm not following your thinking here.  If I've eliminated /etc/make.conf
from the picture, then ccache is not involved at all, so why should I wipe out
the cache contents?
>
>then
>
>[/usr/src #] make cleandir && make clean && make buildworld && make
>buildkernel && make installkernel && mergemaster -p

 At this point, that looks very optimistic, to say the least. :-)  I've
tried "make cleanworld" (with /etc/make.conf still in place), and it failed
exactly like the buildworld example I posted before.
>
>[make changes if needed]
>
>[/usr/src #] make installworld && mergemaster
>
>[make more changes if needed]
>
>reboot
>
>then cd into /usr/src as root, then do
>
>yes | make delete-old
>yes | make delete-old-libs

 What do you expect the above to accomplish on a freshly installed system
on which no obsolete directories or libraries from a previous release should
exist?
>
>reboot again
>
>then re-enable your extra lines in make.conf and src.conf, if you need them.
>
>It may be down to something in the ccache dir, I guess.
>
 I will try it with no /etc/{make,src}.conf then just to find out what
will happen, but I will still need a way to build from source with ccache
involved in the process for normal use because it typically cuts the build
times by 50% - 75% from the usual five or six hours elapsed time.
 Thanks again for your suggestions.  I will try it without
/etc/{make,src}.conf and report back.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


why does buildworld fail on stable/11 ?

2018-01-22 Thread Scott Bennett
 I tried asking for help on this several days ago on freebsd-questions@,
but got no responses.  I'm trying freebsd-stable@ next because it involves
trying to build world on a 11.1-STABLE system from a freshly checked out,
unaltered source tree at r328251.  The system currently installed is from a
11.1-STABLE installer image:

FreeBSD hellas 11.1-STABLE FreeBSD 11.1-STABLE #0 r326620: Wed Dec  6 15:08:03 
UTC 2017 r...@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

I would like to keep it reasonably up to date, but I am stumped.
 I did a fresh svn checkout into a completely empty /usr/src, but when
I tried to run a "make buildworld", it failed like this.

Script started on Mon Jan 22 02:11:21 2018
hellas# setenv CCACHE_DIR /buildwork/ccache.freebsd
hellas# cd /usr/src
hellas# date && \
? time nice +11 make buildworld  && date
Mon Jan 22 02:13:05 CST 2018
Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

Unknown modifier 'U'

"/usr/src/share/mk/src.sys.mk", line 28: Option DIRDEPS_BUILD may only be 
defined in , environment, or make argument, not /etc/src.conf.
*** Error code 1

Stop.
make: stopped in /usr/src
0.035u 0.047s 0:01.65 4.2%  15254+422k 88+21io 31pf+0w
hellas# exit
exit

Script done on Mon Jan 22 02:13:19 2018

Note that 'U' has not always been the letter complained of in the "Unknown
modifier" messages.  The content of /etc/src.conf is as follows.

PORTS_MODULES=multimedia/cuse4bsd-kmod sysutils/pefs-kmod 
emulators/virtualbox-ose-kmod net/ndproxy
WITH_LLDB=yes
#WITH_FAST_DEPEND=yes
WITH_CCACHE_BUILD=yes

As you can see, there is no DIRDEPS_BUILD in /etc/src.conf.  I don't even
know what DIRDEPS_BUILD is.
     Anyway, I'm stuck.  Can someone please tell me what is going wrong and
how to fix it?  I'd really like to be able to update my system, not only to
keep it reasonably current, but also to be able to customize a kernel.  Thanks
in advance for any suggestions/solutions.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: automount usb msdosfs no partition table

2017-10-09 Thread Scott Bennett
Tomasz CEDRO  wrote:

> i cannot format that device, as its the "firmware feature" that it has no
> partition table.. i would have to fix the firmware.. but it would be nice
> to automount it anyway as macos, linux and windoze can :-)
>
 Well, put a partition table onto it, then.  You can use either gpart(8)
or fdisk(8) to do that and to create a slice, and then use newfs_msdos(8) to
create the file system.
 I understood from your previous message that you wanted to create a FAT32
file system on /dev/da0 rather than on /dev/da0s1, which meant on the bare
device rather than on a slice.  Otherwise, create the partition table, create
a slice, and proceed.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


automount usb msdosfs no partition table

2017-10-09 Thread Scott Bennett
On Date: Mon, 9 Oct 2017 01:10:19 +0200 Tomasz CEDRO  wrote:

> I need to configure automount for a testing machine. It seems to work
> fine, except for two issues:

 I've never used the automounter, so I can't help you with that.
>
> 1. Mount point does not disappear after device disappears, what makes
> things harder to script when device is gone. automount -c does not
> remove the mountpoint, only restarting the service does. It is a bug
> or feature?
>
> 2. Automounter does not mount USB Pendrive / MSDOSFS devices that does
> not have a parition table. Some USB Drives does not have valid
> partition table, they appear as /dev/da0 and can be mounted with
> mount_msdosfs /dev/da0 /mnt, but they are not recognised by
> automounter.. how can I make it work with such devices?
>
 Try newfs_msdos(8).


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic: Solaris(panic): blkptr invalid CHECKSUM1

2017-10-01 Thread Scott Bennett
don't know the DVA and blkptr internals, so I won't
>write a zfs_fsck(8) soon ;-)
>
>Does it make sense to dump the disks for further analysis?
>I need to recreate the pool because I need the machine's resources... :-(
>Any help highly appreciated!
>
 First, if it's not too late already, make a copy of the pool's cache file,
and save it somewhere in case you need it unchanged again.
 Can zdb(8) see it without causing a panic, i.e., without importing the
pool?  You might be able to track down more information if zdb can get you in.
 Another thing you could try with an admittedly very low probability of
working would be to try importing the pool with one drive of one mirror
missing, then try it with a different drive of one mirror, and so on the minor
chance that the critical error is limited to one drive.  If you find a case
where that works, then you could try to rebuild the missing drive and then run
a scrub.  Or vice versa.  This one is time-consuming, I would imagine, given
that each failure means a reboot. :-(


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-15 Thread Scott Bennett
Mark Millard  wrote:

> [Something strange happened to the automatic CC: fill-in for my original
> reply. Also I should have mentioned that for my test program if a
> variant is made that does not fork the swapping works fine.]
>
> On 2017-Mar-15, at 9:37 AM, Mark Millard  wrote:
>
> > On 2017-Mar-15, at 6:15 AM, Scott Bennett  wrote:
> > 
> >>On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard
> >>  wrote:
> >>> On 2017-Mar-14, at 4:44 PM, Bernd Walter  wrote:
> >>> 
> >>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
> >>>>> [test_check() between the fork and the wait/sleep prevents the
> >>>>> failure from occurring. Even a small access to the memory at
> >>>>> that stage prevents the failure. Details follow.]
> >>>> 
> >>>> Maybe a stupid question, since you might have written it somewhere.
> >>>> What medium do you swap to?
> >>>> I've seen broken firmware on microSD cards doing silent data
> >>>> corruption for some access patterns.
> >>> 
> >>> The root filesystem is on a USB SSD on a powered hub.
> >>> 
> >>> Only the kernel is from the microSD card.
> >>> 
> >>> I have several examples of the USB SSD model and have
> >>> never observed such problems in any other context.
> >>> 
> >>> [remainder of irrelevant material deleted  --SB]
> >> 
> >>You gave a very long-winded non-answer to Bernd's question, so I'll
> >> repeat it here.  What medium do you swap to?
> > 
> > My wording of:
> > 
> > The root filesystem is on a USB SSD on a powered hub.
> > 
> > was definitely poor. It should have explicitly mentioned the
> > swap partition too:
> > 
> > The root filesystem and swap partition are both on the same
> > USB SSD on a powered hub.
> > 
> > More detail from dmesg -a for usb:
> > 
> > usbus0: 12Mbps Full Speed USB v1.0
> > usbus1: 480Mbps High Speed USB v2.0
> > usbus2: 12Mbps Full Speed USB v1.0
> > usbus3: 480Mbps High Speed USB v2.0
> > ugen0.1:  at usbus0
> > uhub0:  on usbus0
> > ugen1.1:  at usbus1
> > uhub1:  on usbus1
> > ugen2.1:  at usbus2
> > uhub2:  on usbus2
> > ugen3.1:  at usbus3
> > uhub3:  on usbus3
> > . . .
> > uhub0: 1 port with 1 removable, self powered
> > uhub2: 1 port with 1 removable, self powered
> > uhub1: 1 port with 1 removable, self powered
> > uhub3: 1 port with 1 removable, self powered
> > ugen3.2:  at usbus3
> > uhub4 on uhub3
> > uhub4:  on 
> > usbus3
> > uhub4: MTT enabled
> > uhub4: 4 ports with 4 removable, self powered
> > ugen3.3:  at usbus3
> > umass0 on uhub4
> > umass0:  on usbus3
> > umass0:  SCSI over Bulk-Only; quirks = 0x0100
> > umass0:0:0: Attached to scbus0
> > . . .
> > da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
> > da0:  Fixed Direct Access SPC-4 SCSI device
> > da0: Serial Number 
> > da0: 40.000MB/s transfers
> > 
> > (Edited a bit because there is other material interlaced, even
> > internal to some lines. Also: I removed the serial number of the
> > specific example device.)

 Thank you.  That presents a much clearer picture.
> > 
> >>I will further note that any kind of USB device cannot automatically
> >> be trusted to behave properly.  USB devices are notorious, for example,
> >> 
> >>   [reasons why deleted  --SB]
> >> 
> >>You should identify where you page/swap to and then try substituting
> >> a different device for that function as a test to eliminate the possibility
> >> of a bad storage device/controller.  If the problem still occurs, that
> >> means there still remains the possibility that another controller or its
> >> firmware is defective instead.  It could be a kernel bug, it is true, but
> >> making sure there is no hardware or firmware error occurring is important,
> >> and as I say, USB devices should always be considered suspect unless and
> >> until proven innocent.
> > 
> > [FYI: This is a ufs context, not a zfs one.]

 Right.  It's only a Pi, after all. :-)
> > 
> > I'm aware of such  things. There is no evidence that has resulted in
> > suggesting the USB devices that I can replace are a problem. Otherwise
> > I'd not be going down this path. I only have access to the one arm64
> > device (a Pine64+ 2GB) so I've no ability to substitution-test

Swapping from a zvol results in a deadman panic

2017-02-05 Thread Scott Bennett
On Sat, 4 Feb 2017 08:18:28 -0500 "Matthew X. Economou" 
wrote:

>My FreeBSD 10.3-RELEASE-p16 server crashes in the middle of a Poudriere
>bulk run (see below).  This crash happens even if I lower
>vfs.zfs.arc_max or tweak vm.v_free_min/target/reserved/severe.  I'm
>looking for configuration advice in case I missed something obvious,
>since this seems to work on Illumos- and Linux-derived O/Ses, but
>failing that, I'd like to get some advice as to how to go about
>debugging this.  I doubt the deadman timer causes the system to stop
>responding.  It's more likely a race condition elsewhere.
>
 Those who try to create deadlocks should not complain when they succeed.
Sigh.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


make buildkernel does not respect KERNCONF or JOBS in /etc/make.conf

2016-12-11 Thread Scott Bennett
 On Sun, 11 Dec 2016 12:34:58 + tech-lists 
wrote:

>I have found that make buildkernel/installkernel does not respect 
>KERNCONF= variables. It also doesn't respect MAKE_JOBS_NUMBER. It *DOES* 
>however respect WITH_CCACHE_BUILD. It's hard to say when the behaviour 
>changed to what it is, but it was sometime around the time that 
>11-CURRENT became 11-STABLE.
>
>Sources are 11-STABLE r309795
>
>Here is my /etc/make.conf, which used to work:
>
>MALLOC_PRODUCTION=yes
>WITH_CCACHE_BUILD=yes
>MAKE_JOBS_NUMBER=32
>KERNCONF=PUMPKIN GENERIC
>WITH_MANCOMPRESS=YES
>WITHOUT_DEBUG=YES
>DEFAULT_VERSIONS+=  ssl=libressl
>OPTIMIZED_CFLAGS=YES
>BUILD_OPTIMIZED=YES
>
>I used to be able to buildworld and kernel like this:
>
>root@localhost:/usr/src# make cleandir && make clean && make buildworld 
>&& make buildkernel && make installkernel && mergemaster -p
>
>and I'd get two installed kernels, PUMPKIN and GENERIC
>
>now I have to specify on the line:
>
>root@localhost:/usr/src# make cleandir && make clean && make -j32 
>buildworld && make -j32 buildkernel KERNCONF=PUMPKIN
>
>Also, I have to specify jobs # for both buildworld and buildkernel 
>otherwise it just uses one, two or four cores.
>
>How can I get it to work like it did previously?
>
 You may have misremembered how you did it previously.  Try adding

BUILDKERNELS=PUMPKIN GENERIC

to your /etc/src.conf and removing the KERNCONF line from /etc/make.conf
before you run it again.  KERNCONF goes on the "make buildkernel" command,
not into /etc/make.conf, but should not be necessary at all if /etc/src.conf
contains the list of kernels to be built.  (See src.conf(5).)


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: huge nanosleep variance on 11-stable

2016-11-03 Thread Scott Bennett
 On Wed, 2 Nov 2016 10:23:24 -0400 George Mitchell 
wrote:
>On 11/01/16 23:45, Kevin Oberman wrote:
>> On Tue, Nov 1, 2016 at 2:36 PM, Jason Harmening 
>> wrote:
>> 
>>> Sorry, that should be ~*30ms* to get 30fps, though the variance is still
>>> up to 500ms for me either way.
>>>
>>> On 11/01/16 14:29, Jason Harmening wrote:
>>>> repro code is at http://pastebin.com/B68N4AFY if anyone's interested.
>>>>
>>>> On 11/01/16 13:58, Jason Harmening wrote:
>>>>> Hi everyone,
>>>>>
>>>>> I recently upgraded my main amd64 server from 10.3-stable (r302011) to
>>>>> 11.0-stable (r308099).  It went smoothly except for one big issue:
>>>>> certain applications (but not the system as a whole) respond very
>>>>> sluggishly, and video playback of any kind is extremely choppy.
>>>>>
>>>>> [...]
>> I eliminated the annoyance by change scheduler from ULE to 4BSD. That was
>> it, but I have not seen the issue since. I'd be very interested in whether
>> the scheduler is somehow impacting timing functions or it's s different
>> issue. I've felt that there was something off in ULE for some time, but it
>> was not until these annoying hiccups convinced me to try going back to
>> 4BSD.
>> 
>> Tip o' the hat to Doug B. for his suggestions that ULE may have issues that
>> impacted interactivity.
>> [...]
>
>Not to beat a dead horse, but I've been a non-fan of SCHED_ULE since
>it was first introduced, and I don't like it even today.  I run the
>distributed.net client on my machines, but even without that, ULE
>screws interactive behavior.  With distributed.net running and ULE,
>a make buildworld/make buildkernel takes 10 2/3 hours on 10.3-RELEASE
>on a 6-CPU machine versus 2 1/2 hours on the same machine with 4BSD
>and distributed.net running.  I'm told that SCHED_ULE is the greatest
>thing since sliced bread for some compute load or other (details are
>scarce), but I (fortunately) don't often have to run heavy server
>type loads; and for everyday use (even without distributed.net
>running), SCHED_4BSD is my choice by far.  It's too bad I can't run
>freebsd_update with it, though.
>
 I gave up on ULE during 8-STABLE.  I had tried tinkering with
kern.sched.preempt_thresh as recommended, as well as some more extreme
values, but I couldn't see any improvement.  Some values may have made
performance even worse.  The last straw for me, however, was when I
discovered that ULE happily scheduled *idle* priority processes at times
when both CPU threads on a P4 Prescott were tied up by 100% CPU-bound
(mprime) threads at normal priority niced to 20.  Idle priority tasks
should *only* run when no higher priority tasks are available to run for
all CPU threads.  The 4BSD scheduler handles this situation properly.
 Now I'm running 10.3-STABLE on a QX9650, and I haven't tested ULE
on it to see whether it's still as flawed.  If and when I get a machine
with a multi-cored, hyperthreaded CPU or perhaps a board with multiple
CPU chips, then I may worry about the multi-level affinity stuff that
ULE was supposedly designed for enough to bother testing it.  But for
now, I can't see any advantage in it for my current machine.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs, a directory that used to hold lot of files and listing pause

2016-10-21 Thread Scott Bennett
 On Fri, 21 Oct 2016 16:51:36 +0500 "Eugene M. Zheganin"
 wrote:

>On 21.10.2016 15:20, Slawa Olhovchenkov wrote:
>>
>> ZFS prefetch affect performance dpeneds of workload (independed of RAM
>> size): for some workloads wins, for some workloads lose (for my
>> workload prefetch is lose and manualy disabled with 128GB RAM).
>>
>> Anyway, this system have only 24MB in ARC by 2.3GB free, this is may
>> be too low for this workload.
>You mean - "for getting a list of a directory with 20 subdirectories" ? 
>Why then does only this directory have this issue with pause, not 
>/usr/ports/..., which has more directories in it ?
>
>(and yes, /usr/ports/www isn't empty and holds 2410 entities)
>
>/usr/bin/time -h ls -1 /usr/ports/www
>[...]
>0.14s real  0.00s user  0.00s sys
>
 Oh, my goodness, how far afield nonsense has gotten!  Have all the
good folks posting in this thread forgotten how directory blocks are
allocated in UNIX?  This isn't even a BSD-specific thing; it's really
ancient.  What Eugene has complained of is exactly what is to be expected--
on really old hardware.  The only eyebrow-raiser is that he has created a
use case so extreme that a live human can actually notice the delays on
modern hardware.
 I quote from his original posting:  "I also have one directory that used
to have a lot of (tens of thousands) files." and "But now I have 2 files and
a couple of dozens directories in it".  A directory with tens of thousands
of files in it at one point in time most likely has somewhere well over one
thousand blocks allocated.  Directories don't shrink.  Directory entries do
not get moved around within directories when files are added or deleted.
Directories can remain the same length or they can grow in length.  If a
directory once had many tens of thousands of filenames and links to their
primary inodes, then the directory is still that big, even if it now only
contains two [+ 20 to 30 directory], probably widely separated, entries.  To
read a file's entry, all blocks must be searched until the desired filename
is found.  Likewise, to list the contents of a directory, all blocks must be
read until the number of files found matches the link count for the directory.
IOW, if you want the performance to go back to what it was when the directory
was fresh (and still small), you have to create a new directory and then move
the remaining entries from the old directory into the new (small) directory.
The only real difference here between UFS (or even the early AT&T filesystem)
and ZFS is that the two remaining entries in a formerly huge directory are
likely to be in different directory blocks that could be at effectively random
locations scattered around the space of a partition for one filesystem in UFS
or over an entire pool of potentially many filesystems and much more space in
ZFS.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: buildworld errors at outset on fresh svn checkout

2016-10-08 Thread Scott Bennett
 I wrote:

>"Matthew D. Fuller"  wrote:
>
>> On Fri, Oct 07, 2016 at 10:13:02PM -0700 I heard the voice of
>> Kevin Oberman, and lo! it spake thus:
>> > On Thu, Oct 6, 2016 at 10:16 PM, Scott Bennett  wrote:
>> > 
>> > > "/usr/src/Makefile.inc1", line 1113: Malformed conditional
>> > > (${BUILDKERNELS:[)
>> > > Unknown modifier '['
>> > 
>> > '[' needs to be a hard link to test.
>>
>> This is make, not sh.
>>
>> That would be the line
>>
>> .if ${BUILDKERNELS:[#]} > 1 && ${NO_INSTALLEXTRAKERNELS} != "yes"
>>
>> in current stable/10, and it's horking on the [...] modifier.  That's
>
> What is currently installed is
>
>FreeBSD hellas 10.3-STABLE FreeBSD 10.3-STABLE #284 r304657: Tue Aug 23 
>01:48:12 CDT 2016 bennett@hellas:/usr/obj/usr/src/sys/hellas  amd64
>
>> listed in the make manpage on a stable/10 system from last October, so
>> it's not terribly new.  Maybe an ancient make?  Or your 'make' isn't
>> the make it's expecting?
>>
>Here are the ones I have currently installed.
>
>[hellas] 26 % ls -ilgF /usr/{,local/}bin/{,?}make
> 102327 -r-xr-xr-x  1 root  wheel   466396 May  3  2014 /usr/bin/bmake*
>  82321 -r-xr-xr-x  1 root  wheel   715152 Aug 23 05:17 /usr/bin/make*
>7391987 -r-xr-xr-x  1 root  wheel   187600 Aug 23 16:24 /usr/local/bin/bmake*
>7391732 -rwxr-xr-x  1 root  wheel  3805840 Sep 18 13:36 /usr/local/bin/cmake*
>7386658 -r-xr-xr-x  1 root  wheel   112976 Sep  6  2015 /usr/local/bin/dmake*
>7384146 -r-xr-xr-x  1 root  wheel   224520 Jul 10 18:45 /usr/local/bin/gmake*
>7384478 -r-xr-xr-x  1 root  wheel24664 May 18  2015 /usr/local/bin/imake*
>7386767 -r-xr-xr-x  1 root  wheel  2541440 Dec 14  2014 /usr/local/bin/qmake*
>7392225 -rwxr-xr-x  1 root  wheel   123256 Aug 31 11:03 /usr/local/bin/smake*
>7384593 -r-xr-xr-x  1 root  wheel32941 May 19  2015 /usr/local/bin/tmake*
>
> So the worrisome one looks like the /usr/bin/bmake, which pershaps should
>have been removed by "make delete-old" during an upgrade somewhere along the
>way.  Let me try renaming it to see what happens.  I'll report back.
>
 Okay.  I tried renaming /usr/bin/bmake to /usr/bin/bmake.old and then
ran "make buildworld".  That failed the same way as before. :-(  So I guess
that wasn't the problem.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: buildworld errors at outset on fresh svn checkout

2016-10-08 Thread Scott Bennett
"Matthew D. Fuller"  wrote:

> On Fri, Oct 07, 2016 at 10:13:02PM -0700 I heard the voice of
> Kevin Oberman, and lo! it spake thus:
> > On Thu, Oct 6, 2016 at 10:16 PM, Scott Bennett  wrote:
> > 
> > > "/usr/src/Makefile.inc1", line 1113: Malformed conditional
> > > (${BUILDKERNELS:[)
> > > Unknown modifier '['
> > 
> > '[' needs to be a hard link to test.
>
> This is make, not sh.
>
> That would be the line
>
> .if ${BUILDKERNELS:[#]} > 1 && ${NO_INSTALLEXTRAKERNELS} != "yes"
>
> in current stable/10, and it's horking on the [...] modifier.  That's

 What is currently installed is

FreeBSD hellas 10.3-STABLE FreeBSD 10.3-STABLE #284 r304657: Tue Aug 23 
01:48:12 CDT 2016 bennett@hellas:/usr/obj/usr/src/sys/hellas  amd64

> listed in the make manpage on a stable/10 system from last October, so
> it's not terribly new.  Maybe an ancient make?  Or your 'make' isn't
> the make it's expecting?
>
Here are the ones I have currently installed.

[hellas] 26 % ls -ilgF /usr/{,local/}bin/{,?}make
 102327 -r-xr-xr-x  1 root  wheel   466396 May  3  2014 /usr/bin/bmake*
  82321 -r-xr-xr-x  1 root  wheel   715152 Aug 23 05:17 /usr/bin/make*
7391987 -r-xr-xr-x  1 root  wheel   187600 Aug 23 16:24 /usr/local/bin/bmake*
7391732 -rwxr-xr-x  1 root  wheel  3805840 Sep 18 13:36 /usr/local/bin/cmake*
7386658 -r-xr-xr-x  1 root  wheel   112976 Sep  6  2015 /usr/local/bin/dmake*
7384146 -r-xr-xr-x  1 root  wheel   224520 Jul 10 18:45 /usr/local/bin/gmake*
7384478 -r-xr-xr-x  1 root  wheel24664 May 18  2015 /usr/local/bin/imake*
7386767 -r-xr-xr-x  1 root  wheel  2541440 Dec 14  2014 /usr/local/bin/qmake*
7392225 -rwxr-xr-x  1 root  wheel   123256 Aug 31 11:03 /usr/local/bin/smake*
7384593 -r-xr-xr-x  1 root  wheel32941 May 19  2015 /usr/local/bin/tmake*

 So the worrisome one looks like the /usr/bin/bmake, which pershaps should
have been removed by "make delete-old" during an upgrade somewhere along the
way.  Let me try renaming it to see what happens.  I'll report back.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: buildworld errors at outset on fresh svn checkout

2016-10-07 Thread Scott Bennett
Kevin Oberman  wrote:

 Thanks much for replying!

> On Thu, Oct 6, 2016 at 10:16 PM, Scott Bennett  wrote:
>
> >  I'm running into a problem in updating my 10-STABLE system from
> > source.
> > A "make buildworld" quits immediately.  I tried a fresh svn checkout for
> > base/stable/10 and then tried to run buildworld again, but got the same
> > error.
> > I've been scratching my head over this for hours, but must be missing
> > something
> > simple.
> >  I have ccache installed and have been using it for a fairly long time
> > now.
> > My /etc/src.conf contains just two lines:
> >
> > PORTS_MODULES=multimedia/cuse4bsd-kmod sysutils/pefs-kmod #
> > emulators/virtualbox-ose-kmod
> > WITH_LLDB=yes
> >
> > My /etc/make.conf is rather longer, so I'll append it following .sig below.
> >
> >  Here's what happens.
> >
> > Script started on Thu Oct  6 23:31:47 2016
> > hellas# cd /usr/src
> > hellas# nice make buildworld
> > Unknown modifier '['
> >
> > "/usr/src/Makefile.inc1", line 1113: Malformed conditional
> > (${BUILDKERNELS:[)
> > Unknown modifier '['
> >
> > "/usr/src/Makefile.inc1", line 1122: if-less endif
> > Unknown modifier '['
> >
> > "/usr/src/Makefile.inc1", line 1144: Malformed conditional
> > (${BUILDKERNELS:[)
> > Unknown modifier '['
> >
> > "/usr/src/Makefile.inc1", line 1161: if-less endif
> > Unknown modifier '['
> >
> > "/usr/src/Makefile.inc1", line 1183: Malformed conditional
> > (${BUILDKERNELS:[)
> > Unknown modifier '['
> >
> > "/usr/src/Makefile.inc1", line 1190: if-less endif
> > bmake: fatal errors encountered -- cannot continue
> > *** Error code 1
> >
> > Stop.
> > make: stopped in /usr/src
> > hellas# exit
> > exit
> >
> > Script done on Thu Oct  6 23:37:00 2016
> >
> >  This just started happening after my machine had been down for a
> > couple
> > of days after a hang that damaged stuff in /usr/home.  I had already
> > restored
> > /usr/local from backups before narrowing down the weird behavior I was
> > seeing
> > in wmaker to /usr/home corruption.  So /usr/home has now been restored to
> > good condition, too, but perhaps I need to restore something else as well.
> > This mess was part of my justification to myself for the fresh checkout of
> > /usr/src, but that doesn't seem to have made any difference in the
> > buildworld
> > failure.
> >  If anyone else can see what's wrong and clue me in, I'd be grateful.
> > I'm subscribed to the digest for this list, so please Cc: me directly, so
> > I'll get replies right away.
> > Thanks in advance!
> >
> >  [stuff deleted --SB]
>
> Could something else have gotten corrupted?
>
> > ls -i /bin/[
> 65795 /bin/[
> > ls -i /bin/test
> 65795 /bin/test
>
> The values (inode) will not match these, but must be identical.

[hellas] 23 % ls -lgiF /bin/\[ /bin/test
131557 -r-xr-xr-x  2 root  wheel  11664 Aug 23 05:16 /bin/[*
131557 -r-xr-xr-x  2 root  wheel  11664 Aug 23 05:16 /bin/test*

So no, that's not the problem.
>
> '[' needs to be a hard link to test. I'm suspicious that something happened
> to this link. If this is the case, other corruption may have occurred, but,
> if you can re-create the hardlink (ln /bin/test /bin/[) and successfully
> make buildworld  and make buildkernel, it's likely that you can reinstall
> the system and the system will be fine. Of course, things could be damaged
> in the installed ports, too.

 Unfortunately, no, I can't re-install, or at least not without restoring
/usr (and probably / and /usr/obj) to before the damage, assuming I can figure
out exactly between which backups the damage occurred, which may not have been
at the time of the hang.  I already had a recently built /usr/obj and tried to
install its contents.  The failure of installkernel was how I discovered the
problem.
 So I'm still stuck.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


buildworld errors at outset on fresh svn checkout

2016-10-06 Thread Scott Bennett
 I'm running into a problem in updating my 10-STABLE system from source.
A "make buildworld" quits immediately.  I tried a fresh svn checkout for
base/stable/10 and then tried to run buildworld again, but got the same error.
I've been scratching my head over this for hours, but must be missing something
simple.
 I have ccache installed and have been using it for a fairly long time now.
My /etc/src.conf contains just two lines:

PORTS_MODULES=multimedia/cuse4bsd-kmod sysutils/pefs-kmod # 
emulators/virtualbox-ose-kmod
WITH_LLDB=yes

My /etc/make.conf is rather longer, so I'll append it following .sig below.

 Here's what happens.

Script started on Thu Oct  6 23:31:47 2016
hellas# cd /usr/src
hellas# nice make buildworld
Unknown modifier '['

"/usr/src/Makefile.inc1", line 1113: Malformed conditional (${BUILDKERNELS:[)
Unknown modifier '['

"/usr/src/Makefile.inc1", line 1122: if-less endif
Unknown modifier '['

"/usr/src/Makefile.inc1", line 1144: Malformed conditional (${BUILDKERNELS:[)
Unknown modifier '['

"/usr/src/Makefile.inc1", line 1161: if-less endif
Unknown modifier '['

"/usr/src/Makefile.inc1", line 1183: Malformed conditional (${BUILDKERNELS:[)
Unknown modifier '['

"/usr/src/Makefile.inc1", line 1190: if-less endif
bmake: fatal errors encountered -- cannot continue
*** Error code 1

Stop.
make: stopped in /usr/src
hellas# exit
exit

Script done on Thu Oct  6 23:37:00 2016

 This just started happening after my machine had been down for a couple
of days after a hang that damaged stuff in /usr/home.  I had already restored
/usr/local from backups before narrowing down the weird behavior I was seeing
in wmaker to /usr/home corruption.  So /usr/home has now been restored to
good condition, too, but perhaps I need to restore something else as well.
This mess was part of my justification to myself for the fresh checkout of
/usr/src, but that doesn't seem to have made any difference in the buildworld
failure.
 If anyone else can see what's wrong and clue me in, I'd be grateful.
I'm subscribed to the digest for this list, so please Cc: me directly, so
I'll get replies right away.
Thanks in advance!


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
/etc/make.conf contains:

CPUTYPE?=core2
CFLAGS+="-mtune=core2"
SVNFLAGS?="-r RELENG_10"
# build ports with clang stack protector
WITH_SSP=yes
SSP_CFLAGS=-fstack-protector-all
# added for ports system use to avoid dialogs by SJB  4 May 2007
BATCH=YES
# added for new pkg system  --SJB  10 December 2014
WITH_PKGNG=yes
# build ports using ccache  --SJB  19 January 2015
WITH_CCACHE_BUILD=yes
## buildworld and buildkernel using ccache  --SJB  26 January 2015
.if (!empty(.CURDIR:M/usr/src*) || !empty(.CURDIR:M/usr/obj*))
.if !defined(NOCCACHE) && exists(/usr/local/libexec/ccache/world/cc)
CC:=${CC:C,^cc,/usr/local/libexec/ccache/world/cc,1}
CXX:=${CXX:C,^c\+\+,/usr/local/libexec/ccache/world/c++,1}
CCACHE_COMPILERCHECK=content
CCACHE_DIR=/buildwork/ccache.freebsd
.endif
.else
CFLAGS+="-mssse3"
#CFLAGS+="-mssse3 -msse4.1"
.endif
# added to deal with ccache bug 8460  --SJB  2 November 2013
# bug has been reported fixed, so try without this workaround
#CCACHE_CPP2=1
# added as a better specification of -j by SJB 17 November 2009
MAKE_JOBS_NUMBER=4
# put build tree where there is plenty of temporary workspace
WRKDIRPREFIX=/buildwork/ports
DEFAULT_VERSIONS+=  ssl=openssl
# Allow updating of Mesa3D from 7.4.4 to 7.6.1 and libdrm from 2.4.12 to 2.4.17
WITHOUT_NOUVEAU=yes
# Use ATLAS libraries in ports that use BLAS libraries
OPTIONS_SET=ATLAS
# Tell gnustep-related ports to use base system's compiler
GNUSTEP_WITH_BASE_GCC=yes
GNUSTEP_WITHOUT_LIBOBJC=yes
QT4_OPTIONS= CUPS NAS QGTKSTYLE
# Begin portconf settings
# Do not touch these lines
.if !empty(.CURDIR:M/usr/ports*) && exists(/usr/local/libexec/portconf)
_PORTCONF!=/usr/local/libexec/portconf
.if ${_PORTCONF} != "|"
.for i in ${_PORTCONF:S/^|//:S/|/ /g}
${i:C/^([^=]*)=.*/\1/}=${i:C/^[^=]*=//:S/%/ /g}
.endfor
.endif
.endif
# End portconf settings
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: devd(8) complains loudly when DVD player is empty, possibly due to r298134

2016-05-02 Thread Scott Long

> On May 1, 2016, at 9:07 AM, Trond Endrestøl 
>  wrote:
> 
> On Wed, 27 Apr 2016 13:46-0400, Scott Long wrote:
> 
>> Thanks for the report.  I might be mistaken, but the default system 
>> is not configured to direct devd messages to user.info, so I didn’t 
>> see this during my development.  However, what you’re reporting is 
>> definitely annoying, so Warner Losh and I are working on a solution.
>> 
>> Scott
> 
> I solved the problem by running devd with -q, i.e. devd_flags="-q" in 
> /etc/rc.conf. This should probably be the default anyway.
> 
> All of my systems (stable/10) have custom logging where each facility 
> has its own file. Also *.*;mark.* is sent to /dev/ttyvb and to the 
> central log host. /dev/ttyvb was pretty busy on the log host.
> 
> Making devd less chatty does have its merits.
> The next servers I buy will probably exclude a DVD player.
> 
> Happy hacking.

Hi Trond,

Thanks for the follow-up.  I still plan to fix the drivers so they don’t 
generate unwanted noise, whether or not its recorded in devd.

Scott

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: devd(8) complains loudly when DVD player is empty, possibly due to r298134

2016-04-27 Thread Scott Long
Hi Trond,

Thanks for the report.  I might be mistaken, but the default system is not 
configured to direct devd messages to user.info, so I didn’t see this during my 
development.  However, what you’re reporting is definitely annoying, so Warner 
Losh and I are working on a solution.

Scott

> On Apr 27, 2016, at 1:23 PM, Trond Endrestøl 
>  wrote:
> 
> Hi,
> 
> The symptoms began after upgrading from stable/10 r298033 to stable/10 
> r298573.
> 
> Apr 27 18:40:00  [HOSTNAME] devd: Processing event '!system=CAM 
> subsystem=periph type=error device=cd0 serial="R8KL6GKC900AFG" 
> cam_status="0xcc" scsi_status=2 scsi_sense="70 02 04 01" CDB="00 00 00 00 00 
> 00 " '
> 
> These messages are just seconds apart:
> 
> Apr 27 18:40:01  [HOSTNAME] devd: Processing event '!system=CAM 
> subsystem=periph type=error device=pass1 serial="R8KL6GKC900AFG" 
> cam_status="0xcc" scsi_status=2 scsi_sense="70 02 04 01" CDB="00 00 00 00 00 
> 00 " '
> Apr 27 18:40:03  [HOSTNAME] devd: Processing event '!system=CAM 
> subsystem=periph type=error device=pass1 serial="R8KL6GKC900AFG" 
> cam_status="0xcc" scsi_status=2 scsi_sense="70 02 04 01" CDB="00 00 00 00 00 
> 00 " '
> Apr 27 18:40:05  [HOSTNAME] devd: Processing event '!system=CAM 
> subsystem=periph type=error device=pass1 serial="R8KL6GKC900AFG" 
> cam_status="0xcc" scsi_status=2 scsi_sense="70 02 04 01" CDB="00 00 00 00 00 
> 00 " '
> 
> When I put a CD or DVD in the DVD player, the messages stop. As soon 
> as I eject the disc, they start appearing again.
> 
> Here's the relevant part from dmesg:
> 
> cd0 at ahcich1 bus 0 scbus1 target 0 lun 0
> cd0:  Removable CD-ROM SCSI device
> cd0: Serial Number R8KL6GKC900AFG
> cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
> cd0: Attempt to query device size failed: NOT READY, Medium not present - 
> tray closed
> 
> This is on a mid-2012 Dell Latitude E5530 with the stock DVD player.
> 
> Upgrading to stable/10 r298705 doesn't resolve this issue.
> 
> Does anyone else see this?
> 
> Maybe r298134 is to blame:
> 
>  stable/10/sys/cam/cam_periph.c
> 
>  MFC r298004:
> 
>  Add a devctl/devd notification conduit for CAM errors that happen at the
>  periph level.
> 
>  Due to not merging the changes to ata_res_sbuf(), this version is a little
>  messy.
> 
>  Sponsored by:Netflix
> 
> http://svnweb.freebsd.org/base?view=revision&revision=298134
> 
> -- 
> +---++
> | Vennlig hilsen,   | Best regards,  |
> | Trond Endrestøl,  | Trond Endrestøl,   |
> | IT-ansvarlig, | System administrator,  |
> | Fagskolen Innlandet,  | Gjøvik Technical College, Norway,  |
> | tlf. mob.   952 62 567,   | Cellular...: +47 952 62 567,   |
> | sentralbord 61 14 54 00.  | Switchboard: +47 61 14 54 00.  |
> +---++
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: kernel: mps0: Out of chain frames, consider increasing hw.mps.max_chains.

2016-03-08 Thread Scott Long

> On Mar 8, 2016, at 11:02 AM, Slawa Olhovchenkov  wrote:
> 
> On Tue, Mar 08, 2016 at 10:56:39AM -0800, Scott Long wrote:
> 
>> 
>>> On Mar 8, 2016, at 10:48 AM, Slawa Olhovchenkov  wrote:
>>> 
>>> On Tue, Mar 08, 2016 at 10:34:23AM -0800, Scott Long wrote:
>>> 
>>>> 
>>>>> On Mar 8, 2016, at 10:07 AM, Slawa Olhovchenkov  wrote:
>>>>> 
>>>>> On Mon, Mar 07, 2016 at 02:10:12PM +0300, Slawa Olhovchenkov wrote:
>>>>> 
>>>>>>>>>> This allocated one for all controllers, or allocated for every 
>>>>>>>>>> controller?
>>>>>>>>> 
>>>>>>>>> It’s per-controller.
>>>>>>>>> 
>>>>>>>>> I’ve thought about making the tuning be dynamic at runtime.  I
>>>>>>>>> implemented similar dynamic tuning for other drivers, but it seemed
>>>>>>>>> overly complex for low benefit.  Implementing it for this driver
>>>>>>>>> would be possible but require some significant code changes.
>>>>>>>> 
>>>>>>>> What cause of chain_free+io_cmds_active << max_chains?
>>>>>>>> One cmd can use many chains?
>>>>>>> 
>>>>>>> Yes.  A request uses and active command, and depending on the size of 
>>>>>>> the I/O,
>>>>>>> it might use several chain frames.
>>>>> 
>>>>> I am play with max_chains and like significant cost of handling
>>>>> max_chains: with 8192 system resonded badly vs 2048. Now try 3192,
>>>>> response like with 2048.
>>>> 
>>>> Hi, I’m not sure I understand what you’re saying.  You said that you tried 
>>>> 8192, but the system still complained of being out of chain frames?  Now 
>>>> you are trying fewer, only 3192?
>>> 
>>> With 8192 system not complained of being out of chain frames, but like
>>> need more CPU power to handle this chain list -- traffic graf (this
>>> host servered HTTP by nginx) have many "jerking", with 3192 traffic
>>> graf is more smooth.
>> 
>> Hi,
>> 
>> The CPU overhead of doing more chain frames is nil.  They are just
>> objects in a list, and processing the list is O(1), not O(n).  What
>> you are likely seeing is other problems with VM and VFS-BIO system
>> struggling to deal with the amount of I/O that you are doing.
>> Depending on what kind I/O you are doing (buffered filesystem
>> reads/writes, memory mapped I/O, unbuffered I/O) there are limits
>> and high/low water marks on how much I/O can be outstanding, and
>> when the limits are reached processes are put to sleep and then race
>> back in when they are woken up.  This causes poor, oscillating
>> system behavior.  There’s some tuning you can do to increase the
>> limits, but yes, it’s a problem that behaves poorly in an untuned
>> system.
> 
> Sorry, I am don't understund you point: how to large unused chain
> frames can consume CPU power?

A ‘chain frame’ is 128 bytes.  By jumping from 2048 to 8192 chain frames 
allocated, you’ve jumped from 256KB to 1MB of allocated memory.  This sounds 
like a lot, but if you’re doing enough I/O to saturate the tunings then you 
likely have many GB of RAM.  The 1MB of memory consumed is going to be well 
less than 1% of you have, and likely .1 to .01%.  So it’s likely that the VM is 
not having to work much harder to deal with the missing memory.  In dealing 
with the chain frames themselves, they are stored on a linked list, and that 
list is never walked from head to tail.  The driver adds to head and subtracts 
from the head, so there is no cost for the length of the list.

For comparison, we use 4 ‘mps’ controllers in our servers at Netflix, and run 
20Gbps (2.5GB/s) through them.  We’ve done extensive profiling and tuning of 
the kernel, and we’ve never measured a change in cost for having different 
chain frame lengths, other than the difficulties that come from having too few. 
 The problems exist in the VM and VFS-BIO interfaces being poorly tuned for 
modern workloads.

Scott

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: kernel: mps0: Out of chain frames, consider increasing hw.mps.max_chains.

2016-03-08 Thread Scott Long

> On Mar 8, 2016, at 10:48 AM, Slawa Olhovchenkov  wrote:
> 
> On Tue, Mar 08, 2016 at 10:34:23AM -0800, Scott Long wrote:
> 
>> 
>>> On Mar 8, 2016, at 10:07 AM, Slawa Olhovchenkov  wrote:
>>> 
>>> On Mon, Mar 07, 2016 at 02:10:12PM +0300, Slawa Olhovchenkov wrote:
>>> 
>>>>>>>> This allocated one for all controllers, or allocated for every 
>>>>>>>> controller?
>>>>>>> 
>>>>>>> It’s per-controller.
>>>>>>> 
>>>>>>> I’ve thought about making the tuning be dynamic at runtime.  I
>>>>>>> implemented similar dynamic tuning for other drivers, but it seemed
>>>>>>> overly complex for low benefit.  Implementing it for this driver
>>>>>>> would be possible but require some significant code changes.
>>>>>> 
>>>>>> What cause of chain_free+io_cmds_active << max_chains?
>>>>>> One cmd can use many chains?
>>>>> 
>>>>> Yes.  A request uses and active command, and depending on the size of the 
>>>>> I/O,
>>>>> it might use several chain frames.
>>> 
>>> I am play with max_chains and like significant cost of handling
>>> max_chains: with 8192 system resonded badly vs 2048. Now try 3192,
>>> response like with 2048.
>> 
>> Hi, I’m not sure I understand what you’re saying.  You said that you tried 
>> 8192, but the system still complained of being out of chain frames?  Now you 
>> are trying fewer, only 3192?
> 
> With 8192 system not complained of being out of chain frames, but like
> need more CPU power to handle this chain list -- traffic graf (this
> host servered HTTP by nginx) have many "jerking", with 3192 traffic
> graf is more smooth.

Hi,

The CPU overhead of doing more chain frames is nil.  They are just objects in a 
list, and processing the list is O(1), not O(n).  What you are likely seeing is 
other problems with VM and VFS-BIO system struggling to deal with the amount of 
I/O that you are doing.  Depending on what kind I/O you are doing (buffered 
filesystem reads/writes, memory mapped I/O, unbuffered I/O) there are limits 
and high/low water marks on how much I/O can be outstanding, and when the 
limits are reached processes are put to sleep and then race back in when they 
are woken up.  This causes poor, oscillating system behavior.  There’s some 
tuning you can do to increase the limits, but yes, it’s a problem that behaves 
poorly in an untuned system.

Scott

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: kernel: mps0: Out of chain frames, consider increasing hw.mps.max_chains.

2016-03-08 Thread Scott Long

> On Mar 8, 2016, at 10:07 AM, Slawa Olhovchenkov  wrote:
> 
> On Mon, Mar 07, 2016 at 02:10:12PM +0300, Slawa Olhovchenkov wrote:
> 
>>>>>> This allocated one for all controllers, or allocated for every 
>>>>>> controller?
>>>>> 
>>>>> It’s per-controller.
>>>>> 
>>>>> I’ve thought about making the tuning be dynamic at runtime.  I
>>>>> implemented similar dynamic tuning for other drivers, but it seemed
>>>>> overly complex for low benefit.  Implementing it for this driver
>>>>> would be possible but require some significant code changes.
>>>> 
>>>> What cause of chain_free+io_cmds_active << max_chains?
>>>> One cmd can use many chains?
>>> 
>>> Yes.  A request uses and active command, and depending on the size of the 
>>> I/O,
>>> it might use several chain frames.
> 
> I am play with max_chains and like significant cost of handling
> max_chains: with 8192 system resonded badly vs 2048. Now try 3192,
> response like with 2048.

Hi, I’m not sure I understand what you’re saying.  You said that you tried 
8192, but the system still complained of being out of chain frames?  Now you 
are trying fewer, only 3192?

Thanks,
Scott

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: kernel: mps0: Out of chain frames, consider increasing hw.mps.max_chains.

2016-03-06 Thread Scott Long

> On Mar 6, 2016, at 10:04 PM, Slawa Olhovchenkov  wrote:
> 
> On Sun, Mar 06, 2016 at 06:20:06PM -0800, Scott Long wrote:
> 
>> 
>>> On Mar 6, 2016, at 1:27 PM, Slawa Olhovchenkov  wrote:
>>> 
>>> On Sun, Mar 06, 2016 at 01:10:42PM -0800, Scott Long wrote:
>>> 
>>>> Hi,
>>>> 
>>>> The message is harmless, it's a reminder that you should tune the kernel 
>>>> for your workload.  When the message is triggered, it means that a 
>>>> potential command was deferred, likely for only a few microseconds, and 
>>>> then everything moved on as normal.  
>>>> 
>>>> A command uses anywhere from 0 to a few dozen chain frames per I/o, 
>>>> depending on the size of the io.  The chain frame memory is allocated at 
>>>> boot so that it's always available, not allocated on the fly.  When I 
>>>> wrote this driver, I felt that it would be wasteful to reserve memory for 
>>>> a worst case scenario of all large io's by default, so I put in this 
>>>> deferral system with a console reminder to for tuning.  
>>>> 
>>>> Yes, you actually do have 900 io's outstanding.  The controller buffers 
>>>> the io requests and allows the system to queue up much more than what sata 
>>>> disks might allow on their own.  It's debatable if this is good or bad, 
>>>> but it's tunable as well.
>>>> 
>>>> Anyways, the messages should not cause alarm.  Either tune up the chain 
>>>> frame count, or tune down the max io count.
>>> 
>>> I am don't know depends or not, but I see dramaticaly performance drop
>>> at time of this messages.
>>> 
>> 
>> Good to know.  Part of the performance drop might be because of the slowness 
>> of printing to the console.
> 
> no, on console print may be one per minute
> 

The one-per-minute prints are by design.  I should probably make it print once 
and then increment a sysctl counter.

>>> How I can calculate buffers numbers?
>> 
>> If your system is new enough to have mpsutil, please run it ‘mpsutil
>> show iocfacts’.
> 
> As I see mpsutil present only on -HEAD.
> Can I compile it on 10-STABLE?
> 

Yes, I believe it should compile on 10, but I have not tried it recently.

>> If not, then boot your system with bootverbose and send me the output.
> 
> I can do this day ago.
> 
>>> I am have very heavy I/O.
>> 
>> Out of curiosity, do you redefine MAXPHYS/DFLTPHYS in your kernel config?
> 
> no
> 
>>> This allocated one for all controllers, or allocated for every controller?
>> 
>> It’s per-controller.
>> 
>> I’ve thought about making the tuning be dynamic at runtime.  I
>> implemented similar dynamic tuning for other drivers, but it seemed
>> overly complex for low benefit.  Implementing it for this driver
>> would be possible but require some significant code changes.
> 
> What cause of chain_free+io_cmds_active << max_chains?
> One cmd can use many chains?

Yes.  A request uses and active command, and depending on the size of the I/O,
it might use several chain frames.

Scott

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: kernel: mps0: Out of chain frames, consider increasing hw.mps.max_chains.

2016-03-06 Thread Scott Long

> On Mar 6, 2016, at 1:27 PM, Slawa Olhovchenkov  wrote:
> 
> On Sun, Mar 06, 2016 at 01:10:42PM -0800, Scott Long wrote:
> 
>> Hi,
>> 
>> The message is harmless, it's a reminder that you should tune the kernel for 
>> your workload.  When the message is triggered, it means that a potential 
>> command was deferred, likely for only a few microseconds, and then 
>> everything moved on as normal.  
>> 
>> A command uses anywhere from 0 to a few dozen chain frames per I/o, 
>> depending on the size of the io.  The chain frame memory is allocated at 
>> boot so that it's always available, not allocated on the fly.  When I wrote 
>> this driver, I felt that it would be wasteful to reserve memory for a worst 
>> case scenario of all large io's by default, so I put in this deferral system 
>> with a console reminder to for tuning.  
>> 
>> Yes, you actually do have 900 io's outstanding.  The controller buffers the 
>> io requests and allows the system to queue up much more than what sata disks 
>> might allow on their own.  It's debatable if this is good or bad, but it's 
>> tunable as well.
>> 
>> Anyways, the messages should not cause alarm.  Either tune up the chain 
>> frame count, or tune down the max io count.
> 
> I am don't know depends or not, but I see dramaticaly performance drop
> at time of this messages.
> 

Good to know.  Part of the performance drop might be because of the slowness of 
printing to the console.

> How I can calculate buffers numbers?

If your system is new enough to have mpsutil, please run it ‘mpsutil show 
iocfacts’.  If not, then boot your system with bootverbose and send me the 
output.

> I am have very heavy I/O.

Out of curiosity, do you redefine MAXPHYS/DFLTPHYS in your kernel config?

> This allocated one for all controllers, or allocated for every controller?

It’s per-controller.

I’ve thought about making the tuning be dynamic at runtime.  I  implemented 
similar dynamic tuning for other drivers, but it seemed overly complex for low 
benefit.  Implementing it for this driver would be possible but require some 
significant code changes.

Scott

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: kernel: mps0: Out of chain frames, consider increasing hw.mps.max_chains.

2016-03-06 Thread Scott Long
Hi,

The message is harmless, it's a reminder that you should tune the kernel for 
your workload.  When the message is triggered, it means that a potential 
command was deferred, likely for only a few microseconds, and then everything 
moved on as normal.  

A command uses anywhere from 0 to a few dozen chain frames per I/o, depending 
on the size of the io.  The chain frame memory is allocated at boot so that 
it's always available, not allocated on the fly.  When I wrote this driver, I 
felt that it would be wasteful to reserve memory for a worst case scenario of 
all large io's by default, so I put in this deferral system with a console 
reminder to for tuning.  

Yes, you actually do have 900 io's outstanding.  The controller buffers the io 
requests and allows the system to queue up much more than what sata disks might 
allow on their own.  It's debatable if this is good or bad, but it's tunable as 
well.

Anyways, the messages should not cause alarm.  Either tune up the chain frame 
count, or tune down the max io count.

Scott

Sent from my iPhone

> On Mar 6, 2016, at 11:45 AM, Slawa Olhovchenkov  wrote:
> 
> I am use 10-STABLE r295539 and LSI SAS2008.
> 
> mps0:  port 0x8000-0x80ff mem 
> 0xdfc0-0xdfc03fff,0xdfb8-0xdfbb irq 32 at device 0.0 on pci2
> mps0: Firmware: 15.00.00.00, Driver: 20.00.00.00-fbsd
> mps0: IOCCapabilities: 185c
> mps1:  port 0x7000-0x70ff mem 
> 0xdf60-0xdf603fff,0xdf58-0xdf5b irq 34 at device 0.0 on pci3
> mps1: Firmware: 17.00.01.00, Driver: 20.00.00.00-fbsd
> mps1: IOCCapabilities: 185c
> mps2:  port 0xf000-0xf0ff mem 
> 0xfba0-0xfba03fff,0xfb98-0xfb9b irq 50 at device 0.0 on pci129
> mps2: Firmware: 15.00.00.00, Driver: 20.00.00.00-fbsd
> mps2: IOCCapabilities: 185c
> mps3:  port 0xe000-0xe0ff mem 
> 0xfb40-0xfb403fff,0xfb38-0xfb3b irq 56 at device 0.0 on pci130
> mps3: Firmware: 15.00.00.00, Driver: 20.00.00.00-fbsd
> mps3: IOCCapabilities: 185c
> 
> Some time ago I am see in log messages like this:
> 
> Mar  6 22:28:27 edge02 kernel: mps3: Out of chain frames, consider increasing 
> hw.mps.max_chains.
> Mar  6 22:28:28 edge02 kernel: mps1: Out of chain frames, consider increasing 
> hw.mps.max_chains.
> Mar  6 22:28:28 edge02 kernel: mps0: Out of chain frames, consider increasing 
> hw.mps.max_chains.
> Mar  6 22:29:39 edge02 kernel: mps0: Out of chain frames, consider increasing 
> hw.mps.max_chains.
> Mar  6 22:30:07 edge02 kernel: mps3: Out of chain frames, consider increasing 
> hw.mps.max_chains.
> Mar  6 22:30:09 edge02 kernel: mps1: Out of chain frames, consider increasing 
> hw.mps.max_chains.
> 
> This is peak hour. I am try to monitoring:
> 
> root@edge02:/ # sysctl dev.mps | grep -e chain_free: -e io_cmds_active
> dev.mps.3.chain_free: 70
> dev.mps.3.io_cmds_active: 901
> dev.mps.2.chain_free: 504
> dev.mps.2.io_cmds_active: 725
> dev.mps.1.chain_free: 416
> dev.mps.1.io_cmds_active: 896
> dev.mps.0.chain_free: 39
> dev.mps.0.io_cmds_active: 12
> root@edge02:/ # sysctl dev.mps | grep -e chain_free: -e io_cmds_active
> dev.mps.3.chain_free: 412
> dev.mps.3.io_cmds_active: 572
> dev.mps.2.chain_free: 718
> dev.mps.2.io_cmds_active: 687
> dev.mps.1.chain_free: 211
> dev.mps.1.io_cmds_active: 906
> dev.mps.0.chain_free: 65
> dev.mps.0.io_cmds_active: 144
> root@edge02:/ # sysctl dev.mps | grep -e chain_free: -e io_cmds_active
> dev.mps.3.chain_free: 500
> dev.mps.3.io_cmds_active: 629
> dev.mps.2.chain_free: 623
> dev.mps.2.io_cmds_active: 676
> dev.mps.1.chain_free: 251
> dev.mps.1.io_cmds_active: 907
> dev.mps.0.chain_free: 139
> dev.mps.0.io_cmds_active: 144
> 
> [...]
> 
> root@edge02:/ # sysctl dev.mps | grep -e chain_free: -e io_cmds_active
> dev.mps.3.chain_free: 1874
> dev.mps.3.io_cmds_active: 78
> dev.mps.2.chain_free: 1888
> dev.mps.2.io_cmds_active: 64
> dev.mps.1.chain_free: 1922
> dev.mps.1.io_cmds_active: 42
> dev.mps.0.chain_free: 1936
> dev.mps.0.io_cmds_active: 48
> root@edge02:/ # sysctl dev.mps | grep -e chain_free: -e io_cmds_active
> dev.mps.3.chain_free: 1890
> dev.mps.3.io_cmds_active: 78
> dev.mps.2.chain_free: 1890
> dev.mps.2.io_cmds_active: 82
> dev.mps.1.chain_free: 1729
> dev.mps.1.io_cmds_active: 150
> dev.mps.0.chain_free: 1893
> dev.mps.0.io_cmds_active: 57
> 
> What this mean? Why with allocated 2048 chains per controller I see 65 free 
> and 144 allocated?
> How I got 976 active commands? I am use SATA HDD suported only 32 tags on 
> NCQ, with 8 ports
> this maximum 256 outstanding commands per controller.
> 
> How I can resolve this issue?
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


10.2 Release - no core dump found after crash

2016-02-11 Thread Scott Otis
ing entropy file:.
Setting hostname: [--redacted--]
hn0: link state changed to UP
Starting dhclient.
DHCPDISCOVER on hn0 to 255.255.255.255 port 67 interval 7
DHCPOFFER from 168.63.129.16
unknown dhcp option value 0xf5
DHCPREQUEST on hn0 to 255.255.255.255 port 67
DHCPACK from 168.63.129.16
unknown dhcp option value 0xf5
bound to 10.0.2.5 -- renewal in -1 seconds.
Starting Network: lo0 hn0.
lo0: flags=8049 metric 0 mtu 16384
options=63
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
inet 127.0.0.1 netmask 0xff00
nd6 options=21
hn0: flags=8843 metric 0 mtu 1500
options=31b
ether 00:0d:3a:31:0d:fc
inet 10.0.2.5 netmask 0xff00 broadcast 10.0.2.255
nd6 options=29
Starting devd.
add net fe80::: gateway ::1
add net ff02::: gateway ::1
add net :::0.0.0.0: gateway ::1
add net ::0.0.0.0: gateway ::1
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib
32-bit compatibility ldconfig path: /usr/lib32
Creating and/or trimming log files.
Starting syslogd.
No core dumps found.
Clearing /tmp (X related).
Updating motd:.
Mounting late file systems:.
Performing sanity check on sshd configuration.

Any help would be appreciated!

Scott Otis
CTO & Co-Founder
Tandem
www.tandemcal.com<http://www.tandemcal.com/>

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: 10.2 Release seems to be crashing on Azure with "Standard DS" VM sizes

2016-02-08 Thread Scott Otis
ED]
psm0: model IntelliMouse Explorer, device ID 4
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (115200,n,8,1)
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
fdc0:  port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on 
acpi0
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
orm0:  at iomem 0xc-0xcbfff on isa0
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
ppc0: cannot reserve I/O port range
Timecounters tick every 1.000 msec
random: unblocking device.
Timecounter "TSC" frequency 1109287570 Hz quality 800
Timecounter "Hyper-V" frequency 1000 Hz quality 1000
storvsc0 on vmbus0
storvsc1 on vmbus0
hyperv-utils0 on vmbus0
hyperv-utils0: Hyper-V Service attaching: Hyper-V Heartbeat Service

hyperv-utils1 on vmbus0
hyperv-utils1: Hyper-V Service attaching: Hyper-V KVP Service

da0 at blkvsc0 bus 0 scbus1 target 0 lun 0
hyperv-utils2 on vmbus0
hyperv-utils2: Hyper-V Service attaching: Hyper-V Shutdown Service

da0:  Fixed Direct Access SPC-2 SCSI device
hyperv-utils3 on vmbus0
hyperv-utils3: Hyper-V Service attaching: Hyper-V Time Synch Service

da0: 300.000MB/s transfers
hn0:  on vmbus0
da0: Command Queueing enabled
da0: 21505MB (44042240 512 byte sectors: 255H 63S/T 2741C)
da1 at blkvsc1 bus 0 scbus2 target 1 lun 0
da1:  Fixed Direct Access SPC-2 SCSI device
da1: 300.000MB/s transfershn0: unknown status 1073872902 received

hn0: unknown status 1073872902 received
hn0: hv send offload request succeeded
da1: Command Queueing enabled
hn0: Using defaults for TSO: 65518/35/2048
hn0: Ethernet address: 00:0d:3a:31:0d:fc
storvsc2 on vmbus0
da1: 7168MB (14680064 512 byte sectors: 255H 63S/T 913C)
storvsc3 on vmbus0
Trying to mount root from ufs:/dev/gpt/rootfs [rw]...
WARNING: / was not properly dismounted
Setting hostuuid: 253d30c5-d221-8548-b887-f87baa231b2d.
Setting hostid: 0xd995724c.
Entropy harvesting: interrupts ethernet point_to_point swi.
Starting file system checks:
** SU+J Recovering /dev/gpt/rootfs
** Reading 33554432 byte journal from inode 8.
** Biilding recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.
** 1 journal records in 512 bytes for 6.25% utilization
** Freed 1 inodes (0 dirs) 0 blocks, and 0 frags.

* FILE SYSTEM MARKED CLEAN *
Mounting local file systems:.
Writing entropy file:.
Setting hostname: [--redacted--]
hn0: link state changed to UP
Starting dhclient.
DHCPDISCOVER on hn0 to 255.255.255.255 port 67 interval 7
DHCPOFFER from 168.63.129.16
unknown dhcp option value 0xf5
DHCPREQUEST on hn0 to 255.255.255.255 port 67
DHCPACK from 168.63.129.16
unknown dhcp option value 0xf5
bound to 10.0.2.5 -- renewal in -1 seconds.
Starting Network: lo0 hn0.
lo0: flags=8049 metric 0 mtu 16384
options=63
inet6 ::1 prefixlen 128 
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
inet 127.0.0.1 netmask 0xff00 
nd6 options=21
hn0: flags=8843 metric 0 mtu 1500
options=31b
ether 00:0d:3a:31:0d:fc
inet 10.0.2.5 netmask 0xff00 broadcast 10.0.2.255 
nd6 options=29
Starting devd.
add net fe80::: gateway ::1
add net ff02::: gateway ::1
add net :::0.0.0.0: gateway ::1
add net ::0.0.0.0: gateway ::1
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib
32-bit compatibility ldconfig path: /usr/lib32
Creating and/or trimming log files.
Starting syslogd.
No core dumps found.
Clearing /tmp (X related).
Updating motd:.
Mounting late file systems:.
Performing sanity check on sshd configuration.

Scott

-Original Message-
From: Adrian Chadd [mailto:adrian.ch...@gmail.com] 
Sent: Sunday, February 7, 2016 12:08 AM
To: Scott Otis 
Cc: freebsd-stable@freebsd.org
Subject: Re: 10.2 Release seems to be crashing on Azure with "Standard DS" VM 
sizes

Hm, the runtime going backwards is a bit odd, maybe they fixed that in -HEAD 
recently. Other than that, yeah, you'll need a crash dump or at least some 
screenshot when it does reboot.


-a
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 10.2 Release seems to be crashing on Azure with "Standard DS" VM sizes

2016-02-07 Thread Scott Otis
After setting crash dump settings correctly I'm still not seeing a dump in 
/var/crash/ - will try to get screenshot my of the boot screen after the crash 
and get back to you.

Scott

On Feb 7, 2016, at 12:08 AM, Adrian Chadd  wrote:

Hm, the runtime going backwards is a bit odd, maybe they fixed that in
-HEAD recently. Other than that, yeah, you'll need a crash dump or at
least some screenshot when it does reboot.


-a
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: 10.2 Release seems to be crashing on Azure with "Standard DS" VM sizes

2016-02-05 Thread Scott Otis
Adrian,

3.5 GB of RAM.

Here is the list from dmseg if that helps:

Copyright (c) 1992-2015 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.2-RELEASE #0 r28: Wed Aug 12 15:26:37 UTC 2015
r...@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
CPU: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz (1109.74-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206d7  Family=0x6  Model=0x2d  Stepping=7
  
Features=0xf83fbff
  
Features2=0x9e982203
  AMD Features=0x20100800
  AMD Features2=0x1
  XSAVE Features=0x1
Hypervisor: Origin = "Microsoft Hv"
real memory  = 3758096384 (3584 MB)
avail memory = 3515142144 (3352 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: 
ioapic0: Changing APIC ID to 0
ioapic0  irqs 0-23 on motherboard
random:  initialized
kbd1 at kbdmux0
vmbus0:  on motherboard
acpi0:  on motherboard
acpi0: Power Button (fixed)
cpu0:  on acpi0
attimer0:  port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0:  port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
isab0:  at device 7.0 on pci0
isa0:  on isab0
atapci0:  port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0
ata0:  at channel 0 on atapci0
ata1:  at channel 1 on atapci0
pci0:  at device 7.3 (no driver attached)
vgapci0:  mem 0xf800-0xfbff irq 11 at device 
8.0 on pci0
vgapci0: Boot video device
atkbdc0:  port 0x60,0x64 irq 1 on acpi0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0:  irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (115200,n,8,1)
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
fdc0:  port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on 
acpi0
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
orm0:  at iomem 0xc-0xcbfff on isa0
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
ppc0: cannot reserve I/O port range
Timecounters tick every 1.000 msec
random: unblocking device.
Timecounter "TSC" frequency 1109744806 Hz quality 800
Timecounter "Hyper-V" frequency 1000 Hz quality 1000
storvsc0 on vmbus0
storvsc1 on vmbus0
hyperv-utils0 on vmbus0
hyperv-utils0: Hyper-V Service attaching: Hyper-V Heartbeat Service

hyperv-utils1 on vmbus0
hyperv-utils1: Hyper-V Service attaching: Hyper-V KVP Service

hyperv-utils2 on vmbus0
hyperv-utils2: Hyper-V Service attaching: Hyper-V Shutdown Service

hyperv-utils3 on vmbus0
hyperv-utils3: Hyper-V Service attaching: Hyper-V Time Synch Service

da0 at blkvsc0 bus 0 scbus1 target 0 lun 0
hn0:  on vmbus0
da0:  Fixed Direct Access SPC-2 SCSI device
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 21505MB (44042240 512 byte sectors: 255H 63S/T 2741C)
hn0: unknown status 1073872902 received
hn0: unknown status 1073872902 received
hn0: hv send offload request succeeded
hn0: Using defaults for TSO: 65518/35/2048
hn0: Ethernet address: 00:0d:3a:31:7e:9e
storvsc2 on vmbus0
da1 at blkvsc1 bus 0 scbus2 target 1 lun 0
da1:  Fixed Direct Access SPC-2 SCSI device
storvsc3 on vmbus0
da1: 300.000MB/s transfers
da1: Command Queueing enabled
da1: 7168MB (14680064 512 byte sectors: 255H 63S/T 913C)
Trying to mount root from ufs:/dev/gpt/rootfs [rw]...
WARNING: / was not properly dismounted
calcru: runtime went backwards from 3737 usec to 1889 usec for pid 272 
(dhclient)
calcru: runtime went backwards from 97244 usec to 49171 usec for pid 272 
(dhclient)
calcru: runtime went backwards from 47052 usec to 44339 usec for pid 16 (sh)
calcru: runtime went backwards from 595 usec to 410 usec for pid 5 (pagedaemon)
calcru: runtime went backwards from 18 usec to 9 usec for pid 4 (sctp_iterator)
calcru: runtime went backwards from 8932 usec to 4569 usec for pid 3 (fdc0)
calcru: runtime went backwards from 702786 usec to 368788 usec for pid 2 (cam)
calcru: runtime went backwards from 7662 usec to 6957 usec for pid 14 
(rand_harvestq)
calcru: runtime went backwards from 8949 usec to 4525 usec for pid 13 (geom)
calcru: runtime went backwards from 38401460 usec to 32753458 usec for pid 11 
(idle)
calcru: runtime went backwards from 189139 usec to 95666 usec for pid 1 (init)
calcru: runtime went backwards from 1078465 usec to 545696 usec for pid 0 
(kernel)

Scott


-----Original Messag

10.2 Release seems to be crashing on Azure with "Standard DS" VM sizes

2016-02-05 Thread Scott Otis
Been trying to get a FreeBSD VM server running on Azure on a "Standard DS" size 
VM (so I have access to SSD storage for PostgreSQL).  The "Standard DS" series 
of VMs also have faster/newer CPUs than the original "Standard A" series of 
VMs.  I am using the image of FreeBSD from here: 
https://vmdepot.msopentech.com/Vhd/Show?vhdId=56718&version=61117 .  After 
setting up the VM it started rebooting every hour or so.  Initially I thought 
this was an Azure issue and Azure was forcibly rebooting the VM because it 
wasn't getting health reports back.  But I don't' believe this is the case 
anymore and I think FreeBSD is crashing.  (Note: I have setup a FreeBSD VM with 
a "Standard A" series VM and that seems to be running fine - though it looks 
like it might have crashed after about 38 hours which is much better than 
crashing after 1 hour).

Here is the "last" log from the last two days:

[--redacted--]   pts/2[--redacted--] Fri Feb  5 21:10   still logged in
[--redacted--]   pts/1[--redacted--] Fri Feb  5 21:10   still logged in
[--redacted--]   pts/0[--redacted--] Fri Feb  5 20:58 - 21:10  (00:12)
boot time  Fri Feb  5 20:53
[--redacted--]   pts/1[--redacted--] Fri Feb  5 04:59 - crash  (15:54)
[--redacted--]   pts/0[--redacted--] Fri Feb  5 04:59 - 04:59  (00:00)
boot time  Fri Feb  5 04:41
boot time  Fri Feb  5 02:16
[--redacted--]   pts/1[--redacted--] Fri Feb  5 00:36 - crash  (01:40)
[--redacted--]   pts/0[--redacted--] Fri Feb  5 00:36 - 00:36  (00:00)
boot time  Fri Feb  5 00:29
shutdown time  Thu Feb  4 09:33
boot time  Thu Feb  4 08:47
[--redacted--]   pts/0[--redacted--] Thu Feb  4 07:25 - crash  (01:21)
boot time  Thu Feb  4 07:00
boot time  Thu Feb  4 05:03
boot time  Thu Feb  4 02:45
[--redacted--]   pts/0[--redacted--] Thu Feb  4 01:08 - crash  (01:36)
boot time  Thu Feb  4 00:58

There are boot times without previous shutdown times - and there is that 
"crash" text (which is why I'm thinking it is a crash).

It looks by default the OS is setup to NOT save crash dumps


sudo dumpon -v -l

kernel dumps on /dev/null


If that is the case - do I add this to /etc/rc.conf?



dumpdev="AUTO"

dumpdir="/var/crash"



Is there anything I need to adjust there?

Does that only take affect after a reboot?

Do I need to set anything with dumpon(8)?



Here is the info on the swapfile:



sudo swapinfo -h

Device  512-blocks UsedAvail Capacity

/dev/gpt/swapfs2097152   0B 1.0G     0%



Is that enough space for a kernel crash dump?



Thanks for all your help with this.

Regards,

Scott Otis
CTO & Co-Founder
Tandem
www.tandemcal.com<http://www.tandemcal.com/>

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ISCI bus_alloc_resource failed

2015-09-07 Thread Scott Long
This is really weird.  According to what you’ve posted, it’s advertising itself 
as an SMBus controller with no BARs.  Maybe you’re passing through the wrong 
device, or someone added the wrong PCI device id data to the driver?

Scott

> On Sep 7, 2015, at 11:34 AM, Bradley W. Dutton 
>  wrote:
> 
> Hi,
> 
> I'm having trouble with the isci driver in both stable and current. I see the 
> following dmesg in stable:
> 
> isci0:  port 
> 0x5000-0x50ff mem 0xe7afc000-0xe7af,0xe740-0xe77f irq 19 at 
> device 0.0 on pci11
> isci: 1:51 ISCI bus_alloc_resource failed
> 
> 
> I'm running FreeBSD on VMWare ESXi 6 with vt-d passthrough of the isci 
> devices, here is the relevant pciconf output:
> 
> none2@pci0:3:0:0: class=0x0c0500 card=0x062815d9 chip=0x1d708086 rev=0x06 
> hdr=0x00
>vendor = 'Intel Corporation'
>device = 'C600/X79 series chipset SMBus Controller 0'
>class  = serial bus
>subclass   = SMBus
>cap 10[90] = PCI-Express 2 endpoint max data 128(128) link x32(x32)
> speed 5.0(5.0) ASPM disabled(L0s)
>cap 01[cc] = powerspec 3  supports D0 D3  current D0
>cap 05[d4] = MSI supports 1 message
>ecap 000e[100] = ARI 1
> isci0@pci0:11:0:0:class=0x010700 card=0x062815d9 chip=0x1d6b8086 rev=0x06 
> hdr=0x00
>vendor = 'Intel Corporation'
>device = 'C602 chipset 4-Port SATA Storage Control Unit'
>class  = mass storage
>subclass   = SAS
>cap 01[98] = powerspec 3  supports D0 D3  current D0
>cap 10[c4] = PCI-Express 2 endpoint max data 128(128) link x32(x32)
> speed 5.0(5.0) ASPM disabled(L0s)
>cap 11[a0] = MSI-X supports 2 messages
> Table in map 0x10[0x2000], PBA in map 0x10[0x3000]
>ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
>ecap 000e[138] = ARI 1
>ecap 0017[180] = TPH Requester 1
>ecap 0010[140] = SRIOV 1
> 
> 
> I haven't tried booting on bare metal but running a linux distro (centos 7) 
> in the same VM works without issue. Is is possible the SRIOV option is 
> causing trouble? I don't see a BIOS option to disable that setting on this 
> server like I have on some others. Any other ideas to get this working?
> 
> Thanks,
> Brad
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS Panic after freebsd-update

2013-07-01 Thread Scott Sipe
On Mon, Jul 1, 2013 at 1:04 PM, Jeremy Chadwick  wrote:

> On Mon, Jul 01, 2013 at 12:23:45PM -0400, Paul Mather wrote:
> > On Jul 1, 2013, at 11:49 AM, Jeremy Chadwick  wrote:
> >
> > > Of course when I see lines like this:
> > >
> > >  Trying to mount root from zfs:zroot
> > >
> > >  ...this greatly diminishes any chances of "live debugging" on the
> > >  system.  It amazes me how often I see this come up on the lists --
> people
> > >  who have ZFS problems but use ZFS for their root/var/tmp/usr.  I wish
> > >  that behaviour would stop, as it makes debugging ZFS a serious PITA.
> > >  This comes up on the list almost constantly, sad panda.
> >
> >
> > I'm not sure why it amazes you that people are making widespread use of
> ZFS.
>
> It's not widespread use of ZFS.  It's widespread use of ZFS as their
> sole filesystem (specifically root/var/tmp/usr, or more specifically
> just root/usr).  People are operating with the belief that "ZFS just
> works", when reality shows "it works until it doesn't".  The mentality
> seems to be "it's so rock solid it'll never break" along with "it can't
> happen to me".  I tend to err on the side of caution, hence avoidance of
> ZFS for critical things like the aforementioned.
>
> It's different if you have a UFS root/var/tmp/usr and ZFS for everything
> else.  You then have a system you can boot/use without issue even if ZFS
> is crapping the bed.
>


> ...
>


> 95% of FreeBSD users cannot debug kernel problems**.  To debug a kernel
> problem, you need: a crash dump, a usable system with the exact
> kernel/world where the crash happened (i.e. you cannot crash 8.4 ZFS and
> boot into 8.2 and reliably debug it using that), and (most important of
> all) a developer who is familiar with kernel debugging *and* familiar
> with the bits which are crashing.  Those who say what you're quoting are
> often the latter.
>


> ...
>


> But the OP is running -RELEASE, and chooses to run that, along with use
> of freebsd-update for binary updates.  Their choices are limited: stick
> with 8.2, switch to stable/X, cease use of ZFS, or change OSes entirely.
>

So I realize that neither 8.2-RELEASE or 8.4-RELEASE are stable, but I
ultimately wasn't sure where the right place to go for discuss 8.4 is?
Beyond the FS mailing list, was there a better place for my question? I'll
provide the other requested information (zfs outputs, etc) to wherever
would be best.

This is a production machine (has been since late 2010) and after tweaking
some ZFS settings initially has been totally stable. I wasn't incredibly
closely involved in the initial configuration, but I've done at least one
binary freebsd-update previously.

Before this computer I had always done source upgrades. ZFS (and the
thought of a panic like the one I saw this weekend!) made me leery of doing
that. We're a small business--we have this server, an offsite backup
server, and a firewall box. I understand that issues like this are are
going to happen when I don't have a dedicated testing box, I just like to
try to minimize them and keep them to weekends!

It sounds like my best bet might be to add a new UFS disk, do a clean
install of 9.1 onto that disk, and then import my existing ZFS pool?

Thanks,
Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


ZFS Panic after freebsd-update

2013-07-01 Thread Scott Sipe
*** Sorry for partial first message! (gmail sent after multiple returns
apparently?) ***

Hello,

I have not had much time to research this problem yet, so please let me
know what further information I might be able to provide.

This weekend I attempted to upgrade a computer from 8.2-RELEASE-p3 to 8.4
using freebsd-update. After I rebooted to test the new kernel, I got a
panic. I had to take a picture of the screen. Here's a condensed version:

panic: page fault
cpuid = 1
KDB: stack backtrace:
#0 kdb_backtrace
#1 panic
#2 trap_fatal
#3 trap_pfault
#4 trap
#5 calltrap
#6 vdev_mirror_child_select
#7 ved_mirror_io_start
#8 zio_vdev_io_start
#9 zio_execute
#10 arc_read
#11 dbuf_read
#12 dbuf_findbp
#13 dbuf_hold_impl
#14 dbuf_hold
#15 dnode_hold_impl
#16 dnu_buf_hold
#17 zap_lockdir
Uptime: 5s
Cannot dump. Device not defined or unavailable.
Automatic reboot in 15 seconds - press a key on the console to abort

uname -a from before (and after) the reboot:

FreeBSD xeon 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57
UTC 2011 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
 amd64

dmesg is attached.

I was able to reboot to the old kernel and am up and running back on 8.2
right now.

Any thoughts?

Thanks,
Scott
Copyright (c) 1992-2011 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57 UTC 2011
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU   E5520  @ 2.27GHz (2266.76-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x106a5  Family = 6  Model = 1a  Stepping = 5
  
Features=0xbfebfbff
  
Features2=0x9ce3bd
  AMD Features=0x28100800
  AMD Features2=0x1
  TSC: P-state invariant
real memory  = 18253611008 (17408 MB)
avail memory = 16513347584 (15748 MB)
ACPI APIC Table: <031710 APIC1617>
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 SMT threads
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
 cpu4 (AP): APIC ID:  4
 cpu5 (AP): APIC ID:  5
 cpu6 (AP): APIC ID:  6
 cpu7 (AP): APIC ID:  7
 cpu8 (AP): APIC ID: 16
 cpu9 (AP): APIC ID: 17
 cpu10 (AP): APIC ID: 18
 cpu11 (AP): APIC ID: 19
 cpu12 (AP): APIC ID: 20
 cpu13 (AP): APIC ID: 21
 cpu14 (AP): APIC ID: 22
 cpu15 (AP): APIC ID: 23
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-47 on motherboard
kbd1 at kbdmux0
acpi0: <031710 XSDT1617> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a (3) failed
acpi0: reservation of 10, bff0 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0:  on acpi0
ACPI Warning: Incorrect checksum in table [OEMB] - 0xAD, should be 0xAA 
(20101013/tbutils-354)
cpu1:  on acpi0
cpu2:  on acpi0
cpu3:  on acpi0
cpu4:  on acpi0
cpu5:  on acpi0
cpu6:  on acpi0
cpu7:  on acpi0
cpu8:  on acpi0
cpu9:  on acpi0
cpu10:  on acpi0
cpu11:  on acpi0
cpu12:  on acpi0
cpu13:  on acpi0
cpu14:  on acpi0
cpu15:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pcib1:  at device 1.0 on pci0
pci10:  on pcib1
pcib2:  at device 3.0 on pci0
pci9:  on pcib2
pcib3:  at device 7.0 on pci0
pci8:  on pcib3
pcib4:  at device 8.0 on pci0
pci7:  on pcib4
pcib5:  at device 9.0 on pci0
pci6:  on pcib5
pcib6:  at device 10.0 on pci0
pci5:  on pcib6
pci0:  at device 20.0 (no driver 
attached)
pci0:  at device 20.1 (no driver 
attached)
pci0:  at device 20.2 (no driver 
attached)
pci0:  at device 20.3 (no driver 
attached)
pci0:  at device 22.0 (no driver attached)
pci0:  at device 22.1 (no driver attached)
pci0:  at device 22.2 (no driver attached)
pci0:  at device 22.3 (no driver attached)
pci0:  at device 22.4 (no driver attached)
pci0:  at device 22.5 (no driver attached)
pci0:  at device 22.6 (no driver attached)
pci0:  at device 22.7 (no driver attached)
uhci0:  port 0xa400-0xa41f irq 16 
at device 26.0 on pci0
uhci0: [ITHREAD]
uhci0: LegSup = 0x2f00
usbus0:  on uhci0
uhci1:  port 0xa480-0xa49f irq 21 
at device 26.1 on pci0
uhci1: [ITHREAD]
uhci1: LegSup = 0x2f00
usbus1:  on uhci1
uhci2:  port 0xa800-0xa81f irq 19 
at device 26.2 on pci0
uhci2: [ITHREAD]
uhci2: LegSup = 0x2f00
usbus2:  on uhci2
ehci0:  mem 
0xfbcf4000-0xfbcf43ff irq 18 at device 26.7 on pci0
ehci0: [ITHREAD]
usbus3: EHCI version 1.0
usbus3:  on ehci0
pcib7:  irq 17 at device 28.0 on pci0
pci4:  on pcib7
pcib8:  irq 17 at device 28.4 on pci0
pci3:  on pcib8
em0:  port 0xec00-0xec1f mem 
0xfbee-0xfbef,0xfbedc000-0xfbed irq 16 at device 0.0 on pci3
em0: Using MSIX interrupts with 3 vectors
em0: [ITHREAD]
em0: [ITHREAD]
em0: [ITHREAD]
em0

ZFS Panic after freebsd-update

2013-07-01 Thread Scott Sipe
Hello,

I have not had much time to research this problem yet, so please let me
know what further information I might be able to provide.

This weekend I attempted to upgrade a computer from 8.2-RELEASE-p3 to 8.4
using freebsd-update. After I rebooted to test the new kernel, I got a
panic. I had to take a picture of the screen. Here's a condensed version:

panic: page fault
cpuid = 1
KDB: stack backtrace:
#0
#1
#2
#3
#4
#5
#6
#6
#6
#6
#6
#6
FreeBSD xeon.cap-press.com 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue
Sep 27 18:45:57 UTC 2011
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
 amd64
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: portupgrade(1) | portmaster(8) -- which is more effective for large upgrade?

2013-06-26 Thread Scott Lambert
On Wed, Jun 26, 2013 at 05:34:45PM -0700, Chris H wrote:
> Greetings, and thank you for your reply.
>
> I understand that portupgrade _will_ pull in other dependencies _as
> needed_ -- I _do_ read the man(1) pages. :)
>
> But it installed (pulled in) far more than those dependencies
> actually required.  I believe, due to the fact that it doesn't
> appear to honor the original build options recorded in
> /var/db/ports//options. Nor, do I recall that it honored
> /etc/make.conf -- make.conf(5). Maybe things have changed?

You may have asked portupgrade to use packages first and fall back
to building from source.  That would install the packages which were
built with the default options on the package building cluster.  It
saves time; but I don't like mixing packages with build from source,
especially when I want custom options on anything.

-- 
Scott LambertKC5MLE   Unix SysAdmin
lamb...@lambertfam.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: Flow monitoring with PF

2013-06-11 Thread Scott, Brian
>I was looking at trying out flow monitoring and I found pfflowd, but 
>unfortunately it does not work with FreeBSD >9.0. I thought about ng_netflow 
>but that doesn't >see my tun interface which may be related to..
>WARNING: attempt to domain_add(netgraph) after domainfinalize()

Noise message. I've never seen it actually mean anything.

The problem is that tun0 is a generic network interface. Ng_ether only exposes 
Ethernet devices. The equivalent to tun but for an Ethernet device is tap. 
Creating a tap device after boot immediately creates the corresponding ng_ether 
node which can then be plumbed into ng_netflow.

Some software is kind enough to work with either tun or tap as a configurable 
option.

>Does anyone have any recommendations for generating flow information from PF?

I've had great success with ng_netflow. I like the fact that all the processing 
is in-kernel.
**
This message is intended for the addressee named and may contain
privileged information or confidential information or both. If you
are not the intended recipient please delete it and notify the sender.
**
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ada(4) and ahci(4) quirk printing

2013-04-24 Thread Scott Long

On Apr 23, 2013, at 4:33 AM, Steven Hartland  wrote:

>>> 
>>> If we can't reach an agreement, I'm happy to wrap the relevant bits with
>>> an "if (bootverbose)", but I really feel users should have some way to
>>> see this information outside of bootverbose.
>> Both da and ada drivers already have sysctl's. It should be trivial to add 
>> one more, especially if just numeric.
> 
> Wouldn't camcontrol be a better place for this?


Yes, the forest of CAM sysctl's needs to be trimmed and discouraged from 
growing.  I like Jeremy's original proposal for exposing quirk information, and 
I think that it would be served well via camcontrol, not the console or sysctl.

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ada(4) and ahci(4) quirk printing

2013-04-24 Thread Scott Long
Your meta-commentary here is irritating.

They're called "quirks" because that is the named given to them in CAM>

Scott

On Apr 23, 2013, at 9:33 AM, Adrian Chadd  wrote:

> .. are we really debating this?
> 
> Stop calling them quirks. That sounds like something that won't mess
> up your actual runtime. It's not giving them enough "weight". They're
> more "device behaviours" or "device flags" or something. Print them
> out like that. I think that _not_ printing them out at boot time is
> insane. Doubly so if it could cause issues before you can actually run
> commands.
> 
> So if it were me, I'd print out the device quirks like we print out
> CPU flags. Ie, all the time.
> 
> 
> 
> Adrian
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


svn revision stable/9

2013-04-22 Thread Scott Reber
I'm using svnup to checkout stable/9 sources.  How do I determine the svn
revision of those sources?  I know that Head has included this in "uname
-a", but I do not see this with stable/9 as of today.

 

 

Regards,

Scott 

 

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Any objections/comments on axing out old ATA stack?

2013-03-31 Thread Scott Long

On Mar 31, 2013, at 7:04 AM, Victor Balada Diaz  wrote:

> On Wed, Mar 27, 2013 at 11:22:14PM +0200, Alexander Motin wrote:
>> Hi.
>> 
>> Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA 
>> stack, using only some controller drivers of old ata(4) by having 
>> `options ATA_CAM` enabled in all kernels by default. I have a wish to 
>> drop non-ATA_CAM ata(4) code, unused since that time from the head 
>> branch to allow further ATA code cleanup.
>> 
>> Does any one here still uses legacy ATA stack (kernel explicitly built 
>> without `options ATA_CAM`) for some reason, for example as workaround 
>> for some regression? Does anybody have good ideas why we should not drop 
>> it now?
> 
> Hello,
> 
> At my previous job we had troubles with NCQ on some controllers. It caused
> failures and silent data corruption. As old ata code didn't use NCQ we just 
> used
> it.
> 
> I reported some of the problems on 8.2[1] but the problem existed with 8.3.
> 
> I no longer have access to those systems, so i don't know if the problem
> still exists or have been fixed on newer versions.
> 
> Regards.
> Victor.


So what I hear you and Matthias saying, I believe, is that it should be easier 
to
force disks to fall back to non-NCQ mode, and/or have a more responsive
black-list for problematic controllers.  Would this help the situation?  It's 
hard to
justify holding back overall forward progress because of some bad controllers;
we do several Tbps off of AHCI controllers with NCQ enabled on FreeBSD 9.x,
enough to make up a sizable percentage of the internet's traffic, and we see no
problems.  How can we move forward but also take care of you guys with
problematic hardware?

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Any objections/comments on axing out old ATA stack?

2013-03-28 Thread Scott Long

On Mar 28, 2013, at 8:00 AM, Ian Lepore  wrote:

> On Thu, 2013-03-28 at 09:17 +0200, Alexander Motin wrote:
>> On 28.03.2013 02:43, Adrian Chadd wrote:
>>> My main concern with the new stuff is that it requires CAM and that's
>>> reasonably big compared to the standalone ATA code.
>>> 
>>> It'd be nice if we could slim down the CAM stack a bit first; it makes
>>> embedding it on the smaller devices really freaking painful.
>> 
>> Are there many boards now with ATA, but without USB? But I agree, it 
>> should be checked.
>> 
> 
> It's not necessarily what the boards have but how they're used.  We use
> industrial SBCs at work that have ata compact flash sockets on the board
> which we do use, and usb interfaces which we don't use.
> 
> I've never tested the new ata+cam stuff on some of these boards, most
> based on Cyrix, Via, Geode, and VortexD86 chipsets.  The older ata code
> works, but not always very well -- for example, we usually have to set
> hw.ata.ata_dma=0 for absolutely no reason we've ever been able to figure
> out except that if we leave it enabled we get DMA errors and panics on
> some CF cards and not on others.  I have no idea whether to expect such
> things to be better, worse, or no different by changing to the ata+cam
> way of doing things (but I don't really have time to do extensive
> testing right now either).
> 


The legacy ATA code was hard to maintain, very buggy (as you point out), and
is essentially unmaintained.  Also, IIRC, the legacy stack simply cannot support
NCQ tagged queueing.

I think that Alexander has done a superb job with both developing and supporting
the CAM_ATA stack.

Scott


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Any objections/comments on axing out old ATA stack?

2013-03-28 Thread Scott Long

On Mar 27, 2013, at 6:43 PM, Adrian Chadd  wrote:

> My main concern with the new stuff is that it requires CAM and that's
> reasonably big compared to the standalone ATA code.
> 

>From a code execution standpoint?  No, it's not.

> It'd be nice if we could slim down the CAM stack a bit first; it makes
> embedding it on the smaller devices really freaking painful.
> 

>From a code segment size standpoint, there's definitely some stuff that should 
>be
made modular and optional.

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: And for our next trick (Audio problems, Envy24HT driver)

2013-02-17 Thread Scott Lambert
On Fri, Feb 15, 2013 at 03:58:37PM -0600, Karl Denninger wrote:
> Says its running, and mpg123 has attached to it -- but no output.
> 
> Mixer says the volume is on:
> 
> [root@NewFS /boot/kernel]# mixer
> Mixer vol  is currently set to  75:75
> Mixer treble   is currently set to   0:0
> Mixer synthis currently set to   0:0
> Mixer pcm  is currently set to  75:75
> Mixer speaker  is currently set to   0:0
> Mixer line is currently set to  75:75
> Mixer mic  is currently set to   0:0
> Mixer cd   is currently set to   0:0
> Mixer mix  is currently set to   0:0
> Recording source: mic
> 
> Ideas for further troubleshooting?

Try increasing the volume for "mix" and/or "speaker".  If that
does not work, increase volume for everything and just see if you
get any output.  Maybe the driver got its lines mixed up?

mixer speaker 75:75
mixer mix 75:75
...

I am not a sound guy.  The last time I messed with sound, there
were some mixer line labels that didn't make sense to me on the
card I was using at the time.  It was a long time ago.

-- 
Scott LambertKC5MLE   Unix SysAdmin
lamb...@lambertfam.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: geom mirror now rebuilding on every reboot?

2012-08-06 Thread Scott Long

On Aug 6, 2012, at 12:06 PM, Gleb Smirnoff  wrote:

>  Michael,
> 
> On Sat, Aug 04, 2012 at 12:49:49PM -0400, Michael Butler wrote:
> M> Something in -current and recently MFC'd to -stable is causing all of my
> M> gmirror drives to rebuild on reboot :-(
> M> 
> M> Being remote and these being production machines, I suspect SVN r237929
> M> and r237930 in -current and SVN r238500 to -stable but haven't yet been
> M> able to prove it.
> 
> I'd appreciate if you test that and either confirm or disclaim that
> r238500 introduces such regression. Thanks!
> 

I'm not sure how r238500 could affect what Max, myself, and presumably the 
original poster are seeing.  There's one other change in 9-stable, r235599, but 
it looks like a benign change as well.

Scott


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Netflix's New Peering Appliance Uses FreeBSD

2012-06-07 Thread Scott Long

On Jun 5, 2012, at 6:16 PM, Scott Long wrote:
> 
> Yes, we are indeed using FreeBSD at Netflix!  For those who are interested, I
> recently moved from Yahoo to Netflix to help support FreeBSD for them, and
> I'm definitely impressed with what is going on there.  Other than a few small
> changes, we're using stock FreeBSD 9, tracking the 9-stable branch on a
> regular basis.  Our chassis is a semi-custom 4U 19" form factor with thirty 
> six
> 3TB SATA disks and 2 SSDs.  Each disk has its own UFS+J filesystem, except for
> the SSDs that are mirrored together with gmirror.  The SSDs hold the OS image
> and cache some of the busiest content.  The other disks hold nothing but the
> audio and video files for our content streams.  We connect to the outside 
> world
> via a twin-port Intel 10GBe optical NIC (only one port is active at the 
> moment),
> and we use LSI MPT2 controllers for 32 of the 36 disks.  The other 4 disks
> connect to the onboard AHCI SATA controller.  All of the disks are
> direct-attach with no SAS backplanes or expanders.  Out-of-band management
> happens via IPMI on an on-board 1Gb NIC.  The entire system consumes
> around 500W of power, making it a very efficient appliance for its 
> functionality.
> 
> Netflix is also at the front of the internet pack with IPv6 roll-out, and 
> FreeBSD
> plays an essential part of that.  We've been working hard on stabilizing the
> FreeBSD IPv6 stack for production-level traffic, and I recommend that all 
> users
> of IPv6 update to the latest patches in 9-stable and 8-stable.  Contact me
> directly if you have questions about this.  That said, we're excited about 
> World
> IPv6 Day, and we're ready with  DNS records and content service from both
> Amazon and the traditional CDNs as well as our OpenConnect network.
> 
>> From an advocacy standpoint, Netflix represents 30% of all North American
> internet traffic during peak hours, and FreeBSD is becoming an integral part
> of that metric as we shift traffic off of the traditional CDNs.  We're 
> expanding
> quickly, which means that FreeBSD is once again a core part of the internet
> infrastructure.  As we find and fix stability and performance issues, we're
> aggressively pushing those changes into FreeBSD so that everyone can
> benefit from them, just as we benefit from the contributions of the rest of 
> the
> FreeBSD ecosystem.  We're proud to be a part of the community, and look
> forward to a long-term relationship with FreeBSD.
> 
> If you have any questions, let me know or follow the information links on the
> OpenConnect web site.
> 

I wanted to follow up on this briefly.  I jumped the gun a little bit in 
talking about this publicly, since the Openconnect website wasn't fully  
globally online at the time.  It is now, so anyone who previously had trouble 
getting to it should try again at https://signup.netflix.com/openconnect.  
Also, I mistakenly claimed that our regular CDN partners were serving streaming 
content over IPv6.  This isn't the case, only OpenConnect is, and I apologize 
for any confusion (hey, I've only just started at Netflix, and I couldn't even 
spell IPv6 two weeks ago =-)  Finally, I wanted to thank the NginX developers, 
they've done an amazing job supporting us.

The community enthusiasm and interest has been outstanding so far, so please 
feel free to continue to ask questions on the mailing list and to make formal 
inquires to Netflix.

Thanks,
Scott




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Netflix's New Peering Appliance Uses FreeBSD

2012-06-07 Thread Scott Long

On Jun 7, 2012, at 3:09 AM, Daniel Kalchev wrote:

> 
> 
> On 06.06.12 03:16, Scott Long wrote:
> 
> [...]
>> Each disk has its own UFS+J filesystem, except for
>> the SSDs that are mirrored together with gmirror.  The SSDs hold the OS image
>> and cache some of the busiest content.  The other disks hold nothing but the
>> audio and video files for our content streams.
> 
> Could you please explain the rationale of using UFS+J for this large storage. 
> Your published documentation states that you have reasonable redundancy in 
> case of multiple disk failure and I wonder how you handle this with "plain" 
> UFS. Things like avoiding hangs and panics when an disk is going to die.

Redundancy happens by allowing the streaming clients to choose multiple other 
sources for their stream, and buffer enough of the stream to make a switchover 
appear seamless.  That other source might be a peer node on the same network, 
or might be a node that is upstream or on a different network.  The point of 
the caches is to hold as much content as possible, and we've found that it's 
more effective to maximize capacity but allow drives to fail in place than to 
significantly reduce capacity with hardware or software RAID.  When a disk 
starts having problems that affect its ability to deliver data on time, any 
clients affected by it simply switch to a different source.  When the disk does 
finally die, it is removed from the available pool and content is reshuffled on 
the other drives during the next daily content update.  Once enough disks fail 
that the cache is no longer effective, it gets replaced.

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Netflix's New Peering Appliance Uses FreeBSD

2012-06-05 Thread Scott Long

On Jun 5, 2012, at 6:42 PM, David Magda wrote:

> On Jun 5, 2012, at 20:16, Scott Long wrote:
> 
>> If you have any questions, let me know or follow the information links on the
>> OpenConnect web site.
> 
> Out of curiosity, given that Linux seems popular in so many other places 
> (Google, Facebook), is there any particular reason why FreeBSD was chosen for 
> this?
> 
> I'm sure Linux is used in many other places (much of Netflix's IT 
> infrastructure is on Amazon IIRC), so I'm kind of surprised that they went 
> with FreeBSD when they probably already have so much knowledge with Linux.
> 
> 

Linux works wonderfully on EC2 for our C&C and computational tasks, FreeBSD is 
proving to work well on deployed hardware for serving bits.  It highly 
maintainable, and there's an excellent community supporting it.  From the 
website:

For the operating system, we use FreeBSD <http://www.freebsd.org/> version
9.0. This was selected for its balance of stability and features, a strong
development community and staff expertise. We will contribute changes we
make as part of our project to the community through the FreeBSD committers
on our team.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Netflix's New Peering Appliance Uses FreeBSD

2012-06-05 Thread Scott Long

On Jun 5, 2012, at 9:56 AM, Benjamin Francom wrote:

> I just saw this, and thought I'd share:
> 
> Open Connect Appliance Software
> 
> Netflix delivers streaming content using a combination of intelligent
> clients, a central control system, and a network of Open Connect appliances.
> 
> When designing the Open Connect Appliance Software, we focused on these
> fundamental design goals:
> 
>   - Use of Open Source software
>   - Ability to efficiently read from disk and write to network sockets
>   - High-performance HTTP delivery
>   - Ability to gather routing information via BGP
> 
> Operating System
> 
> For the operating system, we use FreeBSD <http://www.freebsd.org/> version
> 9.0. This was selected for its balance of stability and features, a strong
> development community and staff expertise. We will contribute changes we
> make as part of our project to the community through the FreeBSD committers
> on our team.
> Web server
> 

Yes, we are indeed using FreeBSD at Netflix!  For those who are interested, I
recently moved from Yahoo to Netflix to help support FreeBSD for them, and
I'm definitely impressed with what is going on there.  Other than a few small
changes, we're using stock FreeBSD 9, tracking the 9-stable branch on a
regular basis.  Our chassis is a semi-custom 4U 19" form factor with thirty six
3TB SATA disks and 2 SSDs.  Each disk has its own UFS+J filesystem, except for
the SSDs that are mirrored together with gmirror.  The SSDs hold the OS image
and cache some of the busiest content.  The other disks hold nothing but the
audio and video files for our content streams.  We connect to the outside world
via a twin-port Intel 10GBe optical NIC (only one port is active at the moment),
and we use LSI MPT2 controllers for 32 of the 36 disks.  The other 4 disks
connect to the onboard AHCI SATA controller.  All of the disks are
direct-attach with no SAS backplanes or expanders.  Out-of-band management
happens via IPMI on an on-board 1Gb NIC.  The entire system consumes
around 500W of power, making it a very efficient appliance for its 
functionality.

Netflix is also at the front of the internet pack with IPv6 roll-out, and 
FreeBSD
plays an essential part of that.  We've been working hard on stabilizing the
FreeBSD IPv6 stack for production-level traffic, and I recommend that all users
of IPv6 update to the latest patches in 9-stable and 8-stable.  Contact me
directly if you have questions about this.  That said, we're excited about World
IPv6 Day, and we're ready with  DNS records and content service from both
Amazon and the traditional CDNs as well as our OpenConnect network.

>From an advocacy standpoint, Netflix represents 30% of all North American
internet traffic during peak hours, and FreeBSD is becoming an integral part
of that metric as we shift traffic off of the traditional CDNs.  We're expanding
quickly, which means that FreeBSD is once again a core part of the internet
infrastructure.  As we find and fix stability and performance issues, we're
aggressively pushing those changes into FreeBSD so that everyone can
benefit from them, just as we benefit from the contributions of the rest of the
FreeBSD ecosystem.  We're proud to be a part of the community, and look
forward to a long-term relationship with FreeBSD.

If you have any questions, let me know or follow the information links on the
OpenConnect web site.

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Fwd: ntpd couldn't resolve host name on system boot

2012-05-19 Thread Scott, Brian

> 
> Matthew,
> 
> Although netwait will probably fix the problem for you, another
> possibility that I have just ran into recently involved DNSSEC
> validation in bind. The problem was that without ntp syncing the time at
> boot (the system doesn't have battery backed time) dns resolution failed
> (root key wasn't valid yet). If dns resolution fails then you can't find
> your time servers. Repeat till you get sick of it.
> 
> Possible solutions include:
> *Don't do dnssec validation
> *Hard code an IP address in your ntp.conf (even a low stratum
> server somewhere)
> *Run a task at startup to brute force your time to something
> reasonable (within a year or two ago) to allow dnssec validation to
> work.
> 
> As I said, you will probably find that netwait_* is all you need and you
> can safely ignore this.
> 
> Brian
> 
> -Original Message-
> From: owner-freebsd-sta...@freebsd.org
> [mailto:owner-freebsd-sta...@freebsd.org] On Behalf Of Matthew Doughty
> Sent: Friday, 18 May 2012 1:03 AM
> To: freebsd-stable@freebsd.org
> Subject: ntpd couldn't resolve host name on system boot
> 
> Dear Jerermy,
> 
> Whilst searching for a solution to a problem, I found your post:
> http://lists.freebsd.org/pipermail/freebsd-stable/2011-October/064350.ht
> ml
> 
> Please could you explain how I can implement the netwait script to solve
> the problem?  I'm new to freenas/BSD but am willing to try working from
> the Cmd line.
> 
> Best regards,
> 
> Matthew
> 
> 
> PS: Here are the messages
> 
> May 14 13:32:59 freenas kernel: SMP: AP CPU #1 Launched!
> 
> May 14 13:32:59 freenas kernel: GEOM: da0s1: geometry does not match
> label (16h,63s != 255h,63s).
> 
> May 14 13:32:59 freenas kernel: GEOM: da0s2: geometry does not match
> label (16h,63s != 255h,63s).
> 
> May 14 13:32:59 freenas kernel: Trying to mount root from
> ufs:/dev/ufs/FreeNASs2a
> 
> May 14 13:32:59 freenas kernel: WARNING: /data was not properly
> dismounted
> 
> May 14 13:32:59 freenas kernel: ZFS filesystem version 4
> 
> May 14 13:32:59 freenas kernel: ZFS storage pool version 15
> 
> May 14 13:33:00 freenas ntpd[1512]: ntpd 4.2.4p5-a (1)
> 
> May 14 13:33:00 freenas root: /etc/rc: WARNING: failed precmd routine
> for vmware_guestd
> 
> May 14 13:33:02 freenas ntpd_initres[1526]: host name not found:
> 0.freebsd.pool.ntp.org
> 
> May 14 13:33:02 freenas ntpd_initres[1526]: couldn't resolve `
> 0.freebsd.pool.ntp.org', giving up on it
> 
> May 14 13:33:02 freenas ntpd_initres[1526]: host name not found:
> 1.freebsd.pool.ntp.org
> 
> May 14 13:33:02 freenas ntpd_initres[1526]: couldn't resolve `
> 1.freebsd.pool.ntp.org', giving up on it
> 
> May 14 13:33:02 freenas ntpd_initres[1526]: host name not found:
> 2.freebsd.pool.ntp.org
> 
> May 14 13:33:02 freenas ntpd_initres[1526]: couldn't resolve `
> 2.freebsd.pool.ntp.org', giving up on it
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscr...@freebsd.org"

**
This message is intended for the addressee named and may contain
privileged information or confidential information or both. If you
are not the intended recipient please delete it and notify the sender.
**
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: random problem with 8.3 from yesterday

2012-02-25 Thread Scott Bennett
 On Sat, 25 Feb 2012 08:56:24 -0800 Kevin Oberman 
wrote:
>On Sat, Feb 25, 2012 at 2:27 AM, Scott Bennett  wrote:
>> =A0 =A0 On Wed, 22 Feb 2012 13:34:36 +0700 Erich Dollansky
>>  wrote:
>>
>>>I got a new thumb drive which was FAT formatted. I use this script to cha=
>nge this:
>>>
>>>!/bin/tcsh
>>>#
>>># This script format a thumb drive connected to USB as da0.
>>>#
>>>printf "You have to run this script as 'root' to succeed.\n"
>>>printf "Warning this script will delete all your data from /dev/da0. Cont=
>inue? > "
>>>set Eingabe =3D $<
>>>if ("$Eingabe" =3D=3D "y") then
>>> =A0 printf "\nDeleting the device "
>>> =A0 dd if=3D/dev/zero of=3D/dev/da0 bs=3D1k count=3D1
>>> =A0 printf "\nWriting the BSD label "
>>> =A0 bsdlabel -Bw da0 auto
>>
>> =A0 =A0 Hmmm...so no MBR and no GPT either? =A0Just the bare device? =A0I=
> guess
>> I haven't tried that, so I don't know what that would do.
>
>Call me a bit confused, but I thought -B did write an MBR. It always
>has seemed to do so for me, at any rate. From man bsdlabel:
>"Installing Bootstraps
> If the -B option is specified, bootstrap code will be read from the fi=
>le
> /boot/boot and written to the disk."
>Or am I not understanding something?

 I guess I understand the part that you quoted above as meaning that
the bootstrap code would be copied to the bootstrap sectors.  However, as
I interpret it, the bsdlabel command does not write a MBR, which would
include the slice map for the device.  Further, Erich's later commands did
not specify a slice number.  In short, it looks to me as though he may have
ended up with the initial boot code where it belonged at the start of the
device, but the boot code looks for the slice map, which isn't there, so
it should not be possible to boot a kernel because the bootstrap code
would not be able to find it.  But as far as simply mounting a file system,
I really don't know whether it should work to have a BSD label written to
a bare device with neither a MBR nor a GPT to find that label.  IOW, would
the device node to be used in the mount operation have been created?
 Note to Erich:  did you look in /dev and /dev/ufs to see whether all
of the device files that you expected to be there were, in fact, present
before you attempted the mount?


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at cs.niu.edu  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


random problem with 8.3 from yesterday

2012-02-25 Thread Scott Bennett
 On Wed, 22 Feb 2012 13:34:36 +0700 Erich Dollansky
 wrote:

>I got a new thumb drive which was FAT formatted. I use this script to change 
>this:
>
>!/bin/tcsh
>#
># This script format a thumb drive connected to USB as da0.
>#
>printf "You have to run this script as 'root' to succeed.\n"
>printf "Warning this script will delete all your data from /dev/da0. Continue? 
>> "
>set Eingabe = $<
>if ("$Eingabe" == "y") then
>   printf "\nDeleting the device "
>   dd if=/dev/zero of=/dev/da0 bs=1k count=1
>   printf "\nWriting the BSD label "
>   bsdlabel -Bw da0 auto

 Hmmm...so no MBR and no GPT either?  Just the bare device?  I guess
I haven't tried that, so I don't know what that would do.

>   printf "\nEditing the BSD label "
>   bsdlabel -e da0
>   newfs /dev/da0a
>   printf "\nThe device /dev/da0 was formated to be used with FreeBSD.\n"
>else
>   printf "\nScript aborted!\n"
>endif
>
>I then call manually
>
>tunefs -L NewDeviceName /dev/da0a

 Just out of curiosity, I'd like to know why you run tunefs manually,
rather than using "-L NewDeviceName" on the newfs command, given that your
script is clearing the physical device and then creating an empty file
system.
>
>Either this call or the mount command does not work randomly.
>
>When I then try to mount the device on /dev/da0a it does not work always.

 What do you mean when you write "mount the device on /dev/da0a"?
Normally one mounts a filesystem onto a "device", e.g.,

mount /dev/ad0s1d /var

or some similar thing.  Also, why do you refer to /dev/da0a at all if you
labeled the file system?  The whole point of labeling the file system is
supposed to be so that you can mount it independently of the physical
device name, e.g.,

mount /dev/ufs/NewDeviceName /thumbfs

which allows you to have an entry in /etc/fstab for mounting the file
system that doesn't need to be edited every time you reboot the system or
move devices around.
>
>I do not know what this causes, I am only randomly able to reproduce it.
>
>It might be affected by removing the device or keeping it plugged in.

 Well, yes, that's what you label partitions/devices to avoid having
to deal with manually, right?
>
>uname says:
>
>FreeBSD AMD620.ovitrap.com 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #28: Tue Feb 
>21 17:15:07 WIT 2012 
>er...@amd620.ovitrap.com:/usr/obj/usr/src/sys/AsusAMD620  amd64
>
>dmesg says:
>
>ugen1.2:  at usbus1
>umass0:  on 
>usbus1
>umass0:  SCSI over Bulk-Only; quirks = 0x4001
>umass0:2:0:-1: Attached to scbus2
>da0 at umass-sim0 bus 0 scbus2 target 0 lun 0
>da0: < USB FLASH DRIVE PMAP> Removable Direct Access SCSI-0 device 
>da0: 40.000MB/s transfers
>da0: 15272MB (31277056 512 byte sectors: 255H 63S/T 1946C)
>
>It is not an urgent problem.
>
 It most likely is not a problem at all.  See

http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/geom-glabel.html#AEN27470


  With best regards,

  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at cs.niu.edu  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: disk devices speed is ugly

2012-02-20 Thread Scott Long

On Feb 20, 2012, at 12:24 PM, Alex Samorukov wrote:

> On 02/15/2012 05:50 AM, Scott Long wrote:
>> 
>> What would be nice is a generic caching subsystem that any FS can use
>> - similar to the old block devices but with hooks to allow the FS to
>> request read-ahead, advise of unwanted blocks and ability to flush
>> dirty blocks in a requested order with the equivalent of barriers
>> (request Y will not occur until preceeding request X has been
>> committed to stable media).  This would allow filesystems to regain
>> the benefits of block devices with minimal effort and then improve
>> performance&  cache efficiency with additional work.
>> 
>> Any filesystem that uses bread/bwrite/cluster_read are already using the 
>> "generic caching subsystem" that you propose.  This includes UDF, CD9660, 
>> MSDOS, NTFS, XFS, ReiserFS, EXT2FS, and HPFS, i.e. every local storage 
>> filesystem in the tree except for ZFS.  Not all of them implement 
>> VOP_GETPAGES/VOP_PUTPAGES, but those are just optimizations for the vnode 
>> pager, not requirements for using buffer-cache services on block devices.  
>> As Kostik pointed out in a parallel email, the only thing that was removed 
>> from FreeBSD was the userland interface to cached devices via /dev nodes.  
>> This has nothing to do with filesystems, though I suppose that could maybe 
>> sorta kinda be an issue for FUSE?.
> May be its possible to provide some generic interface for fuse based 
> filesystems to use this generic cache? I can test it and report performance.
> 

What you're asking for is to bring back the cached raw devices.  I don't have a 
strong opinion on this one way or another, except that it's a pretty specific 
use case.  Does the inherent performance gap with user land filesystems warrant 
this?  Maybe a simple cache layer can be put into FUSE that would allow client 
filesystems the same control over block caching and clustering that is afforded 
in the kernel?

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: disk devices speed is ugly

2012-02-14 Thread Scott Long

On Feb 14, 2012, at 1:02 PM, Peter Jeremy wrote:

> On 2012-Feb-13 08:28:21 -0500, Gary Palmer  wrote:
>> The filesystem is the *BEST* place to do caching.  It knows what metadata
>> is most effective to cache and what other data (e.g. file contents) doesn't
>> need to be cached.
> 
> Agreed.
> 
>> Any attempt to do this in layers between the FS and
>> the disk won't achieve the same gains as a properly written filesystem. 
> 
> Agreed - but traditionally, Unix uses this approach via block devices.
> For various reasons, FreeBSD moved caching into UFS and removed block
> devices.  Unfortunately, this means that any FS that wants caching has
> to implement its own - and currently only UFS & ZFS do.
> 
> What would be nice is a generic caching subsystem that any FS can use
> - similar to the old block devices but with hooks to allow the FS to
> request read-ahead, advise of unwanted blocks and ability to flush
> dirty blocks in a requested order with the equivalent of barriers
> (request Y will not occur until preceeding request X has been
> committed to stable media).  This would allow filesystems to regain
> the benefits of block devices with minimal effort and then improve
> performance & cache efficiency with additional work.
> 

Any filesystem that uses bread/bwrite/cluster_read are already using the 
"generic caching subsystem" that you propose.  This includes UDF, CD9660, 
MSDOS, NTFS, XFS, ReiserFS, EXT2FS, and HPFS, i.e. every local storage 
filesystem in the tree except for ZFS.  Not all of them implement 
VOP_GETPAGES/VOP_PUTPAGES, but those are just optimizations for the vnode 
pager, not requirements for using buffer-cache services on block devices.  As 
Kostik pointed out in a parallel email, the only thing that was removed from 
FreeBSD was the userland interface to cached devices via /dev nodes.  This has 
nothing to do with filesystems, though I suppose that could maybe sorta kinda 
be an issue for FUSE?.

ZFS isn't in this list because it implements its own private buffer/cache (the 
ARC) that understands the special requirements of ZFS.  There are good and bad 
aspects to this, noted below.

> One downside of the "each FS does its own caching" in that the caches
> are all separate and need careful integration into the VM subsystem to
> prevent starvation (eg past problems with UFS starving ZFS L2ARC).
> 

I'm not sure what you mean here.  The ARC is limited by available wired memory; 
attempts to allocate such memory will evict pages from the buffer cache as 
necessary, until all available RAM is consumed.  If anything, ZFS starves the 
rest of the system, not the other way around, and that's simply because the ARC 
isn't integrated with the normal VM.  Such integration is extremely hard and 
has nothing to do with having a generic caching subsystem.

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: Reducing the need to compile a custom kernel

2012-02-14 Thread Scott, Brian
>>  - CPU_SOEKRIS, CPU_GEODE, CPU_ELAN, NO_SWAPPING for embedded devices
>
>Embedded devices are out of the scope of this, normally you do a lot of
other modifictions to such systems anyway, so a custom kernel should be
not a >big problem.

Just as a quick data point here, I have just installed FreeBSD onto an
ALIX system and was hoping to keep everything very standard.

Turns out that I needed to rebuild the kernel to add CPU_GEODE to get a
few simple features added. Everything else is standard GENERIC because
I'm too lazy to fine tune. The geode code is very small and I would
expect completely harmless if left enabled in GENERIC. The overhead of
including it for other systems would be a few extra compares during
startup and a k or so extra size in the kernel.

I would suggest that avoiding custom kernels to make trivial changes is
exactly what you should be looking at. Make features like this removable
for the people who want to fine tune their kernels but include for
people who are happy to have a little overhead as a trade of for ease of
management.

The only other thing that regularly has me running custom kernels is
IPFIREWALL_FORWARD. As others have said, I'd be very happy if that was
the default but removable.

Brian Scott
**
This message is intended for the addressee named and may contain
privileged information or confidential information or both. If you
are not the intended recipient please delete it and notify the sender.
**
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: problems with AHCI on FreeBSD 8.2

2012-02-14 Thread Scott Long

On Feb 14, 2012, at 4:34 PM, Victor Balada Diaz wrote:

> On Tue, Feb 14, 2012 at 03:09:58PM -0800, Jeremy Chadwick wrote:
>> On Tue, Feb 14, 2012 at 11:15:27PM +0100, Victor Balada Diaz wrote:
>>> On Tue, Feb 14, 2012 at 06:17:19PM +0100, Harald Schmalzbauer wrote:
>>>> schrieb Jeremy Chadwick am 14.02.2012 17:50 (localtime):
>>>>> On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote:
>>>>>> Hello,
>>>>>> 
>>>>>> I have got a quite similar problem with AHCI on FreeBSD 8.2 and it still
>>>>>> persists on FreeBSD 9.0 release.
>>>>>> 
>>>>>> Switching from ahci to ataahci resolved the problem for me too.
>>>>>> 
>>>>>> I'm using gmirror for swap, system is on a zpool and the problem first
>>>>>> occurred during a zpool scrub, but it is easily reproducible with dd.
>>>>>> 
>>>>>> The timeouts only occur when writing to disks, dd if=/dev/ada{0|1}
>>>>>> of=/dev/null is not an issue.
>>>>>> Sometimes I need to power off the server because after a reboot one disk
>>>>>> is still missing.
>>>>>> 
>>>>>> I really would like to help in this issue, so let me know if you need
>>>>>> any more information.
>>>>> I find it interesting that, at least so far, the only people reporting
>>>>> problems of this type with the ahci.ko driver are people using Samsung
>>>>> disks.  The only difference is that your models are F1s while the OPs
>>>>> are F2s.
>>>> 
>>>> I saw such timeouts long ago and mav@ had a look at my postings and he
>>>> mentioned it could be a NCQ problem.
>>>> I suspected the disks firmware.
>>>> I never tracked it down further, because after replacing the Samsung (F3
>>>> in that case) disks with hitachi ones solved all my problems and gave a
>>>> big performance kick as well (with zfs).
>>>> You can find the discussion here:
>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html
>>>> 
>>> 
>>> You gave me a good idea: try to disable NCQ and see if that's the fault. So
>>> i went and applied the attached patch. After it, i can no longer reproduce
>>> the issue with ahci driver.
>>> 
>>> I know this is not a solution because it disables NCQ at controller level
>>> instead of disk level, but at least we know for sure where the problem is.
>>> 
>>> I think the solution would be to add a new quirk ADA_Q_NONCQ in 
>>> sys/cam/ata/ata_da.c.
>>> Quirks infraestructure is already built, so adding a new quirk for this 
>>> seems
>>> easy.
>>> 
>>> Is someone interested? Do you think there is a better solution?
>>> 
>>> If someone is interested i can build a patch to add ADA_Q_NONCQ quirk and 
>>> add my drives
>>> to it.
>> 
>> I took a stab at this, but I don't feel confident this is the proper
>> solution/method.  I worry there's some sort of chicken-or-the-egg
>> condition here (quirk setup/matching comes *after* SATA capabilities
>> detection), or that it makes the code messier.  Need mav@'s
>> recommendations on this.
>> 
>> Below is for RELENG_8.  I should note I haven't tested if this works, or
>> even compiles -- normally I don't provide such patches without testing
>> so I apologise in advance / user beware.
> 
> You're amazingly fast. Thanks for all your help :)
> 
> You start applying the quirks before 
> 
>snprintf(announce_buf, sizeof(announce_buf),
>"kern.cam.ada.%d.quirks", periph->unit_number);
>quirks = softc->quirks;
>TUNABLE_INT_FETCH(announce_buf, &quirks);
> 
> So you're breaking quirk setting at boot time.
> 
> See my attached patch. I can confirm it works for me.
> 
> Regards.
> 

I don't think that disabling NCQ entirely is the right solution.  It's a tag 
starvation issue in the firmware, not a complete failure, and it can be dealt 
with in the CAM XPT scheduler fairly efficiently.  Alexander and I talked about 
this recently, and though we differ on the details, a tag hack is not in order, 
IMHO.  In the short term, try just using "cam control tags ada0 -N 1" to limit 
the concurrent commands to 1.

Scott


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: SCHED_ULE should not be the default

2011-12-12 Thread Scott Lambert
On Mon, Dec 12, 2011 at 09:06:04AM -0800, Steve Kargl wrote:
> Tuning kern.sched.preempt_thresh did not seem to help for
> my workload.  My code is a classic master-slave OpenMPI
> application where the master runs on one node and all
> cpu-bound slaves are sent to a second node.  If I send
> send ncpu+1 jobs to the 2nd node with ncpu's, then 
> ncpu-1 jobs are assigned to the 1st ncpu-1 cpus.  The
> last two jobs are assigned to the ncpu'th cpu, and 
> these ping-pong on the this cpu.  AFAICT, it is a cpu
> affinity issue, where ULE is trying to keep each job
> associated with its initially assigned cpu.
> 
> While one might suggest that starting ncpu+1 jobs
> is not prudent, my example is just that.  It is an
> example showing that ULE has performance issues. 
> So, I now can start only ncpu jobs on each node
> in the cluster and send emails to all other users
> to not use those node, or use 4BSD and not worry
> about loading issues.

Does it meet your expectations if you start (j modulo ncpu) = 0
jobs on a node?

-- 
Scott LambertKC5MLE   Unix SysAdmin
lamb...@lambertfam.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: scp: Write Failed: Cannot allocate memory

2011-07-25 Thread Scott Sipe

On Jul 24, 2011, at 10:07 PM, Adrian Chadd wrote:

> Has someone asked for the output of netstat -mb? That error message is
> mbuf related, so I bet it's something to do with mbuf allocation.
> 
> Is it possible that the system is incorrectly tuned when virtualbox is 
> enabled?
> 
> 
> Adrian

I never ran netstat -mb when I was encountering the network memory allocation 
error. At the moment I have adjusted net.graph.maxdata from the default value 
to 8192, rebooted, and have not encountered any more network memory allocation 
errors (and vmstat -z shows 0 NetGraph data item failures). If, at this maxdata 
setting, they show up again I'll post the output of netstat -mb. If it's worth 
anything, current output of netstat -mb is below.

# netstat -mb
1027/6668/7695 mbufs in use (current/cache/total)
1024/4034/5058/25600 mbuf clusters in use (current/cache/total/max)
1024/3456 mbuf+clusters out of packet secondary zone in use (current/cache)
0/1303/1303/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
2304K/14947K/17251K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
19 requests for I/O initiated by sendfile
0 calls to protocol drain routines

Scott___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: disable 64-bit dma for one PCI slot only?

2011-07-20 Thread Scott Long
On Jul 20, 2011, at 3:54 AM, Stefan Esser wrote:
> 
> This is a very good idea, IMHO.
> 
> When I committed pciconf back in 1996 (it had been contributed by
> gwollman) for PCI 1.0 (at a time when their was no standard for PCI to
> PCI brigdes, yet ;-) ), the current format seemed sensible, but the
> tabular form suggested by Artem is much better to parse.
> 
> I'd want to suggest another slightly different format:
> 
> Driver Handle ClassVndDevSubVnd SubDev Rev  Hdr
> hostb0 0:0:0:00x06 0x8086 0x0100 0x8086 0x2010 0x09 0x00
> pcib1  0:0:1:00x060400 0x8086 0x0101 0x8086 0x2010 0x09 0x01
> pcib2  0:0:1:10x060400 0x8086 0x0105 0x8086 0x2010 0x09 0x01
> none0  0:0:22:0   0x078000 0x8086 0x1c3a 0x8086 0x4742 0x04 0x00
> em00:0:25:0   0x02 0x8086 0x1503 0x8086 0x 0x04 0x00
> dummy0 65535:255:31:7 0x02 0x8086 0x1503 0x8086 0x 0x04 0x00
> 
> I.e., print only one header line (no "---"), make the "Handle" column
> wide enough to hold the longest possible value, use only white space to
> separate columns and print 0x as a prefix for all hex numbers.
> 
> Instead of "pci0:0:0:0" for the PCI handle, just "0:0:0:0" could be
> printed, IMHO. (But this is bikeshed material, I guess ...)
> 
> The "Rev" column is required for of devices that are not uniquely
> identified by their Vnd/Dev-IDs. (These used to exist, e.g. the Symbios
> SCSI controllers, though I'm not aware of any device that needed a
> different driver depending on the PCI revision number.)
> 

Actually, a few drivers (amr in particular) look at this rev field during 
probe, though they should be looking at the subdev/ven ids instead.  I think 
that this behavior has actually caused recent headaches for LSI with other 
drivers.  But as Kostik points out, the rev field is still moderately useful 
for informational purposes.

> I'd be happy to modify pciconf to print the new format in -CURRENT
> (having been the maintainer of the PCI code for quite some time), if
> consensus is reached on a format and if this change is accepted by RE.
> 

I'm pretty sure that we scrape the current format at Yahoo and use it in our 
tools.  Implementing a switch of some sort to fall back to the old format is 
something that will have to happen at some point, whether it's done now or not. 
 I'd probably implement it as an env variable such as "PCICONF_COMPAT", similar 
to what is used by expr(1).

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: disable 64-bit dma for one PCI slot only?

2011-07-19 Thread Scott Long
On Jul 19, 2011, at 7:31 AM, John Baldwin wrote:
>> 
>> If we're going to change it, might as well break it down into 4 fields.  
>> Maybe
>> we retain the old format under a legacy switch and/or env variable for users
>> that have tools that parse the output (cough yahoo cough).
> 
> The only reason it might be nice to stick with two fields is due to the line
> length (though the first line is over 80 cols even in the current format).  
> Here
> are two possible suggestions:
> 

I like A for the explicitness, but B is a bit easier to read on an 80 column 
display.  There's no 'verbose' flag for pciconf, and the '-v' flag has already 
been claimed for another use; if a verbose flag were to appear, I'd suggest 
hiding the rev and hdr fields under it and otherwise shortening the line.  
However, it's not my bikeshed to paint, and I'll be thrilled with either option 
A or B or anything in between.

> I went with vendor word first for both A) and B) as in my experience that is
> the more common ordering in driver tables, etc.
> 

Indeed.  Thanks a lot for working on this.

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: disable 64-bit dma for one PCI slot only?

2011-07-18 Thread Scott Long

On Jul 18, 2011, at 3:14 PM, John Baldwin wrote:

> On Monday, July 18, 2011 5:06:40 pm Scott Long wrote:
>> On Jul 18, 2011, at 12:02 PM, John Baldwin wrote:
>>> On Friday, July 15, 2011 6:07:31 pm Mark McConnell wrote:
>>>> Dear folks,
>>>> 
>>>> I have two LSI raid cards, one of which (SCSI 320-I) supports 
>>>> 64-bit DMA when 4GB+ of DDR is present and another which 
>>>> does not (SATA 150-D) .  Consquently I've disabled 64-bit 
>>>> addressing for amr devices.
>>>> 
>>>> I would like to disable 64-bit addressing for the SATA card, but 
>>>> permit it for the SCSI card.  Is this possible?
>>> 
>>> You'd have to hack the driver perhaps to only disable 64-bit DMA for 
>>> certain 
>>> PCI IDs.  It probably already does this?
>>> 
>> 
>> The driver already had a table for determining 64bit DMA based on the PCI ID.
>> I guess there's a mistake in the table for this particular card.  I think 
>> that
>> changing the following line to remove the AMR_ID_DO_SG64 flag will fix the
>> problem:
>> 
>>{0x1000, 0x1960, AMR_ID_QUARTZ | AMR_ID_DO_SG64 | AMR_ID_PROBE_SIG},
>> 
>> Actually, what's probably going on is that the driver is only looking at the
>> vendor and device id's, and is ignoring the subvendor and subdevice id's that
>> would give it a better clue on the exact hardware in use.  Fixing the driver
>> to look at all 64bits of id info (and take into account wildcards where
>> needed) would be a good project, if anyone is interested.
>> 
>> Btw, I *HATE* the "chip" and "card" identifiers used in pciconf.  Can we
>> change it to emit the standard (sub)vendor/(sub)device terminology?
> 
> Oh, yeah.  I hate that too.  Would you want them as 4 separate entities or to
> just rename the labels to 'devid' and 'subdevid'?
> 

If we're going to change it, might as well break it down into 4 fields.  Maybe 
we retain the old format under a legacy switch and/or env variable for users 
that have tools that parse the output (cough yahoo cough).

Scott


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: disable 64-bit dma for one PCI slot only?

2011-07-18 Thread Scott Long
On Jul 18, 2011, at 12:02 PM, John Baldwin wrote:
> On Friday, July 15, 2011 6:07:31 pm Mark McConnell wrote:
>> Dear folks,
>> 
>> I have two LSI raid cards, one of which (SCSI 320-I) supports 
>> 64-bit DMA when 4GB+ of DDR is present and another which 
>> does not (SATA 150-D) .  Consquently I've disabled 64-bit 
>> addressing for amr devices.
>> 
>> I would like to disable 64-bit addressing for the SATA card, but 
>> permit it for the SCSI card.  Is this possible?
> 
> You'd have to hack the driver perhaps to only disable 64-bit DMA for certain 
> PCI IDs.  It probably already does this?
> 

The driver already had a table for determining 64bit DMA based on the PCI ID.  
I guess there's a mistake in the table for this particular card.  I think that 
changing the following line to remove the AMR_ID_DO_SG64 flag will fix the 
problem:

{0x1000, 0x1960, AMR_ID_QUARTZ | AMR_ID_DO_SG64 | AMR_ID_PROBE_SIG},

Actually, what's probably going on is that the driver is only looking at the 
vendor and device id's, and is ignoring the subvendor and subdevice id's that 
would give it a better clue on the exact hardware in use.  Fixing the driver to 
look at all 64bits of id info (and take into account wildcards where needed) 
would be a good project, if anyone is interested.

Btw, I *HATE* the "chip" and "card" identifiers used in pciconf.  Can we change 
it to emit the standard (sub)vendor/(sub)device terminology?

Scott


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: scp: Write Failed: Cannot allocate memory

2011-07-06 Thread Scott Sipe
On Wed, Jul 6, 2011 at 4:21 AM, Peter Ross wrote:

> Quoting "Peter Ross" :
>
>  Quoting "Peter Ross" :
>>
>>  Quoting "Jeremy Chadwick" :
>>>
>>>  On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote:
>>>>
>>>>> Quoting "Jeremy Chadwick" :
>>>>>
>>>>>  On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote:
>>>>>>
>>>>>>> Quoting "Jeremy Chadwick" :
>>>>>>>
>>>>>>>  On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote:
>>>>>>>>
>>>>>>>>> Quoting "Jeremy Chadwick" :
>>>>>>>>>
>>>>>>>>>  On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm running virtualbox 3.2.12_1 if that has anything to do with
>>>>>>>>>>> it.
>>>>>>>>>>>
>>>>>>>>>>> sysctl vfs.zfs.arc_max: 62
>>>>>>>>>>>
>>>>>>>>>>> While I'm trying to scp, kstat.zfs.misc.arcstats.size is
>>>>>>>>>>> hovering right around that value, sometimes above, sometimes
>>>>>>>>>>> below (that's as it should be, right?). I don't think that it
>>>>>>>>>>> dies when crossing over arc_max. I can run the same scp 10 times
>>>>>>>>>>> and it might fail 1-3 times, with no correlation to the
>>>>>>>>>>> arcstats.size being above/below arc_max that I can see.
>>>>>>>>>>>
>>>>>>>>>>> Scott
>>>>>>>>>>>
>>>>>>>>>>> On Jul 5, 2011, at 3:00 AM, Peter Ross wrote:
>>>>>>>>>>>
>>>>>>>>>>>  Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> just as an addition: an upgrade to last Friday's
>>>>>>>>>>>> FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the
>>>>>>>>>>>> problem.
>>>>>>>>>>>>
>>>>>>>>>>>> I will experiment a bit more tomorrow after hours and grab
>>>>>>>>>>>>
>>>>>>>>>>> some statistics.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>> Peter
>>>>>>>>>>>>
>>>>>>>>>>>> Quoting "Peter Ross" :
>>>>>>>>>>>>
>>>>>>>>>>>>  Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I noticed a similar problem last week. It is also very
>>>>>>>>>>>>> similar to one reported last year:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-**
>>>>>>>>>>>>> September/058708.html<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html>
>>>>>>>>>>>>>
>>>>>>>>>>>>> My server is a Dell T410 server with the same bge card (the
>>>>>>>>>>>>> same pciconf -lvc output as described by Mahlon:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-**
>>>>>>>>>>>>> September/058711.html<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yours, Scott, is a em(4)..
>>>>>>>>>>>>>
>>>>>>>>>>>>> Another similarity: In all cases we are using VirtualBox. I
>>>>>>>>>>>>> just want to mention it, in case it matters. I am still
>>>>>>>>>>>>> running VirtualBox 3.2.
>>>>>>>>>>>>>
>>>>&

Re: scp: Write Failed: Cannot allocate memory

2011-07-05 Thread Scott Sipe

On Jul 6, 2011, at 12:15 AM, Jeremy Chadwick wrote:

> On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote:
>> Quoting "Jeremy Chadwick" :
>> 
>>> On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote:
>>>> Quoting "Jeremy Chadwick" :
>>>> 
>>>>> On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote:
>>>>>> Quoting "Jeremy Chadwick" :
>>>>>> 
>>>>>>> On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote:
>>>>>>>> I'm running virtualbox 3.2.12_1 if that has anything to do with it.
>>>>>>>> 
>>>>>>>> sysctl vfs.zfs.arc_max: 62
>>>>>>>> 
>>>>>>>> While I'm trying to scp, kstat.zfs.misc.arcstats.size is
>>>>>>>> hovering right around that value, sometimes above, sometimes
>>>>>>>> below (that's as it should be, right?). I don't think that it
>>>>>>>> dies when crossing over arc_max. I can run the same scp 10 times
>>>>>>>> and it might fail 1-3 times, with no correlation to the
>>>>>>>> arcstats.size being above/below arc_max that I can see.
>>>>>>>> 
>>>>>>>> Scott
>>>>>>>> 
>>>>>>>> On Jul 5, 2011, at 3:00 AM, Peter Ross wrote:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> just as an addition: an upgrade to last Friday's
>>>>>>>>> FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the
>>>>>>>>> problem.
>>>>>>>>> 
>>>>>>>>> I will experiment a bit more tomorrow after hours and grab
>>>>>> some statistics.
>>>>>>>>> 
>>>>>>>>> Regards
>>>>>>>>> Peter
>>>>>>>>> 
>>>>>>>>> Quoting "Peter Ross" :
>>>>>>>>> 
>>>>>>>>>> Hi all,
>>>>>>>>>> 
>>>>>>>>>> I noticed a similar problem last week. It is also very
>>>>>>>>>> similar to one reported last year:
>>>>>>>>>> 
>>>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html
>>>>>>>>>> 
>>>>>>>>>> My server is a Dell T410 server with the same bge card (the
>>>>>>>>>> same pciconf -lvc output as described by Mahlon:
>>>>>>>>>> 
>>>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html
>>>>>>>>>> 
>>>>>>>>>> Yours, Scott, is a em(4)..
>>>>>>>>>> 
>>>>>>>>>> Another similarity: In all cases we are using VirtualBox. I
>>>>>>>>>> just want to mention it, in case it matters. I am still
>>>>>>>>>> running VirtualBox 3.2.
>>>>>>>>>> 
>>>>>>>>>> Most of the time kstat.zfs.misc.arcstats.size was reaching
>>>>>>>>>> vfs.zfs.arc_max then, but I could catch one or two cases
>>>>>>>>>> then the value was still below.
>>>>>>>>>> 
>>>>>>>>>> I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it
>>>> does not help.
>>>>>>>>>> 
>>>>>>>>>> BTW: It looks as ARC only gives back the memory when I
>>>>>>>>>> destroy the ZFS (a cloned snapshot containing virtual
>>>>>>>>>> machines). Even if nothing happens for hours the buffer
>>>>>>>>>> isn't released..
>>>>>>>>>> 
>>>>>>>>>> My machine was still running 8.2-PRERELEASE so I am upgrading.
>>>>>>>>>> 
>>>>>>>>>> I am happy to give information gathered on old/new kernel if it 
>>>>>>>>>> helps.
>>>>>>>>>> 
>>>>>>>>>> Regards
>>>>>>>>>> Peter
>>>>>>>>

Re: scp: Write Failed: Cannot allocate memory

2011-07-05 Thread Scott Sipe
I'm running virtualbox 3.2.12_1 if that has anything to do with it.

sysctl vfs.zfs.arc_max: 62

While I'm trying to scp, kstat.zfs.misc.arcstats.size is hovering right around 
that value, sometimes above, sometimes below (that's as it should be, right?). 
I don't think that it dies when crossing over arc_max. I can run the same scp 
10 times and it might fail 1-3 times, with no correlation to the arcstats.size 
being above/below arc_max that I can see.

Scott

On Jul 5, 2011, at 3:00 AM, Peter Ross wrote:

> Hi all,
> 
> just as an addition: an upgrade to last Friday's FreeBSD-Stable and to 
> VirtualBox 4.0.8 does not fix the problem.
> 
> I will experiment a bit more tomorrow after hours and grab some statistics.
> 
> Regards
> Peter
> 
> Quoting "Peter Ross" :
> 
>> Hi all,
>> 
>> I noticed a similar problem last week. It is also very similar to one 
>> reported last year:
>> 
>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html
>> 
>> My server is a Dell T410 server with the same bge card (the same pciconf 
>> -lvc output as described by Mahlon:
>> 
>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html
>> 
>> Yours, Scott, is a em(4)..
>> 
>> Another similarity: In all cases we are using VirtualBox. I just want to 
>> mention it, in case it matters. I am still running VirtualBox 3.2.
>> 
>> Most of the time kstat.zfs.misc.arcstats.size was reaching vfs.zfs.arc_max 
>> then, but I could catch one or two cases then the value was still below.
>> 
>> I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it does not help.
>> 
>> BTW: It looks as ARC only gives back the memory when I destroy the ZFS (a 
>> cloned snapshot containing virtual machines). Even if nothing happens for 
>> hours the buffer isn't released..
>> 
>> My machine was still running 8.2-PRERELEASE so I am upgrading.
>> 
>> I am happy to give information gathered on old/new kernel if it helps.
>> 
>> Regards
>> Peter
>> 
>> Quoting "Scott Sipe" :
>> 
>>> 
>>> On Jul 2, 2011, at 12:54 AM, jhell wrote:
>>> 
>>>> On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote:
>>>>> On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote:
>>>>>> I'm running 8.2-RELEASE and am having new problems with scp. When scping
>>>>>> files to a ZFS directory on the FreeBSD server -- most notably large 
>>>>>> files
>>>>>> -- the transfer frequently dies after just a few seconds. In my last 
>>>>>> test, I
>>>>>> tried to scp an 800mb file to the FreeBSD system and the transfer died 
>>>>>> after
>>>>>> 200mb. It completely copied the next 4 times I tried, and then died 
>>>>>> again on
>>>>>> the next attempt.
>>>>>> 
>>>>>> On the client side:
>>>>>> 
>>>>>> "Connection to home closed by remote host.
>>>>>> lost connection"
>>>>>> 
>>>>>> In /var/log/auth.log:
>>>>>> 
>>>>>> Jul  1 14:54:42 freebsd sshd[18955]: fatal: Write failed: Cannot allocate
>>>>>> memory
>>>>>> 
>>>>>> I've never seen this before and have used scp before to transfer large 
>>>>>> files
>>>>>> without problems. This computer has been used in production for months 
>>>>>> and
>>>>>> has a current uptime of 36 days. I have not been able to notice any 
>>>>>> problems
>>>>>> copying files to the server via samba or netatalk, or any problems in
>>>>>> apache.
>>>>>> 
>>>>>> Uname:
>>>>>> 
>>>>>> FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat Feb 19 01:02:54 EST
>>>>>> 2011 root@xeon:/usr/obj/usr/src/sys/GENERIC  amd64
>>>>>> 
>>>>>> I've attached my dmesg and output of vmstat -z.
>>>>>> 
>>>>>> I have not restarted the sshd daemon or rebooted the computer.
>>>>>> 
>>>>>> Am glad to provide any other information or test anything else.
>>>>>> 
>>>>>> {snip vmstat -z and dmesg}
>>>>> 
>>>>> You didn't provide details about your networking setup (rc.conf,
>>&

Re: scp: Write Failed: Cannot allocate memory

2011-07-02 Thread Scott Sipe

On Jul 2, 2011, at 12:54 AM, jhell wrote:

> On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote:
>> On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote:
>>> I'm running 8.2-RELEASE and am having new problems with scp. When scping
>>> files to a ZFS directory on the FreeBSD server -- most notably large files
>>> -- the transfer frequently dies after just a few seconds. In my last test, I
>>> tried to scp an 800mb file to the FreeBSD system and the transfer died after
>>> 200mb. It completely copied the next 4 times I tried, and then died again on
>>> the next attempt.
>>> 
>>> On the client side:
>>> 
>>> "Connection to home closed by remote host.
>>> lost connection"
>>> 
>>> In /var/log/auth.log:
>>> 
>>> Jul  1 14:54:42 freebsd sshd[18955]: fatal: Write failed: Cannot allocate
>>> memory
>>> 
>>> I've never seen this before and have used scp before to transfer large files
>>> without problems. This computer has been used in production for months and
>>> has a current uptime of 36 days. I have not been able to notice any problems
>>> copying files to the server via samba or netatalk, or any problems in
>>> apache.
>>> 
>>> Uname:
>>> 
>>> FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat Feb 19 01:02:54 EST
>>> 2011 root@xeon:/usr/obj/usr/src/sys/GENERIC  amd64
>>> 
>>> I've attached my dmesg and output of vmstat -z.
>>> 
>>> I have not restarted the sshd daemon or rebooted the computer.
>>> 
>>> Am glad to provide any other information or test anything else.
>>> 
>>> {snip vmstat -z and dmesg}
>> 
>> You didn't provide details about your networking setup (rc.conf,
>> ifconfig -a, etc.).  netstat -m would be useful too.
>> 
>> Next, please see this thread circa September 2010, titled "Network
>> memory allocation failures":
>> 
>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708
>> 
>> The user in that thread is using rsync, which relies on scp by default.
>> I believe this problem is similar, if not identical, to yours.
>> 
> 
> Please also provide your output of ( /usr/bin/limits -a ) for the server
> end and the client.
> 
> I am not quite sure I agree with the need for ifconfig -a but some
> information about the networking driver your using for the interface
> would be helpful, uptime of the boxes. And configuration of the pool.
> e.g. ( zpool status -a ;zfs get all  ) You should probably
> prop this information up somewhere so you can reference by URL whenever
> needed.
> 
> rsync(1) does not rely on scp(1) whatsoever but rsync(1) can be made to
> use ssh(1) instead of rsh(1) and I believe that is what Jeremy is
> stating here but correct me if I am wrong. It does use ssh(1) by
> default.
> 
> Its a possiblity as well that if using tmpfs(5) or mdmfs(8) for /tmp
> type filesystems that rsync(1) may be just filling up your temp ram area
> and causing the connection abort which would be expected. ( df -h ) would
> help here.

Hello,

I'm not using tmpfs/mdmfs at all. The clients yesterday were 3 different OSX 
computers (over gigabit). The FreeBSD server has 12gb of ram and no bce 
adapter. For what it's worth, the server is backed up remotely every night with 
rsync (remote FreeBSD uses rsync to pull) to an offsite (slow cable connection) 
FreeBSD computer, and I have not seen any errors in the nightly rsync.

Sorry for the omission of networking info, here's the output of the requested 
commands and some that popped up in the other thread:

http://www.cap-press.com/misc/

In rc.conf:  ifconfig_em1="inet 10.1.1.1 netmask 255.255.0.0"

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


scp: Write Failed: Cannot allocate memory

2011-07-01 Thread Scott Sipe
Hello,

I'm running 8.2-RELEASE and am having new problems with scp. When scping
files to a ZFS directory on the FreeBSD server -- most notably large files
-- the transfer frequently dies after just a few seconds. In my last test, I
tried to scp an 800mb file to the FreeBSD system and the transfer died after
200mb. It completely copied the next 4 times I tried, and then died again on
the next attempt.

On the client side:

"Connection to home closed by remote host.
lost connection"

In /var/log/auth.log:

Jul  1 14:54:42 freebsd sshd[18955]: fatal: Write failed: Cannot allocate
memory

I've never seen this before and have used scp before to transfer large files
without problems. This computer has been used in production for months and
has a current uptime of 36 days. I have not been able to notice any problems
copying files to the server via samba or netatalk, or any problems in
apache.

Uname:

FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat Feb 19 01:02:54 EST
2011 root@xeon:/usr/obj/usr/src/sys/GENERIC  amd64

I've attached my dmesg and output of vmstat -z.

I have not restarted the sshd daemon or rebooted the computer.

Am glad to provide any other information or test anything else.

Thanks,
Scott
# vmstat -z
ITEM SIZE LIMIT  USED  FREE  REQUESTS  FAILURES

UMA Kegs: 208,0,  205,   16,  205,0
UMA Zones:704,0,  205,0,  205,0
UMA Slabs:568,0,69503,32711, 72942509,0
UMA RCntSlabs:568,0, 4667,  583, 5664,0
UMA Hash: 256,0,   79,   11,   81,0
16 Bucket:152,0,   85,  165,  227,0
32 Bucket:280,0,  238,  196,  442,1
64 Bucket:536,0,  359,  222,  721,  126
128 Bucket:  1048,0, 5867,1,45296,   196701
VM OBJECT:216,0,26698,27140, 40712912,0
MAP:  232,0,7,   25,7,0
KMAP ENTRY:   120,   412734,15244, 3201, 200399669,0
MAP ENTRY:120,0,10795, 7557, 64096457,0
DP fakepg:120,0,0,0,0,0
SG fakepg:120,0,0,0,0,0
mt_zone: 2056,0,  278,1,  278,0
16:16,0,88568,   153688, 1913750912,0
32:32,0, 3825,12032, 260796507,0
64:64,0,70106,   195950, 21507520870,   
 0
128:  128,0,   428518,   337836, 3162416717,0
256:  256,0,47071,   197099, 517289563,0
512:  512,0,   593921,   656853, 563338488,0
1024:1024,0, 7190,18422, 25604261,0
2048:2048,0, 5271, 5559,  7565792,0
4096:4096,0, 3344, 2030, 57255230,0
Files: 80,0, 1603, 2312, 52443448,0
TURNSTILE:136,0, 2515,  265, 2545,0
umtx pi:   96,0,0,0,0,0
MAC labels:40,0,0,0,0,0
PROC:1120,0,  112, 1544,  1117378,0
THREAD:  1112,0, 1932,  582, 3101,0
SLEEPQUEUE:80,0, 2515,  298, 2545,0
VMSPACE:  392,0,   86,  784,  1118412,0
cpuset:72,0,2,   98,2,0
audit_record: 952,0,0,0,0,0
mbuf_packet:  256,0, 1024, 3584, 218220869,0
mbuf: 256,0,1, 3461, 562735494,0
mbuf_cluster:2048,25600, 4608, 1008, 5632,0
mbuf_jumbo_page: 4096,12800,0, 1859, 50017085,0
mbuf_jumbo_9k:   9216, 6400,0,0,0,0
mbuf_jumbo_16k: 16384, 3200,0,0,0,0
mbuf_ext_refcnt:4,0,0, 3024, 6978,0
g_bio:232,0,0, 2480, 492271394,0
ttyinq:   160,0,  135,  801, 2205,0
ttyoutq:  256,0,   72,  483, 1176,0
ata_request:  320,

Re: UFS SU+J

2011-06-30 Thread Scott Long

On Jun 30, 2011, at 4:28 AM, Ivan Voras wrote:

> On 29/06/2011 23:03, Mark Saad wrote:
> 
>> The svn sources are here http://svn.freebsd.org/base/projects/suj/8/ .
>> 
>> Why would suj not make it into 8-STABLE ?
> 
> It is a too large patch, and it changes a lot of important, known and
> working code (like softupdates). In other words, it's too risky.
> 

I'm preparing to put it into large-scale deployment at Yahoo on FreeBSD 7.  It 
wasn't ready for production until the latest round of fixes from Jeff and Kirk.

Scott


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: RELENG_8 does not build with CPUTYPE=core2

2011-05-14 Thread Scott Allendorf
I had discovered this method of working around the fatal error, but the 
question remains:  Should buildworld ever use the base compiler once the 
bootstrap compiler is built?  Much of buildworld had already completed 
with "-march=core2" being fed to the bootstrap compiler by the time this 
error occurs.


-Scott

Oliver Pinter wrote:

in two step you can elliminate the warning message:
1) compile and install world and kernel with commented CPUTYPE in make.conf
2) uncomment the CPUTYPE line, and recompile world and kernel

the problem is, the build system based on newer (4.2.2) gcc, and when
you set CPUTYPE=core2 than substitute -march=core2 in gcc parameter
list, but older base system cc (4.2.1) do not knows this option or
value.

On 5/14/11, Dominic Fandrey  wrote:

On 14/05/2011 02:38, Scott Allendorf wrote:

Dominic Fandrey wrote:

env CCACHE_PREFIX=/usr/local/bin/distcc /usr/local/bin/ccache cc -O2
-pipe -march=core2 -DHAVE_CONFIG_H
-I/usr/src/kerberos5/tools/make-roken/../../include -std=gnu99   -c
make-roken.c
/usr/src/kerberos5/tools/make-print-version/../../../crypto/heimdal/lib/vers/make-print-version.c:1:
error: bad value (core2) for -march= switch
/usr/src/kerberos5/tools/make-print-version/../../../crypto/heimdal/lib/vers/make-print-version.c:1:
error: bad value (core2) for -mtune= switch
make-roken.c:1: error: bad value (core2) for -march= switch
make-roken.c:1: error: bad value (core2) for -mtune= switch
distcc[44991] ERROR: compile make-roken.c on localhost failed
*** Error code 1
...

I saw this too when updating systems across the compiler update.  As
near as I can tell, some part of the build is not using the new
"core2"-aware compiler built as part of the toolchain and is using the
older, installed version instead.

Commenting out the CPUTYPE definition allowed my buildworlds to complete
successfully. ...

Thanks for the workaround!

It still worries me, that there's a bug in the bootstrapping. You
never know what kind of trouble comes from that kind of thing.

I hope this is going to be fixed.

Regards


--
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"



--
Scott C. Allendorf Email:  scott-allend...@uiowa.edu
Senior Systems Administrator   Office:   216A Van Allen Hall
Department of Physics and AstronomyVoice: (319) 335-0003
The University of Iowa FAX:   (319) 335-1753
Iowa City, Iowa  52242-1479ICBM:  41 39 43.6 N  91 31 55.1 W


smime.p7s
Description: S/MIME Cryptographic Signature


Re: RELENG_8 does not build with CPUTYPE=core2

2011-05-13 Thread Scott Allendorf

Dominic Fandrey wrote:

env CCACHE_PREFIX=/usr/local/bin/distcc /usr/local/bin/ccache cc -O2 -pipe 
-march=core2 -DHAVE_CONFIG_H 
-I/usr/src/kerberos5/tools/make-roken/../../include -std=gnu99   -c make-roken.c
/usr/src/kerberos5/tools/make-print-version/../../../crypto/heimdal/lib/vers/make-print-version.c:1:
 error: bad value (core2) for -march= switch
/usr/src/kerberos5/tools/make-print-version/../../../crypto/heimdal/lib/vers/make-print-version.c:1:
 error: bad value (core2) for -mtune= switch
make-roken.c:1: error: bad value (core2) for -march= switch
make-roken.c:1: error: bad value (core2) for -mtune= switch
distcc[44991] ERROR: compile make-roken.c on localhost failed
*** Error code 1
1 error
*** Error code 2
distcc[44988] ERROR: compile 
/usr/src/kerberos5/tools/make-print-version/../../../crypto/heimdal/lib/vers/make-print-version.c
 on localhost failed
*** Error code 1
1 error
*** Error code 2
2 errors
*** Error code 2
1 error
*** Error code 2
1 error
*** Error code 2

# make -VCPUTYPE
nocona
# make -VCFLAGS
-O2 -pipe -march=nocona

CPUTYPE is set with CPUTYPE?=core2.



I saw this too when updating systems across the compiler update.  As 
near as I can tell, some part of the build is not using the new 
"core2"-aware compiler built as part of the toolchain and is using the 
older, installed version instead.


Commenting out the CPUTYPE definition allowed my buildworlds to complete 
successfully.  After the resulting system was installed I was then able 
to restore CPUTYPE to "core2" and 'make buildworld' would succeed 
(presumably because the base system compiler now understands 
"-march=core2").


Hope this help...

-Scott

--
Scott C. Allendorf Email:  scott-allend...@uiowa.edu
Senior Systems Administrator   Office:   216A Van Allen Hall
Department of Physics and AstronomyVoice: (319) 335-0003
The University of Iowa FAX:   (319) 335-1753
Iowa City, Iowa  52242-1479ICBM:  41 39 43.6 N  91 31 55.1 W


smime.p7s
Description: S/MIME Cryptographic Signature


Re: ZFS vs OSX Time Machine

2011-04-28 Thread Scott Sipe
On Apr 28, 2011, at 3:56 PM, Jeremy Chadwick wrote:

> On Thu, Apr 28, 2011 at 11:33:22PM +0930, Daniel O'Connor wrote:
>> Does anyone else use ZFS to store TM backups?
>> 
>> I find that whenever my laptop (over wifi!) starts a TM the ZFS machine it's 
>> backing up to grinds to a halt.. Other systems streaming stuff over NFS from 
>> it also tend to stall..
>> 
>> I presume that TM is doing something which causes ZFS some issues but I'm 
>> not sure how to find out what the real problem is let alone how to fix it..
>> 
>> I am running FreeBSD midget.dons.net.au 8.2-PRERELEASE FreeBSD 
>> 8.2-PRERELEASE #8 r217094M: Sat Jan  8 11:15:07 CST 2011 
>> dar...@midget.dons.net.au:/usr/obj/usr/src/sys/MIDGET  amd64
>> 
>> It is a 5 disk RAIDZ1 with 1.29Tb free using WD10EADS drives.
>> 
>> I don't see any SMART errors or ZFS warnings.
>> 
>> I have the following ZFS related tunables
>> 
>> vfs.zfs.arc_max="3072M"
>> vfs.zfs.prefetch_disable="1" 
>> vfs.zfs.txg.timeout=5
>> vfs.zfs.cache_flush_disable=1
> 
> Are the last two actually *working* in /boot/loader.conf?  Can you
> verify by looking at them via sysctl?  AFAIK they shouldn't work, since
> they lack double-quotes around the values.  Parsing errors are supposed
> to throw you back to the loader prompt.  See loader.conf(5) for the
> syntax.

In my /boot/loader.conf I have:

vfs.zfs.arc_max=62
vfs.zfs.vdev.min_pending=4
vfs.zfs.vdev.max_pending=8
vfs.zfs.txg.timeout=5
vfs.zfs.prefetch_disable=1

And they all are properly reflected in sysctl values (no parse errors seen).

> Next, you COULD try using Samba/CIFS on the FreeBSD box to see if you
> can narrow the issue down to bad NFS performance.  Please see this post
> of mine about tuning Samba on FreeBSD (backed by ZFS) to get extremely
> good performance.  Many people responded and said their performance
> drastically improved (you can see the thread yourself).  The trick is
> AIO.  You can ignore the part about setting vm.kmem_size in loader.conf;
> that advice is now old/deprecated (does not pertain to you given the
> date of your kernel), and vfs.zfs.txg.write_limit_override is something
> you shouldn't mess with unless absolutely needed to leave it default:
> 
> http://lists.freebsd.org/pipermail/freebsd-stable/2011-February/061642.html

Just wanted to note that we were having a terrible time with ZFS performance. 
Copying just a single large file over the network would bring interactive 
system usage to an absolute crawl (system is 2x xeons, 12gb ram). Thanks to 
your optimizations we have great ZFS + Samba performance. We also use Netatalk 
for afpd file sharing (though I have not tried Netatalk as a Time Machine 
target) and our performance is quite good there as well.

Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: problems booting 8.x kernel

2011-03-15 Thread Mike Scott

On 15/03/11 10:17, Ian Smith wrote:

On Tue, 15 Mar 2011, Mike Scott wrote:

  >  I think I'll have to give up because I'm out of ideas, settle for a linux
  >  kernel on this particular target h/w,  and just hope this issue doesn't
  >  affect another machine I have which is due for imminent upgrade from 6.2 to
  >  8.x.

Does it boot ok on 7.4?  Might that be suitable for this box's purpose?


Don't know about booting. I'm looking at multiple installations, which 
confuses me as much as anyone :-)


The small server currently running 6.2 I need^W want to get up to 8.x 
for the userland camera support.


The desktop I've mostly been referring to previously is a testbed before 
I consider buying new hardware - I want to get a system installed, write 
some software and get it all running before committing to significant 
(at least for me!) expense. The problem being that I see some advantage 
in trying debian kfreebsd (I need alsaplayer, and assume this would be 
available) - which uses an 8.1 kernel and is where all this started 
because it wouldn't boot! If pushed, I'd probably drop back to straight 
debian/linux, but I'd rather stick with something I know a little about; 
besides I happen to like freebsd :-)


Does that make some sort of sense??



--
Mike Scott
Harlow, Essex, England
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: problems booting 8.x kernel

2011-03-15 Thread Mike Scott

On 15/03/11 09:54, Mike Scott wrote:

Sorry, a line or two 'got away from' that last message, so part won't 
make sense. Should be



I've found the occasional similar comment on the web, such as
http://forums.freebsd.org/showthread.php?p=117767
That refers to a hang after 'flowtable cleaner' while I've also seen 
comments about:

hanging on amd64 smp hardware after 'ata pseudoraid loaded' (which is a
message I sometimes see just before the 'flowtable' message); maybe
related, maybe not. But no answers :-{


--
Mike Scott
Harlow, Essex, England
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: problems booting 8.x kernel

2011-03-15 Thread Mike Scott

On 14/03/11 18:48, Olivier Cochard-Labbé wrote:

On Mon, Mar 14, 2011 at 11:06 AM, Mike Scott  wrote:


Basically, I'm finding that the 8.1 and 8.2 kernels hang on certain machines
during bootup, specifically during device discover and module load. I've
tried 8.2 off the current release CD, and also 8.1 off the Debian kfreebsd
6.0.0 distribution - both have the same issue.



I've got the same problem with one motherboard (Asus K8N7-E deluxe):
Since FreeBSD 8.0, there is an ACPI bug (pr 142263) in the FreeBSD
kernel that detect wrong address for all devices.

As example, here is an extract of the dmesg on FreeBSD 7.2:
nfe0:  port
0xb000-0xb007 mem 0xd300-0xd3000fff irq 21 at device 10.0 on pci0
nfe0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xd300

But, since FreeBSD 8.0, the dmesg report this (note the reserved mem diff):
nfe0:  irq 21 at device
10.0 on pci0
nfe0: Lazy allocation of 0x100 bytes rid 0x10 type 3 at 0x8100
nfe0: Reserved 0x100 bytes for rid 0x10 type 3 at 0x8100!

Try to boot by disabling ACPI into the FreeBSD boot screen… This solve
the problem on my motherboard.


Thanks for the note.

I've tried disabling everything - floppy, usb and acpi - in the BIOS. 
Still no joy; it just hangs. I don't get the 'lazy allocation' message 
you do; nfe0 looks reasonable here.


I'm using boot -v. Depending on whether I use -p (single step) as well, 
the hang point changes from just after "flowtable cleaner started" or 
the message about a firewire bus reset I quoted earlier.



I've found the occasional similar comment on the web, such as
http://forums.freebsd.org/showthread.php?p=117767
hanging on amd64 smp hardware after 'ata pseudoraid loaded' (which is a 
message I sometimes see just before the 'flowtable' message); maybe 
related, maybe not. But no answers :-{



I think I'll have to give up because I'm out of ideas, settle for a 
linux kernel on this particular target h/w,  and just hope this issue 
doesn't affect another machine I have which is due for imminent upgrade 
from 6.2 to 8.x.




--
Mike Scott
Harlow, Essex, England
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


problems booting 8.x kernel

2011-03-14 Thread Mike Scott
(Apologies if this isn't the appropriate forum - I've tried on 
cubf.misc, where it was suggested I come here. More apologies as I'm not 
overly familiar with the bootup sequence, and have probably got some 
terms wrong.)


Basically, I'm finding that the 8.1 and 8.2 kernels hang on certain 
machines during bootup, specifically during device discover and module 
load. I've tried 8.2 off the current release CD, and also 8.1 off the 
Debian kfreebsd 6.0.0 distribution - both have the same issue.


Basically, the console gets as far as the messages
...
acd0: DVDR .. at ata1-master UDMA66
uhub0: 3 ports with 6 removable, self powered
uhub1: 3 ports with 6 removable, self powered
uhub2: 6 ports with 6 removable, self powered

and there it hangs.

If I run the boot in 'single step' mode (set boot_pause, set 
boot_verbose), I find the one machine hangs consistently. The screen 
shows at the bottom the text


fwip0: Firewire address: .. maxrec 2048
fwohci0: Initiate bus reset
fwohci0: fwohci_intr_core: BUS reset
fwohci0: fwohci_intr_core: node_id=0x, SelfID Count=1
 CYCLEMASTER mode
pcib2:  at device 30.0 on pci0
fwohci0: fwohci_intr_core: BUS reset

and that's it. Dead as a dodo. (Can't check the other machine - it's the 
one I'm typing this on :-)



OTOH if I disable usb and floppy in the bios, and do a 'boot -v', the 
messages stop after

ata5: Identifying devices
ata5: New devices 
ATA pseudoRAID loaded
flowtable cleaner started


WHile a 'boot -v -p' reaches instead lines about atkbdc0, then atkbd0 
and finally

atkbd: the current kbd controller command byte 0047

before hanging.


It suggests to me the final line shown is not particularly related to 
the lockup, but that something else is happening behind the scenes.



Out of 4 machines I've tried this on, it hangs on two, and boots OK on 
two. The OK ones are an acer laptop and a machine with IIRC an SiS 
chipset - both machines that it fails on are (different) nf7s mobos, and 
also have floppy (but disabling this does not fix the problem, it might 
be changing the exact message though) and firewire (can't remove or 
disable).


I'm at rather a loss; I've found nothing on the net to suggest there's a 
general problem. I know an older version (6.2) of fbsd boots happily on 
all four of these machines; the two failing machines are currently 
running debian and dual-boot ubuntu/XP, so it's not a hardware fault as 
such.


Any ideas please as to what's going on, or where best to look for more 
information?


TIA.


--
Mike Scott
Harlow, Essex, England
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


  1   2   3   4   5   6   7   8   9   >