Re: Does FreeBSD 13 disable the VEV cache in ZFS ?

2021-05-14 Thread Stefan Esser
Am 14.05.21 um 10:34 schrieb Pete French:
> 
> Am just upgrading my machiens, and have noticed an oddity.
> This is on a machine runnign 12.2
> 
> # zfs-stats -D
> 
> 
> ZFS Subsystem Report  Fri May 14 08:30:50 2021
> 
> 
> VDEV Cache Summary:   88.31   m
>   Hit Ratio:  31.20%  27.55   m
>   Miss Ratio: 68.48%  60.47   m
>   Delegations:0.32%   284.97  k
> 
> 
> 
> 
> This is on a machine running 13.0
> 
> # zfs-stats -D
> 
> 
> ZFS Subsystem Report  Fri May 14 08:32:18 2021
> 
> 
> VDEV cache is disabled
> 
> 
> 
> Same config on both. So, is it really disabled, or is zfs-stats getting this
> wrong for some reason ? I cant find any refernbce to this in the release notes
> or by googling.

Hi Pete,

zfs-stats has last been modified by me to make it work with the sysctl names
changed by OpenZFS. (But I might have missed a few ...)

Could you check the values of the following sysctl variables:

vfs.zfs.vdev.cache_size
vfs.zfs.vdev.cache_bshift
vfs.zfs.vdev.cache_max
kstat.zfs.misc.vdev_cache_stats.misses
kstat.zfs.misc.vdev_cache_stats.hits
kstat.zfs.misc.vdev_cache_stats.delegations

On my system, vfs.zfs.vdev.cache_size is always 0 - and that results in the
message "VDEV cache is disabled" that you got, too. And I'd guess that the
kstat variables will all report 0, too (not surprising if the size is 0).


A quick web search reveals:

https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html#zfs-vdev-cache-size

Quote:

Note: with the current ZFS code, the vdev cache is not helpful and in some
cases actually harmful. Thus it is disabled by setting the zfs_vdev_cache_size 
= 0


You may want to try setting "vfs.zfs.vdev.cache_size" (name changed, it used
to be cache.size instead of cache_size) in /boot/loader.conf and perform a
few performance tests. (I have not verified that the current implementation
actually uses and supports a value specified that way, though.)

Regards, STefan



OpenPGP_signature
Description: OpenPGP digital signature


Re: freebsd-update and speed

2021-04-16 Thread Stefan Esser

Am 16.04.21 um 10:17 schrieb Ferdinand Goldmann:

On Thu, 15 Apr 2021, Rainer Duffner wrote:




It’s OK-ish most of the time here (CH).

It does *NOT* work through a proxy, due to the use of pipelined http-requests.

What’s your internet-connection?


The 10Gbit uplink of my university, directly connected to the internet, 

not

behind a proxy. I don't think that's the problem. When update3 was still online
I'd always use that and updates were really fast back then.

Now that update3 is gone all update servers seem to be in the US or Australia.

After waiting for nearly one hour:

..853085408550856085708580859086008610862086308640865086608670868086908700  
done.

Applying patches... done.
Fetching 9628 files... gunzip: (stdin): unexpected end of file
0a4626107f3700cf5f87bd9c123bf427bd5a8561aadc2eca1d1605465c090935 has incorrect 
hash.


This is getting kind of tiresome. :(


There was a discussion about adding another mirror in Europe, but
it was decided that a suitable system already existed.

Not sure whether this mirror actually has been provided, but I do
remember that it should have been a well connected system (maybe in
NL?) that has been selected to perform all FreeBSD mirror services
for users in Europe with little latency and high throughput.

Regards, STefan



OpenPGP_signature
Description: OpenPGP digital signature


Re: FreeBSD 13/stable and zpool upgrade

2021-02-19 Thread Stefan Esser

Am 19.02.21 um 22:07 schrieb Warner Losh:

To avoid confusion and errors, I think a proper boot1.efifat should be put
back into the base system so that people don't have to track the above
recipe (which is likely to change).



There's no safe way to do this. The old process has been deprecated because
it is dangerous and inflexible.

For upgrading from an old-style installation, the new process is actually
fairly simple:

mount -t msdos /dev/daXsY /mnt
cp /boot/loader.efi /mnt/efi/boot/bootx64.efi
umount /mnt

Though if you're using eftbootmgr to select things, you'll need to copy to
efi/freebsd/loader.efi. Different BIOS vendors have a number of different
bugs that make this trickier than it should be, alas.


I have been using the attached script to update book blocks on EFI
or GPT ZFS MBR boot disks for quite some time.

More partition schemes and boot block types could be supported by
extending the covered cases and/or passing the partition type (GPT
vs. MBR).

Regards, STefan
#!/bin/sh

MOUNTPOINT=/.efi

BOOT1=/boot/pmbr
BOOT2=/boot/gptzfsboot
BOOTEFI=/boot/loader.efi

update_gptzfsboot ()
{
disk=$1
partname=$2

if strings /dev/$partname | grep -q ZFS; then
index=${partname##*p}
gpart bootcode -b $BOOT1 -p $BOOT2 -i $index $disk
fi
}

update_efiboot ()
{
disk=$1
partname=$2
efidir=$MOUNTPOINT/efi/boot

umount $MOUNTPOINT 2>/dev/null
if mount -t msdos "/dev/$partname" $MOUNTPOINT; then
mkdir -p $efidir
rm -f $efidir/BOOTx64o.efi
mv $efidir/BOOTx64.efi $efidir/BOOTx64o.efi
cp $BOOTEFI /.efi/efi/boot/BOOTx64.efi
umount $MOUNTPOINT
fi
}

while read line
do
set - $line

if [ "$1" = "=>" ]; then
disk=$4
partscheme=$5
else
partname=$3
parttype=$4

case $parttype in
efi)
update_efiboot $disk $partname
;;
freebsd-boot)
update_gptzfsboot $disk $partname
;;
esac
fi

done <<*EOF
$(gpart show -p)
*EOF


OpenPGP_signature
Description: OpenPGP digital signature


Re: calendar (1) - patch to correct error description

2020-10-31 Thread Stefan Esser

JFYI: port version 0.7 or the latest sources in CURRENT include a number
of further enhancements (and an important fix to allow the calendar to
be build on 11.x/12.x which do not have _PATH_LOCALBASE in paths.h).

I have added an #undef command (since it was supported by the calendar
program that used cpp and since it occurs in one of calendar files that
have traditionally been included in FreeBSD) and checks for the correct
use of #else and #endif.

Warnings and error messages caused by malformed input files are now
reported with filename and line number.

The man-page and tests have not been updated to reflect this latest
set of changes, I'll hope to be able to update them in the next few
days.

Please let me know if there are any issues with these changes, since I
want to merge them to 11-STABLE and 12-STABLE sometime next week.

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: calendar (1) - patch to correct error description

2020-10-30 Thread Stefan Esser

Am 30.10.20 um 00:48 schrieb Julian H. Stacey:

Here's another 2 calendar errors, presumably cpp, that manifest in 12.2-STABLE,
that 9.2-RELEASE gets right.

Man calendar:
Empty lines and lines protected by the C commenting syntax
(/* ...  */) are ignored.

--- Input ~/.calendar/calendar
friday  fish
/*
  * Oct 21  AAA
  */
friday  and chips
---12.2-STABLE output
Oct 30* and chips
Oct 30  AAA
Oct 30* fish
Oct 31  AAA
---9.2-RELEASE output
Oct 30* and chips
Oct 30* fish
---

Error 1:Why does it emit AAA ?


The version you used does only support /* ... */ on a single line.
The "*" in front of "Oct" seems to have been parsed as a wild-card,
but I have not checked why it lead to be interpreted as "Oct 30".

This is fixed with the comment processing that I have added to the
internal pre-processor.


Error 2:Why twice ?


No idea and I do not consider this relevant now that the issue is
fixed.


Puzzle: Doesnt happen if you change Oct above to Aug inside the comment.


Feel free to solve this puzzle, I really do not have the time to
waste on this question ;-)


(PS both do a nasty stack unstack, which may look familiar to us
programmers, but looks silly to normal peopler, inverting fish & chips)


Yes, a linked list that got built-up be putting the new element at
the head and the previously added values into the "next" field of
that element.


Please check the calendar version in -CURRENT or the deskutil/calendar
port version 0.6. Both issues should be fixed there.

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: calendar (1) - patch to correct error description

2020-10-29 Thread Stefan Esser

Am 29.10.20 um 13:07 schrieb Diane Bruce:

On Thu, Oct 29, 2020 at 01:29:39AM +0100, Julian H. Stacey wrote:

Hi Stefan

Am 28.10.20 um 13:02 schrieb Julian H. Stacey:

man calendar states:
"The calendar internal cpp does not correctly do #ifndef and will discard
the rest of the file if a #ifndef is triggered."
That is wrong, as proved by test file:


If I was asked about this I'd suggest ripping out the internal cpp
and switching back to an external cpp IFF calendar is all in ports.
The idea when the original very hurried hack was done was to remove
more from base. No longer a problem if using ports.


This is a possibility, but there exists no plan to remove the calendar
program from base, currently.

I have created the deskutils/calendar port for RELEASE users that want
to take advantage of recent changes to the calendar program, but this
port exists for only this particular purpose.

Piping of the calendar files through CPP leads to other problems, e.g.
how to feed error messages from CPP back to the calendar program in
a sensible way.

I have made the semantics of #define and #if(n)def more similar to
that of a CPP, but there still is one major difference:

#define COND true
#ifdef COND

will not get the result you might expect, since "COND true" has been
defined and

#ifdef COND true

will evaluate to true.

This is easily changed (I'd use only the first word in #define and
reject #ifdef if followed by more than one word), but while being
nearer to what CPP would give, it deviates from many years of
practice in FreeBSD and might not be allowed to be MFCed.

And different semantics in -CURRENT vs. -STABLE are even less
acceptable, IMHO.

But I'd like to apply such a patch, anyway.

There are other changes to the semantics that are possible, e.g.
to check that #ifdef/#endif are balanced or that there is no #else
outside an #ifdef/#endif range.

Implementing such checks is quite simple, given the structure of
the code, but I'm not sure that this is required or even a good
idea, since it might break current calendar data files that are
not really well-formed ...

Best regards, STefan
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: calendar (1) - patch to correct error description

2020-10-28 Thread Stefan Esser

Am 28.10.20 um 13:02 schrieb Julian H. Stacey:

man calendar states:
   "The calendar internal cpp does not correctly do #ifndef and will discard
   the rest of the file if a #ifndef is triggered."
That is wrong, as proved by test file:
---
// Test data for ~/.calendar/calendar
*   bla0
#ifdef DEBUG1
* 28bla1
#endif
#ifdef DEBUG2
* 28bla2
#endif
#ifndef DEBUG3
* 28bla3
#endif
#define DEBUG4 TRUE
#ifndef DEBUG4
* 28bla4
#endif
* 28bla5
---
Produces:
---
Oct 28  bla5
Oct 28  bla4
Oct 28  bla3
Oct 28  bla2
Oct 28  bla1
---
Correction:
   The calendar internal cpp ignores directives #ifdef , #ifndef and #endif ,
   and simply including intervening text regardless.


Hi Julian,

no, the calendar program worked as documented, see the BUGS section of
the man-page:

.Sh BUGS
The
.Nm
internal cpp does not correctly do #ifndef and will discard the rest-of 
the file if a #ifndef is triggered.-It also has a maximum of 50 include 
file and/or 100 #defines and only recognises #include, #define and

#ifndef.

There is no mention of #ifdef being supported ...

And your "#ifndef DEBUG4" did not trigger, since the whole line after
#define is used as the identifier, in your case "DEBUG4 TRUE".

This is not obvious from reading the man-page and it might be more
intuitive, if the identifier was only the word up to the first blank,
but the code in the calendar program does just strip off leading and 
trailing white-space. It might be too late to change this behavior.


I have updated the code in -CURRENT to support #ifdef (MFC in 3 days)
and I plan to add supported for nested conditions in -CURRENT (not
sure whether that should be merged to -STABLE, though).

I could change the #define and #if/ifndef to only consider the first
following word, but do not plan to do that at this time.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


New bc and dc implementation merged to 12-STABLE (off by default)

2020-08-08 Thread Stefan Esser
The new bc and dc implementation that has become the default version in
FreeBSD-CURRENT has been merged to 12-STABLE and will be included in the
upcoming release 12.2.

It has a number of advantages (standards compliance, speed, NLS support)
over the one in previous FreeBSD releases, but in accordance with POLA
it will require the option WITH_GH_BC=yes to be passed to buildworld and
installworld in FreeBSD-12.

With WITH_TESTS=yes a large number of tests is built and available as
part of the Kyua based test framework - test results from platforms
other than i386 and amd64 are welcome.

The same version of bc and dc is also available as a port or package,
but built with less aggressive compiler optimizations and without
any test cases being installed (but they can be run with "make test"
in the port directory /usr/ports/math/gh-bc).
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.2-STABLE kernel wired memory leak

2019-02-14 Thread Stefan Esser
Am 13.02.19 um 10:59 schrieb Andriy Gapon:
> On 12/02/2019 20:17, Eugene Grosbein wrote:
>> 13.02.2019 1:14, Eugene Grosbein wrote:
>>
>>> Use following command to see how much memory is wasted in your case:
>>>
>>> vmstat -z | awk -F, '{printf "%10s %s\n", $2*$5/1024/1024, $1}' | sort 
>>> -k1,1 -rn | head
>>
>> Oops, small correction:
>>
>> vmstat -z | sed 's/:/,/' | awk -F, '{printf "%10s %s\n", $2*$5/1024/1024, 
>> $1}' | sort -k1,1 -rn | head
> 
> I have a much uglier but somewhat more informative "one-liner" for
> post-processing vmstat -z output:
> 
> vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=0; cache=0; used=0 } {u =
> $2 * $4; c = $2 * $5; t = u + c; cache += c; used += u; total += t; name=$1;
> gsub(" ", "_", name); print t, name, u, c} END { print total, "TOTAL", used,
> cache } ' | sort -n | perl -a -p -e 'while (($j, $_) = each(@F)) { 1 while
> s/^(-?\d+)(\d{3})/$1,$2/; print $_, " "} print "\n"' | column -t
> 
> This would be much nicer as a small python script.

Or, since you are already using perl:


#!/usr/local/bin/perl5





open STDIN, "vmstat -z |" or die "Failed to run vmstat program";
open STDOUT, "| sort -n @ARGV -k 2" or die "Failed to pipe through sort";

$fmt="%-30s %8.3f %8.3f %8.3f %6.1f%%\n";
while () {
($n, $s, undef, $u, $c) = split /[:,] */;
next unless $s > 0;
$n =~ s/ /_/g;
$s /= 1024 * 1024;
$u *= $s;
$c *= $s;
$t =  $u + $c;
next unless $t > 0;
printf $fmt, $n, $t, $u, $c, 100 * $c / $t;
$cache += $c;
$used  += $u;
$total += $t;
}
printf $fmt, TOTAL, $total, $used, $cache, 100 * $cached / $total;
close STDOUT;


This script takes additional parameters, e.g. "-r" to reverse the
sort or "-r -k5" to print those entries with the highest percentage
of unused memory at the top.

(I chose to suppress lines with a current "total" of 0 - you may
want to remove the "next" command above the printf in the loop to
see them, but they did not seem useful to me.)

> Or, even, we could add a sort option for vmstat -z / -m.

Yes, and the hardest part might be to select option characters
for the various useful report formats. ;-)

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: lightly loaded system eats swap space

2018-06-19 Thread Stefan Esser
Am 19.06.18 um 03:48 schrieb Erich Dollansky:
> A very long time ago - and not on FreeBSD but maybe on a real BSD - I
> worked with a system that swapped pages out just to bring it back as
> one contiguous block. This made a difference those days. I do not know
> if the code made it out of the university I was working at. I just
> imagine now that the code made it out and is still in use with the
> opposite effect.

If this was on a VAX, then it was due to a short-coming of the
MMU of the VAX, which used one linear array (in system virtual
memory) to hold physical addresses of user pages of all programs.
Each user program had 2 slices in this array (1 growing up, 1
growing down for the stack) and whenever a program allocated a
new page, this slice needed to grow. That leads to fragmentation
(same a problem as with realloc() for an ever growing array), and
when there was no contiguous free space in the array for a grown
slice, then all process where swapped out (resulting in this whole
page table array being cleared and thus without fragmentation,
since swapped-out processes needed no space in this array).

This was a solution that worked without the table walk used in
todays VM systems. System pages were mapped by a linear page table
in physical memory, while user programs used the above described
linear page table in system virtual memory.

Nothing of the above applies to any other architecture than the
VAX and thus the swap-out of all user processes serves no purpose
on any other system. It was an implementation detail of the VAX
VM code, not a BSD Unix feature.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: more data: SCHED_ULE+PREEMPTION is the problem

2018-04-09 Thread Stefan Esser
Am 07.04.18 um 16:18 schrieb Peter:
> 3. kern.sched.preempt_thresh
> 
> I could make the problem disappear by changing kern.sched.preempt_thresh  from
> the default 80 to either 11 (i5-3570T) or 7 (p3) or smaller. This seems to
> correspond to the disk interrupt threads, which run at intr:12 (i5-3570T) or
> intr:8 (p3).

[CC added to include Jeff as the author of the ULE scheduler ...]

Since I had somewhat similar problems on my systems (with 4 Quad-Core with SMT
enabled, i.e. 8 threads of execution) with compute bound processes keeping I/O
intensive processes from running (load average of 12 with 8 "CPUs"), and these
problems where affected by preempt_thresh, I checked how this variable is used
in the scheduler. The code is in /sys/kern/sched_ule.c.

It controls, whether a thread that has become runnable (e.g., after waiting
for disk I/O to complete) will preempt the thread currently running on "this"
CPU (i.e. the one executing this test in the kernel).

IMHO, sched_preempt should default to a much higher number than 80 (e.g. 190),
but maybe I misunderstand some of the details ...


static inline int
sched_shouldpreempt(int pri, int cpri, int remote)
{

The parameters are:

pri: the priority if the now runnable thread
cpri: the priority of the thread that currently runs on "this" CPU
remote: whether to consider preempting a thread on another CPU

The priority values are those displayed by top or ps -l as "PRI", but with an
offset of 100 applied (i.e. pri=120 is displayed as PRI=20 in top).

If this thread has less priority than the currently executing one (cpri), the
currently running thread will not be preempted:

/*


 * If the new priority is not better than the current priority there is


 * nothing to do.


 */
if (pri >= cpri)
return (0);

If the current thread is the idle thread, it will always be preempted by the
now runnable thread:

/*


 * Always preempt idle.


 */
if (cpri >= PRI_MIN_IDLE)
return (1);

A value of preempt_thresh=0 (e.g. if "options PREEMPTION" is missing in the
kernel config) lets the previously running thread continue (except if was the
idle thread, which has been dealt with above). The compute bound thread may
continue until its quantum has expired.

/*


 * If preemption is disabled don't preempt others.


 */
if (preempt_thresh == 0)
return (0);

For any other value of preempt_thresh, the new priority of the thread that
just has become runnable will be compared to preempt_thresh and if this new
priority is higher (lower numeric value) or equal to preempt_thresh, the
thread for which (e.g.) disk I/O finished will preempt the current thread:

/*


 * Preempt if we exceed the threshold.


 */
if (pri <= preempt_thresh)
return (1);

===> This is the only condition that depends on preempt_thresh > 0 <===

The flag "remote" controls whether this thread will be scheduled to run, if
its priority is higher or equal to PRI_MAX_INTERACT (less than or equal to
151) and if the opposite is true for the currently running thread (cpri).
The value of remote will always be 0 on kernels built without "options SMP".

/*


 * If we're interactive or better and there is non-interactive


 * or worse running preempt only remote processors.


 */
if (remote && pri <= PRI_MAX_INTERACT && cpri > PRI_MAX_INTERACT)
return (1);


The critical use of preempt_thresh is marked above. If it is 0, no preemption
will occur. On a single processor system, this should allow the CPU bound
thread to run for as long its quantum lasts.

A value of 120 (corresponding to PRI=20 in top) will allow the I/O bound
thread to preempt any other thread with lower priority (cpri > pri). But in
case of a high priority kernel thread being active during this test (with a
low numeric cpri value), the I/O bound process will not preempt that higher
priority thread (i.e. some high priority kernel thread).

Whether the I/O bound thread will run (instead of the compute bound) after
the higher priority thread has given up the CPU, will depend on the scheduler
decision which thread to select. And for "timeshare" threads, this will often
not be the higher priority (I/O bound) thread, but the compute bound thread,
which then may execute until next being interrupted by the I/O bound thread
(which will not happen, if no new I/O has been requested).

This might explain, why setting preempt_thresh to a very low value (in the
range of real-time kernel threads) enforces preemption of the CPU bound
thread, while any higher (numeric) value of preempt_thresh prevents this
and makes tdq_choose() often select the low priority CPU bound over the
higher priority I/O bound thread.

BUT the first test in sched_shouldpreempt() should prevent any user process
from ever preempting a real-time thread "if (pri >= cpri) return 0;".

For preemption to occur,  pri must be numerically lower than cpri, and
pri 

Try setting kern.sched.preempt_thresh != 0 (was: Re: kern.sched.quantum: Creepy, sadistic scheduler)

2018-04-04 Thread Stefan Esser
Am 04.04.18 um 12:39 schrieb Alban Hertroys:
> 
>> On 4 Apr 2018, at 2:52, Peter  wrote:
>>
>> Occasionally I noticed that the system would not quickly process the
>> tasks i need done, but instead prefer other, longrunning tasks. I
>> figured it must be related to the scheduler, and decided it hates me.
> 
> If it hated you, it would behave much worse.
> 
>> A closer look shows the behaviour as follows (single CPU):
> 
> A single CPU? That's becoming rare! Is that a VM? Old hardware? Something 
> really specific?
> 
>> Lets run an I/O-active task, e.g, postgres VACUUM that would
> 
> And you're running a multi-process database server on it no less. That is 
> going to hurt, no matter how well the scheduler works.
> 
>> continuousely read from big files (while doing compute as well [1]):
>>> poolalloc   free   read  write   read  write
>>> cache   -  -  -  -  -  -
>>>  ada1s47.08G  10.9G  1.58K  0  12.9M  0
>>
>> Now start an endless loop:
>> # while true; do :; done
>>
>> And the effect is:
>>> poolalloc   free   read  write   read  write
>>> cache   -  -  -  -  -  -
>>>  ada1s47.08G  10.9G  9  0  76.8K  0
>>
>> The VACUUM gets almost stuck! This figures with WCPU in "top":
>>
>>>  PID USERNAME   PRI NICE   SIZERES STATETIMEWCPU COMMAND
>>> 85583 root990  7044K  1944K RUN  1:06  92.21% bash
>>> 53005 pgsql   520   620M 91856K RUN  5:47   0.50% postgres
>>
>> Hacking on kern.sched.quantum makes it quite a bit better:
>> # sysctl kern.sched.quantum=1
>> kern.sched.quantum: 94488 -> 7874
>>
>>> poolalloc   free   read  write   read  write
>>> cache   -  -  -  -  -  -
>>>  ada1s47.08G  10.9G395  0  3.12M  0
>>
>>>  PID USERNAME   PRI NICE   SIZERES STATETIMEWCPU COMMAND
>>> 85583 root940  7044K  1944K RUN  4:13  70.80% bash
>>> 53005 pgsql   520   276M 91856K RUN  5:52  11.83% postgres
>>
>>
>> Now, as usual, the "root-cause" questions arise: What exactly does
>> this "quantum"? Is this solution a workaround, i.e. actually something
>> else is wrong, and has it tradeoff in other situations? Or otherwise,
>> why is such a default value chosen, which appears to be ill-deceived?
>>
>> The docs for the quantum parameter are a bit unsatisfying - they say
>> its the max num of ticks a process gets - and what happens when
>> they're exhausted? If by default the endless loop is actually allowed
>> to continue running for 94k ticks (or 94ms, more likely) uninterrupted,
>> then that explains the perceived behaviour - buts thats certainly not
>> what a scheduler should do when other procs are ready to run.
> 
> I can answer this from the operating systems course I followed recently. This 
> does not apply to FreeBSD specifically, it is general job scheduling theory. 
> I still need to read up on SCHED_ULE to see how the details were implemented 
> there. Or are you using the older SCHED_4BSD?
> Anyway...
> 
> Jobs that are ready to run are collected on a ready queue. Since you have a 
> single CPU, there can only be a single job active on the CPU. When that job 
> is finished, the scheduler takes the next job in the ready queue and assigns 
> it to the CPU, etc.

I'm guessing that the problem is caused by kern.sched.preempt_thresh=0, which
prevents preemption of low priority processes by interactive or I/O bound
processes.

For a quick test try:

# sysctl kern.sched.preempt_thresh=1

to see whether it makes a difference. The value 1 is unreasonably low, but it
has the most visible effect in that any higher priority process can steal the
CPU from any lower priority one (high priority corresponds to low PRI values
as displayed by ps -l or top).

Reasonable values might be in the range of 80 to 224 depending on the system
usage scenario (that's what I found to have been suggested in the mail-lists).

Higher values result in less preemption.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: KBI unexpexted change in stable/11 ?

2018-03-28 Thread Stefan Esser
Am 28.03.18 um 17:30 schrieb John:
> On Wed, 28 Mar 2018, at 15:20, Greg Byshenk wrote:
>> On Wed, Mar 28, 2018 at 03:11:50PM +0100, tech-lists wrote:
>>> On 28/03/2018 14:39, Gregory Byshenk wrote:
 You can do this manually, or by adding a PORTS_MODULES line to
 /etc/make.conf. This will rebuild the listed modules from ports
 when you build a new kernel.
>>>
>>> Are you sure it's in /etc/make.conf and not /etc/src.conf?
>>
>> No. But it is in the man page for make.conf and not src.conf.
> 
> yeah. That is... confusing! at least to me, because [I thought it would be] 
> src.conf that's consulted when src is built. So I ran a couple of tests and 
> found that it would work in either file HOWEVER if one ports module statement 
> was in src.conf and another, different ports module statement was in 
> make.conf, 
> that the one in src.conf would get built but the one in make.conf would not. 
> 
> how confusing is that. This is on 11.1-stable.
No, its not confusing at all, if you think about it ... ;-)

/etc/make.conf is included in any case, but kernel and world builds will
include /etc/src.conf thereafter (and overwrite earlier settings read from
make.conf, unless ?= is used to preserve the earlier value, or += to append
the new values).

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: programm to edit rc.conf key/values ?

2017-10-18 Thread Stefan Esser
Am 18.10.17 um 14:32 schrieb Kurt Jaeger:
> Hi!
> 
>> sysrc nrpe2_enable="YES"
>>
>> for instance
> 
> Yes, that's it! Thanks!
> 
> man rc.conf and man rc need a pointer to this!

Yes, I also once searched for sysrc and had appreciated references
in some other man pages ...

Therefore, I've just added those references to rc.8 and rc.conf.5,
as you suggested.

I forgot to set a MFC reminder, but I'll see that these references
are merged to 10 and 11 before the end of the month.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


GELI: Regression between STABLE-10 and STABLE-11?

2017-06-16 Thread Stefan Esser
Hi all,

I'm administrating an SVN server for a small company, which is used
to archive work results, but also customer contracts and information
received under NDA.

The system uses pure ZFS (root on ZFS) and part of the "data" pool
is a ZVOL that is used as a GELI provider to hold the confidential
data.

I just tried to upgrade this system to STABLE-11 (or rather 11-BETA1)
and found, that I could not attach the GELI protected partition with:

# geli attach -d -k /root/MY_GELI_KEYFILE /dev/zvol/data/geli.vol

The command failed with "invalid password" (or along that line, sorry
for not writing the exact text down).

The system was running with consistent STABLE-11 kernel and world,
and there was no sign of any other problem.

I performed a roll-back to STABLE-10 and could attach the GELI
partition without any problem with the key-file and password that
had failed under STABLE-11.

This problem is not critical for me (I can create an encrypted backup
of the encrypted data and restore that into a GELI partition created
under STABLE-11), but it might be a general problem - that's why I'm
reporting this failure ...


Some more details:

$ uname -a
FreeBSD XXX.com 10.3-STABLE FreeBSD 10.3-STABLE #0 r318284: Mon May 15
11:58:47 CEST 2017 root@s...  amd64

The (abridged) ZFS pool status is:

$ zpool status
  pool: sys
config:

NAME  STATE READ WRITE CKSUM
sys   ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
gpt/System-1  ONLINE   0 0 0
gpt/System-2  ONLINE   0 0 0

  pool: data
config:
NAMESTATE READ WRITE CKSUM
dataONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
gpt/Data-1  ONLINE   0 0 0
gpt/Data-2  ONLINE   0 0 0

  pool: crypto
config:
NAME  STATE READ WRITE CKSUM
cryptoONLINE   0 0 0
  zvol/data/geli.vol.eli  ONLINE   0 0 0

$ zfs list -t volume
NAMEUSED  AVAIL  REFER  MOUNTPOINT
data/geli.vol  94.5G  78.5G  37.9G  -

I know about the problem of ZFS on ZFS and this will be fixed (I'm
going to convert the file-system in the ZVOL to UFS), but it was a
valid setup when the server was installed a number of years ago.
(And I use "vfs.zfs.vol.recursive=1" as a work-around to disable
the safe-guard that has been implemented to prevent ZFS on ZPOOL.)

I'm able to work around the problem, since the amount of data in the
encrypted partition is small and I wanted to transfer it into an UFS
file-system on a GELI partition, anyway.

Since I had only reserved a short maintenance window for the attempted
upgrade, I could not perform many tests and I lost all logs during the
rollback to STABLE-10. (I had not considered, this could be a problem
that might affect others, at that time.)

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: passwd and pw speed regression?

2016-02-11 Thread Stefan Esser
Am 11.02.2016 um 16:02 schrieb Mike Tancsa:
> I noticed that on a new RELENG_10 box we are building, password updates
> are taking a very long time to build.  On the old RELENG_8 box, doing
> something simple like adding a user
> 
> # time pw useradd test12345
> 0.062u 0.063s 0:00.14 85.7% 54+988k 196+134io 0pf+0w
> 
> # time pw userdel test12345
> 0.164u 0.044s 0:00.20 100.0%28+1181k 0+18io 0pf+0w
> 
> 
> On the new RELENG_10 box,
> 
> # time pw useradd test12345
> 0.060u 0.120s 0:58.89 0.3%  58+146k 12+6485io 0pf+0w
> 
> # time pw userdel test12345
> 0.125u 0.133s 0:58.80 0.4% 46+214k 13+9326io 0pf+0w
> 
> 
> # wc /etc/passwd
> 6113   14792  376128 /etc/passwd
> 
> 
> Yes, almost 60 seconds to add a user to the password file?
> 
> Does anyone know what is going on to account for the large difference
> and how to work around it ?  I am guessing

You are affected by the problem mentioned in

https://reviews.freebsd.org/D5186

The output file is written with O_SYNC a record at a time
and this is slow (100 to 200 records per second on a non-SSD
drive).

This will be fixed in -CURRENT soon and I think the fix
should qualify for inclusion in the next 10-BETA, thereafter.

Regards, STefan

> https://svnweb.freebsd.org/base?view=revision&revision=285205
> 
> is the issue. Apart from keeping local source code changes, is there not
> a better way to not have reasonable speeds ?
> 
>   ---Mike

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: CPU frequency doesn't drop below 1200MHz (like it used to)

2015-05-22 Thread Stefan Esser
Am 22.05.2015 um 09:33 schrieb Nikos Vassiliadis:
> Hi,
> 
> I just noticed that my CPU's frequency doesn't support dropping
> below 1200MHz. It used to be able to go down to 150MHz, if I am
> not mistaken. I'd like it to go down to 600MHz via powerd, like
> it used to go. This is a month's old 10-STABLE.
> 
>> [nik@moby ~]$ sysctl dev.cpu.0.freq_levels
>> dev.cpu.0.freq_levels: 2400/35000 2300/32872 2200/31127 2100/29417
>> 2000/27740 1900/26096 1800/24490 1700/22588 1600/21045 1500/19534
>> 1400/18055 1300/16611 1200/15194
> 
> This is the CPU:
>> hw.model: Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz

Well, your CPU does not support clock frequencies below 1200 MHz.

Throttling works by injection of "wait cycles" that reduce the
amount of work the CPU can perform per unit of time, but does
not really lower the CPU frequency.

That means, that with throttling the CPU will need more energy
to perform some calculation than it would without.

If you select 150 MHz, then your CPU will be clocked at 1200 MHz,
but will only perform any operations on each 8th clock cycle.
This limits peak energy consumption (and that was the reason this
feature was introduced in the power-hungry Pentium-4 processors),
but increases the amount of energy needed to perform the computation.

The power consumption of your CPU may be (an estimated) 50% to 70%
at "150 Mhz" compared to 1200 Mhz. But you'll need 8 times as long
until the CPU can fall into a deep sleep state. Since RAM and other
components see the same clock whether throttling is enabled or not,
you'll need 8 times as long full power for your RAM (which will
also go into a low power refresh mode, when the CPU is idle).

Throttling has been disabled, because there are no longer any CPUs
which need it to prevent overheating. (Or rather: there are now
better mechanisms than throttling, which are implemented in any
modern x68 CPU.) Throttling could also impact system stability.

It really serves no purpose anymore and it was never suitable to
improve the power efficiency of e.g. a laptop computer. You'll
see better battery live if you keep throttling disabled.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: if_em, legacy nic and GbE saturation

2013-08-26 Thread Stefan Esser
Am 26.08.2013 11:28, schrieb Harald Schmalzbauer:
> Bezüglich Adrian Chadd's Nachricht vom 26.08.2013 10:34
> (localtime):
>> Hi,
>> 
>> There's bus limits on how much data you can push over a PCI bus.
>> You can look around online to see what 32/64 bit, 33/66MHz PCI
>> throughput estimates are.
>> 
>> It changes massively if you use small versus large frames as
>> well.
>> 
>> The last time I tried it i couldn't hit gige on PCI; I only
>> managed to get to around 350mbit doing TCP tests.
> 
> Thanks, I'm roughly aware about the PCI bus limit, but I guess it
> should be good for almost GbE: 33*10^6*32=1056, so if one considers
> overhead and other bus-blocking things (nothing of significance is
> active on the PCI bus in this case), I'd expect at least 800Mbis/s,
> which is what I get with jumbo frames.

But PCI bus throughput might be much lower than expected:

- The arbitration overhead is quite high, in the order of 0.2 to 0.3us.

- Depending on device capabilities and chip-set configuration and
  features there may be many more arbitration phases than one might
  expect.

- A cache line flush is requested for data held in the CPU, unless the
  bus-master uses special transfer commands to indicate that the full
  cache line will be invalidated within the requested transfer.

These overheads combined may reduce the effective PCI throughput to a
fraction of the nominal performance (1/3 to 1/4 for bursts of 16 bytes).

The "minimum grant" value is the minimum burst length the device wants
(to avoid a buffer underrun/overrun due to too low effective bandwidth)
the "maximum latency" corresponds to the number of PCI clocks the
device is willing to wait for the bus to be granted (to avoid buffer
underrun/overrun while waiting to get access to the bus granted). The
maximum latency value is useful to calculate the maximum arbitration
unit for which no device is stalled longer than allowed by MAXLAT.

MINGNT and MAXLAT of a device can be displayed with pciconf:

# pciconf -r -b pci0:1:0 0x3e:0x3f (e.g., for bus 0 device 1 function 0)

The PCI bus will be "lost" whenever another device gets access to
the bus, whether CPU or another PCI (or PCIe) device.

Especially when simultanously sending and receiving packets with
two Ethernet controllers, bus arbitration will occur for every 16
to 32 transfers (depending on bus arbiter settings and programmed
MINGNT).

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zpool on a zvol inside zpool

2013-07-22 Thread Stefan Esser
Am 22.07.2013 10:04, schrieb Eugene M. Zheganin:
> Hi.
> 
> I'm moving some of my geli installation to a new machine. On an old
> machine it was running UFS. I use ZFS on a new machine, but I don't have
> an encrypted main pool (and I don't want to), so I'm kinda considering a
> way where I will make a zpool on a zvol encrypted by geli. Would it be
> completely insane (should I use UFS instead ?) or would it be still
> valid  ?

I have configured a system in just that way, a few weeks ago.
It seems to work just fine.

This is a workgroup server for a small company, which is meant to
provide secure storage for documents. The system has a separate
boot/root pool and a large pool for data (both as ZFS mirrors).

On the data pool there is a ZVOL which is GELI encrypted to
provide a "disk" for the encrypted ZFS that holds the documents.

The system is running headless in some datacenter. It must boot
multi-user and start a SSHD for remote entry of the passphrase,
therefore solutions where a GELI key is on a USB key or entered
via a console during boot were not possible.

Performance is reasonable and far exceeds the 100Mbit/s Ethernet
port ordered in the data-center, so I did not bother to measure
throughput of this ZFS on GELI encrypted ZPOOL.

For low load scenarios, this seems to be the easiest configuration.
If you have hardware crypto or expect high load, then a ZFS mirror
of GELI encrypted disks may show better performance, though.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Flow monitoring with PF

2013-06-12 Thread Stefan Esser
Am 12.06.2013 02:17, schrieb Scott, Brian:
>> I was looking at trying out flow monitoring and I found pfflowd, but 
>> unfortunately it does not work with FreeBSD >9.0. I thought about ng_netflow 
>> but that doesn't >see my tun interface which may be related to..
>> WARNING: attempt to domain_add(netgraph) after domainfinalize()
> 
> Noise message. I've never seen it actually mean anything.

This message indicates a possible problem (leading to panics under
specific circumstances). I proposed a patch to fix the panic, but
was reluctant to commit it, because I knew the patch was not complete
(and I was working toward a better solution).

It was then taken by somebody who ignored the problems with the patch
and committed against my advise. That's when I stopped working on a
real fix - the committer of my (incomplete) patch owns the problem
now (and is not active anymore, AFAICT).

The problem is that registering a network domain after the kernel
has been running (e.g. when loading Netgraph as a kernel module),
data structures in the kernel need to be adjusted. AFAICR, it works
as long as only one new network domain is loaded (e.g. Netgraph),
but may fail if another one is loaded thereafter (this used to be
triggered by ISDN, which had its own network domain but is history,
now).

Sorry for having nothing to add on the subject of this thread ...

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Any objections/comments on axing out old ATA stack?

2013-04-01 Thread Stefan Esser
Am 01.04.2013 15:14, schrieb Victor Balada Diaz:
> Being able to configure quirks from loader.conf for disks AND controllers 
> would be great
> and is not hard to do. If you want i can do a patch in two weeks and send it 
> to you. That
> way it's easy to test disabling NCQ and/or other things in case of hitting a 
> bug. Also
> being able to modify the configuration without a kernel recompile would be a 
> big
> improvement because we could still use freebsd-update to keep systems updated.

Something like:

kern.cam.ada.0.quirks=1

to force 4KB sectors?

No need to implement that, it is in -CURRENT (did not check -STABLE).
But there is no quirk, that disables NCQ, currently, although it is
easy to implement. See the places where "ADA_FLAG_CAN_NCQ" is set and
make that value depend on a new quirk flag being unset ...

But instead of setting that flag in the loader, it would be good to
collect drive signatures that need it and to add quirk entries for
them in ata_da.c ...

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS / zpool size

2012-01-17 Thread Stefan Esser
Am 17.01.2012 16:47, schrieb Christer Solskogen:
> Hi!
> 
> I have a zpool called data, and I have some inconsistencies with sizes.
> 
> $ zpool iostat
>capacity operationsbandwidth
> poolalloc   free   read  write   read  write
> --  -  -  -  -  -  -
> data3.32T   761G516 50  56.1M  1.13M
> 
> $ zfs list -t all
> NAMEUSED  AVAIL  REFER  MOUNTPOINT
> data   2.21T   463G  9.06G  /data
> 
> Can anyone throw any light on this?

The ZFS numbers are 2/3 of the ZPOOL numbers for alloc.

This looks like a raidz1 over 3 drives. The ZPOOL command shows
disk blocks available and used (disk drive view), the ZFS command
operates on the file-system level and shows blocks used to hold
actual data or available for actual data (does not account for
RAID parity overhead).

> Is not free the same as AVAIL?

Avail should be 2/3 of free, but I guess there is some overhead
that reduces the number of available blocks.

> I do not have any zfs snapshots, but some filesystems are compressed.

Compression affects the ZPOOL and ZFS numbers in the same way, but  "zfs
list" and "df" will differ significantly for file-systems that
contain compressible data.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FB9-stable: bridge0 doesn't come up via rc

2012-01-17 Thread Stefan Esser
Am 17.01.2012 12:57, schrieb Denny Schierz:
> hi,
> 
> I have problems starting the bridge via rc.d:
> 
> rc.conf:
> 
> cloned_interfaces="bridge0"
> ifconfig_bge0="up"
> ifconfig_bridge0="addm bge0 up"
> ifconfig_bridge0="inet 192.168.1.0 netmask 255.255.255.0 up"

You forgot that rc.conf does not contain commands, but only variable
assignments. The latter of the last two lines overwrites the value
set in the former.

You may want to replace the first of these two lines by:

autobridge_bridge0="bge0" # add further interfaces as required

The parameter holds a space separated list and may include wild-cards
(e.g. "bge0 ixp*").

> defaultrouter="192.168.1.254"
> gateway_enable="YES"
> 
> It doesn't work. After reboot I have to set up:
> 
> ifconfig bridge0 addm bge0
> 
> then it works.

Yes, as explained above ...

> Also a problem: "/etc/rc.d/netif stop" doesn't destroy bridge0 and 
> "/etc/rc.d/netif start" gives errors, because bridge exists already.

This will be fixed if you use "autobridge" as explained above. See
/etc/rc.d/bridge for details.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: SCHED_ULE should not be the default

2011-12-24 Thread Stefan Esser
Am 24.12.2011 00:02, schrieb Andriy Gapon:
> on 24/12/2011 00:49 Adrian Chadd said the following:
>> Does ULE care (much) if the nodes are hyperthreading or real cores?
>> Would that play a part in what it tries to schedule/spread?
> 
> An answer to this part from the theory.
> ULE does care about physical topology of the (logical) CPUs.
> So, for example, four cores are not the same as two core with two hw threads
> from ULE's perspective.  Still, ULE tries to eliminate any imbalances between
> the CPU groups starting from the top level (e.g. CPU packages in a 
> multi-socket
> system) and all the way down to the individual (logical) CPUs.
> Thus, given enough load (L >= N) there should not be an idle CPU in the system
> whatever the topology.  Modulo bugs, of course, as always.

I tried to locate the old message, where somebody explained why the
topology lead to a thread being selected for migration, re-assigned and
then on another topology level was swapped back and ended on just the
core it had already been running on. The analysis was quite detailed and
it may well have been part of that discussion back in 2008 that Steve
Kargl mentioned ...

This problem could be fixed by adding a slight degree if randomness.
But if IIRC, a deterministic solution might also be possible, which just
takes care not to put a thread back on the core it previously had been
running on, if it has been determined that the thread should be migrated
to a different core, before.

Sorry for not being able to point to the old message that contained the
analysis of this problem.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server

2011-12-20 Thread Stefan Esser
Am 21.12.2011 06:22, schrieb Ian Smith:
> I find the results on this page very strange, but perhaps indicative:
> 
> http://www.phoronix.com/scan.php?page=article&item=debian_kfreebsd_h210&num=1
> 
> Here we see scant difference in results between Debian running FreeBSD 
> 7.3 or 8.0 or Linux 2.6.32 kernels, yet native FreeBSD 7.3 and 8.0 
> installations apparently run far slower, especially on the gzip test!

You did not expect this, since all user space programs were compiled
from identical sources, as were FreeBSD and kFreeBSD (probably with
minimal deviations in kFreeBSD, which should not affect the results)?

> Does this imply that given the similar kernel speed, Debian GNU userland 
> performs so dramatically better than FreeBSD userland?  Or does it 
> perhaps point to the default tuning of the FreeBSD systems compared to 
> (here) Debian, for these particular tests?  Indeed, `which gzip`?

Well, the answer is quite simple: Just run the Linux binaries on FreeBSD
or kFreeBSD (those compiled for testing Linux performance) and I'm
convinced that you'll find that performance significantly improves.

You did notice, that the 7-zip and gzip binaries were built with
gcc-4.4.4 for Linux and with gcc-4.2.1 for FreeBSD?

And another point: The relative advantage between FreeBSD and Linux is
different on R52 and T61. Might it be the case that gcc-4.4.4 has better
knowledge of the newer CPU in the latter (T61, Core 2 Duo) and optimizes
for it, not for the CPU in the R52 (Pentium-M) anymore?

And apparently 7-zip results are less affected by the compiler version
than the gzip results. This also hints at the compiler as the reason for
the better kFreeBSD and Linux results. (7-zip seems to be less dependent
on the better optimization of the newer gcc, or it does not take as much
advantage from it.)

Funny is the finding, that gzip is measured slower on FreeBSD 7.3 than
8.0 on the Pentium-M, while it is faster on 7.3 on the Core 2 Duo. That
does not match my expectations at all ...

There are no technical reasons, that FreeBSD does not come with a newer
GCC, as probably all in this list know. But OTOH, the newer GCC versions
can easily be installed from a port or package, and thus it would not
have been impossible to compare native binaries compiled with the same
compiler version for all test cases.

> And yes, FreeBSD could sure use some sort of tuning 'profiles' mechanism 
> to be able to preconfigure systems for at least several vastly different 
> types of workload.  Nate Lawson used to talk about this, then in respect 
> to simple 'laptop vs desktop' scenarios, but we've since seen volumes 
> written, mostly in lists but some wikis, parts of the Handbook, guides 
> for performance tuning etc, scarcely accessible to J. Random Installer.  
> A set of tunings for these Phoronix benchmarks might be a good start?

I doubt that tuning is responsible, because kFreeBSD performed better
(with the test programs compiled with gcc-4.4.4). The benchmark measured
just that, the better optimization of the newer gcc version.

Install the port (perhaps an even later gcc version, gcc-4.5 is said to
generate even better code than gcc-4.4) and make it the default compiler
for ports, if you want to take advantage of the more advanced compiler.
The FreeBSD ports system makes that very easy.

BTW: Why don't we build binary packages with a later version of gcc than
what is in the system. This should not cause any GPLv3 violation, and we
could have the userland built with the compiler giving best performance ...

Regards, Stefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server

2011-12-16 Thread Stefan Esser
Am 16.12.2011 08:06, schrieb O. Hartmann:
> For the underlying OS, as far as I know, the compiler hasn't as much
> impact as on userland software since autovectorization and other neat
> things are not used during system build.
> 
> From my experience using gcc 4.2 or 4.4/4.5 does not have an impact
> beyond 3% when SSE isn't explicetly enforced.

Well, but the compute intensive tests showed performance variance of a
few percents only, IIRC. The big differences were in the parts that
heavily depend on file system and buffer cache concepts (i.e. the low
limit on dirty buffers in FreeBSD, which is very beneficial in real
world situations; do you remember the first few releases of SunOS-4,
which heavily suffered in interactive performance due to a naive unified
buffer cache VM system that did not limit the amount of dirty buffers?
It caused interactive shells to be swapped out within seconds on systems
with background jobs writing to disk).

> More interesting is the performance gain due to the architecture. I
> think it would be very easy for M. Larabel to repeat this benchmark with
> a "bleeding edge"  Ubuntu or Suse as well. And since FreeBSD 9.0 can be
> compiled with CLANG, it should be possible to compare both also with
> "bleeding edge" compilers, say FreeBSD 9/CLANG, Ubuntu 12/gcc 4.6.2.

Clang may be considered "bleeding edge", but in quite a different way
than gcc-4.6.2. While the latter can look back on 2 decades of
development, clang is still in a state where feature completeness (and
bug-to-bug compatibility with GCC ;-) is much more important than
performance. there is much promise of powerful optimizations becoming
available in clang once it is mature, but just now expect GCC 4.6.2 to
deliver 5% to 10% higher performance than clang.

But as stated before: To exclude compiler dependencies just run the
Linux binaries on FreeBSD. There is slight emulation overhead and Glibc
is not particularly optimized for FreeBSD, but this will still provide
more useful results.

And the tests should be selected to represent reasonable real-world
scenarios. Server programs tested on otherwise idle systems and running
for just a few seconds (not reaching equilibrium during the majority of
the test period) are not representative at all (again: if your goal is
to compare server performance).

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: switching schedulers (Re: SCHED_ULE should not be the default)

2011-12-16 Thread Stefan Esser
Am 16.12.2011 09:11, schrieb Luigi Rizzo:
> The interesting part is probably the definition of the methods that
> schedulers should implement (see struct _sched_interface ).
> 
> The switch from one scheduler to another was implemented with a
> sysctl. This calls the sched_move() method of the current (i.e.
> old) scheduler, which extracts all ready processes from its own
> "queues" (however they are implemented) and reinserts them onto the
> new scheduler's "queues" using its (new) setrunqueue() method.  You
> don't need to bother for blocked process as the scheduler doesn't
> know much about them.
> 
> I am not preserving the thread's dynamic "priority" (think of
> accumulated work, affinity etc.) when switching
> schedulers, as that is expected to be an infrequent event, and
> so in the end it doesn't really matter -- at a switch, threads
> are inserted in the scheduler as newly created ones, using only
> the static priority as a parameter.

I think this is OK for user processes (which will receive reasonable
relative priorities after running a fraction of a second, anyway).

But I'm not sure whether it is possible to use static priorities for
(real-time) kernel threads, where priority inversion may occur, if the
current dynamic (relative) thread priorities are not preserved.

But not only the relative priorities of the existing processes must be
preserved, new kernel threads must be created with matching (relative)
priorities. This means, that the schedulers may be switched at any time,
but the priority values should be portable between schedulers to prevent
dead-lock (or illegal order of execution?) of threads (AFAICT).

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server

2011-12-15 Thread Stefan Esser
Am 15.12.2011 11:10, schrieb Michael Larabel:
> No, the same hardware was used for each OS.
> 
> In terms of the software, the stock software stack for each OS was used.

Just curious: Why did you choose ZFS on FreeBSD, while UFS2 (with
journaling enabled) should be an obvious choice since it is more similar
in concept to ext4 and since that is what most FreeBSD users will use
with FreeBSD?

Did you tune the ZFS ARC (e.g. vfs.zfs.arc_max="6G") for the tests?

And BTW: Did your measured run times account for the effect, that Linux
keeps much more dirty data in the buffer cache (FreeBSD has a low limit
on dirty buffers since under realistic load the already cached data is
much more likely to be reused and thus more valuable than freshly
written data; aggressively caching dirty data would significantly reduce
throughput and responsiveness under high load). Given the hardware specs
of the test system, I guess that Linux accepts at least 100 times the
dirty data in the buffer cache, compared to FreeBSD (where this number
is at most in the tens of megabyte range).

If you did not, then your results do not represent a server load (which
I'd expect relevant, if you are testing against Oracle Linux 6.1
server), where continuous performance is required. Tests that run on an
idle system starting in a clean state and ignoring background flushing
of the buffer cache after the timed program has stopped are perhaps
useful for a very lowly loaded PC, but not for a system with high load
average as the default.

I bet that if you compared the systems under higher load (which
admittedly makes it much harder to get sensible numbers for the program
under test) or with reduced buffer cache size (or raise the dirty buffer
limit in FreeBSD accordingly, which ought to be possible with sysctl
and/or boot time tuneables, e.g. "vfs.hidirtybuffers").

And a last remark: Single benchmark runs do not provide reliable data.
FreeBSD comes with "ministat" to check the significance of benchmark
results. Each test should be repeated at least 5 times for meaningful
averages with acceptable confidence level.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Sandy Bridge and MCA UNCOR PCC (problem + solution)

2011-11-30 Thread Stefan Esser
Am 25.11.2011 21:13, schrieb Thomas Zander:
> List,
> 
> here's a rant about a recent problem I had and the surprising
> solution.
> 
> I recently had to investigate weird unexpected issues on a workstation.
> Relevant hardware: Asus P8B-WS, Xeon E3-1260L (Sandy Bride, Intel
> HD-2000 graphics)
> 
> Since we don't have kms and friends in STABLE yet, and I can live
> without accelerated video for now, I am using the vesa driver on this
> machine.
> 
> Initially, this had two major drawbacks:
> 1) 1280x1024 resolution utterly sucks on a 1680x1050 screen.
> 2) Reproducable unhandled MCA events (and subsequent kernel panics)
> like the following whenever I switch from X to console:
> 
> panic: machine check trap
> ...
> MCA: CPU 6 UNCOR UNCOR UNCOR PCC PCC PCC internal error 2internal error
> 2PCC internal error 2
> 
> The kernel dump _always_ showed something like:
> 
> current process = 11 (idle: cpu3)
> trap number = 28
> #1 0x805db167 at panic+0x187
> #2 0x808c6820 at trap_fatal+0x290
> #3 0x808c6d3a at trap+0x10a
> #4 0x808ae894 at calltrap+0x8
> #5 0x801f6b9a at acpi_cpu_idle+0x20a
> #6 0x806003af at sched_idletd+0x11f
> #7 0x805afe6f at fork_exit+0x11f
> #8 0x808aedde at fork_trampoline+0xe
> 
> mcelog did not help decoding the MCA output and the "internal error2"
> message made me suspect that this CPU was maybe just broken.
> However, due to my utter inabilty of producing the slightest other
> problem with this machine (constantly heavy CPU + IO load) or any
> problem using other operating systems I derived the wild speculation
> that there might be something with the Sandy Bridge silicon which this
> exact sequence of actions on FreeBSD reliably could trigger.
> 
> Long story short: I got the latest Bios from Asus for this Board. The
> changelog of course said absolutely nothing about fixing any known
> problem.
> Upon boot I entered the Bios settings and noticed that it apparently
> contained a microcode update. The changelog for microcode from Intel is
> of course non-existing.
> 
> And since this boot there has not been a single problem with this
> machine. Vesa now works in 1680x1050 and switching from X to console
> and back does not trigger MCA events anymore.
> 
> I like to believe that for the first time a microcode update from Intel
> fixed my specific problem.
> 
> Anyway, now the story is on the list and for Google to find, in case
> anyone else has this problem as well.

Thank you for reporting this. I had a somewhat similar problem with
an i2600K (not overclocked) on an ASUS P8H67-M EVO.

I had somewhat similar issues, which have also been resolved by a
BIOS upgrade (to version 2001 for that mainboard).

The system locked up hard (did not even respond to pressing the reset
button) after attempting to switch from X11 to a text console, or even
on shutdown from X11. This started a few month back (it had been
working, when the system was new), but due to lack of debug access and
no way to obtain a core dump, I had just given up on starting a local
X11 server.

Meanwhile I had updated the BIOS to the latest version while trying
to resolve another issue, and after reading your message I retried
starting the X11/vesa server and found, that it does no longer
completely lock up the system on exit. Text consoles are unusable,
once the X server had been started (no stable signal, monitor looses
sync and switches off and on again after a few seconds but only shows
hardly readable characters).

Anyway, I can now use the local X11 and still cleanly shutdown the
system without the need to remove electrical power.

The new version of the microcode patches appears to be "1a", but I
have no idea, what version the previous BIOS contained.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: disable 64-bit dma for one PCI slot only?

2011-07-20 Thread Stefan Esser
On 20.07.2011 18:25, YongHyeon PYUN wrote:
>> The "Rev" column is required for of devices that are not uniquely
>> identified by their Vnd/Dev-IDs. (These used to exist, e.g. the Symbios
>> SCSI controllers, though I'm not aware of any device that needed a
>> different driver depending on the PCI revision number.)
>>
> 
> re(4) and rl(4) are one of example that needs the "Rev".

Does the decision whether "re" or "rl" attaches the device depend
on the revision field? This used to be the case for "ncr" and "sym",
too, but one driver was extended to cover all devices supported by
the other ...

Anyway: I agree that the revision is significant information and
should be kept in pciconf output.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: disable 64-bit dma for one PCI slot only?

2011-07-20 Thread Stefan Esser
Am 20.07.2011 18:11, schrieb Scott Long:
> On Jul 20, 2011, at 3:54 AM, Stefan Esser wrote:
>>
>> This is a very good idea, IMHO.
>>
>> When I committed pciconf back in 1996 (it had been contributed by
>> gwollman) for PCI 1.0 (at a time when their was no standard for PCI to
>> PCI brigdes, yet ;-) ), the current format seemed sensible, but the
>> tabular form suggested by Artem is much better to parse.
>>
>> I'd want to suggest another slightly different format:
>>
>> Driver Handle ClassVndDevSubVnd SubDev Rev  Hdr
>> hostb0 0:0:0:00x06 0x8086 0x0100 0x8086 0x2010 0x09 0x00
>> pcib1  0:0:1:00x060400 0x8086 0x0101 0x8086 0x2010 0x09 0x01
>> pcib2  0:0:1:10x060400 0x8086 0x0105 0x8086 0x2010 0x09 0x01
>> none0  0:0:22:0   0x078000 0x8086 0x1c3a 0x8086 0x4742 0x04 0x00
>> em00:0:25:0   0x02 0x8086 0x1503 0x8086 0x 0x04 0x00
>> dummy0 65535:255:31:7 0x02 0x8086 0x1503 0x8086 0x 0x04 0x00
>>
>> I.e., print only one header line (no "---"), make the "Handle" column
>> wide enough to hold the longest possible value, use only white space to
>> separate columns and print 0x as a prefix for all hex numbers.
>>
>> Instead of "pci0:0:0:0" for the PCI handle, just "0:0:0:0" could be
>> printed, IMHO. (But this is bikeshed material, I guess ...)
>>
>> The "Rev" column is required for of devices that are not uniquely
>> identified by their Vnd/Dev-IDs. (These used to exist, e.g. the Symbios
>> SCSI controllers, though I'm not aware of any device that needed a
>> different driver depending on the PCI revision number.)
> 
> Actually, a few drivers (amr in particular) look at this rev field during 
> probe, though they should be looking at the subdev/ven ids instead.
> I think that this behavior has actually caused recent headaches for
> LSI with other drivers.  But as Kostik points out, the rev field is
> still moderately useful for informational purposes.

Dependency on the revision is bad, if it is a required criterion for the
selection of a driver. This used to be the case for the Symbios 53c810
vs. 53c860 (where the latter could take advantage of the "sym" driver,
while the prior lacked support of features originally required by the
sym driver and only worked with "ncr"). The subvendor/subdevice ID is
not well suited to select a driver in that case, since it was not used
in general (PCI 1.0, on-board controllers) and even if it was used, the
list of subvendor/subdevice tuples is hard to maintain if there are many
vendors using a certain PCI(e) chip.

>> I'd be happy to modify pciconf to print the new format in -CURRENT
>> (having been the maintainer of the PCI code for quite some time), if
>> consensus is reached on a format and if this change is accepted by RE.
> 
> I'm pretty sure that we scrape the current format at Yahoo and use it 
> in our tools.  Implementing a switch of some sort to fall back to the
> old format is something that will have to happen at some point,
> whether it's done now or not.  I'd probably implement it as an env
> variable such as "PCICONF_COMPAT", similar to what is used by expr(1).

Hmm, then how about a new option (e.g. "-t" for tabular output, or "-L"
for an alternate list format).

For the current format, "-l" can be combined with "-b", "-c" and/or "-v"
to print BARs, CAPs and/or decoded vendor and device information.

The new tabular format suggested above does not mix well with these
extended list options, and thus I think we should introduce a new list
option that is incompatible with -b, -c and -v. The old option would
produce unchanged output and thus there is no need for PCICONF_COMPAT.

Regards, STefan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: disable 64-bit dma for one PCI slot only?

2011-07-20 Thread Stefan Esser
Am 19.07.2011 20:17, schrieb Artem Belevich:
> On Tue, Jul 19, 2011 at 6:31 AM, John Baldwin  wrote:
>> The only reason it might be nice to stick with two fields is due to the line
>> length (though the first line is over 80 cols even in the current format).  
>> Here
>> are two possible suggestions:
>>
>> old:
>>
>> hostb0@pci0:0:0:0:  class=0x06 card=0x20108086 chip=0x01008086 
>> rev=0x09 hdr=0x00
>> pcib1@pci0:0:1:0:   class=0x060400 card=0x20108086 chip=0x01018086 
>> rev=0x09 hdr=0x01
>> pcib2@pci0:0:1:1:   class=0x060400 card=0x20108086 chip=0x01058086 
>> rev=0x09 hdr=0x01
>> none0@pci0:0:22:0:  class=0x078000 card=0x47428086 chip=0x1c3a8086 
>> rev=0x04 hdr=0x00
>> em0@pci0:0:25:0:class=0x02 card=0x8086 chip=0x15038086 
>> rev=0x04 hdr=0x00
>> ...
>>
>> A)
>>
>> hostb0@pci0:0:0:0:  class=0x06 vendor=0x8086 device=0x0100 
>> subvendor=0x8086 subdevice=0x2010 rev=0x09 hdr=0x00
>> pcib1@pci0:0:1:0:   class=0x060400 vendor=0x8086 device=0x0101 
>> subvendor=0x8086 subdevice=0x2010 rev=0x09 hdr=0x01
>> pcib2@pci0:0:1:1:   class=0x060400 vendor=0x8086 device=0x0105 
>> subvendor=0x8086 subdevice=0x2010 rev=0x09 hdr=0x01
>> none0@pci0:0:22:0:  class=0x078000 vendor=0x8086 device=0x1c3a 
>> subvendor=0x8086 subdevice=0x4742 rev=0x04 hdr=0x00
>> em0@pci0:0:25:0:class=0x02 vendor=0x8086 device=0x1503 
>> subvendor=0x8086 subdevice=0x rev=0x04 hdr=0x00
>> ...
>>
>> B)
>>
>> hostb0@pci0:0:0:0:  class=0x06 devid=0x8086:0100 subid=0x8086:2010 
>> rev=0x09 hdr=0x00
>> pcib1@pci0:0:1:0:   class=0x060400 devid=0x8086:0101 subid=0x8086:2010 
>> rev=0x09 hdr=0x01
>> pcib2@pci0:0:1:1:   class=0x060400 devid=0x8086:0105 subid=0x8086:2010 
>> rev=0x09 hdr=0x01
>> none0@pci0:0:22:0:  class=0x078000 devid=0x8086:1c3a subid=0x8086:4742 
>> rev=0x04 hdr=0x00
>> em0@pci0:0:25:0:class=0x02 devid=0x8086:1503 subid=0x8086: 
>> rev=0x04 hdr=0x00
>> ...
>>
>> I went with vendor word first for both A) and B) as in my experience that is
>> the more common ordering in driver tables, etc.
> 
> Do we need to print (class|devid|device|subvendor|etc.)= on every
> line? IMHO they belong to a header line. Something like this:
> 
> Driver Handle   ClassVnd:Dev Sub Vnd:Dev Rev  Hdr
> --
> hostb0 pci0:0:0:0   0x06 0x8086:0100 0x8086:2010 0x09 0x00
> pcib1  pci0:0:1:0   0x060400 0x8086:0101 0x8086:2010 0x09 0x01
> pcib2  pci0:0:1:1   0x060400 0x8086:0105 0x8086:2010 0x09 0x01
> none0  pci0:0:22:0  0x078000 0x8086:1c3a 0x8086:4742 0x04 0x00
> em0pci0:0:25:0  0x02 0x8086:1503 0x8086: 0x04 0x00

This is a very good idea, IMHO.

When I committed pciconf back in 1996 (it had been contributed by
gwollman) for PCI 1.0 (at a time when their was no standard for PCI to
PCI brigdes, yet ;-) ), the current format seemed sensible, but the
tabular form suggested by Artem is much better to parse.

I'd want to suggest another slightly different format:

Driver Handle ClassVndDevSubVnd SubDev Rev  Hdr
hostb0 0:0:0:00x06 0x8086 0x0100 0x8086 0x2010 0x09 0x00
pcib1  0:0:1:00x060400 0x8086 0x0101 0x8086 0x2010 0x09 0x01
pcib2  0:0:1:10x060400 0x8086 0x0105 0x8086 0x2010 0x09 0x01
none0  0:0:22:0   0x078000 0x8086 0x1c3a 0x8086 0x4742 0x04 0x00
em00:0:25:0   0x02 0x8086 0x1503 0x8086 0x 0x04 0x00
dummy0 65535:255:31:7 0x02 0x8086 0x1503 0x8086 0x 0x04 0x00

I.e., print only one header line (no "---"), make the "Handle" column
wide enough to hold the longest possible value, use only white space to
separate columns and print 0x as a prefix for all hex numbers.

Instead of "pci0:0:0:0" for the PCI handle, just "0:0:0:0" could be
printed, IMHO. (But this is bikeshed material, I guess ...)

The "Rev" column is required for of devices that are not uniquely
identified by their Vnd/Dev-IDs. (These used to exist, e.g. the Symbios
SCSI controllers, though I'm not aware of any device that needed a
different driver depending on the PCI revision number.)

I'd be happy to modify pciconf to print the new format in -CURRENT
(having been the maintainer of the PCI code for quite some time), if
consensus is reached on a format and if this change is accepted by RE.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: PCIe SATA HBA for ZFS on -STABLE

2011-06-01 Thread Stefan Esser
Am 01.06.2011 10:07, schrieb Jeremy Chadwick:
> Sadly I don't have a recommendation for you, since you effectively want
> a 6-port SATA300 controller that's reliable, you're almost certainly
> going to be paying Big Bucks(tm) given the number of ports and your
> requirement that it be PCIe-based.  You state quite boldly "not wanting
> to break the bank", but what you're asking for almost certainly WILL
> break the bank.
> 
> For example, an "affordable" controller might be one driven by Silicon
> Image's SiI3124 chip -- four (4) SATA300 ports, but it's only hooked to
> PCI or PCI-X, not PCIe, which means you're susceptible to a much more
> severe bus bottleneck than with PCIe:
> 
> http://www.siliconimage.com/products/family.aspx?id=3

FYI: There is at least one PCIe card with Sil3124 (with PCIe to PCI-X
bridge on the controller card):

http://www.leaf-computer.com/#24e

Price is 69 Euro plus 5 Euro shipping to Europe or North America (i.e.
some US$110 total).

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: gpart -b 34 versus gpart -b 1024

2010-07-26 Thread Stefan Esser
Am 26.07.2010 03:07, schrieb per...@pluto.rain.com:
> Dmitry Morozovsky  wrote:
>> ... sector numbers (in CHS address method)
>> [start] at 1 (which always suprized me ;)
> 
> This goes back at least as far as soft-sectored 8" diskettes
> in the CP/M era.
> 
> IIRC, physical sector 0 of each track contained the C number,
> possibly the H, and a list of the remaining sectors on the track
> including the size of each sector -- even within a single track
> the sectors did not all have to be the same size.

This is extremely off-topic, and therefor, I´ll only say,
that the above is not true for 8" diskettes nor for CP/M.
I can only guess, that there is a track 0 and not a sector
with that number because the first track was reserved for
system internal use (e.g. held at least the CCP in case of
CP/M). I´m quite sure, that FDCs generally supported sector
numbers from 0 to 254 (with 255 reserved as a wildcard in
certain commands). But this is all really off-topic ...

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: net/mpd5, ppp, proxy-arp issues

2010-05-03 Thread Stefan Esser
Am 26.04.2010 18:02, schrieb Julian Elischer:
> On 4/26/10 1:11 AM, Stefan Esser wrote:
>> I debugged this problem and prepared a patch for discussion, which
>> later was committed by Max Laier (if memory serves me right). The
>> message was added in order to identify further situations, where
>> network domains are added after network interfaces have been
>> initialized. This message ought to be informational right now, since
>> the interface init is repeated whenever a network domain is added
>> as part of above mentioned patch. Init order should be fixed, if
>> this message is printed for compiled in drivers, but in case of a
>> kernel module (like netgraph) that adds a domain, it is unadvoidable
>> that the init order is reversed.
>>
>> Perhaps the message should be made conditional on the start-up of
>> the kernel not having finished, or it should be completely removed,
>> since time has shown, that the init order is correct in general.
>>
>> I'll remove that message (or make it conditional on "bootverbose")
>> unless there is opposition to this change ...
> please do..
> 
> it's an unavoidable thing that domains added after boot
> are done after boot completes   :-)

Hmmm, I had a look at the code over the weekend and I'm not sure,
whether changes during the last 5 years did not break assumptions
and introduced bugs in if_attachdomain1() in /sys/net/if.c ...

The tests at the head of the function seem problematic, but will
need further analysis. I'll have to check the conditions under which
the TRY_LOCK may fail and the second if clause seems to prevent the
execution of the core of the function for KLDs (which would be BAD).

Since I'm travelling abroad and without access to the sources or a
test system for most of the week, I cannot perform these tests. But
I'd be very surprised, if the code still worked as I intended it
more than 5 years ago ...

I'll hold back any commits until I have been able to perform tests
(or somebody else looks into this and gets to a conclusion ...).

Best regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: net/mpd5, ppp, proxy-arp issues

2010-04-26 Thread Stefan Esser
Am 22.04.2010 20:43, schrieb Marin Atanasov:
> Hello,
> 
> Thanks a lot for the patch, Qing!
> 
> It works fine. However I've noticed one thing, after I start mpd5 and
> connect to my home network:
> 
> kernel: WARNING: attempt to domain_add(netgraph) after domainfinalize()
> 
> Not very sure if this is something to worry about or not?

There was a problem with the initialization order of network "domains",
which caused kernel crashes with ISDN+INET6 some two years ago. The
reason was, that there was an implicit assumption, that all domains
were initialized when the network interfaces are initialized, with
NULL dereferences if domains are added (and relevant to a device)
after the device has been initialized.

I debugged this problem and prepared a patch for discussion, which
later was committed by Max Laier (if memory serves me right). The
message was added in order to identify further situations, where
network domains are added after network interfaces have been
initialized. This message ought to be informational right now, since
the interface init is repeated whenever a network domain is added
as part of above mentioned patch. Init order should be fixed, if
this message is printed for compiled in drivers, but in case of a
kernel module (like netgraph) that adds a domain, it is unadvoidable
that the init order is reversed.

Perhaps the message should be made conditional on the start-up of
the kernel not having finished, or it should be completely removed,
since time has shown, that the init order is correct in general.

I'll remove that message (or make it conditional on "bootverbose")
unless there is opposition to this change ...

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance

2010-01-29 Thread Stefan Esser
Am 26.01.2010 00:15, schrieb Daniel O'Connor:
> On Tue, 26 Jan 2010, Dan Naumov wrote:
>> CPU-performance-wise, I am not really worried. The current system is
>> an Atom 330 and even that is a bit overkill for what I do with it and
>> from what I am seeing, the new Atom D510 used on those boards is a
>> tiny bit faster. What I want and care about for this system are
>> reliability, stability, low power use, quietness and fast disk
>> read/write speeds. I've been hearing some praise of ICH9R and 6
>> native SATA ports should be enough for my needs. AFAIK, the Intel
>> 82574L network cards included on those are also very well supported?
> 
> You might want to consider an Athlon (maybe underclock it) - the AMD IXP 
> 700/800 south bridge seems to work well with FreeBSD (in my 
> experience).
> 
> These boards (eg Gigabyte GA-MA785GM-US2H) have 6 SATA ports (one may be 
> eSATA though) and PATA, they seem ideal really.. You can use PATA with 
> CF to boot and connect 5 disks plus a DVD drive.
> 
> The CPU is not fanless however, but the other stuff is, on the plus side 
> you won't have to worry about CPU power :)
> 
> Also, the onboard video works well with radeonhd and is quite fast.
> 
> One other downside is the onboard network isn't great (Realtek) but I 
> put an em card in mine.

If neither an Atom 330 nor a cheap Athlon 64 do not offer sufficient
performance, the new i3-530 and H55 chip-set may be a much faster
alternative (with lower power consumption than the Athlon 64).

This combination is more expensive (official CPU list price $113)
but a CPU plus uATX mainboard should still be under $200 (e.g. the
MSI H55M-E33, or if you want an ITX form factor mainboard, the
Zotac H55-ITX).

Idle power of a small system (incl. 2.5" disk) has been reported
as low as 20W (primary power, efficient 120W power supply), with
throughput near that of low end quad CPUs.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.0-RC USB/FS problem

2009-12-02 Thread Stefan Esser
On 22.11.2009 10:47 Hans Petter Selasky wrote:
> Other operating systems do a port bus reset when the device has a problem. On 
> FreeBSD we just try a software reset via the control endpoint. I guess that 
> it 
> is a device problem you are seeing. The USB stack in FreeBSD is faster than 
> the old one, and maybe the faster queueing of mass storage requests trigger 
> some hidden bugs in your device.
> 
> When the problem happens try:
> 
> sysctl hw.usb.umass.debug=-1

I have observed USB lock-ups with several external drive enclosures
that used to work with the old USB stack (and continue to work when
connected to a Windows notebook, for example). (BTW: System is AMD X2
with Nvidia chip-set and i386 kernel.)

In my case, hw.usb.debug=6 makes the drive work at some 4MB/s for
any amount of data transfered, while hw.usb.debug=5 (and an ylower
value) lets the drive pause for about 1 Minute per 100MB transfered.
I wanted to test whether short delays inserted in the places with
DPRINTFN(6, ...) make a difference, but will not get to it before
the weekend.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Phoronix Benchmarks: Waht's wrong with FreeBSD 8.0?

2009-11-30 Thread Stefan Esser
Am 30.11.2009 15:46, schrieb Ivan Voras:
> Robert Huff wrote:
>> Bill Moran writes:
>>
>>>  It's common knowledge that the default value for vfs.read_max is
>>>  non- optimal for most hardware and that significant performance
>>>  improvements can be made in most cases by raising it.
>>
>> Documentation/discussion where?
> 
> There is no documentation except for the sysctl documentation itself:
> "vfs.read_max: Cluster read-ahead max block count" but it depends on the
> load - it helps sequential reads, will probably do nothing for other
> kinds of loads. It is also UFS-only.

I tested different values some time ago. vfs.read_max can be raised to
about twice its default value and I set it to 15, when I had UFS+SU
file systems (switched over to ZFS, long ago.) Tests included operations
on large files (multi-GB) that were processed and written back to the
same drive. But even in these tests, there was an upper limit beyond
that system responsiveness declined massively (IIRC, at about 25). The
best value (without impact on randoim I/O) seems to be in the range
12 to 16. (FreeBSD used to apply a heuristic on read-ahead, and only
incremented the read amount to the limit set by the sysctl as long as
the accesses were purely sequential.)

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 7.2 dies in midnight run again

2009-11-20 Thread Stefan Esser
Am 20.11.2009 01:45, schrieb Randy Bush:
> i think the issue is how to tune for zfs
> 
> i386 with 4G of RAM
> 
> RELENG_7 cvsupped Nov 18 02:42 GMT
> 
> panic: kmem_malloc(65536): kmem_map too small: 535019520 total allocated
> cpuid = 0
> Uptime: 13h15m1s
> Physical memory: 3958 MB
> Dumping 637 MB: 622 606 590 574 558 542 526 510 494 478 462 446 430 414 
> 398 382 366 350 334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 
> 94 78 62 46 30 14
> Dump complete
> Automatic reboot in 15 seconds - press a key on the console to abort
> 
> and it did not auto reboot
> 
> # cat /boot/loader.conf.local
> ipfw_load=YES
> umass_load=YES
> zfs_load=YES
> vm.kmem_size=536870912
> vm.kmem_size_max=1073741824
> vfs.zfs.prefetch_disable=1

I'm using 8.0-i386 (but AFAIK with same ZFS and allocation behaviour)
on a system with 2GB RAM and a RAIDZ1 consisting of 3*1TB (plus
separate 10GB L2ARC cache on a separate disk, to become a SSD).

Besides increasing KVA_PAGES to cover some 2GB (options KVA_PAGES=512),
I use:

vm.kmem_size=1500M
vm.kmem_size_max=2G

This is the result of quite some tuning, since I also suffered from
kmem_map too small panics under high load.

BTW: I use auto-tuning of the ARC cache size:

vfs.zfs.arc_min: 12288
vfs.zfs.arc_max: 98304

Due to the extra ZFS cache drive, I could reduce arc_max to some 300MB
(or even lower, Though I did not test lower values, yet), without
noticeable impact (faster than with just a 700MB RAM cache in many
tests). And since L2ARC caches can be removed from pools at any time
and do not need redundancy, it is quite simple to test the effect ...

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Various Issues with 7.0-BETA4

2007-12-13 Thread Stefan Esser
Ivan Voras schrieb:
> Peter Thoenen wrote:
> 
>> Issue #1:
>>
>> For some reason zfs_enable="YES" in rc.conf doesn't work.  It doesn't
>> seem to auto mount my zfs mounts which is a PITA.  Currently I am forces
>> (each time I reboot) to boot into single user mode, mount all my drives,
>> then exit, continuing into multi-user mode.  The interesting this is step 3.
>>
>> 1) fsck -p
>> 2) mount -u /
>> 3) zfs
>> 4) zfs mount -a
>> 5) exit
>>
>> NOTE: If I skip #3 and immediately do #4 it mails.  For some reason I
>> have to to a straight zfs call.
>>
>> NOTE: If I immediately go to multiuser mode skipping manually mounting
>> not only does zfs not mount but I have to re-force import the tank pool
>> (e.g. step 3.5: zpool import -f tank)
> 
> Did you create the zfs structures and file system while in single user
> mode with root mounted read-only? If so, this is a "known feature" and
> it won't be fixed: you need to a) mount root read-write and b) run
> /etc/rc.d/hostid start before /etc/rc.d/zfs start. To fix it, mount root
> read-write, remove zpool.cache file (if any) from /boot/zfs, run
> commands from "b" and then run zfs import -f until you have your zfs
> file systems online. Then reboot into multiuser mode - it should work
> now. Never modify zfs without steps "a" and "b", some combinations of
> such modifications lead to kernel panics or possible data loss.

I just rebuild my world and then kernel based on sources as of a few
hours ago. It had been running well with kernel and world as of Nov 22.,
but fails to mount the ZFS root with the new kernel:

[...]
ZFS filesystem version 6
ZFS storage pool version 6
acd0: CDRW  at ata0-master UDMA33
ad4: 76293MB  at ata2-master SATA150
WARNING: Expected rawoffset 0, found 63
lapic1: Forcing LINT1 to edge trigger
SMP: AP CPU #1 Launched!
Trying to mount root from zfs:root

Manual root filesystem specification:
  :  Mount  using filesystem 
   eg. ufs:da0s1a
  ?  List valid disk boot devices
 Abort manual input

mountroot> zfs:root
Trying to mount root from zfs:root
WARNING: TMPFS is considered to be a highly experimental feature in FreeBSD.
bge0: link state changed to UP
[...]

As you can see, the ZFS root mount succeeds after manual entry of just
the value also specified in /boot/loader.conf. The file "zpool.cache"
has also been stable since initial installation of the system:

$ ls -l /boot/zfs/zpool.cache
-rw-r--r--  1 root  wheel  908 Oct  2 17:33 /boot/zfs/zpool.cache


Is this a regression that occured between 2007-11-22 and 2007-12-13 ?


I know that automatic reboots (including mounting this ZFS root FS)
worked fine with the previous kernel!

I could try to export/import the ZFS pool from my boot file system,
but since this is a productive system, I cannot do this right now ...

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: removing external usb hdd without unmounting causes reboot?

2007-07-21 Thread Stefan Esser
Norberto Meijome schrieb:
> On Thu, 19 Jul 2007 17:38:14 +0200
> "[LoN]Kamikaze" <[EMAIL PROTECTED]> wrote:
> 
>> As I mentioned earlier I remember it working during the 5.3 era on Stable, at
>> some point it worked. I even remember removing my CD-Rom drive from my 
>> Thinkpad
>> without running atacontrol detach. The system just took it and the drive just
>> continued working after I put it back in.
> 
> on 6.2-STABLE (of a few days ago), i have this happening a couple of times 
> with no adverse effect at all. 
> Burn DVD/Cd, when finished, hald detects the disk, mounts it, /dev/cd0 in 
> /media/whatever.
> 
> i can eject the disk just fine (which in itself is weird, i think) the 
> device is still there...
> umount /dev/cd0 
> 
> works fine and off it goes. other than that, no, i havent tried to access the 
> device in question

In that case the device has been mounted R/O before, and if
you don't remove it in the middle of a transaction, there
is nothing the kernel might want to do with the physical
device to unmount it (and even within a transfer, this ought
to be caught by the driver). For that reason I had suggested
to have a soft-R/O mode for removable devices, which together
with a very short flush delay might allow such a device to
be mounted R/O "nearly all the time" (tm) ;-) This is not
a perfect solution, but it is similar to the way USB sticks
are used with Windows/XP: Wait a second or two and remove it.
While not perfect this covers the case of MP3 players or
digicams that are mounted as USB storage devices, and many
other cases. To make this a perfect solution is much harder,
but even a simple implementation would be a big step forward.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: removing external usb hdd without unmounting causes reboot?

2007-07-18 Thread Stefan Esser
Oliver Fromme wrote:
> Momchil Ivanov wrote:
>  > On Wednesday 18 July 2007 15:52:42 [LoN]Kamikaze wrote:
>  > > Josh Paetzel wrote:
>  > > > Yes, it's expected behavior.  The workaround is to not unplug mounted
>  > > > devices. (There's nothing special about USB here, if you unplugged an
>  > > > IDE drive you'd get the same behavior)
>  > > 
>  > > Wouldn't it make some sense not to panic if mounted devices that are in
>  > > sync get removed? A few applications might get in trouble, but that's
>  > > hardly a reason to bring a whole system down.
>  > 
>  > I don`t know how things work, but shutting down the system when some
>  > mounted fs is no longer present seems like the wrong thing to me.
> 
> As Josh wrote, it's expected.  The problem is known
> to exist for a long time already (probably as long
> as FreeBSD itself exists), and if there was an easy
> solution, certainly someone would have fixed it.

I have to check this, but AFAIK this problem exists only for
devices/partitions that are mounted R/W. Do you happen to
know this? (I can not risk to crash my box right now for a
test ;-)

There once was an autofs implementation, but IIRC it has
later been removed. It could not only automatically mount
removable media, but it could also help with the problem
of devices that are rarely written to, but still mounted
R/W just in case for easy write-access.


Long time ago I had the idea that a clean file system could
be mounted R/O after a short delay. When all dirty buffers
are flushed, the device could be forcefully disconnected
without causing inconsistencies in the kernel. If there are
no open file descriptors, the super-block could be written
with the "clean" flag set, to signal that no fsck is needed
when the partition is mounted next time.

Internally, the device can be treated as R/O, with the only
exeption that an attempted write is not rejected, but that
it instead triggers the change back to R/W operation (this
means setting the in-RAM copy of the super-block to dirty
before the write is allowed to proceed as normal).

Removable devices and dealing with a device that is gone and
re-appears (either the same device or one that takes its place)
needs special consideration, e.g. by checking a disk label and
flushing cached blocks that were associated with the device
that now is definitely gone.

I had this idea back when floppy disks were common, but with
USB memory sticks and devices the same situation exists ...

The mode change to R/O could be triggered by a timer after
the necessary condition exists (e.g. half a second after the
last write to the device with no dirty buffers left).

The system already knows whether there are dirty buffers for
a partition, it is not hard to detect this case. The other
parameter of interest is whether there are any open files on
that partition (which decides whether the super-block can be
marked as clean).

This functionality could be implemented within an autofs as
a special case (mount only R/O and upgrade only when needed
and for as long as necessary), but I think it should be not
too hard to add as a small in-kernel modification ...

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Patch: sym(4) "VTOBUS FAILED" panics on amd64, amd64/89550

2006-09-22 Thread Stefan Esser
Scott Long schrieb:
> Jan Mikkelsen wrote:
> 
>> Hi,
>>
>> Doug White wrote:
>>
>>> On Fri, 22 Sep 2006, Jan Mikkelsen wrote:
>>>
>>>
 Quick summary:  sym(4) assumes on amd64 that virtual 
>>>
>>> addresses provided by
>>>
 bus_dmamem_alloc() have the same alignment as the physical 
>>>
>>> addresses (in
>>>
 this case, 2*PAGE_SIZE).  They don't, and stuff breaks.  
>>>
>>> This patch works
>>>
 around that.
>>>
>>> Why is this? busdma supports alignment constraints; why not just set
>>> the alignment to what you need it set at? I realize sym has its own
>>> hand rolled DMA management craziness but alignment is something
>>> busdma can take care of easily.
>>
>>
>> sym has the alignment requirement on the virtual address because of the
>> buddy memory allocation algorithm; changing how sym allocates memory
>> internally would remove the requirement.  The buddy algorithm with 2^13
>> bytes aligned on a 2^12 byte (but not a 2^13 byte) boundary can
>> provide two
>> chunks of 2^12 bytes but nothing greater than 2^12 bytes.
>>
>> The VTOBUS failure is caused by the buddy implementation making alignment
>> assumptions which aren't true, and then getting the virtual addresses
>> wrong.
>>
>> Perhaps I'm just doing something wrong with bus_dma.  I believe I set the
>> alignment requirements to be 2*PAGE_SIZE, and this is what I see for the
>> physical address.  However the virtual address seems to only be page
>> aligned.
>>
>> I can't see any mention of virtual address alignment in the bus_dma man
>> page.  Can it take care of virtual address alignment?  If so, how?
>>  
> 
> busdma makes no guarantees on virtual addresses.
> 
> Sigh, sorry I never got this fixed.  The custom memory allocator made me
> unhappy, and I never had time to dig into it.  Do real docs on sym exist
> somewhere?  I'm not against sitting down and re-writing the physical
> memory handling to both work and conform to the FreeBSD APIs.

I've been the co-author of the ncr SCSI driver, on which sym is based
(though not that particular code fragment). Since I know the structure
and principals of the driver (and since I have and know the docs up to
the 53c875, possibly also the 53c895), I'd probably be in a position
to work on this with the least effort to get started. Only problem is
that I do not have an amd64 system for testing ...

I changed the private allocator in the sym driver to use contigmalloc,
some time ago, but now I understand that there are stricter alignment
requirements. For a start, a work-around could be committed, IMHO (even
if it is ugly). The better approach is of course an extension of busdma
to support aligned physical chunks as required by the driver.

But I could also try to find a clean fix for the affected driver code.

Is the Symbios SCSI controller still used that much that the effort
required for a "clean" fix is well spent?

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: top doesn't show any Process in idle-Mode

2006-01-30 Thread Stefan Esser
On 2006-01-30 13:49 +0100, Michael Schuh <[EMAIL PROTECTED]> wrote:
> Hello,
> 
> i use top mostly in idle-mode.
> # top  
> or
> # top -I
> 
> Under releng_6 (stable p4) and the older versions,
> i think down to releng_5, doesn't show a running process.
> 
> I have tryed to dig in the source but my experiences are not
> so good that i can find the possible error.

See line 603 (in HEAD) of /usr/src/usr.bin/top/machine.c:

if (displaymode == DISP_CPU && !show_idle &&
(pp->ki_pctcpu == 0 || pp->ki_stat != SRUN))
/* skip idle or non-running processes */
continue;

Since I do not like the current behaviour, I considered removing the
test for state SRUN. But I guess that the teest can not be completely
eliminated. Instead of selecting only SRUN, some states may need to
be suppressed (SZOMB, possibly also SIDL, SSTOP).

I'll test the following version on my system:

if (displaymode == DISP_CPU && !show_idle &&
(pp->ki_pctcpu == 0 || pp->ki_stat == SZOMB || pp->ki_stat 
== SSTOP))
/* skip idle or non-running processes */
continue;

> Has anyone same experiences made with top, or has
> anyone a workaround (please not like while true)

Patch included (not verified to apply to 5.x or 6.x, but editing the
test in place should be easy, then).

Regards, STefan

--- /usr/src/usr.bin/top/machine.c  18 May 2005 13:42:51 -  1.74
+++ /usr/src/usr.bin/top/machine.c  30 Jan 2006 14:26:08 -
@@ -601,7 +601,7 @@
continue;
 
if (displaymode == DISP_CPU && !show_idle &&
-   (pp->ki_pctcpu == 0 || pp->ki_stat != SRUN))
+   (pp->ki_pctcpu == 0 || pp->ki_stat == SZOMB || pp->ki_stat 
== SSTOP))
/* skip idle or non-running processes */
continue;
 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Interscan Viruswall for Linux.... on FreeBSD-4.x

2001-01-27 Thread Stefan Esser

On 2001-01-24 22:59 -0500, Forrest Aldrich <[EMAIL PROTECTED]> wrote:
> I would like to hear from anyone who is successfully using Interscan 
> Viruswall for Linux on FreeBSD, under Linux emulation (via the linux_base 
> port).

I ordered the Solaris version and received a single CD-ROM that installs
on all supported platforms (Unix and Win32). At first, I didn't find the
distribution files for Solaris (the box came with NT docs, but I checked
with the vendor, that I received a licence that is valid on Solaris).

To my surprise, the Linux version on that media was 3.6, while the Solaris
version was still at 3.5!

Anyway, I tried the Linux version on a 4.2-STABLE box, just to compare the
performance to that of our productive system, and found that the Linux 
version works just fine.

> It seems this should work with ease; however, TrendMicro refuses to 
> disclose what "configuration adjustments" need to be made, citing that it's 
> not "officially supported" (ie: some of their programs have hard coded path 
> requirements, which they've been notified about from others who use this on 
> FreeBSD).

Well, if you know about rc files and are not afraid to use symbolic links
between file systems, all should be well ;-)

I even considered creating a FreeBSD port, that installs VirusWall from a
CD-ROM ...

My trial period is over, but I have not yet de-installed the software from
my test box, so I may help get you started, in case you find any problems.

My only nit is, that you can't bind VirusWall to a single interface. My
firewall architecture requires strictly independent configurations for 
incoming and outgoing messages. For that reason, I have separate instances
of VirusWall for either direction, and it required some magic to get all
the parts working together on a single box.

Regards, STefan


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: Re: PCIOCGETCONF/PCIOCREAD requires write permission?

2000-12-08 Thread Stefan Esser

On 2000-12-08 10:02 -0600, Mike Silbersack <[EMAIL PROTECTED]> wrote:
> Seriously, though.  There must be some way to abuse such direct access to
> the pci configuration registers.  Just because nobody has figured it out
> how yet doesn't mean that enabling the feature is a good idea.

Well, what makes you think, that nobody has figured out why read access
to the pci config space registers might not be a good idea ? ;-)

The reason is simple: There are a number of PCI devices that fail in a 
number of ways, if certain config space registers are accessed while the
device is active. This is counterintuitive at first, but just try to
read a config register beyond 0x80 from an NCR SCSI chip while it is 
executing SCRIPTS code ...

The PCI spec made higher numbered config space registers implementation
dependent. Some vendors mapped their devices' operational registers into
config space, even though the spec never encouraged that (though I'm not
sure that such an (ab)use of config registers was declared forbidden in 
later revisions of the spec.).

Since there are a number of devices that could be severely impacted by
read accesses to configuration space registers, we can't safely permit
any user such read access. Root hopefully knows what he is doing and only 
accesses such registers that are meant to be accessed while the device is
operating ...

Regards, STefan


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: how to stop route redirects

1999-08-17 Thread Stefan Esser

On 1999-08-16 15:15 -0700, Ed Baxter <[EMAIL PROTECTED]> wrote:
Route redirect is based on ICMP. See "man ipfw" (or "man ipf")
and referred man-pages for the packet filter extensions that 
allow blocking of all or specific ICMP redirect messages.

If you are running a recent -current (After August 10th), then 
you can control how the kernel reacts on ICMP redirect packets:

net.inet.icmp.log_redirect: 0
net.inet.icmp.drop_redirect: 0

Use "sysctl -w net.inet.icmp.drop_redirect=1" to ignore all ICMP
redirects (possibly after prior logging, if "log_riderects" == 1.

(You may want to merge that code into -stable, else:

cd /sys/netinet
cvs up -kk -j 1.35 ip_icmp.c

Or apply the patch at the end of this file to just add the "drop"
feature to -stable ...)

Regards, STefan

Index: ip_icmp.c
===
RCS file: /usr/cvs/src/sys/netinet/ip_icmp.c,v
retrieving revision 1.33.2.1
diff -u -2 -r1.33.2.1 ip_icmp.c
--- ip_icmp.c   1999/03/06 23:11:41 1.33.2.1
+++ ip_icmp.c   1999/08/17 09:36:45
@@ -70,4 +70,8 @@
&icmpmaskrepl, 0, "");
 
+static int drop_redirect = 0;
+SYSCTL_INT(_net_inet_icmp, OID_AUTO, drop_redirect, CTLFLAG_RW, 
+   &drop_redirect, 0, "");
+
 #ifdef ICMP_BANDLIM 
  
@@ -463,4 +467,6 @@
 
case ICMP_REDIRECT:
+   if (drop_redirect)
+   break;
if (code > 3)
goto badcode;


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message