Re: mutex_owner

2013-02-06 Thread Andrey Zonov
On 2/5/13 6:37 PM, Dr. Baud wrote:
> All,
> 
>  Anyone use mutex_owner in a dtrace script, as the obvious does not work 
> for me:
> 
> Content of spin.d:
> 
> #!/usr/sbin/dtrace -qs
> 
> :::*spin
> {
> self->mutex = (kmutex_t *) arg0;
> self->mutex_owner = mutex_owner((kmutex_t *) :self->mutex);
> }
> 

Lock implementation in FreeBSD is different from in Solaris.  The script
below has to do what you want.

:::*spin
{
self->mtx = (struct mtx *)arg0;
self->mtx_owner = mutex_owner(self->mtx);
}

Implementation details of mutexes you can find in sys/sys/_mutex.h,
sys/sys/mutex.h, sys/kern/kern_mutex.c.

-- 
Andrey Zonov



signature.asc
Description: OpenPGP digital signature


Re: Failsafe on kernel panic

2013-02-02 Thread Andrey Zonov
On 1/20/13 6:07 PM, Willem Jan Withagen wrote:
> On 17-1-2013 4:18, Ian Lepore wrote:
>> On Wed, 2013-01-16 at 23:27 +0200, Sami Halabi wrote:
>>> Thank you for your response, very helpful.
>>> one question - how do i configure auto-reboot once kernel panic occurs?
>>>
>>> Sami
>>>
>>
>> From src/sys/conf/NOTES, this may be what you're looking for...
>>
>> #
>> # Don't enter the debugger for a panic. Intended for unattended operation
>> # where you may want to enter the debugger from the console, but still want
>> # the machine to recover from a panic.
>> #
>> options  KDB_UNATTENDED
>>
>> But I think it only has meaning if you have option KDB in effect,
>> otherwise it should just reboot itself after a 15 second pause.
> 
> Well it is not the  magical fix-all solution.
> 
> Last night I had to drive to the colo (lucky for me a 5 min drive.)
> because I could not get a system to reboot/recover from a crash.
> 
> Upon arrival the system was crashed and halted on the message:
>   rebooting in 15 sec.
> 

I've seen the same thing many and many times.  Now I'm using ddb to save
crash dump and reboot machine on panic.  It's much more reliable.

-- 
Andrey Zonov



signature.asc
Description: OpenPGP digital signature


Re: Is there any modern alternative to pstack?

2013-01-14 Thread Andrey Zonov
On 8/7/12 4:34 AM, Yuri wrote:
> On 04/16/2012 06:59, John Baldwin wrote:
>> I'm fine with putting it into the base.  If so, we should import 1.2
>> first I
>> think and then apply the 1.3 patch.
> 
> So are there plans to import it into the base? Maybe for 9.1?
> /usr/ports/sysutils/pstack is still i386 only.
> 

Try this version [1].  I plan to update sysutils/pstack to it.

[1] https://github.com/z0nt/pstack

-- 
Andrey Zonov



signature.asc
Description: OpenPGP digital signature


Re: Question on io monitoring tools such as gstat and iostat

2012-08-29 Thread Andrey Zonov
On 8/29/12 12:07 PM, Daniel Braniss wrote:
>> This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
>> --enigCDF012FCB4FC78B4732FDA45
>> Content-Type: text/plain; charset=ISO-8859-1
>> Content-Transfer-Encoding: quoted-printable
>>
>> On 8/28/12 3:14 PM, Andy Young wrote:
>>> I am relatively new to using IO monitoring tools and wanted to confirm =
>> I
>>> understand them correctly. If I specify an interval of 5 seconds, my
>>> assumption is that the data displayed is an average over that 5 second
>>> interval. Is that correct or am I misunderstanding how intervals work?
>>> =20
>>
>> Yes, you are right.  For more information you can read devstat(3) or
>> sources in src/lib/libdevstat/devstat.c.
> 
> netstat does not!
> 

And you are right, but the question was about gstat(8) and iostat(8).
They print units per second, unlike netstat(1), it prints only units
without "per".

-- 
Andrey Zonov



signature.asc
Description: OpenPGP digital signature


Re: Question on io monitoring tools such as gstat and iostat

2012-08-28 Thread Andrey Zonov
On 8/28/12 3:14 PM, Andy Young wrote:
> I am relatively new to using IO monitoring tools and wanted to confirm I
> understand them correctly. If I specify an interval of 5 seconds, my
> assumption is that the data displayed is an average over that 5 second
> interval. Is that correct or am I misunderstanding how intervals work?
> 

Yes, you are right.  For more information you can read devstat(3) or
sources in src/lib/libdevstat/devstat.c.

-- 
Andrey Zonov



signature.asc
Description: OpenPGP digital signature


Re: GPT boot from 2nd. disk fails

2012-08-16 Thread Andrey Zonov

On 8/16/12 11:06 AM, Daniel Braniss wrote:

On Wednesday, August 15, 2012 4:46:28 am Garrett Cooper wrote:

On Wed, Aug 15, 2012 at 1:27 AM, Daniel Braniss  wrote:

hi,
this host has to disks:
sa0> gpart show
=>   34  976773101  ada0  GPT  (465G)
  34128 1  freebsd-boot  (64k)
 1624194304 2  freebsd-ufs  (2.0G)
 4194466   33554432 3  freebsd-swap  (16G)
37748898  939024237 4  freebsd-zfs  (447G)

=>   34  976773101  ada1  GPT  (465G)
  34128 1  freebsd-boot  (64k)
 1624194304 2  freebsd-ufs  [bootme]  (2.0G)
 41944668388608 3  freebsd-swap  (4.0G)
12583074  964190061 4  freebsd-zfs  (459G)

but no amount of magic will cause boot from the second disk, it will

always

boot from the first disk.

any insights?


 Use boot0cfg -s 5 (untested with GPT disks)?


Will not work with GPT disks.  They use /boot/pmbr to boot, not /boot/boot0.

If you can get your BIOS to explicitly boot ada1 from the start via a BIOS
setting, that should work.  Another option would be to break into gptboot's
prompt (similar to breaking into boot2) aud typing in 'ad1p2:/boot/loader' or
some such.  If that works you should even be able to write that to
/boot.config on ada0p2's filesystem.


sorry, as usual my questions are a bit terse :-),
I want to switch between roots either at boot time (this is very tricky now,
since breaking into boot2 needs very fast fingers) or before reboot.
btw, it's 1:ad(0p2)/boot/loader
also, since the disks are hot swap, i can switch between them, but I realy
want to do it via software!

the bootme trick did work, on a different host/setup and sometime ago.

before GPT, when we had MBR, I could switch between slices/partitions either
via the menu or via boot0cfg, so maybe I should go back to mbr.



You can erase boot record of the first disk, then your BIOS will try to 
use second one.  Be careful, some BIOS'es try only first disk.



--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


system wide major/minor page faults counters

2012-08-02 Thread Andrey Zonov

Hi,

It would be useful to have system wide major and minor page faults 
counters.  Attached patch makes this possible.


Are there any objections to have it?

--
Andrey Zonov

Index: usr.bin/vmstat/vmstat.c
===
--- usr.bin/vmstat/vmstat.c (revision 238738)
+++ usr.bin/vmstat/vmstat.c (working copy)
@@ -473,6 +473,8 @@ fill_vmmeter(struct vmmeter *vmmp)
ADD_FROM_PCPU(i, v_cow_optim);
ADD_FROM_PCPU(i, v_zfod);
ADD_FROM_PCPU(i, v_ozfod);
+   ADD_FROM_PCPU(i, v_majflt);
+   ADD_FROM_PCPU(i, v_minflt);
ADD_FROM_PCPU(i, v_swapin);
ADD_FROM_PCPU(i, v_swapout);
ADD_FROM_PCPU(i, v_swappgsin);
@@ -511,6 +513,8 @@ fill_vmmeter(struct vmmeter *vmmp)
GET_VM_STATS(vm, v_cow_optim);
GET_VM_STATS(vm, v_zfod);
GET_VM_STATS(vm, v_ozfod);
+   GET_VM_STATS(vm, v_majflt);
+   GET_VM_STATS(vm, v_minflt);
GET_VM_STATS(vm, v_swapin);
GET_VM_STATS(vm, v_swapout);
GET_VM_STATS(vm, v_swappgsin);
@@ -966,6 +970,8 @@ dosum(void)
(void)printf("%9u copy-on-write optimized faults\n", sum.v_cow_optim);
(void)printf("%9u zero fill pages zeroed\n", sum.v_zfod);
(void)printf("%9u zero fill pages prezeroed\n", sum.v_ozfod);
+   (void)printf("%9u page faults\n", sum.v_majflt);
+   (void)printf("%9u page reclaims\n", sum.v_minflt);
(void)printf("%9u intransit blocking page faults\n", sum.v_intrans);
(void)printf("%9u total VM faults taken\n", sum.v_vm_faults);
(void)printf("%9u pages affected by kernel thread creation\n", 
sum.v_kthreadpages);
Index: usr.bin/systat/vmstat.c
===
--- usr.bin/systat/vmstat.c (revision 238738)
+++ usr.bin/systat/vmstat.c (working copy)
@@ -82,6 +82,8 @@ static struct Info {
u_int v_cow_faults; /* number of copy-on-writes */
u_int v_zfod;   /* pages zero filled on demand */
u_int v_ozfod;  /* optimized zero fill pages */
+   u_int v_majflt; /* page faults */
+   u_int v_minflt; /* page reclaims */
u_int v_swapin; /* swap pager pageins */
u_int v_swapout;/* swap pager pageouts */
u_int v_swappgsin;  /* swap pager pages paged in */
@@ -328,20 +330,22 @@ labelkre(void)
mvprintw(VMSTATROW + 1, VMSTATCOL + 9, "zfod");
mvprintw(VMSTATROW + 2, VMSTATCOL + 9, "ozfod");
mvprintw(VMSTATROW + 3, VMSTATCOL + 9 - 1, "%%ozfod");
-   mvprintw(VMSTATROW + 4, VMSTATCOL + 9, "daefr");
-   mvprintw(VMSTATROW + 5, VMSTATCOL + 9, "prcfr");
-   mvprintw(VMSTATROW + 6, VMSTATCOL + 9, "totfr");
-   mvprintw(VMSTATROW + 7, VMSTATCOL + 9, "react");
-   mvprintw(VMSTATROW + 8, VMSTATCOL + 9, "pdwak");
-   mvprintw(VMSTATROW + 9, VMSTATCOL + 9, "pdpgs");
-   mvprintw(VMSTATROW + 10, VMSTATCOL + 9, "intrn");
-   mvprintw(VMSTATROW + 11, VMSTATCOL + 9, "wire");
-   mvprintw(VMSTATROW + 12, VMSTATCOL + 9, "act");
-   mvprintw(VMSTATROW + 13, VMSTATCOL + 9, "inact");
-   mvprintw(VMSTATROW + 14, VMSTATCOL + 9, "cache");
-   mvprintw(VMSTATROW + 15, VMSTATCOL + 9, "free");
-   if (LINES - 1 > VMSTATROW + 16)
-   mvprintw(VMSTATROW + 16, VMSTATCOL + 9, "buf");
+   mvprintw(VMSTATROW + 4, VMSTATCOL + 9, "majflt");
+   mvprintw(VMSTATROW + 5, VMSTATCOL + 9, "minflt");
+   mvprintw(VMSTATROW + 6, VMSTATCOL + 9, "daefr");
+   mvprintw(VMSTATROW + 7, VMSTATCOL + 9, "prcfr");
+   mvprintw(VMSTATROW + 8, VMSTATCOL + 9, "totfr");
+   mvprintw(VMSTATROW + 9, VMSTATCOL + 9, "react");
+   mvprintw(VMSTATROW + 10, VMSTATCOL + 9, "pdwak");
+   mvprintw(VMSTATROW + 11, VMSTATCOL + 9, "pdpgs");
+   mvprintw(VMSTATROW + 12, VMSTATCOL + 9, "intrn");
+   mvprintw(VMSTATROW + 13, VMSTATCOL + 9, "wire");
+   mvprintw(VMSTATROW + 14, VMSTATCOL + 9, "act");
+   mvprintw(VMSTATROW + 15, VMSTATCOL + 9, "inact");
+   mvprintw(VMSTATROW + 16, VMSTATCOL + 9, "cache");
+   mvprintw(VMSTATROW + 17, VMSTATCOL + 9, "free");
+   if (LINES - 1 > VMSTATROW + 18)
+   mvprintw(VMSTATROW + 18, VMSTATCOL + 9, "buf");
 
mvprintw(GENSTATROW, GENSTATCOL, " Csw  Trp  Sys  Int  Sof  Flt");
 
@@ -498,20 +502,22 @@ showkr

Re: /proc filesystem

2012-06-18 Thread Andrey Zonov

On 6/18/12 10:31 PM, Wojciech Puchar wrote:

where can i find description of field of files /proc/*/map
?


Use procstat -v instead.  All fields are documented in procstat(1).

--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: usertime stale at about 371k seconds

2012-06-12 Thread Andrey Zonov

On 6/13/12 1:21 AM, Mark Linimon wrote:

On Wed, Jun 13, 2012 at 12:30:08AM +0400, Andrey Zonov wrote:

No, I didn't.  I want to fix the problem not just file a PR and wait
for years.


I do understand your frustration, but we have some new people interested
in picking up and handling src-related PRs, so I see the situation as
improving a bit.



Hi Mark,

Please look at the date of PR/76972.  More than 7 years past since it 
was filed and I can't see any progress.  I've got more PRs (not only 
mine) that was filed but never touched for years.  That's about frustration.


On the other hand I can read and write in C, I can understand FreeBSD 
code and can solve some problems by myself.  My patches go forward to 
maintainers.  That works better than just file a PR and wait.


--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: usertime stale at about 371k seconds

2012-06-12 Thread Andrey Zonov

On 5/31/12 11:34 AM, Andrey Zonov wrote:

On 5/30/12 11:27 PM, Andrey Zonov wrote:

Hi,

I have long running process for which `ps -o usertime -p $pid' shows
always the same time - 6190:07.65, `ps -o cputime -p $pid' for the same
process continue to grow and now it's 21538:53.61. It looks like
overflow in resource usage code or something.



I reproduced that problem with attached program. I ran it with 23
threads on machine with 24 CPUs and after night I see this:

$ ps -o usertime,time -p 24134 && sleep 60 && ps -o usertime,time -p 24134
USERTIME TIME
6351:24.74 14977:35.19
USERTIME TIME
6351:24.74 15000:34.53

Per thread user-time counts correct:

$ ps -H -o usertime,time -p 24134
USERTIME TIME
0:00.00 0:00.00
652:35.84 652:38.59
652:34.75 652:37.97
652:50.46 652:51.97
652:38.93 652:43.08
652:39.73 652:43.36
652:44.09 652:47.36
652:56.49 652:57.94
652:51.84 652:54.41
652:37.48 652:41.57
652:36.61 652:40.90
652:39.41 652:42.52
653:03.72 653:06.72
652:49.96 652:53.25
652:45.92 652:49.03
652:40.33 652:42.05
652:46.53 652:49.31
652:44.77 652:47.33
653:00.54 653:02.24
652:33.31 652:36.13
652:51.03 652:52.91
652:50.73 652:52.71
652:41.32 652:44.64
652:59.86 653:03.25

(kgdb) p $my->p_rux
$14 = {rux_runtime = 2171421985692826, rux_uticks = 114886093,
rux_sticks = 8353, rux_iticks = 0, rux_uu = 381084736784, rux_su =
65773652, rux_tu = 904571706136}
(kgdb) p $my->p_rux
$15 = {rux_runtime = 2191831516209186, rux_uticks = 115966087,
rux_sticks = 8444, rux_iticks = 0, rux_uu = 381084736784, rux_su =
66458587, rux_tu = 913099969825}

As you can see rux_uu stale, but rux_uticks still ticks. I think the
problem is in calcru1(). This expression

uu = (tu * ut) / tt

overflows.

I applied the following patch:



I've made some explorations and found that this expression 
'(uint64_t)a*(uint64_t)b/(uint64_t)c' can be replaced with this '(a/c)*b 
+ (a%c)*(b/c) + (a%c)*(b%c)/c' and will be perfect for 0as 'c' is sum of ticks, overflow occurs after 
2^32/128(stathz)/60(sec)/60(min)/24(hours) = 388 days! or after 16 days 
on machine with 24 cores.  That's better than we got now.


In user-land I can use (__uint128_t)a*b/c for this purpose but kernel 
doesn't built with it.  If you know good algorithm how to calculate 
'(uint64_t)a*(uint64_t)b/(uint64_t)c' for 'c > 2^32' please let me know.


--
Andrey Zonov
Index: sys/kern/kern_resource.c
===
--- sys/kern/kern_resource.c(revision 234600)
+++ sys/kern/kern_resource.c(working copy)
@@ -880,6 +880,8 @@ rufetchtd(struct thread *td, struct rusage *ru)
calcru1(p, &td->td_rux, &ru->ru_utime, &ru->ru_stime);
 }
 
+#definemul_div(a, b, c)(a/c)*b + (a%c)*(b/c) + (a%c)*(b%c)/c
+
 static void
 calcru1(struct proc *p, struct rusage_ext *ruxp, struct timeval *up,
 struct timeval *sp)
@@ -909,10 +911,10 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
 * The normal case, time increased.
 * Enforce monotonicity of bucketed numbers.
 */
-   uu = (tu * ut) / tt;
+   uu = mul_div(tu, ut, tt);
if (uu < ruxp->rux_uu)
uu = ruxp->rux_uu;
-   su = (tu * st) / tt;
+   su = mul_div(tu, st, tt);
if (su < ruxp->rux_su)
su = ruxp->rux_su;
} else if (tu + 3 > ruxp->rux_tu || 101 * tu > 100 * ruxp->rux_tu) {
@@ -941,8 +943,8 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
"to %ju usec for pid %d (%s)\n",
(uintmax_t)ruxp->rux_tu, (uintmax_t)tu,
p->p_pid, p->p_comm);
-   uu = (tu * ut) / tt;
-   su = (tu * st) / tt;
+   uu = mul_div(tu, ut, tt);
+   su = mul_div(tu, st, tt);
}
 
ruxp->rux_uu = uu;
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: usertime stale at about 371k seconds

2012-06-12 Thread Andrey Zonov

On 6/11/12 7:33 PM, Eric van Gyzen wrote:

On 05/31/2012 02:34, Andrey Zonov wrote:

On 5/30/12 11:27 PM, Andrey Zonov wrote:

Hi,

I have long running process for which `ps -o usertime -p $pid' shows
always the same time - 6190:07.65, `ps -o cputime -p $pid' for the same
process continue to grow and now it's 21538:53.61. It looks like
overflow in resource usage code or something.



I reproduced that problem with attached program. I ran it with 23
threads on machine with 24 CPUs and after night I see this:

$ ps -o usertime,time -p 24134 && sleep 60 && ps -o usertime,time -p
24134
USERTIME TIME
6351:24.74 14977:35.19
USERTIME TIME
6351:24.74 15000:34.53

Per thread user-time counts correct:

$ ps -H -o usertime,time -p 24134
USERTIME TIME
0:00.00 0:00.00
652:35.84 652:38.59
652:34.75 652:37.97
652:50.46 652:51.97
652:38.93 652:43.08
652:39.73 652:43.36
652:44.09 652:47.36
652:56.49 652:57.94
652:51.84 652:54.41
652:37.48 652:41.57
652:36.61 652:40.90
652:39.41 652:42.52
653:03.72 653:06.72
652:49.96 652:53.25
652:45.92 652:49.03
652:40.33 652:42.05
652:46.53 652:49.31
652:44.77 652:47.33
653:00.54 653:02.24
652:33.31 652:36.13
652:51.03 652:52.91
652:50.73 652:52.71
652:41.32 652:44.64
652:59.86 653:03.25

(kgdb) p $my->p_rux
$14 = {rux_runtime = 2171421985692826, rux_uticks = 114886093,
rux_sticks = 8353, rux_iticks = 0, rux_uu = 381084736784, rux_su =
65773652, rux_tu = 904571706136}
(kgdb) p $my->p_rux
$15 = {rux_runtime = 2191831516209186, rux_uticks = 115966087,
rux_sticks = 8444, rux_iticks = 0, rux_uu = 381084736784, rux_su =
66458587, rux_tu = 913099969825}

As you can see rux_uu stale, but rux_uticks still ticks. I think the
problem is in calcru1(). This expression

uu = (tu * ut) / tt

overflows.

I applied the following patch:

Index: /usr/src/sys/kern/kern_resource.c
===
--- /usr/src/sys/kern/kern_resource.c (revision 235394)
+++ /usr/src/sys/kern/kern_resource.c (working copy)
@@ -885,7 +885,7 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
struct timeval *sp)
{
/* {user, system, interrupt, total} {ticks, usec}: */
- uint64_t ut, uu, st, su, it, tt, tu;
+ uint64_t ut, uu, st, su, it, tt, tu, tmp;

ut = ruxp->rux_uticks;
st = ruxp->rux_sticks;
@@ -909,10 +909,20 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
* The normal case, time increased.
* Enforce monotonicity of bucketed numbers.
*/
- uu = (tu * ut) / tt;
+ if (ut == 0)
+ uu = 0;
+ else {
+ tmp = tt / ut;
+ uu = tmp ? tu / tmp : 0;
+ }
if (uu < ruxp->rux_uu)
uu = ruxp->rux_uu;

and now ran test again.


This looks related to, and possibly identical to, PR kern/76972:

http://www.freebsd.org/cgi/query-pr.cgi?pr=76972


Yes, that's the same.



If you filed a PR, please submit a follow-up to both PRs so they
reference each other.


No, I didn't.  I want to fix the problem not just file a PR and wait for 
years.




Thanks,



Thank you.


Eric



--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: detailed map of WIRED memory under FreeBSD 9

2012-06-02 Thread Andrey Zonov

On 6/1/12 12:19 PM, Wojciech Puchar wrote:

what tool and how can be used to display detailed map what exactly wired
memory on my system as it is far way too much (1.5GB out of 4GB RAM).



I think `vmstat -m' and `vmstat -z' can help you.


i do run 4 virtualboxes but one have 256MB RAM, the others 192 and when
i turn them off wired memory goes down right amount but still it is too
much used.




--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: usertime stale at about 371k seconds

2012-05-31 Thread Andrey Zonov

On 5/30/12 11:27 PM, Andrey Zonov wrote:

Hi,

I have long running process for which `ps -o usertime -p $pid' shows
always the same time - 6190:07.65, `ps -o cputime -p $pid' for the same
process continue to grow and now it's 21538:53.61. It looks like
overflow in resource usage code or something.



I reproduced that problem with attached program.  I ran it with 23 
threads on machine with 24 CPUs and after night I see this:


$ ps -o usertime,time -p 24134 && sleep 60 && ps -o usertime,time -p 24134
  USERTIMETIME
6351:24.74 14977:35.19
  USERTIMETIME
6351:24.74 15000:34.53

Per thread user-time counts correct:

$ ps -H -o usertime,time -p 24134
 USERTIME  TIME
  0:00.00   0:00.00
652:35.84 652:38.59
652:34.75 652:37.97
652:50.46 652:51.97
652:38.93 652:43.08
652:39.73 652:43.36
652:44.09 652:47.36
652:56.49 652:57.94
652:51.84 652:54.41
652:37.48 652:41.57
652:36.61 652:40.90
652:39.41 652:42.52
653:03.72 653:06.72
652:49.96 652:53.25
652:45.92 652:49.03
652:40.33 652:42.05
652:46.53 652:49.31
652:44.77 652:47.33
653:00.54 653:02.24
652:33.31 652:36.13
652:51.03 652:52.91
652:50.73 652:52.71
652:41.32 652:44.64
652:59.86 653:03.25

(kgdb) p $my->p_rux
$14 = {rux_runtime = 2171421985692826, rux_uticks = 114886093, 
rux_sticks = 8353, rux_iticks = 0, rux_uu = 381084736784, rux_su = 
65773652, rux_tu = 904571706136}

(kgdb) p $my->p_rux
$15 = {rux_runtime = 2191831516209186, rux_uticks = 115966087, 
rux_sticks = 8444, rux_iticks = 0, rux_uu = 381084736784, rux_su = 
66458587, rux_tu = 913099969825}


As you can see rux_uu stale, but rux_uticks still ticks.  I think the 
problem is in calcru1().  This expression


uu = (tu * ut) / tt

overflows.

I applied the following patch:

Index: /usr/src/sys/kern/kern_resource.c
===
--- /usr/src/sys/kern/kern_resource.c   (revision 235394)
+++ /usr/src/sys/kern/kern_resource.c   (working copy)
@@ -885,7 +885,7 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
 struct timeval *sp)
 {
/* {user, system, interrupt, total} {ticks, usec}: */
-   uint64_t ut, uu, st, su, it, tt, tu;
+   uint64_t ut, uu, st, su, it, tt, tu, tmp;

ut = ruxp->rux_uticks;
st = ruxp->rux_sticks;
@@ -909,10 +909,20 @@ calcru1(struct proc *p, struct rusage_ext *ruxp, s
 * The normal case, time increased.
 * Enforce monotonicity of bucketed numbers.
 */
-   uu = (tu * ut) / tt;
+   if (ut == 0)
+   uu = 0;
+   else {
+   tmp = tt / ut;
+   uu = tmp ? tu / tmp : 0;
+   }
if (uu < ruxp->rux_uu)
    uu = ruxp->rux_uu;

and now ran test again.

--
Andrey Zonov
/*
 * Andrey Zonov (c) 2012
 */

#include 
#include 
#include 

void *func(void *arg);

int
main(int argc, char **argv)
{
int i;
int threads;
int *tid;
pthread_t *tds;

if (argc != 2)
errx(1, "usage: usertime ");

threads = atoi(argv[1]);
tid = malloc(sizeof(int) * threads);
tds = malloc(sizeof(pthread_t) * threads);

for (i = 0; i < threads; i++) {
tid[i] = i;
if (pthread_create(&tds[i], NULL, func, &tid[i]) != 0)
err(1, "pthread_create(%d)", i);
}

for (i = 0; i < threads; i++)
if (pthread_join(tds[i], NULL) != 0)
err(1, "pthread_join(%d)", i);

exit(0);
}

void *
func(void *arg __unused)
{
int i;

#define MAX (1<<20)

for (i = 0; i < MAX; i++) {
if ((i % (MAX - 1)) == 0) {
i = 0;
/*usleep(1);*/
}
}

pthread_exit(NULL);
}
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

usertime stale at about 371k seconds

2012-05-30 Thread Andrey Zonov

Hi,

I have long running process for which `ps -o usertime -p $pid' shows 
always the same time - 6190:07.65, `ps -o cputime -p $pid' for the same 
process continue to grow and now it's 21538:53.61.  It looks like 
overflow in resource usage code or something.


Any ideas?

--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: problems with mmap() and disk caching

2012-05-22 Thread Andrey Zonov

On 4/30/12 3:49 AM, Alan Cox wrote:

On 04/11/2012 01:07, Andrey Zonov wrote:

On 10.04.2012 20:19, Alan Cox wrote:

On 04/09/2012 10:26, John Baldwin wrote:

On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

Hi,

I open the file, then call mmap() on the whole file and get pointer,
then I work with this pointer. I expect that page should be only
once
touched to get it into the memory (disk cache?), but this doesn't
work!

I wrote the test (attached) and ran it for the 1G file generated
from
/dev/random, the result is the following:

Prepare file:
# swapoff -a
# newfs /dev/ada0b
# mount /dev/ada0b /mnt
# dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024

Purge cache:
# umount /mnt
# mount /dev/ada0b /mnt

Run test:
$ ./mmap /mnt/random-1024 30
mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super:
0; other: 0)
mmap: 2 pass took: 7.356670 (none: 261648; res: 496; super:
0; other: 0)
mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super:
0; other: 0)
mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super:
0; other: 0)
mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super:
0; other: 0)
mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super:
0; other: 0)
mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super:
0; other: 0)
mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super:
0; other: 0)
mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super:
0; other: 0)
mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super:
0; other: 0)
mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super:
0; other: 0)
mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super:
0; other: 0)
mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super:
0; other: 0)
mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super:
0; other: 0)
mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super:
0; other: 0)
mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super:
0; other: 0)
mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super:
0; other: 0)
mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super:
0; other: 0)
mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super:
0; other: 0)
mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super:
0; other: 0)
mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super:
0; other: 0)
mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super:
0; other: 0)
mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super:
0; other: 0)
mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super:
0; other: 0)
mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super:
0; other: 0)
mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super:
0; other: 0)
mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super:
0; other: 0)
mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super:
0; other: 0)
mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super:
0; other: 0)
mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super:
0; other: 0)

If I ran this:
$ cat /mnt/random-1024> /dev/null
before test, when result is the following:

$ ./mmap /mnt/random-1024 5
mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super:
0; other: 0)
mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super:
0; other: 0)
mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super:
0; other: 0)
mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super:
0; other: 0)
mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super:
0; other: 0)

This is what I expect. But why this doesn't work without reading
file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.

I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years. Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

pmap_remove_all(mt);
if (mt->dirty != 0)
vm_page_deactivate(mt);
else
vm_page_cache(mt);

to:

vm_page_dontneed(mt);

because I suspect that the current code does more harm than good. In
theory, it saves activations of the page daemon. However, more often
than not, I suspect that we are spending more on page reactivations
than
we are saving on page daemon activations. The sequential access
detection heuristic is just too easily triggered. For example, I've
seen it triggered by demand paging of the gcc text segment. Also, I
think that pmap_remove_all() and especially vm_page_cache() are too
severe for a detection heuristic that is so easily triggered.

Are you planning to commit this?



Not yet. I did some tests with a file that was several times larger than
DRAM, and I didn't like what I saw. Initially, everything behaved as
expected, but about halfway through the test the bulk of the pages were
active. Despite the call to pmap_clear_reference() in
vm_page_dontneed()

Re: service causes ps errors in chroot

2012-04-22 Thread Andrey Zonov

On 4/22/12 4:07 PM, rank1see...@gmail.com wrote:

When I use '/usr/sbin/service' in chroot, it outputs:
ps: empty file: Invalid argument
   OR
ps: cannot read IdlePTD

But it does work.



I think you need devfs in your chroot.  Try this:

mount -t devfs devfs /path/to/chroot/dev


--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: problems with mmap() and disk caching

2012-04-10 Thread Andrey Zonov

On 10.04.2012 20:19, Alan Cox wrote:

On 04/09/2012 10:26, John Baldwin wrote:

On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

Hi,

I open the file, then call mmap() on the whole file and get pointer,
then I work with this pointer. I expect that page should be only once
touched to get it into the memory (disk cache?), but this doesn't
work!

I wrote the test (attached) and ran it for the 1G file generated from
/dev/random, the result is the following:

Prepare file:
# swapoff -a
# newfs /dev/ada0b
# mount /dev/ada0b /mnt
# dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024

Purge cache:
# umount /mnt
# mount /dev/ada0b /mnt

Run test:
$ ./mmap /mnt/random-1024 30
mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super:
0; other: 0)
mmap: 2 pass took: 7.356670 (none: 261648; res: 496; super:
0; other: 0)
mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super:
0; other: 0)
mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super:
0; other: 0)
mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super:
0; other: 0)
mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super:
0; other: 0)
mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super:
0; other: 0)
mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super:
0; other: 0)
mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super:
0; other: 0)
mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super:
0; other: 0)
mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super:
0; other: 0)
mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super:
0; other: 0)
mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super:
0; other: 0)
mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super:
0; other: 0)
mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super:
0; other: 0)
mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super:
0; other: 0)
mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super:
0; other: 0)
mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super:
0; other: 0)
mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super:
0; other: 0)
mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super:
0; other: 0)
mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super:
0; other: 0)
mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super:
0; other: 0)
mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super:
0; other: 0)
mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super:
0; other: 0)
mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super:
0; other: 0)
mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super:
0; other: 0)
mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super:
0; other: 0)
mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super:
0; other: 0)
mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super:
0; other: 0)
mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super:
0; other: 0)

If I ran this:
$ cat /mnt/random-1024> /dev/null
before test, when result is the following:

$ ./mmap /mnt/random-1024 5
mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super:
0; other: 0)
mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super:
0; other: 0)
mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super:
0; other: 0)
mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super:
0; other: 0)
mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super:
0; other: 0)

This is what I expect. But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.

I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years. Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

pmap_remove_all(mt);
if (mt->dirty != 0)
vm_page_deactivate(mt);
else
vm_page_cache(mt);

to:

vm_page_dontneed(mt);

because I suspect that the current code does more harm than good. In
theory, it saves activations of the page daemon. However, more often
than not, I suspect that we are spending more on page reactivations than
we are saving on page daemon activations. The sequential access
detection heuristic is just too easily triggered. For example, I've
seen it triggered by demand paging of the gcc text segment. Also, I
think that pmap_remove_all() and especially vm_page_cache() are too
severe for a detection heuristic that is so easily triggered.

Are you planning to commit this?



Not yet. I did some tests with a file that was several times larger than
DRAM, and I didn't like what I saw. Initially, everything behaved as
expected, but about halfway through the test the bulk of the pages were
active. Despite the call to pmap_clear_reference() in
vm_page_dontneed(), the page daemon is finding the pages to be
referenced and reactivating them. Th

Re: problems with mmap() and disk caching

2012-04-09 Thread Andrey Zonov
On Mon, Apr 9, 2012 at 1:18 PM, Konstantin Belousov  wrote:
> On Mon, Apr 09, 2012 at 11:17:41AM +0400, Andrey Zonov wrote:
>> On 06.04.2012 12:13, Konstantin Belousov wrote:
>> >On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote:
[snip]
>> >>I always thought that active memory this is a sum of resident memory of
>> >>all processes, inactive shows disk cache and wired shows kernel itself.
>> >So you are wrong. Both active and inactive memory can be mapped and
>> >not mapped, both can belong to vnode or to anonymous objects etc.
>> >Active/inactive distinction is only the amount of references that was
>> >noted by pagedaemon, or some other page history like the way it was
>> >unwired.
>> >
>> >Wired is not neccessary means kernel-used pages, user processes can
>> >wire their pages as well.
>>
>> Let's talk about that in details.
>>
>> My understanding is the following:
>>
>> Active memory: the memory which is referenced by application.  An
> Assuming the part 'by application' is removed, this sentence is almost right.
> Any managed mapping of the page participates in the active references.
>
>> application may get memory only through mmap() (allocator don't use
>> brk()/sbrk() any more).  The resident memory of an application is the
>> sum of physical used memory.  So, sum of RSS is active memory.
> First, brk/sbrk is still used. Second, there is no requirement that
> resident pages are referenced. E.g. page could have participated in the
> buffer, and unwiring on the buffer dissolve put it into inactive state.
> Or pagedaemon cleared the reference and moved the page to inactive queue.
> Or the page was prefaulted by different optimizations.
>
> More, there is subtle difference between 'resident' and 'not causing fault
> on access'. Page may be resident, but pte was not preinstalled, or pte
> was flushed etc.

>From the user point of view: how can the memory be active if no-one (I
mean application) use it?

What I really saw not at once is that the program for a long time
worked with big mmap()'ed file, couldn't work well (many page faults)
with new version of the file, until I manually flushed active memory
by FS re-mounting.  New version couldn't force out the old one.  In my
opinion if VM moved cached objects to inactive queue after program
termination I wouldn't see this problem.

>>
>> Inactive memory: the memory which has no references.  Once we call
>> read() on the file, the file is in inactive memory, because we have no
>> references to this object, we just read it.  This is also released
>> memory by free().
> On buffers dissolve, buffer cache explicitely puts pages constituing
> the buffer, into the inactive queue. In fact, this is not quite right,
> e.g. if the same pages are mapped and actively referenced, then
> pagedaemon has slightly more work now to move the page from inactive
> to active.
>

Yes, sure, if someone else use the object it should be active and even
better to introduce new "SHARED" counter, like one is in MacOSX and
Linux.

> And, free(3) operates at so much higher level then vm subsystem that
> describing the interaction between these two is impossible in any
> definitive mood. Old naive mallocs put block description at the beggining
> of the block, actually causing free() to reference at least the first
> page of the block. Jemalloc often does madvise(MADV_FREE) for large
> freed allocations. MADV_FREE  moves pages between queues probabalistically.
>

That's exactly what I meant by free().  We drop act_count to 0 and
move page to inactive queue by vm_page_dontneed()

>>
>> Cache memory: I don't know what is it. It's always small enough to not
>> think about it.
> This was the bug you reported, and which Alan fixed on Sunday.
>

I've tested this patch under 9.0-STABLE and should say that it
introduces problems with interactivity on heavy disk loaded machines.
With the patch that I tested before I didn't observe such problems.

>>
>> Wired memory: kernel memory and yes, application may get wired memory
>> through mlock()/mlockall(), but I haven't seen any real application
>> which calls mlock().
> ntpd, amd from the base system. gpg and similar programs try to mlock
> key store to avoid sensitive material leakage to the swap. cdrecord(8)
> tried to mlock itself to avoid indefinite stalls during write.
>

Nice catch ;-)

>
>>
>> >>
>> >>>>
>> >>>>Read the file:
>> >>>>$ cat /mnt/random>   /dev/null
>> >>>>
>> >>>>Mem: 79M Acti

Re: problems with mmap() and disk caching

2012-04-09 Thread Andrey Zonov

On 06.04.2012 12:13, Konstantin Belousov wrote:

On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote:

On 05.04.2012 23:41, Konstantin Belousov wrote:

On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote:

On 05.04.2012 19:54, Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

[snip]

This is what I expect. But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.


I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years. Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

pmap_remove_all(mt);
if (mt->dirty != 0)
vm_page_deactivate(mt);
else
vm_page_cache(mt);

to:

vm_page_dontneed(mt);



Thanks Alan!  Now it works as I expect!

But I have more questions to you and kib@.  They are in my test below.

So, prepare file as earlier, and take information about memory usage

>from top(1).  After preparation, but before test:

Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free

First run:
$ ./mmap /mnt/random
mmap:  1 pass took:   7.462865 (none:  0; res: 262144; super:
0; other:  0)

No super pages after first run, why?..

Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free

Now the file is in inactive memory, that's good.

Second run:
$ ./mmap /mnt/random
mmap:  1 pass took:   0.004191 (none:  0; res: 262144; super:
511; other:  0)

All super pages are here, nice.

Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free

Wow, all inactive pages moved to active and sit there even after process
was terminated, that's not good, what do you think?

Why do you think this is 'not good' ? You have plenty of free memory,
there is no memory pressure, and all pages were referenced recently.
THere is no reason for them to be deactivated.



I always thought that active memory this is a sum of resident memory of
all processes, inactive shows disk cache and wired shows kernel itself.

So you are wrong. Both active and inactive memory can be mapped and
not mapped, both can belong to vnode or to anonymous objects etc.
Active/inactive distinction is only the amount of references that was
noted by pagedaemon, or some other page history like the way it was
unwired.

Wired is not neccessary means kernel-used pages, user processes can
wire their pages as well.


Let's talk about that in details.

My understanding is the following:

Active memory: the memory which is referenced by application.  An 
application may get memory only through mmap() (allocator don't use 
brk()/sbrk() any more).  The resident memory of an application is the 
sum of physical used memory.  So, sum of RSS is active memory.


Inactive memory: the memory which has no references.  Once we call 
read() on the file, the file is in inactive memory, because we have no 
references to this object, we just read it.  This is also released 
memory by free().


Cache memory: I don't know what is it. It's always small enough to not 
think about it.


Wired memory: kernel memory and yes, application may get wired memory 
through mlock()/mlockall(), but I haven't seen any real application 
which calls mlock().






Read the file:
$ cat /mnt/random>   /dev/null

Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free

Now the file is in wired memory.  I do not understand why so.

You do use UFS, right ?


Yes.


There is enough buffer headers and buffer KVA
to have buffers allocated for the whole file content. Since buffers wire
corresponding pages, you get pages migrated to wired.

When there appears a buffer pressure (i.e., any other i/o started),
the buffers will be repurposed and pages moved to inactive.



OK, how can I get amount of disk cache?

You cannot. At least I am not aware of any counter that keeps track
of the resident pages belonging to vnode pager.

Buffers should not be thought as disk cache, pages cache disk content.
Instead, VMIO buffers only provide bread()/bwrite() compatible interface
to the page cache (*) for filesystems.
(*) - The cache term is used in generic term, not to confuse with
cached pages counter from top etc.



Yes, I know that.  I try once again to ask my question about buffers. 
Is this reasonable to use for them 10% of the physical memory or we may 
set rational upper limit automatically?






Could you please give me explanation about active/inactive/wired memory?



because I suspect that the current code does more harm than good. In
theory, it saves activations of the page daemon. However, more often
than not, I suspect that we are spending more on page reactivations than
we are saving on page daemon activations. The sequential access
detection heuristic is just too easily triggered. For example, I've seen
it triggered by demand pagi

Re: problems with mmap() and disk caching

2012-04-05 Thread Andrey Zonov

On 05.04.2012 23:54, Andrey Zonov wrote:

On 05.04.2012 23:41, Konstantin Belousov wrote:

You do use UFS, right ?


Yes.



I've run test on ZFS.

Mem: 2645M Active, 363M Inact, 2042M Wired, 1406M Buf, 42G Free

$ ./mmap /mnt/random

Mem: 3669M Active, 363M Inact, 3067M Wired, 1406M Buf, 40G Free

It eats 2Gb as I understand.

# umount /mnt
# zfs mount -a

Mem: 2645M Active, 363M Inact, 2042M Wired, 1406M Buf, 42G Free

$ cat /mnt/random > /dev/null

Mem: 2645M Active, 363M Inact, 3067M Wired, 1406M Buf, 41G Free

That's correct - 1Gb.

About "Buf" memory.  Is this reasonable to set it to 10% of physical 
memory?  I've lost 10Gb by default on machines with 96Gb.


--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: problems with mmap() and disk caching

2012-04-05 Thread Andrey Zonov

On 05.04.2012 23:41, Konstantin Belousov wrote:

On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote:

On 05.04.2012 19:54, Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

[snip]

This is what I expect. But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.


I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years. Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

pmap_remove_all(mt);
if (mt->dirty != 0)
vm_page_deactivate(mt);
else
vm_page_cache(mt);

to:

vm_page_dontneed(mt);



Thanks Alan!  Now it works as I expect!

But I have more questions to you and kib@.  They are in my test below.

So, prepare file as earlier, and take information about memory usage
from top(1).  After preparation, but before test:
Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free

First run:
$ ./mmap /mnt/random
mmap:  1 pass took:   7.462865 (none:  0; res: 262144; super:
0; other:  0)

No super pages after first run, why?..

Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free

Now the file is in inactive memory, that's good.

Second run:
$ ./mmap /mnt/random
mmap:  1 pass took:   0.004191 (none:  0; res: 262144; super:
511; other:  0)

All super pages are here, nice.

Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free

Wow, all inactive pages moved to active and sit there even after process
was terminated, that's not good, what do you think?

Why do you think this is 'not good' ? You have plenty of free memory,
there is no memory pressure, and all pages were referenced recently.
THere is no reason for them to be deactivated.



I always thought that active memory this is a sum of resident memory of 
all processes, inactive shows disk cache and wired shows kernel itself.




Read the file:
$ cat /mnt/random>  /dev/null

Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free

Now the file is in wired memory.  I do not understand why so.

You do use UFS, right ?


Yes.


There is enough buffer headers and buffer KVA
to have buffers allocated for the whole file content. Since buffers wire
corresponding pages, you get pages migrated to wired.

When there appears a buffer pressure (i.e., any other i/o started),
the buffers will be repurposed and pages moved to inactive.



OK, how can I get amount of disk cache?



Could you please give me explanation about active/inactive/wired memory?



because I suspect that the current code does more harm than good. In
theory, it saves activations of the page daemon. However, more often
than not, I suspect that we are spending more on page reactivations than
we are saving on page daemon activations. The sequential access
detection heuristic is just too easily triggered. For example, I've seen
it triggered by demand paging of the gcc text segment. Also, I think
that pmap_remove_all() and especially vm_page_cache() are too severe for
a detection heuristic that is so easily triggered.


[snip]

--
Andrey Zonov


--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: problems with mmap() and disk caching

2012-04-05 Thread Andrey Zonov

On 05.04.2012 19:54, Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

[snip]

This is what I expect. But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.


I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years. Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

pmap_remove_all(mt);
if (mt->dirty != 0)
vm_page_deactivate(mt);
else
vm_page_cache(mt);

to:

vm_page_dontneed(mt);



Thanks Alan!  Now it works as I expect!

But I have more questions to you and kib@.  They are in my test below.

So, prepare file as earlier, and take information about memory usage 
from top(1).  After preparation, but before test:

Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free

First run:
$ ./mmap /mnt/random
mmap:  1 pass took:   7.462865 (none:  0; res: 262144; super: 
0; other:  0)


No super pages after first run, why?..

Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free

Now the file is in inactive memory, that's good.

Second run:
$ ./mmap /mnt/random
mmap:  1 pass took:   0.004191 (none:  0; res: 262144; super: 
511; other:  0)


All super pages are here, nice.

Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free

Wow, all inactive pages moved to active and sit there even after process 
was terminated, that's not good, what do you think?


Read the file:
$ cat /mnt/random > /dev/null

Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free

Now the file is in wired memory.  I do not understand why so.

Could you please give me explanation about active/inactive/wired memory?



because I suspect that the current code does more harm than good. In
theory, it saves activations of the page daemon. However, more often
than not, I suspect that we are spending more on page reactivations than
we are saving on page daemon activations. The sequential access
detection heuristic is just too easily triggered. For example, I've seen
it triggered by demand paging of the gcc text segment. Also, I think
that pmap_remove_all() and especially vm_page_cache() are too severe for
a detection heuristic that is so easily triggered.


[snip]

--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: problems with mmap() and disk caching

2012-04-04 Thread Andrey Zonov

I forgot to attach my test program.

On 04.04.2012 13:36, Andrey Zonov wrote:

On 04.04.2012 11:17, Konstantin Belousov wrote:


Calling madvise(MADV_RANDOM) fixes the issue, because the code to
deactivate/cache the pages is turned off. On the other hand, it also
turns of read-ahead for faulting, and the first loop becomes eternally
long.


Now it takes 5 times longer. Anyway, thanks for explanation.



Doing MADV_WILLNEED does not fix the problem indeed, since willneed
reactivates the pages of the object at the time of call. To use
MADV_WILLNEED, you would need to call it between faults/memcpy.



I played with it, but no luck so far.



I've also never seen super pages, how to make them work?

They just work, at least for me. Look at the output of procstat -v
after enough loops finished to not cause disk activity.



The problem was in my test program. I fixed it, now I see super pages
but I'm still not satisfied. There are several tests below:

1. With madvise(MADV_RANDOM) I see almost all super pages:
$ ./mmap /mnt/random-1024 5
mmap: 1 pass took: 26.438535 (none: 0; res: 262144; super: 511; other: 0)
mmap: 2 pass took: 0.187311 (none: 0; res: 262144; super: 511; other: 0)
mmap: 3 pass took: 0.184953 (none: 0; res: 262144; super: 511; other: 0)
mmap: 4 pass took: 0.186007 (none: 0; res: 262144; super: 511; other: 0)
mmap: 5 pass took: 0.185790 (none: 0; res: 262144; super: 511; other: 0)

Should it be 512?

2. Without madvise(MADV_RANDOM):
$ ./mmap /mnt/random-1024 50
mmap: 1 pass took: 7.629745 (none: 262112; res: 32; super: 0; other: 0)
mmap: 2 pass took: 7.301720 (none: 261202; res: 942; super: 0; other: 0)
mmap: 3 pass took: 7.261416 (none: 260226; res: 1918; super: 1; other: 0)
[skip]
mmap: 49 pass took: 0.155368 (none: 0; res: 262144; super: 323; other: 0)
mmap: 50 pass took: 0.155438 (none: 0; res: 262144; super: 323; other: 0)

Only 323 pages.

3. If I just re-run test I don't see super pages with any size of "block".

$ ./mmap /mnt/random-1024 5 $((1<<30))
mmap: 1 pass took: 1.013939 (none: 0; res: 262144; super: 0; other: 0)
mmap: 2 pass took: 0.267082 (none: 0; res: 262144; super: 0; other: 0)
mmap: 3 pass took: 0.270711 (none: 0; res: 262144; super: 0; other: 0)
mmap: 4 pass took: 0.268940 (none: 0; res: 262144; super: 0; other: 0)
mmap: 5 pass took: 0.269634 (none: 0; res: 262144; super: 0; other: 0)

4. If I activate madvise(MADV_WILLNEDD) in the copy loop and re-run test
then I see super pages only if I use "block" greater than 2Mb.

$ ./mmap /mnt/random-1024 1 $((1<<21))
mmap: 1 pass took: 0.299722 (none: 0; res: 262144; super: 0; other: 0)
$ ./mmap /mnt/random-1024 1 $((1<<22))
mmap: 1 pass took: 0.271828 (none: 0; res: 262144; super: 170; other: 0)
$ ./mmap /mnt/random-1024 1 $((1<<23))
mmap: 1 pass took: 0.333188 (none: 0; res: 262144; super: 258; other: 0)
$ ./mmap /mnt/random-1024 1 $((1<<24))
mmap: 1 pass took: 0.339250 (none: 0; res: 262144; super: 303; other: 0)
$ ./mmap /mnt/random-1024 1 $((1<<25))
mmap: 1 pass took: 0.418812 (none: 0; res: 262144; super: 324; other: 0)
$ ./mmap /mnt/random-1024 1 $((1<<26))
mmap: 1 pass took: 0.360892 (none: 0; res: 262144; super: 335; other: 0)
$ ./mmap /mnt/random-1024 1 $((1<<27))
mmap: 1 pass took: 0.401122 (none: 0; res: 262144; super: 342; other: 0)
$ ./mmap /mnt/random-1024 1 $((1<<28))
mmap: 1 pass took: 0.478764 (none: 0; res: 262144; super: 345; other: 0)
$ ./mmap /mnt/random-1024 1 $((1<<29))
mmap: 1 pass took: 0.607266 (none: 0; res: 262144; super: 346; other: 0)
$ ./mmap /mnt/random-1024 1 $((1<<30))
mmap: 1 pass took: 0.901269 (none: 0; res: 262144; super: 347; other: 0)

5. If I activate madvise(MADV_WILLNEED) immediately after mmap() then I
see some number of super pages (the number from test #2).

$ ./mmap /mnt/random-1024 5
mmap: 1 pass took: 0.178666 (none: 0; res: 262144; super: 323; other: 0)
mmap: 2 pass took: 0.158889 (none: 0; res: 262144; super: 323; other: 0)
mmap: 3 pass took: 0.157229 (none: 0; res: 262144; super: 323; other: 0)
mmap: 4 pass took: 0.156895 (none: 0; res: 262144; super: 323; other: 0)
mmap: 5 pass took: 0.162938 (none: 0; res: 262144; super: 323; other: 0)

6. If I read file manually before test then I don't see super pages with
any size of "block" and madvise(MADV_WILLNEED) doesn't help.

$ ./mmap /mnt/random-1024 5 $((1<<30))
mmap: 1 pass took: 0.996767 (none: 0; res: 262144; super: 0; other: 0)
mmap: 2 pass took: 0.311129 (none: 0; res: 262144; super: 0; other: 0)
mmap: 3 pass took: 0.317430 (none: 0; res: 262144; super: 0; other: 0)
mmap: 4 pass took: 0.314437 (none: 0; res: 262144; super: 0; other: 0)
mmap: 5 pass took: 0.310757 (none: 0; res: 262144; super: 0; other: 0)




--
Andrey Zonov
/*_
 * Andrey Zonov (c) 2011
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int
main(int argc, char **argv)
{
int i;
  

Re: problems with mmap() and disk caching

2012-04-04 Thread Andrey Zonov
7430 (none:  0; res: 262144; super: 
0; other:  0)
mmap:  4 pass took:   0.314437 (none:  0; res: 262144; super: 
0; other:  0)
mmap:  5 pass took:   0.310757 (none:  0; res: 262144; super: 
0; other:  0)



--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


problems with mmap() and disk caching

2012-04-03 Thread Andrey Zonov

Hi,

I open the file, then call mmap() on the whole file and get pointer, 
then I work with this pointer.  I expect that page should be only once 
touched to get it into the memory (disk cache?), but this doesn't work!


I wrote the test (attached) and ran it for the 1G file generated from 
/dev/random, the result is the following:


Prepare file:
# swapoff -a
# newfs /dev/ada0b
# mount /dev/ada0b /mnt
# dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024

Purge cache:
# umount /mnt
# mount /dev/ada0b /mnt

Run test:
$ ./mmap /mnt/random-1024 30
mmap:  1 pass took:   7.431046 (none: 262112; res: 32; super: 
0; other:  0)
mmap:  2 pass took:   7.356670 (none: 261648; res:496; super: 
0; other:  0)
mmap:  3 pass took:   7.307094 (none: 260521; res:   1623; super: 
0; other:  0)
mmap:  4 pass took:   7.350239 (none: 258904; res:   3240; super: 
0; other:  0)
mmap:  5 pass took:   7.392480 (none: 257286; res:   4858; super: 
0; other:  0)
mmap:  6 pass took:   7.292069 (none: 255584; res:   6560; super: 
0; other:  0)
mmap:  7 pass took:   7.048980 (none: 251142; res:  11002; super: 
0; other:  0)
mmap:  8 pass took:   6.899387 (none: 247584; res:  14560; super: 
0; other:  0)
mmap:  9 pass took:   7.190579 (none: 242992; res:  19152; super: 
0; other:  0)
mmap: 10 pass took:   6.915482 (none: 239308; res:  22836; super: 
0; other:  0)
mmap: 11 pass took:   6.565909 (none: 232835; res:  29309; super: 
0; other:  0)
mmap: 12 pass took:   6.423945 (none: 226160; res:  35984; super: 
0; other:  0)
mmap: 13 pass took:   6.315385 (none: 208555; res:  53589; super: 
0; other:  0)
mmap: 14 pass took:   6.760780 (none: 192805; res:  69339; super: 
0; other:  0)
mmap: 15 pass took:   5.721513 (none: 174497; res:  87647; super: 
0; other:  0)
mmap: 16 pass took:   5.004424 (none: 155938; res: 106206; super: 
0; other:  0)
mmap: 17 pass took:   4.224926 (none: 135639; res: 126505; super: 
0; other:  0)
mmap: 18 pass took:   3.749608 (none: 117952; res: 144192; super: 
0; other:  0)
mmap: 19 pass took:   3.398084 (none:  99066; res: 163078; super: 
0; other:  0)
mmap: 20 pass took:   3.029557 (none:  74994; res: 187150; super: 
0; other:  0)
mmap: 21 pass took:   2.379430 (none:  55231; res: 206913; super: 
0; other:  0)
mmap: 22 pass took:   2.046521 (none:  40786; res: 221358; super: 
0; other:  0)
mmap: 23 pass took:   1.152797 (none:  30311; res: 231833; super: 
0; other:  0)
mmap: 24 pass took:   0.972617 (none:  16196; res: 245948; super: 
0; other:  0)
mmap: 25 pass took:   0.577515 (none:   8286; res: 253858; super: 
0; other:  0)
mmap: 26 pass took:   0.380738 (none:   3712; res: 258432; super: 
0; other:  0)
mmap: 27 pass took:   0.253583 (none:   1193; res: 260951; super: 
0; other:  0)
mmap: 28 pass took:   0.157508 (none:  0; res: 262144; super: 
0; other:  0)
mmap: 29 pass took:   0.156169 (none:  0; res: 262144; super: 
0; other:  0)
mmap: 30 pass took:   0.156550 (none:  0; res: 262144; super: 
0; other:  0)


If I ran this:
$ cat /mnt/random-1024 > /dev/null
before test, when result is the following:

$ ./mmap /mnt/random-1024 5
mmap:  1 pass took:   0.337657 (none:  0; res: 262144; super: 
0; other:  0)
mmap:  2 pass took:   0.186137 (none:  0; res: 262144; super: 
0; other:  0)
mmap:  3 pass took:   0.186132 (none:  0; res: 262144; super: 
0; other:  0)
mmap:  4 pass took:   0.186535 (none:  0; res: 262144; super: 
0; other:  0)
mmap:  5 pass took:   0.190353 (none:  0; res: 262144; super: 
0; other:  0)


This is what I expect.  But why this doesn't work without reading file 
manually?


I've also never seen super pages, how to make them work?

I've been playing with madvise and posix_fadvise but no luck.  BTW, 
posix_fadvise(POSIX_FADV_WILLNEED) does nothing as the commentary says, 
shouldn't this be documented in the manual page?


All tests were run under 9.0-STABLE (r233744).

--
Andrey Zonov
/*_
 * Andrey Zonov (c) 2011
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int
main(int argc, char **argv)
{
int i;
int fd;
int num;
int block;
int pagesize;
size_t n;
size_t size;
size_t none, incore, super, other;
char *p;
char *tmp;
char *vec;
char *vecp;
struct stat sb;
struct timeval tp, tp1, tp2;

if (argc < 2 || argc > 4)
errx(1, "usage: mmap  [num] [block]");

fd = open(argv[1], O_RDONLY);
if (fd == -1)
err(1, "open()");

num = 1;
if (argc >= 3)
num = atoi(argv[2]);

pagesize = getpagesize();
block = pagesize;
if (argc == 4)
block = atoi(argv[3]);

if (fstat(fd, &a

Re: backup BIOS settings

2012-02-08 Thread Andrey Zonov

On 10.01.2012 7:01, Łukasz Kurek wrote:

Hi,
Is it possible to backup BIOS settings (CMOS configuration) to file and restore 
this settings on the other machine (the same hardware configuration and the 
same BIOS)?

I try do it for this way:

kldload nvram

dd if=/dev/nvram of=nvram.bin   (backup)

dd if=nvram.bin of=/dev/nvram   (restore)


but this way always load default BIOS settings, not my (probably there is some 
kind of error).


Try sysutils/nvramtool instead.

--
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Does anyone use nscd?

2011-10-13 Thread Andrey Zonov

Nope, because of http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/130749

--
Andrey Zonov


04.10.2011 19:20, Dag-Erling Smørgrav пишет:

Does anyone actually use nscd?

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: issues with kern.devstat.all

2011-03-21 Thread Andrey Zonov

Hi,

This sysctl contains a binary data. You can see it using -o or -x 
sysctl's key.

Additional information is at devstat(3) manpage.

--
Andrey Zonov


20.03.2011 20:51, Alexander Best пишет:

hi there,

could somebody explain the following behavior running a recent CURRENT on amd64?

otaku% sysctl kern.devstat
kern.devstat.numdevs: 6
kern.devstat.generation: 222
kern.devstat.version: 6
otaku% sysctl kern.devstat.all
otaku% echo $?
0
otaku% sysctl -d kern.devstat.all
kern.devstat.all: All devices in the devstat list

cheers.
alex


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


segfault in libz's longest_match()

2011-03-15 Thread Andrey Zonov
Hi,

After updating to svn://svn.freebsd.org/base/stable/8@215508, some python
scripts have started getting segfault.
I've made coredump and found that problem in libz's function
longest_match(). I think my problem is simmilar to PR
http://www.freebsd.org/cgi/query-pr.cgi?pr=154073.

Does anybody know what's going on?

PS backtrace looks like this:
(gdb) bt
#0  0x341552a6 in longest_match () from /lib/libz.so.5
#1  0x34154230 in deflateParams () from /lib/libz.so.5
#2  0x3415370c in deflate () from /lib/libz.so.5
#3  0x3436653d in PyZlib_objcompress () from
/usr/local/lib/python2.6/lib-dynload/zlib.so
#4  0x0047d2ed in PyEval_EvalFrameEx ()
#5  0x0047e194 in PyEval_EvalFrameEx ()
#6  0x0047eb41 in PyEval_EvalCodeEx ()
#7  0x0047c95b in PyEval_EvalFrameEx ()
#8  0x0047eb41 in PyEval_EvalCodeEx ()
#9  0x0047c95b in PyEval_EvalFrameEx ()
#10 0x0047e194 in PyEval_EvalFrameEx ()
#11 0x0047e194 in PyEval_EvalFrameEx ()
#12 0x0047eb41 in PyEval_EvalCodeEx ()
#13 0x004c839c in PyClassMethod_New ()
#14 0x00417edd in PyObject_Call ()
#15 0x0047b08b in PyEval_EvalFrameEx ()
#16 0x0047e194 in PyEval_EvalFrameEx ()
#17 0x0047e194 in PyEval_EvalFrameEx ()
#18 0x0047eb41 in PyEval_EvalCodeEx ()
#19 0x004c829d in PyClassMethod_New ()
#20 0x00417edd in PyObject_Call ()
#21 0x0047b08b in PyEval_EvalFrameEx ()
#22 0x0047e194 in PyEval_EvalFrameEx ()
#23 0x0047e194 in PyEval_EvalFrameEx ()
#24 0x0047e194 in PyEval_EvalFrameEx ()
#25 0x0047e194 in PyEval_EvalFrameEx ()
#26 0x0047e194 in PyEval_EvalFrameEx ()
#27 0x0047eb41 in PyEval_EvalCodeEx ()
#28 0x004c839c in PyClassMethod_New ()
#29 0x00417edd in PyObject_Call ()
#30 0x0047b08b in PyEval_EvalFrameEx ()
#31 0x0047eb41 in PyEval_EvalCodeEx ()
#32 0x004c839c in PyClassMethod_New ()
#33 0x00417edd in PyObject_Call ()
#34 0x0047b08b in PyEval_EvalFrameEx ()
#35 0x0047eb41 in PyEval_EvalCodeEx ()
#36 0x004c839c in PyClassMethod_New ()
#37 0x00417edd in PyObject_Call ()
#38 0x0047b08b in PyEval_EvalFrameEx ()
#39 0x0047eb41 in PyEval_EvalCodeEx ()
#40 0x004c839c in PyClassMethod_New ()
#41 0x00417edd in PyObject_Call ()
#42 0x0047b08b in PyEval_EvalFrameEx ()
#43 0x0047eb41 in PyEval_EvalCodeEx ()
#44 0x0047c95b in PyEval_EvalFrameEx ()
#45 0x0047e194 in PyEval_EvalFrameEx ()
#46 0x0047e194 in PyEval_EvalFrameEx ()
#47 0x0047eb41 in PyEval_EvalCodeEx ()
#48 0x0047c95b in PyEval_EvalFrameEx ()
#49 0x0047eb41 in PyEval_EvalCodeEx ()
#50 0x0047ec22 in PyEval_EvalCode ()
#51 0x004984b2 in Py_CompileString ()
#52 0x00498586 in PyRun_FileExFlags ()
#53 0x00499a0f in PyRun_SimpleFileExFlags ()
#54 0x00413de3 in Py_Main ()
#55 0x004131fa in main ()
(gdb) info threads
* 1 Thread 32e041c0 (LWP 101629)  0x341552a6 in longest_match ()
from /lib/libz.so.5

-- 
Andrey Zonov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


[patch] rresvport_af(3) uses setsockopt(SO_REUSEADDR)

2010-11-24 Thread Andrey Zonov

Hi,

I've made the patch for rresvport_af(3) and rcmd_af(3) which makes 
possible to use more connections for rsh/rshd.
I've also reviewed freebsd src tree and I think these changes in libc do 
not break any existing applications.


Can anybody look at the patch?

--
Andrey Zonov

Index: libexec/rshd/rshd.c
===
--- libexec/rshd/rshd.c (revision 215508)
+++ libexec/rshd/rshd.c (working copy)
@@ -278,11 +278,6 @@
(void) alarm(0);
if (port != 0) {
int lport = IPPORT_RESERVED - 1;
-   s = rresvport_af(&lport, af);
-   if (s < 0) {
-   syslog(LOG_ERR, "can't get stderr port: %m");
-   exit(1);
-   }
if (port >= IPPORT_RESERVED ||
port < IPPORT_RESERVED/2) {
syslog(LOG_NOTICE|LOG_AUTH,
@@ -291,10 +286,31 @@
port);
exit(1);
}
-   *((in_port_t *)&fromp->sa_data) = htons(port);
-   if (connect(s, fromp, fromp->sa_len) < 0) {
-   syslog(LOG_INFO, "connect second port %d: %m", port);
-   exit(1);
+   for ( ;; ) {
+   s = rresvport_af(&lport, af);
+   if (s < 0) {
+   if (errno == EADDRINUSE ||
+   errno == EADDRNOTAVAIL) {
+   lport--;
+   continue;
+   }
+   if (errno == EAGAIN)
+   syslog(LOG_ERR, "socket: all ports in 
use");
+   else
+   syslog(LOG_ERR, "can't get stderr port: 
%m");
+   exit(1);
+   }
+   *((in_port_t *)&fromp->sa_data) = htons(port);
+   if (connect(s, fromp, fromp->sa_len) < 0) {
+   if (errno == EADDRINUSE) {
+   lport--;
+   close(s);
+   continue;
+   }
+   syslog(LOG_INFO, "connect second port %d: %m", 
port);
+   exit(1);
+   }
+   break;
}
}
 
@@ -535,11 +551,11 @@
char c;
 
do {
+   if (cnt-- == 0)
+   rshd_errx(1, "%s too long", error);
if (read(STDIN_FILENO, &c, 1) != 1)
exit(1);
*buf++ = c;
-   if (--cnt == 0)
-   rshd_errx(1, "%s too long", error);
} while (c != 0);
 }
 
Index: lib/libc/net/rcmd.c
===
--- lib/libc/net/rcmd.c (revision 215508)
+++ lib/libc/net/rcmd.c (working copy)
@@ -152,6 +152,11 @@
for (timo = 1, lport = IPPORT_RESERVED - 1;;) {
s = rresvport_af(&lport, ai->ai_family);
if (s < 0) {
+   if (errno == EADDRINUSE ||
+   errno == EADDRNOTAVAIL) {
+   lport--;
+   continue;
+   }
if (errno != EAGAIN && ai->ai_next) {
ai = ai->ai_next;
continue;
@@ -212,17 +217,34 @@
fprintf(stderr, "Trying %s...\n", paddr);
}
}
-   lport--;
+   lport = IPPORT_RESERVED - 1;
if (fd2p == 0) {
_write(s, "", 1);
lport = 0;
} else {
-   int s2 = rresvport_af(&lport, ai->ai_family), s3;
+   int s2, s3;
socklen_t len = ai->ai_addrlen;
int nfds;
 
-   if (s2 < 0)
-   goto bad;
+   for ( ;; ) {
+   s2 = rresvport_af(&lport, ai->ai_family);
+   if (s2 < 0) {
+   if (errno == EADDRINUSE ||
+   errno == EADDRNOTAVAIL) {
+   lport--;
+   continue;
+   }
+   if (errno == EAGAIN)
+   (void)fprintf(stderr,
+   "rcmd: socket2: All ports in 
use\n");
+   

Re: How change process flags from userland?

2010-07-14 Thread Andrey Zonov
Hi,

I resolve this problem (thanks Julian Elischer for his thoughts):

===
int fd;
int cnt;
off_t off;
void *p;
kvm_t *kd;
struct kinfo_proc *kip;
struct proc *p_mmap;

kd = kvm_open(NULL, _PATH_MEM, NULL, O_RDONLY, NULL);
kip = kvm_getprocs(kd, KERN_PROC_PID, pid, &cnt);
fd = open(_PATH_KMEM, O_RDWR, 0);
off = (off_t)((uintptr_t)kip->ki_paddr);
p = mmap(0, sizeof(struct proc), PROT_READ | PROT_WRITE,
MAP_SHARED, fd, off);
p_mmap = (struct proc *)p;
p_mmap->p_flag |= P_PROTECTED;
...
===

I wrote daemon [1] that set P_PROTECTED flag for applications. May be
it useful for someone.

[1] http://zonov.pp.ru/pprotectd/pprotectd.tbz

-- 
Andrey Zonov

2010/6/30 Andrey Zonov :
> Hi,
>
> I want to set P_PROTECTED flag for some daemons after it start, without
> patching application and kernel.
> It possible?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: How change process flags from userland?

2010-06-30 Thread Andrey Zonov

Can you explain how change flags with /dev/kmem?
kvm_write(3) not work for this.

Julian Elischer пишет:

On 6/30/10 11:23 AM, Andrey Zonov wrote:

Yes, but I want change process flags without kernel hacking/loading
modules or modification applications.


you are going to have to do one of those.
The only alternative is that if you have root you can modify a 
processe's flags

using gdb and /dev/kmem.
you could use a program to do it specially if you have root,
but if that's not what you want then you will need to add a syscall to 
do what you want

as far as I can see.






Andrey V. Elsukov пишет:

On 30.06.2010 10:26, Andrey Zonov wrote:

Hi,

I want to set P_PROTECTED flag for some daemons after it start, 
without

patching application and kernel.
It possible?



Did you try sysutils/scprotect?







--
Andrey Zonov

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: How change process flags from userland?

2010-06-30 Thread Andrey Zonov
Yes, but I want change process flags without kernel hacking/loading 
modules or modification applications.


Andrey V. Elsukov пишет:

On 30.06.2010 10:26, Andrey Zonov wrote:
  

Hi,

I want to set P_PROTECTED flag for some daemons after it start, without
patching application and kernel.
It possible?




Did you try sysutils/scprotect?

  


--
Andrey Zonov

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


How change process flags from userland?

2010-06-29 Thread Andrey Zonov

Hi,

I want to set P_PROTECTED flag for some daemons after it start, without 
patching application and kernel.

It possible?

--
Andrey Zonov

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: 2 bytes allocated problems

2010-02-24 Thread Andrey Zonov

And how free() finds that the need to release?

Dag-Erling Smørgrav пишет:

Andrey Zonov  writes:
  

When I try allocated pointer to a pointer, and in it some pointers
(important: size is 2 bytes), the pointers lose their boundaries.



Pointers have no boundareis in C.

  

PS in freebsd < 7, it's ok, in Linux too.



Only by accident.

DES
  

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


2 bytes allocated problems

2010-02-24 Thread Andrey Zonov
Hi,

When I try allocated pointer to a pointer, and in it some pointers
(important: size is 2 bytes), the pointers lose their boundaries.
Why it can happen?

Test program in attach.

PS in freebsd < 7, it's ok, in Linux too.

-- 
Andrey Zonov


alloc.c
Description: Binary data
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"