Out of swap handling and X lockups in 3.2R

1999-09-21 Thread Ivan


Hi,

I have a couple of questions about the way 'out of swap' situations are
handled in FreeBSD. Not that my system often runs out of swap, I'm just 
being curious:

When the system runs out of swap space, it is supposed to kill the
'biggest' process to regain some space.
I wrote a little program to test this behaviour, basically something like

/* gradually ask for memory at each key stroke */
while (getchar())
{
  a = malloc(SIZE);
  assert(a);
  memset(a,0,SIZE);
}

where SIZE was 4 MB in this case. I ran it on the console (I've got 64 MB
of RAM and 128 MB of swap) until the swap pager went out of space and
my huge process was eventually killed as expected. Fine. But when I ran 
it under X Window, the system eventually killed the X server (SIZE ~20 MB,
RES ~14 MB -- the biggest RES size) instead of my big process (SIZE ~100
MB, RES 0K). 

My question is: Why was the X server killed ? Was it because the 'biggest'
process is the one with the biggest resident memory size ?
And if so, why not take into account the total size of processes ?

This leads me to another (not related to swap) question:

When the X server is killed, the machine simply hangs without any
reaction to Ctrl-Alt-F1 or even Ctrl-Alt-Del. Is that the normal
behaviour ? (I think it should get the user back to the console ?!)
Is there any workaround ?

TIA,

Ivan



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Ivan


> :where SIZE was 4 MB in this case. I ran it on the console (I've got 64 MB
> :of RAM and 128 MB of swap) until the swap pager went out of space and
> :my huge process was eventually killed as expected. Fine. But when I ran 
> :it under X Window, the system eventually killed the X server (SIZE ~20 MB,
> :RES ~14 MB -- the biggest RES size) instead of my big process (SIZE ~100
> :MB, RES 0K). 
> :
> :My question is: Why was the X server killed ? Was it because the 'biggest'
> :process is the one with the biggest resident memory size ?
> :And if so, why not take into account the total size of processes ?
> 
> The algorithm is pretty dumb.  In fact, it would not be too difficult
> to actually calculate the amount of swap being used by a process and
> add that to the RSS when figuring out who to kill.

Thank you for your explanations ! 
I had a look at vm_pageout.c and noticed that situations may occur where
no process can be killed. I guess that in such situations memory
allocation requests are simply rejected ( e.g. malloc returning NULL ) .
Is there a reason why this isn't the default behavior in FreeBSD ? i.e.
why does the system always try to kill a process ?

> 
> The X server wasn't killed nicely, it couldn't take you out of the
> video mode.
> 
Indeed, the 'biggest' process is SIGKILLed without any prior notice. Would
it be possible to send him a nicer signal first, to let him a chance to
quit before being killed ?

A last question, to FreeBSD developpers:
After a few tests, I came to the conclusion that it's quite easy to crash
a vanilla FreeBSD system (without any per-user/per-process limits set) by
simply running it out of swap space ... (the 'kill the biggest process'
mechanism doesn't seem to always work !?) 
Is this a currently addressed issue, or is it simply considered not an
issue ?

Thanks in advance for your time,

Ivan

>   Matthew Dillon 
>   <[EMAIL PROTECTED]>
> 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Ivan


On Thu, 23 Sep 1999, Daniel C. Sobral wrote:

> > I had a look at vm_pageout.c and noticed that situations may occur where
> > no process can be killed. I guess that in such situations memory
> > allocation requests are simply rejected ( e.g. malloc returning NULL ) .
> 
> Err... no. Malloc() does not "call" these functions. By the time a
> pageout is requested, the malloc() has already finished. The pageout
> is being requested because a program is trying to use the memory
> that was allocated to it.

Of course I didn't mean that malloc() calls the pageout daemon ... I 
simply meant that if no more memory space can be regained (in particular
by killing a process) then at some point memory allocations will be
refused -- or else, when does malloc() ever returns NULL ?!

> > Is there a reason why this isn't the default behavior in FreeBSD ? i.e.
> > why does the system always try to kill a process ?
> 
> If no process can be killed, the system will panic (or deadlock).
> 
> > Indeed, the 'biggest' process is SIGKILLed without any prior notice. Would
> > it be possible to send him a nicer signal first, to let him a chance to
> > quit before being killed ?
> 
> I'd very much like to see swap space being taking into account in
> addition to RSS. A runaway program is more likely to have a low RSS
> and a large swap than a large RSS.
> 
> Anyway, some Unix systems do send a signal in low memory conditions.
> In AIX (the one I'm most familiar with) it is called SIGDANGER, and
> it's handler defaults to SIG_IGN.
> 
> One reason why we do not do this is the lack of support for more
> than 32 signals. Alas, I think we now support more than 32 signals,
> don't we? If that's the case, I'd think it shouldn't be too
> difficult to make the swapper send SIGDANGER to all processes when
> it reaches a certain threshold (x% full? xMb left?).

Or even simply send SIGTERM for instance before SIGKILL ... at least,
that would be understood by many processes (such as the X server).

> > A last question, to FreeBSD developpers:
> > After a few tests, I came to the conclusion that it's quite easy to crash
> > a vanilla FreeBSD system (without any per-user/per-process limits set) by
> > simply running it out of swap space ... (the 'kill the biggest process'
> > mechanism doesn't seem to always work !?)
> 
> 'kill the biggest process' should always work. Do you have any test
> case where it doesn't?
>

I logged in and ran this little program this morning on a FreeBSD 3.2R box
(128 MB RAM, 300 MB swap) (try this at home :-):

#include 
#include 

#define ISIZE 180*1024*1024
#define SIZE 1024*1024

main()
{
 char * a;
 a = (char *) malloc(ISIZE);
 assert(a);
 memset(a,0,ISIZE);
 printf("Initial size: %d bytes\n",ISIZE);

 while (getchar())
 {
   printf("Allocating %d bytes\n",SIZE);
   a = (char *) malloc(SIZE);
   assert(a);
   memset(a,0,SIZE);
 }
}

The machine wasn't too loaded, ( no swapping, active pages ~20% of RAM ).
I let the program ask for memory (pressed a key a certain number of
times), leaving some time though for my process to be almost totally
swapped out (thus ignored by the 'kill the biggest' routine) . After a while,
having reached a '99% swap used' state, everything was locked up (remote
connections, console, etc.), I couldn't event tell which process had been
killed or if something had actually been killed -- we had to reboot :-( 
Yet I'm not certain that this is related to a bug in the pageout daemon
...

> > Is this a currently addressed issue, or is it simply considered not an
> > issue ?
> 
> FreeBSD's memory overcommit behavior is not considered an issue by
> anyone with the knowledge to do something about it. In fact, these
> people consider FreeBSD behavior to be a gain over
> non-overcommitting systems (such as Solaris). A lot of people share
> this opinion, and some people strongly disagrees.

A least I think that this overcommit behaviour should more documented :-)
 
> As for the problems that might result from it, the solution is to
> use per-process limits through login.conf, and be a good
> administrator.
> 


> --
> Daniel C. Sobral  (8-DCS)
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> 
>   "Thus, over the years my wife and I have physically diverged. While
> I have zoomed toward a crusty middle-age, she has instead clung
> doggedly to the sweet bloom of youth. Naturally I think this unfair.
> Yet, if it was the other way around, I confess I wouldn't be happy
> either."
> 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Are POSIX mqueues supposed to be functional on FreeBSD?

2010-06-23 Thread Ivan Voras
On 06/21/10 02:25, Garrett Cooper wrote:

> For whatever reason my source tree wasn't prebuilt, so I reran
> buildkernel and everything was fine once again.

So, do the tests pass now? :)

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: an alternative to powerpoint

2010-07-13 Thread Ivan Voras
On 07/13/10 06:15, Luigi Rizzo wrote:

> Have fun, it would be great if you could report how it works
> on fancy devices (iphone, ipad, androids...) 

For what it's worth, it doesn't work at all on Android :) (and the
layout is messed up)



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: disk I/O, VFS hirunningspace

2010-07-15 Thread Ivan Voras
On 07/14/10 18:27, Jerry Toung wrote:
> On Wed, Jul 14, 2010 at 12:04 AM, Gary Jennejohn
> wrote:
> 
>>
>>
>> Rather than commenting out the code try setting the sysctl
>> vfs.hirunningspace to various powers-of-two.  Default seems to be
>> 1MB.  I just changed it on the command line as a test to 2MB.
>>
>> You can do this in /etc/sysctl.conf.
>>
>>
> thank you all, that did it. The settings that Matt recommended are giving
> the same numbers

Any objections to raising the defaults to 8 MB / 1 MB in HEAD?

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Why is TUNABLE_INT discouraged?

2010-08-07 Thread Ivan Voras
On 7.8.2010 15:40, Dag-Erling Smørgrav wrote:
> Garrett Cooper  writes:
>>I found the commit where it was made (by des@ -- cvs revision
>> 1.120), but unfortunately I lack the context as to why that suggestion
>> is made; the commit isn't very explicit as to why integers tunables
>> should be discouraged
> 
> You're supposed to use TUNABLE_LONG or TUNABLE_ULONG instead.  From
> digging in the -current archives, it seems that the motivation was a bug
> that resulted from using a TUNABLE_INT for a value that was actually an
> address.  It was doubly broken: first because it was too small on 64-bit
> systems, and second because it was signed.

Ok, but still - if the underlying value really is declared as "int",
doesn't it make perfect sense to have something like TUNABLE_INT for it?

Forcing "long" is a bit weird in this context, as C long is 32-bit on
i386 and 64-bit on amd64.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Why is TUNABLE_INT discouraged?

2010-08-07 Thread Ivan Voras
2010/8/8 Dag-Erling Smørgrav :
> Garrett Cooper  writes:
>> Dag-Erling Smørgrav  writes:
>> > Perhaps.  I don't remember all the details; I can't find a discussion in
>> > the list archives (other than me announcing the change in response to a
>> > bug report), but there must have been one, either on IRC or in Karlsruhe.
>> > In any case, I never removed TUNABLE_INT(), so...
>> It does matter for integers on 64-bit vs 32-bit architectures though,
>> right
>
> Not sure what you mean.  The original issue was that someone had used
> TUNABLE_INT() for something that was actually a memory address.  I
> changed it to TUNABLE_ULONG().  Of course, if your tunable is a boolean
> value or something like maxprocs, an int is fine - but so is a long.

Semantically valid but using TUNABLE_INT to hold pointers is a
developer bug, not the fault of the API, and there's nothing wrong
with "int" as a data type in this context.

Unless there is a real hidden danger in using TUNABLE_INT (and/or
adding TUNABLE_UINT etc.) in the expected way, I'd vote for either
removing the cautioning comment or rewriting it to say something like
"developers are hereby warned that ints cannot hold pointers on all
architectures", if it is indeed such a little known fact among kernel
developers :P
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


glabel "force sectorsize" patch

2010-08-07 Thread Ivan Voras
Hi,

In order to help users having 4k sector drives which the system
recognizes as 512 byte sector drives, I'm proposing a patch to glabel
which enables it to use a forced sector size for its native-labeled
providers. It is naturally only usable with glabel-native labels
(those created by "glabel label") and not partition and file system
labels because we cannot add arbitrary new fields to metadata of those
types.

The patch is here:

http://people.freebsd.org/~ivoras/diffs/glabel_ssize.patch

It's tested with UFS+SU and a forced 4k sector size - apparently there
are no problems here. Here's how a dumpfs output looks like from the
test file system with completely default newfs options (except SU):

magic   19540119 (UFS2) timeSun Aug  8 03:40:47 2010
superblock location 65536   id  [ 4c5e0ab3 41c7e8d9 ]
ncg 7   size524287  blocks  514774
bsize   16384   shift   14  mask0xc000
fsize   4096shift   12  mask0xf000
frag4   shift   2   fsbtodb 3
minfree 8%  optim   timesymlinklen 120
maxbsize 16384  maxbpg  2048maxcontig 8 contigsumsize 8
nbfree  128690  ndir2   nifree  150972  nffree  12
bpg 21567   fpg 86268   ipg 21568   unrefs  0
nindir  2048inopb   64  maxfilesize 140806241583103
sbsize  4096cgsize  16384   csaddr  1376cssize  4096
sblkno  20  cblkno  24  iblkno  28  dblkno  1376
cgrotor 0   fmod0   ronly   0   clean   1
avgfpdir 64 avgfilesize 16384
flags   soft-updates
fsmnt   /mt
volname swuid   0

This is a pre-commit review request and also a call for testers :)

This mechanism is a band-aid until there's a better way of dealing
with 4k drives.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: glabel "force sectorsize" patch

2010-08-09 Thread Ivan Voras
On 9 August 2010 10:51, Dag-Erling Smørgrav  wrote:
> Marius Nünnerich  writes:
>> I did not think of a new GEOM class that looks like glabel but one
>> that has no metadata stored on disk . It is then activated and
>> controlled by loader.conf variables. (Maybe like gnop? If I remember
>> correctly, I did not take a look at that class for ages).
>
> As you would know if you had followed the discussion about WD EARS
> disks, gnop does what you want and is currently the recommended
> solution.

Of course, but gnop as a testing GEOM class, does not save its
metadata, meaning it has to be reconfigured after reboot, etc.

> I am looking into a permanent solution and would appreciate if people
> held off on this for a couple of weeks.

Thank you!
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: glabel "force sectorsize" patch

2010-08-09 Thread Ivan Voras
On 9 August 2010 14:37, Dag-Erling Smørgrav  wrote:
> Ivan Voras  writes:
>> Dag-Erling Smørgrav  writes:
>> > Marius Nünnerich  writes:
>> > > I did not think of a new GEOM class that looks like glabel but one
>> > > that has no metadata stored on disk . It is then activated and
>> > > controlled by loader.conf variables. (Maybe like gnop? If I
>> > > remember correctly, I did not take a look at that class for ages).
>> > As you would know if you had followed the discussion about WD EARS
>> > disks, gnop does what you want and is currently the recommended
>> > solution.
>> Of course, but gnop as a testing GEOM class, does not save its
>> metadata, meaning it has to be reconfigured after reboot, etc.
>
> Please read what Marius wrote, which I quoted above.

You are right, I skipped that part of his message. Gnop fits that.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


nsswitch man page

2010-08-31 Thread Ivan Voras
I'm trying to do something with NSS and I see that NetBSD has much
better documentation for it:

http://www.daemon-systems.org/man/nsdispatch.3.html

vs

http://www.freebsd.org/cgi/man.cgi?query=nsdispatch

>From the AUTHORS section on the FreeBSD's page it looks like it is an
import of an earlier NetBSD version.

Are the implementations still compatible? Could the manpage simply be
reimported from NetBSD?


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: nsswitch man page

2010-09-01 Thread Ivan Voras

On 09/01/10 08:16, Michael Bushkov wrote:

If you don't mind, as I've worked extensively with nsswitch, I can
check the current implementation and provide you a patch to update the
docs.


Of course, go ahead.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: How to disallow logout

2010-09-10 Thread Ivan Voras

On 09/10/10 05:27, Aryeh Friedman wrote:

I have a directory that must not exist on logout and rm -rf is not
sufficent to do it because the contents need to be processed by our
version control system.   The real life scenario is our version
control system stores the repo for a given project encrypted but for
techinical reasons it needs to keep the checkouted files in plain text
(they are all in the same dir) and I want to *NEVER* have the plain
text checkouted files in my dir when I logout, *BUT* instead of just
deleting it I need to check them in...  so how do I make my .logout so
if the file exists it will not exit and give a error saying that dir
is still there? (minor but unimportant side effect of the version
control system is the dir will have a different name everytime it is
made but always the same prefix)


Have you thought about what should happen if for example, the login 
session is forcefully terminated by either of:


1) power outage of the server
2) power outage on the client
3) network problems (ssh or TCP connection drop)
4) administrative command (e.g. root executes "killall $shell")

?

I don't think there is a way to protect from all of those, so any effort 
in protecting from only part of the problem looks useless.


On the other hand, if partial solutions satisfy your requirements, maybe 
you can do something with 
http://glebkurtsou.blogspot.com/search/label/pefs .


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: How to disallow logout

2010-09-10 Thread Ivan Voras
On 10 September 2010 14:11, Atom Smasher  wrote:
> On Fri, 10 Sep 2010, Ivan Voras wrote:
>
>> 1) power outage of the server
>> 2) power outage on the client
>> 3) network problems (ssh or TCP connection drop)
>> 4) administrative command (e.g. root executes "killall $shell")
>>
>> ?
>>
>> I don't think there is a way to protect from all of those, so any effort
>> in protecting from only part of the problem looks useless.
>
> 
>
> you forgot cosmic rays, nuclear war and zombie apocalypse, among other
> failure modes. *NOTHING* is capable of protecting against everything; a good
> solution will most always have pitfalls; as a sysadmin/engineer/manager one
> has to either accept the pitfalls or find a more acceptable solution, which
> usually means different pitfalls. that doesn't mean a given solution is
> useless.

On the other hand, things such as power outages, network blackouts and
and root security compromises have been statistically shown to appear
more often than zombie apocalypses, so I'd guess, though of course
without absolute certainty, that those problem should be solved first
:)

Otherwise, it's just as effective as putting a README file in the home
directory saying "please go away" :)
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: is vfs.lookup_shared unsafe in 7.3?

2010-09-14 Thread Ivan Voras

On 09/13/10 22:57, cronfy wrote:

Hello,

Trying to overtake high server load (sudden peaks of 15%us/85%sy, LA>
40, very slow lstat() at these moments, looks like some kind of lock
contention) I enabled vfs.lookup_shared=1 on two servers today. One is
FreeBSD-7.3 kernel csup'ed and built Sep  9 2010 and other is
FreeBSD-7.3 csup'ed and built Jul 16 2010.


The important think you missed is *where* is the supposed lock 
contention. If you have lots of processes in "ufs" state, there are 
other things that can help you, such as increasing vfs.ufs.dirhash_maxmem.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Examining the VM splay tree effectiveness

2010-09-30 Thread Ivan Voras
On 09/30/10 20:01, Alan Cox wrote:
> On Thu, Sep 30, 2010 at 12:37 PM, Andre Oppermann  wrote:
> 
>> On 30.09.2010 18:37, Andre Oppermann wrote:
>>
>>> Just for the kick of it I decided to take a closer look at the use of
>>> splay trees (inherited from Mach if I read the history correctly) in
>>> the FreeBSD VM system suspecting an interesting journey.
>>>
>>
>> Correcting myself regarding the history: The splay tree for vmmap was
>> done about 8 years ago by alc@ to replace a simple linked list and was
>> a huge improvement.  The change in vmpage from a hash to the same splay
>> tree as in vmmap was committed by dillon@ about 7.5 years ago with some
>> involvement of a...@.
>> ss

> Yes, and there is a substantial difference in the degree of locality of
> access to these different structures, and thus the effectiveness of a splay
> tree.  When I did the last round of changes to the locking on the vm map, I
> made some measurements of the splay tree's performance on a JVM running a
> moderately large bioinformatics application.  The upshot was that the
> average number of map entries visited on an access to the vm map's splay
> tree was less than the expected depth of a node in a perfectly balanced
> tree.

Sorry, I'm not sure how to parse that - are you saying that the splaying
helped, making the number of nodes visited during tree search lesser
than the average node depth in a balanced tree?

Even if so, wouldn't the excessive bandwidth lost in the splaying ops
(and worse - write bandwidth) make it unsuitable today?


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Examining the VM splay tree effectiveness

2010-09-30 Thread Ivan Voras
On 09/30/10 20:38, Alfred Perlstein wrote:
> Andre,
> 
> Your observations on the effectiveness of the splay tree
> mirror the concerns I have with it when I read about it.
> 
> I have always wondered though if the splay-tree algorithm
> was modified to only perform rotations when a lookup required
> more than "N" traversals to reach a node.
> 
> This would self-balance the tree and maintain cache without 
> the expense of writes for nearly all lookups.
> 
> I'm wondering if you have the time/interest in rerunning
> your tests, but modifying the algorithm to only rebalance
> the splay if a lookup requires more than let's say 3, 5, 7
> or 10 traversals.

I see two possible problems with this:

1: the validity of this heuristics, since the splay is not meant to help
the current lookup but future lookups, and if you "now" require e.g.
5-deep traversal, (barring external information about the structures -
meybe some inner relationship of the nodes can be exploitet) it is I
think about the same probability that the next lookup will hit that
rotated node or the former root node.

2: rotating only on the N'th lookup would have to go like this:

   1. take a read-only lock
   2. make the lookup, count the depth
   3. if depth > N:
  1. relock for write (lock upgrade will not always work)
  2. recheck if the tree is still the same; bail if it isn't
  3. do the splay
   4. unlock

i.e. suspiciously complicated. That is, if you want to take advantage of
read paralelism; if the tree is write-locked all the time it's simpler
but only inefficient.

Of course, real-world measurements trump theory :)

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Examining the VM splay tree effectiveness

2010-10-03 Thread Ivan Voras
On 10/01/10 10:54, Andre Oppermann wrote:
> On 30.09.2010 19:51, Ivan Voras wrote:
>> On 09/30/10 18:37, Andre Oppermann wrote:
>>
>>> Both the vmmap and page table make use of splay trees to manage the
>>> entries and to speed up lookups compared to long to traverse linked
>>> lists or more memory expensive hash tables.  Some structures though
>>> do have an additional linked list to simplify ordered traversals.
>>
>> The property of splay tree requiring *writes* for nearly every read
>> really is a thorn in the eye for SMP. It seems to me that even if the
>> immediate benefits from converting to something else are not directly
>> observable, it will still be worth doing it.
> 
> Fully agreed.
> 
>> It's a shame that RCU is still a patent minefield :/
>>
>> http://mirror.leaseweb.com/kernel/people/npiggin/patches/lockless/2.6.16-rc5/radix-intro.pdf
>>
> 
> I'm not convinced that RCU is the only scalable way of sharing a
> data structure across a possibly large number of CPU's.

Of course, it's just well understood currently.

> The term "lockless" is often used and frequently misunderstood.
> Having a lockess data structure *doesn't* mean that is either
> performant, scalable or both.  It heavily depends on a number
> of additional factors.  Many times "lockless" just replaces a
> simple lock/unlock cycle with a number of atomic operations on
> the data structure.  This can easily backfire because an atomic

Yes.

> operation just hides the computational complexity and also dirties
> the CPU's cache lines.  Generally on cache coherent architectures
> almost all of the atomic operation is in HW with bus lock cycles,
> bus snooping and whatnot.  While seemingly simple form the programmers
> point of view, the overhead and latency is still there.  Needless

Yes, you basically just offload the operation to hardware but the steps
it needs to make are the same in concept.

>  a) make sure the lock is held for only a small amount of time
> to avoid lock contention.
>  b) do everything you can outside of the lock.
>  c) if the lock is found to be heavily contended rethink the
> whole approach and check if other data structures can be used.
>  d) minimize write accesses to memory in the lock protected
> shared data structure.
>  e) PROFILE, DON'T SPECULATE! Measure the access pattern and
> measure the locking/data access strategy's cost in terms
> of CPU cycles consumed.
> 
>  f) on lookup heavy data structures avoid writing to memory and
> by it dirtying CPU caches.
>  g) on modify heavy data structures avoid touching too many
> elements.
>  h) on lookup and modify heavy data structure that are used
> across many CPU's all bets are off and a different data
> structure approach should be considered resulting ideally
> in case f).
> 
> It all starts with the hypothesis that a data structure is not
> optimally locked.

This looks like material for a developer-centric wiki page :) There is a
lot of dispersed wisdom in this thread which would be nice if gathered
in one place.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Examining the VM splay tree effectiveness

2010-10-03 Thread Ivan Voras
On 10/01/10 20:28, Ed Schouten wrote:
> Andre,
> 
> * Andre Oppermann  wrote:
>> A splay tree is an interesting binary search tree with insertion,
>> lookup and removal performed in O(log n) *amortized* time.  With
>> the *amortized* time being the crucial difference to other binary trees.
>> On every access *including* lookup it rotates the tree to make the
>> just found element the new root node.  For all gory details see:
>>  http://en.wikipedia.org/wiki/Splay_tree
> 
> Even though a red-black tree is quite good since it guarantees a $2 \log
> n$ upperbound, the problem is that it's quite computationally intensive.
> 
> Maybe it would be worth looking at other types of balanced trees? For
> example, another type of tree which has only $O(\log n)$ amortized
> insertion/removal/lookup time, but could already be a lot better in
> practice, is a Treap.

How many elements are held in vm_map trees? From the source it looks
like one tree with all pages in the system and then one per-process?

Trees have very varied real-time characteristics, e.g.:

http://attractivechaos.awardspace.com/udb.html
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.6795&rep=rep1&type=pdf


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Cannot compile a custom FreeBSD kernel

2010-10-18 Thread Ivan Klymenko
В Mon, 18 Oct 2010 22:03:54 +0200
Jack Engqvist Johansson  пишет:

> Hi,
> 
> I have a HP tx2020eo laptop with FreeBSD 8.1 installed. I'm trying to
> recompile the kernel to get even better performance.
> The problem is that I get error when I compile. I've tried
> comment/uncomment lines in my kernel config file but I always get some
> of error.
> 
> Could somebody have a look at my configuration?
> 
> Laptop spec:
> http://h10025.www1.hp.com/ewfrf/wc/document?lc=en&cc=us&docname=c01302377&dlc=en
> 
> --
> Compilation:
> ...
> awk -f /usr/src/sys/conf/kmod_syms.awk ahc.ko.debug  export_syms |
> xargs -J% objcopy % ahc.ko.debug
> objcopy --only-keep-debug ahc.ko.debug ahc.ko.symbols
> objcopy --strip-debug --add-gnu-debuglink=ahc.ko.symbols ahc.ko.debug
> ahc.ko ===> aic7xxx/ahc/ahc_eisa (all)
> cc -O3 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE
> -nostdinc
>

not use -O3 gcc optimisation...
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Filesystem full when installing custom kernel in FreeBSD

2010-10-18 Thread Ivan Klymenko
В Tue, 19 Oct 2010 00:53:31 +0200
Jack Engqvist Johansson  пишет:

> Hi,
> 
> I just got succeeded with my compilation of a custom kernel for
> FreeBSD 8.1. But when I'm trying to install it, I got an error.
> File system is full!
> 
> So I moved the old kernel to another partition, but got the same
> error. And I cannot move it back again.
> Whats wrong? How can I do to get a kernel again?
> 
> Thanks.
> Best regards, Jack Engvist Johansson
> 
> 
> 
>  bsd# make installkernel KERNCONF=NECTRUS
> --
> >>> Installing kernel
> --
> cd /usr/obj/usr/src/sys/NECTRUS;  MAKEOBJDIRPREFIX=/usr/obj
> MACHINE_ARCH=amd64  MACHINE=amd64  CPUTYPE=
> GROFF_BIN_PATH=/usr/obj/usr/src/tmp/legacy/usr/bin
> GROFF_FONT_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/groff_font
> GROFF_TMAC_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/tmac
> PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/usr/games:/usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/usr/obj/usr/src/tmp/usr/games:/sbin:/bin:/usr/sbin:/usr/bin
>  make KERNEL=kernel install
> thiskernel=`sysctl -n kern.bootfile` ;  if [ ! "`dirname
> "$thiskernel"`" -ef /boot/kernel ] ; then  chflags -R noschg
> /boot/kernel ;  rm -rf /boot/kernel ;  else  if [ -d /boot/kernel.old
> ] ; then  chflags -R noschg /boot/kernel.old ;  rm -rf
> /boot/kernel.old ;  fi ;  mv /boot/kernel /boot/kernel.old ;  sysctl
> kern.bootfile=/boot/kernel.old/"`basename "$thiskernel"`" ;  fi
> mkdir -p /boot/kernel
> install -p -m 555 -o root -g wheel kernel /boot/kernel
> 
> /: write failed, filesystem is full
> install: /boot/kernel/kernel: No space left on device
> *** Error code 71
> 
> Stop in /usr/obj/usr/src/sys/NECTRUS.
> *** Error code 1
> 
> Stop in /usr/src.
> *** Error code 1
> 
> Stop in /usr/src.
> -
> 

Look how much space left on partition /
df -h
and is not used for the root account
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Filesystem full when installing custom kernel in FreeBSD

2010-10-19 Thread Ivan Klymenko
В Tue, 19 Oct 2010 13:53:34 +0200
Jack Engqvist Johansson  пишет:

> On Tue, Oct 19, 2010 at 8:30 AM, Ivan Klymenko  wrote:
> > В Tue, 19 Oct 2010 00:53:31 +0200
> > Jack Engqvist Johansson  пишет:
> >
> >> Hi,
> >>
> >> I just got succeeded with my compilation of a custom kernel for
> >> FreeBSD 8.1. But when I'm trying to install it, I got an error.
> >> File system is full!
> >>
> >> So I moved the old kernel to another partition, but got the same
> >> error. And I cannot move it back again.
> >> Whats wrong? How can I do to get a kernel again?
> >>
> >> Thanks.
> >> Best regards, Jack Engvist Johansson
> >>
> >>
> >>
> >>  bsd# make installkernel KERNCONF=NECTRUS
> >> --
> >> >>> Installing kernel
> >> --
> >> cd /usr/obj/usr/src/sys/NECTRUS;  MAKEOBJDIRPREFIX=/usr/obj
> >> MACHINE_ARCH=amd64  MACHINE=amd64  CPUTYPE=
> >> GROFF_BIN_PATH=/usr/obj/usr/src/tmp/legacy/usr/bin
> >> GROFF_FONT_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/groff_font
> >> GROFF_TMAC_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/tmac
> >> PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/usr/games:/usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/usr/obj/usr/src/tmp/usr/games:/sbin:/bin:/usr/sbin:/usr/bin
> >>  make KERNEL=kernel install
> >> thiskernel=`sysctl -n kern.bootfile` ;  if [ ! "`dirname
> >> "$thiskernel"`" -ef /boot/kernel ] ; then  chflags -R noschg
> >> /boot/kernel ;  rm -rf /boot/kernel ;  else  if
> >> [ -d /boot/kernel.old ] ; then  chflags -R
> >> noschg /boot/kernel.old ;  rm -rf /boot/kernel.old ;  fi ;
> >>  mv /boot/kernel /boot/kernel.old ;  sysctl
> >> kern.bootfile=/boot/kernel.old/"`basename "$thiskernel"`" ;  fi
> >> mkdir -p /boot/kernel install -p -m 555 -o root -g wheel
> >> kernel /boot/kernel
> >>
> >> /: write failed, filesystem is full
> >> install: /boot/kernel/kernel: No space left on device
> >> *** Error code 71
> >>
> >> Stop in /usr/obj/usr/src/sys/NECTRUS.
> >> *** Error code 1
> >>
> >> Stop in /usr/src.
> >> *** Error code 1
> >>
> >> Stop in /usr/src.
> >> -
> >>
> >
> > Look how much space left on partition /
> > df -h
> > and is not used for the root account
> >
> 
> $ df -h
> Filesystem SizeUsed   Avail Capacity  Mounted on
> /dev/ad4s1a496M490M-34M   108%/
> devfs  1.0K1.0K  0B   100%/dev
> /dev/ad4s1e496M 26M430M 6%/tmp
> /dev/ad4s1f137G 13G113G10%/usr
> /dev/ad4s1d2.8G162M2.4G 6%/var
> procfs 4.0K4.0K  0B   100%/proc
> linprocfs  4.0K4.0K  0B   100%/usr/compat/linux/proc
> 
> 
> Nautilus: 4258945024 bytes (Free space)
> /root: 14.2 KB (Used space)
> 
> 

show me the output the following commands from the root account:
du -chd0 /bin
du -chd0 /boot
du -chd0 /etc
du -chd0 /lib
du -chd0 /libexec
du -chd0 /root
du -chd0 /sbin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Filesystem full when installing custom kernel in FreeBSD

2010-10-19 Thread Ivan Klymenko
В Tue, 19 Oct 2010 15:58:35 +0200
Jack Engqvist Johansson  пишет:

> On Tue, Oct 19, 2010 at 3:38 PM, Ivan Klymenko  wrote:
> > В Tue, 19 Oct 2010 13:53:34 +0200
> > Jack Engqvist Johansson  пишет:
> >
> >> On Tue, Oct 19, 2010 at 8:30 AM, Ivan Klymenko 
> >> wrote:
> >> > В Tue, 19 Oct 2010 00:53:31 +0200
> >> > Jack Engqvist Johansson  пишет:
> >> >
> >> >> Hi,
> >> >>
> >> >> I just got succeeded with my compilation of a custom kernel for
> >> >> FreeBSD 8.1. But when I'm trying to install it, I got an error.
> >> >> File system is full!
> >> >>
> >> >> So I moved the old kernel to another partition, but got the same
> >> >> error. And I cannot move it back again.
> >> >> Whats wrong? How can I do to get a kernel again?
> >> >>
> >> >> Thanks.
> >> >> Best regards, Jack Engvist Johansson
> >> >>
> >> >>
> >> >>
> >> >>  bsd# make installkernel KERNCONF=NECTRUS
> >> >> --
> >> >> >>> Installing kernel
> >> >> --
> >> >> cd /usr/obj/usr/src/sys/NECTRUS;  MAKEOBJDIRPREFIX=/usr/obj
> >> >> MACHINE_ARCH=amd64  MACHINE=amd64  CPUTYPE=
> >> >> GROFF_BIN_PATH=/usr/obj/usr/src/tmp/legacy/usr/bin
> >> >> GROFF_FONT_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/groff_font
> >> >> GROFF_TMAC_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/tmac
> >> >> PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/usr/games:/usr/obj/usr/src/tmp/usr/sbin:/usr/obj/usr/src/tmp/usr/bin:/usr/obj/usr/src/tmp/usr/games:/sbin:/bin:/usr/sbin:/usr/bin
> >> >>  make KERNEL=kernel install
> >> >> thiskernel=`sysctl -n kern.bootfile` ;  if [ ! "`dirname
> >> >> "$thiskernel"`" -ef /boot/kernel ] ; then  chflags -R noschg
> >> >> /boot/kernel ;  rm -rf /boot/kernel ;  else  if
> >> >> [ -d /boot/kernel.old ] ; then  chflags -R
> >> >> noschg /boot/kernel.old ;  rm -rf /boot/kernel.old ;  fi ;
> >> >>  mv /boot/kernel /boot/kernel.old ;  sysctl
> >> >> kern.bootfile=/boot/kernel.old/"`basename "$thiskernel"`" ;  fi
> >> >> mkdir -p /boot/kernel install -p -m 555 -o root -g wheel
> >> >> kernel /boot/kernel
> >> >>
> >> >> /: write failed, filesystem is full
> >> >> install: /boot/kernel/kernel: No space left on device
> >> >> *** Error code 71
> >> >>
> >> >> Stop in /usr/obj/usr/src/sys/NECTRUS.
> >> >> *** Error code 1
> >> >>
> >> >> Stop in /usr/src.
> >> >> *** Error code 1
> >> >>
> >> >> Stop in /usr/src.
> >> >> -
> >> >>
> >> >
> >> > Look how much space left on partition /
> >> > df -h
> >> > and is not used for the root account
> >> >
> >>
> >> $ df -h
> >> Filesystem     Size    Used   Avail Capacity  Mounted on
> >> /dev/ad4s1a    496M    490M    -34M   108%    /
> >> devfs          1.0K    1.0K      0B   100%    /dev
> >> /dev/ad4s1e    496M     26M    430M     6%    /tmp
> >> /dev/ad4s1f    137G     13G    113G    10%    /usr
> >> /dev/ad4s1d    2.8G    162M    2.4G     6%    /var
> >> procfs         4.0K    4.0K      0B   100%    /proc
> >> linprocfs      4.0K    4.0K      0B   100%
> >>  /usr/compat/linux/proc
> >>
> >>
> >> Nautilus: 4258945024 bytes (Free space)
> >> /root: 14.2 KB (Used space)
> >>
> >>
> >
> > show me the output the following commands from the root account:
> > du -chd0 /bin
> > du -chd0 /boot
> > du -chd0 /etc
> > du -chd0 /lib
> > du -chd0 /libexec
> > du -chd0 /root
> > du -chd0 /sbin
> >
> 
> bsd# du -chd0 /bin
> 1.2M  /bin
> 1.2M  total
> bsd# du -chd0 /boot
>  14M  /boot
>  14M  total
> bsd# du -chd0 /etc
> 1.7M  /etc
> 1.7M  total
> bsd# du -chd0 /lib
> 7.5M  /lib
> 7.5M  total
> bsd# du -chd0 /libexec
> 514K  /libexec
> 514K  total
> bsd# du -chd0 /root
> 457M  /root
> 457M  total

!!
do not use the Root account to work in the system!
!!
Create another account for this...
go to this directory (/root) and delete the files that take up much space and 
you're free ~ 450Mb...

> bsd# du -chd0 /sbin
> 4.6M  /sbin
> 4.6M  total
> 
> 




С уважением, Иван!
--
Мы можем все - что можем себе представить!

jabber: fi...@jabber.ru
skype: freedom_fidaj
youtube channel: http://www.youtube.com/freedomfidaj
mob.: +380938326345
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Filesystem full when installing custom kernel in FreeBSD

2010-10-19 Thread Ivan Klymenko
В Tue, 19 Oct 2010 16:31:55 +0200
Jack Engqvist Johansson  пишет:

> >> bsd# du -chd0 /root
> >> 457M  /root
> >> 457M  total
> >
> > !!
> > do not use the Root account to work in the system!
> > !!
> > Create another account for this...
> > go to this directory (/root) and delete the files that take up much
> > space and you're free ~ 450Mb...
> >
> >> bsd# du -chd0 /sbin
> >> 4.6M  /sbin
> >> 4.6M  total
> >>
> 
> It was the .local folder in /root! Got my kernel back :) Thanks!
> 
> You mean that I should create a user within the group wheel?

Yes.

> Now, I use my user (jack) and just type 'su' to get root access. Or do
> you mean sudo?

Why do you need to run jackd with root user account?
And how do you use it? Through QjackCtl?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Filesystem full when installing custom kernel in FreeBSD

2010-10-19 Thread Ivan Klymenko
В Tue, 19 Oct 2010 23:28:10 +0300
Ivan Klymenko  пишет:

> В Tue, 19 Oct 2010 16:31:55 +0200
> Jack Engqvist Johansson  пишет:
> 
> > >> bsd# du -chd0 /root
> > >> 457M  /root
> > >> 457M  total
> > >
> > > !!
> > > do not use the Root account to work in the system!
> > > !!
> > > Create another account for this...
> > > go to this directory (/root) and delete the files that take up
> > > much space and you're free ~ 450Mb...
> > >
> > >> bsd# du -chd0 /sbin
> > >> 4.6M  /sbin
> > >> 4.6M  total
> > >>
> > 
> > It was the .local folder in /root! Got my kernel back :) Thanks!
> > 
> > You mean that I should create a user within the group wheel?
> 
> Yes.
> 
> > Now, I use my user (jack) and just type 'su' to get root access. Or
> > do you mean sudo?
> 
> Why do you need to run jackd with root user account?
> And how do you use it? Through QjackCtl?

Excuse me - did not understand:)

Yes, of course - to access root using 'sudo', and if the user (jack) is also 
added to the group wheel, then you can use 'su' ...

Sorry for my English ...
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: fsync(2) manual and hdd write caching

2010-10-26 Thread Ivan Voras

On 10/26/10 23:36, Alexander Best wrote:

hi there,

since there's a thread on freebsd-questions@ concerning fsync(2) and the fact
that hdd write caching can cause this syscall to basically be a no op, could
somebody please copy the BUGS section from sync(2) to fsync(2)?


I don't think they are the same.

The "buffers" of sync(2) are not those from the discussion on fsync(2) 
safety. Or more correctly, they are but those 2 calls work on a 
different scope.


fsync(2) actually does behave as advertised, "auses all modified data 
and attributes of fd to be moved to a permanent storage device". It is 
the problem of the "permanent storage device" if it caches this data 
further.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: fsync(2) manual and hdd write caching

2010-10-27 Thread Ivan Voras
On 10/27/10 12:11, Bruce Cran wrote:
> On Wed, 27 Oct 2010 02:00:51 -0700
> per...@pluto.rain.com wrote:
> 
>> Short of mounting synchronously, with the attendant performance
>> hit, would it not make sense for fsync(2) to issue ATA_FLUSHCACHE
>> or SCSI "SYNCHRONIZE CACHE" after it has finished writing data
>> to the drive?  Surely the low-level capability to issue those
>> commands must already exist, else we would have no way to safely
>> prepare for power off.
> 
> mounting synchronously won't help, will it? As I understand it that
> just makes sure that data is sent straight to disk and not left in
> memory; the data will still be stored in the HDD cache for a
> while.

Correct. The problem is actually pretty hard - since AFAIK SoftUpdates
doesn't have "checkpoints" in the sense that it groups writes and all
data "before" can guaranteed to be on-disk, the problem is *when* to
issue BIO_FLUSH requests. One possible solution is to simply decide on a
heuristic like: "ok, doing BIO_FLUSH all the time will destroy
performance, we will only do it for every metadata write". Possibly with
a sysctl tunable or per-mount option.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


kern.smp.topology

2010-11-10 Thread Ivan Klymenko
Hello! People.

Who can explain the purpose of sysctl variable kern.smp.topology?
What does it affect?

It may take such values:
1  -Dual core with no sharing.
2  -No topology, all cpus are equal.
3  -Dual core with shared L2.
4  -quad core, shared l3 among each package, private l2.
5  -quad core,  2 dualcore parts on each package share l2.
6  -Single-core 2xHTT
7  -quad core with a shared l3, 8 threads sharing L2.
default-Default, ask the system what it wants.

Does it make sense to set its value manually, if I know that my CPU Core2Duo?
How to do this, select a value?

I not found this explanation in any of the official guides ...

Thanks!
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: kern.smp.topology

2010-11-10 Thread Ivan Voras
On 11/10/10 11:56, Ivan Klymenko wrote:
> Hello! People.
> 
> Who can explain the purpose of sysctl variable kern.smp.topology?
> What does it affect?
> 
> It may take such values:
> 1  -Dual core with no sharing.
> 2  -No topology, all cpus are equal.
> 3  -Dual core with shared L2.
> 4  -quad core, shared l3 among each package, private l2.
> 5  -quad core,  2 dualcore parts on each package share l2.
> 6  -Single-core 2xHTT
> 7  -quad core with a shared l3, 8 threads sharing L2.
> default-Default, ask the system what it wants.
> 
> Does it make sense to set its value manually, if I know that my CPU Core2Duo?
> How to do this, select a value?
> 
> I not found this explanation in any of the official guides ...

Short answer is: you should not have to touch it, ever.

Long answer: it's used mostly for testing ULE and debugging
topology-related problems. It's even less relevant in recent kernels (9,
8-stable) where a better topology parser has been committed.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: kern.smp.topology

2010-11-10 Thread Ivan Klymenko
В Wed, 10 Nov 2010 12:20:45 +0100
Ivan Voras  пишет:

> On 11/10/10 11:56, Ivan Klymenko wrote:
> > Hello! People.
> > 
> > Who can explain the purpose of sysctl variable kern.smp.topology?
> > What does it affect?
> > 
> > It may take such values:
> > 1  -Dual core with no sharing.
> > 2  -No topology, all cpus are equal.
> > 3  -Dual core with shared L2.
> > 4  -quad core, shared l3 among each package, private l2.
> > 5  -quad core,  2 dualcore parts on each package share l2.
> > 6  -Single-core 2xHTT
> > 7  -quad core with a shared l3, 8 threads sharing L2.
> > default-Default, ask the system what it wants.
> > 
> > Does it make sense to set its value manually, if I know that my CPU
> > Core2Duo? How to do this, select a value?
> > 
> > I not found this explanation in any of the official guides ...
> 
> Short answer is: you should not have to touch it, ever.
> 
> Long answer: it's used mostly for testing ULE and debugging
> topology-related problems. It's even less relevant in recent kernels
> (9, 8-stable) where a better topology parser has been committed.
> 
Thank you! I understood. :)
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Network socket concurrency (userland)

2010-11-16 Thread Ivan Voras

On 11/16/10 16:19, Joerg Sonnenberger wrote:

On Tue, Nov 16, 2010 at 03:37:59PM +0100, Ivan Voras wrote:

Are there any standard-defined guarantees for TCP network sockets
used by multiple threads to do IO on them?


System calls are atomic relative to each other. They may be partially
executed from the perspective of a remote system, e.g. due to
segmentation, but one system call will finish before the next call of
the same category is started.


It seems to me that with a multithreaded kernel and multithreaded 
userland that cannot really be guaranteed in general (and really should 
not be guaranteed for performance reasons), but maybe it's true for e.g. 
sockets if they are protected by a mutex or two within the kernel?



Specifically, will multiple write() or send() calls on the same
socket execute serially (i.e. not interfere with each other) and
blocking (until completion) even for large buffer sizes? What about
read() / recv()?


All write operations are serialised against each other, just like all
read operations are serialised against.


To what degree is such serialization standardized (by POSIX?)?

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Network socket concurrency (userland)

2010-11-19 Thread Ivan Voras
Are there any standard-defined guarantees for TCP network sockets used 
by multiple threads to do IO on them?


Specifically, will multiple write() or send() calls on the same socket 
execute serially (i.e. not interfere with each other) and blocking 
(until completion) even for large buffer sizes? What about read() / recv()?


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Userland DTrace ready?

2010-11-25 Thread Ivan Voras

Is userland DTrace ready?

The postgresql port (databases/postgresql90-server) has an option to
be built with dtrace, but when I use it it fails with this error:

gmake[1]: Entering directory
`/usr/ports/databases/postgresql90-server/work/postgresql-9.0.1/src/backend/utils'
dtrace -C -h -s probes.d -o probes.h.tmp
dtrace: failed to compile script probes.d: "/usr/lib/dtrace/psinfo.d",
line 37: syntax error near "uid_t"
gmake[1]: *** [probes.h] Error 1
gmake[1]: Leaving directory
`/usr/ports/databases/postgresql90-server/work/postgresql-9.0.1/src/backend/utils'
gmake: *** [utils/probes.h] Error 2
*** Error code 2

The error is here both in 9 / HEAD and in 8-stable. Any suggestions?

(for all I know, this may be PostgreSQL's fault...)

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Simple kernel attack using socketpair.

2010-11-26 Thread Ivan Klymenko
Hello!
Rumor has it that this vulnerability applies to FreeBSD too, with the
replacement SOCK_SEQPACKET on SOCK_DGRAM...

http://lkml.org/lkml/2010/11/25/8

What do you think about this?

Thank you!
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Simple kernel attack using socketpair.

2010-11-26 Thread Ivan Klymenko
В Fri, 26 Nov 2010 12:26:39 +0200
Ivan Klymenko  пишет:

> Hello!
> Rumor has it that this vulnerability applies to FreeBSD too, with the
> replacement SOCK_SEQPACKET on SOCK_DGRAM...
and add:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

> 
> http://lkml.org/lkml/2010/11/25/8
> 
> What do you think about this?
> 
> Thank you!
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [rfc] Replacing FNV and hash32 with Paul Hsieh's SuperFastHash

2010-12-25 Thread Ivan Voras
On 23.12.2010 23:46, Gleb Kurtsou wrote:

> For testing I've used dbench with 16 processes on 1 Gb swap back md
> device, UFS + SoftUpdates:
> Old hash (Mb/s): 599.94  600.096 599.536
> SFH hash (Mb/s): 612.439 612.341 609.673
> 
> It's just ~1% improvement, but dbench is not a VFS metadata intensive
> benchmark. Subjectively it feels faster accessing maildir mailboxes
> with ~10.000 messages : )

Try blogbench if you need metadata-intensive operations, or even fsx.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [rfc] Replacing FNV and hash32 with Paul Hsieh's SuperFastHash

2010-12-26 Thread Ivan Voras
On 26 December 2010 14:24, Gleb Kurtsou  wrote:
> On (25/12/2010 20:29), Ivan Voras wrote:
>> On 23.12.2010 23:46, Gleb Kurtsou wrote:
>>
>> > For testing I've used dbench with 16 processes on 1 Gb swap back md
>> > device, UFS + SoftUpdates:
>> > Old hash (Mb/s): 599.94  600.096 599.536
>> > SFH hash (Mb/s): 612.439 612.341 609.673
>> >
>> > It's just ~1% improvement, but dbench is not a VFS metadata intensive
>> > benchmark. Subjectively it feels faster accessing maildir mailboxes
>> > with ~10.000 messages : )
>>
>> Try blogbench if you need metadata-intensive operations, or even fsx.

> blogbench should be good, but I've always had hard time interpreting its
> results. Besides results tend to very a lot, there is no way to set seed
> value like in fsx, so that I could run exactly the same test in different
> configurations.

I think the exact sequence of blogbench operations depends on duration
of previous operations (it's multithreaded) so from that angle you are
right - you can't do a repeatable run except in the trivial cases. On
the other hand, it uses rand() without seeding it with
srand()/sranddev() so this part is actually very repeatable :)

> fsx is a different beast, it reads/writes/truncates at random offsets -
> great tool for debugging mmap/truncate issues. Patch doesn't improve it
> in any way.

It depends on what metadata operations you require - blogbench will
create, find and write files (if we ignore atime); fsx will create a
decent amount of traffic with file size and mtime changes. In your
case you'll probably need to run it on a memory file system or tmpfs
due to sensitivity to disk IO latencies (if your improvements is on
the order of few percent).
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: gpart/gstripe problems?

2011-01-18 Thread Ivan Voras
On 18.1.2011 16:48, Daniel Braniss wrote:
> I have:
> sf-03> gpart show
> =>   34  976773101  ada0  GPT  (466G)
>  34128 1  freebsd-boot  (64K)
> 1624194304 2  freebsd-ufs  (2.0G)
> 4194466  100663296 3  freebsd-swap  (48G)
>   104857762  871915373 4  freebsd  (416G)
> 
> =>   34  976773101  ada1  GPT  (466G)
>  34128 1  freebsd-boot  (64K)
> 1624194304 2  freebsd-ufs  (2.0G)
> 4194466  100663296 3  freebsd-swap  (48G)
>   104857762  871915373 4  freebsd  (416G)
> 
> sf-03> ls -ls /dev/ada*
> 0 crw-r-  1 root  operator0,  78 Jan 18 14:35 /dev/ada0
> 0 crw-r-  1 root  operator0,  80 Jan 18 14:35 /dev/ada0p1
> 0 crw-r-  1 root  operator0,  81 Jan 18 14:35 /dev/ada0p2
> 0 crw-r-  1 root  operator0,  82 Jan 18 14:35 /dev/ada0p3
> 0 crw-r-  1 root  operator0,  83 Jan 18 14:35 /dev/ada0s4
> 0 crw-r-  1 root  operator0,  79 Jan 18 14:35 /dev/ada1
> 0 crw-r-  1 root  operator0,  84 Jan 18 14:35 /dev/ada1p1
> 0 crw-r-  1 root  operator0,  85 Jan 18 14:35 /dev/ada1p2
> 0 crw-r-  1 root  operator0,  86 Jan 18 14:35 /dev/ada1p3
> 0 crw-r-  1 root  operator0,  87 Jan 18 14:35 /dev/ada1s4
> 
> next I did:
> # gstripe label s0 /dev/ada{0,1}s4 
> and on the console the following appeared:
> GEOM_STRIPE: Device s0 activated.
> GEOM_STRIPE: Cannot add disk gptid/bd0f6e54-22ea-11e0-b27c-001b245d5a5b to s0 
> (error=17).
> GEOM_STRIPE: Cannot add disk gptid/bdf7d563-22ea-11e0-b27c-001b245d5a5b to s0 
> (error=17).
> 
> is this realy an error?

It looks like a similar type of error as people commonly see with
gmirror - a race with glabel. If you don't use glabel, the easyest way
would be to disable some of kern.geom.label.*.

I think one way to solve it would be for glabel export an attribute for
devices (providers) on which this is possible (i.e. those whose size
doesn't change from the underlying devices) which could be checked by
such GEOM classes.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Why not give git a try? (was "Re: [head tinderbox] failure on amd64/amd64")

2011-01-24 Thread Ivan Voras

On 24.1.2011 9:13, Garrett Cooper wrote:

On Sun, Jan 23, 2011 at 9:16 PM, Peter Jeremy  wrote:

On 2011-Jan-21 20:01:32 +0100, "Simon L. B. Nielsen"  wrote:

Perhaps we should just set the tinderbox up to sync directly of cvsup-master 
instead if that makes it more useful?


Can cvsup-master still lose atomicity of commits?  I suspect it can,
in which case syncing directly off the SVN master would seem a better
approach.


I think des is working on "svnup" to work directly on the SVN tree.


I've seen a lot of `self-healing' failures lately w.r.t. cvsup, so I
wonder if it's time to look at another solution to this problem as
these annoying stability issues don't appear to be going away. What
about git?


As long as we're choosing bikeshed colour, I would like to drop 
"mercurial" here :)


Mainly because of this:

> - Higher learning curve.

I found Mercurial to have an easier learning curve and to be something 
like a "DSCM for the users of CVS/SVN".


> - Some slightly annoying nits with stashing local changes when working
> on separate branches (need to talk to git maintainers).

I don't know if we're talking about the same thing, but I've also 
noticed git tends to do things the long way around which should be 
simple. Git's also much "lower level".


They both support pretty much the same feature set; here's a cute but 
dated comparison:


http://importantshock.wordpress.com/2008/08/07/git-vs-mercurial/

Hg is/was AFAIK used by Sun.

Anyway, personally, svn is good enough :)


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Tracking down a problem with php on FreeBSD

2011-01-24 Thread Ivan Voras

On 23.1.2011 23:58, Ruslan Mahmatkhanov wrote:


Good day!

We are using custom php application on FreeBSD 8.1R amd64. It is started
with php-fpm 5.3.3 from ports as backend and nginx 0.8.54 as frontend.
Several times per day this app is making self unavailable.


I think it would be more appropriate to ask this on the stable@ list.


Simple php-fpm restart solves the problem, but i need to track it down
to the cause of this situation and ask for your assistance and
instructions on how to debug it. Some facts about this:


On one hand, FPM is said to be very experimental...

Personally, I've been using apache22-worker or apache22-event + 
mod_fcgid for years without trouble.



- I don't know how to manually reproduce this, but it happens several
times every day
- It happens on FreeBSD 7.x too
- It happens with apache+mod_php instead php-fpm
- It happens with lighthttpd instead nginx
- It happens with both SHED_4BSD and SHED_ULE
- It doesn't happen on php =< 5.2.12
- It happens with and w/o eaccelerator


It looks very application-specific, possibly not really an OS problem 
(or maybe a problem of different expectations from the OS when porting 
from Linux).



- `top -mio` shows very high (8-9 for VCSW) VCSW/IVCSW values
for php-fpm processes and LA is more than 120


How many "real" user request are in these 120? Do any users at the time 
of problem (this doesn't look like a "crash") receive valid responses?



- user seeing http 502 error code in browser
- php-fpm log has many of this lines in time of crash:
Jan 23 17:56:58.176425 [WARNING] [pool world] server reached
max_children setting (100), consider raising it


Did you try raising it? Does the error happen ONLY when this limit is 
reached?



2011/01/23 17:57:00 [error] 38018#0: *26006023 writev() failed (54:
Connection reset by peer) while sending request to upstream, client:
xx.xx.xx.xx, server: some.server.org, request: "POST /?ctrl=Chat&
a=chatList&__path=chat_list&h=8093b9e1cf448762d5677e21bded97ae&
h1=38ca8b747a46098c3b1a4f39e6658170 HTTP/1.1", upstream:
"fastcgi://127.0.0.1:9002", host: "some.server.org", referrer:
"http://some.server.org/";
2011/01/23 17:57:00 [error] 38016#0: *26029878 kevent() reported
about an closed connection (54: Connection reset by peer) while
reading response header from upstream, client: xx.xx.xx.xx, server:
some.server.org, request: "POST /?ctrl=Location&a=refresh&
__path=refresh&h=276f591df26a65d9a1736f6e1006f4ab&
h1=3c0916c16b1fc2e7015b71e90bbc3d23 HTTP/1.1", upstream:
"fastcgi://127.0.0.1:9002", host: "some.server.org", referrer:
"http://some.server.org/";
2011/01/23 17:57:02 [crit] 38020#0: *26034390 open() "/tmp/nginx
/client_temp/1/74/000741" failed (13: Permission denied) while
sending request to upstream, client: xx.xx.xx.xx, server:
some.server.org, request: "POST /?ctrl=Chat&a=send&__path=chat_send&
h=4a27d8d382ba9b1059412323a451ef84&
h1=b0a53c86e3c744a01356a5030559ed1a HTTP/1.1", upstream:
"fastcgi://127.0.0.1:9002", host: "some.server.org", referrer:
"http://some.server.org/";
2011/01/23 17:57:02 [alert] 38020#0: *26034390 http request count is
zero while sending to client, client: xx.xx.xx.xx, server:
some.server.org, request: "POST /?ctrl=Chat&a=send&__path=chat_send&
h=4a27d8d382ba9b1059412323a451ef84&
h1=b0a53c86e3c744a01356a5030559ed1a HTTP/1.1", upstream:
"fastcgi://127.0.0.1:9002", host: "some.server.org", referrer:
"http://some.server.org/";
2011/01/23 17:57:03 [error] 38014#0: *25997903 upstream prematurely
closed connection while reading response header from upstream,
client: 109.229.69.186, server: some.server.org, request: "POST
/?ctrl=Chat&a=chatList&__path=chat_list&
h=c8723de73c4f8ebb98f9bf746d75e965&
h1=3ab289760a009b07b73c6d96cc94a509 HTTP/1.1", upstream:
"fastcgi://127.0.0.1:9002", host: "some.server.org", referrer:
"http://some.server.org/";


These are some very varied errors, not especially consistent with each 
other.


Did you try some generic socket & TCP tuning like described in 
http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel 
?


Other than that, you will probably have to debug the php-fpm processes. 
Start by observing in which state they are (top without "-mio"). If the 
processes are blocking, try "procstat -k " on them.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Why not give git a try? (was "Re: [head tinderbox] failure on amd64/amd64")

2011-01-24 Thread Ivan Voras
On 24 January 2011 19:31, Diane Bruce  wrote:

> As long as it is not GPL.

Unless there's a missing smiley in that sentence there, it is a tough
requirement. Of the major SCMs, only Subversion is non-GPL-ed (even
CVS is...).
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Why not give git a try? (was "Re: [head tinderbox] failure on amd64/amd64")

2011-01-25 Thread Ivan Voras
On 25 January 2011 11:22,   wrote:
> Diane Bruce  wrote:
>
>> There certainly would not be a chance of putting
>> mercurial or git into base for example.
>
> Completely apart from licensing, another strike against
> mercurial is that it is written in Python, so it couldn't
> go into base unless Python also went into base.

Of course. OTOH, this topic will only become relevant if anyone
notices that Subversion gets committed into base ;)
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Namecache lock contention?

2011-01-28 Thread Ivan Voras

I have this situation on a PHP server:

36623 www 1  760   237M 30600K *Name   6   0:14 47.27% php-cgi
36638 www 1  760   237M 30600K *Name   3   0:14 46.97% php-cgi
36628 www 1 1050   237M 30600K *Name   2   0:14 46.88% php-cgi
36627 www 1 1050   237M 30600K *Name   0   0:14 46.78% php-cgi
36639 www 1 1050   237M 30600K *Name   5   0:14 46.58% php-cgi
36643 www 1 1050   237M 30600K *Name   7   0:14 46.39% php-cgi
36629 www 1  760   237M 30600K *Name   1   0:14 46.39% php-cgi
36642 www 1 1050   237M 30600K *Name   2   0:14 46.39% php-cgi
36626 www 1 1050   237M 30600K *Name   5   0:14 46.19% php-cgi
36654 www 1 1050   237M 30600K *Name   7   0:13 46.19% php-cgi
36645 www 1 1050   237M 30600K *Name   1   0:14 45.75% php-cgi
36625 www 1 1050   237M 30600K *Name   0   0:14 45.56% php-cgi
36624 www 1 1050   237M 30600K *Name   6   0:14 45.56% php-cgi
36630 www 1  760   237M 30600K *Name   7   0:14 45.17% php-cgi
36631 www 1 1050   237M 30600K RUN 4   0:14 45.17% php-cgi
36636 www 1 1050   237M 30600K *Name   3   0:14 44.87% php-cgi

It looks like periodically most or all of the php-cgi processes are 
blocked in "*Name" for long enough that "top" notices, then continue, 
probably in a "thundering herd" way. From grepping inside /sys the most 
likely suspect seems to be something in the namecache, but I can't find 
exactly a symbol named "Name" or string beginning with "Name" that would 
be connected to a lock.


Has anyone investigated this before? Any ideas where to look?

The PHP script used above should not do any filesystem writing but it 
stats and reads a lot of small files and libraries.


This is 8-stable, UFS.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Namecache lock contention?

2011-01-28 Thread Ivan Voras
On 28 January 2011 16:15, John Baldwin  wrote:
> On Friday, January 28, 2011 8:46:07 am Ivan Voras wrote:
>> I have this situation on a PHP server:
>>
>> 36623 www         1  76    0   237M 30600K *Name   6   0:14 47.27% php-cgi
>> 36638 www         1  76    0   237M 30600K *Name   3   0:14 46.97% php-cgi
>> 36628 www         1 105    0   237M 30600K *Name   2   0:14 46.88% php-cgi
>> 36627 www         1 105    0   237M 30600K *Name   0   0:14 46.78% php-cgi
>> 36639 www         1 105    0   237M 30600K *Name   5   0:14 46.58% php-cgi
>> 36643 www         1 105    0   237M 30600K *Name   7   0:14 46.39% php-cgi
>> 36629 www         1  76    0   237M 30600K *Name   1   0:14 46.39% php-cgi
>> 36642 www         1 105    0   237M 30600K *Name   2   0:14 46.39% php-cgi
>> 36626 www         1 105    0   237M 30600K *Name   5   0:14 46.19% php-cgi
>> 36654 www         1 105    0   237M 30600K *Name   7   0:13 46.19% php-cgi
>> 36645 www         1 105    0   237M 30600K *Name   1   0:14 45.75% php-cgi
>> 36625 www         1 105    0   237M 30600K *Name   0   0:14 45.56% php-cgi
>> 36624 www         1 105    0   237M 30600K *Name   6   0:14 45.56% php-cgi
>> 36630 www         1  76    0   237M 30600K *Name   7   0:14 45.17% php-cgi
>> 36631 www         1 105    0   237M 30600K RUN     4   0:14 45.17% php-cgi
>> 36636 www         1 105    0   237M 30600K *Name   3   0:14 44.87% php-cgi
>>
>> It looks like periodically most or all of the php-cgi processes are
>> blocked in "*Name" for long enough that "top" notices, then continue,
>> probably in a "thundering herd" way. From grepping inside /sys the most
>> likely suspect seems to be something in the namecache, but I can't find
>> exactly a symbol named "Name" or string beginning with "Name" that would
>> be connected to a lock.
>
> In vfs_cache.c:
>
> static struct rwlock cache_lock;
> RW_SYSINIT(vfscache, &cache_lock, "Name Cache");

You're right, I misread it as SYSCTL at a glance.

> What are the php scripts doing?  Do they all try to create and delete files at
> the same time (or do renames)?

Right again - they do simultaneously create session files and in rare
occasions (1%) delete them. These are "sharded" into a two-level
directory structure by single letter (/storage/a/b/file, i.e. 32^2
directories); dirhash is large enough.

During all this, the web server did around 60 PHP pages per second so
it doesn't look to me like there should be such noticable contention
(i.e. at most, there are 60 files/s created and on average 60/100
deletes). The file system is on softupdates, there's only light IO.

Typical vmstat is:

 procs  memory  pagedisks faults cpu
 r b w avmfre   flt  re  pi  pofr  sr da0 da1   in   sy
cs us sy id

17 0 0   8730M  1240M 3   0   0   0   206   0   1   0 1948 266928
15079 65 34  1
19 0 0   8730M  1240M 0   0   0   0   290   0   1  24 1835 260618
15132 63 35  2
 7 0 0   8730M  1239M 0   0   0   0   200   0   0   0 1822 260783
14851 63 35  2
16 0 0   8730M  1239M 0   0   0   0   199   0 788   0 2744 259902
20465 61 37  2
16 0 0   8730M  1239M 0   0   0   0   210   0   0   0 1755 265081
17564 61 37  2

(8 cores; around 35% sys load across them - I'm trying to find out why).
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Namecache lock contention?

2011-01-28 Thread Ivan Voras
On 28 January 2011 16:25, Dan Nelson  wrote:

> My guess would be:
>
> kern/vfs_cache.c:151 static struct rwlock cache_lock;
> kern/vfs_cache.c:152 RW_SYSINIT(vfscache, &cache_lock, "Name Cache");
>
> The CACHE_*LOCK() macros.c in vfs_cache use cache_lock, so you've got lots
> of possible contention points.  procstat -ka and/or dtrace might help you
> determine exactly where.

I'm new with dtrace so I tried this:

lockstat:::rw-block
{
@traces[stack()] = count();
}

with these results:

http://ivoras.net/stuff/rw-block.txt

It's informative because most of the traces are namecache-related. As
suspected, the most blocking occurs in stat().

As this is a rwlock I'd interpret it as waiting for a write lock to be
lifted so the readers can acquire it, but I need to confirm this as
there's a lot of things that can in theory be stat()ed here.

I'm going to continue investigating with dtrace but I'd appreciate
pointers on how to make the output more useful (like including
filenames from stat()).
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Namecache lock contention?

2011-01-28 Thread Ivan Voras
On 28 January 2011 22:18, Gleb Kurtsou  wrote:

> You could try replacing rwlock with plain mutex to check if there are
> priority propagation issues among readers/writers.

How would that manifest? (i.e. how would it be detectable)

> SX locks should also
> work but would likely to be a considerable performance regression.

With mutexes I'd lose all shared (read) acquisitions so I doubt sx
locks would do much more harm :)

> Finding out home much activity is there outside of storage/a/b/file
> (sessions storage) could also be helpful.

Here's more information:

* The session storage (i.e. mostly file creates / writes in this
particular workload) is on a separate file system than the core of the
application (which only does reads)

* The dtrace output I've send is from around thirty seconds of
operation, so around 2000 PHP runs. (PHP in this case is FastCGI, so
the processes are persistent instead of constantly respawning). In
these 2000 runs there have been around 20,000 rw-block events in
cache_lookup - which is strange.

* Here's another dtrace output without rwlock mutex inlining, showing
a different picture than what I've earlier thought: most rw-blocks
events are in wlock! http://ivoras.net/stuff/rw-block-noinline.txt  --
there are also some blocks without a rwlock function in the trace; I
don't understand how rwlock inlining is implemented, maybe the readers
are always inlined?

Next step - find out how to make dtrace print files for which this happens.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Namecache lock contention?

2011-01-28 Thread Ivan Voras
On 28 January 2011 23:37, Gleb Kurtsou  wrote:

>> * The dtrace output I've send is from around thirty seconds of
>> operation, so around 2000 PHP runs. (PHP in this case is FastCGI, so
>> the processes are persistent instead of constantly respawning). In
>> these 2000 runs there have been around 20,000 rw-block events in
>> cache_lookup - which is strange.

> Are there rename, rmdir calls? - these purge namecache.
> If cache is empty, VOP_LOOKUP acquires write lock to populate the cache.

No, only creates and deletes on files, no directory operations at all.

>> * Here's another dtrace output without rwlock mutex inlining, showing
>> a different picture than what I've earlier thought: most rw-blocks
>> events are in wlock! http://ivoras.net/stuff/rw-block-noinline.txt  --
>> there are also some blocks without a rwlock function in the trace; I
>> don't understand how rwlock inlining is implemented, maybe the readers
>> are always inlined?
> Add options RWLOCK_NOINLINE, recompiling with -O0 might also be good
> idea.

That's what I meant by "without rwlock mutex inlining". The default
-O2 is enough - aggressive inlining only begins at -O3.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Scheduler question

2011-02-04 Thread Ivan Voras

On 04/02/2011 03:56, Daniel O'Connor wrote:


I hooked up a logic analyser and I can see most of the time it's fairly 
regularly transferring 16k of data every 2msec.

If I load up the disk by, eg, tar -cf /dev/null /local0 I find it drops out and 
I can see gaps in the transfers until eventually the FIFO fills up and it stops.

I am wondering if this is a scheduler problem (or I am expecting too much :) in 
that it is not running my libusb thread reliably under load. The other 
possibility is that it is a USB issue, although I am looking at using 
isochronous transfers instead of bulk.


I'm surprised this isn't complained about more often - I also regularly 
see that file system activity blocks other, non-file-using processes 
which are mostly CPU and memory intensive (but since I'm not running 
realtime things, it fell under the "good enough" category). Maybe there 
is kind of global-ish lock of some kind which the VM or the VFS hold 
which would interfere with normal operation of other processes (maybe 
when the processes use malloc() to grow their memory?).


Could you try 2 things:

	1) instead of doing file IO, could you directly use a disk device (e.g. 
/dev/ad0), possibly with some more intensive utility than dd (e.g. 
"diskinfo -vt") and see if there is any difference?


	2) if there is a difference in 1), try modifying your program to not 
use malloc() in the critical path (if applicable) and/or use mlock(2)?



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Scheduler question

2011-02-04 Thread Ivan Voras

On 04/02/2011 12:45, Daniel O'Connor wrote:


On 04/02/2011, at 21:48, Ivan Voras wrote:

I am wondering if this is a scheduler problem (or I am expecting too much :) in 
that it is not running my libusb thread reliably under load. The other 
possibility is that it is a USB issue, although I am looking at using 
isochronous transfers instead of bulk.


I'm surprised this isn't complained about more often - I also regularly see that file 
system activity blocks other, non-file-using processes which are mostly CPU and memory 
intensive (but since I'm not running realtime things, it fell under the "good 
enough" category). Maybe there is kind of global-ish lock of some kind which the VM 
or the VFS hold which would interfere with normal operation of other processes (maybe 
when the processes use malloc() to grow their memory?).


I guess for an interactive user anything less than 100msec is probably not 
noticeable unless it happens reasonably regularly when watching a video.


Could you try 2 things:

1) instead of doing file IO, could you directly use a disk device (e.g. 
/dev/ad0), possibly with some more intensive utility than dd (e.g. "diskinfo 
-vt") and see if there is any difference?


OK, I'll give it a shot.


2) if there is a difference in 1), try modifying your program to not 
use malloc() in the critical path (if applicable) and/or use mlock(2)?


It doesn't allocate memory once it's going, everything is preallocated before 
the data transfer starts.

I'll have a go with mlock() and see what happens.


Did you find anything interesting?

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Tracking down a problem with php on FreeBSD

2011-02-05 Thread Ivan Voras
On 5 February 2011 19:43, Ruslan Mahmatkhanov  wrote:
> Hi, Ivan!
>
> Thank you much for response and sorry for late answer. We was able to
> collect some data about the issue to make discussion more objective. See
> below.

>>> Simple php-fpm restart solves the problem, but i need to track it down
>>> to the cause of this situation and ask for your assistance and
>>> instructions on how to debug it. Some facts about this:
>>
>> On one hand, FPM is said to be very experimental...
>>
>> Personally, I've been using apache22-worker or apache22-event +
>> mod_fcgid for years without trouble.
>
> We prefer to avoid using apache at all, because in this it's just adds yet
> another unneeded link and complexity.

I guess it's about tradeoffs beween complexity and stability :)

>>> - `top -mio` shows very high (8-9 for VCSW) VCSW/IVCSW values
>>> for php-fpm processes and LA is more than 120

I think this is significant, especially with this:

> When attaching to any hanging php-fpm proccess with truss, than i see a lot
> of this calls:
> sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078)
> = 0 (0x0)
> sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078)
> = 0 (0x0)
> sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078)
> = 0 (0x0)
> sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078)
> = 0 (0x0)

"Normal" processes of the type PHP is have no need to call
sched_yield() arbitrarily, unless they are implementing something they
shouldn't - like a synchronization primitive (semaphore/lock). If "a
lot" means "of the same order of magnitude as your VCSW rate", this is
the reason for it.

I've analyzed my php-cgi binary and modules and they don't use sched_yield.

And yes, grepping for it in the source finds it only in FPM:

sapi/fpm/fpm/fpm_atomic.h:140:  sched_yield();

It seems they are trying to implement a spinlock by hand, instead of
using what the OS provides. (on the other hand, the implementation
might be correct but they may be using it wrong).

In any case, this points to bugs in FPM. if so, unfortunately I can't
help you further.

If you really want to continue using FPM, I guess you should probably
replace this hand-made lock implementation by sem(4) or see if
"robust" pthreads mutexes can be committed and MFCed (maybe with David
Xu).

Here is the FPM file:

http://svn.php.net/viewvc/php/php-src/branches/PHP_5_3/sapi/fpm/fpm/fpm_atomic.h?revision=305417&view=markup
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Tracking down a problem with php on FreeBSD

2011-02-05 Thread Ivan Voras
On 5 February 2011 21:03, Ruslan Mahmatkhanov  wrote:

>
> Can you please tell me more what you mean by ""robust" pthreads mutexes" and

It's just a name for properties of a mutex; actually this is
imprecise, what's needed here is process-shared & robust
(fpm_shm_slots.c: FPM uses shared memory).
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Tracking down a problem with php on FreeBSD

2011-02-05 Thread Ivan Voras
On 5 February 2011 21:22, Ivan Voras  wrote:
> On 5 February 2011 21:03, Ruslan Mahmatkhanov  wrote:
>
>>
>> Can you please tell me more what you mean by ""robust" pthreads mutexes" and
>
> It's just a name for properties of a mutex; actually this is
> imprecise, what's needed here is process-shared & robust
> (fpm_shm_slots.c: FPM uses shared memory).

Actually I think "robustness" is the key here (in this context it
means that the locks of a thread / processes are released if a thread
/ process dies unexpectedly (crashes)); It is very likely that in your
case the PHP process with FPM SAPI module dies while holding a lock
shared between processes and the other processes get stuck waiting for
this lock to unlock.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Tracking down a problem with php on FreeBSD

2011-02-05 Thread Ivan Voras
On 5 February 2011 23:11, Ruslan Mahmatkhanov  wrote:

> Yes, it seems so. But all of this locking/threading is a black magick for me
> right now, and i don't feel to be able to study out with this fpm issue by
> myself. So i just sent this last obtained info to php-fpm mailing list. And
> thank you again, Ivan, for your analysis and explanations.

They would likely be more helped if you found a core dump of a crashed
PHP process (if that is what's causing it).
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Scheduler question

2011-02-06 Thread Ivan Voras
On 7 February 2011 02:41, Daniel O'Connor  wrote:
>
> On 05/02/2011, at 12:43, Daniel O'Connor wrote:
>> On 05/02/2011, at 11:09, Ivan Voras wrote:
>>>> It doesn't allocate memory once it's going, everything is preallocated 
>>>> before the data transfer starts.
>>>>
>>>> I'll have a go with mlock() and see what happens.
>>>
>>> Did you find anything interesting?
>>
>> I'll be looking at it on Monday, I will let you know :)
>
> No luck with mlock() so it wouldn't appear to be paging is the issue :(

I'm also interested in raw device vs file system access!
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Scheduler question

2011-02-07 Thread Ivan Voras
On 07/02/2011 04:12, Daniel O'Connor wrote:
>
> On 07/02/2011, at 13:02, Ivan Voras wrote:
>>>> I'll be looking at it on Monday, I will let you know :)
>>>
>>> No luck with mlock() so it wouldn't appear to be paging is the issue :(
>>
>> I'm also interested in raw device vs file system access!
>
> Oops, sorry.. I just tried that now but it doesn't improve things :(

Meaning: you still get jitter?

> I am writing directly to /dev/ad10 but stressing /dev/ad14 (sudo tar -cf 
> /dev/null /local0)

Can you do only one of those things? I.e. leave all the file systems
alone and just do something like 'diskinfo -vt /dev/ad14'?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Scheduler question

2011-02-07 Thread Ivan Voras
On 7 February 2011 13:38, Daniel O'Connor  wrote:

>>> I am writing directly to /dev/ad10 but stressing /dev/ad14 (sudo tar -cf 
>>> /dev/null /local0)
>>
>> Can you do only one of those things? I.e. leave all the file systems
>> alone and just do something like 'diskinfo -vt /dev/ad14'?
>
> OK, I wrote the data to /dev/null from USB and ran diskutil in a loop and it 
> doesn't drop out.

Maybe I misunderstood you and it's a different problem than what I was
experiencing; is this a better description of your problem:

1) you have a program communicating with a USB device
2) it reads from the device and writes to a file
3) you experience stalls when you write the data recived from the USB
device to the file but only if the file system you're writing on is
also loaded by something else - heavy reads?

?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Analyzing wired memory?

2011-02-08 Thread Ivan Voras
Is it possible to track by some way what kernel system, process or 
thread has wired memory? (including "data exists but needs code to 
extract it")


I'd like to analyze a system where there is a lot of memory wired but 
not accounted for in the output of vmstat -m and vmstat -z. There are no 
user processes which would lock memory themselves.


Any pointers?

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Super pages

2011-02-23 Thread Ivan Voras

On 23/02/2011 14:03, Dr. Baud wrote:


 In general, is it unadvisable to disable super pages?


I don't think there would be any effect on the stability of operation if 
you disable superpages, but generally (except in cases of CPU bugs) you 
would not need to. Your system should operate a bit faster with 
superpages enabled.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Mem leak : malloc/free + pthreads = leakage?

2011-03-07 Thread Ivan Voras

On 06/03/2011 18:35, Ryan Stone wrote:

On Sun, Mar 6, 2011 at 10:34 AM, Ryan Stone  wrote:

I would try playing with MALLOC_OPTIONS.  I seriously doubt that there
is an actual leak in jemalloc, but from my own experiences with it I
suspect that there are certain multithreaded malloc/free sequences
that interact badly with with the per-thread caching that jemalloc
performs.  The first thing I would try is setting MALLOC_OPTIONS=7h to
disable the caching.



Wait, sorry, apparently this is a new option in HEAD.  Under 8.1
MALLOC_OPTIONS=g will disable the thread-specific caching.  See the
malloc(3) man page for the definitive list of available options.


I can confirm this suspicion; I have an malloc-intensive multithreaded 
program on 8-STABLE which I need to run with MALLOC_OPTIONS g10f2n to 
reduce otherwise severe memory lossage by the allocator. I haven't tried 
it on 9, though.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Timecounter Project (GSoc2011)

2011-03-24 Thread Ivan Voras

On 24/03/2011 10:00, Jing Huang wrote:

Hi Everyone,

  I am a student of Peking University in China. I am interest
in the FreeBSD project of "Timecounter Performance Improvements".

  I am familiar with Linux kernel and virtualization systems,
like KVM and Xen. I have maintained the Linux Server for my College
for last whole year. Recently, I learned a lot about KVM and assigned
VMs to students who need them. I also have experience of install and
config FreeBSD system.


Offtopic for your specific requests, but if you or these students would 
like to finish porting KVM to FreeBSD, that would also be a great GSoC 
project!



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Timecounter Project (GSoc2011)

2011-03-24 Thread Ivan Voras

On 24/03/2011 12:21, Zhihao Yuan wrote:

On Thu, Mar 24, 2011 at 5:39 AM, Ivan Voras  wrote:

On 24/03/2011 10:00, Jing Huang wrote:


Hi Everyone,

  I am a student of Peking University in China. I am interest
in the FreeBSD project of "Timecounter Performance Improvements".

  I am familiar with Linux kernel and virtualization systems,
like KVM and Xen. I have maintained the Linux Server for my College
for last whole year. Recently, I learned a lot about KVM and assigned
VMs to students who need them. I also have experience of install and
config FreeBSD system.


Offtopic for your specific requests, but if you or these students would like
to finish porting KVM to FreeBSD, that would also be a great GSoC project!



Linux KVM was ported to FreeBSD before:
http://retis.sssup.it/~fabio/freebsd/lkvm/

But their code are not clean, and the implementation only support
FreeBSD 6/7 (due to the changes to the USB stack). Since there may be
another big project to clean up their code, FreeBSD dropped that GSoC
result.


Yes, that is why I suggested finishing the port :) There is enough work 
in finishing the KVM port that it can be a new GSoC project.


(also, finishing FUSE...)

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Timecounter Project (GSoc2011)

2011-03-24 Thread Ivan Voras

On 24/03/2011 14:11, Zhihao Yuan wrote:


Well, it depends on the decision of core team. AFAIC, to make the KVM
to be committed is very hard, especially for a GSoC project.


Ah, please read what I'm saying: finish, not commit.


But... I think the thread is not talking about the KVM itself...

FUSE works. It's in the ports, as well as many file systems.


No, it doesn't work. It crashes. Google will show you many bug reports, 
including mine.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [gsoc] HTree Directory Index and Journal in ext2fs

2011-04-05 Thread Ivan Voras

On 05/04/2011 15:48, gnehzuil wrote:

Hello,

I would like to apply a new project "HTree Directory Index and Journal
in ext2fs" in GSoC 2011. This project is not in ideas page. But this
project can improve ext2fs in FreeBSD.

Last year, I have participated GSoC 2010 and have implemented a
preallocation algorithm in ext2fs and make it can read ext4 file system
in read-only mode. I have try to read htree dir index in ext4 read-only
mode. Yet I don't finish it. I am plan to implement a htree dir index in
ext2fs before midterm evaluation.

Next I will try to implement journal in ext2fs. I think I can borrow
some ideas from WAPBL in NetBSD.


When you say 'journal in ext2fs' do you mean ext3-compatible or 
something different entirely? I don't think that a journalling addition 
to our ext2fs which would not make it compatible with ext3 would be useful.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [UPDATE] New Boot-Loader Menu -- version 1.4

2011-05-05 Thread Ivan Voras

On 05/05/2011 15:40, Warren Block wrote:

On Thu, 5 May 2011, Devin Teske wrote:


Running on i386-compatible hardware supporting ACPI:
B&W (standard): http://twitpic.com/4tlsin
Color (loader_color=YES): http://twitpic.com/4tlt6l


Looks nice. Options 3, 4, and 5 could be changed to

3. Safe Mode
4. Single User Mode
5. Verbose

On/Off or Enabled/Disabled might be bikeshedably better than Yes and No.


If we're going to nitpick, then the style of

*Enable* Safe Mode : *YES | NO*

may be even better :) While at it, I'd also suggest aligning the YES | 
NO fields vertically for better readability.


But these are minor suggestions, it is ok the way it is :)




___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Capsicum project: Ideas needed

2011-07-08 Thread Ivan Voras

On 08/07/2011 05:42, Ilya Bakulin wrote:

Hi hackers,
As a part of ongoing effort to enhance usage of Capsicum in FreeBSD base
system, I want to ask you, which applications in the base system should
receive sandboxing support.


How about a small description what sandboxing can bring to applications?

I'm browsing the documents at 
http://www.cl.cam.ac.uk/research/security/capsicum/documentation.html 
but it looks like it still mostly describes the generic framework rather 
than what you can do with it. From it, it looks like you can set limits 
on file handle operations (e.g. (lc_limitfd(STDOUT_FILENO, CAP_FSTAT | 
CAP_SEEK | CAP_WRITE)), but what else?



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: ZFS installs on HD with 4k physical blocks without any warning as on 512 block size device

2011-08-22 Thread Ivan Voras

On 19/08/2011 14:21, Aled Morris wrote:

On 19 August 2011 11:15, Tom Evans  wrote:


On Thu, Aug 18, 2011 at 6:50 PM, Yuri  wrote:

Some latest hard drives have logical sectors of 512 byte when they

actually

have 4k physical sectors.



...

Shouldn't UFS and ZFS drivers be able to either read the right sector size



from the underlying device or at least issue a warning?


The device never reports the actual sector size, so unless FreeBSD
keeps a database of 4k sector hard drives that report as 512 byte
sector hard drives, there is nothing that can be done.


At what point should we change the default in newfs/zfs to 4k?


It is already changed for UFS in 9.


I guess formatting the filesystem for 4k sectors on a 512b drive would still
work but it would be suboptimal.  What would the performance penalty be in
reality?


It would be suboptimal but only for the slight waste of space that would 
have otherwise been reclaimed if the block or fragment size remained 512 
or 2K. This waste of space is insignificant for the vast majority of 
users and there are no performance penalties, so it seems that switching 
to 4K sectors by default for all file systems would actually be a good idea.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: ZFS installs on HD with 4k physical blocks without any warning as on 512 block size device

2011-08-23 Thread Ivan Voras

On 23/08/2011 03:23, Peter Jeremy wrote:

On 2011-Aug-22 12:45:08 +0200, Ivan Voras  wrote:

It would be suboptimal but only for the slight waste of space that would
have otherwise been reclaimed if the block or fragment size remained 512
or 2K. This waste of space is insignificant for the vast majority of
users and there are no performance penalties, so it seems that switching
to 4K sectors by default for all file systems would actually be a good idea.


This is heavily dependent on the size distribution.  I can't quickly
check for ZFS but I've done some quick checks on UFS.  The following
are sizes in MB for my copies of the listed trees with different UFS
frag size.  These include directories but not indirect blocks:

   1b  512b  1024b  2048b  4096b
 4430  4511  4631   4875   5457  /usr/ncvs
 4910  5027  5181   5499   6133  Old FreeBSD SVN repo
  299   370   485733   1252  /usr/ports cheched out from CVS
  467   485   509557656  /usr/src 8-stable checkout from CVS

Note that the ports tree grew by 50% going from 1K to 2K frags and
will grow by another 70% going to 4KB frags.  Similar issues will
be seen when you have lots of small file.


I agree but there are at least two things going for making the increase 
anyway:


1) 2 TB drives cost $80
2) Where the space is really important, the person in charge usually 
knows it and can choose a non-default size like 512b fragments.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: ZFS installs on HD with 4k physical blocks without any warning as on 512 block size device

2011-08-23 Thread Ivan Voras

On 23/08/2011 11:59, Aled Morris wrote:

On 23 August 2011 10:52, Ivan Voras  wrote:



I agree but there are at least two things going for making the increase
anyway:

1) 2 TB drives cost $80
2) Where the space is really important, the person in charge usually knows
it and can choose a non-default size like 512b fragments.


helpers like sysinstall should help with choosing the smaller blocks for
smaller drives (especially SSD)


Only via hints and help text. Too much magic in the installer leads to 
awkward choices :)


(e.g. first you need to distinguish between a VM with a small drive, a 
SSD small drive, or a SAN small volume... it quickly turns into an 
AI-class problem).


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Large machine test ideas

2011-08-26 Thread Ivan Voras
I'll have a 8x8x2 (128 logical CPUs) machine to test for an afternoon 
next week and I'm just wondering if any of you have something they want 
tested. The opportunities are limited: it would have to be a 
self-contained test (no network, drives, etc.) and fairly short.


Of course, I'll do some of my own tests just to get a feel of the machine.

I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs, 
right?


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Large machine test ideas

2011-08-29 Thread Ivan Voras
On 26/08/2011 19:44, Garrett Cooper wrote:
> On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras  wrote:
> 
> ...
> 
>> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
>> right?
> 
> A 9.0-BETA1 snapshot, yes.

Well, I'll leave it another half an hour but the 9.9-beta1 shapshot
froze on boot after showing a "SRAT: No CPU found for memory domain 4".

(all this after the traditional "do-nothing" pause of 10-or so minutes
before displaying the copyright banner).

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Large machine test ideas

2011-08-29 Thread Ivan Voras
On 29/08/2011 16:46, Ivan Voras wrote:
> On 26/08/2011 19:44, Garrett Cooper wrote:
>> On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras  wrote:
>>
>> ...
>>
>>> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
>>> right?
>>
>> A 9.0-BETA1 snapshot, yes.
> 
> Well, I'll leave it another half an hour but the 9.9-beta1 shapshot
> froze on boot after showing a "SRAT: No CPU found for memory domain 4".
> 
> (all this after the traditional "do-nothing" pause of 10-or so minutes
> before displaying the copyright banner).

No luck - it's frozen. Linux and Windows Server work fine.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Large machine test ideas

2011-08-29 Thread Ivan Voras
On 29 August 2011 17:20, Andriy Gapon  wrote:
> on 29/08/2011 18:18 Ivan Voras said the following:

>>> Not sure if hw.memtest.tests tunable has made it into 9.0-BETA1.
>>> Setting it to zero should result in skipping the checks.
>>
>> If it did, to what should I set it?
>
> See one line above your question :-)

Sorry :) I blame the ifluence of fan noise on my head :)

>>> You may also try to capture and share a verbose dmesg, if possible.
>>
>> I'll take some photos of the screen.
>
> No serial console? :(

There is on the server side but not on the client... didn't bring my
usb2serial cable.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Large machine test ideas

2011-08-29 Thread Ivan Voras
On 29 August 2011 17:15, Andriy Gapon  wrote:
> on 29/08/2011 17:46 Ivan Voras said the following:
>> On 26/08/2011 19:44, Garrett Cooper wrote:
>>> On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras  wrote:
>>>
>>> ...
>>>
>>>> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
>>>> right?
>>>
>>> A 9.0-BETA1 snapshot, yes.
>>
>> Well, I'll leave it another half an hour but the 9.9-beta1 shapshot
>> froze on boot after showing a "SRAT: No CPU found for memory domain 4".
>>
>> (all this after the traditional "do-nothing" pause of 10-or so minutes
>> before displaying the copyright banner).
>
> Not sure if hw.memtest.tests tunable has made it into 9.0-BETA1.
> Setting it to zero should result in skipping the checks.

If it did, to what should I set it?

> You may also try to capture and share a verbose dmesg, if possible.

I'll take some photos of the screen.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Large machine test ideas

2011-08-29 Thread Ivan Voras
On 29 August 2011 18:33,   wrote:
> On Mon, Aug 29, 2011 at 7:46 AM, Ivan Voras  wrote:
>> On 26/08/2011 19:44, Garrett Cooper wrote:
>>> On Fri, Aug 26, 2011 at 10:36 AM, Ivan Voras  wrote:
>>>
>>> ...
>>>
>>>> I think that I'll need a 9-CURRENT snapshot on it to run all 128 CPUs,
>>>> right?
>>>
>>> A 9.0-BETA1 snapshot, yes.
>>
>> Well, I'll leave it another half an hour but the 9.9-beta1 shapshot
>> froze on boot after showing a "SRAT: No CPU found for memory domain 4".
>
> This message implies the memory affinity information coming from ACPI
> is either non-sensical, or you have an unexpected physical setup where
> there really are CPUs with no memory in the local sockets.
>
> You should be able to boot with something like hint.srat.0="disabled"
> at the boot loader prompt.

Unfortunately, neither the memtest or the srat disabling tunables
worked (I also tried disabling srat.4).

My time with the machine is over, so I can't do more testing.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


9-beta1 installer - partition editor

2011-08-30 Thread Ivan Voras
Am I doing something wrong or the BETA1 installer cannot be used to 
manually create the partition scheme?


1) it doesn't accept "freebsd-swap" as partition type ("invalid argument")
2) it doesn't recognize that I have actually created a root (/) mount 
point; since it doesn't show mountpoints maybe it forgets the input from 
the dialog?


The partition editor looks very rudimentary and feature-less. It really 
should show "space left" on the drive.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: 9-beta1 installer - partition editor

2011-08-30 Thread Ivan Voras

On 30.8.2011. 16:36, Brandon Falk wrote:

On 8/30/2011 8:27 AM, Ivan Voras wrote:

Am I doing something wrong or the BETA1 installer cannot be used to
manually create the partition scheme?



I do not have BETA1 available right now on CD, but I do have BETA2 rev
225251. On this system I'm not able to replicate your issue (amd64). I
know I'm using a newer rev, but I've been using 9 for a long time now,
and I've yet to have an issue with the partitioner.


Ok, then it's probably fixed by now.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: 9-beta1 installer - partition editor

2011-08-30 Thread Ivan Voras

On 30.8.2011. 16:11, Nathan Whitehorn wrote:

On 08/30/11 07:27, Ivan Voras wrote:

Am I doing something wrong or the BETA1 installer cannot be used to
manually create the partition scheme?

1) it doesn't accept "freebsd-swap" as partition type ("invalid
argument")
2) it doesn't recognize that I have actually created a root (/) mount
point; since it doesn't show mountpoints maybe it forgets the input
from the dialog?

The partition editor looks very rudimentary and feature-less. It
really should show "space left" on the drive.


It does show mountpoints, and of course does support swap partitions.
You can use the partition editor to create quite complicated multi-disk
partition layouts over a variety of schemes, and in that way it is
wildly more featureful than what was in sysinstall.

Can you describe more what you were trying to do, in terms of what
partition scheme you were using, etc.? The "invalid argument" is a
message coming from the kernel, so something must be very wrong in your
setup.


It was a plain install on a RAID volume which appears as ordinary da0 
drive. I did do a couple of start-overs so it could be that some state 
got lost. It definitely did NOT show mount points in the dialog which 
lists newly created partitions.


I'm sure you've looked around but just in case you missed it, here's how 
Ubuntu's text-mode installer looks like (note its partition editor):


http://www.debianadmin.com/ubuntu-lamp-server-installation-with-screenshots.html


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Large machine test ideas

2011-08-30 Thread Ivan Voras

On 29.8.2011. 20:15, John Baldwin wrote:


However, the SRAT code just ignores the table when it encounters an issue like
this, it doesn't hang.  Something else later in the boot must have hung.


Anyway... that machine can in its maximal configuration be populated 
with eight 10-core CPUs, i.e. 80 physical / 160 logical, so here's a 
vote from me to bump the shiny new cpuset infrastructure maximum CPU 
count to 256 before 9.0.


http://www.supermicro.com/products/system/5U/5086/SYS-5086B-TRF.cfm

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: 9-beta1 installer - partition editor

2011-08-31 Thread Ivan Voras

On 31/08/2011 08:42, Andrey V. Elsukov wrote:

On 30.08.2011 16:27, Ivan Voras wrote:

Am I doing something wrong or the BETA1 installer cannot be used to
manually create the partition scheme?

1) it doesn't accept "freebsd-swap" as partition type ("invalid argument")


Not all partitioning schemes supports "freebsd-swap" partition type.
E.g. MBR does not support it.


This could very well be the cause of my problems! The dialog should 
definitely not include suggestions to create "freebsd-swap" partitions 
if the partitioning scheme does not support it.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: 9-beta1 installer - partition editor

2011-08-31 Thread Ivan Voras

On 31/08/2011 02:40, Nathan Whitehorn wrote:

On 08/30/11 19:07, Ivan Voras wrote:




It was a plain install on a RAID volume which appears as ordinary da0
drive. I did do a couple of start-overs so it could be that some state
got lost. It definitely did NOT show mount points in the dialog which
lists newly created partitions.



Which partitioning scheme did you use? How did you lay out the partitions?


I did not deviate from defaults until the partition editor, where I 
deleted existing partitions (Linux) and tried to create new ones.


So, it's a MBR scheme, and I intended to create three partitions, for 
"/", for "/srv" and a swap partition. I think Andrey's idea about what 
went wrong with the swap partition is most probably correct, so this 
only leaves the inability to register mount points with the partitions.


However, if as Brandon suggested this is already fixed, don't bother. 
I'll try the BETA2 when ISOs become available and will post screenshots 
(IPMI) if it fails again.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: 9-beta1 installer - partition editor

2011-08-31 Thread Ivan Voras
On 31 August 2011 14:45, Nathan Whitehorn  wrote:

> It does let you set mountpoints, and displays them, and always has, but not
> for bsdlabel container partitions (MBR type "freebsd"), since they aren't
> filesystems. Is this what you were trying to do?

Very probably - it was unclear to me that it still keeps the old
slice-partition division but reverses the names. But, look at the
screenshots here and see what went wrong:

http://ivoras.imgur.com/installer__partitioner

If it is as you say, then the dialog where I entered "/" and "/srv"
should definitely NOT have that field on it.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: 9-beta1 installer - partition editor

2011-08-31 Thread Ivan Voras
On 31 August 2011 15:35, Nathan Whitehorn  wrote:
> On 08/31/11 08:28, Ivan Voras wrote:

>> If it is as you say, then the dialog where I entered "/" and "/srv"
>> should definitely NOT have that field on it.
>
> Well, no. It only applies to bsdlabel containers. For instance, were I to
> want to mount an ext2 or fat32 partition directly under MBR, which the
> installer can do (and create, in the case of fat32), the mountpoint field is
> very important. What we *can* do is add a check that rejects mountpoints for
> partitions of type "freebsd". I'll see if I can code that up; it's too late
> for BETA2, however.

As you probably know, nothing precludes users to create UFS (or any
other file system) directly under the MBR partition or the disk
itself, so in fact what I showed in the screenshots should have been a
valid operation.

I think the dialogs are confusing, especially for users not used to
the FreeBSD way of doing things. How about these *minimal* changes to
the partition editor:

1) Before the partition editor starts, show an informational dialog
box describing in short (one screen, no scrolling) that they can
choose to either use a "normal" partitioning scheme like Linux,
Windows and others and just create simple partitions or they can go
the weird BSD way and create nested partitions (i.e. disklabels under
a MBR partition).

2) Have a helpful message / line in the partition editor saying that
if a "freebsd"-type partition is created without a mountpoint
specified, the editor will allow creating second-level bsdlabel under
them.

I am very much trying to emphasize that any assumption that users will
know these two pieces of information before they use the installer
will only cause them to fill the mailing lists with bug reports such
as mine or worse - just silently give up.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Large machine test ideas

2011-09-01 Thread Ivan Voras
On 1 September 2011 16:11, Attilio Rao  wrote:

>> I mean, if we have 2 cpus in a machine, but MAXCPU is set to 256, there
>> is a bunch of "lost" memory and higher levels of lock contention?
>>
>> I thought that attilio was taking a stab at enhancing this, but at the
>> current time anything more than a value of 64 for MAXCPU is kind of a
>> "caveat emptor" area of FreeBSD.
>
> With newest current you can redefine MAXCPU in your kernel config, so
> you don't need to bump the default value.
> I think 64 as default value is good enough.
>
> Removing MAXCPU dependency from the KBI is an important project
> someone should adopt and bring to conclusion.

That's certainly one half of it and thanks for the work, but the real
question in this thread is what Sean asked: what are the negative
side-effects of simply bumping MAXCPU to 256 by default? AFAIK, there
are not that many structures which are statically sized by MAXCMPU and
most use the runtime-detected smp_cpus?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: 9-beta1 installer - partition editor

2011-09-12 Thread Ivan Voras
Unfortunately, I continue to have problems with the partitioner part of
the installer in the BETA2 image. See the (unchanged) problem
screenshots here:

http://ivoras.imgur.com/freebsd_installer_2

See also the screenshots of the entire process here (on BETA1):

http://ivoras.imgur.com/installer__partitioner

I am no longer trying to create a swap partition but still:

1) I cannot proceed without specifying a root partition
2) I cannot specify the root partition (the dialog ignores it).

If this doesn't get solved, it makes FreeBSD uninstallable in this case.
There may be some kind of interference between the existing MBR scheme
and the operations that the installer attempts to do.




___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: 9-beta1 installer - partition editor

2011-09-12 Thread Ivan Voras
On 12 September 2011 18:28, Nathan Whitehorn  wrote:

> This was resolved earlier -- you cannot install onto just MBR without a
> bsdlabel. This has never been supported, and worked only by accident before.
> *As it tells you* you need to create sub-partitions.

Hi,

I'll again note that it should be supported because a) there's no
technical reason not to and b) this is how every other OS works. But
I'll leave it at that, maybe the users won't mind.

But other than that, it might be that I just don't get the workflow
it's supposed to implement. Can you point out to me on these
screenshots: http://ivoras.imgur.com/freebsd_installer_2 (or on the
other set), what option on what screen (i.e. which screenshot) should
I choose to create bsdlabels?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Sharing device driver between kernel and user space

2011-09-21 Thread Ivan Voras
On 21/09/2011 08:05, geoffrey levand wrote:
> I think you misunderstood what i need. If i got it right then cuse4bsd allows 
> user applications to create char devices, right ?
> I do not want to create character devices from user space. My VUART kernel
> module should provide the character device for user space. What i need is a 
> way to synchronize access
> to VUART data between kernel and user space. The kernel device driver should 
> provide 2 interfaces: one for user space
>  (through char dev) and the other for kernel land. The problem is how to 
> synchronize the access to VUART data between 2
> lands because VUART cannot be shared by both simultaneously.

I'm not sure I understand your question but what exactly is the problem
here? As the userland will access the device through the char device,
you need kernel code which services this device's requests. This kernel
code can use any number of synchronization operations provided by the
kernel to protect access to any and all needed resources.

In other words, you should have a single point of entry to the device in
the kernel anyway (e.g. a module, a header file, whatever) and then you
may need just a simple sx(9) lock or a sema(9) semaphore, assuming the
device access needs sleeping, or mutex(9) if it doesn't.




signature.asc
Description: OpenPGP digital signature


Re: Re[2]: Sharing device driver between kernel and user space

2011-09-21 Thread Ivan Voras
On 21 September 2011 16:09, geoffrey levand  wrote:
> Sure i can use the synchronization primitives, the problem is that the 
> response to a request sent to PS3 VUART port is not
> available immediately, and i have to disallow kernel access to the PS3 VUART 
> while i'm waiting for the response in user
> space. I send request with write syscall from user space and wait for 
> response with read syscall. In the period of time
> between sending request and receiving response i could receive some other 
> packets from VUART port, e.g. some kind of
> event notification,  i have to skip them. But kernel should not interfer 
> until i get my response.
> So i would need to lock out the kernel during this time. I think i found a 
> good solution for this problem, just use a IOCTL
> which tells kernel device driver to stop processing kernel requests and 
> events, something like SET_USER_MODE.
> After that i can use it in user space.

Have you read sema(9)?

Or if returning EBUSY is acceptable when the resource is in use by
$whatever, maybe you just need a boolean variable.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Does anyone use nscd?

2011-10-06 Thread Ivan Voras
On 05/10/2011 09:38, Trond Endrestøl wrote:
> On Wed, 5 Oct 2011 12:54+1030, Daniel O'Connor wrote:
> 
>> On 05/10/2011, at 2:30, Michel Talon wrote:
>>
>>> Des wrote:
 Does anyone actually use nscd?
>>>
>>> I am using it since a lot of time. I have not experienced annoying bugs
>>> in all that time. The last time i have been hit is when installing some
>>> new softs which require adding some user and some group with pw. Of
>>> course this doesn't work well with caching these data, and i had
>>> completely forgotten i was using a cache. This is very perplexing.
>>
>> In my experience ncsd seems to cache negative hits forever, 
>> regardless of the setting for negative-time-to-live.
> 
> I'm glad to see I'm not the only one who has noticed this odd 
> behaviour of nscd. Shame on me for not speaking up sooner, but I 
> feared I might be proved wrong (again), and yes, that's a lame excuse. 
> :-/

+1.

It's very annoying when installing ports which add users - the port adds
it then in some future code checks it and it fails. I've noticed it with
at least CUPS.



signature.asc
Description: OpenPGP digital signature


Re: mmap performance and memory use

2011-10-11 Thread Ivan Voras
On 07/10/2011 19:13, Alan Cox wrote:
> On Thu, Oct 6, 2011 at 11:01 AM, Kostik Belousov wrote:

>> For one thing, this indeed causes more memory use for the OS. This is
>> somewhat mitigated by automatic use of superpages. Superpage promotion
>> still keeps the 4KB page table around, so most savings from the
>> superpages are due to more efficient use of TLB.
>>
>>
> You are correct about the page table page.  However, a superpage mapping
> consumes a single PV entry, in place of 512 or 1024 PV entries.  This winds
> up saving about three physical pages worth of memory for every superpage
> mapping.

But wouldn't the "conservative" superpages policy make it difficult in
the OPs case to ever get promotions to superpages if he's touching pages
almost randomly?

> Similarly, mmap(..., MAP_PREFAULT_READ) on a large, memory resident file may 
> pre-map the file using superpage mappings. 

grep doesn't find this symbol in the sys src tree in 8-STABLE - nor it
seems in /usr/include.

But anyway, is there a mechanism which gives more guarantees than "may"
(i.e. which forces this) - or if not, how hard would it be to add one?
Some Linux-based "enterprise" software (including Java) use
Linux-specific calls to allocate large pages directly.




signature.asc
Description: OpenPGP digital signature


Re: Measuring memory footprint in C/C++ code on FreeBSD

2011-10-21 Thread Ivan Voras
On 21/10/2011 12:19, Razmig K wrote:
> Le 21.10.2011 10:44, Peter Jeremy a écrit :
>> On 2011-Oct-20 19:57:31 +0200, Razmig K  wrote:
>> It's not clear whether the program is attempting to determine it's
>> own (or a child's) memory footprint, or that of an arbitrary process.
>> In the former case, getrusage() is the obvious choice.  This as a
>> portable interface.
> The program has to determine its own memory footprint. It has no children.
> 
>>
>> If you want to examine arbitrary processes, the best interface on
>> FreeBSD would be kvm_getprocs(3).
>>
>> BTW, since you mention heap objects, I presume you are aware that
>> malloc() uses mmap(), rather than sbrk() to obtain memory.
> No I wasn't aware of that.
> 
> In few words, the program needs to obtain and report information
> reported by the SIZE column of top, since it is going to be run many
> times, and it is impractical to watch top for this purpose.

Well, do you know that SIZE in top is virtual memory size, not resident
size (which is the "RES" column)? You can allocate whatever you want
from virtual memory, it is not "used" until it's touched.




signature.asc
Description: OpenPGP digital signature


Re: Measuring memory footprint in C/C++ code on FreeBSD

2011-10-21 Thread Ivan Voras
On 21/10/2011 12:57, Razmig K wrote:
> Le 21.10.2011 12:26, Ivan Voras a écrit :
>> Well, do you know that SIZE in top is virtual memory size, not resident
>> size (which is the "RES" column)? You can allocate whatever you want
>> from virtual memory, it is not "used" until it's touched.
> 
> Yes, I do. So do you suggest using RES as a better indicator of memory
> footprint?

Almost certainly yes. Measuring virtual memory is significantly less
important for real-world loads. Some of this is very nicely described
here: https://www.varnish-cache.org/trac/wiki/ArchitectNotes .

> The program in question processes large 3D images via vtk, and I'd like
> to measure its memory usgae with different parameter configurations as
> the maximum amount of memory acquired during execution. Since SIZE often
> happens to be larger than RES, and increase more during execution, I
> thought of using it as an indicator of memory footprint.

No; the difference between SIZE and RES is "slack space" - allocated but
untouched virtual memory, which is *NOT PRESENT IN RAM*. You can verify
this yourself: make a small C program and allocate twice the physical
memory (+swap) you have on the machine (try terabytes on a 64-bit
machine), and it will succeed. If you look at this program in top, it
should (barring some optimizations) show you that SIZE is huge, but RES
is a couple of MB, basically like you didn't allocate anything at all.
Now, it is a whole other thing if you try to actually *use* this memory
you've allocated.

Here's a random link on the topic from Google:
http://opsmonkey.blogspot.com/2007/01/linux-memory-overcommit.html .

Unfortunately, the phrase "memory overcommit" has been hijacked by the
virtualization environment to mean the same thing but relating to the
memory in virtual machines.




signature.asc
Description: OpenPGP digital signature


sleep(3) hangs?

2011-11-04 Thread Ivan Voras
I have an "interesting" problem which is why I'm posting to the hackers@
list :)

The situation is: an 8-STABLE amd64 system from a few months ago running
on VMWare ESXi 5, which worked fine until today. Today, it looks like
anything which "sleeps" for whatever reasons (including select(2))
simply hanged without return. Commands like "iostat 1" and "sleep 1" hanged.

The timecounter was autodetected to HPET, hz was autoconfiguted to 100.

When I changed the timecounter to ACPI-safe, situation somewhat
normalized, but each second the machine sees (from "sleep 1") takes
around 3-4 wall-clock seconds.

I'm running the system as-is until a new kernel from todays sources is
built (in the hope that something will fix it). If anyone wants me to
experiment with the machine, tell me...




signature.asc
Description: OpenPGP digital signature


Re: The zombie has involved into /dev/null

2011-11-16 Thread Ivan Voras
So, if I understand you correctly, you are reporting a bug in which a
jailed process is holding (the jailed instance of) /dev/null open and
"umount -f" doesn't work on the jailed /dev ?


On 14/11/2011 23:52, Slono Slono wrote:
> On one of servers where installed cacti in jail there is strange enough 
> situation. Sometimes processes poller.php haven't time to successful complete 
> until to beginning of the following session (absence of lock is other problem 
> - its ok) therefore processes breed yet won't begin them kill. During such 
> moments appear zombie processes. However, these zombie show that keep devfs 
> the device. Possibly because are started as
> 
> php /poller.php 2>/dev/null 2>&1
> 
> Sending  of any signals (SIGCHILD too) changes nothing. Strange that with -f 
> (force) optons through a umount command is impossible to unmount devfs with 
> which worked as the zombie.
> 
> ps axf  shows:
> ..
> 
> 99551  ??  DsJ  0:00.12 /usr/local/bin/php 
> /usr/local/share/cacti/poller.php
> 99554  ??  ZJ   0:00.02 
> .
> 
> lsof -p 99551
> COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFFNODE NAME
> php 99551 root  cwd   VBAD (revoked)
> php 99551 root  rtd   VDIR 225,10355344423  678909 
> /usr/jails/jails/mon
> php 99551 root  jld   VDIR 225,10355344423  678909 
> /usr/jails/jails/mon
> php 99551 root  txt   VREG 225,1035534442  3261754 1620922 
> /usr/jails/jails-data/mon-data/usr/local/bin/php
> php 99551 root  txt   VREG 225,1035534442   246776  626780 
> /usr/jails/jails-data/mon-data/libexec/ld-elf.so.1
> php 99551 root  txt   VREG 225,103553444233600  626862 
> /usr/jails/jails-data/mon-data/lib/libcrypt.so.5
> php 99551 root  txt   VREG 225,1035534442   377814 1267501 
> /usr/jails/jails-data/mon-data/usr/local/lib/libpcre.so.0
> php 99551 root  txt   VREG 225,1035534442   150656  626861 
> /usr/jails/jails-data/mon-data/lib/libm.so.5
> php 99551 root  txt   VREG 225,1035534442  1495740  649173 
> /usr/jails/jails-data/mon-data/usr/local/lib/libxml2.so.5
> php 99551 root  txt   VREG 225,103553444284848  626828 
> /usr/jails/jails-data/mon-data/lib/libz.so.5
> php 99551 root  txt   VREG 225,1035534442  1074175  649584 
> /usr/jails/jails-data/mon-data/usr/local/lib/libiconv.so.3
> php 99551 root  txt   VREG 225,1035534442  1270640  626857 
> /usr/jails/jails-data/mon-data/lib/libc.so.7
> php 99551 root  txt   VREG 225,103553444274189  636259 
> /usr/jails/jails-data/mon-data/usr/local/lib/php/20090626/session.so
> php 99551 root  txt   VREG 225,103553444263195  637380 
> /usr/jails/jails-data/mon-data/usr/local/lib/php/20090626/xml.so
> php 99551 root  txt   VREG 225,103553444240650  638507 
> /usr/jails/jails-data/mon-data/usr/local/lib/php/20090626/snmp.so
> php 99551 root  txt   VREG 225,1035534442   337128  665903 
> /usr/jails/jails-data/mon-data/usr/lib/libssl.so.6
> php 99551 root  txt   VREG 225,1035534442   730269 8050234 
> /usr/jails/jails-data/mon-data/usr/local/lib/libnetsnmp.so.30
> php 99551 root  txt   VREG 225,103553444235264  626850 
> /usr/jails/jails-data/mon-data/lib/libkvm.so.5
> php 99551 root  txt   VREG 225,103553444219720  626858 
> /usr/jails/jails-data/mon-data/lib/libdevstat.so.7
> php 99551 root  txt   VREG 225,1035534442  1693344  626824 
> /usr/jails/jails-data/mon-data/lib/libcrypto.so.6
> php 99551 root  txt   VREG 225,1035534442   105904  666224 
> /usr/jails/jails-data/mon-data/usr/lib/libelf.so.1
> php 99551 root  txt   VREG 225,103553444261034  635955 
> /usr/jails/jails-data/mon-data/usr/local/lib/php/20090626/mysql.so
> php 99551 root  txt   VREG 225,103553444254114  637132 
> /usr/jails/jails-data/mon-data/usr/local/lib/php/20090626/sockets.so
> php 99551 root0u  PIPE 0xfe07514ab5b016384 
> ->0xfe07514ab708
> php 99551 root1w  VCHR   0,27  0t0  27 
> /usr/jails/jails/mon/dev (devfs) (like character special /dev/null)
> php 99551 root2w  VCHR   0,27  0t0  27 
> /usr/jails/jails/mon/dev (devfs) (like character special /dev/null)
> php 99551 root3u  unix 0xfe074ad832a8  0t0 ->(none)
> php 99551 root5u  PIPE 0xfe043c62fcb80 
> ->0xfe043c62fb60
> 
> mount -t devfs |grep mon
> devfs on /usr/jails/jails/mon/dev (devfs, local, multilabel)
> 
> umount -f /usr/jails/jails/mon/dev
> umount: unmount of /usr/jails/jails/mon/dev failed: Device busy
> 
> However apparently devfs is unmount when executed jail stop:
> 
> ls -la /usr/jails/jails/mon/dev
> total 5
> drwxr-xr-x  2 root  wheel  2 Nov 14 22:36 .
> drwxr-xr-x  3 root  wheel  3 Nov 14 22:36 ..
> 
> As can be that zombie blocks devfs or that in system there is an information 
> on active mou

sem(4) lockup in python?

2012-01-11 Thread Ivan Voras
The lang/python27 port can optionally be built with the support for 
POSIX semaphores - i.e. sem(4). This option is labeled as experimental 
so it may be that the code is simply incorrect. I've tried it and get 
frequent hangs with the python process in the "usem" state. The kernel 
stack is as follows and looks reasonable:


# procstat -kk 19008
  PIDTID COMM TDNAME   KSTACK 

19008 101605 python   -mi_switch+0x174 
sleepq_catch_signals+0x2f4 sleepq_wait_sig+0x16 _sleep+0x269 
do_sem_wait+0xa19 __umtx_op_sem_wait+0x51 amd64_syscall+0x450 
Xfast_syscall+0xf7


The process doesn't react to SIGINT or SIGTERM but fortunately reacts to 
SIGKILL.


This could be an error in Python code but OTOH this code is not 
FreeBSD-specific so it's unlikely.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: sem(4) lockup in python?

2012-01-11 Thread Ivan Voras
On 11 January 2012 14:06, John Baldwin  wrote:
> On Wednesday, January 11, 2012 6:21:18 am Ivan Voras wrote:
>> The lang/python27 port can optionally be built with the support for
>> POSIX semaphores - i.e. sem(4). This option is labeled as experimental
>> so it may be that the code is simply incorrect. I've tried it and get
>> frequent hangs with the python process in the "usem" state. The kernel
>> stack is as follows and looks reasonable:
>>
>> # procstat -kk 19008
>>    PID    TID COMM             TDNAME           KSTACK
>>
>> 19008 101605 python           -                mi_switch+0x174
>> sleepq_catch_signals+0x2f4 sleepq_wait_sig+0x16 _sleep+0x269
>> do_sem_wait+0xa19 __umtx_op_sem_wait+0x51 amd64_syscall+0x450
>> Xfast_syscall+0xf7
>>
>> The process doesn't react to SIGINT or SIGTERM but fortunately reacts to
>> SIGKILL.
>>
>> This could be an error in Python code but OTOH this code is not
>> FreeBSD-specific so it's unlikely.
>
> This is using the new umtx-based semaphore code that David Xu wrote.  He is
> probably the best person to ask (cc'd).
>

Ok, I've encountered the problem repeatedly while building databases/tdb:
 it uses Python in the build process (but maybe it needs something else in
parallel to provoke the problem).
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: sem(4) lockup in python?

2012-01-11 Thread Ivan Voras
On 11 January 2012 17:47, Garrett Cooper  wrote:

> when doing interactive builds as well. The issue appears to be
> exacerbated when we have more builds running in parallel on the same
> machine. I've also run into the same issue compiling talloc because it
> uses the same waf infrastructure as tdb, which was designed to "speed
> things up by forcing builds to be parallelized" (It builds
> kern.smp.ncpus jobs instead of -j 1). Furthermore, it seems to occur
> regardless of whether or not we have the WITH_SEM enabled in python or
> not

... which is interesting as I've habitually enabled WITH_SEM on 8.x
systems and there it worked without problems. Must be the new code.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-17 Thread Ivan Voras

(answering out of order)

On 16/01/2012 23:28, John Kozubik wrote:


2) Having two simultaneous production releases draws focus away from
both of them, and keeps any release from ever truly maturing.


This isn't how things work. The -CURRENT always has (and probably always 
had and always will have) the focus of developers. The "releases" are 
for many people simply a periodical annoyance due to freezes. In no way 
will reducing the number of "production releases" change this. As a 
volunteer effort, backports to stable branches only happen when 1) it's 
in the interest of the developer, e.g. "I've found a bug on my systems, 
want to get it fixed in -STABLE" and 2) when the developer is budged by 
outside forces (users complaining, other developers requesting it) and 
3) they think it's worth doing and have time to do it spontaneously. 
These are in order of likelihood to happen.


You could say the question is: why is it so, but I think you know the 
answer to that: small project, not enough manpower and volunteer-hours. 
However, the situation is actually quite good because the developers are 
usually very responsive to MFC requests...



going to
run RELEASE software ONLY



4) New code and fixes that people NEED TODAY just sits on the shelf for
8 or 10 or (nowadays) 13 months while end users wait for new minor
releases.


... except if you expect regular releases :)

I've concluded very early that because of what I've said above, the only 
way to run FreeBSD effectively is to track -STABLE. The developers 
MFC-ing stuff usually try hard not to break things so -STABLE has become 
a sort of "running RELEASE" branch. Since -STABLE is so ... stable ..., 
there is less and less incentive to make proper releases (though I think 
nobody would mind it happening).


The next question is: what do releases from a -STABLE branch bring in 
that simply tracking the original -STABLE tree doesn't? Lately, not very 
much. Since there is a huge number of users and developers tracking 
-STABLE, the testing done specifically for a X.Y, Y>0 RELEASE is not 
very extensive, so you just as well could have tracked -STABLE.


I'm sure you know how easy it is to upgrade a STABLE-running system, and 
there are simple ways in which that can be made to scale to thousands of 
machines. Breakage is of course a risk, but not significantly greater 
than for any other upgrading.



of 2012, we should be on 6.12 (or so) and just breaking ground on 7.0.
Instead, each release gets perhaps two years of focused development
before every new fix is applied only to the upcoming release, and any
kind of support or enthusiasm from the community just disappears.


If you're saying that -STABLE branches don't get fixes fast enough, I'd 
disagree.



A few years ago we were dying for new em(4) and twa(4) drivers in
FreeBSD 6, but they were applied only to 7 and beyond, since that was
the "new production" release (as opposed to the "old production"
release). It's the same bad choice again: make major investments in
testing and people and processes every two years, or just limp along
with old, buggy drivers and no support.


Have you tried contacting the developers of those drivers? The most 
likely causes the drivers weren't MFC-ed are either that they were 
experimental enough that it was feared for their stability or that they 
didn't think anyone needs drivers MFC-ed.


The situation you describe is certainly not FreeBSD-specific: Debian is 
notoriously slow in adopting new features, but so is Red Hat Enterprise 
Linux, which had the ancient (2006 vintage) 2.6.18 kernel throughout its 
5.x cycle (still active in 2011) - though updated with new drivers. 
Compared to these, FreeBSD is in many ways a pleasure to work with.


Seriously, just think of -STABLE as a "rolling release", just like the 
ports tree.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-17 Thread Ivan Voras

On 17/01/2012 07:20, John Kozubik wrote:


 as wonderful as ZFS on FreeBSD is (and we are
deploying it this year) it is only now (well, in March) with 8.3 that I
feel it is finally safe and stable enough to bet the farm on. I'm not
the only one that feels this way.


I must remember to ask you about your experiences with ZFS in about a 
year from now :)



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-17 Thread Ivan Voras
On 17 January 2012 13:02, Tom Evans  wrote:
> On Tue, Jan 17, 2012 at 11:41 AM, Ivan Voras  wrote:
>> I've concluded very early that because of what I've said above, the only way
>> to run FreeBSD effectively is to track -STABLE. The developers MFC-ing stuff
>> usually try hard not to break things so -STABLE has become a sort of
>> "running RELEASE" branch. Since -STABLE is so ... stable ..., there is less
>> and less incentive to make proper releases (though I think nobody would mind
>> it happening).
>>
>> The next question is: what do releases from a -STABLE branch bring in that
>> simply tracking the original -STABLE tree doesn't? Lately, not very much.
>
> Sorry to just pick out bits of your email Ivan…
>
> Ability to use freebsd-update. It would be better to have more
> frequent releases. As a prime example, ZFS became much more stable
> about 3 months after 8.2 was released. If you were waiting for an 8.x
> release that supported that improved version of ZFS, you are still
> waiting.

You know, that's an excellent point! And maybe an excellent idea: to
provide occasional, time-based STABLE snapshots for freebsd-update.

> You say that snapshots of STABLE are stable and effectively a running
> release branch, so why can't more releases be made?

Nobody volunteered :(

> Is the release process too complex for minor revisions, could that be
> improved to make it easier to have more releases, eg by not bundling
> ports packages?

Almost certainly yes. The current release process involves src, ports
and docs teams. Would you and other RELEASE users be happy with simple
periodic snapshots off the STABLE branches, not much different from
tracking STABLE? The only benefit I see would be a light-weight
opportunity for testing which would probably end up being implemented
by moving to date-based tags (e.g. if a critical bug is found and the
fix MFC-ed, the "current" tag would be advanced to "$today")?

> Can it really be that the best advice for users is to run their own
> build infrastructure and make their own releases?

Maybe. I'm trying to suggest that it looks like (I may be wrong, of
course) that the effort required to upgrade from one RELEASE to the
other is comparable to the effort of just having a -STABLE build
machine somewhere and doing "make installkernel, make installworld,
mergemaster -FU" over NFS on a 1000 machines. If you are serious about
testing, you would need to test the RELEASEs also.

> I really don't want to come across as someone throwing their toys out
> and saying that unless everything changes I'm off to Linux-land,
> however there is mutterings at $JOB that too much time is spent
> massaging FreeBSD and that using Linux would be significantly easier
> to manage.

Personally, I actually like apt-get and the way it handles installs,
updates, dependencies, suggesteions, etc. I dislike almost everything
else, thoughj.

But now I'm curious: how do you (and others) update from one RELEASE
to the other? Just by using freebsd-update? What do you think prevents
you from using -STABLE?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


  1   2   3   4   5   >