Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-18 Thread Maxim Sobolev

Dag-Erling Smørgrav wrote:

Andrey Chernov <[EMAIL PROTECTED]> writes:
"BSD sort" as an idea will be a good project indeed, but "BSD sort" 
implementation we currently have at hand is totally misleading and should 
be rewritten from the scratch, I realize it when long time ago I try to 
localize it for single byte locales.


I think part of the problem is that there aren't enough people who truly
understand localization.  I think I understand most of it, but I'm
pretty sure I *don't* understand how collation works, or is supposed to
work.  Amongst other things, I don't understand how (or whether) it
handles cases like "aa" and "å", which are considered the same letter in
Norwegian.

Perhaps you could create a Localization page on wiki.freebsd.org which
addresses these issues, or at least points to relevant resources?


Good regression test suite which would include cases in different single 
and multi-byte locates for grep/sort/etc could also be a big help.


Regards,
--
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
T/F: +1-646-651-1110
Web: http://www.sippysoft.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sun4v arch

2008-08-23 Thread Maxim Sobolev

Peter Jeremy wrote:

Is there a summary of the open issues somewhere?  There are no sun4v
PRs open.  http://wiki.freebsd.org/FreeBSD/sun4v effectively hasn't
been touched since November 2006 and suggests that the only critical
issue is lack of serial port support.


There is a better interpretation, which is that the only critical issue 
is lack of real users for this port, not lack of serial port support :).


-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sun4v arch

2008-08-25 Thread Maxim Sobolev

Maxim Sobolev wrote:

Peter Jeremy wrote:

Is there a summary of the open issues somewhere?  There are no sun4v
PRs open.  http://wiki.freebsd.org/FreeBSD/sun4v effectively hasn't
been touched since November 2006 and suggests that the only critical
issue is lack of serial port support.


There is a better interpretation, which is that the only critical issue 
is lack of real users for this port, not lack of serial port support :).


Just to clarify a bit - my point was not to suggest that port is 
irrelevant, or that the FreeBSD should not go there. In fact I believe 
on contrary from what I know sun4v is good as a testbed for the future 
of multi-processor architectures today - definitely we will see ever 
increasing number of cores in commodity Intel/AMD servers in few years 
from now. So that in that sense sun4v work is very important if the 
FreeBSD project wants to keep ahead of things, not catching-up later.


However, realistically immaturity of the port as well as scarcity of 
hardware limits number of users severely. Therefore, absence of PRs 
should not be surprising to anyone.


-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Enhancing cdboot [patch for review]

2008-12-08 Thread Maxim Sobolev

Hi,

Below please find patch that enhances cdboot with two compile-time options:

1. CDBOOT_SILENT. When this option is set, the cdboot doesn't produce
any messages except "Loading, please wait..." and it also passes
RBX_MUTE flag to the next stage to silence it as well. This is intended
for custom installations where end-user is not required to see any
messages or interfere with the boot process.

2. CDBOOT_PROMPT. When this option is enabled the cdboot behaves like
windows xp or vista cd loader, that is it reads MBR from the first hard
drive in the system and if the MBR is bootable (i.e. drive has some
other operating system installed on it) then it presents user with
"Press any key to boot from CD" prompt and waits 20 seconds. If key is
not pressed then the control is passed to the MBR, otherwise CD is
booted. This is intended for installation CD to allow unattended mode
and also helps when installation CD is unintentionally left in the drive
of the machine that is set to boot off CD.

Any comments/suggestions are appreciated. If there are no objections I
would like to commit the change. The long-term goal is to make
CDBOOT_PROMPT default mode for installation CD.

http://sobomax.sippysoft.com/~sobomax/cdboot.diff

-Maxim

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Enhancing cdboot [patch for review]

2008-12-08 Thread Maxim Sobolev

Hi,

Below please find patch that enhances cdboot with two compile-time options:

1. CDBOOT_SILENT. When this option is set, the cdboot doesn't produce 
any messages except "Loading, please wait..." and it also passes 
RBX_MUTE flag to the next stage to silence it as well. This is intended 
for custom installations where end-user is not required to see any 
messages or interfere with the boot process.


2. CDBOOT_PROMPT. When this option is enabled the cdboot behaves like 
windows xp or vista cd loader, that is it reads MBR from the first hard 
drive in the system and if the MBR is bootable (i.e. drive has some 
other operating system installed on it) then it presents user with 
"Press any key to boot from CD" prompt and waits 20 seconds. If key is 
not pressed then the control is passed to the MBR, otherwise CD is 
booted. This is intended for installation CD to allow unattended mode 
and also helps when installation CD is unintentionally left in the drive 
of the machine that is set to boot off CD.


Any comments/suggestions are appreciated. If there are no objections I 
would like to commit the change. The long-term goal is to make 
CDBOOT_PROMPT default mode for installation CD.


http://sobomax.sippysoft.com/~sobomax/cdboot.diff

-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Enhancing cdboot [patch for review]

2008-12-08 Thread Maxim Sobolev

Luigi Rizzo wrote:

On Mon, Dec 08, 2008 at 02:40:41PM -0800, Maxim Sobolev wrote:

Hi,

Below please find patch that enhances cdboot with two compile-time options:

...

Any comments/suggestions are appreciated. If there are no objections I
would like to commit the change. The long-term goal is to make
CDBOOT_PROMPT default mode for installation CD.

http://sobomax.sippysoft.com/~sobomax/cdboot.diff


Looks good. Some comments:


Thank you for the review and comments. Please see my answers below.


1. since there is plenty of space in the cdboot sector, why don't you
   make the two option always compiled in, controlling which one to
   activate through flags in the bootsector itself, to be set
   patching the binary sector itself using a mechanism similar to
   boot0cfg.
  Of course you cannot alter a cdrom after you burn it,
   but it makes it easier to build CDs with one or the other defaults,
   patching cdboot or the iso image itself before creating/burning it.

2. in fact, the 'silent' option could be disabled at runtime by
   pressing some key (e.g. adding a short wait loop before proceeding;
   if this is meant for custom, unattended CDs the extra delay should not
   matter much);


Good idea, I will see if I can put that in. In fact this behavior should 
have to be optional as well, since one of the uses for the "silent" 
option here is to provide tamper-resistant boot process on custom hardware.



3. one nitpick -- in one of the first chunks you replace $start
   with $LOAD, but if i am not mistaken operation depends on $LOAD = $start,
   so why don't you always use the same ?


No, they are not the same. $LOAD is address where BIOS loads boot 
sector, which is 0x7c00 by default (you can configure it when building 
CD-ROM, which is why it's an option). On the other hand, $start is 
address where code is compiled to be located, which is 0x0600.



 Also in terms of relocation size, wouldn't it be the case of
   hardwiring the size of the cd boot sector:

-   mov $((end_init - start)/2),%cx
+   mov 1024,%cx


Well, I don't see the reason to hardwire this. CDROM boot code can be of 
different size (within certain limits of course, to be selected when 
building ISO), it's not limited to fixed number of sectors like boot[012].



4. another nitpick -- the value you pass in %si to the MBR does not
   seem to point to anything useful. As discussed about boot0.S and
   the followup in the mailing lists, there seems to be no standard
   but at least some MBR expect %si to point to a partition entry,
   so you should probably initialize one in a way similar way to that
   used by boot0.S


Hmm, maybe I misunderstood it then. What do you mean by "point to 
partition entry exactly"? Right now it points to the beginning on MBR.


-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Enhancing cdboot [patch for review]

2008-12-08 Thread Maxim Sobolev

Luigi Rizzo wrote:

On Mon, Dec 08, 2008 at 04:29:04PM -0800, Maxim Sobolev wrote:

Luigi Rizzo wrote:

...

4. another nitpick -- the value you pass in %si to the MBR does not
  seem to point to anything useful. As discussed about boot0.S and
  the followup in the mailing lists, there seems to be no standard
  but at least some MBR expect %si to point to a partition entry,
  so you should probably initialize one in a way similar way to that
  used by boot0.S
Hmm, maybe I misunderstood it then. What do you mean by "point to 
partition entry exactly"? Right now it points to the beginning on MBR.


ok, so here is what I know.

Even though there is no standard, at least ldlinux.sys and perhaps
other bootloaders expect %si to point to a 16-byte record containing
the partition descriptor (same structure as one of the 4 records
at 0x1be in the MBR) for the partition they were loaded from.

ldlinux.sys uses this info to "relocate": it knows the location of the
other sectors of ldlinux.sys relative to the beginning of the partition,
and uses the start-of-partition from the record at %si to compute
these locations in terms of absolute disk positions.

Note that in principle a MBR does not need this info -- even if it
is a multi-sector boot code such as boot0ext, it may well assume to
be located at offset 0.

On the other hand if the code on the MBR uses %si, then you should
set the entry so that at least the starting CHS and LBA info point
to the first sector on disk, i.e. CHS=0,0,1 and LBA=0.

In practical terms -- make %si point to a 16-byte area of memory
containing all 0's except for the byte representing the sector
number for the start of the partition.
See the code in a recent sys/boot/i386/boot0/boot0.S which gives
some details on this.


I see, thank you for the explanation. It looks like it only makes sense 
for multi-stage boot loaders, when the stage has been loaded from some 
location within the disk and it needs some clue to determine where it 
has came from. In this case we simply emulate BIOS loading MBR, and from 
what I've read here MBR code should make no assumptions with regard to 
%si, so that I would just set it to zero. Do you think it could create 
any issues?


-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Improving the kernel/i386 timecounter performance (GSoC proposal)

2009-03-30 Thread Maxim Sobolev

Scott Long wrote:
I've been talking about this for years.  All I need is help with the VM 
magic to create the page on fork.  I also want two pages, one global

for gettimeofday (and any other global data we can think of) and one
per-process for static data like getpid/getgid.


I believe somebody suggested that no real VM magic is needed and the 
libc should be in charge of opening special pseudo-device and doing 
necessary mmap(2) magic to get the page mapped in when user calls 
gettimeofday()/getpid()/getid() etc for the first time.


-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Improving the kernel/i386 timecounter performance (GSoC proposal)

2009-03-30 Thread Maxim Sobolev

Peter Jeremy wrote:

On 2009-Mar-29 08:35:45 +0800, David Xu  wrote:

Julian Elischer wrote:

interestingly it is even feasible to have a per-thread page..
it requires that the scheduler change a page table entry tough.

I will knock his door at midnight if he added such a heavy weight
task in the scheduler, TLB shutdown is horrible, and big code size
squeezing out data from CPU cache is not idea model.
scheduler should be as simple as just a context switching routine.


If the TSC is not consistent between all cores (which is probably
the most common situation at present), then using the TSC implies
knowing which core you are executing on.  From a userland perspective,
the easiest way to do this is to have a page of data that varies
depending on which core you are executing on.


It's not that easy, unless you can pin thread to a specific core before 
reading that page. I.e. imagine the case when your thread reads per-cpu 
page, get preempted and scheduled to a different core, then executes 
RDTSC there, still thinking it got TSC reading from the first core. Even 
if it does re-read from that page again after reading TSC to determine 
if he has read the correct TSC, still it's possible (though not very 
likely) that it has been preempted again and scheduled to the first core 
after reading the TSC.


-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Improving the kernel/i386 timecounter performance (GSoC proposal)

2009-03-30 Thread Maxim Sobolev

Robert Watson wrote:
Part of the point of mapping in the page at execve()-time, or 
fork()-time for per-process pages (which I'm not entirely convinced we 
need yet) is to avoid the cost of an extra device open, mmap, etc, for 
every execve(), which can be quite expensive.  I stuck a prototype page 


You don't really need to do it on every execve() unconditionally. It 
could be done on demand in libc, so that only when thread pass certain 
threshold, the "common page optimization code" kicks in and does its 
open/mmap/etc magic. Otherwise, "normal" syscall is performed. The 
implementation could be as simple as counter in the appropriate libc 
routine, so that optimization engages after certain number of calls. For 
syscalls that return time it's also easy to do frequency thresholds, so 
that for example gettimeofday() only gets optimized if threads calls it 
more frequently that 1 call/sec.


-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [RFC] New features for libvgl

2001-01-25 Thread Maxim Sobolev

Nicolas Souchu wrote:

> On Wed, Jan 24, 2001 at 08:14:01PM +0200, Maxim Sobolev wrote:
> > >
> > > Isn't your list of modes redundant with the internal data structures of the
> > > VGA/VESA driver? Why do you list modes if it's not to query a specific one?
> >
> > I believe that there should be possibility to do both these things, i.e. (1) query
> > list of available modes using some filter, so the aplication/toolkit will be able 
>to
> > select one that matches its needs, and (2) let the video driver select the best one
> > given certain constrains. For example SDL provides a possibility to at least 
>emulate
> > mode if is not directly available from video hardware, so it need to know what the
> > alternatives are.
>
> All this is done by GGI. I agree that you may need it for SDL.

So why not to put it into libvgl and allow any toolkit use this code?

> >
> > I'm not very adamant about using internal data structures of video driver, because
> > they are too generic and include too much irrelevant details. Probably we should
> > extent VGLBitmap to include missing bits (depth, masks, etc).
>
> Maybe you should rather put it in your VGL<->SDL interface driver and just consider 
>VGL as
> a drawing library.
>
> >
> > > [skip]
> > >
> > > 
> >
> > Well, it's nice, but I'm not sure how it would work in the case of more high level
> > toolkit (like SDL). As I said earlier, application can request an arbitrary mode,
> > i.e. 349x246x19, so we have to do some adjustments before passing these parameters
> > to the video driver- to do this we should know what the alternatives are.
>
> See above.

I see. ;)

> >
> > > About the mouse stuff, what is the exact usage of MousePointerShow? It's
> > > not documented in the manpage.
> >
> > It's internal VGL function used to draw a mouse pointer at the screen.
>
> So, the mousehandler is called each time the mouse is moving?

Yes, each time mouse is moving syscons driver sends a signal specified by the 
CONS_MOUSECTL
ioctl. What I'm proposing here is to add a knob to allow vgl-using apps to assign 
their own
handlers for this.

> The SDL library is
> synchronous with input peripheral?

It depends on specific driver implementation. You can make it to either query mouse 
position
periodically or be synchronous with input peripheral. Matter of taste ;).

> How will you handle acceleration with SDL?

There are not much you can do about acceleration with vgl, hovewer SDL in general 
contains a
provision for hardware accelerated blits.

Ok, do you have any patches for vgl yet? I would like to take  a look at it if 
possible.

-Maxim




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: SER Core Dumps

2006-08-22 Thread Maxim Sobolev

Dan Nelson wrote:

In the last episode (Aug 18), Jean-Michel Hiver said:

Kris Kennaway a écrit :

On Fri, Aug 18, 2006 at 10:37:43AM +0400, Jean-Michel Hiver wrote:

FreeBSD's SER port core dumps when I start it with 'fork=yes' in
the config file. The OS is freebsd 6.1, the platform is:

Typically this is something to take up with the software authors.
Well, it doesn't seem to do so under Linux / Debian, and people on the 
#ser IRC channel have sent me to FreeBSD's maintainers :(


They probably meant the maintainer of that particular port, which in
net/ser's case is [EMAIL PROTECTED] .


Unfortunately I don't have amd64 machine to test on. There is little I 
can do unless you can debug by yourself and provide patch for me.


-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Proper (no) accounting for the disabled HTT cores

2006-09-01 Thread Maxim Sobolev

Hi,

Currently, FreeBSD by default disables hyper-threading "cores", by not 
scheduling any threads to it. However, it still counts those cores as 
"active but permanently idle" when calculating system-wide CPUs 
statistics. It is incorrect, since it skews statistics quite a bit and 
creates real problems for certain types of applications (monitoring 
applications for example), by making them believe that the system does 
have enough idle resources, while in fact it does not.


I think the proper way to handle disabled cores is to not account for 
them in any way. Please find the patch attached, which fixes the 
problem. Any comments or suggestions are welcome.


-Maxim

Index: local_apic.c
===
RCS file: /home/ncvs/src/sys/i386/i386/local_apic.c,v
retrieving revision 1.28
diff -d -u -r1.28 local_apic.c
--- local_apic.c12 Jul 2006 21:22:43 -  1.28
+++ local_apic.c2 Sep 2006 00:42:32 -
@@ -615,6 +615,16 @@
/* Send EOI first thing. */
lapic_eoi();

+   /*
+* Don't do any accounting for the disabled HTT cores, since it
+* will provide misleading numbers for the userland.
+*
+* No locking is necessary here, since even if we loose the race
+* when hlt_cpus_mask changes it is not a big deal, really.
+*/
+   if ((hlt_cpus_mask & (1 << PCPU_GET(cpuid))) != 0)
+   return;
+
/* Look up our local APIC structure for the tick counters. */
la = &lapics[PCPU_GET(apic_id)];
(*la->la_timer_count)++;

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: HEADS-UP newppbus for beta-testing

2000-01-03 Thread Maxim Sobolev

Nicolas Souchu wrote:

> Hi there!
>
> FOR ANYBODY THAT USES ZIP/PRINTER/PLIP ON THE PARALLEL PORT UNDER -current
>
> A major ppbus(4) release is available for beta-testing.

Good work! Now plip, which has been broken for ages, works perfectly - no more
lockups, spontaneous reboots, panics, etc! To test it I even managed to get X
and NFS working over plip line, things which was impossible with oldppbus.

Count on my vote to commit it before branch split because IMHO it should be
considered as a bugfix rather that a new feature.

-Maxim




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: HEADS-UP newppbus for beta-testing

2000-01-05 Thread Maxim Sobolev

Nicolas Souchu wrote:

> On Mon, Jan 03, 2000 at 09:24:52PM +0200, Maxim Sobolev wrote:
> >
> >Nicolas Souchu wrote:
> >
> >> Hi there!
> >>
> >> FOR ANYBODY THAT USES ZIP/PRINTER/PLIP ON THE PARALLEL PORT UNDER -current
> >>
> >> A major ppbus(4) release is available for beta-testing.
> >
> >Good work! Now plip, which has been broken for ages, works perfectly - no more
> >lockups, spontaneous reboots, panics, etc! To test it I even managed to get X
> >and NFS working over plip line, things which was impossible with oldppbus.
>
> Nice! But, sure the 'net' interrupt level mask (at the ppc0 declaration)
> in you MACHINE config file would have done the job.

Unfortunately it is not a solution because net,tty and bio keywords went away from
config(8) long time ago... I've only received `syntax error' message.

-Maxim




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: d: /kernel: malformed input file (not rel or archive) ??

2000-03-01 Thread Maxim Sobolev

Johan Kruger wrote:

> If i try to load the example in
> /usr/src/share/exaples/lkm/misc/module/misc_mod.o i get the following.
> Pleeaaas help ?
>
> borg# modload ./misc_mod.o
> ld: /kernel: malformed input file (not rel or archive)
> modload: /usr/bin/ld: return code 1

What FreeBSD release you are using? If it is 2.2 than the -current is wrong
place to ask, but if it is 4.0 than you must note that the lkm subsustem has
been abolished long time ago in favor of the new kld system. See man kld and
/usr/share/examples/kld for details.

-Maxim




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: ACPI project progress report

2000-06-20 Thread Maxim Sobolev

Mike Smith wrote:

> > On Mon, Jun 19, 2000 at 05:01:46PM -0600, Warner Losh wrote:
> > > In message <[EMAIL PROTECTED]> "Andrew Reilly" writes:
> > > : That sounds way too hard.  Why not restrict suspend activity to
> > > : user-level processes and bring the kernel/drivers back up through
> > > : a regular boot process?  At least that way the hardware and drivers
> > > : will know what they are all up to, even if some of it has changed
> > > : in the mean time.
> > >
> > > Takes too long...  That's shutdown, not S4.
> >
> > Yes.  But what is the difference, really?  As far as the
> > hardware is concerned, it's being booted.  If that process can
> > be sped up by using the "S4" mechanisms, why can't they be
> > applied to a regular boot process too?  [I'm thinking about a
> > kernel equivelant of the "clean shutdown" flag on file systems.]
> >
> > Fundamentally, is there no way to get the kernel and drivers to
> > go through a full boot phase in a small fraction of the time
> > that it takes to repopulate 64M of RAM from disk? (*)
>
> The real issue here is persistent system state across the S4 suspend; ie.
> leaving applications open, etc.  IMO this isn't really something worth a
> lot of effort to us, and it has a lot of additional complications for a
> "server-class" operating system in that you have to worry about network
> connections from other systems, not just _to_ other systems.

Why then brand commercial vendors have such capability in their "server-class"
operating system for a long time? Particularly HP's PA-RISC servers does have it, at
least I remember such feature in the old 30MHz systems which I managed several years
ago (the systems was shipped with small internal battery, which in the case of power
failure was used to dump system to the disk).

-Maxim.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Porting Linux Archive FS

2000-06-20 Thread Maxim Sobolev

Hi Hackers,

Now I'm looking into possibility to port Linux Archive FS module (UFS module
that allows to mount archive files
http://raiden.goice.co.jp/member/mo/release/mi-arcfs/). This thing should be
extremely useful for keeping ports/src tree on machines with limited HDD space.

As I'm not very experienced with UFS hacking I would like to ask somebody with
similar knowledge (ext2fs, smbfs etc.) to provide some help on that (i.e. hints
and tips, plan of attack etc.). If someone would like to help please contact me
at my e-mail, because I do not subscribed on this list.

Thanks!

-Maxim



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Automatic fontsize definition in vidcontrol(8) [patch]

2001-04-06 Thread Maxim Sobolev

Hi,

I found it very frustrating that each time I need to change console
font (my native language has 3 different encodings, so I need to do it
regularly) I have to specify not only the font name, but fontsize as
well. I do not see any reasons why vidcontrol(8) can't guess correct
value from the fontfile itself, especially considering that there is
1-to-1 mapping between size of the fontdata and font dimensions. With
this message I'm attaching small patch that adds this functionality to
the utility.

In addition, this patch makes vidcontrol(8) more robust, fixing the
following issues:
o previously vidcontrol could be crashed easily by specifying a valid
uuencoded file with undecoded size greater than the size of the buffer
allocated for that particular fontsize (the same applies for the
screenmap). Now decoding routine will discard any data that exceed the
size of the buffer;

o previously it was quite easy to trash fonts on your console by
erroneously specifying an arbitrary binary file (or even a directory)
instead of a fontfile. Now the utility will refuse to load a binary
file, unless its size is equal to one of 3 possible valid sizes, thus
greatly reducing a possibility of an error.

Please somebody review attached patches.

Thanks!

-Maxim


Index: decode.c
===
RCS file: /home/ncvs/src/usr.sbin/vidcontrol/decode.c,v
retrieving revision 1.8
diff -d -u -r1.8 decode.c
--- decode.c1999/08/28 01:20:29 1.8
+++ decode.c2001/04/06 12:08:43
@@ -35,10 +35,11 @@
 #include 
 #include "decode.h"
 
-int decode(FILE *fd, char *buffer)
+int decode(FILE *fd, char *buffer, int len)
 {
-   int n, pos = 0;
-   char *p;
+   int n, pos = 0, tpos;
+   char *bp, *p;
+   char tbuffer[3];
char temp[128];
 
 #defineDEC(c)  (((c) - ' ') & 0x3f)
@@ -48,31 +49,49 @@
return(0);
} while (strncmp(temp, "begin ", 6));
sscanf(temp, "begin %o %s", &n, temp);
+   bp = buffer;
for (;;) {
if (!fgets(p = temp, sizeof(temp), fd))
return(0);
if ((n = DEC(*p)) <= 0)
break;
-   for (++p; n > 0; p += 4, n -= 3)
+   for (++p; n > 0; p += 4, n -= 3) {
+   tpos = 0;
if (n >= 3) {
-   buffer[pos++] = DEC(p[0])<<2 | DEC(p[1])>>4;
-   buffer[pos++] = DEC(p[1])<<4 | DEC(p[2])>>2;
-   buffer[pos++] = DEC(p[2])<<6 | DEC(p[3]);
+   tbuffer[tpos++] = DEC(p[0])<<2 | DEC(p[1])>>4;
+   tbuffer[tpos++] = DEC(p[1])<<4 | DEC(p[2])>>2;
+   tbuffer[tpos++] = DEC(p[2])<<6 | DEC(p[3]);
}
else {
if (n >= 1) {
-   buffer[pos++] =
+   tbuffer[tpos++] =
DEC(p[0])<<2 | DEC(p[1])>>4;
}
if (n >= 2) {
-   buffer[pos++] =
+   tbuffer[tpos++] =
DEC(p[1])<<4 | DEC(p[2])>>2;
}
if (n >= 3) {
-   buffer[pos++] =
+   tbuffer[tpos++] =
DEC(p[2])<<6 | DEC(p[3]);
}
}
+   if (tpos == 0)
+   continue;
+   if (tpos + pos > len) {
+   tpos = len - pos;
+   /*
+* Arrange return value > len to indicate
+* overflow.
+*/
+   pos++;
+   }
+   bcopy(tbuffer, bp, tpos);
+   pos += tpos;
+   bp += tpos;
+   if (pos > len)
+   return(pos);
+   }
}
if (!fgets(temp, sizeof(temp), fd) || strcmp(temp, "end\n"))
return(0);
Index: decode.h
===
RCS file: /home/ncvs/src/usr.sbin/vidcontrol/decode.h,v
retrieving revision 1.1
diff -d -u -r1.1 decode.h
--- decode.h1997/03/07 01:34:44 1.1
+++ decode.h2001/04/06 12:08:43
@@ -1 +1,3 @@
-int decode(FILE *fd, char *buffer);
+/* $FreeBSD$ */
+
+int decode(FILE *fd, char *buffer, int len);
Index: vidcontrol.1
===
RCS file: /home/n

Merging ln(1) ``-h'' option from NetBSD [patch]

2001-04-20 Thread Maxim Sobolev

Please somebody take a look at attached patch that adds ``-h'' option
to the ln(1) command (obtained from NetBSD, which has it since 1997).
In addition, I've tried to minimise diffs between our code and
NetBSD's one, so  there are several changes that at a first glance
look superfluous.

-Maxim


Index: ln.1
===
RCS file: /home/ncvs/src/bin/ln/ln.1,v
retrieving revision 1.14
diff -d -u -r1.14 ln.1
--- ln.12000/11/20 11:39:37 1.14
+++ ln.12001/04/20 14:36:37
@@ -44,11 +44,11 @@
 .Nd make links
 .Sh SYNOPSIS
 .Nm
-.Op Fl fisv
+.Op Fl fhinsv
 .Ar source_file
 .Op target_file
 .Nm
-.Op Fl fisv
+.Op Fl fhinsv
 .Ar source_file ...
 .Op target_dir
 .Nm link
@@ -79,6 +79,14 @@
 option overrides any previous
 .Fl i
 options.)
+.It Fl h
+If the
+.Ar target_file
+or
+.Ar target_dir
+is a symbolic link, do not follow it.  This is most useful with the
+.Fl f
+option, to replace a symlink which may point to a directory.
 .It Fl i
 Cause
 .Nm
@@ -94,6 +102,12 @@
 option overrides any previous
 .Fl f
 options.)
+.It Fl n
+Same as
+.Fl h ,
+for compatibility with other
+.Nm
+implementations.
 .It Fl s
 Create a symbolic link.
 .It Fl v
@@ -168,12 +182,18 @@
 and
 .Fl v
 options are non-standard and their use in scripts is not recommended.
-.Sh HISTORY
-An
+.Sh STANDARDS
+The
 .Nm
-command appeared in
-.At v1 .
+utility conforms to
+.St -p1003.2-92 .
+.Pp
 The simplified
 .Nm link
 command conforms to
 .St -susv2 .
+.Sh HISTORY
+An
+.Nm
+command appeared in
+.At v1 .
Index: ln.c
===
RCS file: /home/ncvs/src/bin/ln/ln.c,v
retrieving revision 1.18
diff -d -u -r1.18 ln.c
--- ln.c2000/08/17 16:08:06 1.18
+++ ln.c2001/04/20 14:36:37
@@ -57,6 +57,7 @@
 
 intfflag;  /* Unlink existing files. */
 intiflag;  /* Interactive mode. */
+inthflag;  /* Check new name for symlink first. */
 intsflag;  /* Symbolic, not hard, link. */
 intvflag;  /* Verbose output. */
/* System link call. */
@@ -65,6 +66,7 @@
 
 intlinkit __P((char *, char *, int));
 void   usage __P((void));
+intmain __P((int, char *[]));
 
 int
 main(argc, argv)
@@ -73,7 +75,8 @@
 {
struct stat sb;
int ch, exitval;
-   char *p, *sourcedir;
+   char *p;
+   char *sourcedir;
 
/*
 * Test for the special case where the utility is called as
@@ -92,12 +95,16 @@
usage();
}
 
-   while ((ch = getopt(argc, argv, "fisv")) != -1)
+   while ((ch = getopt(argc, argv, "fhinsv")) != -1)
switch (ch) {
case 'f':
fflag = 1;
iflag = 0;
break;
+   case 'h':
+   case 'n':
+   hflag = 1;
+   break;
case 'i':
iflag = 1;
fflag = 0;
@@ -122,13 +129,22 @@
switch(argc) {
case 0:
usage();
+   /* NOTREACHED */
case 1: /* ln target */
exit(linkit(argv[0], ".", 1));
+   /* NOTREACHED */
case 2: /* ln target source */
exit(linkit(argv[0], argv[1], 0));
+   /* NOTREACHED */
}
/* ln target1 target2 directory */
sourcedir = argv[argc - 1];
+   if (hflag && lstat(sourcedir, &sb) == 0 && S_ISLNK(sb.st_mode)) {
+   /* we were asked not to follow symlinks, but found one at
+  the target--simulate "not a directory" error */
+   errno = ENOTDIR;
+   err(1, "%s", sourcedir);
+   }
if (stat(sourcedir, &sb))
err(1, "%s", sourcedir);
if (!S_ISDIR(sb.st_mode))
@@ -136,6 +152,7 @@
for (exitval = 0; *argv != sourcedir; ++argv)
exitval |= linkit(*argv, sourcedir, 1);
exit(exitval);
+   /* NOTREACHED */
 }
 
 int
@@ -161,18 +178,20 @@
}
}
 
-   /* If the source is a directory, append the target's name. */
-   if (isdir || ((exists = !stat(source, &sb)) && S_ISDIR(sb.st_mode))) {
+   /* If the source is a directory (and not a symlink if hflag),
+  append the target's name. */
+   if (isdir ||
+   (!lstat(source, &sb) && S_ISDIR(sb.st_mode)) ||
+   (!hflag && !stat(source, &sb) && S_ISDIR(sb.st_mode))) {
if ((p = strrchr(target, '/')) == NULL)
p = target;
else
++p;
(void)snprintf(path, sizeof(path), "%s/%s", source, p);
source = path;
- 

Re: [CFR] OpenBSD install(1) fixes: atomic install, etc.

2001-04-20 Thread Maxim Sobolev

Konstantin Chuguev wrote:

> Hi,
>
> Ruslan Ermilov wrote:
>
> > The attached patch incorporates most of OpenBSD fixes to install(1).
> > It does not include manpage update.  Most significant changes are:
> >
> > o New flag: -S (atomic install)
> >
> > : -SSafe copy.  Normally, install unlinks an existing target before
> > :   installing the new file.  With the -S flag a temporary file is
> > :   used and then renamed to be the target.  The reason this is safer
> > :   is that if the copy or rename fails, the existing target is left
> > :   untouched.
> >
>
> Just curious: why not make this way of doing install default (i.e. always use
> it)?

It may effectively doubles disk space requirements during copy (when destination
file is not on a sofdep-enabled partition and is not open at the moment when
install(8) unlinks it). For small files it doesn't matter, but for a big ones it
could lead to a problem.

-Maxim


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Junior Kernel Hacker task: improve vnode->v_tag

2001-09-04 Thread Maxim Sobolev

> In message <[EMAIL PROTECTED]>, Maxim Sobolev writes:
> >> 
> >> In message <[EMAIL PROTECTED]>, Brent Verner writes:
> >> >
> >> >I've done a /cursory/ look over how this v_tag is used.  I'm not sure
> >> >this is a simple/clean as you propose, since this is used in the 
> >> >IS_LOCKING_VFS macro, as well as in union_subr.c...
> >> 
> > Well, that is just too bad, because IS_LOCKING_VFS is wrong then.
> >> 
> >> The places which inspect v_tag will have to be changed to use
> >> strcmp() then...
> >
> >I think that we can add a new vnode flag, say VCANLOCK, so that each
> >particular VFS can set it if it supports locking, which should allow
> >to remove pre-defined VFS list from the IS_LOCKING_VFS macro. I can
> >produce a patch if it sounds reasonably.
> 
> Yeah, I think that makes a lot of sense.

See attached. Please let me know if it is OK for you.

-Maxim


Index: isofs/cd9660/cd9660_vfsops.c
===
RCS file: /home/ncvs/src/sys/isofs/cd9660/cd9660_vfsops.c,v
retrieving revision 1.91
diff -d -u -r1.91 cd9660_vfsops.c
--- isofs/cd9660/cd9660_vfsops.c2001/05/16 18:04:30 1.91
+++ isofs/cd9660/cd9660_vfsops.c2001/09/04 15:20:46
@@ -697,6 +697,7 @@
}
MALLOC(ip, struct iso_node *, sizeof(struct iso_node), M_ISOFSNODE,
M_WAITOK | M_ZERO);
+   vp->v_flag |= VLOCKABLE;
lockinit(&vp->v_lock, PINOD, "isonode", 0, 0);
/*
 * ISOFS uses stdlock and can share lock structure
Index: ufs/ffs/ffs_vfsops.c
===
RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.157
diff -d -u -r1.157 ffs_vfsops.c
--- ufs/ffs/ffs_vfsops.c2001/06/28 22:21:27 1.157
+++ ufs/ffs/ffs_vfsops.c2001/09/04 15:21:25
@@ -1172,6 +1172,7 @@
return (error);
}
bzero((caddr_t)ip, sizeof(struct inode));
+   vp->v_flag |= VLOCKABLE;
/*
 * FFS supports lock sharing in the stack of vnodes
 */
Index: ufs/ifs/ifs_vfsops.c
===
RCS file: /home/ncvs/src/sys/ufs/ifs/ifs_vfsops.c,v
retrieving revision 1.6
diff -d -u -r1.6 ifs_vfsops.c
--- ufs/ifs/ifs_vfsops.c2001/04/25 07:07:51 1.6
+++ ufs/ifs/ifs_vfsops.c2001/09/04 15:21:25
@@ -217,6 +217,7 @@
return (error);
}
bzero((caddr_t)ip, sizeof(struct inode));
+   vp->v_flag |= VLOCKABLE;
/*
 * IFS supports lock sharing in the stack of vnodes
 */
Index: nfs/nfs_node.c
===
RCS file: /home/ncvs/src/sys/nfs/nfs_node.c,v
retrieving revision 1.49
diff -d -u -r1.49 nfs_node.c
--- nfs/nfs_node.c  2001/05/01 08:13:14 1.49
+++ nfs/nfs_node.c  2001/09/04 15:21:25
@@ -232,6 +232,7 @@
}
vp = nvp;
bzero((caddr_t)np, sizeof *np);
+   vp->v_flag |= VLOCKABLE;
vp->v_data = np;
np->n_vnode = vp;
/*
Index: sys/vnode.h
===
RCS file: /home/ncvs/src/sys/sys/vnode.h,v
retrieving revision 1.154
diff -d -u -r1.154 vnode.h
--- sys/vnode.h 2001/08/27 06:09:55 1.154
+++ sys/vnode.h 2001/09/04 15:21:25
@@ -175,6 +175,7 @@
 /* open for business   0x10 */
 #defineVONWORKLST  0x20 /* On syncer work-list */
 #defineVMOUNT  0x40 /* Mount in progress */
+#define VLOCKABLE  0x60 /* vnode supports locking */
 
 /*
  * Vnode attributes.  A field value of VNOVAL represents a field whose value
@@ -433,12 +434,7 @@
 /*
  * [dfr] Kludge until I get around to fixing all the vfs locking.
  */
-#define IS_LOCKING_VFS(vp) ((vp)->v_tag == VT_UFS  \
-|| (vp)->v_tag == VT_NFS   \
-|| (vp)->v_tag == VT_LFS   \
-|| (vp)->v_tag == VT_ISOFS \
-|| (vp)->v_tag == VT_MSDOSFS   \
-|| (vp)->v_tag == VT_DEVFS)
+#define IS_LOCKING_VFS(vp) ((vp)->v_flag & VLOCKABLE)
 
 #define ASSERT_VOP_LOCKED(vp, str) \
 do {   \
Index: fs/devfs/devfs_vnops.c
===
RCS file: /home/ncvs/src/sys/fs/devfs/devfs_vnops.c,v
retrieving revision 1.27
diff -d -u -r1.27 devfs_vnops.c
--- fs/devfs/devfs_vnops.c  2001/08/14 06:42:32 1.27
+++ fs/devfs/devfs_vnops.c  2001/09/04 15:21:25
@@ -151,6 +151,7 @@
 

Re: Junior Kernel Hacker task: improve vnode->v_tag

2001-09-04 Thread Maxim Sobolev

> 
> Hi Maxim,
> 
> Perhaps you meant:
> diff -d -u -r1.154 vnode.h
> --- sys/vnode.h 2001/08/27 06:09:55 1.154
> +++ sys/vnode.h 2001/09/04 15:21:25
> @@ -175,6 +175,7 @@
>  /* open for business   0x10 */
>  #defineVONWORKLST  0x20 /* On syncer work-list */
>  #defineVMOUNT  0x40 /* Mount in progress */
> +#define VLOCKABLE  0x60 /* vnode supports locking */
> ...should be
> +#define VLOCKABLE  0x80 /* vnode supports locking */

Indeed. Thank you for pointing out!

-Maxim

> 
> 
> 
> Maxim Sobolev wrote:
> 
> > > In message <[EMAIL PROTECTED]>, Maxim Sobolev writes:
> > > >>
> > > >> In message <[EMAIL PROTECTED]>, Brent Verner writes:
> > > >> >
> > > >> >I've done a /cursory/ look over how this v_tag is used.  I'm not sure
> > > >> >this is a simple/clean as you propose, since this is used in the
> > > >> >IS_LOCKING_VFS macro, as well as in union_subr.c...
> > > >>
> > > > Well, that is just too bad, because IS_LOCKING_VFS is wrong then.
> > > >>
> > > >> The places which inspect v_tag will have to be changed to use
> > > >> strcmp() then...
> > > >
> > > >I think that we can add a new vnode flag, say VCANLOCK, so that each
> > > >particular VFS can set it if it supports locking, which should allow
> > > >to remove pre-defined VFS list from the IS_LOCKING_VFS macro. I can
> > > >produce a patch if it sounds reasonably.
> > >
> > > Yeah, I think that makes a lot of sense.
> >
> > See attached. Please let me know if it is OK for you.
> >
> > -Maxim
> >
> >   --
> >Name: p
> >p   Type: Plain Text (text/plain)
> >Encoding: 7bit
> > Description: ASCII C program text
> 
> --
>  * *   Konstantin Chuguev   Francis House
>   *  * Application Engineer 112 Hills Road
> *  Tel: +44 1223 302992 Cambridge CB2 1PQ
> D  A  N  T  E  WWW: http://www.dante.netUnited Kingdom
> 
> 
> 
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Junior Kernel Hacker task: improve vnode->v_tag

2001-09-04 Thread Maxim Sobolev

> 
> In message <[EMAIL PROTECTED]>, Brent Verner writes:
> >On 04 Sep 2001 at 10:36 (+0200), Poul-Henning Kamp wrote:
> >| 
> >| Assignment:
> >| 
> >| The v_tag element in struct vnode is a debugging aid, but unfortunately
> >| it is implemented in a way which means that adding a filesystem means
> >| modifying the definition in .
> >| 
> >| Convert the v_tag to an "const char *" and have the filesystems put
> >| their name in there instead.
> >| 
> >| The v_tag has been abused a few places, easily recognizable by the fact
> >| that the kernel should never inspect the value of v_tag.
> >| These places should be easily changeable to use the new representation.
> >| Please mark them with a big fat "/*XXX: ABUSE OF v_tag */" comment.
> >
> >#include 
> >
> >I've done a /cursory/ look over how this v_tag is used.  I'm not sure
> >this is a simple/clean as you propose, since this is used in the 
> >IS_LOCKING_VFS macro, as well as in union_subr.c...
> 
> Well, that is just too bad, because IS_LOCKING_VFS is wrong then.
> 
> The places which inspect v_tag will have to be changed to use
> strcmp() then...

I think that we can add a new vnode flag, say VCANLOCK, so that each
particular VFS can set it if it supports locking, which should allow
to remove pre-defined VFS list from the IS_LOCKING_VFS macro. I can
produce a patch if it sounds reasonably.

-Maxim


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Junior Kernel Hacker task: improve vnode->v_tag

2001-09-04 Thread Maxim Sobolev

> 
> 
> apart from the numerical value, yes, looks good.

Ok, please find the final patch attached. Dare I say that it looks really
ugly?

I'm looking forward for your comments.

-Maxim

> 
> Poul-Henning
> 
> In message <[EMAIL PROTECTED]>, Maxim Sobolev writes:
> >
> >--%--multipart-mixed-boundary-1.97537.999617732--%
> >Content-Type: text/plain; charset=us-ascii
> >Content-Transfer-Encoding: 7bit
> >
> >> In message <[EMAIL PROTECTED]>, Maxim Sobolev writes:
> >> >> 
> >> >> In message <[EMAIL PROTECTED]>, Brent Verner writes:
> >> >> >
> >> >> >I've done a /cursory/ look over how this v_tag is used.  I'm not sure
> >> >> >this is a simple/clean as you propose, since this is used in the 
> >> >> >IS_LOCKING_VFS macro, as well as in union_subr.c...
> >> >> 
> >> > Well, that is just too bad, because IS_LOCKING_VFS is wrong then.
> >> >> 
> >> >> The places which inspect v_tag will have to be changed to use
> >> >> strcmp() then...
> >> >
> >> >I think that we can add a new vnode flag, say VCANLOCK, so that each
> >> >particular VFS can set it if it supports locking, which should allow
> >> >to remove pre-defined VFS list from the IS_LOCKING_VFS macro. I can
> >> >produce a patch if it sounds reasonably.
> >> 
> >> Yeah, I think that makes a lot of sense.
> >
> >See attached. Please let me know if it is OK for you.
> >
> >-Maxim
> >
> >--%--multipart-mixed-boundary-1.97537.999617732--%
> >Content-Type: text/plain; charset=us-ascii
> >Content-Transfer-Encoding: 7bit
> >Content-Description: ASCII C program text
> >Content-Disposition: attachment; filename="p"
> >
> >Index: isofs/cd9660/cd9660_vfsops.c
> >===
> >RCS file: /home/ncvs/src/sys/isofs/cd9660/cd9660_vfsops.c,v
> >retrieving revision 1.91
> >diff -d -u -r1.91 cd9660_vfsops.c
> >--- isofs/cd9660/cd9660_vfsops.c 2001/05/16 18:04:30 1.91
> >+++ isofs/cd9660/cd9660_vfsops.c 2001/09/04 15:20:46
> >@@ -697,6 +697,7 @@
> > }
> > MALLOC(ip, struct iso_node *, sizeof(struct iso_node), M_ISOFSNODE,
> > M_WAITOK | M_ZERO);
> >+vp->v_flag |= VLOCKABLE;
> > lockinit(&vp->v_lock, PINOD, "isonode", 0, 0);
> > /*
> >  * ISOFS uses stdlock and can share lock structure
> >Index: ufs/ffs/ffs_vfsops.c
> >===
> >RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vfsops.c,v
> >retrieving revision 1.157
> >diff -d -u -r1.157 ffs_vfsops.c
> >--- ufs/ffs/ffs_vfsops.c 2001/06/28 22:21:27 1.157
> >+++ ufs/ffs/ffs_vfsops.c 2001/09/04 15:21:25
> >@@ -1172,6 +1172,7 @@
> > return (error);
> > }
> > bzero((caddr_t)ip, sizeof(struct inode));
> >+vp->v_flag |= VLOCKABLE;
> > /*
> >  * FFS supports lock sharing in the stack of vnodes
> >  */
> >Index: ufs/ifs/ifs_vfsops.c
> >===
> >RCS file: /home/ncvs/src/sys/ufs/ifs/ifs_vfsops.c,v
> >retrieving revision 1.6
> >diff -d -u -r1.6 ifs_vfsops.c
> >--- ufs/ifs/ifs_vfsops.c 2001/04/25 07:07:51 1.6
> >+++ ufs/ifs/ifs_vfsops.c 2001/09/04 15:21:25
> >@@ -217,6 +217,7 @@
> > return (error);
> > }
> > bzero((caddr_t)ip, sizeof(struct inode));
> >+vp->v_flag |= VLOCKABLE;
> > /*
> >  * IFS supports lock sharing in the stack of vnodes
> >  */
> >Index: nfs/nfs_node.c
> >===
> >RCS file: /home/ncvs/src/sys/nfs/nfs_node.c,v
> >retrieving revision 1.49
> >diff -d -u -r1.49 nfs_node.c
> >--- nfs/nfs_node.c   2001/05/01 08:13:14 1.49
> >+++ nfs/nfs_node.c   2001/09/04 15:21:25
> >@@ -232,6 +232,7 @@
> > }
> > vp = nvp;
> > bzero((caddr_t)np, sizeof *np);
> >+vp->v_flag |= VLOCKABLE;
> > vp->v_data = np;
> > np->n_vnode = vp;
> > /*
> >Index: sys/vnode.h
> >===
> >RCS file: /home/ncvs/src/sys/sys/vnode.h,v
> >retrieving revision 1.154
> >diff -d -u -r1.154 vnode.h
> >--- sys/vnode.h  2001/08/27 06:09:55 1.154
> >+++

Re: Junior Kernel Hacker task: improve vnode->v_tag

2001-09-18 Thread Maxim Sobolev

Chris Costello wrote:

> On Saturday, September 08, 2001, Maxim Sobolev wrote:
> > I don't like idea to hardcode the same string ("procfs"), with the
> > same meaning in several places across kernel. As for your proposition
> > to use f_fstypename to set v_tag, it is even more bogus because
> > value of the f_fstypename is supplied from the user level, so
> > potentially it could be anything and we can't make any reasonable
> > assumptions about mapping between its value and type of the filesystem
> > in question.
>
>How do you figure?  The contents if `f_fstypename' must match
> a configured file system exactly, so it could _not_ be anything.
> To quote sys/kern/vfs_syscalls.c:mount():

Oh, yes, you are correct obviously (don't know what I was thinking about). In this
case, it looks like v_tag is redundant, because f_fstypename could be used instead
in a few places where v_tag is abused (the same applies to the statfs.f_type
because essentually it is the same thing as v_tag). Poul, what do you think about
it? In the meantime, I found another place in the kernel where VT_* macros are
[ab]used - it is Linuxlator, attached please find patches to fix it - please
review.

-Maxim


Index: linux_stats.c
===
RCS file: /home/ncvs/src/sys/compat/linux/linux_stats.c,v
retrieving revision 1.37
diff -d -u -r1.37 linux_stats.c
--- linux_stats.c   2001/09/12 08:36:57 1.37
+++ linux_stats.c   2001/09/18 11:52:02
@@ -187,10 +187,6 @@
l_int   f_spare[6];
 };
 
-#ifndef VT_NWFS
-#defineVT_NWFS VT_TFS  /* XXX - bug compat. with sys/fs/nwfs/nwfs_node.h */
-#endif
-
 #defineLINUX_CODA_SUPER_MAGIC  0x73757245L
 #defineLINUX_EXT2_SUPER_MAGIC  0xEF53L
 #defineLINUX_HPFS_SUPER_MAGIC  0xf995e849L
@@ -202,34 +198,30 @@
 #defineLINUX_PROC_SUPER_MAGIC  0x9fa0L
 #defineLINUX_UFS_SUPER_MAGIC   0x00011954L /* XXX - UFS_MAGIC in Linux */
 
-/*
- * ext2fs uses the VT_UFS tag. A mounted ext2 filesystem will therefore
- * be seen as an ufs filesystem.
- */
 static long
-bsd_to_linux_ftype(int tag)
+bsd_to_linux_ftype(const char *fstypename)
 {
 
-   switch (tag) {
-   case VT_CODA:
+   if (strcmp(fstypename, "coda") == 0)
return (LINUX_CODA_SUPER_MAGIC);
-   case VT_HPFS:
+   else if (strcmp(fstypename, "hpfs") == 0)
return (LINUX_HPFS_SUPER_MAGIC);
-   case VT_ISOFS:
+   else if (strcmp(fstypename, "cd9660") == 0)
return (LINUX_ISOFS_SUPER_MAGIC);
-   case VT_MSDOSFS:
+   else if (strcmp(fstypename, "msdosfs") == 0)
return (LINUX_MSDOS_SUPER_MAGIC);
-   case VT_NFS:
+   else if (strcmp(fstypename, "nfs") == 0)
return (LINUX_NFS_SUPER_MAGIC);
-   case VT_NTFS:
+   else if (strcmp(fstypename, "ntfs") == 0)
return (LINUX_NTFS_SUPER_MAGIC);
-   case VT_NWFS:
+   else if (strcmp(fstypename, "nwfs") == 0)
return (LINUX_NCP_SUPER_MAGIC);
-   case VT_PROCFS:
+   else if (strcmp(fstypename, "procfs") == 0)
return (LINUX_PROC_SUPER_MAGIC);
-   case VT_UFS:
+   else if (strcmp(fstypename, "ufs") == 0)
return (LINUX_UFS_SUPER_MAGIC);
-   }
+   else if (strcmp(fstypename, "ext2fs") == 0)
+   return (LINUX_EXT2_SUPER_MAGIC);
 
return (0L);
 }
@@ -265,7 +257,7 @@
if (error)
return error;
bsd_statfs->f_flags = mp->mnt_flag & MNT_VISFLAGMASK;
-   linux_statfs.f_type = bsd_to_linux_ftype(bsd_statfs->f_type);
+   linux_statfs.f_type = bsd_to_linux_ftype(bsd_statfs->f_fstypename);
linux_statfs.f_bsize = bsd_statfs->f_bsize;
linux_statfs.f_blocks = bsd_statfs->f_blocks;
linux_statfs.f_bfree = bsd_statfs->f_bfree;
@@ -301,7 +293,7 @@
if (error)
return error;
bsd_statfs->f_flags = mp->mnt_flag & MNT_VISFLAGMASK;
-   linux_statfs.f_type = bsd_to_linux_ftype(bsd_statfs->f_type);
+   linux_statfs.f_type = bsd_to_linux_ftype(bsd_statfs->f_fstypename);
linux_statfs.f_bsize = bsd_statfs->f_bsize;
linux_statfs.f_blocks = bsd_statfs->f_blocks;
linux_statfs.f_bfree = bsd_statfs->f_bfree;



Re: Creating Compressed Loop FS from stdin

2004-12-30 Thread Maxim Sobolev
Peter Pentchev wrote:
On Thu, Dec 30, 2004 at 01:28:28PM +0100, Matteo Riondato wrote:
Il giorno Gio, 30-12-2004 alle 12:34 +0200, Peter Pentchev ha scritto:
This could be fixed by the following patch.  I'm CC'ing Maxim Sobolev,
the author of mkuzip(8); Maxim, do you have any objections to this patch?
Thank you for the answer and fo the patch! I hope Maxim will commit it
soon.

Actually, if Maxim has no objections, I could commit it myself.
However, it would be totally understandable if he doesn't answer in
the next day or three, what with the calendar moving ahead and all :)
It will not help, since AFAIK you can't seek stdin anyway, or even if I 
am wrong and you can seek it to the end you will be unable to seek it 
backward.

I've already replied to this message, but Matteo has some very strange 
settings of his smtp relay so that neither my original message nor my 
follow-up in which I had forwarded mail delivery error message got through.

-Maxim
 Original Message 
Subject: Re: Creating Compressed Loop FS from stdin
Date: Fri, 17 Dec 2004 17:14:48 +0200
From: Maxim Sobolev <[EMAIL PROTECTED]>
Organization: Porta Software Ltd
To: Matteo Riondato <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]>
This is not going to work by design unfortunately. cloop format has
serious design flaw: it contains variable-lengh header at the beginning
of the compressed image, so that before doing compression mkuzip(1) uses
stat(2) call at the original file to get its size and reserve necessary
space, which doesn't work with /dev/stdin as you may guess. Original GNU
utility either keeps the whole compressed image in memory or uses some
form of temporary storage (I don't quite remember) to work around this
problem.
Regards,
Maxim
 Original Message 
Subject: Returned mail: see transcript for details
Date: Fri, 17 Dec 2004 16:14:58 +0100 (CET)
From: Mail Delivery Subsystem 
To: <[EMAIL PROTECTED]>
The original message was received at Fri, 17 Dec 2004 16:14:55 +0100 (CET)
from [192.168.1.26]
- The following addresses had permanent fatal errors -
<[EMAIL PROTECTED]>
 (reason: 550 Error: Sorry, unallowed MIME charset (too much spam))
- Transcript of session follows -
... while talking to relay.gufi.org.:
>>> DATA
<<< 550 Error: Sorry, unallowed MIME charset (too much spam)
554 5.0.0 Service unavailable

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Creating Compressed Loop FS from stdin

2004-12-30 Thread Maxim Sobolev
You don't check return code of the second lseek - I bet it fails. This 
probably leads to creation of seemingly valid loop fs (i.e. with valid 
header), but filled with zeroes or some random junk.

-Maxim
Peter Pentchev wrote:
On Thu, Dec 30, 2004 at 03:32:27PM +0200, Maxim Sobolev wrote:
Peter Pentchev wrote:
On Thu, Dec 30, 2004 at 01:28:28PM +0100, Matteo Riondato wrote:

Il giorno Gio, 30-12-2004 alle 12:34 +0200, Peter Pentchev ha scritto:

This could be fixed by the following patch.  I'm CC'ing Maxim Sobolev,
the author of mkuzip(8); Maxim, do you have any objections to this patch?
Thank you for the answer and fo the patch! I hope Maxim will commit it
soon.

Actually, if Maxim has no objections, I could commit it myself.
However, it would be totally understandable if he doesn't answer in
the next day or three, what with the calendar moving ahead and all :)
It will not help, since AFAIK you can't seek stdin anyway, or even if I 
am wrong and you can seek it to the end you will be unable to seek it 
backward.

I tested the patch before posting it, fully expecting to find that stdin
really cannot be seeked (sought? :), and surprisingly it worked, at least
on RELENG_5 as of today!

I've already replied to this message, but Matteo has some very strange 
settings of his smtp relay so that neither my original message nor my 
follow-up in which I had forwarded mail delivery error message got through.

-Maxim
 Original Message 
Subject: Re: Creating Compressed Loop FS from stdin
Date: Fri, 17 Dec 2004 17:14:48 +0200
From: Maxim Sobolev <[EMAIL PROTECTED]>
Organization: Porta Software Ltd
To: Matteo Riondato <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]>
This is not going to work by design unfortunately. cloop format has
serious design flaw: it contains variable-lengh header at the beginning
of the compressed image, so that before doing compression mkuzip(1) uses
stat(2) call at the original file to get its size and reserve necessary
space, which doesn't work with /dev/stdin as you may guess. Original GNU
utility either keeps the whole compressed image in memory or uses some
form of temporary storage (I don't quite remember) to work around this
problem.

Well, another solution would be to make mkuzip use a temporary file
(obviously, keeping the whole thing in memory would be a bad idea for
big ISO filesystems ;).  However, as I noted above, my lseek patch
works at least for RELENG_5.  Can somebody test it on -CURRENT?
G'luck,
Peter
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Creating Compressed Loop FS from stdin

2004-12-30 Thread Maxim Sobolev
Never mind - people are inheretedly error prone creatures. ;-)
In your case shell has been passing file descriptor of the open file, 
not pipe, so that seeking has been working properly.

Anyway, I think that your patch is useful, since it should allow using 
disk devices.

-Maxim
Peter Pentchev wrote:
On Thu, Dec 30, 2004 at 04:55:43PM +0200, Peter Pentchev wrote:
On Thu, Dec 30, 2004 at 04:20:16PM +0200, Maxim Sobolev wrote:
You don't check return code of the second lseek - I bet it fails. This 
probably leads to creation of seemingly valid loop fs (i.e. with valid 
header), but filled with zeroes or some random junk.
I said I'd tested it before posting it the first time.  It works.
It creates a valid loop fs, containing exactly the files that are in
the input ISO image.

Errr.  Oops.
Sorry everyone - the patch does not really work.  I keep testing it with
a *file* passed on mkuzip's stdin, all the while feeling surprised that
lseek() works on the pipe... when there is no pipe at all :(
I just tested it with a real pipe, and of course, it failed.  Again,
sorry for wasting your time; I guess it'd be best if I tucked in for
the holidays now :(
G'luck,
Peter
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Attempt to invoke connect(2) on already connected unix domaindatagram socket fails with ECONNRESET

2005-01-10 Thread Maxim Sobolev
Folks,
I've discovered very strange behaviour of the connect(2) system call 
when it's called on already connected unix domain datagram socket. In 
this case connect(2) fails with ECONNRESET, which is weird. ECONNRESET 
is not even listed among possible return values of connect(2). I've 
confirmed this behaviour at 4.10 and 5.3 systems. Linux doesn't exhibit 
this (mis?)behaviour.

As long as I can tell, this behaviour contradicts documentation, 
connect(2) manpage says:

 Generally, stream sockets may successfully connect() only
 once; datagram sockets may use connect() multiple times to change
 their association.
Attached please find small test program which illustrates the problem. 
It forks itsels at the start, child becomes a server, while parent a 
client. After each transaction server closes unix domain socket and 
opens its again, while the client attempts to re-connect() to that unix 
domain socket using already created socket object.

This mimics real-world scenario in which I've encountered the problem. 
In this scenarion, there are two distinct processes communicating using 
unix domain socket. Client uses connect() on already connected socket 
object for performance reasons to avoid calling socket(2) for each 
transaction. Everything works just fine until server is restarted. After 
that any attempts to send command from the client to the server fails 
with ECONNRESET until the client is restarted as well.

-Maxim
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define UDS_NAME"/tmp/uds_test.sock"

#define sstosa(ss)  ((struct sockaddr *)(ss))

static pid_t pid_kill;

void
prepare_ifsun(struct sockaddr_un *ifsun)
{
static char ch = '1' * 2;

memset(ifsun, '\0', sizeof(*ifsun));
#if !defined(__linux__) && !defined(__solaris__)
ifsun->sun_len = strlen(UDS_NAME);
#endif
ifsun->sun_family = AF_LOCAL;
strcpy(ifsun->sun_path, UDS_NAME);
//ifsun->sun_path[ifsun->sun_len - 1] = ch / 2;
ch++;
}

int
create_uds_server(void)
{
struct sockaddr_un ifsun;
int sock;

prepare_ifsun(&ifsun);

unlink(ifsun.sun_path);

sock = socket(PF_LOCAL, SOCK_DGRAM, 0);
if (sock == -1)
err(1, "server: can't create socket");
setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &sock, sizeof(sock));
if (bind(sock, sstosa(&ifsun), sizeof(ifsun)) < 0)
err(1, "server: can't bind to a socket");

return sock;
}

void
connect_uds_server(int sock)
{
struct sockaddr_un ifsun;

prepare_ifsun(&ifsun);

if (connect(sock, sstosa(&ifsun), sizeof(ifsun)) < 0)
err(1, "client: can't connect to a socket");
}

static void
cleanup(void)
{

kill(pid_kill, SIGKILL);
}

int
main()
{
int sock, len;
pid_t pid;

pid = fork();
if (pid < 0)
err(1, "can't fork");
pid_kill = getpid();
if (pid != 0) {
/* Parent */
pid_kill = pid;
atexit(cleanup);
sock = socket(PF_LOCAL, SOCK_DGRAM, 0);
if (sock < 0)
err(1, "client: can't create socket");
for (;;) {
sleep(1);
connect_uds_server(sock);
len = write(sock, &pid, sizeof(pid));
if (len < 0)
err(1, "client: can't write to a socket");
printf("client: wrote %d bytes to the socket\n", len);
}
} else {
/* Child */
atexit(cleanup);
for (;;) {
sock = create_uds_server();
len = recvfrom(sock, &pid, sizeof(pid), 0, NULL, NULL);
if (len < 0)
err(1, "server: can't read from a socket");
printf("server: read %d bytes from the socket\n", len);
close(sock);
}
}
exit (1);
}
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Attempt to invoke connect(2) on already connected unix domain datagram socket fails with ECONNRESET

2005-01-11 Thread Maxim Sobolev
Further investigation revealed that the said problem only happens when 
the program is trying to re-connect() socket object which previously has 
been connected to the unix domain socket closed on the server side at 
the time when the second connect() is called. Attached please find more 
simple testcase.

-Maxim
Maxim Sobolev wrote:
Folks,
I've discovered very strange behaviour of the connect(2) system call 
when it's called on already connected unix domain datagram socket. In 
this case connect(2) fails with ECONNRESET, which is weird. ECONNRESET 
is not even listed among possible return values of connect(2). I've 
confirmed this behaviour at 4.10 and 5.3 systems. Linux doesn't exhibit 
this (mis?)behaviour.

As long as I can tell, this behaviour contradicts documentation, 
connect(2) manpage says:

 Generally, stream sockets may successfully connect() only
 once; datagram sockets may use connect() multiple times to change
 their association.
Attached please find small test program which illustrates the problem. 
It forks itsels at the start, child becomes a server, while parent a 
client. After each transaction server closes unix domain socket and 
opens its again, while the client attempts to re-connect() to that unix 
domain socket using already created socket object.

This mimics real-world scenario in which I've encountered the problem. 
In this scenarion, there are two distinct processes communicating using 
unix domain socket. Client uses connect() on already connected socket 
object for performance reasons to avoid calling socket(2) for each 
transaction. Everything works just fine until server is restarted. After 
that any attempts to send command from the client to the server fails 
with ECONNRESET until the client is restarted as well.

-Maxim

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define UDS_NAME1   "/tmp/uds_test.sock1"
#define UDS_NAME2   "/tmp/uds_test.sock2"

#define sstosa(ss)  ((struct sockaddr *)(ss))

void
prepare_ifsun(struct sockaddr_un *ifsun, const char *path)
{

memset(ifsun, '\0', sizeof(*ifsun));
#if !defined(__linux__) && !defined(__solaris__)
ifsun->sun_len = strlen(path);
#endif
ifsun->sun_family = AF_LOCAL;
strcpy(ifsun->sun_path, path);
}

int
create_uds_server(const char *path)
{
struct sockaddr_un ifsun;
int sock;

prepare_ifsun(&ifsun, path);

unlink(ifsun.sun_path);

sock = socket(PF_LOCAL, SOCK_DGRAM, 0);
if (sock == -1)
err(1, "server: can't create socket");
setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &sock, sizeof(sock));
if (bind(sock, sstosa(&ifsun), sizeof(ifsun)) < 0)
err(1, "server: can't bind to a socket");

return sock;
}

void
connect_uds_server(int sock, const char *path)
{
struct sockaddr_un ifsun;
int e;

prepare_ifsun(&ifsun, path);

e = connect(sock, sstosa(&ifsun), sizeof(ifsun));
if (e < 0)
err(1, "client: can't connect to a socket");
}

int
main()
{
int s_sock1, s_sock2, c_sock;

s_sock1 = create_uds_server(UDS_NAME1);
s_sock2 = create_uds_server(UDS_NAME2);

c_sock = socket(PF_LOCAL, SOCK_DGRAM, 0);
if (c_sock < 0)
err(1, "client: can't create socket");

connect_uds_server(c_sock, UDS_NAME1);
close(s_sock1);
connect_uds_server(c_sock, UDS_NAME2);

exit (0);
}
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Attempt to invoke connect(2) on already connected unix domain datagram socket fails with ECONNRESET

2005-01-11 Thread Maxim Sobolev
Maxim Sobolev wrote:
Further investigation revealed that the said problem only happens when 
the program is trying to re-connect() socket object which previously has 
been connected to the unix domain socket closed on the server side at 
the time when the second connect() is called. Attached please find more 
simple testcase.
It seems that I've found source of the problem. It is caused by the fact 
that when server closes its side of unix domain socket it causes 
unp_drop(ref, ECONNRESET) to be called on client side of the connection, 
which in turn results in so_error member of client's struct socket to be 
set to ECONNRESET. Since we don't do any more reads on the client side 
of the connection, this error is never cleared up and then being picked 
up as a connection error by kern_connect() routine, which is obviously 
incorrect. The funny thing is that despite that error (ECONNRESET) one 
can still use resulting socket like if no error has happened.

Attached please find which I believe should fix the problem in question. 
I would appreciate if somebody can review it.

Thanks in advance!
Regards,
Maxim
Index: uipc_socket.c
===
RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.208.2.6
diff -d -u -r1.208.2.6 uipc_socket.c
--- uipc_socket.c   16 Nov 2004 08:15:07 -  1.208.2.6
+++ uipc_socket.c   10 Jan 2005 16:23:07 -
@@ -530,10 +530,19 @@
 */
if (so->so_state & (SS_ISCONNECTED|SS_ISCONNECTING) &&
((so->so_proto->pr_flags & PR_CONNREQUIRED) ||
-   (error = sodisconnect(so
+   (error = sodisconnect(so {
error = EISCONN;
-   else
+   } else {
+   SOCK_LOCK(so);
+   /*
+* Prevent accumulated error from previous connection
+* from biting us.
+*/
+   so->so_error = 0;
+   SOCK_UNLOCK(so);
error = (*so->so_proto->pr_usrreqs->pru_connect)(so, nam, td);
+   }
+
return (error);
 }
 
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: RFC: backporting GEOM to the 4.x branch

2005-02-27 Thread Maxim Sobolev
Roland Dowdeswell wrote:
[ cc'ing [EMAIL PROTECTED], because there has been talk
  of GBDE there in the past.]
Well, I thought that since I saw this:
ALeine wrote a while ago:
[EMAIL PROTECTED] wrote:
Wouldn't be easier porting cgd* from NetBSD ?
* http://www.netbsd.org/guide/en/chap-cgd.html
Perhaps, but I believe GBDE to be superior to CGD for a number
of reasons, one of the most important being that with GBDE you
can change the passphrase without re-encrypting the entire disk,
which is not the case with CGD, AFAIK. From Poul-Henning Kamp's
paper on GBDE:

That, as the author of CGD, I should respond to some common
misconceptions about my work which seem to be percolating around.
First, on the capability front, you can:
1.  change the passphrase on a disk without re-encrypting it,
2.  have as many passphrases as you would like to configure,
3.  use n-factor authentication with arbitrary large n.
Also, GBDE has a number of serious drawbacks.  All of which would
be show-stoppers if I were considering using it for serious security
work, or even use in a production environment.
There is no protection _at_all_ against dictionary attacks.  Where
CGD uses PKCS#5 in a completely standard way to frustrate dictionary
attacks, GBDE does exactly nothing.  In fact, worse than nothing.
It is possible to conduct half of the dictionary attack offline,
so the actual online portion of the attack is something that my
laptop could make about 2^30 guesses in a couple of hours.  So, it
is insecure from the start.
Well, I think that this is quite minor item, since GBDE doesn't govern 
transformation of the passphrase into the actual key, so that another 
scheme more bullet-prof against dictionary attacks (PKCS#5 or any other) 
can be developed in virtually no time at all and will require making 
only minor changes to the userland utility which gets password from the 
keyboard or command line, hashes it and feeds to the kernel.

GBDE has no facility for using different encryption algorithms than
the rather...  interesting one that it comes with.  There is no
way to trade speed and security for different use cases, and the
only algorithm that it comes with is very slow.  Less than half
the performance of CGD's most secure algorithm (AES256).
Well, it's hard to comment on this, since the only paper that I have 
found on CGD is http://www.imrryr.org/~elric/cgd/cgd.pdf, which 
unfortunately doesn't provide any details on how CGD encrypts data and 
lays it out on disk.

So, now that we've touched on the security problems...  Let's think
about using GBDE in production.  Please reference
http://phk.freebsd.dk/pubs/bsdcon-03.gbde.paper.pdf
And read Section 7.5, and refer to figure 2.
Each disk write involves two writes to the disk.  Where is the
journal?  I do not see any talk about a journal in the paper, or
the GBDE source code.  Hence, if the OS crashes or if a removable
disk is removed at the wrong time, etc. etc. it is possible that
only one of those writes would succeed.  I think that we can all
see where this is going.
So what? If the write fails in the middle, reading sector will just 
produce garbage. I don't think that it's different from plain old HDD 
which has been powered down in the middle of doing disk write. Disk 
encryption layer is definitely not the level at which journaling should 
be implemented. It's task of file system to do this. The task of 
encryption layer is merely to inform the file system when transaction 
(i.e. both of those two writes in this case) have been completed 
successfully, so that FS can adjust its journal accordingly.

-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Sub-optimal libc's read-ahead buffering behaviour

2005-08-04 Thread Maxim Sobolev

Hi,

I have found the scenario in which our libc behaves utterly 
suboptimally. Consider the following piece of code reads and processes 
every other 512-bytes block in a file (error handling intentionally 
omitted):


FILE *f;
int i;
char buf[512];

f = fopen(...);
for (i = 0; feof(f) == 0; i++) {
  fread(buf, sizeof(buf), 1, f);
  do_process(buf);
  fseek(f, i * 2 * sizeof(buf), SEEK_SET);
}

What I have discovered in this case is that libc reads 4096 bytes from 
the file for *each* fread(3) call, despite the fact that it can only do 
one actual read(2) for every fourth fread(3) and satisfy the rest from 
the internal buffer (4096 bytes). However, if I replace fseek(3) with 
just another dummy fread(3) everything works as expected - libc does 
only one read for every 8 fread(3) calls (4 dummy and 4 real).


Is it something which should be fixed or are there some subtle reasons 
for the current behaviour?


Following is piece of code which illustrates the problem:

#include 
#include 

int
main(int argc, char **argv)
{
FILE *f;
int i;
char buf[512];

f = fopen("/dev/zero", "r");
for (i = 0; i < 16; i++) {
fread(buf, sizeof(buf), 1, f);
if (argc == 1)
fread(buf, sizeof(buf), 1, f);
else
fseek(f, i * 2 * sizeof(buf), SEEK_SET);
}
exit(0);
}

When run with zero arguments relevant truss output looks like:

open("/dev/zero",0x0,0666)   = 3 (0x3)
fstat(3,0xbfbfe900)  = 0 (0x0)
readlink("/etc/malloc.conf",0xbfbfe8c0,63)   ERR#2 'No such file or 
directory'

issetugid()  = 0 (0x0)
mmap(0x0,4096,(0x3)PROT_READ|PROT_WRITE,(0x1002)MAP_ANON|MAP_PRIVATE,-1,0x0) 
= 1209335808 (0x4815)

break(0x804b000) = 0 (0x0)
break(0x804c000) = 0 (0x0)
ioctl(3,TIOCGETA,0xbfbfe940) ERR#19 'Operation not 
supported by device'

read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
exit(0x0)

While when I am specifying some argument it becomes:

open("/dev/zero",0x0,0666)   = 3 (0x3)
fstat(3,0xbfbfe900)  = 0 (0x0)
readlink("/etc/malloc.conf",0xbfbfe8c0,63)   ERR#2 'No such file or 
directory'

issetugid()  = 0 (0x0)
mmap(0x0,4096,(0x3)PROT_READ|PROT_WRITE,(0x1002)MAP_ANON|MAP_PRIVATE,-1,0x0) 
= 1209335808 (0x4815)

break(0x804b000) = 0 (0x0)
break(0x804c000) = 0 (0x0)
ioctl(3,TIOCGETA,0xbfbfe940) ERR#19 'Operation not 
supported by device'

read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x0,SEEK_SET)= 0 (0x0)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x400,SEEK_SET)  = 1024 (0x400)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x800,SEEK_SET)  = 2048 (0x800)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0xc00,SEEK_SET)  = 3072 (0xc00)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x1000,SEEK_SET) = 4096 (0x1000)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x1400,SEEK_SET) = 5120 (0x1400)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x1800,SEEK_SET) = 6144 (0x1800)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x1c00,SEEK_SET) = 7168 (0x1c00)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x2000,SEEK_SET) = 8192 (0x2000)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x2400,SEEK_SET) = 9216 (0x2400)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x2800,SEEK_SET) = 10240 (0x2800)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x2c00,SEEK_SET) = 11264 (0x2c00)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x3000,SEEK_SET) = 12288 (0x3000)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x3400,SEEK_SET) = 13312 (0x3400)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x3800,SEEK_SET) = 14336 (0x3800)
read(0x3,0x804b000,0x1000)   = 4096 (0x1000)
lseek(3,0x3c00,SEEK_SET) = 15360 (0x3c00)
exit(0x0)

The output speaks for itse

Network interrupt after shutdown method has been called [kern/62889]

2005-08-17 Thread Maxim Sobolev

Folks,

We experience a 100% reproducible panic on one of our machines during 
shutdown+power off. We have found that it's caused by the interrupt 
which happens in re(4) after re_shutdown() method has been called. Quick 
googling reveals that we are not alone who experience this problem and 
such condition sometimes happens as a result of interaction with the 
particular ACPI implementation on shutdown+power off.


Some FreeBSD network drivers have been patched to workaround for the 
problem (i.e. vr(4), see kern/62889), but quick browsing through sources 
suggests that majority still can be affected by the exactly the same 
problem.


Hence the question:

Who is "guilty"? Can the network driver make an assumption that no 
interrupt will happen after its foo_shutdown() has been called? Or such 
assumption cannot be made? In the former case most of the network 
drivers have to be fixed (usually by turning foo_shutdown() into a 
wrapper to foo_detach() as with vr(4)), while in the latter the reason 
of this stray irq should be investigated further and fixed where 
appropriate.


Any comments/ideas?

-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: cvs commit: ports/devel/ORBit Makefile ports/devel/ORBit/files patch-src::IIOP::giop-msg-buffer.c

2001-10-26 Thread Maxim Sobolev

Ian Dowse wrote:
> 
> In message <[EMAIL PROTECTED]>, Maxim Sobolev writ
> es:
> >  Nautilus from working properly. The problem disappeared when I've replaced
> >  writev(2) call with appropriate loop based around ordinary write(2). Perhaps
> >  this should be investigated and the real source of the problem fixed instead,
> >  but I do not have a time for this right now. For those who interested I'm
> >  ready to provide a step-by step instruction on how to reproduce the bug.
> 
> Hi,
> 
> If you have the details handy, a post to -hackers is likely to be
> quite constructive at getting the problem analysed and resolved.

Ok, details are below.

GNOME oaf is a CORBA-based RPC framework. It uses UNIX
domain sockets to communicate between client application and
oafd daemon that serves requests. Usually the communication
looks like the following:

1. Client connects to the oafd daemon via domain socket and
sends marshalled RPC request.
2. The daemon reads request, demarshalls it and executes
either internally or by invoking external program/shared
library.
3. The daemon marshalls result of the call and passes it
back to the client via the same socket.

On the step 3, when marshalling results of the call, daemon
creates a large collection of small buffers (usually 5-10
bytes long each) arranged as array of struct iovec and then
sends this whole buffer to the client using writev(2) call.
In my particular case there were some 2,800 entries in the
buffer and when the daemon tried to send it to the client
writev(2) was returning -1 and setting errno to be EINVAL,
which confused the server and the client causing RPC to
fail.

To check that all buffers are indeed valid I have replaced
writev(2) with a simple loop based around write(2), and the
problem disappeared. See
http://www.freebsd.org/cgi/cvsweb.cgi/ports/devel/ORBit/files/patch-src%3a%3aIIOP%3a%3agiop-msg-buffer.c
for details. I suspect that there is some problem associated
with the writev(2)'s handling of EAGAIN (in my
write(2)-based replacement I've observed EAGAIN on some
800th element of the buffer).

If the problem is confirmed, it should be either fixed, or
somehow noted in the manual page.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: cvs commit: ports/devel/ORBit Makefile ports/devel/ORBit/files patch-src::IIOP::giop-msg-buffer.c

2001-10-26 Thread Maxim Sobolev

Peter Pentchev wrote:
> 
> On Fri, Oct 26, 2001 at 05:49:08PM +0300, Maxim Sobolev wrote:
> > Ian Dowse wrote:
> > >
> > > In message <[EMAIL PROTECTED]>, Maxim Sobolev writ
> > > es:
> > > >  Nautilus from working properly. The problem disappeared when I've replaced
> > > >  writev(2) call with appropriate loop based around ordinary write(2). Perhaps
> > > >  this should be investigated and the real source of the problem fixed instead,
> > > >  but I do not have a time for this right now. For those who interested I'm
> > > >  ready to provide a step-by step instruction on how to reproduce the bug.
> > >
> > > Hi,
> > >
> > > If you have the details handy, a post to -hackers is likely to be
> > > quite constructive at getting the problem analysed and resolved.
> >
> > Ok, details are below.
> >
> > GNOME oaf is a CORBA-based RPC framework. It uses UNIX
> > domain sockets to communicate between client application and
> > oafd daemon that serves requests. Usually the communication
> > looks like the following:
> >
> > 1. Client connects to the oafd daemon via domain socket and
> > sends marshalled RPC request.
> > 2. The daemon reads request, demarshalls it and executes
> > either internally or by invoking external program/shared
> > library.
> > 3. The daemon marshalls result of the call and passes it
> > back to the client via the same socket.
> >
> > On the step 3, when marshalling results of the call, daemon
> > creates a large collection of small buffers (usually 5-10
> > bytes long each) arranged as array of struct iovec and then
> > sends this whole buffer to the client using writev(2) call.
> > In my particular case there were some 2,800 entries in the
> > buffer and when the daemon tried to send it to the client
> > writev(2) was returning -1 and setting errno to be EINVAL,
> > which confused the server and the client causing RPC to
> > fail.
> 
> 2800 entries?  Well, from the writev(2) manual page:
> 
>  In addition, writev() may return one of the following errors:
> 
>  ...
> 
>  [EINVAL]   Iovcnt was less than or equal to 0, or greater than
> UIO_MAXIOV.
> 
> And at least on -stable, UIO_MAXIOV is defined as 1024..

Ah, ok. I've overlooked it somehow.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: cvs commit: ports/devel/ORBit Makefile ports/devel/ORBit/files patch-src::IIOP::giop-msg-buffer.c

2001-10-26 Thread Maxim Sobolev

Peter Pentchev wrote:
> 
> On Fri, Oct 26, 2001 at 06:06:59PM +0300, Maxim Sobolev wrote:
> > Peter Pentchev wrote:
> > >
> > > On Fri, Oct 26, 2001 at 05:49:08PM +0300, Maxim Sobolev wrote:
> [snip]
> > > >
> > > > On the step 3, when marshalling results of the call, daemon
> > > > creates a large collection of small buffers (usually 5-10
> > > > bytes long each) arranged as array of struct iovec and then
> > > > sends this whole buffer to the client using writev(2) call.
> > > > In my particular case there were some 2,800 entries in the
> > > > buffer and when the daemon tried to send it to the client
> > > > writev(2) was returning -1 and setting errno to be EINVAL,
> > > > which confused the server and the client causing RPC to
> > > > fail.
> > >
> > > 2800 entries?  Well, from the writev(2) manual page:
> > >
> > >  In addition, writev() may return one of the following errors:
> > >
> > >  ...
> > >
> > >  [EINVAL]   Iovcnt was less than or equal to 0, or greater than
> > > UIO_MAXIOV.
> > >
> > > And at least on -stable, UIO_MAXIOV is defined as 1024..
> >
> > Ah, ok. I've overlooked it somehow.
> 
> So basically, you still want a loop, but it could be a writev(2) loop,
> not a write(2) loop, to keep some of the writev(2) performance benefit.

Yes, I've figured it already, because doing 2,800 syscalls
when you can do a 3 instead is a bad idea. :)

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



kqueue(2) doesn't deliver EV_EOF on pipes [patch]

2001-11-12 Thread Maxim Sobolev

Hi,

I've noticed that kqueue(2) doesn't notify reader about EV_EOF
condition on pipe. Attached simple test program highlights
the problem (confirmed both on 5-CURRENT and 4-STABLE). Also
attached is the simple fix.

-Maxim


Index: sys/kern/sys_pipe.c
===
RCS file: /home/ncvs/src/sys/kern/sys_pipe.c,v
retrieving revision 1.86
diff -d -u -r1.86 sys_pipe.c
--- sys/kern/sys_pipe.c 2001/09/21 22:46:53 1.86
+++ sys/kern/sys_pipe.c 2001/11/12 13:28:05
@@ -1221,6 +1221,7 @@
 
ppipe->pipe_state |= PIPE_EOF;
wakeup(ppipe);
+   KNOTE(&ppipe->pipe_sel.si_note, 0);
ppipe->pipe_peer = NULL;
}
/*


#include 
#include 
#include 
#include 
#include 
#include 
#include 

void
testpassed(int sig)
{
printf("Test passed\n");
exit(0);
}

int
main(int argc, char **argv)
{
int kq, pid, ppid, nevents;
struct kevent changelist[1];
struct kevent eventlist[1];
int pp[2];

pipe(pp);
ppid = getpid();
pid = fork();

switch (pid) {
case -1:
/* Error */
err(1, "can't fork()");
/* NOTREACHED */

case 0:
/* Child */
close(pp[1]);
kq = kqueue();
EV_SET(changelist, pp[0], EVFILT_READ, EV_ADD | EV_ENABLE | EV_EOF, \
0, 0, NULL);
kevent(kq, changelist, 1, NULL, 0, NULL);
for (;;) {
nevents = kevent(kq, NULL, 0, eventlist, 1, NULL);
if (nevents > 0 || (eventlist[0].flags & EV_EOF) != 0) {
kill(ppid, SIGTERM);
exit(0);
}
}
break;

default:
/* Sever */
close(pp[0]);
break;
}
signal(SIGTERM, testpassed);
/* Give child some time to initialise kqueue(2) */
sleep(1);
close(pp[1]);
/* Give child some time to receive EV_EOF and kill us */
sleep(1);
kill(pid, SIGTERM);
printf("Test failed\n");
exit(1);
}



Re: kqueue(2) doesn't deliver EV_EOF on pipes [patch]

2001-11-12 Thread Maxim Sobolev

>   if (nevents > 0 || (eventlist[0].flags & EV_EOF) != 0) {
^^
OOPS, last minute bug. Should be `&&' instead, but it doesn't affect
outcome of the test.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Using bit 21 of EFLAGS in user-mode [was: Re: sigreturn: eflags creash (fixed!)]

2001-11-15 Thread Maxim Sobolev

On Thu, 15 Nov 2001 14:56:31 -0500 (EST), Joe Clarke wrote:
> 
> I learned about this by reading through some of the -hackers archives.
> One person complained of similar errors trying to get xine to work on
> FreeBSD.  Removing the MMX detection code fixed it.  I remembered libpng
> also used MMX, so I removed the pnggccrd.c source, and voila!
> 
> Based on core dumps, strace output, and a lot of code surfing, this makes
> sense to me.  Basically, any png-dependent app's thread that runs longer
> than what ITIMER_PROF is set to gets hit with a SIGPROF.  When that
> happens, things context switch.  eflags must have been corrupted by the
> MMX code, thus sigreturn() bombs out, and causes uthread_kern to die as
> well.  Here's what strace looks like when balsa tries to read a 33 MB
> mailbox:
> 
> 74202 sigreturn(0x81f2c64
> 
> When this happens, strace politely dies with a bus error.
> 
> Thanks for testing this, Maxim.  Hopefully someone can find the problem
> and fix it for good.

That explains... After a quick glance at png code I found that
the only place where EFLAGS is altered is CPUID code, where
the library flips bit 21 of EFLAGS in order to ensure that the
CPUID instruction is supported (otherwise it will get SIGILL
on older processors). Unfortunately, for some reason FreeBSB
kernel considers bit 21 of EFLAGS as one that should not be
altered in the user mode, thus making it illegal to use standard
user-mode processor-detection routines based around that bit.
AFAIK, it is a bug in FreeBSD, because there is actually nothing
wrong with altering bit 21 in the user mode - it doesn't have
any side effects and pretty much any of the currently available
on the i386 OSes allows it.

Therefore, I would like to ask you to test attached patch and
if it works and there are no other objections I would like to
commit it shortly. To test the patch, you need to recompile
kernel with patch applied, reboot, recompile/reinstall png with
MXX support turned on and try to run Nautilus. Please let me know
if it helped or not.

Thanks!

-Maxim


psl.h.diff
Description: Binary data


Re: Using bit 21 of EFLAGS in user-mode [was: Re: sigreturn: eflags creash (fixed!)]

2001-11-15 Thread Maxim Sobolev

On Thu, 15 Nov 2001 17:41:32 -0500 (EST), Daniel Eischen wrote:
> On Thu, 15 Nov 2001, Maxim Sobolev wrote:
> > On Thu, 15 Nov 2001 14:56:31 -0500 (EST), Joe Clarke wrote:
> > > 
> > > I learned about this by reading through some of the -hackers archives.
> > > One person complained of similar errors trying to get xine to work on
> > > FreeBSD.  Removing the MMX detection code fixed it.  I remembered libpng
> > > also used MMX, so I removed the pnggccrd.c source, and voila!
> > > 
> > > Based on core dumps, strace output, and a lot of code surfing, this makes
> > > sense to me.  Basically, any png-dependent app's thread that runs longer
> > > than what ITIMER_PROF is set to gets hit with a SIGPROF.  When that
> > > happens, things context switch.  eflags must have been corrupted by the
> > > MMX code, thus sigreturn() bombs out, and causes uthread_kern to die as
> > > well.  Here's what strace looks like when balsa tries to read a 33 MB
> > > mailbox:
> > > 
> > > 74202 sigreturn(0x81f2c64
> > > 
> > > When this happens, strace politely dies with a bus error.
> > > 
> > > Thanks for testing this, Maxim.  Hopefully someone can find the problem
> > > and fix it for good.
> > 
> > That explains... After a quick glance at png code I found that
> > the only place where EFLAGS is altered is CPUID code, where
> > the library flips bit 21 of EFLAGS in order to ensure that the
> > CPUID instruction is supported (otherwise it will get SIGILL
> > on older processors). Unfortunately, for some reason FreeBSB
> 
> Does it need to keep bit 21 of EFLAGS flipped, or can libpng
> set it back and keep knowledge that CPUID is supported?  Or
> does that bit need to remain set for CPUID to work?

No it doesn't need it to be in any specific state. The only
knowelege a program gains from the bit 21 is that its state
could be changed, which means that CPUID instruction is
supported. Unfortunately original libpng doesn't bother to
set the state of the bit back, which exposed this problem.

> If at all possible, a fix should be committed that wouldn't
> necessitate a new kernel be built for -stable.

Yes, I was also thinking about that. I've committed a patch,
which restores state of the bit 21 as soon as possible. There
is still a chance that the program will get a signal during
that time, but this change is rather slim. The "unsafe" piece
of code in question looks like:

popfl   <-load eflags with bit 21 flipped
pushfl  <-save resulting eflags
+   popl %%eax  <-load resulting eflags into eax
+   pushl %%ecx <-save original eflags
popfl   <-restore original eflags

Of course, it is possible to either mask all signals during
detection period, or rip out detection code based around
eflags and replace it with SIGILL handler, but this will
cannibalize on speed improvement from MMX optimisations
because of the additonal overhead associated with doing
syscall necessary to set-up signal handler or signal mask.
In any case, tomorrow I will test this workaround
extensively, and if it appears that it is not sufficient
to prevent `sigreturn: eflags...' errors, then I'll just
disable MMX code in the libpng.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: closing down the squid22/23 ports?

2001-09-28 Thread Maxim Sobolev

Adrian Chadd wrote:

> Hi all,
>
> Pardon the cross-posting. :-)
>
> I'd like to look at closing down / making inactive the squid22 and
> squid23 ports. The squid-2.2 and squid-2.3 codebases have been
> inactive and largely unsupported by the squid developers (read: myself
> inclusive here) for some time now, and I'd like to point users
> at the actively developed/maintained squid branch.
>
> Squid-2.5 is also in the pipeline for release soon, and I don't think
> there is a point in having 4 squid ports.
>
> What do people think?
>
> (please CC me, I'm currently not on the ports/hackers list
> for various time-related reasons..)

I'm pretty sure that squid22 could be safely killed, but perhaps it would have
a sense to keep squid23 around for some more time, because many production
systems out there still use it.

-Maxim


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: evolution & sigreturn: eflags = 0x246

2001-12-07 Thread Maxim Sobolev

"Jacques A. Vidrine" wrote:
> 
> Hi,
> 
> I decided to give Evolution a try.  It seems that with large mail
> folder (via Maildir or IMAP), the mail component dies (signal 6).  I
> notice the following:
> 
>   Dec  7 09:24:56 madman /kernel: sigreturn: eflags = 0x246
>   Dec  7 09:24:56 madman /kernel: pid 56881 (evolution-mail), uid 1001: exited on 
>signal 6
> 
> The sigreturn message is generated inside of `sigreturn', around
> line 947 on sys./i386/i386/machdep.c (in 4.4-RELEASE).  This code is
> unfamiliar to me, but I suspect that there is a bug in Evolution's
> signal handling that is causing corruption of the signal context.  I
> thought I'd ask for a second opinion before trying to track it down.
> 
> Has anyone else seen this with Evolution, or something similar with
> another application?

Apply the following patch and rebuild/reinstall your kernel:

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/include/psl.h.diff?r1=1.10&r2=1.11

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: [SUGGESTION] - JFS for FreeBSD

2001-12-11 Thread Maxim Sobolev

Terry Lambert wrote:
> 
> Greg Lehey wrote:
> > Since then, it has become possible for the loader to load modules
> > before booting the kernel.  This means that, theoretically, it would
> > be possible to have a JFS root file system.  Given the strong
> > opposition to the GPL in some factions of the FreeBSD project, I don't
> > see this happening any time soon, especially since we still don't know
> > if it will buy us anything.
> 
> ?
> 
> OK, I load the kernel from the JFS.  I mount the root FS, which
> is a JFS.  I read the module "jfs.ko" from the JFS so that I can
> mount the root FS, which is a JFS, so I can read the module "jfs.ko"
> from the JFS so that I can mount the root FS, which is a JFS, so I
> can read the module "jfs.ko" from the JFS so that I can mount the
> root FS, which is a JFS, so I can...
> 
> Do you see the problem yet?

Libstand (and hence the loader) could be extended to allow reading
files from jfs without using any GPL'ed code. For example our loader
can load modules from the FAT even though we do not have any M$ code.
:) Alternatively, /boot could be placed on separate filesystem, which
could be ufs or anything else supported by the loader.

-Maxim

> > >> It is used on IBM MainFrames and Enterprise servers
> > >> for high performance and maximum throughput...
> > >
> > > No, it's not.  The Linux JFS is derived from the OS/2 JFS code, not
> > > the good AIX JFS code.
> >
> > That's correct, but note that AIX is moving to this code base too, so
> > it's not as if it's second-rate.  From what I've seen of the
> > structures, JFS2 is *much* better than JFS1.  I haven't compared
> > performance.
> 
> None of the Web Connections RS/6000 machines ran this OS/2 derived
> code.  I was under the impression that it was there for Linux
> compatability.  My impression is, layout or not, the original JFS
> is much better code, overall.
> 
> -- Terry
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: what is PSEUDOFS?

2001-12-06 Thread Maxim Sobolev

Hiten Pandya wrote:
> 
> hi all...
> i would like to know if possible what is PSEUDOFS...
> cause i forgot to update my kernel configuration file,
> regarding the message in the UPDATING section...
> 
> i know what DEVFS is... after the lecture at the
> BSDCon
> 2001 Europe by phk

As far as I know, PSEUDOFS is a kernel infrastructure that simplifies
creation of fully-synthetic filesystems (a la procfs, or linprocfs).
It should decrease amount of code duplication among such filesystems.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Linking libc before libc_r into application causes weird problems

2002-02-07 Thread Maxim Sobolev

Hi,

When working on updating port of Ximian Evolution to the latest released
version I have stuck to the problem - the new version of application
just hanged on startup on my 5-CURRENT box. After lot of digging and
debugging I found that the source of the problem is that the resulting
application had libc linked in before libc_r, which caused waitpid() in
the ORBit library just hang forever, even though child process died
almost instantly (I see zombie in the ps(1) output). When program was
relinked with -pthread flag, which seemingly forcing "right" order of
libc/libc_r (libc_r first) the problem disappeared. 

Based on the problematic code in the ORBit I had prepared short testcase
illustrating the problem and attaching it with this message. The problem
could be exposed by compiling the test.c using the following command: 

$ cc test.c -o test -lc -lc_r 

When either of -lc or -lc_r is omitted, or their order is reversed the
problem disappears. The problem doesn't exist on 4-STABLE. 

Any ideas, comments and suggestions are welcome. 

Thanks!

-Maxim




#include 
#include 
#include 
#include 
#include 
#include 

int main()
{
int childpid, exitstatus, itmp;
sigset_t mask, omask;

/* Block SIGCHLD so no one else can wait() on the child before we do. */
sigemptyset(&mask);
sigaddset(&mask, SIGCHLD);
sigprocmask(SIG_BLOCK, &mask, &omask);

childpid = fork();

if(!childpid) {
	int i;

	/* Do something useful */
	sleep(1);

	_exit(0);
}

while ((itmp = waitpid(childpid, &exitstatus, 0)) == -1 && errno == EINTR)
	continue;
sigprocmask (SIG_SETMASK, &omask, NULL);
exit(WEXITSTATUS(exitstatus));
}



Re: Linking libc before libc_r into application causes weird problems

2002-02-08 Thread Maxim Sobolev

Terry Lambert wrote:
> 
> Maxim Sobolev wrote:
> > $ cc test.c -o test -lc -lc_r
> >
> > When either of -lc or -lc_r is omitted, or their order is reversed the
> > problem disappears. The problem doesn't exist on 4-STABLE.
> >
> > Any ideas, comments and suggestions are welcome.
> 
> Symbols are resolved from libraries in the order in which
> they are specified to the linker.
> 
> So the fix is obvious: specify them in the right order.
> 
> Linux doesn't see this because libc_r is just there for
> the reentrant calls, and their threading uses processes,
> instead of a user space ("quantum conservation") scheduler.

All not as easy as it seems to be. -lc could come not from the command
line, but from one of the other libraries the binary being linked
with. Therefore, in real life resolving this problem could be a little
more tricky, especially with large applications (e.g. Evolution), that
uses code from 30+ shared libraries. I think that ld(1) should be
smart enough to reorder libc/libc_r so that libc_r is always linked
before libc.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Linking libc before libc_r into application causes weirdproblems

2002-02-08 Thread Maxim Sobolev

"M. Warner Losh" wrote:
> 
> In message: <1013147180.73417.2.camel@notebook>
> Maxim Sobolev <[EMAIL PROTECTED]> writes:
> : Based on the problematic code in the ORBit I had prepared short testcase
> : illustrating the problem and attaching it with this message. The problem
> : could be exposed by compiling the test.c using the following command:
> :
> : $ cc test.c -o test -lc -lc_r
> 
> cc test.c -o test -pthread
> 
> If that doesn't work, test.c is broken :-)

Hmm, as far as I understand in current -pthread is being slowly
deorbited (replaced with just -lc_r), but this could lead to a problem
when some of other libraries the binary being linked with contains
explicit dependency to libc. I think that ld(1) should be smart enough
to reorder libc/libc_r so that libc_r is always linked before libc.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Linking libc before libc_r into application causes weirdproblems

2002-02-08 Thread Maxim Sobolev

"M. Warner Losh" wrote:
> 
> Confirmed.  test.c appears to work properly when compiled:
> 
> cc -o test test.c -pthread
> ./test
> 
> Generally speaking, if you want to add -lc_r, you are doing things
> incorrectly.  I've done way to much building...  In FreeBSD 3.x you
> did need to do -lc_r, but that was changed to -pthread in 4.0.

And AFAIK then was changed back to -lc_r in 5.0...

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Linking libc before libc_r into application causes weirdproblems

2002-02-08 Thread Maxim Sobolev

Stephen Montgomery-Smith wrote:
> 
> "M. Warner Losh" wrote:
> >
> > Confirmed.  test.c appears to work properly when compiled:
> >
> > cc -o test test.c -pthread
> > ./test
> >
> > Generally speaking, if you want to add -lc_r, you are doing things
> > incorrectly.  I've done way to much building...  In FreeBSD 3.x you
> > did need to do -lc_r, but that was changed to -pthread in 4.0.
> >
> > Warner
> >
> 
> According to the man page for gcc, you are supposed to write
> 
> cc -o test test.c -pthread -D_THREAD_SAFE
> 
> or am I misunderstanding something?

In 5.0-CURRENT -pthread was replaced by -lc_r.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Linking libc before libc_r into application causes weird problems

2002-02-08 Thread Maxim Sobolev

Jason Evans wrote:
> 
> On Fri, Feb 08, 2002 at 07:46:34AM +0200, Maxim Sobolev wrote:
> > Hi,
> >
> > When working on updating port of Ximian Evolution to the latest released
> > version I have stuck to the problem - the new version of application
> > just hanged on startup on my 5-CURRENT box. After lot of digging and
> > debugging I found that the source of the problem is that the resulting
> > application had libc linked in before libc_r, which caused waitpid() in
> > the ORBit library just hang forever, even though child process died
> > almost instantly (I see zombie in the ps(1) output). When program was
> > relinked with -pthread flag, which seemingly forcing "right" order of
> > libc/libc_r (libc_r first) the problem disappeared.
> >
> > Based on the problematic code in the ORBit I had prepared short testcase
> > illustrating the problem and attaching it with this message. The problem
> > could be exposed by compiling the test.c using the following command:
> >
> > $ cc test.c -o test -lc -lc_r
> >
> > When either of -lc or -lc_r is omitted, or their order is reversed the
> > problem disappears. The problem doesn't exist on 4-STABLE.
> >
> > Any ideas, comments and suggestions are welcome.
> 
> IIRC, Dan changed things in -current about six months ago so that -lc_r
> would do the right thing.  Previously (and still in -stable), -pthread was
> necessary in order to prevent libc from being implicitly linked in.
> There's some magic in the compiler front end that prevents libc from being
> implicitly linked in if libc_r is specified.  It may re-order things as
> well, but I'd have to look at the code to verify that.  In any case, don't
> manually specify both, or Bad Things will happen, as you've discovered.

I don't (this was just a testcase). In real life it was really unclear
where that libc came from, because there were no -lc in the command
line and no of the shared libraries specified contained explicit libc
dependency (at least according to ldd(1)).

> It's my hope that we'll be able to use -lpthread by the 5.0 release, which
> is what the standards say should work.  We could have that right now, but
> we've been holding off, since threads may be KSE-based by the 5.0 release.

That would be nice, but we have a real problem at hand. As I said, I
think that ld(1) should be smart enough to reorder libc/libc_r so that
libc_r is always linked before libc. This is clearly not the case
right now. Unfortunately there is no easy way to reproduce this, but
if you have some spare CPU cycles try to remore explicit -pthread from
ports/mail/evolution/Makefile, build the port on -current and do `ldd
/usr/X11R6/bin/evolution'. You will see that libc.so.X precedes
libc_r.so.X, even though -lc wasn't supplied to a linker, while -lc_r
was.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Linking libc before libc_r into application causes weird problems

2002-02-08 Thread Maxim Sobolev

Terry Lambert wrote:
> 
> Maxim Sobolev wrote:
> > That would be nice, but we have a real problem at hand. As I said, I
> > think that ld(1) should be smart enough to reorder libc/libc_r so that
> > libc_r is always linked before libc. This is clearly not the case
> > right now. Unfortunately there is no easy way to reproduce this, but
> > if you have some spare CPU cycles try to remore explicit -pthread from
> > ports/mail/evolution/Makefile, build the port on -current and do `ldd
> > /usr/X11R6/bin/evolution'. You will see that libc.so.X precedes
> > libc_r.so.X, even though -lc wasn't supplied to a linker, while -lc_r
> > was.
> 
> You aren't including the linker lines for the libraries
> specified before the -lc_r (which may themselves be linked
> against libc.so instead of libc_r.so, which is wrong),
> and you aren't including the final link line.
> 
> See the recent patch to ldd to make it work against .so
> libraries (unfortunately, it's only in -current, not yet
> in -stable).

Heh, actually I'm an author of that patch. :)))

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Linking libc before libc_r into application causes weird problems

2002-02-08 Thread Maxim Sobolev

Terry Lambert wrote:
> 
> Maxim Sobolev wrote:
> [...]
> > > Symbols are resolved from libraries in the order in which
> > > they are specified to the linker.
> > >
> > > So the fix is obvious: specify them in the right order.
> [...]
> > All not as easy as it seems to be. -lc could come not from the command
> > line, but from one of the other libraries the binary being linked
> > with. Therefore, in real life resolving this problem could be a little
> > more tricky, especially with large applications (e.g. Evolution), that
> > uses code from 30+ shared libraries. I think that ld(1) should be
> > smart enough to reorder libc/libc_r so that libc_r is always linked
> > before libc.
> 
> Excuse me.
> 
> Even assuming it were possible to order libraries so that
> certain libraries were considered "weak" and others were
> considered "strong" by their symbol tagging alone, you can
> *not* fix this wherne there are two libraries, or a mutual
> precedence order issue.
> 
> How in the heck does it get the Xll libraries linked in the
> correct -lXt -lXext -lX11 order, if not by specifying them
> in the correct order?

When you are linking with shared libraries you do not need to specify
them in the "correct" order, because AFAIK linker takes care of that
using dependency information recorded within each shared library.
Correct order only required for static libraries that do not have a
way to record a dependency information. 

-Maxim

> It's really, really stupid to make an assumption about libc_r
> that you can't even make on Linux with regards to X11/Xext/Xt,
> just because some software had the misfortune to be born on
> the wrong side of the autoconf tracks.
> 
> Code protability is an attribute of the code, not of the
> environment where the code is linked.
> 
> You might as well assume that you are going to reorder the
> dependency graph for template virtual base clases to their
> dependency order instead of their link order for something
> like ACAP (ACAP didn't used to compile with g++ until
> Jeremy Allison and I hacked it into submission, and away
> from bad assumptions, like that one, or certain spacing of
> underscores in declarations).
> 
> -- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Linking libc before libc_r into application causes weird problems

2002-02-08 Thread Maxim Sobolev

Ruslan Ermilov wrote:
> 
> On Fri, Feb 08, 2002 at 12:48:34PM +0200, Maxim Sobolev wrote:
> > Terry Lambert wrote:
> > >
> > > Maxim Sobolev wrote:
> > > > That would be nice, but we have a real problem at hand. As I said, I
> > > > think that ld(1) should be smart enough to reorder libc/libc_r so that
> > > > libc_r is always linked before libc. This is clearly not the case
> > > > right now. Unfortunately there is no easy way to reproduce this, but
> > > > if you have some spare CPU cycles try to remore explicit -pthread from
> > > > ports/mail/evolution/Makefile, build the port on -current and do `ldd
> > > > /usr/X11R6/bin/evolution'. You will see that libc.so.X precedes
> > > > libc_r.so.X, even though -lc wasn't supplied to a linker, while -lc_r
> > > > was.
> > >
> When you say ld(1), are you perhaps mean rtld-elf.so.1 (aka rtld(1))?
> ld(1) only _links_ when static linkage was requested (which is not the
> case here), or writes dynamic dependencies on shared objects.

No, I meant ld(1). The problem here is that in the case when libc is
recorded before libc_r in dynamic dependencies list the resulting
executable may not work correctly (see my testcase).

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Linking libc before libc_r into application causes weird problems

2002-02-08 Thread Maxim Sobolev

Ruslan Ermilov wrote:
> 
> On Fri, Feb 08, 2002 at 07:06:09AM -0800, Terry Lambert wrote:
> > Maxim Sobolev wrote:
> > > No, I meant ld(1). The problem here is that in the case when libc is
> > > recorded before libc_r in dynamic dependencies list the resulting
> > > executable may not work correctly (see my testcase).
> >
> > Patient: "Doctor, it hurts when I record libc before libc_r
> > in the dynamic dependencies list!"
> >
> > Doctor:  [expected response]
> >
> > 8-).
> >
> > Seriously, the "Evolution" build process is seriously
> > broken; it works on Linux because Linux has a simple
> > threads implementation, rather than an efficient one.
> >
> Doctor's Assistant: "No library should ever have an explicit
> dependency on libc".

But no library has it here! libc comes out of blue just before libc_r
- see attached script. Perhaps I'm missing something, but I can't
figure out where it comes from, could you?

-Maxim

Script started on Fri Feb  8 17:41:11 2002
root@notebook# gmake
Making all in glade
gmake[1]: ÷ÈÏÄ × ËÁÔÁÌÏÇ 
`/tmp/portbuild/usr/ports/mail/evolution/work/evolution-1.0.2/shell/glade'
gmake[1]: ãÅÌØ `all' ÎÅ ÔÒÅÂÕÅÔ ×ÙÐÏÌÎÅÎÉÑ ËÏÍÁÎÄ.
gmake[1]: ÷ÙÈÏÄ ÉÚ ËÁÔÁÌÏÇ 
`/tmp/portbuild/usr/ports/mail/evolution/work/evolution-1.0.2/shell/glade'
Making all in importer
gmake[1]: ÷ÈÏÄ × ËÁÔÁÌÏÇ 
`/tmp/portbuild/usr/ports/mail/evolution/work/evolution-1.0.2/shell/importer'
gmake[1]: ãÅÌØ `all' ÎÅ ÔÒÅÂÕÅÔ ×ÙÐÏÌÎÅÎÉÑ ËÏÍÁÎÄ.
gmake[1]: ÷ÙÈÏÄ ÉÚ ËÁÔÁÌÏÇ 
`/tmp/portbuild/usr/ports/mail/evolution/work/evolution-1.0.2/shell/importer'
gmake[1]: ÷ÈÏÄ × ËÁÔÁÌÏÇ 
`/tmp/portbuild/usr/ports/mail/evolution/work/evolution-1.0.2/shell'
/bin/sh ../libtool --mode=link cc  -pipe -O -mpreferred-stack-boundary=2 
-march=pentium -I/usr/X11R6/include -Wall -Wunused  -L/usr/X11R6/lib -o evolution  
e-activity-handler.o e-component-registry.o e-corba-shortcuts.o 
e-corba-storage-registry.o e-corba-storage.o e-folder-type-registry.o e-folder.o 
e-gray-bar.o e-local-folder.o e-local-storage.o e-setup.o e-shell-about-box.o 
e-shell-folder-commands.o e-shell-folder-creation-dialog.o 
e-shell-folder-selection-dialog.o e-shell-folder-title-bar.o e-shell-importer.o 
e-shell-offline-handler.o e-shell-startup-wizard.o 
e-shell-user-creatable-items-handler.o e-shell-utils.o e-shell-view-menu.o 
e-shell-view.o e-shell.o e-shortcuts-view-model.o e-shortcuts-view.o e-shortcuts.o 
e-splash.o e-storage-set-view.o e-storage-set.o e-storage.o e-summary-storage.o 
e-task-bar.o e-task-widget.o e-uri-schema-registry.o evolution-storage-set-view.o 
evolution-storage-set-view-factory.o main.o libeshell.la   
importer/libevolution-importer.la  
 ../widgets/e-timezone-dialog/libetimezonedialog.a   
../widgets/misc/libemiscwidgets.a   ../e-util/libeutil.la  
 ../libical/src/libical/libical-evolution.la -Wl,-E 
-L/usr/X11R6/lib -L/usr/local/lib -lgal -lgnomeprint -lfreetype -lglade-gnome -lglade 
-lxml -lXpm -ljpeg -lgnomeui -lart_lgpl -lgdk_imlib -ltiff -lungif -lpng -lz -lSM 
-lICE -lgnome -lgnomesupport -lesd -laudiofile -lgdk_pixbuf -lgtk12 -lgdk12 
-lgmodule12 -lglib12 -lintl -lXext -lX11 -lm -lgnomecanvaspixbuf -lgiconv -lc_r 
-L/usr/local/lib -lgthread12 -lglib12 -lc_r  
-Wl,-E -L/usr/X11R6/lib -L/usr/local/lib -lgnomeprint -lXpm -ljpeg -lgnomeui 
-lgdk_imlib -ltiff -lungif -lpng -lSM -lICE -lgnome -lgnomesupport -lesd -laudiofile 
-lgdk_pixbuf -lgtk12 -lgdk12 -lgmodule12 -lglib12 -lintl -lXext -lX11 -lart_lgpl -lm 
-lxml -lz -lfreetype -Wl,-E 
-L/usr/X11R6/lib -L/usr/local/lib -lgtkhtml -lgal -lgnomeprint -lfreetype 
-lglade-gnome -lglade -lxml -lXpm -ljpeg -lgnomeui -lart_lgpl -lgdk_imlib -ltiff 
-lungif -lpng -lz -lSM -lICE -lgnome -lgnomesupport -lesd -laudiofile -lgdk_pixbuf 
-lgnomecanvaspixbuf -lgiconv -lgconf-gtk-1 -lgconf-1 -loaf -lORBitCosNaming -lORBit 
-lIIOP -lORBitutil -lwrap -lgtk12 -lgdk12 -lgmodule12 -lglib12 -lintl -lXext -lX11 -lm 
-Wl,-E -L/usr/X11R6/lib 
-L/usr/local/lib -lbonobo_conf -lbonobo -loaf -lORBitCosNaming -lORBit -lIIOP 
-lORBitutil -lwrap -lbonobox -lXpm -ljpeg -lgnomeui -lart_lgpl -lgdk_imlib -ltiff 
-lungif -lpng -lSM -lICE -lgdk_pixbuf -lgtk12 -lgdk12 -lgmodule12 -lXext -lX11 -lxml 
-lz -lgnome -lgnomesupport -lintl -lesd -laudiofile -lm -lglib12   
-Wl,-E -L/usr/X11R6/lib -L/usr/local/lib -lbonobo -loaf 
-lORBitCosNaming -lORBit -lIIOP -lORBitutil -lwrap -lbonobox -lbonobo-print 
-lgnomeprint -lfreetype -lglade-gnome -lglade -lxml -lgdk_pixbuf -lgnomecanvaspixbuf 
-lXpm -ljpeg -lgnomeui -

Re: Linking libc before libc_r into application causes weird problems

2002-02-08 Thread Maxim Sobolev

Will Andrews wrote:
> 
> [ cc:-list from hell snipped ]
> 
> On Fri, Feb 08, 2002 at 05:51:26PM +0200, Maxim Sobolev wrote:
> > But no library has it here! libc comes out of blue just before libc_r
> > - see attached script. Perhaps I'm missing something, but I can't
> > figure out where it comes from, could you?
> 
> It is added automatically by ld(1) if -nostdlib is not specified.

BZZZT! Wrong! It should come *after* libc_r, not before it. Please
read the whole thread.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Linking libc before libc_r into application causes weird problems

2002-02-08 Thread Maxim Sobolev

Terry Lambert wrote:
> 
> Ruslan Ermilov wrote:
> > Sorry, but I don't get it.  I can't reproduce it other than specifying
> > -lc explicitly.  For example, -lssh now depends on -lcrypto and -lz, in
> > that order.  Attempting to link a program with -lc_r -lssh gives, in
> > that order:
> >
> >libc_r.so.5 => /usr/lib/libc_r.so.5 (0x28065000)
> >libssh.so.2 => /usr/lib/libssh.so.2 (0x28083000)
> >libc.so.5 => /usr/lib/libc.so.5 (0x280b2000)
> >libcrypto.so.2 => /usr/lib/libcrypto.so.2 (0x28168000)
> >libz.so.2 => /usr/lib/libz.so.2 (0x28223000)
> >
> > The primary dependecies come first, then secondaries.  I can only
> > imagine the situation where libc.so comes before libc_r.so if some
> > library has a (bogus) explicit dependency on libc.so.
> 
> Yes, this is exactly the case: the shared library is linked
> against libc.so.  THis is actually legal, and, in some cases,
> desirable.
> 
> In the "Evolution" case, though, it's bogus.

As you can see from my log there was no library explicitly linked with
libc and no -lc command line option, but resulting executable ended up
with libc recorded right before libc_r. Any clues?

-Maxim

> 
> > How does ldd(1) output in question looks like, the full version?
> 
> Heh.  Same question I asked, with ldd information for the
> .so's, too.  8-).
> 
> -- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Linking libc before libc_r into application causes weird problems

2002-02-08 Thread Maxim Sobolev

Terry Lambert wrote:
> 
> Maxim Sobolev wrote:
> > But no library has it here! libc comes out of blue just before libc_r
> > - see attached script. Perhaps I'm missing something, but I can't
> > figure out where it comes from, could you?
> 
> What does your patched ldd say about each and every one
> of those .so's you are linking in perhaps being linked
> against libc.so, or linked against something linked
> against something ... linked against something linked
> against libc.so?

It reports the full chain, so that when A.so depends on B.so, while
B.so depends on C.so, but A.so doesn't explicitly depend on C.so, `ldd
A.so' will show both B.so and C.so.

> Are *any* of the object files created with "ld -r"?

I don't think so.

> Which binutils are you using?

$ ld -v
GNU ld version 2.11.2 20010719 [FreeBSD] (with BFD 2.11.2 20010719
[FreeBSD])

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



or not ? [Was: cvs commit: src/include grp.h]

2002-02-25 Thread Maxim Sobolev

"Andrey A. Chernov" wrote:
> 
> On Mon, Feb 25, 2002 at 05:55:48 -0800, Maxim Sobolev wrote:
> > sobomax 2002/02/25 05:55:48 PST
> >
> >   Modified files:
> > include  grp.h
> >   Log:
> >   Backout rev.1.5 - it seems that it's posixly correct that the program
> >   needs to include  before .
> 
> No, it breaks POSIX compatibility, please back it out.
> 
>  is standalone per POSIX and programs WILL treat is as standalone
> for that reason.

Are you sure? I've just heard so many opinions about that and want to
get some clarity before backouting the backout to avoid backouting the
backouted backout later. :)

Please, could anyone confirm or reject assertion that POSIX doesn't
require  before ?

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Extending loader(8) for loading kerels/modules split across several disks

2002-03-05 Thread Maxim Sobolev
error == 0) {
fp->f_loader = i;   /* remember the loader */
*result = fp;
@@ -247,7 +256,7 @@
continue;   /* Unknown to this handler? */
if (error) {
sprintf(command_errbuf, "can't load file '%s': %s",
-   filename, strerror(error));
+   filesv[0], strerror(error));
break;
}
 }
@@ -261,6 +270,7 @@
 struct mod_depend *verinfo;
 struct kernel_module *mp;
 char *dmodname;
+char *filesv[1];
 int error;
 
 md = file_findmetadata(base_file, MODINFOMD_DEPLIST);
@@ -272,7 +282,8 @@
dmodname = (char *)(verinfo + 1);
if (file_findmodule(NULL, dmodname, verinfo) == NULL) {
printf("loading required module '%s'\n", dmodname);
-   error = mod_load(dmodname, verinfo, 0, NULL);
+   filesv[0] = dmodname;
+   error = mod_load(1, filesv, verinfo, 0, NULL);
if (error)
break;
/*
@@ -305,7 +316,7 @@
  * no arguments or anything.
  */
 int
-file_loadraw(char *type, char *name)
+file_loadraw(char *type, int filesc, char *filesv[])
 {
 struct preloaded_file  *fp;
 char   *cp;
@@ -319,16 +330,16 @@
 }
 
 /* locate the file on the load path */
-cp = file_search(name, NULL);
+cp = file_search(filesv[0], NULL);
 if (cp == NULL) {
-   sprintf(command_errbuf, "can't find '%s'", name);
+   sprintf(command_errbuf, "can't find '%s'", filesv[0]);
return(CMD_ERROR);
 }
-name = cp;
+filesv[0] = cp;
 
-if ((fd = open(name, O_RDONLY)) < 0) {
-   sprintf(command_errbuf, "can't open '%s': %s", name, strerror(errno));
-   free(name);
+if ((fd = sopen(filesc, filesv, O_RDONLY)) < 0) {
+   sprintf(command_errbuf, "can't open '%s': %s", filesv[0], strerror(errno));
+   free(filesv[0]);
return(CMD_ERROR);
 }
 
@@ -339,9 +350,9 @@
if (got == 0)   /* end of file */
break;
if (got < 0) {  /* error */
-   sprintf(command_errbuf, "error reading '%s': %s", name, strerror(errno));
-   free(name);
-   close(fd);
+   sprintf(command_errbuf, "error reading '%s': %s", filesv[0], 
+strerror(errno));
+   free(filesv[0]);
+   sclose(fd);
return(CMD_ERROR);
}
laddr += got;
@@ -349,7 +360,7 @@
 
 /* Looks OK so far; create & populate control structure */
 fp = file_alloc();
-fp->f_name = name;
+fp->f_name = filesv[0];
 fp->f_type = strdup(type);
 fp->f_args = NULL;
 fp->f_metadata = NULL;
@@ -362,7 +373,7 @@
 
 /* Add to the list of loaded files */
 file_insert_tail(fp);
-close(fd);
+sclose(fd);
 return(CMD_OK);
 }
 
@@ -372,18 +383,18 @@
  * If module is already loaded just assign new argc/argv.
  */
 int
-mod_load(char *modname, struct mod_depend *verinfo, int argc, char *argv[])
+mod_load(int filesc, char *filesv[], struct mod_depend *verinfo, int argc, char 
+*argv[])
 {
 struct kernel_module   *mp;
 interr;
 char   *filename;
 
-if (file_havepath(modname)) {
-   printf("Warning: mod_load() called instead of mod_loadkld() for module 
'%s'\n", modname);
-   return (mod_loadkld(modname, argc, argv));
+if (file_havepath(filesv[0])) {
+   printf("Warning: mod_load() called instead of mod_loadkld() for module 
+'%s'\n", filesv[0]);
+   return (mod_loadkld(filesc, filesv, argc, argv));
 }
 /* see if module is already loaded */
-mp = file_findmodule(NULL, modname, verinfo);
+mp = file_findmodule(NULL, filesv[0], verinfo);
 if (mp) {
 #ifdef moduleargs
if (mp->m_args)
@@ -394,12 +405,12 @@
return (0);
 }
 /* locate file with the module on the search path */
-filename = mod_searchmodule(modname, verinfo);
+filename = mod_searchmodule(filesv[0], verinfo);
 if (filename == NULL) {
-   sprintf(command_errbuf, "can't find '%s'", modname);
+   sprintf(command_errbuf, "can't find '%s'", filesv[0]);
return (ENOENT);
 }
-err = mod_loadkld(filename, argc, argv);
+err = mod_loadkld(filesc, filesv, argc, argv);
 return (err);
 }
 
@@ -408,7 +419,7 @@
  * search path.
  */
 int
-mod_loadkld(const char *kldname, int argc, char *argv[])
+mod_loadkld(int filesc, char *filesv[], int argc, char *argv[])
 {
 struct preloaded_file  *fp, *last_file;
 int        err;
@@ -417,9 +428,9 @@
 /*
  * Get fully qualified KLD name
 

Re: Extending loader(8) for loading kerels/modules split across

2002-03-05 Thread Maxim Sobolev

John Baldwin wrote:
> 
> On 05-Mar-02 Maxim Sobolev wrote:
> > Hi folks,
> >
> > Please review attached patch, which adds long overdue feature to our
> > loader(8), allowing it to load sequence of files as  a single object.
> > This should allow us to lift 1.44M limit on compressed kernel for the
> > installation diskette. Please note, that to use this feature to load
> > gzip-compressed objects you need to split the object first and then
> > compress each piece individually, not compress first and then split
> > already compressed file. Therefore tight fitting of each piece to the
> > 1.44M limit could be a little tricky, but not impossible. Other way
> > around is to use kgzip(8) utility to compress kernel and then split it
> > into pieces 1.44M each.
> >
> > If there are no objections I would like to commit it ASAP, so that our
> > RE team is able to use this feature in the forthcoming 5.0-DP1
> > release.
> >
> > Any feedback is appreciated.
> 
> Looks good to me I guess. :)  Do you have an example loader.conf that can be
> used on the floppies to demonstrate it?

You probably meant loader.rc? Very simple:

load -n3 /kernel /kernel.1 /kernel.2

This will load kernel out of 3 pieces - they could be either /kernel,
/kernel.1 and /kernel.2 or /kernel.gz, /kernel.1.gz and /kernel.2.gz
or any combination of those. Just as an example I've split stock
kern.flp image from 4.5-RELEASE into two images - they could be
downloaded from http://people.freebsd.org/~sobomax/kern.flp.bz2 and
http://people.freebsd.org/~sobomax/kern1.flp.bz2.;

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Extending loader(8) for loading kerels/modules split across several disks

2002-03-07 Thread Maxim Sobolev

Michael Smith wrote:
> 
> > Please review attached patch, which adds long overdue feature to our
> > loader(8), allowing it to load sequence of files as  a single object.
> 
> I don't like this.  I would much rather see support for 'split' files
> implemented as a stacking filesystem layer like the gzip support, with
> the simple recognition of 'foo.gz.aa' as the first part of a split
> version of 'foo.gz', which in turn is recognised as a compressed version
> of 'foo'.

I am curious how in this case the layer is going to know how many
parts the file contains?


-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Extending loader(8) for loading kerels/modules split across several disks

2002-03-07 Thread Maxim Sobolev

Michael Smith wrote:
> 
> > > > Please review attached patch, which adds long overdue feature to our
> > > > loader(8), allowing it to load sequence of files as  a single object.
> > >
> > > I don't like this.  I would much rather see support for 'split' files
> > > implemented as a stacking filesystem layer like the gzip support, with
> > > the simple recognition of 'foo.gz.aa' as the first part of a split
> > > version of 'foo.gz', which in turn is recognised as a compressed version
> > > of 'foo'.
> >
> > I am curious how in this case the layer is going to know how many
> > parts the file contains?
> 
> The simple way to do it is to keep asking for more parts until there are
> no more.
> 
> You can take the NetBSD approach of wrapping the file in a multipart tar
> archive.
> 
> Or a more elegant method involves the use of a control file.
> 
> eg. the splitfs code, when asked to open "foo" looks for "foo.split"
> which is a text file containing a list of filenames and media names, eg.
> 
> foo.aa "Kernel floppy 1"
> foo.ab "Kernel floppy 2"
> foo.ac "Kernel and modules floppy"
> 
> For each file segment, the process is:
> 
>  - try to open the file
>  - prompt "please insert the disk labelled '"
>  - try to open the file
>  - return error
> 
> At any rate, my key point is that the splitting should be invisible, and
> *definitely* not pushed up into the loader.

Ok, sounds reasonably. I'll try to reimplement the feature this way.

Thank you for suggestion.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Extending loader(8) for loading kerels/modules split across several disks

2002-03-15 Thread Maxim Sobolev

Michael Smith wrote:
> 
> > > > Please review attached patch, which adds long overdue feature to our
> > > > loader(8), allowing it to load sequence of files as  a single object.
> > >
> > > I don't like this.  I would much rather see support for 'split' files
> > > implemented as a stacking filesystem layer like the gzip support, with
> > > the simple recognition of 'foo.gz.aa' as the first part of a split
> > > version of 'foo.gz', which in turn is recognised as a compressed version
> > > of 'foo'.
> >
> > I am curious how in this case the layer is going to know how many
> > parts the file contains?
> 
> The simple way to do it is to keep asking for more parts until there are
> no more.
> 
> You can take the NetBSD approach of wrapping the file in a multipart tar
> archive.
> 
> Or a more elegant method involves the use of a control file.
> 
> eg. the splitfs code, when asked to open "foo" looks for "foo.split"
> which is a text file containing a list of filenames and media names, eg.
> 
> foo.aa "Kernel floppy 1"
> foo.ab "Kernel floppy 2"
> foo.ac "Kernel and modules floppy"
> 
> For each file segment, the process is:
> 
>  - try to open the file
>  - prompt "please insert the disk labelled '"
>  - try to open the file
>  - return error
> 
> At any rate, my key point is that the splitting should be invisible, and
> *definitely* not pushed up into the loader.

Ok, attached is the path, which does exactly what described. Please
review and if there are no objections I would like to commit it
shortly, so that our re@ team would be able to consider it for the
forthcoming 5.0-DP1 release.

Thanks!

-Maxim

Index: src/lib/libstand/Makefile
===
RCS file: /home/ncvs/src/lib/libstand/Makefile,v
retrieving revision 1.27
diff -d -u -r1.27 Makefile
--- src/lib/libstand/Makefile   27 Feb 2002 17:15:37 -  1.27
+++ src/lib/libstand/Makefile   15 Mar 2002 08:40:31 -
@@ -153,6 +153,7 @@
 SRCS+= ufs.c nfs.c cd9660.c tftp.c zipfs.c bzipfs.c
 SRCS+= netif.c nfs.c
 SRCS+= dosfs.c ext2fs.c
+SRCS+= splitfs.c
 
 beforeinstall:
${INSTALL} -C -o ${BINOWN} -g ${BINGRP} -m 444 ${.CURDIR}/stand.h \
Index: src/lib/libstand/bzipfs.c
===
RCS file: /home/ncvs/src/lib/libstand/bzipfs.c,v
retrieving revision 1.3
diff -d -u -r1.3 bzipfs.c
--- src/lib/libstand/bzipfs.c   1 Feb 2002 16:33:40 -   1.3
+++ src/lib/libstand/bzipfs.c   15 Mar 2002 08:40:31 -
@@ -150,7 +150,7 @@
 
 /* If the name already ends in .gz or .bz2, ignore it */
 if ((cp = strrchr(fname, '.')) && (!strcmp(cp, ".gz")
-   || !strcmp(cp, ".bz2")))
+   || !strcmp(cp, ".bz2") || !strcmp(cp, ".split")))
return(ENOENT);
 
 /* Construct new name */
Index: src/lib/libstand/splitfs.c
===
RCS file: src/lib/libstand/splitfs.c
diff -N src/lib/libstand/splitfs.c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ src/lib/libstand/splitfs.c  15 Mar 2002 08:40:31 -
@@ -0,0 +1,287 @@
+/* 
+ * Copyright (c) 2002 Maxim Sobolev
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include 
+__FBSDID("$FreeBSD$");
+
+#include "stand.h"
+
+#define NTRIES (3)
+#define CONF_BUF   (512)
+#defin

Is there any co-operation between KSE and similar effort in NetBSD?

2002-06-11 Thread Maxim Sobolev

Folks,

I wonder if there is any co-operation between our KSE and similar
effort in NetBSD (see
http://web.mit.edu/nathanw/www/usenix/freenix-sa/freenix-sa.html). To
me it sounds logical to unite efforts if not for the kernel code, but
at least for the kernel interfaces and userland library.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Thread-safe resolver [patches for review]

2002-08-12 Thread Maxim Sobolev

Folks,

Attched please find two patches based on bin/29581 PR to make FreeBSD
resolver thread-safe. They represent two approaches to reach this goal
- the first is to introduce reentrant versions of the standard
gethostbyXXX(3) APIs, similar to ones existing in other unices, and
the second one is to make gethostbyXXX(3) returning data placed into
per-thread storage when linked with libc_r. I like the latter approach
more, since it doesn't introduce new non-standard APIs.

I would like to hear any comments and suggestions on the proposed
patches, as well as to opinions about which path to chose.

Thanks!

-Maxim

Index: src/include/netdb.h
===
RCS file: /home/ncvs/src/include/netdb.h,v
retrieving revision 1.24
diff -d -u -r1.24 netdb.h
--- src/include/netdb.h 26 Jun 2002 08:18:42 -  1.24
+++ src/include/netdb.h 10 Aug 2002 10:03:43 -
@@ -82,7 +82,10 @@
 #define_PATH_PROTOCOLS "/etc/protocols"
 #define_PATH_SERVICES  "/etc/services"
 
-extern int h_errno;
+__BEGIN_DECLS
+int * __h_errno_accessor(void);
+__END_DECLS
+#define h_errno (* __h_errno_accessor())
 
 /*
  * Structures returned by network data base library.  All addresses are
@@ -240,6 +243,15 @@
 char   *gai_strerror(int);
 void   setnetgrent(const char *);
 void   setservent(int);
+
+intgethostbyaddr_r(const char *, int, int, struct hostent *,
+char *, int, int *);
+intgethostbyname_r(const char *, struct hostent *,
+char *, int, int *);
+intgethostbyname2_r(const char *, int, struct hostent *,
+ char *, int, int *);
+struct hostent *gethostent_r(struct hostent *, char *, int);
+
 
 /*
  * PRIVATE functions specific to the FreeBSD implementation
Index: src/include/resolv.h
===
RCS file: /home/ncvs/src/include/resolv.h,v
retrieving revision 1.21
diff -d -u -r1.21 resolv.h
--- src/include/resolv.h23 Mar 2002 17:24:53 -  1.21
+++ src/include/resolv.h10 Aug 2002 10:03:43 -
@@ -90,11 +90,16 @@
 #defineMAXDFLSRCH  3   /* # default domain levels to try */
 #defineMAXDNSRCH   6   /* max # domains in search path */
 #defineLOCALDOMAINPARTS2   /* min levels in name that is "local" 
*/
+#defineMAXALIASES  35  /* max # of aliases to return */
+#defineMAXADDRS35  /* max # of addresses to return */
 
 #defineRES_TIMEOUT 5   /* min. seconds between retries */
 #defineMAXRESOLVSORT   10  /* number of net to sort on */
 #defineRES_MAXNDOTS15  /* should reflect bit field size */
 
+#defineCAST_ALIGN(ptr, type) \
+   (char*)(type)ptr < (char*)ptr ? ((type)ptr) + 1 : (type)ptr
+
 struct __res_state {
int retrans;/* retransmition time interval */
int retry;  /* number of times to retransmit */
@@ -198,10 +203,6 @@
char *  humanname;  /* Its fun name, like "mail exchanger" */
 };
 
-extern struct __res_state _res;
-/* for INET6 */
-extern struct __res_state_ext _res_ext;
-
 extern const struct res_sym __p_class_syms[];
 extern const struct res_sym __p_type_syms[];
 
@@ -224,6 +225,7 @@
 #definefp_query__fp_query
 #definefp_nquery   __fp_nquery
 #definehostalias   __hostalias
+#definehostalias_r __hostalias_r
 #defineputlong __putlong
 #defineputshort__putshort
 #definep_class __p_class
@@ -273,6 +275,7 @@
 void   fp_query(const u_char *, FILE *);
 void   fp_nquery(const u_char *, int, FILE *);
 const char *   hostalias(const char *);
+const char *   hostalias_r(const char *, char *, int);
 void   putlong(u_int32_t, u_char *);
 void   putshort(u_int16_t, u_char *);
 const char *   p_class(int);
@@ -315,5 +318,30 @@
 void   res_freeupdrec(ns_updrec *);
 #endif
 __END_DECLS
+
+struct __res_data {
+   int h_errno_res;
+   int s;  /* socket used for communications */
+   int connected : 1;  /* is the socket connected */
+   int vc : 1; /* is the socket a virtual circuit? */
+   int af; /* address family of socket */
+   res_send_qhook Qhook;
+   res_send_rhook Rhook;
+   FILE* hostf;
+   int stayopen;
+   struct __res_state *res;
+   struct __res_state_ext *res_ext;
+};
+
+__BEGIN_DECLS
+u_int16_t _getshort(const u_char *);
+u_int32_t _getlong(const u_char *);
+struct __res_data * __res_data_accessor(void);
+struct __res_state * __res_accessor(void);
+__END_DECLS
+#define _res_data (* __res_data_accessor())
+#define _res (* __res_accessor())
+/* for INET6 */

Re: Thread-safe resolver [patches for review]

2002-08-12 Thread Maxim Sobolev

Terry Lambert wrote:
> 
> Maxim Sobolev wrote:
> > Attched please find two patches based on bin/29581 PR to make FreeBSD
> > resolver thread-safe. They represent two approaches to reach this goal
> > - the first is to introduce reentrant versions of the standard
> > gethostbyXXX(3) APIs, similar to ones existing in other unices, and
> > the second one is to make gethostbyXXX(3) returning data placed into
> > per-thread storage when linked with libc_r. I like the latter approach
> > more, since it doesn't introduce new non-standard APIs.
> >
> > I would like to hear any comments and suggestions on the proposed
> > patches, as well as to opinions about which path to chose.
> 
> 1)  Allocate the per thread storage as a single blob, and
> set the pointers into it, instead of using seperate
> allocations.  This will have the side effect of letting
> you free it, all at once, and will tend to make it
> faster on each first use per thread, anyway.  You can
> do this by making a meta structure containing the list
> of structures to be allocated, and then setting the
> pointers to the addresses of the structure subelements.

Ok, I'll do it.

> 2)  Note somewhere in the man page that this makes it so
> you can not pass the results off to another thread by
> reference, unless you copy them once there (i.e. you
> are not allowed persistant references accross threads).
> It seems to me the most likely use would be to permit
> a seperate thread (or threads) to be used to resolve
> concurrently, and/or with other operations.  The upshot
> of this is that holding a reference would mean that you
> could not initiate another lookup on the lookup worker
> thread(s) until the reference was freed.

Yuip, I'll do it as well.

> You may also want to consider the use of a .init and .fini
> section for the code, to permit the creation of an initial
> lookup context chunk; this is kind of a tradeoff, but it will
> mean that a server will not have to do the recheck each time.
> The .fini section would aloow auto-cleanup.  This may be a
> necessity for a long running program that uses a shared object
> to perform the thread creation and lookup (you could leak
> memory, otherwise).

Could you please elaborate how exactly memory could be leaked in this
case, if the program does correctly shut down all its threads?

I also would like to hear from you whether or not you think that we
need all those gethostbyXXX_r(3) functions.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Thread-safe resolver [patches for review]

2002-08-12 Thread Maxim Sobolev

Daniel Eischen wrote:
> 
> On Mon, 12 Aug 2002, Maxim Sobolev wrote:
> > Folks,
> >
> > Attched please find two patches based on bin/29581 PR to make FreeBSD
> > resolver thread-safe. They represent two approaches to reach this goal
> > - the first is to introduce reentrant versions of the standard
> > gethostbyXXX(3) APIs, similar to ones existing in other unices, and
> > the second one is to make gethostbyXXX(3) returning data placed into
> > per-thread storage when linked with libc_r. I like the latter approach
> > more, since it doesn't introduce new non-standard APIs.
> >
> > I would like to hear any comments and suggestions on the proposed
> > patches, as well as to opinions about which path to chose.
> 
> Why do you need uthread_resolv.c?  You should be able to thread
> calls by checking __isthreaded.  Just keep everything in libc.
> If there are missing stubs for some pthread_* routines (I think
> everything you need is in -current's libc), then add them.

Why do we have uthread_error.c then? Also it will add penalty to every
access to _res_data structure even in non-threaded case. 

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Thread-safe resolver [patches for review]

2002-08-13 Thread Maxim Sobolev

Terry Lambert wrote:
> 
> Maxim Sobolev wrote:
> > > You may also want to consider the use of a .init and .fini
> > > section for the code, to permit the creation of an initial
> > > lookup context chunk; this is kind of a tradeoff, but it will
> > > mean that a server will not have to do the recheck each time.
> > > The .fini section would aloow auto-cleanup.  This may be a
> > > necessity for a long running program that uses a shared object
> > > to perform the thread creation and lookup (you could leak
> > > memory, otherwise).
> >
> > Could you please elaborate how exactly memory could be leaked in this
> > case, if the program does correctly shut down all its threads?
> 
> Create PIC object foo.so.
> Link PIC object foo.so against libc.so.
> Call dlopen to load module foo.so into program "bob".
> Call function in foo.so from program "bob".
> Function in foo.so creates two threads, one for IPv4 lookup,
> another for IPv6 lookup to cause lookups to proceed
> concurrently.
> Lookup completes.
> Unload module foo.so.
> -> leak memory in libc.so image

This scenario doesn't look as a legitimate way to do things for me.
Let's inspect what will happen when you are unloading a PIC module,
which has one or more threads running. There are two possibilities:
either thread scheduler (libc_r) was linked with the program itself
and therefore loaded with it, or it was linked with PIC module and
loaded along with that module. In the first case, after you have
dlclose'd the PIC module, dynamic linker will unmap module's code from
memory, but the thread scheduler will remain running and on the next
attempt to pass control to the thread in your PIC module will probably
get SIGBUS due to the fact that code is no longer mapped. In the
second case, you'll unload module along with thread scheduler, but
thread-scheduling signals setup will remain in place, so that shortly
you will get the same SIGBUS, when the kernel will be trying to
delivery signal to no longer mapper region.

In either case, you will get the problem much more serious than memory
leak.

> The assumption (which is potentially wrong) is that the program
> will correctly shut down all its threads, when in fact it was a
> module not under the programs control that created and used the
> threads.

I do not quite agree. In such case, the module should probably have
destructor function, either placed into the fini section, or to be
explicitly called by the program before dlclose().

> The leak depends on:
> 
> 1)  A pool of worker threads being created and left around
> or the purpose of simultaneous resolution
> 
> 2)  The parent shutting down the module without explicitly
> dealing with the threads (basically, code which would
> need to live in ".fini" of the foo.so, and could not be
> automatically triggered on unload of foo.so any other way).
> 
> I think that parallel IPv6/IPv4 resolution presented as a single
> serial interface is a high probability implementation with the
> support for threaded access to the resolver, particularly with
> the Mozilla code acting the way it does.
> 
> > I also would like to hear from you whether or not you think that we
> > need all those gethostbyXXX_r(3) functions.
> 
> No.  I don't think any of the _r functions are needed, so long
> as the results are not cached by pointer instead of a copy,
> before passing them from one thread to another.  It's a risk on
> the clobber case of a call with a cached reference outstanding
> but not processed by another thread which is not an issue with
> the _r functions, which require that you pass the storage down.
> 
> Of course, if you pass down per thread storage, you could have
> the same problem if you didn't copy rather than reference the
> results before passing to another thread by address.
> 
> Given that, per thread allocations ("thread local storage")
> makes more sense than allocate/free fights between threads
> based on who's responsible for owning the memory after an
> inter-thread call.  8-).

Thank you for the explanation!

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Thread-safe resolver [patches for review]

2002-08-13 Thread Maxim Sobolev

Daniel Eischen wrote:
> 
> On Mon, 12 Aug 2002, Maxim Sobolev wrote:
> > Folks,
> >
> > Attched please find two patches based on bin/29581 PR to make FreeBSD
> > resolver thread-safe. They represent two approaches to reach this goal
> > - the first is to introduce reentrant versions of the standard
> > gethostbyXXX(3) APIs, similar to ones existing in other unices, and
> > the second one is to make gethostbyXXX(3) returning data placed into
> > per-thread storage when linked with libc_r. I like the latter approach
> > more, since it doesn't introduce new non-standard APIs.
> >
> > I would like to hear any comments and suggestions on the proposed
> > patches, as well as to opinions about which path to chose.
> 
> Why do you need uthread_resolv.c?  You should be able to thread
> calls by checking __isthreaded.  Just keep everything in libc.
> If there are missing stubs for some pthread_* routines (I think
> everything you need is in -current's libc), then add them.

I did that, but I can't fugure out correct way to get _thread_run and
_thread_initial symbols into libc from libc_r. Any ideas?

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Increasing size of if_flags field in the ifnet structure [patch for review]

2002-08-15 Thread Maxim Sobolev

Folks,

When implementing ability to switch interface into promisc mode using
ifconfig(8) I've stumbled into the problem with already exhausted
space in the `short if_flags' field in the ifnet structure. I need to
allocate one new flag, while we already have 16 IFF_* flags, and even
one additional flag which is implemented using currently free
if_ipending field of the said structure. Attached patch is aimed at
increasing size of if_flags to 32 bits, as well as to clean-up
if_ipending abuse. Granted, it will break backward ABI compatibility,
but IMO it is not a big problem.

Comments and suggestions are greatly appreciated. Thanks!

-Maxim

Index: src/share/man/man4/netintro.4
===
RCS file: /home/ncvs/src/share/man/man4/netintro.4,v
retrieving revision 1.20
diff -d -u -r1.20 netintro.4
--- src/share/man/man4/netintro.4   18 Mar 2002 12:39:32 -  1.20
+++ src/share/man/man4/netintro.4   15 Aug 2002 18:33:42 -
@@ -197,7 +197,7 @@
 structsockaddr ifru_addr;
 structsockaddr ifru_dstaddr;
 structsockaddr ifru_broadaddr;
-short ifru_flags;
+int   ifru_flags;
 int   ifru_metric;
 int   ifru_mtu;
 int   ifru_phys;
Index: src/share/man/man9/ifnet.9
===
RCS file: /home/ncvs/src/share/man/man9/ifnet.9,v
retrieving revision 1.25
diff -d -u -r1.25 ifnet.9
--- src/share/man/man9/ifnet.9  10 Jan 2002 11:57:10 -  1.25
+++ src/share/man/man9/ifnet.9  15 Aug 2002 18:33:43 -
@@ -284,7 +284,7 @@
 (Set by driver,
 decremented by generic watchdog code.)
 .It Va if_flags
-.Pq Vt short
+.Pq Vt int
 Flags describing operational parameters of this interface (see below).
 (Manipulated by both driver and generic code.)
 .It Va if_capabilities
Index: src/sys/compat/linux/linux_ioctl.c
===
RCS file: /home/ncvs/src/sys/compat/linux/linux_ioctl.c,v
retrieving revision 1.86
diff -d -u -r1.86 linux_ioctl.c
--- src/sys/compat/linux/linux_ioctl.c  26 Jun 2002 15:53:11 -  1.86
+++ src/sys/compat/linux/linux_ioctl.c  15 Aug 2002 18:33:45 -
@@ -1963,7 +1963,7 @@
 {
l_short flags;
 
-   flags = ifp->if_flags;
+   flags = ifp->if_flags & 0x;
/* these flags have no Linux equivalent */
flags &= ~(IFF_SMART|IFF_OACTIVE|IFF_SIMPLEX|
IFF_LINK0|IFF_LINK1|IFF_LINK2);
Index: src/sys/dev/fxp/if_fxp.c
===
RCS file: /home/ncvs/src/sys/dev/fxp/if_fxp.c,v
retrieving revision 1.138
diff -d -u -r1.138 if_fxp.c
--- src/sys/dev/fxp/if_fxp.c9 Aug 2002 01:48:28 -   1.138
+++ src/sys/dev/fxp/if_fxp.c15 Aug 2002 18:33:46 -
@@ -1193,7 +1193,7 @@
 #ifdef DEVICE_POLLING
struct ifnet *ifp = &sc->sc_if;
 
-   if (ifp->if_ipending & IFF_POLLING)
+   if (ifp->if_flags & IFF_POLLING)
return;
if (ether_poll_register(fxp_poll, ifp)) {
/* disable interrupts */
@@ -1785,7 +1785,7 @@
 * ... but only do that if we are not polling. And because (presumably)
 * the default is interrupts on, we need to disable them explicitly!
 */
-   if ( ifp->if_ipending & IFF_POLLING )
+   if ( ifp->if_flags & IFF_POLLING )
CSR_WRITE_1(sc, FXP_CSR_SCB_INTRCNTL, FXP_SCB_INTR_DISABLE);
else
 #endif /* DEVICE_POLLING */
Index: src/sys/dev/vx/if_vx.c
===
RCS file: /home/ncvs/src/sys/dev/vx/if_vx.c,v
retrieving revision 1.36
diff -d -u -r1.36 if_vx.c
--- src/sys/dev/vx/if_vx.c  20 Mar 2002 02:07:47 -  1.36
+++ src/sys/dev/vx/if_vx.c  15 Aug 2002 18:33:47 -
@@ -285,7 +285,7 @@
 register struct ifnet *ifp = &sc->arpcom.ac_if;  
 int i, j, k;
 char *reason, *warning;
-static short prev_flags;
+static int prev_flags;
 static char prev_conn = -1;
 
 if (prev_conn == -1) {
Index: src/sys/kern/kern_poll.c
===
RCS file: /home/ncvs/src/sys/kern/kern_poll.c,v
retrieving revision 1.9
diff -d -u -r1.9 kern_poll.c
--- src/sys/kern/kern_poll.c4 Aug 2002 21:00:49 -   1.9
+++ src/sys/kern/kern_poll.c15 Aug 2002 18:33:48 -
@@ -383,7 +383,7 @@
for (i = 0 ; i < poll_handlers ; i++) {
if (pr[i].handler &&
pr[i].ifp->if_flags & IFF_RUNNING) {
-   pr[i].ifp->if_ipending &= ~IFF_POLLING;
+   pr[i].ifp->if_flags &= ~IFF_POLLING;
pr[i].handler(pr[i].ifp, POLL_DEREGISTER, 1);
}
pr[i].handler=NULL;
@@ -415,7 +415,7 @@
return 0;
if ( !(ifp->if_flags &

Re: Increasing size of if_flags field in the ifnet structure [patch

2002-08-16 Thread Maxim Sobolev

> 
>Please take a look at this patch. It implement 1 more flag to if_flags
> and ofcourse it increases size of this flag field by using if_ipending
> which is unused.

There is no much point in this patch, because it will increase size of
struct  ifreq, which means that no ioctl's from older apps will be accepted
anyway. Therefore, there is no difference between those two, while my
approach is obviously cleaner.

-Maxim

> 
> On Thu, 15 Aug 2002, Julian Elischer wrote:
> 
> > you cannot break ABIs in 4.x
> > in 5.x it will probably be ok until (say) 5.1 or something.
> >
> >
> > On Thu, 15 Aug 2002, Maxim Sobolev wrote:
> >
> > > Folks,
> > >
> > > When implementing ability to switch interface into promisc mode using
> > > ifconfig(8) I've stumbled into the problem with already exhausted
> > > space in the `short if_flags' field in the ifnet structure. I need to
> > > allocate one new flag, while we already have 16 IFF_* flags, and even
> > > one additional flag which is implemented using currently free
> > > if_ipending field of the said structure. Attached patch is aimed at
> > > increasing size of if_flags to 32 bits, as well as to clean-up
> > > if_ipending abuse. Granted, it will break backward ABI compatibility,
> > > but IMO it is not a big problem.
> > >
> > > Comments and suggestions are greatly appreciated. Thanks!
> > >
> > > -Maxim
> >
> >
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-net" in the body of the message
> >
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Increasing size of if_flags field in the ifnet structure [patch

2002-08-16 Thread Maxim Sobolev

> 
> On Thu, 15 Aug 2002, Maxim Sobolev wrote:
> 
> > When implementing ability to switch interface into promisc mode using
> > ifconfig(8) I've stumbled into the problem with already exhausted
> > space in the `short if_flags' field in the ifnet structure. I need to
> > allocate one new flag, while we already have 16 IFF_* flags, and even
> > one additional flag which is implemented using currently free
> > if_ipending field of the said structure. Attached patch is aimed at
> > increasing size of if_flags to 32 bits, as well as to clean-up
> > if_ipending abuse. Granted, it will break backward ABI compatibility,
> > but IMO it is not a big problem.
> 
> Why isn't it a bug problem?  It affects an application ABI (most socket
> ioctls).  We have whole syscalls whose purpose is to avoid breaking
> application ABIs back to about 4.3BSD.  Some of them may even work.
> 
> > Index: src/share/man/man4/netintro.4
> > ===
> > RCS file: /home/ncvs/src/share/man/man4/netintro.4,v
> > retrieving revision 1.20
> > diff -d -u -r1.20 netintro.4
> > --- src/share/man/man4/netintro.4   18 Mar 2002 12:39:32 -  1.20
> > +++ src/share/man/man4/netintro.4   15 Aug 2002 18:33:42 -
> > @@ -197,7 +197,7 @@
> >  structsockaddr ifru_addr;
> >  structsockaddr ifru_dstaddr;
> >  structsockaddr ifru_broadaddr;
> > -short ifru_flags;
> > +int   ifru_flags;
> >  int   ifru_metric;
> >  int   ifru_mtu;
> >  int   ifru_phys;
> 
> This particular ABI seems to have been broken before (in if.h 1.50 on
> 1999/02/09), since the actual struct has "short ifru_flags[2];" followed
> by "short if_index;" instead of "short ifru_flags;", and it has 2 new
> struct members at the end too.  If the struct were actually as above,
> then changing the short to an int would almost be binary compatible
> since it would just expand ifru_flags to use the 2 bytes of unnamed
> padding caused by the poor layout, so the struct wouldn't expand and
> the other members wouldn't move.  Enlarging ifru_flags itself might
> only break big-endian machines (little-endian ones wouldn't notice
> providing the padding is zeroed).
> 
> > Index: src/share/man/man9/ifnet.9
> 
> Breaking kernel ABIs isn't so important.  They should only be compatible
> within major releases.

BTW, I've just realised that we can easily avoid breaking application
ABI by using currently unused ifr_ifru.ifru_flags[2] (aka. ifr_prevflags)
for storing another 16 flags. What do people think?

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Increasing size of if_flags field in the ifnet structure [patch

2002-08-16 Thread Maxim Sobolev

> > There is no much point in this patch, because it will increase size of
> > struct  ifreq, which means that no ioctl's from older apps will be accepted
> > anyway. Therefore, there is no difference between those two, while my
> > approach is obviously cleaner.
> 
>   It does not increase size of struct ifreq.
>   This is a union not a struct as You see.

Oh, yes, you are obviously correct. However, I still wonder if your patch
is endianless-safe.

-Maxim

> union {
> struct  sockaddr ifru_addr;
> struct  sockaddr ifru_dstaddr;
> struct  sockaddr ifru_broadaddr;
> short   ifru_flags[2];
> int ifru_flagslong;
> int ifru_metric;
> int ifru_mtu;
> int ifru_phys;
> int ifru_media;
> caddr_t ifru_data;
> int ifru_cap[2];
> } ifr_ifru;
> >
> > -Maxim
> >
> > >
> > > On Thu, 15 Aug 2002, Julian Elischer wrote:
> > >
> > > > you cannot break ABIs in 4.x
> > > > in 5.x it will probably be ok until (say) 5.1 or something.
> > > >
> > > >
> > > > On Thu, 15 Aug 2002, Maxim Sobolev wrote:
> > > >
> > > > > Folks,
> > > > >
> > > > > When implementing ability to switch interface into promisc mode using
> > > > > ifconfig(8) I've stumbled into the problem with already exhausted
> > > > > space in the `short if_flags' field in the ifnet structure. I need to
> > > > > allocate one new flag, while we already have 16 IFF_* flags, and even
> > > > > one additional flag which is implemented using currently free
> > > > > if_ipending field of the said structure. Attached patch is aimed at
> > > > > increasing size of if_flags to 32 bits, as well as to clean-up
> > > > > if_ipending abuse. Granted, it will break backward ABI compatibility,
> > > > > but IMO it is not a big problem.
> > > > >
> > > > > Comments and suggestions are greatly appreciated. Thanks!
> > > > >
> > > > > -Maxim
> > > >
> > > >
> > > > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > > > with "unsubscribe freebsd-net" in the body of the message
> > > >
> > >
> >
> >
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-net" in the body of the message
> >
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Increasing size of if_flags field in the ifnet structure [patch

2002-08-16 Thread Maxim Sobolev

> 
> On Fri, 16 Aug 2002, Maxim Sobolev wrote:
> 
> MS>BTW, I've just realised that we can easily avoid breaking application
> MS>ABI by using currently unused ifr_ifru.ifru_flags[2] (aka. ifr_prevflags)
> MS>for storing another 16 flags. What do people think?
> 
> The ifr_prevflags may be used by snmp daemons to provide the necessary
> atomic rollback.

Could you please verify? Nothing in the base system uses it. Initially,
ifr_prevflags was added with the following log message (rev.1.50):

  Since ifru_flags is a short, we can fit in a copy of the flags
  before they got changed.  This can help eliminate much of the
  gymnastics drivers do in their ioctl routines to figure this out.

but no drivers are using it so far.

Just in the case, attached is updated patch, which utilises ifr_prevflags
for extending ifr_flags.

-Maxim


diff -druN src.preflags/sbin/ifconfig/ifconfig.c src/sbin/ifconfig/ifconfig.c
--- src.preflags/sbin/ifconfig/ifconfig.c   Thu Aug 15 09:47:46 2002
+++ src/sbin/ifconfig/ifconfig.cFri Aug 16 16:12:09 2002
@@ -999,14 +999,15 @@
exit(1);
}
strncpy(my_ifr.ifr_name, name, sizeof (my_ifr.ifr_name));
-   flags = my_ifr.ifr_flags;
+   flags = my_ifr.ifr_flags | (my_ifr.ifr_flagshigh << 16);
 
if (value < 0) {
value = -value;
flags &= ~value;
} else
flags |= value;
-   my_ifr.ifr_flags = flags;
+   my_ifr.ifr_flags = flags & 0x;
+   my_ifr.ifr_flagshigh = flags >> 16;
if (ioctl(s, SIOCSIFFLAGS, (caddr_t)&my_ifr) < 0)
Perror(vname);
 }
diff -druN src.preflags/share/man/man4/netintro.4 src/share/man/man4/netintro.4
--- src.preflags/share/man/man4/netintro.4  Thu Aug 15 09:47:47 2002
+++ src/share/man/man4/netintro.4   Fri Aug 16 16:11:11 2002
@@ -197,20 +197,21 @@
 structsockaddr ifru_addr;
 structsockaddr ifru_dstaddr;
 structsockaddr ifru_broadaddr;
-short ifru_flags;
+short ifru_flags[2];
 int   ifru_metric;
 int   ifru_mtu;
 int   ifru_phys;
 caddr_t   ifru_data;
 } ifr_ifru;
-#define ifr_addr  ifr_ifru.ifru_addr/* address */
-#define ifr_dstaddr   ifr_ifru.ifru_dstaddr /* other end of p-to-p link */
+#define ifr_addr  ifr_ifru.ifru_addr  /* address */
+#define ifr_dstaddr   ifr_ifru.ifru_dstaddr   /* other end of p-to-p link */
 #define ifr_broadaddr ifr_ifru.ifru_broadaddr /* broadcast address */
-#define ifr_flags ifr_ifru.ifru_flags   /* flags */
-#define ifr_metricifr_ifru.ifru_metric  /* metric */
-#define ifr_mtu   ifr_ifru.ifru_mtu /* mtu */
-#define ifr_phys  ifr_ifru.ifru_phys/* physical wire */
-#define ifr_data  ifr_ifru.ifru_data/* for use by interface */
+#define ifr_flags ifr_ifru.ifru_flags[0]  /* flags (low 16 bits) */
+#define ifr_flagshigh ifr_ifru.ifru_flags[1]  /* flags (high 16 bits) */
+#define ifr_metricifr_ifru.ifru_metric/* metric */
+#define ifr_mtu   ifr_ifru.ifru_mtu   /* mtu */
+#define ifr_phys  ifr_ifru.ifru_phys  /* physical wire */
+#define ifr_data  ifr_ifru.ifru_data  /* for use by interface */
 };
 .Ed
 .Pp
diff -druN src.preflags/share/man/man9/ifnet.9 src/share/man/man9/ifnet.9
--- src.preflags/share/man/man9/ifnet.9 Thu Aug 15 09:47:48 2002
+++ src/share/man/man9/ifnet.9  Thu Aug 15 11:36:46 2002
@@ -284,7 +284,7 @@
 (Set by driver,
 decremented by generic watchdog code.)
 .It Va if_flags
-.Pq Vt short
+.Pq Vt int
 Flags describing operational parameters of this interface (see below).
 (Manipulated by both driver and generic code.)
 .It Va if_capabilities
diff -druN src.preflags/sys/compat/linux/linux_ioctl.c 
src/sys/compat/linux/linux_ioctl.c
--- src.preflags/sys/compat/linux/linux_ioctl.c Thu Aug 15 09:47:48 2002
+++ src/sys/compat/linux/linux_ioctl.c  Thu Aug 15 11:48:59 2002
@@ -1963,7 +1963,7 @@
 {
l_short flags;
 
-   flags = ifp->if_flags;
+   flags = ifp->if_flags & 0x;
/* these flags have no Linux equivalent */
flags &= ~(IFF_SMART|IFF_OACTIVE|IFF_SIMPLEX|
IFF_LINK0|IFF_LINK1|IFF_LINK2);
diff -druN src.preflags/sys/dev/fxp/if_fxp.c src/sys/dev/fxp/if_fxp.c
--- src.preflags/sys/dev/fxp/if_fxp.c   Thu Aug 15 09:47:50 2002
+++ src/sys/dev/fxp/if_fxp.cThu Aug 15 21:17:11 2002
@@ -1193,7 +1193,7 @@
 #ifdef DEVICE_POLLING
struct ifnet *ifp = &sc->sc_if;
 
-   if (ifp->if_ipending & IFF_POLLING)
+   if (ifp->if_flags & IFF_POLLING)
return;
if (ether_poll_register(fxp_poll, ifp)) {
/* disable interrupts */
@@ -1785,7 +1785,7 @@
 * ... but only do that if we are not polling. And because (presumably)
 * the default is interrupts on, we need to disable them expl

Re: Increasing size of if_flags field in the ifnet structure [patch

2002-08-16 Thread Maxim Sobolev

> 
> On Fri, 16 Aug 2002, Maxim Sobolev wrote:
> 
> MS>>
> MS>> On Fri, 16 Aug 2002, Maxim Sobolev wrote:
> MS>>
> MS>> MS>BTW, I've just realised that we can easily avoid breaking application
> MS>> MS>ABI by using currently unused ifr_ifru.ifru_flags[2] (aka. ifr_prevflags)
> MS>> MS>for storing another 16 flags. What do people think?
> MS>>
> MS>> The ifr_prevflags may be used by snmp daemons to provide the necessary
> MS>> atomic rollback.
> MS>
> MS>Could you please verify? Nothing in the base system uses it. Initially,
> MS>ifr_prevflags was added with the following log message (rev.1.50):
> MS>
> MS>  Since ifru_flags is a short, we can fit in a copy of the flags
> MS>  before they got changed.  This can help eliminate much of the
> MS>  gymnastics drivers do in their ioctl routines to figure this out.
> MS>
> MS>but no drivers are using it so far.
> 
> I veryfied only net-snmp-current. It doesn't use it (I guess that
> variable is not SNMP-writeable in net-snmp). I use it however in the
> bsnmp daemon for NgATM. Its the only way to guarantee the atomicity
> required by SNMP.
> 
> Removing something from the ABI which you cannot do otherwise from
> userspace is always a problem, because it may break 3rd party software
> (I mean not binary breakage, but functional breakage).
> 
> Well, while thinking about it: With a user settable PROXY flag there is no
> way for an application to find out whether the flag was already set or not
> if the application sets it, excpect by inspecting the ifr_prevflags field.
> So two applications fiddling with this bit may entirly confuse each other.
> Count me as a vote for keeping the field and breaking binary compatibility
> instead of functionality.

Hmm, I do not really see how this flag may help your application. To set or
reset some flag from if_flags you have to read current values of those
flags, so that you can use that value instead of ifr_prevflags. Of course
it introduces some tiny race condition, but I do not see how presence of
ifr_prevflags can help you in this case. Moreover, possibility of such
race IMO is quite low, as interface flags are usually updated very rarely.

Instead your bsnmp daemons should be using other ways to serialise write
access to interface flags (e.g. lock file).

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: phk's JKH list

2002-08-19 Thread Maxim Sobolev

Poul-Henning Kamp wrote:
> 
> I've started to type in my mental sticky notes, have at it:
> 
> http://people.freebsd.org/~phk/TODO/

Could you please modify reference to each of the tasks to be link to
the list of the relevant patches available so far, so that anybody who
wants to pick up the task will have something to start from? For
example, I have bunch of v_tag cleanup patches sitting in my local
tree and ready to share them with anybody willing to continue where I
left off.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Thread-safe resolver [patches for review]

2002-08-19 Thread Maxim Sobolev

Terry Lambert wrote:
> 
> [...]
> > > The assumption (which is potentially wrong) is that the program
> > > will correctly shut down all its threads, when in fact it was a
> > > module not under the programs control that created and used the
> > > threads.
> >
> > I do not quite agree. In such case, the module should probably have
> > destructor function, either placed into the fini section, or to be
> > explicitly called by the program before dlclose().
> 
> Uh, that's exactly the argument I was making: use a .fini section
> to clean up the per thread memory allocations.
> 8-).

I am not sure how you can get from a .fini section list of per-thread
dynamically allocated storages, without resorting to inspecting inner
implementation details of pthread_{set,get}specific(3). Any ideas?

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: libc_r in stable

2002-09-20 Thread Maxim Sobolev

Andriy,

First of all thank you for your detailed reports, they could be very
useful. Unfortunately, currently I am a bit busy due to participation
in first Ukrainian OSS Conference, therefore it might be better to
submit those reports to someone else - I'd recommend either Daniel
Eischen <[EMAIL PROTECTED]> or Julian Elischer
<[EMAIL PROTECTED]> (both CC'ed), who are FreeBSD libc_r gurys, and
see if they could help you.

Thanks!

-Maixim

Andriy Gapon wrote:
> 
> Maxim,
> 
> sorry if my English is not perfect, but I've decided to use it as more
> offcial language of FreeBSD.
> 
> I have recently been involved into debugging a complex program on FreeBSD
> 4.6.2 (multiprocessed, multithreaded, signal handling, pipes and fifos for
> communication) and based on that I've developed several concerns and ideas
> about pthreads in 4.6.2. I'll start with the most obvious and proceed to
> the ones that I'm not quite sure about.
> 
> 1. write() doesn't set errno to EINTR if thread receives a signal while
> being on a queue waiting for a data on a descriptor
> 
> 2. in the case above, write() always returns -1 regardless of wheather it
> was able to write part of data on previous attempts, I believe it should
> return number of bytes written thus far
> 
> 3. likewise, in the case "real" write() system call returns value < 0,
> libc_r write() retruns the same value, although some data might have been
> written on the previous attempts.
> 
> 4. libc_r execve() sets all descriptors that were not set expicitely to
> non-blocking mode to blocking mode before doing "real" execve, which is
> good and done with non-multithreaded programs possibly being exec'ed in
> mind. However, it has a painful effect if this is done as part of spawning
> another process (fork+exec), obviously all descriptors in a parent become
> blocking that effectively kills multithreading during IO. I think there is
> no other option if a programmer really means to share descriptors between
> a multithreaded and a singlethreaded program. However, in the case
> close-on-exec flag is set on the descriptor, I think, it's better to leave
> the descriptor as is, in the non-blocking mode.
> 
> 5. I see that on SIGCHLD received descriptors are reset back to the
> non-blocking mode with a comment that this is to undo possible setting
> them to blocking state by a child. There is a number of concerns about
> that:
> a. what if not all of the singlethreaded child processes that
> share descriptors with a multithreaded parent exited ?
> b. SIGCHLD may be generated when a child process stops e.g. by ^Z
> on a controlling terminal, when it continues the shared descriptors
> will remain in the non-bloking state.
> c. descriptor flags are reset to union of a saved explicitely set
> value and O_NONBLOCK block flag. This may erase changes performed
> by fcntl() in a child process, which in some exotic case may have
> been ment to persist after the child exits.
> 
> Frankly, I have no good ideas about 5, and obviously all problems with 4
> and 5 are there only if one mixes programs linked with libc and libc_r
> into parent-child relationships and obviously there seems to be no perfect
> solution for such situation, but maybe some improvements can still be
> made.
> 
> --
> Andriy Gapon
> *
> Hang on tightly, let go lightly.

Andriy Gapon wrote:
> 
> Maxim,
> 
> in addition to my previous report:
> 
> 6. open() from libc_r should add O_NONBLOCK to flags before executing
> open() system call, but after saving actual flags value.
> Otherwise, in the situations where system open()
> blocks a whole calling process is blocked, where only a calling thread
> should actually be blocked. Necessary retries (similiar to read() and
> write()) should obviuosly be added too.

Andriy Gapon wrote:
> 
> -- Forwarded message --
> Date: Tue, 17 Sep 2002 13:29:08 -0400 (EDT)
> From: Andriy Gapon <[EMAIL PROTECTED]>
> To: Maxim Sobolev <[EMAIL PROTECTED]>
> Subject: libc_r in stable (fwd)
> 
> Maxim,
> 
> in addition to my previous report:
> 
> 6. open() from libc_r should add O_NONBLOCK to flags before executing
> open() system call, but after saving actual flags value.
> Otherwise, in the situations where system open()
> blocks a whole calling process is blocked, where only a calling thread
> should actually be blocked. Necessary retries (similiar to read() and
> write()) should obviuosly be added too.
> 
> -- End of forwarded message --
> 
> sorry about this one, didn't think it through. Looks like, although
> curre

Patch to allow a driver to report unrecoverable write errors to the buf layer

2002-10-18 Thread Maxim Sobolev
Hi folks,

I noticed that FreeBSD buf/bio subsystem has one very annoying problem
- once the write request is ejected into it, and write operation
failed, there seemingly no way valid to tell the layer to drop the
buffer. Instead, it retries the attempt over and over again, until
reboot, even though originator of request (usually vfs layer) was
already notified about failure and propagated error condition to the
underlying user-lever program.

There is a very easy way to trigger the problem: insert blank floppy
into your drive, format it with newfs_msdos, mount it, remove the disk
from the drive without unmounting and do `touch /floppy/somefile'.
You'll see that touch(1) fails with Input/Output error and the kernel
reports write failure on the console. However, after couple of seconds
you'll notice that the kernel tries to write exactly the same buffer
again, then again ad infinitum. The same effect if you'll mount
write-protected floppy in read/write mode. 

Moreover, such stale buffer prevents the fs from being unmounted (even
forcefully) because before unmounting the kernel wants to ensure that
all dirty buffers are flushed, thus blocking umount(8) forever in
synchronization routine.

OK, you can tell "well, don't do that!", and in this particular case
I'd probably agree, but there at least few others situation in which
such functionality would be very helpful: consider a machine, which
has several disk drives mounted and suddenly one of the drives fails -
it would be nice if the OS could at least try to withstand, or another
example: a RAID array, which due to the failure of some stripes has
been degraded into read-only mode, so that any write operation would
cause above mentioned buf stall. Also in the era of P-n-P hardware
(USB, FireWire etc), it is no longer safe to assume that the disk
drive will be staying connected until the OS lets it go.

Attached patch addresses the problem (with fd(4) only right now, but
it should be trivial to extend other drivers) by allowing any device
driver to inform the buf layer that unrecoverable error condition
occurred during write operation, so that it is meaningless to do a
retry. I would like to hear any comments or suggestions about my
approach.

Also it would be very nice to devise some way to propagate such error
condition into vfs layer, so that the fs driver could act upon it
somehow (e.g. degrade fs into read-only mode).

Thanks!

-Maxim
Index: sys/bio.h
===
RCS file: /home/ncvs/src/sys/sys/bio.h,v
retrieving revision 1.122
diff -d -u -r1.122 bio.h
--- sys/bio.h   9 Oct 2002 07:11:03 -   1.122
+++ sys/bio.h   18 Oct 2002 16:53:02 -
@@ -100,6 +100,15 @@
 /* bio_flags */
 #define BIO_ERROR  0x0001
 #define BIO_DONE   0x0004
+#define BIO_NORETRY0x0008  /* Don't attempt to retry failed   */
+   /* operation. Should be set when   */
+   /* the underlying driver detected  */
+   /* some unrecoverable condition*/
+   /* e.g. fatal hardware failure,*/
+   /* forcefully ejected removable*/
+   /* media, media that has been made */
+   /* write-protected, replaced with  */
+   /* another media etc.  */
 #define BIO_FLAG2  0x4000  /* Available for local hacks */
 #define BIO_FLAG1  0x8000  /* Available for local hacks */
 
Index: kern/vfs_bio.c
===
RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v
retrieving revision 1.338
diff -d -u -r1.338 vfs_bio.c
--- kern/vfs_bio.c  28 Sep 2002 17:46:30 -  1.338
+++ kern/vfs_bio.c  18 Oct 2002 16:53:05 -
@@ -2915,6 +2915,8 @@
return (EINTR);
}
if (bp->b_ioflags & BIO_ERROR) {
+   if (bp->b_ioflags & BIO_NORETRY)
+   bp->b_flags |= B_INVAL;
return (bp->b_error ? bp->b_error : EIO);
} else {
return (0);
Index: isa/fd.c
===
RCS file: /home/ncvs/src/sys/isa/fd.c,v
retrieving revision 1.241
diff -d -u -r1.241 fd.c
--- isa/fd.c2 Oct 2002 20:29:54 -   1.241
+++ isa/fd.c18 Oct 2002 16:53:13 -
@@ -2530,6 +2530,8 @@
}
if ((fd->options & FDOPT_NOERROR) == 0) {
bp->bio_flags |= BIO_ERROR;
+   if (bp->bio_cmd == BIO_WRITE)
+   bp->bio_flags |= BIO_NORETRY;
bp->bio_error = EIO;
bp->bio_resid = bp->bio_bcount - fdc->fd->skip;
} else



Re: Patch to allow a driver to report unrecoverable write errors to the buf layer

2002-10-18 Thread Maxim Sobolev
Matthew Dillon wrote:
> 
> :Hi folks,
> :
> :I noticed that FreeBSD buf/bio subsystem has one very annoying problem
> :- once the write request is ejected into it, and write operation
> :failed, there seemingly no way valid to tell the layer to drop the
> :buffer. Instead, it retries the attempt over and over again, until
> :reboot, even though originator of request (usually vfs layer) was
> :already notified about failure and propagated error condition to the
> :underlying user-lever program.
> :
> :There is a very easy way to trigger the problem: insert blank floppy
> :...
> 
> Your patch looks slightly incomplete to me, but the concept is reasonable.
> The BIO_NORETRY test that sets B_INVAL should probably be done in
> brelse(), not in bufwait().  It is the code in brelse() that actually
> does the re-dirtying of the buffer in case of a write-error.

Ah, actually I've initially put it into brelse() but then reconsidered
a decision and moved it down into bufwait(). I'll move it back. ;)

> This re-dirtying is necessary in most cases to prevent filesystem
> corruption.  Otherwise the buffer may be thrown away and a re-read
> may return the original pre-modified data, causing massive filesystem
> corruption elsewhere (consider what that would mean for a bitmap block).
> 
> I think it's perfectly reasonable to do away with the buffer in the
> case of a floppy error, though.

Thanks!

-Maxim

> -Matt
> 
> :...
> :
> :Also it would be very nice to devise some way to propagate such error
> :condition into vfs layer, so that the fs driver could act upon it
> :somehow (e.g. degrade fs into read-only mode).
> :
> :Thanks!
> :
> :-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Patch to allow a driver to report unrecoverable write errors to the buf layer

2002-10-18 Thread Maxim Sobolev
On Fri, Oct 18, 2002 at 11:35:54AM -0700, Matthew Dillon wrote:
> 
> :> :
> :> :There is a very easy way to trigger the problem: insert blank floppy
> :> :...
> :> 
> :> Your patch looks slightly incomplete to me, but the concept is reasonable.
> :> The BIO_NORETRY test that sets B_INVAL should probably be done in
> :> brelse(), not in bufwait().  It is the code in brelse() that actually
> :> does the re-dirtying of the buffer in case of a write-error.
> :
> :Ah, actually I've initially put it into brelse() but then reconsidered
> :a decision and moved it down into bufwait(). I'll move it back. ;)
> 
> Heh heh.  Well, it seems to me that since it is the BUF abstraction
> that has the error check / redirtying / retry code, then the BUF
> abstraction should probably be responsible for the no-retry case as
> well.  The BIO abstraction is really designed to hold an I/O operation,
> not really to hold meta operations.  You could still specify a BIO
> flag for it since it's a media hack of sorts, but the BUF code should
> be responsible for processing it.

OK, thank you for deteiled explanation.

> I dunno about a formal abstraction.  We need to differentiate between
> media which can and cannot remap blocks.  A 'perfect' solution
> would be far more complex.  File data blocks would have to be
> remapped at the filesystem level and meta-data would have to be 
> invalidated in-core (bitmap, inode blocks with write errors), and
> the filesystem would have to be marked dirty on unmount.  Then unmount
> could safely destroy the buffers representing the write-error'd meta
> data. 
> 
> The VFS layer would definitely need to be involved.  We have the
> advantage in that the buffer cache is already logically mapped, but
> it would still be a fairly sophisticated piece of work.
> 
> :> This re-dirtying is necessary in most cases to prevent filesystem
> :> corruption.  Otherwise the buffer may be thrown away and a re-read
> :> may return the original pre-modified data, causing massive filesystem
> :> corruption elsewhere (consider what that would mean for a bitmap block).
> :> 
> :> I think it's perfectly reasonable to do away with the buffer in the
> :> case of a floppy error, though.
> 
> Just a bit of history.  Originally the buffer cache did not retry error'd
> out writes.  I changed it several years ago because the mechanism
> was producing massive filesystem corruption in the face of disk write
> errors.  The floppy issue was a known issue at the time and I am quite
> happy that someone is tackling the problem now!

Hmm, the current approach doesn't look all that "right" to me, because we are
retrying operation even though the upper-layer code that initiated it was
already notified about the failure (e.g. received EIO), so that it should not
assume that the data was actually written successfully. Or I am missing
something?

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



New kevent types: NOTE_STARTEXEC and NOTE_STOPEXEC

2002-10-24 Thread Maxim Sobolev
Folks,

Please review the patch, which adds two new types of events -
NOTE_STARTEXEC and NOTE_STOPEXEC, that could be used to get
notification when the image starts or stops executing. For example, it
could be used to monitor that a daemon is up and running and notify
administrator when for some reason in exits. I am running this code
for more than a year now without any problems.

Any comments and suggestions are welcome.

Thanks!

-Maxim
Index: src/lib/libc/sys/kqueue.2
===
RCS file: /home/ncvs/src/lib/libc/sys/kqueue.2,v
retrieving revision 1.28
diff -d -u -r1.28 kqueue.2
--- src/lib/libc/sys/kqueue.2   2 Jul 2002 21:04:00 -   1.28
+++ src/lib/libc/sys/kqueue.2   24 Oct 2002 06:57:41 -
@@ -292,7 +292,7 @@
 .Va fflags ,
 and returns when one or more of the requested events occurs on the descriptor.
 The events to monitor are:
-.Bl -tag -width XXNOTE_RENAME
+.Bl -tag -width XXNOTE_STARTEXEC
 .It NOTE_DELETE
 .Fn unlink
 was called on the file referenced by the descriptor.
@@ -310,6 +310,19 @@
 Access to the file was revoked via
 .Xr revoke 2
 or the underlying fileystem was unmounted.
+.It NOTE_STARTEXEC
+The file referenced by the descriptor has been executed via
+.Xr execve 2 ,
+.Xr fork 2
+or similar call.  The PID of the process is returned in
+.Va data .
+.It NOTE_STOPEXEC
+Execution of the file referenced by the descriptor ended.  Triggered when
+the process associated with the file exited or was replaced with anoter
+image using
+.Xr execve 2
+or simial syscall.  The PID of the process is returned in
+.Va data .
 .El
 .Pp
 On return,
Index: src/sys/sys/event.h
===
RCS file: /home/ncvs/src/sys/sys/event.h,v
retrieving revision 1.21
diff -d -u -r1.21 event.h
--- src/sys/sys/event.h 29 Jun 2002 19:14:52 -  1.21
+++ src/sys/sys/event.h 24 Oct 2002 06:57:41 -
@@ -83,13 +83,15 @@
 /*
  * data/hint flags for EVFILT_VNODE, shared with userspace
  */
-#defineNOTE_DELETE 0x0001  /* vnode was removed */
-#defineNOTE_WRITE  0x0002  /* data contents changed */
-#defineNOTE_EXTEND 0x0004  /* size increased */
-#defineNOTE_ATTRIB 0x0008  /* attributes changed */
-#defineNOTE_LINK   0x0010  /* link count changed */
-#defineNOTE_RENAME 0x0020  /* vnode was renamed */
-#defineNOTE_REVOKE 0x0040  /* vnode access was revoked */
+#defineNOTE_DELETE 0x0010  /* vnode was removed */
+#defineNOTE_WRITE  0x0020  /* data contents changed */
+#defineNOTE_EXTEND 0x0040  /* size increased */
+#defineNOTE_ATTRIB 0x0080  /* attributes changed */
+#defineNOTE_LINK   0x0100  /* link count changed */
+#defineNOTE_RENAME 0x0200  /* vnode was renamed */
+#defineNOTE_REVOKE 0x0400  /* vnode access was revoked */
+#defineNOTE_STARTEXEC  0x0800  /* vnode was executed */
+#defineNOTE_STOPEXEC   0x1000  /* vnode execution stopped */
 
 /*
  * data/hint flags for EVFILT_PROC, shared with userspace
@@ -98,6 +100,7 @@
 #defineNOTE_FORK   0x4000  /* process forked */
 #defineNOTE_EXEC   0x2000  /* process exec'd */
 #defineNOTE_PCTRLMASK  0xf000  /* mask for hint bits */
+/* Applies both to EVFILT_VNODE and EVFILT_PROC */
 #defineNOTE_PDATAMASK  0x000f  /* mask for pid */
 
 /* additional flags for EVFILT_PROC */
Index: src/sys/kern/kern_exec.c
===
RCS file: /home/ncvs/src/sys/kern/kern_exec.c,v
retrieving revision 1.193
diff -d -u -r1.193 kern_exec.c
--- src/sys/kern/kern_exec.c11 Oct 2002 21:04:01 -  1.193
+++ src/sys/kern/kern_exec.c24 Oct 2002 06:57:41 -
@@ -518,6 +518,8 @@
 * to locking the proc lock.
 */
textvp = p->p_textvp;
+   if (textvp)
+   VN_KNOTE(textvp, NOTE_STOPEXEC | p->p_pid);
p->p_textvp = ndp->ni_vp;
 
/*
@@ -525,6 +527,7 @@
 * as we're now a bona fide freshly-execed process.
 */
KNOTE(&p->p_klist, NOTE_EXEC);
+   VN_KNOTE(p->p_textvp, NOTE_STARTEXEC | p->p_pid);
p->p_flag &= ~P_INEXEC;
 
/*
Index: src/sys/kern/kern_exit.c
===
RCS file: /home/ncvs/src/sys/kern/kern_exit.c,v
retrieving revision 1.184
diff -d -u -r1.184 kern_exit.c
--- src/sys/kern/kern_exit.c15 Oct 2002 00:14:32 -  1.184
+++ src/sys/kern/kern_exit.c24 Oct 2002 06:58:03 -
@@ -440,6 +440,8 @@
 * Notify inte

Re: New kevent types: NOTE_STARTEXEC and NOTE_STOPEXEC

2002-10-27 Thread Maxim Sobolev
On Sun, Oct 27, 2002 at 01:04:29AM -0700, Juli Mallett wrote:
> * De: Maxim Sobolev <[EMAIL PROTECTED]> [ Data: 2002-10-27 ]
>   [ Subjecte: Re: New kevent types: NOTE_STARTEXEC and NOTE_STOPEXEC ]
> > On Sat, Oct 26, 2002 at 06:09:31PM -0700, Nate Lawson wrote:
> > > On Thu, 24 Oct 2002, Maxim Sobolev wrote:
> > > > Please review the patch, which adds two new types of events -
> > > > NOTE_STARTEXEC and NOTE_STOPEXEC, that could be used to get
> > > > notification when the image starts or stops executing. For example, it
> > > > could be used to monitor that a daemon is up and running and notify
> > > > administrator when for some reason in exits. I am running this code
> > > > for more than a year now without any problems.
> > > > 
> > > > Any comments and suggestions are welcome.
> > > 
> > > Couldn't this just be done by init(8) and /etc/ttys?  Or inetd?  If you
> > > want to write your own, couldn't you use waitpid()?  Or a kevent() of
> > > EVFILT_PROC with NOTE_EXIT/NOTE_FORK?  I'm not sure I see the need for
> > > this.
> > 
> > EVFILT_PROC operates on pids, while NOTE_{START,STOP}EXEC operate on
> > vnodes - it is the main difference. Currently, you can't reliably
> > get a notification when kernes started executing some arbitrary
> > executable from your fs.
> 
> This is not a job for the kernel, I don't think.

Why not? Kernel now reports number of internal events via kqueue(2) interface,
there is nothing wrong in adding yet another one. BTW, linux and irix already
have similar functionality provided by /dev/imon.

> Implement it in userland
> in terms of having the daemon write to a pidfile at startup, and have SIGUSR1
> make it tell the sender it's alive (using my sigq stuff this is trivial, just
> send SIGUSR2 back), and periodically read the pidfile and try to communciate
> with the daemon, and respawn it if it fails.  This could be racey if done
> poorly.  However if you want this for *any* executable, rather than just
> "some arbitrary executable" rather than some specific job, then while I wonder
> how useful it is in a generic concept, the kq solution might be more
> reasonable.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: New kevent types: NOTE_STARTEXEC and NOTE_STOPEXEC

2002-10-27 Thread Maxim Sobolev
On Sun, Oct 27, 2002 at 01:24:19AM -0700, Juli Mallett wrote:
> * De: Maxim Sobolev <[EMAIL PROTECTED]> [ Data: 2002-10-27 ]
>   [ Subjecte: Re: New kevent types: NOTE_STARTEXEC and NOTE_STOPEXEC ]
> > On Sun, Oct 27, 2002 at 01:04:29AM -0700, Juli Mallett wrote:
> > > * De: Maxim Sobolev <[EMAIL PROTECTED]> [ Data: 2002-10-27 ]
> > >   [ Subjecte: Re: New kevent types: NOTE_STARTEXEC and NOTE_STOPEXEC ]
> > > > On Sat, Oct 26, 2002 at 06:09:31PM -0700, Nate Lawson wrote:
> > > > > On Thu, 24 Oct 2002, Maxim Sobolev wrote:
> > > > > > Please review the patch, which adds two new types of events -
> > > > > > NOTE_STARTEXEC and NOTE_STOPEXEC, that could be used to get
> > > > > > notification when the image starts or stops executing. For example, it
> > > > > > could be used to monitor that a daemon is up and running and notify
> > > > > > administrator when for some reason in exits. I am running this code
> > > > > > for more than a year now without any problems.
> > > > > > 
> > > > > > Any comments and suggestions are welcome.
> > > > > 
> > > > > Couldn't this just be done by init(8) and /etc/ttys?  Or inetd?  If you
> > > > > want to write your own, couldn't you use waitpid()?  Or a kevent() of
> > > > > EVFILT_PROC with NOTE_EXIT/NOTE_FORK?  I'm not sure I see the need for
> > > > > this.
> > > > 
> > > > EVFILT_PROC operates on pids, while NOTE_{START,STOP}EXEC operate on
> > > > vnodes - it is the main difference. Currently, you can't reliably
> > > > get a notification when kernes started executing some arbitrary
> > > > executable from your fs.
> > > 
> > > This is not a job for the kernel, I don't think.
> > 
> > Why not? Kernel now reports number of internal events via kqueue(2) interface,
> > there is nothing wrong in adding yet another one. BTW, linux and irix already
> > have similar functionality provided by /dev/imon.
> 
> If you implemented a kq interface, and an imon wrapper, I'd be much more
> impressed.  A less-divergant interface to do this would be nice, even if
> kq is the backing.  In fact, especially if kq is the backing, kq is strong,
> kq will make us smart.

Actually, the only user of /dev/imon I am aware of is SGI's fam package
(file alteration monitor). I've already implemented subset of it
using BSD kqueue as a backend instead of imon. Check bsdfam project
on sourceforge for details.

-Maxim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Memory corruption in -STABLE on P4/2GHz

2002-11-17 Thread Maxim Sobolev
Hi there,

I'm observing very strange memory corruption problems with 2GHz P4
system running 4.7 (security branch as of today). Under the load
(make -j20 buildworld) the compiler or make(1) often die with signal
11. I found in mailing lists that there is similarly looking problem
with -current, any chances that -stable is affected as well?

Adding `options DISABLE_PSE', as suggested, reduced the likelyhood
of the problem, but didn't eliminate it completely (-j20 fails
with sig11 from time to time, but much less frequently than without
the said option.

Any ideas?

-Maxim


Copyright (c) 1992-2002 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 4.7-RELEASE-p2 #0: Sun Nov 17 05:12:06 PST 2002
[EMAIL PROTECTED]:/usr/src/sys/compile/INSTALL
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium 4 (1990.24-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf24  Stepping = 4
  
Features=0x3febf9ff,ACC>
real memory  = 503250944 (491456K bytes)
config> en apm0
config> q
avail memory = 484413440 (473060K bytes)
Preloaded elf kernel "kernel" at 0xc050d000.
Preloaded userconfig_script "/boot/kernel.conf" at 0xc050d09c.
Pentium Pro MTRR support enabled
md0: Malloc disk
Using $PIR table, 5 entries at 0xc00fdec0
apm0:  on motherboard
apm: found APM BIOS v1.2, connected at v1.2
npx0:  on motherboard
npx0: INT 16 interface
pcib0:  on motherboard
pci0:  on pcib0
pcib1:  at device 1.0 on pci0
pci1:  on pcib1
pci1:  at 0.0 irq 11
ohci0:  mem 0xec10-0xec100fff irq 11 at device 8.0 on 
pci0
usb0: OHCI version 1.0
usb0:  on ohci0
usb0: USB revision 1.0
uhub0: NEC OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
ohci1:  mem 0xec101000-0xec101fff irq 5 at device 8.1 on 
pci0
usb1: OHCI version 1.0
usb1:  on ohci1
usb1: USB revision 1.0
uhub1: NEC OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
pci0:  at 8.2 irq 10
rl0:  port 0xd000-0xd0ff mem 0xec103000-0xec1030ff irq 10 
at device 10.0 on pci0
rl0: Ethernet address: 00:e0:4c:77:20:ce
miibus0:  on rl0
rlphy0:  on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
isab0:  at device 17.0 on pci0
isa0:  on isab0
atapci0:  port 0xd400-0xd40f at device 17.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
pci0:  (vendor=0x1106, dev=0x3059) at 17.5 irq 10
orm0:  at iomem 0xc-0xcbfff on isa0
fdc0:  at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0:  at port 0x60,0x64 on isa0
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
ppc0:  at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
plip0:  on ppbus0
lpt0:  on ppbus0
lpt0: Interrupt-driven port
ppi0:  on ppbus0
ad0: DMA limited to UDMA33, non-ATA66 cable or device
ad0: 76319MB  [155061/16/63] at ata0-master UDMA33
acd0: CDROM  at ata1-slave PIO4
Mounting root from ufs:/dev/ad0s2a
pid 70813 (make), uid 0: exited on signal 11 (core dumped)



Re: Memory corruption in -STABLE on P4/2GHz

2002-11-17 Thread Maxim Sobolev
On Sun, Nov 17, 2002 at 07:54:48PM -0500, Craig Rodrigues wrote:
> On Sun, Nov 17, 2002 at 11:16:54PM +0200, Maxim Sobolev wrote:
> > Hi there,
> > 
> > I'm observing very strange memory corruption problems with 2GHz P4
> > system running 4.7 (security branch as of today). Under the load
> > (make -j20 buildworld) the compiler or make(1) often die with signal
> > 11. I found in mailing lists that there is similarly looking problem
> > with -current, any chances that -stable is affected as well?
> 
> I'm seeing similar errors on -current on my AMD K6-2 machine:
> 
> CPU: AMD-K6(tm) 3D processor (400.91-MHz 586-class CPU)
>   Origin = "AuthenticAMD"  Id = 0x58c  Stepping = 12
>   Features=0x8021bf
>   AMD Features=0x8800
> Data TLB: 128 entries, 2-way associative
> Instruction TLB: 64 entries, 1-way associative
> L1 data cache: 32 kbytes, 32 bytes/line, 2 lines/tag, 2-way associative
> L1 instruction cache: 32 kbytes, 32 bytes/line, 2 lines/tag, 2-way associative
> Write Allocate Enable Limit: 384M bytes
> Write Allocate 15-16M bytes: Enable
> 
> I am seeing make or /usr/libexec/cc1 intermittently coredump with SIG 11 or 
> SIG 10 errors when trying to do a buildworld.
> I wasn't sure if it was because I had flaky hardware or not.

It is likely that those aren't related. Mine K6-2/500, which I had
while back, was also causing SIG 11, due to overheating. Another
possible reason is memory - you should check that you have PC100,
not PC66 installed, because K6-2/400 runs with 100MHz FSB.

In this case, the possible overheating is eliminated by keeping the
case fully opened but it doesn't help much.

-Maxim


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Bumping MAXCPU on amd64?

2010-09-22 Thread Maxim Sobolev

Hi,

Is there any reason to keep MAXCPU at 16 in the default kernel config? 
There are quite few servers on the market today that have 24 or even 32 
physical cores. With hyper-threading this can even go as high as 48 or 
64 virtual cpus. People who buy such hardware might get very 
disappointed finding out that the FreeBSD is not going to use such 
hardware to its full potential.


Does anybody object if I'd bump MAXCPU to 32, which is still low but 
might me more reasonable default these days, or at least make it an 
kernel configuration option documented in the NOTES?


Thanks!

-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Bumping MAXCPU on amd64?

2010-09-22 Thread Maxim Sobolev

On 9/22/2010 6:37 AM, John Baldwin wrote:

Unfortunately this can't be MFC'd to 7 as it would destroy the ABI for
existing klds.


Ah, ok, sorry, I did only check RELENG_7. Can we make it a kernel option 
then?


Regards,
--
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
T/F: +1-646-651-1110
Web: http://www.sippysoft.com
MSN: sa...@sippysoft.com
Skype: SippySoft
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Improving geom_mirror(4)'s read balancing

2009-04-28 Thread Maxim Sobolev

Ivan Voras wrote:

Maxim Sobolev wrote:


The patch is available here:
http://sobomax.sippysoft.com/~sobomax/geom_mirror.diff. I would like to
get input on the functionality/code itself, as well on what is the best
way to add this functionality. Right now, it's part of the round-robin
balancing code. Technically, it could be added as a separate new
balancing method, but for the reasons outlined above I really doubt
having "pure" round-robin has any practical value now. The only case
where previous behavior might be beneficial is with solid-state/RAM
disks where there is virtually no seek time, so that by reading close
sectors from two separate disks one could actually get a better speed.
At the very least, the new method should become default, while "old
round-robin" be another option with clearly documented shortcomings. I
would really like to hear what people think about that.


Have you perhaps seen this:

http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/113885

I'm using the patch in the PR and it helps a bit, similar to what you
have seen. Pawel is silent about the issue so I guess it can also be
taken as silent approval :)


Oh, great! I am curious as to if there is any background behind 
"distance to use delay" metric? To me it seems the current number of 
outstanding requests is much more important when selecting between disk 
X and disk Y. I am not a storage expert, so that I could be wrong 
though. One way or another the load-balancing has be improved and the 
new more intelligent scheduling IMHO should be the default one.


-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: C99: Suggestions for style(9)

2009-05-01 Thread Maxim Sobolev

Christoph Mallon wrote:

Roman Divacky schrieb:

I like the part about using as many variables as possible because
of documentation and performance enhancements. I tend to like
the other changes as well..


This is not about using as many variables as possible. The goal is to 
use as many variables as you have logically distinct entities in the 
function. I suppose, this is what you mean, but I want to clarify this 
point.


Why don't just put "logically distinct entities" into separate functions 
on their own? It's a good indicator that the re-factoring is due when 
you reach this point.


-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


heap limits: mmap(2) vs. break(2) on i386

2009-11-27 Thread Maxim Sobolev

Hi,

I am trying to figure out why java fails to start with 1024MB of heap on 
i386 with 4GB of RAM and 4GB of swap. Both MAXDSIZ and DFLDSIZ are set 
to 2GB. Here is my limits:


Resource limits (current):
  cputime  infinity secs
  filesize infinity kB
  datasize  2097152 kB
  stacksize   65536 kB
  coredumpsize infinity kB
  memoryuseinfinity kB
  memorylocked infinity kB
  maxprocesses 5547
  openfiles   2
  sbsize   infinity bytes
  vmemoryuse   infinity kB

Running ktrace I see:

  9154 java CALL 
mmap(0,0x4400,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_NORESERVE|MAP_ANON,0x,0,0)

  9154 java RET   mmap -1 errno 12 Cannot allocate memory
  9154 java CALL  write(0x1,0xbf9fe378,0x2b)
  9154 java GIO   fd 1 wrote 43 bytes
   "Error occurred during initialization of VM

I made a small program that uses malloc(3) to allocate the same amount 
of memory, and that works nicely, ktrace reveals why:


 10108 a.outCALL 
mmap(0,0x4400,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,0x,0,0)

 10108 a.outRET   mmap -1 errno 12 Cannot allocate memory
 10108 a.outCALL  break(0x4c10)
 10108 a.outRET   break 0

So the question is: why does mmap() fails while essentially the same 
sbrk() request succeeds? This is really bad since, while native FreeBSD 
programs can work around this by using malloc(3), Linux programs and 
software that knows nothing about intricate details of the FreeBSD VM 
(i.e. Java) will fail miserably.


I tried increasing vm.max_proc_mmap to 2147483647 from default 49344, 
but it did not do any good. mmap() still fails with the request of this 
size.


I have seen several threads on the issue over the years, but still no 
resolution. It seems that only plausible solution is to limit heap size 
in java, which may not work for all cases.


Funny thing is that the first sentence of the sbrk(2) manual page says:

 The brk() and sbrk() functions are legacy interfaces from before
 the advent of modern virtual memory management.

Yet, "legacy interfaces" seems to do much better job than "modern 
virtual memory management interfaces"!


-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: heap limits: mmap(2) vs. break(2) on i386

2009-11-27 Thread Maxim Sobolev

Jason Evans wrote:

Maxim Sobolev wrote:
I am trying to figure out why java fails to start with 1024MB of heap 
on i386 with 4GB of RAM and 4GB of swap. Both MAXDSIZ and DFLDSIZ are 
set to 2GB.


Some memory (1GiB?) is reserved for kernel address space, and you 
reserved 2GiB for DSS.  That leaves less than 1GiB available after 
shared libraries and whatnot are mapped.  If there is more than 1GiB 
available, mmap can still fail due to the memory being non-contiguous.


Jason,

So, are you saying that by allocating 2GB to MAXDSIZ, I limit myself 
less than 1GB left to be allocated via mmap()?


Perhaps the cause of the problem is my interpretation of MAXDSIZ as an 
overall limit of VM that the process will be able to allocate regardless 
of the memory management interface is wrong, and in fact the process can 
allocate up to MAXDSIZ using sbrk(2) and then some extra using mmap(2) 
up to 3GB?


I tried lowering DFLDSIZ to 1.5GB, and it helped with Java. What is the 
best strategy if I want to maximize amount of memory available to 
applications? Most of modern applications use mmap(), isn't it? Then 
where MAXDSIZ can bite me if I set it to say 512MB?


-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: heap limits: mmap(2) vs. break(2) on i386

2009-11-27 Thread Maxim Sobolev

Jason Evans wrote:

Maxim Sobolev wrote:

Jason Evans wrote:

Maxim Sobolev wrote:
I am trying to figure out why java fails to start with 1024MB of 
heap on i386 with 4GB of RAM and 4GB of swap. Both MAXDSIZ and 
DFLDSIZ are set to 2GB.


Some memory (1GiB?) is reserved for kernel address space, and you 
reserved 2GiB for DSS.  That leaves less than 1GiB available after 
shared libraries and whatnot are mapped.  If there is more than 1GiB 
available, mmap can still fail due to the memory being non-contiguous.


So, are you saying that by allocating 2GB to MAXDSIZ, I limit myself 
less than 1GB left to be allocated via mmap()?


Yes, my recollection is that MAXDSIZ controls the amount of virtual 
address space dedicated to DSS, and this address space will not be 
mapped via anonymous mmap.  I wanted to move completely away from using 
sbrk in malloc, but we can't completely remove DSS for backward 
compatibility reasons, which means less heap address space than would be 
ideal.


What is the best strategy if I want to maximize amount of memory 
available to applications? Most of modern applications use mmap(), 
isn't it? Then where MAXDSIZ can bite me if I set it to say 512MB?


I would set MAXDSIZ to 0, so that the maximum amount of memory is 
available for mapping shared libraries and files, and allocating via 
malloc.  This may cause problems with a couple of ports that implement 
their own memory allocators based on sbrk, but otherwise it should be 
all good.  You might also set /etc/malloc.conf  to 'd' in order to 
disable the sbrk calls.


I see, thank you for the explanation. One of the problem that we are 
having is that we use a lot of interpreted languages in our environment 
(python, php etc), and most of those implement their own memory 
allocators, some of which rely on sbrk(2) unfortunately. I believe 
that's where that 2GB limit of ours comes from - one of our Python 
applications is very memory hungry and we had to bump that limit to 
allow it sufficient room.


Crazy idea, perhaps, but has anyone considered wrapping up sbrk(2) into 
mmap(2), so that there is only one memory pool to draw from? Switch to 
64-bit certainly helps, however there are lot of 32-bit machines hanging 
around and we will see them for a while in the embedded space. Certainly 
current situation with two separate sources of heap memory is not normal.


-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Sudden mbuf demand increase and shortage under the load

2010-02-15 Thread Maxim Sobolev

Hi,

Our company have a FreeBSD based product that consists of the numerous 
interconnected processes and it does some high-PPS UDP processing 
(30-50K PPS is not uncommon). We are seeing some strange periodic 
failures under the load in several such systems, which usually evidences 
itself in IPC (even through unix domain sockets) suddenly either 
breaking down or pausing and restoring only some time later (like 5-10 
minutes). The only sign of failure I managed to find was the increase of 
the "requests for mbufs denied" in the netstat -m and number of total 
mbuf clusters (nmbclusters) raising up to the limit.


I have tried to raise some network-related limits (most notably maxusers 
and nmbclusters), but it has not helped with the issue - it's still 
happening from time to time to us. Below you can find output from the 
netstat -m few minutes right after that shortage period - you see that 
somehow the system has allocated huge amount of memory for the network 
(700MB), with only tiny amount of that being actually in use. This is 
for the kern.ipc.nmbclusters: 302400. Eventually the system reclaims all 
that memory and goes back to its normal use of 30-70MB.


This problem is killing us, so any suggestions are greatly appreciated. 
My current hypothesis is that due to some issues either with the network 
driver or network subsystem itself, the system goes insane and "eats" up 
all mbufs up to nmbclusters limit. But since mbufs are shared between 
network and local IPC, IPC goes down as well.


We observe this issue with systems using both em(4) driver and igb(4) 
driver. I believe both drivers share the same design, however I am not 
sure if this is some kind of design flaw in the driver or part of a 
larger problem with the network subsystem.


This happens on amd64 7.2-RELEASE and 7.3-PRERELEASE alike, with 8GB of 
memory. I have not tried upgrading to 8.0, this is production system so 
upgrading will not be easy.  I don't believe there are some differences 
that let us hope that this problem will go away after upgrade, but I can 
try it as the last resort.


As I said, this is very critical issue, so I can provide any additional 
debug information upon request. We are ready to go as far as paying 
somebody reasonable amount of money for tracking down and resolving the 
issue.


Regards,
--
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
T/F: +1-646-651-1110
Web: http://www.sippysoft.com
MSN: sa...@sippysoft.com
Skype: SippySoft


[ssp-r...@ds-467 /usr/src]$ netstat -m
17061/417669/434730 mbufs in use (current/cache/total)
10420/291980/302400/302400 mbuf clusters in use (current/cache/total/max)
10420/0 mbuf+clusters out of packet secondary zone in use (current/cache)
19/1262/1281/51200 4k (page size) jumbo clusters in use 
(current/cache/total/max)

0/0/0/25600 9k jumbo clusters in use (current/cache/total/max)
0/0/0/12800 16k jumbo clusters in use (current/cache/total/max)
25181K/693425K/718606K bytes allocated to network (current/cache/total)
1246681/129567494/67681640 requests for mbufs denied 
(mbufs/clusters/mbuf+clusters)

0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

[FEW MINUTES LATER]

[ssp-r...@ds-467 /usr/src]$ netstat -m
10001/84574/94575 mbufs in use (current/cache/total)
6899/6931/13830/302400 mbuf clusters in use (current/cache/total/max)
6899/6267 mbuf+clusters out of packet secondary zone in use (current/cache)
2/1151/1153/51200 4k (page size) jumbo clusters in use 
(current/cache/total/max)

0/0/0/25600 9k jumbo clusters in use (current/cache/total/max)
0/0/0/12800 16k jumbo clusters in use (current/cache/total/max)
16306K/39609K/55915K bytes allocated to network (current/cache/total)
1246681/129567494/67681640 requests for mbufs denied 
(mbufs/clusters/mbuf+clusters)

0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Sudden mbuf demand increase and shortage under the load

2010-02-15 Thread Maxim Sobolev

Sergey Babkin wrote:

Maxim Sobolev wrote:

Hi,

Our company have a FreeBSD based product that consists of the numerous
interconnected processes and it does some high-PPS UDP processing
(30-50K PPS is not uncommon). We are seeing some strange periodic
failures under the load in several such systems, which usually evidences
itself in IPC (even through unix domain sockets) suddenly either
breaking down or pausing and restoring only some time later (like 5-10
minutes). The only sign of failure I managed to find was the increase of
the "requests for mbufs denied" in the netstat -m and number of total
mbuf clusters (nmbclusters) raising up to the limit.


As a simple idea: UDP is not flow-controlled. So potentially
nothing stops an application from sending the packets as fast 
as it can. If it's faster than the network card can process,

they would start collecting. So this might be worth a try
as a way to reproduce the problem and see if the system has
a safeguard against it or not.

Another possibility: what happens if a process is bound to
an UDP socket but doesn't actually read the data from it?
FreeBSD used to be pretty good at it, just throwing away
the data beyond a certain limit, SVR4 was running out of
network memory. But it might have changed, so might be
worth a look too.


Thanks. Yes, the latter could be actually the case. The former is less 
likely since the system doesn't generate so much traffic by itself, but 
rather relays what it receives from the network pretty much in 1:1 
ratio. It could happen though, if somehow the output path has been 
stalled. However, netstat -I igb0 shows zero Oerrs, which I guess means 
that we can rule that out too, unless there is some bug in the driver.


So we are looking for potential issues that can cause UDP forwarding 
application to stall and not dequeue packets on time. So far we have 
identified some culprits in application logic that can cause such stalls 
in the unlikely event of gettimeofday() time going backwards. I've seen 
some messages from ntpd around the time of the problem, although it's 
unclear whether those are result of the that mbuf shortage or could 
indicate the root issue. We've also added some debug output to catch any 
abnormalities in the processing times.


In any case I am a little bit surprised on how easy the FreeBSD can let 
mbuf storage to overflow. I'd expect it to be more aggressive in 
dropping things received from network once one application stalls. 
Combined with the fact that we apparently use shared storage for 
different kinds of network activity and perhaps IPC too, this gives an 
easy opportunity for DOS attacks. To me, separate limits for separate 
protocols or even classes of traffic (i.e. local/remote) would make much 
sense.


Thanks to everybody for useful tips and suggestions, I will do more 
research along the lines and let you know once we either resolve the 
case or when I have more diagnostic output.


Regards,
--
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
T/F: +1-646-651-1110
Web: http://www.sippysoft.com
MSN: sa...@sippysoft.com
Skype: SippySoft
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Sudden mbuf demand increase and shortage under the load (igb issue?)

2010-02-16 Thread Maxim Sobolev
OK, here is some new data that I think rules out any issues with the 
applications. Following Alfred's suggestion I have made a script to run 
every second and output some system statistics:


date
netstat -m
vmstat -i
ps -axl
pstat -T
vmstat -z
sysctl -a

The problem had hit us again today several times and upon investigating 
the log I found that increase in the mbuf usage happened in one step - 
going from normal 10% to 100% between two script runs. What is more 
interesting, is that time from two such subsequent runs were about 2 
minutes apart (instead of 1 second as it should be) and when inspecting 
cron logs I noticed the same time gap in there. I ruled out any VM 
starvation as a cause of the delay because system has plenty of free 
memory. The incoming network traffic was not sufficient to starve VM so 
quickly either - it was about 7MB/sec at that time, so even if all 
receivers stopped draining their buffers it should have taken at least 
1-2 seconds to fill up mbuf cache and create demand for an additional 
kernel memory. The failure would likely to be more gradual and I should 
have seen how it builds up in the debug log.


So it looks like kernel issue of a sort, which causes all userland 
activity to cease for 2 minutes when the system reaches certain load. 
Mbuf build-up is only the by-product of this, not really a cause. igb(4) 
is being the primary suspect now, since we have other machines with more 
load not having this problem and we don't have anybody else using this 
driver.  The chip is the following:


i...@pci0:5:0:0:class=0x02 card=0x323f103c chip=0x10c98086 
rev=0x01 hdr=0x00

vendor = 'Intel Corporation'
class  = network
subclass   = ethernet
i...@pci0:5:0:1:class=0x02 card=0x323f103c chip=0x10c98086 
rev=0x01 hdr=0x00

vendor = 'Intel Corporation'
class  = network
subclass   = ethernet

Hardware in question is a new HP DL160G6. I have also checked IPMI logs 
and sensors and have not found any issue in there as well. No sensors 
reported off-range values and chassis temperature is within normal limits.


I am not sure how to debug this problem further. We are now 
investigating opportunity to install external non-igb card to the server 
and see if it solves the issue.


I have the whole log if anyone wants to take a closer peek.

Regards,
--
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
T/F: +1-646-651-1110
Web: http://www.sippysoft.com
MSN: sa...@sippysoft.com
Skype: SippySoft
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Sudden mbuf demand increase and shortage under the load (igb issue?)

2010-02-18 Thread Maxim Sobolev

Folks,

Indeed, it looks like igb(4) issue. Replacing the card with the 
desktop-grade em(4)-supported card has fixed the problem for us. The 
system has been happily pushing 110mbps worth of RTP traffic and 2000 
concurrent calls without any problems for two days now.


e...@pci0:7:0:0: class=0x02 card=0xa01f8086 chip=0x10d38086 rev=0x00 
hdr=0x00

vendor = 'Intel Corporation'
class  = network
subclass   = ethernet

em0:  port 0xec00-0xec1f mem 
0xfbee-0xfbef,0xfbe0-0xfbe7,0xfbedc000-0xfbed irq 24 
at device 0.0 on pci7

em0: Using MSIX interrupts
em0: [ITHREAD]
em0: [ITHREAD]
em0: [ITHREAD]
em0: Ethernet address: 00:1b:21:50:02:49

I really think that this has to be addressed before 7.3 release is out. 
FreeBSD used to be famous for its excellent network performance and it's 
shame to see that deteriorating due to sub-standard quality drivers. 
Especially when there is a multi-billion vendor supporting the driver in 
question. No finger pointing, but it really looks like either somebody 
is not doing his job or the said vendor doesn't care so much about 
supporting FreeBSD. I am pretty sure the vendor in question has access 
to numerous load-testing tools, that should have catched this issue.


This is the second time during the past 6 months I have issue with the 
quality of the Intel-based drivers - the first one is filed as 
kern/140326, which has stalled apparently despite me providing all 
necessary debug information.


Regards,
--
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
T/F: +1-646-651-1110
Web: http://www.sippysoft.com
MSN: sa...@sippysoft.com
Skype: SippySoft
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Sudden mbuf demand increase and shortage under the load (igb issue?)

2010-02-18 Thread Maxim Sobolev

Jack Vogel wrote:
This thread is confusing, first he says its an igb problem, then you 
offer an em patch :)


I suspect it could be patch for the kern/140326.

-Maxim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


  1   2   >