Re: libprocstat(3): retrieve process command line args and environment

2013-04-03 Thread Mikolaj Golub
The updated patch set:

http://people.freebsd.org/~trociny/procstat_core.4.patch

It includes changes discussed with Kostik. New NT_PROCSTAT_PSSTRINGS
and NT_PROCSTAT_AUXV notes are added. libprocstat(3) is extended with
functions to retrieve env, args and auxv (so the patch that started
this thread a couple of months ago has been merged to this patch set
too). procstat(1) is fully converted to work only via libprocstat(3),
all its options are supported for core dumps, except kernel stacks,
which are not useful in this case.

It looks for me to be in a good shape now and I am planning to start
committing things in a week or two, after some additional testing, if
there is no objections or other suggestions.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: libprocstat(3): retrieve process command line args and environment

2013-03-31 Thread Mikolaj Golub
On Sun, Mar 31, 2013 at 04:40:47PM +0300, Konstantin Belousov wrote:

> I inspected imgact_elf.c:parse_note(), imgact_elf.c:putnote() and
> rtld.c:digest_notes(). Only  putnote() uses 8-byte alignment.
> Every other OS and our !coredump code assumes 4-byte alignment.

Thanks!
 
> Does changing the putnote() to align on the 4-byte boundary cause
> real change in the core file notes layout ?

Currently, we store only 4 types of notes in a core file:

#define NT_PRSTATUS 1   /* Process status. */
#define NT_FPREGSET 2   /* Floating point registers. */
#define NT_PRPSINFO 3   /* Process state info. */
#define NT_THRMISC  7   /* Thread miscellaneous info. */

I checked the sizes of structures inserted into the notes, and on amd64
they all are multiple of 8:

(kgdb) p sizeof(prpsinfo_t) % 8
$1 = 0
(kgdb) p sizeof(prstatus_t) % 8
$2 = 0
(kgdb) p sizeof(prfpregset_t) % 8
$3 = 0
(kgdb) p sizeof(thrmisc_t) % 8
$4 = 0

so both 4-byte and 8-byte aligned.

I believe that the patch below will not change the current core file
notes layout, will make things consistent in our tree, and will make
adding my procstat notes easier, if I use 4-byte alignment.

Are you ok if I commit it before introducing my changes?

Index: sys/kern/imgact_elf.c
===
--- sys/kern/imgact_elf.c   (revision 248706)
+++ sys/kern/imgact_elf.c   (working copy)
@@ -1538,10 +1538,10 @@ __elfN(putnote)(void *dst, size_t *off, const char
*off += sizeof note;
if (dst != NULL)
bcopy(name, (char *)dst + *off, note.n_namesz);
-   *off += roundup2(note.n_namesz, sizeof(Elf_Size));
+   *off += roundup2(note.n_namesz, sizeof(Elf32_Size));
if (dst != NULL)
bcopy(desc, (char *)dst + *off, note.n_descsz);
-   *off += roundup2(note.n_descsz, sizeof(Elf_Size));
+   *off += roundup2(note.n_descsz, sizeof(Elf32_Size));
 }
 
 static boolean_t

Also, shouldn't we update then the following comment in sys/elf_common.h?

/*
 * Note header.  The ".note" section contains an array of notes.  Each
 * begins with this header, aligned to a word boundary.  Immediately
 * following the note header is n_namesz bytes of name, padded to the
 * next word boundary.  Then comes n_descsz bytes of descriptor, again
 * padded to a word boundary.  The values of n_namesz and n_descsz do
 * not include the padding.
 */

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: libprocstat(3): retrieve process command line args and environment

2013-03-29 Thread Mikolaj Golub
On Fri, Mar 29, 2013 at 11:22:45AM +0200, Konstantin Belousov wrote:
> On Thu, Mar 28, 2013 at 11:18:21PM +0200, Mikolaj Golub wrote:
> > On Thu, Mar 28, 2013 at 12:51:34PM +0200, Konstantin Belousov wrote:
> > 
> > > In the generic Elf 64bit draft specification I have, the notes sections
> > > are specified to consists of entries, each of which is an array of 8-byte
> > > words. I think we are right using the 8-byte alignment.
> > 
> > I have impression many implementations use 4-byte alignment. E.g. in
> > NetBSD:
> > 
> > sys/kern/core_elf32.c:
> > 
> > #define ELFROUNDSIZE4   /* XXX Should it be sizeof(Elf_Word)? */
> > #define elfround(x) roundup((x), ELFROUNDSIZE)
> Note that this is core_elf32. I am concerned with the 64-bit cores.

core_elf64.c:

#define ELFSIZE 64

#include "core_elf32.c"

> > 
> > Also, we have inconsistency with imgactl_elf.c/parse_notes(), which
> > uses 4-byte alignment:
> > 
> > note = (const Elf_Note *)((const char *)(note + 1) +
> > roundup2(note->n_namesz, sizeof(Elf32_Addr)) +
> > roundup2(note->n_descsz, sizeof(Elf32_Addr)));
> > 
> > I suppose there were no issues before, because accidentally the sizes
> > of all notes we had were 8 bytes aligned.
> Indeed, both ABI and NOINIT notes have size which is multiple of 8.
> 
> > 
> > Now, when I add new notes it will break things. I don't have strong
> > opinion, it will be ok for me to leave 8-byte alignment and fix
> > issues, just want to have strong support here :-)
> Well, while the issue is discussed and decided, you could just make
> your new notes size be multiple of 8 too.

I thought about this too. Then I need to be more caerful when
extracting stats from notes, because the length returned by
procstat_core_get() can be more than a real payload.

Ok, I will try this way.

I could add length to the note header, which is currently contains
only structsize, so it would became something like:

struct {
  int structsize;
  int lenght;
}

But not sure it is worth doing, especially if the forced 8-bit
alignment is a temporary mesure.

> > 
> > BTW, looking at NetBSD code I see they set p_align in the note
> > segement to ELFROUNDSIZE:
> > 
> > /* Write out the PT_NOTE header. */
> > ws.psections->p_type = PT_NOTE;
> > ws.psections->p_offset = notestart;
> > ws.psections->p_vaddr = 0;
> > ws.psections->p_paddr = 0;
> > ws.psections->p_filesz = notesize;
> > ws.psections->p_memsz = 0;
> > ws.psections->p_flags = PF_R;
> > ws.psections->p_align = ELFROUNDSIZE;
> > 
> > while we set to 0:
> > 
> > /* The note segement. */
> > phdr->p_type = PT_NOTE;
> > phdr->p_offset = hdrsize;
> > phdr->p_vaddr = 0;
> > phdr->p_paddr = 0;
> > phdr->p_filesz = notesz;
> > phdr->p_memsz = 0;
> > phdr->p_flags = 0;
> > phdr->p_align = 0;
> You mean, for the core dumps ?

yes

> > 
> > Shouldn't we set it to alignment size too? Note also, they set
> > "Segment is readable" flag.
> I think both changes are fine.
> 
> I skimmed over the usermode parts of the patch. One thing I did not liked
> is the mis-handling of the read() return values. If there is short read,
> the errno value is meaningless, but warn() would still append it to
> the message. I suggest to explicitely distinguish -1 and >= 0 returns
> from reads.

ok. Thanks.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: libprocstat(3): retrieve process command line args and environment

2013-03-28 Thread Mikolaj Golub
On Thu, Mar 28, 2013 at 12:51:34PM +0200, Konstantin Belousov wrote:

> In the generic Elf 64bit draft specification I have, the notes sections
> are specified to consists of entries, each of which is an array of 8-byte
> words. I think we are right using the 8-byte alignment.

I have impression many implementations use 4-byte alignment. E.g. in
NetBSD:

sys/kern/core_elf32.c:

#define ELFROUNDSIZE4   /* XXX Should it be sizeof(Elf_Word)? */
#define elfround(x) roundup((x), ELFROUNDSIZE)

Also, we have inconsistency with imgactl_elf.c/parse_notes(), which
uses 4-byte alignment:

note = (const Elf_Note *)((const char *)(note + 1) +
roundup2(note->n_namesz, sizeof(Elf32_Addr)) +
roundup2(note->n_descsz, sizeof(Elf32_Addr)));

I suppose there were no issues before, because accidentally the sizes
of all notes we had were 8 bytes aligned.

Now, when I add new notes it will break things. I don't have strong
opinion, it will be ok for me to leave 8-byte alignment and fix
issues, just want to have strong support here :-)

BTW, looking at NetBSD code I see they set p_align in the note
segement to ELFROUNDSIZE:

/* Write out the PT_NOTE header. */
ws.psections->p_type = PT_NOTE;
ws.psections->p_offset = notestart;
ws.psections->p_vaddr = 0;
ws.psections->p_paddr = 0;
ws.psections->p_filesz = notesize;
ws.psections->p_memsz = 0;
ws.psections->p_flags = PF_R;
ws.psections->p_align = ELFROUNDSIZE;

while we set to 0:

/* The note segement. */
phdr->p_type = PT_NOTE;
phdr->p_offset = hdrsize;
phdr->p_vaddr = 0;
phdr->p_paddr = 0;
phdr->p_filesz = notesz;
phdr->p_memsz = 0;
phdr->p_flags = 0;
phdr->p_align = 0;

Shouldn't we set it to alignment size too? Note also, they set
"Segment is readable" flag.

> > 
> > 4) In libprocstat I added new functions and placed them under already
> > existent FBSD_1.3 version section in Symbol.map.
> > 
> > Shouldn't I bump the version? Won't I need any additional care if I
> > want to MFC the code to stable/9 and may be 8?
> Version of what ? MFC does not require any additional actions, FBSD_1.3
> is the valid version namespace in 9 and 8.

Ok. Now I see it was rather silly question :-). Thanks. For this and
your other notes.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: libprocstat(3): retrieve process command line args and environment

2013-03-24 Thread Mikolaj Golub
Here is an updated patch set, which I think includes all kib's
suggestions. It also introduces procstat groups, umask, rlimit, and
osrel notes.

http://people.freebsd.org/~trociny/procstat_core.3.patch

Sbuf section interface looks like below:

/* start a section */
sbuf_start_section(sb, NULL);
/* write something to sbuf */
...
/* start a subsection (need to save the current section length) */
sbuf_start_section(sb, &old_len);
/* write something to sbuf */
...
/* end a subsection padding to 4 bytes with '\0' bytes 
  (need to provide the previously saved section length) */
sbuf_end_section(sb, old_len, 4, 0);
...
 /* aling the whole section to page size */
sbuf_end_section(sb, -1, PAGE_SIZE, 0);

Open issues/questions:

1) I would also like to make libprocstat(3) extract environment, args,
and auxv from the core. It looks like I don't need to store envv and
argv in notes, as it is already present in the core. But I think I
need to know psstrings to find them.

Would it be ok, if I add auxv and psstrings notes, and extract envv
and agrv from a program section in the core?

2) I started NT_PROCSTAT constants from the first not used number in
elf_common.h, i.e. 8. But in this case they conflict with those
available on other systems:

contrib/binutils/include/elf/common.h:

#define NT_PSTATUS  10  /* Has a struct pstatus */
#define NT_FPREGS   12  /* Has a struct fpregset */
#define NT_PSINFO   13  /* Has a struct psinfo */
#define NT_LWPSTATUS16  /* Has a struct lwpstatus_t */
#define NT_LWPSINFO 17  /* Has a struct lwpsinfo_t */
#define NT_WIN32PSTATUS 18  /* Has a struct win32_pstatus */

Although note name = "FreeBSD" should have disambiguated this, readelf
looks like ignores this and its output for my core on i386 looks
confusing:

  Owner Data size   Description
  FreeBSD   0x006c  NT_PRPSINFO (prpsinfo structure)
  FreeBSD   0x0068  NT_PRSTATUS (prstatus structure)
  FreeBSD   0x00b0  NT_FPREGSET (floating point registers)
  FreeBSD   0x0018  NT_THRMISC (thrmisc structure)
  FreeBSD   0x0304  Unknown note type: (0x0008)
  FreeBSD   0x0a6c  Unknown note type: (0x0009)
  FreeBSD   0x08d4  NT_PSTATUS (pstatus structure)
  FreeBSD   0x000c  Unknown note type: (0x000b)
  FreeBSD   0x0006  NT_FPREGS (floating point registers)
  FreeBSD   0x00d4  NT_PSINFO (psinfo structure)
  FreeBSD   0x0008  Unknown note type: (0x000e)

Should I use some other range for NT_PROCSTAT notes?

3) Following our current code I align notes to sizeof(Elf_Size), which
is 4 on i386 and 8 on amd64.

But I have an issue reading the notes by readelf on amd64, which alway
uses 4 byte alignment:

contrib/binutils/binutils/readelf.c:

  next = (Elf_External_Note *)(inote.descdata + align_power (inote.descsz, 
2));

where

  #define align_power(addr, align)\
  (((addr) + ((bfd_vma) 1 << (align)) - 1) & ((bfd_vma) -1 << (align)))

Should I change alignment to 4 bytes?

4) In libprocstat I added new functions and placed them under already
existent FBSD_1.3 version section in Symbol.map.

Shouldn't I bump the version? Won't I need any additional care if I
want to MFC the code to stable/9 and may be 8?

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: libprocstat(3): retrieve process command line args and environment

2013-03-17 Thread Mikolaj Golub
On Sun, Mar 17, 2013 at 08:30:33AM +0200, Konstantin Belousov wrote:
> On Sun, Mar 17, 2013 at 12:35:20AM +0200, Mikolaj Golub wrote:
> > A KPI that would be natural for my case: 
> > 
> >   /* start a section that is going to be aligned to sizeof(Elf_Size),
> >  using byte '0' for padding */
> >   sbuf_padded_start(sb, sizeof(Elf_Size), 0);
> >   /* write something to sbuf */
> >   sbuf_bcat(sb, data, len);
> >   /* align the sbuf section */
> >   sbuf_pad(sb);
> > 
> > This might look a little awkward and would add some overhead for the normal
> > case though...
> This looks fine, in fact. You might want to call it sbuf_start_section()
> and sbuf_end_section() ?

Ok, will try this way. Thanks.

...

> All you need is to reset req->oldidx. This could be done outside the
> sbuf interface, in the top level function implementing the sysctl ?

I am afraid at this level I don't know a value to reset req->oldidx
to. Reseting it to 0 I think is not a good solution?

> What you propose in the follow-up message should work too, I do not
> have any preference.

Ok. Thanks.

> > > Indents after the else clauses in kern_proc_out() are wrong.
> > 
> > Do you mean indents after '#ifdef COMPAT_FREEBSD32' block? I did it
> > that way so if COMPAT_FREEBSD32 sections were removed from the code
> > the indentation would be correct. I saw this practice through the code
> > and used it myself before.
> The sections are not going to be removed. IMHO code should be formatted
> as if the preprocessor directive lines are not present. Could you point
> out an example of existing code consistent with your indentation ?

In kern/kern_proc.c my code (get_proc_vector, sysctl_kern_proc_auxv)
has such indentation. There were no objection when I introduced it, so
I thought it was a right way. But surely I didn't invented such
indentation myself. I don't recall if I used some particular examples
as a reference then, but here are several examples of such indentation
found by quick grep:

net/bpf.c-#ifdef BPF_JITTER
net/bpf.c-  bf = bpf_jitter_enable != 0 ? d->bd_bfilter : NULL;
net/bpf.c-  if (bf != NULL)
net/bpf.c-  slen = (*(bf->func))(pkt, pktlen, pktlen);
net/bpf.c-  else
net/bpf.c:#endif
net/bpf.c-  slen = bpf_filter(d->bd_rfilter, pkt, pktlen, pktlen);

kern/kern_conf.c-#if 0
kern/kern_conf.c-   if (dev->si_usecount == 0 &&
kern/kern_conf.c-   (dev->si_flags & SI_CHEAPCLONE) && (dev->si_flags & 
SI_NAMED))
kern/kern_conf.c-   ;
kern/kern_conf.c-   else 
kern/kern_conf.c:#endif
kern/kern_conf.c-   if (dev->si_devsw == NULL && dev->si_refcount == 0) {

kern/kern_jail.c-   if (SV_PROC_FLAG(td->td_proc, SV_ILP32)) {
kern/kern_jail.c-   uint32_t hid32 = pr->pr_hostid;
kern/kern_jail.c-
kern/kern_jail.c-   error = vfs_setopt(opts, "host.hostid", &hid32, 
sizeof(hid32));
kern/kern_jail.c-   } else
kern/kern_jail.c:#endif
kern/kern_jail.c-   error = vfs_setopt(opts, "host.hostid", &pr->pr_hostid,

netinet6/raw_ip6.c- /* Do not inject data into pcb. */
netinet6/raw_ip6.c- INP_RUNLOCK(last);
netinet6/raw_ip6.c- } else
netinet6/raw_ip6.c:#endif /* IPSEC */
netinet6/raw_ip6.c- if (last != NULL) {
netinet6/raw_ip6.c- if (last->inp_flags & INP_CONTROLOPTS ||

On the other hand there are many examples where indentation is used in
the way you prefer. I don't have any strong opinion about this so I
will do in the way you suggest.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: libprocstat(3): retrieve process command line args and environment

2013-03-16 Thread Mikolaj Golub
On Sun, Mar 17, 2013 at 12:35:20AM +0200, Mikolaj Golub wrote:

> Ah, this is a thing I wanted to discuss but forgot! As I understand
> the idea of the 'ABI hack' is: if the output buffer is less than the
> size of data we have, truncate our data to the last successfully
> written kinfo_file structure and return without error.
> 
> In my code I do reset ENOMEM to 0 (see sysctl_kern_proc_filedesc), but
> I don't see a way how using sbuf interface I can truncate req->oldidx
> to the last successfully written file: sbuf_bcat() (to internal
> buffer) may be successful and the error might appear only later, when
> draining. I suspect it will break things if I return with a partial
> kinfo_file, but haven't come with a solution yet...

A solution I am going to try is to provide maxlen argument to
kern_proc_filedesc_out(), and if it is not 0, output files that do not
exceed the limit, so sysctl_kern_proc_filedesc would call:

  kern_proc_filedesc_out(p, &sb, req->oldlen);

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: libprocstat(3): retrieve process command line args and environment

2013-03-16 Thread Mikolaj Golub
On Sat, Mar 16, 2013 at 09:16:05PM +0200, Konstantin Belousov wrote:

> IMO sbuf_pad() should be moved to subr_sbuf.c. I find the KPI of
> the sbuf_pad() wrong. You have two separate number, the amount to
> pad to, and the current amount. Natural interface would take the
> two numbers separate instead of the difference. Also, could the
> sbuf_pad() deduce the current amount on its own, this would be the
> best ?

Hm, I am not sure about this.

So are you proposing instead of something like this

  sbuf_pad(sb, roundup(x, y) - x, 0);

to have

  sbuf_pad(sb, x, roundup(x, y), 0)?

Although it is a little easier to write, it looks less intuitive for
me.  E.g. I have troubles how to document this and explain. I can't
reffer x as a current position in sbuf, because it might not be. It is
just a some position, roundup(x,y) is another one, and only their
difference makes sence for sbuf_pad, so why not just to provide this
difference? So

  sbuf_pad(sb, from, to, c);

looks for me less intutive than

  sbuf_pad(sb, len, c);

A KPI that would be natural for my case: 

  /* start a section that is going to be aligned to sizeof(Elf_Size),
 using byte '0' for padding */
  sbuf_padded_start(sb, sizeof(Elf_Size), 0);
  /* write something to sbuf */
  sbuf_bcat(sb, data, len);
  /* align the sbuf section */
  sbuf_pad(sb);

This might look a little awkward and would add some overhead for the normal
case though...

> In register_note(), put spaces around '|' for the malloc line.
> 
> It seems that you did not moved the 'ABI hack' for ENOMEM situation for
> sysctl_kern_proc_filedesc() into the rewritten function.
>

Ah, this is a thing I wanted to discuss but forgot! As I understand
the idea of the 'ABI hack' is: if the output buffer is less than the
size of data we have, truncate our data to the last successfully
written kinfo_file structure and return without error.

In my code I do reset ENOMEM to 0 (see sysctl_kern_proc_filedesc), but
I don't see a way how using sbuf interface I can truncate req->oldidx
to the last successfully written file: sbuf_bcat() (to internal
buffer) may be successful and the error might appear only later, when
draining. I suspect it will break things if I return with a partial
kinfo_file, but haven't come with a solution yet...

> Please commit the changes to use pget() in the sysctl handlers separately.
> 
> Indents after the else clauses in kern_proc_out() are wrong.

Do you mean indents after '#ifdef COMPAT_FREEBSD32' block? I did it
that way so if COMPAT_FREEBSD32 sections were removed from the code
the indentation would be correct. I saw this practice through the code
and used it myself before.

> Since you are moving the KERN_PROC_ZOMBMASK out of kern_proc.c,
> a comment is needed to describe its usage. I would find it very
> confusing if grep reveals no like of code that sets the flags.
> 
> In the comment for sbuf_drain_core_output(), s/drainig/draining/,
> s/sefely/safely/ and s/hold/call us with the process lock held/.
> 
> I do not see much sense in the kstack note. The core is dumped when
> the threads are moved into the safe state in the kernel, so you
> cannot catch 'living' stack backtraces for the kernel side of
> things.

I was afraid of it after I had tried it on real dumps :-( Ok, will
remove the kstack note.

> On the other hand, things like umask, resources and osrel might be
> of the interest for post-morted analysis.

This is in my TODO list.

> I did not looked at the usermode changes.

Thanks for your suggestions! Will do them.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: libprocstat(3): retrieve process command line args and environment

2013-03-16 Thread Mikolaj Golub
On Wed, Feb 20, 2013 at 09:58:02PM +0200, Mikolaj Golub wrote:
> On Wed, Feb 20, 2013 at 09:04:14AM -0500, John Baldwin wrote:
> 
> > The process should be stopped by the time we dump a core, so running it
> > multiple times should be ok in that the sizes should not change.  I would
> > say that you should try to implement a "determine sizes" pass that doesn't
> > allocate anything, but others should comment on that.
> 
> I had a little talk with kib about this recently. Kib's main concern
> looked to be that a process with many threads/open files might require
> considerable amount of kernel memory if the procstat notes are
> prepared in memory before writing. So currently I am working on
> another approach, when on the first pass the sizes are found, and on
> the second pass procstat notes are written to coredump without
> preliminarily storing all notes in memory buffer. Hope, the code won't
> look very ugly...

Here is an updated patch:

http://people.freebsd.org/~trociny/procstat_core.2.patch

- The coredump routines are modified to be able to write notes
directly to a core file, via sbuf interface, without preliminary
preparing them all in a memory buffer.

- To write NT_PROCSTAT_* notes, the corresponding sysctl routines are
changed to provide kern_proc_filedesc_out(), kern_proc_kstack_out(),
kern_proc_out(), and kern_proc_vmmap_out() functions.

- libprocstat(3) is extended to extract procstat notes from a process
core file. Also new functions (procstat_getvmmap, procstat_kstack) are
added.

- procstat(1) is changed to use libprocstat(3) where it is possible
and to treat non-numeric command line arguments as core files.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: libprocstat(3): retrieve process command line args and environment

2013-02-20 Thread Mikolaj Golub
On Wed, Feb 20, 2013 at 09:04:14AM -0500, John Baldwin wrote:

> The process should be stopped by the time we dump a core, so running it
> multiple times should be ok in that the sizes should not change.  I would
> say that you should try to implement a "determine sizes" pass that doesn't
> allocate anything, but others should comment on that.

I had a little talk with kib about this recently. Kib's main concern
looked to be that a process with many threads/open files might require
considerable amount of kernel memory if the procstat notes are
prepared in memory before writing. So currently I am working on
another approach, when on the first pass the sizes are found, and on
the second pass procstat notes are written to coredump without
preliminarily storing all notes in memory buffer. Hope, the code won't
look very ugly...

> One other thing to consider is if gcore needs to be updated to output these
> records as well.  It could use the sysctls to fetch the data and then write
> out appropriate notes I think, so perhaps it wouldn't be too difficult to add
> this as a followup commit once the kernel version has settled and the file
> format is set?

Looks like very interesting idea!

Thank you for all your comments and suggestions.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: libprocstat(3): retrieve process command line args and environment

2013-02-12 Thread Mikolaj Golub
 the format first, but I would be
glad if somebody helps me with the problem I faced linking libelf to
libprocstat. With the modifications to the libprocstat makefile:

-DPADD= ${LIBKVM} ${LIBUTIL}
-LDADD= -lkvm -lutil
+DPADD= ${LIBELF} ${LIBKVM} ${LIBUTIL}
+LDADD= -lelf -lkvm -lutil

buildworld fails with the error:

make: don't know how to make 
/scratch2/tmp/trociny/obj/i386.i386/home/trociny/svn/base/head/tmp/usr/lib/libelf.a.
 Stop

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: libprocstat(3): retrieve process command line args and environment

2013-01-23 Thread Mikolaj Golub
On Wed, Jan 23, 2013 at 11:31:43AM -0500, John Baldwin wrote:
> On Wednesday, January 23, 2013 2:25:00 am Mikolaj Golub wrote:
> > IMHO, after adding procstat_getargv and procstat_getargv, the usage of
> > kvm_getargv() and kvm_getenvv() (at least in the new code) may be
> > deprecated. As this is stated in the man page, BUGS section, "these
> > routines do not belong in the kvm interface". I suppose they are part
> > of libkvm because there was no a better place for them. procstat(1)
> > prefers direct sysctl to them (so, again, code duplication, which I am
> > going to remove adding procstat_getargv/envv).
> 
> Hmm, are you going to rewrite ps(1) to use libprocstat?  Or rather, is that a
> goal someday?  That is one current consumer of kvm_getargv/envv.  That might
> be fine if we want to make more tools use libprocstat instead of using libkvm
> directly.

I didn't have any plans for ps(1) :-) That is why I wrote about "new
code". But if you think it is good to do I might look at it one day...

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: libprocstat(3): retrieve process command line args and environment

2013-01-22 Thread Mikolaj Golub
On Tue, Jan 22, 2013 at 02:17:39PM -0800, Stanislav Sedov wrote:
> 
> On Jan 22, 2013, at 1:48 PM, John Baldwin  wrote:
> > 
> > Well, you could make procstat open a kvm handle in both cases (open a 
> > "live" 
> > handle in the procstat_open_sysctl() case).  It just seems rather silly to 
> > be 
> > duplicating code in the two interfaces.

In this particular case I prefer code duplication to opening a kvm
handle in procstat_open_sysctl(), as it looks a bit confusing. But I
can do this way if the agreement is reached.

> > More a question for Robert: does 
> > libprocstat intentionally duplicate the code in libkvm for other things as 
> > well in the live case?  (Like fetching the list of processes?)
> > 
> It does not actually has a duplicate code, the code for fetching the list of
> processes via sysctl is different from the KVM case.  The open file 
> descriptors
> processing is different as well.  Because libprocstat implements almost the
> same functionality both for sysctl and mvm backends, it can be used to analyze
> both the live system and the kernel crash dumps.  The code Mikolaj proposed
> only implements the sysctl backend currently, so it does not seem to have
> any relation to KVM, so it will be a bit weird to make it open a KVM handle
> though it does not use it.

IMHO, after adding procstat_getargv and procstat_getargv, the usage of
kvm_getargv() and kvm_getenvv() (at least in the new code) may be
deprecated. As this is stated in the man page, BUGS section, "these
routines do not belong in the kvm interface". I suppose they are part
of libkvm because there was no a better place for them. procstat(1)
prefers direct sysctl to them (so, again, code duplication, which I am
going to remove adding procstat_getargv/envv).

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: libprocstat(3): retrieve process command line args and environment

2013-01-22 Thread Mikolaj Golub
On Tue, Jan 22, 2013 at 12:01:06PM -0500, John Baldwin wrote:

> How is this different from kvm_getargv()?  It seems to be a direct copy.

libprocstat(3) is a frontend for sysctl(3) and kvm(3) interfaces, so
it is good to extend it to cover "getarg/env" functionality.

Yes the functions look similar to kvm_getargv() but I couldn't
implement them just as wrappers around kvm_getargv(): I would like to
have libprocstat functions thread safe, while kvm_getargv() uses
static variables for its internal buffers.

It looks like I could fix kvm_getargv() to use fields of kvm structure
instead of static variables to store pointers to the buffers, and then
use it in libprocstat(3). Do you think it is worth doing?

BTW, struct __kvm already contains some pointers, which looks like are
unused currently:

  char**argv; /* (dynamic) storage for argv pointers */
  int argc;   /* length of above (not actual # present) */
  char*argbuf;/* (dynamic) temporary storage */

But if I even had kvm_getargv() to behave as I wanted, there is still
an issue with using it in libprocstat(): to get kvm structure you need
to initialize procstat using procstat_open_kvm(). It is supposed to
call procstat_open_kvm() when you want to read from kernel memory,
while kvm_getargv() uses sysctl. So from a user point of you it would
be a litle confusing if she had to call procstat_open_kvm() to get
runtime args and env. If she wanted e.g. to get both runtime args and
file info (via sysctl) she would have to do procstat_open_kvm() for
args and procstat_open_sysctl() for files.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


libprocstat(3): retrieve process command line args and environment

2013-01-19 Thread Mikolaj Golub
Hi,

Some time ago Stanislav Sedov suggested to me extending libprocstat(3)
with functions to retrieve process command line arguments and
environment variables.

In the first approach I tried, the newly added functions
procstat_getargv/getenvv allocated a buffer of necessary size, stored
the values and returned to the caller:

http://people.freebsd.org/~trociny/libprocstat.1.patch

The problem with this approach was that when I updated procstat(1) to
use this interface, I observed noticeable performance degradation
(about 30% on systems with MALLOC_PRODUCTION off), due to memory
allocation overhead: the original procstat(1) reuses the buffer for
all its retrievals.

So my second approach was to add internal buffers to struct procstat,
which are used by procstat_getargv/getenvv to store values and reused
on the subsequent call:

http://people.freebsd.org/~trociny/libprocstat.2.patch

The drawback of this approach is that a user has to take care and
remember that a subsequent call rewrites argument vector obtained from
the previous call. On the other hand this is ok for typical use cases
while does not add allocation overhead, so I like this approach more.

I would like to commit this second patch, if there are no objections
or suggestions how to improve the things.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Changing `iostat -Ix' output

2012-09-18 Thread Mikolaj Golub
Hi,

I don't like very much what `iostat -Ix' outputs and would like to
change this.

A typical output:

% iostat -Ix
extended device statistics  
device r/i   w/ikr/ikw/i qlen svc_t  %b  
ada0 5599136.0 3953193.0 39982760.0 86819866.50  15.1  16 
cd0  552.0   0.0 4.3 0.00   0.0   0 
pass0333.0   0.0   166.5 0.00   0.0   0 
pass1  2.0   0.0 1.0 0.00   0.0   0 

Parameters like r/i, kr/i (total io operations/kbytes) are very
useful. They allow to use `iostat -Ix' to collect IO statistics
running it periodically (from cron or some monitoring tool) and
calculate average amount of operations or bytes per second at the
specified period subtracting the current value from the previous one
and dividing by time period.

But you can't do the same with % busy, which is very useful IO
characteristics. Average % busy at the specified period could be
calculated storing total busy time for the device at time t1, total
busy time at t2 and then subtracting the last value from the first (to
get busy time at this period) and dividing by the time period.

Currently iostat(8) does not provide 'total busy time' statistics. I
use sysutils/devstat for this but it would be nice if iostat(8) itself
provide such functionality.

I propose for `iostat -Ix` to output total busy time instead of %
busy, and also total duration of transactions instead of average
duration (to be able to calculate average duration for the period
between two iostat runs).

http://people.freebsd.org/~trociny/iostat.total_busy_time.1.patch

Average duration and % busy are still available via `iostat -x`.

Here is an output example

% ./iostat -Ix; sleep 60; ./iostat -Ix
extended device statistics  
device   r/i w/i kr/i kw/i qlen   tdur  
  sb  
ada0   5599785.0   3961913.0   39985960.5   86902385.50   144055.5   
35966.5 
cd0554.0 0.0  4.3  0.000.0  
 9.5 
pass0  336.0 0.0168.0  0.000.0  
17.5 
pass12.0 0.0  1.0  0.000.0  
 0.0 
extended device statistics  
device   r/i w/i kr/i kw/i qlen   tdur  
  sb  
ada0   5599922.0   3963177.0   40002608.0   86958230.50   144074.4   
35970.5 
cd0554.0 0.0  4.3  0.000.0  
 9.5 
pass0  336.0 0.0168.0  0.000.0  
17.5 
pass12.0 0.0  1.0  0.000.0  
 0.0 

So, for ada0, % busy for that period was 100 * (35970.5 - 35966.5) / 60 = 6.

And service time (assuming that only read and write operations were
serviced) was

1000 * (144074.4 - 144055.5) / (5599922 - 5599785 + 3963177 - 3961913)
= 13.4 msec.

What do you think about this?

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: SuperPages utilization survey

2012-06-09 Thread Mikolaj Golub

On Sat, 9 Jun 2012 12:07:40 +0300 Konstantin Belousov wrote:

 KB> Well, if I see a report informing me that some 2M region contains 512 super
 KB> pages, how should I interpret it ? For me, it is only one superpage 
(mapping)
 KB> that can be created in one 2M region.

Well, if I see a report like below:

  PID  STARTEND PRTRES PRES   SUP REF SHD  FL 
TP PATH
485680x800c00x820c0 rw- 1310720 51712   2   0 --S 
df 

it tells me that for the region 0x800c0-0x820c0 (512Mb) we have 131072
* 4k = 512Mb resident and 51712 * 4k = 202Mb (a litle less than a half of the
region) promoted (mapped) to superpages.

If I had number of superpages here I would need additional knowledge (a
superpage size) to calculate how effectively superpages are used.

But actually, no much difference for me. To get a number of superpages is it
enough just to divide the result obtained counting normal-sized pages by
(2M/4k) factor?

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: SuperPages utilization survey

2012-06-09 Thread Mikolaj Golub

On Sat, 9 Jun 2012 11:38:22 +0300 Konstantin Belousov wrote:

 KB> On Sat, Jun 09, 2012 at 10:31:17AM +0300, Mikolaj Golub wrote:
 >> 
 >> On Fri, 1 Jun 2012 14:54:48 +0200 Ivan Voras wrote:
 >> 
 >>  IV> On 1 June 2012 14:35, Wojciech Puchar  
 >> wrote:
 >>  >>> http://people.freebsd.org/~ivoras/stuff/spsurvey.py
 >> 
 >>  ...
 >> 
 >>  IV> If anyone posts more data, I'll analyse it. I'm more worried about the
 >>  IV> granularity of procstat, where it marks the entire region if a single
 >>  IV> superpage exists in it - it means any such analysis is only
 >>  IV> approximate.
 >> 
 >> Here is a patch (for kernel and procstat) that allows to see amount of pages
 >> mapped to superpages.
 >> 
 >> http://people.freebsd.org/~trociny/procstat-superpages.cnt.1.patch
 >> 
 >> Not sure it is useful enough to be committed.

 KB> Superpage aggregates mappings for several normal-sized pages.
 KB> As a consequence, when you iterate over small pages in
 KB> sysctl_kern_proc_vmmap(), you account each superpage as many time as
 KB> much constituent small pages it contains.

This is exactly what my intention was to count: how much memory are handled by
superpages (using normal-sized page as a measurement unit), not amount of
superpages. And I think this is what Ivan wanted to know. Do you think it is
better to return number of superpages?

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: SuperPages utilization survey

2012-06-09 Thread Mikolaj Golub

On Fri, 1 Jun 2012 14:54:48 +0200 Ivan Voras wrote:

 IV> On 1 June 2012 14:35, Wojciech Puchar  
wrote:
 >>> http://people.freebsd.org/~ivoras/stuff/spsurvey.py

 ...

 IV> If anyone posts more data, I'll analyse it. I'm more worried about the
 IV> granularity of procstat, where it marks the entire region if a single
 IV> superpage exists in it - it means any such analysis is only
 IV> approximate.

Here is a patch (for kernel and procstat) that allows to see amount of pages
mapped to superpages.

http://people.freebsd.org/~trociny/procstat-superpages.cnt.1.patch

Not sure it is useful enough to be committed.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: a sysctl for process binary osreldate

2012-03-22 Thread Mikolaj Golub

On Thu, 22 Mar 2012 22:38:15 +0200 Mikolaj Golub wrote:

 MG> Actually I don't see reasons why this may not be p_cansee, so I
 MG> updated the patch  and going to commit it if there is no objections.

The updated patch:

http://people.freebsd.org/~trociny/kern_proc_osrel.2.patch

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: a sysctl for process binary osreldate

2012-03-22 Thread Mikolaj Golub
On Sat, Mar 17, 2012 at 9:30 PM, Mikolaj Golub  wrote:
> Hi,
>
> Currently we can check and change binary osreldate of another process via
> procfs(5).
>
> Kostik suggested to add a new sysctl for the same purpose and also extend
> procstat to show osrel.
>
> Here are patches I am going to commit if there are no objections or
> suggestions.
>
> http://people.freebsd.org/~trociny/kern_proc_osrel.1.patch
> http://people.freebsd.org/~trociny/procstat.osrel.1.patch
>
> I set the same permissions as for procfs(5) osrel -- so only user can read it,
> but may be this is too restrictive and p_cansee on read would be ok?

Actually I don't see reasons why this may not be p_cansee, so I
updated the patch  and going to commit it if there is no objections.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: a sysctl for process binary osreldate

2012-03-18 Thread Mikolaj Golub

On Sat, 17 Mar 2012 22:29:01 +0100 Jilles Tjoelker wrote:

 JT> On Sat, Mar 17, 2012 at 09:30:05PM +0200, Mikolaj Golub wrote:
 >> I added osrel output to procstat -b option:

 >> kopusha:~% procstat -b 2975
 >>   PID COMMOSREL PATH
 >>  2975 emacs 101 /usr/local/bin/emacs-23.3

 >> Would this be ok or someone see a better way?

 JT> Hmm, this means that procstat is not supposed to be used from scripts as
 JT> it is apparently OK to change its output format like this?

Yes, breaking output compatibility worries me too. Although I already broke it
recently for '-s' option, adding umask output. Let me cite Robert (taken from
our then discussion about procstat umask output):

> if we add too many arguments we'll start looking like ps(1), whereas the
> point of procstat(1) is that it's *not* ps(1) :-).

That is why I decided to not introduce yet another option here too at the cost
of breaking compatibility. But I am open for any suggestions. 

 JT> In some ways, querying via ps would be better for scripts since it
 JT> allows things like
 JT>   ps -p PID -o KEYWORD=
 JT> which do not need additional parsing except that many of the newer
 JT> things in procstat do not have ps keywords.

 JT> -- 
 JT> Jilles Tjoelker

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: a sysctl for process binary osreldate

2012-03-17 Thread Mikolaj Golub

On Sat, 17 Mar 2012 23:26:53 +0200 Konstantin Belousov wrote:

 KB> On Sat, Mar 17, 2012 at 11:07:24PM +0200, Mikolaj Golub wrote:
 >> 
 >> On Sat, 17 Mar 2012 16:37:02 -0400 Jason Hellenthal wrote:
 >> 
 >>  JH> Would this be a planned MFC to stable/N as well specifcially 8 ?
 >> 
 >> I plan to MFC to stable/9 if there is no objections.

 KB> I do not see why the merge to stable/8 cannot be done from the technical
 KB> POV. If Mikolaj has no time or desire to merge to 8, I can help him.

If people consider this to be useful, no problem for me to merge to 8.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: a sysctl for process binary osreldate

2012-03-17 Thread Mikolaj Golub

On Sat, 17 Mar 2012 16:37:02 -0400 Jason Hellenthal wrote:

 JH> Would this be a planned MFC to stable/N as well specifcially 8 ?

I plan to MFC to stable/9 if there is no objections.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


a sysctl for process binary osreldate

2012-03-17 Thread Mikolaj Golub
Hi,

Currently we can check and change binary osreldate of another process via
procfs(5).

Kostik suggested to add a new sysctl for the same purpose and also extend
procstat to show osrel.

Here are patches I am going to commit if there are no objections or
suggestions.

http://people.freebsd.org/~trociny/kern_proc_osrel.1.patch
http://people.freebsd.org/~trociny/procstat.osrel.1.patch

I set the same permissions as for procfs(5) osrel -- so only user can read it,
but may be this is too restrictive and p_cansee on read would be ok?

I added osrel output to procstat -b option:

kopusha:~% procstat -b 2975
  PID COMMOSREL PATH
 2975 emacs 101 /usr/local/bin/emacs-23.3

Would this be ok or someone see a better way?

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-12-04 Thread Mikolaj Golub

On Sun, 4 Dec 2011 15:57:06 + Robert N. M. Watson wrote:

 RNMW> On 4 Dec 2011, at 14:31, Jilles Tjoelker wrote:

 >> On Sat, Oct 29, 2011 at 01:32:39PM +0300, Mikolaj Golub wrote:
 >>> [KERN_PROC_AUXV requires just p_cansee()]
 >> 
 >> If we are ever going to do ASLR, the AUXV information tells an attacker
 >> where the stack, executable and RTLD are located, which defeats much of
 >> the point of randomizing the addresses in the first place.
 >> 
 >> Given that the AUXV information seems to be used by debuggers only
 >> anyway, I think it would be good to move it to p_candebug() now.
 >> 
 >> The full virtual memory maps (KERN_PROC_VMMAP, procstat -v) are already
 >> under p_candebug().


 RNMW> Agreed. In general, my view is that p_cansee() should be used for very
 RNMW> few of our process inspection APIs. I like your example of ASLR
 RNMW> especially, as it illustrates how debugging information can aid even
 RNMW> local attacks (i.e., user vs. setuid binary).

What do you think about recently added kern.proc.ps_strings, which returns
location of ps_strings structure? It uses p_cansee() too. The location is the
same for all processes of the same ABI, so this does not look like sensitive
information, on the other hand it also seems to be used by debuggers only.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-12-04 Thread Mikolaj Golub

On Sun, 4 Dec 2011 15:57:06 + Robert N. M. Watson wrote:

 RNMW> On 4 Dec 2011, at 14:31, Jilles Tjoelker wrote:

 >> On Sat, Oct 29, 2011 at 01:32:39PM +0300, Mikolaj Golub wrote:
 >>> [KERN_PROC_AUXV requires just p_cansee()]
 >> 
 >> If we are ever going to do ASLR, the AUXV information tells an attacker
 >> where the stack, executable and RTLD are located, which defeats much of
 >> the point of randomizing the addresses in the first place.
 >> 
 >> Given that the AUXV information seems to be used by debuggers only
 >> anyway, I think it would be good to move it to p_candebug() now.
 >> 
 >> The full virtual memory maps (KERN_PROC_VMMAP, procstat -v) are already
 >> under p_candebug().


 RNMW> Agreed. In general, my view is that p_cansee() should be used for very
 RNMW> few of our process inspection APIs. I like your example of ASLR
 RNMW> especially, as it illustrates how debugging information can aid even
 RNMW> local attacks (i.e., user vs. setuid binary).

Thanks! I will change it to p_candebug().

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-11-14 Thread Mikolaj Golub

On Wed, 9 Nov 2011 14:53:29 +0200 Kostik Belousov wrote:

 >> > http://people.freebsd.org/~trociny/env.sys.4.patch
 >> > 
 >> > Investigating cases when EFAULT was returned and if the fallback was
 >> > successful I noticed that most of the cases were when p->p_comm changed 
 >> > during
 >> > the read, so the process was in exec in that time. In order to avoid this
 >> > error I added a check for P_INEXEC flag.
 >> And now you return success and nothing gets copied out for the process
 >> in P_INEXEC state. Either you should return an error like EAGAIN, or
 >> consider the P_INEXEC state as transitional and wait till process
 >> leaves it. Or, ignore the state as it was before, and return whatever
 >> error proc_rwmem generated (my preference).

 KB> Forgot to say that the check does not change much because you drop
 KB> process lock immediately after the check, so the process may enter
 KB> the INEXEC state right after the check. I believe you already tried
 KB> to do this with P_WEXIT.

Ok, eventually I decided not to check for P_INEXEC (as the simplest :-).

The updated patch:

http://people.freebsd.org/~trociny/env.sys.5.patch

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-11-10 Thread Mikolaj Golub

On Wed, 09 Nov 2011 15:31:26 +0200 Mikolaj Golub wrote:

 MG> On Wed, 9 Nov 2011 14:53:29 +0200 Kostik Belousov wrote:

 >>> And now you return success and nothing gets copied out for the process
 >>> in P_INEXEC state. Either you should return an error like EAGAIN, or
 >>> consider the P_INEXEC state as transitional and wait till process
 >>> leaves it. Or, ignore the state as it was before, and return whatever
 >>> error proc_rwmem generated (my preference).

 KB>> Forgot to say that the check does not change much because you drop
 KB>> process lock immediately after the check, so the process may enter
 KB>> the INEXEC state right after the check. I believe you already tried
 KB>> to do this with P_WEXIT.

 MG> Good point :-). Although after adding the P_INEXEC I have not seen errors 
any
 MG> more, while before they were often (when running 'procstat -ca' in loop and
 MG> building world simultaneously). Thus it looks like the probability is much
 MG> smaller.

 MG> So, it still looks good for me to check for P_INEXEC and return EAGAIN, and
 MG> add the comment why we do this and that it still racy. But if you still 
think
 MG> that ignoring the state is the best option no problems for me to return it
 MG> back.

Realted to this, sysctl_kern_proc_kstack() looks like has the similar issue.
But it returns ESRCH instead.

/* XXXRW: Not clear ESRCH is the right error during proc execve(). */
if (p->p_flag & P_WEXIT || p->p_flag & P_INEXEC) {
    PROC_UNLOCK(p);
return (ESRCH);
}
...
_PHOLD(p);
PROC_UNLOCK(p);

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-11-09 Thread Mikolaj Golub

On Wed, 9 Nov 2011 14:53:29 +0200 Kostik Belousov wrote:

 >> And now you return success and nothing gets copied out for the process
 >> in P_INEXEC state. Either you should return an error like EAGAIN, or
 >> consider the P_INEXEC state as transitional and wait till process
 >> leaves it. Or, ignore the state as it was before, and return whatever
 >> error proc_rwmem generated (my preference).

 KB> Forgot to say that the check does not change much because you drop
 KB> process lock immediately after the check, so the process may enter
 KB> the INEXEC state right after the check. I believe you already tried
 KB> to do this with P_WEXIT.

Good point :-). Although after adding the P_INEXEC I have not seen errors any
more, while before they were often (when running 'procstat -ca' in loop and
building world simultaneously). Thus it looks like the probability is much
smaller.

So, it still looks good for me to check for P_INEXEC and return EAGAIN, and
add the comment why we do this and that it still racy. But if you still think
that ignoring the state is the best option no problems for me to return it
back.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-11-09 Thread Mikolaj Golub

On Wed, 9 Nov 2011 14:44:55 +0200 Kostik Belousov wrote:

 KB> On Tue, Nov 08, 2011 at 11:47:54PM +0200, Mikolaj Golub wrote:
 >> 
 >> http://people.freebsd.org/~trociny/env.sys.4.patch
 >> 
 >> Investigating cases when EFAULT was returned and if the fallback was
 >> successful I noticed that most of the cases were when p->p_comm changed 
 >> during
 >> the read, so the process was in exec in that time. In order to avoid this
 >> error I added a check for P_INEXEC flag.

 KB> And now you return success and nothing gets copied out for the process
 KB> in P_INEXEC state.

This looked ok for me: new arguments have not been created for the process
yet, so return nothing :-)

 KB> Either you should return an error like EAGAIN, or consider the P_INEXEC
 KB> state as transitional and wait till process leaves it. Or, ignore the
 KB> state as it was before, and return whatever error proc_rwmem generated
 KB> (my preference).

I prefer EAGAIN :-). Reading in the process space that is being currrently
updated does not look good for me. And EAGAIN gives a hint that if I try it
again I will probably get the result, while EFAULT looks mysterious.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-11-08 Thread Mikolaj Golub

On Sun, 6 Nov 2011 20:10:41 +0200 Kostik Belousov wrote:

 KB> On Sat, Nov 05, 2011 at 10:37:46PM +0200, Mikolaj Golub wrote:
 >> 
 >> http://people.freebsd.org/~trociny/env.sys.3.patch

 KB> Oops, I missed this in the previous review. You cannot use fubyte in
 KB> proc_read_mem(). fubyte reads a byte from the address space of the current
 KB> process. The fix is easy, use proc_rwmem for 1 byte.

 KB> I do not think that fall back to single byte read is warranted for
 KB> proc_read_mem calls e.g. for ps_strings. Add a flag to indicate whether
 KB> the proc_read_mem should fall back to byte read ?

 KB> I would prefer using sizeof(uint64_t) and sizeof(uint32_t) instead of 8
 KB> and 4 constants in the align checks.

 KB> Might be, add PROC_ASSERT_HELD() to get_ps_string() ?

 KB> procfs patch looks good.

Thanks. The updated version:

http://people.freebsd.org/~trociny/env.sys.4.patch

Investigating cases when EFAULT was returned and if the fallback was
successful I noticed that most of the cases were when p->p_comm changed during
the read, so the process was in exec in that time. In order to avoid this
error I added a check for P_INEXEC flag.

After this I observed EFAULT (very rarely) only when reading arg or env
strings and fallback was successful for those cases. So I modified the patch
to do fallback only when reading strings (as it was in one of my earlier
versions but with wrong fubyte), and returned your comment which explains why
it may happen :-)

Also in the procfs patch I have added the check for process state.

The userland part has not been changed since my first report:

http://people.freebsd.org/~trociny/env.user.patch

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-11-06 Thread Mikolaj Golub

On Sat, 5 Nov 2011 15:58:01 +0200 Kostik Belousov wrote:

 KB> procfs_doproccmdline() can benefit from your work.

Patch for procfs:

http://people.freebsd.org/~trociny/procfs.patch

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-11-05 Thread Mikolaj Golub

On Sat, 5 Nov 2011 21:45:53 +0200 Kostik Belousov wrote:

 KB> On Sat, Nov 05, 2011 at 08:59:21PM +0200, Mikolaj Golub wrote:
 >> 
 >> On Sat, 5 Nov 2011 17:44:43 +0200 Kostik Belousov wrote:
 >> 
 >>  >>  KB> I think that the aux vector must be naturally aligned. You can 
 >> return
 >>  >>  KB> ENOEXEC early if vptr is not aligned.
 >>  >> 
 >>  >> Not sure I see what you mean. vptr for auxv is calculated just couple 
 >> lines
 >>  >> above, and I check the result here, in the part common for all vector 
 >> types.
 >>  KB> You do not check for the alignment. Am I wrong ?
 >> 
 >> I see now. If natural alignment means "addr % sizeof(aux) == 0" then the aux
 >> vectors are not naturally aligned. After adding this check:
 >> 
 >> if (vptr % sizeof(aux) != 0)
 >> return (ENOEXEC);
 KB> No, the natural alignment of the structure is the alignment of the most
 KB> demanding member. So it is 4 bytes on 32bit, and 8 bytes on 64.

 >> 
 >> I started to observe many ENOEXEC errors. Adding printf showed that the
 >> vectors are half size aligned.
 >> 
 >> On i386:
 >> 
 >> get_proc_vector(pid = getty[3442], type = 2): vptr (2143284876) % 
 >> sizeof(aux) (8) = 4)
 >> 
 >> On amd64:
 >> 
 >> get_proc_vector(pid = getty[2425], type = 2): vptr (140737488346568) % 
 >> sizeof(aux) (16) = 8)
 >> 
 >> Looking at exec_copyout_strings() from kern_exec.c, how destp is 
 >> calculated, I
 >> think they are sizeof(char *) aligned.
 >> 
 >> Do you think it is worth adding the check for sizeof(char *) alignment?
 >> 
 >> if (vptr % (sizeof(char *) != 0)
 >> return (ENOEXEC);
 KB> I suggest to use #if __ELF_WORD_SIZE == 32 or 64.

Thanks. The updated patch:

http://people.freebsd.org/~trociny/env.sys.3.patch

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-11-05 Thread Mikolaj Golub

On Sat, 5 Nov 2011 17:44:43 +0200 Kostik Belousov wrote:

 >>  KB> I think that the aux vector must be naturally aligned. You can return
 >>  KB> ENOEXEC early if vptr is not aligned.
 >> 
 >> Not sure I see what you mean. vptr for auxv is calculated just couple lines
 >> above, and I check the result here, in the part common for all vector types.
 KB> You do not check for the alignment. Am I wrong ?

I see now. If natural alignment means "addr % sizeof(aux) == 0" then the aux
vectors are not naturally aligned. After adding this check:

if (vptr % sizeof(aux) != 0)
return (ENOEXEC);

I started to observe many ENOEXEC errors. Adding printf showed that the
vectors are half size aligned.

On i386:

get_proc_vector(pid = getty[3442], type = 2): vptr (2143284876) % sizeof(aux) 
(8) = 4)

On amd64:

get_proc_vector(pid = getty[2425], type = 2): vptr (140737488346568) % 
sizeof(aux) (16) = 8)

Looking at exec_copyout_strings() from kern_exec.c, how destp is calculated, I
think they are sizeof(char *) aligned.

Do you think it is worth adding the check for sizeof(char *) alignment?

if (vptr % (sizeof(char *) != 0)
return (ENOEXEC);

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-11-05 Thread Mikolaj Golub

On Sat, 5 Nov 2011 15:58:01 +0200 Kostik Belousov wrote:

 KB> +if (error == EFAULT) {
 KB> +for (i = 0; i < len; i++) {
 KB> +c = fubyte(sptr + i);
 KB> +if (c < 0)
 KB> As a purely stylistical issue, compare with -1.

 KB> +return (EFAULT);
 KB> +buf[i] = (char)c;
 KB> +if (c == '\0')
 KB> +break;
 KB> +}
 KB> +error = 0;
 KB> +}
 KB> +return error;
 KB> Put () around error.

Thanks.

 KB> +/*
 KB> + * Check that that the address is in user space.
 KB> + */
 KB> +if (vptr + 1 < VM_MIN_ADDRESS + 1 || vptr >= VM_MAXUSER_ADDRESS)
 KB> +return (ENOEXEC);
 KB> Why is this needed ?

I saw this check in libkvm for ps_argvstr and ps_envstr and decided to add it
too. Just some additional check that ps_string fields, which can be
overwritten by the user, look reasonable. If you think this is not very useful
I remove it.

 KB> I think that the aux vector must be naturally aligned. You can return
 KB> ENOEXEC early if vptr is not aligned.

Not sure I see what you mean. vptr for auxv is calculated just couple lines
above, and I check the result here, in the part common for all vector types.

BTW, investigating the cases when I got

procstat: sysctl: kern.proc.args: 58002: 8: Exec format error

they were because the PROC_VECTOR_MAX limit (512 entries, as it is in
linprocfs and libkvm) is small for real world cases:

get_proc_vector(pid = rm[47883], type = 0): vsize (3009) > PROC_VECTOR_MAX 
(512))
get_proc_vector(pid = rm[47883], type = 0): vsize (3009) > PROC_VECTOR_MAX 
(512))
get_proc_vector(pid = rm[47890], type = 0): vsize (3008) > PROC_VECTOR_MAX 
(512))
get_proc_vector(pid = rm[47890], type = 0): vsize (3008) > PROC_VECTOR_MAX 
(512))
get_proc_vector(pid = rm[47897], type = 0): vsize (4511) > PROC_VECTOR_MAX 
(512))
get_proc_vector(pid = rm[47897], type = 0): vsize (4511) > PROC_VECTOR_MAX 
(512))
get_proc_vector(pid = rm[47897], type = 0): vsize (4511) > PROC_VECTOR_MAX 
(512))
get_proc_vector(pid = rm[48044], type = 0): vsize (611) > PROC_VECTOR_MAX (512))
get_proc_vector(pid = rm[52189], type = 0): vsize (772) > PROC_VECTOR_MAX (512))
get_proc_vector(pid = rm[52192], type = 0): vsize (1157) > PROC_VECTOR_MAX 
(512))
get_proc_vector(pid = rm[55685], type = 0): vsize (1041) > PROC_VECTOR_MAX 
(512))
get_proc_vector(pid = rm[55687], type = 0): vsize (1040) > PROC_VECTOR_MAX 
(512))
get_proc_vector(pid = rm[55690], type = 0): vsize (1559) > PROC_VECTOR_MAX 
(512))

So I am going to change it to ARG_MAX and use independent limit (256 entries)
for auxv.

 KB> Why the blank after the loop statement in get_ps_strings() ?

Sorry, what blank you mean? I don't see it in get_ps_strings(). May be you
mean the blank line in get_proc_vector() before return?

 KB> There shall be blank lines after the '{' in proc_getargv() and 
proc_getenvv().

Ah, sure.

 KB> Note that using cached pargs is somewhat inconsistent with the digging
 KB> into ps_strings.

 KB> procfs_doproccmdline() can benefit from your work.

Thanks, I will look at it :-).

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-11-05 Thread Mikolaj Golub

On Mon, 31 Oct 2011 11:49:48 +0200 Kostik Belousov wrote:

 KB> I suspect this is my bug: Reading the GET_PS_STRINGS_CHUNK_SZ may validly
 KB> return EFAULT if the string is shorter than the chunk and aligned at
 KB> the end of the page, assuming the next page is not mapped. There should
 KB> be a fallback to fubyte() read loop. I remember that copyinstr() was
 KB> unsuitable.

Hm, I thought that this issue was only for reading arg and env strings (which
could be shorter than GET_PS_STRINGS_CHUNK_SZ), but investigating the cases
when EFAULT was returned in my tests (running buildworld and procstat in loop)
I saw that it also returned when reading other objects (like struct
ps_strings), and a fallback to fubyte() read loop was successful in those
cases too.

So I updated the patch to do fallback for any type of read (although it does
not contain a good comment explaining why fubyte() read might succeed when
proc_rwmem() failed).

Also there were the cases when EFAULT was returned because arg vector
contained the NULL pointer. I observed this for sh processes. In
lib/libc/gen/setproctitle.c I found this comment:

oargc = ps_strings->ps_nargvstr;
oargv = ps_strings->ps_argvstr;
for (i = len = 0; i < oargc; i++) {
/*
 * The program may have scribbled into its
 * argv array, e.g., to remove some arguments.
 * If that has happened, break out before
 * trying to call strlen on a NULL pointer.
 */
if (oargv[i] == NULL) {
oargc = i;
break;
}

I have updated my patch to do the same.

Running buildworld test after these changes I have observed EFAULT only once,
for cc process, when argv contained a pointer to 0x40.

Also, for kern.proc.args some times errors like below are observed:

procstat: sysctl: kern.proc.args: 58002: 8: Exec format error

And for kern.proc.env:

procstat: sysctl: kern.proc.env: 81352: 16: Device busy

But I have not investigated these cases yet.

The update version:

http://people.freebsd.org/~trociny/env.sys.2.patch

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-11-02 Thread Mikolaj Golub

On Mon, 31 Oct 2011 11:49:48 +0200 Kostik Belousov wrote:

 KB> I think it is better to use sys/elf.h over the machine/elf.h.

 KB> Please change the comment for PROC_AUXV_MAX to "Safety limit on auxv size".
 KB> Also, it worth adding a comment saying that we are reading aux vectors 
twice,
 KB> first to get a size, second time to fetch a content, for simplicity.

 KB> When reading aux vector, if the PROC_AUXV_MAX entries are iterated over,
 KB> and we still not reached AT_NULL, the return error is 0. Was it intended ?

 KB> For PROC_ARG and PROC_ENV, you blindly trust the read values of the arg and
 KB> env vector sizes. This can easily cause kernel panics due to unability to
 KB> malloc the requested memory. I recommend to put some clump, and twice
 KB> of (PATH_MAX + ARG_MAX) is probably enough (see kern_exec.c, in particular,
 KB> exec_alloc_args). Also, you might use the swappable memory for the strings
 KB> as well, in the style of exec_alloc_args().

 KB> I suspect this is my bug: Reading the GET_PS_STRINGS_CHUNK_SZ may validly
 KB> return EFAULT if the string is shorter than the chunk and aligned at
 KB> the end of the page, assuming the next page is not mapped. There should
 KB> be a fallback to fubyte() read loop. I remember that copyinstr() was
 KB> unsuitable.

 KB> The checks for P_WEXIT in the linprocfs routines look strange. Since
 KB> you are unlocking the process right after the check, it does not make
 KB> sense. In fact, the checks are not needed, I believe, since pseudofs
 KB> already did the hold (see e.g. pfs_read and pfs_visible).

Here is an updated version of the patch. Also available at 

http://people.freebsd.org/~trociny/env.sys.1.patch

I decided to use the same constant (PROC_VECTOR_MAX) for limiting both the 
number
of arg or env strings and the numbex of aux vectors.

Also I decided not to play with exec_alloc_args :-).

-- 
Mikolaj Golub

diff --git a/sys/sys/proc.h b/sys/sys/proc.h
index fb97913..4949f98 100644
--- a/sys/sys/proc.h
+++ b/sys/sys/proc.h
@@ -168,6 +168,7 @@ struct p_sched;
 struct proc;
 struct procdesc;
 struct racct;
+struct sbuf;
 struct sleepqueue;
 struct td_sched;
 struct thread;
@@ -843,6 +844,10 @@ int	p_canwait(struct thread *td, struct proc *p);
 struct	pargs *pargs_alloc(int len);
 void	pargs_drop(struct pargs *pa);
 void	pargs_hold(struct pargs *pa);
+int	proc_getargv(struct thread *td, struct proc *p, struct sbuf *sb,
+	size_t nchr);
+int	proc_getenvv(struct thread *td, struct proc *p, struct sbuf *sb,
+	size_t nchr);
 void	procinit(void);
 void	proc_linkup0(struct proc *p, struct thread *td);
 void	proc_linkup(struct proc *p, struct thread *td);
diff --git a/sys/sys/sysctl.h b/sys/sys/sysctl.h
index 1e879f5..99ea342 100644
--- a/sys/sys/sysctl.h
+++ b/sys/sys/sysctl.h
@@ -559,6 +559,8 @@ SYSCTL_ALLOWED_TYPES(UINT64, uint64_t *a; unsigned long long *b; );
 #define	KERN_PROC_VMMAP		32	/* VM map entries for process */
 #define	KERN_PROC_FILEDESC	33	/* File descriptors for process */
 #define	KERN_PROC_GROUPS	34	/* process groups */
+#define	KERN_PROC_ENV		35	/* get environment */
+#define	KERN_PROC_AUXV		36	/* get ELF auxiliary vector */
 
 /*
  * KERN_IPC identifiers
diff --git a/sys/kern/kern_proc.c b/sys/kern/kern_proc.c
index 998e7ca..cc7c746 100644
--- a/sys/kern/kern_proc.c
+++ b/sys/kern/kern_proc.c
@@ -41,6 +41,8 @@ __FBSDID("$FreeBSD$");
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -49,6 +51,7 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -75,6 +78,7 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #ifdef COMPAT_FREEBSD32
@@ -1356,6 +1360,290 @@ pargs_drop(struct pargs *pa)
 		pargs_free(pa);
 }
 
+static int
+proc_read_mem(struct thread *td, struct proc *p, vm_offset_t offset, void* buf,
+size_t len)
+{
+	struct iovec iov;
+	struct uio uio;
+
+	iov.iov_base = (caddr_t)buf;
+	iov.iov_len = len;
+	uio.uio_iov = &iov;
+	uio.uio_iovcnt = 1;
+	uio.uio_offset = offset;
+	uio.uio_resid = (ssize_t)len;
+	uio.uio_segflg = UIO_SYSSPACE;
+	uio.uio_rw = UIO_READ;
+	uio.uio_td = td;
+
+	return (proc_rwmem(p, &uio));
+}
+
+static int
+proc_read_string(struct thread *td, struct proc *p, const char *sptr, char *buf,
+size_t len)
+{
+	size_t i;
+	int error, c;
+
+	error = proc_read_mem(td, p, (vm_offset_t)sptr, buf, len);
+	/*
+	 * Reading the chunk may validly return EFAULT if the string is shorter
+	 * than the chunk and is aligned at the end of the page, assuming the
+	 * next page is not mapped.  So if EFAULT is returned do a fallback to
+	 * fubyte() read loop.
+	 */
+	if (error == EFAULT) {
+		for (i = 0; i < len; i++) {
+			c = fubyte(sptr + i);
+			if (c < 0)
+return (EFAULT);
+			buf[i] = (char)c;
+			if (c == '\0')
+break;
+		}
+		error = 0;
+	}
+	return error;
+

Re: "ps -e" without procfs(5)

2011-11-01 Thread Mikolaj Golub

On Mon, 31 Oct 2011 11:49:48 +0200 Kostik Belousov wrote:

 KB> For PROC_ARG and PROC_ENV, you blindly trust the read values of the arg and
 KB> env vector sizes. This can easily cause kernel panics due to unability to
 KB> malloc the requested memory. I recommend to put some clump, and twice
 KB> of (PATH_MAX + ARG_MAX) is probably enough (see kern_exec.c, in particular,
 KB> exec_alloc_args). Also, you might use the swappable memory for the strings
 KB> as well, in the style of exec_alloc_args().

After looking at it more closely, I am not sure if I need to use
exec_alloc_args. I malloc explicitly only for array vector (proc_vector). And
actually it should be much smaller than 2 * (PATH_MAX + ARG_MAX). Currently in
linprocfs the limit is 512 entries:

#define MAX_ARGV_STR512 /* Max number of argv-like strings */

The same limit is in libkvm:

/*
 * Check that there aren't an unreasonable number of arguments,
 * and that the address is in user space.  Special test for
 * VM_MIN_ADDRESS as it evaluates to zero, but is not a simple zero
 * constant for some archs.  We cannot use the pre-processor here and
 * for some archs the compiler would trigger a signedness warning.
 */
if (narg > 512 || addr + 1 < VM_MIN_ADDRESS + 1 || addr >= 
VM_MAXUSER_ADDRESS)
return (0);

(BTW, may be the VM_MIN_ADDRESS - VM_MAXUSER_ADDRESS is worth adding in my
code too?)

So it looks like I should use the same limit (512 * sizeof(char *)) for the
allocated array. I could use exec_alloc_args() for the allocation but it would
reqire some changes: I would have to free using kmem_free_wakeup(), which
requires size of the region, while I return the number of entries. So I'd
rather not use exec_alloc_args() for vector allocation because the benefit is
not significant here.

For strings I use sbuf and set it up using sbuf_new_for_sysctl. I could set it
up manually as SBUF_FIXEDLEN allocating buf (up to 2 * (PATH_MAX + ARG_MAX))
with exec_alloc_args() but this would complicate things a little. Do you think
it is worth doing?

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-10-31 Thread Mikolaj Golub

On Mon, 31 Oct 2011 11:49:48 +0200 Kostik Belousov wrote:

 KB> On Sat, Oct 29, 2011 at 01:32:39PM +0300, Mikolaj Golub wrote:
 >> 
 >> What do you think about the attached patch? This is a kernel
 >> part. COMPAT_FREEBSD32 has not been tested after the last update (just 
 >> checked
 >> that it compiles): it looks I will not have access to amd64 box for testing
 >> during the weekend. I will test it after the weekend.
 >> 
 >> Both kernel and userland parts are available here:
 >> 
 >> http://people.freebsd.org/~trociny/env.sys.patch
 >> http://people.freebsd.org/~trociny/env.user.patch
 >> 
 >> Currently there is an issue with procstat -x: if one tried to run it on 64 
 >> bit
 >> for a 32 bit process it would not detect this so would output a garbage. 
 >> Could
 >> somebody recommend a way how to get this info about a process from userlend?

 KB> I think it is better to use sys/elf.h over the machine/elf.h.

 KB> Please change the comment for PROC_AUXV_MAX to "Safety limit on auxv size".
 KB> Also, it worth adding a comment saying that we are reading aux vectors 
twice,
 KB> first to get a size, second time to fetch a content, for simplicity.

 KB> When reading aux vector, if the PROC_AUXV_MAX entries are iterated over,
 KB> and we still not reached AT_NULL, the return error is 0. Was it intended ?

According to kern_exec.c it is possible that a process doesn't have auxv at
all. I don't know a way how to detect this. So because PROC_AUXV_MAX is much
larger than expected amount of aux entries and we have not reached AT_NULL it
is most likely the process doesn't have auxv and 0 length array (without
error) is returned. 

If you think I should return a error in this situation, I can add this. Please
tell me the error code I should return :-).

Also, may be there is a sane way to check on auxv existence?

 KB> For PROC_ARG and PROC_ENV, you blindly trust the read values of the arg and
 KB> env vector sizes. This can easily cause kernel panics due to unability to
 KB> malloc the requested memory. I recommend to put some clump, and twice
 KB> of (PATH_MAX + ARG_MAX) is probably enough (see kern_exec.c, in particular,
 KB> exec_alloc_args). Also, you might use the swappable memory for the strings
 KB> as well, in the style of exec_alloc_args().

 KB> I suspect this is my bug: Reading the GET_PS_STRINGS_CHUNK_SZ may validly
 KB> return EFAULT if the string is shorter than the chunk and aligned at
 KB> the end of the page, assuming the next page is not mapped. There should
 KB> be a fallback to fubyte() read loop. I remember that copyinstr() was
 KB> unsuitable.

 KB> The checks for P_WEXIT in the linprocfs routines look strange. Since
 KB> you are unlocking the process right after the check, it does not make
 KB> sense. In fact, the checks are not needed, I believe, since pseudofs
 KB> already did the hold (see e.g. pfs_read and pfs_visible).

Ah, right. Unintentionally added when was adding the P_SYSTEM check.

Thank you for all your comments. I will do this.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-10-29 Thread Mikolaj Golub

On Tue, 25 Oct 2011 11:24:51 +0300 Kostik Belousov wrote:

 KB> On Tue, Oct 25, 2011 at 12:13:10AM +0300, Mikolaj Golub wrote:
 >> 
 >> On Sun, 16 Oct 2011 20:10:05 +0300 Kostik Belousov wrote:
 >> 
 >>  KB> In my opinion, the way to implement the feature is to (re)use
 >>  KB> linprocfs_doargv() and provide another kern.proc sysctl to retrieve the
 >>  KB> argv and env vectors. Then, ps(1) and procstat(1) can use it, as well 
 >> as
 >>  KB> procfs and linprocfs inside the kernel.
 >> 
 >> Thanks! I am testing a patch (without auxv vector so far) and have some
 >> questions.
 >> 
 >> Original ps -e returns environment only for user owned processes (the 
 >> access is
 >> restricted by the permissions of /proc/pid/mem file). My kern.proc.env 
 >> sysctl
 >> does not have such a restriction. I suppose I should add it? What function I
 >> could use for this?
 >> 
 >> BTW, linprocfs allows to read other user's environment.
 KB> linprocfs uses p_cansee() to check the permissions. There are sysctls
 KB> security.bsd.see_other_{ug}ids that control the behaviour.

 KB> I believe that the new sysctl shall use the same check.

 >> 
 >>  KB> While you are at the code, it would be useful to also export the auxv 
 >> vector,
 >>  KB> which is immediately before env.
 >> 
 >> It looks I can find the location of auxv but what about the size? Or do you
 >> propose to extend struct ps_strings to store location and size of auxv? I
 >> could do this way...

 KB> No, extending ps_strings is not needed and it is too radical change.
 KB> The auxv vector must end by the AT_NULL aux entry. You can also 
artificially
 KB> limit the amount of read aux vectors to, say, 256, which is much more then
 KB> it is currently defined.

What do you think about the attached patch? This is a kernel
part. COMPAT_FREEBSD32 has not been tested after the last update (just checked
that it compiles): it looks I will not have access to amd64 box for testing
during the weekend. I will test it after the weekend.

Both kernel and userland parts are available here:

http://people.freebsd.org/~trociny/env.sys.patch
http://people.freebsd.org/~trociny/env.user.patch

Currently there is an issue with procstat -x: if one tried to run it on 64 bit
for a 32 bit process it would not detect this so would output a garbage. Could
somebody recommend a way how to get this info about a process from userlend?

-- 
Mikolaj Golub

diff --git a/sys/sys/proc.h b/sys/sys/proc.h
index fb97913..4949f98 100644
--- a/sys/sys/proc.h
+++ b/sys/sys/proc.h
@@ -168,6 +168,7 @@ struct p_sched;
 struct proc;
 struct procdesc;
 struct racct;
+struct sbuf;
 struct sleepqueue;
 struct td_sched;
 struct thread;
@@ -843,6 +844,10 @@ int	p_canwait(struct thread *td, struct proc *p);
 struct	pargs *pargs_alloc(int len);
 void	pargs_drop(struct pargs *pa);
 void	pargs_hold(struct pargs *pa);
+int	proc_getargv(struct thread *td, struct proc *p, struct sbuf *sb,
+	size_t nchr);
+int	proc_getenvv(struct thread *td, struct proc *p, struct sbuf *sb,
+	size_t nchr);
 void	procinit(void);
 void	proc_linkup0(struct proc *p, struct thread *td);
 void	proc_linkup(struct proc *p, struct thread *td);
diff --git a/sys/sys/sysctl.h b/sys/sys/sysctl.h
index 1e879f5..99ea342 100644
--- a/sys/sys/sysctl.h
+++ b/sys/sys/sysctl.h
@@ -559,6 +559,8 @@ SYSCTL_ALLOWED_TYPES(UINT64, uint64_t *a; unsigned long long *b; );
 #define	KERN_PROC_VMMAP		32	/* VM map entries for process */
 #define	KERN_PROC_FILEDESC	33	/* File descriptors for process */
 #define	KERN_PROC_GROUPS	34	/* process groups */
+#define	KERN_PROC_ENV		35	/* get environment */
+#define	KERN_PROC_AUXV		36	/* get ELF auxiliary vector */
 
 /*
  * KERN_IPC identifiers
diff --git a/sys/kern/kern_proc.c b/sys/kern/kern_proc.c
index 998e7ca..ef4055a 100644
--- a/sys/kern/kern_proc.c
+++ b/sys/kern/kern_proc.c
@@ -41,6 +41,7 @@ __FBSDID("$FreeBSD$");
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -49,6 +50,7 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -66,6 +68,8 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 
+#include 
+
 #ifdef DDB
 #include 
 #endif
@@ -1356,6 +1360,218 @@ pargs_drop(struct pargs *pa)
 		pargs_free(pa);
 }
 
+static int
+proc_read_mem(struct thread *td, struct proc *p, vm_offset_t offset, void* buf,
+size_t len)
+{
+	struct iovec iov;
+	struct uio uio;
+
+	iov.iov_base = (caddr_t)buf;
+	iov.iov_len = len;
+	uio.uio_iov = &iov;
+	uio.uio_iovcnt = 1;
+	uio.uio_offset = offset;
+	uio.uio_resid = (ssize_t)len;
+	uio.uio_segflg = UIO_SYSSPACE;
+	uio.uio_rw = UIO_READ;
+	uio.uio_td = td;
+
+	return (proc_rwmem(p, &uio));
+}
+
+#define PROC_AUXV_MAX	256	/* Limit on auxv size. */
+
+enum proc_v

Re: "ps -e" without procfs(5)

2011-10-25 Thread Mikolaj Golub

On Tue, 25 Oct 2011 11:24:51 +0300 Kostik Belousov wrote:

 KB> On Tue, Oct 25, 2011 at 12:13:10AM +0300, Mikolaj Golub wrote:
 >> 
 >> On Sun, 16 Oct 2011 20:10:05 +0300 Kostik Belousov wrote:
 >> 
 >>  KB> In my opinion, the way to implement the feature is to (re)use
 >>  KB> linprocfs_doargv() and provide another kern.proc sysctl to retrieve the
 >>  KB> argv and env vectors. Then, ps(1) and procstat(1) can use it, as well 
 >> as
 >>  KB> procfs and linprocfs inside the kernel.
 >> 
 >> Thanks! I am testing a patch (without auxv vector so far) and have some
 >> questions.
 >> 
 >> Original ps -e returns environment only for user owned processes (the 
 >> access is
 >> restricted by the permissions of /proc/pid/mem file). My kern.proc.env 
 >> sysctl
 >> does not have such a restriction. I suppose I should add it? What function I
 >> could use for this?
 >> 
 >> BTW, linprocfs allows to read other user's environment.
 KB> linprocfs uses p_cansee() to check the permissions. There are sysctls
 KB> security.bsd.see_other_{ug}ids that control the behaviour.

 KB> I believe that the new sysctl shall use the same check.

This looks reasonable for me. But I just wanted to be sure that this would be
ok for other people, as my patch changes the system behavior: currently with
security.bsd.see_other_{ug}ids and procfs (not linprocfs) mounted a user can
see other users args but not env; after the change a user will see both args
and env (until security.bsd.see_other_{ug}ids is off).

 >> 
 >>  KB> While you are at the code, it would be useful to also export the auxv 
 >> vector,
 >>  KB> which is immediately before env.
 >> 
 >> It looks I can find the location of auxv but what about the size? Or do you
 >> propose to extend struct ps_strings to store location and size of auxv? I
 >> could do this way...

 KB> No, extending ps_strings is not needed and it is too radical change.
 KB> The auxv vector must end by the AT_NULL aux entry. You can also 
artificially
 KB> limit the amount of read aux vectors to, say, 256, which is much more then
 KB> it is currently defined.

Thanks.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "ps -e" without procfs(5)

2011-10-24 Thread Mikolaj Golub

On Sun, 16 Oct 2011 20:10:05 +0300 Kostik Belousov wrote:

 KB> In my opinion, the way to implement the feature is to (re)use
 KB> linprocfs_doargv() and provide another kern.proc sysctl to retrieve the
 KB> argv and env vectors. Then, ps(1) and procstat(1) can use it, as well as
 KB> procfs and linprocfs inside the kernel.

Thanks! I am testing a patch (without auxv vector so far) and have some
questions.

Original ps -e returns environment only for user owned processes (the access is
restricted by the permissions of /proc/pid/mem file). My kern.proc.env sysctl
does not have such a restriction. I suppose I should add it? What function I
could use for this?

BTW, linprocfs allows to read other user's environment.

 KB> While you are at the code, it would be useful to also export the auxv 
vector,
 KB> which is immediately before env.

It looks I can find the location of auxv but what about the size? Or do you
propose to extend struct ps_strings to store location and size of auxv? I
could do this way...

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


"ps -e" without procfs(5)

2011-10-16 Thread Mikolaj Golub
Hi,

I have a patch that makes kvm_uread() read from user space using ptrace(2).

http://people.freebsd.org/~trociny/kvm_uread.ptrace.patch

With this change 'ps -e' does not requires procfs(5).

Do you like it or there might be some reasons why it is a bad idea?

Grepping sources it looks like currently only ps uses kvm_getenvv(3) (and thus
kvm_uread()).

Note, when reading from its own user space it just does bcopy(3), so if a
wrong address range is passed to kvm_uread() the program will segfault. Do we
need some protection here and what? Masking SIGSEGV?

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: issues with kern.devstat.all

2011-03-22 Thread Mikolaj Golub

On Tue, 22 Mar 2011 00:15:06 +0300 Andrey Zonov wrote:

 AZ> Hi,

 AZ> This sysctl contains a binary data. You can see it using -o or -x 
 AZ> sysctl's key.
 AZ> Additional information is at devstat(3) manpage.

Or try sysutils/devstat :-)

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: flowtable_cleaner/flowtable_flush livelock

2010-11-20 Thread Mikolaj Golub

On Sat, 20 Nov 2010 20:04:35 + (UTC) Bjoern A. Zeeb wrote:

 BAZ> How do you reproduce the crash?  Is it just another ifioctl race as
 BAZ> from kern/146250?

Using the same script I posted in my first mail, removing a jail and epair
interface simultaneously:

  ifconfig epair0b vnet myjail
  jail -r myjail &
  ifconfig epair0a destroy

For me it loooks like other thread is destroying interface improperly in that
time. One time I saw crash in another thread:

(kgdb) bt
#0  doadump () at pcpu.h:231
#1  0xc04f2439 in db_fncall (dummy1=1, dummy2=0, dummy3=-1056689728, 
dummy4=0xc2ba5984 "")
at /usr/src/sys/ddb/db_command.c:548
#2  0xc04f2831 in db_command (last_cmdp=0xc0e75cfc, cmd_table=0x0, dopager=1)
at /usr/src/sys/ddb/db_command.c:445
#3  0xc04f298a in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
#4  0xc04f48ad in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:229
#5  0xc090face in kdb_trap (type=12, code=0, tf=0xc2ba5bf8) at 
/usr/src/sys/kern/subr_kdb.c:546
#6  0xc0c3d2bf in trap_fatal (frame=0xc2ba5bf8, eva=3735929066)
at /usr/src/sys/i386/i386/trap.c:971
#7  0xc0c3d4f0 in trap_pfault (frame=0xc2ba5bf8, usermode=0, eva=3735929066)
at /usr/src/sys/i386/i386/trap.c:893
#8  0xc0c3dca5 in trap (frame=0xc2ba5bf8) at /usr/src/sys/i386/i386/trap.c:568
#9  0xc0c24a9c in calltrap () at /usr/src/sys/i386/i386/exception.s:168
#10 0xc09ad219 in vnet_destroy (vnet=0xc2f24240) at /usr/src/sys/net/vnet.c:284
#11 0xc08b5922 in prison_deref (pr=0xc3640800, flags=Variable "flags" is not 
available.
) at /usr/src/sys/kern/kern_jail.c:2506
#12 0xc08b5ab0 in prison_complete (context=0xc3640800, pending=1)
at /usr/src/sys/kern/kern_jail.c:2433
#13 0xc091c87b in taskqueue_run_locked (queue=0xc2dd6d80)
at /usr/src/sys/kern/subr_taskqueue.c:247
#14 0xc091cf17 in taskqueue_thread_loop (arg=0xc0ebb8e8)
at /usr/src/sys/kern/subr_taskqueue.c:379
#15 0xc08af558 in fork_exit (callout=0xc091ceb0 , 
arg=0xc0ebb8e8, 
frame=0xc2ba5d28) at /usr/src/sys/kern/kern_fork.c:835
#16 0xc0c24b44 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:275
(kgdb) fr 10
#10 0xc09ad219 in vnet_destroy (vnet=0xc2f24240) at /usr/src/sys/net/vnet.c:284
284 TAILQ_FOREACH_SAFE(ifp, &V_ifnet, if_link, nifp) {
(kgdb) list
279 VNET_LIST_WUNLOCK();
280
281 CURVNET_SET_QUIET(vnet);
282
283 /* Return all inherited interfaces to their parent vnets. */
284 TAILQ_FOREACH_SAFE(ifp, &V_ifnet, if_link, nifp) {
285 if (ifp->if_home_vnet != ifp->if_vnet)
286 if_vmove(ifp, ifp->if_home_vnet);
287 }
288
(kgdb) p ifp
$1 = (struct ifnet *) 0xdeadc0de

Doesn't this need some lock protection? I tried the attached patch, but still
observed crashes in ifioctl I posted earlier.  

-- 
Mikolaj Golub

Index: sys/net/vnet.c
===
--- sys/net/vnet.c	(revision 215576)
+++ sys/net/vnet.c	(working copy)
@@ -268,7 +268,7 @@ vnet_alloc(void)
 void
 vnet_destroy(struct vnet *vnet)
 {
-	struct ifnet *ifp, *nifp;
+	struct ifnet *ifp;
 
 	SDT_PROBE2(vnet, functions, vnet_destroy, entry, __LINE__, vnet);
 	KASSERT(vnet->vnet_sockcnt == 0,
@@ -281,10 +281,20 @@ vnet_destroy(struct vnet *vnet)
 	CURVNET_SET_QUIET(vnet);
 
 	/* Return all inherited interfaces to their parent vnets. */
-	TAILQ_FOREACH_SAFE(ifp, &V_ifnet, if_link, nifp) {
-		if (ifp->if_home_vnet != ifp->if_vnet)
+	do {
+		IFNET_RLOCK();
+		TAILQ_FOREACH(ifp, &V_ifnet, if_link) {
+			if (ifp->if_home_vnet != ifp->if_vnet) {
+if_ref(ifp);
+break;
+			}
+		}
+		IFNET_RUNLOCK();
+		if (ifp != NULL) {
 			if_vmove(ifp, ifp->if_home_vnet);
-	}
+			if_rele(ifp);
+		}
+	} while (ifp != NULL);
 
 	vnet_sysuninit();
 	CURVNET_RESTORE();
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: flowtable_cleaner/flowtable_flush livelock

2010-11-20 Thread Mikolaj Golub

On Sat, 20 Nov 2010 17:03:13 + (UTC) Bjoern A. Zeeb wrote:

 BAZ> I think net@ would have been a better initial place but since this
 BAZ> seems to be a problem when interacting with VIMAGE
 BAZ> freebsd-virtualization might be better.

 BAZ> What you could try is:
 BAZ> http://people.freebsd.org/~bz/20100216-10-ft-cv.diff

Ah, I have recalled I had already saw this patch but did not understand what
the problem was that it fixed, thus did not associated it with my case
(actually, I thought you had committed all these patches to the tree long time
ago and I was running the kernel with them already :-).

BTW, the patch needs updating: in the current flow_full() wakes up flowcleaner
too, and flowcleaner sleeps for flowclean_freq instead of 10*hz (see the
attached patch).

With the patch I can't reproduce the lock. Only the crash I mentioned in my
first letter is observed:

(kgdb) bt
#0  doadump () at pcpu.h:231
#1  0xc04f2789 in db_fncall (dummy1=1, dummy2=0, dummy3=-1056677760, 
dummy4=0xc8731860 "")
at /usr/src/sys/ddb/db_command.c:548
#2  0xc04f2b81 in db_command (last_cmdp=0xc0e79f7c, cmd_table=0x0, dopager=1)
at /usr/src/sys/ddb/db_command.c:445
#3  0xc04f2cda in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
#4  0xc04f4bfd in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:229
#5  0xc09119be in kdb_trap (type=12, code=0, tf=0xc8731a94) at 
/usr/src/sys/kern/subr_kdb.c:546
#6  0xc0c3da8f in trap_fatal (frame=0xc8731a94, eva=3735929074)
at /usr/src/sys/i386/i386/trap.c:970
#7  0xc0c3e0be in trap (frame=0xc8731a94) at /usr/src/sys/i386/i386/trap.c:361
#8  0xc0c272dc in calltrap () at /usr/src/sys/i386/i386/exception.s:168
#9  0xc0988415 in strncmp (s1=0xc1fee4e0 "epair20b", 
s2=0xdeadc0f2 , n=16) at 
/usr/src/sys/libkern/strncmp.c:44
#10 0xc09929d7 in ifunit_ref (name=0xc1fee4e0 "epair20b") at 
/usr/src/sys/net/if.c:1986
#11 0xc0996982 in ifioctl (so=0xc25649c0, cmd=3223349536, data=0xc1fee4e0 
"epair20b", 
td=0xc286c000) at /usr/src/sys/net/if.c:2475
#12 0xc09307f7 in soo_ioctl (fp=0xc1ff5af0, cmd=3223349536, data=0xc1fee4e0, 
active_cred=0xc1d83e80, td=0xc286c000) at /usr/src/sys/kern/sys_socket.c:212
#13 0xc092a61d in kern_ioctl (td=0xc286c000, fd=3, com=3223349536, 
data=0xc1fee4e0 "epair20b")
at file.h:254
#14 0xc092a7a4 in ioctl (td=0xc286c000, uap=0xc8731cec) at 
/usr/src/sys/kern/sys_generic.c:679
#15 0xc091f303 in syscallenter (td=0xc286c000, sa=0xc8731ce4)
at /usr/src/sys/kern/subr_trap.c:318
#16 0xc0c3dd2f in syscall (frame=0xc8731d28) at 
/usr/src/sys/i386/i386/trap.c:1094
#17 0xc0c27371 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:266
#18 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) fr 10
#10 0xc09929d7 in ifunit_ref (name=0xc1fee4e0 "epair20b") at 
/usr/src/sys/net/if.c:1986
1986if (strncmp(name, ifp->if_xname, IFNAMSIZ) == 0 &&
(kgdb) p ifp
$1 = (struct ifnet *) 0xdeadc0de

I might want to report it to freebsd-virtualization unless I find that this is
a known issue.

-- 
Mikolaj Golub

Index: sys/net/flowtable.c
===
--- sys/net/flowtable.c	(revision 215574)
+++ sys/net/flowtable.c	(working copy)
@@ -195,7 +195,8 @@ STATIC_VNET_DEFINE(uma_zone_t, flow_ipv6_zone);
 #define	V_flow_ipv6_zone	VNET(flow_ipv6_zone)
 
 
-static struct cv 	flowclean_cv;
+static struct cv 	flowclean_f_cv;
+static struct cv 	flowclean_c_cv;
 static struct mtx	flowclean_lock;
 static uint32_t		flowclean_cycles;
 static uint32_t		flowclean_freq;
@@ -951,7 +952,7 @@ flow_full(struct flowtable *ft)
 		if ((ft->ft_flags & FL_HASH_ALL) == 0)
 			ft->ft_udp_idle = ft->ft_fin_wait_idle =
 			ft->ft_syn_idle = ft->ft_tcp_idle = 5;
-		cv_broadcast(&flowclean_cv);
+		cv_broadcast(&flowclean_c_cv);
 	} else if (!full && ft->ft_full) {
 		flowclean_freq = 20*hz;
 		if ((ft->ft_flags & FL_HASH_ALL) == 0)
@@ -1560,14 +1561,14 @@ flowtable_cleaner(void)
 		}
 		VNET_LIST_RUNLOCK();
 
-		flowclean_cycles++;
 		/*
 		 * The 10 second interval between cleaning checks
 		 * is arbitrary
 		 */
 		mtx_lock(&flowclean_lock);
-		cv_broadcast(&flowclean_cv);
-		cv_timedwait(&flowclean_cv, &flowclean_lock, flowclean_freq);
+		flowclean_cycles++;
+		cv_broadcast(&flowclean_f_cv);
+		cv_timedwait(&flowclean_c_cv, &flowclean_lock, 10*hz);
 		mtx_unlock(&flowclean_lock);
 	}
 }
@@ -1580,8 +1581,8 @@ flowtable_flush(void *unused __unused)
 	mtx_lock(&flowclean_lock);
 	start = flowclean_cycles;
 	while (start == flowclean_cycles) {
-		cv_broadcast(&flowclean_cv);
-		cv_wait(&flowclean_cv, &flowclean_lock);
+		cv_broadcast(&flowclean_c_cv);
+		cv_wait(&flowclean_f_cv, &flowclean_lock);
 	}
 	mtx_unlock(&flowclean_lock);
 }
@@ -1613,7 +1614,8 @@ static vo

Re: flowtable_cleaner/flowtable_flush livelock

2010-11-20 Thread Mikolaj Golub

On Sat, 20 Nov 2010 17:03:13 + (UTC) Bjoern A. Zeeb wrote:

 BAZ> On Sat, 20 Nov 2010, Mikolaj Golub wrote:

 BAZ> Hi,

 >> Running something like below under VirtualBox (CURRENT, VIMAGE)
 BAZ> ...
 >> So the question is who is guilty in this situation? ULE? flowtable? Or
 >> jail/epair, which should not allow simultaneous entering of flowtable_flush?

 BAZ> In general: you for running an experimental feature;-)

I like experimenting :-)

 BAZ> What you could try is:
 BAZ> http://people.freebsd.org/~bz/20100216-10-ft-cv.diff

I will. Thanks.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


flowtable_cleaner/flowtable_flush livelock

2010-11-20 Thread Mikolaj Golub
0004,33b,c1fd47f8,...) at 
flowtable_cleaner+0x255
fork_exit(c0990630,0,c43c7d28) at fork_exit+0xb8
fork_trampoline() at fork_trampoline+0x8

In net/flowtable.c we have two functions:

static void
flowtable_cleaner(void)
{
...
while (1) {
...
flowclean_cycles++;
mtx_lock(&flowclean_lock);
cv_broadcast(&flowclean_cv);
cv_timedwait(&flowclean_cv, &flowclean_lock, flowclean_freq);
mtx_unlock(&flowclean_lock);
}
}

static void
flowtable_flush(void *unused __unused)
{
uint64_t start;

mtx_lock(&flowclean_lock);
start = flowclean_cycles;
while (start == flowclean_cycles) {
cv_broadcast(&flowclean_cv);
cv_wait(&flowclean_cv, &flowclean_lock);
}
mtx_unlock(&flowclean_lock);
}

It looks like when two threads enter flowtable_flush() simultaneously they 
start to
wake up each other not giving to flowcleaner thread (which is in RUNQ) a chance
to run (I suppose because it has higher priority number) and update
flowclean_cycles counter. I added print in flowtable_flush() loop to check my
assumption and got:

flowtable_flush: start(C43FEB14): 23; flowclean_cycles: 23
flowtable_flush: start(C87439F8): 23; flowclean_cycles: 23
flowtable_flush: start(C43FEB14): 23; flowclean_cycles: 23
flowtable_flush: start(C87439F8): 23; flowclean_cycles: 23
flowtable_flush: start(C43FEB14): 23; flowclean_cycles: 23
flowtable_flush: start(C87439F8): 23; flowclean_cycles: 23
flowtable_flush: start(C43FEB14): 23; flowclean_cycles: 23
flowtable_flush: start(C87439F8): 23; flowclean_cycles: 23
...

So the question is who is guilty in this situation? ULE? flowtable? Or
jail/epair, which should not allow simultaneous entering of flowtable_flush?

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: close() failing with ECONNRESET

2010-06-09 Thread Mikolaj Golub

On Wed, 9 Jun 2010 12:45:52 +0300 Kostik Belousov wrote:

 KB> On Wed, Jun 09, 2010 at 12:35:11AM -0700, per...@pluto.rain.com wrote:
 >> Timo Sirainen  wrote:
 >> 
 >> > I see that since FreeBSD 6.3 close() can fail with:
 >> >
 >> > > [ECONNRESET]The underlying object was a stream socket that was
 >> > > shut down by the peer before all pending data was
 >> > > delivered.
 >> >
 >> > Could someone explain what this is useful for?
 KB> Note that any return from close(2) that does not set errno to EBADF
 KB> closes the supplied file descriptor. Mentioned errno value supplies
 KB> caller with the information that not "all pending data was delivered".

We have kern/146845 about close(2) returning ECONNRESET for tcp
connections. Looking at the code (which I am not very familiar with though)
and running some tests make me think that currently ECONNRESET may be only
returned by close(2) after shutdown()/close() on our side and simultaneous
close() on the other side (and in this case this is wrong).

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: leak of the vnodes

2010-04-04 Thread Mikolaj Golub
On Sat, 3 Apr 2010 19:52:38 +0300 Kostik Belousov wrote:

> Then, after you determined the problematic mp, reboot the machine,
> redo the procedure causing leak. From ddb prompt, you can do "show mount",
> find the mp, then do "show mount ". The later command shall
> produce really large output, listing all mp vnodes, so serial console
> or firewire can be useful. Put output somewhere.

Or use ddb capture buffer :-). In ddb:

capture on

capture off
continue

And then

ddb capture print > capture.out

Make sure your capture buffer is large enough (I have in my /etc/sysctl.conf
debug.ddb.capture.bufsize=5242880).

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: unix socket: race on close?

2010-02-18 Thread Mikolaj Golub
On Thu, 18 Feb 2010 11:59:40 + (GMT) Robert Watson wrote:

> On Thu, 18 Feb 2010, Mikolaj Golub wrote:
>
>> Below is a simple test code with unix sockets: the client does
>> connect()/close() in loop and the server -- accept()/close().
>>
>> Sometimes close() fails with 'Socket is not connected' error:
>
> Hi Mikolaj:
>
> Thanks for this report, and sorry about not spotting your earlier post
> to freebsd-net.  I've been fairly preoccupied the last month and not
> keeping up with the mailing lists.  Could I ask you to file a PR on
> this, and forward me the PR number so I can claim ownership?  This
> should prevent it from getting lost while I catch up.

kern/144061

> In short, your evaluation seems reasonable to me -- have you tried
> tweaking soclose() to ignore ENOTCONN from sodisconnect() to confirm
> this diagnosis fixes all the instances you've been seeing?

I just have done this:

1) add logging the error when sodisconnect() returns error:

--- uipc_socket.c.orig  2010-02-18 14:25:25.0 +0200
+++ uipc_socket.c   2010-02-18 14:55:26.0 +0200
@@ -120,6 +120,7 @@ __FBSDID("$FreeBSD: src/sys/kern/uipc_so
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -136,6 +137,7 @@ __FBSDID("$FreeBSD: src/sys/kern/uipc_so
 
 #include 
 
+
 #ifdef COMPAT_IA32
 #include 
 #include 
@@ -657,7 +659,7 @@ soclose(struct socket *so)
if ((so->so_state & SS_ISDISCONNECTING) == 0) {
error = sodisconnect(so);
if (error)
-   goto drop;
+   log(LOG_INFO, "soclose: sodisconnect error: 
%d\n", error);
}
if (so->so_options & SO_LINGER) {
if ((so->so_state & SS_ISDISCONNECTING) &&


Then on every error exit of the test application, like this

a.out: parent: close error: 57

I have in the message log:

Feb 18 15:35:32 zhuzha kernel: soclose: sodisconnect error: 57

2) add logging the error when sodisconnect() returns error and ignore the error:

--- uipc_socket.c.orig  2010-02-18 14:25:25.0 +0200
+++ uipc_socket.c   2010-02-18 15:41:07.0 +0200
@@ -120,6 +120,7 @@ __FBSDID("$FreeBSD: src/sys/kern/uipc_so
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -136,6 +137,7 @@ __FBSDID("$FreeBSD: src/sys/kern/uipc_so
 
 #include 
 
+
 #ifdef COMPAT_IA32
 #include 
 #include 
@@ -656,8 +658,11 @@ soclose(struct socket *so)
if (so->so_state & SS_ISCONNECTED) {
if ((so->so_state & SS_ISDISCONNECTING) == 0) {
error = sodisconnect(so);
-   if (error)
-   goto drop;
+   if (error) {
+   log(LOG_INFO, "soclose: sodisconnect error: 
%d\n", error);
+   if (error == ENOTCONN)
+   error = 0;
+   }
}
if (so->so_options & SO_LINGER) {
if ((so->so_state & SS_ISDISCONNECTING) &&


After this the test application does not exits and I see in the message log:

Feb 18 16:02:37 zhuzha kernel: soclose: sodisconnect error: 57
Feb 18 16:03:31 zhuzha kernel: soclose: sodisconnect error: 57
Feb 18 16:05:49 zhuzha last message repeated 4 times
Feb 18 16:15:50 zhuzha last message repeated 13 times

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


unix socket: race on close?

2010-02-17 Thread Mikolaj Golub
Hi,

Below is a simple test code with unix sockets: the client does
connect()/close() in loop and the server -- accept()/close().

Sometimes close() fails with 'Socket is not connected' error:

a.out: parent: close error: 57

or

a.out: child: close error: 57

It looks for me like some race in close(). Looking at uipc_socket.c:soclose():

int
soclose(struct socket *so)
{
int error = 0;

KASSERT(!(so->so_state & SS_NOFDREF), ("soclose: SS_NOFDREF on enter"));

CURVNET_SET(so->so_vnet);
funsetown(&so->so_sigio);
if (so->so_state & SS_ISCONNECTED) {
if ((so->so_state & SS_ISDISCONNECTING) == 0) {
error = sodisconnect(so);
if (error)
goto drop;
}

Isn't the problem here? so_state is checked for SS_ISCONNECTED and
SS_ISDISCONNECTING without locking and then sodisconnect() is called, which
closes both sockets of the connection. So it looks for me that if the close()
is called for both ends simultaneously it is possible that sodisconnect() will
be called for both ends and for one ENOTCONN will be returned. Or may I have
missed something?

We have been observing periodically ENOTCONN errors on unix socket close in
our applications, so it is not just curiosity :-) (I posted about our problem
to freebsd-net@ some time ago but then did not attract any attention
http://lists.freebsd.org/pipermail/freebsd-net/2009-December/024047.html).

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define UNIXSTR_PATH "/tmp/mytest.socket"
#define USLEEP  100

int main(int argc, char **argv)
{
int listenfd, connfd, pid;
struct sockaddr_un  servaddr;

pid = fork();
if (-1 == pid)
errx(1, "fork(): %d", errno);

if (0 != pid) { /* parent */

if ((listenfd = socket(AF_LOCAL, SOCK_STREAM, 0)) < 0)
errx(1, "parent: socket error: %d", errno);

unlink(UNIXSTR_PATH);
bzero(&servaddr, sizeof(servaddr));
servaddr.sun_family = AF_LOCAL;
strcpy(servaddr.sun_path, UNIXSTR_PATH);

if (bind(listenfd, (struct sockaddr *) &servaddr, 
sizeof(servaddr)) < 0)
errx(1, "parent: bind error: %d", errno);

if (listen(listenfd, 1024) < 0)
errx(1, "parent: listen error: %d", errno);

for ( ; ; ) {
if ((connfd = accept(listenfd, (struct sockaddr *) 
NULL, NULL)) < 0)
errx(1, "parent: accept error: %d", errno);

//usleep(USLEEP / 2); // (I) uncomment this or (II) 
below to avoid the race

if (close(connfd) < 0)
errx(1, "parent: close error: %d", errno);
}

} else { /* child */

sleep(1); /* give the parent some time to create the socket */

for ( ; ; ) {

if ((connfd = socket(AF_LOCAL, SOCK_STREAM, 0)) < 0)
errx(1, "child: socket error: %d", errno);

bzero(&servaddr, sizeof(servaddr));
servaddr.sun_family = AF_LOCAL;
strcpy(servaddr.sun_path, UNIXSTR_PATH);

if (connect(connfd, (struct sockaddr *) &servaddr, 
sizeof(servaddr)) < 0)
errx(1, "child: connect error %d", errno);

// usleep(USLEEP); // (II) uncomment this or (I) above 
to avoid the race

if (close(connfd) != 0) 
errx(1, "child: close error: %d", errno);

usleep(USLEEP);
}
}

return 0;
}

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: crashtar

2009-10-18 Thread Mikolaj Golub

On Tue, 13 Oct 2009 22:50:44 +0200 (CEST) Alexander Best wrote:

 AB> again: great script. would be great to have this in the ports dir in the 
near
 AB> future.

I have created separate google project for this script

http://code.google.com/p/bsdcrashtar/

And submitted to ports

http://www.freebsd.org/cgi/query-pr.cgi?pr=139721

BTW, many things in the script have been improved since I posted it here, and
user friendly error output is among them :-).

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: crashtar

2009-10-10 Thread Mikolaj Golub

On Sat, 10 Oct 2009 12:34:05 +0200 (CEST) Alexander Best wrote:

 AB> thanks. this is a cool script and very useful indeed. only thing you might
 AB> want to do is check for root privileges at the beginning to avoid nasty 
error
 AB> messages like.

 AB> awk: can't open file /var/crash/info.0
 AB>  source line number 12

In some cases you might not need root privileges. E.g. on some servers I don't
have root but SA gives me read access to crashdumps. In this case if the
script had a check for root privileges I would not be able to use it.

Actually as for me the message looks informative enough, it says that we have
some problems with accessing crash dump files, so permissions should be
checked.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: crashinfo: print the content of ddb capture budder

2009-10-09 Thread Mikolaj Golub

On Fri, 9 Oct 2009 11:28:11 -0400 John Baldwin wrote:

 JB> On Monday 05 October 2009 1:48:06 am Mikolaj Golub wrote:
 >> Hi,
 >> 
 >> It would be nice if crashinfo(8) were also trying to output the content of 
 >> ddb
 >> capture buffer. Something like in this patch:
 >> 
 >> --- crashinfo.sh.orig2009-10-05 08:26:26.0 +0300
 >> +++ crashinfo.sh2009-10-05 08:43:56.0 +0300
 >> @@ -304,3 +304,18 @@
 >>  echo "kernel config"
 >>  echo
 >>  config -x $KERNEL
 >> +
 >> +file=`mktemp /tmp/crashinfo.XX`
 >> +if [ $? -eq 0 ]; then
 >> +ddb capture -M $VMCORE -N $KERNEL print > $file 2>/dev/null
 >> +if [ -s $file ]; then
 >> +echo 
 >> ""
 >> +echo "ddb capture buffer"
 >> +echo
 >> +cat $file |
 >> +sed -e 's/p\{10\}p*//' # XXX: this removes the unfilled 
 >> part of a capture buffer
 >> +echo
 >> +fi
 >> +rm -f $file
 >> +fi
 >> +
 >> 

 JB> I'm definitely in favor of this.  I assume you have tested it locally?  Do 
you have a sample
 JB> crash.X.txt file with it enabled?

I have tested it on 8.0.

zhuzha:~% ls -l /var/crash/vmcore.23   
-rw--- 1 root wheel 166703104 2009-10-05 08:03 /var/crash/vmcore.23
zhuzha:~% sudo crashinfo 
Writing crash summary to /var/crash/core.txt.23.
zhuzha:~% grep -B5 -A30 'ddb capture buffer' /var/crash/core.txt.23

kernel config

config: File /boot/kernel.old/kernel doesn't contain configuration file. Either 
unsupported, or not compiled with INCLUDE_CONFIG_FILE

ddb capture buffer

db:0:kdb.enter.panic>  show pcpu
cpuid= 0
dynamic pcpu= 0x68ee80
curthread= 0xc4a1ad80: pid 2276 "sysctl"
curpcb   = 0xe6d44d90
fpcurthread  = none
idlethread   = 0xc4576900: pid 11 "idle: cpu0"
APIC ID  = 0
currentldt   = 0x50
spin locks held:
db:0:kdb.enter.panic>  show allpcpu
Current CPU: 0

cpuid= 0
dynamic pcpu= 0x68ee80
curthread= 0xc4a1ad80: pid 2276 "sysctl"
curpcb   = 0xe6d44d90
fpcurthread  = none
idlethread   = 0xc4576900: pid 11 "idle: cpu0"
APIC ID  = 0
currentldt   = 0x50
spin locks held:

cpuid= 1
dynamic pcpu= 0x34ffe80
curthread= 0xc5837480: pid 2191 "screen"
curpcb   = 0xe6e5ed90
fpcurthread  = none
idlethread   = 0xc4576b40: pid 11 "idle: cpu1"
zhuzha:~% tail /var/crash/core.txt.23 
mi_switch(104,0,c0c798d3,1d6,44,...) at mi_switch+0x200
sleepq_switch(c0dc8190,0,c0c798d3,26e,0,...) at sleepq_switch+0x15f
sleepq_timedwait(c0dc7ee0,44,c0c7793c,0,0,...) at sleepq_timedwait+0x6b
_sleep(c0dc7ee0,0,44,c0c7793c,2710,...) at _sleep+0x339
scheduler(0,141ec00,141ec00,141e000,1425000,...) at scheduler+0x23e
mi_startup() at mi_startup+0x96
begin() at begin+0x2c
db:0:kdb.enter.panic>  call doadump


zhuzha:~% 

Actually the last echo in the patch looks like is not necessary.

Do you want the whole crash.23.txt file for review?

Also, I remember I tested it on crashdump of a kernel without ddb support and
no issues were noticed too.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


crashinfo: print the content of ddb capture budder

2009-10-04 Thread Mikolaj Golub
Hi,

It would be nice if crashinfo(8) were also trying to output the content of ddb
capture buffer. Something like in this patch:

--- crashinfo.sh.orig   2009-10-05 08:26:26.0 +0300
+++ crashinfo.sh2009-10-05 08:43:56.0 +0300
@@ -304,3 +304,18 @@
 echo "kernel config"
 echo
 config -x $KERNEL
+
+file=`mktemp /tmp/crashinfo.XX`
+if [ $? -eq 0 ]; then
+   ddb capture -M $VMCORE -N $KERNEL print > $file 2>/dev/null
+   if [ -s $file ]; then
+   echo 
""
+   echo "ddb capture buffer"
+   echo
+   cat $file |
+   sed -e 's/p\{10\}p*//' # XXX: this removes the unfilled part of 
a capture buffer
+   echo
+   fi
+   rm -f $file
+fi
+

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


crashtar

2009-10-04 Thread Mikolaj Golub
Hi,

http://trociny.googlecode.com/files/crashtar

This simple script is useful for me and might be useful for other people
too. The script creates tar archive that contains all files needed for
debugging FreeBSD kernel crash (vmcore, kernel, loaded modules, sources that
appear in backtrace). This is useful e.g. for debugging a crash on another
host, sending it to developers or if you are going to upgrade the kernel on
crashed host but would like to keep crashdump in case the developers ask you
to provide additional info.

Created tar contains also a script that when being run inside unpacked archive
will give kgdb(1) session with crash core loaded in it. The script should be
run with root privileges because it does chroot(8) before starting kgdb(1).

I think I don't have to warn here that a crashdump may be sent only to person
you trust :-).

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


7.1 panicked removing namecache entry from cache

2009-09-15 Thread Mikolaj Golub
e_next = 0xcc30acf0, tqe_prev 
= 0xc0c4f3ac}, v_bufobj = {
bo_mtx = 0xc89053c4, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 
0xc8905400}, bv_root = 0x0, 
  bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 
0xc8905410}, bv_root = 0x0, 
  bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_ops = 0xc0be8e40, bo_bsize 
= 16384, bo_object = 0x0, 
bo_synclist = {le_next = 0x0, le_prev = 0xc890520c}, bo_private = 
0xc890533c, 
__bo_vnode = 0xc890533c}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0}

(kgdb) fr 7
#7  0xc07fd34b in cache_zap (ncp=0xcc33783c) at 
/usr/src/sys/kern/vfs_cache.c:276
276 LIST_REMOVE(ncp, nc_hash);
(kgdb) p *ncp
$2 = {nc_hash = {le_next = 0x0, le_prev = 0x0}, nc_src = {le_next = 0xcc31650c, 
le_prev = 0xcc2c51e4}, 
  nc_dst = {tqe_next = 0x0, tqe_prev = 0x0}, nc_dvp = 0x0, nc_vp = 0x0, nc_flag 
= 0 '\0', 
  nc_nlen = 0 '\0', nc_name = 0xcc33785e ""}

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Partial kvm dumps

2009-08-24 Thread Mikolaj Golub
.g. SNMP looks like more proper alternative
solution -- this is standard, also snmpd is actually that program which
"traverse kernel structures extracting all necessary data". But SNMP has its
own limitations, statistics provided via SNMP are rather limited and currently
I don't see how I could use it effectively to echieve my goal, althogh I
haven't think much in this direction yet...

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: possibly tmpfs bug

2009-07-03 Thread Mikolaj Golub

On Fri, 3 Jul 2009 16:07:56 +0200 (CEST) Wojciech Puchar wrote:

 WP> repeatable

 WP> put something on tmpfs filesystem, then download it to other machine
 WP> using ftp (server is ftpd on first machine). no errors, download is
 WP> fine, but you get rubbish - simply data from wrong places in memory.

 WP> using rcp works. most probably ftpd uses sendfile, while rcp does not

Yes, this is sendfile problem. It has been reported.

http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/127213

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: 7.1-STABLE crash

2009-06-03 Thread Mikolaj Golub
On Tue, 02 Jun 2009 23:24:54 +0300 Mikolaj Golub wrote:

> Actually some output from crashinfo looks suspicious. zero values for fork()
> calls, negative values in vmstat -m output... Does the userland where you were
> running crashinfo matched the crushed kernel?

It looks like the strange figures in vmstat output because its '-M' option is
broken. I am observing the same issue on my system. I have rebuild the kernel
and the world of today's RELENG_7 to make sure that this is not the issue with
my kernel not in sync with the userland.

zhuzha:~% uname -a
FreeBSD zhuzha.ua1 7.2-STABLE FreeBSD 7.2-STABLE #18: Wed Jun  3 14:28:49 EEST 
2009 r...@zhuzha.ua1:/usr/obj/usr/src/sys/DEBUG  i386
zhuzha:~% vmstat -m |head  
 Type InUse MemUse HighUse Requests  Size(s)
  ata_dma 3 1K   -3  128
 GEOM83 9K   - 1176  16,32,64,128,256,512,1024,2048
   isadev20 2K   -   20  64
ar_driver 0 0K   -6  512,2048
   acd_driver 1 2K   -1  2048
 cdev27 4K   -   28  128
   kbdmux 6 9K   -6  16,128,256,2048,4096
sigio 8 1K   -8  32
 filedesc   15246K   - 1637  16,32,64,128,256,512,1024,2048,4096
zhuzha:~% sudo vmstat -m -M /dev/mem -N /boot/kernel/kernel |head
 Type InUse MemUse HighUse Requests  Size(s)
  ata_dma-3 0K   -0  
 GEOM 1124760 -1107K   -  1125936  
16,32,64,128,256,512,1024,2048,4096,8192
   isadev   -20 0K   -0  
ar_driver 19450   -18K   -19456  32,64
   acd_driver-1-1K   -0  
 cdev   100-2K   -  128  16
   kbdmux-6-7K   -0  
sigio-8 0K   -0  
 filedesc 384793  -422K   -   386432  16,32,64,128,1024,2048,4096,8192
zhuzha:~% vmstat -s |head
  2380238 cpu context switches
27952 device interrupts
   573159 software interrupts
   931165 traps
  9997656 system calls
   56 kernel threads created
 1519  fork() calls
   45 vfork() calls
0 rfork() calls
0 swap pager pageins
zhuzha:~% sudo vmstat -s -M /dev/mem -N /boot/kernel/kernel |head
0 cpu context switches
0 device interrupts
0 software interrupts
0 traps
0 system calls
0 kernel threads created
0  fork() calls
0 vfork() calls
0 rfork() calls
0 swap pager pageins

On CURRENT vmstat -m works ok but vmstat -s gives zero values:

fbsd# uname -a
FreeBSD fbsd.zhuzha.ua1 8.0-CURRENT FreeBSD 8.0-CURRENT #8 r193242M: Mon Jun  1 
23:43:06 EEST 2009 
r...@zhuzha.ua1:/home/golub/freebsd/build/obj/home/golub/freebsd/src/sys/GENERIC
  i386
fbsd# vmstat -m |head
 Type InUse MemUse HighUse Requests  Size(s)
 cdev11 2K   -   11  128
CAM dev queue 1 1K   -1  64
sigio 1 1K   -1  32
 filedesc4913K   - 1109  16,256,512
 kenv   106 8K   -  109  16,32,64,128,4096
   kqueue 0 0K   -   12  128,1024
proc-args29 2K   -  593  16,32,64,128
  ithread63 6K   -   63  16,64,128
   acpica   42623K   - 8140  16,32,64,128,256,512,1024
fbsd# vmstat -m -M /dev/mem -N /boot/kernel/kernel |head
 Type InUse MemUse HighUse Requests  Size(s)
 cdev11 2K   -   11  128
CAM dev queue 1 1K   -1  64
sigio 1 1K   -1  32
 filedesc4913K   -   16,256,512
 kenv   106 8K   -  109  16,32,64,128,4096
   kqueue 0 0K   -   12  128,1024
proc-args29 2K   -  599  16,32,64,128
  ithread63 6K   -   63  16,64,128
   acpica   42623K   - 8140  16,32,64,128,256,512,1024
fbsd# vmstat -s |head
88380 cpu context switches
 5058 device interrupts
16851 software interrupts
   138908 traps
   140549 system calls
   18 kernel threads created
 1070  fork() calls
   22 vfork() calls
0 rfork() calls
0 swap pager pageins
fbsd# vmstat -s -M /dev/mem -N /boot/kernel/kernel |head
0 cpu context switches
0 device interrupts
0 software interrupts
0 traps
0 system calls
0 kernel threads created
0  fork() calls
0 vfork() calls
0 rfork() calls
0 swap pager pageins

On 6.3-RELEASE vmstat -s gives zero values only for some counters:

fbsd6# uname -a
FreeBSD fbsd6.zhuzha.ua1 6.3-RELEASE FreeBSD 6.3-RELEASE #0: Wed Jan 16 
04:18:52 UTC 2008 r...@dessler.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC 
 i386
fbsd6# vmstat -m |head
 Type InUse MemUse 

Re: 7.1-STABLE crash

2009-06-02 Thread Mikolaj Golub

On Tue, 02 Jun 2009 13:41:40 +0400 Asmodean Dark wrote:

 AD> # kgdb kernel.debug vmcore.0
 AD> GNU gdb 6.1.1 [FreeBSD]
 AD> Copyright 2004 Free Software Foundation, Inc.
 AD> GDB is free software, covered by the GNU General Public License, and you 
are
 AD> welcome to change it and/or distribute copies of it under certain 
conditions.
 AD> Type "show copying" to see the conditions.
 AD> There is absolutely no warranty for GDB.  Type "show warranty" for details.
 AD> This GDB was configured as "i386-marcel-freebsd"...No struct type named 
linker_file.
 AD> No struct type named linker_file.
 AD> No struct type named linker_file.
 AD> Attempt to extract a component of a value that is not a structure.
 AD> No struct type named linker_file.
 AD> No struct type named linker_file.
 AD> No struct type named linker_file.
 AD> Attempt to extract a component of a value that is not a structure.

 AD> Attempt to extract a component of a value that is not a structure pointer.
 AD> Attempt to extract a component of a value that is not a structure pointer.
 AD> Attempt to extract a component of a value that is not a structure pointer.
 AD> Attempt to extract a component of a value that is not a structure pointer.
 AD> #0  0x8063d6b0 in doadump ()
 AD> (kgdb) bt
 AD> #0  0x8063d6b0 in doadump ()
 AD> #1  0x8063dc44 in boot ()
 AD> #2  0x8063e0ca in panic ()
 AD> #3  0x807dab3d in trap_fatal ()
 AD> #4  0x807daeba in trap_pfault ()
 AD> #5  0x807db7bd in trap ()
 AD> #6  0x807c2a3b in calltrap ()
 AD> #7  0x806dcb88 in rn_match ()
 AD> #8  0x806ddc8a in rn_lookup ()
 AD> #9  0x8070e460 in ipfw_chk (args=0xe70175fc) at 
../../../netinet/ip_fw2.c:1894
 AD> #10 0x80710c3d in ipfw_check_in (arg=0x0, m0=0xe7017700, ifp=0x91c5a800, 
dir=1, inp=0x0) at ../../../netinet/ip_fw_pfil.c:125
 AD> #11 0x806dc20f in pfil_run_hooks ()
 AD> #12 0x80713984 in ip_input (m=0x91954c00) at 
../../../netinet/ip_input.c:416
 AD> #13 0x806ec0d9 in ng_iface_rcvdata ()
 AD> #14 0x806e9570 in ng_apply_item ()
 AD> #15 0x806e8569 in ng_snd_item ()
 AD> #16 0x806e9570 in ng_apply_item ()
 AD> #17 0x806e8569 in ng_snd_item ()
 AD> #18 0x806e9570 in ng_apply_item ()
 AD> #19 0x806e8569 in ng_snd_item ()
 AD> #20 0x806f16a7 in ng_ppp_proto_recv ()
 AD> #21 0x806f3ed2 in ng_ppp_rcvdata ()
 AD> #22 0x806e9570 in ng_apply_item ()
 AD> #23 0x806e8569 in ng_snd_item ()
 AD> #24 0x806e9570 in ng_apply_item ()
 AD> #25 0x806e8569 in ng_snd_item ()
 AD> #26 0x806ee3c3 in ng_ksocket_incoming2 ()
 AD> #27 0x806e969d in ng_apply_item ()
 AD> #28 0x806ea8aa in ngintr ()
 AD> #29 0x806dab72 in swi_net ()
 AD> #30 0x8061e265 in ithread_loop ()
 AD> #31 0x8061adf5 in fork_exit ()
 AD> #32 0x807c2ab0 in fork_trampoline ()
 AD> (kgdb) fr 9
 AD> #9  0x8070e460 in ipfw_chk (args=0xe70175fc) at 
../../../netinet/ip_fw2.c:1894
 AD> 1894sa.sin_len = 8;
 AD> (kgdb) list
 AD> 1889struct sockaddr_in sa;
 AD> 1890
 AD> 1891if (tbl >= IPFW_TABLES_MAX)
 AD> 1892return (0);
 AD> 1893rnh = ch->tables[tbl];
 AD> 1894sa.sin_len = 8;
^ looks strange. On the line 1894 I expected to see rnh_lookup() call,
which is two lines below. Are you sure your source matches the built kernel?

 AD> 1895sa.sin_addr.s_addr = addr;
 AD> 1896ent = (struct table_entry *)(rnh->rnh_lookup(&sa, NULL, 
rnh));
 AD> 1897if (ent != NULL) {
 AD> 1898*val = ent->value;
 AD> (kgdb) p *cmd
 AD> $1 = {opcode = O_IP_SRC_LOOKUP, len = 1 '\001', arg1 = 2}
 AD> (kgdb) p cmd->arg1
 AD> $2 = 2

It crashed looking for src IP in table 2. But from ps otput I don't see the
process that could modify the table in that time. So the table might have been
corrupted earlier.

Unfortunately, reviewing provided info I don't have any good ideas what might
have caused this. May be other people on the list could help...

Recently I saw some backtrace of the crash in rn_match() too but then it was
pf that was looking for IP in the table. It appeared that the guy was running
ssh brute-force blocker and expiretable, which was run periodically, removed
old entries from the table. He just disabled expiretable and this stopped the
crashes.

Actually some output from crashinfo looks suspicious. zero values for fork()
calls, negative values in vmstat -m output... Does the userland where you were
running crashinfo matched the crushed kernel? Also, does you kernel match
userland on crashed box?

And certainly it would be good to provide backtrace with full debugging info
available :-). Do you remember that debugging symbols for modules are needed
too?
   
-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: 7.1-STABLE crash

2009-06-02 Thread Mikolaj Golub
On Tue, 02 Jun 2009 10:36:32 +0400 Asmodean Dark wrote:

>>> Do you really need a such outdated version of mpd while mpd5 is available 
>>> in ports?
> Yes, we need mpd3 because there are many our custom patches for it (some 
> RADIUS functions, bugfixes etc).
>
>>> Do you have some automatic blocker or some other script that periodically 
>>> add/remove IPs in a ipfw table?
> Yes. For some our users we use ipfw-based rules (SMTP port blocking, enabling 
> transparent proxy and other). Mpd ifaceup script contain something like this:
>   if [ "$FilterId" = "no_transparent" ]; then
>   $IPFW table 3 add $IP
>   fi
>
> And, in ipfw.rules:
>   add allow tcp from table(3) to any dst-port 80 in recv ng*
>
> Why table can be corrupted? ipfw check added address, isn`t it?

It might be that when ipfw was looking for IP in the table it was being
modified by ifaceup script at that time. ipfw has lock protection so this
thing should not have happened but...

Could you run something like this?

ps -auxl -M /path/to/vmcore -N /path/to/kernel.symbols

We could look if at the moment of the crash some process was running that was
adding/removing ipfw table.

Other good and simple thing is to run crashinfo(8) utility and provide its
output. If it runs flawlessly it should contain ps output too among other
useful information.

>>> Do the bt output for other crashes looks the same?
> It`s a first dump obtained. There is some problems in receiving it, because 
> all servers is network-booted (via PXE) and have no usable dump devices :) 
> This dump obtained with USB flash device connected to server. Also, crashes 
> is not often.

So we could try to get as much info from this dump as we can :-). Could you
post here the whole output of kgdb from its start, not only bt (I mean things
like "Reading symbols from..." and the command line itself)?

Also you can try in kgdb session:

fr 9
list
p *cmd
p cmd->arg1
p a

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: 7.1-STABLE crash

2009-06-01 Thread Mikolaj Golub

On Mon, 01 Jun 2009 10:05:46 +0400 Asmodean Dark wrote:

 AD> Hello,

 AD> We have cluster of FreeBSD VPN servers with running mpd3.18 and sometimes 
it crash:

Do you really need a such outdated version of mpd while mpd5 is available in
ports?

 AD> # uname -a
 AD> FreeBSD vpn 7.1-STABLE FreeBSD 7.1-STABLE #13: Wed Mar 18 14:53:13 YEKT 
2009 r...@vpn:/usr/src/sys/i386/compile/kconf  i386

 AD> # dmesg
 AD> Fatal trap 12: page fault while in kernel mode
 AD> cpuid = 1; apic id = 01
 AD> fault virtual address   = 0x8
 AD> fault code  = supervisor read, page not present
 AD> instruction pointer = 0x20:0x806dcb88
 AD> stack pointer   = 0x28:0xe70775c4
 AD> frame pointer   = 0x28:0xe70775e8
 AD> code segment= base 0x0, limit 0xf, type 0x1b
 AD> = DPL 0, pres 1, def32 1, gran 1
 AD> processor eflags= interrupt enabled, resume, IOPL = 0
 AD> current process = 22 (em0 taskq)
 AD> trap number = 12

 AD> (kgdb) bt
 AD> #0  0x8063d6b0 in doadump ()
 AD> #1  0x8063dc44 in boot ()
 AD> #2  0x8063e0ca in panic ()
 AD> #3  0x807dab3d in trap_fatal ()
 AD> #4  0x807daeba in trap_pfault ()
 AD> #5  0x807db7bd in trap ()
 AD> #6  0x807c2a3b in calltrap ()
 AD> #7  0x806dcb88 in rn_match ()
 AD> #8  0x806ddc8a in rn_lookup ()
 AD> #9  0x8070e460 in ipfw_chk (args=0xe70175fc) at 
../../../netinet/ip_fw2.c:1894
 AD> #10 0x80710c3d in ipfw_check_in (arg=0x0, m0=0xe7017700, ifp=0x91c5a800, 
dir=1, inp=0x0) at ../../../netinet/ip_fw_pfil.c:125
 AD> #11 0x806dc20f in pfil_run_hooks ()
 AD> #12 0x80713984 in ip_input (m=0x91954c00) at 
../../../netinet/ip_input.c:416
 AD> #13 0x806ec0d9 in ng_iface_rcvdata ()
 AD> #14 0x806e9570 in ng_apply_item ()
 AD> #15 0x806e8569 in ng_snd_item ()
 AD> #16 0x806e9570 in ng_apply_item ()
 AD> #17 0x806e8569 in ng_snd_item ()
 AD> #18 0x806e9570 in ng_apply_item ()
 AD> #19 0x806e8569 in ng_snd_item ()
 AD> #20 0x806f16a7 in ng_ppp_proto_recv ()
 AD> #21 0x806f3ed2 in ng_ppp_rcvdata ()
 AD> #22 0x806e9570 in ng_apply_item ()
 AD> #23 0x806e8569 in ng_snd_item ()
 AD> #24 0x806e9570 in ng_apply_item ()
 AD> #25 0x806e8569 in ng_snd_item ()
 AD> #26 0x806ee3c3 in ng_ksocket_incoming2 ()
 AD> #27 0x806e969d in ng_apply_item ()
 AD> #28 0x806ea8aa in ngintr ()
 AD> #29 0x806dab72 in swi_net ()
 AD> #30 0x8061e265 in ithread_loop ()
 AD> #31 0x8061adf5 in fork_exit ()
 AD> #32 0x807c2ab0 in fork_trampoline ()

It looks like the kernel crashed when ipfw was looking for the packet's
src/dst in a table. The table might have been corrupted (being modified?)  at
that time. Do you have some automatic blocker or some other script that
periodically add/remove IPs in a ipfw table?

 AD> What can I do with it? Are additional info needed?

It is a bit strange for me that in the bt output the file source information
is displayed only for several functions. Do you have the kernel and all
modules built with the debugging symbols? If you don't I would recommend to
rebuild the kernel so I would be able to provide bt (from the next crash :-)
with all necessary info available. Do the bt output for other crashes looks
the same?

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Memory leak on thread removal

2009-05-17 Thread Mikolaj Golub

On Sat, 16 May 2009 20:24:09 +0200 Marius Nünnerich wrote:

 >> http://freshmeat.net/projects/lmdbg
 >>
 >> This is a small memory leak debugger. It does not provide all functionality
 >> you can find in more sophisticated tools but is lightweight, portable and
 >> simple in use. It was very useful when I traced this bug.

 MN> Thanks, I'll take a look at it.

Today I submitted lmdbg port.

http://www.freebsd.org/cgi/query-pr.cgi?pr=134617

At present it is waiting to be committed in ports tree, but you can use shar
from the PR to build the port yourself.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Memory leak on thread removal

2009-05-17 Thread Mikolaj Golub

On Tue, 12 May 2009 09:27:30 +0300 Mikolaj Golub wrote:

 MG> Hi,

 MG> The code below is compiled with -fopenmp and run on FreeBSD6/7 (i386, 
amd64):

 MG> #include 
 MG> #include 

 MG> int n = 4, m = 2;

 MG> int main () {
 MG> for (;;) {
 MG> int i;

 MG> //sleep(2);
 MG> #pragma omp parallel for num_threads(m)
 MG> for(i = 0; i < 1; i++) {}

 MG> //sleep(2);
 MG> #pragma omp parallel for num_threads(n)
 MG> for(i = 0; i < 1; i++) {}
 MG> 
 MG> }

 MG> return 0;
 MG> }

 MG> During the run the program's virtual memory usage constantly grows. The 
growth
 MG> is observed only when n != m. When running the program with uncommented
 MG> sleep() and observing the number of threads with 'top -H' I see in turn 2 
or 4
 MG> threads. So it looks like memory leak when thread is removed. Should I fill
 MG> PR?

Reported.

http://www.freebsd.org/cgi/query-pr.cgi?pr=134604

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Memory leak on thread removal

2009-05-16 Thread Mikolaj Golub

On Fri, 15 May 2009 13:48:51 +0200 Marius Nünnerich wrote:

 MN> On Tue, May 12, 2009 at 08:27, Mikolaj Golub  
wrote:
 >> Hi,
 >>
 >> The code below is compiled with -fopenmp and run on FreeBSD6/7 (i386, 
 >> amd64):
 >>
 >> #include 
 >> #include 
 >>
 >> int n = 4, m = 2;
 >>
 >> int main () {
 >>        for (;;) {
 >>                int i;
 >>
 >>                //sleep(2);
 >> #pragma omp parallel for num_threads(m)
 >>                for(i = 0; i < 1; i++) {}
 >>
 >>                //sleep(2);
 >> #pragma omp parallel for num_threads(n)
 >>                for(i = 0; i < 1; i++) {}
 >>
 >>        }
 >>
 >>        return 0;
 >> }
 >>
 >> During the run the program's virtual memory usage constantly grows. The 
 >> growth
 >> is observed only when n != m. When running the program with uncommented
 >> sleep() and observing the number of threads with 'top -H' I see in turn 2 
 >> or 4
 >> threads. So it looks like memory leak when thread is removed. Should I fill
 >> PR?

It looks like I have found the leak. The problem is in libgomp/team.c.
gomp_thread_start() does sem_init() but sem_destroy() is never called. This
patch solves the problem for me:

--- contrib/gcclibs/libgomp/team.c.orig 2009-05-16 17:32:57.0 +0300
+++ contrib/gcclibs/libgomp/team.c  2009-05-16 19:16:37.0 +0300
@@ -164,9 +164,12 @@ new_team (unsigned nthreads, struct gomp
 static void
 free_team (struct gomp_team *team)
 {
+  int i;
   free (team->work_shares);
   gomp_mutex_destroy (&team->work_share_lock);
   gomp_barrier_destroy (&team->barrier);
+  for(i = 1; i < team->nthreads; i++)
+gomp_sem_destroy (team->ordered_release[i]);
   gomp_sem_destroy (&team->master_release);
   free (team);
 }

I am going to fill PR to gcc mainstream, but should I also register this in
FreeBSD bugtrack as gcc is part of the base?

BTW, the problem is not observed under Linux. I have not looked in Linux code
but it looks like sem_init() implementation for Linux does not do memory
allocation. The memory for the test program below grows under FreeBSD and does
not under Linux.

#include 

int
main(int argc, char *argv[]) {

sem_t sem;

for(;;) { sem_init(&sem, 0, 0);}

return 0;
}


 MN> I can confirm this. I briefly looked through the libgomp code but
 MN> didn't see the leak. Anybody knows good tools how to investigate this?

http://freshmeat.net/projects/lmdbg

This is a small memory leak debugger. It does not provide all functionality
you can find in more sophisticated tools but is lightweight, portable and
simple in use. It was very useful when I traced this bug.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Memory leak on thread removal

2009-05-11 Thread Mikolaj Golub
Hi,

The code below is compiled with -fopenmp and run on FreeBSD6/7 (i386, amd64):

#include 
#include 

int n = 4, m = 2;

int main () {
for (;;) {
int i;

//sleep(2);
#pragma omp parallel for num_threads(m)
for(i = 0; i < 1; i++) {}

//sleep(2);
#pragma omp parallel for num_threads(n)
for(i = 0; i < 1; i++) {}

}

return 0;
}

During the run the program's virtual memory usage constantly grows. The growth
is observed only when n != m. When running the program with uncommented
sleep() and observing the number of threads with 'top -H' I see in turn 2 or 4
threads. So it looks like memory leak when thread is removed. Should I fill
PR?

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: diagnosing freezes (DRI?)

2009-04-11 Thread Mikolaj Golub

On Sat, 11 Apr 2009 11:15:59 +0100 xorquew...@googlemail.com wrote:

 x> On 2009-04-11 02:30:40, Paul B. Mahol wrote:
 >> 
 >> If it locks under X11 then use debug.debugger_on_panic=0 sysctl.
 >> Not doing this will increase drasticaly chances of locking whole system
 >> and not providing any debug data.

 x> I don't seem to have that sysctl.

You will see this sysctl only if you build your kernel with ddb(4) support.

If you are interested in providing useful information about your freezes,
please read the following:

http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html

You need to build your kernel with options described in

http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-online-ddb.html

and 

http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html

As you run X, you need to have debug.debugger_on_panic=0 set (as Paul has
suggested). Otherwise ddb would enter on panic but you wouldn't be able to
access it due to X. After panic you will be able to get useful information
from generated core dump using kgdb.

Another option is to set debug.debugger_on_panic=1 but also set some ddb
script that will run when the kernel debugger is entered as a result of a
panic. This script will enable output capture, dump some useful debugging info
to capture buffer, and then force a kernel dump to be written out followed by
a reboot. E.g. running something like this will do the trick:

ddb script 'kdb.enter.panic=capture on; show pcpu; show allpcpu; bt; ps; show 
locks; show alllocks; show lockedvnods; alltrace; call doadump; reset'

After reboot you can extract captured information from the capture buffer
information using the command:

ddb capture -M /var/crash/vmcore.X print > ddb.out

You need to increase the value of debug.ddb.capture.bufsize sysctl variable to
make sure all ddb output will be captured.

You can read more about this in ddb(4), ddb(8), textdump(4).

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


How to calculate current kmem usage?

2009-01-14 Thread Mikolaj Golub
Hi,

Could someone explain please, how to calculate current kernel memory
utilization, one that is limited by vm.kmem_size?

There is a script on http://wiki.freebsd.org/ZFSTuningGuide that calculates
kernel memory utilization by summing the values from `kldstat' output (TEXT)
and the values from `vmstat -m' output (DATA). Are these the only data needed
for proper calculation of kmem?

What about zone(9) allocations? Shouldn't data from `vmstat -z' output be
added to calculate kmem usage?

The reason I am asking about this is that we are tuning vfs.ufs.dirhash_maxmem
on our storage servers. By default it is 2Mb that looks like very small
value. We increased it to 30Mb and all 30Mb were filled very quickly, so we
are considering to increase it more but we need the method to monitor the
system resources we can hit (we use the default value for vm.kmem_size 300Mb
that is not so large).

So what the system parameters we should monitor increasing 
vfs.ufs.dirhash_maxmem?
I see the growth of dirhash_maxmem corresponds the growth of wired
memory. Currently wired is 222M on this host. Isn't wired memory limited by
vm.kmem_size or it is limited only by vm.kvm_size?

BTW, how reasonably large the value of vfs.ufs.dirhash_maxmem can be? I have
seen recommendations to increase it until it all in usage, but may be there
are other considerations I should take into account? We use rsync on our
storage servers to synchronize data between the hosts and I suppose this is the
main dirhash_mem eater.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Socket leak

2008-05-14 Thread Mikolaj Golub

On Wed, 14 May 2008 09:46:35 -0400 Mark Saad wrote:

 MS> Mikolaj
 MS>   Thanks for the input, did you change any of the options for
 MS> TimeoutLinger or TimeoutIdle ?

No, I didn't

 MS> The Proftpd I am running is build for 6.3-RELEASE  here are the build
 MS> options

 MS> Compile-time Settings:
 MS>  Version: 1.3.0a
 MS>  Platform: FREEBSD6 (FREEBSD6_3)
 MS>  Built With:
 MS>configure CPPFLAGS=-DHAVE_OPENSSL --localstatedir=/var/run
 MS> --disable-sendfile --disable-ipv6
 MS> --with-modules=mod_sql:mod_sql_mysql:mod_check_mysql:mod_check_digest
 MS> --prefix=/usr/local
 MS> --with-includes=/usr/local/include/mysql:/usr/include/openssl
 MS> --with-libraries=/usr/local/lib/mysql

It might be that it is not proftpd but other application that cause the leak. 
Anyway, to check if it is proftpd, look in its logs for entries like these:  

  Entering Passive Mode (192,168,0,213,241,70).
  FTP session closed.

Convert the last two numbers to port (241*256+70) and check by netstat if you
still have this connection. If you have, then it is likely this is the same
situation as in my case and the proftpd is a problem. Upgrade to 1.3.1 from
ports then.

If proftpd is ok, look for other applications. Search for connections reported
by netstat as ESTABLISHED but not displayed by sockstat utility. You could run
something like this:

netstat -an | grep ESTABL |
while read b l a local remote state; do
echo -n "$local $remote: "
sockstat |
sed -e 's/:/./g' |
grep -c "$local *$remote"
done

Look for sockets with 0 count. These are suspicious ones. Observe these
sockets by netstat and try to figure out what application they could belong
and dig in that direction.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Socket leak

2008-05-14 Thread Mikolaj Golub

On Tue, 13 May 2008 19:37:29 -0400 Mark Saad wrote:

 MS> I started logging the values of kern.ipc.numopensockets and I noticed
 MS> that something is leaking sockets. Here is a sample of the log

 MS> 2008-04-29--15:04.10  kern.ipc.numopensockets: 1501
 MS> 2008-04-29--16:04.01  kern.ipc.numopensockets: 1535
 MS> 2008-04-29--17:04.00  kern.ipc.numopensockets: 1617
 MS> 2008-04-29--18:04.00  kern.ipc.numopensockets: 1710

 MS> This continues until kern.ipc.maxsockets its reached or the box is
 MS> rebooted.

 MS> The other thing we looked at was the output from vmstat -z
 MS> The first thing was the high amount of malloc 128 bucket failures

 MS> 128 Bucket:524,0, 2489,   80, 8364, 23055239

 MS> I also logged the mbuf clusters, we never reached the max mbuf clusters

 MS> Its almost like there are stale sockets. Here is a snapshot of the server 
now

 MS> ewr# sockstat -4u |wc -l
 MS> 139
 MS> ewr# sysctl kern.ipc.numopensockets
 MS> kern.ipc.numopensockets: 13935

 MS> ewr# uptime
 MS> 7:30PM  up 6 days, 26 mins, 3 users, load averages: 0.18, 0.25, 0.17

We had the same problem on one of hosts running 6.2-RELEASE-p11. The situation
was complicated by the fact that I didn't have root access to the host and
there were problems with getting more debugging or running tcpdump.

Eventually, it appeared that problem was caused by proftpd. One of our clients
connected to ftp server every five minutes looking for new file to
download. When there was the file everything was good. But when there wasn't,
some strange things happened. In proftpd logs we had:

FTP session opened.
mod_delay/0.5: delaying for 28 usecs
user fake authenticated by mod_auth_pam.c
USER fake: Login successful.
Preparing to chroot to directory '/var/ftp/fake'
Environment successfully chroot()ed.
mod_delay/0.5: delaying for 621 usecs
Entering Passive Mode (XX,YY,ZZ,213,241,70).
FTP session closed.

i.e. the client connected to server, had login successful, created new DATA
connection in passive mode and then exited. But although proftpd reported that
connection closed and proftpd process exited we still had this orphaned
connection in our system reported by netstat in ESTABLISHED state. sockstat
did not display this connections. Some of these connections could be in
CLOSE_WAIT mode instead of ESTABLISHED. Such connection was seen by netstat
for several hours and then disappeared but I suspect that the socket buffer
was not freed and numopensockets counter did not decrease.

Unfortunately, I did not managed to persuade admin to increase DebugLevel in
proftpd.conf and run tcpdump to investigate more what was going on. It turned
out that we had proftpd built for FREEBSD5_4:

Compile-time Settings:
  Version: 1.3.0
  Platform: FREEBSD5 (FREEBSD5_4)
  Built With:
configure --localstatedir=/var/run --sysconfdir=/usr/local/etc 
--disable-sendfile --disable-ipv6 
--with-modules=mod_ratio:mod_readme:mod_rewrite:mod_wrap:mod_ifsession 
--prefix=/usr/local i386-portbld-freebsd5.4

Upgrade to more recent proftpd built for proper platform resolved the problem.

So I would recommend to look for process that could cause this leak. In my
case careful investigation of netstat output history and comparing with
sockstat output helped to find guilty. May be it would help to restart daemons
one by one and see if sockets are freed.

You can surely increase kern.ipc.maxsockets as workaround until you identify
what causes the problem.

-- 
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"